Re: ramsonware URI list
On 2017-07-15 12:19, David B Funk wrote: > Another way to use that data is to extract the hostnames and feed them > into a local URI-dnsbl. > Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM > overhead) way to implement a local DNSbl for multiple purposes (EG an > IP-addr based list for RBLDNSd or host-name based URI-dnsbl). > The URI-dnsbl has an advantage of being easy to add names (just 'cat' > them on to the end of the data-file with appropriate suffix) and > doesn't require a restart of any daemon to take effect. But one still needs to signal rbldnsd to reload the data, right? If one has just hostname data or fixed IP address data (no ranges) yet another option is the "constant database" cdb [1]. I use it a lot for these purposes. You can even match domain wildcards, by successively stripping the most significant parts of the subject domain before trying the match. I am wondering if (or why not) a similar no-daemon option exists for CIDR range data. There are definitely perl modules that manipulate such data, but none I'm aware of with a built-in compiled, quickly loaded dataset format. [1] https://cr.yp.to/cdb.html -- Please don't Cc: me privately on mailing lists and Usenet, if you also post the followup to the list or newsgroup. Do obvious transformation on domain to reply privately _only_ on Usenet.
Re: ramsonware URI list
On Sat, 15 Jul 2017 13:13:31 -0500 (CDT) David B Funk wrote: > > On Sat, 15 Jul 2017, Antony Stone wrote: > One observation; that list has over 10,000 entries which means that > you're going to be adding thousands of additional rules to SA on an > automated basis. > > Some time in the past other people had worked up automated mechanisms > to add large numbers of rules derived from example spam messages (Hi > Chris;) and there were performance issues (significant increase in SA > load time, memory usage, etc). I'm not an expert on perl internals, so I may be wide of the mark, but I would have thought that the most efficient way to do this using uri rule(s) would be to generate a single regex recursively so that scanning would be O(log(n)) in the number of entries rather than O(n). You start by stripping the http:// and then make a list of the all the first characters, then for each character you recurse. You end up with something like ^http://(a(...)|b(...)...|z(...)) Where each of the (...) contains a similar list of alternations to the top level. You can take this a bit further and detect when the all the strings in the current list start with a common sub-string - you can then generate the equivalent of a patricia trie in regex form. > Be aware, you may run into that situation. Using a URI-dnsbl avoids > that risk. The list contains full URLs, I presume there's a reason for that. For example: http://invoiceholderqq.com/85.exe http://invoiceholderqq.com/87.exe http://invoiceholderqq.com/93.exe http://inzt.net/08yhrf3 http://inzt.net/0ftce4
Re: ramsonware URI list
Ahuhaauahu ok ok Thankyou for replay -- View this message in context: http://spamassassin.1065346.n5.nabble.com/ramsonware-URI-list-tp122939p135315.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: ramsonware URI list
On 7/15/2017 2:13 PM, David B Funk wrote: How quickly do stale entries get removed from it? I randomly sorted this list, then I tried visiting 10 randomly selected links. I know that isn't a very large sample size, but it is a strong indicator since they were purely randomly chosen. 9 of the 10 links had already been taken down. So there might be much stale data in that list? I also extracted out the host names, deleted duplicates, randomly sorted those, then ran checks of 500 randomly selected host names against SURBL, URIBL, DBL, and ivmURI. The number of hits on all 4 lists of shockingly low. But I think that probably has more to do with stale data on this URL list (and this is really a URL list, not a URI list), rather than with lack of effectiveness of these other domain/URI blacklists. Still, there can be situations where a URI list won't list such a host name due to too much collateral damage - but yet where a URL list that specifically lists the entire URL - can still be effective. Because such a URL list would be LESS efficient (due to being rules-based), it would be preferable that such a list would have much less stale data - and perhaps would focus on the stuff that isn't found on any (or very many) of the 4 major URI lists I mentioned, so as to keep the data small and focused, for maximum processing efficiency. -- Rob McEwen http://www.invaluement.com
Re: ramsonware URI list
On Sat, 2017-07-15 at 09:59 -0700, Ian Zimmerman wrote: > On 2017-07-15 11:59, Antony Stone wrote: > > > Maybe other people have further optimisations. > > With awk already part of the pipeline, all those seds are screaming > for > a vacation. > Indeed. I think the whole job can be done fairly easily with a single awk script. I didn't look at the input (have parts of it appeared on this list?), which makes it hard to work out what the entire pipeline does. However, the more I look at it the more it looks as if awk's default action of chopping each line into words would, when combined with awk functions that use regexes to modify words - gsub() and friends - should simplify the whole exercise. To the OP: if you want to raise your game with using sed and awk, about the best thing yo can do is to get the O'Reilly "sed & awk" book by Dale Dougherty - its a real eye-opener and much easier to read and understand than the manpages, if only because its better organised and includes a lot of example code. Martin
Re: ramsonware URI list
On Sat, 15 Jul 2017, Antony Stone wrote: On Saturday 15 July 2017 at 11:19:54, mastered wrote: Hi Nicola, I'm not good at SHELL script language, but this might be fine: 1 - Save file into lista.txt 2 - trasform lista.txt in spamassassin rules: cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' | sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2" describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt) ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf [snip..] One observation; that list has over 10,000 entries which means that you're going to be adding thousands of additional rules to SA on an automated basis. Some time in the past other people had worked up automated mechanisms to add large numbers of rules derived from example spam messages (Hi Chris;) and there were performance issues (significant increase in SA load time, memory usage, etc). Be aware, you may run into that situation. Using a URI-dnsbl avoids that risk. I see that list gets updated frequently. How quickly do stale entries get removed from it? I couldn't find a policy statement about that other than the note about the 30 days retention for the RW_IPBL list. Checking a random sample of the URLs on that list, the majority of them hit 404 errors. If that list grows with out bound and isn't periodically pruned of stale entries then it will become problematic for automated rule generation. I'm not saying that this isn't an idea worth pursuing, just be aware there may be issues. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: ramsonware URI list
On Sat, 15 Jul 2017, Antony Stone wrote: On Saturday 15 July 2017 at 11:19:54, mastered wrote: Hi Nicola, I'm not good at SHELL script language, but this might be fine: 1 - Save file into lista.txt 2 - trasform lista.txt in spamassassin rules: cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' | sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2" describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt) ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf If anyone can optimize it, i'm happy. My first comment would be "useless use of cat" :) My second comment would be that you can combine sed commands into a single string, separated by ; so that you only have to call sed itself once at the start of all that: sed "s'/http:\/\///'; s'/\/.*//'; s'/\./\\./g'; s'/^/\//'; s'/$/\\b\/i/'" lista.txt | nl . Another observation/optimization; use the perl pattern-match separator character specifier to avoid delimiter collision. (EG "m!" ). The following two regexes are functionally equivalent but one is easier to write/read: /http:\/\/site\.com\/this\/that\/the\other\//i m!http://site\.com/this/that/the/other/!i Second one avoids the "Leaning toothpick syndrome" https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome Another way to use that data is to extract the hostnames and feed them into a local URI-dnsbl. Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM overhead) way to implement a local DNSbl for multiple purposes (EG an IP-addr based list for RBLDNSd or host-name based URI-dnsbl). The URI-dnsbl has an advantage of being easy to add names (just 'cat' them on to the end of the data-file with appropriate suffix) and doesn't require a restart of any daemon to take effect. Clearly it has a greater risk of FPs than a targeted rule that matches on the specific URL of the malware. However if the site is purpose created by blackhats to disseminate malware or a legitimate site that has been compromised and isn't being maintained then there's a high probability that it will be (ab)used again for other payloads. In that case blacklisting the host name gets all future garbage too. IMHO: any site on that list with more than 3 entries or a registration age of less than a year is fair game for URIdnsbl listing. Looking at that data there are clearly several patterns that could be used to create targeted rules. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: ramsonware URI list
On 2017-07-15 11:59, Antony Stone wrote: > Maybe other people have further optimisations. With awk already part of the pipeline, all those seds are screaming for a vacation. Also, isn't the following command just a no-op? sed -n p A couple of quick tests failed to detect any difference from cat ;-) -- Please don't Cc: me privately on mailing lists and Usenet, if you also post the followup to the list or newsgroup. Do obvious transformation on domain to reply privately _only_ on Usenet.
Re: "bout u" campaign
On Thu, 13 Jul 2017 18:26:54 -0400 Alex wrote: > Hi, > > >> Are you paying for DCC? I think we're over their limit and they > >> blacklisted us long ago, lol. > > > > I have my own DCC server joined into the DCC network. > > > > https://www.dcc-servers.net/dcc/ > > So you only provide spam services for your own users? Or do you pay? > > > I am classifying about 10K ham and 8K spam each day which I also > > use in the masscheck processing (currently on hold). Since I have > > started doing this > > Through autolearn? > > It is otherwise extremely time-intensive. > > > Yep. Again my block threshold is 6.0 in MailScanner and I have > > less default trust for FREEMAIL senders. I also have meta rules > > based on FREEMAIL and other hits that add to the score based on > > combinations I have seen over the years. > > Adjusting many of the default rules disrupts the score balance created > by masschecks, no? > > I want to avoid having to juggle scores around, in addition to already > worrying about writing rules that ultimately have the same effect as > existing metas. > > >>> 2.2 ENA_DIGEST_FREEMAILFreemail account hitting message > >>> digest spam seen by the Internet (DCC, Pyzor, or Razor). > > Are you worried about overlap between the checksum systems? > > I've enabled DCC again today, and remembered what I don't like about > it. Do you have DCC_CHECK at its default 1.1 score? That's quite high > for something described as "bulk mail" when bulk mail is already > scored very close to 5.0. And with FREEMAIL_FROM plus DCC_CHECK (or any digest) you have 1.2 FREEMAIL_FROM 2.2 DCC_CHECK 2.2 ENA_DIGEST_FREEMAIL 0.0 ENA_BAD_SPAM which is 5.6 points. And judging by the name, at least in some cases, maybe all: 2.2 ENA_BAD_SPAM_FREEMAIL which makes 7.8 points. This is something that presumably works for him, but could cause problems in general.
Re: "bout u" campaign
On 07/14/2017 09:22 PM, Alex wrote: Hi, The ENA_BAD_SPAM rule is a combination of 2 different types (reputation and content) rules with an AND between them. For example (this is is about one-third of the rule): Is it usable like this? Try it out with a score of 0.001 and see what you think. It should have been valid. Just drop it in and run: spamassassin -D --lint 2>&1 | /bin/grep -Ei '(failed|undefined dependency|score set for non-existent rule)' | /bin/grep ENA_ By "usable" I meant have you included enough of the rule for it to really be effective? I let it run for the day, and it's just not anchored well enough to provide any meaningful benefit. It's hitting on jcpenny, vresp.com, constantcontact, sendgrid, facebook, etc. I have all of those senders in whitelist_auth entries. The ENA_BAD_SPAM has a score of 0.001 just as a place holder for other meta rules based on it that have a score of 1.2 - 3.2. Once you setup different tiers of senders and SHORTCIRCUIT all of the trusted senders that usually score very low, you will be able to handle regular and untrusted senders more aggressively. As I have said before, I SHORTCIRCUIT as ham thousands of domains based on their envelope-from domain as long as they have legit unsubscribe/opt out processes/links. Now I don't have to worry about these being falsely categorized as spam based on content. I don't SHORTCIRCUIT any FREEMAIL domains or any domains that have user mailboxes that can be compromised. My MTA blocks the majority of the junk so what passes through SA is mostly SHORTCIRCUIT'd as ham. Less than 5 percent is spam blocked by SA. I only get the occasional report of spam from customers from compromised accounts now which are very difficult to block based on reputation. Content-based rules are really the only way since these spammers are crafting zero-hour email that are designed to get through major mail filters. -- David Jones
Re: ramsonware URI list
On Saturday 15 July 2017 at 11:19:54, mastered wrote: > Hi Nicola, > > I'm not good at SHELL script language, but this might be fine: > > 1 - Save file into lista.txt > > 2 - trasform lista.txt in spamassassin rules: > > cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' | > sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2" > describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware > score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt) > ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf > > > If anyone can optimize it, i'm happy. My first comment would be "useless use of cat" :) My second comment would be that you can combine sed commands into a single string, separated by ; so that you only have to call sed itself once at the start of all that: sed "s'/http:\/\///'; s'/\/.*//'; s'/\./\\./g'; s'/^/\//'; s'/$/\\b\/i/'" lista.txt | nl . My only other comment is that you might want to adjust the spelling of Ransomware :) Maybe other people have further optimisations. Antony. -- The gravitational attraction exerted by a single doctor at a distance of 6 inches is roughly twice that of Jupiter at its closest point to the Earth. Please reply to the list; please *don't* CC me.
Re: ramsonware URI list
Hi Nicola, I'm not good at SHELL script language, but this might be fine: 1 - Save file into lista.txt 2 - trasform lista.txt in spamassassin rules: cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' | sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2" describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt) ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf If anyone can optimize it, i'm happy. Alberto. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/ramsonware-URI-list-tp122939p135313.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.