Re: ramsonware URI list

2017-07-15 Thread Ian Zimmerman
On 2017-07-15 12:19, David B Funk wrote:

> Another way to use that data is to extract the hostnames and feed them
> into a local URI-dnsbl.

> Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM
> overhead) way to implement a local DNSbl for multiple purposes (EG an
> IP-addr based list for RBLDNSd or host-name based URI-dnsbl).

> The URI-dnsbl has an advantage of being easy to add names (just 'cat'
> them on to the end of the data-file with appropriate suffix) and
> doesn't require a restart of any daemon to take effect.

But one still needs to signal rbldnsd to reload the data, right?

If one has just hostname data or fixed IP address data (no ranges) yet
another option is the "constant database" cdb [1].  I use it a lot for
these purposes.  You can even match domain wildcards, by successively
stripping the most significant parts of the subject domain before trying
the match.

I am wondering if (or why not) a similar no-daemon option exists for
CIDR range data.  There are definitely perl modules that manipulate such
data, but none I'm aware of with a built-in compiled, quickly loaded
dataset format.

[1]
https://cr.yp.to/cdb.html
-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: ramsonware URI list

2017-07-15 Thread RW
On Sat, 15 Jul 2017 13:13:31 -0500 (CDT)
David B Funk wrote:

> > On Sat, 15 Jul 2017, Antony Stone wrote:

> One observation; that list has over 10,000 entries which means that
> you're going to be adding thousands of additional rules to SA on an
> automated basis.
> 
> Some time in the past other people had worked up automated mechanisms
> to add large numbers of rules derived from example spam messages (Hi
> Chris;) and there were performance issues (significant increase in SA
> load time, memory usage, etc).

I'm not an expert on perl internals, so I may be wide of the mark,
but I would have thought that the most efficient way to do this
using uri rule(s) would be to generate a single regex recursively so
that scanning would be O(log(n)) in the number of entries rather than
O(n). 

You start by stripping the http://  and then make a list of the all
the first characters, then for each character you recurse. You end up
with something like 

^http://(a(...)|b(...)...|z(...))

Where each of the (...) contains a similar list of alternations to the
top level. 

You can take this a bit further and detect when the all the strings in
the current list start with a common sub-string - you can then generate
the equivalent of a patricia trie in regex form.  


> Be aware, you may run into that situation. Using a URI-dnsbl avoids
> that risk.

The list contains full URLs, I presume there's a reason for that. For
example:

http://invoiceholderqq.com/85.exe
http://invoiceholderqq.com/87.exe
http://invoiceholderqq.com/93.exe
http://inzt.net/08yhrf3
http://inzt.net/0ftce4


Re: ramsonware URI list

2017-07-15 Thread mastered
Ahuhaauahu ok ok

Thankyou for replay





--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/ramsonware-URI-list-tp122939p135315.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: ramsonware URI list

2017-07-15 Thread Rob McEwen

On 7/15/2017 2:13 PM, David B Funk wrote:

How quickly do stale entries get removed from it?


I randomly sorted this list, then I tried visiting 10 randomly selected 
links. I know that isn't a very large sample size, but it is a strong 
indicator since they were purely randomly chosen. 9 of the 10 links had 
already been taken down. So there might be much stale data in that list?


I also extracted out the host names, deleted duplicates, randomly sorted 
those, then ran checks of 500 randomly selected host names against 
SURBL, URIBL, DBL, and ivmURI. The number of hits on all 4 lists of 
shockingly low. But I think that probably has more to do with stale data 
on this URL list (and this is really a URL list, not a URI list), rather 
than with lack of effectiveness of these other domain/URI blacklists.


Still, there can be situations where a URI list won't list such a host 
name due to too much collateral damage - but yet where a URL list that 
specifically lists the entire URL - can still be effective.


Because such a URL list would be LESS efficient (due to being 
rules-based), it would be preferable that such a list would have much 
less stale data - and perhaps would focus on the stuff that isn't found 
on any (or very many) of the 4 major URI lists I mentioned, so as to 
keep the data small and focused, for maximum processing efficiency.


--
Rob McEwen
http://www.invaluement.com


Re: ramsonware URI list

2017-07-15 Thread Martin Gregorie
On Sat, 2017-07-15 at 09:59 -0700, Ian Zimmerman wrote:
> On 2017-07-15 11:59, Antony Stone wrote:
> 
> > Maybe other people have further optimisations.
> 
> With awk already part of the pipeline, all those seds are screaming
> for
> a vacation.
> 
Indeed. I think the whole job can be done fairly easily with a single
awk script. I didn't look at the input (have parts of it appeared on
this list?), which makes it hard to work out what the entire pipeline
does. However, the more I look at it the more it looks as if awk's
default action of chopping each line into words would, when combined
with awk functions that use regexes to modify words - gsub() and
friends - should simplify the whole exercise.

To the OP: if you want to raise your game with using sed and awk, about
the best thing yo can do is to get the O'Reilly "sed & awk" book by
Dale Dougherty - its a real eye-opener and much easier to read and
understand than the manpages, if only because its better organised and
includes a lot of example code.


Martin



Re: ramsonware URI list

2017-07-15 Thread David B Funk

On Sat, 15 Jul 2017, Antony Stone wrote:


On Saturday 15 July 2017 at 11:19:54, mastered wrote:


Hi Nicola,

I'm not good at SHELL script language, but this might be fine:

1 - Save file into lista.txt

2 - trasform lista.txt in spamassassin rules:

cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print 
"uri;RULE_NR_"$1";"$2"

describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt)
; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf

[snip..]

One observation; that list has over 10,000 entries which means that you're going 
to be adding thousands of additional rules to SA on an automated basis.


Some time in the past other people had worked up automated mechanisms to add 
large numbers of rules derived from example spam messages (Hi Chris;) and there 
were performance issues (significant increase in SA load time, memory usage, 
etc).

Be aware, you may run into that situation. Using a URI-dnsbl avoids that risk.

I see that list gets updated frequently. How quickly do stale entries get 
removed from it?
I couldn't find a policy statement about that other than the note about the 30 
days retention for the RW_IPBL list.
Checking a random sample of the URLs on that list, the majority of them hit 
404 errors.
If that list grows with out bound and isn't periodically pruned of stale entries 
then it will become problematic for automated rule generation.


I'm not saying that this isn't an idea worth pursuing, just be aware there may 
be issues.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: ramsonware URI list

2017-07-15 Thread David B Funk

On Sat, 15 Jul 2017, Antony Stone wrote:


On Saturday 15 July 2017 at 11:19:54, mastered wrote:


Hi Nicola,

I'm not good at SHELL script language, but this might be fine:

1 - Save file into lista.txt

2 - trasform lista.txt in spamassassin rules:

cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2"
describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt)
; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf


If anyone can optimize it, i'm happy.


My first comment would be "useless use of cat" :)

My second comment would be that you can combine sed commands into a single
string, separated by ; so that you only have to call sed itself once at the
start of all that:

sed "s'/http:\/\///'; s'/\/.*//'; s'/\./\\./g'; s'/^/\//'; s'/$/\\b\/i/'"
lista.txt | nl .


Another observation/optimization; use the perl pattern-match separator character 
specifier to avoid delimiter collision. (EG "m!" ).


The following two regexes are functionally equivalent but one is easier to 
write/read:


  /http:\/\/site\.com\/this\/that\/the\other\//i

  m!http://site\.com/this/that/the/other/!i

Second one avoids the "Leaning toothpick syndrome" 
https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome


Another way to use that data is to extract the hostnames and feed them into a 
local URI-dnsbl.
Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM overhead) way 
to implement a local DNSbl for multiple purposes (EG an IP-addr based list for 
RBLDNSd or host-name based URI-dnsbl).
The URI-dnsbl has an advantage of being easy to add names (just 'cat' them on to 
the end of the data-file with appropriate suffix) and doesn't require a restart 
of any daemon to take effect.
Clearly it has a greater risk of FPs than a targeted rule that matches on the 
specific URL of the malware. However if the site is purpose created by blackhats 
to disseminate malware or a legitimate site that has been compromised and isn't 
being maintained then there's a high probability that it will be (ab)used again 
for other payloads. In that case blacklisting the host name gets all future 
garbage too.
IMHO: any site on that list with more than 3 entries or a registration age of 
less than a year is fair game for URIdnsbl listing.


Looking at that data there are clearly several patterns that could be used to 
create targeted rules.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: ramsonware URI list

2017-07-15 Thread Ian Zimmerman
On 2017-07-15 11:59, Antony Stone wrote:

> Maybe other people have further optimisations.

With awk already part of the pipeline, all those seds are screaming for
a vacation.

Also, isn't the following command just a no-op?

sed -n p

A couple of quick tests failed to detect any difference from cat ;-)

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: "bout u" campaign

2017-07-15 Thread RW
On Thu, 13 Jul 2017 18:26:54 -0400
Alex wrote:

> Hi,
> 
> >> Are you paying for DCC? I think we're over their limit and they
> >> blacklisted us long ago, lol.  
> >
> > I have my own DCC server joined into the DCC network.
> >
> > https://www.dcc-servers.net/dcc/  
> 
> So you only provide spam services for your own users? Or do you pay?
> 
> > I am classifying about 10K ham and 8K spam each day which I also
> > use in the masscheck processing (currently on hold).  Since I have
> > started doing this  
> 
> Through autolearn?
> 
> It is otherwise extremely time-intensive.
> 
> > Yep.  Again my block threshold is 6.0 in MailScanner and I have
> > less default trust for FREEMAIL senders.  I also have meta rules
> > based on FREEMAIL and other hits that add to the score based on
> > combinations I have seen over the years.  
> 
> Adjusting many of the default rules disrupts the score balance created
> by masschecks, no?
> 
> I want to avoid having to juggle scores around, in addition to already
> worrying about writing rules that ultimately have the same effect as
> existing metas.
> 
> >>>   2.2 ENA_DIGEST_FREEMAILFreemail account hitting message
> >>> digest spam seen by the Internet (DCC, Pyzor, or Razor).  
> 
> Are you worried about overlap between the checksum systems?
> 
> I've enabled DCC again today, and remembered what I don't like about
> it. Do you have DCC_CHECK at its default 1.1 score? That's quite high
> for something described as "bulk mail" when bulk mail is already
> scored very close to 5.0.

And with  FREEMAIL_FROM plus DCC_CHECK (or any digest) you
have 

1.2 FREEMAIL_FROM 
2.2 DCC_CHECK
2.2 ENA_DIGEST_FREEMAIL
0.0 ENA_BAD_SPAM

which is 5.6 points. And judging by the name, at least in some cases,
maybe all:

2.2 ENA_BAD_SPAM_FREEMAIL

which makes  7.8 points. This is something that presumably works for
him, but could cause problems in general.

 






Re: "bout u" campaign

2017-07-15 Thread David Jones

On 07/14/2017 09:22 PM, Alex wrote:

Hi,


The ENA_BAD_SPAM rule is a combination of 2 different types (reputation
and
content) rules with an AND between them.  For example (this is is about
one-third of the rule):


Is it usable like this?


Try it out with a score of 0.001 and see what you think.  It should have
been valid.  Just drop it in and run:

spamassassin -D --lint 2>&1 | /bin/grep -Ei '(failed|undefined
dependency|score set for non-existent rule)' | /bin/grep ENA_


By "usable" I meant have you included enough of the rule for it to
really be effective?

I let it run for the day, and it's just not anchored well enough to
provide any meaningful benefit. It's hitting on jcpenny, vresp.com,
constantcontact, sendgrid, facebook, etc.



I have all of those senders in whitelist_auth entries.  The ENA_BAD_SPAM 
has a score of 0.001 just as a place holder for other meta rules based 
on it that have a score of 1.2 - 3.2.


Once you setup different tiers of senders and SHORTCIRCUIT all of the 
trusted senders that usually score very low, you will be able to handle 
regular and untrusted senders more aggressively.


As I have said before, I SHORTCIRCUIT as ham thousands of domains based 
on their envelope-from domain as long as they have legit unsubscribe/opt 
out processes/links.  Now I don't have to worry about these being 
falsely categorized as spam based on content.  I don't SHORTCIRCUIT any 
FREEMAIL domains or any domains that have user mailboxes that can be 
compromised.


My MTA blocks the majority of the junk so what passes through SA is 
mostly SHORTCIRCUIT'd as ham.  Less than 5 percent is spam blocked by 
SA.  I only get the occasional report of spam from customers from 
compromised accounts now which are very difficult to block based on 
reputation.  Content-based rules are really the only way since these 
spammers are crafting zero-hour email that are designed to get through 
major mail filters.


--
David Jones


Re: ramsonware URI list

2017-07-15 Thread Antony Stone
On Saturday 15 July 2017 at 11:19:54, mastered wrote:

> Hi Nicola,
> 
> I'm not good at SHELL script language, but this might be fine:
> 
> 1 - Save file into lista.txt
> 
> 2 - trasform lista.txt in spamassassin rules:
> 
> cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
> sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2"
> describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
> score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt)
> ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf
> 
> 
> If anyone can optimize it, i'm happy.

My first comment would be "useless use of cat" :)

My second comment would be that you can combine sed commands into a single 
string, separated by ; so that you only have to call sed itself once at the 
start of all that:

sed "s'/http:\/\///'; s'/\/.*//'; s'/\./\\./g'; s'/^/\//'; s'/$/\\b\/i/'" 
lista.txt | nl .

My only other comment is that you might want to adjust the spelling of 
Ransomware :)

Maybe other people have further optimisations.


Antony.

-- 
The gravitational attraction exerted by a single doctor at a distance of 6 
inches is roughly twice that of Jupiter at its closest point to the Earth.

   Please reply to the list;
 please *don't* CC me.


Re: ramsonware URI list

2017-07-15 Thread mastered
Hi Nicola, 

I'm not good at SHELL script language, but this might be fine:

1 - Save file into lista.txt

2 - trasform lista.txt in spamassassin rules:

cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2"
describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt) ;
do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf 


If anyone can optimize it, i'm happy.

Alberto.



--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/ramsonware-URI-list-tp122939p135313.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.