Re: add_header all Date of Scan _DATE_

2014-06-09 Thread Matus UHLAR - fantomas

On 09.06.14 05:49, Karsten Bräckelmann wrote:

Found the culprit after some digging. Bug 6915 [1], revision 1453407. As
a band-aid, the following trivial one-line patch fixes it. Can easily be
applied manually.


can that by any chance fix problem with Date: in mail received by SSL ?
That one behaves similarly...

http://mail-archives.apache.org/mod_mbox/spamassassin-users/201401.mbox/20140131144406.GA28818%40fantomas.sk


Since it is kind of way past getting late here, and there may be other
Template Tags affected, I'll defer proper bug handling and committing
code changes for tomorrow.


--- lib/Mail/SpamAssassin/Util.pm   (revision 1601300)
+++ lib/Mail/SpamAssassin/Util.pm   (working copy)
@@ -582,6 +582,7 @@
}

sub time_to_rfc822_date {
+  my $pms = shift;
  my($time) = @_;

  my @days = qw/Sun Mon Tue Wed Thu Fri Sat/;


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Despite the cost of living, have you noticed how popular it remains? 


Spam score range and distribution statistics?

2014-06-09 Thread Ben Stover
As far as I found out SpamAssassin calculates the spam score and puts the value 
into the email header.

What is the maximum range of the score?

-10,,+10

or other?

Is there a statistic for an average email account how much emails get which 
score?

In other words is there something like a gaussian distribution graphic 
visualisation?

Ben




Re: Forged yahoo and mass mailers

2014-06-09 Thread Anthony Cartmell
I have a few messages that have been incorrectly tagged because the  
sender

used their yahoo address as the sender, but used a mass mailer (
contactbeacon.com) to send their newsletter for them. Apparently this is
enough for it to hit FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO, causing it
to be marked as spam.

Is there something I'm missing, or is there a better way to do this to
avoid the FPs in the future?


The problem probably has something to do with Yahoo! (and AOL) publishing  
strict DMARC records. So anything From: a @yahoo.com (or @aol.com)  
address that isnt' coming from a Yahoo! (or AOL) mail server is required  
to be blocked according to DMARC.


The mass mailer needs to change the From: address to be something  
@contactbeacon.com and use the Reply-to: for the email address they want  
replies to go to. Certainly anything sent From: a @yahoo.com address but  
from a contactbeacon.com server will be rejected by mail systems that  
implement DMARC checking, such as Yahoo!, AOL, and more.


Anthony
--
www.fonant.com - Quality web sites
Tel. 01903 867 810
Fonant Ltd is registered in England and Wales, company No. 7006596
Registered office: Amelia House, Crescent Road, Worthing, West Sussex,  
BN11 1QR


Re: Spam score range and distribution statistics?

2014-06-09 Thread Matus UHLAR - fantomas

On 09.06.14 09:47, Ben Stover wrote:

As far as I found out SpamAssassin calculates the spam score and puts the
value into the email header.

What is the maximum range of the score?

-10,,+10


I don't think it has limits. Maybe just limist for integer.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Eagles may soar, but weasels don't get sucked into jet engines. 


Re: Spam score range and distribution statistics?

2014-06-09 Thread Antony Stone
On Monday 09 June 2014 at 09:50, Matus UHLAR - fantomas wrote:

 On 09.06.14 09:47, Ben Stover wrote:
 As far as I found out SpamAssassin calculates the spam score and puts the
  value into the email header.
 
 What is the maximum range of the score?
 
 -10,,+10
 
 I don't think it has limits. Maybe just limist for integer.

http://spamassassin.apache.org/gtube for example has a default score of 1000.


Antony.

-- 
In fact I wanted to be John Cleese and it took me some time to realise that 
the job was already taken.

 - Douglas Adams

 Please reply to the list;
   please don't CC me.


Re: add_header all Date of Scan _DATE_

2014-06-09 Thread Chris
On Mon, 2014-06-09 at 05:49 +0200, Karsten Bräckelmann wrote:
 On Sun, 2014-06-08 at 20:56 -0500, Chris wrote:
  In my etc/mail/spamassassin/local.cf I have the above line. I just
 
 For completeness: That add_header option does work, although there are
 actually exactly 3 arguments.
 
   add_header { spam | ham | all } header_name string
 
 Just like stock configuration shows, the string argument should be
 enclosed by double quotes.
 
   add_header all Date of Scan _DATE_
 
  upgraded to 3.4.0 today and I notice that the 'date of scan' is showing
  something like this:
 
 Sic, it's the (X-Spam-) Date header, not Date of Scan header. ;)
 
 
  X-spam-date: of Scan Sat, 21 Feb 1976 13:57:28 -0500
  
  Does this add header line not work anymore? Previous to the upgrade it
  was working correctly:
  
  X-spam-date: of Scan Sun, 08 Jun 2014 12:35:11 -0500
 
 Interesting. Unrelated to the number of arguments, though...
 
 Found the culprit after some digging. Bug 6915 [1], revision 1453407. As
 a band-aid, the following trivial one-line patch fixes it. Can easily be
 applied manually.
 
 Since it is kind of way past getting late here, and there may be other
 Template Tags affected, I'll defer proper bug handling and committing
 code changes for tomorrow.
 
 
 --- lib/Mail/SpamAssassin/Util.pm   (revision 1601300)
 +++ lib/Mail/SpamAssassin/Util.pm   (working copy)
 @@ -582,6 +582,7 @@
  }
  
  sub time_to_rfc822_date {
 +  my $pms = shift;
my($time) = @_;
  
my @days = qw/Sun Mon Tue Wed Thu Fri Sat/;
 
 
 [1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6915
 

Thanks Karsten, that did the trick. Much appreciated.

Chris

-- 
Chris
KeyID 0xE372A7DA98E6705C
31.11°N 97.89°W (Elev. 1092 ft)
08:31:32 up 6 days, 17:01, 2 users, load average: 0.28, 0.30, 0.26
Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb



Re: Viagra spam not caught

2014-06-09 Thread Daniele Paoni

On 06/07/2014 03:55 PM, Matus UHLAR - fantomas wrote:


On 06.06.14 18:06, Daniele Paoni wrote:

I deleted the bayes database and trained it using real spamham


I would not clear the BAYES DB so fast. Even BAYES_00 spam can become
BAYES_99 after a few properly trained samples.


OK, I will keep it in mind for the next time :-)


Today I got another one of these emails, the strange thing is that if
I scan it with spamassassin manually the TO_NO_BRKTS_MSFT is triggered
but it is not triggered on the original mail scanned with postfix +
amavisd-new.


did you reload amavis after spamassassin rule updates?

Yes I have also rebooted the server for a kernel upgrade so it was 
definitely restarted.




Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread Kevin A. McGrail

On 6/7/2014 3:31 AM, David B Funk wrote:

This does require
some baby-sitting as it will get traffic that is the results of a real 
human

fat-fingering a legit recipient.


Perhaps use just subdomains then?  Such as 
venusflyt...@invalid.uiowa.edu to eliminate the risk of legit 
fat-fingered email.


Regards,
KAM



Re: Forged yahoo and mass mailers

2014-06-09 Thread Kevin A. McGrail

On 6/8/2014 10:49 PM, Alex wrote:
I have a few messages that have been incorrectly tagged because the 
sender used their yahoo address as the sender, but used a mass mailer 
(contactbeacon.com http://contactbeacon.com) to send their 
newsletter for them. Apparently this is enough for it to hit 
FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO, causing it to be marked as spam.


Is there something I'm missing, or is there a better way to do this to 
avoid the FPs in the future?
People with Yahoo! accounts (and AOL) and any other senders that have a 
DMARC policy of reject/quarantine need to use either A) a mailing list 
sender that has modified their process for DMARC or B) not use those 
accounts.


See 
http://www.pcworld.com/article/2141120/yahoo-email-antispoofing-policy-breaks-mailing-lists.html


Regards,
KAM


Re: Spam score range and distribution statistics?

2014-06-09 Thread Bowie Bailey

On 6/9/2014 3:47 AM, Ben Stover wrote:

As far as I found out SpamAssassin calculates the spam score and puts the value 
into the email header.

What is the maximum range of the score?

-10,,+10

or other?


There are no limits on the score.  The higher the score, the more likely 
the email is spam and the lower the score, the more likely it is to be 
non-spam.  Looking through the last month's worth of logs on my server, 
I see scores ranging from -98 to 101.



Is there a statistic for an average email account how much emails get which 
score?

In other words is there something like a gaussian distribution graphic 
visualisation?


That would be different on every server depending on what type of spam 
and ham you see and which rule sets you are running.  I graphed mine out 
of curiosity and it forms a reasonable bell curve from -14 to 40 peaking 
at about 9.  Although there is an odd spike sticking up from -3 to 1 for 
some reason (and a rather large spike at 0).


I'm not a statistics guy, so I can't give you all the distribution 
numbers -- and, as I said, it will likely differ a fair amount between 
installations.


Are you just looking for general information, or is there something you 
are trying to determine?  If you tell us what you are looking for, we 
may be able to give you some better answers.


--
Bowie


Re: Spam score range and distribution statistics?

2014-06-09 Thread Joe Quinn

On 6/9/2014 11:34 AM, Bowie Bailey wrote:

On 6/9/2014 3:47 AM, Ben Stover wrote:
As far as I found out SpamAssassin calculates the spam score and puts 
the value into the email header.


What is the maximum range of the score?

-10,,+10

or other?


There are no limits on the score.  The higher the score, the more 
likely the email is spam and the lower the score, the more likely it 
is to be non-spam.  Looking through the last month's worth of logs on 
my server, I see scores ranging from -98 to 101.


Is there a statistic for an average email account how much emails get 
which score?


In other words is there something like a gaussian distribution 
graphic visualisation?


That would be different on every server depending on what type of spam 
and ham you see and which rule sets you are running.  I graphed mine 
out of curiosity and it forms a reasonable bell curve from -14 to 40 
peaking at about 9.  Although there is an odd spike sticking up from 
-3 to 1 for some reason (and a rather large spike at 0).


I'm not a statistics guy, so I can't give you all the distribution 
numbers -- and, as I said, it will likely differ a fair amount between 
installations.


Are you just looking for general information, or is there something 
you are trying to determine?  If you tell us what you are looking for, 
we may be able to give you some better answers.


That spike around zero is going to be your typical boring ham. It passes 
SPF and some other minor ham rules, and hits very very minor spam rules, 
if any.


RE: SPAM from a registrar

2014-06-09 Thread Patrick Domack

I have been tracking this for about 2 weeks now myself.

Comparing my list of new domains, shows that DOB seems to pick them up  
after they are 2 days old.


I also tried to compair my list to fresh.spameatingmonkey.net, but  
none of my domains in the 0-5days old would get a match for com/net  
domains. I do get some hits for info and us though. But it's normally  
com and a few us that are on my lists.


I am currently doing a whois lookups for about 30 tld's, and tracking  
their time and registar. I do minimize the lookups.


I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com  
(all the .com are ENOM) sending email to me, with an age 1day old.  
This is pretty consistant day to day.






Have you looked into Day old bread?   
http://wiki.apache.org/spamassassin/Rules/URIBL_RHS_DOB


 ...Kevin
--
Kevin Miller
Network/email Administrator, CBJ MIS Dept.
155 South Seward Street
.Juneau, Alaska 99801
Phone: (907) 586-0242, Fax: (907) 586-4500
Registered Linux User No: 307357
-Original Message-
From: James B. Byrne [mailto:byrn...@harte-lyne.ca]
Sent: Wednesday, May 14, 2014 8:52 AM
To: users@spamassassin.apache.org
Subject: SPAM from a registrar

This AM we received (and are continuing to receive) numerous spam  
messages from multiple domains
that were all registered today (2014-05-14) with a company called  
enom, inc.  This firm is
also the registrar for the the mail server domain BOSJAW.com that is  
ending some if not all

of the UCEM.  That server is hosted in CZ.

It seems likely that this is a planned UCEM campaign designed to use  
disposable domains, probably
registered with stolen credit cards or some other form of fraud, in  
order to escape blacklisting

services.  No doubt by tomorrow they will be abandoned.

Is there any test to check how long a domain name has been in  
existence and set a spam score

with that information?

Along the same lines, is there any test to determine the country of  
origin of the IP address

in the last hop before it connects to our servers?

- End forwarded message -

---BeginMessage---

I have been tracking this for about 2 weeks now myself.

Comparing my list of new domains, shows that DOB seems to pick them up  
after they are 2 days old.


I also tried to compair my list to fresh.spameatingmonkey.net, but  
none of my domains in the 0-5days old would get a match.


I am currently doing a whois lookups for about 30 tld's, and tracking  
their time and registar. I do minimize the lookups.


I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com  
(all the .com are ENOM) sending email to me, with an age 1day old.  
This is pretty consistant day to day.






Have you looked into Day old bread?   
http://wiki.apache.org/spamassassin/Rules/URIBL_RHS_DOB


 ...Kevin
--
Kevin Miller
Network/email Administrator, CBJ MIS Dept.
155 South Seward Street
.Juneau, Alaska 99801
Phone: (907) 586-0242, Fax: (907) 586-4500
Registered Linux User No: 307357
-Original Message-
From: James B. Byrne [mailto:byrn...@harte-lyne.ca]
Sent: Wednesday, May 14, 2014 8:52 AM
To: users@spamassassin.apache.org
Subject: SPAM from a registrar

This AM we received (and are continuing to receive) numerous spam  
messages from multiple domains
that were all registered today (2014-05-14) with a company called  
enom, inc.  This firm is
also the registrar for the the mail server domain BOSJAW.com that is  
ending some if not all

of the UCEM.  That server is hosted in CZ.

It seems likely that this is a planned UCEM campaign designed to use  
disposable domains, probably
registered with stolen credit cards or some other form of fraud, in  
order to escape blacklisting

services.  No doubt by tomorrow they will be abandoned.

Is there any test to check how long a domain name has been in  
existence and set a spam score

with that information?

Along the same lines, is there any test to determine the country of  
origin of the IP address

in the last hop before it connects to our servers?

---End Message---


Re: Spam score range and distribution statistics?

2014-06-09 Thread Karsten Bräckelmann
On Mon, 2014-06-09 at 11:34 -0400, Bowie Bailey wrote:
  In other words is there something like a gaussian distribution
  graphic visualisation?
 
 That would be different on every server depending on what type of spam 
 and ham you see and which rule sets you are running.  I graphed mine out 
 of curiosity and it forms a reasonable bell curve from -14 to 40 peaking 
 at about 9.  Although there is an odd spike sticking up from -3 to 1 for 
 some reason (and a rather large spike at 0).

I don't think that second spike is odd. That's the majority of your ham.

Since the data-set includes both spam and ham combined, there are two
spikes to be expected. A single bell curve would mean too many messages
in the gray area, no clear distinction between ham and spam, and
consequently lots of false positives and negatives.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SPAM from a registrar

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 1:23 PM, Patrick Domack wrote:

I have been tracking this for about 2 weeks now myself.

Comparing my list of new domains, shows that DOB seems to pick them up 
after they are 2 days old.


I also tried to compair my list to fresh.spameatingmonkey.net, but 
none of my domains in the 0-5days old would get a match for com/net 
domains. I do get some hits for info and us though. But it's normally 
com and a few us that are on my lists.


I am currently doing a whois lookups for about 30 tld's, and tracking 
their time and registar. I do minimize the lookups.


I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com 
(all the .com are ENOM) sending email to me, with an age 1day old. 
This is pretty consistant day to day.
I wonder how we can use DNS, an RBL and distributed lookups to get the 
age of domains AND share the information so it's centrally available...


Regards,
KAM


Re: SPAM from a registrar

2014-06-09 Thread Patrick Domack

Quoting Kevin A. McGrail kmcgr...@pccc.com:


On 6/9/2014 1:23 PM, Patrick Domack wrote:

I have been tracking this for about 2 weeks now myself.

Comparing my list of new domains, shows that DOB seems to pick them  
up after they are 2 days old.


I also tried to compair my list to fresh.spameatingmonkey.net, but  
none of my domains in the 0-5days old would get a match for com/net  
domains. I do get some hits for info and us though. But it's  
normally com and a few us that are on my lists.


I am currently doing a whois lookups for about 30 tld's, and  
tracking their time and registar. I do minimize the lookups.


I am currently seeing, about 2 .asia, 2 .uk, and then around 100  
.com (all the .com are ENOM) sending email to me, with an age 1day  
old. This is pretty consistant day to day.
I wonder how we can use DNS, an RBL and distributed lookups to get  
the age of domains AND share the information so it's centrally  
available...


That could be easily done. Only issue is, if you trust the distributed  
lookups to have accurate infomation.
I suppose we could build in a trust system, where if enough  
distributed clients upload the same info, it could be trusted.


This could work out pretty good. Each dns-rbl cluster could run with  
their own shared database, and you can cross-publish to other dns-rbl  
clusters, and set your own trust rating, depending on how many copies  
you get, on if you trust the info, or do your own whois lookup for the  
info.


Bad thing is, I wonder how fast these are hammers out, and if the  
trust and replication wouldn't matter, due to latency.






Re: SPAM from a registrar

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


On 6/9/2014 1:23 PM, Patrick Domack wrote:

 Comparing my list of new domains, shows that DOB seems to pick them up
 after they are 2 days old.


I wonder how we can use DNS, an RBL and distributed lookups to get the age of 
domains AND share the information so it's centrally available...


Perhaps we should cultivate contacts at a registrar so that the BL can be 
generated directly off their feed of changes?


Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree 
getting the data for free will be challenging.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control laws aren't enacted to control guns, they are enacted
  to control people: catholics (1500s), japanese peasants (1600s),
  blacks (1860s), italian immigrants (1911), armenians (1911),
  the irish (1920s), jews (1930s), blacks (1960s), the poor (always)
---
 739 days since the first successful private support mission to ISS (SpaceX)


Re: SPAM from a registrar

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 2:24 PM, Patrick Domack wrote:

Quoting Kevin A. McGrail kmcgr...@pccc.com:


On 6/9/2014 1:23 PM, Patrick Domack wrote:

I have been tracking this for about 2 weeks now myself.

Comparing my list of new domains, shows that DOB seems to pick them 
up after they are 2 days old.


I also tried to compair my list to fresh.spameatingmonkey.net, but 
none of my domains in the 0-5days old would get a match for com/net 
domains. I do get some hits for info and us though. But it's 
normally com and a few us that are on my lists.


I am currently doing a whois lookups for about 30 tld's, and 
tracking their time and registar. I do minimize the lookups.


I am currently seeing, about 2 .asia, 2 .uk, and then around 100 
.com (all the .com are ENOM) sending email to me, with an age 1day 
old. This is pretty consistant day to day.
I wonder how we can use DNS, an RBL and distributed lookups to get 
the age of domains AND share the information so it's centrally 
available...


That could be easily done. Only issue is, if you trust the distributed 
lookups to have accurate infomation.
I suppose we could build in a trust system, where if enough 
distributed clients upload the same info, it could be trusted.


This could work out pretty good. Each dns-rbl cluster could run with 
their own shared database, and you can cross-publish to other dns-rbl 
clusters, and set your own trust rating, depending on how many copies 
you get, on if you trust the info, or do your own whois lookup for the 
info.


Bad thing is, I wonder how fast these are hammers out, and if the 
trust and replication wouldn't matter, due to latency.
Thanks for weighing in.  These are all issues we've solved with other 
RBLs via rsync of the data and I want to keep the hurdle low for 
implementation so you are write about the trust rating, etc.


Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread David F. Skoll
On Mon, 09 Jun 2014 14:24:19 -0400
Patrick Domack patric...@patrickdk.com wrote:

 That could be easily done. Only issue is, if you trust the
 distributed lookups to have accurate infomation.
 I suppose we could build in a trust system, where if enough  
 distributed clients upload the same info, it could be trusted.

There's a company that offers a domain-age-like service:
https://www.farsightsecurity.com/Services/NOD/

Their approach is interesting (they receive a huge volume of DNS
traffic and keep track of domain lookups that are newly seen.)

Their price for practical volumes of lookups, unfortunately, is
ridiculously expensive, which has prevented us from pursuing this
any further.

Regards,

David.


Re: SPAM from a registrar

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 2:33 PM, John Hardin wrote:

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


On 6/9/2014 1:23 PM, Patrick Domack wrote:

 Comparing my list of new domains, shows that DOB seems to pick them up
 after they are 2 days old.


I wonder how we can use DNS, an RBL and distributed lookups to get 
the age of domains AND share the information so it's centrally 
available...


Perhaps we should cultivate contacts at a registrar so that the BL can 
be generated directly off their feed of changes?


Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree 
getting the data for free will be challenging.


Good idea.  If we can get existing data from trustable sources such as 
registries, we can add that to the source RBL and then only query the 
new ones.


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 2:38 PM, David F. Skoll wrote:

On Mon, 09 Jun 2014 14:24:19 -0400
Patrick Domack patric...@patrickdk.com wrote:


That could be easily done. Only issue is, if you trust the
distributed lookups to have accurate infomation.
I suppose we could build in a trust system, where if enough
distributed clients upload the same info, it could be trusted.

There's a company that offers a domain-age-like service:
https://www.farsightsecurity.com/Services/NOD/

Their approach is interesting (they receive a huge volume of DNS
traffic and keep track of domain lookups that are newly seen.)

Their price for practical volumes of lookups, unfortunately, is
ridiculously expensive, which has prevented us from pursuing this
any further.
I think the core issue is that age of domains is a good indicator of 
spam.  So there is merit in building a distributed look-up system using SA.


I have more ideas than resources, of course...


Re: SPAM from a registrar

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


On 6/9/2014 2:33 PM, John Hardin wrote:

 On Mon, 9 Jun 2014, Kevin A. McGrail wrote:

  On 6/9/2014 1:23 PM, Patrick Domack wrote:
Comparing my list of new domains, shows that DOB seems to pick 
them up after they are 2 days old.
 
  I wonder how we can use DNS, an RBL and distributed lookups to get the 
  age of domains AND share the information so it's centrally available...


 Perhaps we should cultivate contacts at a registrar so that the BL can be
 generated directly off their feed of changes?

 Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree
 getting the data for free will be challenging.


Good idea.  If we can get existing data from trustable sources such as 
registries, we can add that to the source RBL and then only query the new 
ones.


I was referring to a feed of the new ones. Inferring that is the difficult 
part, I was hoping there was some way to avoid the inference part.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control laws aren't enacted to control guns, they are enacted
  to control people: catholics (1500s), japanese peasants (1600s),
  blacks (1860s), italian immigrants (1911), armenians (1911),
  the irish (1920s), jews (1930s), blacks (1960s), the poor (always)
---
 739 days since the first successful private support mission to ISS (SpaceX)


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


So there is merit in building a distributed look-up system using SA.


Distributed lookup of *what*, though? Can you clarify that part of your 
idea? Are you referring to distributed whois queries for a domain name, to 
determine its age?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control laws aren't enacted to control guns, they are enacted
  to control people: catholics (1500s), japanese peasants (1600s),
  blacks (1860s), italian immigrants (1911), armenians (1911),
  the irish (1920s), jews (1930s), blacks (1960s), the poor (always)
---
 739 days since the first successful private support mission to ISS (SpaceX)


Re: SPAM from a registrar

2014-06-09 Thread Patrick Domack


Quoting Kevin A. McGrail kmcgr...@pccc.com:


On 6/9/2014 2:24 PM, Patrick Domack wrote:

Quoting Kevin A. McGrail kmcgr...@pccc.com:


On 6/9/2014 1:23 PM, Patrick Domack wrote:

I have been tracking this for about 2 weeks now myself.

Comparing my list of new domains, shows that DOB seems to pick  
them up after they are 2 days old.


I also tried to compair my list to fresh.spameatingmonkey.net,  
but none of my domains in the 0-5days old would get a match for  
com/net domains. I do get some hits for info and us though. But  
it's normally com and a few us that are on my lists.


I am currently doing a whois lookups for about 30 tld's, and  
tracking their time and registar. I do minimize the lookups.


I am currently seeing, about 2 .asia, 2 .uk, and then around 100  
.com (all the .com are ENOM) sending email to me, with an age  
1day old. This is pretty consistant day to day.
I wonder how we can use DNS, an RBL and distributed lookups to get  
the age of domains AND share the information so it's centrally  
available...


That could be easily done. Only issue is, if you trust the  
distributed lookups to have accurate infomation.
I suppose we could build in a trust system, where if enough  
distributed clients upload the same info, it could be trusted.


This could work out pretty good. Each dns-rbl cluster could run  
with their own shared database, and you can cross-publish to other  
dns-rbl clusters, and set your own trust rating, depending on how  
many copies you get, on if you trust the info, or do your own whois  
lookup for the info.


Bad thing is, I wonder how fast these are hammers out, and if the  
trust and replication wouldn't matter, due to latency.
Thanks for weighing in.  These are all issues we've solved with  
other RBLs via rsync of the data and I want to keep the hurdle low  
for implementation so you are write about the trust rating, etc.


Well, while rsync works, you need a source, if the source was a feed  
from the tld's themselfs, that would work just fine.


The main thing I'm more worried about here is making sure new domains  
are noticed. Atleast I have seen 1day old domains send a lot more  
spam than 2-3day old ones.


So the new, unknown domain, is going be more important to lookup.




Re: SPAM from a registrar

2014-06-09 Thread Jim Popovitch
On Mon, Jun 9, 2014 at 2:39 PM, Kevin A. McGrail kmcgr...@pccc.com wrote:

 On 6/9/2014 2:33 PM, John Hardin wrote:

 On Mon, 9 Jun 2014, Kevin A. McGrail wrote:

  On 6/9/2014 1:23 PM, Patrick Domack wrote:

  Comparing my list of new domains, shows that DOB seems to pick them up
  after they are 2 days old.


 I wonder how we can use DNS, an RBL and distributed lookups to get the
 age of domains AND share the information so it's centrally available...


 Perhaps we should cultivate contacts at a registrar so that the BL can be
 generated directly off their feed of changes?

 Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree
 getting the data for free will be challenging.

  Good idea.  If we can get existing data from trustable sources such as
 registries, we can add that to the source RBL and then only query the new
 ones.



I haven't been following this whole thread.

I always thought it odd to look for new domains.  I tend to think that
everything is new unless it's been seen before (and there's a bunch of data
out there on existing domains)

-Jim P.


Re: SPAM from a registrar

2014-06-09 Thread Axb

On 06/09/2014 08:39 PM, Kevin A. McGrail wrote:

On 6/9/2014 2:33 PM, John Hardin wrote:

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


On 6/9/2014 1:23 PM, Patrick Domack wrote:

 Comparing my list of new domains, shows that DOB seems to pick them up
 after they are 2 days old.


I wonder how we can use DNS, an RBL and distributed lookups to get
the age of domains AND share the information so it's centrally
available...


Perhaps we should cultivate contacts at a registrar so that the BL can
be generated directly off their feed of changes?

Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree
getting the data for free will be challenging.


Good idea.  If we can get existing data from trustable sources such as
registries, we can add that to the source RBL and then only query the
new ones.


WHOIS age data is a good indicator with a handful of TLDs but only in 
combination with their registrars and NS.

Even low scoring on age only will cause lost of surprises.

What you  want is something like reputation  data which URIBL publishes 
via datafeeds


http://www.uribl.com/datasets.shtml
domain_data.txt

and the you come across such zones  as .us which is slow in updating 
zone data.


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Rob McEwen
Domain age is a good metric to factor in. But I'm always fascinated with
some people's desire to block all messages with extremely new domains. 
(NOT saying that this applies to everyone who posted on this thread!)

Keep in mind that many large and famous businesses... who have fairly
good mail sending practices... sometimes launch a new products complete
with links to very newly registered domains. Same is often true for
advertisments for things like rock concerts, etc. Or web sites that deal
with specific events or hot-topic political issues that appeared out of
nowhere. Yes, some of these are UBE. But many are NOT!

These example provide one of the largest source of FPs for all the major
domain/URI blacklists. But the better domain/URI blacklists have good
mechanisms in place to (a) PREVENT... MANY of these from ever becoming
FPs in the first place, and (b) and where those mechanism failed, they
have good triggers/feedback to remove  whitelist such FPs VERY QUICKLY
if/when they do occur.

In contrast, many who might go overboard by outright blocking on
newness... and/or scoring too agressively on newness... may find
too-high FP problems kicking their butts in the long run. And when such
a FP starts happening, they may not have the proper telemetry to
catch/fix it until AFTER much FP damage has happened.

Personally, I think that the real problem here is that some of the most
famous URI/domain blacklists are NOT catching everything and/or NOT
catching everything fast enough... combined with many sys admins failing
to make use of ALL the good and low-FP URI/domain blacklists... where
they 'd see MUCH better results if they were using ALL of the good URI
blacklists! ...but I'm a little biased on this point! :)

-- 
Rob McEwen
+1 (478) 475-9032



Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread David F. Skoll
On Mon, 9 Jun 2014 11:51:21 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:

  So there is merit in building a distributed look-up system using SA.

 Distributed lookup of *what*, though? Can you clarify that part of
 your idea? Are you referring to distributed whois queries for a
 domain name, to determine its age?

Well, here's how it could be done.  Imagine someone runs a DNS zone
for newdomain.example.net.  You want to see if example.org is a new
domain, so you look up a TXT record for example.org.newdomain.example.net.

The DNS software that serves the zone newdomain.example.net runs
the following pseudo-code when example.org is looked up:

IF example.org is in my database
THEN
   return the TXT record associated with example.org
   update the last-looked-up time for example.org
ELSE
   generate a TXT record of the form MMDDHHMMSS corresponding to current 
time (UTC)
   insert it in the database
   return it
ENDIF

A background job will periodically clean out domains that haven't been
queried in a long time.

The clever part is that once lots of sites begin using this in their
SA setups, we'll very quickly build up quite an accurate database of
newly-seen domains that's completely independent of any registrar for
a data source.

Yes, spammers can poison it by specifically looking up a domain,
waiting a couple of days, and then spamming.  But I think most won't bother
(witness how effective greylisting still is.)

Furthermore, you can ignore all but the first few hundred lookups before you
enter the TXT record in the database; this will make it more expensive
for spammers to poison the data.  Or you could not enter a record in the
database until it has been looked up from 100 different IP addresses... I
can think of a few other countermeasures.

So who's volunteering to do this? :)

Regards,

David.


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 2:51 PM, John Hardin wrote:

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


So there is merit in building a distributed look-up system using SA.


Distributed lookup of *what*, though? Can you clarify that part of 
your idea? Are you referring to distributed whois queries for a domain 
name, to determine its age?
Yes.  Because whois data is hard to get and many whois servers limit 
lookups, distributing and sharing the lookup load to determine age of 
domains IMO has merit.




Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 3:02 PM, Rob McEwen wrote:

Domain age is a good metric to factor in. But I'm always fascinated with
some people's desire to block all messages with extremely new domains.
(NOT saying that this applies to everyone who posted on this thread!)

Keep in mind that many large and famous businesses... who have fairly
good mail sending practices... sometimes launch a new products complete
with links to very newly registered domains. Same is often true for
advertisments for things like rock concerts, etc. Or web sites that deal
with specific events or hot-topic political issues that appeared out of
nowhere. Yes, some of these are UBE. But many are NOT!

These example provide one of the largest source of FPs for all the major
domain/URI blacklists. But the better domain/URI blacklists have good
mechanisms in place to (a) PREVENT... MANY of these from ever becoming
FPs in the first place, and (b) and where those mechanism failed, they
have good triggers/feedback to remove  whitelist such FPs VERY QUICKLY
if/when they do occur.

In contrast, many who might go overboard by outright blocking on
newness... and/or scoring too agressively on newness... may find
too-high FP problems kicking their butts in the long run. And when such
a FP starts happening, they may not have the proper telemetry to
catch/fix it until AFTER much FP damage has happened.

Personally, I think that the real problem here is that some of the most
famous URI/domain blacklists are NOT catching everything and/or NOT
catching everything fast enough... combined with many sys admins failing
to make use of ALL the good and low-FP URI/domain blacklists... where
they 'd see MUCH better results if they were using ALL of the good URI
blacklists! ...but I'm a little biased on this point! :)
A great point.  My goal is simply to build a system to identify the age 
of domains and use it as YAIOS or yet another indicator of spamminess 
not as a poison pill.


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Patrick Domack

Quoting David F. Skoll d...@roaringpenguin.com:


On Mon, 9 Jun 2014 11:51:21 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:


 So there is merit in building a distributed look-up system using SA.



Distributed lookup of *what*, though? Can you clarify that part of
your idea? Are you referring to distributed whois queries for a
domain name, to determine its age?


Well, here's how it could be done.  Imagine someone runs a DNS zone
for newdomain.example.net.  You want to see if example.org is a new
domain, so you look up a TXT record for example.org.newdomain.example.net.

The DNS software that serves the zone newdomain.example.net runs
the following pseudo-code when example.org is looked up:

IF example.org is in my database
THEN
   return the TXT record associated with example.org
   update the last-looked-up time for example.org
ELSE
   generate a TXT record of the form MMDDHHMMSS corresponding to  
current time (UTC)

   insert it in the database
   return it
ENDIF

A background job will periodically clean out domains that haven't been
queried in a long time.

The clever part is that once lots of sites begin using this in their
SA setups, we'll very quickly build up quite an accurate database of
newly-seen domains that's completely independent of any registrar for
a data source.

Yes, spammers can poison it by specifically looking up a domain,
waiting a couple of days, and then spamming.  But I think most won't bother
(witness how effective greylisting still is.)

Furthermore, you can ignore all but the first few hundred lookups before you
enter the TXT record in the database; this will make it more expensive
for spammers to poison the data.  Or you could not enter a record in the
database until it has been looked up from 100 different IP addresses... I
can think of a few other countermeasures.

So who's volunteering to do this? :)

Regards,

David.


The point was, I have already done this, and have it in production. I  
did this cause this subject keeps coming up from time to time, and I  
was personally interested to see the results of it.


And I do agree with Rob McEwen on many points. And I would be  
hisentant to outright block. But so far, and I doubt much in real  
usage, and haven't found any yet, any issues with blocking 1day  
outright.


But then the only way to be completely sure of that, will be time.





Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, David F. Skoll wrote:


On Mon, 9 Jun 2014 11:51:21 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:


So there is merit in building a distributed look-up system using SA.



Distributed lookup of *what*, though? Can you clarify that part of
your idea? Are you referring to distributed whois queries for a
domain name, to determine its age?


The clever part is that once lots of sites begin using this in their
SA setups, we'll very quickly build up quite an accurate database of
newly-seen domains that's completely independent of any registrar for
a data source.


Ah, ok, that's where I was confused. The proposal is for a distributed 
network gathering newly-SEEN domain names, rather than newly-REGISTERED 
domain names.


Thanks for the clarification. I was focusing on the latter.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You can't reason a person out of a position if he didn't use
  reason to get there in the first place.   -- Kristopher, at Marko's
---
 739 days since the first successful private support mission to ISS (SpaceX)


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread David F. Skoll
On Mon, 09 Jun 2014 15:24:29 -0400
Patrick Domack patric...@patrickdk.com wrote:

 The point was, I have already done this, and have it in production.
 I did this cause this subject keeps coming up from time to time, and
 I was personally interested to see the results of it.

Interesting.  If you don't mind my asking... how much data do you
collect?  How many lookups/day?

I was thinking a system that gets lookups from thousands or more SA
installations would get a pretty good overview of new domains.  A local
installation would necessarily see a limited subset.

 And I do agree with Rob McEwen on many points. And I would be  
 hisentant to outright block. But so far, and I doubt much in real  
 usage, and haven't found any yet, any issues with blocking 1day  
 outright.

Or even just holding the mail for a day or so and then re-analyzing it.

Regards,

David.


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 3:24 PM, Patrick Domack wrote:
The point was, I have already done this, and have it in production. I 
did this cause this subject keeps coming up from time to time, and I 
was personally interested to see the results of it.


And I do agree with Rob McEwen on many points. And I would be 
hisentant to outright block. But so far, and I doubt much in real 
usage, and haven't found any yet, any issues with blocking 1day 
outright.


But then the only way to be completely sure of that, will be time.


My conjecture is that many people have built this for lower volume. But 
you can't be doing much volume or your IP gets blocked from whois 
servers.  The twist I want to do is bring more data back centralized 
from SA installations such as whois data where it can only be done in a 
distributed manner.


regards,
KAM


RE: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread David Jones
If SEM was able to detect newly registered domains more quickly then that would 
solve the problem.

From: John Hardin jhar...@impsec.org
Sent: Monday, June 09, 2014 2:24 PM
To: users@spamassassin.apache.org
Subject: Re: Domain ages (was Re: SPAM from a registrar)

On Mon, 9 Jun 2014, David F. Skoll wrote:

 On Mon, 9 Jun 2014 11:51:21 -0700 (PDT)
 John Hardin jhar...@impsec.org wrote:

 So there is merit in building a distributed look-up system using SA.

 Distributed lookup of *what*, though? Can you clarify that part of
 your idea? Are you referring to distributed whois queries for a
 domain name, to determine its age?

 The clever part is that once lots of sites begin using this in their
 SA setups, we'll very quickly build up quite an accurate database of
 newly-seen domains that's completely independent of any registrar for
 a data source.

Ah, ok, that's where I was confused. The proposal is for a distributed
network gathering newly-SEEN domain names, rather than newly-REGISTERED
domain names.

Thanks for the clarification. I was focusing on the latter.

--
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
   You can't reason a person out of a position if he didn't use
   reason to get there in the first place.   -- Kristopher, at Marko's
---
  739 days since the first successful private support mission to ISS (SpaceX)


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


On 6/9/2014 2:51 PM, John Hardin wrote:

 On Mon, 9 Jun 2014, Kevin A. McGrail wrote:

  So there is merit in building a distributed look-up system using SA.

 Distributed lookup of *what*, though? Can you clarify that part of your
 idea? Are you referring to distributed whois queries for a domain name, to
 determine its age?


Yes.  Because whois data is hard to get and many whois servers limit lookups, 
distributing and sharing the lookup load to determine age of domains IMO has 
merit.


Ah, I think there's still two different assumptions occurring in this 
discussion: newly-seen (David and Patrick) vs. newly-registered (me and 
Kevin)...


Maybe we need to clarify that first.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You can't reason a person out of a position if he didn't use
  reason to get there in the first place.   -- Kristopher, at Marko's
---
 739 days since the first successful private support mission to ISS (SpaceX)


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 3:33 PM, John Hardin wrote:

On Mon, 9 Jun 2014, Kevin A. McGrail wrote:


On 6/9/2014 2:51 PM, John Hardin wrote:

 On Mon, 9 Jun 2014, Kevin A. McGrail wrote:

  So there is merit in building a distributed look-up system using SA.

 Distributed lookup of *what*, though? Can you clarify that part of 
your
 idea? Are you referring to distributed whois queries for a domain 
name, to

 determine its age?


Yes.  Because whois data is hard to get and many whois servers limit 
lookups, distributing and sharing the lookup load to determine age of 
domains IMO has merit.


Ah, I think there's still two different assumptions occurring in this 
discussion: newly-seen (David and Patrick) vs. newly-registered (me 
and Kevin)...


Maybe we need to clarify that first. 


Good clarification.  The spam I envision stopping is spammers using 
things like stolen credit cards or trial accounts to register domains 
that they then spam and then disappear quite quickly.


So this builds a database of domain whois data (initial discussions 
focused on the creation date) using distributed SA nodes to build the data.


And I chose to discuss it here because I get more ideas than I have time 
and resources to implement.


Regards,
KAM


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 3:31 PM, David Jones wrote:

If SEM was able to detect newly registered domains more quickly then that would 
solve the problem.
That is the crux of the issue, yes.  So how do you identify new domains 
if the registrars/registries won't give you the data? That's the problem 
my idea solves by monitoring newly seen domains with the idea being that 
spammers are not going to buy domains and sit on them before using them.


Regards,
KAM


RE: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, David Jones wrote:

If SEM was able to detect newly registered domains more quickly then 
that would solve the problem.


Oh, agreed.

The problem is, a registrar feed of registration changes costs a lot, and 
this is a free project.


That's why I suggested trying to develop relationships with registrars, 
to maybe get them onboard with providing this data for free for this 
purpose.


It's possible that the Apache name could provide cachet to get registars 
onboard to provide rsync'able data feeds of domain names registered in the 
last N days. It might be possible/better to get them to provide the data 
to URIBL.org (to act as an aggregator) with a license to provide the data 
free via DNS (i.e. non-bulk access) and at a nominal fee for rsync access 
(which URIBL already charges for the data they collect).


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You can't reason a person out of a position if he didn't use
  reason to get there in the first place.   -- Kristopher, at Marko's
---
 739 days since the first successful private support mission to ISS (SpaceX)


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Axb

On 06/09/2014 09:38 PM, Kevin A. McGrail wrote:

That is the crux of the issue, yes.  So how do you identify new domains
if the registrars/registries won't give you the data? That's the problem
my idea solves by monitoring newly seen domains with the idea being that
spammers are not going to buy domains and sit on them before using them.


You get the TLD zone files... and depending on your budget you get them 
once/24hrs or hourly diffs (if you can affford a house in The Hamptons, 
you can afford the diffs .-)


Some TLDs won't handout zone, period.



Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Matthias Leisi
On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com wrote:


 I think the core issue is that age of domains is a good indicator of spam.
  So there is merit in building a distributed look-up system using SA.

 I have more ideas than resources, of course...


I repeat my question: which domain? HELO, MAIL FROM, From:, ...?

-- Matthias


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Kevin A. McGrail

On 6/9/2014 4:25 PM, Matthias Leisi wrote:



On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com 
mailto:kmcgr...@pccc.com wrote:


I think the core issue is that age of domains is a good indicator
of spam.  So there is merit in building a distributed look-up
system using SA.

I have more ideas than resources, of course...


I repeat my question: which domain? HELO, MAIL FROM, From:, ...?


I envision it for potentially any and all domains in the email.


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Matthias Leisi
On Mon, Jun 9, 2014 at 9:11 PM, David F. Skoll d...@roaringpenguin.com
wrote:


 The clever part is that once lots of sites begin using this in their
 SA setups, we'll very quickly build up quite an accurate database of
 newly-seen domains that's completely independent of any registrar for
 a data source.


dnswl.org (and many other DNSxLs) already have some of that data as part of
their parsing/handling of DNS logs.  For

Furthermore, you can ignore all but the first few hundred lookups before you
 enter the TXT record in the database; this will make it more expensive
 for spammers to poison the data.  Or you could not enter a record in the
 database until it has been looked up from 100 different IP addresses... I
 can think of a few other countermeasures.

 So who's volunteering to do this? :)


We had some plans to publish such data. However since it is not really
clear what domains to look for, we did not pursue that a lot further. We
have at least a primary domain for each DNSWL record, but at least
historically we were not strict in what type of domain to put there (we
like to use the domain name that most closely links the IPs to the
administratively responsible owner, which is admittedly somewhat vague).

Based on the useage data we gather, we can pretty accurately extract a
last seen date for a particular domain (or, it's associated IPs to be
exact).

*But*, again: which domains would be queried for such a list?

-- Matthias


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Patrick Domack


Quoting Matthias Leisi matth...@leisi.net:


On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com wrote:



I think the core issue is that age of domains is a good indicator of spam.
 So there is merit in building a distributed look-up system using SA.

I have more ideas than resources, of course...



I repeat my question: which domain? HELO, MAIL FROM, From:, ...?

-- Matthias


HELO hasn't matched anything in my tests.

MAIL FROM has matched many, though the helo's are always a different domain

From I have only started doing yesterday, and not sure exactly how I  
will track them. Likely just wait a few days, and check my ham/spam  
folders and compare what rules where hit.






Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Axb

On 06/09/2014 10:32 PM, Patrick Domack wrote:


Quoting Matthias Leisi matth...@leisi.net:


On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com
wrote:



I think the core issue is that age of domains is a good indicator of
spam.
 So there is merit in building a distributed look-up system using SA.

I have more ideas than resources, of course...



I repeat my question: which domain? HELO, MAIL FROM, From:, ...?

-- Matthias


HELO hasn't matched anything in my tests.

MAIL FROM has matched many, though the helo's are always a different domain

 From I have only started doing yesterday, and not sure exactly how I
will track them. Likely just wait a few days, and check my ham/spam
folders and compare what rules where hit.


LOTS of the recent .us  .me will match sender/ptr/A/HELO



Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread David F. Skoll
On Mon, 9 Jun 2014 22:31:55 +0200
Matthias Leisi matth...@leisi.net wrote:

 *But*, again: which domains would be queried for such a list?

I think MAIL FROM domain.

Regards,

David.


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread James B. Byrne

On Mon, June 9, 2014 15:35, Patrick Domack wrote:

 I guess what would need to be hammered out, is, the exact info wanted.
 We know age, and registrar. Though doing the registrar isn't so
 simple, as the same for just ENOM changes between tld, and even within
 a single tld (likely from the mergers they had).

My investigations of the domains used against us revealed that all of the
handful checked were between 4 and 20 hours old when first encountered by our
servers.

It would suffice I think to have a negative lookup RTBL service where if a
domain is not listed therein then may be considered as new, at least insofar
as mailing traffic is concerned.  The registrar and the age of the domain need
not concern us overmuch at the outset of a spam attack. What is more important
to know is whether the domain has been seen by others before and how long
before so that the information in DOB and SEM can be considered in that light.

Lookup domains may be added as and when they are encountered albeit after some
delay and only if some threshold of volume and distinct number of enquiring
hosts is passed.  A graded approach is probably called for with one listing a
previously unseen domain only after 24 hours from the first enquiry, one only
after 48, and so on.  Of course, the domains in question need to be verified
before being added.  And other precautions are no doubt necessary to avoid
poisoning or advance loading subversion attempts.

Comments?


-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3



Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Matthias Leisi
On Mon, Jun 9, 2014 at 9:11 PM, David F. Skoll d...@roaringpenguin.com
wrote:


 The DNS software that serves the zone newdomain.example.net runs
 the following pseudo-code when example.org is looked up:
 [..]

So who's volunteering to do this? :)


*raises hand*

I still have an experimental DNS server (written in Perl) lying around that
this more-or-less what is described here. The overall system would need a
bit more thought, though.

* Distributed over n nodes. Given that data can have pretty long TTL, it
does not need a lot of nodes, but still the distributed nature brings some
challenges.
* Definition of the granularity of data - should a first seen date be
returned, or an age (in days?)
* Querying whois servers is not practical at that scale.
* How would the queries be sent to the nodes? Domain-based BL-type queries?
* Would the SA project take on some operational responsibilities?
* The dnswl.org project can sponsor resources and take on some operational
aspects, but we would welcome some support.

-- Matthias


Local BL support?

2014-06-09 Thread Philip Prindeville
I’d like to add a plugin (and eventually share it once the bugs are out) that 
uses either Net::CIDR::Lite to allow manual entry of IP-based blacklists for 
known offending address blocks, or else using the Geo::IP module to blacklist 
based on the country or ISP.

It would need to expose parts of the API depending on how it detects the 
presence of modules, I suppose.

Not sure if it’s worth making run-time detection of the Geo::IP licenses and 
databases do the same.

Is there a prototype Plugin that I could use for doing parsing/looking up the 
URI’s hostname?  Since I’m using a local database without network access, it 
could happen synchronously…

Thanks,

-Philip



Re: Local BL support?

2014-06-09 Thread Axb

On 06/09/2014 10:46 PM, Philip Prindeville wrote:

I’d like to add a plugin (and eventually share it once the bugs are
out) that uses either Net::CIDR::Lite to allow manual entry of
IP-based blacklists for known offending address blocks, or else using
the Geo::IP module to blacklist based on the country or ISP.

It would need to expose parts of the API depending on how it detects
the presence of modules, I suppose.

Not sure if it’s worth making run-time detection of the Geo::IP
licenses and databases do the same.

Is there a prototype Plugin that I could use for doing
parsing/looking up the URI’s hostname?  Since I’m using a local
database without network access, it could happen synchronously…

Thanks,


The standard SA URIBL.pm ?
put your data in a local NS instance (rbldnsd, bind, whatever you prefer)



Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread Philip Prindeville

On Jun 6, 2014, at 3:50 PM, Axb axb.li...@gmail.com wrote:

 If you have to post a spam sample, pls use pastebin and post the full msg
 
 On 06/06/2014 11:32 PM, Philip Prindeville wrote:
 We’re getting a lot of spam that contains URL’s which look like (remove the 
 ):
 
 http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol
 
 Some observations… The URL’s should be fairly easy to filter against via a 
 regex.  Anyone have some working rules they could share?
 
 Pls note than any rule shared via lists usually looses its teeth within a few 
 hours .-)

Well, it depends on the nature of the rule…  Some characteristics are less 
fungible than others.


 
 
 The other thing is, the URL is almost always hosted by solarvps.com, in the 
 CIDR block 65.181.64.0/18.
 
 Is there an easy way to do a domain lookup on the host portion of the URL 
 and then filter it if it’s in this subnet?
 
 Yes, there is:
 
 run a local A record blacklist with rbldnsd
 
 65.181.64.0/18
 
 and a rule like, for example:
 
 uridnssub  YOUR_A_URIBL yourabl.example.net.  A  127.0.0.2
 body  YOUR_A_URIBLeval:check_uridnsbl('YOUR_A_URIBL')
 describe  YOUR_A_URIBLURL domain A rec listed by YOUR_A_URIBL
 score YOUR_A_URIBL  5.0
 tflags YOUR_A_URIBL   net a
 
 


If I used local A records, for a /18 network, I’d need all 2^14 records, right?

Because a lookup is always on a full dotted-quad (in reverse order)…

I tried using multi.uribl.com and couldn’t get this to work.

I had:

urirhssub L_URIBL_BLACK multi.uribl.com. A 2
body L_URIBL_BLACK  eval:check_uridnsbl('L_URIBL_BLACK')
describe L_URIBL_BLACK  Contains a URL listed in the URIBL blacklist
tflags L_URIBL_BLACKnet
score L_URIBL_BLACK 20.0


set, and also:

skip_rbl_checks 0

at the end of /etc/mail/spamassassin/sa-mimedefang.cf set.

Running this over the message in a file:

spamassassin -t --lint -D  /tmp/cable.eml

I get:

…
Jun  9 14:57:13.029 [32297] dbg: rules: compiled meta tests
Jun  9 14:57:13.032 [32297] dbg: check: is spam? score=-2.348 required=5
Jun  9 14:57:13.032 [32297] dbg: check: 
tests=L_EMPTY_SENDER,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Jun  9 14:57:13.032 [32297] dbg: check: 
subtests=__BODY_TEXT_LINE,__EMPTY_BODY,__EMPTY_SENDER,__GATED_THROUGH_RCVD_REMOVER,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_SUBJECT,__L_UNDISCLOSED2,__MISSING_REF,__MISSING_REPLY,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__SANE_MSGID,__TO_NO_ARROWS_R,__UNUSABLE_MSGID
Jun  9 14:57:13.033 [32297] dbg: timing: total 1908 ms - init: 1384 (72.5%), 
parse: 1.17 (0.1%), extract_message_metadata: 11 (0.6%), get_uri_detail_list: 
1.06 (0.1%), tests_pri_-1000: 9 (0.5%), compile_gen: 202 (10.6%), compile_eval: 
37 (1.9%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 7 (0.4%), tests_pri_-400: 
6 (0.3%), tests_pri_0: 404 (21.2%), tests_pri_500: 75 (3.9%)


so I’m not sure why it’s failing to find nqtel.com in the uribl.com database.

What am I missing?

-Philip



Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Axb

On 06/09/2014 10:43 PM, James B. Byrne wrote:


On Mon, June 9, 2014 15:35, Patrick Domack wrote:


I guess what would need to be hammered out, is, the exact info wanted.
We know age, and registrar. Though doing the registrar isn't so
simple, as the same for just ENOM changes between tld, and even within
a single tld (likely from the mergers they had).


My investigations of the domains used against us revealed that all of the
handful checked were between 4 and 20 hours old when first encountered by our
servers.

It would suffice I think to have a negative lookup RTBL service where if a
domain is not listed therein then may be considered as new, at least insofar
as mailing traffic is concerned.  The registrar and the age of the domain need
not concern us overmuch at the outset of a spam attack. What is more important
to know is whether the domain has been seen by others before and how long
before so that the information in DOB and SEM can be considered in that light.

Lookup domains may be added as and when they are encountered albeit after some
delay and only if some threshold of volume and distinct number of enquiring
hosts is passed.  A graded approach is probably called for with one listing a
previously unseen domain only after 24 hours from the first enquiry, one only
after 48, and so on.  Of course, the domains in question need to be verified
before being added.  And other precautions are no doubt necessary to avoid
poisoning or advance loading subversion attempts.

Comments?


You have a domain reputation method on your drawing board and imo, has 
some flaws:


- Delayed data is good for research, not to efficiently stop spam.

- Verifying anything that large needs 40k indians in the basement or 
huge clusters of cycles doing something - neither is trivial or cheap.


- There's a bunch of Passsive DNS projects which do what you're 
describing and non will work as the FUSSP - they're datapoints which can 
be combined wiht other stuff to achieve something (aka research)







Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread Axb

On 06/09/2014 11:03 PM, Philip Prindeville wrote:


On Jun 6, 2014, at 3:50 PM, Axb axb.li...@gmail.com wrote:


If you have to post a spam sample, pls use pastebin and post the full msg

On 06/06/2014 11:32 PM, Philip Prindeville wrote:

We’re getting a lot of spam that contains URL’s which look like (remove the 
):

http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol



Some observations… The URL’s should be fairly easy to filter against via a 
regex.  Anyone have some working rules they could share?


Pls note than any rule shared via lists usually looses its teeth within a few 
hours .-)


Well, it depends on the nature of the rule…  Some characteristics are less 
fungible than others.






The other thing is, the URL is almost always hosted by solarvps.com, in the 
CIDR block 65.181.64.0/18.

Is there an easy way to do a domain lookup on the host portion of the URL and 
then filter it if it’s in this subnet?


Yes, there is:

run a local A record blacklist with rbldnsd

65.181.64.0/18

and a rule like, for example:

uridnssub  YOUR_A_URIBL yourabl.example.net.  A  127.0.0.2
body  YOUR_A_URIBL  eval:check_uridnsbl('YOUR_A_URIBL')
describe  YOUR_A_URIBL  URL domain A rec listed by YOUR_A_URIBL
score YOUR_A_URIBL  5.0
tflags   YOUR_A_URIBL   net a





If I used local A records, for a /18 network, I’d need all 2^14 records, right?

Because a lookup is always on a full dotted-quad (in reverse order)…



nope... wiht robldnsd you set your BL zone to use the ip4trie dataset

which as per http://www.corpit.ru/mjt/rbldnsd/rbldnsd.8.html

ip4trie Dataset
Set of IP4 CIDR ranges with corresponding (A, TXT) values. This dataset 
is similar to ip4set, but uses a different internal representation. It 
accepts CIDR ranges only (not a.b.c.d−e.f.g.h), and allows for the 
specification of A/TXT values on a per CIDR range basis. (If multiple 
CIDR ranges match a query, the value for longest matching prefix is 
returned.) Exclusions are supported too.




I tried using multi.uribl.com and couldn’t get this to work.

I had:

urirhssub L_URIBL_BLACK multi.uribl.com. A 2
body L_URIBL_BLACK  eval:check_uridnsbl('L_URIBL_BLACK')
describe L_URIBL_BLACK  Contains a URL listed in the URIBL blacklist
tflags L_URIBL_BLACKnet
score L_URIBL_BLACK 20.0


URIBL is enabled by default in SA - no need to add extra rules.



set, and also:

skip_rbl_checks 0

at the end of /etc/mail/spamassassin/sa-mimedefang.cf set.

Running this over the message in a file:

spamassassin -t --lint -D  /tmp/cable.eml

I get:

…
Jun  9 14:57:13.029 [32297] dbg: rules: compiled meta tests
Jun  9 14:57:13.032 [32297] dbg: check: is spam? score=-2.348 required=5
Jun  9 14:57:13.032 [32297] dbg: check: 
tests=L_EMPTY_SENDER,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Jun  9 14:57:13.032 [32297] dbg: check: 
subtests=__BODY_TEXT_LINE,__EMPTY_BODY,__EMPTY_SENDER,__GATED_THROUGH_RCVD_REMOVER,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_SUBJECT,__L_UNDISCLOSED2,__MISSING_REF,__MISSING_REPLY,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__SANE_MSGID,__TO_NO_ARROWS_R,__UNUSABLE_MSGID
Jun  9 14:57:13.033 [32297] dbg: timing: total 1908 ms - init: 1384 (72.5%), 
parse: 1.17 (0.1%), extract_message_metadata: 11 (0.6%), get_uri_detail_list: 
1.06 (0.1%), tests_pri_-1000: 9 (0.5%), compile_gen: 202 (10.6%), compile_eval: 
37 (1.9%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 7 (0.4%), tests_pri_-400: 
6 (0.3%), tests_pri_0: 404 (21.2%), tests_pri_500: 75 (3.9%)


so I’m not sure why it’s failing to find nqtel.com in the uribl.com database.
What am I missing?


--lint doesn't do network tests








Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Richard Doyle
On 06/09/2014 12:29 PM, Kevin A. McGrail wrote:
 On 6/9/2014 3:24 PM, Patrick Domack wrote:
 The point was, I have already done this, and have it in production. I
 did this cause this subject keeps coming up from time to time, and I
 was personally interested to see the results of it.

 And I do agree with Rob McEwen on many points. And I would be
 hisentant to outright block. But so far, and I doubt much in real
 usage, and haven't found any yet, any issues with blocking 1day
 outright.

 But then the only way to be completely sure of that, will be time.

 My conjecture is that many people have built this for lower volume.
 But you can't be doing much volume or your IP gets blocked from whois
 servers.  The twist I want to do is bring more data back centralized
 from SA installations such as whois data where it can only be done in
 a distributed manner.

 regards,
 KAM


A caching whois client (jwhois, for example) can significantly reduce
the volume of queries.



Re: Local BL support?

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, Axb wrote:


On 06/09/2014 10:46 PM, Philip Prindeville wrote:

 I’d like to add a plugin (and eventually share it once the bugs are
 out) that uses either Net::CIDR::Lite to allow manual entry of
 IP-based blacklists for known offending address blocks, or else using
 the Geo::IP module to blacklist based on the country or ISP.

 Is there a prototype Plugin that I could use for doing
 parsing/looking up the URI’s hostname?  Since I’m using a local
 database without network access, it could happen synchronously…


The standard SA URIBL.pm ?
put your data in a local NS instance (rbldnsd, bind, whatever you prefer)


Second URIBL.pm.

For small sites it would be nice if it supported specifying a netblock 
explicitly in the rule. If you're only doing a few that would be easier 
than setting up a zone or rbldnsd. You might look at extending URIBL.pm to 
do that.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You are in a maze of twisty little protocols,
  all written by Microsoft.
--
 739 days since the first successful private support mission to ISS (SpaceX)

Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Matthias Leisi
On Mon, Jun 9, 2014 at 11:31 PM, Richard Doyle lists...@islandnetworks.com
wrote:


 A caching whois client (jwhois, for example) can significantly reduce
 the volume of queries.


You will need to query potentially hundreds or thousands of domains *per
day* - mostly throw away domains from spammers.

1) What are the typical rate limits on public whois servers?
2) How to protect against attackers sending random non-existant domain
names your way, thus ensuring you hit rate limites early?
3) How to parse the myriads of formats sent by whois servers?
4) How do you handle TLDs which do not publish registration dates, like eg
.de? (At least they did not last time I checked.)

Whois is not a feasible data source.

-- Matthias


Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Patrick Domack

Quoting Matthias Leisi matth...@leisi.net:


On Mon, Jun 9, 2014 at 11:31 PM, Richard Doyle lists...@islandnetworks.com
wrote:



A caching whois client (jwhois, for example) can significantly reduce
the volume of queries.



You will need to query potentially hundreds or thousands of domains *per
day* - mostly throw away domains from spammers.

1) What are the typical rate limits on public whois servers?
2) How to protect against attackers sending random non-existant domain
names your way, thus ensuring you hit rate limites early?
3) How to parse the myriads of formats sent by whois servers?
4) How do you handle TLDs which do not publish registration dates, like eg
.de? (At least they did not last time I checked.)

Whois is not a feasible data source.

-- Matthias


1) I dunno, but I am doing around 15k lookups a day, from a single ip,  
without getting limited/blocked
2) This is hard, and I don't know, currently the postfix reject  
unknown sender helps solve this for me, but won't for dns based lookups

3) This, while annoying, is solved in my code, not too hard
4) These I just don't bother doing lookups for, there is no solution,  
other than to let them bypass this system, or rate them via seen  
before method.





Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread Philip Prindeville

On Jun 9, 2014, at 3:10 PM, Axb axb.li...@gmail.com wrote:

 On 06/09/2014 11:03 PM, Philip Prindeville wrote:
 
 On Jun 6, 2014, at 3:50 PM, Axb axb.li...@gmail.com wrote:
 
 If you have to post a spam sample, pls use pastebin and post the full msg
 
 On 06/06/2014 11:32 PM, Philip Prindeville wrote:
 We’re getting a lot of spam that contains URL’s which look like (remove 
 the ):
 
 http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol
 
 Some observations… The URL’s should be fairly easy to filter against via a 
 regex.  Anyone have some working rules they could share?
 
 Pls note than any rule shared via lists usually looses its teeth within a 
 few hours .-)
 
 Well, it depends on the nature of the rule…  Some characteristics are less 
 fungible than others.


BTW, I found that the last N characters of the above URL’s were always the 
same, and tried to do a “body” rule based on those last N characters, but I 
couldn’t get the rule to match.

Still not sure why.  The entire a ... sequence is only 382 characters long.

Any ideas?


 
 
 
 
 The other thing is, the URL is almost always hosted by solarvps.com, in 
 the CIDR block 65.181.64.0/18.
 
 Is there an easy way to do a domain lookup on the host portion of the URL 
 and then filter it if it’s in this subnet?
 
 Yes, there is:
 
 run a local A record blacklist with rbldnsd
 
 65.181.64.0/18
 
 and a rule like, for example:
 
 uridnssub  YOUR_A_URIBL yourabl.example.net.  A  127.0.0.2
 body  YOUR_A_URIBL  eval:check_uridnsbl('YOUR_A_URIBL')
 describe  YOUR_A_URIBL  URL domain A rec listed by YOUR_A_URIBL
 score YOUR_A_URIBL  5.0
 tflags   YOUR_A_URIBL   net a
 
 
 
 
 If I used local A records, for a /18 network, I’d need all 2^14 records, 
 right?
 
 Because a lookup is always on a full dotted-quad (in reverse order)…
 
 
 nope... wiht robldnsd you set your BL zone to use the ip4trie dataset
 
 which as per http://www.corpit.ru/mjt/rbldnsd/rbldnsd.8.html
 
 ip4trie Dataset
 Set of IP4 CIDR ranges with corresponding (A, TXT) values. This dataset is 
 similar to ip4set, but uses a different internal representation. It accepts 
 CIDR ranges only (not a.b.c.d−e.f.g.h), and allows for the specification of 
 A/TXT values on a per CIDR range basis. (If multiple CIDR ranges match a 
 query, the value for longest matching prefix is returned.) Exclusions are 
 supported too.


Okay, and what would 65.181.64.0/18 look like as a BIND RR?  I wasn’t able to 
infer this from the documentation you pointed at.



 
 
 I tried using multi.uribl.com and couldn’t get this to work.
 
 I had:
 
 urirhssub L_URIBL_BLACK multi.uribl.com. A 2
 body L_URIBL_BLACK  eval:check_uridnsbl('L_URIBL_BLACK')
 describe L_URIBL_BLACK  Contains a URL listed in the URIBL blacklist
 tflags L_URIBL_BLACKnet
 score L_URIBL_BLACK 20.0
 
 URIBL is enabled by default in SA - no need to add extra rules.
 
 
 set, and also:
 
 skip_rbl_checks 0
 
 at the end of /etc/mail/spamassassin/sa-mimedefang.cf set.
 
 Running this over the message in a file:
 
 spamassassin -t --lint -D  /tmp/cable.eml
 
 I get:
 
 …
 Jun  9 14:57:13.029 [32297] dbg: rules: compiled meta tests
 Jun  9 14:57:13.032 [32297] dbg: check: is spam? score=-2.348 required=5
 Jun  9 14:57:13.032 [32297] dbg: check: 
 tests=L_EMPTY_SENDER,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
 Jun  9 14:57:13.032 [32297] dbg: check: 
 subtests=__BODY_TEXT_LINE,__EMPTY_BODY,__EMPTY_SENDER,__GATED_THROUGH_RCVD_REMOVER,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_SUBJECT,__L_UNDISCLOSED2,__MISSING_REF,__MISSING_REPLY,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__SANE_MSGID,__TO_NO_ARROWS_R,__UNUSABLE_MSGID
 Jun  9 14:57:13.033 [32297] dbg: timing: total 1908 ms - init: 1384 (72.5%), 
 parse: 1.17 (0.1%), extract_message_metadata: 11 (0.6%), 
 get_uri_detail_list: 1.06 (0.1%), tests_pri_-1000: 9 (0.5%), compile_gen: 
 202 (10.6%), compile_eval: 37 (1.9%), tests_pri_-950: 6 (0.3%), 
 tests_pri_-900: 7 (0.4%), tests_pri_-400: 6 (0.3%), tests_pri_0: 404 
 (21.2%), tests_pri_500: 75 (3.9%)
 
 
 so I’m not sure why it’s failing to find nqtel.com in the uribl.com database.
 What am I missing?
 
 --lint doesn't do network tests
 


Okay, taking out --lint changed the results.

Thanks,

-Philip



Re: Local BL support?

2014-06-09 Thread Philip Prindeville

On Jun 9, 2014, at 3:36 PM, John Hardin jhar...@impsec.org wrote:

 On Mon, 9 Jun 2014, Axb wrote:
 
 On 06/09/2014 10:46 PM, Philip Prindeville wrote:
 I’d like to add a plugin (and eventually share it once the bugs are
 out) that uses either Net::CIDR::Lite to allow manual entry of
 IP-based blacklists for known offending address blocks, or else using
 the Geo::IP module to blacklist based on the country or ISP.
 
 Is there a prototype Plugin that I could use for doing
 parsing/looking up the URI’s hostname?  Since I’m using a local
 database without network access, it could happen synchronously…
 
 The standard SA URIBL.pm ?
 put your data in a local NS instance (rbldnsd, bind, whatever you prefer)
 
 Second URIBL.pm.
 
 For small sites it would be nice if it supported specifying a netblock 
 explicitly in the rule. If you're only doing a few that would be easier than 
 setting up a zone or rbldnsd. You might look at extending URIBL.pm to do that.
 

I’m happy to try doing that, since I know Perl and need this…  I’m just lacking 
on the expertise about doing SA modules…  Anyone want to walk me through it?

-Philip




Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, Philip Prindeville wrote:


We’re getting a lot of spam that contains URL’s which look like (remove the 
):

http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol


BTW, I found that the last N characters of the above URL’s were always 
the same, and tried to do a “body” rule based on those last N 
characters, but I couldn’t get the rule to match.


Still not sure why.  The entire a ... sequence is only 382 characters 
long.


Any ideas?


If it's in an HTML anchor tag the URL itself isn't in the body text, 
only the display label will be.


Try a uri rule.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control laws cannot reduce violent crime, because gun control
  laws focus obsessively on a tool a criminal might use to commit a
  crime rather than the criminal himself and his act of violence.
---
 739 days since the first successful private support mission to ISS (SpaceX)

Re: Domain ages (was Re: SPAM from a registrar)

2014-06-09 Thread Richard Doyle
On 06/09/2014 02:42 PM, Matthias Leisi wrote:

 On Mon, Jun 9, 2014 at 11:31 PM, Richard Doyle
 lists...@islandnetworks.com mailto:lists...@islandnetworks.com wrote:
  

 A caching whois client (jwhois, for example) can significantly reduce
 the volume of queries.


 You will need to query potentially hundreds or thousands of domains
 *per day* - mostly throw away domains from spammers. 

 1) What are the typical rate limits on public whois servers?
Apparently higher than my usage (cached names aren't rechecked)

 2) How to protect against attackers sending random non-existant domain
 names your way, thus ensuring you hit rate limites early?
Sender verification

 3) How to parse the myriads of formats sent by whois servers?
Don't try (see 4)

 4) How do you handle TLDs which do not publish registration dates,
 like eg .de? (At least they did not last time I checked.)
I only check .com, .net and .org


 Whois is not a feasible data source.
Whois certainly has limited usefulness, but is a feasible data source
within those limits


 -- Matthias

-Richard



Re: Local BL support?

2014-06-09 Thread John Hardin

On Mon, 9 Jun 2014, Philip Prindeville wrote:



On Jun 9, 2014, at 3:36 PM, John Hardin jhar...@impsec.org wrote:


On Mon, 9 Jun 2014, Axb wrote:


On 06/09/2014 10:46 PM, Philip Prindeville wrote:

I’d like to add a plugin (and eventually share it once the bugs are
out) that uses either Net::CIDR::Lite to allow manual entry of
IP-based blacklists for known offending address blocks, or else using
the Geo::IP module to blacklist based on the country or ISP.

Is there a prototype Plugin that I could use for doing
parsing/looking up the URI’s hostname?  Since I’m using a local
database without network access, it could happen synchronously…


The standard SA URIBL.pm ?
put your data in a local NS instance (rbldnsd, bind, whatever you prefer)


Second URIBL.pm.

For small sites it would be nice if it supported specifying a netblock 
explicitly in the rule. If you're only doing a few that would be easier than 
setting up a zone or rbldnsd. You might look at extending URIBL.pm to do that.



I’m happy to try doing that, since I know Perl and need this… I’m just 
lacking on the expertise about doing SA modules… Anyone want to walk me 
through it?


Ths URIBL module is already there. If you know Perl it should be fairly 
easy to look at the existing code and add a variant where it accepts a 
netblock spec instead of a URIBL hostname and does the IP comparison to 
that rather than performing a DNS query...


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control laws cannot reduce violent crime, because gun control
  laws focus obsessively on a tool a criminal might use to commit a
  crime rather than the criminal himself and his act of violence.
---
 739 days since the first successful private support mission to ISS (SpaceX)

Re: add_header all Date of Scan _DATE_

2014-06-09 Thread Karsten Bräckelmann
On Mon, 2014-06-09 at 05:49 +0200, Karsten Bräckelmann wrote:
 Found the culprit after some digging. Bug 6915 [1], revision 1453407. As
 a band-aid, the following trivial one-line patch fixes it. Can easily be
 applied manually.
 
 Since it is kind of way past getting late here, and there may be other
 Template Tags affected, I'll defer proper bug handling and committing
 code changes for tomorrow.

Bug 7050 [1]. Fixed in trunk, to be committed to 3.4 branch after RTC
mode review and voting.

While the quick fix I posted yesterday does work, it does so only
because all occurrences want the current time formatted. It will not
work in general for other dates than now (which SA does not use with
that function).

A proper M::SA::PerMsgStatus.pm fix can be found in bug 7050 comment 1,
linked to the svn revision.


[1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7050

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: add_header all Date of Scan _DATE_

2014-06-09 Thread Karsten Bräckelmann
On Mon, 2014-06-09 at 09:23 +0200, Matus UHLAR - fantomas wrote:
 On 09.06.14 05:49, Karsten Bräckelmann wrote:
  Found the culprit after some digging. Bug 6915 [1], revision 1453407. As
  a band-aid, the following trivial one-line patch fixes it. Can easily be
  applied manually.
 
 can that by any chance fix problem with Date: in mail received by SSL ?
 That one behaves similarly...
 
 http://mail-archives.apache.org/mod_mbox/spamassassin-users/201401.mbox/20140131144406.GA28818%40fantomas.sk

No, these are unrelated. The code change mentioned above affects
Templates Tags only.

And while a date-string related function is involved in this issue, the
underlying bug is calling that function with a bad argument. Besides,
all instances of calling that function are now correct.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread Amir Caspi
On Jun 9, 2014, at 4:25 PM, John Hardin jhar...@impsec.org wrote:

 On Mon, 9 Jun 2014, Philip Prindeville wrote:
 
 http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol
 
 If it's in an HTML anchor tag the URL itself isn't in the body text, only 
 the display label will be.
 
 Try a uri rule.

This URL is already in my AC_SPAMMY_URI template group, though I don't know 
if this particular one has been released or not (I never sent an update since 
the first batch a few months ago), and even if so the current version would not 
have caught it due to being a bit too restrictive.

Try this:

uri __AC_LONGSTRS_URI   /\/[0-9]{8}(?:\/[a-z0-9_~]{50,}){3}\b/

Score as desired (I assign 3 points to all AC_SPAMMY_URI templates, but the 
released ones score differently).

--- Amir

Re: add_header all Date of Scan _DATE_

2014-06-09 Thread Karsten Bräckelmann
On Tue, 2014-06-10 at 02:03 +0200, Karsten Bräckelmann wrote:
 On Mon, 2014-06-09 at 09:23 +0200, Matus UHLAR - fantomas wrote:

  can that by any chance fix problem with Date: in mail received by SSL ?
  That one behaves similarly...
  
  http://mail-archives.apache.org/mod_mbox/spamassassin-users/201401.mbox/20140131144406.GA28818%40fantomas.sk
 
 No, these are unrelated. The code change mentioned above affects
 Templates Tags only.  [...]

Moreover, that sample shows SA 3.3.2. The bad Date Template Tag is
strictly 3.4 and trunk.

I've run the headers (after manually fixing that horribly mis-formatted
paste) through a 3.3 test environment and could not reproduce
DATE_IN_FUTURE rules firing. We will need a proper sample.

Since the check_for_shifted_date() eval works with the actual Date and
Received headers, I suspect the glue to result in that rule's misfiring.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread David B Funk

On Mon, 9 Jun 2014, Amir Caspi wrote:


On Jun 9, 2014, at 4:25 PM, John Hardin jhar...@impsec.org wrote:


On Mon, 9 Jun 2014, Philip Prindeville wrote:


http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol


If it's in an HTML anchor tag the URL itself isn't in the body text, only the 
display label will be.

Try a uri rule.


This URL is already in my AC_SPAMMY_URI template group, though I don't know 
if this particular one has been released or not (I never sent an update since the first 
batch a few months ago), and even if so the current version would not have caught it due 
to being a bit too restrictive.

Try this:

uri __AC_LONGSTRS_URI   /\/[0-9]{8}(?:\/[a-z0-9_~]{50,}){3}\b/

Score as desired (I assign 3 points to all AC_SPAMMY_URI templates, but the 
released ones score differently).

--- Amir


Just beware of FPs, I've seen some ugly URLs from things like airline
reservation confirmations. (spammers are getting better at stealing
features from legit messages to protect their garbage).

Also be aware that you cannot set the score for the rule __AC_LONGSTRS_URI
at all (as it's an indirect rule and thus scoreless), you'll either
have to rename it or use it in a meta rule.


--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread Philip Prindeville

On Jun 9, 2014, at 4:25 PM, John Hardin jhar...@impsec.org wrote:

 On Mon, 9 Jun 2014, Philip Prindeville wrote:
 
 We’re getting a lot of spam that contains URL’s which look like (remove 
 the ):
 
 http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol
 
 BTW, I found that the last N characters of the above URL’s were always the 
 same, and tried to do a “body” rule based on those last N characters, but I 
 couldn’t get the rule to match.
 
 Still not sure why.  The entire a ... sequence is only 382 characters long.
 
 Any ideas?
 
 If it's in an HTML anchor tag the URL itself isn't in the body text, only 
 the display label will be.
 
 Try a uri rule.


Thanks, that did it.

-Philip



Re: Forged yahoo and mass mailers

2014-06-09 Thread Alex
Hi,

  is enough for it to hit FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO,
  causing it to be marked as spam.

 Scores of 1.63 and 2.5 respectively, according to your sample. With a
 total score of 6.995, it is the latter one pushing it over the 5.0
 threshold, not the first one.

 Moreover, the responsible rule is NOT stock SA. The obvious L local
 prefix should be a clear hint. You defined it as from yahoo, but not
 DKIM valid.

 For amusement, search google for UNVERIFIED_YAHOO (and insist you really
 mean it literally with the underscore rather than two words).

 Yahoo uses DKIM and this wasn't signed. Funnily enough, that's a quote
 from a bug report back April 2007. Actually the OP closing its own
 report as not a bug.

This was a set of rules created by Mark back in 2011. Thanks for not
flaming me.

  Is there something I'm missing, or is there a better way to do this to
  avoid the FPs in the future?

 If by doing this you mean writing a safer variant of your local rule,
 you should have  (a) clearly stated it's a local rule, and  (b) pasted
 the complete current version of that local rule.

 By making us chase your local rules in archives, all you'll get is
 fingers pointing at your own, local rule.

I never intended to do that. I completely forgot this was a local rule.
I've disabled it for now, pending any words of wisdom on improving it from
those more knowledgeable than myself.

header __L_ML1   Precedence =~ m{\b(list|bulk)\b}i
header __L_ML2   exists:List-Id
header __L_ML3   exists:List-Post
header __L_ML4   exists:Mailing-List
header __L_HAS_SNDR  exists:Sender
meta   __L_VIA_ML__L_ML1 || __L_ML2 || __L_ML3 || __L_ML4 ||
__L_HAS_SNDR
header __L_FROM_Y1   From:addr =~ m{[@.]yahoo\.com$}i
header __L_FROM_Y2   From:addr =~
m{\@yahoo\.com\.(ar|br|cn|hk|mx|my|ph|sg)$}i
header __L_FROM_Y3   From:addr =~ m{\@yahoo\.co\.(id|in|jp|nz|th|uk)$}i
header __L_FROM_Y4   From:addr =~
m{\@yahoo\.(ca|cn|de|dk|es|fr|gr|ie|it|pl|ru|se)$}i
meta   __L_FROM_YAHOO __L_FROM_Y1 || __L_FROM_Y2 || __L_FROM_Y3 ||
__L_FROM_Y4
header __L_FROM_GMAIL From:addr =~ m{\@gmail\.com$}i
meta L_UNVERIFIED_YAHOO  !DKIM_VALID  !DKIM_VALID_AU 
__L_FROM_YAHOO  !__L_VIA_ML
priority L_UNVERIFIED_YAHOO  500
scoreL_UNVERIFIED_YAHOO  2.5
meta L_UNVERIFIED_GMAIL  !DKIM_VALID  !DKIM_VALID_AU 
__L_FROM_GMAIL  !__L_VIA_ML
priority L_UNVERIFIED_GMAIL  500
scoreL_UNVERIFIED_GMAIL  2.5

Thanks,
Alex


Re: Can't keep up with spam from SolarVPS sites

2014-06-09 Thread Amir Caspi
On Jun 9, 2014, at 7:11 PM, David B Funk dbf...@engineering.uiowa.edu wrote:

 Just beware of FPs, I've seen some ugly URLs from things like airline
 reservation confirmations. (spammers are getting better at stealing
 features from legit messages to protect their garbage).

FWIW, I haven't had a single FP on that or any of my other AC rules... but, 
that's only been tested on ham and spam for myself and my limited user base.  
An FP could, in principle, happen.

 Also be aware that you cannot set the score for the rule __AC_LONGSTRS_URI
 at all (as it's an indirect rule and thus scoreless), you'll either
 have to rename it or use it in a meta rule.

Indeed, I use this as part of a meta for AC_SPAMMY_URIs, so if you're using it 
standalone, remove the underscores.

--- Amir



Re: Forged yahoo and mass mailers

2014-06-09 Thread Alex
Hi,

On Mon, Jun 9, 2014 at 11:27 AM, Kevin A. McGrail kmcgr...@pccc.com wrote:

  On 6/8/2014 10:49 PM, Alex wrote:

  I have a few messages that have been incorrectly tagged because the
 sender used their yahoo address as the sender, but used a mass mailer (
 contactbeacon.com) to send their newsletter for them. Apparently this is
 enough for it to hit FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO, causing it
 to be marked as spam.

 Is there something I'm missing, or is there a better way to do this to
 avoid the FPs in the future?

 People with Yahoo! accounts (and AOL) and any other senders that have a
 DMARC policy of reject/quarantine need to use either A) a mailing list
 sender that has modified their process for DMARC or B) not use those
 accounts.

 See
 http://www.pcworld.com/article/2141120/yahoo-email-antispoofing-policy-breaks-mailing-lists.html


Great information, thanks so much guys. It looks like it would be better to
reject the p=reject DKIM at SMTP time, no?

Thanks,
Alex


auto-learn

2014-06-09 Thread Chris
Since having to wipe my bayes db I've thought about going back to having
'auto-learn' setup for awhile. It's been so long since I did this I have
a fairly dumb question. Do I need the two below lines to be set and if
so is this the correct setting? Anything here about a score of 5 is
considered spam.

# bayes_auto_learn_threshold_nonspam 0.1
# bayes_auto_learn_threshold_spam 12.0

Thanks
Chris

-- 
Chris
KeyID 0xE372A7DA98E6705C
31.11°N 97.89°W (Elev. 1092 ft)
21:38:18 up 7 days, 6:08, 1 user, load average: 0.53, 0.45, 0.34
Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb



Re: Forged yahoo and mass mailers

2014-06-09 Thread Karsten Bräckelmann
On Mon, 2014-06-09 at 21:40 -0400, Alex wrote:
  For amusement, search google for UNVERIFIED_YAHOO (and insist you really
  mean it literally with the underscore rather than two words).

 This was a set of rules created by Mark back in 2011. Thanks for not
 flaming me.

Heh. ;)

Sorry, but I kind of expect some due diligence, in particular by long
time and experienced community members. Coming across blatantly obvious
cases of local rules being complained about to misfire might make me
snappy.

Think about it this way: In order to help you, my first step is to find
out details about those rules (grep stock cf files) and their respective
score (your sample). You provided an exemplary, flawless sample. Why did
you not have a look at the rules' sources?


  By making us chase your local rules in archives, all you'll get is
  fingers pointing at your own, local rule.
 
 I never intended to do that. I completely forgot this was a local
 rule. I've disabled it for now, pending any words of wisdom on
 improving it from those more knowledgeable than myself.

The rule itself was not that bad. Actually, as Kevin and Anthony pointed
out, Yahoo even expressly states in their DMARC records you should never
have genuinely received those messages, nor accepted them. Yahoo
classifies it forged.

It is the mass mailer's and its client's fault. (Back to the cheap
part. Doing mass mailings but don't own your own domain? Accepting and
actually using free-mailer address as sender? Even worse, failing to get
the note about Yahoo DMARC policy in that business?)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: auto-learn

2014-06-09 Thread Karsten Bräckelmann
On Mon, 2014-06-09 at 21:40 -0500, Chris wrote:
 Since having to wipe my bayes db I've thought about going back to having
 'auto-learn' setup for awhile. It's been so long since I did this I have
 a fairly dumb question. Do I need the two below lines to be set and if
 so is this the correct setting? Anything here about a score of 5 is
 considered spam.
 
 # bayes_auto_learn_threshold_nonspam 0.1
 # bayes_auto_learn_threshold_spam 12.0

Answering the direct questions first: Yes, that is correct syntax. No,
you don't need them (commented out), they are default.

An auto-learning setup generally isn't a bad idea, and actually default.
Depending on your amount of messages, you might want to have a look at
the recent train-on-error option.

If (since) there was any need to wipe your old Bayes DB and start fresh,
I seriously recommend continued manual training. And in any case, always
(manually) training spam with low-ish Bayes probability. Likewise for
ham that doesn't already have a very low Bayes probability.

In non-high-volume environments, there's hardly any down-side on
training the extremes, too. Learning hand-confirmed non-extremes is
always worth it.


[1] 
http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Forged yahoo and mass mailers

2014-06-09 Thread Alex
Hi,

 This was a set of rules created by Mark back in 2011. Thanks for not

  flaming me.

 Heh. ;)

 Sorry, but I kind of expect some due diligence, in particular by long
 time and experienced community members. Coming across blatantly obvious
 cases of local rules being complained about to misfire might make me
 snappy.

 Think about it this way: In order to help you, my first step is to find
 out details about those rules (grep stock cf files) and their respective
 score (your sample). You provided an exemplary, flawless sample. Why did
 you not have a look at the rules' sources?

It really was a temporary lapse. I'm now managing so much, and thought for
sure it was an SA rule since I didn't immediately recognize it. Also, my
local rules all begin with LOC_, or immediately recognizable KAM_ or AXB_.

 The rule itself was not that bad. Actually, as Kevin and Anthony pointed
 out, Yahoo even expressly states in their DMARC records you should never
 have genuinely received those messages, nor accepted them. Yahoo
 classifies it forged.

 It is the mass mailer's and its client's fault. (Back to the cheap
 part. Doing mass mailings but don't own your own domain? Accepting and
 actually using free-mailer address as sender? Even worse, failing to get
 the note about Yahoo DMARC policy in that business?)

Great points. I've found the rule's hit a very large amount of ham, even
some that's been whitelisted. Investigating a bit further, it appears to
hit quite a few messages that indeed pass through yahoo.com. I've included
one such example set of headers here:

http://pastebin.com/XiHpRbJb

However, it doesn't have the p=reject DKIM auth statement, so I don't yet
fully understand how it all works. It hit DKIM_SIGNED but not DKIM_VALID,
and in fact hit T_DKIM_INVALID.

Thanks,
Alex


Re: auto-learn

2014-06-09 Thread Chris
On Tue, 2014-06-10 at 05:13 +0200, Karsten Bräckelmann wrote:
 On Mon, 2014-06-09 at 21:40 -0500, Chris wrote:
  Since having to wipe my bayes db I've thought about going back to having
  'auto-learn' setup for awhile. It's been so long since I did this I have
  a fairly dumb question. Do I need the two below lines to be set and if
  so is this the correct setting? Anything here about a score of 5 is
  considered spam.
  
  # bayes_auto_learn_threshold_nonspam 0.1
  # bayes_auto_learn_threshold_spam 12.0
 
 Answering the direct questions first: Yes, that is correct syntax. No,
 you don't need them (commented out), they are default.
 
 An auto-learning setup generally isn't a bad idea, and actually default.
 Depending on your amount of messages, you might want to have a look at
 the recent train-on-error option.
 
 If (since) there was any need to wipe your old Bayes DB and start fresh,
 I seriously recommend continued manual training. And in any case, always
 (manually) training spam with low-ish Bayes probability. Likewise for
 ham that doesn't already have a very low Bayes probability.
 
 In non-high-volume environments, there's hardly any down-side on
 training the extremes, too. Learning hand-confirmed non-extremes is
 always worth it.
 
 
 [1] 
 http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html
 
Thanks very much Karsten for the quick reply.

-- 
Chris
KeyID 0xE372A7DA98E6705C
31.11°N 97.89°W (Elev. 1092 ft)
22:18:08 up 7 days, 6:48, 1 user, load average: 0.56, 0.49, 0.61
Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb



Re: DMARC policy check with AskDNS posible?

2014-06-09 Thread Franck Martin

On Jun 7, 2014, at 9:49 PM, Christian Laußat us...@spamassassin.shambhu.info 
wrote:

 Am 07.06.2014 19:55, schrieb Franck Martin:
 As DMARC provide a feedback mechanism to the sender, then it is up to
 the sender to deal with these issues, you are just following their
 policy, you don’t need to or have to to second guess them. You can use
 some whitelists in openDMARC for some streams you really care about,
 like mailing lists. There are usually not too many.
 The default option of openDMARC is to not reject, as to avoid if you
 forgot opendkim or spf, and start to reject all the incoming mail…
 Once you are happy with the config, you ought to change that option.
 
 The problem is that the sender is not the postmaster, so if e.g. yahoo.com 
 had changed its policy to p=reject, many sender had been blocked without even 
 knowing why. There are many postmaster who think they understood DMARC and 
 set a wrong policy. For human interaction DMARC policy should be p=none. And 
 p=reject should only be used for automatic mailing systems e.g. shopping 
 systems and banks.

This is not correct. I think it is strange to claim that yahoo or aol, being a 
co-creator of DMARC and having outstanding engineers in the profession do not 
know what they are doing.

 
 So it's your decision if you would risk to loose some e-mail, but for me it 
 is a just another indicator for SpamAssassin to rate the mail.

Because of the monitoring mode, when you move to p=reject, with all the 
aggregate reports, you know exactly how much mail you will loose. As you take 
control of your email streams it becomes a sweet point where fixing exact 
domain spoofing is more interesting than losing some emails. Your mileage may 
vary.

 
 If you let OpenDMARC block on policy failures, why don't you let OpenDKIM 
 block on DKIM failures and SPF-milter on SPF failures? Blocking on only one 
 criteria leads to many false positives. That's the power of SpamAssasin, to 
 combine many rating points and then decide if it*s spam or not.
 
DKIM and SPF do not have a reporting to the sender to tell them how many emails 
were blocked/rejected. DKIM does not have a policy method, only SPF. So as a 
sender with SPF -all you have no idea how many emails are blocked, very few are 
willing to take that risk. With DMARC, you know exactly which emails are 
getting blocked/rejected.



signature.asc
Description: Message signed with OpenPGP using GPGMail