Re: AWL functionality messed up?

2009-05-27 Thread Matt Kettler
Linda Walsh wrote:
 Bowie Bailey wrote:
 Linda Walsh wrote:

 I got a really poorly scored piece of spam -- one thing that stood out
 as weird was report claimed the sender was in my AWL.

 Any sender who has sent mail to you previously will be in your AWL. 
 This is probably the most misunderstood component of SA.  Read the wiki.

 http://wiki.apache.org/spamassassin/AutoWhitelist


 
 At face value, this seems very counter productive.
It's obvious you're taking it at face value and you've not read the
URL above.

You're seeing whitelist in the name, and beliving it. Sorry the name
is misleading, but the AWL is not a whitelist.

 If I get spam from 1000 senders, they all end up in my
 AWL???

 WTF?
You're leaping to wildly incorrect conclusions, mostly because you're
assuming the AWL is a whitelist. It's not.

*READ* the URL above. No, really READ IT. You don't understand the AWL yet.

 AWL should only be added to by emails judged to be 'ham' via
 the feed back mechanisms --, spammers shouldn't get bonuses for
 being repeat senders...
Who says they get bonuses just for being a repeat sender?? They get
bonuses or penalties, all depending.

The AWL isn't a whitelist Linda. It's an averager. It can whitelist or
blacklist messages. If they send a message that scores less than their
previous average, they get a positive AWL score (blacklisting). If they
send one that's higher they get a negative score (whitelisting).

HOWEVER, in the AWL, a simple look at the positive or negative sign on
the score doesn't really tell you much.

Take this example: Pre-AWL score +12, AWL -2, Final score +10, . What
did the AWL think of this sender based on history? +6, spammer.

If the same sender instead sent: Pre-AWL score +4 the AWL would hit at
+1.0 resulting in Final score +5.0.

End result: same sender, different messages, different signs on the AWL,
but both are still tagged as spam. And in one example, a false negative
was avoided based on their history.


 How do I delete spammer addresses from my 'auto-white-list'? \
spamassassin --remove-addr-from-whitelist=...@example.com


 (That's just insane..whitelisting spammers?!?!)
No, it's insane to have the AWL named AWL, because it's not a white list.

It's really A history-based score averaging system with automatic
whitelisting and blacklisting effects. However, AHBSASWAWB is an awfuly
long name.

I *REALLY* suggest you read up on how the AWL works, for real, before
jumping to conclusions about what it is, and what it does. It really
doesn't work the way you think.








Re: my AWL messed up?

2009-05-27 Thread Matt Kettler
Linda Walsh wrote:
 To be clear about what is being white listed, would it
 hurt if the 'brief report for the AWL', instead of :
 -1.3 AWLAWL: From: address is in the auto white-list

 it had
 -1.3 AWLAWL: 'From: 518501.com' addr is in auto white-list

 So I can see what domain it is flagging with a 'white' value?

 I don't know of any emails from '518501.com' that wouldn't have
 been classified spam, so none should have a 'negative value'.


What was the final message score in this example? Looking at the AWL
score alone is meaningless, and doesn't show what the AWL thinks the
historical average is.

If the final score was over 6.3, the AWL still thought the sender was
a spammer. It's just splitting the averages.


Re: Problem with check_invalid_ip()

2009-05-29 Thread Matt Kettler
Eric Rodriguez wrote:
 Hi,

 I'm having trouble with the check invalid_ip subroutine in the
 RelayEval.pm.
 See
 http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/RelayEval.pm?view=logr1=451385pathrev=451385
 http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/RelayEval.pm?view=logr1=451385pathrev=451385

 After a couple test, it seems that 193.X.X.X and 194.X.X.X ip's are
 not valid with respect to the regexp.
 Is this a bug? or am I wrong about the test?

 I used http://www.fileformat.info/tool/regex.htm with
 RegExp:   
 (?:[01257]|(?!127.0.0.)127|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+
 Tests:
 127.0.0.1
 192.168.1.1
 87.248.121.75
 193.1.1.1
 194.1.1.1


 Could someone explain me which ip are valid according to this test ?
 Thanks

 Eric Rodriguez
Using the above tool I get results telling me that 193.1.1.1 and
194.1.1.1 do NOT match, and therefore are valid IPs.

TestTarget String   matches()   replaceFirst()  replaceAll()
lookingAt() find()  group(0)
1   193.1.1.1   *No*193.1.1.1   193.1.1.1   No  No
2   194.1.1.1   *No*194.1.1.1   194.1.1.1   No  No



In fact, NONE of your test strings match the regex. But 127.1.1.1,
correctly, does.




Re: Problem with check_invalid_ip()

2009-05-29 Thread Matt Kettler
Eric Rodriguez wrote:
 Hi,

 I removed the negation ~ , the begin ^ and end $  charaters from the
 original source:
 sub check_for_illegal_ip {
   my ($self, $pms) = @_;

   foreach my $rcvd ( @{$pms-{relays_untrusted}} ) {

 # (note this might miss some hits if the Received.pm skips any invalid 
 IPs)
 foreach my $check ( $rcvd-{ip}, $rcvd-{by} ) {
   return 1 if ($check =~ /^
   
 (?:[01257]|(?!127.0.0.)127|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+

   $/x);
 }
   }
   return 0;
 }
   

 Here are my results:
 Test  Target String   matches()   replaceFirst()  replaceAll()
 lookingAt()   find()  group(0)
 1 127.0.0.1   No  12  12  No  Yes 7.0.0.1
 2 192.168.1.1 No  19  19  No  Yes 2.168.1.1
 3 87.248.121.75   No  8   8   No  Yes 7.248.121.75
 4 193.1.1.1   No  193.1.1.1   193.1.1.1   No  No
 5 194.1.1.1   No  194.1.1.1   194.1.1.1   No  No



 If I understand correctly the first 3 tests are valid IP, but not the
 193.1.1.1 and 194.1.1.1 ??

 Eric Rodriguez

No, none of the 5 has yes in the matches column so they're all valid
(ie: none of them matches the regex).

The other columns are irrelevant to the application here. Please ignore
them unless you fully understand them.







Re: tests= SIZE_LIMIT_EXCEEDED ??

2009-06-08 Thread Matt Kettler
Stefan-Michael Guenther wrote:
 Hi,

 I just had a closer look at the header of an email which should have
 been recognized by spamassassin as spam.

 Waht I found was this:

 X-SpamScore:   0
 tests= SIZE_LIMIT_EXCEEDED

 I have checked /usr/share/spamassassin/ for a rule which might contain
 a size limit, but didn't finde any.

 A search with Google didn't help either.

 So, any suggestions from the list members where I can define the size
 that has been exceeded?

 Thanks,

 Stefan


Interesting, are you just using spamc/spamd, or a different integration
tool?


In general it sounds like something decided not to feed the message to
the main SpamAssassin instance at all. Spamc can do this, but I didn't
know it added a test when doing so.

Also, X-SpamScore is not a default header, and one that SA couldn't add
itself. (it must add headers beginning with X-Spam-, so you'd get
X-Spam-Score at the closest), so I'm suspecting this was done by your
integration tools.




Re: Unsubscribe

2009-06-12 Thread Matt Kettler
Mike Yrabedra wrote:
 unsubscribe
   
If you look at the message headers, there's a header explaining where to
send unsubscribe messages to (this is the RFC standard header for doing
this, so look for it in other mailing lists):

List-Unsubscribe: mailto:users-unsubscr...@spamassassin.apache.org







Re: Unsubscribe

2009-06-13 Thread Matt Kettler
Michael Scheidell wrote:


 Since we saw two of them come in pretty back to back, I suspect a joe
 job of sometype.  those people might not have subscribed.
That would be a bit tricky to just be a joe job. This list is confirmed
opt-in. i.e.: if you subscribe, an automated bot from ezlm sends you a
message that you need to reply to to get subscribed. Well, actually all
you have to do is send a second message to a different address that
contains randomly generated text as a magic cookie. But still, you need
to know that randomly generated address.

Of course, it's always the possibility someone guessed the random text
in the reply address.. but, good luck..

They start off a bit like this (note: I've munged the email address, and
changed the values of the magic text and serial number, but I have not
changed the length. I substituted letters for letters, and numbers for
numbers. Otherwise, this is the start of a real confirm message.)

-

Hi! This is the ezmlm program. I'm managing the
users@spamassassin.apache.org mailing list.

To confirm that you would like

   exam...@example.com

added to the users mailing list, please send
a short reply to this address:

   
users-sc.1244818352.jacibredcfjnkiobdtef-example=example@spamassassin.apache.org

Usually, this happens when you just hit the reply button.
...






Re: Custom Rule Sets

2009-06-21 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 Good morning,

 Looking at the docs I see a 'don't add your customer rules here' warning
 in reference to the default /usr/share/spamassassin dir. Instead it
 lists a couple of options including local.cf

 Is it possible to ask local.cf to include external files/dir for custom
 rules at all? 
Yes, there is an include directive (see the Mail::SpamAssassin::Conf
docs) but by default SA will load *ALL* .cf files from your site rules
directory (usually /etc/mail/spamassassin), so includes at the local.cf
level are a bit silly. Just put extra .cf files in the same directory
and SA will load them.

Generally speaking, the include directive is only used at the user_prefs
level, where a single file is parsed by default, not a whole directory.

See also:
http://wiki.apache.org/spamassassin/WritingRules



Re: Custom Rule Sets

2009-06-22 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 On Mon, 2009-06-22 at 00:26 -0400, Matt Kettler wrote:
   
 rich...@buzzhost.co.uk wrote:
 
 Good morning,

 Looking at the docs I see a 'don't add your customer rules here' warning
 in reference to the default /usr/share/spamassassin dir. Instead it
 lists a couple of options including local.cf

 Is it possible to ask local.cf to include external files/dir for custom
 rules at all? 
   
 Yes, there is an include directive (see the Mail::SpamAssassin::Conf
 docs) but by default SA will load *ALL* .cf files from your site rules
 directory (usually /etc/mail/spamassassin), so includes at the local.cf
 level are a bit silly.
 

 I agree - but the docs seem to imply that you should not put them in
 here - hence my confusion.

   

Where do they imply you should not create additional .cf files?




Re: Custom Rule Sets

2009-06-22 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 On Mon, 2009-06-22 at 07:30 -0400, Matt Kettler wrote:
   
 rich...@buzzhost.co.uk wrote:
 
 On Mon, 2009-06-22 at 00:26 -0400, Matt Kettler wrote:
   
   
 rich...@buzzhost.co.uk wrote:
 
 
 Good morning,

 Looking at the docs I see a 'don't add your customer rules here' warning
 in reference to the default /usr/share/spamassassin dir. Instead it
 lists a couple of options including local.cf

 Is it possible to ask local.cf to include external files/dir for custom
 rules at all? 
   
   
 Yes, there is an include directive (see the Mail::SpamAssassin::Conf
 docs) but by default SA will load *ALL* .cf files from your site rules
 directory (usually /etc/mail/spamassassin), so includes at the local.cf
 level are a bit silly.
 
 
 I agree - but the docs seem to imply that you should not put them in
 here - hence my confusion.

   
   
 Where do they imply you should not create additional .cf files?


 
 I does not. I've already covered that and thanked a poster earlier for
 guiding me in my error. Did you not read the follow up I posted?


   
About 20 seconds after I replied..

Sorry, just waking up for the AM here... Didn't think to read the rest
of the thread.



Re: Unable to update SARE

2009-06-22 Thread Matt Kettler
Frank Bures wrote:
 Since yesterday, when running

 sa-update --channelfile /etc/mail/spamassassin/sare-sa-update-channels.txt
 --gpgkey 856AA88A

 I get

 Use of uninitialized value in concatenation (.) or string at
 /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/Scalar/Util.pm line 30.

 An example line from sare-sa-update-channels.txt:
 70_sare_adult.cf.sare.sa-update.dostech.net


 Any ideas will be greatly appreciated.
   

Why are you trying to update SARE?

You might want to read the front page of the website:

http://www.rulesemporium.com/



Re: Use of uninitialized value $dir in scalar chomp at /usr/local/bin/spamd line 2118, GEN103 line 2.

2009-06-22 Thread Matt Kettler
alexus wrote:
 On Thu, Apr 23, 2009 at 4:08 PM, alexusale...@gmail.com wrote:
   
 On Wed, Apr 8, 2009 at 12:50 AM, Matt Kettler mkettler...@verizon.net 
 wrote:
 
 alexus wrote:
   
 I keep getting this line in my logs everytime there is a spamd calles

 Apr  8 03:55:15 mx1 spamd[36109]: Use of uninitialized value $dir in
 scalar chomp at /usr/local/bin/spamd line 2118, GEN103 line 2.

 i dont suppose this is normal

 
 Are you using the -v parameter when you start spamd, but are passing a
 username that's not a vpopmail user with working vuserinfo?


 Code:
 -
  if ( $opt{'vpopmail'} ) {
my $vpopdir = $dir;
$dir = `$vpopdir/bin/vuserinfo -d \Q$username\E`;
if ($? != 0) {
  #
  # If vuserinfo failed $username could be an alias
  #
  $dir = `$vpopdir/bin/valias \Q$username\E`;
  if ($? == 0  $dir !~ /.+ - /) {
$dir =~ s,.+ - (/.+)/Maildir/,$1,;
  } else {
undef($dir);
  }
}
chomp($dir);
  }
 --




   
 i even tried with vpopmail user instead of spamd user, I still get this 
 warning

 --
 http://alexus.org/

 

 sorry for getting back for older post, but i never got around to fix
 this issue, and I think it should be fixed...
 can someone suggest me how to resolve this issue?

 let me recap

 every time an email comes in, I get following line in my syslog

 spamd[30649]: Use of uninitialized value $dir in scalar chomp at
 /usr/local/bin/spamd line 2118, GEN990 line 2.

 that's how I run spamd

 root  1736  0.0  0.5 70044 40568  ??  SsJ  23May09   3:53.05
 /usr/local/bin/spamd --allow-tell --daemonize --vpopmail
 --username=spamd --socketpath=/tmp/spamd.sock --pidfile
 /usr/local/var/run/spamd.pid (perl)

   
Ok, so you're running with vpopmail, and spamd is running as the spamd
user.

So, what virtual user are you passing to spamc's -u parameter?

What happens when you run vuserinfo and pass the above username to it?




Re: 552 spam score (11.3) exceeded threshold

2009-06-22 Thread Matt Kettler
John Hardin wrote:
 On Mon, 22 Jun 2009, Pawe�~B T�~Ycza wrote:

 Yesterday I was trying to send here warning of new www.shopXX.net
 spam flood. It was short letter with a few URLs to pastebin.com.
 Unfortunately my messages hasn't arrived at the mailing list.

 What's up? Do I really look like a spammer? ;)

 It's a bad idea to pass SA list email through SA...

All mail on the list passes through SA anyway, albeit with a high
required_score (10.0) the Apache Software Foundation (ASF) runs it on
their mailservers, and this list is hosted by  them. Check the headers.

It's a little unfortunate in that it makes posting spam samples a pain,
but we're not the only project using these servers.



Re: gpg signed spam email ???

2009-06-27 Thread Matt Kettler
RobertH wrote:
 i was reading at

 http://www.karan.org/blog/

 specifically

 http://www.karan.org/blog/index.php/2009/06/15/gpg-signed-spam

 that he recv'd a gpg signed spam email

 ive never heard of that before yet i havent thought much about it or studied
 it...

 Q: is this unheard of, or common?

 near as i can quickly investigate, it doesnt appear to be common as per
 papa google [sic].

 comments? feedback?

 just trying to get up on the curve now.

Well, let's put it this way:

A long, long time ago, SA had a rule in the default set, giving negative
score to PGP and GPG signed messages. Quickly, spammers started adding
enough fragments of a signature to match the rule. This was very
obvious, as the rule only matched the begin clause, and the spams had a
begin clause dropped at the bottom of the message, with no end clause.

The rule could have been modified to validate the signature, but of
course, anyone can GPG sign a message and have it be valid, and the
spammers probably would have done so if the rule changed. Therefore, the
rule was dropped from the set entirely.

GPG signatures only validate that the sender has the private key that
matches the public one signing the email. Like SPF, and many other
authentication only technologies, this doesn't tell you anything about
the sender. Even perfect authentication at best only provides
confirmation of who the sender is, and most of these technologies only
prove a sender is the proper owner holder of some abstract identity like
a key or domain.

Authentication needs to be paired with recognition to be meaningful.  If
a sender proves who they are, will you immediately accept the email
without further question? What if they just proved they were Alan Ralsky?

http://www.spamhaus.org/rokso/listing.lasso?-op=cnspammer=Alan%20Ralsky


Moral of the story: don't assign negative scores to systems that only
provide authentication, unless you're somehow pairing it with proof the
sender is someone you actually trust (or at least is trusted by a
service you trust, etc).

Ever notice that the negative score of SPF_PASS is insignificantly
small, there's a reason for that.. Spammers can pass SPF too, so by
itself, it's meaningless. But paired with your explicit trust of a
domain or sender, it provides forgery resistant whitelisting
(whitelist_from_spf).








 


Re: gpg signed spam email ???

2009-06-28 Thread Matt Kettler
True, it likely is. But it would also be trivial for the spammer to
generate a valid one.

Given what we've seen with the image spams in the past (custom generated
image for *every* email with random font, size, color, offset, and
randomized dots added on), computational power is hardly an obstacle.

As before, you might be able to write a plugin to check the signature
and assign positive points if it is invalid, but I don't know if that
would work long enough to be worthwhile.

Justin Mason wrote:
 there's a very good chance the GPG signature in this case was fake --
 ie. a cut-and-paste job.

 --j.

 On Sat, Jun 27, 2009 at 19:05, Matt Kettlermkettler...@verizon.net wrote:
   
 RobertH wrote:
 
 i was reading at

 http://www.karan.org/blog/

 specifically

 http://www.karan.org/blog/index.php/2009/06/15/gpg-signed-spam

 that he recv'd a gpg signed spam email

 ive never heard of that before yet i havent thought much about it or studied
 it...

 Q: is this unheard of, or common?

 near as i can quickly investigate, it doesnt appear to be common as per
 papa google [sic].

 comments? feedback?

 just trying to get up on the curve now.
   
 Well, let's put it this way:

 A long, long time ago, SA had a rule in the default set, giving negative
 score to PGP and GPG signed messages. Quickly, spammers started adding
 enough fragments of a signature to match the rule. This was very
 obvious, as the rule only matched the begin clause, and the spams had a
 begin clause dropped at the bottom of the message, with no end clause.

 The rule could have been modified to validate the signature, but of
 course, anyone can GPG sign a message and have it be valid, and the
 spammers probably would have done so if the rule changed. Therefore, the
 rule was dropped from the set entirely.

 GPG signatures only validate that the sender has the private key that
 matches the public one signing the email. Like SPF, and many other
 authentication only technologies, this doesn't tell you anything about
 the sender. Even perfect authentication at best only provides
 confirmation of who the sender is, and most of these technologies only
 prove a sender is the proper owner holder of some abstract identity like
 a key or domain.

 Authentication needs to be paired with recognition to be meaningful.  If
 a sender proves who they are, will you immediately accept the email
 without further question? What if they just proved they were Alan Ralsky?

 http://www.spamhaus.org/rokso/listing.lasso?-op=cnspammer=Alan%20Ralsky


 Moral of the story: don't assign negative scores to systems that only
 provide authentication, unless you're somehow pairing it with proof the
 sender is someone you actually trust (or at least is trusted by a
 service you trust, etc).

 Ever notice that the negative score of SPF_PASS is insignificantly
 small, there's a reason for that.. Spammers can pass SPF too, so by
 itself, it's meaningless. But paired with your explicit trust of a
 domain or sender, it provides forgery resistant whitelisting
 (whitelist_from_spf).











 


   



Re: RulesDuJour

2009-06-30 Thread Matt Kettler
Anshul Chauhan wrote:
 we have to copy KAM.cf  to /usr/share/spamassassin only for its
 integration with spamassassin or something else is to done

 I'm using spamassassin-3.2.5-1.el4.rf on Centos4.7

Any add-on rules should be placed in the same directory as your local.cf
(ie: /etc/mail/spamassassin/ in most cases). SA reads *.cf from this
directory, not just local.cf.

Adding files to /usr/share/spamassassin, or making changes to files
present there, is not a good idea. When SpamAssassin gets upgraded, this
whole directory will be nuked by the installer.





Re: perms problems galore

2009-07-03 Thread Matt Kettler
Gene Heskett wrote:
 Greetings all;

 I _thought_ I had sa-update running ok, but it seemed that the effectiveness 
 was stagnant, so I found the cron entry that was running as-update  
 discovered a syntax error there, which when I fixed it, disclosed that I had 
 all sorts of perms problems that I don't seem to be able to fix readily.

 sa-update is being run as the user saupdate, which is a member of the group 
 mail.  I have made the whole /var/lib/spamassassin/keys tree an 
 saupdate:mail, 
 with very limited rights as in:
 drw--- 2 saupdate mail 4096 2008-12-19 16:05 keys

 But sa-update appears not to have perms to access or create gpg keys there.
 --
 [r...@coyote init.d]# su saupdate -c /usr/bin/sa-update --gpghomedir 
 /var/lib/spamassassin/keys
 gpg: failed to create temporary file 
 `/var/lib/spamassassin/keys/.#lk0xb9bfb8a8.coyote.coyote.den.8955': 
 Permission 
 denied
 --
 What do I need to open that up to?

 Thanks.
   
In order to be able to create files, you need the X permission on a
directory.

That said, why give the saupdate user the ability to add keys at all?
Import them as root and only give the saupdate user read access.

 



Re: perms problems galore

2009-07-03 Thread Matt Kettler
Gene Heskett wrote:

 Ok, I'll fix that, thanks.

   
 That said, why give the saupdate user the ability to add keys at all?
 Import them as root and only give the saupdate user read access.
 

 Basically, since I run myself as root, I was trying to reduce the exposure.
 All the rest of the routine mail handling here is by unpriviledged users.  
 And 
 it is all behind a dd-wrt firewall with NAT.

   
True, but installing keys isn't something that should be routine. This
should only be possible manually. i.e.: sa-update does not need to
create or write to the key file to perform an update.

If you're concerned about exposure, it's really best that your automatic
saupdate user not have rights over the key file, it doesn't need it.




Re: Annoying auto_whitelist

2009-07-04 Thread Matt Kettler
Michelle Konzack wrote:
 Hello,

 while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
 some are going throug my filters, I have to move them to  my  spamfolder
 manualy and feed them to sa-learn --spam but this does not work...

 ...because the Spamer From: is in the auto_whitelist.

Wait a second. The AWL has nothing to do with bayes or sa-learn.

The only reason SA won't learn a message a spam would be if it has
already been learned as spam, as noted in the bayes_seen database (or
corresponding SQL table).

 For me, this seems to be a bug, becuase sa-learn has to remove the From:
 from the auto_whitelist and then RESCAN this crap.
Um, the AWL has nothing to do with sa-learn --spam, and this action will
neither consult, nor modify the AWL.

What makes you think the AWL is inhibiting learning?

The AWL is actually going to contain *EVERY* sender that ever sent you
email (because it is an averager, not a whitelist), so if it would
inhibit learning, you'd never be able to learn anything.



Re: Annoying auto_whitelist

2009-07-04 Thread Matt Kettler
Michelle Konzack wrote:
 Hello,

 while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
 some are going throug my filters, I have to move them to  my  spamfolder
 manualy and feed them to sa-learn --spam but this does not work...

 ...because the Spamer From: is in the auto_whitelist.

 For me, this seems to be a bug, becuase sa-learn has to remove the From:
 from the auto_whitelist and then RESCAN this crap.

Is the AWL actually causing false negatives?

Please be aware the AWL is NOT whitelist, or a blacklist, and the scores
don't really quite work the way they look. The AWL is essentially an
averager, and as such, it's sometimes going to assign negative scores to
spam sometimes.

This does *NOT* necessarily mean the AWL has whitelisted the sender,
unless it pushes it below the required_score. It just means that this
spam scored higher than the last one. i.e.: if a spam scoring +20 gets a
-5 AWL, the AWL still believes the sender is a spammer with a +10
average. If that same sender had instead sent a message scoring 0, the
AWL would have given them a +5.

Please be sure to read:

http://wiki.apache.org/spamassassin/AwlWrongWay

Before you make too many judgments about what the AWL is doing. Looking
at the score it assigns alone does not tell you anything about what the
AWL is doing.






Re: Current Rules Repository

2009-07-08 Thread Matt Kettler
Patrick Sherrill - Coconet wrote:
 With SARES et al not being updated, where is the best repository for
 current rules being maintained?
   
The default sa-update channel.


 .


   



Re: Never ending spam flood www.viaXX.net?

2009-07-10 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 On Fri, 2009-07-10 at 21:26 +1200, Jason Haar wrote:
   
 On 07/10/2009 09:01 PM, Paweł Tęcza wrote:
 
 Please see my initial post on Pastebin:

 http://pastebin.com/f6a83e9fb
   
   
 If it's true that all those domains resolve to just a handful of IP
 addresses, then why aren't they listed in - oh wait - SURBLs don't cover
 IPs just the DNS names - argh!

 Is there a way to do SURBL lookups of the IP instead of the FQDN?

 
 Is there not some kind of 'intent' plugin for SA?

 Barracuda (which steal everything else) have an intent scanner that
 looks at links in mails and resolves the name to IP *AND* the AUTH NS.
 Then looking the IP's found up.
   
SA has always avoided resolving forward lookups of potentially spammer
controlled domains to IPs. This is extremely foolish to do, as it opens
you up to a variety of attacks against your DNS resolver. (resolver
cache poisoning, DoS, etc)

 I can't believe they wrote it themselves - seriously I can't! What plug
 in is it?

   
It's no plugin I know of, but it's a feature we intentionally left out
of SA for security reasons. So given that it's a really bad idea I'd
guess barracuda did implement it themselves.


Re: Never ending spam flood www.viaXX.net?

2009-07-10 Thread Matt Kettler
Steve Freegard wrote:
 Matt Kettler wrote:
   
 rich...@buzzhost.co.uk wrote:
 
 On Fri, 2009-07-10 at 21:26 +1200, Jason Haar wrote:
   
   
 On 07/10/2009 09:01 PM, Paweł Tęcza wrote:
 
 
 Please see my initial post on Pastebin:

 http://pastebin.com/f6a83e9fb
   
   
   
 If it's true that all those domains resolve to just a handful of IP
 addresses, then why aren't they listed in - oh wait - SURBLs don't cover
 IPs just the DNS names - argh!

 Is there a way to do SURBL lookups of the IP instead of the FQDN?

 
 
 Is there not some kind of 'intent' plugin for SA?

 Barracuda (which steal everything else) have an intent scanner that
 looks at links in mails and resolves the name to IP *AND* the AUTH NS.
 Then looking the IP's found up.
   
   
 SA has always avoided resolving forward lookups of potentially spammer
 controlled domains to IPs. This is extremely foolish to do, as it opens
 you up to a variety of attacks against your DNS resolver. (resolver
 cache poisoning, DoS, etc)

 
 I can't believe they wrote it themselves - seriously I can't! What plug
 in is it?

   
   
 It's no plugin I know of, but it's a feature we intentionally left out
 of SA for security reasons. So given that it's a really bad idea I'd
 guess barracuda did implement it themselves.

 

 Are you forgetting URIBL_SBL??   That requires the A or NS records of
 the URI to function.
   

We do NS only. Not A.



Re: Annoying auto_whitelist

2009-07-10 Thread Matt Kettler
RW wrote:
 On Fri, 10 Jul 2009 12:33:51 +0200
 Matus UHLAR - fantomas uh...@fantomas.sk wrote:

   
 On Sat, 04 Jul 2009 08:56:35 -0400
 Matt Kettler mkettler...@verizon.net wrote:
 
 Please be aware the AWL is NOT whitelist, or a blacklist, and
 the scores don't really quite work the way they look. The AWL is
 essentially an averager, and as such, it's sometimes going to
 assign negative scores to spam sometimes.
   
 And it works from its own version of the score that ignores
 whitelisting and bayes scores. So if learning a spam leads to the
 next spam from the same address getting a higher bayes score,
 that benefit isn't washed-out by AWL. 
 
 On 04.07.09 22:42, RW wrote:
 
 I take that back, I thought the the BAYES_XX rules were ignored by
 AWL, but they aren't.

 Personally I think BAYES should be ignored by AWL, emails from the
 same from address and ip address will have a lot of tokens in
 common.  They should train quickly, and there shouldn't be any need
 to damp-out that learning.
   
 I don't think so. Teaching BAYES is a good way to hint AWL which way
 should it push scores. By ignoring bayes, you could move much spam
 the ham-way since much of spam isn't catched by other scores than
 BAYES, and vice versa.

 
 Right, but that's only a benefit if the BAYES score drops - remember
 it's an averaging system. Personally I only have a single spam in my
 spam corpus that has a AWL hit and doesn't hit BAYES_99, and that hits
 BAYES_95. Sending multiple spams from the same from address and IP
 address is a gift to Bayesian filters.

 The much more common scenario is that the first spam hits BAYES_50 and
 subsequent BAYES_99 hits are countered by a negative  AWL score.
   
Technically, this only counters half the score. It also gets paid back
later. It raises the stored average that will apply to subsequent messages.

I'd also argue it's a rather rare case. Most of my spam hits BAYES_99
the first shot around, and most has varying sender address and IP. The
odds of one having increasing score and the same sender address/ip seems
extraordinarily unlikely to me.

Besides, the real problem there isn't the AWL, but the fact that the
first message scored low.

Are you really seeing cases where this is causing false negatives, or
are you just pontificating about what's possible?




Re: deactivate all checks except specific tests

2009-07-10 Thread Matt Kettler
sebast...@debianfan.de wrote:
 Hello,

 i have set up a virtual server for experiments.

 I want to disable all the spamassassin tests - except one specific rbl -
 in this topic-  the manitu rbl.

 Is there a parameter for disabling all the tests?
There is no option to disable all rules.

However, you could use the -C parameter to either spamd or spamassassin
and point SA to a directory that does not contain any rulefiles, except
a single .cf containing the rule you want to run. This would effectively
remove the stock ruleset from the parse.






Re: Opt In Spam

2009-07-16 Thread Matt Kettler
Have you reported the abuse to mailto:habeas@abuse.net, as Neil
Schwartzman from Return Path (operators of Habeas) requested last time?

Just posting to the sa-users list isn't really going to do very much. If
there are pervasive FP problems, it will show up in the mass-checks and
we'll drop the score.



twofers wrote:
 And yet another SPAM from these opt-in guys.
  
 I believe this group are nothing but covert Spammers abusing a
 privilage afforded them.
  
 I receive these spams at two separate email addresses, both I use
 exclusively for my business, there is no way I'd use these addresses
 as an opt-in for anything. They are not personal emails and I'd never
 consider using them as opt-in for anything. I don't opt-in for
 anything ever to begin with anyway.
  
 X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
 H67646.safesecureweb.com
 X-Spam-Level:
 X-Spam-Status: No, score=0.6 required=5.0 tests=HABEAS_ACCREDITED_SOI,

 HTML_IMAGE_RATIO_02,HTML_MESSAGE,LOCAL_URI_NUMERIC_ENDING,MISSING_MID,
 MPART_ALT_DIFF,SARE_UNSUB09 autolearn=no version=3.2.1
 X-Spam-Report:
 *  0.0 MISSING_MID Missing Message-Id: header
 *  1.3 SARE_UNSUB09 URI: SARE_UNSUB09
 *  2.0 LOCAL_URI_NUMERIC_ENDING URI: Ends in a number of at
 least 4 digits
 *  0.0 HTML_MESSAGE BODY: HTML included in message
 *  1.1 MPART_ALT_DIFF BODY: HTML and text parts are different
 *  0.6 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text
 to image area
 * -4.3 HABEAS_ACCREDITED_SOI RBL: Habeas Accredited Opt-In or
 Better
 *  [66.59.8.161 listed in sa-accredit.habeas.com]
 Received: (qmail 17894 invoked from network); 15 Jul 2009 12:21:13 -0400
 Received: from mailengine.8lmediamail.com (66.59.8.161)
   by mail.jelsma.com with SMTP; 15 Jul 2009 12:21:12 -0400
 Received-SPF: pass (mail.jelsma.com: SPF record at
 mailengine.8lmediamail.com designates 66.59.8.161 as permitted sender)
 Received: by mailengine.8lmediamail.com (PowerMTA(TM) v3.2r23) id
 hbo0ve0eutci for embroid...@x.com mailto:embroid...@x.com;
 Wed, 15 Jul 2009 09:14:23 -0700 (envelope-from
 streamsendboun...@mailengine.8lmediamail.com
 mailto:streamsendboun...@mailengine.8lmediamail.com)
 Content-Type: multipart/alternative;
 boundary=_--=_1073964459106330
 MIME-Version: 1.0
 X-Mailer: StreamSend - 23361
 X-Report-Abuse-At: ab...@streamsend.com mailto:ab...@streamsend.com
 X-Report-Abuse-Info: It is important to please include full email
 headers in the report
 X-Campaign-ID: 20812
 X-Streamsendid: 23361+362+1918562+20812+mailengine.8lmediamail.com
 Date: Wed, 15 Jul 2009 09:14:24 -0700
 From: Paul DiFrancesco: Eight Legged Media efly...@8lmediamail.com
 mailto:efly...@8lmediamail.com
 To: embroid...@x.com mailto:embroid...@x.com
 Subject: Visit with over 25 suppliers
 This is a multi-part message in MIME format.





Re: Underscores

2009-07-16 Thread Matt Kettler


twofers wrote:
 How can I pattern match when every word has an underscore after it.
 Example:
 This_sentenance_has_an_underscore_after_every_word

 I'm not really good at Perl pattern matching, but \w and \W see an
 underscore as a word character, so I'm just not sure what might work.

 body =~ /^([a-z]+_+)+/i

 Is that something that will work effectively?

 Thanks.

 Wes



I'd do something like this:

body  MY_UNDERSCORES/\S+_+\S+_+\S+/

Unless you really want to restrict it to A-Z.

Regardless, ending any regex in + in a SA rule is redundant. Since +
allows a one-instance match, it will devolve to that. You don't need to
match the entire line with your rule, so the extra matches are
redundant. It will match the first instance, and that's all it needs to
be a match.

Also any regex ending in * should just have it's last element removed,
as that will devolve to a zero-count match.




Re: sa-update errors

2009-07-20 Thread Matt Kettler
MrGibbage wrote:
 I get errors like this when I run sa-update from cron

 /usr/local/bin/setlock -n /tmp/cronlock.4051759.53932 sh -c
 $'/home/skipmorrow/bin/sa-update --gpgkey 6C6191E3 --channel
 sought.rules.yerp.org'

 gpg: WARNING: unsafe ownership on homedir
 `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys' 
 gpg: failed to create temporary file
 `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/.#lk0x5d7320.ps11651.23686':
 Permission denied
 gpg: keyblock
 resource`/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/secring.gpg':
 general error
 gpg: failed to create temporary file
 `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/.#lk0x5d7320.ps11651.23686':
 Permission denied
 gpg: keyblock resource
 `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/pubring.gpg': general
 error
 gpg: no writable keyring found: eof
 gpg: error reading
 `/home/skipmorrow/share/spamassassin/sa-update-pubkey.txt': general error
 gpg: import from
 `/home/skipmorrow/share/spamassassin/sa-update-pubkey.txt' failed: general
 error

 But when I run it from a login shell, it doesn't show those errors.  So I
 wrote a cript to verify that the cron job is running as the correct user by
 putting in whoami, and indeed it is running as skipmorrow

 skipmor...@ps11651:~$ ls etc/mail/spamassassin/sa-update-keys/ -la
 total 28
 drwx-- 2 skipmorrow pg652 4096 Jul 20 00:00 .
 drwxr-xr-x 3 skipmorrow pg652 4096 Jul 17 13:29 ..
 -rw--- 1 skipmorrow pg652 5123 Jul 17 14:29 pubring.gpg
 -rw--- 1 skipmorrow pg652 4505 Jul 17 13:32 pubring.gpg~
 -rw--- 1 skipmorrow pg6520 Jul 17 13:29 secring.gpg
 -rw--- 1 skipmorrow pg652 1200 Jul 17 13:29 trustdb.gpg
 skipmor...@ps11651:~$ ls .gnupg/ -la
 total 24
 drwx--  2 skipmorrow pg652 4096 Jul 10 13:27 .
 drwxr-x--x 30 skipmorrow pg652 4096 Jul 20 03:48 ..
 -rw---  1 skipmorrow pg652 4128 Jul 10 13:27 pubring.gpg
 -rw---  1 skipmorrow pg652 3039 Jul 10 13:27 pubring.gpg~
 -rw---  1 skipmorrow pg6520 Jul 10 13:27 secring.gpg
 -rw---  1 skipmorrow pg652 1200 Jul 10 13:27 trustdb.gpg

 should sa-update be looking for keys in ~/.gnupg? 
No, it should not be looking in .gnupg. That would be the location for
keys you use. The keys used by sa-update are application specific, so
why would you want them on the keyring you use for email?

  Or is it working correctly?
Well, it's not working correctly, as you're having errors :)
   What environment variable does sa-learn and gnupg look for that
 would be present in my login shell but not be present when running in a cron
 environment?
   
I don't think it's missing an enviornment variable. Are you sure the
cronjob is running with an effective userid of skipmorrow?

This message:

gpg: failed to create temporary file
`/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/.#lk0x5d7320.ps11651.23686':
Permission denied


Strongly suggests you've got a permissions issue, where the cronjob is
running as a user that can't create files in
/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/ . Since
skipmorrow has rwx, that suggests the cronjob is running as some other
userid (probably cron or some other system account).





Re: WEb Frontend for SQL Bayes

2009-07-21 Thread Matt Kettler
Luis Daniel Lucio Quiroz wrote:
 Is there a good frontend that letme to admin SQL Bayes?  


   
mysql-admin?

What are you looking to do as far as administering bayes? (particularly
what would you be doing more than once every 2-3 years)



Re: Spamcheck and how it affects bayes question

2009-07-21 Thread Matt Kettler
Gary Smith wrote:
 We have a process in place using the perl CPAN module for invoking SA.  This 
 is outside of the scope of the normal mail system.  Basically we use this to 
 see what scores emails would generate for some statistical stuff.  The spam 
 engine this calls is to set use -100 as the score so that everything is 
 considered spam.  Our production spam engine is set to 7.  We are looking at 
 the score that the perl modules returns and logging it (rather than the 
 isspam flag).  To complicate things a little more, we are using MySql for the 
 bayes store.  This store is also used by our production boxes.  This isn't 
 the problem, just what we are doing.

 The CPAN module has this as the decription:
 public instance (\%) process (String $msg, Boolean $is_check_p)
 Description:
 This method makes a call to the spamd server and depending on the value of
 C$is_check_p either calls PROCESS or CHECK.

 Given that the perl call as a boolean option for PROCESS and CHECK, I would 
 assume that they make some difference, but it really doesn't what the 
 difference is.  Currently in our code we are it with a false value, which 
 executes the PROCESS commnad.

 What I'm wondering is will this through off bayes if we keep doing this as 
 everything that SA is returning is considered spam?  I'm just worried that 
 these continued tests will cause bayes to get wacky.  Also, should we be 
 using PROCESS or CHECK when doing this type of checks.

 Gary

   
The bayes auto-learning system does not care what your required_score
is set to, and does not care if messages are tagged as spam or not. It
uses its own thresholds, and its own additional criteria for learning.

So, feeding it lots of mail with the threshold set to -100 shouldn't
matter at all.






Re: WEb Frontend for SQL Bayes

2009-07-21 Thread Matt Kettler
Luis Daniel Lucio Quiroz wrote:
 Le mardi 21 juillet 2009 22:11:39, Matt Kettler a écrit :
   
 Luis Daniel Lucio Quiroz wrote:
 
 Is there a good frontend that letme to admin SQL Bayes?
   
 mysql-admin?

 What are you looking to do as far as administering bayes? (particularly
 what would you be doing more than once every 2-3 years)
 
 No, pas mysql-admin

 To let a common user or admin to teach SA about spam or ham.

   
Fair enough, since you were specific about SQL, I was wondering if you
were looking to do something SQL specific (ie: database compacts, etc)..

At this point you need a web frontend for sa-learn, and I don't really
know of any.

If you've got a per-user bayes setup, such a frontend could get a little
messy (needing to authenticate and setuid to the right user prior to
invoking sa-learn, etc)



Re: WEb Frontend for SQL Bayes

2009-07-22 Thread Matt Kettler
Benny Pedersen wrote:
 On Wed, July 22, 2009 04:18, Luis Daniel Lucio Quiroz wrote:
   
 Is there a good frontend that letme to admin SQL Bayes?
 

 sa-learn

   
That's not exactly a web front-end Benny.





Re: Subject Rules

2009-07-22 Thread Matt Kettler
twofers wrote:
 I'm writing rules for header Subject and have a rule question.
  
 I want a rule that would hit on specific words, no matter what order
 they were. Would a rule written like this rule below accomplish that?
  
 Is the  *  redundant and not needed?
  
 Would a rule written like this be more efficient and faster than a
 rule where say, each of these words was used in a separate individual
 rule?
  
 header LR  Subject =~
 / 
 [independent]*[opportunity]*[luxury]*[cowhides]*[win]*[money]*[rep]*[save]*/i
  
 Thanks.
  
 Wes


Well, I wouldn't say that * is redundant.. however, I would say this
entire rule is silly and doesn't do what you want, and it's a little
ambiguous what you're really trying to do.

The whole rule devolves to being any empty regex (//) if you express the
*'s as {0}, meaning this should match *any* text.  I highly doubt that's
what you meant.

Also, you've put the words inside [], which turns them into character
classes.
[win]  will match a single character. a w, an i or an  n, not the word
win. I doubt that's what you want either.

You probably meant to do something like this:
header LR  Subject =~ /
independent.*\bopportunity\b.*\bluxury\b.*\bcowhides\b.*\bwin\b.*\bmoney\b.*\brep\b.*\bsave\b.*/i

But that will only match if all the words are used IN THAT ORDER.

If you want to match all of them being used in arbitrary order, you'll
have to use multiple rules and combine them with a meta rule.

Or perhaps you were looking to detect if any one of them was used, which
would be this rule:

header LR  Subject =~ /
\b(?:independent.|opportunity|luxury|cowhides|win|money|rep|save)\b/i

Probably very false positive prone, but that works.






Re: DNSWL

2009-07-24 Thread Matt Kettler
twofers wrote:
 I get:
 * -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at
 http://www.dnswl.org/, low
 *  trust
  
 and I read the dnswl.org home page, but I don't understand why this
 rule would get a -1.0 for a LOW trust rating.
  
 It just seems awkward to me, I think LOW trust would dictate a
 positive rating, say a 1.0 or higher.
  
 Any insights?
  


Low doesn't mean it's a likely spam source, it means it's a nonspam
source, but with less confidence than the higher tiers.

Regardless, this test performed reasonably well in the 3.2 mass-checks


OVERALLSPAM% HAM% S/ORANK   SCORE  NAME
  0.092   0.0058   0.24420.023   0.66   -1.00  RCVD_IN_DNSWL_LOW

(from 
http://svn.apache.org/repos/asf/spamassassin/branches/3.2/rules/STATISTICS-set3.txt)

With a S/O of 0.023, that means that 97.7% of the email this rule hit was 
nonspam, and 2.3% was spam.

With that S/O, I don't think -1 is an out-of-order score, particularly since 
the test set was spam biased (63.8% of the test email was spam)





Re: How can I view bayes score for individual words?

2009-07-24 Thread Matt Kettler
snowweb wrote:
 I tried to view the files bayes.toks, bayes.journal, bayes.seen and
 autowhitelist but they just look jibberish when opened in a unix editor.
 What's the solution to this?
   
The bayes database stores truncated SHA1 hashes of the words, it is not
reversible back to human readable text using the database alone. This is
done for performance reasons (fixed size tokens = faster random access),
but has a side benefit of preventing your bayes DB from containing words
that may imply things about your confidential emails.

However, if you run a message through spamassassin with -D bayes=9 it
should dump all the tokens in the message with their score from the
bayes DB.

  I was hoping to be able to tweak some of the
 scores and add certain words etc.
That would be a very misguided thing to do. Bayes is a statistical
system, and statistics work better with real measurements, not biased
numbers based on your own guesswork.

The reality of things is that a learning statistics system based on
email is really gathering statistics based on human behavior. Human
behavior is *way* more complex than you think it is. :-)

If you really want to tweak the score of some words, create static rules
for them. Leave bayes to doing its own exacting measurements.


Re: anchor forgery

2009-07-25 Thread Matt Kettler
mouss wrote:
 Mike Cardwell a écrit :
   
 Just checking through my Spam folder and I came across a message that
 contained this in the html:


 
censored example, Verizon won't let me send it 
 Yet, there was no mention of this obvious forgery in the spamassassin
 rules which caught the email.

 How would you create a rule which matched when the anchor text is a url
 which uses a different domain to the anchor href?

 

 this has been discussed a (very) long time ago. the outcome is that a
 mismatch also happens in legitimate mail.

Not just happens, it happens quite a lot.

Sometimes in nonspam it is differences that are easy to compensate for,
like the link being to hosting.example.com, but the anchor text is
www.example.com.

Other times it's difficult to compensate for, where they first send you
to a link at their ESP, which then redirects you to the actual site.
Some ESPs prefer to do this, either for billing (charge extra for
clicks) or spam control reasons (if the sender violates the ToS, the ESP
will disable the redirect, which isn't much, but it does prevent the
sender from profiting at the ESPs expense.).

Regardless of reasons, Senders tend to make the text match what your
browser will show after the redirect occurs, not the ESP target in some
totally different domain.



Re: Score -71 for VERY spammy message!

2009-07-25 Thread Matt Kettler
snowweb wrote:

 Terry Carmen wrote:
   
 This is the result,

 X-Spam-Level:
 X-Spam-Status: No, score=-71.4 required=4.7 tests=HELO_DYNAMIC_IPADDR,

 HTML_IMAGE_ONLY_20,HTML_IMAGE_RATIO_02,HTML_MESSAGE,HTML_SHORT_LINK_IMG_3,

 MIME_HTML_ONLY,MISSING_DATE,MISSING_MID,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_PBL,

 RCVD_IN_SORBS_DUL,RCVD_IN_XBL,RDNS_NONE,RELAYCOUNTRY_PE,SARE_FROM_DRUGS,

 SARE_UNI,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL,
 USER_IN_WHITELIST autolearn=no version=3.2.4
 X-Spam-Relay-Country: PE

 I can't understand what is going on here! How can it get a score like
 that?
 The message contained just an image and a link.
   
 -- USER_IN_WHITELIST

 

 Ah ok. I hadn't seen that. By that does it mean sender or user? The
 spammer is actually in my whitelist? Where can I check entries in my
 whitelist please?

   
USER_IN_WHITELIST would be sender.

Check your whitelist_from and whitelist_from_* statements in your local.cf.

In particular, make sure you didn't make this common mistake:

whitelist_from insert your own address or domain here

Spammers *WILL* abuse this, regularly.


Re: bayes not active although enabled?

2009-07-25 Thread Matt Kettler
snowweb wrote:
 Sorry, got mixed up. In /etc/mail/spamassassin/local.cf

 use_bayes 1

 Is there anywhere else that I need to switch this on since it does not
 appear to be doing bayesian testing at all for any messages.

   
check your sa-learn --dump magic

SA won't activate bayes until it has learned at least 200 spam, and 200
nonspam messages. (under the general premise that until you have a
decent amount of mail learned, the statistics are going to be a bit
erratic and not worthwhile using)




Re: Score -71 for VERY spammy message!

2009-07-26 Thread Matt Kettler
Benny Pedersen wrote:
 On Sun, July 26, 2009 05:07, snowweb wrote:
   
 I can't understand what is going on here! How can it get a score like that?
 The message contained just an image and a link.
 

 add
 score whitelist_from 0.1

 in user_prefs or local.cf

 restart spamd

   
Um, that's not going to do anything except generate errors Benny.

1) There is no rule named whitelist_from to assign a score to. The rule
name is USER_IN_WHITELIST, not whitelist_from. So you'd have to do:

score USER_IN_WHITELIST 0.1.

2) Doing this completely defeats all the user-configured static
whitelisting in SA. You'd be better off removing all your whitelist_from
statements instead.

 i.e.: don't be stupid and treat the symptoms when it's just as easy to
treat the cause.






Re: whitelist_from questions

2009-07-27 Thread Matt Kettler
MySQL Student wrote:
 Hi,

 I'm looking an email that appears to be one of the users from the
 whitelist, but instead was from:

From probesqt...@segunitb1.freeserve.co.uk  Mon Jul 27 19:49:19 2009

 Why can't a comparison be made between the From: info and the actual
 sender? Is this because of virtual domains and/or users?
   
It's not done because this mismatch happens for nearly every mailing
list in existence (including this one).

Every message you get from this mailing list is From: the poster, but
the envelope is from the apache list server's bounce handler.

The To: header and Rcpt to: mismatch for similar reasons (To: will be
the list, but RCPT TO will be your mailbox).







Re: AutoWhiteList

2009-07-31 Thread Matt Kettler
--[ UxBoD ]-- wrote:
 Hi, 

 Where can I find sa-awlUtil as it does not appear to be in the download file 
 ? 

 Best Regards, 

   
Hmmm, it looks like someone has been editing the wiki in ways that don't
match anything in any released or unreleased version of SA.

The tool is named check-whitelist.

There's been talk of changing AWL stuff to not reference the word
whitelist, but AFAIK, this hasn't even been done in the unreleased 3.3
code.

Regardless, you can fetch check_whitelist from SVN:

http://svn.apache.org/repos/asf/spamassassin/branches/3.2/tools/







Re: Parallelizing Spam Assassin

2009-07-31 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:
   
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.
   
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.

 
 I apologise for the any language deemed offensive. Whilst 'Jesus',
 'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for
 openly swearing and using the filty phrase  'Barracuda Networks'. For
 this I apologise.



   
Richard, we are not joking. Please watch your language on this mailing
list, or you will be banned from it.

You have now been warned by 2 members of the Project Management
Committee. You will not be warned again.





Re: Parallelizing Spam Assassin

2009-07-31 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 email me off list as I've just been
 banned for upsetting a sponsor LOL
   
Richard, this has nothing to do with Barracuda. They have no influence
over my opinions whatsoever. I don't work for Apache or Barracuda, or
any company sponsored by either.Neither Apache nor Barracuda has
complained. At the time I warned you, I didn't even remember that
Barracuda ever donated to Apache. I don't think any member of the PMC
has any regular contact with Barracuda, although we've had occasional
contact about using their RBL.

Your warning is about using foul language, and then choosing to thumb
your nose at the warning Justin gave you. You're behaving like an
impudent and foul mouthed child, and that's unwelcome her.

That said, I really don't appreciate you using this list to rant about
Barracuda's products, or discuss them at all. This is the SpamAssassin
list, not the Barracuda list. Barracuda may use SpamAssassin, and
SpamAssassin may support the Barracuda public RBL, but beyond that, any
discussion of them is, quite frankly, off-topic. I don't care how good
or bad their commercial product, or its support is, because it is
off-topic here. I don't welcome people praising Barracuda any more than
I welcome complaints. It simply doesn't matter to SpamAssassin, so it
doesn't belong here.

You may as well be ranting about Ford cars for all I care, it still
doesn't belongs here.

This list is about SpamAssassin, nothing more, nothing less.

Continue with the foul language, and you'll find the door very quickly.
Keep harping on the same off-topic subject and we will eventually get
tired of it. You've said your peace about Barracuda, now give it a rest,
because frankly I don't care about their products, I care about our product.

Is that difficult to understand?













   



Re: Parallelizing Spam Assassin

2009-08-01 Thread Matt Kettler
Um, Linda.. I'm pretty positive Justin is Irish, not American.

Linda Walsh wrote:
 It's an American thing.  Things that are normal speech for UK blokes, get
 Americans all disturbed.

 Funny, used to be the other way around...but well...times change.



 Justin Mason wrote:
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.

 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.






Re: SA-learn (spamassassin)

2009-08-02 Thread Matt Kettler
monolit wrote:
 Question is logical. When SA learnt new spam/ham so SA have to write new info
 to the database and I think that database have to increase size. If you have
 for example *.doc file and you modify it. You add several words - *.doc will
 be bigger(increase his size).
   
The database doesn't need to grow in size.

A berkley db file can contain free space. This is done to avoid
constantly shrinking and growing the file on disk. Deleted elements are
merely marked as free space for later use.

Therefore, data can be added to a berkley db file, without an increase
in file size.



Re: blacklisting a forger; summary; /* end

2009-08-03 Thread Matt Kettler
LuKreme wrote:
 On 3-Aug-2009, at 10:21, Dennis G German wrote:
 Content-Type: text/html;
 charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable


 Yes, there IS a problem.

 What the hell?

The message was multipart/alternative. You are more than capable of
reading the text/plain part.

html-only messages are strongly discouraged on the list, but so is
complaining about multipart/alternative.







Re: RelayCountry Config

2009-08-06 Thread Matt Kettler
MySQL Student wrote:
 Hi,

   
 I don't know if it makes a difference, but I call it Relay-Countries to
 match the name of the pseudo-header used in the tests

 add_header all Relay-Countries  _RELAYCOUNTRY_
 

 It doesn't appear to make a difference. I must be doing something else
 wrong. Using spamassassin --lint -D 21 | less shows the
 X-Relay-Countries header, but it's null:

 # spamassassin --lint -D 21 | egrep -i 'relay|country|countries'


   
snip
 [23760] dbg: metadata: X-Spam-Relays-Trusted:
 [23760] dbg: metadata: X-Spam-Relays-Untrusted:
 [23760] dbg: metadata: X-Spam-Relays-Internal:
 [23760] dbg: metadata: X-Spam-Relays-External:

   
snip
 [23760] dbg: metadata: X-Relay-Countries:
   
The --lint test is *NOT* valid for this. --lint is *ONLY* to verify your
config files are parseable.

The lint test uses a dummy message that has no Recived: headers in it.
This prevents --lint from wasting time doing RBL lookups, etc, which
speeds up the lint run. This is valid because --lint is not intended to
be a comprehensive test of the system, it's intended to check if your
rulefiles are readable.

Since the lint dummy mode has no Received: headers, it hasn't been
anywhere, so it's been in no countries.

Try again with a real message with real headers, and try to remember
that --lint is not a general-purpose test.





Re: Scores, razor, and other questions

2009-08-07 Thread Matt Kettler
MySQL Student wrote:
 Hi,

 After another day of hacking, I have a handful of general questions
 that I hoped you could help me to answer.

 - How can I find the score of a particular rule, without having to use
 grep? I'm concerned that I might find it at some score, only for it to
 be redefined somewhere else that I didn't catch. Something I can do
 from the command-line?
   
No, to be comprehensive you'd have to do a series of greps, one for the
default set, site rules, and user_prefs.

You could probably make a little shell script to automate grepping all 3.

 - How do I find out what servers razor is using? What is the current
 license now that it's hosted on sf, or are the query servers not also
 running there? It doesn't list any restrictions on the web site.
   
Wow.. the razor client has been hosted on SF for a LOOong time..
Like 6 years now?

Regardless, the servers are operated by Vipul's company, cloudmark. Try
running razor-admin -d -discover. Alternatively, look at razor's
server.lst file.
 - The large majority of the spam that I receive these days is a result
 of a URL not being listed in one of the SBLs. I'm using SURBL, URIBL,
 and spamcop. For example, I caught censored several hours
 ago, and it's still not listed in any of the SBLs. Am I doing
 something wrong or am I missing an SBL? Has anyone else's spam with
 URLs increased a lot lately?
   
Note: domain censored, verizon's spam outbreak controls won't let me
send the message with that domain in it right now.

URIBLs have some inherent lag, and spammers are playing a race game with
the URIBLs, trying to change domains faster than they get listed.
Fortunately, the domain registrations cost the spammers money, so
increasing the number of those they need is good.

Personally, I find bayes tends to clean up most of what gets missed,
although I auto-feed my bayes using spamtrap addresses that
automatically submit to sa-learn --spam, resulting in very fresh spam
training.

Looking at uribl, they've currently got it listed in URIBL gold, but
that's a non-free list of theirs. It's also a proactive list, so it
will list domains before they send spam, making it more effective
against mutating runs, but also might toss a FP or two on new domains.


 Thanks,
 Alex


   



Re: Mailbox for auto learning

2009-08-09 Thread Matt Kettler
Luis Daniel Lucio Quiroz wrote:
 Hi SAs,

 Well, after reading this link 
 http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html I'm still looking 
 for an easy-way to let my mortal users to train our antispam.  I was thinking 
 a mailbox such as  h...@antispamserver and s...@antispamserver to let users 
 to 
 forward their false positivos or their false netgatives.  In isde each box 
 (ham or spam), of course a procmail with sa-learn input will be forwarded.  

 My doubts are nexts:
 1. Will forwarded mails be usefull for training, I mean if spam was: From: 
 spa...@example.netTo: u...@mydomain,   when forwarding it will be From: 
 mu...@mydomain To: s...@antispamserver.   Change of this and forwarding 
 (getting rid of headers because mail-clients) wont change learning?
   
Forwarded mails are NOT useful.

You also neglected to mention the change of Received headers, and pretty
much every header in the message, the re-encoding of the body by your
mail client, etc.

Since SA's bayes tokenizes headers, that's disastrous.
 2. If technique in question 1 is usless, what other way would be nice to let 
 user to report a false positive/negative for training.
In some cases you can have the client forward as attachment, and use a
mailbox that strips attachments and feeds them to sa-learn. As long as
the client being used forwards the entire original message, with
complete headers, this should work fine.

   

 TIA

 LD


   



Re: two different spamassassin outputs

2009-08-09 Thread Matt Kettler
David Banning wrote:
 With every email for some reason I get two reports from spamassassin.
 I the headers I get this line;

 -
 X-Spam-Status: No, score=2.5 required=5.0 tests=BAYES_00,DEAR_SOMETHING,
 HTML_MESSAGE,SPF_PASS,URIBL_BLACK,URIBL_OB_SURBL autolearn=no version=3.
 2.5
 -


 Then in the actual message content area, I get this message - (notice the 
 difference in the score)



 -
 Content analysis details:   (6.3 points, 5.0 required)

  pts rule name  description
  -- --
 -0.0 SPF_PASS   SPF: sender matches SPF record
  2.2 DEAR_SOMETHING BODY: Contains 'Dear (something)'
  0.0 HTML_MESSAGE   BODY: HTML included in message
  2.1 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist
 [URIs: verery.net]
  2.0 URIBL_BLACKContains an URL listed in the URIBL blacklist
 [URIs: verery.net]
 -

 I would like it to toss the email away based on the second score (6.3)
 but I would also like to know why it is scoring twice, each with a different
 score.
   
It looks like you're scanning twice. One copy of SA has a bayes database
(that thinks the message is nonspam, so it's probably badly trained),
and the other doesn't seem to have bayes enabled.

That alone accounts for -2.6 points.

The other side of it, the second copy, because bayes isn't active, the
second one is using scoreset 1, instead of scoreset 3, which raises the
scores of other tests (the points that bayes would otherwise hog gets
sprinkled around across other rules).

ie: in set 1, URIBL_OB_SURBL is 2.132 points, in set 3 it is 1.50.. etc.

 Any comments or suggestions would be helpful. Thanks - 


   



Re: whitelist_from_rcvd and short circuit

2009-08-13 Thread Matt Kettler
Chris wrote:
 It appears as though I don't understand how this is supposed to work. I
 have a file in /etc/mail/spamassassin called my-whitelist.cf. In it I
 have entries such as:


   
snip
 whitelist_from_rcvd harley-requ...@the-hed.net the-hed.net

   
snip
 however, a message from the 2nd address doesn't hit the
 USER_IN_WHITELIST for some reason:

 Return-path: harley-requ...@the-hed.net
 X-spam-checker-version: SpamAssassin 3.2.5 (2008-06-10) on
 localhost.localdomain
 X-spam-status: No, score=-4.9 required=5.0
 tests=AWL=0.445,BAYES_00=-6.4,
 DCC_CHECK_NEGATIVE=-0.0001,KHOP_NO_FULL_NAME=0.259,RDNS_NONE=0.1,
 SPF_NEUTRAL=0.686,UNPARSEABLE_RELAY=0.001
 AWL,BAYES_00,DCC_CHECK_NEGATIVE,
 KHOP_NO_FULL_NAME,RDNS_NONE,SPF_NEUTRAL,UNPARSEABLE_RELAY
 shortcircuit=no autolearn=disabled version=3.2.5

 Complete headers of both posts are here:

 http://pastebin.com/m1d1d5e07

   
snip
 So, what am I doing wrong here?
   
Two problems with that message:

First, there's an unparsable Received: header, which appears to be the
one created by your fetchmail. That's breaking SA's trust path, and
preventing any hosts from being trusted, making whitelist_from_rcvd
impossible. I'm not sure what's throwing it off, but the (single-drop)
bit looks a bit odd to me. You need to get SA to understand the
Received: headers for any Received-based mechanisms to work. You'll also
need it to trust all the servers at your isp/esp/whatever relationship
you have with embarqmail.com and synacor.com.

Second, the message from harley-requ...@the-hed.net is not relayed to
your site from a server using the-hed.net as it's reverse DNS. In fact,
the-hed.net is not used as the domain of *ANY* server in the received
headers of that message. The server they appear to be using is
kyoto.hostforweb.net, so hostforweb.net should be the second parameter
in your whitelist_from_rcvd, not the-hed.net.









Re: Counting RAZOR2 hits

2009-08-15 Thread Matt Kettler
MySQL Student wrote:
 Hi,

 I thought grep -c RAZOR2_CHECK through my mail logs would give me a
 good approximation of the number of times RAZOR2 was consulted, but
 that doesn't seem to be the case. There are some mails that don't have
 it listed in the tests= section.

 I've also tried the razor-* commands, and they don't appear to be able
 to help here either. What am I missing?

 Does RAZOR2_CHECK mean that it was found in the RAZOR2 db, or that it
 merely consulted the db?
   
That means it was found and was above your min_cf. i.e.: Razor believes
it is spam.




Re: sa-update.com expired ?

2009-08-16 Thread Matt Kettler
Stefan wrote:
 Hello list,

 I just configured sa-update on a server with some sare rule sets. And it 
 couldn't download some sets because the MIRRORED.BY file has an entry with sa-
 update.com. In this case it was the 70_zmi_german rule set and the 
 MIRRORED.BY 
 file has the following content:
 http://daryl.dostech.ca/sa-update/zmi/70_zmi_german.cf/
 http://updates.sa-update.com/zmi/70_zmi_german.cf/

 sa-update tried the latter but there is nothing, because the domain seams to 
 be expired.

 Is this temporary or are there plans to fix that?
   
Hmm, Interesting. Looks like it expired on August 8th.

Perhaps Daryl can answer this (AFAIK, he's the owner of the
sa-update.com domain. It is not owned by the ASF or the SpamAssassin team.)




Re: Counting RAZOR2 hits

2009-08-17 Thread Matt Kettler
Karsten Bräckelmann wrote:
 On Mon, 2009-08-17 at 09:52 +0200, Matus UHLAR wrote:
   
 On 15.08.09 14:32, Matt Kettler wrote:
 
 That means it was found and was above your min_cf. i.e.: Razor believes
 it is spam.
   
 There's no min_cf gor RAZOR and there's no public hitcount. RAZOR2 has
 internal trust system which counts reports and revokes from its
 users/reporters and uses those to decide if the message is listed or not.
 

 There is -- the minimum confidence level is the second option for the
 check_razor2_range() eval rule.


   
You can also set your min_cf in your razor config files, which will
affect when the RAZOR2_CHECK rule fires. This does work in SpamAssassin,
as I have over-ridden the min_cf on my own system, and have done so for
years.

The private part of Razor's trust system has to do with how much impact
your reports have on the cf values everyone else gets when they query
razor. However, you're free to tweak razor to be more or less aggressive.

The razor system also advertizes a suggested cf value, which they call
ac (average confidence?) and you can define min_cf to either be your
own absolute value (ie: 10), or relative to the advertized one (ie: 
ac+10, or ac-5).

Razor's cf's go from -100 to +100.

see man razor-agent.conf for further details on how to configure razor,
if you're so inclide.


Re: SA Timeouts

2009-08-19 Thread Matt Kettler
Cory Hawkless wrote:

 Hi All,

  

 Having a problem with my SA setup. I’m using amavisd and Postfix. For
 some reason I get the following occasionally

  

 Aug 19 15:37:20.176 ceg.caznet.com.au /usr/sbin/amavisd[5]:
 (5-01-6) SA dbg: bayes: database connection established

 Aug 19 15:37:20.177 ceg.caznet.com.au /usr/sbin/amavisd[5]:
 (5-01-6) SA dbg: bayes: found bayes db version 3

 Aug 19 15:37:20.179 ceg.caznet.com.au /usr/sbin/amavisd[5]:
 (5-01-6) SA dbg: bayes: Using userid: 4

 Aug 19 15:37:20.184 ceg.caznet.com.au /usr/sbin/amavisd[5]:
 (5-01-6) SA dbg: bayes: corpus size: nspam = 5993, nham = 24505

 Aug 19 15:39:30.977 ceg.caznet.com.au /usr/sbin/amavisd[4]:
 (4-02-4) (!)SA TIMED OUT, backtrace: at
 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PerMsgStatus.pm
 line 1961\n\teval {...} called at
 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PerMsgStatus.pm
 line
 1961\n\tMail::SpamAssassin::PerMsgStatus::_get_parsed_uri_list('Mail::SpamAssassin::PerMsgStatus=HASH(0xb0945cc)')
 called at
 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PerMsgStatus.pm
 line
 1852\n\tMail::SpamAssassin::PerMsgStatus::get_uri_detail_list('Mail::SpamAssassin::PerMsgStatus=HASH(0xb0945cc)')
 called at
 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin/URIDNSBL.pm
 line
 207\n\tMail::SpamAssassin::Plugin::URIDNSBL::parsed_metadata('Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0xae5421c)',
 'HASH(0xb05f97c)') called at
 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PluginHandler.pm
 line 202\n\teval {...} called at
 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin[...]

  


Roughly twice a day?

If so, I'm guessing a bayes expire run makes the SA run just long enough
to get killed (expiry does take a while, depending on hardware and DB
size, it adds around 1-2 minutes to a run.

. Try either:
1) extend the amavis timeout by 30 seconds
2) disable SA's bayes_auto_expire, and use a cronjob to run sa-learn
--force-expire instead.

and see if it goes away.



Re: sare channels

2009-08-20 Thread Matt Kettler
Dave wrote:
 Hello,
   I'm trying to add additional sa rules and wanted to use the sare
 channels referenced by the wiki. I'm using sa 3.2.5 and when i atempted to
 get updates from saupdates.openprotect.com the channel didn't exist. Has it
 moved?
 Thanks.
 Dave.


   
Read the top of the rulesemporium site:

http://www.rulesemporium.com/

SARE rules aren't being updated. Hence, sa-updating them is pointless.


Re: sare channels

2009-08-20 Thread Matt Kettler
Gary Smith wrote:
 Read the top of the rulesemporium site:

 http://www.rulesemporium.com/

 SARE rules aren't being updated. Hence, sa-updating them is pointless.
 

 Is it still recommended to run the SARE rules?


   
There's nothing wrong with running them if you want.. but using
sa-update on them regularly is utterly pointless..



Re: Obfuscation Question

2009-08-27 Thread Matt Kettler
Irish Online Help Desk wrote:

 When I send a test message for my broadcast email I am receiving “0.6
 HTML_OBFUSCATE_05_10 BODY: Message is 5% to 10% HTML obfuscation” in
 the spam score.  It is a pretty basic email message with a few
 hyperlinks and a numbered list.  Can you explain what may be causing
 this spam score.

Well, at 0.6 points, it's not really anything to worry about. Nobody (at
least nobody with more than 2 braincells) should be tagging or
discarding email at such a low score level.

As for the rule, it's generally going to be looking for abuse of tables,
etc to obscure what the user-perceived text of a message is. (ie:
writing a message by populating columns vertical-first in a table), 
etc. If you're really worried, you might want to look at the raw message
source and see if the innocent looking text has a lot of really weird
html layout in it.

However, with such a low obfuscation ratio, and such a small score.. I'd
really not worry about it.





Re: SA-learn is it a problem

2009-08-30 Thread Matt Kettler
peperami97 wrote:
 Hi

 Whenever I run sa-learn it claims to learn from every message regardless of
 whether its being run immediately after being run on the same folder.

 Is this normal or is this a problem ?
   
That would seem to be a problem. It shouldn't relearn from the same
message..

are you using SQL based bayes, or the default db_file based bayes?

If db_file, is your bayes_seen file being updated when you run sa-learn?





Re: SA-learn is it a problem

2009-08-30 Thread Matt Kettler
Ben Whyte wrote:

 By default:

 ~/.spamassassin/bayes_seen


 (ie: inside the .spamassassin subdierctory of your home directory)


 I found it I assume.  Its in /home/.spamassassin/spamassassin_seen

 .spamassassin_seen is not getting updated

 .spamassasin_toks is getting updated

 Ben

Erm. Did someone mess with the bayes_path setting in your configuration?

Also, are you running SA as a user whose home directory is just /home
(ie: the user named nobody)







Re: SA-learn is it a problem

2009-08-31 Thread Matt Kettler
Ben Whyte wrote:

 Erm. Did someone mess with the bayes_path setting in your configuration?

 Also, are you running SA as a user whose home directory is just /home
 (ie: the user named nobody)






 The bayes config is pointing to /home/spamd
Based on what you've told me so far, it is not. It is pointing to
/home/.spamassassin/

What's the exact bayes_path statement you used, and what file is it in? 
(please post the exact one.. bayes_path is a VERY tricky option to use,
because it requires more than just a path.)




 Its running as the user spamd.

 I noticed that the spamassassin_seen is not been touched by running
 sa-learn.  However the size isnt changing.  spamassassin_toks is
 changing size.

Did you run sa-learn as the user spamd?
 Ben





Re: some domains in my local.cf file not being tagged

2009-09-02 Thread Matt Kettler
Mark Mahabir wrote:
 Hi,

 I have a large number of domains I've blacklisted in my local.cf file e.g.

 blacklist_from *...@domain.com

 however spam from some domains gets tagged, whereas others don't. What
 can I do to improve the situation?

 Thanks

 Mark


   
Does the From: header of these messages match *...@domain.com, or are they
*...@something.somedomain.com (which wouldn't match)?


Does the X-Spam-Status header show that a blacklist matched
(USER_IN_BLACKLIST)?



Re: Rule PTR != localhost

2009-09-03 Thread Matt Kettler
Clunk Werclick wrote:
 Howdie;

 I'm starting to see plenty of these and they are new to us:

 zgrep address not listed /var/log/mail.info
 Sep  3 05:26:59 : warning: 222.252.239.56: address not listed for
 hostname localhost
 dig -x 222.252.239.56

 ...
 ;; QUESTION SECTION:
 ;56.239.252.222.in-addr.arpa. IN PTR

 ;; ANSWER SECTION:
 56.239.252.222.in-addr.arpa. 83651 IN PTR localhost.
 ...

 Taking to one side the various RBL's which are catching these, and not
 going the whole 'PTR must match' route - would it be practical to craft
 a 10 point rule based on PTR = localhost? Is it even possible to build a
 rule based upon DNS returns?

 Forgive the stupidity of the question, but I'm not sure how to, or even
 if it can be implemented?
Not without writing a plugin. Although if your MTA inserts a may be
forged note into the Received: headers, SA will pick up on this.

Generally speaking, SA does not perform A record lookups of anything
that could be spammer-provided, neither hosts in URLs nor Received:
hosts. Doing so posses a potential security risk. (NS record queries are
performed, but not A).

Attack vectors include:

1) malicious insertion of hosts that are slow-to-resolve, forcing a DNS
timeout, thus slowing down mail processing. A small flood of such
messages (each with different hostnames) could readily occupy all your
spamd children. Spamd does not have sufficient cross child co-ordination
to implement countermeasures, and anyone using the API or spamassassin
script would have to roll their own.

2) there is the potential to abuse chosen queries to facilitate DNS
cache poisoning attacks, on servers that are vulnerable.







Re: Rule PTR != localhost

2009-09-03 Thread Matt Kettler
Matt Kettler wrote:
 Clunk Werclick wrote:
   
 Howdie;

 I'm starting to see plenty of these and they are new to us:

 zgrep address not listed /var/log/mail.info
 Sep  3 05:26:59 : warning: 222.252.239.56: address not listed for
 hostname localhost
 dig -x 222.252.239.56

 ...
 ;; QUESTION SECTION:
 ;56.239.252.222.in-addr.arpa. IN PTR

 ;; ANSWER SECTION:
 56.239.252.222.in-addr.arpa. 83651 IN PTR localhost.
 ...

 Taking to one side the various RBL's which are catching these, and not
 going the whole 'PTR must match' route - would it be practical to craft
 a 10 point rule based on PTR = localhost? Is it even possible to build a
 rule based upon DNS returns?

 Forgive the stupidity of the question, but I'm not sure how to, or even
 if it can be implemented?
 
 Not without writing a plugin. Although if your MTA inserts a may be
 forged note into the Received: headers, SA will pick up on this.
   
Correction, SA dropped this rule a LONG time ago in the 2.5x series due
to wild false positives.

The legacy rule from 2.4x

header MAY_BE_FORGEDReceived =~ /\(may be forged\)/i
describe MAY_BE_FORGED  'Received:' has 'may be forged' warning
score MAY_BE_FORGED  0.038


OVERALL%   SPAM% NONSPAM% S/ORANK   SCORE  NAME
  2.5303.7572.2900.620.340.04  MAY_BE_FORGED

0.62  S/O is not so good (ie: 62% of the email matched was spam, but 38%
was nonspam)



   



Re: Rule PTR != localhost

2009-09-03 Thread Matt Kettler
Clunk Werclick wrote:
 On Thu, 2009-09-03 at 05:23 -0400, Matt Kettler wrote:
   
 Clunk Werclick wrote:
 
 Howdie;

 I'm starting to see plenty of these and they are new to us:

 zgrep address not listed /var/log/mail.info
 Sep  3 05:26:59 : warning: 222.252.239.56: address not listed for
 hostname localhost
 dig -x 222.252.239.56

 ...
 ;; QUESTION SECTION:
 ;56.239.252.222.in-addr.arpa. IN PTR

 ;; ANSWER SECTION:
 56.239.252.222.in-addr.arpa. 83651 IN PTR localhost.
 ...

 Taking to one side the various RBL's which are catching these, and not
 going the whole 'PTR must match' route - would it be practical to craft
 a 10 point rule based on PTR = localhost? Is it even possible to build a
 rule based upon DNS returns?

 Forgive the stupidity of the question, but I'm not sure how to, or even
 if it can be implemented?
   
 Not without writing a plugin. Although if your MTA inserts a may be
 forged note into the Received: headers, SA will pick up on this.

 Generally speaking, SA does not perform A record lookups of anything
 that could be spammer-provided, neither hosts in URLs nor Received:
 hosts. Doing so posses a potential security risk. (NS record queries are
 performed, but not A).

 Attack vectors include:

 1) malicious insertion of hosts that are slow-to-resolve, forcing a DNS
 timeout, thus slowing down mail processing. A small flood of such
 messages (each with different hostnames) could readily occupy all your
 spamd children. Spamd does not have sufficient cross child co-ordination
 to implement countermeasures, and anyone using the API or spamassassin
 script would have to roll their own.

 2) there is the potential to abuse chosen queries to facilitate DNS
 cache poisoning attacks, on servers that are vulnerable.
 

 Thank you Matt. That is a fine quality of answer and makes total sense.
 I had never thought to consider this attack vector. On an SA install
 running hundreds of thousands of messages I could see a significant
 issue if DNS returns ran much past 300ms or so. I am guessing (and I
 have not at all examined the code, nor shall I pretend that I would
 understand it) that there is some kind of sanity check for DNS timeout
 there someplace? Again, potentially a stupid question - but I'm curious
 as to how we would say 'that query has taken too long, I'm out of
 here'. 
   
AFAIK, all the DNS lookups for a message are subject to the rbl_timeout
code.

See to conf docs:
http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html




Re: some domains in my local.cf file not being tagged

2009-09-03 Thread Matt Kettler
Mark Mahabir wrote:
 2009/9/3 Matt Kettler mkettler...@verizon.net:
   
 Does the From: header of these messages match *...@domain.com, or are they
 *...@something.somedomain.com (which wouldn't match)?
 

 They're definitely *...@domain.com in the From: header.

   
 Does the X-Spam-Status header show that a blacklist matched
 (USER_IN_BLACKLIST)?
 

 No, they don't (the ones that don't get tagged).

 Thanks,

 Mark


   
Interesting, then one of the following is the cause:

1) there's errors in your config, and SA isn't parsing local.cf at all.
To check for this, run spamassassin --lint. It should run quietly, if
it complains, find and fix the offending lines.

2) You're editing a local.cf in the wrong path. Check what the site
rules dir is near the top of the debug output when you run
spamassassin -D --lint.

3) the offending message has multiple From: headers, and SA is
interpreting the other one. You can try looking at the raw message
source for this.

4) The configuration being used at delivery time is over-riding the one
used at the command line. You can try pumping the message as a file
through spamassassin on the command line and see what it comes up with.
If it matches USER_IN_BLACKLIST on the command-line, but fails to match
at delivery, something is fishy about your integration and how it
configures SA.




Re: some domains in my local.cf file not being tagged

2009-09-04 Thread Matt Kettler
d.h...@yournetplus.com wrote:
 Quoting Matt Kettler mkettler...@verizon.net:

 Mark Mahabir wrote:
 2009/9/3 Matt Kettler mkettler...@verizon.net:

 Does the From: header of these messages match *...@domain.com, or are
 they
 *...@something.somedomain.com (which wouldn't match)?


 They're definitely *...@domain.com in the From: header.


 Does the X-Spam-Status header show that a blacklist matched
 (USER_IN_BLACKLIST)?


 No, they don't (the ones that don't get tagged).

 Thanks,

 Mark



 Interesting, then one of the following is the cause:

 1) there's errors in your config, and SA isn't parsing local.cf at all.
 To check for this, run spamassassin --lint. It should run quietly, if
 it complains, find and fix the offending lines.

 2) You're editing a local.cf in the wrong path. Check what the site
 rules dir is near the top of the debug output when you run
 spamassassin -D --lint.

 3) the offending message has multiple From: headers, and SA is
 interpreting the other one. You can try looking at the raw message
 source for this.

 4) The configuration being used at delivery time is over-riding the one
 used at the command line. You can try pumping the message as a file
 through spamassassin on the command line and see what it comes up with.
 If it matches USER_IN_BLACKLIST on the command-line, but fails to match
 at delivery, something is fishy about your integration and how it
 configures SA.

 Or, does order of comparison matter. From the documentation,
 blacklist_from states to see whitelist_from. whitelist_from states:

 The headers checked for whitelist addresses are as follows: if
 Resent-From is set, use that; otherwise check all addresses
 taken from the following set of headers:

 Envelope-Sender
 Resent-Sender
 X-Envelope-From
 From

 If taken in that order, the From header field would be compared last.


It will check *ALL* of the from like headers, and it will fire if
*ANY* of them match.

So that's not the problem.


Re: user prefs from sql problem

2009-09-08 Thread Matt Kettler
Karel Beneš wrote:
 Hi,

   I am trying to load user preferences from SQL db (mysql). Setup was
 done according to doc/spamassassin/sql/README.gz, but user
 preferences are still loaded from files. No error message is raised
 into log file in debug mode. DB-based bayes and awl works fine.

 Debian GNU/Linux 5.0.3, spamassassin 3.2.5, mysql 5.0.51a.

 Spamassassin is invoked by spamc in /etc/procmailrc.

 spamd --max-children 2 --helper-home-dir --setuid-with-sql -d
 --pidfile=x

 What is going wrong?
   
Did you set these options in your local.cf?:

  user_scores_dsn   DBI:driver:connection
  user_scores_*sql*_username  dbusername
  user_scores_*sql*_password  dbpassword

And what did you set user_scores_dsn to?

See also:

sql/README from the tarball (web copy for 3.2.x at:
http://svn.apache.org/repos/asf/spamassassin/branches/3.2/sql/README)




 Thanks a lot,
 --kb

   



Re: URL rule creation question

2009-09-10 Thread Matt Kettler
MySQL Student wrote:
 Hi all,

 I've seen this pattern in spam quite a bit lately:

   
snip - URI that verizon won't let me send
 Would it be reasonable to create a rule that looks for this two-char
 then dot pattern, or is it reasonable that it might appear in a
 legitimate email too frequently? If possible, how would you create a
 rule to capture this?
   

This rule  should detect 10 consecutive occurrences.
uri   L_URI_FUNNYDOTS   /(?:\.[a-z,0-9]{2}\.){10}

I do think that 4-in-a-row might be pretty common (ie: IP addresses),
but 10 in a row seems unlikely.

Warning: I wrote this quickly without too much thought. It may have
bugs, but I'm short on time at the moment.



Re: URL rule creation question

2009-09-11 Thread Matt Kettler
McDonald, Dan wrote:

 From: Matt Kettler [mailto:mkettler...@verizon.net]

 This rule  should detect 10 consecutive occurrences.
 uri   L_URI_FUNNYDOTS   /(?:\.[a-z,0-9]{2}\.){10}

 Warning: I wrote this quickly without too much thought. It may have
 bugs, but I'm short on time at the moment.

 your variant would require two periods in a row between each pair.

So it would... Hence the warning :)


Re: Does Spam Assassin support additional languages?

2009-09-15 Thread Matt Kettler
Chris Arrendale wrote:

 I am running version 3.2.4 and interested to know if there are
 additional language packs for Spam Assassin such as German, Turkish,
 Chinese, etc.?  If they are available, does anybody know where I can
 download them?

There are a few language packs that come with SA. German, Spanish,
French, italian, etc.

see the 30_text_*.cf files that come with it.

SA picks this up which language to use from the LANG environment
variable of the system it is running on.



Re: Re-running SA on an mbox

2009-09-20 Thread Matt Kettler
MySQL Student wrote:
 Hi,

 I have an mbox with about a 100 messages in it from a few days ago.
 The mbox is a combination of spam and ham. What is the best way to run
 SA through these messages again, so I can catch the ones that have
 URLs in them that weren't on the blacklist at the time they were
 received?

 Must I break them all apart to do this, or can SA somehow parse the
 whole mbox? If not, what program do you suggest I use to accomplish
 this?
   
Do you just want to re-scan the whole mbox and see what rules hit now
for research reasons?

You could probably abuse the mass-check tool for that purpose:

http://svn.apache.org/repos/asf/spamassassin/branches/3.2/masses/

It's normally used to generate logs we feed into the score generation
process, but it can be run on a single mbox.

The downside, is all it does is generate a report, one line per message,
with a list of hits.

There's no way to (directly) get SA to modify email that's already in an
mbox file. The mass-check and sa-learn tools can read them, but nothing
in SA can write to that. However, there might be a utility out there to
do this (although I'm not aware of any)..




Re: Re-running SA on an mbox

2009-09-20 Thread Matt Kettler
MySQL Student wrote:
 Hi,

   
 Do you just want to re-scan the whole mbox and see what rules hit now
 for research reasons?
 

 That's a good start, but I'd like to see if I can break out the ham to
 train bayes.

   
 There's no way to (directly) get SA to modify email that's already in an
 mbox file. The mass-check and sa-learn tools can read them, but nothing
 in SA can write to that. However, there might be a utility out there to
 do this (although I'm not aware of any)..
 

 Yeah, that's kind of what I thought. Maybe a program that can split
 each message back into an individual file? Would procmail even help
 here? Or even a simple shell script that looks for '^From ', redirects
 it to a file, runs spamassassin -d on it, then re-runs SA on each
 file? I could then concatenate each of them back together and pass it
 through sa-learn.
   

That sounds like a good plan.

If you google around for mbox split or mbox splitter you can find
some sample code out there that does it. It's all just simple code
looking for the ^From  boundary.



Re: Re-running SA on an mbox

2009-09-20 Thread Matt Kettler
Theo Van Dinter wrote:
 You probably want spamassassin --mbox. :)
 It won't modify the messages in-place, but you can do something like
 spamassassin --mbox infile  outfile.

 If you're talking about sa-learn, though, it also knows --mbox.
   
Yes, but he's got mixed spam and nonspam in one mbox. You've got to
split that before you can feed sa-learn.


 On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student mysqlstud...@gmail.com wrote:
   
 Yeah, that's kind of what I thought. Maybe a program that can split
 each message back into an individual file? Would procmail even help
 here? Or even a simple shell script that looks for '^From ', redirects
 it to a file, runs spamassassin -d on it, then re-runs SA on each
 file? I could then concatenate each of them back together and pass it
 through sa-learn.
 


   



Re: partial (lazy) scoring? (shortcircuit features)

2009-09-22 Thread Matt Kettler
ArtemGr wrote:
 I would like to configure Spamassassin to only do certain tests
 when the required_score is not yet reached.
 For example, do the usual rule-based and bayesian tests first,
 and if the score is lower than the required_score,
 then do the DCC and RAZOR2 tests.

 Is it possible?


   
Not exactly the way you describe, no.

SpamAssassin has a priority and a shortcircuit facility that provide a
vaguely similar functionality, but it doesn't really work exactly the
way you want.

Priority allows you to change the order in which rules are processed, so
you can make some rules run earlier, or later, than others. This part
fits your needs.

Shortcircuit allows you to stop processing when a particular rule fires.
However, it is strictly based on the rule firing, not the message score.
This part doesn't fit your needs.

Collectively they allow you to make some rules (ie: USER_IN_WHITELIST,
USER_IN_BLACKLIST) run first, and abort processing if they fire.

However, this doesn't really work for your scenario of delaying a few
rules and aborting if they're not needed.

I suppose there could be some kind of mod to the shortcircuit plugin to
do this, however it's a little dangerous from a false-positive
perspective, so the devs may not be very enthusiastic about adding it.

A long, long time ago, SpamAssassin had a feature where it would abort
as soon as a given score was hit. However, this introduced a problem
where it could cause false positives. A nonspam message might hit
several spam rules early in the processing, and drive the score over the
abort threshold, causing it to be tagged as spam. However, this could
prevent it from matching negative scoring rules that would push it back
under the spam threshold.

Now, that version of SA was a long time ago, and we didn't have any
priority going on, and it was also checking the score pretty often in
between rules.

In theory, a feature could be added to let you do something like this
(SA doesn't have this feature, but I'm proposing it could be added):

shortcircuit_if_score_above_at score priority

Which would let you do:

shortcircuit_if_score_above_at 5.0 99
priority RAZOR_CHECK 100
priority DCC_CHECK 100

You'd have to be careful about your priorities, as this will prevent any
nonspam rules with higher priority numbers from running, but it could
work for this scenario.

You could also prevent the rules from running on nonspam if they're
pointless as well with a similar score below feature:

shortcircuit_if_score_below_at -1.17 99

The highest score you can ever get out of both DCC and Razor (with the
current scores) is +6.17 (unlikely, but possible, assuming both e4 and
e8 have high cf's and DCC fires too). If the score is already below
-1.17, there's no way these rules can ever drive the score up enough be
over 5.0 and make the message spam.

Obviously this would greatly depend on what rules you're running late.





Re: partial (lazy) scoring? (shortcircuit features)

2009-09-24 Thread Matt Kettler
Matus UHLAR - fantomas wrote:
 Matt Kettler mkettler_sa at verizon.net writes:
 
 In theory, a feature could be added to let you do something like this
 (SA doesn't have this feature, but I'm proposing it could be added):
   

 On 22.09.09 11:46, ArtemGr wrote:
   
 That would be a nice optimization: most of the spam we receive have a 10
 score. It seems a real waste of resource to perform all the complex tests
 (like distributed hashing or OCR-ing) on spam which is DNS and
 rule-detectable.
 

 You haven't read Matt's explanation of why it wasn't a good idea, did you?

 There are rules with negative scores, which can puch the score back to the
 ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is
 checked?
   
*You* obviously haven't read my message, which explains how this *can*
be done safely.




Re: 3.3.0 and sa-compile

2009-09-25 Thread Matt Kettler
to...@starbridge.org wrote:
 Benny Pedersen a écrit :
  On fre 25 sep 2009 13:38:19 CEST, to...@starbridge.org wrote

  I've tested with SA 3.2.5 and it's working fine with Rule2XSBody
  active. I've tried to delete compiled rules and compile again:
  same result.
  forget to sa-compile in 3.3 ?

 sa-compile has been run correctly with no errors (even in debug)



Re: SQL Bayes behavior

2009-09-29 Thread Matt Kettler
pm...@email.it wrote:
 Hi,

 I've few question about the behavior of Bayes and SQL. Before the
 questions, i've followed this tutorial 
 http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html that
 should be the same thing of this:
 http://spamassassin.apache.org/full/3.0.x/dist/sql/README.bayes, my db
 are updated constantly, so it should woks.

 1- In the bayes_vars
 http://192.168.1.36/phpmyadmin/sql.php?db=spamassassintoken=eea7fc1ed22ce035cad972e37fa36534table=bayes_varspos=0
 table i've only a row for amavis user. Theoretically is it a good
 choise to use only one db for all users of my domain? (if i've
 understood well, spamassassin use this single db to store Bayes for
 all users of my domain)
In theory, per-user is slightly more accurate than systemwide. However,
training is more important than granularity. So when it comes down to
it, unless you're ready to set up something where users can individually
report spam and nonspam (can be a bit tricky) you're probably better off
going with a single system-wide bayes database. At least this way if you
need to do some manual training, it's only one DB to train on and
everyone benefits.



 2- How can i use single Bayes db for each users? Should i use
 bayes_sql_override_username ? I don't know where to get the right
 username.
You'd need to get amavis to pass this to spamassassin. I don't know
enough about amavis to know if this is supported or not. Generally most
MTA layer integrations don't, and most MDA integrations do, but there's
lots of exceptions. Amavis is a MTA integration, but it might be one of
the exceptions.


 3- Every 10-15 seconds, the counts of ham_count or spam_count in
 bayes_vars
 http://192.168.1.36/phpmyadmin/sql.php?db=spamassassintoken=eea7fc1ed22ce035cad972e37fa36534table=bayes_varspos=0
 table increase without that any users send or receave mails. So, the
 behavior of spamassassin is to analize all mails presents in all my
 users's Maildirs?
No. spamassasin has no concept that your user's maildirs even exist, it
will not scan them.

 There are only 2 ways training occurs:

1) a message passes through SA during delivery, and gets auto-learned
due to the scoring criteria
2) someone (or some cronjob) calls sa-learn and explicitly feeds it mail.

And the only other way that the counts could update would be during a
journal sync, which occurs only during message processing or calls to
sa-learn. (the exact triggers are slightly different, but from a
high-level view they're more-or-less the same.).

It seems strange you're seeing the counts increase without any incoming
mail... Are you *positive* nothing is arriving, or recently arrived and
is just finishing up being processed by SA?




 Thanks :)
 Marco





Re: New spamhaus list not included

2009-10-04 Thread Matt Kettler
Mike Cardwell wrote:
 SpamHaus announced a new list a couple of days back -
 http://www.spamhaus.org/news.lasso?article=646

 According to that page it returns results of 127.0.0.3

 I just took a quick look at 20_dnsbl_tests.cf and it doesn't seem to
 include it yet. Currently we have:

 RCVD_IN_SBL - 127.0.0.2
 RCVD_IN_XBL - 127.0.0.[45678]
 RCVD_IN_PBL - 127.0.0.1[01]

It was announced 2 days ago.. are you really surprised it's not in SA
proper yet? (2 days isn't really enough time to test a new RBL for
accuracy) :-)

That said, we do appreciate you passing along the announcement, and it
looks like Alex committed a rule for it to his sandbox for testing
shortly after your email and created bug 6215 to track it.

So, the ball is now rolling. Thanks much.



Re: SpamAssassin Ruleset Generation

2009-10-06 Thread Matt Kettler
poifgh wrote:
 I have a question about - understanding how are rulesets generated for
 spamassassin.

 For example - consider the rule in 20_drugs.cf : 
 header SUBJECT_DRUG_GAP_C   Subject =~
 /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i
 describe SUBJECT_DRUG_GAP_C Subject contains a gappy version of 'cialis'

 Who generated the regular expression
 /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i
   
Man, that's a good question. I wrote a large chunk of the rules in
20_drugs.cf, but not that one. ( I wrote the stuff near the bottom that
uses meta rules. ie:  __DRUGS_ERECTILE1 through DRUGS_MANYKINDS,
originally distributed as a separate set called antidrug.cf). As I
recall, there were 2 other people making drug rules, but it's been a
LONG time, and I forget who did it. Those rules were written in the
2004-2006 time frame when pharmacy spams were just hammering the heck
outa everyone.

 a. Is it done manually with people writing regex to see how efficiently they
 capture spams?
   
Yes. Many hours of reading spams, studying them, testing various regex
tweaks, checking for false positives, etc, etc.

mass-check is your friend for this kind of stuff.

One post from when I was developing this as a stand-alone set:

http://mail-archives.apache.org/mod_mbox/spamassassin-users/200404.mbox/%3c6.0.0.22.0.20040428132346.029d9...@opal.evi-inc.com%3e

Note: the comcast link mentioned in that message should be considered
DEAD. The antidrug set is no longer maintained separately from the
mailline ruleset, and hasn't been for years.


If you want to break the rules down a bit, here's some tips:

The rules are in general designed to detect common methods to obscure
text by inserting spaces, punctuation, etc between letters, and possibly
substituting some of the letters for other similar looking characters.
(W4R3Z style, etc)

The simple format would be to think of it in groupings. You end up using
a repeating pattern of (some representation of a character)(some kind of
gap sequence)(character)(gap)...etc.

.{0,2} is a gap sequence, although not one I prefer. I prefer
[_\W]{0,3} in most cases because it's a bit less FP-prone, but risks
missing things using small lower-case letters to gap.

You also get replacements for characters in some of those, like [A4]
instead of just A. Or, more elaborately..  [a4\xe0-\...@]

So this mess:

body __DRUGS_ERECTILE1  
/(?:\b|\s)[_\W]{0,3}(?:\\\/|V)[_\W]{0,3}[ij1!|l\xEC\xED\xEE\xEF][_\W]{0,3}[a40\xe0-\...@][_\w]{0,3}[xyz]?[gj][_\W]{0,3}r[_\W]{0,3}[a40\xe0-\...@][_\w]{0,3}x?[_\W]{0,3}(?:\b|\s)/i


Could be broken down:

(?:\b|\s)   - preamble, detecting space or word boundary.
[_\W]{0,3}   - gap
(?:\\\/|V)   - V
[_\W]{0,3}   - gap
[ij1!|l\xEC\xED\xEE\xEF] - I
[_\W]{0,3}   - gap
[a40\xe0-\...@]   - A
[_\W]{0,3}   - gap
[xyz]?[gj]   - G (with optional extra garbage before it)
[_\W]{0,3}   - gap
r- just R :-)
[_\W]{0,3}   - gap
[a40\xe0-\...@] -A
[_\W]{0,3}   - gap
x?   - optional garbage
[_\W]{0,3}   - gap
(?:\b|\s)- suffix, detecting space or word boundary.

Which detects weird spacings and substitutions in the word Viagra.


 But how are the rules generated themselves? 
   
Mostly meatware, except the sought rules others have mentioned.
 Thnx
   



Re: Valid mail from blacklisted dynamic IPs

2009-10-08 Thread Matt Kettler
MySQL Student wrote:
 Hi,

 I have a set of users that are authorized to use the mail server via
 pop-before-smtp, but SA catches the mail they send through the system
 as spam because they are on blacklisted Verizon or Comcast IPs:

 X-Spam-Status: Yes, hits=5.4 tag1=-300.0 tag2=5.0 kill=5.0
  use_bayes=1 tests=BAYES_50, BOTNET, FH_HOST_EQ_VERIZON_P, RCVD_IN_PBL,
  RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL
   
Does your pop-before-smtp method cause your MTA to indicate they've been
authed in the Received: header?
 I also don't understand how SPF_SOFTFAIL could happen when there
 wasn't any SPF record to test to begin with.
   
Are you sure? What was the envelope from domain for the message? (keep
in mind, this checks the envelope from, not the from header..)

 One of the Comcast users:

 X-Spam-Status: Yes, hits=6.4 tag1=-300.0 tag2=5.0 kill=5.0
  use_bayes=1 tests=BAYES_50, BOTNET, DYN_RDNS_SHORT_HELO_HTML, HTML_MESSAGE,
  RCVD_IN_PBL, RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL,
  SUBJ_ALL_CAPS

 We are working on better Bayes training, but sans that problem, what
 is the right way to address this, through a rule that whitelists their
 specific IP?

 Another mail that I'm dealing with is one sent by Marriott that hit
 SARE_HTML_URI_REFID, DCC_CHECK, and AE_DETAILS_WITH_MONEY, among being
 whitelisted by JMF/HOSTKARMA. I don't know how it hit DCC when there
 are details in there specific to the user, including account numbers,
 user names, etc. 

Some of DCC's signatures are fuzzy, thus will match similar messages
with minor differences. This is done to avoid spammers bypassing by
simply adding a text counter to the message, or some other similar bit
to make each one unique. Combine that with DCC being strictly a
measure of bulkiness not spamminess, and you most likely have your
answer.

You could run it through dccproc to see which of DCC's signatures matched.

As for dealing with it:
whitelist Marriott at the SA level (as you suggest)
whitelist Marriott at the dcc level
remove or severely cut back the score of AE_DETAILS_WITH_MONEY, if
you ever actually expect to get important email about traveling to the UAE.
   
Personally I strongly recommend the third option if you're likely to get
emails about travel to the UAE. That rule (with the IMO overly strong
3.0 score that floats around) is really designed for people who would
never travel there, but get hammered with spam offering trips there. For
folks that might actually do so, maybe 0.5 is more appropriate.


 How should I go about allowing this type of mail
 without disrupting its ability to block mail that should be blocked
 with these rules? I'm sure I can add a rule subtracting points if it
 hits these and comes from Marriott, but I thought there might be
 something that could address the more general problem rather than this
 specific one from Marriott. Perhaps I'm making it too hard.

 Thanks,
 Alex


   



Re: results in languages other than english

2009-10-08 Thread Matt Kettler
ahattarki wrote:
 The spamassassin report comes back in English. Is this configurable to return
 results in languages other than english.

 Also can a single spamassassin handle returning results in different
 languages. One user gets the results back in English, while another gets the
 results back in Korean all on the same instance of SpamAssassin ??

 thanks,
 Anjali
   
SA reads the LANG enviornment variable when it runs, and if it matches
one of the extra language sets (see 30_text_*.cf in the ruleset), then
it will use that text set.

At present, there's no korean translation set, but it's not difficult to
write your own, look at some of the other files for examples.

As for switching  per-user on the fly, AFAIK sa isn't set up for that.
In part, this would require the SA instance to maintain strings for all
language sets in memory at the same time. Right know, if I remember
right, it only loads strings for the language it is set for at the time
the ruleset is parsed during load.




Re: How can i block blank messeage mail

2009-10-22 Thread Matt Kettler
cofe2003 wrote:
 i find SA will not scan a mail if messeage is blank .

 so ,i want score all of blank messeages mails is 6.00

 how can i do?
 my SA version is 3.17
 thanks 
   
That's odd. SA should scan it, unless it is so blank there aren't even
any headers.

How have you integrated SA into your system?

Something like this should work for the scoring:

rawbody  MSG_BODY_EMPTY !~ /./
describe MSG_BODY_EMPTY  Message has no body text
score MSG_BODY_EMPTY 6.0



Re: update does not work correctly?

2009-10-23 Thread Matt Kettler
klop...@gmx.de wrote:

 Hi,

  

 I use Spamassassin 3.2.5 with CentOS

 On October 20 I startet an update with this commands:

 sa-update --channel updates.spamassassin.org

  

 When I start now the update, the date of the folder and file in
 /var/lib/spamassassin/3.002005 does not change. It is still the
 October 20

  

 Did anyone an idea why the date does not change?

Updates are published as needed, which at times means there may be
updates every day, and other times it may be a several months between
releases.

In general spam signatures are fairly broad and generic, and need to be
updated *MUCH* less often than virus signatures. Virus signatures target
a single virus at a time, thus need updating for every new variant,
hence the very frequent releases. SA rules target a generic trait of a
message, and only need updating when there is radical change in the spam
stream.

Looking at the SVN tags, the last update to rules for the 3.2 branch was
pushed back on July 20th.





Re: New to Spamassassin. Have a few ?s...

2009-11-08 Thread Matt Kettler
Computerflake wrote:
 I'm looking into a free spam filter that can do the following. Will
 Spamassassin do these things?

 1) Will it filter multiple domains so I can filter for many different
 companies?
   
Sure. Depending on how you set it up, you can even have per-domain
customization of the whole ruleset.
 2) Will it send individual users an email once a day (for example) to inform
 them of the spam that was captured in case they were not actually spam?
   
Directly? No.. SpamAssassin, by itself, is really just a scanning engine
with header modification abilities. It does not do email management,
quarantines, etc at all. It receives a message, evaluates it, and
modifies it based on the results, nothing more, nothing less.  (this is
done to make SA flexible.. it's a mail pipe, so you can glue it into
almost anything.)

Generally matters like this are handled by integration tools such as
MailScanner, amavisd-new, etc, although I do not know of any that
provide comprehensive quarantine management. That said, I've never
desired such, so I've not looked at length for one. (I mostly just tag
mail, and let users filter at the client level as they see fit.)

See also:
http://wiki.apache.org/spamassassin/IntegratedInMta

 3) Will it allow users to add people to an individual whitelist so they can
 handle their own spam settings?
   
Yes, provided the tools integrate it in a per-user manner.
 4) I understand it connects in to ClamAV using a plugin. How easy is it to
 install the plugin so I can also scan for viruses for folks? 
   
Personally, I'd suggest letting an integration tool call ClamAV and
SpamAssassin independently. The clamav plugin for SA is functional, and
not difficult to set up, but it's not what I would consider an ideal
solution. All it does is cause viruses to show up as a SA rule named
CLAMAV. However, Since SpamAssassin can't drop mail directly, you'll
still need to get an integration tool to detect that marker in the
header and delete the message.
 Thanks for any help. I don't want to spend a fortune on a spam filter if I
 can find a free filter that will do everything I would need. 
   



Re: About log generation

2009-11-08 Thread Matt Kettler
Jose Luis Marin Perez wrote:
 Dear friends,

 There is some configuration of SA to generate different logs and these
 are for each mail domain?
spamd, like most well behaved unix daemons, uses syslog. It doesn't
write logfiles directly.

The old-school approach to this would be to run several instances of
spamd, one per domain, have each log to a separate local* syslog
facility, and have syslogd write each to a separate logfile.

A more modern approach might be possible using some of the newer
syslogd's that can be configured based on message content, not just
facility.severity. However, that assumes you can tell from the log
message alone.. I'm not sure offhand if spamd has that info in the
syslog messages.

 The antispam system analyzes emails from different domains and what I
 want is to generate statistics for each domain.

 Thanks

 Jose Luis
 Discover the new Windows Vista Learn more!
 http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE



Re: Development dead

2009-11-11 Thread Matt Kettler
Anatoly Pugachev wrote:
 On 04.11.2009 / 09:20:16 -0500, Bowie Bailey wrote:
   
 polloxx wrote:
 
 Hi,

 Is the spamassassin development dead?
 On the website there's: 2008-06-12: SpamAssassin 3.2.5 has been released.
   
   
 Not quite.  If you look at svn, you'll see this:

 spamassassin_20091103151200.tar.gz03-Nov-2009 15:122.1M

 Doesn't look dead to me!  :)
 

 Hello!
 Can you please post a full URL to this archive? 
 Since http://svn.apache.org/snapshots/spamassassin/ doesn't have it.


   
The snapshots directory is automatically built and old versions are
purged. The November 3rd image is gone. Now we've got ones from the 10th
and 11th. By the time you look at it again, these might be gone and
newer ones may have replaced them.

[   ] spamassassin_20091110151200.tar.gz 10-Nov-2009 15:12  2.1M 
[   ] spamassassin_20091110211200.tar.gz 10-Nov-2009 21:12  2.1M 
[   ] spamassassin_2009031200.tar.gz 11-Nov-2009 03:12  2.1M 
[   ] spamassassin_2009091200.tar.gz 11-Nov-2009 09:12  2.1M 

However, if you're really just looking to gauge development activity, it
would be better to look at the list archives of all the SVN commits.

http://mail-archives.apache.org/mod_mbox/spamassassin-commits/

or, for the current month of November 2009, sorted by date:

http://mail-archives.apache.org/mod_mbox/spamassassin-commits/200911.mbox/date


Re: Relation bettwen MAIL FROM: and From:

2009-11-12 Thread Matt Kettler
Luis Daniel Lucio Quiroz wrote:

 Hi All,

 I'm wondering if some know is this is possible to stop using SA. Look.

MAIL FROM and From: are commonly mismatched in legitimate mail.

For example, every message that you receive from this list (and every
other sanely configured mailing list) will have an apache.org address in
the MAIL FROM, and the sender in the From:. That's because apache is
remailing, and should receive all DSN's, but they are not the originator
of the message.

There's quite a few other scenarios where mismatches occur outside of
spam. Perhaps you should look more closely at your nonspam email.






Re: Problem with sa-blacklist

2009-11-21 Thread Matt Kettler
Michael Monnerie wrote:
 I can't reach Bill Stearns, so I try at this list:

 Dear Bill,

 I'm using the sa-blacklist.reject for postfix since a long time, but 
 these last days your rsync doesn't work anymore:
 rsync: failed to connect to rsync.sa-blacklist.stearns.org: Connection 
 timed out (110)

 So I had a look if something changed on 
 http://www.sa-blacklist.stearns.org/sa-blacklist/
 but obviously the information there is quite old: If I download the sa-
 blacklist.current.reject, it has a version of April: 200904171539
 while my last rsync version is 200910142031

 Any chance for a fix?

 mfg zmi
   
SA-blacklist and sa-blacklist-uri are both dead as far as use within
SpamAssassin goes. Although someone updated it in 2009, for all
practical purposes it's use as a SA ruleset has been dead (or at least
dying) since 2004. (when the WS sub-list of surbl.org was created)

While it was an interesting case study, but it is *VERY* inefficient,
and will kill most servers. Any use of it should be restricted to
research purposes only (i.e.: reading the list manually to study
patterns in emerging spam domains). It is too heavyweight to use under
SpamAssassin.

The plain sa-blacklist was not very effective, and consumed lots of
memory (750MB per spamd instance?). This list worked on the From:
address of the message, which spammers recycle very quickly. This means
lots of addresses, a huge list, and very low hitrate due to low re-use.
Plain and simple waste of memory to use it under SA. (although manually
looking at the list does have some uses... as noted above..)

The URI version has become the WS list over on surbl. This version had
better hitrates, but the very large list consumed large amounts of
memory too. Also, searching this huge list as a large number regular
expressions is so computationally intensive that most systems can
complete a DNS lookup against surbl.org before the regexes finish
running. It is not unheard of for this ruleset to add 10 or more seconds
to message processing, in addition to the over 1 gig of ram it consumes.
Sure a more recent server with more CPU beef and fast ram could probably
complete it in 3 seconds or so, but that is still slower than a DNS lookup.

Most admins are not willing to devote several gigs of ram just for their
SpamAssassin instances. I doubt you are either, so please don't use
sa-blacklist.

Unless you're looking to use it as a data set for analysis purposes, it
is dead, and has been for a long time. The valuable parts have evolved
into parts of SURBL, which is already in SpamAssassin, unless you're
dealing with a version that is over 4 years old.


Re: SpamAssassin 3.3

2009-11-22 Thread Matt Kettler
LuKreme wrote:
 Is there a roadmap for the release of SA 3.3? 
   
Probably the best roadmap would be to look at the list of bugs assigned
against 3.3.0

https://issues.apache.org/SpamAssassin/buglist.cgi?query_format=advancedbug_status=NEWbug_status=ASSIGNEDbug_status=REOPENEDversion=3.3.0

 a best guess on when it might be released? 
when it's done...

It shouldn't be too terribly long now before a beta is released, based
on reading some of the latest dev list traffic. However, exactly how
long that is depends a lot on how much free time the team has.
 A URL I should be reading instead of posting to the list?
   
You can always browse the dev list archives. There's often good tidbits
on there (and often lots of noise to.. but...)

http://mail-archives.apache.org/mod_mbox/spamassassin-dev/






Re: X_Report_Header

2009-11-29 Thread Matt Kettler
Daniel D Jones wrote:
 Running 3.2.5 under Debian Etch.

 I'm trying to add the Spamassassin X_Report_Header.   Per the website at

 http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html

 report_safe ( 0 | 1 | 2 ) (default: 1)
 ...
 If this option is set to 0, incoming spam is only modified by adding some X-
 Spam- headers and no changes will be made to the body. In addition, a header 
 named X-Spam-Report will be added to spam. You can use the remove_header 
 option to remove that header after setting report_safe to 0.


 I have the option set to 0 in /etc/spamassassin/local.cf and the 
 remove_header 
 option is not configured in any of the files in that directory.  I'm getting 
 the X-Spam-Score,  X-Spam_score_int,  X-Spam_bar, etc headers but I am not 
 getting the Report header.  I've been unable to find anything on the web as 
 to 
 why this might be.  And assistance appreciated.

   
That sounds like your headers are not being generated by SpamAssassin...
SpamAssassin cannot generate headers starting with X-Spam_. They have to
start with X-Spam- (note dash instead of underscore).

What happens when you run a message through SA on the command-line? ie:
  spamassassin  testmsg.txt

Are you using something like this exim integration:
http://www.debianhelp.org/node/10614

In which case, you'll have to edit that exim script, because SA isn't
generating headers in that kind of setup.







Re: Scoring for DATE_IN_FUTURE_96_XX

2009-12-01 Thread Matt Kettler
Thomas Harold wrote:
 On 11/30/2009 9:27 PM, Thomas Harold wrote:
 While looking at the scores in 50_scores.cf, I noticed the following:

 score DATE_IN_FUTURE_03_06 2.303 0.416 1.461 0.274
 score DATE_IN_FUTURE_06_12 3.099 3.099 2.136 1.897
 score DATE_IN_FUTURE_12_24 3.300 3.299 3.000 2.189
 score DATE_IN_FUTURE_24_48 3.599 2.800 3.599 3.196
 score DATE_IN_FUTURE_48_96 3.199 3.182 3.199 3.199
 score DATE_IN_FUTURE_96_XX 3.899 3.899 2.598 1.439

 Why does the 96+ hour rule score so much lower then the 48-96 hour test
 for the last two entries?

 (I'm also wondering if there should be an even higher score rule for
 stuff over 168 hours in the future or past.)

 I did dig up the following thread from back in Oct '06...

 http://mail-archives.apache.org/mod_mbox/spamassassin-users/200611.mbox/browser


 I'm guessing that what it boils down to is contained in the wiki page?
 The spam is better off caught by another rule once network tests are
 allowed?
Yep, since SA is scored as a set, score stealing between rules is
pretty common when there's a lot of overlap between two rules and one
performs slightly better than the other. It's also possible for there to
be more complicated cascades where one rule affects another, which in
turn affects a third, which affects a fourth...

Also looking at the above scores, there's likely no spam network tests
that cover the same mail as 48_96, because its score is pretty much the
same.

 On average the scores of all non-network spam rules should go down a
little bit when the network tests are enabled there are more rules in
the set competing for score. However since the distribution of hits
across rules is distinctly not random, you'll see a lot of non-average
cases, which means some rules will be:
staying the same because they cover mail the network tests don't
going down radically due to heavy overlap
going up because they correct false negatives in some of the
non-spam network tests.

 http://wiki.apache.org/spamassassin/HowScoresAreAssigned




Re: Clear Database Question

2009-12-04 Thread Matt Kettler
Jason Carson wrote:
 Hello everyone,

 Is it necessary to clear the database...

 sa-learn --clear

 ...before I run the following to train SpamAssassin's bayesian classifier...

 sa-learn --spam /home/jason/.maildir/.Spam/cur/

   
No. That would be ill advised.

Running --clear deletes your entire bayes database, which can take a
long time to recover from. I would only advise using it if you've
decided all your previous training is worthless, or your database
becomes corrupted.

Also be sure to consider that once you clear the database SA will
deactivate bayes until 200 spam and 200 nonspam messages get trained.

SpamAssassin will automatically make room when it needs to by pushing
out the least popular tokens through the expire process (which you can
manually trigger via the sa-learn --force-expire command, but it
normally checks during message processing twice a day)







Re: Language detection in TextCat

2009-12-06 Thread Matt Kettler
Marc Perkel wrote:
 I'm wondering if the language detection in TextCat can be improved.
 Here's the situation.

 It appears that TextCat was designed to be inclusive. You list the
 languages you want and it returns many possibilities so as not to
 trigger unwanted falsely.

 What I'm doing is extracting the language list for Exim where I hope
 to offer a language reject list. The problem is that when you are
 rejecting languages you want a smaller list that when you are
 including languages to avoid false positives. I'd rather have a single
 (non-english) result.

 I'm wondering if there's a way to add some more options to alter the
 behavior of the plugin so it is more optimized towards the idea of
 rejecting languages?


The language detection would have to be radically redesigned to have
enough accuracy support this.

Currently TextCat is a *very* crude match, and will often will return
multiple languages for plain English text.

Textcat is not designed to decide what language the email is, but to
find a set of languages it *might* be. It is very prone to declaring
extra languages that are not really present due to it's design.

This is useful in the if it can't be my language, then it's garbage
sense, but not so useful in a reject if it could be this language I
don't like.  You'd really want reject if it *IS* this language I don't
like, but textcat doesn't tell you what language an email is, only a
set of what it might be.



Re: Possible to whitelist *all* incoming emails that contain specific text in the subject line?

2009-12-08 Thread Matt Kettler
nathang wrote:
 Hi,

 I'd like to setup an email account in cPanel so that I receive *all*
 incoming emails that contain a specific word in the subject line.

 It would be critical that I get 100% of the emails sent to me (that contain
 a specific word in the subject line), and that none of them get trapped by a
 spam filter or whatnot, as these emails would signify my paying customers
 with their order details.

 I know that you can whitelist individual email addresses, but is it possible
 to whitelist based on subject line text?

 If this possible to do in cPanel / WHM, how would I go about doing it?

 Thanks!

   
Assuming a non-ancient version of SA (3.1.0 or higher), the
whitelist_subject plugin should be loaded.

So you can just add this to your configuration (i.e.: local.cf):
  whitelist_subject customer

which would whitelist any email with the word customer in the subject.

As for doing it in cPanel / WHM... no clue, I've never used either tool.




Re: Sharing and merging bayes data?

2009-12-17 Thread Matt Kettler
On 12/17/2009 2:50 AM, Rajkumar S wrote:
 Hello,

 I have 2 SA servers running for a single domain. Both were primed with
 a set of 200 spam and ham messages are are now auto learning. After
 about a day both have auto learned different numbers of ham and spam
 mails. Is it possible to merge the bayes data every night and update
 both servers with new merged data?

 with regards,

 raj

   

No.. If you're using file-based bayes, there's no good way to share
updates between one DB and the other. The information needed to make
such a merger successful isn't stored, because it is not needed for any
reason within SpamAssassin. The database merely stores the token, it's
spam count, it's nonspam count, and a last-seen timestamp. If you look
at the same token in 2 different databases, you can't really merge these
counts, because you don't know how many occurred since your last merge.

 If you really want common bayes data between two servers, you should
configure bayes to use a SQL server (MySQL, etc) and point both
SpamAssassin configurations to the same database. This also has the
benefit that both servers are continuously in-sync.



Re: Sharing and merging bayes data?

2009-12-17 Thread Matt Kettler
On 12/17/2009 11:17 AM, RW wrote:
 If you're using file-based bayes, there's no good way to share
  updates between one DB and the other. The information needed to make
  such a merger successful isn't stored, because it is not needed for
  any reason within SpamAssassin. The database merely stores the token,
  it's spam count, it's nonspam count, and a last-seen timestamp. If
  you look at the same token in 2 different databases, you can't really
  merge these counts, because you don't know how many occurred since
  your last merge.
 
 I'm not saying it's a good idea, but it is possible provided that you
 retained the result of the previous merge. It should be simple to
 script too.

   
Agreed I didn't mean to say that a merge is impossible, it's just not
with the tools that SA comes with, and you need more info than just
what's in the current database.

 As you mentioned, you'd need a custom script  (not wildly complicated
for a good perl scripter, but beyond the bounds of someone with only
crude scripting skills.) as well as historical copies of each database
from the last merge.

Setting up SQL would be much easier.





Re: spamassassin or spamd with amavisd-new?

2010-01-05 Thread Matt Kettler

On 1/5/2010 6:09 AM, Angel L. Mateo wrote:
 Hello,

 Because FH_DATE_PAST_20XX bug, I have found that when I run
 spamassassin through amavisd-new (in a postfix server) I need to
 restart spamassassin and amavisd-new after any change in spamassassin.

 Debugging this, I found that amavisd-new doesn't connect to my
 spamd daemon to check mails, so I think it is using spamassassin
 command instead of spamc (I have spamd running in foreground, without
 -d option and I haven't seen any connection)

 However, I have read in spamassassin that spamc has better
 performance than spamassassin, so I would like amavisd-new to use
 spamc instead of spamassassin.

 I don't know much of amavisd-new and spamassassin implementations
 details, but I have found that amavisd-new connect with spamassassin
 throught is perl interface by create a SpamAssassin object like this:

  my($spamassassin_obj) = Mail::SpamAssassin-new({
 debug = $sa_debug,
 save_pattern_hits = $sa_debug,
 dont_copy_prefs   = 1,
 local_tests_only  = $sa_local_tests_only,
 home_dir_for_helpers = $helpers_home,
 stop_at_threshold = 0,
   });

 Do you know if there is any option to tell perl object to use the
 spamd daemon? Is there any way to use spamd daemon with amavis? Is it
 worth in a mail gateway with hugh loads?


Stop, you do NOT need to do this. It would be slower.

Amavisd-new does not use the spamassasin command-line application
(which is really slow), it is loading perl API directly and re-using
that API instance, which is even more efficient than spamc. You don't
see the perl API method discussed very often because it only makes sense
when using an integration tool written in perl (which amavis is). In
effect, amavisd-new is already it's own spamd daemon using this method.
Invoking spamc on the command line would add more overhead to this process.
 
Really, all spamd does is create a reusable instance of a
Mail::SpamAssassin perl object, and keeps it loaded so it can process
several messages that spamc feeds this. This is exactly what amavisd-new
is already doing internal to its own code, so it doesn't need spamd.

Running spamassassin on the command line is really slow, because it
creates a new Mail::SpamAssassin object, scans a single message, and
exits. This is great for quick checks of the configuration, but not at
all efficient in a mailstream. However, amavisd-new does not do this. It
creates and re-uses a Mail::SpamAssassin object.

Read the main page of the amvavis website which states:

http://www.ijs.si/software/amavisd/

Which will point out:

when configured to call /Mail::SpamAssassin/ (this is optional), it
orders SA to pre-load its config files and to precompile the patterns,
so performance is at least as good as with spamc/spamd setup. All Perl
modules are pre-loaded by a parent process at startup time, so forked
children need not re-compile the code, and can hopefully share some
memory for compiled code;






Re: ALL_TRUSTED rule no longer working

2010-01-05 Thread Matt Kettler
On 1/5/2010 8:03 PM, Julian Yap wrote:
 Previously I was running SpamAssassin-3.1.8_1 on FreeBSD.

 I recently upgraded to 3.2.5_4.

 It's seems now, I never get any hits on the rule ALL_TRUSTED.

 Previously it seemed like SA was doing some kind of dynamic evaluation
 which was working well.

 - Julian

is NO_RELAYS or UNPARSEABLE_RELAY also hitting?

In older versions of SA, ALL_TRUSTED was really implemented as no
untrusted, so it would fire off if there were no relays, or no
parseable ones. This caused problems with ALL_TRUSTED matching spam when
people ran SA on servers with malformed headers.

Later we changed it to fire if there is:
-at least one trusted relay
-no untrusted relays
-no unparseable relays.

Which might be the cause of your problem.






Re: ALL_TRUSTED rule no longer working

2010-01-06 Thread Matt Kettler
On 1/6/2010 3:43 PM, Julian Yap wrote:

 On Tue, Jan 5, 2010 at 5:12 PM, Matt Kettler mkettler...@verizon.net
 mailto:mkettler...@verizon.net wrote:

 On 1/5/2010 8:03 PM, Julian Yap wrote:
 Previously I was running SpamAssassin-3.1.8_1 on FreeBSD.

 I recently upgraded to 3.2.5_4.

 It's seems now, I never get any hits on the rule ALL_TRUSTED.

 Previously it seemed like SA was doing some kind of dynamic
 evaluation which was working well.

 - Julian

 is NO_RELAYS or UNPARSEABLE_RELAY also hitting?

 In older versions of SA, ALL_TRUSTED was really implemented as no
 untrusted, so it would fire off if there were no relays, or no
 parseable ones. This caused problems with ALL_TRUSTED matching
 spam when people ran SA on servers with malformed headers.

 Later we changed it to fire if there is:
 -at least one trusted relay
 -no untrusted relays
 -no unparseable relays.

 Which might be the cause of your problem.


 NO_RELAYS gets no hits but UNPARSEABLE_RELAY is working.

 Should I be getting some hits on NO_RELAYS?

 Thanks for the further explanation.

 - Julian

Neither of these rules should *EVER* fire. They both indicate error
conditions.



  1   2   3   4   5   6   7   8   9   10   >