Re: AWL functionality messed up?
Linda Walsh wrote: Bowie Bailey wrote: Linda Walsh wrote: I got a really poorly scored piece of spam -- one thing that stood out as weird was report claimed the sender was in my AWL. Any sender who has sent mail to you previously will be in your AWL. This is probably the most misunderstood component of SA. Read the wiki. http://wiki.apache.org/spamassassin/AutoWhitelist At face value, this seems very counter productive. It's obvious you're taking it at face value and you've not read the URL above. You're seeing whitelist in the name, and beliving it. Sorry the name is misleading, but the AWL is not a whitelist. If I get spam from 1000 senders, they all end up in my AWL??? WTF? You're leaping to wildly incorrect conclusions, mostly because you're assuming the AWL is a whitelist. It's not. *READ* the URL above. No, really READ IT. You don't understand the AWL yet. AWL should only be added to by emails judged to be 'ham' via the feed back mechanisms --, spammers shouldn't get bonuses for being repeat senders... Who says they get bonuses just for being a repeat sender?? They get bonuses or penalties, all depending. The AWL isn't a whitelist Linda. It's an averager. It can whitelist or blacklist messages. If they send a message that scores less than their previous average, they get a positive AWL score (blacklisting). If they send one that's higher they get a negative score (whitelisting). HOWEVER, in the AWL, a simple look at the positive or negative sign on the score doesn't really tell you much. Take this example: Pre-AWL score +12, AWL -2, Final score +10, . What did the AWL think of this sender based on history? +6, spammer. If the same sender instead sent: Pre-AWL score +4 the AWL would hit at +1.0 resulting in Final score +5.0. End result: same sender, different messages, different signs on the AWL, but both are still tagged as spam. And in one example, a false negative was avoided based on their history. How do I delete spammer addresses from my 'auto-white-list'? \ spamassassin --remove-addr-from-whitelist=...@example.com (That's just insane..whitelisting spammers?!?!) No, it's insane to have the AWL named AWL, because it's not a white list. It's really A history-based score averaging system with automatic whitelisting and blacklisting effects. However, AHBSASWAWB is an awfuly long name. I *REALLY* suggest you read up on how the AWL works, for real, before jumping to conclusions about what it is, and what it does. It really doesn't work the way you think.
Re: my AWL messed up?
Linda Walsh wrote: To be clear about what is being white listed, would it hurt if the 'brief report for the AWL', instead of : -1.3 AWLAWL: From: address is in the auto white-list it had -1.3 AWLAWL: 'From: 518501.com' addr is in auto white-list So I can see what domain it is flagging with a 'white' value? I don't know of any emails from '518501.com' that wouldn't have been classified spam, so none should have a 'negative value'. What was the final message score in this example? Looking at the AWL score alone is meaningless, and doesn't show what the AWL thinks the historical average is. If the final score was over 6.3, the AWL still thought the sender was a spammer. It's just splitting the averages.
Re: Problem with check_invalid_ip()
Eric Rodriguez wrote: Hi, I'm having trouble with the check invalid_ip subroutine in the RelayEval.pm. See http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/RelayEval.pm?view=logr1=451385pathrev=451385 http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/RelayEval.pm?view=logr1=451385pathrev=451385 After a couple test, it seems that 193.X.X.X and 194.X.X.X ip's are not valid with respect to the regexp. Is this a bug? or am I wrong about the test? I used http://www.fileformat.info/tool/regex.htm with RegExp: (?:[01257]|(?!127.0.0.)127|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+ Tests: 127.0.0.1 192.168.1.1 87.248.121.75 193.1.1.1 194.1.1.1 Could someone explain me which ip are valid according to this test ? Thanks Eric Rodriguez Using the above tool I get results telling me that 193.1.1.1 and 194.1.1.1 do NOT match, and therefore are valid IPs. TestTarget String matches() replaceFirst() replaceAll() lookingAt() find() group(0) 1 193.1.1.1 *No*193.1.1.1 193.1.1.1 No No 2 194.1.1.1 *No*194.1.1.1 194.1.1.1 No No In fact, NONE of your test strings match the regex. But 127.1.1.1, correctly, does.
Re: Problem with check_invalid_ip()
Eric Rodriguez wrote: Hi, I removed the negation ~ , the begin ^ and end $ charaters from the original source: sub check_for_illegal_ip { my ($self, $pms) = @_; foreach my $rcvd ( @{$pms-{relays_untrusted}} ) { # (note this might miss some hits if the Received.pm skips any invalid IPs) foreach my $check ( $rcvd-{ip}, $rcvd-{by} ) { return 1 if ($check =~ /^ (?:[01257]|(?!127.0.0.)127|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+ $/x); } } return 0; } Here are my results: Test Target String matches() replaceFirst() replaceAll() lookingAt() find() group(0) 1 127.0.0.1 No 12 12 No Yes 7.0.0.1 2 192.168.1.1 No 19 19 No Yes 2.168.1.1 3 87.248.121.75 No 8 8 No Yes 7.248.121.75 4 193.1.1.1 No 193.1.1.1 193.1.1.1 No No 5 194.1.1.1 No 194.1.1.1 194.1.1.1 No No If I understand correctly the first 3 tests are valid IP, but not the 193.1.1.1 and 194.1.1.1 ?? Eric Rodriguez No, none of the 5 has yes in the matches column so they're all valid (ie: none of them matches the regex). The other columns are irrelevant to the application here. Please ignore them unless you fully understand them.
Re: tests= SIZE_LIMIT_EXCEEDED ??
Stefan-Michael Guenther wrote: Hi, I just had a closer look at the header of an email which should have been recognized by spamassassin as spam. Waht I found was this: X-SpamScore: 0 tests= SIZE_LIMIT_EXCEEDED I have checked /usr/share/spamassassin/ for a rule which might contain a size limit, but didn't finde any. A search with Google didn't help either. So, any suggestions from the list members where I can define the size that has been exceeded? Thanks, Stefan Interesting, are you just using spamc/spamd, or a different integration tool? In general it sounds like something decided not to feed the message to the main SpamAssassin instance at all. Spamc can do this, but I didn't know it added a test when doing so. Also, X-SpamScore is not a default header, and one that SA couldn't add itself. (it must add headers beginning with X-Spam-, so you'd get X-Spam-Score at the closest), so I'm suspecting this was done by your integration tools.
Re: Unsubscribe
Mike Yrabedra wrote: unsubscribe If you look at the message headers, there's a header explaining where to send unsubscribe messages to (this is the RFC standard header for doing this, so look for it in other mailing lists): List-Unsubscribe: mailto:users-unsubscr...@spamassassin.apache.org
Re: Unsubscribe
Michael Scheidell wrote: Since we saw two of them come in pretty back to back, I suspect a joe job of sometype. those people might not have subscribed. That would be a bit tricky to just be a joe job. This list is confirmed opt-in. i.e.: if you subscribe, an automated bot from ezlm sends you a message that you need to reply to to get subscribed. Well, actually all you have to do is send a second message to a different address that contains randomly generated text as a magic cookie. But still, you need to know that randomly generated address. Of course, it's always the possibility someone guessed the random text in the reply address.. but, good luck.. They start off a bit like this (note: I've munged the email address, and changed the values of the magic text and serial number, but I have not changed the length. I substituted letters for letters, and numbers for numbers. Otherwise, this is the start of a real confirm message.) - Hi! This is the ezmlm program. I'm managing the users@spamassassin.apache.org mailing list. To confirm that you would like exam...@example.com added to the users mailing list, please send a short reply to this address: users-sc.1244818352.jacibredcfjnkiobdtef-example=example@spamassassin.apache.org Usually, this happens when you just hit the reply button. ...
Re: Custom Rule Sets
rich...@buzzhost.co.uk wrote: Good morning, Looking at the docs I see a 'don't add your customer rules here' warning in reference to the default /usr/share/spamassassin dir. Instead it lists a couple of options including local.cf Is it possible to ask local.cf to include external files/dir for custom rules at all? Yes, there is an include directive (see the Mail::SpamAssassin::Conf docs) but by default SA will load *ALL* .cf files from your site rules directory (usually /etc/mail/spamassassin), so includes at the local.cf level are a bit silly. Just put extra .cf files in the same directory and SA will load them. Generally speaking, the include directive is only used at the user_prefs level, where a single file is parsed by default, not a whole directory. See also: http://wiki.apache.org/spamassassin/WritingRules
Re: Custom Rule Sets
rich...@buzzhost.co.uk wrote: On Mon, 2009-06-22 at 00:26 -0400, Matt Kettler wrote: rich...@buzzhost.co.uk wrote: Good morning, Looking at the docs I see a 'don't add your customer rules here' warning in reference to the default /usr/share/spamassassin dir. Instead it lists a couple of options including local.cf Is it possible to ask local.cf to include external files/dir for custom rules at all? Yes, there is an include directive (see the Mail::SpamAssassin::Conf docs) but by default SA will load *ALL* .cf files from your site rules directory (usually /etc/mail/spamassassin), so includes at the local.cf level are a bit silly. I agree - but the docs seem to imply that you should not put them in here - hence my confusion. Where do they imply you should not create additional .cf files?
Re: Custom Rule Sets
rich...@buzzhost.co.uk wrote: On Mon, 2009-06-22 at 07:30 -0400, Matt Kettler wrote: rich...@buzzhost.co.uk wrote: On Mon, 2009-06-22 at 00:26 -0400, Matt Kettler wrote: rich...@buzzhost.co.uk wrote: Good morning, Looking at the docs I see a 'don't add your customer rules here' warning in reference to the default /usr/share/spamassassin dir. Instead it lists a couple of options including local.cf Is it possible to ask local.cf to include external files/dir for custom rules at all? Yes, there is an include directive (see the Mail::SpamAssassin::Conf docs) but by default SA will load *ALL* .cf files from your site rules directory (usually /etc/mail/spamassassin), so includes at the local.cf level are a bit silly. I agree - but the docs seem to imply that you should not put them in here - hence my confusion. Where do they imply you should not create additional .cf files? I does not. I've already covered that and thanked a poster earlier for guiding me in my error. Did you not read the follow up I posted? About 20 seconds after I replied.. Sorry, just waking up for the AM here... Didn't think to read the rest of the thread.
Re: Unable to update SARE
Frank Bures wrote: Since yesterday, when running sa-update --channelfile /etc/mail/spamassassin/sare-sa-update-channels.txt --gpgkey 856AA88A I get Use of uninitialized value in concatenation (.) or string at /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/Scalar/Util.pm line 30. An example line from sare-sa-update-channels.txt: 70_sare_adult.cf.sare.sa-update.dostech.net Any ideas will be greatly appreciated. Why are you trying to update SARE? You might want to read the front page of the website: http://www.rulesemporium.com/
Re: Use of uninitialized value $dir in scalar chomp at /usr/local/bin/spamd line 2118, GEN103 line 2.
alexus wrote: On Thu, Apr 23, 2009 at 4:08 PM, alexusale...@gmail.com wrote: On Wed, Apr 8, 2009 at 12:50 AM, Matt Kettler mkettler...@verizon.net wrote: alexus wrote: I keep getting this line in my logs everytime there is a spamd calles Apr 8 03:55:15 mx1 spamd[36109]: Use of uninitialized value $dir in scalar chomp at /usr/local/bin/spamd line 2118, GEN103 line 2. i dont suppose this is normal Are you using the -v parameter when you start spamd, but are passing a username that's not a vpopmail user with working vuserinfo? Code: - if ( $opt{'vpopmail'} ) { my $vpopdir = $dir; $dir = `$vpopdir/bin/vuserinfo -d \Q$username\E`; if ($? != 0) { # # If vuserinfo failed $username could be an alias # $dir = `$vpopdir/bin/valias \Q$username\E`; if ($? == 0 $dir !~ /.+ - /) { $dir =~ s,.+ - (/.+)/Maildir/,$1,; } else { undef($dir); } } chomp($dir); } -- i even tried with vpopmail user instead of spamd user, I still get this warning -- http://alexus.org/ sorry for getting back for older post, but i never got around to fix this issue, and I think it should be fixed... can someone suggest me how to resolve this issue? let me recap every time an email comes in, I get following line in my syslog spamd[30649]: Use of uninitialized value $dir in scalar chomp at /usr/local/bin/spamd line 2118, GEN990 line 2. that's how I run spamd root 1736 0.0 0.5 70044 40568 ?? SsJ 23May09 3:53.05 /usr/local/bin/spamd --allow-tell --daemonize --vpopmail --username=spamd --socketpath=/tmp/spamd.sock --pidfile /usr/local/var/run/spamd.pid (perl) Ok, so you're running with vpopmail, and spamd is running as the spamd user. So, what virtual user are you passing to spamc's -u parameter? What happens when you run vuserinfo and pass the above username to it?
Re: 552 spam score (11.3) exceeded threshold
John Hardin wrote: On Mon, 22 Jun 2009, Pawe�~B T�~Ycza wrote: Yesterday I was trying to send here warning of new www.shopXX.net spam flood. It was short letter with a few URLs to pastebin.com. Unfortunately my messages hasn't arrived at the mailing list. What's up? Do I really look like a spammer? ;) It's a bad idea to pass SA list email through SA... All mail on the list passes through SA anyway, albeit with a high required_score (10.0) the Apache Software Foundation (ASF) runs it on their mailservers, and this list is hosted by them. Check the headers. It's a little unfortunate in that it makes posting spam samples a pain, but we're not the only project using these servers.
Re: gpg signed spam email ???
RobertH wrote: i was reading at http://www.karan.org/blog/ specifically http://www.karan.org/blog/index.php/2009/06/15/gpg-signed-spam that he recv'd a gpg signed spam email ive never heard of that before yet i havent thought much about it or studied it... Q: is this unheard of, or common? near as i can quickly investigate, it doesnt appear to be common as per papa google [sic]. comments? feedback? just trying to get up on the curve now. Well, let's put it this way: A long, long time ago, SA had a rule in the default set, giving negative score to PGP and GPG signed messages. Quickly, spammers started adding enough fragments of a signature to match the rule. This was very obvious, as the rule only matched the begin clause, and the spams had a begin clause dropped at the bottom of the message, with no end clause. The rule could have been modified to validate the signature, but of course, anyone can GPG sign a message and have it be valid, and the spammers probably would have done so if the rule changed. Therefore, the rule was dropped from the set entirely. GPG signatures only validate that the sender has the private key that matches the public one signing the email. Like SPF, and many other authentication only technologies, this doesn't tell you anything about the sender. Even perfect authentication at best only provides confirmation of who the sender is, and most of these technologies only prove a sender is the proper owner holder of some abstract identity like a key or domain. Authentication needs to be paired with recognition to be meaningful. If a sender proves who they are, will you immediately accept the email without further question? What if they just proved they were Alan Ralsky? http://www.spamhaus.org/rokso/listing.lasso?-op=cnspammer=Alan%20Ralsky Moral of the story: don't assign negative scores to systems that only provide authentication, unless you're somehow pairing it with proof the sender is someone you actually trust (or at least is trusted by a service you trust, etc). Ever notice that the negative score of SPF_PASS is insignificantly small, there's a reason for that.. Spammers can pass SPF too, so by itself, it's meaningless. But paired with your explicit trust of a domain or sender, it provides forgery resistant whitelisting (whitelist_from_spf).
Re: gpg signed spam email ???
True, it likely is. But it would also be trivial for the spammer to generate a valid one. Given what we've seen with the image spams in the past (custom generated image for *every* email with random font, size, color, offset, and randomized dots added on), computational power is hardly an obstacle. As before, you might be able to write a plugin to check the signature and assign positive points if it is invalid, but I don't know if that would work long enough to be worthwhile. Justin Mason wrote: there's a very good chance the GPG signature in this case was fake -- ie. a cut-and-paste job. --j. On Sat, Jun 27, 2009 at 19:05, Matt Kettlermkettler...@verizon.net wrote: RobertH wrote: i was reading at http://www.karan.org/blog/ specifically http://www.karan.org/blog/index.php/2009/06/15/gpg-signed-spam that he recv'd a gpg signed spam email ive never heard of that before yet i havent thought much about it or studied it... Q: is this unheard of, or common? near as i can quickly investigate, it doesnt appear to be common as per papa google [sic]. comments? feedback? just trying to get up on the curve now. Well, let's put it this way: A long, long time ago, SA had a rule in the default set, giving negative score to PGP and GPG signed messages. Quickly, spammers started adding enough fragments of a signature to match the rule. This was very obvious, as the rule only matched the begin clause, and the spams had a begin clause dropped at the bottom of the message, with no end clause. The rule could have been modified to validate the signature, but of course, anyone can GPG sign a message and have it be valid, and the spammers probably would have done so if the rule changed. Therefore, the rule was dropped from the set entirely. GPG signatures only validate that the sender has the private key that matches the public one signing the email. Like SPF, and many other authentication only technologies, this doesn't tell you anything about the sender. Even perfect authentication at best only provides confirmation of who the sender is, and most of these technologies only prove a sender is the proper owner holder of some abstract identity like a key or domain. Authentication needs to be paired with recognition to be meaningful. If a sender proves who they are, will you immediately accept the email without further question? What if they just proved they were Alan Ralsky? http://www.spamhaus.org/rokso/listing.lasso?-op=cnspammer=Alan%20Ralsky Moral of the story: don't assign negative scores to systems that only provide authentication, unless you're somehow pairing it with proof the sender is someone you actually trust (or at least is trusted by a service you trust, etc). Ever notice that the negative score of SPF_PASS is insignificantly small, there's a reason for that.. Spammers can pass SPF too, so by itself, it's meaningless. But paired with your explicit trust of a domain or sender, it provides forgery resistant whitelisting (whitelist_from_spf).
Re: RulesDuJour
Anshul Chauhan wrote: we have to copy KAM.cf to /usr/share/spamassassin only for its integration with spamassassin or something else is to done I'm using spamassassin-3.2.5-1.el4.rf on Centos4.7 Any add-on rules should be placed in the same directory as your local.cf (ie: /etc/mail/spamassassin/ in most cases). SA reads *.cf from this directory, not just local.cf. Adding files to /usr/share/spamassassin, or making changes to files present there, is not a good idea. When SpamAssassin gets upgraded, this whole directory will be nuked by the installer.
Re: perms problems galore
Gene Heskett wrote: Greetings all; I _thought_ I had sa-update running ok, but it seemed that the effectiveness was stagnant, so I found the cron entry that was running as-update discovered a syntax error there, which when I fixed it, disclosed that I had all sorts of perms problems that I don't seem to be able to fix readily. sa-update is being run as the user saupdate, which is a member of the group mail. I have made the whole /var/lib/spamassassin/keys tree an saupdate:mail, with very limited rights as in: drw--- 2 saupdate mail 4096 2008-12-19 16:05 keys But sa-update appears not to have perms to access or create gpg keys there. -- [r...@coyote init.d]# su saupdate -c /usr/bin/sa-update --gpghomedir /var/lib/spamassassin/keys gpg: failed to create temporary file `/var/lib/spamassassin/keys/.#lk0xb9bfb8a8.coyote.coyote.den.8955': Permission denied -- What do I need to open that up to? Thanks. In order to be able to create files, you need the X permission on a directory. That said, why give the saupdate user the ability to add keys at all? Import them as root and only give the saupdate user read access.
Re: perms problems galore
Gene Heskett wrote: Ok, I'll fix that, thanks. That said, why give the saupdate user the ability to add keys at all? Import them as root and only give the saupdate user read access. Basically, since I run myself as root, I was trying to reduce the exposure. All the rest of the routine mail handling here is by unpriviledged users. And it is all behind a dd-wrt firewall with NAT. True, but installing keys isn't something that should be routine. This should only be possible manually. i.e.: sa-update does not need to create or write to the key file to perform an update. If you're concerned about exposure, it's really best that your automatic saupdate user not have rights over the key file, it doesn't need it.
Re: Annoying auto_whitelist
Michelle Konzack wrote: Hello, while I get currently several 1000 shop/meds/pill/gen spams a day and some are going throug my filters, I have to move them to my spamfolder manualy and feed them to sa-learn --spam but this does not work... ...because the Spamer From: is in the auto_whitelist. Wait a second. The AWL has nothing to do with bayes or sa-learn. The only reason SA won't learn a message a spam would be if it has already been learned as spam, as noted in the bayes_seen database (or corresponding SQL table). For me, this seems to be a bug, becuase sa-learn has to remove the From: from the auto_whitelist and then RESCAN this crap. Um, the AWL has nothing to do with sa-learn --spam, and this action will neither consult, nor modify the AWL. What makes you think the AWL is inhibiting learning? The AWL is actually going to contain *EVERY* sender that ever sent you email (because it is an averager, not a whitelist), so if it would inhibit learning, you'd never be able to learn anything.
Re: Annoying auto_whitelist
Michelle Konzack wrote: Hello, while I get currently several 1000 shop/meds/pill/gen spams a day and some are going throug my filters, I have to move them to my spamfolder manualy and feed them to sa-learn --spam but this does not work... ...because the Spamer From: is in the auto_whitelist. For me, this seems to be a bug, becuase sa-learn has to remove the From: from the auto_whitelist and then RESCAN this crap. Is the AWL actually causing false negatives? Please be aware the AWL is NOT whitelist, or a blacklist, and the scores don't really quite work the way they look. The AWL is essentially an averager, and as such, it's sometimes going to assign negative scores to spam sometimes. This does *NOT* necessarily mean the AWL has whitelisted the sender, unless it pushes it below the required_score. It just means that this spam scored higher than the last one. i.e.: if a spam scoring +20 gets a -5 AWL, the AWL still believes the sender is a spammer with a +10 average. If that same sender had instead sent a message scoring 0, the AWL would have given them a +5. Please be sure to read: http://wiki.apache.org/spamassassin/AwlWrongWay Before you make too many judgments about what the AWL is doing. Looking at the score it assigns alone does not tell you anything about what the AWL is doing.
Re: Current Rules Repository
Patrick Sherrill - Coconet wrote: With SARES et al not being updated, where is the best repository for current rules being maintained? The default sa-update channel. .
Re: Never ending spam flood www.viaXX.net?
rich...@buzzhost.co.uk wrote: On Fri, 2009-07-10 at 21:26 +1200, Jason Haar wrote: On 07/10/2009 09:01 PM, Paweł Tęcza wrote: Please see my initial post on Pastebin: http://pastebin.com/f6a83e9fb If it's true that all those domains resolve to just a handful of IP addresses, then why aren't they listed in - oh wait - SURBLs don't cover IPs just the DNS names - argh! Is there a way to do SURBL lookups of the IP instead of the FQDN? Is there not some kind of 'intent' plugin for SA? Barracuda (which steal everything else) have an intent scanner that looks at links in mails and resolves the name to IP *AND* the AUTH NS. Then looking the IP's found up. SA has always avoided resolving forward lookups of potentially spammer controlled domains to IPs. This is extremely foolish to do, as it opens you up to a variety of attacks against your DNS resolver. (resolver cache poisoning, DoS, etc) I can't believe they wrote it themselves - seriously I can't! What plug in is it? It's no plugin I know of, but it's a feature we intentionally left out of SA for security reasons. So given that it's a really bad idea I'd guess barracuda did implement it themselves.
Re: Never ending spam flood www.viaXX.net?
Steve Freegard wrote: Matt Kettler wrote: rich...@buzzhost.co.uk wrote: On Fri, 2009-07-10 at 21:26 +1200, Jason Haar wrote: On 07/10/2009 09:01 PM, Paweł Tęcza wrote: Please see my initial post on Pastebin: http://pastebin.com/f6a83e9fb If it's true that all those domains resolve to just a handful of IP addresses, then why aren't they listed in - oh wait - SURBLs don't cover IPs just the DNS names - argh! Is there a way to do SURBL lookups of the IP instead of the FQDN? Is there not some kind of 'intent' plugin for SA? Barracuda (which steal everything else) have an intent scanner that looks at links in mails and resolves the name to IP *AND* the AUTH NS. Then looking the IP's found up. SA has always avoided resolving forward lookups of potentially spammer controlled domains to IPs. This is extremely foolish to do, as it opens you up to a variety of attacks against your DNS resolver. (resolver cache poisoning, DoS, etc) I can't believe they wrote it themselves - seriously I can't! What plug in is it? It's no plugin I know of, but it's a feature we intentionally left out of SA for security reasons. So given that it's a really bad idea I'd guess barracuda did implement it themselves. Are you forgetting URIBL_SBL?? That requires the A or NS records of the URI to function. We do NS only. Not A.
Re: Annoying auto_whitelist
RW wrote: On Fri, 10 Jul 2009 12:33:51 +0200 Matus UHLAR - fantomas uh...@fantomas.sk wrote: On Sat, 04 Jul 2009 08:56:35 -0400 Matt Kettler mkettler...@verizon.net wrote: Please be aware the AWL is NOT whitelist, or a blacklist, and the scores don't really quite work the way they look. The AWL is essentially an averager, and as such, it's sometimes going to assign negative scores to spam sometimes. And it works from its own version of the score that ignores whitelisting and bayes scores. So if learning a spam leads to the next spam from the same address getting a higher bayes score, that benefit isn't washed-out by AWL. On 04.07.09 22:42, RW wrote: I take that back, I thought the the BAYES_XX rules were ignored by AWL, but they aren't. Personally I think BAYES should be ignored by AWL, emails from the same from address and ip address will have a lot of tokens in common. They should train quickly, and there shouldn't be any need to damp-out that learning. I don't think so. Teaching BAYES is a good way to hint AWL which way should it push scores. By ignoring bayes, you could move much spam the ham-way since much of spam isn't catched by other scores than BAYES, and vice versa. Right, but that's only a benefit if the BAYES score drops - remember it's an averaging system. Personally I only have a single spam in my spam corpus that has a AWL hit and doesn't hit BAYES_99, and that hits BAYES_95. Sending multiple spams from the same from address and IP address is a gift to Bayesian filters. The much more common scenario is that the first spam hits BAYES_50 and subsequent BAYES_99 hits are countered by a negative AWL score. Technically, this only counters half the score. It also gets paid back later. It raises the stored average that will apply to subsequent messages. I'd also argue it's a rather rare case. Most of my spam hits BAYES_99 the first shot around, and most has varying sender address and IP. The odds of one having increasing score and the same sender address/ip seems extraordinarily unlikely to me. Besides, the real problem there isn't the AWL, but the fact that the first message scored low. Are you really seeing cases where this is causing false negatives, or are you just pontificating about what's possible?
Re: deactivate all checks except specific tests
sebast...@debianfan.de wrote: Hello, i have set up a virtual server for experiments. I want to disable all the spamassassin tests - except one specific rbl - in this topic- the manitu rbl. Is there a parameter for disabling all the tests? There is no option to disable all rules. However, you could use the -C parameter to either spamd or spamassassin and point SA to a directory that does not contain any rulefiles, except a single .cf containing the rule you want to run. This would effectively remove the stock ruleset from the parse.
Re: Opt In Spam
Have you reported the abuse to mailto:habeas@abuse.net, as Neil Schwartzman from Return Path (operators of Habeas) requested last time? Just posting to the sa-users list isn't really going to do very much. If there are pervasive FP problems, it will show up in the mass-checks and we'll drop the score. twofers wrote: And yet another SPAM from these opt-in guys. I believe this group are nothing but covert Spammers abusing a privilage afforded them. I receive these spams at two separate email addresses, both I use exclusively for my business, there is no way I'd use these addresses as an opt-in for anything. They are not personal emails and I'd never consider using them as opt-in for anything. I don't opt-in for anything ever to begin with anyway. X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on H67646.safesecureweb.com X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=HABEAS_ACCREDITED_SOI, HTML_IMAGE_RATIO_02,HTML_MESSAGE,LOCAL_URI_NUMERIC_ENDING,MISSING_MID, MPART_ALT_DIFF,SARE_UNSUB09 autolearn=no version=3.2.1 X-Spam-Report: * 0.0 MISSING_MID Missing Message-Id: header * 1.3 SARE_UNSUB09 URI: SARE_UNSUB09 * 2.0 LOCAL_URI_NUMERIC_ENDING URI: Ends in a number of at least 4 digits * 0.0 HTML_MESSAGE BODY: HTML included in message * 1.1 MPART_ALT_DIFF BODY: HTML and text parts are different * 0.6 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image area * -4.3 HABEAS_ACCREDITED_SOI RBL: Habeas Accredited Opt-In or Better * [66.59.8.161 listed in sa-accredit.habeas.com] Received: (qmail 17894 invoked from network); 15 Jul 2009 12:21:13 -0400 Received: from mailengine.8lmediamail.com (66.59.8.161) by mail.jelsma.com with SMTP; 15 Jul 2009 12:21:12 -0400 Received-SPF: pass (mail.jelsma.com: SPF record at mailengine.8lmediamail.com designates 66.59.8.161 as permitted sender) Received: by mailengine.8lmediamail.com (PowerMTA(TM) v3.2r23) id hbo0ve0eutci for embroid...@x.com mailto:embroid...@x.com; Wed, 15 Jul 2009 09:14:23 -0700 (envelope-from streamsendboun...@mailengine.8lmediamail.com mailto:streamsendboun...@mailengine.8lmediamail.com) Content-Type: multipart/alternative; boundary=_--=_1073964459106330 MIME-Version: 1.0 X-Mailer: StreamSend - 23361 X-Report-Abuse-At: ab...@streamsend.com mailto:ab...@streamsend.com X-Report-Abuse-Info: It is important to please include full email headers in the report X-Campaign-ID: 20812 X-Streamsendid: 23361+362+1918562+20812+mailengine.8lmediamail.com Date: Wed, 15 Jul 2009 09:14:24 -0700 From: Paul DiFrancesco: Eight Legged Media efly...@8lmediamail.com mailto:efly...@8lmediamail.com To: embroid...@x.com mailto:embroid...@x.com Subject: Visit with over 25 suppliers This is a multi-part message in MIME format.
Re: Underscores
twofers wrote: How can I pattern match when every word has an underscore after it. Example: This_sentenance_has_an_underscore_after_every_word I'm not really good at Perl pattern matching, but \w and \W see an underscore as a word character, so I'm just not sure what might work. body =~ /^([a-z]+_+)+/i Is that something that will work effectively? Thanks. Wes I'd do something like this: body MY_UNDERSCORES/\S+_+\S+_+\S+/ Unless you really want to restrict it to A-Z. Regardless, ending any regex in + in a SA rule is redundant. Since + allows a one-instance match, it will devolve to that. You don't need to match the entire line with your rule, so the extra matches are redundant. It will match the first instance, and that's all it needs to be a match. Also any regex ending in * should just have it's last element removed, as that will devolve to a zero-count match.
Re: sa-update errors
MrGibbage wrote: I get errors like this when I run sa-update from cron /usr/local/bin/setlock -n /tmp/cronlock.4051759.53932 sh -c $'/home/skipmorrow/bin/sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org' gpg: WARNING: unsafe ownership on homedir `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys' gpg: failed to create temporary file `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/.#lk0x5d7320.ps11651.23686': Permission denied gpg: keyblock resource`/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/secring.gpg': general error gpg: failed to create temporary file `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/.#lk0x5d7320.ps11651.23686': Permission denied gpg: keyblock resource `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/pubring.gpg': general error gpg: no writable keyring found: eof gpg: error reading `/home/skipmorrow/share/spamassassin/sa-update-pubkey.txt': general error gpg: import from `/home/skipmorrow/share/spamassassin/sa-update-pubkey.txt' failed: general error But when I run it from a login shell, it doesn't show those errors. So I wrote a cript to verify that the cron job is running as the correct user by putting in whoami, and indeed it is running as skipmorrow skipmor...@ps11651:~$ ls etc/mail/spamassassin/sa-update-keys/ -la total 28 drwx-- 2 skipmorrow pg652 4096 Jul 20 00:00 . drwxr-xr-x 3 skipmorrow pg652 4096 Jul 17 13:29 .. -rw--- 1 skipmorrow pg652 5123 Jul 17 14:29 pubring.gpg -rw--- 1 skipmorrow pg652 4505 Jul 17 13:32 pubring.gpg~ -rw--- 1 skipmorrow pg6520 Jul 17 13:29 secring.gpg -rw--- 1 skipmorrow pg652 1200 Jul 17 13:29 trustdb.gpg skipmor...@ps11651:~$ ls .gnupg/ -la total 24 drwx-- 2 skipmorrow pg652 4096 Jul 10 13:27 . drwxr-x--x 30 skipmorrow pg652 4096 Jul 20 03:48 .. -rw--- 1 skipmorrow pg652 4128 Jul 10 13:27 pubring.gpg -rw--- 1 skipmorrow pg652 3039 Jul 10 13:27 pubring.gpg~ -rw--- 1 skipmorrow pg6520 Jul 10 13:27 secring.gpg -rw--- 1 skipmorrow pg652 1200 Jul 10 13:27 trustdb.gpg should sa-update be looking for keys in ~/.gnupg? No, it should not be looking in .gnupg. That would be the location for keys you use. The keys used by sa-update are application specific, so why would you want them on the keyring you use for email? Or is it working correctly? Well, it's not working correctly, as you're having errors :) What environment variable does sa-learn and gnupg look for that would be present in my login shell but not be present when running in a cron environment? I don't think it's missing an enviornment variable. Are you sure the cronjob is running with an effective userid of skipmorrow? This message: gpg: failed to create temporary file `/home/skipmorrow/etc/mail/spamassassin/sa-update-keys/.#lk0x5d7320.ps11651.23686': Permission denied Strongly suggests you've got a permissions issue, where the cronjob is running as a user that can't create files in /home/skipmorrow/etc/mail/spamassassin/sa-update-keys/ . Since skipmorrow has rwx, that suggests the cronjob is running as some other userid (probably cron or some other system account).
Re: WEb Frontend for SQL Bayes
Luis Daniel Lucio Quiroz wrote: Is there a good frontend that letme to admin SQL Bayes? mysql-admin? What are you looking to do as far as administering bayes? (particularly what would you be doing more than once every 2-3 years)
Re: Spamcheck and how it affects bayes question
Gary Smith wrote: We have a process in place using the perl CPAN module for invoking SA. This is outside of the scope of the normal mail system. Basically we use this to see what scores emails would generate for some statistical stuff. The spam engine this calls is to set use -100 as the score so that everything is considered spam. Our production spam engine is set to 7. We are looking at the score that the perl modules returns and logging it (rather than the isspam flag). To complicate things a little more, we are using MySql for the bayes store. This store is also used by our production boxes. This isn't the problem, just what we are doing. The CPAN module has this as the decription: public instance (\%) process (String $msg, Boolean $is_check_p) Description: This method makes a call to the spamd server and depending on the value of C$is_check_p either calls PROCESS or CHECK. Given that the perl call as a boolean option for PROCESS and CHECK, I would assume that they make some difference, but it really doesn't what the difference is. Currently in our code we are it with a false value, which executes the PROCESS commnad. What I'm wondering is will this through off bayes if we keep doing this as everything that SA is returning is considered spam? I'm just worried that these continued tests will cause bayes to get wacky. Also, should we be using PROCESS or CHECK when doing this type of checks. Gary The bayes auto-learning system does not care what your required_score is set to, and does not care if messages are tagged as spam or not. It uses its own thresholds, and its own additional criteria for learning. So, feeding it lots of mail with the threshold set to -100 shouldn't matter at all.
Re: WEb Frontend for SQL Bayes
Luis Daniel Lucio Quiroz wrote: Le mardi 21 juillet 2009 22:11:39, Matt Kettler a écrit : Luis Daniel Lucio Quiroz wrote: Is there a good frontend that letme to admin SQL Bayes? mysql-admin? What are you looking to do as far as administering bayes? (particularly what would you be doing more than once every 2-3 years) No, pas mysql-admin To let a common user or admin to teach SA about spam or ham. Fair enough, since you were specific about SQL, I was wondering if you were looking to do something SQL specific (ie: database compacts, etc).. At this point you need a web frontend for sa-learn, and I don't really know of any. If you've got a per-user bayes setup, such a frontend could get a little messy (needing to authenticate and setuid to the right user prior to invoking sa-learn, etc)
Re: WEb Frontend for SQL Bayes
Benny Pedersen wrote: On Wed, July 22, 2009 04:18, Luis Daniel Lucio Quiroz wrote: Is there a good frontend that letme to admin SQL Bayes? sa-learn That's not exactly a web front-end Benny.
Re: Subject Rules
twofers wrote: I'm writing rules for header Subject and have a rule question. I want a rule that would hit on specific words, no matter what order they were. Would a rule written like this rule below accomplish that? Is the * redundant and not needed? Would a rule written like this be more efficient and faster than a rule where say, each of these words was used in a separate individual rule? header LR Subject =~ / [independent]*[opportunity]*[luxury]*[cowhides]*[win]*[money]*[rep]*[save]*/i Thanks. Wes Well, I wouldn't say that * is redundant.. however, I would say this entire rule is silly and doesn't do what you want, and it's a little ambiguous what you're really trying to do. The whole rule devolves to being any empty regex (//) if you express the *'s as {0}, meaning this should match *any* text. I highly doubt that's what you meant. Also, you've put the words inside [], which turns them into character classes. [win] will match a single character. a w, an i or an n, not the word win. I doubt that's what you want either. You probably meant to do something like this: header LR Subject =~ / independent.*\bopportunity\b.*\bluxury\b.*\bcowhides\b.*\bwin\b.*\bmoney\b.*\brep\b.*\bsave\b.*/i But that will only match if all the words are used IN THAT ORDER. If you want to match all of them being used in arbitrary order, you'll have to use multiple rules and combine them with a meta rule. Or perhaps you were looking to detect if any one of them was used, which would be this rule: header LR Subject =~ / \b(?:independent.|opportunity|luxury|cowhides|win|money|rep|save)\b/i Probably very false positive prone, but that works.
Re: DNSWL
twofers wrote: I get: * -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low * trust and I read the dnswl.org home page, but I don't understand why this rule would get a -1.0 for a LOW trust rating. It just seems awkward to me, I think LOW trust would dictate a positive rating, say a 1.0 or higher. Any insights? Low doesn't mean it's a likely spam source, it means it's a nonspam source, but with less confidence than the higher tiers. Regardless, this test performed reasonably well in the 3.2 mass-checks OVERALLSPAM% HAM% S/ORANK SCORE NAME 0.092 0.0058 0.24420.023 0.66 -1.00 RCVD_IN_DNSWL_LOW (from http://svn.apache.org/repos/asf/spamassassin/branches/3.2/rules/STATISTICS-set3.txt) With a S/O of 0.023, that means that 97.7% of the email this rule hit was nonspam, and 2.3% was spam. With that S/O, I don't think -1 is an out-of-order score, particularly since the test set was spam biased (63.8% of the test email was spam)
Re: How can I view bayes score for individual words?
snowweb wrote: I tried to view the files bayes.toks, bayes.journal, bayes.seen and autowhitelist but they just look jibberish when opened in a unix editor. What's the solution to this? The bayes database stores truncated SHA1 hashes of the words, it is not reversible back to human readable text using the database alone. This is done for performance reasons (fixed size tokens = faster random access), but has a side benefit of preventing your bayes DB from containing words that may imply things about your confidential emails. However, if you run a message through spamassassin with -D bayes=9 it should dump all the tokens in the message with their score from the bayes DB. I was hoping to be able to tweak some of the scores and add certain words etc. That would be a very misguided thing to do. Bayes is a statistical system, and statistics work better with real measurements, not biased numbers based on your own guesswork. The reality of things is that a learning statistics system based on email is really gathering statistics based on human behavior. Human behavior is *way* more complex than you think it is. :-) If you really want to tweak the score of some words, create static rules for them. Leave bayes to doing its own exacting measurements.
Re: anchor forgery
mouss wrote: Mike Cardwell a écrit : Just checking through my Spam folder and I came across a message that contained this in the html: censored example, Verizon won't let me send it Yet, there was no mention of this obvious forgery in the spamassassin rules which caught the email. How would you create a rule which matched when the anchor text is a url which uses a different domain to the anchor href? this has been discussed a (very) long time ago. the outcome is that a mismatch also happens in legitimate mail. Not just happens, it happens quite a lot. Sometimes in nonspam it is differences that are easy to compensate for, like the link being to hosting.example.com, but the anchor text is www.example.com. Other times it's difficult to compensate for, where they first send you to a link at their ESP, which then redirects you to the actual site. Some ESPs prefer to do this, either for billing (charge extra for clicks) or spam control reasons (if the sender violates the ToS, the ESP will disable the redirect, which isn't much, but it does prevent the sender from profiting at the ESPs expense.). Regardless of reasons, Senders tend to make the text match what your browser will show after the redirect occurs, not the ESP target in some totally different domain.
Re: Score -71 for VERY spammy message!
snowweb wrote: Terry Carmen wrote: This is the result, X-Spam-Level: X-Spam-Status: No, score=-71.4 required=4.7 tests=HELO_DYNAMIC_IPADDR, HTML_IMAGE_ONLY_20,HTML_IMAGE_RATIO_02,HTML_MESSAGE,HTML_SHORT_LINK_IMG_3, MIME_HTML_ONLY,MISSING_DATE,MISSING_MID,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_PBL, RCVD_IN_SORBS_DUL,RCVD_IN_XBL,RDNS_NONE,RELAYCOUNTRY_PE,SARE_FROM_DRUGS, SARE_UNI,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL, USER_IN_WHITELIST autolearn=no version=3.2.4 X-Spam-Relay-Country: PE I can't understand what is going on here! How can it get a score like that? The message contained just an image and a link. -- USER_IN_WHITELIST Ah ok. I hadn't seen that. By that does it mean sender or user? The spammer is actually in my whitelist? Where can I check entries in my whitelist please? USER_IN_WHITELIST would be sender. Check your whitelist_from and whitelist_from_* statements in your local.cf. In particular, make sure you didn't make this common mistake: whitelist_from insert your own address or domain here Spammers *WILL* abuse this, regularly.
Re: bayes not active although enabled?
snowweb wrote: Sorry, got mixed up. In /etc/mail/spamassassin/local.cf use_bayes 1 Is there anywhere else that I need to switch this on since it does not appear to be doing bayesian testing at all for any messages. check your sa-learn --dump magic SA won't activate bayes until it has learned at least 200 spam, and 200 nonspam messages. (under the general premise that until you have a decent amount of mail learned, the statistics are going to be a bit erratic and not worthwhile using)
Re: Score -71 for VERY spammy message!
Benny Pedersen wrote: On Sun, July 26, 2009 05:07, snowweb wrote: I can't understand what is going on here! How can it get a score like that? The message contained just an image and a link. add score whitelist_from 0.1 in user_prefs or local.cf restart spamd Um, that's not going to do anything except generate errors Benny. 1) There is no rule named whitelist_from to assign a score to. The rule name is USER_IN_WHITELIST, not whitelist_from. So you'd have to do: score USER_IN_WHITELIST 0.1. 2) Doing this completely defeats all the user-configured static whitelisting in SA. You'd be better off removing all your whitelist_from statements instead. i.e.: don't be stupid and treat the symptoms when it's just as easy to treat the cause.
Re: whitelist_from questions
MySQL Student wrote: Hi, I'm looking an email that appears to be one of the users from the whitelist, but instead was from: From probesqt...@segunitb1.freeserve.co.uk Mon Jul 27 19:49:19 2009 Why can't a comparison be made between the From: info and the actual sender? Is this because of virtual domains and/or users? It's not done because this mismatch happens for nearly every mailing list in existence (including this one). Every message you get from this mailing list is From: the poster, but the envelope is from the apache list server's bounce handler. The To: header and Rcpt to: mismatch for similar reasons (To: will be the list, but RCPT TO will be your mailbox).
Re: AutoWhiteList
--[ UxBoD ]-- wrote: Hi, Where can I find sa-awlUtil as it does not appear to be in the download file ? Best Regards, Hmmm, it looks like someone has been editing the wiki in ways that don't match anything in any released or unreleased version of SA. The tool is named check-whitelist. There's been talk of changing AWL stuff to not reference the word whitelist, but AFAIK, this hasn't even been done in the unreleased 3.3 code. Regardless, you can fetch check_whitelist from SVN: http://svn.apache.org/repos/asf/spamassassin/branches/3.2/tools/
Re: Parallelizing Spam Assassin
rich...@buzzhost.co.uk wrote: On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote: On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. I apologise for the any language deemed offensive. Whilst 'Jesus', 'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for openly swearing and using the filty phrase 'Barracuda Networks'. For this I apologise. Richard, we are not joking. Please watch your language on this mailing list, or you will be banned from it. You have now been warned by 2 members of the Project Management Committee. You will not be warned again.
Re: Parallelizing Spam Assassin
rich...@buzzhost.co.uk wrote: email me off list as I've just been banned for upsetting a sponsor LOL Richard, this has nothing to do with Barracuda. They have no influence over my opinions whatsoever. I don't work for Apache or Barracuda, or any company sponsored by either.Neither Apache nor Barracuda has complained. At the time I warned you, I didn't even remember that Barracuda ever donated to Apache. I don't think any member of the PMC has any regular contact with Barracuda, although we've had occasional contact about using their RBL. Your warning is about using foul language, and then choosing to thumb your nose at the warning Justin gave you. You're behaving like an impudent and foul mouthed child, and that's unwelcome her. That said, I really don't appreciate you using this list to rant about Barracuda's products, or discuss them at all. This is the SpamAssassin list, not the Barracuda list. Barracuda may use SpamAssassin, and SpamAssassin may support the Barracuda public RBL, but beyond that, any discussion of them is, quite frankly, off-topic. I don't care how good or bad their commercial product, or its support is, because it is off-topic here. I don't welcome people praising Barracuda any more than I welcome complaints. It simply doesn't matter to SpamAssassin, so it doesn't belong here. You may as well be ranting about Ford cars for all I care, it still doesn't belongs here. This list is about SpamAssassin, nothing more, nothing less. Continue with the foul language, and you'll find the door very quickly. Keep harping on the same off-topic subject and we will eventually get tired of it. You've said your peace about Barracuda, now give it a rest, because frankly I don't care about their products, I care about our product. Is that difficult to understand?
Re: Parallelizing Spam Assassin
Um, Linda.. I'm pretty positive Justin is Irish, not American. Linda Walsh wrote: It's an American thing. Things that are normal speech for UK blokes, get Americans all disturbed. Funny, used to be the other way around...but well...times change. Justin Mason wrote: On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate.
Re: SA-learn (spamassassin)
monolit wrote: Question is logical. When SA learnt new spam/ham so SA have to write new info to the database and I think that database have to increase size. If you have for example *.doc file and you modify it. You add several words - *.doc will be bigger(increase his size). The database doesn't need to grow in size. A berkley db file can contain free space. This is done to avoid constantly shrinking and growing the file on disk. Deleted elements are merely marked as free space for later use. Therefore, data can be added to a berkley db file, without an increase in file size.
Re: blacklisting a forger; summary; /* end
LuKreme wrote: On 3-Aug-2009, at 10:21, Dennis G German wrote: Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Yes, there IS a problem. What the hell? The message was multipart/alternative. You are more than capable of reading the text/plain part. html-only messages are strongly discouraged on the list, but so is complaining about multipart/alternative.
Re: RelayCountry Config
MySQL Student wrote: Hi, I don't know if it makes a difference, but I call it Relay-Countries to match the name of the pseudo-header used in the tests add_header all Relay-Countries _RELAYCOUNTRY_ It doesn't appear to make a difference. I must be doing something else wrong. Using spamassassin --lint -D 21 | less shows the X-Relay-Countries header, but it's null: # spamassassin --lint -D 21 | egrep -i 'relay|country|countries' snip [23760] dbg: metadata: X-Spam-Relays-Trusted: [23760] dbg: metadata: X-Spam-Relays-Untrusted: [23760] dbg: metadata: X-Spam-Relays-Internal: [23760] dbg: metadata: X-Spam-Relays-External: snip [23760] dbg: metadata: X-Relay-Countries: The --lint test is *NOT* valid for this. --lint is *ONLY* to verify your config files are parseable. The lint test uses a dummy message that has no Recived: headers in it. This prevents --lint from wasting time doing RBL lookups, etc, which speeds up the lint run. This is valid because --lint is not intended to be a comprehensive test of the system, it's intended to check if your rulefiles are readable. Since the lint dummy mode has no Received: headers, it hasn't been anywhere, so it's been in no countries. Try again with a real message with real headers, and try to remember that --lint is not a general-purpose test.
Re: Scores, razor, and other questions
MySQL Student wrote: Hi, After another day of hacking, I have a handful of general questions that I hoped you could help me to answer. - How can I find the score of a particular rule, without having to use grep? I'm concerned that I might find it at some score, only for it to be redefined somewhere else that I didn't catch. Something I can do from the command-line? No, to be comprehensive you'd have to do a series of greps, one for the default set, site rules, and user_prefs. You could probably make a little shell script to automate grepping all 3. - How do I find out what servers razor is using? What is the current license now that it's hosted on sf, or are the query servers not also running there? It doesn't list any restrictions on the web site. Wow.. the razor client has been hosted on SF for a LOOong time.. Like 6 years now? Regardless, the servers are operated by Vipul's company, cloudmark. Try running razor-admin -d -discover. Alternatively, look at razor's server.lst file. - The large majority of the spam that I receive these days is a result of a URL not being listed in one of the SBLs. I'm using SURBL, URIBL, and spamcop. For example, I caught censored several hours ago, and it's still not listed in any of the SBLs. Am I doing something wrong or am I missing an SBL? Has anyone else's spam with URLs increased a lot lately? Note: domain censored, verizon's spam outbreak controls won't let me send the message with that domain in it right now. URIBLs have some inherent lag, and spammers are playing a race game with the URIBLs, trying to change domains faster than they get listed. Fortunately, the domain registrations cost the spammers money, so increasing the number of those they need is good. Personally, I find bayes tends to clean up most of what gets missed, although I auto-feed my bayes using spamtrap addresses that automatically submit to sa-learn --spam, resulting in very fresh spam training. Looking at uribl, they've currently got it listed in URIBL gold, but that's a non-free list of theirs. It's also a proactive list, so it will list domains before they send spam, making it more effective against mutating runs, but also might toss a FP or two on new domains. Thanks, Alex
Re: Mailbox for auto learning
Luis Daniel Lucio Quiroz wrote: Hi SAs, Well, after reading this link http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html I'm still looking for an easy-way to let my mortal users to train our antispam. I was thinking a mailbox such as h...@antispamserver and s...@antispamserver to let users to forward their false positivos or their false netgatives. In isde each box (ham or spam), of course a procmail with sa-learn input will be forwarded. My doubts are nexts: 1. Will forwarded mails be usefull for training, I mean if spam was: From: spa...@example.netTo: u...@mydomain, when forwarding it will be From: mu...@mydomain To: s...@antispamserver. Change of this and forwarding (getting rid of headers because mail-clients) wont change learning? Forwarded mails are NOT useful. You also neglected to mention the change of Received headers, and pretty much every header in the message, the re-encoding of the body by your mail client, etc. Since SA's bayes tokenizes headers, that's disastrous. 2. If technique in question 1 is usless, what other way would be nice to let user to report a false positive/negative for training. In some cases you can have the client forward as attachment, and use a mailbox that strips attachments and feeds them to sa-learn. As long as the client being used forwards the entire original message, with complete headers, this should work fine. TIA LD
Re: two different spamassassin outputs
David Banning wrote: With every email for some reason I get two reports from spamassassin. I the headers I get this line; - X-Spam-Status: No, score=2.5 required=5.0 tests=BAYES_00,DEAR_SOMETHING, HTML_MESSAGE,SPF_PASS,URIBL_BLACK,URIBL_OB_SURBL autolearn=no version=3. 2.5 - Then in the actual message content area, I get this message - (notice the difference in the score) - Content analysis details: (6.3 points, 5.0 required) pts rule name description -- -- -0.0 SPF_PASS SPF: sender matches SPF record 2.2 DEAR_SOMETHING BODY: Contains 'Dear (something)' 0.0 HTML_MESSAGE BODY: HTML included in message 2.1 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist [URIs: verery.net] 2.0 URIBL_BLACKContains an URL listed in the URIBL blacklist [URIs: verery.net] - I would like it to toss the email away based on the second score (6.3) but I would also like to know why it is scoring twice, each with a different score. It looks like you're scanning twice. One copy of SA has a bayes database (that thinks the message is nonspam, so it's probably badly trained), and the other doesn't seem to have bayes enabled. That alone accounts for -2.6 points. The other side of it, the second copy, because bayes isn't active, the second one is using scoreset 1, instead of scoreset 3, which raises the scores of other tests (the points that bayes would otherwise hog gets sprinkled around across other rules). ie: in set 1, URIBL_OB_SURBL is 2.132 points, in set 3 it is 1.50.. etc. Any comments or suggestions would be helpful. Thanks -
Re: whitelist_from_rcvd and short circuit
Chris wrote: It appears as though I don't understand how this is supposed to work. I have a file in /etc/mail/spamassassin called my-whitelist.cf. In it I have entries such as: snip whitelist_from_rcvd harley-requ...@the-hed.net the-hed.net snip however, a message from the 2nd address doesn't hit the USER_IN_WHITELIST for some reason: Return-path: harley-requ...@the-hed.net X-spam-checker-version: SpamAssassin 3.2.5 (2008-06-10) on localhost.localdomain X-spam-status: No, score=-4.9 required=5.0 tests=AWL=0.445,BAYES_00=-6.4, DCC_CHECK_NEGATIVE=-0.0001,KHOP_NO_FULL_NAME=0.259,RDNS_NONE=0.1, SPF_NEUTRAL=0.686,UNPARSEABLE_RELAY=0.001 AWL,BAYES_00,DCC_CHECK_NEGATIVE, KHOP_NO_FULL_NAME,RDNS_NONE,SPF_NEUTRAL,UNPARSEABLE_RELAY shortcircuit=no autolearn=disabled version=3.2.5 Complete headers of both posts are here: http://pastebin.com/m1d1d5e07 snip So, what am I doing wrong here? Two problems with that message: First, there's an unparsable Received: header, which appears to be the one created by your fetchmail. That's breaking SA's trust path, and preventing any hosts from being trusted, making whitelist_from_rcvd impossible. I'm not sure what's throwing it off, but the (single-drop) bit looks a bit odd to me. You need to get SA to understand the Received: headers for any Received-based mechanisms to work. You'll also need it to trust all the servers at your isp/esp/whatever relationship you have with embarqmail.com and synacor.com. Second, the message from harley-requ...@the-hed.net is not relayed to your site from a server using the-hed.net as it's reverse DNS. In fact, the-hed.net is not used as the domain of *ANY* server in the received headers of that message. The server they appear to be using is kyoto.hostforweb.net, so hostforweb.net should be the second parameter in your whitelist_from_rcvd, not the-hed.net.
Re: Counting RAZOR2 hits
MySQL Student wrote: Hi, I thought grep -c RAZOR2_CHECK through my mail logs would give me a good approximation of the number of times RAZOR2 was consulted, but that doesn't seem to be the case. There are some mails that don't have it listed in the tests= section. I've also tried the razor-* commands, and they don't appear to be able to help here either. What am I missing? Does RAZOR2_CHECK mean that it was found in the RAZOR2 db, or that it merely consulted the db? That means it was found and was above your min_cf. i.e.: Razor believes it is spam.
Re: sa-update.com expired ?
Stefan wrote: Hello list, I just configured sa-update on a server with some sare rule sets. And it couldn't download some sets because the MIRRORED.BY file has an entry with sa- update.com. In this case it was the 70_zmi_german rule set and the MIRRORED.BY file has the following content: http://daryl.dostech.ca/sa-update/zmi/70_zmi_german.cf/ http://updates.sa-update.com/zmi/70_zmi_german.cf/ sa-update tried the latter but there is nothing, because the domain seams to be expired. Is this temporary or are there plans to fix that? Hmm, Interesting. Looks like it expired on August 8th. Perhaps Daryl can answer this (AFAIK, he's the owner of the sa-update.com domain. It is not owned by the ASF or the SpamAssassin team.)
Re: Counting RAZOR2 hits
Karsten Bräckelmann wrote: On Mon, 2009-08-17 at 09:52 +0200, Matus UHLAR wrote: On 15.08.09 14:32, Matt Kettler wrote: That means it was found and was above your min_cf. i.e.: Razor believes it is spam. There's no min_cf gor RAZOR and there's no public hitcount. RAZOR2 has internal trust system which counts reports and revokes from its users/reporters and uses those to decide if the message is listed or not. There is -- the minimum confidence level is the second option for the check_razor2_range() eval rule. You can also set your min_cf in your razor config files, which will affect when the RAZOR2_CHECK rule fires. This does work in SpamAssassin, as I have over-ridden the min_cf on my own system, and have done so for years. The private part of Razor's trust system has to do with how much impact your reports have on the cf values everyone else gets when they query razor. However, you're free to tweak razor to be more or less aggressive. The razor system also advertizes a suggested cf value, which they call ac (average confidence?) and you can define min_cf to either be your own absolute value (ie: 10), or relative to the advertized one (ie: ac+10, or ac-5). Razor's cf's go from -100 to +100. see man razor-agent.conf for further details on how to configure razor, if you're so inclide.
Re: SA Timeouts
Cory Hawkless wrote: Hi All, Having a problem with my SA setup. I’m using amavisd and Postfix. For some reason I get the following occasionally Aug 19 15:37:20.176 ceg.caznet.com.au /usr/sbin/amavisd[5]: (5-01-6) SA dbg: bayes: database connection established Aug 19 15:37:20.177 ceg.caznet.com.au /usr/sbin/amavisd[5]: (5-01-6) SA dbg: bayes: found bayes db version 3 Aug 19 15:37:20.179 ceg.caznet.com.au /usr/sbin/amavisd[5]: (5-01-6) SA dbg: bayes: Using userid: 4 Aug 19 15:37:20.184 ceg.caznet.com.au /usr/sbin/amavisd[5]: (5-01-6) SA dbg: bayes: corpus size: nspam = 5993, nham = 24505 Aug 19 15:39:30.977 ceg.caznet.com.au /usr/sbin/amavisd[4]: (4-02-4) (!)SA TIMED OUT, backtrace: at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PerMsgStatus.pm line 1961\n\teval {...} called at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PerMsgStatus.pm line 1961\n\tMail::SpamAssassin::PerMsgStatus::_get_parsed_uri_list('Mail::SpamAssassin::PerMsgStatus=HASH(0xb0945cc)') called at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PerMsgStatus.pm line 1852\n\tMail::SpamAssassin::PerMsgStatus::get_uri_detail_list('Mail::SpamAssassin::PerMsgStatus=HASH(0xb0945cc)') called at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 207\n\tMail::SpamAssassin::Plugin::URIDNSBL::parsed_metadata('Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0xae5421c)', 'HASH(0xb05f97c)') called at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/PluginHandler.pm line 202\n\teval {...} called at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin[...] Roughly twice a day? If so, I'm guessing a bayes expire run makes the SA run just long enough to get killed (expiry does take a while, depending on hardware and DB size, it adds around 1-2 minutes to a run. . Try either: 1) extend the amavis timeout by 30 seconds 2) disable SA's bayes_auto_expire, and use a cronjob to run sa-learn --force-expire instead. and see if it goes away.
Re: sare channels
Dave wrote: Hello, I'm trying to add additional sa rules and wanted to use the sare channels referenced by the wiki. I'm using sa 3.2.5 and when i atempted to get updates from saupdates.openprotect.com the channel didn't exist. Has it moved? Thanks. Dave. Read the top of the rulesemporium site: http://www.rulesemporium.com/ SARE rules aren't being updated. Hence, sa-updating them is pointless.
Re: sare channels
Gary Smith wrote: Read the top of the rulesemporium site: http://www.rulesemporium.com/ SARE rules aren't being updated. Hence, sa-updating them is pointless. Is it still recommended to run the SARE rules? There's nothing wrong with running them if you want.. but using sa-update on them regularly is utterly pointless..
Re: Obfuscation Question
Irish Online Help Desk wrote: When I send a test message for my broadcast email I am receiving “0.6 HTML_OBFUSCATE_05_10 BODY: Message is 5% to 10% HTML obfuscation” in the spam score. It is a pretty basic email message with a few hyperlinks and a numbered list. Can you explain what may be causing this spam score. Well, at 0.6 points, it's not really anything to worry about. Nobody (at least nobody with more than 2 braincells) should be tagging or discarding email at such a low score level. As for the rule, it's generally going to be looking for abuse of tables, etc to obscure what the user-perceived text of a message is. (ie: writing a message by populating columns vertical-first in a table), etc. If you're really worried, you might want to look at the raw message source and see if the innocent looking text has a lot of really weird html layout in it. However, with such a low obfuscation ratio, and such a small score.. I'd really not worry about it.
Re: SA-learn is it a problem
peperami97 wrote: Hi Whenever I run sa-learn it claims to learn from every message regardless of whether its being run immediately after being run on the same folder. Is this normal or is this a problem ? That would seem to be a problem. It shouldn't relearn from the same message.. are you using SQL based bayes, or the default db_file based bayes? If db_file, is your bayes_seen file being updated when you run sa-learn?
Re: SA-learn is it a problem
Ben Whyte wrote: By default: ~/.spamassassin/bayes_seen (ie: inside the .spamassassin subdierctory of your home directory) I found it I assume. Its in /home/.spamassassin/spamassassin_seen .spamassassin_seen is not getting updated .spamassasin_toks is getting updated Ben Erm. Did someone mess with the bayes_path setting in your configuration? Also, are you running SA as a user whose home directory is just /home (ie: the user named nobody)
Re: SA-learn is it a problem
Ben Whyte wrote: Erm. Did someone mess with the bayes_path setting in your configuration? Also, are you running SA as a user whose home directory is just /home (ie: the user named nobody) The bayes config is pointing to /home/spamd Based on what you've told me so far, it is not. It is pointing to /home/.spamassassin/ What's the exact bayes_path statement you used, and what file is it in? (please post the exact one.. bayes_path is a VERY tricky option to use, because it requires more than just a path.) Its running as the user spamd. I noticed that the spamassassin_seen is not been touched by running sa-learn. However the size isnt changing. spamassassin_toks is changing size. Did you run sa-learn as the user spamd? Ben
Re: some domains in my local.cf file not being tagged
Mark Mahabir wrote: Hi, I have a large number of domains I've blacklisted in my local.cf file e.g. blacklist_from *...@domain.com however spam from some domains gets tagged, whereas others don't. What can I do to improve the situation? Thanks Mark Does the From: header of these messages match *...@domain.com, or are they *...@something.somedomain.com (which wouldn't match)? Does the X-Spam-Status header show that a blacklist matched (USER_IN_BLACKLIST)?
Re: Rule PTR != localhost
Clunk Werclick wrote: Howdie; I'm starting to see plenty of these and they are new to us: zgrep address not listed /var/log/mail.info Sep 3 05:26:59 : warning: 222.252.239.56: address not listed for hostname localhost dig -x 222.252.239.56 ... ;; QUESTION SECTION: ;56.239.252.222.in-addr.arpa. IN PTR ;; ANSWER SECTION: 56.239.252.222.in-addr.arpa. 83651 IN PTR localhost. ... Taking to one side the various RBL's which are catching these, and not going the whole 'PTR must match' route - would it be practical to craft a 10 point rule based on PTR = localhost? Is it even possible to build a rule based upon DNS returns? Forgive the stupidity of the question, but I'm not sure how to, or even if it can be implemented? Not without writing a plugin. Although if your MTA inserts a may be forged note into the Received: headers, SA will pick up on this. Generally speaking, SA does not perform A record lookups of anything that could be spammer-provided, neither hosts in URLs nor Received: hosts. Doing so posses a potential security risk. (NS record queries are performed, but not A). Attack vectors include: 1) malicious insertion of hosts that are slow-to-resolve, forcing a DNS timeout, thus slowing down mail processing. A small flood of such messages (each with different hostnames) could readily occupy all your spamd children. Spamd does not have sufficient cross child co-ordination to implement countermeasures, and anyone using the API or spamassassin script would have to roll their own. 2) there is the potential to abuse chosen queries to facilitate DNS cache poisoning attacks, on servers that are vulnerable.
Re: Rule PTR != localhost
Matt Kettler wrote: Clunk Werclick wrote: Howdie; I'm starting to see plenty of these and they are new to us: zgrep address not listed /var/log/mail.info Sep 3 05:26:59 : warning: 222.252.239.56: address not listed for hostname localhost dig -x 222.252.239.56 ... ;; QUESTION SECTION: ;56.239.252.222.in-addr.arpa. IN PTR ;; ANSWER SECTION: 56.239.252.222.in-addr.arpa. 83651 IN PTR localhost. ... Taking to one side the various RBL's which are catching these, and not going the whole 'PTR must match' route - would it be practical to craft a 10 point rule based on PTR = localhost? Is it even possible to build a rule based upon DNS returns? Forgive the stupidity of the question, but I'm not sure how to, or even if it can be implemented? Not without writing a plugin. Although if your MTA inserts a may be forged note into the Received: headers, SA will pick up on this. Correction, SA dropped this rule a LONG time ago in the 2.5x series due to wild false positives. The legacy rule from 2.4x header MAY_BE_FORGEDReceived =~ /\(may be forged\)/i describe MAY_BE_FORGED 'Received:' has 'may be forged' warning score MAY_BE_FORGED 0.038 OVERALL% SPAM% NONSPAM% S/ORANK SCORE NAME 2.5303.7572.2900.620.340.04 MAY_BE_FORGED 0.62 S/O is not so good (ie: 62% of the email matched was spam, but 38% was nonspam)
Re: Rule PTR != localhost
Clunk Werclick wrote: On Thu, 2009-09-03 at 05:23 -0400, Matt Kettler wrote: Clunk Werclick wrote: Howdie; I'm starting to see plenty of these and they are new to us: zgrep address not listed /var/log/mail.info Sep 3 05:26:59 : warning: 222.252.239.56: address not listed for hostname localhost dig -x 222.252.239.56 ... ;; QUESTION SECTION: ;56.239.252.222.in-addr.arpa. IN PTR ;; ANSWER SECTION: 56.239.252.222.in-addr.arpa. 83651 IN PTR localhost. ... Taking to one side the various RBL's which are catching these, and not going the whole 'PTR must match' route - would it be practical to craft a 10 point rule based on PTR = localhost? Is it even possible to build a rule based upon DNS returns? Forgive the stupidity of the question, but I'm not sure how to, or even if it can be implemented? Not without writing a plugin. Although if your MTA inserts a may be forged note into the Received: headers, SA will pick up on this. Generally speaking, SA does not perform A record lookups of anything that could be spammer-provided, neither hosts in URLs nor Received: hosts. Doing so posses a potential security risk. (NS record queries are performed, but not A). Attack vectors include: 1) malicious insertion of hosts that are slow-to-resolve, forcing a DNS timeout, thus slowing down mail processing. A small flood of such messages (each with different hostnames) could readily occupy all your spamd children. Spamd does not have sufficient cross child co-ordination to implement countermeasures, and anyone using the API or spamassassin script would have to roll their own. 2) there is the potential to abuse chosen queries to facilitate DNS cache poisoning attacks, on servers that are vulnerable. Thank you Matt. That is a fine quality of answer and makes total sense. I had never thought to consider this attack vector. On an SA install running hundreds of thousands of messages I could see a significant issue if DNS returns ran much past 300ms or so. I am guessing (and I have not at all examined the code, nor shall I pretend that I would understand it) that there is some kind of sanity check for DNS timeout there someplace? Again, potentially a stupid question - but I'm curious as to how we would say 'that query has taken too long, I'm out of here'. AFAIK, all the DNS lookups for a message are subject to the rbl_timeout code. See to conf docs: http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html
Re: some domains in my local.cf file not being tagged
Mark Mahabir wrote: 2009/9/3 Matt Kettler mkettler...@verizon.net: Does the From: header of these messages match *...@domain.com, or are they *...@something.somedomain.com (which wouldn't match)? They're definitely *...@domain.com in the From: header. Does the X-Spam-Status header show that a blacklist matched (USER_IN_BLACKLIST)? No, they don't (the ones that don't get tagged). Thanks, Mark Interesting, then one of the following is the cause: 1) there's errors in your config, and SA isn't parsing local.cf at all. To check for this, run spamassassin --lint. It should run quietly, if it complains, find and fix the offending lines. 2) You're editing a local.cf in the wrong path. Check what the site rules dir is near the top of the debug output when you run spamassassin -D --lint. 3) the offending message has multiple From: headers, and SA is interpreting the other one. You can try looking at the raw message source for this. 4) The configuration being used at delivery time is over-riding the one used at the command line. You can try pumping the message as a file through spamassassin on the command line and see what it comes up with. If it matches USER_IN_BLACKLIST on the command-line, but fails to match at delivery, something is fishy about your integration and how it configures SA.
Re: some domains in my local.cf file not being tagged
d.h...@yournetplus.com wrote: Quoting Matt Kettler mkettler...@verizon.net: Mark Mahabir wrote: 2009/9/3 Matt Kettler mkettler...@verizon.net: Does the From: header of these messages match *...@domain.com, or are they *...@something.somedomain.com (which wouldn't match)? They're definitely *...@domain.com in the From: header. Does the X-Spam-Status header show that a blacklist matched (USER_IN_BLACKLIST)? No, they don't (the ones that don't get tagged). Thanks, Mark Interesting, then one of the following is the cause: 1) there's errors in your config, and SA isn't parsing local.cf at all. To check for this, run spamassassin --lint. It should run quietly, if it complains, find and fix the offending lines. 2) You're editing a local.cf in the wrong path. Check what the site rules dir is near the top of the debug output when you run spamassassin -D --lint. 3) the offending message has multiple From: headers, and SA is interpreting the other one. You can try looking at the raw message source for this. 4) The configuration being used at delivery time is over-riding the one used at the command line. You can try pumping the message as a file through spamassassin on the command line and see what it comes up with. If it matches USER_IN_BLACKLIST on the command-line, but fails to match at delivery, something is fishy about your integration and how it configures SA. Or, does order of comparison matter. From the documentation, blacklist_from states to see whitelist_from. whitelist_from states: The headers checked for whitelist addresses are as follows: if Resent-From is set, use that; otherwise check all addresses taken from the following set of headers: Envelope-Sender Resent-Sender X-Envelope-From From If taken in that order, the From header field would be compared last. It will check *ALL* of the from like headers, and it will fire if *ANY* of them match. So that's not the problem.
Re: user prefs from sql problem
Karel Beneš wrote: Hi, I am trying to load user preferences from SQL db (mysql). Setup was done according to doc/spamassassin/sql/README.gz, but user preferences are still loaded from files. No error message is raised into log file in debug mode. DB-based bayes and awl works fine. Debian GNU/Linux 5.0.3, spamassassin 3.2.5, mysql 5.0.51a. Spamassassin is invoked by spamc in /etc/procmailrc. spamd --max-children 2 --helper-home-dir --setuid-with-sql -d --pidfile=x What is going wrong? Did you set these options in your local.cf?: user_scores_dsn DBI:driver:connection user_scores_*sql*_username dbusername user_scores_*sql*_password dbpassword And what did you set user_scores_dsn to? See also: sql/README from the tarball (web copy for 3.2.x at: http://svn.apache.org/repos/asf/spamassassin/branches/3.2/sql/README) Thanks a lot, --kb
Re: URL rule creation question
MySQL Student wrote: Hi all, I've seen this pattern in spam quite a bit lately: snip - URI that verizon won't let me send Would it be reasonable to create a rule that looks for this two-char then dot pattern, or is it reasonable that it might appear in a legitimate email too frequently? If possible, how would you create a rule to capture this? This rule should detect 10 consecutive occurrences. uri L_URI_FUNNYDOTS /(?:\.[a-z,0-9]{2}\.){10} I do think that 4-in-a-row might be pretty common (ie: IP addresses), but 10 in a row seems unlikely. Warning: I wrote this quickly without too much thought. It may have bugs, but I'm short on time at the moment.
Re: URL rule creation question
McDonald, Dan wrote: From: Matt Kettler [mailto:mkettler...@verizon.net] This rule should detect 10 consecutive occurrences. uri L_URI_FUNNYDOTS /(?:\.[a-z,0-9]{2}\.){10} Warning: I wrote this quickly without too much thought. It may have bugs, but I'm short on time at the moment. your variant would require two periods in a row between each pair. So it would... Hence the warning :)
Re: Does Spam Assassin support additional languages?
Chris Arrendale wrote: I am running version 3.2.4 and interested to know if there are additional language packs for Spam Assassin such as German, Turkish, Chinese, etc.? If they are available, does anybody know where I can download them? There are a few language packs that come with SA. German, Spanish, French, italian, etc. see the 30_text_*.cf files that come with it. SA picks this up which language to use from the LANG environment variable of the system it is running on.
Re: Re-running SA on an mbox
MySQL Student wrote: Hi, I have an mbox with about a 100 messages in it from a few days ago. The mbox is a combination of spam and ham. What is the best way to run SA through these messages again, so I can catch the ones that have URLs in them that weren't on the blacklist at the time they were received? Must I break them all apart to do this, or can SA somehow parse the whole mbox? If not, what program do you suggest I use to accomplish this? Do you just want to re-scan the whole mbox and see what rules hit now for research reasons? You could probably abuse the mass-check tool for that purpose: http://svn.apache.org/repos/asf/spamassassin/branches/3.2/masses/ It's normally used to generate logs we feed into the score generation process, but it can be run on a single mbox. The downside, is all it does is generate a report, one line per message, with a list of hits. There's no way to (directly) get SA to modify email that's already in an mbox file. The mass-check and sa-learn tools can read them, but nothing in SA can write to that. However, there might be a utility out there to do this (although I'm not aware of any)..
Re: Re-running SA on an mbox
MySQL Student wrote: Hi, Do you just want to re-scan the whole mbox and see what rules hit now for research reasons? That's a good start, but I'd like to see if I can break out the ham to train bayes. There's no way to (directly) get SA to modify email that's already in an mbox file. The mass-check and sa-learn tools can read them, but nothing in SA can write to that. However, there might be a utility out there to do this (although I'm not aware of any).. Yeah, that's kind of what I thought. Maybe a program that can split each message back into an individual file? Would procmail even help here? Or even a simple shell script that looks for '^From ', redirects it to a file, runs spamassassin -d on it, then re-runs SA on each file? I could then concatenate each of them back together and pass it through sa-learn. That sounds like a good plan. If you google around for mbox split or mbox splitter you can find some sample code out there that does it. It's all just simple code looking for the ^From boundary.
Re: Re-running SA on an mbox
Theo Van Dinter wrote: You probably want spamassassin --mbox. :) It won't modify the messages in-place, but you can do something like spamassassin --mbox infile outfile. If you're talking about sa-learn, though, it also knows --mbox. Yes, but he's got mixed spam and nonspam in one mbox. You've got to split that before you can feed sa-learn. On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student mysqlstud...@gmail.com wrote: Yeah, that's kind of what I thought. Maybe a program that can split each message back into an individual file? Would procmail even help here? Or even a simple shell script that looks for '^From ', redirects it to a file, runs spamassassin -d on it, then re-runs SA on each file? I could then concatenate each of them back together and pass it through sa-learn.
Re: partial (lazy) scoring? (shortcircuit features)
ArtemGr wrote: I would like to configure Spamassassin to only do certain tests when the required_score is not yet reached. For example, do the usual rule-based and bayesian tests first, and if the score is lower than the required_score, then do the DCC and RAZOR2 tests. Is it possible? Not exactly the way you describe, no. SpamAssassin has a priority and a shortcircuit facility that provide a vaguely similar functionality, but it doesn't really work exactly the way you want. Priority allows you to change the order in which rules are processed, so you can make some rules run earlier, or later, than others. This part fits your needs. Shortcircuit allows you to stop processing when a particular rule fires. However, it is strictly based on the rule firing, not the message score. This part doesn't fit your needs. Collectively they allow you to make some rules (ie: USER_IN_WHITELIST, USER_IN_BLACKLIST) run first, and abort processing if they fire. However, this doesn't really work for your scenario of delaying a few rules and aborting if they're not needed. I suppose there could be some kind of mod to the shortcircuit plugin to do this, however it's a little dangerous from a false-positive perspective, so the devs may not be very enthusiastic about adding it. A long, long time ago, SpamAssassin had a feature where it would abort as soon as a given score was hit. However, this introduced a problem where it could cause false positives. A nonspam message might hit several spam rules early in the processing, and drive the score over the abort threshold, causing it to be tagged as spam. However, this could prevent it from matching negative scoring rules that would push it back under the spam threshold. Now, that version of SA was a long time ago, and we didn't have any priority going on, and it was also checking the score pretty often in between rules. In theory, a feature could be added to let you do something like this (SA doesn't have this feature, but I'm proposing it could be added): shortcircuit_if_score_above_at score priority Which would let you do: shortcircuit_if_score_above_at 5.0 99 priority RAZOR_CHECK 100 priority DCC_CHECK 100 You'd have to be careful about your priorities, as this will prevent any nonspam rules with higher priority numbers from running, but it could work for this scenario. You could also prevent the rules from running on nonspam if they're pointless as well with a similar score below feature: shortcircuit_if_score_below_at -1.17 99 The highest score you can ever get out of both DCC and Razor (with the current scores) is +6.17 (unlikely, but possible, assuming both e4 and e8 have high cf's and DCC fires too). If the score is already below -1.17, there's no way these rules can ever drive the score up enough be over 5.0 and make the message spam. Obviously this would greatly depend on what rules you're running late.
Re: partial (lazy) scoring? (shortcircuit features)
Matus UHLAR - fantomas wrote: Matt Kettler mkettler_sa at verizon.net writes: In theory, a feature could be added to let you do something like this (SA doesn't have this feature, but I'm proposing it could be added): On 22.09.09 11:46, ArtemGr wrote: That would be a nice optimization: most of the spam we receive have a 10 score. It seems a real waste of resource to perform all the complex tests (like distributed hashing or OCR-ing) on spam which is DNS and rule-detectable. You haven't read Matt's explanation of why it wasn't a good idea, did you? There are rules with negative scores, which can puch the score back to the ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is checked? *You* obviously haven't read my message, which explains how this *can* be done safely.
Re: 3.3.0 and sa-compile
to...@starbridge.org wrote: Benny Pedersen a écrit : On fre 25 sep 2009 13:38:19 CEST, to...@starbridge.org wrote I've tested with SA 3.2.5 and it's working fine with Rule2XSBody active. I've tried to delete compiled rules and compile again: same result. forget to sa-compile in 3.3 ? sa-compile has been run correctly with no errors (even in debug)
Re: SQL Bayes behavior
pm...@email.it wrote: Hi, I've few question about the behavior of Bayes and SQL. Before the questions, i've followed this tutorial http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html that should be the same thing of this: http://spamassassin.apache.org/full/3.0.x/dist/sql/README.bayes, my db are updated constantly, so it should woks. 1- In the bayes_vars http://192.168.1.36/phpmyadmin/sql.php?db=spamassassintoken=eea7fc1ed22ce035cad972e37fa36534table=bayes_varspos=0 table i've only a row for amavis user. Theoretically is it a good choise to use only one db for all users of my domain? (if i've understood well, spamassassin use this single db to store Bayes for all users of my domain) In theory, per-user is slightly more accurate than systemwide. However, training is more important than granularity. So when it comes down to it, unless you're ready to set up something where users can individually report spam and nonspam (can be a bit tricky) you're probably better off going with a single system-wide bayes database. At least this way if you need to do some manual training, it's only one DB to train on and everyone benefits. 2- How can i use single Bayes db for each users? Should i use bayes_sql_override_username ? I don't know where to get the right username. You'd need to get amavis to pass this to spamassassin. I don't know enough about amavis to know if this is supported or not. Generally most MTA layer integrations don't, and most MDA integrations do, but there's lots of exceptions. Amavis is a MTA integration, but it might be one of the exceptions. 3- Every 10-15 seconds, the counts of ham_count or spam_count in bayes_vars http://192.168.1.36/phpmyadmin/sql.php?db=spamassassintoken=eea7fc1ed22ce035cad972e37fa36534table=bayes_varspos=0 table increase without that any users send or receave mails. So, the behavior of spamassassin is to analize all mails presents in all my users's Maildirs? No. spamassasin has no concept that your user's maildirs even exist, it will not scan them. There are only 2 ways training occurs: 1) a message passes through SA during delivery, and gets auto-learned due to the scoring criteria 2) someone (or some cronjob) calls sa-learn and explicitly feeds it mail. And the only other way that the counts could update would be during a journal sync, which occurs only during message processing or calls to sa-learn. (the exact triggers are slightly different, but from a high-level view they're more-or-less the same.). It seems strange you're seeing the counts increase without any incoming mail... Are you *positive* nothing is arriving, or recently arrived and is just finishing up being processed by SA? Thanks :) Marco
Re: New spamhaus list not included
Mike Cardwell wrote: SpamHaus announced a new list a couple of days back - http://www.spamhaus.org/news.lasso?article=646 According to that page it returns results of 127.0.0.3 I just took a quick look at 20_dnsbl_tests.cf and it doesn't seem to include it yet. Currently we have: RCVD_IN_SBL - 127.0.0.2 RCVD_IN_XBL - 127.0.0.[45678] RCVD_IN_PBL - 127.0.0.1[01] It was announced 2 days ago.. are you really surprised it's not in SA proper yet? (2 days isn't really enough time to test a new RBL for accuracy) :-) That said, we do appreciate you passing along the announcement, and it looks like Alex committed a rule for it to his sandbox for testing shortly after your email and created bug 6215 to track it. So, the ball is now rolling. Thanks much.
Re: SpamAssassin Ruleset Generation
poifgh wrote: I have a question about - understanding how are rulesets generated for spamassassin. For example - consider the rule in 20_drugs.cf : header SUBJECT_DRUG_GAP_C Subject =~ /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i describe SUBJECT_DRUG_GAP_C Subject contains a gappy version of 'cialis' Who generated the regular expression /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i Man, that's a good question. I wrote a large chunk of the rules in 20_drugs.cf, but not that one. ( I wrote the stuff near the bottom that uses meta rules. ie: __DRUGS_ERECTILE1 through DRUGS_MANYKINDS, originally distributed as a separate set called antidrug.cf). As I recall, there were 2 other people making drug rules, but it's been a LONG time, and I forget who did it. Those rules were written in the 2004-2006 time frame when pharmacy spams were just hammering the heck outa everyone. a. Is it done manually with people writing regex to see how efficiently they capture spams? Yes. Many hours of reading spams, studying them, testing various regex tweaks, checking for false positives, etc, etc. mass-check is your friend for this kind of stuff. One post from when I was developing this as a stand-alone set: http://mail-archives.apache.org/mod_mbox/spamassassin-users/200404.mbox/%3c6.0.0.22.0.20040428132346.029d9...@opal.evi-inc.com%3e Note: the comcast link mentioned in that message should be considered DEAD. The antidrug set is no longer maintained separately from the mailline ruleset, and hasn't been for years. If you want to break the rules down a bit, here's some tips: The rules are in general designed to detect common methods to obscure text by inserting spaces, punctuation, etc between letters, and possibly substituting some of the letters for other similar looking characters. (W4R3Z style, etc) The simple format would be to think of it in groupings. You end up using a repeating pattern of (some representation of a character)(some kind of gap sequence)(character)(gap)...etc. .{0,2} is a gap sequence, although not one I prefer. I prefer [_\W]{0,3} in most cases because it's a bit less FP-prone, but risks missing things using small lower-case letters to gap. You also get replacements for characters in some of those, like [A4] instead of just A. Or, more elaborately.. [a4\xe0-\...@] So this mess: body __DRUGS_ERECTILE1 /(?:\b|\s)[_\W]{0,3}(?:\\\/|V)[_\W]{0,3}[ij1!|l\xEC\xED\xEE\xEF][_\W]{0,3}[a40\xe0-\...@][_\w]{0,3}[xyz]?[gj][_\W]{0,3}r[_\W]{0,3}[a40\xe0-\...@][_\w]{0,3}x?[_\W]{0,3}(?:\b|\s)/i Could be broken down: (?:\b|\s) - preamble, detecting space or word boundary. [_\W]{0,3} - gap (?:\\\/|V) - V [_\W]{0,3} - gap [ij1!|l\xEC\xED\xEE\xEF] - I [_\W]{0,3} - gap [a40\xe0-\...@] - A [_\W]{0,3} - gap [xyz]?[gj] - G (with optional extra garbage before it) [_\W]{0,3} - gap r- just R :-) [_\W]{0,3} - gap [a40\xe0-\...@] -A [_\W]{0,3} - gap x? - optional garbage [_\W]{0,3} - gap (?:\b|\s)- suffix, detecting space or word boundary. Which detects weird spacings and substitutions in the word Viagra. But how are the rules generated themselves? Mostly meatware, except the sought rules others have mentioned. Thnx
Re: Valid mail from blacklisted dynamic IPs
MySQL Student wrote: Hi, I have a set of users that are authorized to use the mail server via pop-before-smtp, but SA catches the mail they send through the system as spam because they are on blacklisted Verizon or Comcast IPs: X-Spam-Status: Yes, hits=5.4 tag1=-300.0 tag2=5.0 kill=5.0 use_bayes=1 tests=BAYES_50, BOTNET, FH_HOST_EQ_VERIZON_P, RCVD_IN_PBL, RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL Does your pop-before-smtp method cause your MTA to indicate they've been authed in the Received: header? I also don't understand how SPF_SOFTFAIL could happen when there wasn't any SPF record to test to begin with. Are you sure? What was the envelope from domain for the message? (keep in mind, this checks the envelope from, not the from header..) One of the Comcast users: X-Spam-Status: Yes, hits=6.4 tag1=-300.0 tag2=5.0 kill=5.0 use_bayes=1 tests=BAYES_50, BOTNET, DYN_RDNS_SHORT_HELO_HTML, HTML_MESSAGE, RCVD_IN_PBL, RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL, SUBJ_ALL_CAPS We are working on better Bayes training, but sans that problem, what is the right way to address this, through a rule that whitelists their specific IP? Another mail that I'm dealing with is one sent by Marriott that hit SARE_HTML_URI_REFID, DCC_CHECK, and AE_DETAILS_WITH_MONEY, among being whitelisted by JMF/HOSTKARMA. I don't know how it hit DCC when there are details in there specific to the user, including account numbers, user names, etc. Some of DCC's signatures are fuzzy, thus will match similar messages with minor differences. This is done to avoid spammers bypassing by simply adding a text counter to the message, or some other similar bit to make each one unique. Combine that with DCC being strictly a measure of bulkiness not spamminess, and you most likely have your answer. You could run it through dccproc to see which of DCC's signatures matched. As for dealing with it: whitelist Marriott at the SA level (as you suggest) whitelist Marriott at the dcc level remove or severely cut back the score of AE_DETAILS_WITH_MONEY, if you ever actually expect to get important email about traveling to the UAE. Personally I strongly recommend the third option if you're likely to get emails about travel to the UAE. That rule (with the IMO overly strong 3.0 score that floats around) is really designed for people who would never travel there, but get hammered with spam offering trips there. For folks that might actually do so, maybe 0.5 is more appropriate. How should I go about allowing this type of mail without disrupting its ability to block mail that should be blocked with these rules? I'm sure I can add a rule subtracting points if it hits these and comes from Marriott, but I thought there might be something that could address the more general problem rather than this specific one from Marriott. Perhaps I'm making it too hard. Thanks, Alex
Re: results in languages other than english
ahattarki wrote: The spamassassin report comes back in English. Is this configurable to return results in languages other than english. Also can a single spamassassin handle returning results in different languages. One user gets the results back in English, while another gets the results back in Korean all on the same instance of SpamAssassin ?? thanks, Anjali SA reads the LANG enviornment variable when it runs, and if it matches one of the extra language sets (see 30_text_*.cf in the ruleset), then it will use that text set. At present, there's no korean translation set, but it's not difficult to write your own, look at some of the other files for examples. As for switching per-user on the fly, AFAIK sa isn't set up for that. In part, this would require the SA instance to maintain strings for all language sets in memory at the same time. Right know, if I remember right, it only loads strings for the language it is set for at the time the ruleset is parsed during load.
Re: How can i block blank messeage mail
cofe2003 wrote: i find SA will not scan a mail if messeage is blank . so ,i want score all of blank messeages mails is 6.00 how can i do? my SA version is 3.17 thanks That's odd. SA should scan it, unless it is so blank there aren't even any headers. How have you integrated SA into your system? Something like this should work for the scoring: rawbody MSG_BODY_EMPTY !~ /./ describe MSG_BODY_EMPTY Message has no body text score MSG_BODY_EMPTY 6.0
Re: update does not work correctly?
klop...@gmx.de wrote: Hi, I use Spamassassin 3.2.5 with CentOS On October 20 I startet an update with this commands: sa-update --channel updates.spamassassin.org When I start now the update, the date of the folder and file in /var/lib/spamassassin/3.002005 does not change. It is still the October 20 Did anyone an idea why the date does not change? Updates are published as needed, which at times means there may be updates every day, and other times it may be a several months between releases. In general spam signatures are fairly broad and generic, and need to be updated *MUCH* less often than virus signatures. Virus signatures target a single virus at a time, thus need updating for every new variant, hence the very frequent releases. SA rules target a generic trait of a message, and only need updating when there is radical change in the spam stream. Looking at the SVN tags, the last update to rules for the 3.2 branch was pushed back on July 20th.
Re: New to Spamassassin. Have a few ?s...
Computerflake wrote: I'm looking into a free spam filter that can do the following. Will Spamassassin do these things? 1) Will it filter multiple domains so I can filter for many different companies? Sure. Depending on how you set it up, you can even have per-domain customization of the whole ruleset. 2) Will it send individual users an email once a day (for example) to inform them of the spam that was captured in case they were not actually spam? Directly? No.. SpamAssassin, by itself, is really just a scanning engine with header modification abilities. It does not do email management, quarantines, etc at all. It receives a message, evaluates it, and modifies it based on the results, nothing more, nothing less. (this is done to make SA flexible.. it's a mail pipe, so you can glue it into almost anything.) Generally matters like this are handled by integration tools such as MailScanner, amavisd-new, etc, although I do not know of any that provide comprehensive quarantine management. That said, I've never desired such, so I've not looked at length for one. (I mostly just tag mail, and let users filter at the client level as they see fit.) See also: http://wiki.apache.org/spamassassin/IntegratedInMta 3) Will it allow users to add people to an individual whitelist so they can handle their own spam settings? Yes, provided the tools integrate it in a per-user manner. 4) I understand it connects in to ClamAV using a plugin. How easy is it to install the plugin so I can also scan for viruses for folks? Personally, I'd suggest letting an integration tool call ClamAV and SpamAssassin independently. The clamav plugin for SA is functional, and not difficult to set up, but it's not what I would consider an ideal solution. All it does is cause viruses to show up as a SA rule named CLAMAV. However, Since SpamAssassin can't drop mail directly, you'll still need to get an integration tool to detect that marker in the header and delete the message. Thanks for any help. I don't want to spend a fortune on a spam filter if I can find a free filter that will do everything I would need.
Re: About log generation
Jose Luis Marin Perez wrote: Dear friends, There is some configuration of SA to generate different logs and these are for each mail domain? spamd, like most well behaved unix daemons, uses syslog. It doesn't write logfiles directly. The old-school approach to this would be to run several instances of spamd, one per domain, have each log to a separate local* syslog facility, and have syslogd write each to a separate logfile. A more modern approach might be possible using some of the newer syslogd's that can be configured based on message content, not just facility.severity. However, that assumes you can tell from the log message alone.. I'm not sure offhand if spamd has that info in the syslog messages. The antispam system analyzes emails from different domains and what I want is to generate statistics for each domain. Thanks Jose Luis Discover the new Windows Vista Learn more! http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Development dead
Anatoly Pugachev wrote: On 04.11.2009 / 09:20:16 -0500, Bowie Bailey wrote: polloxx wrote: Hi, Is the spamassassin development dead? On the website there's: 2008-06-12: SpamAssassin 3.2.5 has been released. Not quite. If you look at svn, you'll see this: spamassassin_20091103151200.tar.gz03-Nov-2009 15:122.1M Doesn't look dead to me! :) Hello! Can you please post a full URL to this archive? Since http://svn.apache.org/snapshots/spamassassin/ doesn't have it. The snapshots directory is automatically built and old versions are purged. The November 3rd image is gone. Now we've got ones from the 10th and 11th. By the time you look at it again, these might be gone and newer ones may have replaced them. [ ] spamassassin_20091110151200.tar.gz 10-Nov-2009 15:12 2.1M [ ] spamassassin_20091110211200.tar.gz 10-Nov-2009 21:12 2.1M [ ] spamassassin_2009031200.tar.gz 11-Nov-2009 03:12 2.1M [ ] spamassassin_2009091200.tar.gz 11-Nov-2009 09:12 2.1M However, if you're really just looking to gauge development activity, it would be better to look at the list archives of all the SVN commits. http://mail-archives.apache.org/mod_mbox/spamassassin-commits/ or, for the current month of November 2009, sorted by date: http://mail-archives.apache.org/mod_mbox/spamassassin-commits/200911.mbox/date
Re: Relation bettwen MAIL FROM: and From:
Luis Daniel Lucio Quiroz wrote: Hi All, I'm wondering if some know is this is possible to stop using SA. Look. MAIL FROM and From: are commonly mismatched in legitimate mail. For example, every message that you receive from this list (and every other sanely configured mailing list) will have an apache.org address in the MAIL FROM, and the sender in the From:. That's because apache is remailing, and should receive all DSN's, but they are not the originator of the message. There's quite a few other scenarios where mismatches occur outside of spam. Perhaps you should look more closely at your nonspam email.
Re: Problem with sa-blacklist
Michael Monnerie wrote: I can't reach Bill Stearns, so I try at this list: Dear Bill, I'm using the sa-blacklist.reject for postfix since a long time, but these last days your rsync doesn't work anymore: rsync: failed to connect to rsync.sa-blacklist.stearns.org: Connection timed out (110) So I had a look if something changed on http://www.sa-blacklist.stearns.org/sa-blacklist/ but obviously the information there is quite old: If I download the sa- blacklist.current.reject, it has a version of April: 200904171539 while my last rsync version is 200910142031 Any chance for a fix? mfg zmi SA-blacklist and sa-blacklist-uri are both dead as far as use within SpamAssassin goes. Although someone updated it in 2009, for all practical purposes it's use as a SA ruleset has been dead (or at least dying) since 2004. (when the WS sub-list of surbl.org was created) While it was an interesting case study, but it is *VERY* inefficient, and will kill most servers. Any use of it should be restricted to research purposes only (i.e.: reading the list manually to study patterns in emerging spam domains). It is too heavyweight to use under SpamAssassin. The plain sa-blacklist was not very effective, and consumed lots of memory (750MB per spamd instance?). This list worked on the From: address of the message, which spammers recycle very quickly. This means lots of addresses, a huge list, and very low hitrate due to low re-use. Plain and simple waste of memory to use it under SA. (although manually looking at the list does have some uses... as noted above..) The URI version has become the WS list over on surbl. This version had better hitrates, but the very large list consumed large amounts of memory too. Also, searching this huge list as a large number regular expressions is so computationally intensive that most systems can complete a DNS lookup against surbl.org before the regexes finish running. It is not unheard of for this ruleset to add 10 or more seconds to message processing, in addition to the over 1 gig of ram it consumes. Sure a more recent server with more CPU beef and fast ram could probably complete it in 3 seconds or so, but that is still slower than a DNS lookup. Most admins are not willing to devote several gigs of ram just for their SpamAssassin instances. I doubt you are either, so please don't use sa-blacklist. Unless you're looking to use it as a data set for analysis purposes, it is dead, and has been for a long time. The valuable parts have evolved into parts of SURBL, which is already in SpamAssassin, unless you're dealing with a version that is over 4 years old.
Re: SpamAssassin 3.3
LuKreme wrote: Is there a roadmap for the release of SA 3.3? Probably the best roadmap would be to look at the list of bugs assigned against 3.3.0 https://issues.apache.org/SpamAssassin/buglist.cgi?query_format=advancedbug_status=NEWbug_status=ASSIGNEDbug_status=REOPENEDversion=3.3.0 a best guess on when it might be released? when it's done... It shouldn't be too terribly long now before a beta is released, based on reading some of the latest dev list traffic. However, exactly how long that is depends a lot on how much free time the team has. A URL I should be reading instead of posting to the list? You can always browse the dev list archives. There's often good tidbits on there (and often lots of noise to.. but...) http://mail-archives.apache.org/mod_mbox/spamassassin-dev/
Re: X_Report_Header
Daniel D Jones wrote: Running 3.2.5 under Debian Etch. I'm trying to add the Spamassassin X_Report_Header. Per the website at http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html report_safe ( 0 | 1 | 2 ) (default: 1) ... If this option is set to 0, incoming spam is only modified by adding some X- Spam- headers and no changes will be made to the body. In addition, a header named X-Spam-Report will be added to spam. You can use the remove_header option to remove that header after setting report_safe to 0. I have the option set to 0 in /etc/spamassassin/local.cf and the remove_header option is not configured in any of the files in that directory. I'm getting the X-Spam-Score, X-Spam_score_int, X-Spam_bar, etc headers but I am not getting the Report header. I've been unable to find anything on the web as to why this might be. And assistance appreciated. That sounds like your headers are not being generated by SpamAssassin... SpamAssassin cannot generate headers starting with X-Spam_. They have to start with X-Spam- (note dash instead of underscore). What happens when you run a message through SA on the command-line? ie: spamassassin testmsg.txt Are you using something like this exim integration: http://www.debianhelp.org/node/10614 In which case, you'll have to edit that exim script, because SA isn't generating headers in that kind of setup.
Re: Scoring for DATE_IN_FUTURE_96_XX
Thomas Harold wrote: On 11/30/2009 9:27 PM, Thomas Harold wrote: While looking at the scores in 50_scores.cf, I noticed the following: score DATE_IN_FUTURE_03_06 2.303 0.416 1.461 0.274 score DATE_IN_FUTURE_06_12 3.099 3.099 2.136 1.897 score DATE_IN_FUTURE_12_24 3.300 3.299 3.000 2.189 score DATE_IN_FUTURE_24_48 3.599 2.800 3.599 3.196 score DATE_IN_FUTURE_48_96 3.199 3.182 3.199 3.199 score DATE_IN_FUTURE_96_XX 3.899 3.899 2.598 1.439 Why does the 96+ hour rule score so much lower then the 48-96 hour test for the last two entries? (I'm also wondering if there should be an even higher score rule for stuff over 168 hours in the future or past.) I did dig up the following thread from back in Oct '06... http://mail-archives.apache.org/mod_mbox/spamassassin-users/200611.mbox/browser I'm guessing that what it boils down to is contained in the wiki page? The spam is better off caught by another rule once network tests are allowed? Yep, since SA is scored as a set, score stealing between rules is pretty common when there's a lot of overlap between two rules and one performs slightly better than the other. It's also possible for there to be more complicated cascades where one rule affects another, which in turn affects a third, which affects a fourth... Also looking at the above scores, there's likely no spam network tests that cover the same mail as 48_96, because its score is pretty much the same. On average the scores of all non-network spam rules should go down a little bit when the network tests are enabled there are more rules in the set competing for score. However since the distribution of hits across rules is distinctly not random, you'll see a lot of non-average cases, which means some rules will be: staying the same because they cover mail the network tests don't going down radically due to heavy overlap going up because they correct false negatives in some of the non-spam network tests. http://wiki.apache.org/spamassassin/HowScoresAreAssigned
Re: Clear Database Question
Jason Carson wrote: Hello everyone, Is it necessary to clear the database... sa-learn --clear ...before I run the following to train SpamAssassin's bayesian classifier... sa-learn --spam /home/jason/.maildir/.Spam/cur/ No. That would be ill advised. Running --clear deletes your entire bayes database, which can take a long time to recover from. I would only advise using it if you've decided all your previous training is worthless, or your database becomes corrupted. Also be sure to consider that once you clear the database SA will deactivate bayes until 200 spam and 200 nonspam messages get trained. SpamAssassin will automatically make room when it needs to by pushing out the least popular tokens through the expire process (which you can manually trigger via the sa-learn --force-expire command, but it normally checks during message processing twice a day)
Re: Language detection in TextCat
Marc Perkel wrote: I'm wondering if the language detection in TextCat can be improved. Here's the situation. It appears that TextCat was designed to be inclusive. You list the languages you want and it returns many possibilities so as not to trigger unwanted falsely. What I'm doing is extracting the language list for Exim where I hope to offer a language reject list. The problem is that when you are rejecting languages you want a smaller list that when you are including languages to avoid false positives. I'd rather have a single (non-english) result. I'm wondering if there's a way to add some more options to alter the behavior of the plugin so it is more optimized towards the idea of rejecting languages? The language detection would have to be radically redesigned to have enough accuracy support this. Currently TextCat is a *very* crude match, and will often will return multiple languages for plain English text. Textcat is not designed to decide what language the email is, but to find a set of languages it *might* be. It is very prone to declaring extra languages that are not really present due to it's design. This is useful in the if it can't be my language, then it's garbage sense, but not so useful in a reject if it could be this language I don't like. You'd really want reject if it *IS* this language I don't like, but textcat doesn't tell you what language an email is, only a set of what it might be.
Re: Possible to whitelist *all* incoming emails that contain specific text in the subject line?
nathang wrote: Hi, I'd like to setup an email account in cPanel so that I receive *all* incoming emails that contain a specific word in the subject line. It would be critical that I get 100% of the emails sent to me (that contain a specific word in the subject line), and that none of them get trapped by a spam filter or whatnot, as these emails would signify my paying customers with their order details. I know that you can whitelist individual email addresses, but is it possible to whitelist based on subject line text? If this possible to do in cPanel / WHM, how would I go about doing it? Thanks! Assuming a non-ancient version of SA (3.1.0 or higher), the whitelist_subject plugin should be loaded. So you can just add this to your configuration (i.e.: local.cf): whitelist_subject customer which would whitelist any email with the word customer in the subject. As for doing it in cPanel / WHM... no clue, I've never used either tool.
Re: Sharing and merging bayes data?
On 12/17/2009 2:50 AM, Rajkumar S wrote: Hello, I have 2 SA servers running for a single domain. Both were primed with a set of 200 spam and ham messages are are now auto learning. After about a day both have auto learned different numbers of ham and spam mails. Is it possible to merge the bayes data every night and update both servers with new merged data? with regards, raj No.. If you're using file-based bayes, there's no good way to share updates between one DB and the other. The information needed to make such a merger successful isn't stored, because it is not needed for any reason within SpamAssassin. The database merely stores the token, it's spam count, it's nonspam count, and a last-seen timestamp. If you look at the same token in 2 different databases, you can't really merge these counts, because you don't know how many occurred since your last merge. If you really want common bayes data between two servers, you should configure bayes to use a SQL server (MySQL, etc) and point both SpamAssassin configurations to the same database. This also has the benefit that both servers are continuously in-sync.
Re: Sharing and merging bayes data?
On 12/17/2009 11:17 AM, RW wrote: If you're using file-based bayes, there's no good way to share updates between one DB and the other. The information needed to make such a merger successful isn't stored, because it is not needed for any reason within SpamAssassin. The database merely stores the token, it's spam count, it's nonspam count, and a last-seen timestamp. If you look at the same token in 2 different databases, you can't really merge these counts, because you don't know how many occurred since your last merge. I'm not saying it's a good idea, but it is possible provided that you retained the result of the previous merge. It should be simple to script too. Agreed I didn't mean to say that a merge is impossible, it's just not with the tools that SA comes with, and you need more info than just what's in the current database. As you mentioned, you'd need a custom script (not wildly complicated for a good perl scripter, but beyond the bounds of someone with only crude scripting skills.) as well as historical copies of each database from the last merge. Setting up SQL would be much easier.
Re: spamassassin or spamd with amavisd-new?
On 1/5/2010 6:09 AM, Angel L. Mateo wrote: Hello, Because FH_DATE_PAST_20XX bug, I have found that when I run spamassassin through amavisd-new (in a postfix server) I need to restart spamassassin and amavisd-new after any change in spamassassin. Debugging this, I found that amavisd-new doesn't connect to my spamd daemon to check mails, so I think it is using spamassassin command instead of spamc (I have spamd running in foreground, without -d option and I haven't seen any connection) However, I have read in spamassassin that spamc has better performance than spamassassin, so I would like amavisd-new to use spamc instead of spamassassin. I don't know much of amavisd-new and spamassassin implementations details, but I have found that amavisd-new connect with spamassassin throught is perl interface by create a SpamAssassin object like this: my($spamassassin_obj) = Mail::SpamAssassin-new({ debug = $sa_debug, save_pattern_hits = $sa_debug, dont_copy_prefs = 1, local_tests_only = $sa_local_tests_only, home_dir_for_helpers = $helpers_home, stop_at_threshold = 0, }); Do you know if there is any option to tell perl object to use the spamd daemon? Is there any way to use spamd daemon with amavis? Is it worth in a mail gateway with hugh loads? Stop, you do NOT need to do this. It would be slower. Amavisd-new does not use the spamassasin command-line application (which is really slow), it is loading perl API directly and re-using that API instance, which is even more efficient than spamc. You don't see the perl API method discussed very often because it only makes sense when using an integration tool written in perl (which amavis is). In effect, amavisd-new is already it's own spamd daemon using this method. Invoking spamc on the command line would add more overhead to this process. Really, all spamd does is create a reusable instance of a Mail::SpamAssassin perl object, and keeps it loaded so it can process several messages that spamc feeds this. This is exactly what amavisd-new is already doing internal to its own code, so it doesn't need spamd. Running spamassassin on the command line is really slow, because it creates a new Mail::SpamAssassin object, scans a single message, and exits. This is great for quick checks of the configuration, but not at all efficient in a mailstream. However, amavisd-new does not do this. It creates and re-uses a Mail::SpamAssassin object. Read the main page of the amvavis website which states: http://www.ijs.si/software/amavisd/ Which will point out: when configured to call /Mail::SpamAssassin/ (this is optional), it orders SA to pre-load its config files and to precompile the patterns, so performance is at least as good as with spamc/spamd setup. All Perl modules are pre-loaded by a parent process at startup time, so forked children need not re-compile the code, and can hopefully share some memory for compiled code;
Re: ALL_TRUSTED rule no longer working
On 1/5/2010 8:03 PM, Julian Yap wrote: Previously I was running SpamAssassin-3.1.8_1 on FreeBSD. I recently upgraded to 3.2.5_4. It's seems now, I never get any hits on the rule ALL_TRUSTED. Previously it seemed like SA was doing some kind of dynamic evaluation which was working well. - Julian is NO_RELAYS or UNPARSEABLE_RELAY also hitting? In older versions of SA, ALL_TRUSTED was really implemented as no untrusted, so it would fire off if there were no relays, or no parseable ones. This caused problems with ALL_TRUSTED matching spam when people ran SA on servers with malformed headers. Later we changed it to fire if there is: -at least one trusted relay -no untrusted relays -no unparseable relays. Which might be the cause of your problem.
Re: ALL_TRUSTED rule no longer working
On 1/6/2010 3:43 PM, Julian Yap wrote: On Tue, Jan 5, 2010 at 5:12 PM, Matt Kettler mkettler...@verizon.net mailto:mkettler...@verizon.net wrote: On 1/5/2010 8:03 PM, Julian Yap wrote: Previously I was running SpamAssassin-3.1.8_1 on FreeBSD. I recently upgraded to 3.2.5_4. It's seems now, I never get any hits on the rule ALL_TRUSTED. Previously it seemed like SA was doing some kind of dynamic evaluation which was working well. - Julian is NO_RELAYS or UNPARSEABLE_RELAY also hitting? In older versions of SA, ALL_TRUSTED was really implemented as no untrusted, so it would fire off if there were no relays, or no parseable ones. This caused problems with ALL_TRUSTED matching spam when people ran SA on servers with malformed headers. Later we changed it to fire if there is: -at least one trusted relay -no untrusted relays -no unparseable relays. Which might be the cause of your problem. NO_RELAYS gets no hits but UNPARSEABLE_RELAY is working. Should I be getting some hits on NO_RELAYS? Thanks for the further explanation. - Julian Neither of these rules should *EVER* fire. They both indicate error conditions.