Forwarded spam
I'm trying to improve the effectiveness of a spamassassin installation, and there's one user who gets a lot of spam that is forwarded from another address, which effectively kills the network tests and in some cases messes with the BAYES score as well. I want to get rid of it. My solution to the problem was originally to add the forwarding mtas to trusted_networks (seems ironic, but I think this is appropriate). Unfortunately, this doesn't work, because the headers look like this (with apologies for the munging, but it's not my e-mail): Received: from mta3.iomartmail.com ([62.128.193.153]) by smtp.DOMAIN.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from [EMAIL PROTECTED]) id 1KOUZB-0001Xq-Eb for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:29 +0100 Received: from mta3.iomartmail.com (localhost.localdomain [127.0.0.1]) by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with ESMTP id m6V9ZOVc018574 for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100 Received: from p548AAE80.dip0.t-ipconnect.de (p548AB09B.dip0.t-ipconnect.de [84.138.176.155]) by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with SMTP id m6V9ZNUK018506 for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100 [EMAIL PROTECTED] is the original address, which is handled by mta[X].iomartmail.com, and it's forwarded to [EMAIL PROTECTED], which is handled by smtp.DOMAIN.com. I can put 62.128.193.153 into trusted_networks, which should make spamassassin look at the next header back, but that's another iomartmail.com machine (presumably a virus/spam checker), and I'm fairly sure adding 127.0.0.1 to trusted_networks would be a mistake. Question one: Is there a way of getting the network tests working on these forwarded e-mails? My next idea is just to add a load of score to messages to ORIGINALDOMAIN.com. Looking in the wiki at http://wiki.apache.org/spamassassin/WritingRules#head-36104467608e64f77e1878ec3201073b8180c728 I see this: === Checking the From: line, or any other header, works much the same: header LOCAL_DEMONSTRATION_FROM From =~ /test\.com/i score LOCAL_DEMONSTRATION_FROM 0.1 Now, that rule is pretty silly, as it doesn't do much that a blacklist_from can't. === What I want to do is blacklist_to [EMAIL PROTECTED], but with a score of 3 (ie, it's not really a blacklisting). The quote above seems to suggest I can do that, but I can't see it in the docs. Question two: is it possible to set a score on a blacklisted address? Finally, I can use header ToCC, and that'll probably do, but I wanted to know if there's a better way. Thanks, Chris
Re: Forwarded spam
* Matt Kettler wrote (31/07/08 11:25): Chris Lear wrote: I'm trying to improve the effectiveness of a spamassassin installation, and there's one user who gets a lot of spam that is forwarded from another address, which effectively kills the network tests and in some cases messes with the BAYES score as well. I want to get rid of it. My solution to the problem was originally to add the forwarding mtas to trusted_networks (seems ironic, but I think this is appropriate). Unfortunately, this doesn't work, because the headers look like this (with apologies for the munging, but it's not my e-mail): Received: from mta3.iomartmail.com ([62.128.193.153]) by smtp.DOMAIN.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from [EMAIL PROTECTED]) id 1KOUZB-0001Xq-Eb for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:29 +0100 Received: from mta3.iomartmail.com (localhost.localdomain [127.0.0.1]) by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with ESMTP id m6V9ZOVc018574 for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100 Received: from p548AAE80.dip0.t-ipconnect.de (p548AB09B.dip0.t-ipconnect.de [84.138.176.155]) by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with SMTP id m6V9ZNUK018506 for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100 [EMAIL PROTECTED] is the original address, which is handled by mta[X].iomartmail.com, and it's forwarded to [EMAIL PROTECTED], which is handled by smtp.DOMAIN.com. I can put 62.128.193.153 into trusted_networks, which should make spamassassin look at the next header back, but that's another iomartmail.com machine (presumably a virus/spam checker), and I'm fairly sure adding 127.0.0.1 to trusted_networks would be a mistake. Why would adding 127.0.0.1 to trusted_networks be a mistake? Since trust is a path this won't lead to spammers being able to forge trust, as they'd have to first get to your system from a trusted IP address. (or manage to do a TCP blind-spoofing attack and make it look like it came from one) OK, you've persuaded me. It seemed fishy, but I wasn't being logical. I'll do that and keep an eye on it. Don't worry - I'm not going to obsess about TCP spoofing. Question one: Is there a way of getting the network tests working on these forwarded e-mails? My next idea is just to add a load of score to messages to ORIGINALDOMAIN.com. Looking in the wiki at http://wiki.apache.org/spamassassin/WritingRules#head-36104467608e64f77e1878ec3201073b8180c728 I see this: === Checking the From: line, or any other header, works much the same: header LOCAL_DEMONSTRATION_FROM From =~ /test\.com/i score LOCAL_DEMONSTRATION_FROM 0.1 Now, that rule is pretty silly, as it doesn't do much that a blacklist_from can't. === What I want to do is blacklist_to [EMAIL PROTECTED], but with a score of 3 (ie, it's not really a blacklisting). The quote above seems to suggest I can do that, but I can't see it in the docs. Question two: is it possible to set a score on a blacklisted address? No, unless you reset the score for all blacklist_to's score USER_IN_BLACKLIST_TO 3.0 When I said it doesn't do much that a blacklist_from can't, I didn't mean to say there's nothing it can do that a blacklist_from/to can't.. there's just not much. Custom per-address scoring, using a full regex instead of a file-glob, and per-address combinations with other rules in a meta are things blacklist_from/to can't do that a rule can. Thanks. That all makes sense. I was reading too much into the remark. As a side note, in my perusal of the documentation, I didn't stumble easily on the link between the blacklist_to option and the USER_IN_BLACKLIST_TO rule. Finally, I can use header ToCC, and that'll probably do, but I wanted to know if there's a better way. That's the best way I know of. Also, be aware that unless your MTA drops hints about the recipient in the Received: headers with a for clause, SA won't know who the real recipient is when a message is BCC'ed. This is important, as lots of spam is effectively BCC'ed (i.e.: actual recipient is in the envelope, but not the To: or Cc:), so your ToCC may not match spam. Understood. That's part of the reason I didn't take to this solution originally. I assumed that the blacklist_to option would fetch the real recipient out of the received headers (which, as you can see above, do contain the for clause). Thanks for the help. Chris
Re: Forwarded spam
* Matus UHLAR - fantomas wrote (31/07/08 14:07): On 31.07.08 11:05, Chris Lear wrote: I'm trying to improve the effectiveness of a spamassassin installation, and there's one user who gets a lot of spam that is forwarded from another address, which effectively kills the network tests and in some cases messes with the BAYES score as well. I want to get rid of it. many tests (e.g. those who chcek for dynamic IP) use last external IP, which means some network checks will still be killed by such forwarder. I seem to remember someone saying a while ago that it's not clear to the average spamassassin admin (eg me) which rules use trusted and which use external. Is there either a place that explains it all - or is there some logic that anyone can tell me? Not crucial, but I'm interested. I think it's the forwarder who has to take care of spam... any further forwarding blurs the difference between ham and spam... I agree entirely. Chris
Re: PDF rule not matching -- split line content type?
* Jo Rhett wrote (16/08/07 07:41): Since nobody is paying attention Or they're asleep. Your messages were at 23:44 and 07:41 here. , let me clarify. The current rule is wrong: mimeheader __TVD_MIME_ATT_APContent-Type =~ /^application\/pdf/i mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/octet-stream.*\.pdf/i meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM __TVD_MIME_ATT_TP __TVD_MIME_ATT !__TVD_BODY This evaluates to exactly the same as this: meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM __TVD_MIME_ATT_TP !__TVD_BODY I believe that the original rule's intent was this: meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM __TVD_MIME_ATT !__TVD_BODY I don't think you're right. The rule looks like this to me: meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM # content-type is multi-part mixed __TVD_MIME_ATT_TP # and has a text-plain part __TVD_MIME_ATT# and has an attachment that is either __TVD_MIME_ATT_AP# application/pdf __TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf !__TVD_BODY # and has no non-whitespace text content Your rule would seem to match anything with no non-whitespace text content regardless of whether or not a pdf was attached. I was looking into this very rule about 3 days ago, because of false positives (client mailing out auto-generated pdfs which are being rejected by messagelabs), and I found that spamassassin -D told me all I needed to know about why some e-mail hit this rule and some didn't. Chris
Re: PDF rule not matching -- split line content type?
Jo Rhett wrote: Chris Lear wrote: * Jo Rhett wrote (16/08/07 07:41): Since nobody is paying attention Or they're asleep. Your messages were at 23:44 and 07:41 here. , let me clarify. The current rule is wrong: mimeheader __TVD_MIME_ATT_APContent-Type =~ /^application\/pdf/i mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/octet-stream.*\.pdf/i meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM __TVD_MIME_ATT_TP __TVD_MIME_ATT !__TVD_BODY This evaluates to exactly the same as this: meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM __TVD_MIME_ATT_TP !__TVD_BODY I believe that the original rule's intent was this: meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM __TVD_MIME_ATT !__TVD_BODY I don't think you're right. The rule looks like this to me: meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM # content-type is multi-part mixed __TVD_MIME_ATT_TP # and has a text-plain part __TVD_MIME_ATT# and has an attachment that is either __TVD_MIME_ATT_AP# application/pdf __TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf !__TVD_BODY # and has no non-whitespace text content Your rule would seem to match anything with no non-whitespace text content regardless of whether or not a pdf was attached. I did a full analysis of why the rule is broken, line by line in the message you replied to. But I'll do it again. (dropping __TVT_MIME_ for ease of typing) ATT is a meta of ATT_AP *or* ATT_AOPDF. But the PDF_FINGER01 requires ATT_TP as well as ATT. This means that really it will only work if ATT_TP matches. If ATT_A0PDF matches then it won't match. No go back up and read the text I quoted at the top. Because if this is the authors intent then you can shorten the rule, but I somehow don't think so. I read it. I think you got it wrong. The author's intent seems to accord with my analysis. I was looking into this very rule about 3 days ago, because of false positives (client mailing out auto-generated pdfs which are being rejected by messagelabs), and I found that spamassassin -D told me all I needed to know about why some e-mail hit this rule and some didn't. Perhaps. But maybe you have difficulty reading the line by line analysis I posted below, hm? I have ~200 messages here that are 100% spam that would match the fixed rule, which seems to be the authors intent. As I say, I read it. It was clear from the start that you didn't understand why the rule wasn't firing (and TVD, the rule author, explained that). It also appeared to me that your rewrite of the rule was the result of a misreading of the logic (or a misunderstanding of multipart mime). I thought I could elucidate. I stand by my comments, except that I misread your rewrite and thought it was looking only for text/plain, whereas it's looking only for pdf mime parts. Theo has explained it all now anyway, so there's no more to add. But forgive me. I should have known better than to step in to a Jo Rhett thread. I'll try not to do it again. Chris
Re: URIBL_BLACK matching on messages with no URLs in them...
Jo Rhett wrote: Note: yes, uribl has their own mailing list. That server has been down for quite some time, so I gave up and posted it here in case someone is dual listed and can fix it. There's no URL in this message. What is it mis-matching against? This has been answered, but, if you're still interested, also see http://marc.info/?l=spamassassin-usersm=113533589419731w=2 with details of a similar problem. Chris
Re: Rules report
* Matt Kettler wrote (19/04/07 14:49): Matt Kettler wrote: If you try to build it off a live feed and use SA's marking as the spam criteria, your statistics are useless. Any rule with a high enough score would get perfect results.. all the mail it matched would be spam, and no nonspam. You have, essentially, created a self fulfilling prophecy. The higher-scoring a rule is, the more likely messages that match it will be tagged as spam, even if they're not really spam. Self correction. Such stats aren't useless, it depends on what you want out of them. If you want to know how accurate a particular rule is, by comparing the spam vs nonspam hit rates, those stats are useless, because of the bias. You need a manually sorted corpus to get this kind of information. If you want to see which rules are getting used a lot, vs those that are rarely getting used, these stats are quite useful. If you want a top x rules list, sa-stats can do that for you: http://www.rulesemporium.com/programs/sa-stats.txt http://www.rulesemporium.com/programs/sa-stats-1.0.txt is probably a bit better in this case. It will parse a spamd logfile and report the most-frequently used spam and nonspam rules (and you can configure how many it will list for each) The 1.0 version can do per-domain and per-user info, given a 3.1 log. Chris
Re: New stock spam (2/14/07)
* Jonathan Nichols wrote (15/02/07 05:19): Maciej Friedel wrote: On 02/14/07 Jonathan wrote: http://www.pbp.net/~jnichols/spam2.txt 0.0 BOTNET_NORDNS IP address has no PTR record 0.1 HTML_50_60 BODY: Message is 50% to 60% HTML 0.0 HTML_MESSAGE BODY: HTML included in message 1.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5002] 5.0 BOTNET The submitting mail server looks like part of a Botnet i think botnet is a good idea maciek I thought botnet was unstable.. is it working ok now? It's not (in my experience) unstable. It's excellent. But the default score of 5 is way too high. It gets a lot of false positives, especially (again, in my experience) from small mail-order operations who don't understand dns (Exchange users, I rather uncharitably assume). I score botnet at 2 and I'm very happy with it. I reckon better network tests are the future of spam filtering, now that spammers are sending blocks of text from Harry Potter books along with undetectable URLs containing spaces etc. Chris
Re: complete false hits for BASE64 and LW_STOCK_SPAM4
* Loren Wilton wrote (08/02/07 19:46): As for LW_STOCK_SPAM4, it's being triggered by the fact that the message is base-64 encoded text AND has a Date: header that's missing a proper timezone. Apparently a batch of stock spam went out at some point with both of these abnormal features. I have to admit, it's a pretty rare combination. Date: February 6, 2007 9:52:29 AM PST That should, properly, should read something like this: Date: Wed, 06 Feb 2007 09:52:29 -0800 Actually LW_STOCK_SPAM4 was written on 02/19/2006, and is looking for a Base64 encoded message that has a valid timezone that is specifically \s\+, not an invalid time zone. Internally I have it scored at 5 points and haven't had a problem with it, but people don't send me messages from Blackberrys. I suppose a blackberry might not have a clock so send all messages as though they came from London regardless of where they are. That would somewhat surprise me, since cell phones certainly know where they are and what time it is. But if Verizon is involved then it is certainly possible that the software has been deliberately crippled in a number of ways, and creating a proper date header might be one of those deliberate malfunctions. Just to confirm that this unmodified rule does hit some legit blackberry e-mail, here's an example (apologies for the obfuscation, but I've only messed with addresses. It's not my e-mail): Return-path: someone's address Envelope-to: my wife Delivery-date: Wed, 07 Feb 2007 17:21:42 + Received: from smtp02.bis.eu.blackberry.com ([216.9.253.49]) by mail.barcombe.net with esmtp (Exim 4.63) (envelope-from the sender) id 1HEqUG-0008Ku-IV for my wife's address; Wed, 07 Feb 2007 17:21:41 + Message-ID: [EMAIL PROTECTED] Content-Transfer-Encoding: base64 Reply-To: the sender References: [EMAIL PROTECTED] In-Reply-To: [EMAIL PROTECTED] Sensitivity: Normal Importance: Normal To: My Wife Her address Subject: Re: 25th august From: the sender Date: Wed, 7 Feb 2007 17:22:58 + Content-Type: text/plain; charset=Windows-1252 MIME-Version: 1.0 X-AntiVirus: Clean X-Spam-Score: 2.1 X-Spam-Level: ++ X-Spam-Report: Barcombe.net spam report: Score = 2.1. Tests=BAYES_00=-2.599,LW_STOCK_SPAM4=1.66,MIME_BASE64_NO_NAME=0.224,MIME_BASE64_TEXT=1.885,NO_REAL_NAME=0.961 A bit of grepping suggests that LW_STOCK_SPAM4 has hit 5 ham and 3 spam (all scoring 20+) on that server since about November. So its usefulness is perhaps questionable. Normal disclaimer applies: this is only one low-traffic server. I live in the UK which might make the + timezone more likely. [Also see the thread Blackberry email] Chris (whose mail from blackberries has all been received OK)
Re: Techworld says spam shows sudden slide'?
Tony Finch wrote: On Thu, 11 Jan 2007, Michael Scheidell wrote: I don't think I see any sudden drop, was the worlds #1 spammer in that hut in fluga that got bombed last night? I haven't seen any drop recently either. For my systems (daily legit volume 300,000 and spam 10x that) the spam peak was in the first half of November and levels have been fairly constant (but with a level slightky lower than the peak) since then. I noticed a significant (absolute) drop towards the end of November. I put it down to a change of tactics: a reduction in the number of repeat-the-same-message-with-small-differences spam. These were previously skewing our stats upwards, because effectively the same spam from the same machine was being sent ~10-15 times to the same user with small text changes (we were rate-limiting connections to reduce the SA cost). This seems to be rarer now, or maybe even abandoned as a technique by spammers. Chris
Re: Easyjet e-mail scoring very high
* Chris Lear wrote (01/12/06 16:57): * Adam Stephens wrote (01/12/06 16:10): Chris Lear wrote: * Loren Wilton wrote (01/12/06 14:54): The html contains this sort of thing: http://www#46;easyjet#46;com/EN/Members/ Which looks like the culprit. In fact, every full stop in the html is represented as #46; for some reason. Still wondering though... how do you solve a problem like EasyJet? Sure looks like spam to me. ;-) Which also looks like just about every airline message I've seen from any airline. :-( Apparently they hired spammers to design their marketing campain mail. You could try sending to mostmaster or whatever at whichever marketing company is really sending that mail and see if you can get any attention from them. Probably not, but it might be worth trying. The trouble is, it's not marketing. It's a confirmation of a flight booking, which I paid for. The airline doesn't issue tickets. So it's something I genuinely want in my inbox. It looks like it's generated directly by the easyjet.com web server. I had some complaints about that this week; it's obviously a new issue, and it looks like it only applies to the ticket confirmations. Since people really need these booking confirmations I've whitelisted it - using a whitelist_from_rcvd rule seems to catch the booking confirmations only as the marketing material is sent from a different machine. Thanks for all the advice. I've reluctantly whitelisted them and written a polite message to [EMAIL PROTECTED] It doesn't seem to have bounced, so maybe someone will read it. I'll let you know if I get a response. Meanwhile, I suppose this is something for others to be aware of if you run an mta that rejects on high SA scores (and have users that might want to fly EasyJet). This thread is ancient now, but here's a followup: I never got a response from Easyjet, but I did get (today) a replica of the original e-mail. It's almost identical (same appalling html, still from savvis.net, but from a different ip), but missing a chunk of advertising (hotels, car rental, etc), and with some very slightly different wording about hand luggage. The new version hits these rules: DNS_FROM_RFC_ABUSE, FORGED_RCVD_HELO, [this is new] HTML_FONT_FACE_BAD, HTML_MESSAGE, HTML_TINY_FONT, MIME_HTML_MOSTLY, SARE_OBFU_AMP2B, SARE_SPEC_LEO_LINE03a, USER_IN_WHITELIST [because I whitelisted them] DNS_FROM_RFC_ABUSE HTML_FONT_FACE_BAD HTML_MESSAGE HTML_TINY_FONT MARKETING_PARTNERS [This has gone] MIME_HTML_MOSTLY MPART_ALT_DIFF [This has gone] SARE_OBFU_AMP2B SARE_SPEC_LEO_LINE03a Chris
Re: Botnet 0.6 plugin for Spam Assassin availabile
* Oliver Schulze L. wrote (18/12/06 15:42): Nice stats! How do you generate them in SA 3.1.7 ? I use this: http://www.rulesemporium.com/programs/sa-stats-1.0.txt Chris Thanks Oliver Chris Lear wrote: Here's some sa-stats output: TOP SPAM RULES FIRED -- RANKRULE NAME COUNT %OFMAIL %OFSPAM %OFHAM -- 1BOTNET 138166.37 90.866.44 2BAYES_99 127459.50 83.820.00 3HTML_MESSAGE 118475.06 77.89 68.12 4BOTNET_CLIENT104850.21 68.954.35 5BOTNET_IPINHOSTNAME 96245.45 63.291.77 6URIBL_BLACK 75135.12 49.410.16 7RCVD_IN_SORBS_DUL 72533.96 47.700.32 8URIBL_JP_SURBL68832.13 45.260.00 9BOTNET_CLIENTWORDS60829.61 40.004.19 10URIBL_SC_SURBL52424.47 34.470.00
Re: MSRBL
Bret Miller wrote: I'm more interested in the Image signatures it has. If they're really useful and reliable. I expect that keeping up with image spam wouldn't be very scalable, but it might at least help reduce some load (since we do virus scanning before letting Spam Assassin see a message) for whichever images are known. I ran about half a day yesterday with both images and spam signatures. Images hit a whopping 4 messages and spam hit about 40 with 3 FPs, both a very, very low percentage (way under 1%) of spam. ImageInfo does a much better job IMO. I'm using http://www.sanesecurity.com/clamav/ (on my home domain only at the moment) which saves sa some work (clamav runs before sa). About a third of the spam that was previously caught by sa is now caught by clamav instead. I tried MSRBL, but got very few hits. Sorry - no info about false positives, because anything that hits is rejected. I haven't heard from anyone, though. I'm surprised by how effective it is. Chris
Re: Botnet 0.6 plugin for Spam Assassin availabile
* John Rudd wrote (07/12/06 18:33): (I had a bout of insomnia last night, and got more done than I had pre-announced yesterday...) The next version of the Botnet plugin for Spam Assassin is ready. The install instructions are in the Botnet.txt file, and in the INSTALL file. For those who don't know what Botnet is, it's a plugin which tries to identify whether or not the message has been submitted by a botnet/spam-zombie type host by looking at its DNS characteristics (no reverse DNS, reverse DNS that doesn't resolve, or doesn't resolve back to the relay's IP, or reverse DNS that contains things that look like an ISP's client address). The places I've been using it, and the people I hear about who are using it, have seen a high degree of success. It can be downloaded from: http://people.ucsc.edu/~jrudd/spamassassin/Botnet.tar As usual, feedback, statistics, bug reports, feature suggestions, are all welcome. I've been running the BOTNET rules for a little while now. It's the most-hit rule on the machine (above BAYES_99 even). But I get a significant number of false positives. Here's some sa-stats output: TOP SPAM RULES FIRED -- RANKRULE NAME COUNT %OFMAIL %OFSPAM %OFHAM -- 1BOTNET 138166.37 90.866.44 2BAYES_99 127459.50 83.820.00 3HTML_MESSAGE 118475.06 77.89 68.12 4BOTNET_CLIENT104850.21 68.954.35 5BOTNET_IPINHOSTNAME 96245.45 63.291.77 6URIBL_BLACK 75135.12 49.410.16 7RCVD_IN_SORBS_DUL 72533.96 47.700.32 8URIBL_JP_SURBL68832.13 45.260.00 9BOTNET_CLIENTWORDS60829.61 40.004.19 10URIBL_SC_SURBL52424.47 34.470.00 I think the default score of 5 is far too high. I'm scoring it at 2 at the moment, which seems OK. I'd quite like to be able to give more score to BOTNET_IPINHOSTNAME than BOTNET_CLIENTWORDS, because it seems to give fewer false positives [I think this will probably improve in 0.6, though]. But this isn't a very big deal. So that's a mild vote against the __ prefix. I added p0f to my arsenal recently, hoping it would work to lower the false-positive rate of BOTNET by checking for Windows machines, but it seems that almost all the BOTNET false positives are Exchange servers, so p0f aggravates rather than mitigates that. Hope this feedback is useful. Thanks for the plugin. I take the view that network tests and RBLs (especially URIBLs), rather than body checks, are the best long-term spam-fighting tools. Chris
Re: SV: Help with understanding a rule
* [EMAIL PROTECTED] wrote (07/12/06 12:03): The list managers are the first ones who have to change. Yes, you are probably right. But: there must be a reason why the rule no_real_name exists? And if there is a rule (written or not) that From: headers should contain a real name, I want to follow it. And to follow it I need to convince my IT staff somehow... So, what is the reason behind no_real_name? Most MUAs, most of the time, put a real name into mail they send. It's standard setup. So not having a real name is, perhaps, a spam sign This isn't the same as contravening RFCs. Remember that there's a rule called HTML_MESSAGE as well, which might be a spam sign. Both of these are bound to hit ham a lot of the time, so scoring them high would be, at best, an unusual decision. Scoring them high enough to reject would be very unusual. As it happens, on a server I manage NO_REAL_NAME hits 5% of spam, and 25% of ham (much of which is not MUA-originated). So it's not a rule I'd like to reject on. But if a mailing list or a user has a you must provide a real name policy, spamassassin's flexible enough to be able to enforce it. Chris
Easyjet e-mail scoring very high
I got an EasyJet confirmation E-mail that scored like this: BAYES_00=-2.599 DNS_FROM_RFC_ABUSE=0.2 FORGED_RCVD_HELO=0.135 HTML_FONT_FACE_BAD=0.156 HTML_MESSAGE=0.001 HTML_TINY_FONT=2.324 MARKETING_PARTNERS=1.765 MIME_HTML_MOSTLY=1.102 SARE_OBFU_AMP2B=2.555 SARE_SPEC_LEO_LINE03a=0.408 Which adds to 6.0, and only the Bayes score stopped it being rejected (I'm rejecting at 6.5). [SA 3.1.3 with recent sa-update+SARE rules] What's the recommended practice here? Whitelist? Lower the SARE scores? Remove some less-safe SARE rules? Lower the HTML_TINY_FONT score [which looks right, but if it's right for me, why not everyone else]? I'd like all ham to score under 2, ideally. And almost all of it does. But I'd prefer not to whitelist if possible. I like to feel I can trust SA without introducing special cases. Here are the received headers: Received: from s217124rg180-p.uklond6.savvis.net ([213.174.202.180] helo=easyjet.com) by mail.barcombe.net with esmtp (Exim 4.60) (envelope-from [EMAIL PROTECTED]) id 1GpoFF-0007fV-Ne for [EMAIL PROTECTED]; Thu, 30 Nov 2006 15:54:47 + Received: from mail pickup service by easyjet.com with Microsoft SMTPSVC; Thu, 30 Nov 2006 15:54:50 + I think the Received: from mail pickup service line is causing the SARE_OBFU_AMP2B rule to fire. Am I right? If so, isn't this likely to be a reasonably common cause of false positives? Chris
Re: Easyjet e-mail scoring very high
* Loren Wilton wrote (01/12/06 13:57): HTML_FONT_FACE_BAD=0.156 HTML_MESSAGE=0.001 HTML_TINY_FONT=2.324 MARKETING_PARTNERS=1.765 MIME_HTML_MOSTLY=1.102 SARE_OBFU_AMP2B=2.555 SARE_SPEC_LEO_LINE03a=0.408 I think the Received: from mail pickup service line is causing the SARE_OBFU_AMP2B rule to fire. Am I right? If so, isn't this likely to be Nope. All of the rules above are effectively body rules, dealing mostly with various forms of HTML obfuscation. Thanks for pointing that out. I was being rather dim. The html contains this sort of thing: http://www#46;easyjet#46;com/EN/Members/ Which looks like the culprit. In fact, every full stop in the html is represented as #46; for some reason. Still wondering though... how do you solve a problem like EasyJet? Chris
Re: Easyjet e-mail scoring very high
* Loren Wilton wrote (01/12/06 14:54): The html contains this sort of thing: http://www#46;easyjet#46;com/EN/Members/ Which looks like the culprit. In fact, every full stop in the html is represented as #46; for some reason. Still wondering though... how do you solve a problem like EasyJet? Sure looks like spam to me. ;-) Which also looks like just about every airline message I've seen from any airline. :-( Apparently they hired spammers to design their marketing campain mail. You could try sending to mostmaster or whatever at whichever marketing company is really sending that mail and see if you can get any attention from them. Probably not, but it might be worth trying. The trouble is, it's not marketing. It's a confirmation of a flight booking, which I paid for. The airline doesn't issue tickets. So it's something I genuinely want in my inbox. It looks like it's generated directly by the easyjet.com web server.
Re: Easyjet e-mail scoring very high
* Adam Stephens wrote (01/12/06 16:10): Chris Lear wrote: * Loren Wilton wrote (01/12/06 14:54): The html contains this sort of thing: http://www#46;easyjet#46;com/EN/Members/ Which looks like the culprit. In fact, every full stop in the html is represented as #46; for some reason. Still wondering though... how do you solve a problem like EasyJet? Sure looks like spam to me. ;-) Which also looks like just about every airline message I've seen from any airline. :-( Apparently they hired spammers to design their marketing campain mail. You could try sending to mostmaster or whatever at whichever marketing company is really sending that mail and see if you can get any attention from them. Probably not, but it might be worth trying. The trouble is, it's not marketing. It's a confirmation of a flight booking, which I paid for. The airline doesn't issue tickets. So it's something I genuinely want in my inbox. It looks like it's generated directly by the easyjet.com web server. I had some complaints about that this week; it's obviously a new issue, and it looks like it only applies to the ticket confirmations. Since people really need these booking confirmations I've whitelisted it - using a whitelist_from_rcvd rule seems to catch the booking confirmations only as the marketing material is sent from a different machine. Thanks for all the advice. I've reluctantly whitelisted them and written a polite message to [EMAIL PROTECTED] It doesn't seem to have bounced, so maybe someone will read it. I'll let you know if I get a response. Meanwhile, I suppose this is something for others to be aware of if you run an mta that rejects on high SA scores (and have users that might want to fly EasyJet). Chris
Re: How do I stop these?
* John Rudd wrote (20/11/06 15:46): John Tice wrote: On Nov 20, 2006, at 10:00 AM, Nathan Zabaldo wrote: I am getting pounded by these types of emails. Does anyone else get these? What rule can I apply to have them killed. It's driving me nuts. Please help!!! These are scoring at about 4X my threshold without the SARE stock ruleset. You may need to tweak you scoring. I find bayes_99 to be reliable. FROM_LOCAL_NOVOWEL FORGED_RCVD_HELO BAYES_99 RCVD_IN_SORBS_DUL RCVD_IN_NJABL_DUL RelayCatcher is doing a fine job of keeping me from seeing most of the spam that's out there, lately. See any messages on this list with RelayCatcher in the subject. Particularly RelayCatcher 0.3 in the subject. ...or RelayChecker 0.3. Chris
Re: Amazon / RFCI false positives
* Tony Finch wrote (05/11/06 17:43): On Sat, 4 Nov 2006, Michael Scheidell wrote: So? Build something better. Its open source. Don't use the RFCI scores, drop them, stop bithing about somehting YOU can change. Well, I've added a -2 for email from Amazon, but I thought other people might like a warning. Thanks. Warning appreciated. I think that the people who made derogatory claims about Tony's logic, or claimed that you don't understand had failed to appreciate what These messages are wanted by their recipients so should not be scored as spam by SpamAssassin means. Anyone who disagrees with that piece of logic would appear to be using Spamassassin for a purpose that its designers didn't think of. Chris
Re: Amazon / RFCI false positives
jdow wrote: From: Chris Lear [EMAIL PROTECTED] * Tony Finch wrote (05/11/06 17:43): On Sat, 4 Nov 2006, Michael Scheidell wrote: So? Build something better. Its open source. Don't use the RFCI scores, drop them, stop bithing about somehting YOU can change. Well, I've added a -2 for email from Amazon, but I thought other people might like a warning. Thanks. Warning appreciated. I think that the people who made derogatory claims about Tony's logic, or claimed that you don't understand had failed to appreciate what These messages are wanted by their recipients so should not be scored as spam by SpamAssassin means. Anyone who disagrees with that piece of logic would appear to be using Spamassassin for a purpose that its designers didn't think of. Tony's phrasing implied that he thought the scoring was so wrong that it should be modified by the people who wrote the rule and ran it against mass checks. That logic is dead wrong. That logic, right or wrong, is yours, not Tony's. The correct phrasing might have indicated there is a problem for some sites with Amazon failing RFCi requiring a special rule to negate Amazon.com's negative scores on RFCi. I think that the correct phrasing was exactly what was given, in that case. I understood it, anyway. Demanding that the RFCi rules vanish into the night just is not going to fly. And it indicates flawed thought processes. Which, again, may or may not be true, but certainly wasn't even vaguely hinted at by Tony. These flawed thought processes appear (to me, but maybe I'm unusually pedantic) to be imaginary. Chris
Re: I'm thinking about suing Microsoft
* Marc Perkel wrote (25/10/06 05:22): Europeans have sued Microsoft many times. For anti-competitive behaviour, maybe. For copyright infringement, perhaps. But for attracting crime? For discriminating against owners of illegal software? I hope not. If you win, of course, you might take on php, perl and other easy-to-use web scripting languages that allow people to write crime-attracting sites that are easy targets for IRC bots etc. Plenty of scope for the Perkel suing machine. Unless your real gripe is simply that Microsoft a) is successful and b) insists on licensing software. Unfortunately, neither of these things is illegal in any country as far as I can tell. Chris Lear wrote: * Marc Perkel wrote (23/10/06 19:34): I'm considering filing a lawsuit against Microsoft to try to get an order to make them make public security updates for Windows to everyone, registered or not. The idea is that their product Windows creates a toxic byproduct (spam,ddos zombies) that interfere with everyone else's internet usage and that they have a responsibility to clean it up. It would be similar to a suit where a business that is otherwise legitimate attracts crime in a neighborhood or a manufacturer dumping toxic waste into a stream. Virus infected spam zombie are a toxic byproduct of their business model and it affects all of us and they have a duty to the public to fix it. I'm somewhat of a legal expert, not a lawyer though. But just wanted to get some feedback on the idea. Only in America...
Re: score=0.0 tests=none -- how can that be???
* Debbie D wrote (25/10/06 04:48): Matt Kettler [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Debbie D wrote: I'm just not getting it.. I have a whole list of custom rules, I use RulesDuJour, I have custom scores to mark stuff higher.. I have reasonable limits set.. the users do not adjust tings here, I do.. I use lint when I add scores and rules.. So tell me.. how in the past week or so I have 11 mails in *my* box that show: X-Spam-Status: No, score=0.0 required=4.5 tests=none Usually that means a timeout, or your milter was configured to skip SA for the message. How do you call SA? mimedefang? spamc call in procmail.rc? Exim 4.52 with SA and ClamAV I use spamc In that case, the header is (I'm fairly sure) not added by SA, but by exim. Try stopping spamd. Does exim still add the headers? If so, then the occasional occurrence is because spamd is overloaded. Look in the exim mail log for the mail in question. It might give the answer. Chris
Re: I'm thinking about suing Microsoft
* Marc Perkel wrote (23/10/06 19:34): I'm considering filing a lawsuit against Microsoft to try to get an order to make them make public security updates for Windows to everyone, registered or not. The idea is that their product Windows creates a toxic byproduct (spam,ddos zombies) that interfere with everyone else's internet usage and that they have a responsibility to clean it up. It would be similar to a suit where a business that is otherwise legitimate attracts crime in a neighborhood or a manufacturer dumping toxic waste into a stream. Virus infected spam zombie are a toxic byproduct of their business model and it affects all of us and they have a duty to the public to fix it. I'm somewhat of a legal expert, not a lawyer though. But just wanted to get some feedback on the idea. Only in America...
Re: Psst!
* Chris Santerre wrote (20/10/06 15:30): -Original Message- From: David B Funk [mailto:[EMAIL PROTECTED] Sent: Friday, October 20, 2006 1:20 AM To: users@spamassassin.apache.org Subject: Re: Psst! On Thu, 19 Oct 2006, Matt Kettler wrote: Another thing I've been noticing recently.. some idiot has been culling the web archives of mailing lists, and is trying to send spam emails to MESSAGE ID's of posts I've made. Check your mail logs! One or more of those would make a great spamtrap. Actually this kind of thing has been going on for some time. I still occasionally see spam sent to a Message-ID address derived from a machine that died years ago. The last owner of it was an active Usenet poster and is probably in all kinds of news archives. Just curious, but how many people see spam being sent to usersnames with the fisrt letter dropped? I see a ton in my logs. I believe spammers figure [EMAIL PROTECTED] will also have a [EMAIL PROTECTED] Too bad for them...they do not. :) Loads. Also with a variety of other manglings. One local part is dwoodhouse, and some rejected variations are: 8jwoodhouse 8odhouse dhouse oodhouse woodhousejwoodhouse ydoodhouse I can't see why they bother. Or maybe the address harvester is broken.
Re: ALL_TRUSTED creating a problem
* Jo Rhett wrote (19/10/06 08:55): Mark wrote: We cannot really say SA's autodetection is broken, because SA is designed to be called post-SMTP. Nor that a milter is broken per se for not adding a Received: header, as that is the responsibility of the MTA itself. But a milter using SA *can* be said to be broken if it's not proving SA with the required post-SMTP view of things. Instead of patching SA, or trying to fix it even, any milter using SA should simply DTRT (Do The Right Thing): which is: add a pseudo Received: header before handing it over to SA. You'all are way behind the boat. We've already patched it to support the undocumented requirement. That's not an issue. Perhaps SA being focused on post-SMTP is the problem here. Why is this the focus? In the modern world, you want to reject during SMTP not send backscatter to the poor folks whose e-mail got forged. Frankly, a milter environment is the only possible right way to run SA. So why the constant comments as if this is some one-off weird config? Frankly, anyone who considers the way they do things to be the only possible right way is in danger of being Just Plain Wrong. [further spleen-venting withheld]
Re: SA 3.1.7 children hang but don't die
* David B Funk wrote (19/10/06 03:47): On Wed, 18 Oct 2006, Sandy S wrote: Daryl - I switched back to 3.1.5 after my last post, and am sorry to report that I'm still seeing the same issue under 3.1.5. After running a while, the processes in a state of K start building up until I manually kill them. Regretfully (VERY regretfully) turning off FuzzyOCR. Sandy I'll second this, SA 3.1.5 FuzzyOCR on RHEL-AS4 I've been seeing this off on ever since I added FuzzyOCR. Logs seem to correlate to FuzzyOCR processing a gif image during a peak of messages. Get FuzzyOcr.log message: FuzzyOcr received timeout after running 10 seconds. I'm running SA 3.1.5 with FuzzyOCR. I'm seeing errors in the FuzzOCR log, like this: [2006-10-18 09:34:24] FuzzyOcr received timeout after running 10 seconds. [2006-10-18 09:49:14] FuzzyOcr received timeout after running 10 seconds. [2006-10-18 10:09:26] Unexpected error in pipe to external programs. Please check that all helper programs are installed and in the correct path. (Pipe Command /usr/bin/gifasm -d /tmp/.spamassassin2589Eye8ALtmp/out, Pipe exit code 1 (), Temporary file: /tmp/.spamassassin25893ZSX3Ltmp) But I'm no longer getting children in the K state, since I put a spamd restart into the logrotate script. I haven't turned off FuzzyOCR which is doing an excellent job for me. This isn't particularly conclusive, I'm afraid, because when I was seeing the problem it was sporadic and occasional, so it might just be luck, though it's been OK for a few days. Chris
Re: tmp files being left over from FuzzyOCR?
* Bill wrote (19/10/06 14:03): Since I installed FuzzyOCR I've noticed I'm having a lot of files named similar to .spamassassin8932mZBFrtmp left in my /tmp folder. These are from FuzzyOCR, correct? The content of these files has lots of spaces, hyphens, commas with a few readable words and the word picture a few times. Is there something I need to do to ensure these files are removed? After I manually remove them I see new tmp files being created and removed but sometimes a file is NOT removed. I suspect that if you look in your FuzzyOCR log, you will find errors that match the unremoved temp files. Eg from my FuzzyOCR.log: [2006-10-18 10:10:47] Unexpected error in pipe to external programs. Please check that all helper programs are installed and in the correct path. (Pipe Command /usr/bin/gifasm -d /tmp/.spamassassin2591CHsvrEtmp/out, Pipe exit code 1 (), Temporary file: /tmp/.spamassassin2591dNqOn7tmp) I see that /tmp/.spamassassin2591CHsvrEtmp/ is still there, but /tmp/.spamassassin2591dNqOn7tmp isn't. And another example: [2006-10-18 09:34:24] FuzzyOcr received timeout after running 10 seconds. #ls -l /tmp/.spamassassin* | grep 09:34 -rw--- 1 spamd users 0 Oct 18 09:34 /tmp/.spamassassin2589Wc3z7Gtmp -rw--- 1 spamd users 23579 Oct 18 09:34 /tmp/.spamassassin2589yvpP1Htmp Looks like when gifasm fails, you get a dir left over. If there's a timeout, you get a file left over. Chris
Re: tmp files being left over from FuzzyOCR?
* Bill wrote (19/10/06 15:29): I'm using FuzzyOcr-2.3b and I can't find any reference to this option in any of the FuzzyOCR software I downloaded. focr_keep_bad_images 0 Here's a sample of the items in my /tmp folder. You said your's were folders, mine's not. All of these files are left behind as at the time I made this sample it was 9:25. Look in your FuzzyOCR log. If it's like mine, you will see timeouts like this: [2006-10-18 09:49:14] FuzzyOcr received timeout after running 10 seconds. If the times on these timeouts match the times on the temp files, then that's what's causing them. That logic works for what I'm seeing. === CIRCULAR 230 DISCLOSURE: Pursuant to Regulations Governing Practice Before the Internal Revenue Service, any tax advice contained herein is not intended or written to be used and cannot be used by a taxpayer for the purpose of avoiding tax penalties that may be imposed on the taxpayer. === Shame. I was hoping to get out of paying some tax. CONFIDENTIALITY NOTICE: This electronic mail message and any attached files contain information intended for the exclusive use of the individual or entity to whom it is addressed and may contain information that is proprietary, privileged, confidential and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any viewing, copying, disclosure or distribution of this information may be subject to legal restriction or sanction. Please notify the sender, by electronic mail or telephone, of any unintended recipients and delete the original message without making any copies. I hope I was the intended recipient, but I'm not sure how I can know.
Re: Spamd not killing children
* Chris Lear wrote (16/10/06 10:32): The problem I'm having is that spamd doesn't seem to be able to clean up unwanted idle child processes. [...] I've had a look in the spamd code, and I'm now wondering whether my problem is related to logging bugs (eg http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4237). I've set logrotate to restart spamd after syslog restarts as per the advice in http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4316. Hopefully this will fix it. I'm still unsure whether this is a spamd bug or not. Chris
Spamd not killing children
Subject sounds unpleasantly like incitement to filicide, for which I apologise. The problem I'm having is that spamd doesn't seem to be able to clean up unwanted idle child processes. Here's the logfile evidence: Oct 16 00:12:59 marvin spamd[6351]: prefork: child states: III Oct 16 00:13:09 marvin spamd[18043]: spamd: connection from localhost [127.0.0.1] at port 35720 Oct 16 00:13:09 marvin spamd[18043]: spamd: setuid to spamd succeeded Oct 16 00:13:09 marvin spamd[18043]: spamd: checking message [EMAIL PROTECTED] for spamd:210 Oct 16 00:13:12 marvin spamd[25627]: spamd: connection from localhost [127.0.0.1] at port 35722 Oct 16 00:13:12 marvin spamd[25627]: spamd: setuid to spamd succeeded Oct 16 00:13:12 marvin spamd[25627]: spamd: checking message [EMAIL PROTECTED] for spamd:210 Oct 16 00:13:14 marvin spamd[18043]: spamd: identified spam (29.7/5.0) for spamd:210 in 5.3 seconds, 1545 bytes. Oct 16 00:13:14 marvin spamd[18043]: spamd: result: Y 29 - BAYES_99,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=5.3,size=1545,user=spamd,uid=210,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=35720,mid=[EMAIL PROTECTED],bayes=0.891,autolearn=spam Oct 16 00:13:15 marvin spamd[6351]: prefork: child states: IBK -^ [...] Time passes, and spamd continues to work [...] Oct 16 10:18:00 marvin spamd[6351]: prefork: child states: IIKK -^^ spamd seems to be trying to kill child processes to get the number of threads down to 2. But for some (apparently unreported) reason the threads don't die, and the server is slowly collecting children marked as K. I recently upgraded spamassassin to 3.1.5, and I also installed FuzzyOcr, which I suspect might be part of the problem. Can anyone tell me a) what logs to look in to work out why this has happened? (I've looked in the FuzzyOcr log, which does show some errors and timeouts, but apparently none at relevant times), b) whether there's anything I can do about it (I'll start by disabling FuzzyOcr, but I'd like to use it), or c) whether there's a spamassassin bug? I looked at the code in SpamdForkScaling.pm, and I see that there are 2 places where child processes are killed. In one place (sub child_error_kill, line 134), there is a warn line if the kill fails. In the other (sub need_to_del_server, line 732) there isn't. Chris
Re: DEAR_SOMETHING rule scoring issue
* Gregory T Pelle wrote (09/08/06 15:14): What is the procedure to have a rule score reviewed? I have been looking over the scoring for version 3.1.x at http://spamassassin.apache.org/tests_3_1_x.html and think that a score of 1.6 is high for the DEAR_SOMETHING rule. I know that our customer support emails have the first line as Dear customer's name It would seem to me that any business that is trying to sound professional would have emails that hit this rule. Where I work I'm always trying to persuade the people who write bulk e-mail to customers *not* to start it with Dear customer's name, because I think it does the opposite of sounding professional. But maybe it's just me. They are indeed trying to sound professional, and think that personalising the e-mail with Dear will do that, and I don't seem to win the argument. It hasn't made me lower the DEAR_SOMETHING score, though. Chris
Re: Allowing IMAP/POP to Send Email
* Marc Perkel wrote (03/08/06 14:39): Tony Finch wrote: The reason that message submission is done with SMTP is because of the number of SMTP extensions that the MUA will want to use, in particular DSNs, deliver-by, deliver-after, message tracking, and whatever else may be invented in the future. If you want to make message submission a part of IMAP and POP then you'll have to re-do all these SMTP extensions twice, which is a colossal waste of time. Not really - what I'm proposing is that the IMAP connection just pipe the message into an SMTP server. The IMAP is acting only and an authenticated connection back to SMTP. I'm not suggesting replacing SMTP. What I'm suggesting is that POP/IMAP can be used as a transport to get the mail there because it's an existing connection, is already established, is already authenticated with the credentials of the email account, and it isn't a port that people would block like port 25 is. I'm not trying to replace SMTP. I'm just trying to suggest a better way for end users to get outgoing email to the SMTP server. What if I set up an SMTP server at home behind my ADSL router, collect my vanity-domain mail there, and access it via IMAP or POP3? It seems I only have one option, which is to send my mail via IMAP to my home server. Which then sends via SMTP to... the Internet (or via a smarthost). And the home server sending via SMTP is going to look a bit like a MUA sending via SMTP. How would you tell the difference? Is a home mail server outlawed in the brave new world? Or does my SMTP server have to learn to talk IMAP to make message submissions to the ISP's server? Chris
Re: exim4 + forwarding + spamassassin
* Zinski, Steve wrote (27/07/06 02:50): Not sure how to get exim to pass the initial scan to spamd using a different user. I've gone through my exim.conf file and changed every single user = entry to a known user and it still insists on using nobody for the first pass. Another thing that intrigues me is the wording of the log entries. In the first pass, spamd says that it's checking the message. In the second pass it says processing the message. I think exim only puts the message through spamassassin once (then subsequently caches the result, if required), and uses the username set up in the acl: # Reject messages with a SpamAssassin score 7 deny message = Rejected: Flagged as spam ($spam_score). spam = nobody:true ^^ - **here** condition = ${if {$spam_score_int}{70}{1}{0}} I have a similar setup, except that I run spamc as a user called spamd. This gives site-side bayes, and works fine. Is it possible that the second run through spamd is from you running spamc after the message is delivered? Ie, not from exim? There's an exim-users mailing list that's probably a better place for these questions. Chris -Original Message- From: Stuart Johnston [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 26, 2006 3:05 PM To: users@spamassassin.apache.org Subject: Re: exim4 + forwarding + spamassassin Your first scan is running as nobody (that's bad) but the second is running as szinski. That would explain the BAYES_99. I'm not sure about the FORGED_RCVD_HELO and HTML_50_60 though. Zinski, Steve wrote: I need some help trying to figure out why spamassassin scores the same message differently. I am using an ACL with exim4 to scan email during the actual smtp connection (so I can reject spam before my server accepts it). It's pretty straightforward. My ACL looks like this: # Reject messages with a SpamAssassin score 7 deny message = Rejected: Flagged as spam ($spam_score). spam = nobody:true condition = ${if {$spam_score_int}{70}{1}{0}} Everything works just fine for mail destined to local accounts, but there seems to be a discrepancy in spamassassin when mail is delivered to a forwarded account (the forwarder directs mail to another local account; i.e., [EMAIL PROTECTED] -- [EMAIL PROTECTED]). What happens is that spamassassin scores the message low (non-spam) when it accepts it from the Internet, but then scores it higher (as spam) when the message is rerouted to the local mailbox. Here is a snippet from maillog that illustrates this: Jul 26 07:58:20 vps spamd[7361]: spamd: connection from localhost [127.0.0.1] at port 56458 Jul 26 07:58:20 vps spamd[7361]: spamd: setuid to nobody succeeded Jul 26 07:58:20 vps spamd[7361]: spamd: checking message [EMAIL PROTECTED] for nobody:99 Jul 26 07:58:20 vps spamd[7361]: spamd: clean message (2.6/5.0) for nobody:99 in 0.1 seconds, 2230 bytes. Jul 26 07:58:20 vps spamd[7361]: spamd: result: . 2 - HTML_MESSAGE,URIBL_SBL,URIBL_WS_SURBL scantime=0.1,size=2230,user=nobody,uid=99,required_score=5.0,rhost=local host,raddr=127.0.0.1,rport=56458,mid=[EMAIL PROTECTED] 8,autolearn=no Jul 26 07:58:20 vps spamd[26587]: prefork: child states: II Jul 26 07:58:21 vps spamd[7361]: spamd: connection from localhost [127.0.0.1] at port 56459 Jul 26 07:58:21 vps spamd[7361]: spamd: setuid to szinski succeeded Jul 26 07:58:21 vps spamd[7361]: spamd: processing message [EMAIL PROTECTED] for szinski:503 Jul 26 07:58:21 vps spamd[7361]: spamd: identified spam (7.5/5.0) for szinski:503 in 0.6 seconds, 2183 bytes. Jul 26 07:58:21 vps spamd[7361]: spamd: result: Y 7 - BAYES_99,FORGED_RCVD_HELO,HTML_50_60,HTML_MESSAGE,URIBL_SBL,URIBL_WS_SUR BL scantime=0.6,size=2183,user=szinski,uid=503,required_score=5.0,rhost=loc alhost,raddr=127.0.0.1,rport=56459,mid=[EMAIL PROTECTED] hn8,bayes=0.97051713734,autolearn=no As you can see, during the initial smtp pass (accepting from remote host) the message is deemed clean with a score of 2.6. Then, when the same message is delivered to the local account, it's identified as spam with a score of 7.5. Unfortunately, my ACL only kicks in during the first pass so the message gets accepted and delivered instead of rejected. Anyone know what I might be doing wrong here? Any help would be greatly appreciated. Steve Zinski University of Richmond
Re: The best way to use Spamassassin is to not use Spamassassin
* Marc Perkel wrote (12/07/06 18:30): Catchy subject line eh? OK - so what I mean by this is that I now use SA for about 5% of all incoming email. The reaso of spam is rejected before I get to SA through a fairly large number of tricks that allow me to determine with near 100% accuracy things that are spam. It is none mostly through behavior and karma related lists. Being host blacklisted or URI blacklisted. I don't know if it's relevant to Marc's point, but it seems to me that if SA was reduced to network checks only it would still be a very good blocker of spam. And perhaps what Marc is doing is, more or less, moving SA's network checks into the MTA and using them to reject rather than just score. I suppose something similar would be to score all the URIBL rules and RCVD_IN rules high, and abandon the traditional regex rules. Network checks are easily the most hit spam rules in SA anyway. Here's a bit of sa-stats for spam on a machine I look after (the MTA blocks based on sbl-xbl.spamhaus.org before anything gets to SA, so that's not represented here): 1BAYES_99 2URIBL_BLACK 3URIBL_SBL 4URIBL_JP_SURBL 5URIBL_OB_SURBL 6RCVD_IN_SORBS_DUL 7RCVD_IN_NJABL_DUL 8HTML_MESSAGE 9FORGED_RCVD_HELO 10URIBL_SC_SURBL 11URIBL_WS_SURBL 12SARE_MLB_Stock6 13URIBL_AB_SURBL 14SARE_MLB_Stock1 15STOCK_NAME_FVGT1 Of course that 5% is very important because that is where I get the data for the other tests that allow me to bypass filtering. Even this isn't necessarily so. Data for network tests can be collected automatically, by trapping spammers who trawl the web/usenet for addresses, those who scan for open port 25s, or those who try high MX's. So at least some useful data can be collected without SA, or even human intervention. But - I want you all to start thinking of a new way to look at spam filtering. I'm not sure this is a new way to look at spam filtering, but I agree that content testing against regular expressions is increasingly looking like a crude and easily-outwitted technique compared to dns tests. Bayes is still good, though.
Re: sa-learn script
* Nicholas Payne-Roberts wrote (11/07/06 11:58): Does anybody know a good way to script sa-learn to daily check on junk e-mail folders? i'm currently trying the following line in a cron.daily script, but its throwing up an error: find /home/vpopmail/domains -name .Junk E-mail -exec sa-learn --showdots --spam cur {} \; Your --exec subcommand is the problem. The {} expands to the full path of the found file. It doesn't change directory. A version that might work is find /home/vpopmail/domains -name .Junk E-mail -exec sa-learn --showdots --spam {}/cur \; There's not much point using --showdots in cron, I would have thought, but it's probably useful for testing. To make sure your find command is right, you can do something like this: find /home/vpopmail/domains -name .Junk E-mail -exec echo sa-learn --showdots --spam {}/cur \; which will simply echo a list of commands that would get executed. Chris
Yahoo! SpamGuard spam
I was entertained by this. A score of 5.491 added to an e-mail because of a Yahoo! advert stuck on the bottom by the Yahoo! MTA. And the advert is for SpamGuard. [... headers chopped... ] X-Spam-Score: 2.9 X-Spam-Level: ++ X-Spam-Report: Spam report: Score = 2.9. Tests=BAYES_00=-2.599,DRUGS_ERECTILE=0.493,DRUGS_ERECTILE_OBFU=2.408,FUZZY_VPILL=0.924,SARE_OBFU_VIAGRA=1.666 [... email body chopped ...] ___ All New Yahoo! Mail � Tired of [EMAIL PROTECTED]@! come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html Chris
Re: Lots of missed spam
* Leigh Sharpe wrote (29/06/06 03:03): This was my first suspicion. I turned off Bayes tests temporarily and it had little effect. I'm seriously considering resetting the bayes and starting again I can recommend that. I had a situation a while ago where the bayes database got mysteriously corrupted (sa-learn dump magic suddenly showed nspam way way less than nham). I deleted the whole bayes database, did a bit of manual training, let it carry on with the automatic training, and it was all fine again in a day or so. If spam hits BAYES_00 (which carries a negative score), you're better off without bayes at all. But with good bayes, most of the spam you've posted will be blocked. The difference between BAYES_00 and BAYES_99 is +6.099. So a small negative score with BAYES_00 will be sent over 5 by BAYES_99. Chris
Re: Suing Spammers
* jdow wrote (14/05/06 02:09): From: Gary W. Smith [EMAIL PROTECTED] On another paw, Craig, do consider who is the injured party. Marc is not. The final recipient, the addressee, is an injured party for the spam in her mailbox. The addressee's ISP is also an injured party due to the (vastly) increased mail volume her servers must handle. They have a tort for filing suit. The person who filters the spam is, one can argue, benefiting from the spam. So it is hard for him to sue and win anything. I disagree. As a provider you are paying for the acceptance, processing, storage and re-transmission of that spam. It is costing you resources which can be quantified. My boxes have been running at about 15% on average, 24x7. Knowing that spam is 80% of that then you might be able to prove in a court of law that it is indeed damaging you financially to process this. But the burden gets turned back to you to prove this damage. So the question is what the return will be versus the cost of proving it. Unless you are processing millions of spams per day from a single spammer then more than likely you will be hard pressed to see any type of return. jdow Waitaminit - Marc heavily implied that he was offering a spam filtering service. If that is true then Marc is not being injured. The spam is his bread and butter, regardless of how much he wishes to be put out of that business. What if he's not providing a spam filtering service, but a clean e-mail service? Then the spammer is the enemy, not the bread and butter. And it's the same service even if all spammers boycott his servers. Indeed, I imagine he would get more customers if all spammers boycotted his servers. jdow That is why I made comment of three cases, the actual end recipient, the actual end recipient's ISP, and the spam filtering service provider. Of the three the first can sue and win something nominal. In the second case the ISP has so much bulk that the costs of the filtering and extra machinery are demonstrable injuries that amount to big money. The third case is a person actually making the spam filtering his business. In what way is that third person being injured? In just the same way as the ISP, it seems to me. He's trying to provide a service (delivering legit E-mail), and incurs demonstrable costs. Chris
Can spamassassin stop this?
I run a fairly uncompromising spamassassin, which rejects mail scoring 5.5 or above (and in my own mailbox, I treat anything scoring over 0 as suspect). I find that almost all false negatives that slip through are the result of a not-perfectly-trained site-wide bayes database [Basically, I train it, so it works well for me. Hardly anyone else bothers]. I run lots of network tests, which work really well. But this e-mail looks like it would never get blocked. Does sa have a hope against this, or have the spammers finally come up with something that can't be filtered? Even with BAYES_99 (default score 3.5) it would score just under 5.5. This is the first time I've noticed a spam e-mail that I can't see how spamassassin could kill. Chris = Return-path: [EMAIL PROTECTED] Envelope-to: [EMAIL PROTECTED] Delivery-date: Fri, 12 May 2006 04:52:03 +0100 Received: from bzq-88-155-227-248.red.bezeqint.net ([88.155.227.248]) by marvin.thomasmurray.com with smtp (Exim 4.54) id 1FeOh7-0001os-6a for [EMAIL PROTECTED]; Fri, 12 May 2006 04:52:03 +0100 From: kalyn kari [EMAIL PROTECTED] To: dacia katelin [EMAIL PROTECTED] Subject: Was it love, or was it the thought of being in love? Date: Fri, 12 May 2006 03:52:03 + Message-ID: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: PHP/4.4.0 X-Marvin-Spam-Score: 1.9 X-Marvin-Spam-Level: + X-Marvin-Spam-Report: Marvin spam report: Score = 1.9. Tests=BAYES_50=0.001,HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.001,RCVD_IN_NJABL_DUL=1.946 X-Marvin-AntiVirus: Clean !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html head meta http-equiv=Content-Type content=text/html; charset=us-ascii /head body Hullo!brbr [E]rectilebr [D]ysfunction?brbr We can help! Our site: bochhorfando/b[dot]bcom/b ;) Don't forget to replace b[dot]/b to b./bbrbr ---br cigarette after another and extinguishing them on the edge of a full ash tray, with Dolly, and with the old prince, where there was talk about dinner, about politics, about Marya Petrovna's illness, and where Levin suddenly forgot for a minute what was happening, and felt as though he had waked up from sleep; the other was in her presence, at her pillow, where his heart seemed breaking and still did not break from sympathetic suffering, and he prayed to God without ceasing. And every time he was brought back from a moment of oblivion by a scream reaching him from the /body/html
Re: Could you scan your logs for me?
* Ole Nomann Thomsen wrote (03/02/06 09:27): Hi, can I ask a small favor from some of you running SA with Bayes enabled: Please run the following perl-oneliner on your SA-log (mine is current): perl -ne 'if (/result:/) {$n++; $b++ if (/BAYES/);} } print $b/$n,\n; {' current (I promise it's not a rootkit :-) I get: 0.710109622411693 I suspect you really ought to see 1, always. What do you get? 0.960777058279371 In my case, the difference is attributable to this in local.cf: bayes_ignore_to users@spamassassin.apache.org whitelist_to users@spamassassin.apache.org Chris
Re: Another URL obfuscation
* Jeff Chan wrote (10/01/2006 15:42): On Tuesday, January 10, 2006, 6:17:38 AM, Larry Rosenbaum wrote: I found this obfuscated URL in a drug spam: A href=3Dhttp://gozifo .upze5otbbutzanbb655k685ys5nn%2Eridgykh= comFONT SIZE=3D2/FONT Good grief, does any mail client actually parse that as a functional URI? Yes. In your e-mail, my Thunderbird created a clickable link to http://gozifo My IE gives a DNS error when it tries that address. My FireFox redirects to http://www.google.com/search?btnI=I%27m+Feeling+Luckyie=UTF-8oe=UTF-8q=gozifo which in turn redirects to http://www.vojir.com/other/basic-myebol.html which gives a 404 error. It's probably possible to turn this (mis)feature off in FireFox, but there it is by default. I have no idea whether this is the original intention of the obfuscation. I would guess not - and if it's viewed as html to start with that might make a difference. Chris
Re: SARE_URI_EQUALS false positives
* Loren Wilton wrote (24/12/2005 00:23): Does anyone have any suggestions, apart from simply reducing the score for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to guarantee that only real uris are parsed as such? Several. Hi. Thanks for the response. I'm replying rather late due to pressures of Christmas. 1.Change your report generator to remove the extraneous dot between updated and by. Or change it to the more common underscore, if you insist on these words being connected for some reason. 2.Put spaces around the equal sign. These are fine suggestions, but sadly not practical. The e-mails are auto-generated diffs from cvs commits. The files being committed are java properties files. In particular, the updated.by property contains internationalised versions of the phrase Updated by. The more common underscore would be unusual in the java properties file, and expecting the developers to change the way they work to avoid SARE misfires is a slightly overzealous reaction to the spam problem, I think. However, it is possible if there's no sensible alternative. The second suggestion is only a workaround, not a fix, anyway, because spamassassin will still check http://updated.by as a uri. 3.If you are reluctant for the correct fix, drop the score on the uri_equals rule to 4 or maybe 3, depending on what else your report manages to hit. I am reluctant to use the correct fix. Actually I'm inclined to think that the word correct is being misapplied here. I've changed the scores appropriately, though. 4.You could submit a Bugzilla on the parsing of that phrase. But frankly I consider the bug in the report generation, not SA's parsing of strange syntax. The reason I didn't submit a bug was that I was not sure there was one - hence the original query. And I'm still not going to submit a bug, because I'm persuaded that there is not one. What bothered me (and still does a bit) was that the string updated.by=anything matches a rule that looks for uris of the form http(s)://*=*. Ie the http(s) is conjured out of nowhere for schemeless uris. I can see the point, but I thought it would be worth bringing a possible problem to light. It's a possible problem, not a bug per se, and the subsequent discussion shows that people take different views on the seriousness of this kind of parsing issue. One thing that hasn't been mentioned in respect of this is that if spamassassin is looking aggressively for schemeless uris, it could in some cases create quite a lot of unwanted uri checking traffic. I'm happy to stick with what I've got now. I've sent some examples off as indicated so that the SARE corpus will contain my mail in future. Chris
SARE_URI_EQUALS false positives
I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is therefore skewing the scoring of some mail quite badly. The weird thing is that the uris that spamassassin is complaining about aren't uris at all. The mail in question is auto-created reports of cvs diffs, so it's slightly unusual. I've tried to condense the debug information. Here it is: This is some of the output from spamassassin -D false_positive [16733] dbg: uri: parsed uri found, updated.by=Mis [16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis [16733] dbg: uri: cleaned parsed uri, updated.by=Mis [16733] dbg: uri: parsed uri found, http://updated.by=Mis [16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis [16733] dbg: uri: parsed uri found, updated.by=Updated [16733] dbg: uri: cleaned parsed uri, updated.by=Updated [16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated [16733] dbg: uri: parsed uri found, http://updated.by=Updated [16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated These parsed uris are not links in the e-mail. They are just text. I've had a bit of a look at the regexps that spamassassin uses to work out what is a uri, and it seems that updated.by=Updated is treated as a uri because .by is a valid tld and spamassassin looks for schemeless uris, then prepends http:// for the tests. I'm running spamassassin 3.1.0 on perl 5.8.2. Does anyone have any suggestions, apart from simply reducing the score for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to guarantee that only real uris are parsed as such? Chris
Re: SARE_URI_EQUALS false positives
* jdow wrote (23/12/05 11:26): From: Chris Lear [EMAIL PROTECTED] I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is therefore skewing the scoring of some mail quite badly. The weird thing is that the uris that spamassassin is complaining about aren't uris at all. The mail in question is auto-created reports of cvs diffs, so it's slightly unusual. [...] I've had a bit of a look at the regexps that spamassassin uses to work out what is a uri, and it seems that updated.by=Updated is treated as a uri because .by is a valid tld and spamassassin looks for schemeless uris, then prepends http:// for the tests. I'm running spamassassin 3.1.0 on perl 5.8.2. Does anyone have any suggestions, apart from simply reducing the score for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to guarantee that only real uris are parsed as such? Before you drop the score precipitously check if there is some other characteristic of the emails that trigger falsely which can be used to apply a negative score. If there is such a characteristic then generate the appropriate negative score. If not weigh how effective the rule is for you. The version of sa-stats.pl that is on the SARE site helps figure this out nicely. That said it's close to a 50/50 rule that hits on very few messages here so should have a low score. (It hit on 6 messages out of 75000.) Cutting it out completely here seems like it would be effective TODAY. That could change. At one time it was quite necessary. Spammer fads change.) I've reduced the score, and a quick check shows that that rule hits almost nothing anyway, so it's not a big problem. The bayes rules were keeping the false positives from doing much damage, anyway. But spamassassin uses uris for lots of things, and if it's commonly parsing (reasonably) normal text as uris, I would expect that to be a problem in more rules than just SARE_URI_EQUALS. Chris
Re: SARE_URI_EQUALS false positives
* jdow wrote (23/12/05 12:06): From: Chris Lear [EMAIL PROTECTED] * jdow wrote (23/12/05 11:26): From: Chris Lear [EMAIL PROTECTED] I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is therefore skewing the scoring of some mail quite badly. The weird thing is that the uris that spamassassin is complaining about aren't uris at all. The mail in question is auto-created reports of cvs diffs, so it's slightly unusual. [...] I've had a bit of a look at the regexps that spamassassin uses to work out what is a uri, and it seems that updated.by=Updated is treated as a uri because .by is a valid tld and spamassassin looks for schemeless uris, then prepends http:// for the tests. I'm running spamassassin 3.1.0 on perl 5.8.2. Does anyone have any suggestions, apart from simply reducing the score for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to guarantee that only real uris are parsed as such? Before you drop the score precipitously check if there is some other characteristic of the emails that trigger falsely which can be used to apply a negative score. If there is such a characteristic then generate the appropriate negative score. If not weigh how effective the rule is for you. The version of sa-stats.pl that is on the SARE site helps figure this out nicely. That said it's close to a 50/50 rule that hits on very few messages here so should have a low score. (It hit on 6 messages out of 75000.) Cutting it out completely here seems like it would be effective TODAY. That could change. At one time it was quite necessary. Spammer fads change.) I've reduced the score, and a quick check shows that that rule hits almost nothing anyway, so it's not a big problem. The bayes rules were keeping the false positives from doing much damage, anyway. But spamassassin uses uris for lots of things, and if it's commonly parsing (reasonably) normal text as uris, I would expect that to be a problem in more rules than just SARE_URI_EQUALS. That is a standalone rule. And I do note that many of the SARE rules have severe problems in very specific cases. There are some mailing lists that are not well filtered for spam which have postings which trigger some of the too effective to toss SARE rules. I've developed some massive meta rules to at least partially get a handle on the problem. (A number of times XXX hit option would be nice to have for this.) Sorry to go on, but I wonder whether you've missed by point. The SARE_URI_EQUALS rule is working fine. It just looks in the uris that spamassassin gives it, and complains when they contain =. The problem is that spamassassin is treating things that aren't uris as uris. So SARE_URI_EQUALS is working on dud data. In this specific case, the e-mail contains the text updated.by=Updated. This is not a uri, and nor should it be treated as one. But spamassassin thinks it is (becasue .by is a valid tld), so, as far as I can tell, *all* uri rules will check it. It so happens that SARE_URI_EQUALS hits in this case, but other uri rules are vulnerable to false positives if the uri parsing is wrong, aren't they? Chris
Re: How can i block this?
* Matt Kettler wrote (10/11/05 19:37): Alessio wrote: I have received this mail, the heading from is blank! Is possible? Yes, it's quite normal and is called a message with a null return path. Is it? I thought the return path (or envelope sender) was quite distinct from the From: header in the message itself. Bounce messages usually have From: headers (normally showing [EMAIL PROTECTED]). A blank From: header is possible, but it's unusual in normal mail from MUAs. Chris
Re: How can i block this?
* mouss wrote (10/12/05 13:13): Chris Lear a écrit : * Matt Kettler wrote (10/11/05 19:37): Alessio wrote: I have received this mail, the heading from is blank! Is possible? Yes, it's quite normal and is called a message with a null return path. Is it? I thought the return path (or envelope sender) was quite distinct from the From: header in the message itself. Bounce messages usually have From: headers (normally showing [EMAIL PROTECTED]). A blank From: header is possible, but it's unusual in normal mail from MUAs. while the OP seems confused (he said: heading from), his logs show he is talking about the envelop sender (from= of his sendmail or whatever). I see. Sorry. Chris
Bayes expiry/oddity
I'm running a reasonably small site-wide spamassassin, and I use a site-side bayes db. Spamassassin runs as the user spamd. I noticed that I got spam last night with no BAYES_XX markup. I looked into it this morning, and discovered that the bayes db only has 47 spam messages in it (nspam from sa-learn --dump magic). It has about 69000 ham. It must have gone from 200 spams at around 11pm last night to 50 this morning, and the only explanation I can think of is that the spam has been expired, but on the other hand this seems odd. Spamassassin learnt 143 messages as spam yesterday (according to my logs). In the same period it learnt 291 as ham. These figures are reasonably representative of the traffic (on weekdays, anyway) Can anyone explain what happened to the bayes db? It's now steadily auto-learning itself back to normal, but we are going to get many more false negatives today I think. Any information/explanation appreciated. Chris PS I think it's extremely unlikely that there's been a concerted attack/mistake by users using sa-learn the wrong way and re-learning the spam as ham. For one thing, spamassassin is called by exim during the smtp phase, and if the e-mail is marked as spam it's never delivered to anyone. For another thing, there's nobody else around that knows what sa-learn is.
Re: Bayes expiry/oddity
* Chris Lear wrote (09/23/05 10:34): I'm running a reasonably small site-wide spamassassin, and I use a site-side bayes db. Spamassassin runs as the user spamd. I noticed that I got spam last night with no BAYES_XX markup. I looked into it this morning, and discovered that the bayes db only has 47 spam messages in it (nspam from sa-learn --dump magic). It has about 69000 ham. It must have gone from 200 spams at around 11pm last night to 50 this morning, and the only explanation I can think of is that the spam has been expired, but on the other hand this seems odd. Spamassassin learnt 143 messages as spam yesterday (according to my logs). In the same period it learnt 291 as ham. These figures are reasonably representative of the traffic (on weekdays, anyway) Can anyone explain what happened to the bayes db? It's now steadily auto-learning itself back to normal, but we are going to get many more false negatives today I think. Any information/explanation appreciated. None forthcoming, so I'm putting this down to a freak bayes database corruption. sa-learn --dump magic now shows 161 spam and 69310 ham learnt, and I'm letting it sort itself out. In about 3 months I guess it will be back to normal :-). Spamassassin works fairly well without bayes, so I don't mind too much, but I would feel happier if I thought that what happened was understandable. Chris
Re: Unsubscribing
* Duane Hill wrote (07/15/05 10:49): On Friday, July 15, 2005 at 9:45:17 AM, [EMAIL PROTECTED] confabulated: I am shortly to go on hols for 2 weeks and so was planning to unsubscribe until I get back. I notice on the web page at http://wiki.apache.org/spamassassin/MailingLists it tells you how to subscribe And in the headers of all messages to the list state this: list-help: mailto:[EMAIL PROTECTED] list-unsubscribe: mailto:[EMAIL PROTECTED] List-Post: mailto:users@spamassassin.apache.org Which helps. The OP's suggestion was... [...] I would like to suggest that unsubscribe details be added to the page. I think this is a reasonably sensible suggestion. I also notice that I seem to be subscribed to two spamassassin lists, not sure how that happened, And you seem to have sent mail to both at once, resulting in a duplicate. I think that spamassassin-users@incubator.apache.org is out of date. probably user stupidity knowing me. Is there information somewhere else that tells people how to unsubscribe from the list. See the headers (as mentioned above) -- Chris
Re: How can I correct this FalsePositive?
* Loren Wilton wrote (07/15/05 12:02): X-Spam-Status: Yes, score=2.2 required=2.0 tests=HTML_BACKHAIR_8,HTML_MESSAGE, HTML_OBFUSCATE_05_10,MIME_HTML_ONLY autolearn=no version=3.0.4 The easiest way to eliminate this FP would be to take your spam threshold back to 5, or at least something close to that. The rules that hit on this mail have nothing whatever to do with the site - they are related to the mail message formatting. Since it only got 2.2 points, nobody should really notice this. But since you have set your spam cutoff way too low, it FPs for you. ...and the cheapest way to fix the message formatting, as I see it, is to get them to fix the message so it doesn't hit this rule: 1.2 MIME_HTML_ONLY BODY: Message only has text/html MIME parts Which should also make the message more friendly to non-HTML mail readers, which is worthwhile anyway. And it will take the score down to 1.0. -- Chris
SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
I've been running quite a lot of sare rules on a site-wide SA installation for a month or two now. I've been keeping a fairly close eye on it, and there have been few false positives generally. But today I noticed that several e-mails are hitting both SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from (one specific address in) Ukraine to a Ukrainian in England, written in English. The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so only bayes saves it from being rejected (we reject at 5.5). I can re-score these rules (or remove sare_header0, which will lower the scores anyway), but I have 2 questions: - Is this a slightly unfair double-scoring? - Are there any other similar rules I should worry about, given that some Russian mail to this server is ham? -- Chris
Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
* John Wilcock wrote (05/20/05 10:51): Chris Lear wrote: But today I noticed that several e-mails are hitting both SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from (one specific address in) Ukraine to a Ukrainian in England, written in English. The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so only bayes saves it from being rejected (we reject at 5.5). I can re-score these rules (or remove sare_header0, which will lower the scores anyway), but I have 2 questions: - Is this a slightly unfair double-scoring? - Are there any other similar rules I should worry about, given that some Russian mail to this server is ham? These are actually in the header1 file, not header0, but surely they ought to be moved to the 70_sare_header_eng.cf as they hit non-English ham. Bob? They're in my header0.cf from sare/rules du jour. And in header.cf with a lower score as well. Have I got the wrong files? RulesDuJour $ grep SARE_FROM_CHAR_W1251 * 70_sare_header.cf:headerSARE_FROM_CHAR_W1251 From:raw =~ /\=\?Windows-1251\?/i 70_sare_header.cf:describe SARE_FROM_CHAR_W1251 Displays in unexpected charset 70_sare_header.cf:score SARE_FROM_CHAR_W1251 1.666 70_sare_header.cf:#ham SARE_FROM_CHAR_W1251 Found in some Russian ham 70_sare_header.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob Menschel May 17 2004 70_sare_header.cf:#counts SARE_FROM_CHAR_W1251 245s/4h of 238550 corpus (112525s/126025h RM) 02/28/05 70_sare_header.cf:#counts SARE_FROM_CHAR_W1251 640s/0h of 54176 corpus (16997s/37179h JH-3.01) 02/01/05 70_sare_header.cf:#counts SARE_FROM_CHAR_W1251 0s/0h of 17050 corpus (14617s/2433h MY) 08/08/04 70_sare_header0.cf:headerSARE_FROM_CHAR_W1251 From:raw =~ /\=\?Windows-1251\?/i 70_sare_header0.cf:describe SARE_FROM_CHAR_W1251 Displays in unexpected charset 70_sare_header0.cf:score SARE_FROM_CHAR_W1251 4.000 70_sare_header0.cf:#stypeSARE_FROM_CHAR_W1251 spamgg 70_sare_header0.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob Menschel May 17 2004 70_sare_header0.cf:#counts SARE_FROM_CHAR_W1251 180s/0h of 66979 corpus (41757s/25222h RM) 09/04/04 70_sare_header0.cf:#counts SARE_FROM_CHAR_W1251 209s/0h of 38398 corpus (14914s/23484h JH) 08/14/04 TM2 SA3.0-pre2 70_sare_header0.cf:#counts SARE_FROM_CHAR_W1251 0s/0h of 17050 corpus (14617s/2433h MY) 08/08/04 -- Chris
Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
* John Wilcock wrote (05/20/05 12:15): Chris Lear wrote: They're in my header0.cf from sare/rules du jour. And in header.cf with a lower score as well. Have I got the wrong files? Methinks you have an old header0.cf that is no longer being updated - these rules aren't in the current header0 on rulesemporium.com. OK, thanks. I'll try to find out what's wrong with my Rules du Jour. And in any case you shouldn't be using header and header0 together... I didn't know that. I'll fix that as well. Thanks for your help. -- Chris
Re: how to config SA to scan mail from localhost
* Evan Platt wrote (10/05/2005 05:21): At 09:16 PM 5/9/2005, you wrote: I'm testing the SA but my server can't connect to outside world. Thus, i've to send mail from localhost to myself to find how accurate SA is. Unfortunately, SA don't scan mails that sent from localhost. how can I reconfig it to scan every mail. You don't. You tell spamassassin what mail to scan. How are you calling spamassassin, and what is your mail configuration? The original question is a restatement of yesterday's how to force SA to scan mail that send from php post. My reading of the situation (which might be wrong) is this: The Original Poster wants to do some sort of project that will give statistics on the accuracy of spamassassin. He has followed a recipe that installs qmail with qmail-scanner, and has got a php script that will send mail to the mail server. But the mail server appears to skip the scan for local messages, so the project is getting no statistics. The solution to this problem is to work out how qmail-scanner decides what to scan, and change it. Unfortunately, I can't help there. I would try doing a manual smtp connection from the local machine (telnet localhost 25) and take it from there. But my worry is that sending a load of e-mail via a php form will produce hopeless project results, because it will effectively only test the value of spamassassin's body checks. But perhaps that's part of the plan. -- Chris
Re: OT: Confession and rage
* Stewart, John wrote (05/06/05 15:55): [... excellent story chopped ...] Do I: - Never go there again, as I said would be the case in my previous email? - Show up and try to convince her what a horrible thing she is doing? - Just screw with their (horribly insecure) online site, signing up for appointments all day for Elmer Fudd, etc? - Simply ban their domain from my mailserver and report them to the RBLs? Or... - Offer them some consultancy, in return for a haircut (is this the same as option 2?) -- Chris
Re: Simply don't run spam for Mailing Liste
* arnaud wrote (27/04/2005 23:06): Kris Deugau wrote: [...] In my case, for instance, SA is called from procmail just before the message is written to a mailbox. In my .procmailrc file, I have a number of procmail recipes that look something like this: # SATalk :0: * ^List-Id: users.spamassassin.apache.org /home/kdeugau/mail/spam-stomping This one files messages from this list in the spam-stomping folder before SA even sees the message. I have quite a long list of similar entries for other mailing lists. -kgd Ok Thank you. As your can see, i haven't understand this option. I use exiscan with exim. It would be better i suppose to perform spamassassin with procmail that i use too. Or use exim configuration rules to prevent scanning of certain messages. If you are using exim's acls (either exim 4.50+ or older exim with the exiscan-acl patch), something like this should work: [in main config] acl_smtp_rcpt = acl_check_rcpt acl_smtp_data = acl_check_content [in acls] acl_check_rcpt: [...] # Set acl_m0 variable to tell the later acl not to use SA accept hosts = veronyk.net : freetelecom.com set acl_m0 = dontcheckdata [...] acl_check_content: # Skip all content checks if acl_m0 variable set accept condition = ${if eq{$acl_m0}{dontcheckdata}{1}{0}} [...] deny message = I don't like your nasty spam spam = spamd:true/defer_ok condition = ${if {$spam_score_int}{80}{1}{0}} [...]