Re: Spamassassin not tagging some emails
Hi, SpamAssassin DOES NOT bypass scanning, if the internal or trusted networks contain the server in it. Hmm.. thanks for correcting me. How would you, then, go about preventing SA from scanning the localhost or a specific domain without whitelisting that domain or range? Thanks, Alex
Re: Email / Inbox Speed Problems
Hi, I really hate to respond to this because it's so off-topic (how long did it take you to write that email, anyway?), but you're s missing the point that I just can't let it go, and it's slow on a late Friday night. Yet, you open up a new Mac and what's inside? A PC motherboard and processor, that's what there is!!! You can even boot OSX on a PC That's not the point. Haven't you ever bought a bottled water, or spoken with someone that has, because it tastes better? It's all in the marketing. Apple caters to people that just don't care that it's a PC inside. Yet, Apple's response to the Open Source community is APSL 2.0 which is incompatible with GPL. And do you think that anyone in a Mac store That's a different issue. There's no business model for corporations like Adobe building open source apps for the PC, let alone the Mac where the userbase is even smaller. Your amazed WE have Mac customers?!? At least WE try to EDUCATE them so they aren't stuck with Apple sticking it to their wallets. I'm amazed that ANY Mac-specific retailer, much less APPLE, has ANY Mac customers. You had mentioned someone jammed a screwdriver into the computer and broke it, and you really think they care about going to Fry's to buy a replacement hard disk? They just don't care. They want it to just work. Who cares that the mouse is $30? They buy them for the convenience, the looks, the infamous support for multimedia, and the ease-of-use. They buy them because it's a single point of contact. They buy them because someone can make the choice for them, and they can get on with doing things other than worry about the details of the computer and just start using it. Best, Alex
Re: Elena wants an iron cast oven
Hi, What's the business model of this scam? I can't believe they really want millions of iron cast ovens from all around the world. Maybe I should answer and ask directly ;D Long time since I've last seen one of these... My impression was, they want money of course. The victim falling for it Yes, follow the money. It's always about the money. The oven ploy is just weird enough to attract your attention in hopes of garnering a response. Regards, Alex
Re: Elena wants an iron cast oven
Hi, http://englishrussia.com/?p=2137 plenty of abandoned scrap metal already in Russia. Maybe they could blow it up like the brain surgeons did to that dead whale that was littering the beach in Oregon? # The Infamous Exploding Whale http://www.youtube.com/watch?v=8Vmnq5dBF7Y Alex
Re: Spamassassin not tagging some emails
Hi, On the message that should have been scanned: The emails that has not been tagged at all: [...] From: Angus - 3idea angus.d...@3idea.com To: supp...@3idea.com Are you forwarding this spam from your internal account to this other internal supp...@3idea.com account? It also looked like there was no external mail server involved. If so, I would think that SA trusts your internal network, and therefore is just passing the message through without even evaluating it. If you want your internal mail to also be scanned, remove your mail server from trusted_networks and internal_networks. I think that should fix it. Regards, Alex
hostkarma/uribl_black disparity
Hi, Over the past few days I have been investigating more closely email that wasn't tagged that I thought should have been, and vice-versa, using various factors, such as URIBL_BLACK and JMF_W. I'm very surprised that obvious hosts are on the URIBL_BLACK list, like receiveeweek.com. Even more interesting is a bunch of FNs that contain both URIBL_BLACK and JMF_W. I'm not sure which is correct in many cases, because they are not always so cut-and-dried. For example, there was a Citi Bank email (whitelisted) that happened to use an image server (csnimages.com) that is in URIBL_BLACK. While I don't think that particular email should have been tagged as spam, it's only an example, and I hoped someone would be interested enough to check out a list I created with these types of disparities I've had over the last day or so. It's too long to include here, so I've created a pastebin for it: http://pastebin.com/m4a1561b5 I realize this type of thing could happen for many reasons, not the least of which is an otherwise-legitimate host that has been compromised and now used to send spam. However, many on my list are quite persistent, like blr-events.com and eturbonews.com, which I have no idea whether it is legitimate or bogus. Whatever the case, there are definitely mistakes, and I'd like to help correct them. Ideas appreciated. I'd be glad to gather more info if necessary. Thanks Alex
Re: Is there a WANTS_MY_INFO rule?
Hi, In order to confirm you Web-Mail identity, you are to provide the following data; First Name: Last Name: Username/ID: Password: Date of Birth: Try John Hardin's fillform: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/?sortby=date Regards, Alex
Downloading sandbox rules
Hi, I'd like to download a few of the rules from the SVN sandbox for testing without using svn for this. It used to be possible by clicking Download but in the last week or so the site was updated and that option is no longer available. Do I have to use svn now for this? http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/ Thanks, Alex
Re: Downloading sandbox rules
Hi, Sorry, just after I sent this I saw the message from yesterday about using svn. Thanks, Alex On Sat, Oct 17, 2009 at 1:24 PM, MySQL Student mysqlstud...@gmail.com wrote: Hi, I'd like to download a few of the rules from the SVN sandbox for testing without using svn for this. It used to be possible by clicking Download but in the last week or so the site was updated and that option is no longer available. Do I have to use svn now for this? http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/ Thanks, Alex
Re: Constant Contact
Hi, rawbody __CCM_UNSUB /https?:..visitor\.constantcontact.com\/[^]{60,200}SafeUnsubscribe/ Ouch! Rawbody, that hurts. Do you mean that it's much more resource-intensive than a regular body check? When is it necessary (or possible) to use it over the URIDetail substitute you mentioned? For example, I have to use rawbody here because I'm searching within HTML tags: rawbodyDDN_SPAM_3 /\/.{5}\-.{4}\-.{3}\/.{5}\-.{4}\-.{3}\-1\.jpg border=0\\\/a\\br\/ describe DDN_SPAM_3 New DDN Spam score DDN_SPAM_3 2.201 However, I suspect it's pretty resource-intensive, and I have several of them, along with dozens of rules like: rawbody __SARE_HTML_INV_TAG /\w\!\w{18,60}\w/i^M Is there a way to easily measure the overhead of a particular rule? I'd love to find out which rules are consuming the most resources. Certainly as the number of rules have increased, the constant load on the server has increased. Does everyone systematically run sa-compile on their rules? Thanks, Alex
Re: Constant Contact
Hi, Does anybody here know anything about the legitimacy of Constant Contact http://www.constantcontact.com/anti_spam.jsp ? Sometimes abused, but too legit to outright block based on sending IP, imo. In addition to constantcontact, can I add the following to the list of hosts I'd like people's input on as to whether it's spam: - blueskycommunications.com - pm0.net - topica.com I believe topica.com is very similar to constantcontact in that they send bulk mail for small businesses, and don't necessarily care what they send. The emails typically contain something like You may be eligible for a cash advance and a URL like macho-man-fitness.c.topica.com that is just a redirect to something like cashadvancenow.com. It's only on URIBLS grey list. Thanks, Alex
Re: Constant Contact
Hi, How is Constant Contact better than (say) GNU mailman for that purpose? I don't understand the concept of sending internal mail via an external third party... In addition to what's already been mentioned, CC also provides a nice template that people can drop their message into and click Send. This is very appealing to the local bagel shop or restaurant that wants to advertise their specials to their favorite customers without even having an Internet connection of their own. I don't doubt that if you solicited to these types of businesses with your mailman product and the ability to add their logo to the top of an HTML email, they'd choose your service just the same. Best, Alex
Re: sneaky pharma spam shooting past standard rules
Hi, With this: Received: from public30108.xdsl.centertel.pl (HELO marcin-8963fd6f) (79.163.117.156) my postfix setup would have simply dropped it on the floor at the HELO/EHLO. If it doens't HELO with an FQDN and a proper rDNS, we don't talk to it. Kurt, can you explain how you're doing it with postfix? Thanks, Alex
Re: sneaky pharma spam shooting past standard rules
Hi, smtpd_helo_restrictions = permit_mynetworks, reject_invalid_helo_hostname, reject_non_fqdn_helo_hostname, permit I'm currently using reject_non_fqdn_sender and reject_non_fqdn_recipient. I wanted to be sure I should use the two helo restrictions you've listed above in addition to the ones I'm already using, correct? Hopefully not too far off-topic now, but this is the total list of restrictions I'm currently using: smtpd_recipient_restrictions = permit_mynetworks, reject_non_fqdn_sender, reject_non_fqdn_recipient, reject_unknown_sender_domain, reject_unknown_recipient_domain, check_client_access hash:/etc/postfix/client_access, reject_unauth_destination, check_recipient_access pcre:/etc/postfix/relay_recips_access, reject_unauth_pipelining, reject_invalid_hostname Thanks, Alex
Re: Hostkarma whitelist needs something..
Hi, http://www.impsec.org/jhardin/antispam/ This should be: http://www.impsec.org/~jhardin/antispam/ (note the missing tilde :-) Regards, Alex
Mismarked Ham
Hi, I thought I would look through the quarantine for BAYES_00 to see if there were any mis-marked messages or if bayes was not firing correctly, and I have found a few, although not how I expected it would be. Instead of finding BAYES_00 in spam, I've found it in ham that was pushed over the threshold to spam because of other rules. Here are the headers from one such instance: http://pastebin.com/m6c3cd5e3 exxample.com is my obfuscation. It was an HTML email with two small GIF attachments that were a basic background image and two links to youtube videos of a religious Muslim ceremony in Arabic with English subtitles. All indications are that bayes is correct and it's ham. Which rule(s) is then incorrect? What is the right solution here? Is the only option to whitelist the user? Thanks, Alex
Re: Mismarked Ham
Hi, What makes you think any of the rules are incorrect? A score of 6.1 is not 100% (or even 99%, IIRC) spam. Incorrect in that at least one of the rules fired when they should not have, making the valid email to be marked as spam. there's a couple of things here. First, for some reason you have DKIM_SIGNED but not DKIM_VERIFIED, which seems odd as this looks like a legit gmail message with a legit DKIM signature. So there's one thing to check. Why is that? How do I go about figuring that out? I'm not sure which of those scored what. Then there is the fact that your custom rule L_UNVERIFIED_GMAIL hit. If that's the same rule I see in the list archives, that scored 2.5 and pushed this email firmly into being tagged as spam. Yes, that looks like it. It was posted by Dan McDonald on August 25th to the list. It's a meta: meta L_UNVERIFIED_GMAIL !DKIM_VERIFIED __L_FROM_GMAIL !__L_VIA_ML priority L_UNVERIFIED_GMAIL 500 scoreL_UNVERIFIED_GMAIL 2.5 I've set it to 0.5 for now. Ideas on tracking down the DKIM_VERIFIED issue would be appreciated. Maybe adjust that score, or adjust the assumptions that caused that rule to be added to your config? This IS a gmail message, right? So your unverified-gmail custom rule is in error. Yes, that's correct. I think you've identified the root of the problem. Thanks so much. Best regards, Alex
Re: Mismarked Ham
Hi, I'm not sure which of those scored what. [...] Seconded. I do see quite a few custom rules. How much did they score? My apologies; I hadn't realized so much of it was non-standard. It's otherwise obviously not very possible to help without knowing what the rules are for if you haven't seen them. I've re-run the spam through SA. It looks like the bayes score has now changed, now making the score 8.2. I've also reduced the L_UNVERIFIED_GMAIL down to 0.5 from 2.5. X-Spam-Report: * 2.0 RELAYCOUNTRY_HIGH Relayed by a country thats a bad spam source * 0.0 RELAYCOUNTRY_US Relayed through United States * 1.0 EXTRA_MPART_TYPE Header has extraneous Content-type:...type= entry * 0.5 FREEMAIL_FROM Sender email is freemail (learnlivelove[at]gmail.com) * -0.0 SPF_PASS SPF: sender matches SPF record * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record * 0.0 DKIM_SIGNED Domain Keys Identified Mail: message has a signature * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% * [score: 0.5000] * 1.1 TVD_FW_GRAPHIC_NAME_LONG BODY: TVD_FW_GRAPHIC_NAME_LONG * 0.0 HTML_MESSAGE BODY: HTML included in message * 0.0 T_TVD_FW_GRAPHIC_ID1 BODY: T_TVD_FW_GRAPHIC_ID1 * 1.4 SARE_GIF_ATTACH FULL: Email has a inline gif * 1.6 PART_CID_STOCK Has a spammy image attachment (by Content-ID) * 0.5 L_UNVERIFIED_GMAIL L_UNVERIFIED_GMAIL Should SARE_GIF_ATTACH be such a high value by default? full SARE_GIF_ATTACH /name=\?[0-9a-z._\-]{3,18}\.gif\?/i describe SARE_GIF_ATTACH Email has a inline gif scoreSARE_GIF_ATTACH 1.42 I think this one might also be too aggressive by default? meta PART_CID_STOCK (__ANY_IMAGE_ATTACH__PART_STOCK_CID!__PART_STOCK_CL!__PART_STOCK_CD_F) describe PART_CID_STOCK Has a spammy image attachment (by Content-ID) Even more strange, there is a T_ prefixed rule, which of course is not stock. And generally used for NON-published rules still in evaluation. How did that one end up in there? What does it score? That originated in updates_spamassassin_org/72_active.cf, so it's part of the channel updates: mimeheader T_TVD_FW_GRAPHIC_ID1 Content-Id =~ /[0-9a-f]{12}(?:\$[0-9a-f]{8}){2}\@/ Thanks, Alex
Re: .cn Oddity
Hi, We use some rules if we talk open about it and say hey this spammer is stupid look here, then it will take less then 12 hours and that gap is closed and we loose a valuable trick. yes its the way it is, spammers can also read maillists and adapt there spamming rules to get bypassed It sounds like social engineering needs to be part of the attack rules/strategy that we employ on these spammers :-) Regards, Alex
Re: Valid mail from blacklisted dynamic IPs
Hi, I also don't understand how SPF_SOFTFAIL could happen when there wasn't any SPF record to test to begin with. http://www.openspf.org/ i have no spf either http://old.openspf.org/wizard.html?mydomain=junc.orgsubmit=Go! :) But it's sent from cron, so the host is localhost. I definitely have to read more to learn why SPF would fail without an SPF record. Maybe that's the whole point. what is the sender domain ?, why do users need to be sending to a pop_before_smtp ? They are mostly on laptops or home connections with dynamic IPs. Roadwarriors. remember that ip could as very well be one single user ? (NAT and friend) have there isp forbid them to not being allowed to send mail ? No, they haven't, and that's perhaps the best suggestion is to just have them use their own ISPs mail server in the first place. Thanks so much. Great suggestions. Best, Alex
Re: Valid mail from blacklisted dynamic IPs
Hi, I have a set of users that are authorized to use the mail server via pop-before-smtp, but SA catches the mail they send through the system as spam because they are on blacklisted Verizon or Comcast IPs: why are they not using smtp authentication? I think you're referring to SASL? Some time ago we had used it, but the implementation was so buggy and was such a security nightmare that we removed it, not thinking it would become so intrinsic to email on the Internet in the future. Kind of like the security fears people had about bind-4 back then. Thanks, Alex
Re: SA needs a new paradigm for rule structure
Hi, What we need are rules that combine a lot of simple rules into concepts and then combine those rules into rules that score - and score big. As an example, [...] Yes, SA definitely needs that and sorely lacks this ultimate feature! Can I respectfully add to this that John Hardin has already done what I think you're describing in his lotsa_money and advance_fee rules: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/ Regards, Alex
Valid mail from .cn
Hi, Some portion of our users are from China. I hoped someone could help me troubleshoot the best way to permit a user from .cn to forward mail without improperly being tagged as spam, yet still block the majority of spam from .cn. Here's the SA report: X-Spam-Report: * 0.1 RELAYCOUNTRY_CN Relayed through China * 2.0 RELAYCOUNTRY_HIGH Relayed by a country thats a bad spam source * 1.0 EXTRA_MPART_TYPE Header has extraneous Content-type:...type= entry * -0.0 SPF_PASS SPF: sender matches SPF record * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record * 0.0 LOC_URI_CN URI: Contains CN URI * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% * [score: 0.5000] * 0.0 HTML_MESSAGE BODY: HTML included in message * 0.0 T_TVD_FW_GRAPHIC_ID1 BODY: T_TVD_FW_GRAPHIC_ID1 * 1.8 MIME_BASE64_TEXT RAW: Message text disguised using base64 encoding * 1.5 MY_CID_AND_ARIAL2 SARE CID and Arial2 * 1.6 PART_CID_STOCK Has a spammy image attachment (by Content-ID) * 1.5 MY_CID_AND_STYLE SARE cid and style * 1.6 MY_CID_ARIAL_STYLE SARE cid arial2 style Bayes could probably use a bit of work, but is there something that I should be investigating based on this to improve the accuracy, or should I just whitelist_from_rcvd the user since it's a minority of valid accounts from China? Even if I remove the RELAYCOUNTRY_HIGH meta, it's still over the 5.0 threshold. Thanks, Alex
Fwd: SA needs a new paradigm for rule structure
Hi, I sent this message more than an hour ago, and it looks like it's yet to hit the list. Resending. Thanks, Alex -- Forwarded message -- From: MySQL Student mysqlstud...@gmail.com Date: Fri, Oct 9, 2009 at 2:34 PM Subject: Re: SA needs a new paradigm for rule structure To: SA Mailing list users@spamassassin.apache.org Hi, What we need are rules that combine a lot of simple rules into concepts and then combine those rules into rules that score - and score big. As an example, [...] Yes, SA definitely needs that and sorely lacks this ultimate feature! Can I respectfully add [...] Whoa, dude! You just left the heavy sarcasm in, and snipped everything from the quote that clarifies this statement and identifies it as sarcasm. Yes, I'm really sorry about that. I didn't think that it would not be interpreted as sarcasm with the way I quoted it, but looking at it now, I see that it might. Best, Alex
Re: Valid mail from .cn
Hi, Could you ask them to provide ham samples for the automated masschecks? We currently have none in the corpus so we cannot test the safety of rules against Chinese language mail. Yes, I know how important that is. I recall you mentioning that a few days ago. I think it would be quite difficult for me, though. I'll evaluate how much mail there really is over the coming work-week, and see if there's something I can do. Best, Alex
Re: Subject Rewrite Based on Score
Hi, I actually would be doing that but the filter does not know how to handle int(), so I would have to build a filter for all possible number combinations, but if I could just get SA to do the basic math for me and write a header or subject I can filter off of that. We do something similar here using a procmail/formail script which calls a perl script to match on X-Spam-Status then rewrite the subject to include the bayes score prepended to the subject. We then use a few procmail rules to filter the mail based on the bayes score for analysis. Regards, Alex
Re: Subject Rewrite Based on Score
Hi, That sounds overly complicated and like a lot of wasted cycles. Calling a Perl script for each message? What you just described sounds a hell of lot like this light-weight SA configuration: Yes, I should have mentioned that it is a copy of the mail that users receive and only visible by a single account. It also only occurs once every four hours as the mail is pulled from the spool. Regards, Alex
Re: Subject Rewrite Based on Score
Hi, It still is spawning a Perl process per message. You can do away with that processing hog, if you use the add_header rule I mentioned before and have SA do it instead. You may be right. I'll have to investigate doing this for this specific user only. Thanks for the info. Thanks, Alex
Valid mail from blacklisted dynamic IPs
Hi, I have a set of users that are authorized to use the mail server via pop-before-smtp, but SA catches the mail they send through the system as spam because they are on blacklisted Verizon or Comcast IPs: X-Spam-Status: Yes, hits=5.4 tag1=-300.0 tag2=5.0 kill=5.0 use_bayes=1 tests=BAYES_50, BOTNET, FH_HOST_EQ_VERIZON_P, RCVD_IN_PBL, RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL I also don't understand how SPF_SOFTFAIL could happen when there wasn't any SPF record to test to begin with. One of the Comcast users: X-Spam-Status: Yes, hits=6.4 tag1=-300.0 tag2=5.0 kill=5.0 use_bayes=1 tests=BAYES_50, BOTNET, DYN_RDNS_SHORT_HELO_HTML, HTML_MESSAGE, RCVD_IN_PBL, RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL, SUBJ_ALL_CAPS We are working on better Bayes training, but sans that problem, what is the right way to address this, through a rule that whitelists their specific IP? Another mail that I'm dealing with is one sent by Marriott that hit SARE_HTML_URI_REFID, DCC_CHECK, and AE_DETAILS_WITH_MONEY, among being whitelisted by JMF/HOSTKARMA. I don't know how it hit DCC when there are details in there specific to the user, including account numbers, user names, etc. How should I go about allowing this type of mail without disrupting its ability to block mail that should be blocked with these rules? I'm sure I can add a rule subtracting points if it hits these and comes from Marriott, but I thought there might be something that could address the more general problem rather than this specific one from Marriott. Perhaps I'm making it too hard. Thanks, Alex
Re: Valid mail from blacklisted dynamic IPs
Hi, Does your pop-before-smtp method cause your MTA to indicate they've been authed in the Received: header? I don't believe so. There doesn't appear to be anything additional in the header relating to pop-b4-smtp. I'm using postfix. Perhaps off-topic, but ideas on how to do this, if you think it would be the right approach? I also don't understand how SPF_SOFTFAIL could happen when there wasn't any SPF record to test to begin with. Are you sure? What was the envelope from domain for the message? (keep in mind, this checks the envelope from, not the from header..) No, I'm not sure. I just don't see anything relating to SPF in the message at all. Some of DCC's signatures are fuzzy, thus will match similar messages with minor differences. This is done to avoid spammers bypassing by Yes, understood. The fuz1 and fuz2 max settings are 99, which I assume is the max possible, set by the previous admin. As for dealing with it: whitelist Marriott at the SA level (as you suggest) whitelist Marriott at the dcc level remove or severely cut back the score of AE_DETAILS_WITH_MONEY, if you ever actually expect to get important email about traveling to the UAE. I've whitelisted the Marriott address. I also actually removed the rule entirely, and just relying on John's excellent lotsa and fillform rules. Thanks very much. Best, Alex
Re: OT bad news
Hi, It's a shame that, living in Denver, I will be *just* out of range of hearing the screams as the mailspools fill with viruses, malware, and massive payloads of Spanish Prinsoner spams. Awe, c'mon now. Yes, I agree SA is a better solution, but Microsoft didn't get to be a multi-billion-dollar company solely because of its marketing. Certainly a competent admin following some SANS guides can secure an Exchange box to sufficiently avoid it getting hacked, and a properly-installed version of Symantec will keep most spam away. It /is/ possible, I suppose :-) I'd bet that if he kept the FreeBSD box in place and just told his boss he upgraded to Exchange, they'd never even know :-) Regards, Alex
Re: Uppercase E-mail in Latin America
Hi, doesnt it appear to everyone else that this has the (slim to none) makings of a new urban legend? I have to admit that when Warren posted this, I went to snopes to check, and there was nothing there :-) Regards, Alex
Re: SpamAssassin Ruleset Generation
Hi, Other than the sought rules, all the rules are manually generated? Is there any statistics on how frequently are new rules/regex adopted by spamassasssin? Who are the people who write them? Any details related to Information on Justin Mason's SOUGHT rules is here: http://taint.org/2007/08/15/004348a.html Use sa-update to update your SA rules once or twice per day with the new stuff. His ongoing development work is here: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jm/?sortby=date HTH, Alex
Re: .cn Oddity
Hi All, Regarding the .cn oddity, I added these to my rules, and of about 79k messages today so far, I have the following: uri LOC_URI_CN m;^https?://[^/?]+\.cn\b; uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i LOC_URI_CN: 2926 T_CN_8_URL: 1634 HTH, Alex
Re: Hostkarma white list
Hi, For those of you getting spam from IPs/Hostnames on my hostkarma white list, if you could email me a list of false hits (IP or host name) I could probable clean out the bad entries in the white list pretty quick. I'm not sure this is the best approach. I have a procmail recipe that filters specifically the JMF_W and I go through it every day before training the folder as ham. I'd say around a quarter of the messages are spam. How many entries on the whitelist? How were they added? I'd almost rather start from scratch (or from a more proven list) with a percentage known to be valid and build from there. At the least, wouldn't it be best to move the default score closer to zero on your wiki page for the time being? Maybe another method for submitting FPs rather than emailing them to you could be created? Wouldn't the veracity of the list be better assured if you built the list from a pile of known ham? Mail originating from priorityoneemail.com [69.10.237.52] would be one prime suspect for removal consideration. On a somewhat related topic, how do people classify topica.com? That is one for sure sends junk, but looks like people may actually request it, heh. Thanks, Alex
Re: Hostkarma Blacklist Climbing the Charts
Hi, header RCVD_IN_JMF_W eval:check_rbl_sub('JMF-lastexternal', '127.0.0.1') describe RCVD_IN_JMF_W Sender listed in JMF-WHITE tflags RCVD_IN_JMF_W net nice score RCVD_IN_JMF_W -5 Hopefully my comment isn't out of place with the current discussion of JMF/Hostkarma. I think this is not only a really bad default score, but it should be reduced to -0.5 or perhaps not used at all. I have a money/fraud email that hit RCVD_IN_JMF_W that passed through these servers: Received: from 41.220.75.3 Received: from webmail.stu.qmul.ac.uk (138.37.100.37) by mercury.stu.qmul.ac.uk Received: from qmwmail2.stu.qmul.ac.uk ([138.37.100.210] Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) It also hit these other rules: X-Spam-Status: No, hits=1.3 tagged_above=-300.0 required=5.0 use_bayes=1 tests=AE_GBP, BAYES_50, LOTS_OF_MONEY, LOTTERY_PH_004470, LOTTO_RELATED, MONEY_TO_NO_R, RCVD_IN_DNSWL_MED, RCVD_IN_JMF_W, RELAYCOUNTRY_UK, SPF_FAIL, SPF_HELO_FAIL Unless I'm really missing something, which server has JMF/Hostkarma whitelisted that shouldn't be? This happens time after time. Thanks, Alex header RCVD_IN_JMF_BL eval:check_rbl_sub('JMF-lastexternal', '127.0.0.2') describe RCVD_IN_JMF_BL Sender listed in JMF-BLACK tflags RCVD_IN_JMF_BL net score RCVD_IN_JMF_BL 3.0 header RCVD_IN_JMF_BR eval:check_rbl_sub('JMF-lastexternal', '127.0.0.4') describe RCVD_IN_JMF_BR Sender listed in JMF-BROWN tflags RCVD_IN_JMF_BR net score RCVD_IN_JMF_BR 1.0 ===8--- You pick the names and then the world can use them. The JMF names are out there today. {^_^} Joanne
Re: New money/fraud spam
Okay, my bad, please ignore. Damn google auto-complete. Alex On Sun, Sep 27, 2009 at 6:46 PM, MySQL Student mysqlstud...@gmail.com wrote: Hi John, Another batch of money spam attached. Everything is the same as the last time. Thanks, Alex
Sought regex problem
Hi, I posted bug 6198 a few weeks ago, and there have been no comments or fixes on it in two weeks, and I'm unsure what to do next. It's either not a bug and I'm doing something wrong or it's not significant enough to bother with the focus on v3.3. Thought someone might have some ideas here? I'm using perl-5.6. Anyone else using perl-5.6 with the sought rules? [13204] dbg: config: read file /var/lib/spamassassin/3.002005/sought_rules_yerp_ org/20_sought.cf [13204] warn: config: invalid regexp for rule __SEEK_D52BRW: / Don\'t want to lose your potential of a lover\? Lucky you are, in 21th century all bed-related male problems can be solved by the powerful remedy, the all-mighty blue caplet\! This solution will give you the right support for 50\(\!\) hours\. Rock-like and ready to go\. more\x{bb}/: / Don\'t want to lose your potential of a lover\? Lucky you are, in 21th century all bed-related male problems can be solved by /: Can't use \x{} without 'use utf8' declaration Maybe it's a perl module that's incompatible? Ideas greatly appreciated. Thanks, Alex
Re: Sought regex problem
Hi, [13204] dbg: config: read file /var/lib/spamassassin/3.002005/sought_rules_yerp_ org/20_sought.cf [13204] warn: config: invalid regexp for rule __SEEK_D52BRW: grep doesn't find __SEEK_D52BRW in my copy of the rules. This was from the sa-update when I submitted the bug report. Thanks to all for the feedback and the update to the bugzilla. I'm in the process of upgrading perl, but there are still a few applications that depend on it. Mark suggested in the bugzilla update that I change SpamAssassin to add 'use utf8' into code generated from rules when it sees it is being run with a pre-5.8 version of perl. How do I do this for the time being? Thanks, Alex
Re: Re-running SA on an mbox
Hi, Try using a local SA setup for stripping the headers. By local, I mean don't use your main production SA - run a separate copy with its own (cut down) configuration and all data base accesses and UBL calls etc turned off. Much better idea, thanks. Thanks for the script, too. Best, Alex
Re: Re-running SA on an mbox
Hi, Thank you all for your help. The mbox split suggestion is a good one. I'll follow that route and post my experience later. formail -s is the way to go. I thought about that as a component of procmail. Sounds great. Thanks, Alex
Re: Re-running SA on an mbox
but this will invalidtate dkim headers if this headers is signed, are spamassassin aware of this problem ? (in general) Are you saying there is a bug? mutt -f mbox in mutt save to another folder if missclassified Yes, I use pine for that, but would like to eliminate as many of the FNs as possible, particularly ones that I can't determine visually. Thanks, Dave
Re: Re-running SA on an mbox
Hi, IIRC you previously mentioned using Pine. Just in case you're not aware the default format for Pine/Alpine is MBX, an extended version of MBOX. You can tell the difference because MBX mailboxes start with a dummy email that's hidden by the software. It seems that if you save messages into a separate folder it does not add the DUMMY information at the top. I believe this is why the system was set up to use mbox and not mbx. Does this sound correct? I'd be very wary about allowing any tool to modify an MBX file unless you know it's safe. Where locking is an issue, Mark Crispin recommends that they only be accessed via the c-client library. This isn't the actual spool file, but a copy in the home directory. Thanks, Alex
Re: Re-running SA on an mbox
Hi, It's certainly not a fast operation, but using the following will split an mbox into individual messages: export FILENO=0 mkdir msgs formail -s sh -c 'cat - msgs/$FILENO' mbox-name.mbox I also created a loop that would strip all the SA headers from the messages: for file in *; do echo Processing: $file; spamassassin -d $file $file.txt; done This worked for a few hundred of the messages, but then started to fail on my production system with: [22135] warn: bayes: cannot open bayes databases /home/user/.spamassassin/bayes_* R/W: lock failed: File exists How can I tell when another process is using the database and when it is free for my script to use? Is there a faster way to run spamassassin just to strip the SA headers? Maybe there is a faster way, like passing the messages through the running amavisd instead of having to restart spamassassin each time to re-process each message? Thanks, Alex
Re-running SA on an mbox
Hi, I have an mbox with about a 100 messages in it from a few days ago. The mbox is a combination of spam and ham. What is the best way to run SA through these messages again, so I can catch the ones that have URLs in them that weren't on the blacklist at the time they were received? Must I break them all apart to do this, or can SA somehow parse the whole mbox? If not, what program do you suggest I use to accomplish this? Thanks, Alex
Re: Re-running SA on an mbox
Hi, Do you just want to re-scan the whole mbox and see what rules hit now for research reasons? That's a good start, but I'd like to see if I can break out the ham to train bayes. There's no way to (directly) get SA to modify email that's already in an mbox file. The mass-check and sa-learn tools can read them, but nothing in SA can write to that. However, there might be a utility out there to do this (although I'm not aware of any).. Yeah, that's kind of what I thought. Maybe a program that can split each message back into an individual file? Would procmail even help here? Or even a simple shell script that looks for '^From ', redirects it to a file, runs spamassassin -d on it, then re-runs SA on each file? I could then concatenate each of them back together and pass it through sa-learn. Thanks, Alex
Re: Re-running SA on an mbox
Hi, You probably want spamassassin --mbox. :) It won't modify the messages in-place, but you can do something like spamassassin --mbox infile outfile. My apologies if it wasn't clear, but these messages have already been marked by SA. Some are ham, and the rest are FPs that I'd like to re-run through SA, in hopes of it now properly detecting them as spam. Thank you all for your help. The mbox split suggestion is a good one. I'll follow that route and post my experience later. Thanks again, Alex
Re: Re-running SA on an mbox
Hi, You probably want spamassassin --mbox. :) It won't modify the messages in-place, but you can do something like spamassassin --mbox infile outfile. My apologies if it wasn't clear, but these messages have already been Wait, my mistake. I read that too fast. Does that work, and rewrite the X-Spam-Status header? Guess I could find out for myself, but it just contradicts my experience and info I've learned previously. Thanks again, Alex
URIBL_BLACK vs RCVD_IN_JMF_W
Hi, I have been going through about 15MB of email generated from a procmail recipe searching for RCVD_IN_JMF_W, and you would not believe how many also match URIBL_BLACK or URIBL_GREY. Call me naive, but are there really that many providers that are unaware their clients are sending spam? (okay, rhetorical question :-) IOW, I guess this email is more of an informational note to those who may not be aware, and perhaps for others to comment on whether they even use it? The winner for me was a Bank of America scam with the following two relays: Received: from User (channelf.5460.net [61.137.93.80]) Received: from ortiz.unizar.es (ortiz.unizar.es [155.210.1.52]) No b-of-a relays, of course. This message also hit RAZOR2_CHECK and SPF_FAIL. There's also a money scam that passed through nasa.gov, hit RCVD_IN_JMF_W, and a few fraud rules: Received: from ALTPHYEMBEVSP30.RES.AD.JPL ([128.149.137.84]) by Received: from mail.jpl.nasa.gov (altvirehtstap02.jpl.nasa.gov [128.149.137.73]) Received: from mail.jpl.nasa.gov (sentrion2.jpl.nasa.gov [128.149.139.106]) X-Spam-Status: No, hits=1.1 tagged_above=-300.0 required=5.0 use_bayes=1 tests=AE_ADVICE_WITH_MONEY, AE_FRAUD_ADVICE, BAYES_50, LOTS_OF_MONEY, MILLION_USD, MONEY_TO_NO_R, RCVD_IN_DNSWL_MED, RCVD_IN_JMF_W, RELAYCOUNTRY_US I have RCVD_IN_JMF_W set to 0.5 points. It was also listed in RCVD_IN_DNSWL_MED? Running it a bit later, it scored as spam with the RAZOR rules: X-Spam-Report: * 0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) * -0.5 RCVD_IN_JMF_W RBL: Sender listed in JMF-WHITE * [128.149.139.106 listed in hostkarma.junkemailfilter.com] * -4.0 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, * medium trust * [128.149.139.106 listed in list.dnswl.org] * 0.0 RELAYCOUNTRY_US Relayed through United States * 1.0 AE_FRAUD_ADVICE BODY: Someone offering free advice * 1.8 MILLION_USD BODY: Talks about millions of dollars * 2.1 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level * above 50% * [cf: 56] * 0.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% * [cf: 56] * 0.0 LOTS_OF_MONEY Huge... sums of money * 2.0 AE_ADVICE_WITH_MONEY Has advice and mentions much money * 1.0 MONEY_TO_NO_R Lots of money and bare, missing or undisclosed To * 0.2 MONEY_INHERIT Lots of money from a dead guy X-Spam-Relay-Country: US US US X-Spam-Status: Yes, score=5.4 required=5.0 tests=AE_ADVICE_WITH_MONEY, AE_FRAUD_ADVICE,LOTS_OF_MONEY,MILLION_USD,MONEY_INHERIT,MONEY_TO_NO_R, RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK, RCVD_IN_DNSWL_MED,RCVD_IN_JMF_W,RELAYCOUNTRY_US shortcircuit=no autolearn=disabled version=3.2.5 Thanks, Alex
Re: Problems with high spam
Hi, also if using amavisd make its temp dir on ram speed up scanning and it considered safe, mta have it on disk for the backup :) How about mounting /var with noatime? Does anyone do that? Do you think it helps? What Linux filesystem is best suited for this? ext4? Thanks, Alex
Re: URL rule creation question
\s is the proper way to represent whitespace. lol, yes, I know that; I was actually trying to match 's' and the slash is the start of the pattern match. I wasn't referring to the beginning of the RE. Yeah, I realized that just after I sent this, if anyone cares :-) Thanks again, Alex
Re: URL rule creation question
Hi, The 'doubleheadedrover' domain currently shows up in Razor(E8), uribl_black, surbl_jp, and invaluement. But it wasn't in all of those when he first started posting about it. Yes, that's correct. Thanks for your help. That's already caught a few. I have another that I thought you could help with. I'd like to create a rule that matches a specific letter and up to 5 spaces after it, repeated ten times. I'm thinking something like this: /s\ {5}o\ {5}n\ {5}i\ {5}c\ {5}\ m\ {5}e\ {5}d\ {5}i\ {5}a/i I'm still learning regex's, so hopefully this isn't too far off. The opportunities for rules are coming faster than my ability to learn. Thanks, Alex
Re: JMF whitelist and RAZOR conflict
Hi, I have several emails that are tagged with RCVD_IN_JMF_W, SPF_SOFTFAIL, and RAZOR2_CHECK such as this one: http://pastebin.com/m4a4d990e why accept SPF_SOFTFAIL ? cant this be solved ? I don't understand. I'm still learning how the SPF rules work. Shouldn't I be adding points for an SPF_FAIL? This indicates a spoof attempt, no? are you recieving forwarded emails from spf domains ? If I understand correctly, no. I have no relationship with any external source and their SPF records. if so add the forward ip to trusted_networks (so spf will be disabled from this hosts) Do you mean to avoid the processing overhead? IOW, don't bother checking SPF records for trusted domains? Is the criteria for being listed on the JMF_W simply that it contains a domain that is whitelisted, despite whether it contains another URL that is blacklisted? this is spamassassin working, if there is a blacklisted domain add it to your uribl_skip_domain list Ah, you mean if the domain is erroneously on the blacklist, right? Would I be advised to make the JMF_W score very low, or create a meta that doesn't really whitelist it unless it isn't also blacklisted? this is ip and not domains On a somewhat related note, how does BOTNET differ from RDNS_NONE? What is the logic behind the BOTNET rule? Is there some known list that it's checking, or is it just likely to be a dynamic IP or compromised host if it doesn't have a reverse DNS entry? Thanks so much for the clarification, and confirmation about Gevalia/Kraft. Thanks, Alex
URL rule creation question
Hi all, I've seen this pattern in spam quite a bit lately: href=http://doubleheaderover.com/jazert/html/?39.6d.3d.31.66.67.6b.79.77.63.77.63.65.6e.74.69.6e.6e.69 .61.6c.5f.68.31.33.33.2e.6f.39.39.41.4d.2e.30.30.45.33.39.2e.30.32.30.61.64.6b.37.61.76.61.67.63.31.66. 62.2e.6a.61.7a.65.72.74.2e.68.74.6d.6c3az8fO Would it be reasonable to create a rule that looks for this two-char then dot pattern, or is it reasonable that it might appear in a legitimate email too frequently? If possible, how would you create a rule to capture this? Thanks, Alex
JMF whitelist and RAZOR conflict
Hi, I have several emails that are tagged with RCVD_IN_JMF_W, SPF_SOFTFAIL, and RAZOR2_CHECK such as this one: http://pastebin.com/m4a4d990e Is the criteria for being listed on the JMF_W simply that it contains a domain that is whitelisted, despite whether it contains another URL that is blacklisted? Would I be advised to make the JMF_W score very low, or create a meta that doesn't really whitelist it unless it isn't also blacklisted? meta META_NOT_JMF_RAZOR(RCVD_IN_JMF_W !RAZOR2_CHECK) It also appears to spoof the kraftfoods.com mail server, correct? Is there a possible rule to be created here? Thanks, Alex
Re: JMF whitelist and RAZOR conflict
Hi, http://pastebin.com/m4a4d990e Is the criteria for being listed on the JMF_W simply that it contains a domain that is whitelisted, despite whether it contains another URL that is blacklisted? I'm not sure what you are saying here, it's not as if the people running the whitelist could lookup the IP address on razor. I'm saying that it appears odd that it would be listed on both RAZOR and JMF_W, unless the JMF_W found the kraftfoods.com URL and the RAZOR rules found the bogus http://ADSENSETREASUREONLINE.yolasite.com URL. Unless the yolasite.com is a legitimate kraftfoods site? meta META_NOT_JMF_RAZOR (RCVD_IN_JMF_W !RAZOR2_CHECK) Why RAZOR2_CHECK? Why not other positive scoring rules? The trouble is that the whitelist rule is then pointless. Set it's score at a value that's commensurate with it's effectiveness on your email. Does my question now make sense? I was looking at it from more of a validation point of view for JMF_W, because of the apparent conflict with RAZOR. It also appears to spoof the kraftfoods.com mail server, correct? Is there a possible rule to be created here? No, it was almost certainly sent through kraftfoods.com. It's based on an IP address recorded by your trusted network. Maybe I should have used a better example. Can I ask you to look at this one? http://pastebin.com/m7d61b26f This uses IP 66.132.135.108 as its URL (xybersleuth.com), and unless that's not a spammer's site, then there's something wrong. This email includes JMF_W and RAZOR2_CF_RANGE_51_100 and URIBL_BLACK in the same message, although it has a very low bayes score. Which is correct? Thanks, Alex
Shortcircuit info
Hi all, I'm trying to understand how shortcircuit works to ease some of the load on the severs. First, does anyone have any recommended metas that they use in their environment that might help? Can I add shortcircuit to an existing rule, or does the rule have to be designed to be used with shortcircuit? In other words, I have a meta that combines spamcop with spamhaus: metaMETA_HAUS_COP (RCVD_IN_BL_SPAMCOP_NET RCVD_IN_XBL) describe META_HAUS_COP Contains SPAMHAUS XBL and SPAMCOP score META_HAUS_COP 0 4.0 0 4.0 shortcircuit META_HAUS_COP spam In order for it to be actually shortcircuited, however, I have to make the score 100, correct? Thanks, Alex
Re: Porn-portal spammers
Hi, I am getting rather tired from messages spamming porn-portals. They typically originate from hotmail.com, and advertise a porn-portal based on google.com/groups, google.com/reader, groups.yahoo.com, pipes.yahoo.com, spaces.live.com, docs.google.com, sites.google.com and livejournal.com. This was posted by Martin a week or so ago in response to a similar question by me: This should catch your set and more: uri LOC_YAHOO /^http:.{1,40}\.yahoo[.,]com/i scoreLOC_YAHOO 0 1.5 0 1.5 describe LOC_YAHOO Contains *.yahoo.com uri Or, if you want to be more specific, try this: uri LOC_YAHOO /^http:\/\/(groups|profile|personals)\.yahoo[.,]com/i scoreLOC_YAHOO 0 1.5 0 1.5 describe LOC_YAHOO Contains yahoo.com groups/profile/personals uri Does this help? Best regards, Alex
Re: 3.3.0 alpha 2 on production mail servers / clusers ???
Hi, On Saturday August 29 2009 19:47:32 R-Elists wrote: have many, or any of you folks on the list migrated your production servers to the 3.3.0 alpha 2 or later release? We are certainly one of them (actually running CVS head, which is pretty close to alpha2). About 1000 users here. Do we have an idea of a timeline for the next release and/or production release currently? How about dependencies? Will perl-5.8 work okay? What modules will need to be updated? How about for use with amavis? Will I need to upgrade that? A list of the top five best new features would also be great! *salivates* :-) I'm trying to anticipate what I can do ahead of time to get it into place as soon as possible. Thanks, Alex
Google/Yahoo Spam
Hi all, I'm seeing an increase in Google Reader and yahoo groups/personals/profile spam. Here's an example of the Google Reader spam: http://pastebin.com/m1021fc5f Any ideas on how to catch this one? For the Yahoo spam (with links to yahoo sites ending in '/1', I've created these: uriLOC_YAHOO1 m{http://groups\.yahoo\.com\/}i score LOC_YAHOO1 0 1.5 0 1.5 describe LOC_YAHOO1 Contains groups.yahoo.com uri uriLOC_YAHOO2 m{http://profile\.yahoo\.com\/}i score LOC_YAHOO2 0 1.5 0 1.5 describe LOC_YAHOO2 Raw body contains profile.yahoo uriLOC_YAHOO3 m{http://personals\.yahoo\.com\/}i score LOC_YAHOO3 0 1.5 0 1.5 describe LOC_YAHOO3 Raw body contains personals.yahoo They're somewhat paired down because I'm not very good at pattern matching, so thought someone could improve on this? Thanks, Alex
Converting spam to email message
Hi all, I thought I understood, but I'm still having trouble converting a message in the quarantine back into a normal email message that I can forward on to a recipient. Does anyone know how to do this? Thanks so much. Best regards, Alex
Re: Converting spam to email message
Hi, I thought I understood, but I'm still having trouble converting a message in the quarantine back into a normal email message that I can forward on to a recipient. Does anyone know how to do this? Maybe I missed something, but SpamAssassin doesn't have a quarantine. http://wiki.apache.org/spamassassin/SpamQuarantine Yes, my apologies. I guess it would then be amavisd-new that's managing the quarantine. I didn't realize that amavisd manipulated the mail in that way. Hopefully someone can still help. Thanks, Alex
Training spam as ham and forwarding
Hi SA users, I have a few messages found in the quarantine that I need to train as ham because they were marked as spam incorrectly. To do this, I added the following to the top of the file so it becomes a normal email: From DUMMY-LINE Thu Jan 1 00:00:00 1970 Is this correct? (without the leading spaces) I can now accurately access and index it using pine, whereas before it didn't acknowledge it as a normal email. I'd also now like to forward it to the intended recipient as an attachment, but the recipient isn't able to read it as a normal email, but instead as plain text. How can I accomplish this? Are there mail tools, like procmail or formail, I believe, that were designed to automate this? Does anyone request ham from their users to be trained by bayes, or is autolearning typically the only way (or only real effective way) to do this? Also, on another note, how can I have all email destined for a particular user sent to them, including spam? This is what all_spam_to is for, correct? Thanks, Alex
Re: lottery message scored hammy by bayes
Hi, If you're using autolearning, what are your learning thresholds? What do you recommend for thresholds? I'm considering using autolearning, but very concerned about corrupting the database. I think I would use something like +15 for spam. There are FNs on occasion in the 2.x range with low bayes numbers (or BAYES_50) that I wouldn't want to be tagged as ham. Should that be a concern? Even mail that has been whitelisted could also contain spam, so would a ham threshold of like -100 work, or present the same problem? Thanks, Alex
Re: spam mail with flagged style images
Hi, mimeheader AS_090508_CTYP_PNG Content-Type =~ /image\/png/ mimeheader AS_090508_CTYP_JPG Content-Type =~ /image\/jpg/ mimeheader AS_090508_CTYP_JPEG Content-Type =~ /image\/jpeg/ All scored the same. Can be written as a single rule. I've spent some time and tried to refine my rules based on your advice, guenther. Can I ask you to check them over again and see if this is any better, or at least more inclusive? mimeheader LOC_CDIS_INLINE Content-Disposition =~ /inline/ score LOC_CDIS_INLINE 0.1 describe LOC_CDIS_INLINE Content-Disposition: inline mimeheader LOC_CTYP_IMG ((Content-Type =~ /image\/png/) || (Content-Type =~ /image\/jpg/) || (Content-Type =~ /image\/jpeg/) || (Content-Type =~ /^application\/octet-stream.\.rtf/)) score LOC_CTYP_IMG 0.1 describe LOC_CTYP_IMG Content-Type: PNG-JPG-JPEG-RTF meta LOC_IMGSPAM ((LOC_CDIS_INLINE LOC_CTYP_IMG) score LOC_IMGSPAM 0.1 describe LOC_IMGSPAM Probably inline image meta LOC_BOTNET_IMG ((BOTNET LOC_IMGSPAM) || (BAYES_99 LOC_IMGSPAM)) score LOC_BOTNET_IMG 1.5 describe LOC_BOTNET_IMG Probably inline image spam Generally, no. A spam advertising body part enhancers also has correctly spelled words. Training them doesn't poison Bayes either. And there usually are still useful tokens around. That's great, thanks! Thanks, Alex
Re: spam mail with flagged style images
Hi, mimeheader LOC_CTYP_IMG ((Content-Type =~ /image\/png/) || (Content-Type =~ /image\/jpg/) || (Content-Type =~ /image\/jpeg/) || I thought this passed through my --lint, but I only caught it the second time. I was looking around for the (new) right way to do it, and found this in 80_additional.cf: mimeheader __ANY_IMAGE_ATTACH Content-Type =~ /image\/(?:gif|jpeg|png)/ Now I know. Does the rest look like it will work as expected? Thanks, Alex
Re: spam mail with flagged style images
Hi, Text added to e-mail is a bogus one, never repeated, same as the old styled spam mail with attached images. The OCR doesn't detect nothing, I understand because of flagged effect. Also, image file name changes, if it have. A few of these have slipped through on my systems, but for the most part, these rules have worked here: mimeheader AS_090505_CDIS_INLINE Content-Disposition =~ /inline/ score AS_090505_CDIS_INLINE 0.5 describe AS_090505_CDIS_INLINE Rule by AS: Content-Disposition: inline mimeheader AS_090508_CTYP_PNG Content-Type =~ /image\/png/ score AS_090508_CTYP_PNG 0.5 describe AS_090508_CTYP_PNG Rule by AS: Content-Type: PNG mimeheader AS_090508_CTYP_JPG Content-Type =~ /image\/jpg/ score AS_090508_CTYP_JPG 0.5 describe AS_090508_CTYP_JPG Rule by AS: Content-Type: JPG mimeheader AS_090508_CTYP_JPEG Content-Type =~ /image\/jpeg/ score AS_090508_CTYP_JPEG 0.5 describe AS_090508_CTYP_JPEG Rule by AS: Content-Type: JPEG meta AS_090508_PNGSPAM (AS_090505_CDIS_INLINE AS_090508_CTYP_PNG) score AS_090508_PNGSPAM 0.5 describe AS_090508_PNGSPAM Rule by AS: Probably an Inline PNG spam meta AS_090508_JPGSPAM (AS_090505_CDIS_INLINE AS_090508_CTYP_JPG) score AS_090508_JPGSPAM 0.5 describe AS_090508_JPGSPAM Rule by AS: Probably an Inline JPEG spam meta AS_090508_JPEGSPAM (AS_090505_CDIS_INLINE AS_090508_CTYP_JPEG) score AS_090508_JPEGSPAM 0.5 describe AS_090508_JPEGSPAM Rule by AS: Probably an Inline JPEG spam meta LOCAL_BOTNET_JPG(BOTNET AS_090508_JPGSPAM) score LOCAL_BOTNET_JPG 1.5 describe LOCAL_BOTNET_JPG Rule by AS: Probably an Inline JPEG spam meta LOCAL_BOTNET_JPEG(BOTNET AS_090508_JPEGSPAM) score LOCAL_BOTNET_JPEG1.5 describe LOCAL_BOTNET_JPEGRule by AS: Probably an Inline JPEG spam The LOCAL_* are mine, adapted to others I found some time ago. I'd be interested in people's input on these. Can they be simplified? Do you agree with the scoring? How about bayes poisoning? The messages also all have random text, mostly spelled correctly, but nonsensical. If they are trained, could it adversely affect my bayes db? Thanks, Alex
Junkmailfilter rules
Hi, I've been using the junkmailfilter rules for a few days now, and it's doing quite well. It occurred to me that I might be able to use the RCVD_IN_JMF_W rule filter whitelisted domain mail, and use that to train bayes ham. Would this work? There of course would be mail from constantcontact.com, mailing list mail, newsletters, etc, that all contain a lot of HTML and other components that could equally be seen in spam. How do people typically train bayes ham? I can't rely on my users not to mix up spam and ham, surely corrupting the database. I did find this in one of the emails, passed through delivery.net: X-Spam-Status: No, hits=4.9 tagged_above=-300.0 required=5.0 use_bayes=1 tests=BAYES_50, BOTNET, DKIM_SIGNED, DKIM_VERIFIED, HTML_MESSAGE, RAZOR2_CF_RANGE_51_100, RAZOR2_CF_RANGE_E4_51_100, RAZOR2_CHECK, RCVD_IN_JMF_W, RELAYCOUNTRY_US, SPF_HELO_PASS, SPF_PASS It was a citibank credit card email. How could it be in RAZOR and also whitelisted, and BOTNET? Certainly there were no domains in there that it was relayed through that were part of a botnet. Ideas greatly appreciated. Thanks, Alex
Re: sa-update: stuck at 795855?
Hi, The problem is that the spammers test with the SA rulesets as soon as they are released, which is why the rulesets become ineffective. I'm not sure I agree with that. If this were the case, I would have a lot less spam with scores of 50 or more, which obviously aren't even trying to do something as easy as pass it through SA first. Also, couldn't we then draw conclusions from this that, since vendors like Symantec have rules which never are seen by spammers, that their rules are better? Incidentally, are there technologies that vendors like Symantec, Proofpoint, Cisco, Google, etc, use that we don't have or don't have access to? Thanks, Alex
Re: Assistence needed with spamassasin under RedHat 5.2
Hi, spamassasin. I have a test message which is genuine. Running this through spamassasin with -t (test) mode as described below gives the output below: Running : spamassassin -t /tmp/rose2 gives at the bottom the following (edited for privacy) report. Try adding some debugging output, and first look for something obviously wrong: # spamassassin -D -t /tmp/rose2 21 | less Go line-by-line looking for something that stands out as obviously wrong. Consider obfuscating your message, replacing your domain with example.com, for instance, and uploading it to pastebin.com. Then post a link here so we can all view the message for further ideas. Regards, Alex
Re: gpgkey failures with sa-update
Hi, list. No errors reported then, and I've now forgotten the url. www.yerp.org now gets me a webmail login screen, so obviously that wasn't it. Toss that url to me and I'll replay it again. You should be able to search through your browser history, no? With Firefox v3.5, you can also just type yerp in the location bar, and it will do a more aggressive search through your previous URLs for anything containing those letters. Regards, Alex
Re: Counting RAZOR2 hits
Hi, You can also set your min_cf in your razor config files, which will affect when the RAZOR2_CHECK rule fires. This does work in SpamAssassin, as I have over-ridden the min_cf on my own system, and have done so for years. Thanks to everyone for their great ideas thus far. I'm looking forward to working through it to learn more. I'm seeing a lot of FNs that include various RAZOR rules, but still don't have enough points to be tipped. Are there meta rules that people have created and can share that might help? How about combining it with BOTNET? The ones that have BAYES_99 and most of the SURBLS and RAZOR* are all properly tagged already, but many only have BAYES_50. Some have only RAZOR2_CHECK and contain an inline image. X-Spam-Status: No, hits=4.1 tagged_above=-300.0 required=5.0 use_bayes=1 tests=BAYES_50, HTML_MESSAGE, RAZOR2_CF_RANGE_51_100, RAZOR2_CF_RANGE_E8_51_100, RAZOR2_CHECK, RDNS_NONE, RELAYCOUNTRY_US, SPF_HELO_PASS, SPF_PASS score RAZOR2_CHECK 0 0.9 0 0.9 score RAZOR2_CF_RANGE_51_100 0 0.8 0 0.8 score RAZOR2_CF_RANGE_E4_51_100 0 1.8 0 1.8 score RAZOR2_CF_RANGE_E8_51_100 0 1.5 0 1.5 I see now that RAZOR2_RANGE_E8 should also be at least 1.8, which I've now changed. Does everyone do their own mass-checks these days? How do you go about analyzing the FNs to figure out why they aren't caught and adjust the scores? Of course they need to be looked at individually for additional patterns, but how are the scores best personalized of the rules that are triggered? Thanks, Alex
Re: Barracuda RBL in first place
Hi, So perhaps instead of adding another RBL, maybe some admins need to consider adding in some HELO checking / rejection. Can you explain a bit more here? What are you checking for, that the host is valid? Thanks, Alex
Re: Barracuda RBL in first place
Hi, Unknown user 32.00% (32.00%) 87427696 Greylisted 24.88% (16.92%) 46225401 Throttled 11.03% (5.64%) 15399444 Relay access denied 0.01% (0.00%) 7034 Bogus DNS (Broadcast) 0.01% (0.00%) 11692 Bogus DNS (RFC 1918 space) 0.07% (0.03%) 82135 Spoofed Address 0.26% (0.12%) 319551 Unclassified Event 0.77% (0.35%) 949388 Temporary Local Problem 0.01% (0.00%) 8165 Require FQDN sender address 0.04% (0.02%) 51022 Require FQDN for HELO hostname 8.97% (4.02%) 10988455 [...] Can I ask how you produced those stats? They look very helpful. Thanks, Alex
Re: Barracuda RBL in first place
Hi, What log script do you good people use to generate the list above ? Is it a home brew or one we can download so we can compare our own hits ? http://www.rulesemporium.com/programs/sa-stats.txt Any chance someone knows where there is a compatible one that parses amavisd instead of spamd? I've tried, but guess I don't know enough perl to get it right. Any chance someone has a bit of time to hack on it on this lazy Saturday afternoon? :-) Thanks, Alex
Counting RAZOR2 hits
Hi, I thought grep -c RAZOR2_CHECK through my mail logs would give me a good approximation of the number of times RAZOR2 was consulted, but that doesn't seem to be the case. There are some mails that don't have it listed in the tests= section. I've also tried the razor-* commands, and they don't appear to be able to help here either. What am I missing? Does RAZOR2_CHECK mean that it was found in the RAZOR2 db, or that it merely consulted the db? Thanks, Alex
Elusive spam
Hi, I'm having trouble catching a particular type of spam, and hoped someone had some time to take a look: http://pastebin.com/d57336542 It doesn't match RAZOR2, or any of the URI lists, and it's only BAYES_50. I have a pretty well-established BAYES db, so I'm surprised it's only BAYES_50. What can I do to block spam like this in the future? Thanks, Alex
Re: Elusive spam
Hi, Maybe this will sound dumb but wouldn't it be perfectly safe to blacklist example.com after all, that isn't a domain your ever going to get mail from. I could be wrong, but I'm guessing the example.com is the OP's munging. Yes, that's correct. My apologies. Best, Alex
Re: Elusive spam
Hi, Are we to make guesses on what else might be munged? Is just example.com munged or the 172.0.0.1 also munged? Just the domain was munged. Thanks for the info. I should have been able to figure that out. Thanks, Alex
Re: Elusive spam
Hi, it hits spamhaus, and spamcop, what more do you want ? meta haus_cop (spamhaus spamcop) score haus_cop 5 X-Spam-Status: No, hits=4.8 tagged_above=-300.0 required=5.0 use_bayes=1 tests=BAYES_50, DATE_IN_PAST_03_06, RCVD_IN_BL_SPAMCOP_NET, RCVD_IN_SORBS_WEB, RCVD_IN_XBL, RELAYCOUNTRY_US, URI_HEX 50_scores.cf:score RCVD_IN_BL_SPAMCOP_NET 0 2.188 0 1.960 # n=0 n=2 50_scores.cf:score RCVD_IN_XBL 0 2.896 0 3.033 # n=0 n=2 70_relay_country.cf:score RELAYCOUNTRY_US 0.1 50_scores.cf:score RCVD_IN_SORBS_WEB 0 1.117 0 0.619 # n=0 n=2 50_scores.cf:score BAYES_50 0 0 0.001 0.001 50_scores.cf:score URI_HEX 1.777 1.316 1.395 0.368 50_scores.cf:score DATE_IN_PAST_03_06 2.299 1.394 1.306 0.044 Something doesn't seem right. Am I adding them wrong? It sure seems to equal more than 5.0. Is it possible the rules are being scored differently in another location? The meta rule is a good one. I'll create that now. Thanks, Alex
Re: Elusive spam
Hi, 50_scores.cf:score RCVD_IN_BL_SPAMCOP_NET 0 2.188 0 1.960 # n=0 n=2 50_scores.cf:score RCVD_IN_XBL 0 2.896 0 3.033 # n=0 n=2 70_relay_country.cf:score RELAYCOUNTRY_US 0.1 50_scores.cf:score RCVD_IN_SORBS_WEB 0 1.117 0 0.619 # n=0 n=2 50_scores.cf:score BAYES_50 0 0 0.001 0.001 50_scores.cf:score URI_HEX 1.777 1.316 1.395 0.368 50_scores.cf:score DATE_IN_PAST_03_06 2.299 1.394 1.306 0.044 Something doesn't seem right. Am I adding them wrong? It sure seems to equal more than 5.0. Is it possible the rules are being scored differently in another location? It does look like the XBL scores may have been modified in another config file by a previous admin, ugh. Thanks, now I know. Thanks, Alex
Post trips pastebin spam filter
Hi, I have another spam message that is very elusive, and thought someone might be able to take a look. I tried to post it to pastebin, and its spam filter apparently catches it, and prevents me from posting. It's definitely in the header. Is there something else I can do to post it, or does someone know how their spam filter works? I tried even obfuscating the spam URLs, but it still catches it. The spam has BAYES_99, and is also DKIM signed and verified, and passes SPF, and despite having Congratulations!, Wal-Mart and several URLs in the body, it's not caught. Thanks, Alex
Scores, razor, and other questions
Hi, After another day of hacking, I have a handful of general questions that I hoped you could help me to answer. - How can I find the score of a particular rule, without having to use grep? I'm concerned that I might find it at some score, only for it to be redefined somewhere else that I didn't catch. Something I can do from the command-line? - How do I find out what servers razor is using? What is the current license now that it's hosted on sf, or are the query servers not also running there? It doesn't list any restrictions on the web site. - The large majority of the spam that I receive these days is a result of a URL not being listed in one of the SBLs. I'm using SURBL, URIBL, and spamcop. For example, I caught guadelumbouis.com several hours ago, and it's still not listed in any of the SBLs. Am I doing something wrong or am I missing an SBL? Has anyone else's spam with URLs increased a lot lately? Thanks, Alex
RelayCountry Config
Hi, I'm trying to configure RelayCountry. I have it installed, and SA recognizes it: # spamassassin --lint -D 21|grep -i country [4278] dbg: diag: module installed: IP::Country::Fast, version 604.001 [4278] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry from @INC [4278] dbg: plugin: Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9648) implements 'extract_metadata', priority 0 [4278] dbg: plugin: Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9648) implements 'parsed_metadata', priority 0 I've loaded the plugin, and add_header according to the wiki page: add_header all Relay-Country _RELAYCOUNTRY_ loadplugin Mail::SpamAssassin::Plugin::RelayCountry I can create rules for each country I'd like to identify, and that successfully adds it to the header: header RELAYCOUNTRY_RU X-Relay-Countries =~ /RU/ describeRELAYCOUNTRY_RU Relayed through Russian Federation score RELAYCOUNTRY_RU 2.0 I was hoping to also have the X-Spam-Countries header added, but that doesn't seem to work. I'm using v3.2.5, so it has the RelayCountries.pm patch to add that support. What am I missing? Somewhat of a basic question, but once I do manage to get that header working, I know I can parse that and make decisions based on it. Are there any pre-written perl routines or utilities that can make that information useful? Also, I believe I read it adds bayes metadata to the email. Is that just through the additional headers or is it supposed to add something else? Thanks, Alex
Re: RelayCountry Config
Hi, I don't know if it makes a difference, but I call it Relay-Countries to match the name of the pseudo-header used in the tests add_header all Relay-Countries _RELAYCOUNTRY_ It doesn't appear to make a difference. I must be doing something else wrong. Using spamassassin --lint -D 21 | less shows the X-Relay-Countries header, but it's null: # spamassassin --lint -D 21 | egrep -i 'relay|country|countries' [23760] dbg: diag: module installed: IP::Country::Fast, version 604.001 [23760] dbg: config: read file /etc/mail/spamassassin/70_relay_country.cf [23760] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry from @INC [23760] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayEval from @INC [23760] dbg: Botnet: adding (\b|\d)relay(\b|\d) to botnet_serverwords [23760] dbg: Botnet: adding (\b|\d)relay(\b|\d) to botnet_serverwords [23760] dbg: metadata: X-Spam-Relays-Trusted: [23760] dbg: metadata: X-Spam-Relays-Untrusted: [23760] dbg: metadata: X-Spam-Relays-Internal: [23760] dbg: metadata: X-Spam-Relays-External: [23760] dbg: plugin: Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9698) implements 'extract_metadata', priority 0 [23760] dbg: metadata: X-Relay-Countries: [23760] dbg: plugin: Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9698) implements 'parsed_metadata', priority 0 [23760] dbg: rules: ran eval rule NO_RELAYS == got hit (1) [23760] dbg: Botnet: no trusted relays [23760] dbg: check: tests=MISSING_DATE,MISSING_HEADERS,MISSING_SUBJECT,NO_RECEIVED,NO_RELAYS,RELAYCOUNTRY_LOW I've added your rules in 70_relay_country.cf, and they trigger in the tests=, but the header isn't added. I've added the add_header in init.pre, above the loadplugin line as well as adding it in local.cf when it didn't work in init.pre. I've also checked email that has actually been tagged by these rules, and not just from a -D run, and it's not there either. Thanks again, Alex
Anti-Phishing and Spear-Phishing Version 2
Hi, Has anyone tried the phishing rules generated by Julian Field and developed by Google? It looks really neat: http://www.jules.fm/Logbook/files/anti-phishing-v2.html It's basically a list of 3.5k email addresses found in email thought to be spam. Looks to be developed by Google, so it's safe? Thanks, Alex
Re: RelayCountry Config
Hi, [23760] dbg: metadata: X-Relay-Countries: The --lint test is *NOT* valid for this. --lint is *ONLY* to verify your config files are parseable. Yes, thanks, I should have known that, and I think I did. I mentioned in the previous post that I tried it with a real message, and even viewed a number already in quarantine, and the same result. I found this message on nabble: http://www.nabble.com/Question-about-RelayCountry-td18309349.html#a18339974 Same problem, back in'08, with no resolution. I even downgraded to the IP::Fast released in Jan 09, and no difference. Could this be a problem with one of the modules, or is this most likely a configuration issue? What I don't understand is that it knows which country its relayed through, because it prints the rules in the tests= section: X-Spam-Status: Yes, hits=21.8 tag1=-300.0 tag2=4.9 kill=4.9 use_bayes=1 tests=BAYES_50, BODY_ENHANCEMENT, BOTNET, FH_HELO_EQ_D_D_D_D, RDNS_NONE, RELAYCOUNTRY_UK, SARE_ADULT2, SARE_RECV_IP_FROMIP3, URIBL_AB_SURBL, URIBL_BLACK, [] Curiously, why doesn't it print them each in a column with description, instead of all together? Thanks, Alex
Re: RelayCountry Config
Hi, This is also why the plugin works and you do get the per-country rule hits, but don't get the SA Relay-Countries header. Yes, you are correct. Thanks for the lead and the explanation. Here's a thread that talks about how to add the header for amavisd: http://www.mail-archive.com/amavis-u...@lists.sourceforge.net/msg12416.html I'm not sure it's really necessary after all, though, because the rules work without it, and it still doesn't print the header in quarantined mail. char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}} How did you get line noise from your modem to look so much like perl code? :-) Thanks, Alex
Re: RelayCountry Config
Hi, I find ordinary header and meta rules are all I need: http://pastebin.com/f5e5232d1 Among those rules you have: meta RELAYCOUNTRY_MED ! RELAYCOUNTRY_HIGH ( __RELAYCOUNTRY_AF || __RELAYCOUNTRY_AS || __RELAYCOUNTRY_EU_S || __RELAYCOUNTRY_OC_S || __RELAYCOUNTRY_AM_S ) It's probably hard to read, but doesn't this exclude the US? RELAYCOUNTRY_AM_S are all the Americas except US and CA. If I understand correctly, this says NOT RELAYCOUNTRY_HIGH and all countries except US and CA, which means that RELAYCOUNTRY_MED would trigger on all US and CA relays. Thanks, Alex
Upgrading bayes DB
Hi, I'm still working on my bayes training project, but also trying to upgrade the bayes DB due to upgrading perl and all the associated modules. I started with this output from sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 1786 0 non-token data: nspam 0.000 0 3698 0 non-token data: nham 0.000 0 198349 0 non-token data: ntokens 0.000 0 929232460 0 non-token data: oldest atime 0.000 0 1249369370 0 non-token data: newest atime 0.000 0 1249369387 0 non-token data: last journal sync atime 0.000 0 1249342872 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count After the upgrade (sa-learn --sync -D), it zeroed the nham and nspam. How could this happen? What could I have done wrong? This is after the upgrade: 0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 0 0 non-token data: ntokens 0.000 0 1249438016 0 non-token data: oldest atime 0.000 0 1249438016 0 non-token data: newest atime 0.000 0 1249438016 0 non-token data: last journal sync atime 0.000 0 1249438016 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count It seemed to indicate that it was upgrading from db version 0 to db version 2, then db version 3, although the first sa-learn output shows that it was already version 3. Thanks, Alex
Bayes training
Hi, We have accumulated quite a large list of whitelisted users, primarily because they were previously tagged incorrectly. I've extracted a copy of all whitelisted mail into a separate mbox. Certainly there is some spam in there as well, but assuming I only learn the ham, would it make sense to train bayes using the emails from this folder? It's all business-related, but I'm concerned that it may have things in the email that caused it to be tagged in the first place, like excessive HTML, sent from a host with no reverse DNS, etc. -- all the reasons for it being whitelisted in the first place. Looking at the logs before the addresses were added to the whitelist, I see quite a few that were BAYES_99, probably because they resemble mailing lists, such as those from networkworld, for example. IOW, I wouldn't want to whitelist an email from networkworld.com, but one of the company's partners could send the company an email that had many of those characteristics. Someone may also send them a one-line email with a small GIF as an attachment, such as their corporate logo in their signature. This would be a valid email, but also very much resembles the characteristics of a typical spam. This is all being done to hopefully train bayes to better recognize corporate email, and hopefully cut down on the number of whitelisted senders that must be added in the future (or, corporate email that gets tagged then must be whitelisted). Ideas greatly appreciated. Thanks, Alex
Upgrading perl modules for SA
Hi, I recently upgraded perl from 5.6.0 to perl-5.10.0, along with all the modules necessary for sa-3.2.5 and amavisd-new (an old version still). I'm now having a problem that I really don't understand: Jul 30 14:24:30 bigship amavis[1757]: (01757-175) TROUBLE in check_mail: decoding2-get-file-types FAILED: 'file' utility (/usr/bin/file) failed, status=1 (256 ) at /usr/sbin/amavisd line 4019. Jul 30 14:24:30 bigship amavis[1757]: (01757-175) PRESERVING EVIDENCE in /var/amavis/amavis-20090730T142430-01757 The amavisd children are running as a regular user. When I su to that user and run /usr/bin/file with the files listed above, it successfully returns the correct type of file. The lines in amavisd surrounding 4019 are: $file ne '' or die Unix utility file(1) not available, but is needed; for my $part (@$partslist) { my($filename) = $tempdir/parts/$part; my($filetype) = ''; my($proc_fh) = run_command(undef, undef, $file, $filename); while( defined($_ = $proc_fh-getline) ) { $filetype .= $_ } my($err); $proc_fh-close or $err=$!; my($ret) = retcode($?); = 4019 $ret==0 or die 'file' utility ($file) failed, status=$ret ($? $err); chomp($filetype); my($taint) = substr($filetype,0,0); # remove file name $filetype = $1.$taint if $filetype=~/^.+?:[\t ](.*)$(?!\n)/s; section_time('get-file-type'); local($_) = $filetype; my($ty); # try to classify some common types and give them short type name # _last_ match wins! Running spamassassin --lint returns no errors or warnings. Amavis complains that I'm missing a few modules, like SPF, DKIM, and IO::Socket::SSL, but I don't think they're related, and I guess they weren't on there before when it was working fine. Thanks, Alex
Re: Upgrading perl modules for SA
Hi, check_mail: decoding2-get-file-types FAILED: 'file' utility (/usr/bin/file) failed, status=1 (256 ) at /usr/sbin/amavisd line How's this a SA question? Yes, my apologies. I don't know enough about amavis yet, and thought it may be related to all the modules I upgraded, and not amavis itself. I've since reverted my changes back to perl-5.6.0, and going to subscribe to that list too. I also upgraded Berkeley DB to db4 and have left db3, db2, and db1 on the system too. However, now I'm having a problem with bayes: [10496] dbg: bayes: tie-ing to DB file R/O /home/sscan/.spamassassin/bayes_toks [10496] dbg: bayes: tie-ing to DB file R/O /home/sscan/.spamassassin/bayes_seen [10496] dbg: bayes: found bayes db version 0 [10496] warn: bayes: bayes db version 0 is not able to be used, aborting! at /usr/lib/perl5/site_perl/5.6.0/Mail/SpamAssassin/BayesStore/DBM.pm line 196. I guess I don't understand the logic, because around 196 is the following, which appears to say that if $self-_check_db_version doesn't equal zero, then fail, but we know it equals version zero from what is stated above... $self-{db_version} = ($self-get_storage_variables())[6]; dbg(bayes: found bayes db version .$self-{db_version}); # If the DB version is one we don't understand, abort! if ($self-_check_db_version() != 0) { warn(bayes: bayes db version .$self-{db_version}. is not able to be used, aborting!); $self-untie_db(); return 0; } Thanks, Alex
Re: Low Scoring Lotto Spam
Hi, * 3.0 RCVD_IN_UCEPROTECT2 RBL: Received via a relay in * dnsbl-2.uceprotect.net * [81.202.69.68 listed in dnsbl-2.uceprotect.net] * 2.0 RCVD_IN_UCEPROTECT3 RBL: Received via a relay in * dnsbl-3.uceprotect.net * [81.202.69.68 listed in dnsbl-3.uceprotect.net] How successful have you been with the UCEPROTECT lists? Seems like a nice project. How come more people aren't using it? IOW, you seemed to be the only one of the four or five people that posted their output from this lotto spam. Why such a disparity in the rules that people use? Thanks, Alex
Re: whitelist_from questions
Hi, I'm looking an email that appears to be one of the users from the whitelist, but instead was from: From probesqt...@segunitb1.freeserve.co.uk Mon Jul 27 19:49:19 2009 Why can't a comparison be made between the From: info and the actual sender? Is this because of virtual domains and/or users? Thanks, Alex
Re: Lotto/Money email address spam
Hi, Please don't paste examples to this list. Please post them to pastebin (or a similar service) and then include the link. .. Yes, understood. FWIW, I know enough to not post an entire message with headers to the list -- I'm sure half the time it would be filtered anyway. This time it was just a snippet, but in the future I'll post even those online, too. Thanks, Alex
Re: Lotto/Money email address spam
Hi, sa-update lint checks the rules in a sandbox, and does not update the local channel, if there are any issues. Moreover, do NOT copy these updates to your site config dir -- but keep it in the update dir where sa-update puts them [1]. SA knows how to use them instead of the install-time default conf. Okay, great. That is what I have now done. I actually have multiple mail servers, none of which have direct access to the Internet other than inbound SMTP, so I have sa-update running on another box, which creates a tarball, which is then scp'd to the mail servers and extracted. For me, this now means the sa-update channels are in /var/lib/spamassassin/3.0005/ and my local site-config is /etc/mail/spamassassin, where local.cf and init.pre reside. I also spent much of the day reading docs. I've worked with Linux now for many years, and have been involved with SA, just not to the level that I'm involved now. It's a rather bizarre picture I'm sensing here. From your recent posts I understand you are running a mail server for a large organization. Yet there is this cannonade with rather basic questions... guenther, I knew you were a smart guy :-) Yes, there is a bigger picture; hopefully I get some cred for trying to tackle this on my own (with the help of others more experienced). Anyway, I'm trying to use sa-update to install the SOUGHT rules, and linting them shows this: [17021] warn: config: invalid regexp for rule __SEEK_AY2NNY: /This place is so exclusive, how did you get an invite\x{e2}\x{80}\x{a6} /: /This place is so exclusive, how did you get an invite\x{e2}\x{80}\x{a6} /: Can't use \x{} without 'use utf8' declaration I'm using perl-5.6.0; is that the cause? Thanks again, Alex
Re: whitelist_from questions
Hi, Firstly, before you convert all these to whitelist_from_rcvd, perhaps you ought to ask yourself whether you really need 1000 entries on your whitelist. I'm surprised you were the first to make that very comment, so thanks. Does mail from these addresses actually get miscategorised as spam, or would SA get it right without the whitelist? Mail was being tagged as spam, and the organization became concerned that others would be tagged, so it seemed anytime there was a high-profile external business contact that they couldn't risk being tagged, they had it added to the whitelist. The list used to be much larger until we spent quite a while (months and months) going through it with them to prune it. I don't doubt that if we removed a substantial amount of them that SA would do what's right, but there doesn't seem to be any scientific way to do that successfully. Secondly, don't forget about whitelist_from_spf. If a domain has an SPF record, this is a better solution than whitelist_from_rcvd as it avoids the need for *you* to work out which are the outgoing servers. Is there a way to script that for the 1000 or so entries, to see which have SPF records? Lastly, if you do use whitelist_from_rcvd, remember that there may be multiple outgoing servers for a given domain, and worse they may change over time. Yeah, I thought of that too, so it doesn't sound like that's going to work well here. Thanks, Alex
Eliminating unnecessary rules
Hi, I have created a routine where I can enter a string into a text file and it gets converted into a set of rules that form a cf file. They are all of the form LOCAL_RULE_N, where N is a random 6-digit number. Two points are added if the rule is triggered. There are now about 3800 of these rules, dating back chronologically about a year or so. I've learned a lot over the past year, and I now think some of these patterns may be catching valid mail, so I'd like to figure out how best to prune at least the ones that are no longer triggered or are triggered but don't cause the email to become spam. IOW, the message would be spam regardless of whether the rule fired. What is the best way to do this? An awk script on mail.log over the past few weeks? How can I wildcard the script with so many rules, and when they have random numbers at the end? I'm still surprised how many are hitting for things like Acai Berry or PO Box 1845 | Ft. Worth | TX, for example. Thanks for any ideas. Alex
Re: Spam troubleshooting
How effective are razor/pyzor and SPF/DKIM? very effective, razor/pyzor altogether with DCC. SPF also helps much, although it should be implemented at SMTP level and refuse all messages that cause (hard) fail. While DKIM is currently in SA, the only place it currently applies is whitelisting, since it has scores of +/-0.001. Different scores were mentioned here, but not incorporated into SA scores yet. I've always been a bit hesitant to use any of those. Why? Because how often do spammers have DNS entries with valid SPF or DKIM information? How often do spammers use compromised hosts with valid SPF or DKIM information? Will they help with emails that only contain a random URL and a line or two of text, like: ma...@myhost.com: Get your Nursing Degree here http://spamsite.com/ Or would that be DCC? Often times these types of emails get through, apparently before the URL is listed in spamcop, SURBL, or URIBL_BLACK? Can I also ask where the best place to start with to implement razor and/or pyzor in SA3.2 on Linux with postfix? Thanks, Alex