Re: Uppercase E-mail in Latin America
On 10/6/2009 2:33 AM, Warren Togami wrote: Please excuse me, I used faulty logic. I wasn't asking you anything further. I meant I asked this friend for more details and it seems to be non-technical users is the most likely type of people to type legitimate mail in all caps. Warren so what score is being added to this uppercase stuff? score UPPERCASE_50_75 0.001 0.490 0.001 0.001 score UPPERCASE_75_100 2.402 1.930 1.127 1.528 reminder: SA scores and one rule, per default won't tag something as spam. where's the problem? what's the worry?
Re: OT bad news
On Mon, 2009-10-05 at 15:05 -0700, Quanah Gibson-Mount wrote: --On Monday, October 05, 2009 11:50 PM +0200 mouss mo...@ml.netoyen.net wrote: Thomas Mullins a écrit : We have been running Spamassassin for maybe eight years now. But, my coworkers do not like OpenSource. So they have finally complained enough that my boss is going to replace our reliable FreeBSD/Spamassassin boxes. They are planning on purchasing something that runs ON Exchange. What a bummer. and the problem is? if they want exchange, give them exchange. don't fight (directly), watch instead. take pleasure of the situation, get fun as you can. I personally took fun all day long in windows-only (and believe it or not, in linux-only) environments. that said, you can still try to explain that exchange should not be exposed to the internet. you still need a relay (such as freebsd/postfix). And once exchange falls over, show them Zimbra. ;) Which uses postfix/SA/amavis, etc, and looks a lot like exchange... only better. ;) Isnt zimbra dead as yet ? Yahoo deliberately messed it I believe , and now look to dump it Anyway I think people run away from open source because it is unsupported. Management doesnt want to have any indispensable IT team , so that they can always recruit some cheap M$$ trained guy from the market to do a dirty job. There is also security in question. If something goes wrong with your linux/BSD box *you* will be blamed. If something goes wrong with m$ box (as usual) they would claim that that is how it is supposed to work :-). After all it is from the leading software makers. Never mind that the management also get sponsored International holidays for putting their entire budget in worthless stuff. --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc Zimbra :: the leader in open source messaging and collaboration
Re: Problems with whitelist_from_rcvd
Ignore the text immediately after the from, in this case SUB.MYDOMAIN.MAIL. That is _not_ rDNS data, that is whatever the client sent in its SMTP HELO, and can be _anything_. If you see the correct hostname there it just means that computer is sending its correct hostname when it says HELO. To illustrate, I pulled this out of your message to the list, it is not edited in any way: Received: from localhost (unknown [213.108.33.133]) by highlink.ru (Postfix) with ESMTP id 37F236A818D for users@spamassassin.apache.org; Mon, 5 Oct 2009 10:28:48 +0400 (MSD) I'm pretty sure 213.108.33.133's rDNS does not say localhost. The (unknown [12.12.12.12]) is the DNS data about the client as your MTA sees it, and the fact that it says unknown means that for some reason it cannot perform rDNS on that IP address, or perhaps its rDNS is explicitly set to unknown. If rDNS was working you'd see something like: Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by ga.impsec.org (8.13.7/8.13.7) with SMTP id n956Tp8L020518 for jhar...@impsec.org; Sun, 4 Oct 2009 23:29:55 -0700 Exactly how are you checking the rDNS of that IP address? Can you demonstrate? For example, here are rDNS lookups on the two IP addresses from my examples above: jhar...@dendarii ~ $ host 213.108.33.133 133.33.108.213.in-addr.arpa domain name pointer 133.33.108.213.hl.ru. jhar...@dendarii ~ $ host 140.211.11.3 3.11.211.140.in-addr.arpa domain name pointer hermes.apache.org. I note that the first does have an rDNS, even though the Received: header from the MTA in the example above says unknown. Are you performing your rDNS tests on the MTA computer? It looks to me like the DNS setup on it is misconfigured somehow and it can't perform rDNS queries successfully. What I do (all commands on the mail-server, where SA is installed): # host SUB.MYDOMAIN.MAIL SUB.MYDOMAIN.MAIL has address 12.12.12.12 # host 12.1204.68.58 12.12.12.12.in-addr.arpa domain name pointer SUB.MYDOMAIN.MAIL. host does not produce anything else but a single row -- С уважением, Igor Bogomazov Игорь Богомазов Главный технический специалист HighLink Ltd. St-Petersburg, Russia 8(812)334-12-12 [доб. 220] 8(963)344-44-38 (Билайн) http://www.hl.ru signature.asc Description: PGP signature
Re: Hostkarma White list Updated and Improved
Jon Trulson wrote: On Mon, 5 Oct 2009, Marc Perkel wrote: John Hardin wrote: On Mon, 5 Oct 2009, Marc Perkel wrote: Our white list is supposed to be a source of pure good email. So if spam comes for any of the white listed IPs then it's an error. Whose? Yours or theirs? Meaning: is a single spam reason for an IP to be dropped from the hostkarma whitelist? It depends on what kind of spam it is. If it is a virus generated spam - then yes. If it's a spam determined by message content - no. Sorry if I missed this in the thread, but how do you determine whether a spam originates from a bot-net vs. a 'lone wolf'? A combination of several factors including hitting my tarbaby server AND not using QUIT to close the connection AND some HELO sins. I'm catching near 100% of botnet spam.
Re: OT bad news
Ted Mittelstaedt wrote: Gary Smith wrote: Let them have as much Windows stuff as they want. Just plead the case to supplement. I'll have to repeat, for the original poster this isn't a technology vs technology argument. If it was, his coworkers would be listing specific things Exchange does that FreeBSD/SA does not do. This is a political battle. He is essentially in the position of a mechanic that someone brings their car to for repair, then sits there telling the mechanic what tools he should be using to repair their car. If the car gets repaired the owner claims that they knew how to repair the car better than the mechanic and the mechanic was an idiot. If the car repair fails the owner claims the mechanic is incompetent and an idiot. Either way, once your boss starts micromanaging, your going to be screwed whether you do a good job or not. He's tried rescuing the situation for 8 years, now your giving advice to help him rescue the situation more. If he helps them by keeping the BSD server in reserve, and they fall flat on their face and he rescues them, then it just is teaching them what to fix on their Exchange setup. They will try it again - perhaps falling flat again - and this will continue over and over with them putting more powerful hardware and more expensive add-on software on their exchange box until eventually they will figure it out, make him get rid of the BSD box - then they won't fall flat anymore. Then they will claim how much better Exchange works, completely ignoring the fact that he helped them troubleshoot their exchange setup. There is absolutely no fix for these types other than to let them fail and not help them back up - just let them be fired for incompetence. Trust me - even if that happened to these coworkers they will just go to the next employer that's a Windows only shop and will never once believe that the Windows solution is worse. It's just like the people who believe in Apple. They will go spend $1K on an iMac and accessories and get -exactly- the same thing that I can build with FreeBSD and a whitebox clone for a quarter of the cost - but will never believe that they overpaid for what they have. Ted (Standing ovation on both emails) -- Dan Schaefer Web Developer/Systems Analyst Performance Administration Corp.
Re: OT bad news
On 5-Oct-2009, at 14:49, Thomas Mullins wrote: I will pull out our BSD box, and I will let them connect the Exchange box straight to the Net. It's a shame that, living in Denver, I will be *just* out of range of hearing the screams as the mailspools fill with viruses, malware, and massive payloads of Spanish Prinsoner spams. Really, it should be fun. Personally, I would NOT keep the old setup standing by unless specifically told to. I would rebuild it, slowly. Take a few days, maybe a week, to get it all back online. After all, once the panic starts, its worth it to teach them a lesson. Also, there's going to be some advantage to building a nice new install with updated everything, right? It's not like you'd be being VINDICTIVE, just cautious, right? -- Battlemage? That's not a profession. It barely qualifies as a hobby. 'Battlemage' is about impressive a title as 'Lord of the Dance'. PAUSE I'm adding Lord of the Dance to my titles.
Re: OT bad news
On 5-Oct-2009, at 16:58, Ted Mittelstaedt wrote: It's just like the people who believe in Apple. They will go spend $1K on an iMac and accessories and get -exactly- the same thing that I can build with FreeBSD and a whitebox clone for a quarter of the cost - but will never believe that they overpaid for what they have. Now if that were true then a lot of Unix admins would not have Macs for their personal machines. If you're buying a machine to be a mailserver then buying an iMac is silly. If you're buying a machine to use and to administer unix server, then a Mac is a fine choice (Probably not an iMac, a MacBookPro). The question is are there things you want your computer to do outside of the command line? -- Mac OS X, because making Unix user-friendly was easier than fixing Windows
Re: Problems with whitelist_from_rcvd
On Tue, 6 Oct 2009, Igor Bogomazov wrote: Exactly how are you checking the rDNS of that IP address? Can you demonstrate? Are you performing your rDNS tests on the MTA computer? It looks to me like the DNS setup on it is misconfigured somehow and it can't perform rDNS queries successfully. What I do (all commands on the mail-server, where SA is installed): # host SUB.MYDOMAIN.MAIL SUB.MYDOMAIN.MAIL has address 12.12.12.12 # host 12.1204.68.58 12.12.12.12.in-addr.arpa domain name pointer SUB.MYDOMAIN.MAIL. host does not produce anything else but a single row Okay, good. That proves that host's rDNS is properly set up. Can you run that command on the same computer that your _MTA_ is running on? The MTA is what is doing the rDNS lookups for the Received: header. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If healthcare is a Right means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- 5 days since a sunspot last seen - EPA blames CO2 emissions
Re: OT bad news
- Quanah Gibson-Mount qua...@zimbra.com wrote: | --On Monday, October 05, 2009 11:50 PM +0200 mouss | mo...@ml.netoyen.net | wrote: | | Thomas Mullins a écrit : | We have been running Spamassassin for maybe eight years now. But, | my | coworkers do not like OpenSource. So they have finally complained | enough that my boss is going to replace our reliable | FreeBSD/Spamassassin boxes. They are planning on purchasing | something | that runs ON Exchange. What a bummer. | | | | and the problem is? | | if they want exchange, give them exchange. don't fight (directly), | watch | instead. take pleasure of the situation, get fun as you can. I | personally took fun all day long in windows-only (and believe it or | not, | in linux-only) environments. | | | that said, you can still try to explain that exchange should not be | exposed to the internet. you still need a relay (such as | freebsd/postfix). | | | And once exchange falls over, show them Zimbra. ;) Which uses | postfix/SA/amavis, etc, and looks a lot like exchange... only better. | ;) | | --Quanah | Seconded :) Best Regards, -- This message has been scanned for viruses and dangerous content and is believed to be clean. SplatNIX IT Services :: Innovation through collaboration
Re: Uppercase E-mail in Latin America
On 5-Oct-2009, at 12:53, René Berber wrote: Warren Togami wrote: On 10/05/2009 02:30 PM, René Berber wrote: Warren Togami wrote: I heard an interesting story from a friend who was working in Mexico for the past few months. Apparently in some Latin American countries, uppercase legitimate person-to-person e-mail is common because it is seen as a sign of respect. This apparently is due to historical telegraph messages being in uppercase. Not true. Could you provide some context? Where are you from? What kind of industry or people are you exposed to? I am Mexican, living in México City. I grew up in Guadalajara and still have friends there, and in 'el De Effe' as well as scattered around a few other places in Mexico and I can confirm this is simply not true. No one uses all caps as a sign of respect. I can't speak to other Latin American countries. Perhaps this is true in Guatemala, or Nicaragua? I doubt it though. -- Everybody hates a tourist, especially one who thinks it's all such laugh. Yeah, and the chip stains and grease will come out in the bath. You will never understand how it feels to live your life with no meaning or control, and with nowhere left to go. You are amazed that the exist, and they burn so bright whilst you can only wonder why.
Re: SIGCHLD query
Martin Gregorie wrote: What causes a spamd 3.2.5 child process to be terminated by receiving a SIGCHLD signal? A parent process receives a SIGCHLD when a child process terminates. My last month's logs show 7 of them and I can't work out what caused them to be sent. However, Jose Luis Marin Perez' system is seeing a lot of them - on the order of 10% of messages scanned are getting hit by them, though his seem to be connected with very long running scans. A timeout in the child perhaps? /Per Jessen, Zürich
RE: OT bad news
I have no explanation, Their supposed complaint is, they don't know *nix. But my coworker and I manage those boxes, so even if one of us left, there would be at least one person to run those boxes. SA/ClamAV has been working great. Our BSD box sits in front of the Exchange, hands off clean mail, what more could you ask for. We have two boxes, in case we need to take one down for an upgrade. I will pull out our BSD box, and I will let them connect the Exchange box straight to the Net. Shane Shane, you have probably already thought of and done this yet just in case... document the entire history of these boxes and save the configs of course... plus compile as much the functional statistics as you can over the life (logs) of those servers re: how much total email and how much malware and ham and spam and rejected and delivered email qty etc etc... that way, when the doodie hits the fan and end users are screaming over the huge increase in spam, you have hard stats that tell the real story and write the one page paper about it... whether now, or later, possibly consider distributing it to people that seriously need to know. - rh
RE: Uppercase E-mail in Latin America
I grew up in Guadalajara and still have friends there, and in 'el De Effe' as well as scattered around a few other places in Mexico and I can confirm this is simply not true. No one uses all caps as a sign of respect. I can't speak to other Latin American countries. Perhaps this is true in Guatemala, or Nicaragua? I doubt it though. hm doesnt it appear to everyone else that this has the (slim to none) makings of a new urban legend? i mean, if all caps was a sign of respect on that continent, then wouldnt all of the advertising be in all caps out of respect a few days ago when this was posted it was almost believable, for like 3 seconds of pondering. - rh
Re: SIGCHLD query
On Tue, 2009-10-06 at 16:46 +0200, Per Jessen wrote: Martin Gregorie wrote: What causes a spamd 3.2.5 child process to be terminated by receiving a SIGCHLD signal? A timeout in the child perhaps? That thought that may be the reason. It certainly seems to apply when a child runs longer than the time set by --timeout-child but there are a few cases where a SIGCHLD is sent when the child has only run for a second or two. Its a pity the log message doesn't include the reason why the SIGCHLD was sent. Martin
RE: OT bad news
(Standing ovation on both emails) -- Dan Schaefer Web Developer/Systems Analyst Performance Administration Corp. I feel beat down now :( j/k
Re: OT bad news
Hi, It's a shame that, living in Denver, I will be *just* out of range of hearing the screams as the mailspools fill with viruses, malware, and massive payloads of Spanish Prinsoner spams. Awe, c'mon now. Yes, I agree SA is a better solution, but Microsoft didn't get to be a multi-billion-dollar company solely because of its marketing. Certainly a competent admin following some SANS guides can secure an Exchange box to sufficiently avoid it getting hacked, and a properly-installed version of Symantec will keep most spam away. It /is/ possible, I suppose :-) I'd bet that if he kept the FreeBSD box in place and just told his boss he upgraded to Exchange, they'd never even know :-) Regards, Alex
Re: Uppercase E-mail in Latin America
Hi, doesnt it appear to everyone else that this has the (slim to none) makings of a new urban legend? I have to admit that when Warren posted this, I went to snopes to check, and there was nothing there :-) Regards, Alex
SpamAssassin Ruleset Generation
I have a question about - understanding how are rulesets generated for spamassassin. For example - consider the rule in 20_drugs.cf : header SUBJECT_DRUG_GAP_C Subject =~ /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i describe SUBJECT_DRUG_GAP_C Subject contains a gappy version of 'cialis' Who generated the regular expression /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i a. Is it done manually with people writing regex to see how efficiently they capture spams? b. Is there an algorithm that identifies large corpus of spam and the comes up with these regex'es on its own? c. Is it a combination of (a), (b)? I know scores for rules are generated using a neural network trained with error back propagation http://wiki.apache.org/spamassassin/HowScoresAreAssigned But how are the rules generated themselves? Thnx -- View this message in context: http://www.nabble.com/SpamAssassin-Ruleset-Generation-tp25773508p25773508.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SpamAssassin Ruleset Generation
On Tue, 6 Oct 2009 11:08:28 -0700 (PDT) poifgh abhinav.pat...@gmail.com wrote: I have a question about - understanding how are rulesets generated for ... a. Is it done manually with people writing regex to see how efficiently they capture spams? b. Is there an algorithm that identifies large corpus of spam and the comes up with these regex'es on its own? c. Is it a combination of (a), (b)? The optional sought rules are autogenerated, the rest are manual.
Re: SpamAssassin Ruleset Generation
RW-15 wrote: On Tue, 6 Oct 2009 11:08:28 -0700 (PDT) poifgh abhinav.pat...@gmail.com wrote: I have a question about - understanding how are rulesets generated for ... a. Is it done manually with people writing regex to see how efficiently they capture spams? b. Is there an algorithm that identifies large corpus of spam and the comes up with these regex'es on its own? c. Is it a combination of (a), (b)? The optional sought rules are autogenerated, the rest are manual. Thnx - What are optional sought rules? -- View this message in context: http://www.nabble.com/SpamAssassin-Ruleset-Generation-tp25773508p25776105.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SpamAssassin Ruleset Generation
poifgh wrote: RW-15 wrote: On Tue, 6 Oct 2009 11:08:28 -0700 (PDT) poifgh abhinav.pat...@gmail.com wrote: I have a question about - understanding how are rulesets generated for ... a. Is it done manually with people writing regex to see how efficiently they capture spams? b. Is there an algorithm that identifies large corpus of spam and the comes up with these regex'es on its own? c. Is it a combination of (a), (b)? The optional sought rules are autogenerated, the rest are manual. Thnx - What are optional sought rules? http://www.google.com/search?q=spamassassin+sought -- Bowie
Re: SpamAssassin Ruleset Generation
Bowie Bailey wrote: http://www.google.com/search?q=spamassassin+sought :-D - Thnx -- View this message in context: http://www.nabble.com/SpamAssassin-Ruleset-Generation-tp25773508p25776303.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SpamAssassin Ruleset Generation
poifgh wrote: Bowie Bailey wrote: http://www.google.com/search?q=spamassassin+sought :-D - Thnx Other than the sought rules, all the rules are manually generated? Is there any statistics on how frequently are new rules/regex adopted by spamassasssin? Who are the people who write them? Any details related to it? thnx -- View this message in context: http://www.nabble.com/SpamAssassin-Ruleset-Generation-tp25773508p25776307.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SIGCHLD query
Martin Gregorie wrote: On Tue, 2009-10-06 at 16:46 +0200, Per Jessen wrote: Martin Gregorie wrote: What causes a spamd 3.2.5 child process to be terminated by receiving a SIGCHLD signal? A timeout in the child perhaps? That thought that may be the reason. It certainly seems to apply when a child runs longer than the time set by --timeout-child but there are a few cases where a SIGCHLD is sent when the child has only run for a second or two. Its a pity the log message doesn't include the reason why the SIGCHLD was sent. Martin, generally speaking, the parent can only report the signal and that the child has gone away. The child would have to report on why. /Per Jessen, Zürich
Re: SpamAssassin Ruleset Generation
Hi, Other than the sought rules, all the rules are manually generated? Is there any statistics on how frequently are new rules/regex adopted by spamassasssin? Who are the people who write them? Any details related to Information on Justin Mason's SOUGHT rules is here: http://taint.org/2007/08/15/004348a.html Use sa-update to update your SA rules once or twice per day with the new stuff. His ongoing development work is here: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jm/?sortby=date HTH, Alex
Re: SpamAssassin Ruleset Generation
On Tue, 6 Oct 2009, poifgh wrote: Other than the sought rules, all the rules are manually generated? Is there any statistics on how frequently are new rules/regex adopted by spamassasssin? Who are the people who write them? Any details related to it? Most of the rules are manually written by contributors such as myself. Some meta rules are generated by various means from existing rules - for example, the ADVANCE_FEE rules are generated using genetic algorithms to find effective combinations of simpler subrules that were manually generated. New rules are added whenever a contributor works on them, and this is generally based on when they have time to do so, when they have new ideas, and when new forms of spam appear. Indirect contributors will post rules to the users list and a contributor may add them to the rules sandbox for testing and eventual inclusion in the base ruleset. The CREDITS file in the sources should list all of the contributors. Some contributors may not have added their names to that file, though. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- 5 days since a sunspot last seen - EPA blames CO2 emissions
Re: OT bad news
Quanah Gibson-Mount a écrit : --On Monday, October 05, 2009 11:50 PM +0200 mouss mo...@ml.netoyen.net wrote: Thomas Mullins a écrit : We have been running Spamassassin for maybe eight years now. But, my coworkers do not like OpenSource. So they have finally complained enough that my boss is going to replace our reliable FreeBSD/Spamassassin boxes. They are planning on purchasing something that runs ON Exchange. What a bummer. and the problem is? if they want exchange, give them exchange. don't fight (directly), watch instead. take pleasure of the situation, get fun as you can. I personally took fun all day long in windows-only (and believe it or not, in linux-only) environments. that said, you can still try to explain that exchange should not be exposed to the internet. you still need a relay (such as freebsd/postfix). And once exchange falls over, show them Zimbra. ;) Which uses postfix/SA/amavis, etc, and looks a lot like exchange... only better. ;) I have to chose between zimbra and exchange, I'll go for exchange. but I don't need to chose between the two. I want different components for different tasks. and for many things, I go for web oriented solutions.
Re: Spam Eating Monkey?
On 10/04/2009 09:32 PM, Blaine Fleming wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Warren Togami wrote: http://spameatingmonkey.com Anyone have any experience using these DNSBL and URIBL's? Is anyone from this site on this list? I wonder if we should add these rules to the sandbox for masschecks as well. Since someone is bound to ask I figure I'll state right now that I have no objections to the SEM lists being included in the masschecks. In fact, I'm quite curious. I would also recommend adding AnonWhois.org to the list. I'll add your existing rules to the Sandbox for testing. But have you considered putting all the DNSBL's and URIBL's into aggregated zones so you can cut down on redundant queries? http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists For example, one DNSBL lookup here can respond with 127.0.0.[1-5] depending on which list it is. Warren Togami wtog...@redhat.com
Re: SpamAssassin Ruleset Generation
poifgh wrote: I have a question about - understanding how are rulesets generated for spamassassin. For example - consider the rule in 20_drugs.cf : header SUBJECT_DRUG_GAP_C Subject =~ /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i describe SUBJECT_DRUG_GAP_C Subject contains a gappy version of 'cialis' Who generated the regular expression /\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i Man, that's a good question. I wrote a large chunk of the rules in 20_drugs.cf, but not that one. ( I wrote the stuff near the bottom that uses meta rules. ie: __DRUGS_ERECTILE1 through DRUGS_MANYKINDS, originally distributed as a separate set called antidrug.cf). As I recall, there were 2 other people making drug rules, but it's been a LONG time, and I forget who did it. Those rules were written in the 2004-2006 time frame when pharmacy spams were just hammering the heck outa everyone. a. Is it done manually with people writing regex to see how efficiently they capture spams? Yes. Many hours of reading spams, studying them, testing various regex tweaks, checking for false positives, etc, etc. mass-check is your friend for this kind of stuff. One post from when I was developing this as a stand-alone set: http://mail-archives.apache.org/mod_mbox/spamassassin-users/200404.mbox/%3c6.0.0.22.0.20040428132346.029d9...@opal.evi-inc.com%3e Note: the comcast link mentioned in that message should be considered DEAD. The antidrug set is no longer maintained separately from the mailline ruleset, and hasn't been for years. If you want to break the rules down a bit, here's some tips: The rules are in general designed to detect common methods to obscure text by inserting spaces, punctuation, etc between letters, and possibly substituting some of the letters for other similar looking characters. (W4R3Z style, etc) The simple format would be to think of it in groupings. You end up using a repeating pattern of (some representation of a character)(some kind of gap sequence)(character)(gap)...etc. .{0,2} is a gap sequence, although not one I prefer. I prefer [_\W]{0,3} in most cases because it's a bit less FP-prone, but risks missing things using small lower-case letters to gap. You also get replacements for characters in some of those, like [A4] instead of just A. Or, more elaborately.. [a4\xe0-\...@] So this mess: body __DRUGS_ERECTILE1 /(?:\b|\s)[_\W]{0,3}(?:\\\/|V)[_\W]{0,3}[ij1!|l\xEC\xED\xEE\xEF][_\W]{0,3}[a40\xe0-\...@][_\w]{0,3}[xyz]?[gj][_\W]{0,3}r[_\W]{0,3}[a40\xe0-\...@][_\w]{0,3}x?[_\W]{0,3}(?:\b|\s)/i Could be broken down: (?:\b|\s) - preamble, detecting space or word boundary. [_\W]{0,3} - gap (?:\\\/|V) - V [_\W]{0,3} - gap [ij1!|l\xEC\xED\xEE\xEF] - I [_\W]{0,3} - gap [a40\xe0-\...@] - A [_\W]{0,3} - gap [xyz]?[gj] - G (with optional extra garbage before it) [_\W]{0,3} - gap r- just R :-) [_\W]{0,3} - gap [a40\xe0-\...@] -A [_\W]{0,3} - gap x? - optional garbage [_\W]{0,3} - gap (?:\b|\s)- suffix, detecting space or word boundary. Which detects weird spacings and substitutions in the word Viagra. But how are the rules generated themselves? Mostly meatware, except the sought rules others have mentioned. Thnx
Re: OT bad news
On Tue, Oct 6, 2009 at 4:12 AM, Dan Schaefer d...@performanceadmin.com wrote: I'll have to repeat, for the original poster this isn't a technology vs technology argument. If it was, his coworkers would be listing specific things Exchange does that FreeBSD/SA does not do. (Standing ovation on both emails) Uncloaking to vigorously second. Ted is so painfully right on that I wish that it were otherwise (out of sympathy for the OP). Royce
Re: Spam Eating Monkey?
Warren Togami wrote: I'll add your existing rules to the Sandbox for testing. Thank you! But have you considered putting all the DNSBL's and URIBL's into aggregated zones so you can cut down on redundant queries? Actually, the uri red list is an aggregate zone of my uri black, red and yellow lists. The main reason I haven't merged the black list with any of the other IP zones is because I haven't had enough user response on the other lists yet. Basically, the relevant zones are the SEM-URIRED and SEM-BLACK and each of them needs to be it's own query because of the two completely different datasets. --Blaine
Re: Spam Eating Monkey?
On 10/06/2009 11:15 PM, Blaine Fleming wrote: Warren Togami wrote: I'll add your existing rules to the Sandbox for testing. Thank you! But have you considered putting all the DNSBL's and URIBL's into aggregated zones so you can cut down on redundant queries? Actually, the uri red list is an aggregate zone of my uri black, red and yellow lists. The main reason I haven't merged the black list with any of the other IP zones is because I haven't had enough user response on the other lists yet. You are misunderstanding the question. A single DNS query could respond different numbers meaning they are hits on different lists. Your lists that are subsets or supersets of other lists can easily use this. The querying software need only to know what each result means. Warren
consolidating DNSBLs into a single query (was Spam Eating Monkey?)
Warren Togami wrote: You are misunderstanding the question. A single DNS query could respond different numbers meaning they are hits on different lists. Your lists that are subsets or supersets of other lists can easily use this. The querying software need only to know what each result means. Not saying that this is a bad idea, but it does have its limitations. For example, some lists are into the hundreds of megabytes large, and getting the whole file rsncned and updated can take more than several minutes. Often, such lists update only once or twice per hour, if even that often. In contrast, some lists are smaller and faster reacting and update every few minutes. Trying to merge all such lists into a single lists every several minutes is no trivial task in terms of having enough CPU cycles and RAM to get that done correctly and within a reasonably short time. Likewise, doing the merge hourly loses the benefit of some of the smaller-footprint faster-reacting lists which can react to emerging spam threats faster. Not saying such a consolidation can't be done... and maybe a few tradeoffs here are worthwhile? But if these issues are not dealt with smartly and competently, then one could easily find themselves with that all-in-one comprehensive DNSBL has not being as effective as querying them separately. Also, this loses the ability to *score* on multiple lists... unless you use a bitmasked scoring system whereby one list gets assigned .2, another .4, another .8, on to .128. But that leaves a maximum of only 7 lists. Sure, you can add more than 7 by employing other octets in the answer IP, but that only severely complicates matters. And as it stands, you'd also have the complexity of getting the spam filter to parse, understand, and react properly to those bitmasks. -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: consolidating DNSBLs into a single query (was Spam Eating Monkey?)
On Tue, Oct 6, 2009 at 8:19 PM, Rob McEwen r...@invaluement.com wrote: Warren Togami wrote: You are misunderstanding the question. A single DNS query could respond different numbers meaning they are hits on different lists. Your lists that are subsets or supersets of other lists can easily use this. The querying software need only to know what each result means. Not saying that this is a bad idea, but it does have its limitations. For example, some lists are into the hundreds of megabytes large, and getting the whole file rsncned and updated can take more than several minutes. Often, such lists update only once or twice per hour, if even that often. Hmm ... interesting. If implemented via rbldnsd, each list could be maintained in a separate file, and since rbldnsd can be configured to build a single zone using multiple files on the back end, different lists could be refreshed at different rates. Your comments about tradeoffs and bitmasking still stand, of course. Royce