AWL observations
Sometimes the AWL rule doesn't appear in the list. From looking at the behavior it seems that the rule is only guaranteed to fire if the stored score for the tuple is significantly different than the message score, or if the stored tuple has a very high stored score. But if the stored score and message score are close and the stored tuple does not have a large score, then the rule will not fire. I assume the above reflects the logic for when to adjust the score, rather than reflecting when the tuple was matched. But the plugin text and code all talk about the rule firing on match, not when corrective scoring occurred. Is this a bug? or should the text be changed? If the current code is intended, I'd like to request a new function call that tells if the tuple exists and the number of times it has been seen -- Eric A. Hall http://www.eric-a-hall.com/ Network Technology Research Grouphttp://www.ntrg.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: AWL observations
On 7/22/2010 11:24 AM, RW wrote: I don't recall seeing anything like that. Are sure it's not due to the IP address changing or AWL being short-circuited? My testing is with local message files. If I use sa-awl to dump the database I can see the counter increment, but the rule doesn't fire unless the conditions are met -- Eric A. Hall http://www.eric-a-hall.com/ Network Technology Research Grouphttp://www.ntrg.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: AWL observations
On 7/22/2010 11:07 PM, Matt Kettler wrote: On 7/22/2010 10:32 AM, Eric A. Hall wrote: If the current code is intended, I'd like to request a new function call that tells if the tuple exists and the number of times it has been seen For what purpose? (Not trying to be mean, just asking, because if it's not of use to the general SA community, it doesn't belong in the mainline release. However, if it's useful.) I want to use a previously-seen match list for a variety of purposes. I already have my SAGrey plugin [1] that uses the AWL for limited-use greylists (it only fires when spam threshold exceeded, so as not to penalize everybody, but its not as useful if the AWL rule isn't reliable). I also have a rule I use locally that blocks mail with binary attachments if the sender is unknown, which I would like to modify so that it only fires on spammy messages. There are a couple of other things on the to-do list here that would benefit from a seen-before database. I can write my own, but it would be easier to use AWL if its going to be present and reliable. It would also be nice to have a last-updated field so that the entries can be aged. I already run the pruning tools to purge one-time senders (spammers) at the end of each month, but I would rather do one-time over six-months. -- Eric A. Hall http://www.eric-a-hall.com/ Network Technology Research Grouphttp://www.ntrg.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SVN notifications killing spamassassin
On 2/18/2008 5:50 AM, Justin Mason wrote: Eric A. Hall writes: I sometimes get SVN notifications that contain lists of files and their status. The filenames will often get picked up by the URI matching algorithm, each of which end up being processed through numerous lookups (URICOUNTRY, my LDAP filter, etc). Sometimes I get very large messages with hundreds of file lists, which in turn causes spamassassin to go into never-never land while it thinks about the hundreds of URI matches. For example, Afpo/reports/perl/nagios_notifications1.pl.bak Afoo/reports/perl/nagios_outages1.pl Afoo/reports/perl/GWIR.pm nagios_outages1.pl will be determined as a URI for .pl domain and GWIR.pm will be determined as a URI for .pm domain, and so forth. The only way to get these messages through is to disable spamassassin... I've updated to 3.2.4 just now and it still has the same problem I'm guessing the URI analyzer needs to be smarter. The URI analyzer already is smarter ;) Changing the URICountry plugin is the way to fix this. It doesn't appear to be URICountry that's dying. Either way though, I bet all of the plugins will perform a lot better when they are no longer being passed filenames. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
SVN notifications killing spamassassin
I sometimes get SVN notifications that contain lists of files and their status. The filenames will often get picked up by the URI matching algorithm, each of which end up being processed through numerous lookups (URICOUNTRY, my LDAP filter, etc). Sometimes I get very large messages with hundreds of file lists, which in turn causes spamassassin to go into never-never land while it thinks about the hundreds of URI matches. For example, Afpo/reports/perl/nagios_notifications1.pl.bak Afoo/reports/perl/nagios_outages1.pl Afoo/reports/perl/GWIR.pm nagios_outages1.pl will be determined as a URI for .pl domain and GWIR.pm will be determined as a URI for .pm domain, and so forth. The only way to get these messages through is to disable spamassassin... I've updated to 3.2.4 just now and it still has the same problem I'm guessing the URI analyzer needs to be smarter. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Question - How many of you run ALL your email through SA?
On 8/16/2007 12:39 PM, Marc Perkel wrote: OK - it's interesting that of all of you who responded this is the only person who is doing it right. I have to say that I'm somewhat surprised that so few people are preprocessing their email to reduce the SA load. As we all know SA is very processor and memory expensive. Personally, I'm filtering 1600 domains and I route less than 1% of incoming email through SA. SA does do a good job on the remaining 1% that I can't figure out with blacklists and whitelists and Exim tricks, but if I ran everything through SA I'd have to have a rack of dedicated SA servers. third-party blacklists are good indicators but they are not perfectly accurate. the errors make them unsuitable as a sole metric, but are by definition very good inputs for spamassassin's probability scoring systems. for those of us that can afford this approach it works very well. I'm sorry you can't, but that's not our fault. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Question - How many of you run ALL your email through SA?
On 8/15/2007 11:11 PM, Marc Perkel wrote: As opposed to preprocessing before using SA to reduce the load. (ie. using blacklist and whitelist before SA) All email sent to port 25 goes through SA for processing. Postfix has a couple of regular expressions and some behavioral stuff (invalid commands, invalid recipients, etc), but otherwise it just looks for the spam score and if its too high the transfer is rejected. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: plugin to test attachments from unknown senders
On 7/14/2007 3:49 PM, Eric A. Hall wrote: Like other folks I've been getting hit with the PDF spam pretty hard. I think the way to solve this and the image spam in general is to do a plugin that does two things: 1) looks in the message to see if there is a binary attachment 2) looks in the AWL to see if the sender tuple is known 3) if (1==true) (2==false) fire a score I was able to do this with basic rules. Note the low (0.1) scores. It would be nice to use this as a DEFER check in the MTA, since resends will hit the AWL rule and get cleared. # # This rule looks for in-line MIME Content-Type headers of various # types, and then looks to see if the sender tuple is already known # to the autowhitelist system. If the message contains a binary # attachment and the sender tuple is unknown, fire a rule that tells # us that the message is a gift from a stranger. # mimeheader __L_C_TYPE_APP Content-Type =~ /^application/i mimeheader __L_C_TYPE_IMAGEContent-Type =~ /^image/i mimeheader __L_C_TYPE_AUDIOContent-Type =~ /^audio/i mimeheader __L_C_TYPE_VIDEOContent-Type =~ /^video/i mimeheader __L_C_TYPE_MODELContent-Type =~ /^model/i metaL_STRANGER_APP (!AWL __L_C_TYPE_APP) score L_STRANGER_APP 0.1 tflags L_STRANGER_APP noautolearn priorityL_STRANGER_APP 1001 # defer till after AWL metaL_STRANGER_IMAGE(!AWL __L_C_TYPE_IMAGE) score L_STRANGER_IMAGE0.1 tflags L_STRANGER_IMAGEnoautolearn priorityL_STRANGER_IMAGE1001 # defer till after AWL metaL_STRANGER_AUDIO(!AWL __L_C_TYPE_AUDIO) score L_STRANGER_AUDIO0.1 tflags L_STRANGER_AUDIOnoautolearn priorityL_STRANGER_AUDIO1001 # defer till after AWL metaL_STRANGER_VIDEO(!AWL __L_C_TYPE_VIDEO) score L_STRANGER_VIDEO0.1 tflags L_STRANGER_VIDEOnoautolearn priorityL_STRANGER_VIDEO1001 # defer till after AWL metaL_STRANGER_MODEL(!AWL __L_C_TYPE_MODEL) score L_STRANGER_MODEL0.1 tflags L_STRANGER_MODELnoautolearn priorityL_STRANGER_MODEL1001 # defer till after AWL -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
some of you have bad meta rules...
noticed this in the debug output while upgrading [10637] dbg: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK' [10637] info: rules: meta test FM__TIMES_2 has dependency 'FH_HOST_EQ_D_D_D_D' with a zero score [10637] info: rules: meta test FM_SEX_HOST has dependency 'FH_HOST_EQ_D_D_D_D' with a zero score [10637] dbg: rules: meta test SARE_RD_SAFE has undefined dependency 'SARE_RD_SAFE_MKSHRT' [10637] dbg: rules: meta test SARE_RD_SAFE has undefined dependency 'SARE_RD_SAFE_GT' [10637] dbg: rules: meta test SARE_RD_SAFE has undefined dependency 'SARE_RD_SAFE_TINY' [10637] info: rules: meta test HS_PHARMA_1 has dependency 'HS_SUBJ_ONLINE_PHARMACEUTICAL' with a zero score [10637] dbg: rules: meta test SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_XMAIL_SUSP2' [10637] dbg: rules: meta test SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_HEAD_XAUTH_WARN' [10637] dbg: rules: meta test SARE_HEAD_SUBJ_RAND has undefined dependency 'X_AUTH_WARN_FAKED' don't feel bad, I had some broken ones myself :) --lint probably ought to be extended to catch meta rules btw -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
plugin to test attachments from unknown senders
Like other folks I've been getting hit with the PDF spam pretty hard. I think the way to solve this and the image spam in general is to do a plugin that does two things: 1) looks in the message to see if there is a binary attachment 2) looks in the AWL to see if the sender tuple is known 3) if (1==true) (2==false) fire a score I've been meaning to adapt my SAGREY plugin [1] for this but have not had time and may not have time for a while yet, so I thought I'd throw this out there to see if anybody else is interested in doing it [1] http://www.ntrg.com/misc/sagrey/ -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Rule suggestion - smtp sanity
On 7/13/2007 11:04 AM, arni wrote: From large providers i sometimes recieve messages through encrypted smtp, the header looks smth like this (qmail): ... with (AES256-SHA encrypted) SMTP; ... Would it be a good idea to give a minimal negative score on this -0.1 or -0.2 if this happens on the last hop? - It proves that the sending smtp server is very protocol sane, which spambots are usually not. It's a good idea to look at last-hop transfer and see if it used STARTTLS, if the certificate was valid, etc., and is something I've got on my to-do list for future development. The big problem is that there is no real standard and every MTA records the details differently. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Rule based on X Greylist header
On 3/13/2007 2:40 PM, Arjun Datta wrote: when milter-greylist detects a user that has passed SMTP AUTH - it does not delay it and adds a header: X Greylist: Sender succeeded SMTP Authentication, not delayed by milter-greylist 0.3 Now, how do I add a rule to spamassassin that assigns a negative score to emails (like whitelisting where it adds a score of -100) that are detected with that header so that spamass-milter will not scan those emails. Assuming you mean X-Greylist instead of X Greylist, something like the following will either work or get you close: header L_MILTER_GREY X-Greylist =~ /^Sender succeeded SMTP Authentication/ score L_MILTER_GREY -100 put that into a cf file in one of your rules directory -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Annoying stocks scams
On 3/6/2007 5:30 AM, [EMAIL PROTECTED] wrote: It's my first meta rule, which only gives a score if both conditions are true, and I was wondering if there's a possibility to make the score more intelligent : my local rules use combinations. any message that hits AT LEAST one rule gets the L_STOCKS_1 match. messages that hit more than one ALSO get a separate score, in addition to L_STOCKS_1: metaL_STOCKS_1 (__L_STOCKS_01 || __L_STOCKS_02 || __L_STOCKS_03 || __L_STOCKS_04 || __L_STOCKS_05 || __L_STOCKS_06 || __L_STOCKS_07 || __L_STOCKS_08 || __L_STOCKS_09 || __L_STOCKS_10 || __L_STOCKS_11 || __L_STOCKS_12 || __L_STOCKS_13 || __L_STOCKS_14 || __L_STOCKS_15 || __L_STOCKS_16 || __L_STOCKS_17 || __L_STOCKS_18 || __L_STOCKS_19 || __L_STOCKS_20 || __L_STOCKS_21 || __L_STOCKS_22 || __L_STOCKS_23 || __L_STOCKS_24 || __L_STOCKS_25 || __L_STOCKS_26 || __L_STOCKS_27 ) describeL_STOCKS_1 One or more stock markers score L_STOCKS_1 1.0 metaL_STOCKS_2 (( __L_STOCKS_01 + __L_STOCKS_02 + __L_STOCKS_03 + __L_STOCKS_04 + __L_STOCKS_05 + __L_STOCKS_06 + __L_STOCKS_07 + __L_STOCKS_08 + __L_STOCKS_09 + __L_STOCKS_10 + __L_STOCKS_11 + __L_STOCKS_12 + __L_STOCKS_13 + __L_STOCKS_14 + __L_STOCKS_15 + __L_STOCKS_16 + __L_STOCKS_17 + __L_STOCKS_18 + __L_STOCKS_19 + __L_STOCKS_20 + __L_STOCKS_21 + __L_STOCKS_22 + __L_STOCKS_23 + __L_STOCKS_24 + __L_STOCKS_25 + __L_STOCKS_26 + __L_STOCKS_27 ) == 2) describeL_STOCKS_2 Two stock markers score L_STOCKS_2 4.0 metaL_STOCKS_3 (( __L_STOCKS_01 + __L_STOCKS_02 + __L_STOCKS_03 + __L_STOCKS_04 + __L_STOCKS_05 + __L_STOCKS_06 + __L_STOCKS_07 + __L_STOCKS_08 + __L_STOCKS_09 + __L_STOCKS_10 + __L_STOCKS_11 + __L_STOCKS_12 + __L_STOCKS_13 + __L_STOCKS_14 + __L_STOCKS_15 + __L_STOCKS_16 + __L_STOCKS_17 + __L_STOCKS_18 + __L_STOCKS_19 + __L_STOCKS_20 + __L_STOCKS_21 + __L_STOCKS_22 + __L_STOCKS_23 + __L_STOCKS_24 + __L_STOCKS_25 + __L_STOCKS_26 + __L_STOCKS_27 ) == 3) describeL_STOCKS_3 Three stock markers score L_STOCKS_3 9.0 metaL_STOCKS_4 (( __L_STOCKS_01 + __L_STOCKS_02 + __L_STOCKS_03 + __L_STOCKS_04 + __L_STOCKS_05 + __L_STOCKS_06 + __L_STOCKS_07 + __L_STOCKS_08 + __L_STOCKS_09 + __L_STOCKS_10 + __L_STOCKS_11 + __L_STOCKS_12 + __L_STOCKS_13 + __L_STOCKS_14 + __L_STOCKS_15 + __L_STOCKS_16 + __L_STOCKS_17 + __L_STOCKS_18 + __L_STOCKS_19 + __L_STOCKS_20 + __L_STOCKS_21 + __L_STOCKS_22 + __L_STOCKS_23 + __L_STOCKS_24 + __L_STOCKS_25 + __L_STOCKS_26 + __L_STOCKS_27 ) 3) describeL_STOCKS_4 Four or more stock markers score L_STOCKS_4 20.0 My scores are high because I have some mail accounts on other networks that are lightly whitelisted and I need to hit the spams that come from there. Do not use those scores or else you will fry mailing lists etc.
[Fwd: Re: *****POSIBLE SPAM***** Re: Annoying stocks scams]
please suspend this users mailing list account ---BeginMessage--- Mensaje Automatico *** Este usuario no se encuentra operativo, para cualquier asunto le ruego se pongan en contacto con Leandro Gayango [EMAIL PROTECTED] *** ehall 03/06/07 19:24 Spam detection software, running on the system vm-antispam2.mpsistemas.es, has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On 3/6/2007 5:30 AM, [EMAIL PROTECTED] wrote: It's my first meta rule, which only gives a score if both conditions are true, and I was wondering if there's a possibility to make the score more intelligent : [...] Content analysis details: (5.1 points, 4.0 required) pts rule name description -- -- 1.0 MY_DSL I could use a BL for this. 0.5 NO_RDNSSending MTA has no reverse DNS (Postfix variant) 0.2 MR_NOT_ATTRIBUTED_IP Beta rule: an non-attributed IPv4 found in headers 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] 2.0 RATWR10_MESSID Message-ID has ratware pattern (HEXHEX.HEXHEX@) 0.4 UPPERCASE_50_75message body is 50-75% uppercase 0.0 NO_RDNS2 Sending MTA has no reverse DNS 1.0 RCVD_IN_SORBS RCVD_IN_SORBS ---End Message---
feature req
need a --show-rule option to spamassassin cmd that will display all the information associated with a named rule (DESC, SCORE, rule syntax, etc) -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: feature req
On 2/15/2007 8:53 AM, Justin Mason wrote: Eric A. Hall writes: need a --show-rule option to spamassassin cmd that will display all the information associated with a named rule (DESC, SCORE, rule syntax, etc) could you open a feat req on the bugzilla? it'll get lost otherwise. bug 5335 for what it's worth, we already (internally) use a tool called build/parse-rules-for-masses, which parses and generates a perl data structure representing the rules. If you're consuming it in perl, that'd be a good way to do it. I'm looking at it from operator perspective--it's very time consuming to track down information about a rule and I'm thinking that this would make the process simpler. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: How to deal with mailing list spam?
On 1/24/2007 3:29 PM, Chris Purves wrote: I was wondering what is the best way to deal with spam that comes through on mailing lists? For mailing lists like spamassassin I whitelist all mail because I expect to see examples of spam, but for other lists, is it a good idea to run 'sa-learn --spam'? What about reporting those spam to razor/pyzor or spamcop? 1) subscribe to lists that are well run 2) whitelist the envelope-sender address, or the originating network, or in some cases you may also want to whitelist the list address itself so that directed replies that are TO you but CC the list also get boosted -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: One person to filter spam
On 1/24/2007 12:37 PM, IT_Architect wrote: I'm thinking about using SpamAssassin. Is it possible to have the suspected spam to one account to have one person clear or delete possible spam. When they say it's good, will it then go to the correct user? It's possible to do that but not with spamassassin alone. You'll need a mailer that can resumbit the messages after they've been cleared, at the very least. That's trivial, but it's not something spamassassin does. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: INFO_TLD
On 1/16/2007 1:52 AM, Eric A. Hall wrote: On 1/16/2007 12:06 AM, Theo Van Dinter wrote: On Mon, Jan 15, 2007 at 10:44:33PM -0500, Eric A. Hall wrote: sa-update nuked INFO_TLD which I was still finding useful can somebody with the rule send it to me? thanks One of the aggressive porno spammers is all about the .info so in case anybody else is looking for these uri INFO_TLD /\.info(?::\d+)?(?:\/|$)/i describe INFO_TLD Contains an URL in the INFO top-level domain scoreINFO_TLD 1.0 btw, I run with a lot higher than 1.0 here -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
INFO_TLD
sa-update nuked INFO_TLD which I was still finding useful can somebody with the rule send it to me? thanks -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: INFO_TLD
On 1/16/2007 12:06 AM, Theo Van Dinter wrote: On Mon, Jan 15, 2007 at 10:44:33PM -0500, Eric A. Hall wrote: sa-update nuked INFO_TLD which I was still finding useful can somebody with the rule send it to me? thanks It's pretty straightforward to write, but the rule still exists in the standard 3.1 install. Check out /usr/share/spamassassin or the tarball. got it -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
would SA benefit from port to Java
Thinking about the GPL Java announcement some, and trying to imagine the kinds of opportunities this allows for, it occurs to me that SpamAssassin might be a natural fit for Java. I'm just thinking out loud here, not advocating anything... Would it run better? Would it be faster, have smaller memory footprint, better reclamation, better hooks for plugins etc? OTOH, would it be harder to build, given the dependence of SA on perl modules? Thoughts? -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Feature Request: envelope scanning
On 10/25/2006 7:15 PM, Mark Martinec wrote: For envelope sender there is a standard header: Return-Path Return-Path is supposed to be added when the message is placed in the mailstore (ie, last hop, after the transfer network). Since I do scanning at the MTA level before delivery, I don't have Return-Path yet. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Scoring PTR's
On 10/24/2006 4:01 PM, John Rudd wrote: Eric A. Hall wrote: Note that this is entirely legal, and even necessary: [ root# ] host 207.65.71.14 14.71.65.207.in-addr.arpa is an alias for 14.in-addr.ntrg.com. 14.in-addr.ntrg.com is an alias for 14.in-addr.labs.ntrg.com. 14.in-addr.labs.ntrg.com domain name pointer bulldog.labs.ntrg.com. All of that's ok. The question is: is bulldog.labs.ntrg.com an A record, or a CNAME record? That's the thing I have been testing for (is it a CNAME). That's the thing that RFC1912 doesn't like (the PTR record itself, not merely in-addr.arpa aliases that eventually get to the PTR record, but the PTR record itself, may not _refer_to_ a CNAME record, it must refer to an A record) There's nothing that prohibits the target domain name entry of a PTR from having a CNAME record. A PTR is just a pointer to some other domain name. The target domain name can have whatever records the owner feels they need. It's probably something that should be discouraged, since additional processing would be needed to obtain a complete answer, but on its face it's not illegal (again RFC1912 is informational, is not authoritative, and has significant errors). You'd probably need a plugin to check for this, since you'd need to generate your own query for the RRs associated with the target domain name in order to get a definitive answer. I'm not really sure this would be reliable spam-sign. I can imagine some legitimate uses for this. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Feature Request: envelope scanning
On 10/25/2006 2:35 PM, Joe Flowers wrote: If I pre-pend a message's Envelope to it's Body, can Spamassassin do anything useful with it? At a minimum you can use the envelope recipient(s) to do some kinds of spam-trap filtering (eg, is the message addressed to a spamtrap and me). You can use the envelope sender to do some kinds of whitelisting too (such as whitelisting your aunt at yahoo even if the you have the whole yahoo domain otherwise blacklisted, or whitelisting a mailing list sender). My LDAPfilter plugin (http://www.ehsco.com/misc/ldapfilter/) uses them for these kinds of purposes. Other possibilities exist too. Envelope sender can be used for some SPF filters that aren't currently done, for example. The first problem is that there is no standard header field, and in the case of envelope recipient(s) where there can be multiple entries, there is no standard for the field data. I use X-Envelope-To and X-Envelope-From with typical RFC822 address syntaxes (no real name blob, etc), but only because I had nothing else to use and that structure seems to be the most obvious and least harmful. Another consideration is that they have to be created by the MTA, and spamassassin doesn't have possession of the envelope data so it can't create them. In my case I have to make postfix generate them in order for them to be usable, and the LDAPfilter plugin has .cf options that point to the header fields in questions (eg, ldapfilter_env_from_header) But yeah, if they are provided and if there is a way to tell spamassassin where to look, they are very useful. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Scoring PTR's
On 10/23/2006 10:50 PM, John Rudd wrote: Eric A. Hall wrote: On 10/23/2006 7:01 PM, John Rudd wrote: a) does the hostname in the PTR record point to a CNAME instead of an A record That's not illegal. It's pretty common too, since subnet delegation of in-addr space only works on /8, /16 and /24 subnets due to the way that octets are mapped to domain name labels in that hierarchy. RFC 1912 says don't do that :-) RFC1912 is informational non-authoritative. It has some big errors (ie, it says a label may not be all-numeric, which is wrong). Though, honestly, I've yet to see it actually get triggered in my mimedefang filter, so I don't mind losing it. Can you clarify what you are looking for here? Note that this is entirely legal, and even necessary: [ root# ] host 207.65.71.14 14.71.65.207.in-addr.arpa is an alias for 14.in-addr.ntrg.com. 14.in-addr.ntrg.com is an alias for 14.in-addr.labs.ntrg.com. 14.in-addr.labs.ntrg.com domain name pointer bulldog.labs.ntrg.com. In that example, the entry for 14.71.65.207.in-addr.arpa. has a CNAME RR pointing to 14.in-addr.ntrg.com. (the entry has been delegated to my zone using a CNAME), which in turn aliases to 14.in-addr.labs.ntrg.com., which in turn has a PTR record that resolves to bulldog.labs.ntrg.com. A PTR record is just a pointer to some other domain name and only has semantic meaning when lookups are keyed to a name in the in-addr.arpa. hierarchy. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Scoring PTR's
On 10/24/2006 7:55 AM, John Rudd wrote: Here's an example for one I got tonight (I got 3, but trashed the others before thinking I should send that as an example). (i577A0BC3.versanet.de [87.122.11.195]) 577A0BC3 is the hex encoding of the IP address, with no separators. That may be spam-sign, but unless there's something more than what you're showing it's not a standards violation. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Scoring PTR's
On 10/23/2006 7:01 PM, John Rudd wrote: Eric A. Hall wrote: http://www.ehsco.com/misc/spamassassin/std_compliance.cf might help or work for what you're doing. Make sure to read the disclaimers and warnings Those helped a lot. There's only three checks I can't do with them (probably need to use a plugin for it): a) does the hostname in the PTR record point to a CNAME instead of an A record That's not illegal. It's pretty common too, since subnet delegation of in-addr space only works on /8, /16 and /24 subnets due to the way that octets are mapped to domain name labels in that hierarchy. b) does the hostname contain it's IP address in _hex_ form (instead of in decimal form, which I've already got working) I don't recall ever seeing that. If you create a rule for that you might also want to do octal notations too, which is another valid address encoding syntax that should never appear naturally. c) does the hostname in the PTR record actually going to an A record which includes the relay's IP addr that's a reasonable test -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: R: R: Scoring PTR's
On 10/20/2006 10:43 AM, Giampaolo Tomassoni wrote: RFC 2821 Section 4.1.4 Order of Commands ... An SMTP server MAY verify that the domain name parameter in the EHLO command actually corresponds to the IP address of the client. However, the server MUST NOT refuse to accept a message for this reason if the verification fails: the information about verification failure is for logging and tracing only. ... It can mean whatever you like (do note MAY and MUST NOT though). It just mean you can't drop a message based solely on the parameter of the EHLO command. You MAY check it, if you like to. But you MUST NOT drop it. 2821 is for implementors, not operators. Software developers must not automatically drop mail for this reason --as a matter of design-- but as an operator you can do whatever you want with any piece of mail. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Scoring PTR's
http://www.ehsco.com/misc/spamassassin/std_compliance.cf might help or work for what you're doing. Make sure to read the disclaimers and warnings -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: R: R: Scoring PTR's
On 10/19/2006 7:11 PM, John Rudd wrote: It is my observation that the messages which come from an immediately relay that: A) does not have a PTR record, or B) has forged DNS (PTR record doesn't lead to an A record which resolves back to the SMTP client's IP address), or C) has a hostname that appears to be an end-client of some other network than my own (contains its own IP addr in the hostname, contains words like dynamic, dsl, dial-up, etc.) are generating spam. It's a bigger list than that but yeah. My theory is that if they can't get their network configured, no telling what else is broken, so I flag it. In order to exempt my own legitimate users, I skip the check if they're on my IP block OR if they do SMTP-AUTH. I've got two listeners, one for SMTP 25, one for SUBMIT 587. The latter only allows authenticated sessions. Mail sent to the former is heavily inspected while the session is action, while mail to the latter bypasses the filters altogether. The one thing I'm thinking about changing is, at home I _reject_ messages that fail these checks (using filter_sender in mimedefang). I'm thinking that, for the production systems at work, just to cover that incredibly small percentage of people who can't or wont use their ISP's mail server or do SMTP-AUTH, I'll merely quarantines their messages, via spam assassin score, instead of rejecting them. Yeah, I moved almost everything out of postfix and into spamassassin so that I could work on probability instead of binary. Just make sure to whitelist all traffic for any mailing list that you're on, possibly including to/cc whitelists. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Any comments of the SpamHaus lawsuit?
On 10/11/2006 1:16 AM, Jason Haar wrote: If Spamhaus lose this lawsuit (which they are ignoring as they are they lost it a while ago. summary judgement was in september. the latest papers are because spamhaus didn't comply with the default judgement. pretty routine stuff. Americans to arms I say... Start sending Internet for Dummies to the judge for starters ;-) Actually the judge seems to have left an out for spamhaus in the original default judgement. Excerpting myself: the original default injunctive order states that Spamhaus must not interfere with Mr. Linhardt's e-mail messages ...unless Spamhaus can demonstrate by clear and convincing evidence that Plaintiffs have violated relevant United States law. Well, that should be easy enough--there are millions of people who have received his spam, and it seems to be in violation of the CAN-SPAM act as I know it (and the judge might know it, too). Once demonstrated, the injunction would be partially lifted automatically. Better yet, affected parties could then pursue damages against Mr. Linhardt of their own, thus forcing him to back down. The problem here is that Spamhaus isn't subject to U.S. jurisdiction (as it has argued itself) and so isn't eligible for relief under the CAN-SPAM act, either. Instead, it needs a U.S.-based partner to pursue this angle on its behalf. Worse, due to the way that the CAN-SPAM act is written, only certain parties can sue for damages, which further limits the pool of potential partners. However, many of the organizations that are eligible for relief are also some of Spamhaus' biggest beneficiaries (namely the ISPs that rely most on its filters), and so there should be a natural pool of willing partners for Spamhaus to choose from. http://www.informationweek.com/blog/main/archives/2006/10/spamhaus_needs.html -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: double letter porn
On 10/4/2006 5:57 PM, Richard Doyle wrote: I've been getting lots of porn site spam containing words with doubled letters, like this one: Can anybody suggest a rule or ruleset to catch these double-letter obfuscations? I'm using Spamassassin 3.1.4. You'd probably need to write a plug-in that used some kind of typo-matching logic to find porno words. Would be a good plug-in actually. Get busy :) -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: testing for empty text/plain
On 8/7/2006 12:25 AM, Theo Van Dinter wrote: On Mon, Aug 07, 2006 at 12:07:58AM -0400, Eric A. Hall wrote: Anybody written a rule that tests for empty text/plain, preferably only when a non-empty text/html or some other media-type is provided? Sounds very similar to MPART_ALT_DIFF. That might be useful as a pre-test filter, such as looking to see if MPART_ALT_DIFF fired before doing anything else. From there I can grep to see if text/plain has any printable characters. What's the most efficient way to grab the text/plain part? -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
testing for empty text/plain
Anybody written a rule that tests for empty text/plain, preferably only when a non-empty text/html or some other media-type is provided? Thanks -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
spec file for cpan2rpm and suse 9.3
Anybody got one that works with gnome/evolution? evolution requires spamassassin, which requires perl-spamassassin. cpan2rpm makes perl-Mail-SpamAssassin, which doesn't satisfy either of the packaging dependencies. Attempts on my part to tweak the spec file generated with cpan2rpm have failed miserably. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: spec file for cpan2rpm and suse 9.3
On 10/20/2005 1:27 PM, Eric A. Hall wrote: Anybody got one that works with gnome/evolution? evolution requires spamassassin, which requires perl-spamassassin. cpan2rpm makes perl-Mail-SpamAssassin, which doesn't satisfy either of the packaging dependencies. Attempts on my part to tweak the spec file generated with cpan2rpm have failed miserably. fixed it by adding the following two lines to the header block: Provides: perl-spamassassin Provides: spamassassin I was trying too hard before, thinking I needed version numbers etc. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Individual scoring at SMTP time
On 10/14/2005 6:40 AM, Magnus Holmgren wrote: If you want to reject spam at SMTP time (which I think all agree is a good practice as long as you weigh the risks against the benefits properly), but also want to apply individual settings (according to the One person's ham is another person's spam. maxim), what is the best practice for handling multiple recipients (RCPT TO: commands)? Best current practice for rejecting per-user at MTA level is to configure your MTA so that it only allows one RCPT per message transfer, as you spotted in your option #1. 2. Run a global check at SMTP time, using a conservative ruleset (possibly including bayes with low scores) that only catches 100% certain spam, then let each user run SA a second time any way they want (but without the ability to reject, just accept/blackhole/file in spam folder as usual). That's the better way, although it loses the ability to reject modest spam at the gateway. 3. Run SA with custom configs for each user at SMTP time. Reject if a) any user rejects, b) all user rejects, c) the majority rejects, d) the average score is above the average limit, e) other criterion. (Potentially time-consuming with many recipients, risking that the sending MTA times out.) You still need to limit recipients (actually more imporant since all processing load is now N*msg instead of 1*msg). 6. Write an RFC about changing the SMTP protocol to allow DATA before RCPT TO: (if the mail is sufficently short). :-) There have been several drafts trying to attack this problem. Mine is at http://www.ntrg.com/misc/I-Ds/draft-hall-inline-dsn-01.txt and suggests returning per-RCPT response codes after the DATA ack (eg, user1 returns 250, while user2 returns 550). 1 and 2 are easy to implement, but I don't know if someone has implemented support for 3 in current software. It's feasible enough with some scripting work. The hard part is applying the per-user settings into the chain (have to read the settings, apply them to the local scanning process without clobbering the others, etc). Really though the problem is load, since you are looking at multiples of scanning processes. FWIW, I wrote a primer on this kind of architecture for Network Computing Magazine, archived at http://www.ehsco.com/reading/20040916ncf.html -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: [SPAM] RE: GeoCities Link-only spam
On 8/22/2005 3:34 PM, Derek Harding wrote: On Sun, 2005-08-21 at 20:05 -0400, Eric A. Hall wrote: What's the benefit of using this instead of the uridnsbl plugin? The code below will look for the IP address behind a URI and then query the cn-kr.blackholes.us RBL to see if that addr is in China: This one doesn't require a DNS lookup which makes it faster. IP::Country use Whois lookups instead though, and UDP/DNS lookups are going to be faster than chained TCP/Whois queries. blackholes.us only covers a limited set. Just an example for discussion purposes (worth noting that their main web site is down too). http://countries.nerd.dk/more.html is another one I'll play with the plugin and see what kind of times and load I get -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: [SPAM] RE: GeoCities Link-only spam
On 8/22/2005 3:50 PM, Eric A. Hall wrote: IP::Country use Whois lookups instead though, and UDP/DNS lookups are going to be faster than chained TCP/Whois queries. I'll play with the plugin and see what kind of times and load I get Some poking around, IP::Country::Fast uses a pre-built mapping database instead of issuing lookups (IP::Country::Slow) or caching lookups (IP::Country::Medium). The pre-built databse is stored in the .gif files in /usr/lib/perl5/site_perl/5.8.6/IP/Country/Fast/ on my system, and presumably this stuff gets repackaged when IP allocations change. This means keeping the package synched, of course, but it does seem to be somewhat faster and requires less overhead. BTW, lookups for dead domain names are really slow and block the rest of the message processing. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: GeoCities Link-only spam
On 8/22/2005 4:14 PM, Dallas L. Engelken wrote: IP::Country use Whois lookups instead though Hrmm? Where does it say it uses Real-Time Whois lookups? The docu for IP::Country::Fast is empty and refers to IP::Country, which describes the use of whois. See my follow-up post though -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: OT: sa-learn, interfaced with Cyrus mailboxes
On 8/21/2005 1:59 AM, Forrest Aldrich wrote: I just switched over to Cyrus IMAP - and it didn't occur to me I'd need to change several ways I report spam, due to the mailstore format. I wonder whom else is using Cyrus IMAP here, and how you may be handling this. I don't use sa-learn, but Cyrus mailstore is basically just a folder hierarchy that each contain individual messages, each of which are their own mbox file. Just read *. into sa-learn and it should work on message-id the same as usual. Automatically moving the messages may be more of a problem. http://www.google.com/search?q=sa-learn+cyrus seeems to return a bunch of relevant matches -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
SAGrey plugin
I've written a little plugin called SAGrey that provides a limited amount of greylisting functionality using SpamAssassin's existing services. SAGrey is two-phased, in that it first looks to see if the current score of the current message exceeds the user-defined threshold value (as set in one of the cf files), and then looks to see if the message sender's email and IP address tuple are already known to the auto-whitelist (AWL) repository. If the message is spam and the sender is unknown, SAGrey assumes that this is one-time spam from a throwaway or zombie account, the SAGREY rule fires, adds 1.0 to the current message score, and optionally creates a header field in the message itself. The rulename or header field can then be used to perform additional functions (EG, having your delivery or transfer agent defer delivery), or the score by itself can be used to penalize the message. This model has two benefits over MTA-specific greylisting mechanisms: first, it only subjects probable-spam to greylisting (instead of making everybody be deferred, which has known problems), and it repurposes the existing spamassassin history database (meaning no additional databases need to be maintained). Another benefit is that it can still work at the MTA level if your MTA can call spamassassin while the transfer is active and then defer delivery based on the presence of header-field data (postfix 2.x will not do this unfortunately, since the header checks don't provide a DEFER verb), but can also be used in other models (such as delivery routines). The plugin and cf are posted at http://www.ntrg.com/misc/sagrey/ and I've also updated the wiki -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: [SPAM] RE: GeoCities Link-only spam
On 8/8/2005 5:05 PM, Derek Harding wrote: It allows rules such as: uricountry URICOUNTRY_CN CN header URICOUNTRY_CN eval:check_uricountry('URICOUNTRY_CN') describeURICOUNTRY_CN Contains a URI hosted in China tflags URICOUNTRY_CN net score URICOUNTRY_CN 2.0 What's the benefit of using this instead of the uridnsbl plugin? The code below will look for the IP address behind a URI and then query the cn-kr.blackholes.us RBL to see if that addr is in China: uridnsblURIBL_CNKR cn-kr.blackholes.us TXT bodyURIBL_CNKR eval:check_uridnsbl('URIBL_CNKR') tflags URIBL_CNKR net score URIBL_CNKR 2.0 I'm sure there's a difference but I guess I'm not seeing it -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: having spamc/spamd include hostname?
On 8/20/2005 4:22 PM, Dan Mahoney, System Admin wrote: basically I will have a different sql userpref for [EMAIL PROTECTED] or [EMAIL PROTECTED], or different global defaults for hosta.com. This seems elementary to do, but I can't figure out how to make spamd tell which one to use -- maybe based on the connecting ip, maybe based on a command line/config file variable passed. A simple plug-in would probably do the trick. You'd need to call the Sys::Hostname::Long module yourself, since SA itself does not need or provide the local hostname itself. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: messages with no body
On 7/12/2005 8:59 PM, Loren Wilton wrote: Note that in business circles content includes the subject. As far as I know, rawbody won't see a subject. It is fairly common to send one line questions in the subject with an empty body, and one line replies likewise. I have trained my users better than that, which is why I don't care about these tests. Other people might tho. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: messages with no body
On 7/10/2005 4:41 PM, Eric A. Hall wrote: On 7/10/2005 3:49 PM, Loren Wilton wrote: However, if you want something like this, just off the top of my head: header __HAS_TOTo =~/\S/ body__HAS_BODY/\S/ metaEMPTY_MSG(!__HAS_TO !__HAS_BODY) Good idea. rawbody works better but the model is right. As was pointed out off-list, this rule will wrongly fire if there is an attachment and no text body. The following rule is adapted from the suggested rules that were provided (I assume anonymity was desired from the off-list response so...). The rule checks for the presence of a nested media-type (message/ and multipart/ are the only nested types) or the presence of body data. header __L_MSG_HAS_C_TYPE_MContent-Type =~ /^(message|multipart)/i rawbody __L_MSG_HAS_BODY/\S/ describe L_MSG_NO_BODY Raw message does not have any body data metaL_MSG_NO_BODY (!__L_MSG_C_TYPE_M !__L_MSG_BODY) score L_MSG_NO_BODY 0.1 There are lots of fancier things to look for but that is pretty minimal testing which is what I'm looking for. BTW, I am doing this so that postfix can trap the rule after the message has undergone filtering, so that the message can simply be rejected (there's no judgement as to spamminess here, just a check to see if the message has any content). -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: messages with no body
On 7/10/2005 3:12 PM, Loren Wilton wrote: Anybody got a rule that will catch messages that don't have a body? There are things like that around. I have a rather draconian pesonal rule I use. There is a much milder form in one of the SARE rulesets. The problem is you can't check just missing body, as you will get way too many FPs in a business environment. I guess I should have asked the obvious question: and if so, could you post it? thanks -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: messages with no body
On 7/10/2005 3:49 PM, Loren Wilton wrote: However, if you want something like this, just off the top of my head: header __HAS_TOTo =~/\S/ body__HAS_BODY/\S/ metaEMPTY_MSG(!__HAS_TO !__HAS_BODY) Good idea. rawbody works better but the model is right. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: messages with no body
On 7/10/2005 4:56 PM, Loren Wilton wrote: Rawbody will miss the subject, so you will need to add a test for that too. I'm not looking for that -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: OT: Insecure dependency in connect?
On 6/16/2005 5:47 PM, Eric A. Hall wrote: I'm trying to update my ldap plugin to use SRV lookups for server discovery but am getting barked at during tests with the Insecure dependency in connect... error. I'm not having much luck with googling this error, but I remember this was a problem with razor and spamassassin before, and I'm wondering if anybody knows what the resolution was. For the benefit of others, and for Google's cache: # # this stops IO::Socket from complaining about taint problems # if ($permsgstatus-{ldap_server} =~ /^(\S+)$/) { $permsgstatus-{ldap_server} = $1; } I found that in an unofficial SA patch to razor and it seems to do the trick here too. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
OT: Insecure dependency in connect?
I'm trying to update my ldap plugin to use SRV lookups for server discovery but am getting barked at during tests with the Insecure dependency in connect... error. I'm not having much luck with googling this error, but I remember this was a problem with razor and spamassassin before, and I'm wondering if anybody knows what the resolution was. Thanks -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: AWL pokes, and SAGray.pm
On 6/15/2005 3:20 PM, Justin Mason wrote: Eric -- you may have to patch the AutoWhitelist class to throw those numbers into variables hanging off the PerMsgStatus object. Then the plugin can access those values safely. I'd be +1 on applying a patch that simply sets a variable or two on the PerMsgStatus object as the AWL logic is run, that wouldn't have any noticeable effect during normal use (and it seems handy in general). I don't disagree that it would be handy in general, but I'm not sure it's useful strategy for this plugin given some of the synchronization issues at play here. In particular, AWL runs after all of the eval rules, and that is too late in the cycle for my rule to update the message. This is kind of tricky. On the one hand, the plugin needs to run after all of the other eval tests so that it can get the current spam score. But if it is going to assign a booster score to the message (+1.0 for being first-time spam from unknown source, and getting the outcome recorded in the appropriate header field), then it also needs to run before the end of the message processing so that SA is still in a position to modify the score (and the underlying message) appropriately. This means it has to be pretty much the last rule to run, which is proving to be pretty challenging in its own right. On top of that it also has to pull data from the AWL database, but without allowing AWL to actually run against the message (it would be too late for my eval rule to update the message at that point). Therefore, the easiest way for me to find out if the message has been seen by AWL is to just ask AWL directly, using the exposed method (but that doesn't seem to be working, for reasons unknown). So I agree with you as to general utility, but it won't really help with this plugin. I need to get the AWL method figured out, and I need to get the timing factors figured out (eg, how do I make the rule be last). I'm stuck on both of those, although I'll readily admit that I'm not really trying very hard either, since I've got other stuff to work on. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: AWL pokes, and SAGray.pm
On 6/10/2005 3:04 PM, Eric A. Hall wrote: What I specifically need from AWL is number of instances for the current sender tuple, with the value of one (for the current message) being the magic number. Any suggestions would be appreciated. http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_AutoWhitelist.html says that $meanscore = awl-check_address($addr, $originating_ip); is supposed to work for this but it always seems to return undef no matter what. Is it supposed to do what I think it's supposed to do or do I need to do some other stuff first (like setup a factory or whatever)? Looking through the permsgstatus docs, getting the threshold and current spam score values looks pretty simple. This doesn't seem to be easy, either. It looks like I have to put the code for pulling current score in a sub check_end {} block but it's not behaving... I'm trying to figure out what URIBLDNS does here but it's not simple like I'd hoped. So much for quick and dirty -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
AWL pokes, and SAGray.pm
I'm looking to do a quick-n-dirty plugin that: 1) reads the spam threshold score from config (eg, default is 5.0) 2) reads the spam score for the current message 3) compares if the current score is greater than the threshold score, AND if the auto-whitelist learner has not seen this sender tuple 4) append header field that says probable spam from unknown sender The purpose of this is to allow my MTA to defer accepting messages that have this header field, providing a psuedo-greylisting feature that is keyed to spamassassin score which reuses the AWL tracking. Using this approach, I can do selective keying on spam instead of everybody (thus minimizing collateral damage to the honest mail systems that don't respond well to greylisting), and can avoid implementing yet-another tracking system (if I can get away with reusing AWL). [I should state the obligitory -- this module won't do much for people who call SA from procmail. But in my setup, postfix is calling spamassassin during the transfer process and I'm currently rejecting spam over 8.0, and rerouting mail in the 5.0-8.0 range to a per-user Junk mail folder for quarantine. This module would simply defer mail in the 5.0-8.0 range the first time they try, while subsequent transfers would be quarantined as current behavior.] Looking through the permsgstatus docs, getting the threshold and current spam score values looks pretty simple. But there doesn't seem to be much support for working with the AWL system, and I'm looking for suggestions here. I don't want to manipulate the database since it may not exist (maybe its using SQL storage or something). What I specifically need from AWL is number of instances for the current sender tuple, with the value of one (for the current message) being the magic number. Any suggestions would be appreciated. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
ldapfilter.pm updated
FYI to the handful of people that use it, the ldapfilter.pm plugin on http://www.ntrg.com/misc/ldapfilter/ has been updated to v0.02 The significant change was the use of an eval {} timer block around the LDAP searches, so that if Net::LDAP doesn't come back on its own, the plugin timer kicks in. This seems to have fixed the sporadic timeout problems with LDAP searches, and it seems to operate in persistent mode reliably now. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Is Bayes Really Necessary?
On 5/26/2005 10:08 AM, Jake Colman wrote: Given the rather complete set of rules that ship with SA and which can expanded with SARE, does bayes learning really help? Won't the rules catch pretty much everything anyway? The base SA install is insufficient, but if you tweak the scores and add some additional tests, you can get by without bayes just fine. I use a select set of RBLs, Razor, rulesets from rulesemporium, and my own LDAP-based weighting plugin, and my highest spam only gets an average of one spam per day, and even those are over the 5.0 threshold (so they are auto-filed into the Junk Email folder). Bayes is great for per-user stuff, but unless you are willing to manage the per-user databases (which I'm not), it is easier to just tweak the system scores and rules. Less management overhead, less CPU, etc. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Comparison of SA and commercial solutions
On 5/26/2005 10:30 AM, Chris Santerre wrote: Understood, and very good effort by you to educate them. Mostly all the reviews slam the cost benefit of SA with the Pay an employee to support it. line of crap. Every filtering system requires admin time, and if the reviews don't say as much then they're junk. There is a critical difference with SA, however, which is that the admins need to be proficient at stuff like CPAN, Perl, etc., while some of the packaged offerings provide simple click-the-button GUI, and those can have significantly lower salary associations. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
LDAPfilter plugin posted
I got my plugin finished (I think) and have posted links to the plugin and documentation at http://www.ntrg.com/misc/ldapfilter/ Is the wiki locked? I wanted to post a link there but the pages don't appear to be editable. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: LDAPfilter plugin posted
On 5/24/2005 2:29 PM, Justin Mason wrote: the plugin looks good. did you run into any more wierdness in the core that we should look at? any core APIs that aren't documented but should be, etc.? I think I submitted all the bugs and wishlist items that seemeds reasonable. One cleanup point is that the permsgstatus docs still list finish() which is still apparently dead, and now includes per_msg_finish() which is apparently new. I'm not using either of them for portability reasons but there ya have it. One thing I'd like to request would be ability to fetch explicit data-types from permsgstatus. What I mean by that is stuff like -myhostname() and -mailboxaddr(0) and so forth. The rules are hard for newbies to understand so they will get them wrong, and for bad programmers like me they are too much trouble to write cleanly, so being able to just ask SA for well-known data-types would be a big help. The only bug I know of in this plugin is that Net::LDAP doesn't always come back from a query when persistency is enabled and I can't figure out why, but that doesn't seem to have anything to do with SA, and it might be an artifact of my system's super-weird kernel/perl setup. Is the wiki locked? I wanted to post a link there but the pages don't appear to be editable. you need to create an account and log in. (I think there's a mention of this somewhere on the front page and the user accounts page...) Okay I'll check, thanks. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Relaying Server and sa-learn --spam
Matt Kettler wrote: I've never played with thunderbird's forward as a attachment feature, but you might be able to use that. In this situation you'd need to set up a script that strips off the attachment and feeds the attachment to sa-learn. It creates a message/rfc822 attachment, just like what SA does when it creates a report for an (attached) message. Stripping the embedded message out should be relatively straightforward using some of the mime tools. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SPAMassassin headers missplaced and follow message body
On 5/11/2005 6:58 AM, Martin G. Diehl wrote: I saw a SPAM message with the SPAMassassin message headers (X-spam headers) grossly out of sequence. The message was recognized as SPAM ... but because the X-spam headers were written in the wrong part of the message, it was able I get this periodically too. Very annoying. I haven't really looked into this much yet, but it appears that some embedded CR or LF characters are getting processed by SA and then fed back to Postfix, which then cleans up the message and splits the headers where it sees the bare CR or LF. The result is two sets of headers, the second of which naturally becomes part of the body. I've dealt with this phenomenon by having postfix check the message body for the locally-generated X-Spam-NTRG header (apart from the header block check), and reject those messages. If somebody wants to see the message I should have it in my trash still. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SPAMassassin headers missplaced and follow message body
On 5/11/2005 3:02 PM, Martin G. Diehl wrote: Eric A. Hall wrote: I haven't really looked into this much yet, but it appears that some embedded CR or LF characters are getting processed by SA and then fed back to Postfix, which then cleans up the message and splits the headers where it sees the bare CR or LF. The result is two sets of headers, the second of which naturally becomes part of the body. If somebody wants to see the message I should have it in my trash still. Please send the headers for that message. Return-Path: [EMAIL PROTECTED] Received: from goose.ehsco.com (localhost [127.0.0.1]) by goose.ehsco.com (Cyrus v2.2.3) with LMTP; Tue, 10 May 2005 04:01:56 -0500 X-Sieve: CMU Sieve 2.2 Received: from goose.ehsco.com (localhost [127.0.0.1]) by clean.ehsco.com (Postfix ) with ESMTP id 5AED93D877 for [EMAIL PROTECTED]; Tue, 10 May 2005 04:01:48 -0500 (CDT) X-Envelope-Sender: [EMAIL PROTECTED] X-Envelope-Recipients: [EMAIL PROTECTED] Received: from 24.232.159.2 (OL2-159.fibertel.com.ar [24.232.159.2]) by goose.ehsco.com (Postfix ) with SMTP for [EMAIL PROTECTED]; Tue, 10 May 2005 04:01:48 -0500 (CDT) Received: from 168.213.224.150 by ; Tue, 10 May 2005 21:58:35 +0100 Message-Id: [EMAIL PROTECTED] Date: Tue, 10 May 2005 04:01:48 -0500 (CDT) From: [EMAIL PROTECTED] To: undisclosed-recipients:; sdp.com.arMSS_ID From: Pablo [EMAIL PROTECTED] Subject: Su sitio web en doce cuotas de 35 pesos Date: Wed, 11 May 2005 02:59:35 +0600 MIME-Version: 1.0 Content-Type: multipart/related; type=multipart/alternative; boundary==_NextPart_000_0001_01C55496.4B31A720 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 X-Spam-Status: Yes X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on goose.ehsco.com X-Spam-NTRG: *** (19.0); AWL,DNS_FROM_RFC_ABUSE, EXTRA_MPART_TYPE,FORGED_MUA_OUTLOOK,FORGED_RCVD_HELO,HTML_10_20, HTML_MESSAGE,L_SMTP_MANY_PROBS,MIME_MISSING_BOUNDARY, RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_NERDS_AR,RCVD_IN_NJABL_DUL, RCVD_IN_SORBS_DUL,RCVD_NUMERIC_HELO,UNWANTED_LANGUAGE_BODY X-Spam-Virus: No -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SPAMassassin headers missplaced and follow message body
On 5/11/2005 2:51 PM, Kevin W. Gagel wrote: On 5/11/2005 6:58 AM, Martin G. Diehl wrote: I haven't really looked into this much yet, but it appears that some embedded CR or LF characters are getting processed by SA and then fed back to Postfix, which then cleans up the message and splits the headers where it sees the bare CR or LF. The result is two sets of headers, the second of which naturally becomes part of the body. SpamAssassin does not alter the message. Like I said, I haven't really looked into very closely and I don't know who's doing the conversion of bare CR/LF into CRLF pairs. How sure are you that SA doesn't do conversion? I don't have much doubt that postfix cleanup is doing this, but frankly it seems more likely to be SA. All MTA's will interpret the first blank line as the begining of the body. No kidding. The problem we are seeing happens when there is a EOL marker at the end of a header, and when that is cleaned up we have two CRLF pairs all of a sudden, with all of the headers which follow suddenly being part of the message body. Trying to figure out who/where this is happening is the exercise -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SPAMassassin headers missplaced and follow message body
On 5/11/2005 3:28 PM, Justin Mason wrote: BTW I've seen some similar messages -- as far as I can see, when it happens in my case, it's one of postfix, procmail or my MUA which is interpreting the message structure wrongly due to the whitespace wierdness. That's possible too, if whitespace is wrapped improperly it can be read as a blank line. My setup is arranged so that postfix SMTP recieves the mail, hands it off to spamassassin while the session is still open, and then examines the message that comes back from SA for header flags. If the headers show that the message is spam, then postfix rejects the message, but otherwise accepts the mail and then hands it off to cleanup agent. Looking at the headers that come in and out of spamassasin is what leads me to believe that it's doing the mangling. In particular, we can assume that SA didn't see the blank line in the headers that it read (or else it would have appended the lines at the end of the top block, not the second block), so it seems kind of likely that writing a new headers block is what causes the conversion to happen, and results in CR/blanks/whatever getting turned into a blank line. OTOH, I know that postfix does some cleanup before it performs analysis (it adds Message-ID and does other stuff), so it is entirely possible that it is doing a CR/null conversion as part of that. Very annoying whatever it is -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: spamd or amavisd-new
On 5/6/2005 5:38 AM, Beast wrote: I would like to create a mail/antispam gateway using postfix,sqlgrey and spam assassin. I don't want to install Av on this gateway because it already handle separately by each internal mail server. What is the recomendation on SA setup and which is preferred, using spamd or amavisd-new (traffic is arround 15k-20k/day). I use SpamPD [http://www.worlddesign.com/index.cfm/rd/mta/spampd.htm] so that I can call SpamAssassin from the Postfix proxy filter mechanism [http://www.postfix.org/SMTPD_PROXY_README.html], meaning in-line rejections instead of after-transfer rejections. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: What is better DCC or Razor2?
On 4/17/2005 11:23 AM, Robert Nicholson wrote: I currently run DCC and since adding But what benefit is there in running razor2? DCC just checks for volume, and doesn't quantify content. Mailings from ~cnet or elsewhere end up getting the same rank as spam, so you really have to couple DCC with a whitelisting system of some kind. Razor scores are based on tags that reflect on content, the credibility of the reporter, etc. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SpamAssassin Without Bayes
On 4/4/2005 12:28 PM, Gustafson, Tim wrote: I know that Bayes is the defacto best way to fight SPAM right now, but I wonder if anyone out there is running SA without Bayes turned on and what their experience with it is? I have it turned off and don't miss it. Tweaking your rules works just as well, and you don't have to maintain a bunch of user-specific databases. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Effectiveness
On 3/28/2005 9:30 AM, Matt wrote: That worked but your right it has no effect on the autolearn=spam. Any idea how I get it to autolearn all email to a given address as spam? can you pipe incoming mail for that account to sa-learn? -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Effectiveness
On 3/28/2005 2:07 PM, Daryl C. W. O'Shea wrote: Better yet, is to not even bother running mail for that account through SpamAssassin in the first place and instead just pipe it to sa-learn. No point in filtering mail that you are positive is 100% spam. except that he wants to blacklist for all of the other recipients too, so running it through SA with blacklist_to is needed for that, even with really high bayes marks -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Effectiveness
On 3/26/2005 4:47 PM, Matt wrote: blacklist_to appears to add 10 points to spam score. I would like to change it so it adds 20 points. How would I do that? Reason being that way blacklist_to messages will always be scored high enough to trigger them to be bayes auto_learn spam. Add this to one of your *.cf files score USER_IN_BLACKLIST_TO 100.0 or whatever score you want Dunno if the bayes auto-learner works with blacklist_to rules; it doesn't work with some whitelist rules. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: question about greylisting
alan premselaar wrote: Rob McEwen wrote: I have a question about greylisting. Does greylisting **always** involve blocking upon receipt of the SMTP envelope and not accepting the rest of the message? Or, can greylisting alternatively work where it **does** accept the **entire** message (for auditing purposes, for example) and THEN returns the temporary rejection code? however, temporarily rejecting the message after fully receiving it and processing it kind of defeats the purpose of greylisting. (or at least one major purpose of it) Yeah, it would still require CPU processing, which is one of the advantages of refusing to accept the mail in the first place. OTOH, it would still have value in terms of keeping spam away from the end-users, which is its own reward sometimes. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: RES: Dictionary Attack
On 3/23/2005 4:16 PM, Matt Kettler wrote: Daniel A. de Araujo wrote: Thanks Matt. The 2nd option looks fine, but we use Postfix. Do u (or somebody) know how to implement this option at Postfix ? Try looking at smtpd_error_sleep_time and smtpd_soft_error_limit at this page: http://www.postfix.org/rate.html That's the right track definitely. I use: smtpd_error_sleep_time = 10s smtpd_soft_error_limit = 3 smtpd_hard_error_limit = 5 That stops most malware and dictionary attacks but still tolerates problematic clients and my fat-fingered tests. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Effectiveness
On 3/23/2005 12:01 PM, Matt wrote: Another thing is I have several domains. One is from our dialup ISP 10 years old. It has several email addresses that are dead and receive nothing but junk and lots of it. About 20 pieces or more an hour. Is there anyway I can use these to improve the effectiveness of Spamassassin? Add them to your cf with a blacklist_to [EMAIL PROTECTED] entry and they'll make good spamtraps for other recipients of those same messages (but will have no effect on recipients of other copies that were sent under separate cover). You could also write the message-id and/or envelope sender (among other things) and deal with secondary copies that way. One thing I'm noticing more of lately is that some spam will come from three or four sources all at once, which is presumably happening because somebody has submitted the spam and mailing list to multiple trojaned PCs, so my spamtraps are having a little bit less success lately, but they still work very well. You can also use the messages to feed a ~global bayes training process if you're willing to accept the possibe side-effects of one-dimensional training. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: How do I whitelist this list?
Jim Maul wrote: While the above works great for people using procmail, does anyone have a solution that works without procmail? whitelist_from_rcvd [EMAIL PROTECTED] apache.org worked when I used static whitelists. I had a bunch of similar entries for various mailing lists in a big whitelists.cf file in /etc/mail/spamassassin -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: plugins and parrallelization
Eric A. Hall wrote: I'm storing the session variables (such as login status) as part of $self, and storing message variables with $permsgstatus. But where do I put the logout/disconnect code? DESTROY seems to get called after every message (seems to but I'm fairly blurry at this point), which causes the session to get killed after every message. Where am I supposed to put this stuff? Got around to looking at this some more. DESTROY() does actually get called when everything is being zapped, but that is way too late to do anything useful (Net::LDAP is already dead, for example). http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Plugin.html says $plugin-finish() called when the Mail::SpamAssassin object is destroyed but that is wrong or there is a bug because near as I can tell finish() never gets called, and it doesn't appear to even get probed (as opposed to $plugin-parse_config which shows up in debug probes, and even gets called). Is this a bug? Frankly I'm not sure that finish() would work, since the description sounds like it happens the same time as DESTROY() which is no different. What would be really useful here would be something that SA calls after it is done hitting all of the rules that it's going to. That probably ought to be finish(), and maybe it is, dunno. I can post this on bugzilla so it can be ignored there too. :o -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
plug-in timeouts
Every so often I get spampd complaining about a time-out while SA is trying to interact with one of my eval functions. I've watched the logs, and what basically happens is that the plug-in *sometimes* goes to sleep when one the (current) first eval rule in a batch is activated. It seems to hit a couple of them mroe than others, which is what strikes me as the most suspicious. Verbatim log data below is absolutely typical: Mar 21 05:10:59 goose spampd[28292]: debug: running raw-body-text per-line regexp tests; score so far=-99.95 Mar 21 05:10:59 goose spampd[28292]: debug: running full-text regexp tests; score so far=-99.95 Mar 21 05:10:59 goose spampd[28292]: debug: ClamAV: No virus detected Mar 21 05:10:59 goose spampd[28292]: debug: DCCifd is not available: no r/w dccifd socket found. Mar 21 05:10:59 goose spampd[28292]: debug: Running tests for priority: 500 Mar 21 05:10:59 goose spampd[28169]: Failed to run LDAP_MSG_FROM_LIGHT SpamAssassin test, skipping: (Timed out! ) Mar 21 05:10:59 goose spampd[28169]: debug: forged-HELO: from=apache.org helo=apache.org by=ehsco.com Mar 21 05:10:59 goose spampd[28169]: debug: forged-HELO: from=smtp-vbr11.xs4all.nl helo=smtp-vbr11.xs4all.nl by=apache.org Mar 21 05:10:59 goose spampd[28169]: debug: forged-HELO: from=webmail7.xs4all.nl helo=webmail.xs4all.nl by=smtp-vbr11.xs4all.nl Mar 21 05:10:59 goose spampd[28169]: debug: forged-HELO: mismatch on HELO: 'webmail.xs4all.nl' != 'webmail7.xs4all.nl' Mar 21 05:10:59 goose spampd[28169]: debug: forged-HELO: from=adsl.xs4all.nl helo= by=webmail.xs4all.nl Everything is hunky-dorey and then poop no-habla-API... You can tell that the plug-in itself wasn't even activated because there's no debug output from it. Also, once spampd recovers it goes right into the next set of tests, and on the next message the plug-in will be working fine again... LDAP_MSG_FROM_LIGHT is by far the most common rule to be cited, and yes it works fine when it doesn't trigger a suicide pact. It is the 23rd eval rule in the cf, if that means anything. I've done some back-end debugging, and there aren't any protocol problems like dropped connections or anything that would suggest network trouble (I've even switched to LDAPI sessions via UNIX domain socket and it still happens). So my first guess is that something in the plug-in has gone into blocking mode. This doesn't seem to happen with any other plug-ins, so I'm guessing this has something to do with one of the modules I'm using, or there's something about my plug-in that's keeping , but does this ring any bells for anybody? Could there be too many open eval calls (there are a couple of dozen in my LDAP cf), excess garbage that needs to be collected more frequently, or anything like that? The only other thing I can think of would be that the LDAP server itself is blocked, but like I said the protocol traces don't show any problems, and it's a pretty common cluster of crashes, mostly failing on the same rules (but succeeding on them the majority of the time). Quite curious. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Best way to disable a test from running?
Vicki Brown wrote: I could give it a score of 0 but I'd like to simply say don't even test against it. I'm getting tired of seeing ALL_TRUSTED. We run SMTP; they connect directly to us; there are no interim hosts. You just want to do this for specific hosts, or period? -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: DCC License Change
Greg Allen wrote: I read through some of these postings at rhyolite.com. It sounds to me like DCC should be off in SA by default going forward, or possibly completely removed from SA future versions so users don't accidentally get in a license/legal dispute without their knowledge. Seems to me that most of this stuff should be using the plug-in interface anyway. So maybe just move it out of the core and into a plug-in, and then hand the module off for Vernon to do whatever he feels like with it. SA can still provide pointers in the distro and a link on the Wiki. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
call-back plug-in
I'm thinking that SA might also benefit from a call-back plug-in that looked at the MAIL-FROM and various 822 addresses, opened a connection to the mail server for the domain[s], and verified the sender's address as valid. This would actually be a fair bit of effort given all the stuff that has to be done (MX and fall-back processing, connection management within a time-limit, etc). I'm also aware that some people really dislike these things. The real question it seems is the amount of spam something like this might catch. I've done some poking in google but can't seem to find trustworthy numbers and experiences. Anybody got any thoughts here? -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: call-back plug-in
List Mail User wrote: since you have mentioned that you do/did use Postfix - there is an option to have Postfix perform that task. Yeah, but I'm on a mission to convert the binary pass-fail tests in postfix into probability tests in SA, and this is on that list. I don't even use the call-back system in postfix here, but friends and clients have been known to. That said: everyone I find doing callbacks, gets a letter asking them to stop (at least to my addresses); Until I recognize the pattern, they look just like SMTP port scanners and/or address verification/harvesting `bots. Also, the Postfix notes warn that you should expect people to complain if you enable the option:) Yep. One option might be to cache addresses so that it only does it once per sender per ~six month window, although I'm not keen on keeping a database with this. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
what diff between init.pre and local.cf?
I'm trying to figure out any issues regarding config data and my ldapBlacklist plug-in, and this is a mystery to me. Why purpose does init.pre serve excactly if local.cf and user_prefs can load the same plug-in modules? -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: plugins and parrallelization
Justin Mason wrote: yeah -- as discussed in the Plugin pod docs, the life-cycle of the objects you have access to there is: I'm currently trying to work this so the LDAP session is maintained for the lifetime of the module. TCP sessions are pretty expensive, and having hundreds or even thousands of dead sessions lying around in timeout mode (not uncommon for busy sites) is going to be very undesirable. I'm storing the session variables (such as login status) as part of $self, and storing message variables with $permsgstatus. But where do I put the logout/disconnect code? DESTROY seems to get called after every message (seems to but I'm fairly blurry at this point), which causes the session to get killed after every message. Where am I supposed to put this stuff? Thanks -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Is this Received header correctly formatted?
mouss wrote: Eric A. Hall wrote: Huh? The helo= stuff is inside the parenthesis. Perhaps I am missing something but your point 3 seems to conflicewith your point 2. comments are only allowed where whitespace occurs can you give you me the line num in the rfc? It's actually somewhat stricter than that, and actually says that comments can only be used where folding would occur (that's a hyper-techinical but accurate reading; see the robustness principle). Here is what rfc2822 says: 3.2.3. Folding white space and comments [...] There are several places in this standard where comments and FWS may be freely inserted. To accommodate that syntax, an additional token for CFWS is defined for places where comments and/or FWS can occur. However, where CFWS occurs in this standard, it MUST NOT be inserted in such a way that any line of a folded header field is made up entirely of WSP characters and nothing else. FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space obs-FWS ctext = NO-WS-CTL / ; Non white space controls %d33-39 / ; The rest of the US-ASCII %d42-91 / ; characters not including (, %d93-126; ), or \ ccontent= ctext / quoted-pair / comment comment = ( *([FWS] ccontent) [FWS] ) CFWS= *([FWS] comment) (([FWS] comment) / FWS) Throughout this standard, where FWS (the folding white space token) appears, it indicates a place where header folding, as discussed in section 2.2.3, may take place. Wherever header folding appears in a message (that is, a header field body containing a CRLF followed by any WSP), header unfolding (removal of the CRLF) is performed before any further lexical analysis is performed on that header field according to this standard. That is to say, any CRLF that appears in FWS is semantically invisible. A comment is normally used in a structured field body to provide some human readable informational text. Since a comment is allowed to contain FWS, folding is permitted within the comment. Also note that since quoted-pair is allowed in a comment, the parentheses and backslash characters may appear in a comment so long as they appear as a quoted-pair. Semantically, the enclosing parentheses are not part of the comment; the comment is what is contained between the two parentheses. As stated earlier, the \ in any quoted-pair and the CRLF in any FWS that appears within the comment are semantically invisible and therefore not part of the comment either. Runs of FWS, comment or CFWS that occur between lexical tokens in a structured field header are semantically interpreted as a single space character. RFC 2822 is slightly stricter than RFC 822 in this regard. And while it's not full standard like 822, it is a standards-track update to 822 and was sanctioned by the IESG as such, and was developed after years of debate over good and bad behavior. and even then, the original thing was: Received: from ar39.lsanca2-4.16.241.28.lsanca2.elnk.dsl.genuity.net ([4.16.241.28] helo=watson1) and here helo=watson1 is inside parens, and with withespace (before and after the parens). or am I missing something? Check the BNF again. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocolshttp://www.oreilly.com/catalog/coreprot/
Re: Is this Received header correctly formatted?
Christopher Weimann wrote: On 03/16/2005-04:49AM, Eric A. Hall wrote: Loren Wilton wrote: Received: from ar39.lsanca2-4.16.241.28.lsanca2.elnk.dsl.genuity.net ([4.16.241.28] helo=watson1) by pop-a065d23.pas.sa.earthlink.net with smtp (Exim 3.33 #1) id 1DBKRe-Kp-00; Tue, 15 Mar 2005 14:23:22 -0800 [snip] 2) header data in parenthesis is comment data. comments are supposed to be ~allowed anywhere that whitespace is allowed (this rule is actually documented in RFC2822, which governs header fields). with that in mind, yes, it's fine there. 3) the helo= stuff isn't conformant Huh? The helo= stuff is inside the parenthesis. Perhaps I am missing something but your point 3 seems to conflicewith your point 2. comments are only allowed where whitespace occurs -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocolshttp://www.oreilly.com/catalog/coreprot/
need testers for ldapBlacklist.pm plug-in
I got the ldapBlick plug-in pretty much finished, and it just needs some polishing I think. I'd like to get some help testing this for load and latency, so if anybody has a local LDAP server running already and is pretty comfortable with SA and LDAP, and is willing to poke at this, let me know. Be warned that this plugin can really beat the crap out of your LDAP server, and will add some measurable latency if the SA system is already burdened down. But it works pretty well, and is interesting if you're into LDAP. Responses off-list pls. Thanks -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Is this Received header correctly formatted?
Loren Wilton wrote: Received: from ar39.lsanca2-4.16.241.28.lsanca2.elnk.dsl.genuity.net ([4.16.241.28] helo=watson1) by pop-a065d23.pas.sa.earthlink.net with smtp (Exim 3.33 #1) id 1DBKRe-Kp-00; Tue, 15 Mar 2005 14:23:22 -0800 1) Is stmp in lower case valid, or should it have been STMP? 2) Is it valid to have the (Exim etc) stuff between 'stmp' and 'id'? 3) Anything else that may be off the mark? The robustness principle says that you should be strict in what you send and liberal in what you accept. From that perspective, it's not a strictly conformant header, but its not broken enough for somebody to refuse to parse it. In answer to your questions: 1) the spec calls for uppercase 2) header data in parenthesis is comment data. comments are supposed to be ~allowed anywhere that whitespace is allowed (this rule is actually documented in RFC2822, which governs header fields). with that in mind, yes, it's fine there. 3) the helo= stuff isn't conformant Here's the BNF notation for the Received header as provided in RFC2821: | Time-stamp-line = Received: FWS Stamp CRLF | | Stamp = From-domain By-domain Opt-info ; FWS date-time | | ; where date-time is as defined in [32] | ; but the obs- forms, especially two-digit | ; years, are prohibited in SMTP and MUST NOT be used. | | From-domain = FROM FWS Extended-Domain CFWS | | By-domain = BY FWS Extended-Domain CFWS | | Extended-Domain = Domain / |( Domain FWS ( TCP-info ) ) / |( Address-literal FWS ( TCP-info ) ) | | TCP-info = Address-literal / ( Domain FWS Address-literal ) | ; Information derived by server from TCP connection | ; not client EHLO. | | Opt-info = [Via] [With] [ID] [For] | | Via = VIA FWS Link CFWS | | With = WITH FWS Protocol CFWS | | ID = ID FWS String / msg-id CFWS | | For = FOR FWS 1*( Path / Mailbox ) CFWS | | Link = TCP / Addtl-Link | Addtl-Link = Atom | ; Additional standard names for links are registered with the | ; Internet Assigned Numbers Authority (IANA). Via is | ; primarily of value with non-Internet transports. SMTP | ; servers SHOULD NOT use unregistered names. | Protocol = ESMTP / SMTP / Attdl-Protocol | Attdl-Protocol = Atom | ; Additional standard names for protocols are registered with the | ; Internet Assigned Numbers Authority (IANA). SMTP servers | ; SHOULD NOT use unregistered names. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Is this Received header correctly formatted?
List Mail User wrote: the with is sometimes also either a by or via (and probably other string values which I haven't noticed). BTW. by via and with are separate sub-fields with their own meaning -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Is this Received header correctly formatted?
Daryl C. W. O'Shea wrote: ...and if you can, avoid using running messages to the list through SA (easy to do if you're using procmail, not so easy in other cases). or run them through with whitelist_from_rcvd *.* apache.org to pad the value so that it doesn't matter I do wish that postfix would let me add dynamic headers to the message before the proxy filter is called, or give me an ACL for no-filter, either of which would work to skip well-known message origins -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocolshttp://www.oreilly.com/catalog/coreprot/
Re: plugins and (more)
Eric A. Hall wrote: Over the weekend I banged together a preliminary ldapBlacklist.pm plugin which lets the master process query an ldap server for whitelist or blacklist flags associated with the connecting SMTP client's reverse DNS, the HELO identifer, the mail-from address, the From address, and so forth... The problem is that each of these tests have to do a fair amount I got this working more fully, including with the persistency stuff (thanks again). Couple of other things I'm looking for help on: - is there an internal means to determine the local domain name? I'm having trouble with Sys::Hostname::Long on a couple of systems and would rather use something internal anyway since it's sure to work everywhere that SA itself works. - is there a way to force a plugin to load last? like, if I want SPF and all of the other validation stuff to get called first, but not to rely on it (it may not be installed), is there a way to force the plugin to get called last (presumably this is done by numbering the ldapBlacklist.cf to something like 99_ldap_blacklist.cf, but maybe there's a better way)? Thanks -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocolshttp://www.oreilly.com/catalog/coreprot/
Re: Header-Rule with multiple lines
On 3/15/2005 2:50 AM, Jörg Schütter wrote: I want to write a additional rule for spamassassin (3.0.2) which match the following header lines. Received: from blabla (unknown [1.2.3.4]) by my.mailserver.com This rule shuld add bad scores to machines which don't talk rfc. http://www.rulesemporium.com/forums/showthread.php?s=threadid=105 has a set of rules that might do what you want, or might be adaptable. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
plugins and parrallelization
It seems that the plugin architecture only allows a single pass/fail result, so if you want to have multiple tests with different shades of results, you have to call the plugin multiple times. Is that right? Over the weekend I banged together a preliminary ldapBlacklist.pm plugin which lets the master process query an ldap server for whitelist or blacklist flags associated with the connecting SMTP client's reverse DNS, the HELO identifer, the mail-from address, the From address, and so forth... The problem is that each of these tests have to do a fair amount of processing with some significant serialization (ie, DNS lookup for SRV RRs, DNS lookup for ldap server, connect-bind-query the server, as well as the rest of the background code. Using the pass/fail model as a front-end to this system, each test basically has to be its own rule, and each rule has to call its own eval() in order for each rule to use its defined weighting (eg, -50 for whitelisted, +50 for blacklisted, on a per-test basisc. But in that model, the core LDAP stuff has to be run ~six times to process ~six tests, and that's a significant serialization penalty in sum, just to find out if one of the sending domains is listed as blacklisted or whitelisted in a local LDAP server. It's so bad that I'm not sure it's feasible to do this. What are the thoughts? -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SA addr tests need to be updated
After considering all the discussion, I've filed these three bugs: 4188--RCVD_HELO_IP_MISMATCH should check address literals (this was argued against by Justin, but I'm convinced it's spam-sign) 4186--RCVD_NUMERIC_HELO does not test reserved addresses (they are still 'numeric' and aren't hostnames, and should still hit) 4187--RCVD_ILLEGAL_IP does not fire in all cases (reserved, malformed, and literals should all be tested, but aren't) The rest of it can stay where it is and still be useful Thanks -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SA addr tests need to be updated
On 3/9/2005 1:38 PM, Eric A. Hall wrote: I think the four affected rules are RCVD_HELO_IP_MISMATCH, RCVD_NUMERIC_HELO, RCVD_ILLEGAL_IP, RCVD_BY_IP Extending the problem report--it seems that these rules don't fire in some instances. I haven't really checked this out yet, but addresses with a leading octet of 111, 123, and some others at or below ~130 seem to get skipped entirely (so does 99 and a few other two-digit numbers). Further, in keeping with the notion that all-numeric is illegal, high-numbered decimals (eg, 789) don't trip the RCVD_NUMERIC_HELO rule either. Let me know what you the plan is on this as I can add these kinds of tests to my private set, but would rather not if they'll be in the core set. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SA addr tests need to be updated
On 3/11/2005 3:42 PM, Theo Van Dinter wrote: On Fri, Mar 11, 2005 at 03:25:06PM -0500, Eric A. Hall wrote: Extending the problem report--it seems that these rules don't fire in some instances. I haven't really checked this out yet, but addresses with a leading octet of 111, 123, and some others at or below ~130 seem to get skipped entirely (so does 99 and a few other two-digit numbers). Yeah, they're all listed as reserved. See M::SA::Constants for more detail... I suspected as much. But even then, RCVD_NUMERIC_HELO should match in all cases because all-numeric is always illegal (regardless of the number itself, any number is illegal period). Furthermore, they should be firing on RCVD_ILLEGAL_IP since they are also illegal--bonus ratware sign. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SA addr tests need to be updated
You already got a couple of responses but let me pile on. On 3/10/2005 3:17 AM, [EMAIL PROTECTED] wrote: However, I still believe it is perfectly legal to refuse mail if - the HELO matches my own MX, or lists one of my IPs I do this too. My local networks get an immediate exception to all other filters, and all other connections are queried against an LDAP server that stores PERMIT/REJECT ACLs, with REJECT entries for my own networks. So if a remote connection gets to that point in the process and claims to be me, it's lying. Separately, I run a submission server on another port, which uses strict authentication, and doesn't use the LDAP ACLs. All my clients use the submission server, which allows them to roam. - the MAIL FROM pretends to be one of my users I don't recommend that. There's the eBay problem, but there are also online newspapers and magazines (send this article) that use ~your address as the envelope sender. There are some mailing groups that use aliases instead of lists, and some mailing lists don't re-send the message, in both cases the envelope sender doesn't get rewritten. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
SA addr tests need to be updated
SA 3.0.2 currently performs a handful of tests against HELO greetings that contain an IP address. These tests don't currently fire when an address literal is used in the HELO greeting, but they should. See section 3.6 of RFC 2821: | - The domain name given in the EHLO command MUST BE either a primary |host name (a domain name that resolves to an A RR) or, if the host |has no name, an address literal as described in section 4.1.1.1. and section 4.1.3: 4.1.3 Address Literals | Sometimes a host is not known to the domain name system and | communication (and, in particular, communication to report and repair | the error) is blocked. To bypass this barrier a special literal form | of the address is allowed as an alternative to a domain name. For | IPv4 addresses, this form uses four small decimal integers separated | by dots and enclosed by brackets such as [123.255.37.2], which | indicates an (IPv4) Internet Address in sequence-of-octets form. For | IPv6 and other forms of addressing that might eventually be | standardized, the form consists of a standardized tag that | identifies the address syntax, a colon, and the address itself, in a | format specified as part of the IPv6 standards [17]. Technically, addresses that are NOT enclosed in brackets are illegal, but those are the only ones that SA sniffs out currently. Extending the current rules to include literals can probably be done by simply changing the sniff code to look for open and close brackets, but I haven't looked so I'm just guessing. As far as that goes, the tests might already do this, and just not firing. I think the four affected rules are RCVD_HELO_IP_MISMATCH, RCVD_NUMERIC_HELO, RCVD_ILLEGAL_IP, RCVD_BY_IP -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SA addr tests need to be updated
On 3/9/2005 4:01 PM, Justin Mason wrote: SA 3.0.2 currently performs a handful of tests against HELO greetings that contain an IP address. These tests don't currently fire when an address literal is used in the HELO greeting, but they should. actually, that's deliberate -- compare the frequencies of an RFC-2821 address literal, vs. a raw address, and you'll notice that the latter is much more prevalent in spam. That's true, but the rules that compare for addresses should still check the address in literals. I think the four affected rules are RCVD_HELO_IP_MISMATCH, RCVD_NUMERIC_HELO, RCVD_ILLEGAL_IP, RCVD_BY_IP if the addr doesn't check out... -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SA addr tests need to be updated
On 3/9/2005 3:29 PM, List Mail User wrote: See section 3.6 of RFC 2821: | - The domain name given in the EHLO command MUST BE either a primary |host name (a domain name that resolves to an A RR) or, if the host |has no name, an address literal as described in section 4.1.1.1. 3.6 Domains used. There are two exceptions to the rule requiring FQDNs: ... Nothing in either the section you have quoted, or the one I have allows a hostname which is not a FQDN to be used. see the first exception, which is the text I cited above. Technically, addresses that are NOT enclosed in brackets are illegal, but those are the only ones that SA sniffs out currently. Of course, my machines just refuse these during the SMTP conversation, Many do. BTW, postfix has similar problems wrt literals. For example, if postfix gets a regular address (non-literal) in the HELO, it will split the address into octets and do lookups for PERMIT/REJECT ACLs on incrementally smaller sets, which is all very nice. But if it finds a literal, it doesn't parse for the address inside, and treats the literal like a domain name. Another bug here is that the strict-syntax checks in postfix don't match against non-literal addresses, which it should (RFC1123 spells out what is a valid hostname, and all-numerics is clearly not legal). Please be careful and check the definitions and references in each document indeed -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: SA addr tests need to be updated
On 3/9/2005 5:17 PM, List Mail User wrote: Postfix option reject_invalid_hostname will reject bare IPs (when used in the smtpd_helo_restrictions section of main.cf). Good to hear this was fixed. I filed a bug report on it in May '04 but didn't get much of a response. I'll have to upgrade. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/