Re: SA 3.3.1 bug or mistake in my custom rules?
On 13/10/2011 1:45 AM, Karsten Bräckelmann wrote: On Wed, 2011-10-12 at 23:32 -0230, Lawrence @ Rogers wrote: Starting today, I've noticed that 3 of my rules fire in situations where they should not. They are simple meta rules that count how many rule, against certain URIBL rules, fire. They then raise the spam score. meta LW_URIBL_LO ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL + URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + URIBL_WS_SURBL) == 1) URIBL_RHS_DOB is missing here. meta LW_URIBL_MD ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL + URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + URIBL_WS_SURBL + URIBL_RHS_DOB) == 2) meta LW_URIBL_HI [...] I'm receiving e-mails where both LW_URIBL_LO and LW_URIBL_MD are fired. That would happen, if URIBL_RHS_DOB and another rule of the LO meta variant are hit. The only rule in the message that could trigger them are URIBL_DBL_SPAM and URIBL_RHS_DOB DBL is not part of the meta, so I don't get this. Or did you actually mean to communicate, these are the only URI DNSBL rules triggered? That would be even more confusing -- a real Status header copied would have helped... The above rules are *verbatim*, copy and paste from your rc files, with no human messing around, right? In a related note, as per the M::SA::Conf docs for meta rules -- The value of a hit meta test is that of its arithmetic expression. The value of a hit eval test is that returned by its method. The latter means, this style of adding rules is not necessarily safe, since these are eval tests. However, in this case, I believe they all should be set to 1 in case of a match. The former means, you could eliminate such issues due to inconsistencies and code duplication, by using an additional meta level: meta __VALUE FOO + BAR meta ONE __VALUE == 1 meta TWO __VALUE == 2 Hi Karsten, I don't know how I overlooked that omission in the first rule :) Thanks, it's working as expected now. I designed the rules using the information available on http://wiki.apache.org/spamassassin/WritingRules Under Meta rules It has this rule meta LOCAL_MULTIPLE_TESTS (( __LOCAL_TEST1 + __LOCAL_TEST2 + __LOCAL_TEST3) 1) The value of the sub rule in an arithmetic meta rule is the true/false (1/0) value for whether or not the rule hit. If this is incorrect, perhaps this documentation should be updated. Regards, Lawrence
Re: Results of eval and meta rules
On 13/10/2011 9:01 PM, Karsten Bräckelmann wrote: On Thu, 2011-10-13 at 03:57 -0230, Lawrence @ Rogers wrote: On 13/10/2011 1:45 AM, Karsten Bräckelmann wrote: In a related note, as per the M::SA::Conf docs for meta rules -- The value of a hit meta test is that of its arithmetic expression. The value of a hit eval test is that returned by its method. The latter means, this style of adding rules is not necessarily safe, since these are eval tests. However, in this case, I believe they all should be set to 1 in case of a match. The former means, you could eliminate such issues due to inconsistencies and code duplication, by using an additional meta level: meta __VALUE FOO + BAR meta ONE __VALUE == 1 meta TWO __VALUE == 2 I don't know how I overlooked that omission in the first rule :) In particular, since these rules are not exactly complex, and seeing them side by side... ;) Anyway, that's why I also included a way, to prevent this from ever happening. Define once, don't duplicate code, simply by adding another meta rule level. Thanks, it's working as expected now. I designed the rules using the information available on http://wiki.apache.org/spamassassin/WritingRules Under Meta rules It has this rule meta LOCAL_MULTIPLE_TESTS (( __LOCAL_TEST1 + __LOCAL_TEST2 + __LOCAL_TEST3) 1) The value of the sub rule in an arithmetic meta rule is the true/false (1/0) value for whether or not the rule hit. If this is incorrect, perhaps this documentation should be updated. Well, incorrect... Put into easy terms, I'd say. It's intended as a quick-start tutorial. After that, I seriously recommend having a look into the full documentation. There are two points here: The value of a hit eval test is that returned by its method. Which, I believe (without looking at the code) is generally the boolean value as mentioned in the wiki. Including the URI DNSBL eval rules you are using. However, and that was mostly meant as a heads-up, it MAY NOT hold true always, since eval rules MAY return something else. The value of a hit meta test is that of its arithmetic expression. This also most likely is generally the boolean value. Definitely in the example given, since the (non-boolean!) sub-result of the arithmetic expression then is compared against a number -- either true, of false. The trick is, to keep the duplicated arithmetic sub-expression in a single meta, and use that result for your comparison. Using the supported, though generally not used feature for your benefit. Thanks for the info :) Regards, Lawrence
SA 3.3.1 bug or mistake in my custom rules?
Hi, I am using SpamAssassin 3.3.1 (cPanel) with latest rule updates. Starting today, I've noticed that 3 of my rules fire in situations where they should not. They are simple meta rules that count how many rule, against certain URIBL rules, fire. They then raise the spam score. They are as follows - meta LW_URIBL_LO ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL + URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + URIBL_WS_SURBL) == 1) meta LW_URIBL_MD ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL + URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + URIBL_WS_SURBL + URIBL_RHS_DOB) == 2) meta LW_URIBL_HI ((URIBL_BLACK + URIBL_RED + URIBL_SBL + URIBL_AB_SURBL + URIBL_JP_SURBL + URIBL_OB_SURBL + URIBL_PH_SURBL + URIBL_SC_SURBL + URIBL_WS_SURBL + URIBL_RHS_DOB) 2) score LW_URIBL_LO 1.5 tflags LW_URIBL_LO net score LW_URIBL_MD 3.0 tflags LW_URIBL_MD net score LW_URIBL_HI 4.5 tflags LW_URIBL_HI net - I'm receiving e-mails where both LW_URIBL_LO and LW_URIBL_MD are fired. The only rule in the message that could trigger them are URIBL_DBL_SPAM and URIBL_RHS_DOB Any thoughts? Regards, Lawrence
Re: sa users list down due to irene?
On 29/08/2011 4:03 PM, Michael Scheidell wrote: On 8/29/11 2:13 PM, David F. Skoll wrote: Is anyone even maintaining qmail any more? I thought the project was dead. I wish it would just go away.) I wish ASF would stop using it for its mailing lists, or just apply all the patches that seem to be needed to make it 'play nice' with the rest of the world. (ok, I don't care if it plays nice with aol/hotmail/etc, you get free email? you get what you pay for). What about Yahoo, which is not only freemail, but also used by the biggest ISP here in Canada (Rogers)? Unfortunately, talking about RFC compliance is all well and good, but not everybody will be. It's like HTML and CSS support in browsers. Everyone has a different level of compliance. Some are average, and some are pretty spot on (Firefox and KTML-based tech such as webkit). - Lawrence
Re: Theories on blocking OUTGOING spam
On 16/08/2011 7:32 PM, Marc Perkel wrote: When email is coming fast from an account I start tracking the number of bad recipients and if the number of bad recipients is high it's probably spam. I also have restrictions on valid domains the from has to match, I look for URIBLs, high SA scores, etc. Just curious what others do to detect outgoing spam. I use Exim for the MTA because it has the power to do the tricks I need done. Exim + MailScanner does the job fine here. Just configure MailScanner to scan outgoing e-mail as well using SpamAssassin and discard anything over a certain point (I usually say 7.0 as that seems to be the point where any FPs end). - Lawrence
Re: exclude from freemail_domains
On 29/06/2011 8:37 AM, Tom Kinghorn wrote: Good afternoon list. is there a way to exclude a domain from the fremail_domain checks in the local.cf? I do not want to have to manually remove our domain (which does not offer freemail) from the 20_freemail_domains.cf file every time we update. refer to bug 6542 http://old.nabble.com/-Bug-6542--New%3A-Freemail_domains.cf-FP-td30899122.html our own mail is matching: FREEMAIL_FROM FROM_MISSP_FREEMAIL thanks in advance. Tom Hi Tom, FREEMAIL_FROM by itself is harmless. However, if your e-mail is also hitting FROM_MISSP_FREEMAI, it means it has malformed From: headers. Something like: From: Lawrence Williamslawrencewilli...@nl.rogers.com Notice the missing space between the quotes ending my name, and the beginning of the e-mail address. A proper From header would be: From: lawrencewilli...@nl.rogers.com or From: Lawrence Williams lawrencewilli...@nl.rogers.com This is most likely a bug in the e-mail system you are using. Regards, Lawrence
Re: [Q] Writing rule for career opportunity type messages
On 29/06/2011 3:59 PM, JKL wrote: On 06/29/2011 04:59 PM, John Hardin wrote: On Wed, 29 Jun 2011, J4K wrote: Over the past few months I noticed an increase in 'Start New Employment Today | Career Opportunity' style email. The rules I use, that are pretty much stock rules, correctly tag the email as spam. Usually the Spam score hovers between 5.5 and 6.9. Is there some reason you're unwilling or unable to use Bayes? If you are getting these regularly, then training a few as spam would likely catch most of the rest. Hi, I thought that Baynes was enabled. I have fed spam and ham into sa-learn daily since February 2011. Of course, I might well have been feeding data into a black hole if it is not working. I enabled (I Thought) Baynes as per the local.cf below:- use_bayes 1 bayes_auto_learn 1 bayes_expiry_max_db_size 30 bayes_auto_expire 1 I read somewhere that this might explain what is into the dB. Not a lot, really. # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 0 0 non-token data: ntokens 0.000 0 2147483647 0 non-token data: oldest atime 0.000 0 0 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count nham and nspam = 0Says it all :( spamassassin -D -lint confirms: Jun 29 20:25:17.682 [26298] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC Jun 29 20:25:17.847 [26298] dbg: config: fixed relative path: /var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf Jun 29 20:25:17.847 [26298] dbg: config: using /var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf for included file Jun 29 20:25:17.848 [26298] dbg: config: read file /var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf Jun 29 20:25:19.998 [26298] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements 'learner_new', priority 0 Jun 29 20:25:19.998 [26298] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670), bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL Jun 29 20:25:20.010 [26298] dbg: bayes: using username: Jun 29 20:25:20.010 [26298] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x40bfe48) Jun 29 20:25:20.010 [26298] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements 'learner_is_scan_available', priority 0 Jun 29 20:25:20.012 [26298] dbg: bayes: database connection established Jun 29 20:25:20.013 [26298] dbg: bayes: found bayes db version 3 Jun 29 20:25:20.013 [26298] dbg: bayes: Using userid: 77 Jun 29 20:25:20.013 [26298] dbg: bayes: not available for scanning, only 0 spam(s) in bayes DB 200 Jun 29 20:25:20.027 [26298] dbg: bayes: database connection established Jun 29 20:25:20.027 [26298] dbg: bayes: found bayes db version 3 Jun 29 20:25:20.028 [26298] dbg: bayes: Using userid: 77 Jun 29 20:25:20.028 [26298] dbg: bayes: not available for scanning, only 0 spam(s) in bayes DB 200 I read the entry on http://wiki.apache.org/spamassassin/SiteWideBayesSetup, and it looks like these are missing in my local.cf: bayes_path /var/spamassassin/bayes/bayes bayes_file_mode 0777 * QUESTION Other than defining these entries (baynes_path baynes_file) into the local.cf, and rerunning sa-learn, is there anything else I should do to get this to work? You don't need those entries at all. Most likely, your MTA (Exim most likely) is running as a user other than root. Set bayes_sql_override_username to the user name that your MTA is running under Example: bayes_sql_override_username mailnull Then access your Bayes MySQL database and open the bayes_vars table. It should only contain one record if it's set up properly. Change the user name to the same one you used above as well. If you are using spamd, restart it and restart your MTA. Regards, Lawrence
Re: [Q] Writing rule for career opportunity type messages
On 29/06/2011 4:58 PM, JKL wrote: select count(spam_count) from bayes_vars Run this query SELECT username,spam_count,ham_count FROM bayes_vars This will give a list of usernames that have been used to learn ham and spam into SpamAssassin's Bayes MySQL DB. For a site-wide installation, this should only return one result. To answer your previous question, I meant to simply add the bayes_sql_override_username setting to your local.cf and restart spamassassin If you are using Postfix with the postfix username, set it as bayes_sql_override_username postfix This ensures that all future e-mails are labeled as being learned from the postfix user, regardless of whether you did it manually using sa-learn via ssh or another interface, or auto-learning is used. For one site-wide Bayes installation, this is what you want. Regards, Lawrence
Re: Issuing rollback() due to DESTROY without explicit disconnect() of DBD::mysql::db handle
On 28/06/2011 11:20 PM, Marc Perkel wrote: Hi everyone, Now I'm seeing these error messages in the logs: Issuing rollback() due to DESTROY without explicit disconnect() of DBD::mysql::db handle I'm beginning to wonder if MySQL bays actually works. I'm just geeing too many strange errors. Thanks in advance for any help. Works here on a cPanel server with no issues. Are you running 3.3.2 or 3.3.1?
Re: Migrating bayes to mysql fails with parsing errors
On 21/06/2011 7:01 PM, Dave Wreski wrote: Hi, It looks like that may be my problem too. This is the result with your patch: dbg: bayes: database connection established dbg: bayes: found bayes db version 3 dbg: bayes: Using userid: 2 dbg: bayes: database connection established dbg: bayes: found bayes db version 3 dbg: bayes: using userid: 3 dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3, token: 7�OR� dbg: bayes: error inserting token for line: t 0 1 1308332646 37fc4f52eb dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3, token: Y dbg: bayes: error inserting token for line: t 0 2 1308070890 d2eec4f659 I'll try the suggested my.cnf changes and restart the process. I thought it would take longer before it started to fail again, but trying to change the character set didn't make a difference for me. Thanks, Dave It may be easier to just start from scratch with your Bayes database. - Lawrence
Re: Migrating bayes to mysql fails with parsing errors
On 21/06/2011 7:01 PM, Dave Wreski wrote: Hi, It looks like that may be my problem too. This is the result with your patch: dbg: bayes: database connection established dbg: bayes: found bayes db version 3 dbg: bayes: Using userid: 2 dbg: bayes: database connection established dbg: bayes: found bayes db version 3 dbg: bayes: using userid: 3 dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3, token: 7�OR� dbg: bayes: error inserting token for line: t 0 1 1308332646 37fc4f52eb dbg: bayes: _put_token: Updated an unexpected number of rows: 3, id: 3, token: Y dbg: bayes: error inserting token for line: t 0 2 1308070890 d2eec4f659 I'll try the suggested my.cnf changes and restart the process. I thought it would take longer before it started to fail again, but trying to change the character set didn't make a difference for me. Thanks, Dave Ignore my last suggestion of starting from scratch. Try commenting out these lines (or similar ones) if present in /etc/my.cnf and restarting MySQL before attempting again default-character-set=utf8 character-set-server=utf8 collation-server=utf8_unicode_ci init_connect='set collation_connection = utf8_unicode_ci;' Regards, Lawrence
Re: Migrating bayes to mysql fails with parsing errors
On 21/06/2011 8:47 PM, Benny Pedersen wrote: On Tue, 21 Jun 2011 22:16:05 +0300, Panagiotis Christias wrote: After commenting out the utf8 definitions and reverting back to latin1 sa-learn --restore worked fine. thanks for this report, but imho this should NOT be fixed in my.cnf What other option does he have? iconv?? - Lawrence
Re: Migrating bayes to mysql fails with parsing errors
This one is the current SQL schema and works http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_current_release_3.3.x/sql/bayes_mysql.sql - Lawrence On 20/06/2011 7:34 PM, Dave Wreski wrote: Hi, I have an existing v3.3.2 on fedora14 (perl v5.12.3) that I'm trying to convert bayes to use mysql. The restore process fails after a few minutes due to too many errors: dbg: bayes: error inserting token for line: t 1 0 1308114254 4fd2b3f2f0 dbg: bayes: _put_token: Updated an unexpected number of rows. [repeats ...] bayes: encountered too many errors (20) while parsing token line, reverting to empty database and exiting dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x26b8af8) implements 'learner_close', priority 0 ERROR: Bayes restore returned an error, please re-run with -D for more information This was already run with -D, so no further information is available. I used the sql files from spamassassin.apache.org/full/3.0.x/dist/sql/bayes_mysql.sql to create the tables. Maybe the format has changed since then and there is a more updated file? I'm using the sa from http://kojipkgs.fedoraproject.org/packages/spamassassin/3.3.2/1.fc14/x86_64/ Is there a way to skip these invalid records? Other ideas for resolving this? I can successfully restore back to the normal dbm database. Thanks, Dave
Re: Migrating bayes to mysql fails with parsing errors
On 20/06/2011 10:09 PM, Dave Wreski wrote: I have an existing v3.3.2 on fedora14 (perl v5.12.3) that I'm trying to convert bayes to use mysql. The restore process fails after a few minutes due to too many errors: dbg: bayes: error inserting token for line: t 1 0 1308114254 4fd2b3f2f0 dbg: bayes: _put_token: Updated an unexpected number of rows. [repeats ...] Did you make the backup using 3.3.2 as well? Lawrence
Re: Migrating bayes to mysql fails with parsing errors
On 20/06/2011 11:55 PM, Dave Wreski wrote: Hi, I have an existing v3.3.2 on fedora14 (perl v5.12.3) that I'm trying to convert bayes to use mysql. The restore process fails after a few minutes due to too many errors: dbg: bayes: error inserting token for line: t 1 0 1308114254 4fd2b3f2f0 dbg: bayes: _put_token: Updated an unexpected number of rows. [repeats ...] Did you make the backup using 3.3.2 as well? Yes, and the bdb was originally created just recently using a v3.3.2 pre-release as well. I also made sure the bdb was synced before trying to do the backup. Thanks, Dave Was it made using the very same version though? I don't know what to tell you. I've never seen this issue myself, it sounds like a corrupt backup or bug in the restore. When I did it, I used the instructions found at the end of this file http://svn.apache.org/repos/asf/spamassassin/branches/3.3/sql/README.bayes On cPanel servers, exim and such generally run as the mailnull user, so I had to set bayes_sql_override_username mailnull Once that was done, I restored the backup and it has worked flawlessly since. - Lawrence
Re: Spam not stopped???
On 15/06/2011 10:00 PM, User for SpamAssassin Mail List wrote: Hello, I have something I cannot explain. We blacklisted an email address for a client but Spam assassin still let it through. Here are the logs: Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam (104.0/6.0) for client:2130 in 0.2 seconds, 1729 bytes. Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC E_RATIO,USER_IN_BLACKLIST scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127. 0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no Jun 15 08:08:10 mail sm-mta[21077]: p5FF86ld021067: to=cli...@pcez.com, delay=00:00:03, xdelay=00:00:02, mailer=local, pri=31672, dsn=2.0.0, stat=Sent As you can see the use is in the black list but yet the mail was delivered. I checked other email that was over a score of 9 and the mail was rejected, but for some reason or another this was not. Anyone have an idea why this making it through? Thanks, Ken SpamAssassin merely assigns scores and doesn't do any rejections on it's own. That is handled by whatever is calling SpamAssassin and using the score that the e-mail is assigned. This could be something like MailScanner, Amavis, or some other third party software. Also, it would be better to blacklist an e-mail address at the MTA level (ex: Exim, Postfix) Regards, Lawrence
Re: Spam not stopped???
On 15/06/2011 11:13 PM, User for SpamAssassin Mail List wrote: Lawrence, Thanks for the responce. I know Spam Assassin doesn't stop it we use a spamassassin milter for sendmail to reject it. (We been doing this for years). Anyway here is a log on a email that was rejected: Jun 15 06:27:33 mail spamd[981]: spamd: identified spam (22.2/6.0) for spamass-milter:111 in 2.1 seconds, 5378 bytes. Jun 15 06:27:33 mail spamd[981]: spamd: result: Y 22 - AWL,BAYES_99,HTML_IMAGE_ONLY_12,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,SARE _SPEC_ROLEX,SARE_SPOOF_COM2COM,SARE_SPOOF_COM2OTH,SPOOF_COM2COM,SPOOF_COM2OTH,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_ RHS_DOB,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=2.1,size=5378,user=spamass-milter,uid=111,required_score=6.0,rhost= localhost,raddr=127.0.0.1,rport=42127,mid=20110615185711.2964.qmail@vsp-6214cbe9e6d,bayes=1.00,autolearn=spam Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: Milter: data, reject=550 5.7.1 Blocked by SpamAssassin Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: to=u...@pcez.com, delay=00:00:02, pri=35237, stat=Blocked by SpamAssassin The reason we did not block this at the MTA level is we do not know if OTHER users might want email from this email address. Anyway I'm still looking for a clue why one is blocked and the other is not. Thanks, Ken On Wed, 15 Jun 2011, Lawrence @ Rogers wrote: On 15/06/2011 10:00 PM, User for SpamAssassin Mail List wrote: Hello, I have something I cannot explain. We blacklisted an email address for a client but Spam assassin still let it through. Here are the logs: Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam (104.0/6.0) for client:2130 in 0.2 seconds, 1729 bytes. Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC E_RATIO,USER_IN_BLACKLIST scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127. 0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no Jun 15 08:08:10 mail sm-mta[21077]: p5FF86ld021067: to=cli...@pcez.com, delay=00:00:03, xdelay=00:00:02, mailer=local, pri=31672, dsn=2.0.0, stat=Sent As you can see the use is in the black list but yet the mail was delivered. I checked other email that was over a score of 9 and the mail was rejected, but for some reason or another this was not. Anyone have an idea why this making it through? Thanks, Ken SpamAssassin merely assigns scores and doesn't do any rejections on it's own. That is handled by whatever is calling SpamAssassin and using the score that the e-mail is assigned. This could be something like MailScanner, Amavis, or some other third party software. Also, it would be better to blacklist an e-mail address at the MTA level (ex: Exim, Postfix) Regards, Lawrence Although you shouldn't be using SARE rules anymore (No longer developed and reportedly hit many FPs), this e-mail would be blocked by a 9.0 limit. That would indicate that your setup is working, at least sometimes. The first set of headers you posted were as follows Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC E_RATIO,USER_IN_BLACKLIST scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127. 0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no BAYES_50 is 0.8 HTML_MESSAGE is 0.001 MISSING_SUBJECT is 0.001 SPF_PASS is -0.001 TVD_SPACE_RATIO is 0.001 USER_IN_BLACKLIST is 100.00 I got this from http://spamassassin.apache.org/tests_3_3_x.html (except MISSING_SUBJECT and TVD_SPACE_RATIO, which are not listed but are present in the current 3.3 rules available via sa-update) So the overall score should have been 100.802 What was the score shown as being returned by SA? Regards, Lawrence
Re: Spam not stopped???
On 16/06/2011 3:13 AM, User for SpamAssassin Mail List wrote: On Thu, 16 Jun 2011, Lawrence @ Rogers wrote: On 15/06/2011 11:13 PM, User for SpamAssassin Mail List wrote: Lawrence, Thanks for the responce. I know Spam Assassin doesn't stop it we use a spamassassin milter for sendmail to reject it. (We been doing this for years). Anyway here is a log on a email that was rejected: Jun 15 06:27:33 mail spamd[981]: spamd: identified spam (22.2/6.0) for spamass-milter:111 in 2.1 seconds, 5378 bytes. Jun 15 06:27:33 mail spamd[981]: spamd: result: Y 22 - AWL,BAYES_99,HTML_IMAGE_ONLY_12,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,SARE _SPEC_ROLEX,SARE_SPOOF_COM2COM,SARE_SPOOF_COM2OTH,SPOOF_COM2COM,SPOOF_COM2OTH,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_ RHS_DOB,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL scantime=2.1,size=5378,user=spamass-milter,uid=111,required_score=6.0,rhost= localhost,raddr=127.0.0.1,rport=42127,mid=20110615185711.2964.qmail@vsp-6214cbe9e6d,bayes=1.00,autolearn=spam Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: Milter: data, reject=550 5.7.1 Blocked by SpamAssassin Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: to=u...@pcez.com, delay=00:00:02, pri=35237, stat=Blocked by SpamAssassin The reason we did not block this at the MTA level is we do not know if OTHER users might want email from this email address. Anyway I'm still looking for a clue why one is blocked and the other is not. Thanks, Ken On Wed, 15 Jun 2011, Lawrence @ Rogers wrote: On 15/06/2011 10:00 PM, User for SpamAssassin Mail List wrote: Hello, I have something I cannot explain. We blacklisted an email address for a client but Spam assassin still let it through. Here are the logs: Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam (104.0/6.0) for client:2130 in 0.2 seconds, 1729 bytes. Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC E_RATIO,USER_IN_BLACKLIST scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127. 0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no Jun 15 08:08:10 mail sm-mta[21077]: p5FF86ld021067: to=cli...@pcez.com, delay=00:00:03, xdelay=00:00:02, mailer=local, pri=31672, dsn=2.0.0, stat=Sent As you can see the use is in the black list but yet the mail was delivered. I checked other email that was over a score of 9 and the mail was rejected, but for some reason or another this was not. Anyone have an idea why this making it through? Thanks, Ken SpamAssassin merely assigns scores and doesn't do any rejections on it's own. That is handled by whatever is calling SpamAssassin and using the score that the e-mail is assigned. This could be something like MailScanner, Amavis, or some other third party software. Also, it would be better to blacklist an e-mail address at the MTA level (ex: Exim, Postfix) Regards, Lawrence Although you shouldn't be using SARE rules anymore (No longer developed and reportedly hit many FPs), this e-mail would be blocked by a 9.0 limit. That would indicate that your setup is working, at least sometimes. The first set of headers you posted were as follows Jun 15 08:08:10 mail spamd[20901]: spamd: result: Y 103 - BAYES_50,HTML_MESSAGE,MISSING_SUBJECT,SPF_PASS,TVD_SPAC E_RATIO,USER_IN_BLACKLIST scantime=0.2,size=1729,user=client,uid=2130,required_score=6.0,rhost=localhost,raddr=127. 0.0.1,rport=55987,mid=snt117-w309552c1e79d42eb67a294ad...@phx.gbl,bayes=0.479706,autolearn=no BAYES_50 is 0.8 HTML_MESSAGE is 0.001 MISSING_SUBJECT is 0.001 SPF_PASS is -0.001 TVD_SPACE_RATIO is 0.001 USER_IN_BLACKLIST is 100.00 I got this from http://spamassassin.apache.org/tests_3_3_x.html (except MISSING_SUBJECT and TVD_SPACE_RATIO, which are not listed but are present in the current 3.3 rules available via sa-update) So the overall score should have been 100.802 What was the score shown as being returned by SA? Regards, Lawrence As the log showed: Jun 15 08:08:10 mail spamd[20901]: spamd: identified spam (104.0/6.0) spamd is reporting it as spam. sendmail.mc is set up as: INPUT_MAIL_FILTER(`spamassassin', `S=local:/var/run/spamass/spamass.sock, F=, T=S:6m;R:9m;E:16m')dnl As you can see the one message is blocked by MTA: Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: Milter: data, reject=550 5.7.1 Blocked by SpamAssassin Jun 15 06:27:33 mail sm-mta[1251]: p5FDRUgF001251: to=u...@pcez.com, delay=00:00:02, pri=35237, stat=Blocked by SpamAssassin But the message in question got delivered even though the spamassassin said it was spam. So it looked like the milter is working for one email but not the other. What would cause this? Thanks, Ken Hi Ken, It's odd that one spam e-mail is being blocked by the milter, while another is not. It's definitely something with your milter configuration. Unfortunately, I cannot
Re: Sought rules
On 10/06/2011 10:24 PM, Warren Togami Jr. wrote: On 6/10/2011 2:01 PM, Karsten Bräckelmann wrote: IFF you use the sought channel with SA 3.3.x, you will need the reorder hack to bend the alphabet. It is not entirely clear to me, what exactly are you supposed to rename for the reorder hack? You have to do it every time you sa-update? Warren Would renaming 20_sought_fraud.cf to 99_sought_fraud.cf, putting 20_sought_fraud.cf (from the yelp.org channel) after 72_active.cf (the default and assumed older SA rules) solve this problem? Regards, Lawrence
Re: Spamassasin - SQLITE as storage database
On 17/05/2011 12:06 PM, monolit939 wrote: Hello, do you have any experience with usage of SQLITE database as storage for Spamassassin? Spamassassin uses Berkeley DB, but I need to replace it. I could not find any manual, guide or just phorum discussion about colaboration Sapmassassin with SQLITE. I apreciate each advice. Thanks a lot I have no experience with this, but I do have experience with using MySQL with InnoDB tables. The performance is actually much better than Berkley DBs. Regards, Lawrence
Re: DKIM_SIGNED postive score
On 13/04/2011 10:08 PM, Noel Butler wrote: I've looked high and low and dont seem to be adding this locally, shouldn't it be a negative score of 0.1? Or better still, null, and only get a score if valid which is applied (DKIM_VALID=-0.1,), Seems the above only cancels this out and either way is not needed, or am I missing something? Cheers It appears this is a default score for DKIM_VALID, and is intended to be canceled out by DKIM_VALID or DKIM_VALID_AU
Re: ups.com virus has now switched to dhl.com
On 31/03/2011 1:29 PM, Michael Scheidell wrote: 'from' dhl.com (come on ups/dhl.. I know SPF is broken, but in this case it would sure help is decide if the sending ip is authorized to send on your behalf) with some pretty weird received lines: is this 'ipv8'? Doubtful. IPv8 is still very much a pipe dream. The world hasn't even embraced IPv6 yet. I would say most of the Received: headers are just messed up to bypass IPv4 and RBL checks. - Lawrence
Re: Spam
On 29/03/2011 9:27 PM, Martin Gregorie wrote: On Wed, 2011-03-30 at 00:58 +0200, mar...@swetech.se wrote: recetly i been getting ALOT of these mail with the subjects like this contain a link to some scam/chinese crap factory i run the latest spamassassin along with amavis but these mails keep getting through any ideas? Re: YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat Since the longest (English) word I know has 28 letters (antidisestablishmentarianism), a private rule like: header VERY_LONG_WORD Subject =~ /Re:\s+\S{29}/ should catch that spam. Martin We started getting those spams about 6 months ago. What I did was come up with a low scoring rule that hits on this # Rule 1: check if the Subject also containing numbers, letters, or common formatting (except spaces) and more than 34 characters header LW_SUBJECT_SPAMMY Subject =~ /^[0-9a-zA-Z,.+_\-'!\\\/]{31,}$/ describe LW_SUBJECT_SPAMMY Subject appears spammy (31 or more characters without spaces. Only numbers, letters, and formattiing) score LW_SUBJECT_SPAMMY 0.2 #tflags LW_SUBJECT_SPAMMY noautolearn I'm sure this rule could use some improvement. The ones we saw also always followed 2 possible patterns (sometimes containing both in the same e-mail) 1) Hit the HTML_MESSAGE, and either FREEMAIL_FROM or TRACKER_ID, rules. 2) Hit MIME_QP_LONG_LINE and a network test. We have the above 2 in the form of meta rules and scored at 1.0 each. We also have a 3rd meta rule, with the first rule + the 2 described above, scored at 1.5 This has proven to be quite effective at nuking these spams without FP. This is because the likelyhood of a ham e-mail setting off all of the above rules is quite low. Regards, Lawrence
Re: fake URL's in mail
On 23/03/2011 4:36 PM, Adam Katz wrote: On 03/23/2011 11:43 AM, Matus UHLAR - fantomas wrote: On 03/21/2011 09:37 AM, Matus UHLAR - fantomas wrote: Does anyone successfully use plugin or at least rules that catch fake URLs? On 21.03.11 13:36, Adam Katz wrote: __SPOOFED_URL, a rule already shipping with SA, does this. I know about the problem with legal mail and spoofed URL's. That's why I asked about plugin that would be able to accept whitelists. That would require an ENORMOUS whitelist and very close attention to its upkeep. I do not see this as practical without using a URIBL-style mechanism (which would also require high maintenance). Even with such a mechanism in place, it unduly penalizes the little guys. Agreed. It's just one of those impractical things and just ain't worth the effort. Regards, Lawrence
Re: BUG : all messages rule RP_8BIT
On 22/03/2011 7:02 PM, Bagnoud Thierry [ezwww.ch] wrote: until 21 mars 2011 after the normal cron.daily/update_spamassassin, Spamassassin report all messages with the rule RP_8BIT header RP_8BIT Return-Path:raw =~ /[^\000-\177]/ describe RP_8BIT Return-Path contains 8-bit characters with high bit on score RP_8BIT 2.8 Thanks to correct this rule. Thierry Bagnoud I looked through our mail logs and don't see any such hits on our e-mail. If all of your e-mail is hitting this rule, I would think something before SpamAssassin is messing up the Return-Path (perhaps another scanner or MTA) - Lawrence
Re: BUG : all messages rule RP_8BIT
On 22/03/2011 7:21 PM, Bagnoud Thierry [ezwww.ch] wrote: oups, since 21 mars and not until 21 mars, excuse me bad english :-) the modification from the rule on 2011-03-21 -header RP_8BITReturn-Path =~ /[^\000-\177]/ +header RP_8BITReturn-Path:raw =~ /[^\000-\177]/ http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/mmartinec/20_misc.cf?r1=906046r2=906045pathrev=906046 perhaps the MTA MailScanner is messing up the Return-Path Thierry Bagnoud On 3/22/11 5:32 PM, Bagnoud Thierry [ezwww.ch] wrote: hi, until 21 mars 2011 after the normal cron.daily/update_spamassassin, Spamassassin report all messages with the rule RP_8BIT I don't see this on any inbound email. what are you saying? you want the rule to be changed to the below? I don;t see any difference between this rule and your proposed changes except for the score. and, I don't recommending changing that score. leave it at 2.866 1.389 2.866 1.389 header RP_8BIT Return-Path:raw =~ /[^\000-\177]/ describe RP_8BIT Return-Path contains 8-bit characters with high bit on score RP_8BIT 2.8 Thanks to correct this rule. or are you saying that it is hitting on email that does not have the 8th bit high? you might want to post a full email to pastbin.com and send THE LINK ONLY to this group. are you saying its a false positive on YOUR system since you like getting emails with illegal chars in the headers? then add this to local.cf: score RP_8BIT 0 Thierry Bagnoud Something is definitely off. We use SA with MailScanner, and that rule never hits anything (less than 1 or 2 messages in several thousand). - Lawrence
Re: Very large subjects in all caps with no spaces
I use the following rule that, combined with other meta rules, catches the majority of these header LW_SUBJECT_SPAMMY Subject =~ /^[0-9a-zA-Z,.+_\-'!\\\/]{31,}$/ describe LW_SUBJECT_SPAMMY Subject appears spammy (31 or more characters without spaces. Only numbers, letters, and formatting) score LW_SUBJECT_SPAMMY 0.2 The key is to score the actual subject rule low, but bump the SA score with meta rules that increase the score as more indicators are hit. I've had moderate success with the rules below: # Rule 2: Message is HTML and has a tracking ID, or comes from a free mail address # Therefore, must hit HTML_MESSAGE, and either TRACKER_ID or FREEMAIL_FROM meta LW_SPAMMY_EMAIL1 (LW_SUBJECT_SPAMMY HTML_MESSAGE (TRACKER_ID || FREEMAIL_FROM)) describe LW_SPAMMY_EMAIL1 Spammy HTML message that has a tracking ID or is freemail score LW_SPAMMY_EMAIL1 1.0 #tflags LW_SPAMMY_EMAIL1 noautolearn # Rule 3: Message hits LW_SPAMMY_EMAIL1 and MIME_QP_LONG_LINE # It's unusual for non-spam HTML messages to have really long Quoted Printable lines meta LW_SPAMMY_EMAIL2 (LW_SPAMMY_EMAIL1 (MIME_QP_LONG_LINE || __LW_NET_TESTS)) describe LW_SPAMMY_EMAIL2 Spammy HTML message also has a Quoted Printable line 76 chars, or hits net check score LW_SPAMMY_EMAIL2 1.0 #tflags LW_SPAMMY_EMAIL2 noautolearn Hope this helps! Regards, Lawrence On 15/03/2011 1:53 AM, jambroo wrote: Is there a way of filtering emails with very large one-word subjects. They are also in all caps. I can see rules that set emails to spam if they contain specific wording but nothing like this. Thanks.
Re: The one year anniversary of the Spamhaus DBL brings a new zone
On 08/03/2011 4:54 PM, dar...@chaosreigns.com wrote: Looks like that would be something like this? urirhssub URIBL_DBL_REDIRECTOR dbl.spamhaus.org. A 127.0.1.3 bodyURIBL_DBL_REDIRECTOR eval:check_uridnsbl('URIBL_DBL_SPAM') describeURIBL_DBL_REDIRECTOR Contains a URL listed in the DBL as a spammed redirector domain tflags URIBL_DBL_REDIRECTOR net domains_only score URIBL_DBL_REDIRECTOR 0.1 Anybody know of a domain that hits this? Close. I believe that you should be using this eval:check_uridnsbl('URIBL_DBL_REDIRECTOR') Instead of this eval:check_uridnsbl('URIBL_DBL_SPAM') So the correct rule would be urirhssub URIBL_DBL_REDIRECTOR dbl.spamhaus.org. A 127.0.1.3 bodyURIBL_DBL_REDIRECTOR eval:check_uridnsbl('URIBL_DBL_REDIRECTOR') describeURIBL_DBL_REDIRECTOR Contains a URL listed in the DBL as a spammed redirector domain tflags URIBL_DBL_REDIRECTOR net domains_only score URIBL_DBL_REDIRECTOR 0.1 Regards, Lawrence
Re: The one year anniversary of the Spamhaus DBL brings a new zone
On 08/03/2011 5:12 PM, Yet Another Ninja wrote: On 2011-03-08 21:24, dar...@chaosreigns.com wrote: Looks like that would be something like this? urirhssub URIBL_DBL_REDIRECTOR dbl.spamhaus.org. A 127.0.1.3 bodyURIBL_DBL_REDIRECTOR eval:check_uridnsbl('URIBL_DBL_SPAM') describeURIBL_DBL_REDIRECTOR Contains a URL listed in the DBL as a spammed redirector domain tflags URIBL_DBL_REDIRECTOR net domains_only score URIBL_DBL_REDIRECTOR 0.1 Anybody know of a domain that hits this? tried to post a list of the domains but Apache's infra rejected it with. Delivery to the following recipient failed permanently: users@spamassassin.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 552 552 spam score (13.3) exceeded threshold (FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_SURBL_MULTI1,T_SURBL_MULTI2,T_TO_NO_BRKTS_FREEMAIL,T_URIBL_BLACK_OVERLAP,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_PH_SURBL,URIBL_WS_SURBL (state 18). pretty amazing... How so? You posted a list of spam domains, and SpamAssassin picked up on them. Why not try making them a bit mangled like www dot crappydomain dot com ?? Regards, Lawrence
Re: Should Emails Have An Expiration Date
On 28/02/2011 5:12 PM, Matt wrote: I think this would be a great idea. Many end users never bother to delete old emails and on some, such as sales etc, there is no valid reason for them to countinue to waste disk and server space. http://www.zdnet.com/news/should-emails-have-an-expiration-date/6197888 Dumbest. Idea. Ever. Regards, Lawrence
Re: [Q] sa-compile: not compiling; 'spamassassin --lint' check failed!
On 15/02/2011 9:27 AM, J4K wrote: spamassassin --lint This may seem obvious, but did you run spamassassin --lint like sa-compile suggested? I assume DCC is probably not loaded, or disabled in your setup. Open up /etc/spamassassin/local.cf, find this line dcc_add_header 1 Comment it out. The line should look like this when you are done #dcc_add_header 1 Save your changes and run spamassassin --lint again. This time there should be no complaints from it. If there are not, try sa-compile again. Regards, Lawrence
Re: [Q] sa-compile: not compiling; 'spamassassin --lint' check failed!
On 15/02/2011 10:07 AM, J4K wrote: Its pretty moot any way, because now after running spamassassin -lint, sa-compile still fails with the same error. Hi, Just because DCC is running doesn't mean SA is configured to use it. Can you post the following: - Output of spamassassin --lint - Contents of /etc/spamassassin/local.cf Those should help pinpoint the exact problem. Regards, Lawrence
Re: eval:html_tag_balance - short tags not accepted?
On 28/01/2011 4:13 AM, Per Jessen wrote: letely valid don't you think? It's invalid HTML and contains no content. Yes, I agree. All HTML/XHTML tags are required to close (XHTML is supposed to be more strict, as it was intended to follow XML structure moreso), but only the ones I mentioned earlier are allowed to self-close with a /. The others require an explicit closing in order to be considered valid. Example: p/p OT: I am curious to know why the W3C Validator considers p/ to be valid, when it goes against every bit of documentation from them I've ever read. I agree with John Hardin though. head/ is more likely to appear in spam than ham (ham being legit e-mail that the sender wants to be readable by even the most broken of HTML renderers, like Outlook 2010). - Lawrence
Re: eval:html_tag_balance - short tags not accepted?
On 28/01/2011 5:28 AM, Per Jessen wrote: script type=/ style type=/ fieldset/ legend/ Sounds like it doesn't care whether the actual tag can be shorthand or not. It just looks at the structure and decides they're valid, even though HTML and XHTML specifications say otherwise. - Lawrence
Re: Training Bayes on outbound mail
On 28/01/2011 2:53 PM, David F. Skoll wrote: On Fri, 28 Jan 2011 18:10:08 + Dominic Bensondomi...@lenny.cus.org wrote: Recently, in order to balance the ham/spam ratio given to sa-learn, I have started to pass mail submitted by authenticated users to sa-learn --ham. I haven't seen any mention of this strategy on-list or on the web, so I'm interested in whether (a) anyone else does this, and (b) is there a good reason not to do it that I haven't thought of? It's possibly a good idea, but you want to be really careful of one thing: Make sure your users are savvy enough not to have their accounts phished. It'll take just one compromised account that blasts out a spam run to destroy the usefulness of your Bayes data. Regards, David. Agreed. I was considering the same idea at one point, and came to the same result. One person could poison the DB completely.
Re: eval:html_tag_balance - short tags not accepted?
On 27/01/2011 4:15 AM, Per Jessen wrote: I've just been looking at a mail that got a hit on HTML_TAG_BALANCE_HEAD due to this: !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd; html xmlns=http://www.w3.org/1999/xhtml; head/ body style=width: 800px I can't quite figure out whether the short tag syntax is allowed - the HTML above was generated by XSLT based on this input: head/head Other popular short tags:br/ div/ p/ - I don't think we should be judging those to be unbalanced HTML tags. /Per Jessen, Zürich As a person who writes HTML/XHTML every single day, there are several flaws in your argument: - head/ is not valid HTML or XHTML (in any version) - HTML 4.01 Transitional doesn't allow for an XHTML xmlns attribute, nor does it permit short tags - The only valid short tag that you mentioned is br /. div/ and p/ are not - Using a short tag without a space between the name and the / is also not recommended as it causes problems for older browsers and poorly written HTML parsers. You appear to have made a flawed statement based upon a flawed study (no HTML e-mail will ever be just a head/head combination) Regards, Lawrence
Re: eval:html_tag_balance - short tags not accepted?
On 27/01/2011 4:43 PM, Per Jessen wrote: Lawrence @ Rogers wrote: On 27/01/2011 4:15 AM, Per Jessen wrote: I've just been looking at a mail that got a hit on HTML_TAG_BALANCE_HEAD due to this: !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd; html xmlns=http://www.w3.org/1999/xhtml; head/ body style=width: 800px I can't quite figure out whether the short tag syntax is allowed - the HTML above was generated by XSLT based on this input: head/head Other popular short tags:br/ div/ p/ - I don't think we should be judging those to be unbalanced HTML tags. /Per Jessen, Zürich As a person who writes HTML/XHTML every single day, there are several flaws in your argument: -head/ is not valid HTML or XHTML (in any version) Ah, because it needs at leasttitle. Okay. - HTML 4.01 Transitional doesn't allow for an XHTML xmlns attribute, nor does it permit short tags Irrelevant for this issue. Spamassassin doesn't care about the DTD when it's evaluating for unbalanced tags. Use your imagtion and put any suitable DTD instead. - The only valid short tag that you mentioned isbr /.div/ and p/ are not They're certainly all valid in XHTML. (the validator at w3c says ok for both). - Using a short tag without a space between the name and the / is also not recommended as it causes problems for older browsers and poorly written HTML parsers. Irrelevant for this issue. You appear to have made a flawed statement based upon a flawed study Gee, what's with the hostility? I never made an argument, I asked a simple question. (no HTML e-mail will ever be just ahead/head combination) I didn't suggest that. /Per Jessen, Zürich Hi Per, I did not intend for my message to be hostile in any way. My apologies if my terse tone came across that way. div/ and p/ may pass the validator, but that is most certainly a bug. A quick look through the XHTML 1.0 DTD's reveals only ten tags that may be closed using the short form, and I am unable to find any documentation on the W3C web site to support anything otherwise. area / base / br / col / hr / img / input / link / meta / param / Using any other shorthandled elements would result in HTML rendering engines choking and giving unpredictable results. What I was suggesting is that your belief is flawed because your test was flawed itself. No e-mail will ever be just head/head. Ignoring the fact that a title tag is required as a minimum (although many e-mails probably omit it), the head/ form is invalid as well. SpamAssassin may not care about DTDs and the like, but HTML rendering engines such as the one used in Internet Explorer (where people may be using webmail clients) and Outlook (which recently reverted from IE's engine to a crappy one used in Microsoft Word) do care. Programs who send HTML e-mails are going to do at least the bare minimum to ensure their messages are displayed and readable, and they will know that Internet Explorer's HTML rendering engine is what will most likely be parsing the HTML they supply. This almost ensures that a HTML message will be at least like this html head/head body Some content here /body /html Even spammers know that using anything less than the above runs a very real risk of the message being unable to be displayed, which would make the e-mail completely pointless. I believe that the behavior of HTML_TAG_BALANCE_HEAD is valid in this case, as head/ is invalid HTML (despite what the validator says) and should not be used by anyone. Regards, Lawrence (For what it's worth, div/ and p/ are not popular. I've never seen them used on any legit site)
Re: eval:html_tag_balance - short tags not accepted?
On 27/01/2011 5:36 PM, Per Jessen wrote: Lawrence @ Rogers wrote: div/ andp/ may pass the validator, but that is most certainly a bug. A quick look through the XHTML 1.0 DTD's reveals only ten tags that may be closed using the short form, and I am unable to find any documentation on the W3C web site to support anything otherwise. area / base / br / col / hr / img / input / link / meta / param / Using any other shorthandled elements would result in HTML rendering engines choking and giving unpredictable results. I'm not so sure - I think relatively modern renderers are quite capable of dealing with bothdiv/ andp/ without causing any problems. p/ instead ofp/p is not unusual. What I was suggesting is that your belief is flawed because your test was flawed itself. No e-mail will ever be justhead/head. Ignoring the fact that atitle tag is required as a minimum (although many e-mails probably omit it), thehead/ form is invalid as well. Accepted, but it doesn't change the problem in html_eval_tag() - the code doesn't attempt to validate html, it just does a simple regex check for a balanced tag, but doesn't accept or ignore the short tag version with no content. I believe that the behavior of HTML_TAG_BALANCE_HEAD is valid in this case, ashead/ is invalid HTML (despite what the validator says) and should not be used by anyone. True, but html_eval_tag() will fire on _any_ short tag. /Per Jessen, Zürich The problem is that, the majority of HTML e-mails out there are being handled by older HTML engines (like IE7 or worse) If it's firing on head/ with no content, that's completely valid don't you think? It's invalid HTML and contains no content. That throws the balance off. Any HTML coder will never assume that people are using programs with modern HTML capabilities. Most of the world is only now finally letting IE6 die. Could you provide an example of a site using div/ or p/ shorthand tags? I've never seen them before anywhere. Previously, my understanding has always been that shorthanded closing was only allowed for tags that didn't have a closing tag before (such as meta). The HTML recommendations support this. Perhaps there is further work to be done in SA regarding handling HTML balancing, but head/ is pointless to test for as it has no reason or possible use in the real world. If html_eval_tag() is firing on any short tag, and not just the invalid example code, that would signal a possible bug and investigation. Cheers, Lawrence
Re: BlackBerry Email Being Blocked by SpamAssassin
On 13/01/2011 3:10 PM, Brendan Murtagh wrote: We are running SpamAssassin 3.2.5 (1.1) with IceWarp Mail Server and currently the following are whitelisted within IceWarp: *.bis.na.blackberry.com *.blackberry.com *.blackberry.net A score of 3.0 is much too low for determining if an e-mail is spam. We have clients get e-mails all the time that score between 3.5 and 4.0 and are non-spam. Anything that scores 5.0 or above is definitely spam in our experience. You may have whitelisted the domains within Icewrap, but all that does is ensure they get to SA for scanning. Nothing more. Cheers, Lawrence
Re: Spam bot Spam seems to be decreasing
On 11/01/2011 4:47 PM, Julian Yap wrote: On Sun, Jan 9, 2011 at 11:42 PM, Jeff Chan je...@surbl.org mailto:je...@surbl.org wrote: On Sunday, January 9, 2011, 12:50:12 PM, Lawrence Rogers wrote: On 09/01/2011 4:41 PM, Jari Fredriksson wrote: On 9.1.2011 18:40, Marc Perkel wrote: Just wondering if anyone else is noticing this. Spam bot spam is down to 1/4 of what it was a year ago. I had noticed my black list shrinking. But here's some raw data from someone who tracks it. Now: http://www.sdsc.edu/~jeff/spam/cbc.html http://www.sdsc.edu/%7Ejeff/spam/cbc.html A year ago: http://www.sdsc.edu/~jeff/spam/2010/bc-20100109.html http://www.sdsc.edu/%7Ejeff/spam/2010/bc-20100109.html Are we winning? It has been in news also, spam has decreaced since autumn and then again in december. We just have to wait and see if this is permanent. It has been since the shutdown of Spamit late last year http://www.telegraph.co.uk/news/worldnews/europe/russia/8090100/Spam-falls-by-a-fifth-after-Russian-operation-shut-down.html Rustock is spamming again: http://www.spamcop.net/spamgraph.shtml?spamweek http://cbl.abuseat.org/totalflow.html I concur. I see a rise again this week. It really dropped from around Christmas time. - Julian I get criminals take Christmas off too lol - Lawrence
Re: Spam bot Spam seems to be decreasing
On 09/01/2011 4:41 PM, Jari Fredriksson wrote: On 9.1.2011 18:40, Marc Perkel wrote: Just wondering if anyone else is noticing this. Spam bot spam is down to 1/4 of what it was a year ago. I had noticed my black list shrinking. But here's some raw data from someone who tracks it. Now: http://www.sdsc.edu/~jeff/spam/cbc.html A year ago: http://www.sdsc.edu/~jeff/spam/2010/bc-20100109.html Are we winning? It has been in news also, spam has decreaced since autumn and then again in december. We just have to wait and see if this is permanent. It has been since the shutdown of Spamit late last year http://www.telegraph.co.uk/news/worldnews/europe/russia/8090100/Spam-falls-by-a-fifth-after-Russian-operation-shut-down.html
Re: BOTNET rules question
On 05/01/2011 6:22 PM, Michael Monnerie wrote: Dear list, I received this info from a customer, whose order confirmation from the londontheatredirect.com got marked as spam because of BOTNET* rules. Are those rules too old, or is that server in a botnet? How to find out? Or which rules scores should I tune to optimize? -- Forwarded message -- Datum: Dienstag, 28. Dezember 2010 Preview: LondonTheatreDirect.com Order confirmation Many thanks for your order, christian enserer Please print this confirmation for your reference [...] Analyse Details: (6.0 points, 5.0 required) Pkt Name der Regel Beschreibung -- - -0.5 L_P0F_D7 L_P0F_D7 0.5 L_P0F_WRelayed through Windows OS except Windows XP 0.0 RELAY_UK Relayed through Brittan 2.2 BOTNET Relay might be a spambot or virusbot [botnet0.8,ip=88.208.245.26,rdns=server88-208-245-26.live- servers.net,maildo main=londontheatredir... 0.3 BOTNET_IPINHOSTNAMEHostname contains its own IP address [botnet_ipinhosntame,ip=88.208.245.26,rdns=server88-208-245-26.live- servers. net] 0.0 BOTNET_CLIENT Relay has a client-like hostname [botnet_client,ip=88.208.245.26,rdns=server88-208-245-26.live- servers.net,ip inhostname] -0.0 BAYES_40 BODY: Bayes spam probability is 20 to 40% [score: 0.3460] 0.0 HTML_MESSAGE BODY: HTML included in message 0.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.4 HTML_MIME_NO_HTML_TAG HTML-only message, but there is no HTML tag 1.0 RDNS_DYNAMIC Delivered to internal network by host with dynamic-looking rDNS 0.0 LOTS_OF_MONEY Huge... sums of money 1.6 BOTNET_WIN Mail from Windows XP which seems to be in a Botnet I would suspect that you are using non-standard rules. What's most concerning is the old p0f rules that are looking for Windows XP. That is dangerous and a bad thing to use as a rule (the OS of the sender). I would remove the p0f and botnet rules if I were you. That would solve your problem. Regards, Lawrence
Re: BOTNET rules question
On 05/01/2011 8:38 PM, RW wrote: Aside from BOTNET_WIN the p0f rules are low-scoring and add-up to zero. Since BOTNETS are 100% Windows it doesn't seem unreasonable to use p0f in a metarule. However, you might want to look into this inconsistency: You are right about the overlapping and one rule saying it's Windows XP, and the other says it's not. However, as for botnets, there are a number of Linux botnets nowadays as well. Remember Psyb0t from 2009? So while you can argue Windows is 90%+, it's not alone :) Regards, Lawrence
lots of freemail spam
Hi, Lately, I notice we are getting a fair amount (10-12 per day per client) of spam coming from freemail users (FREEMAIL_FROM triggers). Usually the Subject is non-existent or empty, and the message is always just an URL Is there a good rule for flagging these as possible spam? I understand that there may be some legit e-mails that would hit all 3 factors, so I would score the rule low. Thoughts? Regards, Lawrence
Re: Additional sa-update channels
On 15/12/2010 1:32 PM, Bowie Bailey wrote: On 12/15/2010 11:57 AM, Andy Jezierski wrote: Sorry all, Been away from the list for quite some time. Just updated SA from 3.2.5 to 3.3.1. Have been trying to find a list of sa-update channels that are still relevant but not with much success. Does anyone know is such a list exists, or if you know of which additional channels can still be used. I know a lot of them have been merged into SA and some are outdated and recommended not to be used. All of the good SARE rules have been merged into SA. All of the SARE update channels should no longer be used (as the rules are no longer being updated). The best additional channel to use at the moment is the Sought ruleset. http://wiki.apache.org/spamassassin/SoughtRules Have to disagree on the Sought rules. I've seen them give quite a few false positives (mostly on e-mail notifications from social networks Facebook and Twitter), and hit on hardly any spam at all. Your best best is to use the khop rules, along with one SARE set still being updated by Daryl. Below are the channels I recommend: updates.spamassassin.org khop-bl.sa.khopesh.com khop-blessed.sa.khopesh.com khop-dynamic.sa.khopesh.com khop-general.sa.khopesh.com khop-sc-neighbors.sa.khopesh.com 90_2tld.cf.sare.sa-update.dostech.net Regards, Lawrence
Re: Additional sa-update channels
On 15/12/2010 3:51 PM, Bowie Bailey wrote: The khop rules are good. I thought the 2tld stuff had been pulled into SA as 20_aux_tlds.cf? It has, but the Daryl edited one has some additional stuff (I think) that isn't in there. There is conditional code that enables certain rules in the file depending on what version of SA you are running.
Re: facebook phishing, SPF_PASS
On 19/11/2010 4:43 PM, Michael Scheidell wrote: Thought you would be interested, a facebook phishing email (yes, it is, ) with SPF_PASS (reminding EVERYONE, SPF IS NOT A SPAM VS HAM INDICATOR AT ALL) yes, I publish SPF, I used it in meta rules. this one passed because sender used a envelope from in the ip range of the spf rules. http://secnap.pastebin.com/zTmkSc6J ps, scored a 3.5 here. by now, hopefully, it scores higher with razor/dcc/spamcop, urlbl, etc. I'm not sure how SPF could pass on this one. The sending server doesn't have the same domain name, nor is using an IP authorized in Facebook's SPF records. SPF is supposed to confirm that the sending server is authorized to do so for the domain, but that clearly fails here.
Re: Sought False Positives
On 08/11/2010 12:06 PM, Ned Slider wrote: Fair enough - fortunately I've not seen any of those here so assumed a genuine facebook mail had maybe slipped through into the corpus by mistake. Either way, it was fixed by the time I'd spotted it. I've seen it as well, and disabled the Sought rules. They were causing too many FPs and not hitting enough spam to be worthwhile.
Re: Reservation scam?
On 07/11/2010 8:29 PM, Alex wrote: Hi, I just noticed a handful of emails similar to this scam: http://spamdb.vp44.com/emails/feb09/feb09-234.php I realize it's a scam, but I'm not sure exactly how, and searching produced nothing useful. Is this another 419 scam? Can someone point me to where I can find more info on how this scam works, and more importantly how to stop them? Are there any individual rules developed that people are finding useful? Thanks, Alex Can you post the full headers and body from the spam message (including Received: lines)?
Re: Reservation scam?
On 07/11/2010 10:37 PM, Alex wrote: Hi, Can you post the full headers and body from the spam message (including Received: lines)? Okay, I've figured it out. It's the whole scheme where they convince you to either deposit one of their checks or accept a credit card purchase, then expect you to send real money to some other person or account, all the while they are giving you a fake check or credit card. Here's the latest example: http://pastebin.com/ZUxiLjMy Rules would very much be appreciated. Thanks, Alex It's going to be difficult to help, as you modified the headers before posting the message on Pastebin. Can you put up the full unmodified message? Cheers, Lawrence
Re: new headers rule
On 05/11/2010 10:58 AM, Randy Ramsdell wrote: X-MB-Message-Source: WebUI You appear to have records of the same spam influencing your bayes results (it hits BAYES_99, which is good). What are your Bayes threshold settings? Cheers, Lawrence
Re: new headers rule
On 05/11/2010 6:00 PM, Randy Ramsdell wrote: Lawrence @ Rogers wrote: On 05/11/2010 10:58 AM, Randy Ramsdell wrote: X-MB-Message-Source: WebUI You appear to have records of the same spam influencing your bayes results (it hits BAYES_99, which is good). What are your Bayes threshold settings? Cheers, Lawrence I am not sure what you are asking me. Our spam cutoff is around 5. Note that the above example was from a ssubject modified message that made it through spamassassin. I simply removed the Subject. In your SpamAssassin configuration, what you you have the following options set to: bayes_auto_learn_threshold_nonspam bayes_auto_learn_threshold_spam Cheers, Lawrence
new headers rule
Hi, I've noticed a bunch of spams coming in recently that have no To: and Subject: and have cobbled together the following rule to combat them. Any feedback would be appreciated. # Message has empty To: and Subject: headers # Likely spam header __LW_EMPTY_SUBJECT Subject =~ /[[:space:]]$/ meta LW_EMPTY_SUBJECT_TO (__LW_EMPTY_SUBJECT MISSING_HEADERS) describe LW_EMPTY_SUBJECT_TO Message has empty To and Subject headers score LW_EMPTY_SUBJECT_TO 2.5 If anyone would like to test this as part of the mass corpus, please feel free to do so. I am curious to know how it performs. Regards, Lawrence Williams LCWSoft www.lcwsoft.com
Re: new headers rule
On 04/11/2010 5:56 PM, Karsten Bräckelmann wrote: On Thu, 2010-11-04 at 15:55 -0230, Lawrence @ Rogers wrote: I've noticed a bunch of spams coming in recently that have no To: and Subject: and have cobbled together the following rule to combat them. Any feedback would be appreciated. Just as a side note, there is a difference between a missing and an empty header. # Message has empty To: and Subject: headers # Likely spam header __LW_EMPTY_SUBJECT Subject =~ /[[:space:]]$/ That rule does *not* do what you intend. It matches, if the last char of the Subject happens to be a whitespace. By definition, that header is not empty. Moreover, it is not equivalent to a header that has no printable chars, which seems to be what you actually tried the RE to match. How's about this then # Message has empty To: and Subject: headers # Likely spam header __LW_EMPTY_TO To =~ /^[[:space:]]$/ header __LW_EMPTY_SUBJECT Subject =~ /^[[:space:]]$/ meta LW_EMPTY_SUBJECT_TO (__LW_EMPTY_SUBJECT __LW_EMPTY_TO) describe LW_EMPTY_SUBJECT_TO Message has empty To and Subject headers score LW_EMPTY_SUBJECT_TO 2.5
Re: new headers rule
On 04/11/2010 6:35 PM, Randy Ramsdell wrote: Are the Subject lines blank or missing from the body? And that goes for the To also. In the spam I am seeing, there are both present and empty. Example To: Subject:
Re: new headers rule
On 04/11/2010 8:11 PM, Karsten Bräckelmann wrote: Moving back on-list, since it doesn't appear to be personally directed at me. On Thu, 2010-11-04 at 19:22 -0230, Lawrence @ Rogers wrote: On 04/11/2010 7:13 PM, Karsten Bräckelmann wrote: No, that requires the Subject to consist of exactly one whitespace. Read it out load. The ^ beginning of the string, followed by exactly one whitespace char [2]. Followed by the $ end of the string. No offense, but I am a C and PHP programmer and Perl's documentation is lacking, to put it politely. Too much theory and far too few actual real world examples. This is not about Perl, but Regular Expressions. The much more feature- rich (and widely adopted) Perl flavor, out of all the existing variants. But that's actually irrelevant in this case, cause you would need a very limited sub-set only, pretty much available in any tool sporting REs. Any introduction to REs would do, no need to tend to the Perl docs you don't like. Though it sounds like you didn't even had a look at the docs I pointed you to. That is exactly what I am trying to match, and according to my tests, it works as expected. When the To and Subject are empty, all that's there (before the newline) is one whitespace. Are you referring to the whitespace delimiter between the Header: and its content? It's not part of the content. What I am looking to check is a situation where both the To: and Subject: headers contain nothing at all, but are set (I've seen this in several spam e-mails recently) Now you're confusing me. Do you want to match a single whitespace, or a completely empty header? If there's a better way of doing this, I would appreciate you providing an example. Well, better way... One that does what you just described. Assuming you want to match headers containing nothing at all, as per your previous paragraph. That would be nothing between the beginning and end. header __FOO Foo =~ /^$/ Or, negated, not anything. header __FOO Foo !~ /./ Now, since you specifically constrained this, you might want to check for the header's existence. Probably not worth it, though. The following is copied from stock 20_head_tests.cf, and documented in SA Conf. header __HAS_SUBJECT exists:Subject Anyway, in cases like these it's best to provide a *raw* sample, showing the headers in question completely un-munged and exactly as seen by SA. (Otherwise our help often is limited to guessing and an informal description.) This prohibits copy-n-paste from your MUA, which too often changes subtle but important details. One easy way to come to a conclusion whether you want to match whitespace or not, is the following ad-hoc header rule with spamassassin debug. The matching header's contents are shown in double quotes. spamassassin -D --cf=header FOO To =~ /^.*/ msg 21 | grep FOO And just for reference, 'grep' uses REs... Thanks Karsten, One of these days when I get some free time, I will be sitting down and reading up on REs :) Using your examples, and some hackery, I came up with this. It checks for the existence of the To header as well, as SA doesn't seem to have a rule for doing this on it's own (a grep -r exists:To * on the rules pulled in from updates.spamassassin.org produced nothing). # Message has empty To: and Subject: headers # Likely spam header __LW_HAS_TO exists:To header __LW_EMPTY_TO To =~ /^$/ header __LW_EMPTY_SUBJECT Subject =~ /^$/ meta LW_EMPTY_SUBJECT_TO (__HAS_SUBJECT __LW_HAS_TO __LW_EMPTY_SUBJECT __LW_EMPTY_TO) describe LW_EMPTY_SUBJECT_TO Message has empty To and Subject headers score LW_EMPTY_SUBJECT_TO 2.5 I added this to my custom .cf rules file and ran spamassassin --lint and got no complaints. I ran it over a sample spam, and it hit. I took another spam where both headers had information in them, and it didn't hit. Guess it works as expected :) Cheers, Lawrence
new rule
Hi, Does anyone see anything wrong with this rule I just put together. It is a meta rule that is intended to attempt to detect HTML-only spam with forged freemail Reply-To: header meta LW_HTML_REPLYTO_FORGED (FREEMAIL_FORGED_REPLYTO HTML_MESSAGE MIME_HTML_ONLY) describe LW_HTML_REPLYTO_FORGED HTML-only message with forged freemail reply-to score LW_HTML_REPLYTO_FORGED 2.0 tflags LW_HTML_REPLYTO_FORGED noautolearn I've set it to noautolearn while testing. Regards, Lawrence
comparing From and Reply-To:
As a sort of follow up to my last message, I was wondering how complicated it is to write a rule that would compare the From: and Reply-To: headers, and set it to 0.001 or make it a meta rule that could be used in conjunction with others? Would this plugin suffice? http://wiki.apache.org/spamassassin/FromNotReplyTo Regards, Lawrence
Re: comparing From and Reply-To:
On 02/11/2010 6:43 PM, Chris Conn wrote: On 2010-11-02 17:01, Lawrence @ Rogers wrote: As a sort of follow up to my last message, I was wondering how complicated it is to write a rule that would compare the From: and Reply-To: headers, and set it to 0.001 or make it a meta rule that could be used in conjunction with others? Would this plugin suffice? http://wiki.apache.org/spamassassin/FromNotReplyTo Regards, Lawrence I use this plugin for precisely that. We have modified the plugin to match particular addresses in order to score highly for phishing and whatnot. Chris I've gotten it working here and it seems to do exactly what I want. Compare the 2 e-mail addresses only, and ignore the extra crap like the name and such. I've set it to score 0.001 and used it as part of a few meta rules to help out with some spam.
Re: Rule works in testing, but not hitting live mail
On 29/10/2010 3:32 PM, NFN Smith wrote: header LR_OBSC_RECIPS To =~ /\\\/ Is this rule being used standalone, or as part of a meta rule? Do you have a score declared for it? If so, what is it? Does spamassassin --lint report any errors at the end of its output? Cheers, Lawrence
Re: Rule works in testing, but not hitting live mail
On 29/10/2010 4:06 PM, NFN Smith wrote: Lawrence @ Rogers wrote: On 29/10/2010 3:32 PM, NFN Smith wrote: header LR_OBSC_RECIPS To =~ /\\\/ Is this rule being used standalone, or as part of a meta rule? Do you have a score declared for it? If so, what is it? Right now, I'm scoring at 1.25 points. Thus, it's not a hidden rule. Also, in testing, not only is the rule showing as expected in the SpamAssassinReport.txt attachment, the debug log is also showing that the rule is firing correctly. Oct 29 18:30:18.807 [27696] dbg: rules: ran header rule LR_OBSC_RECIPS == got hit: Does spamassassin --lint report any errors at the end of its output? I double-checked, and a --lint check comes up clean. When I apply rules updates to working configurations, I use a script, and part of that script includes a --lint check. If --lint complains, then I don't replicate the update to my production servers. Smith Are you running it against an e-mail with a known match? Using spamassassin -D -t sample-spam.txt and having sample-spam.txt contain the complete e-mail including headers? Are you sure the machine in question doesn't have 2 copies of SA installed (I have seen this before on cPanel servers, one installed via CPAN and the other via RPM)
Re: Collecting IP reputation data from many people
On 28/10/2010 1:45 PM, David F. Skoll wrote: OK, On a somewhat less sarcastic note: One reason we didn't use TCP is that it simply doesn't scale. If you have clients that open a TCP connection, do a report, and then close the TCP connection, there's a huge bandwidth penalty. On the other hand, if your clients maintain persistent TCP connections, your server is going to run out of sockets rather quickly. Remember, our system is designed to scale to tens or hundreds of thousands of reporting systems sending tens or hundreds of thousands of reports per second. Regards, David. What reporting system do you use? and how does one avail of the data it provides?
Re: rule to catch subject spamming
On 23/10/2010 5:47 PM, RW wrote: On Sat, 23 Oct 2010 14:28:38 -0230 Lawrence @ Rogerslawrencewilli...@nl.rogers.com wrote: Hello all, I noticed recently that our users are getting spam with the subject similar to the following: SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet I got some of these a while ago. They were pretty hard to catch because they came through Hotmail and had little to work with in the body. I added: headerSUBJ_LONG_WORD Subject =~ /\b[^[:space:][:punct:]]{30}/ describe SUBJ_LONG_WORD Longwordinsubjectlikethis score SUBJ_LONG_WORD 2.0 headerSUBJ_JOIN_CAP_WORD Subject =~ /([[:upper:]]+[[:lower:]]+){5}/ describe SUBJ_JOIN_CAP_WORD JoinedCapitalizedWordsRuntogether score SUBJ_JOIN_CAP_WORD 1.5 They are missing some ?:, but for single header rules I don't really care. Thanks, but some testing showed that your rules FP on URLs in the Subject line. I have settled on the following as it's more specific and less prone to FPs (I can't think of any possibilities right now) # Matches a new technique used by spammers in the Subject line # Running a bunch of pornographic words together (with no spaces) to evade spam filters # The message itself is generally malformed HTML with one or more unusually long lines # This rule is a meta rule that tests for the Subject containing any numbers, letters, or common formatting # Must hit at least 3 SA rules (__LOCAL_SUBJECT_SPAMMY, and 2 others... usually HTML_MESSAGE and MIME_QP_LONG_LINE) # string must be at least 42 characters and contain no spaces header __LOCAL_SUBJECT_SPAMMY Subject =~ /^[0-9a-zA-Z,.+]{42,}$/ meta LOCAL_SUBJECT_SPAMMY1 ((__LOCAL_SUBJECT_SPAMMY + HTML_MESSAGE + MIME_QP_LONG_LINE + MPART_ALT_DIFF + TRACKER_ID) 2) describe LOCAL_SUBJECT_SPAMMY1 Subject looks spammy (contains a lot of characters, and no spaces) score LOCAL_SUBJECT_SPAMMY1 5.0 tflags LOCAL_SUBJECT_SPAMMY1 noautolearn Cheers, Lawrence Williams LCWSoft
compare 2 headers
Hi, Is there a quick way to compare 2 headers? I am seeing spam lately that has an invalid e-mail address (one not hosted by us) set in the To: header, but has the intended one in the Envelope-To: header What I would like to do is take the Envelope-To and run a regex to check if the To: header contains it. Is this possible? Regards, Lawrence Williams LCWSoft
Re: compare 2 headers
On 24/10/2010 5:44 PM, Karsten Bräckelmann wrote: On Sun, 2010-10-24 at 16:26 -0230, Lawrence @ Rogers wrote: Is there a quick way to compare 2 headers? I am seeing spam lately that has an invalid e-mail address (one not hosted by us) set in the To: header, but has the intended one in the Envelope-To: header What I would like to do is take the Envelope-To and run a regex to check if the To: header contains it. The To header is merely cosmetic. It does not have any solid meaning, in particular does not necessarily match the recipient. There are perfectly valid reasons to not have the actual recipient in the To header. Ever sent a message with Bcc recipients? Ever received a post via a mailing list? I had not thought of that, but you are right :) I see this mailing list sets the To: header to users@spamassassin.apache.org, even though the e-mail comes to me. I am writing a rule that deals with spam that claims to be coming from AOL's webmail client, where the e-mail has malformed HTML, references to remote images, and a high ratio of images to content. I guess I will have to find another way to detect them.
Re: compare 2 headers
On 24/10/2010 9:27 PM, Martin Gregorie wrote: On Sun, 2010-10-24 at 18:03 -0230, Lawrence @ Rogers wrote: On 24/10/2010 5:44 PM, Karsten Bräckelmann wrote: There are perfectly valid reasons to not have the actual recipient in the To header. Ever sent a message with Bcc recipients? Ever received a post via a mailing list? I had not thought of that, but you are right :) I see this mailing list sets the To: header to users@spamassassin.apache.org, even though the e-mail comes to me. You might want to write a very low scoring rule (score 0.01) that fires on 'List-id' headers for mailing lists you are subscribed to and use this in a meta rules to use different rules for mail from known mailing lists and everything else. Martin Thanks, but I decided to go a different route with this one, as Karsten was right (it was too risky).
rule to catch subject spamming
Hello all, I noticed recently that our users are getting spam with the subject similar to the following: SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet SpamAssassin seems to be having a hard time determining whether it is spam or not because it appears as one long word. In all cases, the subject contains no spaces (to prevent detection I would think) and is longer than 62 characters (not sure why they do this, but it is true in every sample I've seen so far). I would like to create a rule to pick up on this, but having a bit of difficult with the regex for the rule. This is what I've come up with so far header CR_SUBJECT_SPAMMYSubject =~ /.{62}/ describe CR_SUBJECT_SPAMMY Subject looks spammy (contains a lot of characters, and no spaces) score CR_SUBJECT_SPAMMY 2.5 I just need to modify the regex to check that the Subject contains no spaces. I've done some research, and the longest non-coined word in a major dictionary is 30 characters long, meaning that if it was used twice in a subject, the total length would still only be 60 characters, There may be some FPs if the sender used formatting like commas and such, but the possibility of them using 2 of the word, then formatting without spacing, would probably be extremely remote. Any assistance or advice would be greatly appreciated. Regards, Lawrence Williams LCWSoft
Re: rule to catch subject spamming
On 23/10/2010 2:28 PM, Lawrence @ Rogers wrote: Hello all, I noticed recently that our users are getting spam with the subject similar to the following: SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet SpamAssassin seems to be having a hard time determining whether it is spam or not because it appears as one long word. In all cases, the subject contains no spaces (to prevent detection I would think) and is longer than 62 characters (not sure why they do this, but it is true in every sample I've seen so far). I would like to create a rule to pick up on this, but having a bit of difficult with the regex for the rule. This is what I've come up with so far header CR_SUBJECT_SPAMMYSubject =~ /.{62}/ describe CR_SUBJECT_SPAMMY Subject looks spammy (contains a lot of characters, and no spaces) score CR_SUBJECT_SPAMMY 2.5 I just need to modify the regex to check that the Subject contains no spaces. I've done some research, and the longest non-coined word in a major dictionary is 30 characters long, meaning that if it was used twice in a subject, the total length would still only be 60 characters, There may be some FPs if the sender used formatting like commas and such, but the possibility of them using 2 of the word, then formatting without spacing, would probably be extremely remote. Any assistance or advice would be greatly appreciated. Regards, Lawrence Williams LCWSoft This is the rule I've come up with now # Matches a new technique used by spammers in the Subject line # Running a bunch of pornographic words together (with no spaces) to evade # spam filters # This rule tests for the Subject containing any numbers, letters, or common formatting # string must be at least 42 characters and contain no spaces header CR_SUBJECT_SPAMMYSubject =~ /^[0-9a-zA-Z,.+]{42,}$/ describe CR_SUBJECT_SPAMMY Subject looks spammy (contains a lot of characters, and no spaces) score CR_SUBJECT_SPAMMY 3.5 tflags CR_SUBJECT_SPAMMY noautolearn
prevent rule from being considered for Bayes auto-learning
Hi, I recall reading somewhere that there is a way to prevent a rule from being considered for Bayes auto-learning. I am trying to create a rule that hits upon some obvious spam that I am seeing, yet I want to make sure (for now) that any scores it assigns are not used for anything Bayes-related. I cannot seem to find any documentation on how to do this (Google doesn't help). I think it is something to do with setting a tflag, but any guidance would be appreciated. Regards, Lawrence Williams LCWSoft www.lcwsoft.com
Re: prevent rule from being considered for Bayes auto-learning
On 21/10/2010 2:17 PM, Karsten Bräckelmann wrote: On Thu, 2010-10-21 at 18:39 +0200, Karsten Bräckelmann wrote: See M::SA::Plugin::AutoLearnThreshold. In a nutshell, (a) there are a few tflags that will prevent a rule's score to be used for auto-learning and (b) the score used is picked from the respective non-bayes score-set. With (a) you can make a rule invisible to the auto-learning decision. And by setting the scores for score-set 0 and 1 both to 0 as per (b), you can effectively disable a rule unless Bayes is enabled. ... *and* have that rule ignored for the auto-learning decision, if Bayes and auto-learn is enabled. (Actually not ignored, but adding zero doesn't influence the result. ;) The tflags way is much more straight forward, though. You cannot, however, create a rule to conditionally prevent auto- learning altogether (which, as I understand isn't what you had in mind anyway). Thanks everyone, I have set the rule to noautolearn using the tflags directive (this is what I wanted, for the rule to simply not be considered when auto-learning). - Lawrence