Re: anyone know anything about lashback?
On 8/9/2011 3:39 AM, Michael Scheidell wrote: does anyone know about this rbl? http://www.lashback.com/blacklist/ We have a persistent sender who is sending phishing emails through a large corporate server (not ours .. ;-) the only two reputation filters that list them are the commercial DCC, and lashback. (oh, where do I submit the phishing url...) its not listed either. http://www.spamtips.org/2011/05/dnsbl-safety-report-5142011.html UBL has been tested multiple times in the past two years with dismal results. When using spamassassin's corpus, it has demonstrated poor spam detection rates, high false positives, and high rates of unreliability making it dangerous to rely upon for your spam filtering deployment. Our own tests were not alone in making this assessment. Warren
Re: sa-update failing
On 7/16/2011 4:54 AM, dar...@chaosreigns.com wrote: On 07/15, ssapp80 wrote: Running spamassassin-3.3.2 on CentOS 5.5 perl-Net-DNS ver 0.59 installed When I run sa-update i receive the following failures on the Net::DNS module name2labels is not exported by the Net::DNS module My guess is Net::DNS version 0.59 is too old. In which case it would be nice if spamassassin specified that in its use statement so it would give a more useful error. If upgrading that perl module fixes that problem, it might be worth opening a bug to improve that error. I'm using Net::DNS v0.65. EL-5 has perl-Net-DNS 0.59. I've been using it in production for years without issue, even with 3.3.2. This is not the droid we're looking for. Move along. Warren
Re: sa-update failing
On 7/17/2011 7:55 AM, Axb wrote: On 2011-07-17 18:32, Warren Togami Jr. wrote: On 7/16/2011 4:54 AM, dar...@chaosreigns.com wrote: On 07/15, ssapp80 wrote: Running spamassassin-3.3.2 on CentOS 5.5 perl-Net-DNS ver 0.59 installed When I run sa-update i receive the following failures on the Net::DNS module name2labels is not exported by the Net::DNS module My guess is Net::DNS version 0.59 is too old. In which case it would be nice if spamassassin specified that in its use statement so it would give a more useful error. If upgrading that perl module fixes that problem, it might be worth opening a bug to improve that error. I'm using Net::DNS v0.65. EL-5 has perl-Net-DNS 0.59. I've been using it in production for years without issue, even with 3.3.2. This is not the droid we're looking for. Move along. Warren unless RH has patched perl-Net-DNS 0.59, iirc, the original had some issues in RR.pm (going back to 2007) including some security thingie. I'd recomend using 0.63 or higher. http://pkgs.repoforge.org/perl-Net-DNS/ offers a good choice. (I'm using 0.66 and happy) Looking at the changelog, it appears they did patch several bugs and a security issue. All I know is I have had no issues using EL-5's 0.59 for years now. Warren
Re: sa-update failing
On 7/15/2011 10:35 AM, ssapp80 wrote: Running spamassassin-3.3.2 on CentOS 5.5 perl-Net-DNS ver 0.59 installed When I run sa-update i receive the following failures on the Net::DNS module name2labels is not exported by the Net::DNS module Can't continue after import errors at /usr/lib/perl5/vendor_perl/5.8.8/Net/DNS/RR/NSEC3.pm line 24 BEGIN failed--compilation aborted at Why CentOS 5.5? http://www.spamtips.org/p/rpm-packages.html I am running CentOS 5.6 here with standard OS + EPEL packages plus the SpamTips.org spamassassin-3.3.2 RPM. Are you doing anything different from this? I have never seen an error like that before. Why are you running sa-update manually? The upstream RPM and SpamTips.org RPM (both designed by me) automatically run sa-update once per day if spamd is running. Warren Togami war...@togami.com
Re: spamassassin 3.3.2 rpms for el4 / centos4 etc ???
On 7/11/2011 7:53 PM, R - elists wrote: It's removal was based at least in part on a belief that it was not actually usable for anybody. You could take it up with the dev list, particularly if you're up for maintaining it in a way that's useful for the major rpm platforms. Either way you probably want to talk to Warren Togami, the resident RedHat guy. I'd like to see it included, but nobody was willing to maintain it. You should be able to easily copy the relevant files from the 3.3.1 tarball, if they worked for you. Darxus, thanks for the info. i checked the bug link you gave, and frankly, pulling the .spec file because of https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6314 doesnt make any sense to me, yet what do i know... ;-) anyways, if i knew what the relevant files were between the two, id take a shot at it looks like it might be time to find a different solution bums us out cause we have actually been supportive (in small personal way) of the SA people / project. - rh I am sorry that you have been inconvenienced by this change. It was done because far more often it has caused support confusion and breakage for RPM distributions as the rpmbuild -ta packages are incompatible with the way spamassassin is packaged by all distributions. The rpmbuild -ta method has never been a supportable method, people often experience problems after installing it that way caused, report it, cause confusion for distributors, and in the end they were always ignored. The official .spec file for EL5, EL6 and Fedora are identical. It could technically build and work on EL4 with minor changes, but I dropped support for EL4 LONG AGO because the old version of perl there has problems with the proper operation of spamassassin. http://en.wikipedia.org/wiki/Red_hat_enterprise_linux In any case, it is time that you upgrade from EL4 because its supported lifetime ends February 2012. Good time to upgrade to EL6 which is supported until the year 2017. RHEL6 is great. CentOS 6 was just released. Scientific Linux 6 was out for a while now, and 6.1 is coming real soon. Warren Togami war...@togami.com
SpamTips.org: Why run your own DNS server?
Hey folks, http://www.spamtips.org/2011/07/spamassassin-why-run-your-own-dns.html I wrote this article about why it can be important to run your own DNS server if you have a busy Spamassassin deployment. Anyone have any better tips of an alternate DNS resolver, or configuration options to improve this suggested configuration? http://www.spamtips.org/p/ultimate-setup-guide.html Please see my Ultimate Setup Guide for all the latest tweaks to maximize your Spamassassin effectiveness and safety. Do you have any tips or tricks that are not mentioned here? https://admin.fedoraproject.org/mailman/listinfo/spamassassin-news Subscribe here for my Spamassassin for Sysadmins Newsletter Thanks, Warren Togami war...@togami.com
Re: SpamTips.org: Why run your own DNS server?
On 7/4/2011 12:58 AM, Toni Mueller wrote: Hi Warren, On Mon, 04.07.2011 at 00:46:15 -1000, Warren Togami Jr.wtog...@gmail.com wrote: http://www.spamtips.org/2011/07/spamassassin-why-run-your-own-dns.html Anyone have any better tips of an alternate DNS resolver, or configuration options to improve this suggested configuration? while I do agree that it is generally a very good idea to run your own DNS resolver, even if you have less than one mail per day, I am thorougly unconvinced about the qualities of PowerDNS. I do have a suggested alternative, though. http://unbound.net/ This server doesn't go to proprietary changes to the DNS protocol (like inventing new record types that noone else understands), but concentrates on delivering DNS according to the latest specs instead. I heard others recommend unbound, but I haven't tried it yet. Is it more RAM efficient than other alternatives, and fast? I don't believe pdns-recursor is guilty of this particular complaint as it is ONLY a recursor? Warren
Re: SpamTips.org: Why run your own DNS server?
On 7/4/2011 1:52 AM, Axb wrote: On 2011-07-04 12:46, Warren Togami Jr. wrote: Hey folks, http://www.spamtips.org/2011/07/spamassassin-why-run-your-own-dns.html I wrote this article about why it can be important to run your own DNS server if you have a busy Spamassassin deployment. Anyone have any better tips of an alternate DNS resolver, or configuration options to improve this suggested configuration? Warren Sadly, your post has unleashed a sequel of pretty useless hints rants. There is a drawback to running pdns-recursor. The above pdns-recursor instance is using ~400MB of memory. If you cannot afford this kind of memory use, you can reduce the limits in options max-cache-entries and max-packetcache-entries in /etc/pdns-recursor/recursor.conf as documented upstream. You will need to find a balance between memory use and effective cache hit performance. A small site will never use 400MB of DNS cacheing... don't scare ppl unnecessarily :) Larger sites already do local recursion and have the iron to to it. (other recursors will also use a lot of memory under high-ish load) I am not 100% certain about this, but it appears that pdns-recursor is tuned to normal patterns of DNS lookups (like web browsing or maybe a squid proxy server). It is caching a large amount of useless data, evidenced by the piss terrible cache hit ratio. My in-brain logic without testing suggested that timing out most of that nearly-useless cache may shrink memory usage considerably without making that poor cache hit ratio much worse, since more recent data is often more relevant. That is my theory anyway. I'm testing that now. Be careful when endorsing: For example, DNS results of DNSBL and URIBL's are very transient in nature with tiny TTL's, so perhaps we could substantially reduce memory usage by forcing max-cache-ttl and max-negative-ttl to a much smaller duration. It also appears that the packetcache is far more effective than the cache at achieving hits, so we may be better off favoring the packetcache rather than the memory hogging and less effective cache. Reducing negative TTL time should ONLY be done the user runs *local* copies of most of the queried BLs, otherwsise he may hit BL abuse threshold way earlier. BLs generally adjust their negative TTL to get a practical balance between query load and positive hits. Gaming these settings can become a costly process. Axb Good point, I'll remove that paragraph for now and actually test that theory myself to see how it effects the actual hit/miss ratio. Warren
Re: SpamTips.org: Why run your own DNS server?
On 7/4/2011 1:52 AM, Axb wrote: A small site will never use 400MB of DNS cacheing... don't scare ppl unnecessarily :) Larger sites already do local recursion and have the iron to to it. (other recursors will also use a lot of memory under high-ish load) It is also possible that pdns-recursor just sucks and I should be trying other daemons. I will try unbound next. Warren
Re: Rule updates
On 6/27/2011 7:03 AM, dar...@chaosreigns.com wrote: On 06/27, Lars Jørgensen wrote: I noticed the rules for 3.3.1 were updated during the weekend (don't worry about my workaholism, I noticed this monday morning ^-^). I was preparing to upgrade to 3.3.2, but seeing the updated rules makes me doubt whether the upgrade is necessary. I expect rule updates to remain compatible throughout the 3.3.x series, so as long as updates are happening for any 3.3.x version, you you should get them, and they should work, with 3.3.1 (and 3.3.0, etc.). That *could* change, I suppose, but I don't expect it. There has been talk of adding a rule to hit all emails for versions nolonger being maintained, something like SPAMASSASSIN_OUT_OF_DATE: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6614 3.3.x is the first version that supports rule conditionals, so it is possible that 3.4.x rule updates updates could refer to plugins that do not exist in 3.3.x, and those sections are safely ignored by 3.3.x. It seems the intent is to release 3.4 late this year. I heard that the only compat change from 3.3.x to 3.4.x is in the spamc/spamd protocol, so it should theoretically be an easy upgrade. It remains to be seen exactly what is decided for 3.3.x rule updates after 3.3.x is released. Warren
ANNOUNCE: Apache SpamAssassin 3.3.2 available
Release Notes -- Apache SpamAssassin -- Version 3.3.2 Introduction This is a minor release, primarily to support perl-5.12 and later. Additionally several other minor bugs are fixed. Downloading and availability Downloads are available from: http://spamassassin.apache.org/downloads.cgi md5sum of archive files: 253f8fcbeb6c8bfcab9d139865c1a404 Mail-SpamAssassin-3.3.2.tar.bz2 d1d62cc5c6eac57e88c4006d9633b81e Mail-SpamAssassin-3.3.2.tar.gz 06d84d34834d9aecdcdffcc4de08b2a7 Mail-SpamAssassin-3.3.2.zip 72f8075499c618518c68c7399f02b458 Mail-SpamAssassin-rules-3.3.2-r1104058.tar.gz sha1sum of archive files: f38480352935fe3bb849a27a52615e400dee7d66 Mail-SpamAssassin-3.3.2.tar.bz2 de954f69e190496eff4a796a9bab61747f03072b Mail-SpamAssassin-3.3.2.tar.gz edc6297dc651eeb7a4872f596ec5a54aeea85349 Mail-SpamAssassin-3.3.2.zip a199d5f0f8c2381e3dfe421e7a774356b3ffda4b Mail-SpamAssassin-rules-3.3.2-r1104058.tar.gz Note that the *-rules-*.tar.gz files are only necessary if you cannot, or do not wish to, run sa-update after install to download the latest fresh rules. See the INSTALL and UPGRADE files in the distribution for important installation notes. GPG Verification Procedure -- The release files also have a .asc accompanying them. The file serves as an external GPG signature for the given release file. The signing key is available via the wwwkeys.pgp.net key server, as well as http://www.apache.org/dist/spamassassin/KEYS The key information is: pub 4096R/F7D39814 2009-12-02 Key fingerprint = D809 9BC7 9E17 D7E4 9BC2 1E31 FDE5 2F40 F7D3 9814 uid SpamAssassin Project Management Committee priv...@spamassassin.apache.org uid SpamAssassin Signing Key (Code Signing Key, replacement for 1024D/265FA05B) d...@spamassassin.apache.org sub 4096R/7B3265A5 2009-12-02 To verify a release file, download the file with the accompanying .asc file and run the following commands: gpg -v --keyserver wwwkeys.pgp.net --recv-key F7D39814 gpg --verify Mail-SpamAssassin-3.3.2.tar.bz2.asc gpg --fingerprint F7D39814 Then verify that the key matches the signature. Note that older versions of gnupg may not be able to complete the steps above. Specifically, GnuPG v1.0.6, 1.0.7 1.2.6 failed while v1.4.11 worked flawlessly. See http://www.apache.org/info/verification.html for more information on verifying Apache releases. Summary of major changes since 3.3.1 NOTE: Complete changes are available at http://svn.apache.org/repos/asf/spamassassin/branches/3.3/Changes Bug #6353: Fix FH_FROMEML_NOTLD, add MISSING_FROM Bug #6427: Spamc windows header library missing two defines. Bug #6476: patch to fix missing sa-awl man page bug Bug #6470: Small change in windows to exit stating that the exit status is unknown. Thanks to Daniel Lemke for many of these small win32 patches. Bug #6314: Complete removal of spamassassin.spec Bug #6589: Errors in man pages Bug #6588: Small bug in the regexp caught by Jose Borges Ferreira in Bug #6515: spamd timeout_child option overrides time_limit configuration option with nastier behaviour Bug #6490: Mail::SpamAssassin::Plugin::SPF - Two enhancement issues Bug #6562: NULL reference bug in libspamc. Quick workaround to avoid a crash. Bug #6454: wrong status test on $sth-rows in BayesStore::PgSQL Bug #6418: Cannot Log to stderr without timestamps Bug #6403: GMail should use ESMTPSA to indicate that it is in fact authenticated, but doesn't Bug #6229: TextCat is too case sensitive Bug #6241: mkrules does not understand newer options and else Bug #6382: add missing unwhitelist_from_dkim, remove facebook and linkedin from dkim whitelisting Bug #5744: some documentation fixes Bug #6447: new feature to bayes autolearning: learn-on-error Bug #6566: X-Ham-Report default wording (has identified this incoming email as possible spam) is confusing and inaccurate Bug #6468: splice() offset past end of array in HTML.pm Bug #6377: win32: spamd signal handling Bug #6376: win32: consider negative pids under windows in spamds waitpid Bug #6375: win32: posix macro not implemented - spamd Bug #6336: Illegal octal digit 9 received during rules compile Bug #6526: Disable rfc-ignorant.org Bug #6531: clear_uridnsbl_skip_domain feature to allow admin override of default configuration Bug #5491: MIME_QP_LONG_LINE triggering on valid email Bug #6558: body rules having tflags multiple may cause infinite loop when compiled - a workaround Bug #6557: Use same age limits in ruleqa as in sa-updates Bug #6548: spamd protocol examples are wrong Bug #6500: clear_originating_ip_headers seems to be broken Bug #6565: check_rbl_sub rules - all dots need to be escaped - commit felicity/70_dnswl.cf and felicity/70_iadb.cf too Bug #6565: check_rbl_sub rules - all dots need to be escaped Bug #6578: Move TLD
Spamassassin 3.3.2 RPM Packages for Fedora and RHEL
http://www.spamtips.org/p/rpm-packages.html These packages for EL5 and EL6 are identical to the Fedora versions, and I personally use them in production. Warren Togami war...@togami.com
Re: Sought rules
On 6/11/2011 10:03 AM, Justin Mason wrote: guys -- I'm going to make the whole question moot (in trunk at least) -- the only reason SOUGHT and SOUGHT_FRAUD were being checked in there was to make their accuracy visible in ruleqa. It's been months since I've looked at that, so it's needless. I'll remove them from svn asap. --j. WAIT!!! Wouldn't this remove our ability to check for false positives of your patterns against the much larger ham collection of nightly masscheck? I wouldn't be concerned about this if there were a way to collaborate on feeding more ham into SOUGHT's safety check corpus, but when I asked about this earlier you seemed hesitant. Warren
Re: Sought rules
On 6/12/2011 12:32 AM, Warren Togami Jr. wrote: On 6/11/2011 10:03 AM, Justin Mason wrote: guys -- I'm going to make the whole question moot (in trunk at least) -- the only reason SOUGHT and SOUGHT_FRAUD were being checked in there was to make their accuracy visible in ruleqa. It's been months since I've looked at that, so it's needless. I'll remove them from svn asap. --j. WAIT!!! Wouldn't this remove our ability to check for false positives of your patterns against the much larger ham collection of nightly masscheck? The alternative is to filter SOUGHT from the sa-update rule updates with a script, but still allow it in the nightly masschecks. Testing sought in nightly masschecks has been useful to occasionally find obvious SOUGHT problems, or sometimes to locate spam that was misplaced in the ham folder. Warren
Re: Sought rules
Wait a sec, I'm confused about this. JM_SOUGHT_2 hitting on every legit Facebook message on dev@ list February 17th 2011. If the SOUGHT channel was being overridden by the sa-update rules, how would this problem appear from the SOUGHT channel? Doesn't this suggest that spamassassin was successfully using the SOUGHT channel? (I still agree we should remove the static SOUGHT from the sa-update rules.) Warren
READ THIS Re: Sought rules
On 6/10/2011 11:13 PM, Warren Togami Jr. wrote: Wait a sec, I'm confused about this. JM_SOUGHT_2 hitting on every legit Facebook message on dev@ list February 17th 2011. If the SOUGHT channel was being overridden by the sa-update rules, how would this problem appear from the SOUGHT channel? Doesn't this suggest that spamassassin was successfully using the SOUGHT channel? (I still agree we should remove the static SOUGHT from the sa-update rules.) Warren NOTE: I'm skeptical that this bug is effecting us on Linux. I am not using a re-order hack and yet SOUGHT seems to be changing its behavior on a daily basis. Warren
Re: Sought rules
On 6/10/2011 7:14 AM, Karsten Bräckelmann wrote: You are generally correct about the numerical (actually lexical) order, though it doesn't apply to the files you are talking about. The mentioned 72_active and 20_sought are in different sa-update channels. Now, the bad thing about this is that updates_spamassassin_org.cf is lexically *after* sought_rules_yerp_org.cf in your rule update dir. Which means the more recent rules in the dedicated Sought channel are overwritten by the stock rules... This merely requires a re-ordering hack, though. A symlink zzz_sought.cf in your rule updates dir, pointing at the channel generated cf should do. These channel cf files only hold include statements, to pull in the actual cf files in the per-channel dir. Without a re-ordering hack, does this mean mean that essentially EVERYONE is using SOUGHT wrong? This is a bit worrisome. Warren
Re: Sought rules
On 6/10/2011 2:01 PM, Karsten Bräckelmann wrote: IFF you use the sought channel with SA 3.3.x, you will need the reorder hack to bend the alphabet. It is not entirely clear to me, what exactly are you supposed to rename for the reorder hack? You have to do it every time you sa-update? Warren
Re: Sought rules
On 6/10/2011 3:34 PM, John Hardin wrote: On Fri, 10 Jun 2011, Lawrence @ Rogers wrote: On 10/06/2011 10:24 PM, Warren Togami Jr. wrote: On 6/10/2011 2:01 PM, Karsten Bräckelmann wrote: IFF you use the sought channel with SA 3.3.x, you will need the reorder hack to bend the alphabet. It is not entirely clear to me, what exactly are you supposed to rename for the reorder hack? You have to do it every time you sa-update? Would renaming 20_sought_fraud.cf to 99_sought_fraud.cf, putting 20_sought_fraud.cf (from the yelp.org channel) after 72_active.cf (the default and assumed older SA rules) solve this problem? Or symlinks from your local configs directory to the SOUGHT channel directory files. That would probably be easier to not forget about when things get fixed. Is Lawrence's suggestion something we can do upstream to fix this problem? Alternatively, I think it is a mistake for us to ship SOUGHT rules at all in the standard sa-update channel. That is, unless we plan on updating the patterns and scores of SOUGHT on a daily basis. I highly doubt we will do that. Warren Togami war...@togami.com
3.3.2 Ready for Testing
We need +3 votes from PMC (or the release manager) to declare 3.3.2 an official ASF release. This 3.3.2 release has no changes since 3.3.2-rc2. Please do some testing before voting. If you are not a PMC member, please let us know if you see any regressions since 3.3.1 along with details of your platform. Proposed Official Release of 3.3.2 === http://people.apache.org/~wtogami/devel/3.3.2/ f38480352935fe3bb849a27a52615e400dee7d66 Mail-SpamAssassin-3.3.2.tar.bz2 de954f69e190496eff4a796a9bab61747f03072b Mail-SpamAssassin-3.3.2.tar.gz edc6297dc651eeb7a4872f596ec5a54aeea85349 Mail-SpamAssassin-3.3.2.zip a199d5f0f8c2381e3dfe421e7a774356b3ffda4b Mail-SpamAssassin-rules-3.3.2-r1104058.tar.gz GPG Verify Procedure wget http://people.apache.org/~wtogami/devel/3.3.2/Mail-SpamAssassin-3.3.2.tar.bz2 wget http://people.apache.org/~wtogami/devel/3.3.2/Mail-SpamAssassin-3.3.2.tar.bz2.asc gpg --recv-key F7D39814 gpg --verify Mail-SpamAssassin-3.3.2.tar.bz2.asc spamassassin-3.3.2-rc2 RPM Test packages for EL5 and EL6 http://www.spamtips.org/p/rpm-packages.html http://people.apache.org/~wtogami/rpm/3.3.2/ Warren Togami war...@togami.com
3.3.2-rc2 Call for Testing
3.3.2-rc2 is meant to be the true release candidate for 3.3.2. If we find no problems with rc2, then I will recut it as 3.3.2 final with no code changes. http://people.apache.org/~wtogami/devel/3.3.2-rc2/ 3.3.2-rc2 tarballs plus rules from sa-update channel sha1sum of archive files: 445b3d0a9e93284af82180c03f8c3b0fa4c5d2fc Mail-SpamAssassin-3.3.2-rc2.tar.bz2 4eb6a3c23714e33c0413fa25ed45c2796129ac9a Mail-SpamAssassin-3.3.2-rc2.tar.gz 876314a64730604df9468243f68a1fd15b18c214 Mail-SpamAssassin-3.3.2-rc2.zip a199d5f0f8c2381e3dfe421e7a774356b3ffda4b Mail-SpamAssassin-rules-3.3.2-rc2.r1104058.tar.gz http://people.apache.org/~wtogami/rpm/3.3.2-rc2/ RPM packages for EL5 and EL6 Warren Togami war...@togami.com
Re: Trouble starting Spamassassin
On 5/18/2011 1:20 AM, john ffitch wrote: Thank you. Removing the defined clear one error but I still get May 18 12:17:36.306 [5489] warn: Use of uninitialized value $opt{syslog-socket} in lc at /usr/bin/spamd line 444. child process [5491] exited or timed out without signaling production of a PID file: exit 255 at /usr/bin/spamd line 2588. so does not work. I am reluctant to install a rc1 in a live system ==John ffitch 3.3.2-rc1 actually works while 3.3.1 does not. By my download counts, it appears at least 200 people are running my 3.3.2-rc1 RPMS and I have heard no complaints. Warren
EL5 and EL6 Packages of spamassassin-3.3.2-rc1
http://people.apache.org/~wtogami/rpm/3.3.2-rc1/ I made test packages for EL5 and EL6. I began using both in production just now with no apparent ill effects. We need more people to test this and provide feedback. Warren On 05/14/2011 10:34 PM, Warren Togami Jr. wrote: Hey folks, This is an UNRELEASED CANDIDATE of spamassassin-3.3.2-rc1. It would be helpful for folks to test it and provide feedback. Don't worry about the rules tarball, because the real rules you get from running sa-update the first time. http://people.apache.org/~wtogami/devel/3.3.2-rc1/ sha1sum of archive files: 191fc4548c7619e11127ef04714be19741122ea9 Mail-SpamAssassin-3.3.2-rc1.tar.bz2 813b2adb7ab15f6ddc34c9de7fc10e0f9b7b28cd Mail-SpamAssassin-3.3.2-rc1.tar.gz 23bee590d0e4ec5f11936bc931fb73211970966a Mail-SpamAssassin-3.3.2-rc1.zip 9e20dd49fbbb1bf1ff4d171ac3531b53ba7c9dfd Mail-SpamAssassin-rules-3.3.2-rc1.r1083704.tgz GPG signatures available at the above URL. WARNING: I did not test this in production. Warren Togami war...@togami.com
Testing Needed: spamassassin-3.3.2-rc1
Hey folks, This is an UNRELEASED CANDIDATE of spamassassin-3.3.2-rc1. It would be helpful for folks to test it and provide feedback. Don't worry about the rules tarball, because the real rules you get from running sa-update the first time. http://people.apache.org/~wtogami/devel/3.3.2-rc1/ sha1sum of archive files: 191fc4548c7619e11127ef04714be19741122ea9 Mail-SpamAssassin-3.3.2-rc1.tar.bz2 813b2adb7ab15f6ddc34c9de7fc10e0f9b7b28cd Mail-SpamAssassin-3.3.2-rc1.tar.gz 23bee590d0e4ec5f11936bc931fb73211970966a Mail-SpamAssassin-3.3.2-rc1.zip 9e20dd49fbbb1bf1ff4d171ac3531b53ba7c9dfd Mail-SpamAssassin-rules-3.3.2-rc1.r1083704.tgz GPG signatures available at the above URL. WARNING: I did not test this in production. Warren Togami war...@togami.com
DNSBL Safety Report 5/14/2011
http://www.spamtips.org/2011/05/dnsbl-safety-report-5142011.html Several of the well known add-on DNSBL's have changed in safety or overlap since the previous January 2011 report, so sysadmins of Spamassassin servers may want to look carefully at this new report. https://admin.fedoraproject.org/mailman/listinfo/spamassassin-news Subscribe to my Spamassassin for Sysadmins Announce-Only Newsletter. The next issue is coming sometime after the release of spamassassin-3.3.2. Warren
Re: Testing Needed: spamassassin-3.3.2-rc1
Please file bugs. Nothing can be committed to spamassassin-3.3.x without bugs and votes. Warren
Re: Dumb questions
On 5/6/2011 9:19 AM, Greg Lentz wrote: Well, since it looks like SA 3.2 hasn't been getting rules for a couple of years, that probably isn't as critical at the moment. -- Greg Lentz Of course it is critical. How effective would your virus scanner be after several years without updates? Warren
Re: Any active rules repositories left?
On 4/22/2011 6:32 AM, Morten wrote: Hi folks, I'm looking at upgrading a SA 3.2.5 installation. I see that there's a 3.3.1 release, but that's more than a year old. Is there some shared rules repository out there that's more recent? Thanks, Morten http://www.spamtips.org/p/ultimate-setup-guide.html Please follow spamtips.org with documents every add-on and configuration to maximize the effectiveness and safety of your spamassassin deployment. Warren
Mailspike Performance
We haven't had working statistics viewing for a few weeks, but now it is fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL. http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_MSPIKE_BL/detail RCVD_IN_MSPIKE_BL has nearly the highest spam detection ratio of all the DNSBL's, second only to RCVD_IN_XBL. But our measurements also indicate it is detecting this huge amount of spam with a very good ham safety rating. * 84% overlap with RCVD_IN_XBL - redundancy isn't a huge problem here because XBL is a tiny score. But 84% is surprisingly low overlap ratio for such high spam detecting rule. This confirms that Mailspike is doing an excellent job of building their IP reputation database in a truly independent fashion. * 67% overlap with RCVD_IN_PBL - overlap with PBL is concerning because PBL is a high score. But 67% isn't too bad compared to other production DNSBL's. * 58% overlap with RCVD_IN_PSBL - pretty good Given Mailspike's sustained decent performance since late 2009, it seems clear that it is a great candidate for addition to spamassassin-3.4 by default. It would be very interesting to see what it does to the scores during an automatic rescoring of the network rules. Thoughts about Future Rescoring === Before that rescoring, we may want to have a serious discussion about reducing score pile-up in the case where multiple production DNSBL's all hit at the same time. Adam Katz' approach is one possibility, albeit confusing to users because users see subtractions in the score reports. There may be other better approaches to this. In related news... == http://www.spamtips.org/2011/01/dnsbl-safety-report-1232011.html The January DNSBL Safety report found RCVD_IN_SEMBLACK to be reasonably safe, but at the time it overlapped with RCVD_IN_PBL 91% of the time making it dangerously redundant due to PBL's high production score. http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_SEMBLACK/detail Our most recent measurements indicate that SEMBLACK is back to previous behavior of extremely poor safety rating, with false positives on ~7% of ham from recent weeks. It was a bad idea to use SEMBLACK earlier this year due to the high overlap with RCVD_IN_PBL, but this significant decline in safety rating is a clear indication that you should not be using RCVD_IN_SEMBLACK. http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_HOSTKARMA_BL/detail HOSTKARMA_BL overlaps with MSPIKE_BL 88% of the time, but detects far fewer spam and and with slightly more FP's. Compared to last year, HOSTKARMA_BL's safety rating has improved considerably on a sustained basis, and if we were evaluating it alone it wouldn't be too bad. But now that we see the overlaps, HOSTKARMA_BL at this very moment is pretty close to a redundant and slightly less safe subset of RCVD_IN_MSPIKE_BL. Given these measurements, it probably isn't helpful to use HOSTKARMA_BL. Warren Togami war...@togami.com
Re: Suddenly tons of spam
On 3/29/2011 8:30 AM, RW wrote: On Tue, 29 Mar 2011 12:55:51 -0500 Maxmdun...@breakawaysystems.com wrote: Heres the output of spamassassin -D --lint: [29434] dbg: logger: adding facilities: all [29434] dbg: logger: logging level is DBG [29434] dbg: generic: SpamAssassin version Update to the current version. It's not worth giving it any more thought until you've done that. The rules for 3.2.5 haven't been worked on some time. http://www.spamtips.org/p/ultimate-setup-guide.html Indeed. Upgrade to spamassassin-3.3.1, make sure sa-update is set to run at least once daily, then follow everything on this page to maximize its performance. Warren
Re: Spam Eating Monkey causing 100% false positives for large institutions
On 3/23/2011 7:38 AM, Blaine Fleming wrote: On 3/23/2011 9:56 AM, dar...@chaosreigns.com wrote: In the recent sa-updates, the Spam Eating Monkey rules were inappropriately enabled. If you hit them too much, they start returning 100% false positives. Their listed limits are more than 100,000 queries per day or more than 5 queries per second for more than a few minutes. As soon as the bug was reported on the dev list I disabled the 127.0.0.255 response code to avoid any additional issues. I will be turning this functionality back on as soon as the SA rules are updated which I assume will be soon. I would recommend blackholing those IP addresses at the firewall of the DNS server, especially those 300 million+ sites that are impossible to contact. They might finally notice they have a serious configuration issue and stop querying if their mail delivery backs up. Warren
Re: Spam Eating Monkey causing 100% false positives for large institutions
On 3/23/2011 10:58 AM, Karsten Bräckelmann wrote: On Wed, 2011-03-23 at 10:18 -1000, Warren Togami Jr. wrote: On 3/23/2011 7:38 AM, Blaine Fleming wrote: In the recent sa-updates, the Spam Eating Monkey rules were inappropriately enabled. [...] As soon as the bug was reported on the dev list I disabled the 127.0.0.255 response code to avoid any additional issues. I will be turning this functionality back on as soon as the SA rules are updated which I assume will be soon. I would recommend blackholing those IP addresses at the firewall of the DNS server, especially those 300 million+ sites that are impossible to contact. They might finally notice they have a serious configuration issue and stop querying if their mail delivery backs up. Ugh, nasty boy. ;) You do realize they wouldn't be hammering the SEM DNS servers, if testrules wouldn't have slipped out accidentally -- by sa-update. Personally, I'd much rather prefer to have this resolved by another manual rule update, so the queries should die down within another 24-48 hours. Obviously, these sites do use sa-update... Thanks and props to Blaine, for effectively disabling the limit temporarily, and sustain the load for a while! :) Agreed that would be the ideal solution. Who knows the procedure? Is that procedure documented? Warren
Re: Performance on Spear Phishing?
On 3/16/2011 4:08 PM, Hamad Ali wrote: Hi folks -- wondering if anyone has monitored SA's performance against phishing mails. SA is able to detect 86% of phishing emails my clients get, with 0.5% false positives on all the ham. It seems non-phish-SPAM is easier to be detected than phish (~99% for non-phish spam). Probably I need to participate on nightly checks to improve phish and lower false positives. But all the above stuff is about bulk-phish, excluding spear phish. I haven't received any spear phishing complain from my clients, and yet none of the detected phish mails are spear phish -- which is alarming as it's too good to be true that no one did spear phishing yet (specially that it works far better than bulk-phish)! What's the scenario in your mail systems folks? Do you detect spear phishing mail by SA? Users report it? -- H Are you using spamassassin-3.3.1? http://www.spamtips.org/p/ultimate-setup-guide.html Have you tweaked it with the best tested add-ons? Please read this page. In particular the fuzzy hash based plugins like pyzor, Razor and DCC sometimes is effective against phishing. Warren
Re: Performance on Spear Phishing?
On 3/16/2011 5:45 PM, Karsten Bräckelmann wrote: On Wed, 2011-03-16 at 20:30 -0700, John Hardin wrote: On Thu, 17 Mar 2011, Hamad Ali wrote: Probably I need to participate on nightly checks to improve phish and lower false positives. More masscheck participants are always welcome! No. There is this thing called trust. Credibility. And track-record. Which pretty much is the opposite of a freemail address, venting two questions on this list -- without ever getting back even to specific requests for better data, offer for precise help, or a dialog. Karsten, thanks for pointing out that this is the same guy. I had missed that. Warren
Re: how to disable network tests?
On 3/11/2011 10:05 AM, Hamad Ali wrote: hi folks --- everything seems working like chicken. I'm loving SA so far. However, I would like to disable all network tests (each mail takes ~10 seconds!). Except that I dunno how to do it the neat way. Will the tests be disabled if their score is 0? I know that would lead into disabling the effect of a rule on the decision making of SA (i.e. Spam/Ham marking), but would SA exclude them from running too? I need to disable all BLs, DNS queries, and anything that uses the internet. Kindly advise. Thank you guys -- May OOP Raise and Shine! H Please consider that spamassassin is CRIPPLED without the network tests. If it is taking 10 seconds per message then you likely have some kind of serious misconfiguration. The first likely culprit is your DNS server is not good. Several times in past years I've had to stop using my ISP's (or data center's!) official DNS servers because they were simply not capable of handling the load of spamassassin. In such cases I run pdns-recursor on each Spamassassin server directly, and set /etc/resolv.conf to use 127.0.0.1 as the DNS resolver. After you have switched to a known good DNS server, do the following to diagnose the network tests. 1) Save a single spam message as a flat file, with headers and body intact. If your folders are Maildir format then a single file in your directory tree is suitable for this purpose. 2) cat FILE | spamassassin -D 3) Copy the entire output and paste into a text editor. 4) Look at the lines near the bottom for async: timing: Those are followed by a number of seconds that an individual DNS request took to respond. All of these numbers are typically between 0 and 3 seconds on my server. If you have much larger numbers or some queries are timing out entirely, then you may have further issues with your DNS server, or you may have been blocked from queries because you have exceeded free usage limits. http://www.spamtips.org/2011/01/usage-limits-of-spamassassin-network.html Please see my article here about the free usage limits of the various spamassassin network tests. http://www.spamtips.org/p/ultimate-setup-guide.html Please read this page for all known safe and effective configuration tweaks to spamassassin. Warren Togami war...@togami.com
Re: sa-updates
On 3/10/2011 1:41 AM, Nigel Frankcom wrote: Hi All, Apologies if this has been covered, an admittedly fairly cursory Google showed nothing new. My local sa-update hasn't updated in the better part of a month. Is it that there have been no updates or do I need to dig into my systems to see what I broke, how and when? Regards to all Nigel http://ruleqa.spamassassin.org/ The auto-promotion mechanism that promotes/demotes and rescores new rules has been broken lately because we are lacking sufficient quantities of ham and spam in the nightly masscheck. You can see the results of each nightly masscheck at the above link. https://fedorahosted.org/auto-mass-check/ We are seriously in need of additional volunteers in the nightly masscheck. Please read this page to learn how to join. Warren Togami war...@togami.com
Re: The one year anniversary of the Spamhaus DBL brings a new zone
On 3/8/2011 9:58 AM, Bill Landry wrote: FYI: Spamhaus created a new URL shortener/redirector zone in the DBL. See: http://www.spamhaus.org/news.lasso?article=667 Will Spamassassin be adding support for this new DBL shortener/redirector response code?: 127.0.1.3 spammed redirector domain For details, see: http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291 Regards, Bill OK, so this is meant to be used as a URIBL. I don't see this as anything special because there is no way to query the pathname portion of a URI which would allow more fine-grained detection of spammy URI's even on a non-evil shortening service. Is this new DBL return code meant to be a lower score than ordinary URIBL's that often choose to list evil shortener domains? My point is this is no different than an ordinary URIBL listing. Warren
Re: Open letter to Yahoo and Hotmail concerning junkmail
On 3/6/2011 3:15 AM, Ned Slider wrote: On 06/03/11 11:46, Warren Togami Jr. wrote: I have no comment on your proposed solution. I can however point out the statistics that I see on my own spam traps. It seems that 90%+ of the spam coming from DNSWL listed hosts is Yahoo and Hotmail which are listed as DNSWL_NONE. Meanwhile very few spam comes from gmail.com. Apparently DNSWL agrees because they give gmail.com's outgoing MTA's a LOW ranking which is pretty good for a freemail provider. Google is doing something right in outgoing spam prevention. Warren Exactly. If Google can manage to do a pretty good job then it just tells me Microsoft and Yahoo don't care. I've long since stopped caring too and have scored them in SpamAssassin - the only way their mail gets through now is if the sender address is whitelisted or they score some negative points (e.g, Bayes) to get them back below my threshold. These providers are NOT too big to block and the sooner we all start realising that the sooner they might start to care about their reputations and stop emitting huge volumes of spam. Personally I think it's about time FROM_HOTMAIL and FROM_YAHOO became high scoring stock rules in SpamAssassin. A score of 3 points might be a reasonable starting point. I'd agree, but users wont rebel against Yahoo unless they begin to see actual bounces to their sent mail. I do agree that we should have FROM_HOTMAIL and FROM_YAHOO so we can independently decide how to treat their mail separate from typical FREEMAIL. Warren
Re: Open letter to Yahoo and Hotmail concerning junkmail
On 3/7/2011 2:10 AM, Mynabbler wrote: Warren Togami Jr. wrote: I'd agree, but users wont rebel against Yahoo unless they begin to see actual bounces to their sent mail. I don't know about your end users, but ours typically get flummoxed if mail from this well known and trusted free mail providers would not arrive to them... There's just too many users actually using their services, mixed with too many spammers abusing it. My point here is getting an explicit reject is better than silently disappearing. I wasn't commenting on the wisdom of being prejudiced against Yahoo or Hotmail though. Warren
Re: Open letter to Yahoo and Hotmail concerning junkmail
I have no comment on your proposed solution. I can however point out the statistics that I see on my own spam traps. It seems that 90%+ of the spam coming from DNSWL listed hosts is Yahoo and Hotmail which are listed as DNSWL_NONE. Meanwhile very few spam comes from gmail.com. Apparently DNSWL agrees because they give gmail.com's outgoing MTA's a LOW ranking which is pretty good for a freemail provider. Google is doing something right in outgoing spam prevention. Warren
Re: low score for ($1.5Million)
On 3/3/2011 3:06 PM, Karsten Bräckelmann wrote: On Fri, 2011-03-04 at 01:53 +0100, Mikael Syska wrote: I get the following hits: Content analysis details: (19.1 points, 5.0 required) Note though, that your score is on SA 3.3.x, while the OP uses SA 3.2.x. Yes, I can tell this from the scores. :) Major changes between these version are clearly reflected in your score and rules hit. Namely a lot of work by John Hardin to catch exactly such fraud, and the FreeMail plugin now upstream -- with 3.2 it is available as a third-party plugin. Could we please make an official project statement that 3.2.x is unsupported and people should really update to 3.3.x? Warren
Re: DNSWL rules downscoring spam
On 2/20/2011 6:21 AM, Matthias Leisi wrote: On Sun, Feb 20, 2011 at 4:22 PM, Pasi Hirvonenp...@iki.fi wrote: Hello, I just recently moved our mail setup to new hardware and I've been paying close attention to what gets marked as spam and what doesn't. Looking at my spam folder, I have received roughly 550 spam emails to my email account since last tuesday (15th). Out of those 550, *345* have been downscored by RCVD_IN_DNSWL_MED. Annoyingly, a significant number of those spam mails have dropped just below the spam threshold because of it. That should not happen. Can you share some headers? Thanks, -- Matthias, for dnswl.org Matthias, we really need a method to auto-report violations of DNSWL. My spam traps receive dozens or more every week. But I don't have time to file a web form every time it happens. Warren
Re: DNSWL rules downscoring spam
On 2/20/2011 6:31 AM, dar...@chaosreigns.com wrote: I know of no reason it would be a temporary hiccup, but it is certainly unusual. According to spamassassin's mass checks, 0.89% of spam hits RCVD_IN_DNSWL_MED: http://www.chaosreigns.com/dnswl/ The masscheck results are a bit misleading, overwhelmed in quantity with lower quality trap spam at the moment because the higher quality real-address sorted spam is in such low quantity. A few weeks ago when DOS revealed that we need at minimum of 150k spam in a 2 month window I even adjusted my servers to include a larger percentage of trap spam. I know this is problematic, but I intend this to be temporary. I will adjust this down as we have more volunteers join the nightly masscheck and our overall quantities are boosted. My point is, DNSWL violations seem to be occurring at a higher rate to real e-mail addresses than to fake addresses. Warren
Re: DNSWL rules downscoring spam
On 2/20/2011 9:11 AM, Michelle Konzack wrote: Hello Pasi Hirvonen, Am 2011-02-20 17:22:23, hacktest Du folgendes herunter: Hello, I just recently moved our mail setup to new hardware and I've been paying close attention to what gets marked as spam and what doesn't. Looking at my spam folder, I have received roughly 550 spam emails to my email account since last tuesday (15th). Out of those 550, *345* have been downscored by RCVD_IN_DNSWL_MED. Annoyingly, a I have EXACTLY the same problem here... I get per day on my 8 Servers arround 280.000 with 86.000 users. You couldn't consider me to be anti-DNSWL given that I've been strongly promoting DNSWL, urging many to list themselves in recent months. But if automated enforcement doesn't become a reality I am going to push harder for further default score reductions for not only DNSWL but all whitelists. I've seen problems with DNSWL and IADB whitelists in the past year. RP whitelists were bothering me during 2009 but I haven't seen so many problems during late 2010. Warren
Re: Sa-update and proxy servers
On 2/17/2011 11:44 PM, Daniel Lemke wrote: Michael Scheidell wrote: [...] I now need to set a proxy server to do sa-updates through, but could not find any information on settings for a proxy server. [...] Added cmd options: -x --proxy -U --proxy-user -P --proxy-password -t --connect-timeout. [...] Hi, just found this old thread regarding the proxy capabilities of sa-update. I wonder why Michael's patch hasn't been included to the official source. We've got a customer that wants to use sa-update through a proxy but using a custom patch to provide such a feature is kind of weird. Would it be possible to make the patch official? At least it'd be great if one could specify username and password in addition to the proxy url by using environment variables for LWP::Agent. Any comments on this? Daniel Was this ever filed as a bug with the suggested patch attached? Nothing gets in the code without a bug filed. Warren
Re: using spamhaus droplist with sa ?
On 2/17/2011 5:40 AM, RW wrote: The suggestion is that it be scored higher for that reason. Or just outright block all MTA connections from anything listed in zen.spamhaus.org, which seems to be safe. Large sites I know have been doing that for years without any complaints. Warren
Re: alert: New event: ET EXPLOIT Possible SpamAssassin Milter Plugin Remote Arbitrary Command Injection Attempt
On 2/10/2011 1:29 PM, John Hardin wrote: On Thu, 10 Feb 2011, David B Funk wrote: On Fri, 11 Feb 2011, Jason Haar wrote: On 02/11/2011 09:37 AM, Mark Martinec wrote: Yes, the security hole is entirely within the milter, independent of the MTA. That exploit is dated Mar 2010? Has this really not been fixed in about a year??? a year??, try half-a-decade. I've got a copy of that code from March 2006 and the vulnerability is there. Rather stale project. ;) heh. I suppose we ought to compose a boilerplate response for the inevitable visitors who will show up asking about this exploit in SpamAssassin... Perhaps more than boilerplate, but rather an official advisory to clear up the confusion? Given that upstream of that milter is dead, nobody else will make an official advisory? Warren
Re: mx1.res.cisco.com a dynamic ip?
On 2/10/2011 2:30 PM, Michael Scheidell wrote: host mx1.res.cisco.com mx1.res.cisco.com has address 208.90.57.13 $ host 208.90.57.13 13.57.90.208.in-addr.arpa domain name pointer mx1.res.cisco.com. looks fine to me, why does this look to SA like a dynamic ip? (TRIGGERED RDNS_DYNAMIC.) what, because of 'res' in it? yes, they SHOUTED AT THE RECIPIENT, AND I EXPLAINED DON'T DO THAT IN SUBJECT LINE, its rude. The RDNS_DYNAMIC rule might be better to be replaced by the more precise S25R-based patterns in KHOP_DYNAMIC. Care enough? Please file a bug and look into the relative results of the masschecks to start an analysis. Warren
Re: Need Volunteers for Ham Trap
On 02/07/2011 05:37 PM, Mahmoud Khonji wrote: On 01/21/2011 01:06 AM, Warren Togami Jr. wrote: On 1/20/2011 7:23 AM, R - elists wrote: initially this came across as a really suspect idea... i.e., one man's junk is another man's treasure Ham is a lot easier to define than Spam. Ham is simply anything that you subscribed for. I am currently subscribed to number of mailing lists to collect ham emails (in addition to other sources). While it might be true that mailing lists can be good sources of ham, their emails do not contain realistic diversity of features/characteristics. I explicitly excluded discussion mailing lists from the ham trap. In my view, the issue is not just insuring an email is ham, but also insuring that it contains realistic set of features. If the features are not realistic, and if we optimize tests scores based on that, then we might end up worsening test scores for realistic end-users. Not if it is subscribed to hundreds of opt-in subscriptions for legitimate mail that ordinary users receive, most of which is otherwise not represented in the corpora. Many of these subscriptions send mail only once a week or month. It is true that the hamtrap corpus is synthetic and thus not fully representative in frequencies of real ham. But its volume is only a tiny fraction of a percent of our total ham. It helps us to detect and fix problems in individual rules by injecting some variety without causing a measurable impact on the entire corpus. For example, most list emails are non-HTML. While most end-user ham and spam emails are HTML. Evaluating sets of features (or tests) based on this unrealistic corpus is likely to fools us into thinking that a feature/test is more effective that what it is in reality (i.e. we might end up giving MIME-based tests higher scores). The spec and implementation of this ham trap already took this and many other issues into consideration. We've already had a few experts here conclude the plan is sound. I'm somewhat annoyed by the armchair quarterback negative comments on this topic. (Not just you) didn't read the rest of this thread to realize this particular concern is moot. None of the people complaining about how this is such a bad idea are being helpful by actually participate in the nightly masscheck. Talk is cheap. I'm actually doing something. Warren
Re: RFC-Ignorant (was Re: Irony)
On 2/2/2011 7:45 AM, John Levine wrote: RFC Ignorant is deep into kook territory, as should be apparent if you look at which RFCs they expect people to follow, and what their definition of follow is. abuse.net has been listed for years, since there is an autoresponder on ab...@abuse.net, and I've never noticed any delivery problems. One time I asked if they'd delist me if I got rid of the autoresponder and just threw all the abuse mail away. Yes. QED. Regards, John Levine, jo...@iecc.com, Primary Perpetrator of The Internet for Dummies, Please consider the environment before reading this e-mail. http://jl.ly https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6526 We finally agreed that rfc-ignorant.org is useless, or slightly more harmful than good. Spamassassin will be disabling these rules by default sometime soon. http://www.spamtips.org/2011/01/disable-rfc-ignorantorg-rules.html You can disable these rules with this config and avoid a useless DNS query on every mail scan. Warren
Spamassassin News Issue #2
Hey folks, http://lists.fedoraproject.org/pipermail/spamassassin-news/2011-January/01.html Here is Issue #2 of my Spamassassin for Sysadmins Newsletter. https://admin.fedoraproject.org/mailman/listinfo/spamassassin-news Subscribe here. It is intended to be like a Foo Weekly News publication, except it will likely happen monthly as there isn't enough interesting news in a week. Warren
Re: DCC plugin for SA
On 1/20/2011 12:49 AM, J4 wrote: Good morning to all of you, This popped up in the spamd.log after a reboot (done to test everything worked after a reboot). warn: dcc: dccifd - check skipped: dcc: failed to connect to a socket /var/dcc/dccifd: Connection refused The socket is there: srw-rw-rw- 1 dcc spamd 0 Jan 10 09:40 /var/dcc/dccifd local.cf has :- use_dcc 1 dcc_path /usr/local/bin/dccproc v310.pre has :- loadplugin Mail::SpamAssassin::Plugin::DCC Is there anywhere else I could look: The last log entry for DCC in /var/dcc/log was yesterday at 16:36, which makes sence. -rw--- 1 dcc spamd 2771 Jan 19 16:36 msg.0cTgcW Is this an SA related problem or specific to DCC. If the latter, then I shall seek help elsewhere as it might be considered off-topic. Best wishes, s What distribution are you using?
Re: DCC plugin for SA
On 1/20/2011 1:06 AM, J4 wrote: I had not realised it was in the repos - I just checked and it is. Damn. I'm surprised it would be in the repos. DCC is not Free Software. Warren
Re: Need Volunteers for Ham Trap
On 1/20/2011 7:23 AM, R - elists wrote: initially this came across as a really suspect idea... i.e., one man's junk is another man's treasure Ham is a lot easier to define than Spam. Ham is simply anything that you subscribed for. for a moment, it appeared we were gonna need to review the good and the bad of spam-l to avoid serious SA list issues. statistically speaking, this shouldnt sway the scoring substantially anyways would it? You are correct. This is more of a tool to have *some* variety in the ham corpus, to make it possible to flag rules in need of scrutiny. For example, prior to 3.3.x many of our rules were utterly broken with Japanese mail. We had no idea of this fact until I added a few thousand Japanese mail to the ham corpus. JM understood the problem and fixed those rules. what should be known so that bad data is not allowed into the HAM corpus ? The previous discussion described a sort of tagged sender ham trap. This simple process automatically excludes extraneous mail in cases where the address was shared with affiliates or spammer lists. We also will be careful in sticking to reputable companies and orgs for the ham trap. Warren
Re: What is Ham? (was Re: Need Volunteers for Ham Trap)
On 01/20/2011 11:31 AM, Bowie Bailey wrote: Public discussion lists are bit different. In that case, it is the individual post that is being considered spam rather than considering the list spammy. Since there is no overall control over the content of the posts, public lists are vulnerable to being filled with spam if the list owners are not paying attention. For this reason, the ham trap will not be subscribed to any discussion lists. When you sign up for a company's email list, you get whatever they decide to send you. If they decide to start sending marketing to the list, I would not consider that spam because they own the list and they can decide what to use it for. The recipients signed up to get that company's emails and if they no longer want to receive them, they can unsubscribe. And as I said before, if the unsubscribe function doesn't work, then the emails become spam (regardless of the actual content). Your understanding is exactly correct. Warren
Re: Need Volunteers for Ham Trap
On 01/18/2011 11:49 PM, Jeff Chan wrote: On Tuesday, January 18, 2011, 4:59:05 AM, Warren Jr. wrote: * Yes, we cannot be 100% sure our opt-in was only for that particular site and not their partners. But in any case automatic ham trapped mail will be only the mail branded by the subscribed provider, because that is the only mail we know for sure was opted-in. Anything else is kept separate for later analysis. * If clearly spammy other mail arrives at a particular address, the original subscription can be unsubscribed and the continued flow monitored. That address could then be discarded. Both seem reasonable approaches. Those degenerate cases of both are indeed interesting. Cheers, Jeff C. Yes, I think this is a reasonably simple and effective plan. I only need volunteers to help me find appropriate sites and to help subscribe. It is very boring to do all this myself. Warren
Re: Need Volunteers for Ham Trap
On 1/17/2011 11:46 PM, Jeff Chan wrote: So a couple points: 1. Subscribing to lists opens up lots of grey areas including the above. 2. Some of the areas are very difficult to resolve into spam or ham. Some more aggressive anti-spammers may say all of the above is spam, but others may disagree, and the mail may be legal. Before anyone accuses me of being in favor of spammers, please be aware that I am personally highly against any of these unethical practices, but when essentially making decisions for others, one needs to be very careful and consider whether there may be legitimate, ethical, legal or even wanted uses of such things. One person's ham may be another persons spam, and vice versa. However, most people don't want the stuff bots send. The issue is complex, and there are many deliverability, security and anti-spam companies and organizations that struggle with these issues every day. Maintaining accurate ham and spam corpora and making policies for what belongs in which category is trivial in some easy cases like bot pill spam, but non-trivial in other cases. Cheers, Jeff C. I appreciate the nuanced feedback but I have thought of similar considerations. I believe the following will help to avoid ambiguity and legal issues. * Yes, we cannot be 100% sure our opt-in was only for that particular site and not their partners. But in any case automatic ham trapped mail will be only the mail branded by the subscribed provider, because that is the only mail we know for sure was opted-in. Anything else is kept separate for later analysis. * If clearly spammy other mail arrives at a particular address, the original subscription can be unsubscribed and the continued flow monitored. That address could then be discarded. Warren
Re: Need Volunteers for Ham Trap
On 1/18/2011 1:15 AM, Martin Gregorie wrote: On Tue, 2011-01-18 at 01:46 -0800, Jeff Chan wrote: While I certainly would encourage improving ham and spam corpora, this proposal may open up a lot of grey areas that may be non-trivial to resolve. Agreed, and some companies will get to you sign up for accounting and service problem notifications and then pump advertising down the channel in such volume that the purpose for which you signed up seems utterly forgotten. British Telecom sets a bad example here: they even behave like a spammer inasmuch as they regularly vary their promotions text to dodge spam filters. I'd be worried that if word gets around that SA is developing rules that give signed-up bulk mail a free ride then a lot more companies will do the same. This is a misunderstanding. I am largely against whitelisting or negative score rules. I merely intend to increase the variety of legitimate mail in the nightly ham corpus so our spam-hostile rules can be better tested for safety. This will be interesting especially with non-English ham. Warren
Re: Greylisting delay (was Re: Q about short-circuit over ruling blacklisting rule)
On 01/18/2011 12:31 PM, David F. Skoll wrote: On Tue, 18 Jan 2011 22:18:20 + Gary Forrestga...@netnorth.co.uk wrote: Interesting 2 of our 3 scanning heads use a grey list system that uses /32 addresses as part of the process, these two servers have 100's of emails delayed for well over a day. Our 3rd scanning head uses a grey list system that is less granular /24 , this does not. Ah, I should mention that we use a /24 for greylisting for IPv4 and a /64 for IPv6. On the other hand, we also add a hash of the subject into the greylisting tuple so it becomes: I recently gave up entirely on greylisting after: * Last week I discovered /24 was not good enough for redelivery attempts at one major ISP. All mail from that ISP was failing for the past month except in rare cases where randomly the same /24 attempted delivery within the time window. * Years of complaints of mail delivery delays or failures from my users. They had began creating gmail accounts in order to bypass. They kept running into too many cases of broken individual mail servers (major companies!) who failed to redeliver. Users don't care about so and so is violating RFC-XXX. They are trying to get business done and it was simply causing too many problems. Warren
Re: Need Volunteers for Ham Trap
On 01/18/2011 03:25 PM, Dave Pooser wrote: On 1/18/11 12:52 AM, Warren Togami Jr.wtog...@gmail.com wrote: I am seeking volunteers to help me build and administrate a ham trap. The idea is to subscribe a list of unique e-mail addresses to various retailers, airlines, government and other legitimate bulk mail senders. The possible fly in the ointment I see is that you wouldn't necessarily have access to some sorts of transactional emails-- airline flight reminders and things of that nature. Would that be something where you'd be interested in getting mail cc:ed to a hamtrap address? For example, I use tagged email addresses for different airlines, and it would be trivial for me to have my server relay those messages to a hamtrap address as well as delivering to my personal email if that sort of thing would be useful. You are correct that this isn't transactional mail. It is however low-effort automatic collection of a subset of ham that real users receive, much of which we are entirely missing from the nightly corpus. https://fedorahosted.org/auto-mass-check/ As for the ham you suggest, I highly suggest running your own nightly masscheck and uploading logs. This avoids privacy problems and allows you to check/correct quality issues in your own corpus. Warren
Re: SARE and RulesDuJour still relevant
On 01/15/2011 01:36 AM, Ned Slider wrote: In a year of running them locally I've never seen them hit on a ham message. They appear to hit quite well for me because I pre-filter 95%+ of my spam at the smtp level (greylisting, HELO checks, spamhaus etc) so SA only gets to see the difficult to catch stuff which might inflate the percentage hits. As I said, they typically hit against bank phish sent from compromised accounts on legit servers hence why they make it through greylisting and many DNSBLs. In my corpus of 3402 spam I see NSL_RCVD_FROM_USER hit 604 (17.8%) and NSL_RCVD_HELO_USER hit 181 (5.3%). As there is (virtually?) no overlap, that's a combined hit rate of ~23%, the vast majority of which I would bet is bank phish. That is why I say these rules perform well for me - once you take out the spam that's trivial to filter (spambot spam), the hit rate against the remaining spam goes up. It seems that NSL_RCVD_FROM_USER is indeed safe (no FP's except for trec_enron), but the spam hit rate may vary wildly on different targets. My servers without any pre-spamassassin filters are seeing ~0.5-1.5% hit rates. 72_scores.cf score NSL_RCVD_FROM_USER1.180 1.226 1.180 1.226 spamassassin-3.3.x already has NSL_RCVD_FROM_USER with a production score. I am confused as to how NSL_RCVD_FROM_USER got this score, because AFAICT NSL_RCVD_FROM_USER was not in the 3.3 masscheck. In any case, OR with NSL_RCVD_FROM_HELO isn't going to be helpful as you're only piling up more score. Assigning a score to the HELO rule might be a good idea if we are certain it is safe. OTOH, the masschecks indicate very little hits at all on that rule. Warren
Re: SARE and RulesDuJour still relevant
On 1/14/2011 2:28 AM, James Lay wrote: Hey All! Been a while since I did a full blown install of SpamAssassin, and as I'm looking at my old setup, I see a fair amount of changes. I have the SARE rules as well as RulesDuJour running, but noticed that on a fresh install of SA, after doing an sa-update, there are very few rules files (the bulk of which are in /var/lib/spamassassin/3.003001/). Have rules been optimized or something? Should I copy over all the SARE rules and setup RulesDuJour to update, or leave as is? Thanks for the input. James http://www.spamtips.org/ See my blog for current recommendations of rules that are tested to be safe. I use nightly masscheck results at http://ruleqa.spamassassin.org/ in addition to local masschecks to verify that rules are safe before making recommendations. https://admin.fedoraproject.org/mailman/listinfo/spamassassin-news Spamassassin for Sysadmins Newsletter You have installed all the optional plugins right (pyzor, razor, dcc)? http://www.spamtips.org/2010/12/cacheredir-rule-prevent-google-cache.html CACHEREDIR here has proven to be completely safe, while effective against 1-4% of low scoring spam. http://wiki.apache.org/spamassassin/SoughtRules Use SOUGHT. It is good. Anyone else have effective local rules? Please let me know and I'll put them into the nightly masscheck for testing. Warren
Re: SARE and RulesDuJour still relevant
On 01/14/2011 01:09 PM, Ned Slider wrote: On 14/01/11 21:04, Warren Togami Jr. wrote: Anyone else have effective local rules? Please let me know and I'll put them into the nightly masscheck for testing. Warren header NSL_RCVD_HELO_USER Received =~ /helo[= ]user\)/i describe NSL_RCVD_HELO_USER Received from HELO User Might want to combine into a meta rule with existing NSL_RCVD_FROM_USER rule: header NSL_RCVD_FROM_USER Received =~ /from User [\[\(]/ describe NSL_RCVD_FROM_USER Received from User The above are particularly effective (here) against 419 / bank phish type emails sent from compromised webmail accounts. Hit rate is not great, but the FP count is near zero. Regards, Ned Thanks Ned, Both of the above rules are already in trunk/rulesrc/sandbox/jhardin/20_misc_testing.cf. http://ruleqa.spamassassin.org/20110114-r1058896-n/NSL_RCVD_FROM_USER/detail 0.5% spam hit rate, and some ham hits, however they are all in the ancient enron corpus that we will soon be removing. http://ruleqa.spamassassin.org/20110114-r1058896-n/T_NSL_RCVD_HELO_USER/detail Very few spam hits, and a number of ham hits but all in DOS's corpus. Perhaps we should ask him if they really are ham? Could you please describe how these rules work, and why the combination of them would be useful? NSL_RCVD_FROM_USER already has a score. It appears that the combination of the two rules will be zero masscheck FP's, but a maximum of 0.1% spam hits. I suppose this is worthwhile for a night of testing, but I suspect it will be too small? Warren
What's up with AHBL?
http://ruleqa.spamassassin.org/20110107-r1056221-n/DNS_FROM_AHBL_RHSBL/detail I just noticed this network rule with very poor performance. 0.02% spam detected in recent masschecks. My local logs show 16 hits out of 300K mail scanned in the last several months, 2 of which were false positives. http://ruleqa.spamassassin.org/20090930-r808953-n/DNS_FROM_AHBL_RHSBL/detail Apparently it was performing poorly even in the 3.3.0 rescore masscheck late 2009, with 0.072% spam detected in the much larger sample of the rescore masscheck. NJABL and rfc-ignorant.org were controversial at 1% spam, but certainly *this* is an obvious candidate for removal? Where should we draw the line? Warren Togami war...@togami.com
Re: New plugin: DecodeShortURLs
On Wed, Jan 5, 2011 at 2:41 AM, Warren Togami Jr. wtog...@gmail.com wrote: The only trouble here is HTTP's TCP handshake and teardown is significantly slower than DNSBL and URIBL lookups already used in spamassassin. My average scan time is less than one second. A plugin that catches the 1% of URL shortening spam is only worthwhile if it doesn't slow down your mail scanning considerably. Doing the HTTP query asynchronously would help, but I fear that this could easily add several seconds per mail. Warren Another problem... spammers could intentionally max out the number of shortener URL's per spam. The URL's don't even have to be real. Any random garbage after the domain name will trigger a HTTP get, and render the local cache useless. HTTP get could happen dozens or hundreds of times a minute until the shortening service decides to block the spamassassin IP.
Re: New plugin: DecodeShortURLs
On Thu, Jan 6, 2011 at 7:23 AM, Henrik K h...@hege.li wrote: There are lots of plugins out there that aren't part of the core for one reason or another. If you ask me, this is one of them. It just asks trouble widely used. It's not the only way to solve the problem anyway. And the problem itself is somewhat temporary in nature, just like image spam was etc. I don't disagree, but I am wondering how is this temporary? Warren
Re: New plugin: DecodeShortURLs
On Sat, Jan 1, 2011 at 7:19 AM, Steve Freegard st...@stevefreegard.comwrote: On 01/01/11 11:51, Warren Togami Jr. wrote: I'll help you start the process with a Bugzilla ticket. I also hope you could get it into some sort of public source control mechanism soon so we can see the changes that go into it before inclusion in upstream. I feel uncomfortable using something that is only available from a URL without being able to see its change history. Know how to use git? github.com is pretty good for something small like this. Sure. No problem. Setup a git repository? I'd like to collaborate on development on this plugin. 2) How widespread is URL shortening abuse now? I can figure this out very easily by adding a non-network URI rule to the nightly masscheck. Could you please send me privately your updated list of shorteners so that I may write such a rule? Based on the reports I get - quite prevalent at times and when these are used it's effectively a free-pass through the URIBL plug-in which often results in a false-negative. As soon as I've sorted out the list - I'll send it to you. According to yesterday's masschecks, it appears that roughly 1% of spam and 1% of ham contains a URL shortener. Of the spam in the corpus, ~49% of the spam containing a URL shortener scoring 5 points or fewer. A score this low probably means they are successful in avoiding positive URIBL hits. If you look at the borderline scores all the way up to 7, then you're looking at 64% of URL shortening spam. Higher scores are almost always a sign that the URL shortener domain itself is listed in a URIBL, probably because they didn't police themselves and they were abused too much. But the spam bias of URL shorteners are definitely weighted heavily on the lower-end of spamassassin scoring, meaning this is a worthwhile approach to develop. The only trouble here is HTTP's TCP handshake and teardown is significantly slower than DNSBL and URIBL lookups already used in spamassassin. My average scan time is less than one second. A plugin that catches the 1% of URL shortening spam is only worthwhile if it doesn't slow down your mail scanning considerably. Doing the HTTP query asynchronously would help, but I fear that this could easily add several seconds per mail. Warren
What NOT to use?
Can anyone think of custom rules or old sites that continue to be online, misleading people into believing that they should be using some custom rule or plugin that is no longer effective or safe? The former SARE repo was the only one that I know about, but there are apparently others. http://www.rulesemporium.com/ http://saupdates.openprotect.com/ I vaguely recall people saying for years that portions are safe, why not include only the safe portions? Otherwise these instructions should be taken offline as they are doing more harm than good. Warren
Re: IPv6 DNSBL/WL design, was Fwd: [Asrg] draft-levine-iprangepub-01
On Mon, Jan 3, 2011 at 9:27 PM, Jason Haar jason.h...@trimble.co.nz wrote: On 01/04/2011 04:50 PM, Dave Pooser wrote: Frankly, I'd think that besides costing the spammers money (a good thing in and of itself) ...spammers steal other people's resources - so they'll pay nothing... The best case scenario we can ever hope for is that they will be stuck sending all their spam using the From: address and SMTP server of the infected host - nothing better is possible, unless you can figure out how to stop 100% of humanity clicking on %*# executables. Some ISP's appear to be doing a much better job at preventing spam-through-official-SMTP-servers than they used to. I just now noticed that rr.com appears to be using Cloudmark on customer mail leaving their official MTA's. Looking through my logs, it appears very little of my spam is coming from official rr.com MTA's these days. This is a good sign. Now why can't Yahoo do this!? =) Warren
DNSBL Safety Report 1/2/2011
http://www.spamtips.org/2011/01/dnsbl-safety-report-122011.html Further on the topic of RBL's, I wrote this article yesterday for add-on DNSBL's for spamassassin. (BTW, I do agree that zen.spamhaus.org is an excellent choice for outright blocking of spam.) Warren
Re: lots of freemail spam
I've been thinking, perhaps we should consider making a Freemail Realtime BL that lists not IP addresses, but rather ID's at the Freemail provider. 1) I am assuming that ID's you see in headers of mail from Yahoo is always from an authenticated user? 2) Traps and user reports can quickly list a new Freemail user ID. 3) Subsequent spam from that user ID is more easily blocked because the RBL has the ID listed. 4) The RBL feed can be automated to be sent to the provider (like Yahoo) so they can more quickly enforce locking down compromised accounts or enforce their ToS. Warren
Re: lots of freemail spam
If I understand that thread correctly, that is for e-mail addresses in body text? I'm suggesting looking only at authenticated UID's in headers from specific providers like Yahoo who are notorious for spam, but their MTA's also send a significant amount of ham so we cannot DNSBL block them. Given that we know the UID's cannot be spoofed (if we verify the delivery with DKIM), such a BL can be safely populated in an automated fashion using spam traps. So this might be more of a Authenticated User RBL. Warren
Re: New plugin: DecodeShortURLs
http://ruleqa.spamassassin.org/20110102-r1054364-n/T_URL_SHORTENER/detail I inserted a giant uri regex into the nightly masscheck in order to get a rough measure the true extent of the URL shortener problem. It appears that under 1% of spam is abusing shortening redirectors. ~40% of the shortening redirector spam has local-only spamassassin scores below the 5 point threshold. We'll see next Saturday how it scores with all network rules. Warren
Re: New plugin: DecodeShortURLs
What is the status of this plugin? I notice that there is no Bugzilla ticket for this plugin. Do you intend on submitting it for inclusion in future spamassassin upstream? Would a DoS happen if the scanned e-mail contains 10,000 short URL's, and your mail server is hit by many such mail? (Either spamassasin becomes very slow, or you piss off the short URL provider by hitting them too quickly and often.) Could the plugin detect when there are intentionally too many short URL's? If so, what should it do in such cases? Are there ever legit reasons for an e-mail to have a large number of short URL's? Warren Togami war...@togami.com
Re: New plugin: DecodeShortURLs
. Warren Togami war...@togami.com
Re: New plugin: DecodeShortURLs
http://www.surbl.org/faqs#redirect BTW, this page mentions SpamCopURI and urirhdbl as existing tools that handle redirection to some degree. Have you confirmed that you are not needlessly reinventing the wheel? It is entirely possible that your design with suggestions here could be better than the existing tools, but it might be worthwhile to look at the existing tools to see if they have useful ideas to borrow. Warren
Re: New plugin: DecodeShortURLs
On Sat, Jan 1, 2011 at 7:19 AM, Steve Freegard st...@stevefreegard.comwrote: 7) How fast are typical URL shortening responses? What is the timeout? We want to avoid degrading the scan time and delivery performance of spamassassin, but in a way that cannot be abused by the spammer to evade detection. This could be a problem with your huge list of shortening services. If you blindly include all possible shortening services, spammers could purposefully use only the slowest in order timeout spamassasin. Web browsers are more forgiving in timeouts, so a slow redirector is the ideal way to evade your plugin. It is possible that you may want to include only the most reputable shortening services by default, because you don't know what will happen during the multiple years of your plugin being deployed on arbitrary servers. Other less reputable shortening services might be hijacked, domain ownership changed, or simply neglected and become slow. Such services may need to be blacklisted entirely. For the non-default shortening services, it may be safe only if it can be updated via sa-update. The timeout is set to 5 seconds and with a default of 10 short URIs scanned it would take 50 seconds before it timed out the lookups. Thinking about it I could possibly mitigate this by tracking timeouts by shortener domain; so if the 1st lookup to that shortener service timed-out then it wouldn't attempt the rest. Everything else about this sounds very good, but this part is a bit worrisome. Looking through my logs, my average scantime is under 1 second. During debugging a timeout of 5 seconds would be fine in order to help determine how fast the shorteners typically respond. But changes are needed to avoid severely impacting delivery times. * Consecutive timeouts wont work. The combined timeout of all short lookups when this plugin goes into production must be under maybe 3-5 seconds. * I know this would be difficult, but would it be possible to make asynchronous and concurrent queries to the shorteners instead of one-after-another? Kind of like how the URIDNSBL plugin currently works. There might be some complications here, like most HTTP servers will only respond to the first two concurrent connections from an IP address while further connections are serviced only after the first two have disconnected. Rule Ideas SHORT_URL_MULTI10 SHORT_URL_TOOMANY Rules triggering on suspicious behavior even if your plugin didn't have time to query it all. SHORT_URL_TIMEOUT The plugin could print out which URL timed out. Something like: X-Spam-Report: * 0.5 SHORT_URL_TIMEOUT Shortened URL Timedout * [3 second timeout for http://example.com/298fauu] Warren
Re: IPv6 DNSBL/WL design, was Fwd: [Asrg] draft-levine-iprangepub-01
On Thu, Dec 30, 2010 at 5:21 PM, Ted Mittelstaedt t...@ipinc.net wrote: On 12/30/2010 5:43 PM, John Levine wrote: Ah, I see the problem. You're assuming that spammers will follow the rules. That's a poor assumption. No, I am assuming the spammers will do as they have always done in the past - attempt to use other people's computers for free. Other computers that are NOT cycling through lots of IP number in the normal case. I didn't want to get into this debate, but I think this point is naively optimistic. If a system is capable of cycling through IP addresses, the spammer will take advantage of this. It is trivial to do this on a Linux machine without disrupting operation of the owner's software by adding/removing IP aliases. I would assume there is a way to do it on Windows as well, although it is better hidden. Warren
Re: NJABL is dead?
Folks here are missing the point, that NJABL is catching not much of anything, like less than 1% of spam, and with a relatively high FP ratio. I don't understand this desire to keep such a poor performing rule, especially when it costs a network query. Warren
Re: NJABL is dead?
Whoa. Ted please calm down. I think you read too much into this and are seriously overreacting. I didn't propose immediately replacing NJABL with something else like mailspike. I was only pointing out that NJABL was performing very poorly, to such an extent that you're better off removing it because it is needlessly using your resources. In effect my proposal makes nearly zero difference to SpamAssassin's current performance because these rules are nearly useless. The process of adding new DNSBL's to the official spamassassin rules is very lengthy. Among the things we need to improve/verify for eligibility: As you have correctly noted, the website of Mailspike needs improvements. Then we need to ask about the robustness of the mirror network. Then ask for clarification about future plans for taking it private and demanding money from users. I also know about other measures to further improve Mailspike's performance. Masschecks have confirmed for over a year now that Mailspike's performance is awesome. Even after the above things are done, it still might be months or even a year before SpamAssassin uses it as a default rule, because current policies seem to allow for big changes like this only at major releases like 3.4.0. It seems we need a general discussion about rule update policies and procedures, soon to happen on dev@ list. Warren On Tue, Dec 28, 2010 at 6:23 PM, Sahil Tandon sa...@freebsd.org wrote: On Tue, 2010-12-28 at 22:44:09 +, João Gouveia wrote: Again, a bit harsh, but I see your point. We shall improve the web site whenever possible. As everything free (and we would like to keep it that way), it's kind of subject to time+effort constraints, and typically we prefer to make use of that improving the efficiency of the list, and not so much working on the web site.. João, please do not be discouraged by the ranting. We use mailspike at multiple sites and it is a valuable, low-FP addition to the DNSBL arsenal. Thanks for your efforts. -- Sahil Tandon sa...@freebsd.org
Re: NJABL is dead?
On Tue, Dec 28, 2010 at 8:11 PM, Ted Mittelstaedt t...@ipinc.net wrote: All very good points. I guess I'm a bit frustrated because njabl is clearly not performing anymore, I noticed that a few years back, and yet it's still in SA but better BL's are not. As you (and I) both illustrated, certain things need to be in place before a BL is added to SA. It's frustrating that mailspike hasn't done the last little bit needed to polish it up (although it is good that they are care enough about it to pay attention) and it's also frustrating that the njabl owner has (apparently) gotten complacent with it's non-performance. It is a bit unfair to blame Mailspike for not having everything 100% ready. As I understand it, ANBREP began as their cleverly designed in-house spam solution. A while back they wanted it to be tested in nightly masscheck, but they didn't even have a public webpage at all. At my encouragement they slowly over the last year built the public infrastructure (Mailspike.net) and began preparing for public release. It isn't a top priority for them, and it seems one guy there is doing bit-by-bit in his spare time. We did not even formally propose inclusion to upstream yet. If these things aren't ready at the time when it is asked to be included then you can rightfully complain. Meanwhile please understand the situation. There might even be opportunities for you to help. However because the BL's are so important to the usefulness of SA I would like to see SA change the blacklist configuration to something a bit different. What I would like to see is a BL rules subdirectory that contains rules for every known blacklist that is functioning, no matter how poor they are, and then the main SA rules contain a check into that subdirectory, looking for a config file in that subdir. That config file is nothing more than a series of lines, one for each BL. Each line is a name. If a BL name is present in the config file (or uncommented) then the BL rule for that name is sucked into SA, if the BL name isn't there, (or commented out) the rule or rules for that BL are ignored. It is not that simple. The scores work together and are carefully balanced to maximize spam classification while minimizing the amount of ham False Positives. That means the scores assigned to one rule is depending on scores assigned to rules in order to work. Adding or removing significant rules like BL's have a major impact that can substantially tip the balance in either direction. They cannot be changed very often for this reason. Warren
Re: NJABL is dead?
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6525 Discussion about disabling NJABL. https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6526 Discussion about disabling rfc-ignorant.org. score __RCVD_IN_NJABL 0 score RCVD_IN_NJABL_CGI 0 score RCVD_IN_NJABL_MULTI 0 score RCVD_IN_NJABL_PROXY 0 score RCVD_IN_NJABL_RELAY 0 score RCVD_IN_NJABL_SPAM0 score __RFC_IGNORANT_ENVFROM0 score DNS_FROM_RFC_DSN 0 score DNS_FROM_RFC_BOGUSMX 0 score __DNS_FROM_RFC_POST 0 score __DNS_FROM_RFC_ABUSE 0 score __DNS_FROM_RFC_WHOIS 0 If you add these at local.cf, it makes almost zero difference to spamassassin's scoring, but you do two fewer network queries per mail scan. Warren Togami war...@togami.com
Re: NJABL is dead?
I found that if I don't set the non-scoring subrule to zero, it does the DNS lookup anyway. I will try that meta. Thx. Warren
Re: mass-check submissions Re: My attempt at re-calculating test scores
I thought a bit more about the --reuse problem. While there are pros and cons to reuse, I guess there is more benefit to --reuse than without. So I now recommend it in all cases of masscheck. On Fri, Dec 24, 2010 at 1:58 PM, Warren Togami Jr. wtog...@gmail.comwrote: This does remind me however that there is a serious and confusing problem if people should be using --reuse or not. As it is now, it is misleading and broken for most people due to the chicken and egg problem of missing tags for newer DNSBL's. We should probably tell people to turn off --reuse unless they are sure they know what they are doing. Warren On Dec 24, 2010 1:05 PM, John Hardin jhar...@impsec.org wrote:
Re: mass-check submissions Re: My attempt at re-calculating test scores
In general, please stop worrying about your corpus being ideal. Our sample size right now is so small that even non-ideal corpora would be helpful. Get started with cron nightly masschecks then work on improving your corpus later. I personally include: * The last 4 weeks of spam. I use logrotate to automatically rotate one week at a time so I don't have to worry about it. I receive LOTS of spam so this is a good quantity. IMHO, spam older than a month is far less useful to test spamassassin's rules. * Last 2 years of ham. If we had 10x as many contributors to nightly masscheck then I might reduce this to last 1 year of ham. Warren
NJABL is dead?
Hey folks, Does anyone know the story of what is going on with NJABL? http://ruleqa.spamassassin.org/20101225-r1052760-n/RCVD_IN_NJABL_PROXY/detail http://ruleqa.spamassassin.org/20101225-r1052760-n/RCVD_IN_NJABL_RELAY/detail http://ruleqa.spamassassin.org/20101225-r1052760-n/RCVD_IN_NJABL_SPAM/detail After stopping early this year, I only began looking again at ruleqa results in recent weeks. It now appears that NJABL is almost useless or dead. 50_scores.cf:score RCVD_IN_NJABL_PROXY 0 0.208 0 2.224 # n=0 n=2 50_scores.cf:score RCVD_IN_NJABL_RELAY 0 1.881 0 2.499 # n=0 n=2 50_scores.cf:score RCVD_IN_NJABL_SPAM 0 1.466 0 1.249 # n=0 n=2 These scores were assigned by the previous rescoring masscheck before the release of 3.3.0. It appears that NJABL is not worthwhile to remain in spamassassin any longer. We are only creating extra network queries and for no good reason. And NJABL just happens to be among the slowest of all my network queries in spamassassin. Perhaps it is time to remove NJABL? RFC Ignorant appears to be the next most useless network query. We may want to consider the investigating if it is worthwhile to retain it. If we eliminate network queries to useless or less effective blacklists, we could consider later adding more effective lists. Here are a few examples: http://ruleqa.spamassassin.org/20101225-r1052760-n/T_RCVD_IN_MSPIKE_BL/detail Excellent performance, I use this on my server. http://ruleqa.spamassassin.org/20101225-r1052760-n/T_RCVD_IN_SEMBLACK/detail Much improved performance since last year. I am considering using it on my server. Only tagging for now. http://ruleqa.spamassassin.org/20101225-r1052760-n/T_RCVD_IN_HOSTKARMA_BL/detail Dangerously high false positive rate. It would need to become safer. I personally use this for tagging but not scoring. For now I'm proposing only disabling NJABL in sa-update, since it is currently useless and not worth the extra network query. Any thoughts? Warren Togami war...@togami.com
Re: My attempt at re-calculating test scores
You have the option of uploading your corpus to the central server to process every night. But most people have privacy concerns about that if it is their own personal ham. For this reason you have the option of running the masscheck script yourself every night on your own server and to rsync upload the logs only to the spamassassin central server. https://fedorahosted.org/auto-mass-check/ I run this script every night from cron on my corpora. I wrote this as a friendlier wrapper script around spamassassin's confusing and difficult to configure scripts. ♫ And yes, a ham only corpus is extremely useful. You must confirm that it is 100% human verified. Start small, make sure the script is working properly, and sort more ham into that folder. Warren
Re: mass-check submissions Re: My attempt at re-calculating test scores
http://www.mail-archive.com/users@spamassassin.apache.org/msg69546.html Whitelists have almost zero impact on spamassassin's determination of ham vs spam. Believe me. This is not harmful. If you have any ham corpus it would be extremely useful to spamassassin. We have a severe lack of variety of data sources, so even a flawed data source would be incredibly useful. In this case the flaw is a not harmful like the skew that a blacklist would cause. Why recuse yourself from providing statistical data on the thousand other tests? http://ruleqa.spamassassin.org/ Look at how few contributors there are. The WORLD of spamassassin users is relying on the ham of a tiny group. spamassassin defaults are working great on MY spam, but I worry about others, especially non-US, non-English, or non-geek mail. We need greater variety and a larger sample size. Warren
Re: mass-check submissions Re: My attempt at re-calculating test scores
I think what he is failing to understand is the scores are irrelevant, as the masscheck is only determining yes or no for each rule across a corpus. Also current is referring to the nightly masscheck snapshot of svn trunk including the latest rules. This does remind me however that there is a serious and confusing problem if people should be using --reuse or not. As it is now, it is misleading and broken for most people due to the chicken and egg problem of missing tags for newer DNSBL's. We should probably tell people to turn off --reuse unless they are sure they know what they are doing. Warren On Dec 24, 2010 1:05 PM, John Hardin jhar...@impsec.org wrote:
Re: My attempt at re-calculating test scores
BTW, if you have your own corpora, why not participate in the nightly masscheck? We are in serious need of additional participants in order to enable promotion of new rules to the sa-update channel, and to make it possible to release new versions of spamassassin. Warren
spamassassin-3.3.1 RPM packages for Fedora and RHEL5
http://wtogami.livejournal.com/34108.html Please see my blog post here for official, tested RPM packages for Fedora and RHEL5. I highly recommend NOT building the RPM package from the spec file contained within the spamassassin tarball. It has never been tested to work on Fedora or Red Hat Enterprise Linux. Warren Togami wtog...@fedoraproject.org
Re: Sought rules not doing so good
On 02/03/2010 09:18 AM, Justin Mason wrote: The corpus-quality for that masscheck doesn't look too bad though: http://ruleqa.spamassassin.org/20100201-r905213-n/T_JM_SOUGHT_1/detail?s_corpus=1#corpus That day was fine. The weekly masscheck however had only 50k spam. Warren
Re: Sought rules not doing so good
On 02/02/2010 12:07 PM, Adam Katz wrote: That is quite different from our masscheck stats. Today's results at http://ruleqa.spamassassin.org/20100201/%2FJM_SOUGHT look like this: SPAM% HAM% S/ORANK SCORE NAME 9.8564 0.0042 1.0000.940.01 T_JM_SOUGHT_3 8.1587 0.0068 0.9990.930.01 T_JM_SOUGHT_2 11.6464 0.0289 0.9980.890.01 T_JM_SOUGHT_1 00 0.5000.480.00 JM_SOUGHT_FRAUD_1 00 0.5000.480.00 JM_SOUGHT_FRAUD_2 00 0.5000.480.00 JM_SOUGHT_FRAUD_3 FWIW the nightly masscheck is often very unbalanced especially on the spam side. Sometimes we have only 50k spam, sometimes over 500k spam. Some spam corpora contain a disproportionate amount of high scoring spam trap mail. I personally randomly filter out a large percentage of high scoring mail in an attempt to balance my spam corpus. But ultimately we need more masscheck participants to have better results. Warren
Re: blog article on 3.3.0
On 01/28/2010 11:33 AM, J.D. Falk wrote: http://www.returnpath.net/blog/2010/01/spamassasin-rarely-misses.php Yeah, it's partly self-serving, but that's what corporate blogs are for. The people who read this blog are mostly marketers with very little exposure to the open source community, so this should help them understand a bit more of how the real email ecosystem operates. -- J.D. Falkjdf...@returnpath.net Return Path Inc I wasn't planning on responding to this thread, but other positive responses have annoyed me. This article is borderline misleading. We didn't pay the Apache Foundation (which hosts sponsors the SpamAssassin project) for these scores, or try to sell the developers on using it. We did talk about the products with them for quite a while: what the listing criteria is, our plans for the future, et cetera. Some of the developers community members were friendly, others...not so much. In the end, it was SpamAssassin's own testing process which convinced them to include these tests with these scores. The data spoke for itself, and they saw the value in it. The data spoke for itself? http://www.gossamer-threads.com/lists/spamassassin/users/145597?do=post_view_threaded The data showed that whitelists made almost ZERO difference, actually slightly negative impact on spam filtering. Warren
Re: painting everybody in Taiwan with the same brush
On 01/26/2010 05:31 AM, Kai Schaetzl wrote: This is an SARE rule, I suggest you ask there. Kai Huh? Aren't we supposed to be telling people to stop using SARE? Warren
ANNOUNCE: Apache SpamAssassin 3.3.0 available
Release Notes -- Apache SpamAssassin -- Version 3.3.0 Introduction This is a major release, incorporating enhancements and bug fixes that have accumulated in a year and a half of development since the 3.2.5 release. Apart from some new or changed dependencies on perl modules, this version is compatible to large extent with existing installations, so the upgrade is not expected to be problematic (neither is downgrading, if need arises). Please consult the list of known incompatibilities below before upgrading. Downloading and availability Downloads are available from: http://spamassassin.apache.org/downloads.cgi md5sum of archive files: 15af629a95108bf245ab600d78ae754b Mail-SpamAssassin-3.3.0.tar.bz2 38078b07396c0ab92b46386bc70ef086 Mail-SpamAssassin-3.3.0.tar.gz e66856085ca14947146d57a40a51beaa Mail-SpamAssassin-3.3.0.zip 5be313a60c27ae522700e20b557ade33 Mail-SpamAssassin-rules-3.3.0.r901671.tgz sha1sum of archive files: 209a97102e2c0568f6ae8151e5a55cd949317b69 Mail-SpamAssassin-3.3.0.tar.bz2 35ff5ab33dd83bf8e3a63bd1540d819ab35117d5 Mail-SpamAssassin-3.3.0.tar.gz d1c61c67c806054c4404a854fc113a1a3c3e71c7 Mail-SpamAssassin-3.3.0.zip 04ac1d5d02a69f382909b01a4426a048a1e69278 Mail-SpamAssassin-rules-3.3.0.r901671.tgz Note that the *-rules-*.tgz files are only necessary if you cannot, or do not wish to, run sa-update after install to download the latest fresh rules. The release files also have a .asc accompanying them. The file serves as an external GPG signature for the given release file. The signing key is available via the wwwkeys.pgp.net key server, as well as http://www.apache.org/dist/spamassassin/KEYS The key information is: pub 4096R/F7D39814 2009-12-02 Key fingerprint = D809 9BC7 9E17 D7E4 9BC2 1E31 FDE5 2F40 F7D3 9814 uid SpamAssassin Project Management Committee priv...@spamassassin.apache.org uid SpamAssassin Signing Key (Code Signing Key, replacement for 1024D/265FA05B) d...@spamassassin.apache.org sub 4096R/7B3265A5 2009-12-02 See the INSTALL and UPGRADE files in the distribution for important installation notes. Summary of major changes since 3.2.5 COMPATIBILITY WITH 3.2.5 - rules are no longer distributed with the package, but installed by sa-update - either automatically fetched from the network (preferably) or from a tar archive, which is available for downloading separately (see below, section INSTALLING RULES); - CPAN module requirements: - minimum required version of ExtUtils::MakeMaker is 6.17; - modules now required: Time::HiRes, NetAddr::IP (4.000 or later), Archive::Tar (1.23 or later), IO::Zlib; - minimal version of Mail::DKIM is 0.31 (preferred: 0.37 or later); expect some tests in t/dkim2.t to fail with versions older than 0.36_5; - no longer used: Mail::DomainKeys, Mail::SPF::Query; - either Digest::SHA or the older Digest::SHA1 is required, though note that the DKIM plugin requires Digest::SHA for sha256 hashes and Razor agents still need Digest::SHA1; - some IPv6 functionality requires IO::Socket::INET6; - if keeping the AWL database in SQL, the field awl.ip must be extended to 40 characters. The change is necessary to allow AWL to keep track of IPv6 addresses which may appear in a mail header even on non-IPv6 -enabled host. While at it, consider also adding a field 'signedby' to the SQL table 'awl' (and adding 'auto_whitelist_distinguish_signed 1' to local.cf); see sql/README.awl for details. The change need not be undone even if downgrading back to 3.2.* for some reason; - fixing a protocol implementation error regarding a PING command required bumping up the SPAMC protocol version to 1.5. Spamd retains compatibility with older spamc clients. Combining new spamc clients with pre-3.3 versions of a spamd daemon is not supported (but happens to work, except for the PING and SKIP commands); - if using one of the plugins (FreeMail, PhishTag, Reuse) which were previously not part of the official package, please retire your local copy to avoid it conflicting with a new native plugin; - as the plugin AWL is no longer loaded by default, to continue using it the following line is needed in one of the .pre files (e.g. local.pre): loadplugin Mail::SpamAssassin::Plugin::AWL - it may be worth mentioning that a rule DKIM_VERIFIED has been renamed to DKIM_VALID to match its semantics; - the DKIM plugin is now enabled by default for new installs, if the perl module Mail::DKIM is installed. However, installation of SpamAssassin will not overwrite existing .pre configuration files, so to use DKIM when upgrading from a previous release that did not use DKIM, a directive: loadplugin Mail::SpamAssassin::Plugin::DKIM will need to be uncommented in file v312.pre, or added to some other .pre file, such as local.pre; - due to changes in some
spamassassin-3.3.0 for Fedora/RHEL
http://wtogami.livejournal.com/33674.html If you use spamassassin on Fedora or RHEL5, please see my blog post for RPM packages and distro-specific notes. Warren Togami wtog...@redhat.com
Re: spamassassin-3.3.0 for Fedora/RHEL
On 01/26/2010 03:31 PM, Kai Schaetzl wrote: Charles Gregory wrote on Tue, 26 Jan 2010 14:10:51 -0500 (EST): Anyone know where to find a RHEL(CentOS) 4 rpm? Or will it appear in the CentOS 4 official update channels in due time? Just do yourself. Follow the instructions on the download page, it's a *one liner* ! Kai FWIW, RHEL4 is older than anything I expect that .src.rpm to work with. You may also need to build your own perl modules that might be missing. Warren
Re: insecure dependency in sa-learn --import
On 01/26/2010 06:16 PM, David Morton wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Trying to import a bayes db, I get: #sa-learn --import bayes: perform_upgrade: Insecure dependency in open while running with - -T switch at /usr/share/perl/5.8/File/Copy.pm line 133. perl 5.8.8 What distribution? Warren
Re: That Future Bug
Did you enable sa-update? That will get rid of the broken rule as well. Warren