Re: Whitelists, not directly useful to spamassassin...

Charles Gregory Thu, 17 Dec 2009 08:30:45 -0800

Thank you, Warren. That (finally) gives some real perspective to thismess, and gets some of the 'real' questions answered.


- C

On Wed, 16 Dec 2009, Warren Togami wrote:

I made a discovery today that surprised even myself. Using the rescoremasscheck and weekly masscheck logs while working on Bug #6247 I found someinteresting details that throws a wrench into this lively debate.
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6247#c49
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6247#c51
It turns out that the ReturnPath and DNSWL whitelists have a statisticallyinsignificant impact on spamassassin's ability to determine ham vs. spam.Meanwhile, both whitelists have high levels of accuracy.
How can both of these statements be true? I suspect this is because thescores are balanced by the rescoring algorithm to be "safe" in the majoritycase where no whitelist rule has triggered. Thus whitelists are not neededor relied upon to prevent false positive classification.
While whitelists are not directly effective (statistically, when averagedacross a large corpus), whitelists are powerful tools in indirect waysincluding:
* Pushing the score beyond the auto-learn threshold for things like Bayes tofunction without manual intervention.* The albeit controversial method where some automated spam trap blacklistsuse whitelists to help determine if they really should list an IP address.
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6247
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6251
spamassassin-3.3.0 has reduced the score impact of these whitelists to moremodest levels, maxing out at -5 points. -5 is PLENTY for spamassassin, as 5points is the level which the scoreset is tuned. Mail from a whitelisted hostwould need greater than 10 points to be blocked, which is statistically veryrare for ham. I believe that we are striking the right balance with thesemodest whitelist scores in this release.
That being said, whitelists should be constantly policed to maintain theirreputation and trust levels. For example, while I currently am impressed byDNSWL's performance, I am not pleased that they seem to lack automatedtrap-based enforcement. Relying only on manual reports and manualintervention requires too much effort in the long-term for any organization,be it company or volunteer run.
Warren Togami
wtog...@redhat.com

Re: Whitelists, not directly useful to spamassassin...

Reply via email to