Thank you, Warren. That (finally) gives some real perspective to this
mess, and gets some of the 'real' questions answered.
- C
On Wed, 16 Dec 2009, Warren Togami wrote:
I made a discovery today that surprised even myself. Using the rescore
masscheck and weekly masscheck logs while working on Bug #6247 I found some
interesting details that throws a wrench into this lively debate.
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6247#c49
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6247#c51
It turns out that the ReturnPath and DNSWL whitelists have a statistically
insignificant impact on spamassassin's ability to determine ham vs. spam.
Meanwhile, both whitelists have high levels of accuracy.
How can both of these statements be true? I suspect this is because the
scores are balanced by the rescoring algorithm to be "safe" in the majority
case where no whitelist rule has triggered. Thus whitelists are not needed
or relied upon to prevent false positive classification.
While whitelists are not directly effective (statistically, when averaged
across a large corpus), whitelists are powerful tools in indirect ways
including:
* Pushing the score beyond the auto-learn threshold for things like Bayes to
function without manual intervention.
* The albeit controversial method where some automated spam trap blacklists
use whitelists to help determine if they really should list an IP address.
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6247
https: //issues.apache.org/SpamAssassin/show_bug.cgi?id=6251
spamassassin-3.3.0 has reduced the score impact of these whitelists to more
modest levels, maxing out at -5 points. -5 is PLENTY for spamassassin, as 5
points is the level which the scoreset is tuned. Mail from a whitelisted host
would need greater than 10 points to be blocked, which is statistically very
rare for ham. I believe that we are striking the right balance with these
modest whitelist scores in this release.
That being said, whitelists should be constantly policed to maintain their
reputation and trust levels. For example, while I currently am impressed by
DNSWL's performance, I am not pleased that they seem to lack automated
trap-based enforcement. Relying only on manual reports and manual
intervention requires too much effort in the long-term for any organization,
be it company or volunteer run.
Warren Togami
wtog...@redhat.com