Joe E.:
Thanks for getting past the usual knee-jerk reaction and seeing.
Joe F.
Joe Emenaker wrote:
Steve Bertrand wrote:
> SA isn't about the "average" it's about the accuracy.
If this were the case, then why aren't the spam scores
("*required_hits*") for each message either 1 or 0 and nothing else?
Oh, come on now. This is just a troll on a very legitimate and
informative statement.
No... actually, I think he's got you there.
When I read his first post, my kneejerk reaction was: "Dude... RTFM!!
Learn about 'spam_threshold', adust that to your spam/ham averages and
not the other way around and stop asking silly questions..."
Then, he mentioned that he has a bunch of users who already are using
spam_threshold, but their values need to be tweaked, and it would be
easier for him to tweak the scoring, than everybody's thresholds. At
*that* point, my kneejerk reaction was to tell him to write a script
that records all of the spam scores for each user, along with whether
that user categorized it as spam or ham and then write a little script
like this (http://fruitpie.blastpoint.com/~jemenake/spamreport.cgi) to
let the users custom-pick their desired level of
false-positives/false-negatives.
But *THEN*, I finally saw the light.
From what I can gather, he's talking about the problem presented
whenever you add/remove/change your SA rules (or.... heaven forbid...
upgrade to 3.0). Whenever that happens, SA's scoring is going to shift
and, everyone's individual optimum spam_threshold would shift, too.
That's pretty screwed.
What the-other-Joe seems to be asking for is for is some way for SA to
keep "re-centering" itself so that he doesn't have to go fix
everybody's spam_threshold (or ask the individual users to) whenever
he changes the rules.
The easiest way to do this is probably for SA to somehow track what
the highest and lowest scores have been for the last week or so and,
if they both shift in the same direction by some amount, then SA would
compensate for that. On the face of it, this might be able to be
implemented with something similar to the auto-whitelisting which SA
already has (since the auto-whitelist is just an averaging feature).
The even slicker way to do what the-other-Joe is talking about is like
this, but it requires user feedback in the form of them having Spam
and Ham trash folders (like many people already use, myself included,
for Bayes training). If you had that, then SA would have available
reliable data regarding the average score of all ham and all spam.
Then, SA would be able to always adjust its scoring so that these
averages fell equally on either side of 5. Then, nobody would ever
need to mess with their spam_threshold, really. The admin could change
the ruleset, the spammers could change their tactics, etc. As long as
the user kept using their Spam and Ham trash folders, SA would keep
learning and keep re-centering. The user would experience brief spikes
of false-negatives and false-positives, but the auto-centering would
correct for it within a week or so.
Now, the ultra-deluxe-honeymoon-suite version of this would go one
step futher. If SA did have access to the scores of everything dropped
into the Spam and Ham folders, then SA could not only adjust so that a
score of 5 fell squarely between the averages, but it could *scale*
the scoring so that a spam_threshold of "10" would be guaranteed to
*catch* everything that the user has ever dropped into their Spam
trash folder (aka, all known spam from the past).... and a
spam_threshold of "0" would be guaranteed to *pass* everything that
the user has ever dropped into their Ham trash folder (aka, all known
ham from the past).
If SA could do *that*, then the spam_threshold just becomes a 0-10
number that the user chooses to indicate their personal preference
between false-positives and false-negatives. The user would never have
to change that value unless the user's *preference* changed.
And I think *that* should be an ultimate goal.
So, to summarize, I think that the-other-Joe has hit upon an important
idea here... but I think that it can, ultimately, be taken even
further and that it could make SA really, really slick.
- Joe