Martin Radford wrote:
It might be because you get the occasional false positive that you
want to avoid (but all the rest come under your threshold). You
probably would want these autolearned as ham.
Actually, at the moment the bayes engine thinks 99% of the messages
going through it are spam, simply because it's auto-learning spam
messages but never auto-learning ham...because messages never get
negative scores.
Or it might be because the messages are from a mailing list like this
one, where the messages may well contain extracts from spam. In this
case you positively *don't* want to autolearn them as ham, because
it'll adversely affect the Bayes database's training.
I read this in the archives while researching the problem before asking
the list(Gasp, yes! A user did research before posting!) and I think
it's such an obscure problem, that it doesn't affect us. In our
specific use, in fact- this particular circumstance will never, ever,
ever happen; nobody forwards spam to this particular mail server(but it
does get deluges of spam on its own). As for the people on SA-user
getting copies of spams...believe it or not, Spamassassin users vastly-
and I do mean vastly- outnumber spamassassin-* list members. I also
don't know many people that forward each other spams(at least not people
that want to keep their friends).
Further- as I recall last time someone said what if a spam gets quoted
on -list, several people on the SA list pointed out that even on the
SA list, such occurances are rare.
So yes- I think your argument is rather obscure and moot for 99% of your
users.
Did you consider that the occasional spam auto-learned as ham really
isn't that bad, if you're auto-learning many more legitimate messages?
SA tends to grossly tip the scales towards auto-learning spam versus
ham, all for the sake of not accidentally learning a rather
theoretical(for most users) case. Left to its own devices, the bayes
engine will eventually mark more and more messages as spam, and the
engine becomes completely useless- which is much worse than a slight
inaccuracy from the occasional spam that gets auto-learned as ham.
Developers are always well-meaning when they institute rules(that cannot
be overridden) to address specific circumstances. However, these little
rules often end up causing a lot of people a lot of grief and solving a
problem that really wasn't that big in the first place. It's like not
giving your dinner guests steak knives because there's the slim chance
they might poke themselves in the ear with it. Yeah, your dinner guests
will be safe- but they're going to have a hell of a time enjoying the
steak with that butter knife you gave them instead. Another example
would be the infamous crash involving that Airbus plane that overrode
the pilot's command for more throttle. The computer(and its programmers)
had good intentions, but failed to realize that in the end, there has to
be a magic red button somewhere that puts someone with situational
knowledge back in control of things. SA has many such restrictions and
few Magic Red Buttons.
In several cases, spamassassin assumes it knows better than I do, and
overrides my config directives(and further, doesn't warn me it's doing
so). If you want to warn me in the install/config/whatever docs that
turning on auto-learning of messages above X score or turning on
auto-learning of whitelisted messages is dangerous, fine. So be it.
Some people might not instantly realize the implication. But give us
the OPTION of doing it.
So here's my suggestion, and it's two-part:
a)strip the min+max limit controls from the two auto-learn params. If I
want to be a moron and set my auto-learn-spam to 2(ie, below the magic
number 6), that's my bloody business, not yours ;-)
b)add a auto_learn_whitelist, and have a couple of options.
Off(nothin'), auto(ie let bayes auto-learn messages that were
auto-whitelisted), manual(ie config-file whitelist rules) and all(both
auto and manual, mwuaha). Ok, so they're not intelligently named, but
that combo will make just about anybody happy.
Make the default 'off' if you REALLY, really think the whole
subversive-spam thing is a problem for the MAJORITY OF YOUR USERS.
Chances are manual is the next-safest option, since generally users
have to be smarter than the average bear to set up their own rules(or
their admins had good reasons for adding global rules- as I did on our
system, whitelisting our biggest customers). Auto and Both would be the
least safest.
You could work around the problem by creating your own rules to
identify these messages, and give them a negative score.
The messages in question have no common element. They come from
virtually anyone; in most cases, they're inititated by the user out of
the blue, so we can't even inject headers or taglines to look for later.
Brett
---
The SF.Net email is sponsored