On Thu, 2004-05-20 at 20:27, E. Falk wrote: > I don't have any anti-hoax rules to share, but I would caution against > scoring these as spam, especially if you're using any sort of Bayesian > auto-learning or auto-whitelisting. > > Hoaxes are almost always passed along by people who are actually known > to the recipient and whom the recipient would like to receive mail from. > You don't want the system to recognize future e-mail from the sender as > spam in this case. [...]
This thread has provided some very interesting reading material, but the above has me confused. Just when I thought I was beginning to understand bayes... If we add rules that mark hoaxes as spam bayes will look for tokens in those messages to score future messages which is a "good thing", right? But Evan warns that bayes will then also mark ham from the people that sent these hoaxes as spam which is off course a "bad thing". BUT AFAIK bayes doesn't care about the sender, it only looks at the message body or am I wrong here? AWL looks at the sender, so AWL might score ham from these sender higher than it should BUT as AWL calculates scores as an average this shouldn't be much of a problem unless you receive more hoaxes and/or spam than ham from this person. In this case I don't think it's a "bad thing" anymore for SA to score messages from this person as spam, but that's just me. And again you could always whitelist_from. Even if bayes does look at the sender, all harm adding this ruleset would do would be that some tokens that used to be good indicators of spam will become less effective for a small subset of messages. Doesn't seem like a big problem to me. About FP's, I usually cut the hoax from my replies when I try to convince my friends (again) that sending hoaxes, virus warnings or jokes[1] is "A REALLY BAD THING", so all we'd have to be careful about is using too few tokens to identify a message as a hoax. But as the SARE ninjas seem to have taken up this battle I'm quite confident the rules that will make it into this ruleset will be near-perfect. (thanks ninjas!) So could someone explain the dangers of adding a hoax-ruleset to me? It seems I'm missing something. TIA Bram -- # Mertens Bram "M8ram" <[EMAIL PROTECTED]> Linux User #349737 # # SuSE Linux 8.2 (i586) kernel 2.4.20-4GB i686 256MB RAM # # 10:18am up 62 days 13:57, 7 users, load average: 0.00, 0.01, 0.00 #
