Re: Bayes problems and German Spam

Ronan McGlue Mon, 16 May 2005 05:14:28 -0700

Simon Byrnand wrote:

At 09:53 16/05/2005, Jo wrote:
Simon Byrnand wrote:
Hi All,
After going from 2.64 to 3.0.3 I thought Bayes was working much better - previously certain classes of spam were being consistently reported as ham, scoring BAYES_00 no matter what I did, or how much manual training I did. (Autolearning enabled)

After upgrading to 3.0.3 and clearing the Bayes database everything seemed fine for a week or so, now it's back to its old habits :(

Particularly frustrating is the complete inability of sa-learn to correct the thinking of Bayes - all the recent flood of German spams are scoring BAYES_00, and DESPITE the fact that I have manually learnt well over two dozen of these as spam (which includes all the variations of them I've seen so far) new copies of identical spams STILL score BAYES_00. WHY ?

If the autolearn system can't be overridden with some manual learning, it makes it more of less useless :(

A few other spams that were previously getting BAYES_99 are now down to BAYES_00 for no apparent reason. It's highly unlikely that they were autolearnt as ham, as they hit several other tests too. It seems that Bayes is still exploitable... :(
Any suggestions ?
Regards,
Simon
Clear your bayes database and start all over again. Switch off auto-learning and rely purely on manual learning in a feedback loop. Grab a mail box of known ham and another folder of known spam. Preferably use a thousand of each.
Hmm, not very practical when the system has several thousand users/mailboxes. There is no way I would be able to keep current with manual learning just based on my own personal mailbox...(and I can hardly go poking around in other peoples mailboxes to gather ham/spam to learn)

If you ever switch on autolearning again. Set the treshold at -0.2 for ham and 10 or 15 for spam.
Are there even any negative scores in 3.0.3 ? I thought negative scores were pretty much eliminated in recent versions, so with -0.2 it would never learn any ham.

Enable network tests, razor2, pyzor and dcc work wonders on the site I administer.
Already have all network tests enabled, always have done.
Regards,
Simon

I too have all net tests enabled and have started from a fresh clean new database friday, and already Im seeing the german spams hit bayes_00... I dont want to switch autolearning off becuase well i find it incredibly usefull. i have spam/ham thresholds at 10/0 respectivly and all appears well aside from the german bunch of spams...

dont know what else i can do...

*cluches at straws* Is there a way to tie in a positive net test... say multi.surbl.org to sway the bayes as generally if the SURBL reports spam you can guaratee that all the other rules are surplus to requiremtns... IMHO

ronan

--
========

Regards

Ronan McGlue
Info. Services
QUB

Re: Bayes problems and German Spam

Reply via email to