Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

Happy Chap Wed, 04 Aug 2010 06:08:17 -0700

Hi RW, thanks for your reply.

>It's unlikely that that could push the BAYES RESULT down to BAYES_00
>unless there is uncorrected mistraining.

Possibly, but I suspect mistraining isn't a problem because apart from this
specific type of spam, Spamassassin is doing (and has done for sometime) a
very good job of correctly identifying mail properly. If I do a dump of the
bayes database, we've got about 30k each of spam & ham that's it's learned
from and based on those numbers I don't think the %age of mistrained
messages would be significant at all if the odd few were mistrained.

>I don't think the 3.2.x rules get updated much. Perhaps this is leading
>to false autotraining in BAYES.

Ah, perhaps this is more of a problem, I didn't realise there were different
rule updates based on the versions of Spamassassin (well, not between 3.2.x
and 3.3.x anyway). In that case, I'll try upgrading Spamassassin and see if
that helps.

Incidentally, I'm not sure the autotraining is much of a problem as it only
seems to be very obvious (high scoring) spam (and ham) that triggers
autotraining, according to the headers at least. Certainly none of this
particular type of spam is getting autotrainined according to the headers.

Finally, do you know if Spamassassin has rules that *should* catch this type
of spam (ie. no legitimate email would include big blocks of random
paragraphs inside HTML comments). I would have thought that of itself would
have perhaps been picked up by a rule to identify it as spam.

Thanks again, David.
--
View this message in context:
http://old.nabble.com/Text-contained-in-HTML-comments-causing-BAYES_00-to-classify-as-non-spam-tp29342874p29345981.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

Reply via email to