Re: [SAtalk] Removing headers etc.. to feed Bayes correctly

Justin Mason Mon, 16 Jun 2003 12:13:52 -0700

Tom Meunier said:

> A Bayes 
> database doesn't reach maturity by having a certain number of SA-filtered spa
> ms >15 and SA-filtered hams <-2; it reaches maturity by having a certain numb
> er of confirmed hams and spams, period.  Therefore, if one organization obtai
> ns initial Bayes seeding strictly through auto-learning for three weeks and g
> et 2000 hams and 2000 spams in it, and another does theirs in 15 minutes by m
> anually teaching it 2000 hams from this week, and 2000 spams from this week (
> that SpamAssassin has never touched), the LATTER would be the much, much more
>  accurate Bayesian seeding procedure.


Yes, exactly correct.

> This is discussed in-depth in Paul Graham's writing on the topic, specificall
> y the part where he mentions that tokens like "per" and "FL" and "ff0000" are
>  actually very reliable indicators of spammishness.

Mind you, this is not so correct. ;)

Ignore that part of PG's writings; it indicates only that he does not get
very much HTML email ;)   We tested this, and against our corpora it did
very badly.  So it's one of those things that mean 1 thing for 1 person
and another for others -- which, coincidentally, is where bayes does well
;)

--j.


-------------------------------------------------------
This SF.NET email is sponsored by: eBay
Great deals on office technology -- on eBay now! Click here:
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Removing headers etc.. to feed Bayes correctly

Reply via email to