I'm new to all of this and I'm not sure if training with sa-learn is
having any effect as this SPAM still scores the same and bayes thinks
it's probably less than 1% SPAM (BAYES_00). I'm run a small vanity
domain for friends and family so there isn't exactly a ton of training
going on, but I'm sure I'm doing it right as most Bayes is 95-99% for
legitimate SPAM, and 0-5% for HAM. I only training on mail I've
personally made sure is HAM and SPAM, and in fact, these e-mails are the
only 1% probability I get for legitimate SPAM.
I've attached an example below. There is an HTML component as well, but
other than markup it is idential. My thinking is there should be some
way to write a rule checking words against a dictionary, but it sounds
like an expensive filter process-wise. This poor user gets about 10 of
these mails a day.
---------BODY----------------
http://groups.yahoo.com/group/ayazpahlmu/message/Chat/220686/
Ulti mate ly Ab ou t Per ce nt Of Ind ivi dual Re turn s Qual ifie d Fo
r e fu nd s Last Ye ar Tot alin g Abou t Bil lion Th e Re fu nds Aver ge
d Ab out The Sa me Am ou nt.