Scott at HobbyLink Japan on 5/22/03 said

>>This is one of the first things SpamSieve teaches itself to do once you
>>start using it - in fact, if I remember correctly, the manual says that
>>one of the simplest predictors of spam is the "word" FF0000 (the HTML
>>hexcode for the color red). 
>><http://www.c-command.com/spamsieve/>
>
>I have no knowledge as to SpamSieve's inner workings, but almost all of
>the spam its missing these days are 100% HTML mails, hence my comment. 
>The others involve Nigeria and large sums of money.

SpamSieve uses Bayesian filtering which uses every word of the email to
build its corpus. Mr. Tsai had said that after a while you might have to
back up the corpus.plist file and select and remove all words in the
corpus window; then retraining with new good mail and spam mail.
He says: 

"I did this in late January, and my accuracy
increased from 91.5% to 98.6%, even though the new corpus only had
about 1300 messages."

He also said

"If you want to go over this in more detail, please e-mail me at
<[EMAIL PROTECTED]>."

this is me now: About.com had several articles about Bayesian filtering
which were very interesting.

What will be interesting to see is if the new spam writers will find some
way to circumvent this type of filter.

Remember even if you don't do the above, you  need to keep "adding spam"
and "adding good" for the messages that end up in the wrong place.

-- 
Barbara Needham


Reply via email to