Re: [spambayes-dev] some tokenising ideas for someone who wants to experiment

T. Alexander Popiel Thu, 16 Jun 2005 15:20:04 -0700

In message:  <[EMAIL PROTECTED]>
             Anthony Baxter <[EMAIL PROTECTED]> writes:
>
>multipart/alternative:
>
>   When confronted by a multipart/alternative, score each alternative 
>separately, and keep the highest score only. Discard the scoring from the 
>lower scoring part(s). I'm seeing a _lot_ of spam with pure wordsalad 
>text/plain, and spam text in the html only.


Interesting.  To fight those particular spams, I was considering a
pre-spambayes filter that tokenized each of the alternatives, and
if the alternatives differed significantly (probably implemented as
90% of the words in the text/plain must show up in the text/html,
in the same order), just throw it away as spam, without scoring (or
training) at all.

- Alex
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev

Re: [spambayes-dev] some tokenising ideas for someone who wants to experiment

Reply via email to