On Monday, June 19, 2006 5:11 AM -0500, yahoo.de wrote: > The other Question is, as i read in the documentation of > tokenizer.py, does not the Spambayes care about html tags > in a html type emails?
That's correct, it does not care about HTML tags because they probably don't correlate well with either ham or spam. If you suspect that some HTML constructs correlate with spam, you can alter the classifier code to test this. > So if i send an html type email with an spam image as the > background of your email and without any spam text only a > tree characters subject the spambayes does not recognized > sometoken for the image in background to classified it at > least as unsure! Since Spambayes can only interpret text, it can't do anything with an image. Visible text can include the subject line, some address fields, the message-ID and even the MUA used to generate the message. If the subject line has too few characters, it will not generate a token. However, the other header fields will generate at least a few, even with no text below the subject line. Depending on the scores for the few tokens it does generate, it can still score a message as ham, spam or unsure. -- Seth Goodman _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
