"Seth Goodman" <[EMAIL PROTECTED]> writes: > David Abrahams wrote on Tuesday, February 06, 2007 11:05 AM -0600: > >> David Abrahams <[EMAIL PROTECTED]> writes: >> >> > How is it that for a message with >> > >> > Subject: Huge online pharmacy >> > >> > Spambayes isn't using "pharmacy" as a classification token? I can't >> > find a setting that will make it do that, either. >> >> Am I just misinterpreting what I'm seeing, or does SB really ignore >> the Subject header? > > The subject header produces tokens that start with the string > "subject:". When looking at the list of clues Spambayes finds, you > first see the list of "significant tokens", which means up to 150 (?) > tokens that score below 0.4 and above 0.6. The complete list is shown > as "all message tokens". It a token appears in the "all message tokens" > list but not in the "significant token" list, it's probably because the > token scored between 0.4 and 0.6, which means the statistics do not > indicate ham or spam.
I understand from the above that subject words are considered, but it still seems to me that something must be wrong. Subject lines containing [spam] are invariably spam. I have 12 messages that have [spam] in the subject in my spam training folder and zero in my ham training folder. Yet messages with [spam] in the subject line are commonly classified as ham or unsure. When I ask sb_imapfilter about "[spam]" *or* "subject:[spam]" I get nothing. In fact, if I do a regex query for .*spam.*, I see: Word # Spam # Ham Probability (spambayes 0 1 0.155172 spam? 0 2 0.091837 spam, 0 1 0.155172 spam. 14 12 0.506679 spam! 1 0 0.844828 *spam* 0 1 0.155172 subject:spam 13 0 0.983271 spamming? 1 0 0.844828 url:spamguard 13 0 0.983271 spamguard, 13 0 0.983271 which tells me that the tokenizer may be throwing out the brackets. OK, I see that it's doing so on both ends (when training and when classifying) so it's okay. Well, I'm not sure why [spam] hasn't gained more significance, but I guess I'll just keep training it. Thanks, and sorry for the noise. -- Dave Abrahams Boost Consulting www.boost-consulting.com _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
