>Scott at HobbyLink Japan on 5/22/03 said >>I have no knowledge as to SpamSieve's inner workings, but almost all of >>the spam its missing these days are 100% HTML mails, hence my comment. >>The others involve Nigeria and large sums of money. > >SpamSieve uses Bayesian filtering which uses every word of the email to >build its corpus. Mr. Tsai had said that after a while you might have to >back up the corpus.plist file and select and remove all words in the >corpus window; then retraining with new good mail and spam mail. >He says: > >"I did this in late January, and my accuracy >increased from 91.5% to 98.6%, even though the new corpus only had >about 1300 messages."
I did this, and yes, it did increase my accuracy. I'm extremely happy with the SpamSieve/PowerMail combination overall. But... Most of the mails it's missing seem, from the layman's standpoint, to be completely no-brainers. All HTML messages laced with porn words and links to external images. Perhaps 1 in 20 to 30 HTML mails I get are not spam, so I'd like the option in either PM itself (can this be done with a filter? I don't know how since body filtering is not provided), or in SpamSieve, to adopt a "guilty until proven innocent" policy regarding HTML mail, esp. those with links to images. How hard can this be? And why does Nigeria mail still get through, even though I have trained every darn one of them as spam? One would think by now that the word 'Nigeria' would alone almost be an automatic trigger, but I don't know exactly how these Bayesian algorithms work. --- Scott T. Hards President HobbyLink Japan (www.hlj.com)

