> Spam Bayes has worked great for me, except for one thing: > most of the spam I'm getting now in my inbox are picture > ads disguised with innocuous text.or no text. (usually drug > ads) I'm concerned that Spam Bayes only trains on the text, > and if I keep "deleting as spam" these ads, I'll train > SpamBayes into false positives.
No text is fine, because there are lots of tokens in the headers that will be used. Innocuous text might be a problem - it really depends how often those words appear in spam compared to ham. If they're just random words, then it's probably still fine. The best thing to do, IMO, would be to keep training as usual, and see if things improve. If you do end up with more good mail in your unsure folder (possibly, though I suspect not) then we'll have to figure something out. Countering this type of spam is difficult, because no-one has a way of converting a picture to something like "lots of white pills", which we could use. OTOH, many mailers don't show images by default any more, which means that this type of spam isn't effective for the spammer, either. I wonder whether generating tokens from the image would work (simple ones that aren't particularly time consuming). This sort of thing is used in some image algorithms (e.g. cascades of haar-like classifiers for face detection), so it's feasible that it would both work and be fast enough - it really depends whether suitable classifiers can be found that differentiate between ham images and spam ones. I'd quite like to do some research into this, but (a) I don't have the time right and the moment, and (b) either I get just about no spam like this, or if I do, it's all classified correctly and I don't notice it. =Tony.Meyer -- Please always include the list ([email protected]) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
