David Abrahams wrote on Friday, January 05, 2007 9:22 AM -0600: > "Seth Goodman" <[EMAIL PROTECTED]> writes: > > > Image spam is gradually moving in the direction of a captcha: > > images that people can identify but computers can't. How far they > > can go before it becomes so annoying that no one will look at it is > > anyone's guess. As long as people can design effective captcha's, > > it will be possible to construct image spam that OCR will not > > detect. > > Yes, I understand the principle. Of course, the effectiveness of > captchas depends on the ineffectiveness of OCR. On the other hand, > most OCR is built to deal with reasonably legible text, so we may need > spam-specific OCR tools.
The human eye and brain are amazing image analyzers. OCR is only ineffective when compared to them. While our visual sense can be fooled, i.e. "optical illusions", it's power is that it is robust to so many forms of noise and image degradation. You don't need training to find the text in a captcha. We are told it's there and we all just see it. OCR programs use a variety of mathematical methods plus heuristics and they require care and feeding to function at all. This is why computers will remain behind humans in processing images for the foreseeable future. Make OCR as "spam-specific" as you like, but it will require tweaking each time spammers change to an unusual font, background noise or text distortion. I don't want to seem morose about this, but I don't believe it's a battle we can ultimately win. It can still assist Spambayes classifying messages with image spam, but it's not a silver bullet. This is really a problem to be solved at the MTA with stricter connection rules. Nonetheless, I suspect that Spambayes could improve by creating more synthetic tokens that describe the image better and taking advantage of serendipitous differences between tokens for image spam and those in each user's ham. I'm not sure what those attributes are, but it probably beats trying to keep up with a quickly evolving captcha. Outlook doesn't help the situation, as it destroys much of the MIME armor that might provide useful spam clues. -- Seth Goodman _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
