> I just discovered the existence of Tesseract OCR, whose
> homepage[1] says:
>
>   A commercial quality OCR engine originally developed at HP between
>   1985 and 1995. In 1995, this engine was among the top 3 evaluated by
>   UNLV. It was open-sourced by HP and UNLV in 2005.
>
> I thought some of you (Skip, Mark) might be interested if you hadn't
> heard about this software yet.

You could help us out here too, by running some of your image spam against
the various engines and manually inspecting the accuracy of the text versus
what you actually see in the image.  My quick experiments show that
tesseract is very close to the results I get from gocr, and significantly
better than ocrad.

Mark

_______________________________________________
spambayes-dev mailing list
spambayes-dev@python.org
http://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to