Hi, I am working on limitations of OSRA, an Optical Structure Recognition Application which uses tesseract-ocr as optional library for detecting chemical structures (2D diagrams) when presented as images. I came across an issue where the tool suffers badly while detecting structures in the presence of few annotations in the vicinity of 2D diagram of the molecule (have a look at the attached image). I manually removed those annotations and tested the tool again. It worked perfectly this time! I have tested it on large number of images and observed a similar behaviour. These annotations are of unique font type compared to other characters on image, and mostly they are only alpha-numericals.
So, all I wanted to know is whether it is possible to identify characters of a specific font in an image and remove them using tesseract-ocr? Any ideas or suggestions will be greatly appreciated. Thanks very much, Vishal -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

