Sorry for missing the attachment. Please find attached the image. The numbers around the 2D diagram are the annotations I would like to address.
On Tuesday, 12 November 2013 16:43:23 UTC, Vishalkpp wrote: > > Hi, > > I am working on limitations of OSRA, an Optical Structure Recognition > Application which uses tesseract-ocr as optional library for detecting > chemical structures (2D diagrams) when presented as images. I came across > an issue where the tool suffers badly while detecting structures in the > presence of few annotations in the vicinity of 2D diagram of the molecule > (have a look at the attached image). I manually removed those annotations > and tested the tool again. It worked perfectly this time! I have tested it > on large number of images and observed a similar behaviour. These > annotations are of unique font type compared to other characters on image, > and mostly they are only alpha-numericals. > > > <https://lh5.googleusercontent.com/-3de1iEHTDac/UoJi83pHrMI/AAAAAAAAAfE/-h8U_OPD2Zc/s1600/031.png> > > So, all I wanted to know is whether it is possible to identify characters > of a specific font in an image and remove them using tesseract-ocr? > > Any ideas or suggestions will be greatly appreciated. > > Thanks very much, > Vishal > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

