Sorry for missing the attachment. Please find attached the image. The 
numbers around the 2D diagram are the annotations I would like to address.

On Tuesday, 12 November 2013 16:43:23 UTC, Vishalkpp wrote:
>
> Hi,
>
> I am working on limitations of OSRA, an Optical Structure Recognition 
> Application which uses tesseract-ocr as optional library for detecting 
> chemical structures (2D diagrams) when presented as images. I came across 
> an issue where the tool suffers badly while detecting structures in the 
> presence of few annotations in the vicinity of 2D diagram of the molecule 
> (have a look at the attached image). I manually removed those annotations 
> and tested the tool again. It worked perfectly this time! I have tested it 
> on large number of images and observed a similar behaviour. These 
> annotations are of unique font type compared to other characters on image, 
> and mostly they are only alpha-numericals.
>
>
> <https://lh5.googleusercontent.com/-3de1iEHTDac/UoJi83pHrMI/AAAAAAAAAfE/-h8U_OPD2Zc/s1600/031.png>
>
> So, all I wanted to know is whether it is possible to identify characters 
> of a specific font in an image and remove them using tesseract-ocr?
>
> Any ideas or suggestions will be greatly appreciated.
>
> Thanks very much,
> Vishal
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to