Hi,

I am working on limitations of OSRA, an Optical Structure Recognition 
Application which uses tesseract-ocr as optional library for detecting 
chemical structures (2D diagrams) when presented as images. I came across 
an issue where the tool suffers badly while detecting structures in the 
presence of few annotations in the vicinity of 2D diagram of the molecule 
(have a look at the attached image). I manually removed those annotations 
and tested the tool again. It worked perfectly this time! I have tested it 
on large number of images and observed a similar behaviour. These 
annotations are of unique font type compared to other characters on image, 
and mostly they are only alpha-numericals.

So, all I wanted to know is whether it is possible to identify characters 
of a specific font in an image and remove them using tesseract-ocr?

Any ideas or suggestions will be greatly appreciated.

Thanks very much,
Vishal

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to