Hi Alex,

You might consider a template matching toolkit like OpenCV [1], I haven’t used 
it with words but I suspect it would work well in this kind of situation. 
OpenCV can also be used to remove basic shapes, such as circles and so on, but 
having a list of the words you want is a huge advantage.

art
---
1. http://docs.opencv.org/

From: [email protected] [mailto:[email protected]] On 
Behalf Of Alexander Pico
Sent: Monday, April 27, 2015 2:34 PM
To: [email protected]
Subject: [tesseract-ocr] Extracting molecular labels from biological pathway 
images

I am trying to identify the molecules from pathway images. This should be 
relatively simple from clear, high-res images like the one attached, but my 
attempts with Tesseract so are are pretty dismal...

It found 9 of 25 molecules. I even have the luxury of knowing in advance all 
the words I'd like extract and tried supplying these as eng.user-words, but 
there was no improvement.

I suspect I need to find the magic combination of parameter settings or perhaps 
image pre-processing.  Any suggestions?

Thanks!
 - Alex
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To post to this group, send email to 
[email protected]<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ff5a2873-8392-4771-b314-3f2f146b0027%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/ff5a2873-8392-4771-b314-3f2f146b0027%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/BY2PR11MB07435D6D9D6AEE39AEE5E628DCE90%40BY2PR11MB0743.namprd11.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to