whereis howto for optimising text extraction from non-document images?

Jonathan Chetwynd Sun, 28 Apr 2013 05:02:37 -0700

Is there a document explaining how to tweak tesseract to get the most from 
non-document type images?


I have a collection of images containing small amounts of text in signage 
from webcam images.

I find results extremely variable, and without clear or easily understood 
cause.

http://peepo.com/pics/ocr/road_signs.jpg
outputs:

West End

Barbican

Exhibition -9
Halls 

and a small extract
http://peepo.com/pics/ocr/when_red.png
'when red light shows wait here'
outputs:

' LIGHT SHOWS
wmr HERE

How to improve the output in the general case?

is there a way to constrain output to words, ie reject wmr and hence force 
wait?

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

whereis howto for optimising text extraction from non-document images?

Reply via email to