Re: Strategies to binarize and recognize characters with more than 95% accuracy in less time

Nick White Wed, 22 Jan 2014 09:11:36 -0800

Hi Muhammad, sorry for not replying sooner.

I wonder whether Tesseract is trying to apply binarisation to the
image which you've already binarised, and is making things worse as
a result. You can see what Tesseract's binarised version looks like
with the configuration variable 'tessedit_write_images' - see the
PoorQuality[0] page on the wiki for more details on using it.


If the quality is indeed reduced by Tesseract poorly re-binarising,
there may be a way to disable it doing that (I seem to recall
someone mentioning it...) This way, though, you can check what
Tesseract is using.

If the final binarised version looks fine, check that the lines are
being detected properly (which will be less reliable if the image is
skewed). The easiest way to do that would be to just check the HOCR
output.

If lines and characters look like they're being correctly determined,
but the characters are just being recognised incorrectly, try
disabling the dictionaries.

I doubt that retraining for the monospace font would be worthwhile,
as it looks like you're working with pretty ordinary fonts, which
Tesseract ought to do a decent job on.

Let us know how you get on, and do ask any more questions you have.

Apologies again for being so slow to respond.

Nick

0. https://code.google.com/p/tesseract-ocr/wiki/PoorQuality 

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Strategies to binarize and recognize characters with more than 95% accuracy in less time

Reply via email to