[tesseract-ocr] improving boxes

Mike Sat, 10 May 2014 11:02:54 -0700

Hello,

I'm working on mobile app which uses tesseract library for OCR. I trained 
tesseract for my own fonts but results are still very unstable. When I 
debug results it seems library recognizes letters correctly if boxes are 
found correctly. However, in many cases they are incorrect.


For preprocessing I'm using adaptive thresholding, which deals with pretty 
well. 

The common problems with boxes are:
1) detecting one character as two or vice versa
2) detecting very long but narrow boxes covering few lines
3) not detecting boxes

How to improve boxes detection? Can I constrain their sizes or ratio?

Any suggestions are appreciated.


Mike

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] improving boxes

Reply via email to