I printed out the "Welcome" page on my HP laserjet printer and scanned it 
in using .png .  The quality is quite good. So I had been  anticipating 
maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally 
carefullly - but by eyeballing it seems about  50%.    I had used all 
default settings.

Some of the consistent errors:

W -> H
in -> m
li -> h
b -> t)
ll -> H

So is this just "the way things are" in OCR land?  Or am I missing some 
fundamental settings here - to get some reasonable usefulness?



You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to