[tesseract-ocr] Newbie: wondering why a fairly crisp document has such low accuracy

Stephen Boesch Sat, 12 Aug 2017 11:54:37 -0700

I printed out the "Welcome" page on my HP laserjet printer and scanned it 
in using .png .  The quality is quite good. So I had been  anticipating 
maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally 
carefullly - but by eyeballing it seems about  50%.    I had used all 
default settings.


Some of the consistent errors:

W -> H
in -> m
li -> h
b -> t)
ll -> H

So is this just "the way things are" in OCR land?  Or am I missing some 
fundamental settings here - to get some reasonable usefulness?

thanks

stephenb

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c7bc553d-6f89-4c52-a48a-2d2365b646c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Newbie: wondering why a fairly crisp document has such low accuracy

Reply via email to