Re: [tesseract-ocr] Newbie: wondering why a fairly crisp document has such low accuracy

2017-08-12 Thread ShreeDevi Kumar
With English you should probably get close to 99% accuracy.

Is your png at 300 dpi?

Which version of tesseract did you use?
Which traineddata?

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Aug 12, 2017 at 11:46 PM, Stephen Boesch  wrote:

> I printed out the "Welcome" page on my HP laserjet printer and scanned it
> in using .png .  The quality is quite good. So I had been  anticipating
> maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally
> carefullly - but by eyeballing it seems about  50%.I had used all
> default settings.
>
> Some of the consistent errors:
>
> W -> H
> in -> m
> li -> h
> b -> t)
> ll -> H
>
> So is this just "the way things are" in OCR land?  Or am I missing some
> fundamental settings here - to get some reasonable usefulness?
>
> thanks
>
> stephenb
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/c7bc553d-6f89-4c52-a48a-2d2365b646c7%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVTY0XZ%2BFAD6xp%2BKOrE946J6EEJS0A9ihRPb%2BwVW%2BoGXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Newbie: wondering why a fairly crisp document has such low accuracy

2017-08-12 Thread Stephen Boesch
I printed out the "Welcome" page on my HP laserjet printer and scanned it 
in using .png .  The quality is quite good. So I had been  anticipating 
maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally 
carefullly - but by eyeballing it seems about  50%.I had used all 
default settings.

Some of the consistent errors:

W -> H
in -> m
li -> h
b -> t)
ll -> H

So is this just "the way things are" in OCR land?  Or am I missing some 
fundamental settings here - to get some reasonable usefulness?

thanks

stephenb

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c7bc553d-6f89-4c52-a48a-2d2365b646c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.