https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

=> you must remove all noise (=> everything excluding text) from input
image.
Image preprocessing is your task. No tesseract parameter will do it instead
of you.


Zdenko

On Fri, May 26, 2017 at 1:34 AM, Mat <[email protected]> wrote:

> Hi,
>
> I'm working with some image products and trying to extract numbers from an
> image. I've been trying to segment and extract digits from specific areas,
> but I haven't had great results.
>
> I converted the attached image to a .tif (for some reason my environment
> was seg faulting with the .gif), extracted a specific area (also attached),
> resized it, and processed via tesseract.
>
> Below are 3 of the many iterations/configuration combinations I ran with
> corresponding output:
>
> # Test 1: No options
> $ tesseract cropped.tif stdout
> Page 1
> Empty page!!
> Empty page!!
>
>
> # Test 2: Setting psm, resulted in better results but still lots of junk
> $ tesseract cropped.tif stdout -psm 11
> Page 1
>
> 14-15
>
> ..................
>
> 10-11
>
>
> 113-14
>
> _ I.
>
> i
>
>
> # Test 3: Setting psm and whitelisting
>
> # ./config/digits file
> tessedit_char_whitelist 0123456789
>
> $ tesseract cropped.tif stdout -psm 11 ./config/digits
> Page 1
> 14 15
>
> 10 11
>
> 113 14
>
> 3
>
>
>
> As you can see, I got the best results when I whitelisted for just 0-9
> (test 3). However, it's still not perfect and missing the 18, which is
> probably the most critical for my application.
>
>
> I did some tweaking of the command line values (i.e.
> http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version) but
> this didn't result in anything better.
>
>
> Are there any other suggested configuration parameters I can play with to
> increase accuracy?
>
>
> Thanks.
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/ms
> gid/tesseract-ocr/4dfec158-280e-446d-a5ae-cf0b93e9d392%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/4dfec158-280e-446d-a5ae-cf0b93e9d392%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xonV%3DyC38wKHfcs1RXfQcyn5Vawd%2B0EnfVHycR2z70Tw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to