[tesseract-ocr] Extracting sparse digits from images

Mat Thu, 25 May 2017 23:19:46 -0700

Hi,

I'm working with some image products and trying to extract numbers from an 
image. I've been trying to segment and extract digits from specific areas, 
but I haven't had great results.

I converted the attached image to a .tif (for some reason my environment
was seg faulting with the .gif), extracted a specific area (also attached),
resized it, and processed via tesseract.

Below are 3 of the many iterations/configuration combinations I ran with
corresponding output:

# Test 1: No options
$ tesseract cropped.tif stdout
Page 1
Empty page!!
Empty page!!

# Test 2: Setting psm, resulted in better results but still lots of junk
$ tesseract cropped.tif stdout -psm 11
Page 1

14-15

..................

10-11

113-14

_ I.

# Test 3: Setting psm and whitelisting

# ./config/digits file
tessedit_char_whitelist 0123456789

$ tesseract cropped.tif stdout -psm 11 ./config/digits
Page 1
14 15

10 11

113 14

As you can see, I got the best results when I whitelisted for just 0-9
(test 3). However, it's still not perfect and missing the 18, which is
probably the most critical for my application.

I did some tweaking of the command line values
(i.e. http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version)
but this didn't result in anything better.

Are there any other suggested configuration parameters I can play with to
increase accuracy?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/4dfec158-280e-446d-a5ae-cf0b93e9d392%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Extracting sparse digits from images

Reply via email to