https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
=> you must remove all noise (=> everything excluding text) from input image. Image preprocessing is your task. No tesseract parameter will do it instead of you. Zdenko On Fri, May 26, 2017 at 1:34 AM, Mat <[email protected]> wrote: > Hi, > > I'm working with some image products and trying to extract numbers from an > image. I've been trying to segment and extract digits from specific areas, > but I haven't had great results. > > I converted the attached image to a .tif (for some reason my environment > was seg faulting with the .gif), extracted a specific area (also attached), > resized it, and processed via tesseract. > > Below are 3 of the many iterations/configuration combinations I ran with > corresponding output: > > # Test 1: No options > $ tesseract cropped.tif stdout > Page 1 > Empty page!! > Empty page!! > > > # Test 2: Setting psm, resulted in better results but still lots of junk > $ tesseract cropped.tif stdout -psm 11 > Page 1 > > 14-15 > > .................. > > 10-11 > > > 113-14 > > _ I. > > i > > > # Test 3: Setting psm and whitelisting > > # ./config/digits file > tessedit_char_whitelist 0123456789 > > $ tesseract cropped.tif stdout -psm 11 ./config/digits > Page 1 > 14 15 > > 10 11 > > 113 14 > > 3 > > > > As you can see, I got the best results when I whitelisted for just 0-9 > (test 3). However, it's still not perfect and missing the 18, which is > probably the most critical for my application. > > > I did some tweaking of the command line values (i.e. > http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version) but > this didn't result in anything better. > > > Are there any other suggested configuration parameters I can play with to > increase accuracy? > > > Thanks. > > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ms > gid/tesseract-ocr/4dfec158-280e-446d-a5ae-cf0b93e9d392%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/4dfec158-280e-446d-a5ae-cf0b93e9d392%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xonV%3DyC38wKHfcs1RXfQcyn5Vawd%2B0EnfVHycR2z70Tw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

