[tesseract-ocr] Re: Tesseract mistakes letters for numbers

Eric Hodges Wed, 21 Jul 2021 11:37:21 -0700

Update:

I discovered the command line option:


    -c load_number_dawg=0

That did not improve my results.

On Wednesday, July 21, 2021 at 1:07:15 PM UTC-5 Eric Hodges wrote:

> I need some help. I have a bunch of images of text like this:
>
> [image: sample_si.jpg]
> They are all 200 dpi, black and white images. In over 50% of the cases, 
> Tesseract confuses the "SI" at the front for digits. Most of them are "51", 
> but some are "81" or "31".
>
> I've tried tweaking all of the settings I can find, but none of them 
> improve the results. I'm currently using a config file like this:
>
> tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
>
> Interesting fact: If I cut off the digits and only send the alphas to 
> Tesseract, it recognizes them correctly. Is there something in Tesseract 
> that makes it less likely to mix letters and numbers in a single word?
>
> Any suggestions?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/af5a508c-cca8-4db1-a741-4aa10972c129n%40googlegroups.com.

[tesseract-ocr] Re: Tesseract mistakes letters for numbers

Reply via email to