Update: I discovered the command line option:
-c load_number_dawg=0 That did not improve my results. On Wednesday, July 21, 2021 at 1:07:15 PM UTC-5 Eric Hodges wrote: > I need some help. I have a bunch of images of text like this: > > [image: sample_si.jpg] > They are all 200 dpi, black and white images. In over 50% of the cases, > Tesseract confuses the "SI" at the front for digits. Most of them are "51", > but some are "81" or "31". > > I've tried tweaking all of the settings I can find, but none of them > improve the results. I'm currently using a config file like this: > > tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 > > Interesting fact: If I cut off the digits and only send the alphas to > Tesseract, it recognizes them correctly. Is there something in Tesseract > that makes it less likely to mix letters and numbers in a single word? > > Any suggestions? > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/af5a508c-cca8-4db1-a741-4aa10972c129n%40googlegroups.com.