Fair enough, though I was just using this as an example. In practice, it will be a 8 or 9 character alphanumeric string like a code. Would the extra 6-7 characters be enough context?
On Tuesday, October 22, 2019 at 5:32:37 AM UTC-7, zdenop wrote: > > I am afraid that such small faction of text (where are just letter > commonly misinterpreted like S or 5 or ? can not recognized with 100% > accuracy. Try to use in some context (line). > > Zdenko > > > po 21. 10. 2019 o 20:22 Ast <[email protected] <javascript:>> > napĂsal(a): > >> I've spent a good amount of time looking how to resolve this issue. Came >> across this unanswered post >> <https://groups.google.com/forum/?fromgroups#!searchin/tesseract-ocr/2s%7Csort:date/tesseract-ocr/uDxMr-65_nk/csA6aYaLCwAJ> >> >> from 2017. Tried it and it is still reproducible today. There are 2 images >> - one with the letter S, one with 2S. As a single character, the letter S >> is detected successfully but 2S is detected as 25 >> >> From what I've been able to learn, this issue stems from the combination >> of alphanumeric characters (common in receipts or codes) and how tessaract >> tries to use dictionary words. >> >> *Environment:* >> >> tesseract 4.1.0 >> leptonica-1.76.0 >> libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : >> libtiff 4.0.10 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 >> Found AVX2 >> Found AVX >> Found SSE >> >> Debian 10 64bit >> >> I've tried changing some configurations such as* load_system_dawg=0* and >> *load_freq_dawg=0* but without luck. >> >> I am fairly new to OCR so any input and feedback is greatly appreciated. >> Thank you. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9e8203e6-fbd5-47dc-8b2b-0327fe1e2e0a%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/9e8203e6-fbd5-47dc-8b2b-0327fe1e2e0a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b7adc50f-6c15-4851-990f-7e3aeec71e35%40googlegroups.com.

