characters

Ast Mon, 21 Oct 2019 11:22:44 -0700

I've spent a good amount of time looking how to resolve this issue. Came 
across this unanswered post 
<https://groups.google.com/forum/?fromgroups#!searchin/tesseract-ocr/2s%7Csort:date/tesseract-ocr/uDxMr-65_nk/csA6aYaLCwAJ>
 
from 2017. Tried it and it is still reproducible today. There are 2 images 
- one with the letter S, one with 2S. As a single character, the letter S 
is detected successfully but 2S is detected as 25


>From what I've been able to learn, this issue stems from the combination of 
alphanumeric characters (common in receipts or codes) and how tessaract 
tries to use dictionary words. 

*Environment:*

tesseract 4.1.0
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 
4.0.10 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found SSE

Debian 10 64bit

I've tried changing some configurations such as* load_system_dawg=0* and 
*load_freq_dawg=0* but without luck.

I am fairly new to OCR so any input and feedback is greatly appreciated. 
Thank you. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9e8203e6-fbd5-47dc-8b2b-0327fe1e2e0a%40googlegroups.com.

[tesseract-ocr] Accuracy with non-standard words consisting of random combinations/mix of digits + letters/characters

Reply via email to