[tesseract-ocr] Problems recognizing certain string using font OCR A

Alexander Bartl Thu, 07 Nov 2019 09:17:29 -0800

Hello everybody, 

I want to recognize a random combination of numbers and letters string in 
OCR A font. I did not had any luck with that so I tried to recognize 
reference texts with the same font and got surprised: The "normal" text got 
recognized without a problem, while my desired string is wrongly 
interpreted. Searching through the web lead me to disable the dictionaries 
used by Tesseract to match common words in the config file. Unfortunately 
that didn't help either.


Furthermore I tried the same text with a different font (CourierNew) and I 
was able to detect my desired string. So I would assume it has something to 
do with the font.

String I want to detect: 0300FY9N457

My machine is Win10 and the output of "tesseract --version" is:
tesseract v5.0.0-alpha.20191030
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 
4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5


Attached is the input- and outputfile, as well as the configfile and the 
generated .tif. 

The cmd line used for generating the outputfile was: tesseract 
OCRA_Reference.jpg test [PATH]\config.txt 


All kinds of help/suggestions are much appreciated. 
Best regards 
Alex

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6bdfe9df-7550-4cb1-b8f6-ae02b75a5acb%40googlegroups.com.

tessedit_write_images true
load_system_dawg false
load_freq_dawg false

OCR A Std Regular

the quick brown fox
1234567690
O300FY9N457

O30 OF YOUNAS 7
THE QUICK BROWN FOX
JUMPS OVER TI

[tesseract-ocr] Problems recognizing certain string using font OCR A

Reply via email to