Dear developers, 

I had read carefully the online material about how to use Tesseract for OCR 
tasks. It works well for most of the data on my side. However, I found one 
weird thing which confuses me quite a lot. Here are the details.

1. Below is the image I am using. Basically, I have already binarized it to 
make all the pixel values either 0s or 255s. And the letter's height is ~30 
pixels.
[image: TesseractInputImageSingle.png]
2. I compiled the main branch locally. Here is the version info on my side
[image: Tesseract Version Info.png]
3. After running the command "Tesseract TesseractInputImageSingle.png - 
--oem 1 --psm 7", I got "DOT *0*O4N 6VHPPC" which contains an extra ZERO in 
red. 

*Could anyone kindly explain why it happens and how to avoid the confusing 
ZERO during OCR?*

4. I also tried with oem = 0 because some users recommended to use this 
mode for code recognition, the result shows "DOT O4N *G*VHPPC" with "6" 
wrongly recognized as "G".

-- 


This
email and any attachment(s) it may contain is confidential and is 
intended
solely for the use of the individual(s) to whom it is addressed. 
If you are not
the intended recipient of this email, you must not take 
action based on the
contents, nor distribute, nor expose any part of the 
content(s) to entities or
person(s) beyond the original distribution list. 
Please contact the sender and
delete the email if you have received it in 
error. Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ab282294-3ac3-4087-b346-e96a301d18c5n%40googlegroups.com.

Reply via email to