Dear developers, I had read carefully the online material about how to use Tesseract for OCR tasks. It works well for most of the data on my side. However, I found one weird thing which confuses me quite a lot. Here are the details.
1. Below is the image I am using. Basically, I have already binarized it to make all the pixel values either 0s or 255s. And the letter's height is ~30 pixels. [image: TesseractInputImageSingle.png] 2. I compiled the main branch locally. Here is the version info on my side [image: Tesseract Version Info.png] 3. After running the command "Tesseract TesseractInputImageSingle.png - --oem 1 --psm 7", I got "DOT *0*O4N 6VHPPC" which contains an extra ZERO in red. *Could anyone kindly explain why it happens and how to avoid the confusing ZERO during OCR?* 4. I also tried with oem = 0 because some users recommended to use this mode for code recognition, the result shows "DOT O4N *G*VHPPC" with "6" wrongly recognized as "G". -- This email and any attachment(s) it may contain is confidential and is intended solely for the use of the individual(s) to whom it is addressed. If you are not the intended recipient of this email, you must not take action based on the contents, nor distribute, nor expose any part of the content(s) to entities or person(s) beyond the original distribution list. Please contact the sender and delete the email if you have received it in error. Thank you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ab282294-3ac3-4087-b346-e96a301d18c5n%40googlegroups.com.

