Hi,

On 13/06/2022 10:21, 'Yunlong Liu' via tesseract-ocr wrote:
Dear developers,

I had read carefully the online material about how to use Tesseract for OCR tasks. It works well for most of the data on my side. However, I found one weird thing which confuses me quite a lot. Here are the details.

1. Below is the image I am using. Basically, I have already binarized it to make all the pixel values either 0s or 255s. And the letter's height is ~30 pixels.
TesseractInputImageSingle.png
2. I compiled the main branch locally. Here is the version info on my side
Tesseract Version Info.png
3. After running the command "Tesseract TesseractInputImageSingle.png - --oem 1 --psm 7", I got "DOT *0*O4N 6VHPPC" which contains an extra ZERO in red.

*Could anyone kindly explain why it happens and how to avoid the confusing ZERO during OCR?*

4. I also tried with oem = 0 because some users recommended to use this mode for code recognition, the result shows "DOT O4N *G*VHPPC" with "6" wrongly recognized as "G".


You might want to take a look at this issue and see if it helps: https://github.com/tesseract-ocr/tesseract/pull/3476

Cheers,
Merlijn

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bc7bf0ee-cd16-9f7a-25c6-aede8872b387%40archive.org.

Reply via email to