[tesseract-ocr] Recognition contains an extra letter

'Yunlong Liu' via tesseract-ocr Mon, 13 Jun 2022 01:57:15 -0700

Dear developers, 

I had read carefully the online material about how to use Tesseract for OCR 
tasks. It works well for most of the data on my side. However, I found one 
weird thing which confuses me quite a lot. Here are the details.

1. Below is the image I am using. Basically, I have already binarized it to
make all the pixel values either 0s or 255s. And the letter's height is ~30
pixels.
[image: TesseractInputImageSingle.png]
2. I compiled the main branch locally. Here is the version info on my side
[image: Tesseract Version Info.png]
3. After running the command "Tesseract TesseractInputImageSingle.png -
--oem 1 --psm 7", I got "DOT *0*O4N 6VHPPC" which contains an extra ZERO in
red.

*Could anyone kindly explain why it happens and how to avoid the confusing
ZERO during OCR?*

4. I also tried with oem = 0 because some users recommended to use this
mode for code recognition, the result shows "DOT O4N *G*VHPPC" with "6"
wrongly recognized as "G".

This
email and any attachment(s) it may contain is confidential and is
intended
solely for the use of the individual(s) to whom it is addressed.
If you are not
the intended recipient of this email, you must not take
action based on the
contents, nor distribute, nor expose any part of the
content(s) to entities or
person(s) beyond the original distribution list.
Please contact the sender and
delete the email if you have received it in
error. Thank you.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/ab282294-3ac3-4087-b346-e96a301d18c5n%40googlegroups.com.

[tesseract-ocr] Recognition contains an extra letter

Reply via email to