Re: [tesseract-ocr] Recognition contains an extra letter

Merlijn B.W. Wajer Mon, 13 Jun 2022 02:03:32 -0700

Hi,

On 13/06/2022 10:21, 'Yunlong Liu' via tesseract-ocr wrote:

Dear developers,
I had read carefully the online material about how to use Tesseract forOCR tasks. It works well for most of the data on my side. However, Ifound one weird thing which confuses me quite a lot. Here are the details.
1. Below is the image I am using. Basically, I have already binarized itto make all the pixel values either 0s or 255s. And the letter's heightis ~30 pixels.
TesseractInputImageSingle.png
2. I compiled the main branch locally. Here is the version info on my side
Tesseract Version Info.png
3. After running the command "Tesseract TesseractInputImageSingle.png ---oem 1 --psm 7", I got "DOT *0*O4N 6VHPPC" which contains an extra ZEROin red.
*Could anyone kindly explain why it happens and how to avoid theconfusing ZERO during OCR?*
4. I also tried with oem = 0 because some users recommended to use thismode for code recognition, the result shows "DOT O4N *G*VHPPC" with "6"wrongly recognized as "G".

You might want to take a look at this issue and see if it helps:https://github.com/tesseract-ocr/tesseract/pull/3476


Cheers,
Merlijn

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bc7bf0ee-cd16-9f7a-25c6-aede8872b387%40archive.org.

Re: [tesseract-ocr] Recognition contains an extra letter

Reply via email to