Hi,
On 13/06/2022 10:21, 'Yunlong Liu' via tesseract-ocr wrote:
Dear developers,
I had read carefully the online material about how to use Tesseract for
OCR tasks. It works well for most of the data on my side. However, I
found one weird thing which confuses me quite a lot. Here are the details.
1. Below is the image I am using. Basically, I have already binarized it
to make all the pixel values either 0s or 255s. And the letter's height
is ~30 pixels.
TesseractInputImageSingle.png
2. I compiled the main branch locally. Here is the version info on my side
Tesseract Version Info.png
3. After running the command "Tesseract TesseractInputImageSingle.png -
--oem 1 --psm 7", I got "DOT *0*O4N 6VHPPC" which contains an extra ZERO
in red.
*Could anyone kindly explain why it happens and how to avoid the
confusing ZERO during OCR?*
4. I also tried with oem = 0 because some users recommended to use this
mode for code recognition, the result shows "DOTÂ O4N *G*VHPPC" with "6"
wrongly recognized as "G".
You might want to take a look at this issue and see if it helps:
https://github.com/tesseract-ocr/tesseract/pull/3476
Cheers,
Merlijn
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/bc7bf0ee-cd16-9f7a-25c6-aede8872b387%40archive.org.