[tesseract-ocr] '33' recognized correctly, '3' not recognized at all...

Jack Sat, 31 Aug 2019 09:24:40 -0700

I have a weird niche project here, essentially I have about 4,000 images, 
each with 2 numbers between 0 and 127.
I've tweaked the images in a million different ways and I can't get 
tesseract to recognized individual numbers, with the exception of 2, all 
other 1 digit numbers are not recognized.


Also, for some reason if I use tesseract directly I get way worse results, 
whereas if I convert to pdf first and use ocrmypdf, which apparently uses 
tesseract, I get WAY better results, which I don't understand. 

The font is very straight-forward I think, so I'm not sure if training 
would be helpful, but I'm open to the idea if needed.

Here are the sample images I'm using for testing, before and after I 
modified them:
Before: https://imgur.com/a/PhjWXXK
After: https://imgur.com/a/sCRE67S
Okay some of them failed to upload but that's the gist.

Thanks,
Jack

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7be5ed42-df44-4530-b7a2-0d0fa340918e%40googlegroups.com.

[tesseract-ocr] '33' recognized correctly, '3' not recognized at all...

Reply via email to