Thank you for replying, that was very helpful.
I've now tried tessdata_best and tessdata_fast trained data found on the 
tesseract github, which has drastically improved my results, but still not 
as accurate as yours.
Here are my outputs:

tesseract listpng output2 --psm 6 --tessdata-dir ~/tessdata/tessdata_best 
--oem 1
3 70
2 127
4 15
7 96
7 98
9 B58
9 65
19 695
29 91
33 75

tesseract listpng output_fast --psm 6 --tessdata-dir 
~/tessdata/tessdata_fast --oem 1
3 70
2 127
4 15
7 56
7 58
9 #58
9 #65
19 ~=665
24 #691
33 #675

On Saturday, August 31, 2019 at 11:24:23 AM UTC-5, Jack wrote:
>
> I have a weird niche project here, essentially I have about 4,000 images, 
> each with 2 numbers between 0 and 127.
> I've tweaked the images in a million different ways and I can't get 
> tesseract to recognized individual numbers, with the exception of 2, all 
> other 1 digit numbers are not recognized.
>
> Also, for some reason if I use tesseract directly I get way worse results, 
> whereas if I convert to pdf first and use ocrmypdf, which apparently uses 
> tesseract, I get WAY better results, which I don't understand. 
>
> The font is very straight-forward I think, so I'm not sure if training 
> would be helpful, but I'm open to the idea if needed.
>
> Here are the sample images I'm using for testing, before and after I 
> modified them:
> Before: https://imgur.com/a/PhjWXXK
> After: https://imgur.com/a/sCRE67S
> Okay some of them failed to upload but that's the gist.
>
> Thanks,
> Jack
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/934d89f8-a455-4787-8d8d-8986cc615059%40googlegroups.com.

Reply via email to