[tesseract-ocr] Re: Microscopy label, poor recognition

Keith M Tue, 21 Dec 2021 08:30:50 -0800

Martin,

I'd normally reply privately here, but I don't think that's an option given 
google groups configuration.


I know you didn't ask this specifically, but I ran your sample image, 
unmodified, through AWS Textract,  and got great results. I'm happy to run 
a small subset of images through it if you have a wide range of inputs, 
quality of images, etc.

Please contact me off-list keith a_ t_ techtravels dot org.

Thanks,
Keith


On Tuesday, December 21, 2021 at 5:08:21 AM UTC-5 [email protected] 
wrote:

> I have an image (label of a microscopy slide), which I thought would be 
> easy to OCR, because it is easily readable for humans. I am using the 
> latest Tesseract V5 as a command line under Windows However, with
> tesseract image.jpg image.txt --oem 1 --psm x 
>
> with "--psm x" x being any number, which I tried, the results are poor (it 
> misses the bottom line with "LOT40446" and thinks "+" is a "4" after 
> binarization of the image I post here. Is there anything I can do to 
> improve the results? 
>
> I tried:
>
> - Binarizing the image
>
> - Setting DPI to 300 dpi
>
> With these latter, it produced: 
>
> *| +125 PROCock tai*
>
> * | 12/03/2021*
>
> *| 36729/21 344*
>
>
> Do you have any suggestion for improvements? On a side note, I tried the 
> in Windows 10 available library a9t9, which was a lot better, but had also 
> weaknesses.
>
> [image: JBOBF.jpg] 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ba5b6f38-e3a5-4f92-a81b-07fb72e1c1f2n%40googlegroups.com.

[tesseract-ocr] Re: Microscopy label, poor recognition

Reply via email to