Martin, I'd normally reply privately here, but I don't think that's an option given google groups configuration.
I know you didn't ask this specifically, but I ran your sample image, unmodified, through AWS Textract, and got great results. I'm happy to run a small subset of images through it if you have a wide range of inputs, quality of images, etc. Please contact me off-list keith a_ t_ techtravels dot org. Thanks, Keith On Tuesday, December 21, 2021 at 5:08:21 AM UTC-5 [email protected] wrote: > I have an image (label of a microscopy slide), which I thought would be > easy to OCR, because it is easily readable for humans. I am using the > latest Tesseract V5 as a command line under Windows However, with > tesseract image.jpg image.txt --oem 1 --psm x > > with "--psm x" x being any number, which I tried, the results are poor (it > misses the bottom line with "LOT40446" and thinks "+" is a "4" after > binarization of the image I post here. Is there anything I can do to > improve the results? > > I tried: > > - Binarizing the image > > - Setting DPI to 300 dpi > > With these latter, it produced: > > *| +125 PROCock tai* > > * | 12/03/2021* > > *| 36729/21 344* > > > Do you have any suggestion for improvements? On a side note, I tried the > in Windows 10 available library a9t9, which was a lot better, but had also > weaknesses. > > [image: JBOBF.jpg] > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ba5b6f38-e3a5-4f92-a81b-07fb72e1c1f2n%40googlegroups.com.

