Ajinkya and I basically agree that Tesseract has to be retrained for this specific case. What helped me quite a lot is a walktrough by Guiem: https://medium.com/@guiem/how-to-train-tesseract-4-ebe5881ff3b7
Also some image cutting will be probably needed. - Tom Dne pátek 21. února 2020 17:17:45 UTC+1 Adrian Enders napsal(a): > > Would you be willing to share some of these steps here in the group? I > would be curious to know what some of your techniques are. I have worked > with this in the past as well with other technologies. > > - Adrian > > On Friday, February 21, 2020 at 1:46:21 AM UTC-7, Ajinkya Bobade wrote: >> >> Hello, >> >> I have solved this problem for multiple clients for past 3 years. I can >> walk you through the steps. >> You can reach out at [email protected] >> >> Regards >> Ajinkya >> >> On Fri, Feb 21, 2020 at 10:42 AM Tom Apeltauer <[email protected]> wrote: >> >>> Greetings everyone, >>> >>> I am standing in front of quite challenging task. Optical Character >>> Recognition of the data from the IDs taken by smartphone camera. I have >>> tried tesseract as-is, but the accuracy rate is somewhere around 40%. >>> >>> I have started tweaking around, disabling dictionaries and preprocessing >>> images to grayscale, using different page segmentation methods, but each >>> setting produces various and different accuracy on different photos. >>> >>> I am asking you guys as experts in the field if there are some tips you >>> could give me? See example here: >>> >>> https://drive.google.com/open?id=14PDZlbJ-HNFcHsPlE28cBT5VIxV9ceqW >>> >>> Dont mind the red parts. I have been doing at least some basic >>> "protection". You have seen nothing obviously. >>> >>> Thanks! >>> Tom A. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/4c9e4681-23dd-435b-a6f2-73ab78a122e7%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/4c9e4681-23dd-435b-a6f2-73ab78a122e7%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> > > > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL, intended for the sole use of the addressee(s), and may > contain information that is privileged and exempt from disclosure under > applicable law. If you are neither the intended recipient nor responsible > for delivering the message to the intended recipient, please note that any > dissemination, distribution, copying or the taking of any action in > reliance upon the message is strictly prohibited. If you have received this > communication in error, please notify the sender immediately. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e5901fb6-c5ce-40df-9109-369428015cb8%40googlegroups.com.

