Ajinkya and I basically agree that Tesseract has to be retrained for this 
specific case. What helped me quite a lot is a walktrough by Guiem: 
https://medium.com/@guiem/how-to-train-tesseract-4-ebe5881ff3b7

Also some image cutting will be probably needed.

- Tom

Dne pátek 21. února 2020 17:17:45 UTC+1 Adrian Enders napsal(a):
>
> Would you be willing to share some of these steps here in the group? I 
> would be curious to know what some of your techniques are. I have worked 
> with this in the past as well with other technologies.
>
> - Adrian
>
> On Friday, February 21, 2020 at 1:46:21 AM UTC-7, Ajinkya Bobade wrote:
>>
>> Hello, 
>>
>> I have solved this problem for multiple clients for past 3 years. I can 
>> walk you through the steps.
>> You can reach out at [email protected]
>>
>> Regards
>> Ajinkya 
>>
>> On Fri, Feb 21, 2020 at 10:42 AM Tom Apeltauer <[email protected]> wrote:
>>
>>> Greetings everyone,
>>>
>>> I am standing in front of quite challenging task. Optical Character 
>>> Recognition of the data from the IDs taken by smartphone camera. I have 
>>> tried tesseract as-is, but the accuracy rate is somewhere around 40%.
>>>
>>> I have started tweaking around, disabling dictionaries and preprocessing 
>>> images to grayscale, using different page segmentation methods, but each 
>>> setting produces various and different accuracy on different photos.
>>>
>>> I am asking you guys as experts in the field if there are some tips you 
>>> could give me? See example here:
>>>
>>> https://drive.google.com/open?id=14PDZlbJ-HNFcHsPlE28cBT5VIxV9ceqW
>>>
>>> Dont mind the red parts. I have been doing at least some basic 
>>> "protection". You have seen nothing obviously.
>>>
>>> Thanks!
>>> Tom A.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/4c9e4681-23dd-435b-a6f2-73ab78a122e7%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/4c9e4681-23dd-435b-a6f2-73ab78a122e7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>
>
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS 
> CONFIDENTIAL, intended for the sole use of the addressee(s), and may 
> contain information that is privileged and exempt from disclosure under 
> applicable law. If you are neither the intended recipient nor responsible 
> for delivering the message to the intended recipient, please note that any 
> dissemination, distribution, copying or the taking of any action in 
> reliance upon the message is strictly prohibited. If you have received this 
> communication in error, please notify the sender immediately.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e5901fb6-c5ce-40df-9109-369428015cb8%40googlegroups.com.

Reply via email to