Does anyone have an idea why https://cloud.google.com/document-ai#section-2 
is so good while I get bad results with plain Tesseract? What could cause 
this?

[email protected] schrieb am Freitag, 4. Juni 2021 um 21:13:50 UTC+2:

> Following up: try uploading images of real world docs. Please avoid taking 
> photos of photos ( that is photos of computer screen which has documents). 
> Don't take photos of computer screen containing documents.  Capture real 
> document and upload them. 
>
> On Sat, Jun 5, 2021, 12:38 AM Ajinkya Bobade <[email protected]> wrote:
>
>> Hi Timo,
>>
>> Results are in low resolution because the image that you uploaded must be 
>> taken from sample set, this image that you uploaded is not taken from a 
>> real mobile phone camera.
>>
>> I recommend you to upload image captured from good quality phone camera 
>> and retry few more times with different images captured from phone camera. 
>> My software works poorly for sample images which are not real world. It 
>> works excellent for images in real world. 
>>
>> Feel free to reach out to me if you have any questions or concerns. 
>>
>> Regards
>> Ajinkya 
>>
>>
>>
>>
>>
>>
>> On Thu, Jun 3, 2021, 4:38 PM Timo Richter <[email protected]> wrote:
>>
>>> Hi Ajinkya,
>>>
>>> the result looks better than mine. But it looks like a very low 
>>> resolution, the text is not readable. How did you do it?
>>> Still the Google AI website is a lot more accurate. How can they have 
>>> done this?
>>>
>>>
>>> [email protected] schrieb am Mittwoch, 2. Juni 2021 um 17:23:44 
>>> UTC+2:
>>>
>>>> Hello,
>>>> I have created a web extension which solves this problem. Upload image 
>>>> to https://imagescanner-online.com/  it will clear your noise and 
>>>> pixel-segment text so that you get a very good quality input, which you 
>>>> can 
>>>> feed to tesseract and get good output
>>>>
>>>> Regards
>>>> Ajinkya
>>>>
>>>> On Wed, Jun 2, 2021 at 12:13 AM Timo Richter <[email protected]> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I have tried to ocr an identity card [1] and big parts were not 
>>>>> recognised. I do not get anything from the headline nor the first few 
>>>>> rows. 
>>>>> From the middle, Tesseract partially finds correct text. There are lines 
>>>>> and things in the background, as usual. In the monochrome picture I could 
>>>>> not completely extract the letters from the background. Some gray pixels 
>>>>> stay there. But there is a website that does OCR and it works perfectly 
>>>>> [2]. Why do I get bad results and my Tesseract does not read the text? 
>>>>> What 
>>>>> will the website do another way?
>>>>>
>>>>>
>>>>> Thank you in advance,
>>>>>
>>>>> Timo
>>>>>
>>>>>
>>>>> [1] 
>>>>> https://en.wikipedia.org/wiki/Philippine_passport#/media/File:Philippine_passport_(2016_edition)_data_page.jpg
>>>>>  
>>>>> (public domain)
>>>>> [2] https://cloud.google.com/document-ai#section-2
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4f6d0261-5e0a-49c8-b6db-3e2b0e4ad9f5n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4f6d0261-5e0a-49c8-b6db-3e2b0e4ad9f5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/9e83609b-1bad-4134-950a-025357e092b5n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/9e83609b-1bad-4134-950a-025357e092b5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4cc66248-46a9-4914-9aed-0b5acf91375an%40googlegroups.com.

Reply via email to