Does anyone have an idea why https://cloud.google.com/document-ai#section-2 is so good while I get bad results with plain Tesseract? What could cause this?
[email protected] schrieb am Freitag, 4. Juni 2021 um 21:13:50 UTC+2: > Following up: try uploading images of real world docs. Please avoid taking > photos of photos ( that is photos of computer screen which has documents). > Don't take photos of computer screen containing documents. Capture real > document and upload them. > > On Sat, Jun 5, 2021, 12:38 AM Ajinkya Bobade <[email protected]> wrote: > >> Hi Timo, >> >> Results are in low resolution because the image that you uploaded must be >> taken from sample set, this image that you uploaded is not taken from a >> real mobile phone camera. >> >> I recommend you to upload image captured from good quality phone camera >> and retry few more times with different images captured from phone camera. >> My software works poorly for sample images which are not real world. It >> works excellent for images in real world. >> >> Feel free to reach out to me if you have any questions or concerns. >> >> Regards >> Ajinkya >> >> >> >> >> >> >> On Thu, Jun 3, 2021, 4:38 PM Timo Richter <[email protected]> wrote: >> >>> Hi Ajinkya, >>> >>> the result looks better than mine. But it looks like a very low >>> resolution, the text is not readable. How did you do it? >>> Still the Google AI website is a lot more accurate. How can they have >>> done this? >>> >>> >>> [email protected] schrieb am Mittwoch, 2. Juni 2021 um 17:23:44 >>> UTC+2: >>> >>>> Hello, >>>> I have created a web extension which solves this problem. Upload image >>>> to https://imagescanner-online.com/ it will clear your noise and >>>> pixel-segment text so that you get a very good quality input, which you >>>> can >>>> feed to tesseract and get good output >>>> >>>> Regards >>>> Ajinkya >>>> >>>> On Wed, Jun 2, 2021 at 12:13 AM Timo Richter <[email protected]> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> I have tried to ocr an identity card [1] and big parts were not >>>>> recognised. I do not get anything from the headline nor the first few >>>>> rows. >>>>> From the middle, Tesseract partially finds correct text. There are lines >>>>> and things in the background, as usual. In the monochrome picture I could >>>>> not completely extract the letters from the background. Some gray pixels >>>>> stay there. But there is a website that does OCR and it works perfectly >>>>> [2]. Why do I get bad results and my Tesseract does not read the text? >>>>> What >>>>> will the website do another way? >>>>> >>>>> >>>>> Thank you in advance, >>>>> >>>>> Timo >>>>> >>>>> >>>>> [1] >>>>> https://en.wikipedia.org/wiki/Philippine_passport#/media/File:Philippine_passport_(2016_edition)_data_page.jpg >>>>> >>>>> (public domain) >>>>> [2] https://cloud.google.com/document-ai#section-2 >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/4f6d0261-5e0a-49c8-b6db-3e2b0e4ad9f5n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4f6d0261-5e0a-49c8-b6db-3e2b0e4ad9f5n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/9e83609b-1bad-4134-950a-025357e092b5n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/9e83609b-1bad-4134-950a-025357e092b5n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4cc66248-46a9-4914-9aed-0b5acf91375an%40googlegroups.com.

