Hi Zdenko, very good job! i've tried so many image manipulation, but this was the wrong way for the problems 1-3. the idea with the uzn file is great and i think the perfect solution. Thanks :-)
i can confirm that scaling these image doesn't helped (more than 30 pixel per letter is the right explanation). what do you mean with the "end" traineddata? i have the "eng" traineddata and can't find "end.traineddata" - neither on google. i've tested it your files and the result is perfect. thank you, thank you, thank you! Am Samstag, 5. Oktober 2019 20:24:08 UTC+2 schrieb zdenop: > > First image has several problems: > > 1. not straight baseline > 2. different font size > 3. table like structure > 4. amount/digits fields > > > 1-3 could be solved with custom layout analyze e.g. splitting image to > individual parts and sending them to tesseract via API or uzn file. > > There was analyze (you can found it in forum) that suggest not to use > letters higher than 30 pixels,so I also resized input image. > > LSTM engine is not (always) good at OCR of amount field, so I suggest to > use legacy engine for this image (you will need end.trainneddata from > tessdata repository). > > Here is result: > tesseract 1_input_r.png - --psm 4 --oem 2 > UZN file 1_input_r.uzn loaded. > 15. > > 16. > > 17. > > 18. > > 19. > > Sophie > Mitglied > > DerNick03 > Mitglied > > Joko > Mitglied > > Jens > Mitglied > > Christian > Mitglied > > 76 > > 51 > > 0 > > 0 > > > Zdenko > > > so 5. 10. 2019 o 18:27 test0r man <test0r...@gmail.com <javascript:>> > napísal(a): > >> thanks for your test. i set the border with imagemagick for a better >> result on the first image. tesseract detects with psm 6 all numbers right, >> but only on the second image. have you tried the first image too? >> >> >> Am Samstag, 5. Oktober 2019 14:52:15 UTC+2 schrieb zdenop: >>> >>> >>> tesseract 2_input_cropped.png - --psm 6 --oem 0 >>> 6. >>> 7. >>> 8. >>> 9. >>> 10. >>> >>> >>> >>> Zdenko >>> >>> >>> so 5. 10. 2019 o 10:04 test0r man <test0r...@gmail.com> napísal(a): >>> >>>> --Push-- >>>> >>>> does anyone have an idea? >>>> >>>> thanks for help! >>>> >>>> >>>> Am Sonntag, 8. September 2019 12:23:28 UTC+2 schrieb test0r man: >>>>> >>>>> hi, >>>>> i use this command: >>>>> >>>>> tesseract input/image.jpg output/output --dpi 72 --oem 1 -l deu+eng >>>>> >>>>> to scan image like "1_input.jpg" and "2_input.jpg". the ocr result is >>>>> good, but it seems that tesseract ignores short/single characters. >>>>> in the first image it ignores the three "0". >>>>> in the second image it only detects the "10.". >>>>> >>>>> the tessinput files are attached too. >>>>> if i use the "--psm 6" command, all other words won't be detected >>>>> right. >>>>> if i scale the images to 300 dpi, it's the same result. >>>>> >>>>> has anyone an idea? thanks for help! >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesser...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/6bb8a731-afa3-4dbf-a805-90b9120b791b%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/6bb8a731-afa3-4dbf-a805-90b9120b791b%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/c84074cd-d44b-4c52-95d5-a725e2a2b6af%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/c84074cd-d44b-4c52-95d5-a725e2a2b6af%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7cd3752d-7fcc-44fe-bd0b-da291ea12d93%40googlegroups.com.