end is typo ;-) should be read as eng :-) Dňa so 5. 10. 2019, 21:31 test0r man <test0rman...@gmail.com> napísal(a):
> Hi Zdenko, > > very good job! i've tried so many image manipulation, but this was the > wrong way for the problems 1-3. the idea with the uzn file is great and i > think the perfect solution. Thanks :-) > > i can confirm that scaling these image doesn't helped (more than 30 pixel > per letter is the right explanation). > > what do you mean with the "end" traineddata? i have the "eng" traineddata > and can't find "end.traineddata" - neither on google. > > i've tested it your files and the result is perfect. thank you, thank you, > thank you! > > > Am Samstag, 5. Oktober 2019 20:24:08 UTC+2 schrieb zdenop: >> >> First image has several problems: >> >> 1. not straight baseline >> 2. different font size >> 3. table like structure >> 4. amount/digits fields >> >> >> 1-3 could be solved with custom layout analyze e.g. splitting image to >> individual parts and sending them to tesseract via API or uzn file. >> >> There was analyze (you can found it in forum) that suggest not to use >> letters higher than 30 pixels,so I also resized input image. >> >> LSTM engine is not (always) good at OCR of amount field, so I suggest to >> use legacy engine for this image (you will need end.trainneddata from >> tessdata repository). >> >> Here is result: >> tesseract 1_input_r.png - --psm 4 --oem 2 >> UZN file 1_input_r.uzn loaded. >> 15. >> >> 16. >> >> 17. >> >> 18. >> >> 19. >> >> Sophie >> Mitglied >> >> DerNick03 >> Mitglied >> >> Joko >> Mitglied >> >> Jens >> Mitglied >> >> Christian >> Mitglied >> >> 76 >> >> 51 >> >> 0 >> >> 0 >> >> >> Zdenko >> >> >> so 5. 10. 2019 o 18:27 test0r man <test0r...@gmail.com> napísal(a): >> >>> thanks for your test. i set the border with imagemagick for a better >>> result on the first image. tesseract detects with psm 6 all numbers right, >>> but only on the second image. have you tried the first image too? >>> >>> >>> Am Samstag, 5. Oktober 2019 14:52:15 UTC+2 schrieb zdenop: >>>> >>>> >>>> tesseract 2_input_cropped.png - --psm 6 --oem 0 >>>> 6. >>>> 7. >>>> 8. >>>> 9. >>>> 10. >>>> >>>> >>>> >>>> Zdenko >>>> >>>> >>>> so 5. 10. 2019 o 10:04 test0r man <test0r...@gmail.com> napísal(a): >>>> >>>>> --Push-- >>>>> >>>>> does anyone have an idea? >>>>> >>>>> thanks for help! >>>>> >>>>> >>>>> Am Sonntag, 8. September 2019 12:23:28 UTC+2 schrieb test0r man: >>>>>> >>>>>> hi, >>>>>> i use this command: >>>>>> >>>>>> tesseract input/image.jpg output/output --dpi 72 --oem 1 -l deu+eng >>>>>> >>>>>> to scan image like "1_input.jpg" and "2_input.jpg". the ocr result is >>>>>> good, but it seems that tesseract ignores short/single characters. >>>>>> in the first image it ignores the three "0". >>>>>> in the second image it only detects the "10.". >>>>>> >>>>>> the tessinput files are attached too. >>>>>> if i use the "--psm 6" command, all other words won't be detected >>>>>> right. >>>>>> if i scale the images to 300 dpi, it's the same result. >>>>>> >>>>>> has anyone an idea? thanks for help! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesser...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/6bb8a731-afa3-4dbf-a805-90b9120b791b%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6bb8a731-afa3-4dbf-a805-90b9120b791b%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesser...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/c84074cd-d44b-4c52-95d5-a725e2a2b6af%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/c84074cd-d44b-4c52-95d5-a725e2a2b6af%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/7cd3752d-7fcc-44fe-bd0b-da291ea12d93%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/7cd3752d-7fcc-44fe-bd0b-da291ea12d93%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yykvwuDqfMb-r3OaS_-HJvFWe0882aHzKXnJvLbcK%3DgA%40mail.gmail.com.