any ideas?

test0r man Sat, 05 Oct 2019 12:31:59 -0700

Hi Zdenko,

very good job! i've tried so many image manipulation, but this was the 
wrong way for the problems 1-3. the idea with the uzn file is great and i 
think the perfect solution. Thanks :-)


i can confirm that scaling these image doesn't helped (more than 30 pixel 
per letter is the right explanation).

what do you mean with the "end" traineddata? i have the "eng" traineddata 
and can't find "end.traineddata" - neither on google.

i've tested it your files and the result is perfect. thank you, thank you, 
thank you!


Am Samstag, 5. Oktober 2019 20:24:08 UTC+2 schrieb zdenop:
>
> First image has several problems:
>
>    1. not straight baseline
>    2. different font size
>    3. table like structure
>    4. amount/digits fields
>
>
> 1-3  could be solved with custom layout analyze e.g. splitting image to 
> individual parts and sending them to tesseract via API or uzn file. 
>
> There was analyze (you can found it in forum) that suggest not to use 
> letters higher than 30 pixels,so I also resized input image.
>
> LSTM engine is not (always) good at OCR of amount field, so I suggest to 
> use legacy engine for this image (you will need end.trainneddata from 
> tessdata repository).
>
> Here is result:
> tesseract 1_input_r.png - --psm 4 --oem 2
> UZN file 1_input_r.uzn loaded.
> 15.
>
> 16.
>
> 17.
>
> 18.
>
> 19.
>
> Sophie
> Mitglied
>
> DerNick03
> Mitglied
>
> Joko
> Mitglied
>
> Jens
> Mitglied
>
> Christian
> Mitglied
>
> 76
>
> 51
>
> 0
>
> 0
>
>
> Zdenko
>
>
> so 5. 10. 2019 o 18:27 test0r man <[email protected] <javascript:>> 
> napísal(a):
>
>> thanks for your test. i set the border with imagemagick for a better 
>> result on the first image. tesseract detects with psm 6 all numbers right, 
>> but only on the second image. have you tried the first image too?
>>
>>
>> Am Samstag, 5. Oktober 2019 14:52:15 UTC+2 schrieb zdenop:
>>>
>>>
>>> tesseract 2_input_cropped.png - --psm 6 --oem 0
>>> 6.
>>> 7.
>>> 8.
>>> 9.
>>> 10.
>>>
>>>
>>>
>>> Zdenko
>>>
>>>
>>> so 5. 10. 2019 o 10:04 test0r man <[email protected]> napísal(a):
>>>
>>>> --Push--
>>>>
>>>> does anyone have an idea?
>>>>
>>>> thanks for help!
>>>>
>>>>
>>>> Am Sonntag, 8. September 2019 12:23:28 UTC+2 schrieb test0r man:
>>>>>
>>>>> hi,
>>>>> i use this command:
>>>>>
>>>>> tesseract input/image.jpg output/output --dpi 72 --oem 1 -l deu+eng
>>>>>
>>>>> to scan image like "1_input.jpg" and "2_input.jpg". the ocr result is 
>>>>> good, but it seems that tesseract ignores short/single characters.
>>>>> in the first image it ignores the three "0".
>>>>> in the second image it only detects the "10.".
>>>>>
>>>>> the tessinput files are attached too.
>>>>> if i use the "--psm 6" command, all other words won't be detected 
>>>>> right.
>>>>> if i scale the images to 300 dpi, it's the same result.
>>>>>
>>>>> has anyone an idea? thanks for help!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/6bb8a731-afa3-4dbf-a805-90b9120b791b%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6bb8a731-afa3-4dbf-a805-90b9120b791b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/c84074cd-d44b-4c52-95d5-a725e2a2b6af%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/c84074cd-d44b-4c52-95d5-a725e2a2b6af%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7cd3752d-7fcc-44fe-bd0b-da291ea12d93%40googlegroups.com.

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

Reply via email to