I am not sure what version of tesseract and traineddata file you are using.
It works fine with latest code and traineddata files from all three
tessdata repos.

ubuntu@tesseract-ocr:~/TEST$ tesseract pan.png - -l pan --tessdata-dir
~/tessdata --psm 6 --oem 1
Warning: Invalid resolution 0 dpi. Using 70 instead.
ਖੀ ਜ਼ਿੰਦਗੀ ਦਾ
ਤੋਂ ਵੱਡਾ ਗੁਣ
ubuntu@tesseract-ocr:~/TEST$ tesseract pan.png - -l pan --tessdata-dir
~/tessdata_best --psm 6 --oem 1
Warning: Invalid resolution 0 dpi. Using 70 instead.
ਖੀ ਜ਼ਿੰਦਗੀ ਦਾ
ਤੋਂ ਵੱਡਾ ਗੁਣ
ubuntu@tesseract-ocr:~/TEST$ tesseract pan.png - -l pan --tessdata-dir
~/tessdata_fast --psm 6 --oem 1
Warning: Invalid resolution 0 dpi. Using 70 instead.
ਖੀ ਜ਼ਿੰਦਗੀ ਦਾ
ਤੋਂ ਵੱਡਾ ਗੁਣ
ubuntu@tesseract-ocr:~/TEST$



On Sun, Jan 12, 2020 at 5:29 PM neet k <[email protected]> wrote:

> Using Tesseract to recognize Text from images. The spaces between words
> are ignored for Punjabi text.
>
> Library : Tess-Two
>
> Platform : Android
>
> How i can fix the problem related to spaces. Hereby, attaching a
> screenshot, input and output text.
>
> Regards
>
> On Tuesday, May 29, 2018 at 4:33:43 PM UTC+5:30, shree wrote:
>>
>> set the config variable - "preserve_interword_spaces" to 1
>> And as 0
>> For diff runs
>> and see if that makes any difference
>>
>> On Tue 29 May, 2018, 4:30 PM ShreeDevi Kumar, <[email protected]> wrote:
>>
>>> >The traineddata from tesseract does not have a spacing problem,
>>>
>>> Then the problem is related to training.
>>>
>>>
>>>
>>>
>>> On Tue 29 May, 2018, 4:16 PM Sumedhe Dissanayake, <
>>> [email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Friday, May 18, 2018 at 6:32:44 PM UTC+5:30, shree wrote:
>>>>>
>>>>> image is not visible.
>>>>>
>>>>> ShreeDevi
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>>> On Fri, May 18, 2018 at 5:39 PM, Sumedhe Dissanayake <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Sometimes spaces between words are ignored when tesseract is used to
>>>>>> recognize Sinhala text.
>>>>>>
>>>>>> - The traineddata from tesseract does not have a spacing problem,
>>>>>> even though there ware changes in tesseract since it was uploaded.
>>>>>> - The spacing problem occurs regardless of whether I start the
>>>>>> training from scratch or bootstrap with the traineddata from tesseract.
>>>>>> - The spacing problem gets worse with more training.
>>>>>> - Adding more space between the words during training does not make a
>>>>>> difference.
>>>>>> - Adding double space between the words during recognition solves the
>>>>>> problem.
>>>>>> - The spacing problem is not consistent, i.e. in the recognition of a
>>>>>> text only some of the inter-word spaces are ignored (could not figure out
>>>>>> any logic as to when it happens).
>>>>>>
>>>>>> I have attached a screenshot, comparing a sample of input and output
>>>>>> text.
>>>>>>
>>>>>> Words missing spaces are underlined.
>>>>>>
>>>>>>
>>>>>> <https://lh3.googleusercontent.com/-T6hAiA4VclA/Wv1HEKkrioI/AAAAAAAAIN4/hZors3-ZJq01n24E3_c_JFzhws90X-x9gCLcBGAs/s1600/Screenshot_20180517_143558.png>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dfba845a-abe4-48fa-b834-7c64faf54f13%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dfba845a-abe4-48fa-b834-7c64faf54f13%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/df48ecd1-5340-47ab-8b3d-f9b02eaae89e%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/df48ecd1-5340-47ab-8b3d-f9b02eaae89e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/20bd94ee-1630-4291-93b5-4a1f7f4a926f%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/20bd94ee-1630-4291-93b5-4a1f7f4a926f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWeAZtO94seoSeKN8PUdk6Aq9QqG3wZtBjTV-7Ccn_%3Dtg%40mail.gmail.com.

Reply via email to