Thank you so much for all your help

On Fri, May 31, 2019 at 11:26 PM Jennil Thiyam <[email protected]>
wrote:

> So, your suggestion is perform fine tuning process to this
> bengali.traineddata?
>
> On Fri, May 31, 2019 at 11:16 PM Shree Devi Kumar <[email protected]>
> wrote:
>
>> https://github.com/tesseract-ocr/tessdata_best/tree/master/script
>>
>>
>>
>> On Fri, 31 May 2019, 23:01 Jennil Thiyam, <[email protected]> wrote:
>>
>>> What is this script/bengali traineddata???
>>> Is it not the ben,traineddata?
>>>
>>> On Fri, May 31, 2019 at 10:55 PM Shree Devi Kumar <[email protected]>
>>> wrote:
>>>
>>>> Did you try script/bengali traineddata,?
>>>>
>>>> For adding a character in Indic languages where it can form many
>>>> ligatures consonant conjuncts and different vowel forms, it is like adding
>>>> many letters , so plus minus instructions won't work.
>>>>
>>>> You will need to do replace a layer type training instead.
>>>>
>>>> Regarding normalization you should look at the text to make sure that
>>>> it is ok. I don't know the script but my guess is that the vowel maatraa
>>>> that go on both sides of consonants may have been encoded as separate
>>>> rather than one.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, 31 May 2019, 22:40 Jennil Thiyam, <[email protected]>
>>>> wrote:
>>>>
>>>>> SHree Devi, any suggestions?
>>>>>
>>>>> On Fri, May 31, 2019 at 5:45 PM Jennil Thiyam <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Assamese used some extra characters which are not used in Bengali and
>>>>>> our language, so I want to modify in ben.traineddata. I tried using
>>>>>> asm.traineddata, it recognizes the character that I wanted, but it also
>>>>>> misrecognized other characters with their character(which are not used in
>>>>>> bengali and our language). So i want to modify on ben.traineddata. And I
>>>>>> want to know how ben.traineddata fail to recognized the character( that i
>>>>>> asked ) even though that character is in unicharset
>>>>>>
>>>>>> On Fri, May 31, 2019 at 5:00 PM Shree Devi Kumar <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> script/Bengali.traineddata is another option
>>>>>>>
>>>>>>> On Fri, 31 May 2019, 16:58 Shree Devi Kumar, <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Please try the asm.traineddata which is for Assamese which is
>>>>>>>> written in Bengali script.
>>>>>>>>
>>>>>>>> On Fri, 31 May 2019, 16:55 Jennil Thiyam, <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How come this character is in here??? Its not used in bengali, and
>>>>>>>>> also not recognized by ben.traindata model, the character is in my
>>>>>>>>> unicharset that I got after running tesstrain.sh
>>>>>>>>> The character is pronounced as "waa" . I attached two picture, the
>>>>>>>>> first one wa.png is the sshot of the unicharset from the link u have 
>>>>>>>>> given,
>>>>>>>>> and the picture wa_11.png is the unicharset that i got after 
>>>>>>>>> performing
>>>>>>>>> tesstrain.sh(after adding this new character in ben.training_text)
>>>>>>>>> The character is in line no.35(in wa.png) and 79(in wa_11.png)
>>>>>>>>>
>>>>>>>>> Please help me out
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 3:47 PM Shree Devi Kumar <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> or in
>>>>>>>>>>
>>>>>>>>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/asm/asm.unicharset
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 31, 2019 at 3:45 PM Shree Devi Kumar <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Is your new character included in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/ben/ben.unicharset
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 31, 2019 at 3:22 PM Jennil Thiyam <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I have followed the procedure (that is described in training
>>>>>>>>>>>> tesseract 4 for fine tuning for putting plus-minus sign in 
>>>>>>>>>>>> eng.traineddata)
>>>>>>>>>>>> to train ben.traineddata (by adding one character which is not in 
>>>>>>>>>>>> the
>>>>>>>>>>>> Bengali alpahbets, more than 30 times, in ben.training_text). after
>>>>>>>>>>>> creating starter training data and then running lstmtraining, the 
>>>>>>>>>>>> model
>>>>>>>>>>>> failed to recognized the new character, in case of plus-minus, it 
>>>>>>>>>>>> is said
>>>>>>>>>>>> that the plus-minus sign was recognized.
>>>>>>>>>>>> Does anyone have any suggestion???
>>>>>>>>>>>> The demo of the training_text is given below,
>>>>>>>>>>>> .....
>>>>>>>>>>>> লক্ষ্যমাত্রা নির্দেশ ধ্বংস কে
>>>>>>>>>>>> দেখতে শুধু লাইব্রেরী আশা স্বাগত থাং
>>>>>>>>>>>> শতাব্দী অন্ধ্রপ্রদেশ (িপিপিপ)
>>>>>>>>>>>> সন্ধান করে অভ্যুত্থানের প্রসিদ্ধ
>>>>>>>>>>>> ময়ূরের শুরু ইন্টারেস্টিং দলের ও
>>>>>>>>>>>> পুিলেশর খ্রিস্টপূর্ব আশা প্রদর্শিত
>>>>>>>>>>>> কহীং উইকিপিডিয়াতে এ্যান্ড 19 ইঞ্চি
>>>>>>>>>>>> আছে ০ লিখতে অর্পানেট পরে এেক
>>>>>>>>>>>> ভূঁইয়ার আছে করুন, গ্লোব সেপ্টেম্বর
>>>>>>>>>>>> প্রশ্ন,
>>>>>>>>>>>> *ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>>>>>>>>> ....
>>>>>>>>>>>> the underlined text is the possible form that this new
>>>>>>>>>>>> character can take*, *is ther any rule in adding this new
>>>>>>>>>>>> character to the training text???
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to [email protected]
>>>>>>>>>>>> .
>>>>>>>>>>>> To post to this group, send email to
>>>>>>>>>>>> [email protected].
>>>>>>>>>>>> Visit this group at
>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> To post to this group, send email to
>>>>>>>>>> [email protected].
>>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr
>>>>>>>>>> .
>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to
>>>>>>>>> [email protected].
>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodnB4J5dvxrKu-JnFGSzNZj%2B1LgGivW83WtMJJcoF_Xug%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodnB4J5dvxrKu-JnFGSzNZj%2B1LgGivW83WtMJJcoF_Xug%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhp5QngJWnqEfjijcURAXuEwRKiD8CFgUqO2B0kd8Zvw%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhp5QngJWnqEfjijcURAXuEwRKiD8CFgUqO2B0kd8Zvw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoocuKwxprXafSoBj%3D_qC-NUU3MwSbGM7zRnLvM_YBuyeag%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoocuKwxprXafSoBj%3D_qC-NUU3MwSbGM7zRnLvM_YBuyeag%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXjwyKCRKLCb8c8TiJCAvuaj1mi9%3DMzX3EYPRbvkPKkvQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXjwyKCRKLCb8c8TiJCAvuaj1mi9%3DMzX3EYPRbvkPKkvQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoody0WapoHbmd5RQxX8LstgDzMVh%2B7DNDp0j7OujpHAauQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to