Thank you so much for all your help On Fri, May 31, 2019 at 11:26 PM Jennil Thiyam <[email protected]> wrote:
> So, your suggestion is perform fine tuning process to this > bengali.traineddata? > > On Fri, May 31, 2019 at 11:16 PM Shree Devi Kumar <[email protected]> > wrote: > >> https://github.com/tesseract-ocr/tessdata_best/tree/master/script >> >> >> >> On Fri, 31 May 2019, 23:01 Jennil Thiyam, <[email protected]> wrote: >> >>> What is this script/bengali traineddata??? >>> Is it not the ben,traineddata? >>> >>> On Fri, May 31, 2019 at 10:55 PM Shree Devi Kumar <[email protected]> >>> wrote: >>> >>>> Did you try script/bengali traineddata,? >>>> >>>> For adding a character in Indic languages where it can form many >>>> ligatures consonant conjuncts and different vowel forms, it is like adding >>>> many letters , so plus minus instructions won't work. >>>> >>>> You will need to do replace a layer type training instead. >>>> >>>> Regarding normalization you should look at the text to make sure that >>>> it is ok. I don't know the script but my guess is that the vowel maatraa >>>> that go on both sides of consonants may have been encoded as separate >>>> rather than one. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, 31 May 2019, 22:40 Jennil Thiyam, <[email protected]> >>>> wrote: >>>> >>>>> SHree Devi, any suggestions? >>>>> >>>>> On Fri, May 31, 2019 at 5:45 PM Jennil Thiyam <[email protected]> >>>>> wrote: >>>>> >>>>>> Assamese used some extra characters which are not used in Bengali and >>>>>> our language, so I want to modify in ben.traineddata. I tried using >>>>>> asm.traineddata, it recognizes the character that I wanted, but it also >>>>>> misrecognized other characters with their character(which are not used in >>>>>> bengali and our language). So i want to modify on ben.traineddata. And I >>>>>> want to know how ben.traineddata fail to recognized the character( that i >>>>>> asked ) even though that character is in unicharset >>>>>> >>>>>> On Fri, May 31, 2019 at 5:00 PM Shree Devi Kumar < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> script/Bengali.traineddata is another option >>>>>>> >>>>>>> On Fri, 31 May 2019, 16:58 Shree Devi Kumar, <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Please try the asm.traineddata which is for Assamese which is >>>>>>>> written in Bengali script. >>>>>>>> >>>>>>>> On Fri, 31 May 2019, 16:55 Jennil Thiyam, <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> How come this character is in here??? Its not used in bengali, and >>>>>>>>> also not recognized by ben.traindata model, the character is in my >>>>>>>>> unicharset that I got after running tesstrain.sh >>>>>>>>> The character is pronounced as "waa" . I attached two picture, the >>>>>>>>> first one wa.png is the sshot of the unicharset from the link u have >>>>>>>>> given, >>>>>>>>> and the picture wa_11.png is the unicharset that i got after >>>>>>>>> performing >>>>>>>>> tesstrain.sh(after adding this new character in ben.training_text) >>>>>>>>> The character is in line no.35(in wa.png) and 79(in wa_11.png) >>>>>>>>> >>>>>>>>> Please help me out >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 3:47 PM Shree Devi Kumar < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> or in >>>>>>>>>> >>>>>>>>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/asm/asm.unicharset >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 3:45 PM Shree Devi Kumar < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Is your new character included in >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/ben/ben.unicharset >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 3:22 PM Jennil Thiyam < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I have followed the procedure (that is described in training >>>>>>>>>>>> tesseract 4 for fine tuning for putting plus-minus sign in >>>>>>>>>>>> eng.traineddata) >>>>>>>>>>>> to train ben.traineddata (by adding one character which is not in >>>>>>>>>>>> the >>>>>>>>>>>> Bengali alpahbets, more than 30 times, in ben.training_text). after >>>>>>>>>>>> creating starter training data and then running lstmtraining, the >>>>>>>>>>>> model >>>>>>>>>>>> failed to recognized the new character, in case of plus-minus, it >>>>>>>>>>>> is said >>>>>>>>>>>> that the plus-minus sign was recognized. >>>>>>>>>>>> Does anyone have any suggestion??? >>>>>>>>>>>> The demo of the training_text is given below, >>>>>>>>>>>> ..... >>>>>>>>>>>> লক্ষ্যমাত্রা নির্দেশ ধ্বংস কে >>>>>>>>>>>> দেখতে শুধু লাইব্রেরী আশা স্বাগত থাং >>>>>>>>>>>> শতাব্দী অন্ধ্রপ্রদেশ (িপিপিপ) >>>>>>>>>>>> সন্ধান করে অভ্যুত্থানের প্রসিদ্ধ >>>>>>>>>>>> ময়ূরের শুরু ইন্টারেস্টিং দলের ও >>>>>>>>>>>> পুিলেশর খ্রিস্টপূর্ব আশা প্রদর্শিত >>>>>>>>>>>> কহীং উইকিপিডিয়াতে এ্যান্ড 19 ইঞ্চি >>>>>>>>>>>> আছে ০ লিখতে অর্পানেট পরে এেক >>>>>>>>>>>> ভূঁইয়ার আছে করুন, গ্লোব সেপ্টেম্বর >>>>>>>>>>>> প্রশ্ন, >>>>>>>>>>>> *ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>>>>>> .... >>>>>>>>>>>> the underlined text is the possible form that this new >>>>>>>>>>>> character can take*, *is ther any rule in adding this new >>>>>>>>>>>> character to the training text??? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to [email protected] >>>>>>>>>>>> . >>>>>>>>>>>> To post to this group, send email to >>>>>>>>>>>> [email protected]. >>>>>>>>>>>> Visit this group at >>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com >>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> ____________________________________________________________ >>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> To post to this group, send email to >>>>>>>>>> [email protected]. >>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr >>>>>>>>>> . >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com >>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to >>>>>>>>> [email protected]. >>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodnB4J5dvxrKu-JnFGSzNZj%2B1LgGivW83WtMJJcoF_Xug%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodnB4J5dvxrKu-JnFGSzNZj%2B1LgGivW83WtMJJcoF_Xug%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhp5QngJWnqEfjijcURAXuEwRKiD8CFgUqO2B0kd8Zvw%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhp5QngJWnqEfjijcURAXuEwRKiD8CFgUqO2B0kd8Zvw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoocuKwxprXafSoBj%3D_qC-NUU3MwSbGM7zRnLvM_YBuyeag%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoocuKwxprXafSoBj%3D_qC-NUU3MwSbGM7zRnLvM_YBuyeag%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXjwyKCRKLCb8c8TiJCAvuaj1mi9%3DMzX3EYPRbvkPKkvQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXjwyKCRKLCb8c8TiJCAvuaj1mi9%3DMzX3EYPRbvkPKkvQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoody0WapoHbmd5RQxX8LstgDzMVh%2B7DNDp0j7OujpHAauQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

