Did you try script/bengali traineddata,? For adding a character in Indic languages where it can form many ligatures consonant conjuncts and different vowel forms, it is like adding many letters , so plus minus instructions won't work.
You will need to do replace a layer type training instead. Regarding normalization you should look at the text to make sure that it is ok. I don't know the script but my guess is that the vowel maatraa that go on both sides of consonants may have been encoded as separate rather than one. On Fri, 31 May 2019, 22:40 Jennil Thiyam, <[email protected]> wrote: > SHree Devi, any suggestions? > > On Fri, May 31, 2019 at 5:45 PM Jennil Thiyam <[email protected]> > wrote: > >> Assamese used some extra characters which are not used in Bengali and our >> language, so I want to modify in ben.traineddata. I tried using >> asm.traineddata, it recognizes the character that I wanted, but it also >> misrecognized other characters with their character(which are not used in >> bengali and our language). So i want to modify on ben.traineddata. And I >> want to know how ben.traineddata fail to recognized the character( that i >> asked ) even though that character is in unicharset >> >> On Fri, May 31, 2019 at 5:00 PM Shree Devi Kumar <[email protected]> >> wrote: >> >>> script/Bengali.traineddata is another option >>> >>> On Fri, 31 May 2019, 16:58 Shree Devi Kumar, <[email protected]> >>> wrote: >>> >>>> Please try the asm.traineddata which is for Assamese which is written >>>> in Bengali script. >>>> >>>> On Fri, 31 May 2019, 16:55 Jennil Thiyam, <[email protected]> >>>> wrote: >>>> >>>>> How come this character is in here??? Its not used in bengali, and >>>>> also not recognized by ben.traindata model, the character is in my >>>>> unicharset that I got after running tesstrain.sh >>>>> The character is pronounced as "waa" . I attached two picture, the >>>>> first one wa.png is the sshot of the unicharset from the link u have >>>>> given, >>>>> and the picture wa_11.png is the unicharset that i got after performing >>>>> tesstrain.sh(after adding this new character in ben.training_text) >>>>> The character is in line no.35(in wa.png) and 79(in wa_11.png) >>>>> >>>>> Please help me out >>>>> >>>>> On Fri, May 31, 2019 at 3:47 PM Shree Devi Kumar <[email protected]> >>>>> wrote: >>>>> >>>>>> or in >>>>>> >>>>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/asm/asm.unicharset >>>>>> >>>>>> >>>>>> On Fri, May 31, 2019 at 3:45 PM Shree Devi Kumar < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Is your new character included in >>>>>>> >>>>>>> >>>>>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/ben/ben.unicharset >>>>>>> >>>>>>> >>>>>>> On Fri, May 31, 2019 at 3:22 PM Jennil Thiyam < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I have followed the procedure (that is described in training >>>>>>>> tesseract 4 for fine tuning for putting plus-minus sign in >>>>>>>> eng.traineddata) >>>>>>>> to train ben.traineddata (by adding one character which is not in the >>>>>>>> Bengali alpahbets, more than 30 times, in ben.training_text). after >>>>>>>> creating starter training data and then running lstmtraining, the model >>>>>>>> failed to recognized the new character, in case of plus-minus, it is >>>>>>>> said >>>>>>>> that the plus-minus sign was recognized. >>>>>>>> Does anyone have any suggestion??? >>>>>>>> The demo of the training_text is given below, >>>>>>>> ..... >>>>>>>> লক্ষ্যমাত্রা নির্দেশ ধ্বংস কে >>>>>>>> দেখতে শুধু লাইব্রেরী আশা স্বাগত থাং >>>>>>>> শতাব্দী অন্ধ্রপ্রদেশ (িপিপিপ) >>>>>>>> সন্ধান করে অভ্যুত্থানের প্রসিদ্ধ >>>>>>>> ময়ূরের শুরু ইন্টারেস্টিং দলের ও >>>>>>>> পুিলেশর খ্রিস্টপূর্ব আশা প্রদর্শিত >>>>>>>> কহীং উইকিপিডিয়াতে এ্যান্ড 19 ইঞ্চি >>>>>>>> আছে ০ লিখতে অর্পানেট পরে এেক >>>>>>>> ভূঁইয়ার আছে করুন, গ্লোব সেপ্টেম্বর >>>>>>>> প্রশ্ন, >>>>>>>> *ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>>>>> .... >>>>>>>> the underlined text is the possible form that this new character >>>>>>>> can take*, *is ther any rule in adding this new character to the >>>>>>>> training text??? >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected] >>>>>>>> . >>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ____________________________________________________________ >>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodnB4J5dvxrKu-JnFGSzNZj%2B1LgGivW83WtMJJcoF_Xug%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoodnB4J5dvxrKu-JnFGSzNZj%2B1LgGivW83WtMJJcoF_Xug%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhp5QngJWnqEfjijcURAXuEwRKiD8CFgUqO2B0kd8Zvw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

