script/Bengali.traineddata is another option On Fri, 31 May 2019, 16:58 Shree Devi Kumar, <[email protected]> wrote:
> Please try the asm.traineddata which is for Assamese which is written in > Bengali script. > > On Fri, 31 May 2019, 16:55 Jennil Thiyam, <[email protected]> wrote: > >> How come this character is in here??? Its not used in bengali, and also >> not recognized by ben.traindata model, the character is in my unicharset >> that I got after running tesstrain.sh >> The character is pronounced as "waa" . I attached two picture, the first >> one wa.png is the sshot of the unicharset from the link u have given, and >> the picture wa_11.png is the unicharset that i got after performing >> tesstrain.sh(after adding this new character in ben.training_text) >> The character is in line no.35(in wa.png) and 79(in wa_11.png) >> >> Please help me out >> >> On Fri, May 31, 2019 at 3:47 PM Shree Devi Kumar <[email protected]> >> wrote: >> >>> or in >>> >>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/asm/asm.unicharset >>> >>> >>> On Fri, May 31, 2019 at 3:45 PM Shree Devi Kumar <[email protected]> >>> wrote: >>> >>>> Is your new character included in >>>> >>>> >>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/ben/ben.unicharset >>>> >>>> >>>> On Fri, May 31, 2019 at 3:22 PM Jennil Thiyam <[email protected]> >>>> wrote: >>>> >>>>> I have followed the procedure (that is described in training tesseract >>>>> 4 for fine tuning for putting plus-minus sign in eng.traineddata) to train >>>>> ben.traineddata (by adding one character which is not in the Bengali >>>>> alpahbets, more than 30 times, in ben.training_text). after creating >>>>> starter training data and then running lstmtraining, the model failed to >>>>> recognized the new character, in case of plus-minus, it is said that the >>>>> plus-minus sign was recognized. >>>>> Does anyone have any suggestion??? >>>>> The demo of the training_text is given below, >>>>> ..... >>>>> লক্ষ্যমাত্রা নির্দেশ ধ্বংস কে >>>>> দেখতে শুধু লাইব্রেরী আশা স্বাগত থাং >>>>> শতাব্দী অন্ধ্রপ্রদেশ (িপিপিপ) >>>>> সন্ধান করে অভ্যুত্থানের প্রসিদ্ধ >>>>> ময়ূরের শুরু ইন্টারেস্টিং দলের ও >>>>> পুিলেশর খ্রিস্টপূর্ব আশা প্রদর্শিত >>>>> কহীং উইকিপিডিয়াতে এ্যান্ড 19 ইঞ্চি >>>>> আছে ০ লিখতে অর্পানেট পরে এেক >>>>> ভূঁইয়ার আছে করুন, গ্লোব সেপ্টেম্বর >>>>> প্রশ্ন, >>>>> *ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং* >>>>> .... >>>>> the underlined text is the possible form that this new character can >>>>> take*, *is ther any rule in adding this new character to the training >>>>> text??? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

