script/Bengali.traineddata is another option

On Fri, 31 May 2019, 16:58 Shree Devi Kumar, <[email protected]> wrote:

> Please try the asm.traineddata which is for Assamese which is written in
> Bengali script.
>
> On Fri, 31 May 2019, 16:55 Jennil Thiyam, <[email protected]> wrote:
>
>> How come this character is in here??? Its not used in bengali, and also
>> not recognized by ben.traindata model, the character is in my unicharset
>> that I got after running tesstrain.sh
>> The character is pronounced as "waa" . I attached two picture, the first
>> one wa.png is the sshot of the unicharset from the link u have given, and
>> the picture wa_11.png is the unicharset that i got after performing
>> tesstrain.sh(after adding this new character in ben.training_text)
>> The character is in line no.35(in wa.png) and 79(in wa_11.png)
>>
>> Please help me out
>>
>> On Fri, May 31, 2019 at 3:47 PM Shree Devi Kumar <[email protected]>
>> wrote:
>>
>>> or in
>>>
>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/asm/asm.unicharset
>>>
>>>
>>> On Fri, May 31, 2019 at 3:45 PM Shree Devi Kumar <[email protected]>
>>> wrote:
>>>
>>>> Is your new character included in
>>>>
>>>>
>>>> https://github.com/tesseract-ocr/langdata_lstm/blob/master/ben/ben.unicharset
>>>>
>>>>
>>>> On Fri, May 31, 2019 at 3:22 PM Jennil Thiyam <[email protected]>
>>>> wrote:
>>>>
>>>>> I have followed the procedure (that is described in training tesseract
>>>>> 4 for fine tuning for putting plus-minus sign in eng.traineddata) to train
>>>>> ben.traineddata (by adding one character which is not in the Bengali
>>>>> alpahbets, more than 30 times, in ben.training_text). after creating
>>>>> starter training data and then running lstmtraining, the model failed to
>>>>> recognized the new character, in case of plus-minus, it is said that the
>>>>> plus-minus sign was recognized.
>>>>> Does anyone have any suggestion???
>>>>> The demo of the training_text is given below,
>>>>> .....
>>>>> লক্ষ্যমাত্রা নির্দেশ ধ্বংস কে
>>>>> দেখতে শুধু লাইব্রেরী আশা স্বাগত থাং
>>>>> শতাব্দী অন্ধ্রপ্রদেশ (িপিপিপ)
>>>>> সন্ধান করে অভ্যুত্থানের প্রসিদ্ধ
>>>>> ময়ূরের শুরু ইন্টারেস্টিং দলের ও
>>>>> পুিলেশর খ্রিস্টপূর্ব আশা প্রদর্শিত
>>>>> কহীং উইকিপিডিয়াতে এ্যান্ড 19 ইঞ্চি
>>>>> আছে ০ লিখতে অর্পানেট পরে এেক
>>>>> ভূঁইয়ার আছে করুন, গ্লোব সেপ্টেম্বর
>>>>> প্রশ্ন,
>>>>> *ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>> *ৱ ৱা ৱি ৱী ৱু ৱূ ৱে ৱৈ ৱো ৱৌ ৱং*
>>>>> ....
>>>>> the underlined text is the possible form that this new character can
>>>>> take*, *is ther any rule in adding this new character to the training
>>>>> text???
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeysg5AfzppAXjKpREOvH2Jnz14wksMUjhsjotMJxE3bA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4hqzzPYxs5C3G7vdTrW%3DAfLgU7zi8cKH8YT22jE5C7g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooeEaQ6TnAXYnAqFpfU0KX5kppUBjxWDhv16bk4N%3Dher4g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGLL26ecjwu8j6PN_Z6wi_Y%2BxQXgwH55nAi-YNtY_Xzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to