For LSTM training punc, numbers, wordlist are NOT required. You can add
them if you like. Unicharset is generated from the training text.

Are you planning to train from text or images?

On Mon, Jan 27, 2020 at 2:19 AM 'Nilambari Joshi' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:

> Thanks for your response. I will work as suggested. Please also clarify
> whether I need to create separate language directory for Modi similar to
> Marathi with all files like number, punc wordlist included and a separate
> unicharset file as well?
> Thanks in advance.
>
> On Sunday, January 26, 2020 at 12:26:51 PM UTC-5, shree wrote:
>>
>> Thanks for the link to Modi Unicode font.
>>
>> I would convert the Marathi training text to Modi script (use
>> Aksharamukha) and then train using the unicode font.
>>
>> On Sun, Jan 26, 2020 at 10:28 PM Patrick CHEW <patri...@gmail.com> wrote:
>>
>>>
>>> On Jan 26, 2020, at 08:16, Shree Devi Kumar <shree...@gmail.com> wrote:
>>>
>>> Is there a Unicode font for modi script?
>>>
>>>
>>> https://github.com/MihailJP/MarathiCursive
>>>
>>> On Sun, Jan 26, 2020, 21:22 'Nilambari Joshi' via tesseract-ocr <
>>> tesser...@googlegroups.com> wrote:
>>>
>>>> Hi... I want to create Modi script (Marathi language) traineddata in
>>>> tesseract for OCR. Can somebody guide what steps should I follow.
>>>> I referred to
>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>> but stuckup at a stage of creating box files.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUCQftExZW8NvWW%2BNt6dYpy0ajktn0jL%3D0qsRMCWWgudQ%40mail.gmail.com.

Reply via email to