Why do you want to fine-tune eng to get to hindi traineddata?

You can fine-tune hin.traineddata or script/Devanagari.traineddata.

On Wed, Apr 8, 2020, 21:00 Piyush Chandra <[email protected]> wrote:

> When I downloaded the devenagari.unicharset, Latin.unicharset and
> radical-stroke.txt
> , it worked. What are these files and why we need this? Do we need to use
> these every time we work for new language or we need to create our own???
>
>
> On Wednesday, 8 April 2020 20:42:44 UTC+5:30, Piyush Chandra wrote:
>>
>> Hi,
>>
>> I am trying to create a hindi traineddata from scratch using
>> eng.traineddata.
>>
>> I used some png and txt files to create box file using lstmbox and edited
>> those box files to correct the words.
>>
>> Then, I used lstm.train to create lstm files and created unicharset file
>> from the box files using unicharset_extractor.
>>
>> But now, when i use combine_lang_model to get starter traineddata file I
>> am getting error. Please help.
>>
>> ~/hindiFiles/hindi$ /usr/local/bin/combine_lang_model --input_unicharset
>> ./langdata/hin/hin.unicharset --script_dir ./langdata --words
>> ./langdata/hin.wordlist --numbers ./langdata/hin.numbers --puncs
>> ./langdata/hin.punc --output_dir /home/piyush/hindiFiles/hindi/langdata/
>> --lang hin
>> Loaded unicharset of size 39 from file ./langdata/hin/hin.unicharset
>> Setting unichar properties
>> Setting script properties
>> Failed to load script unicharset from:./langdata/Latin.unicharset
>> Failed to load script unicharset from:./langdata/Devanagari.unicharset
>> Warning: properties incomplete for index 3 = मे
>> Warning: properties incomplete for index 4 = रा
>> Warning: properties incomplete for index 5 = ना
>> Warning: properties incomplete for index 6 = म
>> Warning: properties incomplete for index 7 = पी
>> Warning: properties incomplete for index 8 = यू
>> Warning: properties incomplete for index 9 = ष
>> Warning: properties incomplete for index 10 = है
>> Warning: properties incomplete for index 11 = ।
>> Warning: properties incomplete for index 12 = हाँ
>> Warning: properties incomplete for index 13 = ,
>> Warning: properties incomplete for index 14 = मु
>> Warning: properties incomplete for index 15 = झे
>> Warning: properties incomplete for index 16 = भू
>> Warning: properties incomplete for index 17 = ख
>> Warning: properties incomplete for index 18 = ल
>> Warning: properties incomplete for index 19 = गी
>> Warning: properties incomplete for index 20 = तु
>> Warning: properties incomplete for index 21 = म्‌
>> Warning: properties incomplete for index 22 = हा
>> Warning: properties incomplete for index 23 = क्‌
>> Warning: properties incomplete for index 24 = या
>> Warning: properties incomplete for index 25 = कै
>> Warning: properties incomplete for index 26 = से
>> Warning: properties incomplete for index 27 = हो
>> Warning: properties incomplete for index 28 = ?
>> Warning: properties incomplete for index 29 = क
>> Warning: properties incomplete for index 30 = ब
>> Warning: properties incomplete for index 31 = त
>> Warning: properties incomplete for index 32 = आ
>> Warning: properties incomplete for index 33 = ओ
>> Warning: properties incomplete for index 34 = गे
>> Warning: properties incomplete for index 35 = नीं
>> Warning: properties incomplete for index 36 = द
>> Warning: properties incomplete for index 37 = र
>> Warning: properties incomplete for index 38 = ही
>> Config file is optional, continuing...
>> Failed to read data from: ./langdata/hin/hin.config
>> Failed to read data from: ./langdata/radical-stroke.txt
>> Error reading radical code table ./langdata/radical-stroke.txt
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/77cf0099-a40e-4186-b76c-b844832e2240%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/77cf0099-a40e-4186-b76c-b844832e2240%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWJZhckXbxWoidt2QjywAv9aB09s1zqVSYL7Yzb9HkywQ%40mail.gmail.com.

Reply via email to