what is the scirpt for add model language on tesseract

Le mercredi 10 juillet 2019 16:57:10 UTC+2, shree a écrit :
>
> --user-words does not currently work in tesseract4.
>
> On Wed, Jul 10, 2019 at 7:59 PM David Novak <[email protected] 
> <javascript:>> wrote:
>
>>
>> Hello,
>>
>> I have a custom list of words that I'd like to add to (or practically 
>> substitute for) the default word list in my language. Some of these words 
>> combine letters & digits & punctuation e.g.
>> 0.5KG
>> 0.5L
>> 1.1L
>> 1.25KG
>> 108G
>> 4DOG
>>
>> I'm using tesseract 4.0. My approach so far:
>>  - unpack lang.traineddata
>>  - create cus.lstm-word-dawg  (either just from my wordlist or as 
>> combination of standard language list + my list)
>>  - create new .traineddata from cus.lstm cus.lstm-recoder 
>> cus.lstm-unicharset cus.lstm-word-dawg cus.traineddata
>>
>> It has practically no effect... Often, a word that actually is in the 
>> list is recognized wrongly as some string that is not in the list.
>>
>> I have tried to add these words using --user-words <mylist.txt>: no 
>> effect, or the same as my approach
>> I have tried -c language_model_penalty_non_dict_word=1.0  (I thought it 
>> would limit the output to words in cus.lstm-word-dawg): no effect
>>
>> I'm out of ideas after two weeks of trying. Any tips, please?
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/5b015d58-9958-4c1f-a330-abdb001f7957%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/5b015d58-9958-4c1f-a330-abdb001f7957%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/96ec979d-4bb6-4b75-aa0d-334f1f09729d%40googlegroups.com.

Reply via email to