what is the scirpt for add model language on tesseract Le mercredi 10 juillet 2019 16:57:10 UTC+2, shree a écrit : > > --user-words does not currently work in tesseract4. > > On Wed, Jul 10, 2019 at 7:59 PM David Novak <[email protected] > <javascript:>> wrote: > >> >> Hello, >> >> I have a custom list of words that I'd like to add to (or practically >> substitute for) the default word list in my language. Some of these words >> combine letters & digits & punctuation e.g. >> 0.5KG >> 0.5L >> 1.1L >> 1.25KG >> 108G >> 4DOG >> >> I'm using tesseract 4.0. My approach so far: >> - unpack lang.traineddata >> - create cus.lstm-word-dawg (either just from my wordlist or as >> combination of standard language list + my list) >> - create new .traineddata from cus.lstm cus.lstm-recoder >> cus.lstm-unicharset cus.lstm-word-dawg cus.traineddata >> >> It has practically no effect... Often, a word that actually is in the >> list is recognized wrongly as some string that is not in the list. >> >> I have tried to add these words using --user-words <mylist.txt>: no >> effect, or the same as my approach >> I have tried -c language_model_penalty_non_dict_word=1.0 (I thought it >> would limit the output to words in cus.lstm-word-dawg): no effect >> >> I'm out of ideas after two weeks of trying. Any tips, please? >> >> Thanks >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/5b015d58-9958-4c1f-a330-abdb001f7957%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/5b015d58-9958-4c1f-a330-abdb001f7957%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/96ec979d-4bb6-4b75-aa0d-334f1f09729d%40googlegroups.com.

