Yep, I have done it :) Thank you for your help Shree Currently, I have modified my wordlists to 1 million, puncs, numbers and net spec but the result is not as good as finetuning from tessdata_best If anyone can suggest any tips for getting better result in training tesseract from scratch
On Wednesday, January 10, 2018 at 6:16:14 PM UTC+7, shree wrote: > > > On Wed, Jan 10, 2018 at 3:56 PM, <[email protected] <javascript:>> > wrote: > >> It works !! >> I modified your bash script and executed it. Finally I get different >> traineddata size. >> >> But, can I train it from scratch? >> It needs starting traineddata which I can get from combine_lang_model, >> isn't it? >> >> > Starter traineddata will be generated by tesstrain.sh, change the files > in langdata folder. > > To train from scratch, you need to change the lstmtraining command. It > will not need continue_from and old_traineddata. > > You will need to add a network specification - such as > > --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ > > Usually the best traineddata will have the network spec used for training > by Ray as part of the version string. > > See https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 > for more details. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b50a1339-4772-4df0-8b2c-bc94e58641b3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

