Yep, I have done it :)
Thank you for your help Shree

Currently, I have modified my wordlists to 1 million, puncs, numbers and 
net spec but the result is not as good as finetuning from tessdata_best 
If anyone can suggest any tips for getting better result in training 
tesseract from scratch, please share it with me :))

On Wednesday, January 10, 2018 at 6:16:14 PM UTC+7, shree wrote:
>
>
> On Wed, Jan 10, 2018 at 3:56 PM, <[email protected] <javascript:>> 
> wrote:
>
>> It works !!
>> I modified your bash script and executed it. Finally I get different 
>> traineddata size.
>>
>> But, can I train it from scratch?
>> It needs starting traineddata which I can get from combine_lang_model, 
>> isn't it?
>>
>>
> ​Starter traineddata will be generated by tesstrain.sh, change the files 
> in langdata folder.​
>
> ​To train from scratch, you need to change the lstmtraining command. It 
> will not need continue_from and old_traineddata.
>
> You will need to add a network specification - such as
>
>  --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>
> ​Usually the best traineddata will have the network spec used for training 
> by Ray as part of the version string.
>
> See https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 
> for more details.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b5a7b11f-9d76-4178-93e3-334ecd26eab4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to