Re: [tesseract-ocr] Traineddata always ended in same size and did not match with wordlist

easymavinmind Mon, 08 Jan 2018 22:32:32 -0800

Yes, I did the following command in tesseract/training directory:

lstmtraining --stop_training --continue_from 
../result/mylangoutput/base_checkpoint --traineddata 
../result/mylangcombine/mylang/mylang.traineddata --model_output 
../result/mylangoutput/mylang.traineddata


On Monday, January 8, 2018 at 7:36:50 PM UTC+7, shree wrote:
>
> Did you use --stop_training flag at the end?
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Jan 8, 2018 at 5:51 PM, <[email protected] <javascript:>> wrote:
>
>> Hi all,
>>
>> I am doing my project using Tesseract v4.00, and always getting the 
>> traineddata output in the same size after training with my own data.
>> I suppose that I did not do the steps correctly..
>>
>> The only data that I provided were:
>> 1. training_text
>> 2. puncs (I just reduced the general punc as provided in tesseract github)
>> 3. numbers
>> 4. wordlists (I made various wordlists for several training, ranging 
>> between 100.000 - 2.000.000) 
>> 5. font name (I also made various fonts for several training, ranging 
>> between 1 - 20 fonts)
>>
>> The steps that I did were:
>> 1. Made tiff file, unicharset and other complement data using tesstrain.sh
>> 2. Made tiff file, unicharset and other complement data using 
>> tesstrain.sh for evaluation
>> 3. Combined unicharset, wordlists, puncs, numbers and version_str to 
>> create started traineddata using combine_lang_data ( I am still not 
>> confident with the value of version_str though)
>> 4. Trained data using lstmtraining
>> 5. Combined all output file using lstmtraining --continue_from ...
>>
>> Yet, all of my training ended with same size which is 10.5MB..
>> Did I do all my steps correctly?
>>
>> Once, I also trained with modifying WORD_DAWG_FACTOR in 
>> language_spesific.sh to 0 and 1, because I want to read the text and match 
>> 100% with my wordlists. But, the result also did not satisfy me, some words 
>> are not in my wordlists such as "USISUSISU".
>> Do you know whats the cause?
>>
>> I really appreciate if anyone can help or suggest any solution.
>> Thankyou !!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/b6ca74b2-1e50-44cb-93f6-586fcd26cec5%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/b6ca74b2-1e50-44cb-93f6-586fcd26cec5%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8ef2e463-9fd8-48c2-9498-19fb2cd32628%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Traineddata always ended in same size and did not match with wordlist

Reply via email to