Did you use --stop_training flag at the end? ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jan 8, 2018 at 5:51 PM, <[email protected]> wrote: > Hi all, > > I am doing my project using Tesseract v4.00, and always getting the > traineddata output in the same size after training with my own data. > I suppose that I did not do the steps correctly.. > > The only data that I provided were: > 1. training_text > 2. puncs (I just reduced the general punc as provided in tesseract github) > 3. numbers > 4. wordlists (I made various wordlists for several training, ranging > between 100.000 - 2.000.000) > 5. font name (I also made various fonts for several training, ranging > between 1 - 20 fonts) > > The steps that I did were: > 1. Made tiff file, unicharset and other complement data using tesstrain.sh > 2. Made tiff file, unicharset and other complement data using tesstrain.sh > for evaluation > 3. Combined unicharset, wordlists, puncs, numbers and version_str to > create started traineddata using combine_lang_data ( I am still not > confident with the value of version_str though) > 4. Trained data using lstmtraining > 5. Combined all output file using lstmtraining --continue_from ... > > Yet, all of my training ended with same size which is 10.5MB.. > Did I do all my steps correctly? > > Once, I also trained with modifying WORD_DAWG_FACTOR in > language_spesific.sh to 0 and 1, because I want to read the text and match > 100% with my wordlists. But, the result also did not satisfy me, some words > are not in my wordlists such as "USISUSISU". > Do you know whats the cause? > > I really appreciate if anyone can help or suggest any solution. > Thankyou !! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/b6ca74b2-1e50-44cb-93f6-586fcd26cec5% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/b6ca74b2-1e50-44cb-93f6-586fcd26cec5%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWxM%3Dbx_cKK8p9_YCD3oyhc-Cc%3DiCJQ9vbHrAi36-UnWw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

