Overview of Training Process The overall training process is similar to training 3.04 <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract> Conceptually the same:
1. Prepare training text. 2. Render text to image + box file. (Or create hand-made box files for existing image data.) 3. Make unicharset file. 4. Optionally make dictionary data. 5. Run tesseract to process image + box file to make training data set. 6. Run training on training data set. 7. Combine data files. The key differences are: - The boxes only need to be at the *textline level.* It is thus *far easier* to make training data from existing image data. - The .tr files are replaced by .lstmf data files. - Fonts *can and should be mixed freely* instead of being separate. - The clustering steps (mftraining, cntraining, shapeclustering) are replaced with a single slow lstmtraining step. Hello shrreDevi, I request u to guide me in eloborating the above marked steps, as i am not able to find the relevant steps for them. The steps which I am following is giving me the above errors in previuos reply. Please guide me. On Wednesday, April 5, 2017 at 9:07:40 AM UTC+5:30, shree wrote: > > Read > > https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 > > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune > > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example > > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer > > and > > https://github.com/tesseract-ocr/tesseract/wiki/Documentation > > https://github.com/tesseract-ocr/tesseract/wiki/Fonts > > https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage > > https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality > > https://github.com/tesseract-ocr/tesseract/wiki/FAQ > > > > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Wed, Apr 5, 2017 at 12:54 AM, <[email protected] <javascript:>> wrote: > >> Can you please post some experiences in this post, as there are no posts >> to train tesseract 4. >> >> 1)And also, is there any way to add the new trained data file to old >> trained data file, without replacing the old file. >> 2)If we dont know what font we may get in our images, then how should we >> proceed in training the tessract >> >> On Tuesday, April 4, 2017 at 9:27:06 PM UTC+5:30, Saurabh Srivastav wrote: >>> >>> Yes, i trained my tesseract for eng font and make them read the >>> characters from image. >>> >>>> thanks, >>>>> Saurabh Srivastav >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/34ce1784-970d-4b42-8cb6-846fe63c5393%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

