see https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla <[email protected]> wrote: > Thanks for your reply , i have read about tesseract 4.0 and Ray mentioned > how he used so many files to train tesseract 4.0 but i dont want to use > tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my > understanding suppose for eng languaur . eng.training_text file is build > from eng.wordlist file mentioned in langdata. For a new language how can i > build training text from my new languaue wordlist ,any idea on who has > created the eng.training_text file ? is there any rule or algorithm to do > so , or it is randomly generated from eng.wordlist by maintaining minimum > 10 times occurrence of a character in training text. > > > > Please clarify on this , please let me know how to generate traning_text?? > > On Saturday, April 7, 2018 at 3:46:10 PM UTC+5:30, shree wrote: >> >> Just a word list is not enough for training text. >> >> For tesseract 4.0.0 it needs to be representative of the text to be >> recognized. >> >> On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, <[email protected]> wrote: >> >>> Is there any program to generate it ? i see ambiguous_words.cpp >>> generating dictionary words and ambiguous words where is it used ? or it >>> can be used to build unicharambigs file to generate rules ? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

