Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-09 Thread Romil Mehla
Thanks Shree , but if tesseract is open source then why developers can't answer doubts , If i were to randomly train my model how can i come down to accurate accuracy of my model , then my model accuracy will also be random. I want the reason for condition imposed on training text , how much

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-09 Thread ShreeDevi Kumar
For tesseract 3.05 random text will work, it is suggested to use combos similar to English training text. It is unlikely you will get answers to your questions from the developers. You can search past issues/questions in forum and github. 3.05 training does not take long, run a few experiments

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-09 Thread Romil Mehla
Hi Shree Thanks for replying For tesseract *3.05.00* I had already checked that link there they mentioned *"Make sure there are a minimum number of samples of each character. 10 is good, but 5 is OK for rare characters.* *There should be more samples of the more frequent characters - at least

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-07 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla wrote: > Thanks for

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-07 Thread Romil Mehla
Thanks for your reply , i have read about tesseract 4.0 and Ray mentioned how he used so many files to train tesseract 4.0 but i dont want to use tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my understanding suppose for eng languaur . eng.training_text file is build from

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-07 Thread ShreeDevi Kumar
Just a word list is not enough for training text. For tesseract 4.0.0 it needs to be representative of the text to be recognized. On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, wrote: > Is there any program to generate it ? i see ambiguous_words.cpp > generating dictionary words

[tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-07 Thread Romil Mehla
Is there any program to generate it ? i see ambiguous_words.cpp generating dictionary words and ambiguous words where is it used ? or it can be used to build unicharambigs file to generate rules ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"