Re: [tesseract-ocr] How to regenerate the training text

2017-06-15 Thread ShreeDevi Kumar
You can also see https://ancientgreekocr.org/ for Nick White's method of creating training data for Ancient Greek. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jun 16, 2017 at 8:18 AM, ShreeDevi Kumar

Re: [tesseract-ocr] How to regenerate the training text

2017-06-15 Thread ShreeDevi Kumar
>Where are these scripts, or how can I otherwise generate training text from dictionary/corpus data? These are (most probably) internal scripts at Google which have not been open sourced. Please see

[tesseract-ocr] How to regenerate the training text

2017-06-15 Thread Dingyuan Wang
Dear all, I'm trying to generate a training text (chi_sim) for training tesseract because I have a better dictionary and unigram/bigram data than the defaults. I've found the following comments in training/language-specific.sh (line 845) # Set language-specific values for several global