+ tesseract-dev google group Thank you, Marco. I will download the training tools packages and and give it a try.
In future updates to the tesseract package, may I suggest packaging of more languages from 'tessdata' - https://github.com/tesseract-ocr/tessdata specially the ones which have multiple files for the language such as ara, hin etc. The languages that have just one file for traineddata can be downloaded easily as a zip from the 'raw' link. It would be very helpful to have a single tar/zip for the others. Thanks so much for packaging 3.04.00 for cygwin. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Aug 2, 2015 at 12:53 PM, Marco Atzeri <[email protected]> wrote: > On 7/29/2015 11:40 AM, ShreeDevi Kumar wrote: > >> Marco, >> >> Thanks for building the training tools for cygwin. Till now just the >> additional binaries have been shipped as part of the tesseract package. >> >> With Tesseract 3.04.00 there are additional scripts provided to help >> with training. Google has also provided the language data which can be >> used for training different languages and building the traineddata >> files. Hence my request to include these. >> >> Not all users will be interested in training for a new language or >> trying to improve an existing traineddata, so in my opinion, it maybe >> better to package these separately. >> > > Hi ShreeDevi > uploading 3.04.00-2. > > The training tools are in the new package > tesseract-training-util > > while the training language file are split between > tesseract-training-core > tesseract-training-{lang} > > I have not changed the previos datastructure, > just added an additional level > /usr/share/tessdata/training > > and the two test files are in > /usr/share/tessdata/testing/eurotext.tif > /usr/share/tessdata/testing/phototest.tif > > > $ cygcheck -l tesseract-training-util > /usr/bin/ambiguous_words.exe > /usr/bin/classifier_tester.exe > /usr/bin/cntraining.exe > /usr/bin/combine_tessdata.exe > /usr/bin/dawg2wordlist.exe > /usr/bin/mftraining.exe > /usr/bin/set_unicharset_properties.exe > /usr/bin/shapeclustering.exe > /usr/bin/text2image.exe > /usr/bin/unicharset_extractor.exe > /usr/bin/wordlist2dawg.exe > /usr/bin/language-specific.sh > /usr/bin/tesstrain.sh > /usr/bin/tesstrain_utils.sh > > $ cygcheck -l tesseract-training-core > /usr/share/tessdata/training/Arabic.unicharset > /usr/share/tessdata/training/Arabic.xheights > ... > /usr/share/tessdata/training/Cherokee.xheights > /usr/share/tessdata/training/common.punc > /usr/share/tessdata/training/common.unicharambigs > /usr/share/tessdata/training/Common.unicharset > /usr/share/tessdata/training/Cyrillic.unicharset > ... > /usr/share/tessdata/training/Ethiopic.xheights > /usr/share/tessdata/training/font_properties > /usr/share/tessdata/training/forbidden_characters_default > /usr/share/tessdata/training/Georgian.unicharset > ... > /usr/share/tessdata/training/Tibetan.unicharset > > $ cygcheck -l tesseract-training-eng > /usr/share/tessdata/training/eng/desired_characters > /usr/share/tessdata/training/eng/eng.cube-unicharset > /usr/share/tessdata/training/eng/eng.cube-word-dawg > /usr/share/tessdata/training/eng/eng.numbers > /usr/share/tessdata/training/eng/eng.punc > /usr/share/tessdata/training/eng/eng.training_text > /usr/share/tessdata/training/eng/eng.training_text.bigram_freqs > /usr/share/tessdata/training/eng/eng.training_text.unigram_freqs > /usr/share/tessdata/training/eng/eng.unicharambigs > /usr/share/tessdata/training/eng/eng.word.bigrams > /usr/share/tessdata/training/eng/eng.wordlist > > Regards > Marco > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/55BDC558.2090205%40gmail.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWr4%3Dw1%3D024PCj5eKBYs_b3Jx3DOtgGp4UonwyB5EO7Rg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

