Hi Marco, Can you please link a tutorial how to generate/create all the specific language files?
Thanks! Mikayel On Monday, December 14, 2015 at 2:02:17 PM UTC+4, marco atzeri wrote: > > Hi, > I updated both arch (x86 and x86_64) packages to 3.04.00-3. > > The tesseract-training-util packages now contains > the scripts taken from development repository and should work correctly. > > Mini HOWTO using the Kan language files provided by Sriranga. > as example: > > 1) package to be installed > > tesseract-ocr > tesseract-training-util > tesseract-training-core > > in addition the specific font needed for the language > lohit-kannada-fonts > > > 2) copied directory "/usr/share/tessdata/training" > to a working area. > In my case "/pub/devel/tesseract/training" > > > 3) added the kan subdirectory with the specific language files > > training/kan/desired_characters > training/kan/kan.config > training/kan/kan.numbers > training/kan/kan.punc > training/kan/kan.training_text > training/kan/kan.training_text.bigram_freqs > training/kan/kan.training_text.train_ngrams > training/kan/kan.training_text.unigram_freqs > training/kan/kan.unicharambigs > training/kan/kan.word.bigrams > training/kan/kan.wordlist > > 4) command for traininig > > tesstrain.sh --lang kan --langdata_dir /pub/devel/tesseract/training > --tessdata_dir /usr/share/tessdata/ --fontlist "Lohit Kannada" > --training_text /pub/devel/tesseract/training/kan/kan.training_text > > As result the output file is located on > > /tmp/tesstrain/tessdata/kan.traineddata > > and the log of the run can be found on > > /tmp/tmp<randon-name>/kan/tesstrain.log > > > Hoping this help > > Regards > Marco > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d9564259-c029-4d21-9e60-89ee64a19ac1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

