Hi,
I updated both arch (x86 and x86_64) packages to 3.04.00-3.
The tesseract-training-util packages now contains
the scripts taken from development repository and should work correctly.
Mini HOWTO using the Kan language files provided by Sriranga.
as example:
1) package to be installed
tesseract-ocr
tesseract-training-util
tesseract-training-core
in addition the specific font needed for the language
lohit-kannada-fonts
2) copied directory "/usr/share/tessdata/training"
to a working area.
In my case "/pub/devel/tesseract/training"
3) added the kan subdirectory with the specific language files
training/kan/desired_characters
training/kan/kan.config
training/kan/kan.numbers
training/kan/kan.punc
training/kan/kan.training_text
training/kan/kan.training_text.bigram_freqs
training/kan/kan.training_text.train_ngrams
training/kan/kan.training_text.unigram_freqs
training/kan/kan.unicharambigs
training/kan/kan.word.bigrams
training/kan/kan.wordlist
4) command for traininig
tesstrain.sh --lang kan --langdata_dir /pub/devel/tesseract/training
--tessdata_dir /usr/share/tessdata/ --fontlist "Lohit Kannada"
--training_text /pub/devel/tesseract/training/kan/kan.training_text
As result the output file is located on
/tmp/tesstrain/tessdata/kan.traineddata
and the log of the run can be found on
/tmp/tmp<randon-name>/kan/tesstrain.log
Hoping this help
Regards
Marco
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/566E938C.1060001%40gmail.com.
For more options, visit https://groups.google.com/d/optout.