Hi,
I updated both arch (x86 and x86_64) packages to 3.04.00-3.

The tesseract-training-util packages now contains
the scripts taken from development repository and should work correctly.

Mini HOWTO using the Kan language files provided by Sriranga.
as example:

1) package to be installed

  tesseract-ocr
  tesseract-training-util
  tesseract-training-core

in addition the specific font needed for the language
  lohit-kannada-fonts


2) copied directory "/usr/share/tessdata/training"
   to a working area.
   In my case "/pub/devel/tesseract/training"


3) added the kan subdirectory with the specific language files

  training/kan/desired_characters
  training/kan/kan.config
  training/kan/kan.numbers
  training/kan/kan.punc
  training/kan/kan.training_text
  training/kan/kan.training_text.bigram_freqs
  training/kan/kan.training_text.train_ngrams
  training/kan/kan.training_text.unigram_freqs
  training/kan/kan.unicharambigs
  training/kan/kan.word.bigrams
  training/kan/kan.wordlist

4) command for traininig

tesstrain.sh --lang kan --langdata_dir /pub/devel/tesseract/training --tessdata_dir /usr/share/tessdata/ --fontlist "Lohit Kannada" --training_text /pub/devel/tesseract/training/kan/kan.training_text

As result the output file is located on

/tmp/tesstrain/tessdata/kan.traineddata

and the log of the run can be found on

/tmp/tmp<randon-name>/kan/tesstrain.log


Hoping this help

Regards
Marco


--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/566E938C.1060001%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to