Hi Marco,

Can you please link a tutorial how to generate/create all the specific 
language files? 

Thanks!
Mikayel

On Monday, December 14, 2015 at 2:02:17 PM UTC+4, marco atzeri wrote:
>
> Hi, 
> I updated both arch (x86 and x86_64) packages to 3.04.00-3. 
>
> The tesseract-training-util packages now contains 
> the scripts taken from development repository and should work correctly. 
>
> Mini HOWTO using the Kan language files provided by Sriranga. 
> as example: 
>
> 1) package to be installed 
>
>    tesseract-ocr 
>    tesseract-training-util 
>    tesseract-training-core 
>
> in addition the specific font needed for the language 
>    lohit-kannada-fonts 
>
>
> 2) copied directory "/usr/share/tessdata/training" 
>     to a working area. 
>     In my case "/pub/devel/tesseract/training" 
>
>
> 3) added the kan subdirectory with the specific language files 
>
>    training/kan/desired_characters 
>    training/kan/kan.config 
>    training/kan/kan.numbers 
>    training/kan/kan.punc 
>    training/kan/kan.training_text 
>    training/kan/kan.training_text.bigram_freqs 
>    training/kan/kan.training_text.train_ngrams 
>    training/kan/kan.training_text.unigram_freqs 
>    training/kan/kan.unicharambigs 
>    training/kan/kan.word.bigrams 
>    training/kan/kan.wordlist 
>
> 4) command for traininig 
>
> tesstrain.sh --lang kan --langdata_dir  /pub/devel/tesseract/training 
> --tessdata_dir /usr/share/tessdata/ --fontlist "Lohit Kannada" 
> --training_text /pub/devel/tesseract/training/kan/kan.training_text 
>
> As result the output file is located on 
>
> /tmp/tesstrain/tessdata/kan.traineddata 
>
> and the log of the run can be found on 
>
> /tmp/tmp<randon-name>/kan/tesstrain.log 
>
>
> Hoping this help 
>
> Regards 
> Marco 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d9564259-c029-4d21-9e60-89ee64a19ac1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to