Hello,
I am trying to train MS Mincho font ~14k characters for recognizing
only that Japanese font.
I can train it through all the steps but once I try to run it I have
the following error:
$ tesseract myjap.mincho.exp0.tif output -l myjap
Error: Size of unicharset is greater than MAX_NUM_CLASSES
Failed loading language 'myjap'
Tesseract couldn't load any languages!
Could not initialize tesseract.

I am using the latest svn revision: 715

I don't know if it's related, but in the "shapetable" there are only
one third of the shapes.

The tif file and the box file can be downloaded from:
http://dl.dropbox.com/u/64426696/train.zip
font_properties content: mincho 0 0 0 0 0
The list of commands executed to train is:
tesseract myjap.mincho.exp0.tif myjap.mincho.exp0 nobatch box.train
unicharset_extractor myjap.mincho.exp0.box
shapeclustering -F font_properties -U unicharset myjap.mincho.exp0.tr
mftraining -F font_properties -U unicharset -O myjap.unicharset
myjap.mincho.exp0.tr
cntraining myjap.mincho.exp0.tr
combine_tessdata myjap.
tesseract myjap.mincho.exp0.tif output -l myjap


Do you have any suggestion?

Thanks
Andrea

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to