Hello, I am trying to train MS Mincho font ~14k characters for recognizing only that Japanese font. I can train it through all the steps but once I try to run it I have the following error: $ tesseract myjap.mincho.exp0.tif output -l myjap Error: Size of unicharset is greater than MAX_NUM_CLASSES Failed loading language 'myjap' Tesseract couldn't load any languages! Could not initialize tesseract.
I am using the latest svn revision: 715 I don't know if it's related, but in the "shapetable" there are only one third of the shapes. The tif file and the box file can be downloaded from: http://dl.dropbox.com/u/64426696/train.zip font_properties content: mincho 0 0 0 0 0 The list of commands executed to train is: tesseract myjap.mincho.exp0.tif myjap.mincho.exp0 nobatch box.train unicharset_extractor myjap.mincho.exp0.box shapeclustering -F font_properties -U unicharset myjap.mincho.exp0.tr mftraining -F font_properties -U unicharset -O myjap.unicharset myjap.mincho.exp0.tr cntraining myjap.mincho.exp0.tr combine_tessdata myjap. tesseract myjap.mincho.exp0.tif output -l myjap Do you have any suggestion? Thanks Andrea -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

