I am working on Yoruba OCR using Tesseract 3.02. After following the steps on the wiki and referring to Cedric <http://blog.cedric.ws/how-to-train-tesseract-301>and all the training goes through, running Tessecrat coverts my images with Yoruba text to all dashes (-) proportional to the size of the text in the image. This happens even for the image I trained on. I used a very small sample of Yoruba text, and I realize I may not meet the minimum per character requirement because during mftraining I get a bunch of
Warning: no protos/configs for ò in CreateIntTemplates() Warning: no protos/configs for w in CreateIntTemplates() Warning: no protos/configs for ú in CreateIntTemplates() Warning: no protos/configs for à in CreateIntTemplates() ... Is there a way to build off the existing English training data? i.e. I want to extend the existing English training data because Yoruba uses most of the English characters plus 3 dozen additional special non-English characters. The existing English characters should always be recognized. I wanted to start with a small training image so that I could finish with minimal effort, run simple tests, and expand later. I've tried both manual commands and using training within JTessBoxEditor.with the same end result. It would be nice to at least some characters output. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e23b7124-2df2-44a1-ab0d-5fdea104177e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

