Try to use training text from the following and see if it helps - https://code.google.com/r/shreeshrii-langdata/source/browse?name=asc https://code.google.com/r/shreeshrii-langdata/source/browse?name=iast
https://code.google.com/r/shreeshrii-tessdata/source/browse?name=iast You can use eng+your_language_code to recognize english + your language text. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Dec 4, 2014 at 5:22 AM, Victor Williamson <[email protected]> wrote: > I am working on Yoruba OCR using Tesseract 3.02. After following the steps > on the wiki and referring to Cedric > <http://blog.cedric.ws/how-to-train-tesseract-301>and all the training > goes through, running Tessecrat coverts my images with Yoruba text to all > dashes (-) proportional to the size of the text in the image. This happens > even for the image I trained on. I used a very small sample of Yoruba text, > and I realize I may not meet the minimum per character requirement because > during mftraining I get a bunch of > > Warning: no protos/configs for ò in CreateIntTemplates() > Warning: no protos/configs for w in CreateIntTemplates() > Warning: no protos/configs for ú in CreateIntTemplates() > Warning: no protos/configs for à in CreateIntTemplates() > ... > > Is there a way to build off the existing English training data? i.e. I > want to extend the existing English training data because Yoruba uses most > of the English characters plus 3 dozen additional special non-English > characters. The existing English characters should always be recognized. I > wanted to start with a small training image so that I could finish with > minimal effort, run simple tests, and expand later. > > I've tried both manual commands and using training within > JTessBoxEditor.with the same end result. It would be nice to at least some > characters output. > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/e23b7124-2df2-44a1-ab0d-5fdea104177e%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/e23b7124-2df2-44a1-ab0d-5fdea104177e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwj78Dee3icP1dSf6uh2fmW3-WqySeCHkfWcyHRZV%2BGg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

