Try to use training text from the following and see if it helps -

https://code.google.com/r/shreeshrii-langdata/source/browse?name=asc
https://code.google.com/r/shreeshrii-langdata/source/browse?name=iast


https://code.google.com/r/shreeshrii-tessdata/source/browse?name=iast

You can use eng+your_language_code to recognize english + your language
text.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Dec 4, 2014 at 5:22 AM, Victor Williamson <[email protected]>
wrote:

> I am working on Yoruba OCR using Tesseract 3.02. After following the steps
> on the wiki and referring to Cedric
> <http://blog.cedric.ws/how-to-train-tesseract-301>and all the training
> goes through, running Tessecrat coverts my images with Yoruba text to all
> dashes (-) proportional to the size of the text in the image. This happens
> even for the image I trained on. I used a very small sample of Yoruba text,
> and I realize I may not meet the minimum per character requirement because
> during mftraining I get a bunch of
>
> Warning: no protos/configs for ò in CreateIntTemplates()
> Warning: no protos/configs for w in CreateIntTemplates()
> Warning: no protos/configs for ú in CreateIntTemplates()
> Warning: no protos/configs for à in CreateIntTemplates()
> ...
>
> Is there a way to build off the existing English training data? i.e. I
> want to extend the existing English training data because Yoruba uses most
> of the English characters plus 3 dozen additional special non-English
> characters. The existing English characters should always be recognized. I
> wanted to start with a small training image so that I could finish with
> minimal effort, run simple tests, and expand later.
>
> I've tried both manual commands and using training within
> JTessBoxEditor.with the same end result. It would be nice to at least some
> characters output.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e23b7124-2df2-44a1-ab0d-5fdea104177e%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/e23b7124-2df2-44a1-ab0d-5fdea104177e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWwj78Dee3icP1dSf6uh2fmW3-WqySeCHkfWcyHRZV%2BGg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to