[tesseract-ocr] Yoruba OCR

Victor Williamson Wed, 03 Dec 2014 23:21:50 -0800

I am working on Yoruba OCR using Tesseract 3.02. After following the steps 
on the wiki and referring to Cedric 
<http://blog.cedric.ws/how-to-train-tesseract-301>and all the training goes 
through, running Tessecrat coverts my images with Yoruba text to all dashes 
(-) proportional to the size of the text in the image. This happens even 
for the image I trained on. I used a very small sample of Yoruba text, and 
I realize I may not meet the minimum per character requirement because 
during mftraining I get a bunch of


Warning: no protos/configs for ò in CreateIntTemplates()
Warning: no protos/configs for w in CreateIntTemplates()
Warning: no protos/configs for ú in CreateIntTemplates()
Warning: no protos/configs for à in CreateIntTemplates()
...

Is there a way to build off the existing English training data? i.e. I want 
to extend the existing English training data because Yoruba uses most of 
the English characters plus 3 dozen additional special non-English 
characters. The existing English characters should always be recognized. I 
wanted to start with a small training image so that I could finish with 
minimal effort, run simple tests, and expand later.

I've tried both manual commands and using training within 
JTessBoxEditor.with the same end result. It would be nice to at least some 
characters output.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e23b7124-2df2-44a1-ab0d-5fdea104177e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Yoruba OCR

Reply via email to