Check out the training sample files bundled with jTessBoxEditor located 
under samples\vie folder. It seems Vietnamese alphabet share some common 
characters as Yoruba. You certainly adapt it to your language.

On Wednesday, December 3, 2014 5:52:04 PM UTC-6, Victor Williamson wrote:
>
> I am working on Yoruba OCR using Tesseract 3.02. After following the steps 
> on the wiki and referring to Cedric 
> <http://blog.cedric.ws/how-to-train-tesseract-301>and all the training 
> goes through, running Tessecrat coverts my images with Yoruba text to all 
> dashes (-) proportional to the size of the text in the image. This happens 
> even for the image I trained on. I used a very small sample of Yoruba text, 
> and I realize I may not meet the minimum per character requirement because 
> during mftraining I get a bunch of
>
> Warning: no protos/configs for ò in CreateIntTemplates()
> Warning: no protos/configs for w in CreateIntTemplates()
> Warning: no protos/configs for ú in CreateIntTemplates()
> Warning: no protos/configs for à in CreateIntTemplates()
> ...
>
> Is there a way to build off the existing English training data? i.e. I 
> want to extend the existing English training data because Yoruba uses most 
> of the English characters plus 3 dozen additional special non-English 
> characters. The existing English characters should always be recognized. I 
> wanted to start with a small training image so that I could finish with 
> minimal effort, run simple tests, and expand later.
>
> I've tried both manual commands and using training within 
> JTessBoxEditor.with the same end result. It would be nice to at least some 
> characters output.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6e65206d-1463-40a6-8144-8cffbe454948%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to