On Thu, Jan 24, 2013 at 02:01:03PM -0800, h12g wrote:
> As you see that is Traditional Mongolian. If I want add a new language to
> tesseract , I must get a traindata?

Yes, you need a traineddata file for the training, just for the box
generation step. Download and install the english one, then forget
about it :)

> Traditional Mongolian's train data is not exist in tesseract download list. so
> I will generate it from another exist language, such as english. after 
> generate
> some releated files than combine_tessdata, crunch a traindata file.

Tesseract won't use the existing english traineddata for a new
training file, don't worry. Follow the instructions at
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 and
you will make one from scratch.

> But, some files releated train, such as word-draw, I don't know how to use it
> and what means in it, I cant find some document about it.

I suspect you mean word-dawg. This is described further down in the
training documentation, at
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Dictionary_Data_(Optional)

Note that it is optional, so start with just the basic box/tif
parts, and then you can add things like dictionary files and
the unicharambigs file later.

Hope this makes things clearer for you. Let me know if you have more
questions.

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



Reply via email to