On Thu, Jan 24, 2013 at 02:01:03PM -0800, h12g wrote: > As you see that is Traditional Mongolian. If I want add a new language to > tesseract , I must get a traindata?
Yes, you need a traineddata file for the training, just for the box generation step. Download and install the english one, then forget about it :) > Traditional Mongolian's train data is not exist in tesseract download list. so > I will generate it from another exist language, such as english. after > generate > some releated files than combine_tessdata, crunch a traindata file. Tesseract won't use the existing english traineddata for a new training file, don't worry. Follow the instructions at http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 and you will make one from scratch. > But, some files releated train, such as word-draw, I don't know how to use it > and what means in it, I cant find some document about it. I suspect you mean word-dawg. This is described further down in the training documentation, at http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Dictionary_Data_(Optional) Note that it is optional, so start with just the basic box/tif parts, and then you can add things like dictionary files and the unicharambigs file later. Hope this makes things clearer for you. Let me know if you have more questions. Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en