Hi, For the last couple of days, I was training Tess 3.0 to read Telugu. It all went relatively smooth. Until I got to the line in the training page that said,
>> The resulting lang.traineddata goes in your tessdata directory. Tesseract >> can then recognize text in your language (in theory) with the following: >> >> tesseract image.tif output -l lang I had my tel.traineddata file ready along side the eng.traineddata in the /usr/local/share/tessdata folder. And, when I run $tesseract image.tif output I do not have any problem; and the output is some roman characters; but when I do $tesseract image.tif output -l tel I get a simple "Segmentation fault". for the same input. What could be the reason ? I am on AMD Phenom X4, Ubuntu 10.10, 64 bit. As I was getting an error in mftraining, I used the bug fix suggested at ( http://code.google.com/p/tesseract-ocr/issues/detail?id=385 ). Thats the only deviation from the standard training procedure given at ( http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Training_Procedure ) The eng.traineddata is a near 2MB. Where as the tel.traineddata is near 6.5MB. Thanks for reading through. rakeshvara -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

