Hi,

For the last couple of days, I was training Tess 3.0 to read Telugu.
It all went relatively smooth. Until I got to the line in the
training page that said,

>> The resulting lang.traineddata goes in your tessdata directory. Tesseract 
>> can then recognize text in your language (in theory) with the following:
>> >> tesseract image.tif output -l lang

I had my tel.traineddata file ready along side the eng.traineddata in
the /usr/local/share/tessdata folder.
And, when I run
$tesseract image.tif output
I do not have any problem; and the output is some roman characters;
but when I do
$tesseract image.tif output -l tel
I get a simple "Segmentation fault". for the same input.

What could be the reason ?
I am on
AMD Phenom X4, Ubuntu 10.10, 64 bit.

As I was getting an error in mftraining, I used the bug fix suggested
at ( http://code.google.com/p/tesseract-ocr/issues/detail?id=385 ).
Thats the only deviation from the standard training procedure given at
( 
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Training_Procedure
)

The eng.traineddata is a near 2MB. Where as the tel.traineddata is
near 6.5MB.

Thanks for reading through.
rakeshvara

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to