did you ever get it to work for urdu? i am trying the same would appreciate some help please.
On Monday, November 3, 2008 7:23:04 AM UTC, Ainie wrote: > > Hi all > > I am working with the Urdu OCR. I came to know about Tesseract. I tried > to train tesseract for the Urdu characters. In the training procedure's > instruction , it is written that it cannot support the right to left > writing style. I myself tried to training the simple alphabets of Urdu as > follows: > > 1 I made the characters txt file with name UrduCharacters.txt with > utf8 encoding > 2. Then from it TIF image is obtained and saved as UrduCharacters.tif > 3 Run the tesseract command to makebox file > *1 tesseract UrduCharacters.tif UrduCharacters > batch.nochop makebox* > > > 2 *tesseract UrduCharacters.tif UrduCharacters -l urd > batch.nochop > makebox* > I have tried the both the commands for training . In the second one the > error occurs indicating the message that "Unable to locate Urdunichaset > file" > In the second one the boxfile is generated with four character which are > ~, 7,7,! . If anyone has any idea about it please let me know. > > > Regards > Ainie > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

