Thanks for ur Immediate reply I Followed the instructions given in the wiki site. But fail at the step of generating the Box files ( in the very first step). This is the main problem that why I cannot proceed further. I need the developer assistance to suggest me, whether my there is problem in my procedure OR where I have to make changes in the code so that Tesseract can generate the box file with the Urdu character set.
On 11/3/08, 74yrs old <[EMAIL PROTECTED]> wrote: > > eight datafiles have to be generated. Please visit wiki website of > tesseract where how to generate datafiles are explained in detail.AT > present tesseract supports for left to right. In case if you suceeded to > generate datafiles, you hsve to read opposite direction i.e. left to right. > cheers > > On Mon, Nov 3, 2008 at 12:53 PM, Qurat-ul-Ain Akram <[EMAIL PROTECTED] > > wrote: > >> Hi all >> >> I am working with the Urdu OCR. I came to know about Tesseract. I tried >> to train tesseract for the Urdu characters. In the training procedure's >> instruction , it is written that it cannot support the right to left writing >> style. I myself tried to training the simple alphabets of Urdu as follows: >> >> 1 I made the characters txt file with name UrduCharacters.txt with >> utf8 encoding >> 2. Then from it TIF image is obtained and saved as UrduCharacters.tif >> 3 Run the tesseract command to makebox file >> *1 tesseract UrduCharacters.tif UrduCharacters >> batch.nochop makebox* >> >> >> 2 *tesseract UrduCharacters.tif UrduCharacters -l urd >> batch.nochop >> makebox* >> I have tried the both the commands for training . In the second one the >> error occurs indicating the message that "Unable to locate Urdunichaset >> file" >> In the second one the boxfile is generated with four character which are >> ~, 7,7,! . If anyone has any idea about it please let me know. >> >> >> Regards >> Ainie >> >> >> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

