why not try with bbt tool? On Mon, Nov 3, 2008 at 5:10 PM, Qurat-ul-Ain Akram <[EMAIL PROTECTED]>wrote:
> Thanks for ur Immediate reply > I Followed the instructions given in the wiki site. But fail at the step of > generating the Box files ( in the very first step). This is the main problem > that why I cannot proceed further. I need the developer assistance to > suggest me, whether my there is problem in my procedure OR where I have to > make changes in the code so that Tesseract can generate the box file with > the Urdu character set. > > > > On 11/3/08, 74yrs old <[EMAIL PROTECTED]> wrote: >> >> eight datafiles have to be generated. Please visit wiki website of >> tesseract where how to generate datafiles are explained in detail.AT >> present tesseract supports for left to right. In case if you suceeded to >> generate datafiles, you hsve to read opposite direction i.e. left to right. >> cheers >> >> On Mon, Nov 3, 2008 at 12:53 PM, Qurat-ul-Ain Akram < >> [EMAIL PROTECTED]> wrote: >> >>> Hi all >>> >>> I am working with the Urdu OCR. I came to know about Tesseract. I tried >>> to train tesseract for the Urdu characters. In the training procedure's >>> instruction , it is written that it cannot support the right to left writing >>> style. I myself tried to training the simple alphabets of Urdu as follows: >>> >>> 1 I made the characters txt file with name UrduCharacters.txt with >>> utf8 encoding >>> 2. Then from it TIF image is obtained and saved as UrduCharacters.tif >>> >>> 3 Run the tesseract command to makebox file >>> *1 tesseract UrduCharacters.tif UrduCharacters >>> batch.nochop makebox* >>> >>> >>> 2 *tesseract UrduCharacters.tif UrduCharacters -l urd >>> batch.nochop makebox* >>> I have tried the both the commands for training . In the second one the >>> error occurs indicating the message that "Unable to locate Urdunichaset >>> file" >>> In the second one the boxfile is generated with four character which are >>> ~, 7,7,! . If anyone has any idea about it please let me know. >>> >>> >>> Regards >>> Ainie >>> >>> >>> >> >> >> >> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

