Hi all

I am working  with the Urdu OCR. I came to know about Tesseract. I tried to
train tesseract for the Urdu characters. In the training procedure's
instruction , it is written that it cannot support the right to left writing
style. I myself tried to training the simple alphabets of Urdu  as follows:

1      I made the characters txt file with name UrduCharacters.txt with utf8
encoding
2.     Then from it TIF image is obtained and saved as UrduCharacters.tif
3      Run the tesseract command to makebox file
              *1   tesseract UrduCharacters.tif  UrduCharacters batch.nochop
makebox*


              2    *tesseract UrduCharacters.tif  UrduCharacters  -l
urd batch.nochop
makebox*
I have tried the both the commands for training . In the second one the
error occurs indicating the message that "Unable to locate Urdunichaset
file"
In the second one the boxfile is generated with four character which are  ~,
7,7,! . If anyone has any idea about it please let me know.


Regards
Ainie

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to