Need Help To Train Teseract for Urdu Language

Qurat-ul-Ain Akram Sun, 02 Nov 2008 23:23:23 -0800

Hi all

I am working  with the Urdu OCR. I came to know about Tesseract. I tried to
train tesseract for the Urdu characters. In the training procedure's
instruction , it is written that it cannot support the right to left writing
style. I myself tried to training the simple alphabets of Urdu  as follows:


1      I made the characters txt file with name UrduCharacters.txt with utf8
encoding
2.     Then from it TIF image is obtained and saved as UrduCharacters.tif
3      Run the tesseract command to makebox file
              *1   tesseract UrduCharacters.tif  UrduCharacters batch.nochop
makebox*


              2    *tesseract UrduCharacters.tif  UrduCharacters  -l
urd batch.nochop
makebox*
I have tried the both the commands for training . In the second one the
error occurs indicating the message that "Unable to locate Urdunichaset
file"
In the second one the boxfile is generated with four character which are  ~,
7,7,! . If anyone has any idea about it please let me know.


Regards
Ainie

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Need Help To Train Teseract for Urdu Language

Reply via email to