Hi all
I am working with the Urdu OCR. I came to know about Tesseract. I tried to
train tesseract for the Urdu characters. In the training procedure's
instruction , it is written that it cannot support the right to left writing
style. I myself tried to training the simple alphabets of Urdu as follows:
1 I made the characters txt file with name UrduCharacters.txt with utf8
encoding
2. Then from it TIF image is obtained and saved as UrduCharacters.tif
3 Run the tesseract command to makebox file
*1 tesseract UrduCharacters.tif UrduCharacters batch.nochop
makebox*
2 *tesseract UrduCharacters.tif UrduCharacters -l
urd batch.nochop
makebox*
I have tried the both the commands for training . In the second one the
error occurs indicating the message that "Unable to locate Urdunichaset
file"
In the second one the boxfile is generated with four character which are ~,
7,7,! . If anyone has any idea about it please let me know.
Regards
Ainie
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---