Dear Ray/Tom,
I did some work on adding Indic script support to Tesseract this
summer. The hack-blog may be found at 
http://debayanin.googlepages.com/hackingtessearct
and the project is at http://code.google.com/p/tesseractindic/ . I
knew, before i started the project, that my work will not be absorbed
to the main branch, as it was not of sufficient quality. I worked for
under a month on this.
After a few months i learned that Tom had implemented "matraa
clipping" using morphological operations (http://sites.google.com/site/
ocropus/languages/devanagari-hindi-sanskrit and
http://sites.google.com/site/ocropus/morphological-operations), which
is a better method than using projection operation.
Recently, Sarai, an organisation that works on FLOSS projects in India
has announced a fellowship for work on Indic OCR. One of the proposals
likely to be accepted is to work farther in adding Indic script
support to Tesseract. I plan to increase accuracy (it stands at 87%
now) and implement morphological operations. But for that, i need an
"ack" from you guys. Are you planning to implement Indic language
support anytime soon. Will our work overlap on yours? Could you tell
me when exactly you do plan to implement it?
Eagerly waiting for a reply, from any person who knows.

~Debayan
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to