Dear Ray/Tom, I did some work on adding Indic script support to Tesseract this summer. The hack-blog may be found at http://debayanin.googlepages.com/hackingtessearct and the project is at http://code.google.com/p/tesseractindic/ . I knew, before i started the project, that my work will not be absorbed to the main branch, as it was not of sufficient quality. I worked for under a month on this. After a few months i learned that Tom had implemented "matraa clipping" using morphological operations (http://sites.google.com/site/ ocropus/languages/devanagari-hindi-sanskrit and http://sites.google.com/site/ocropus/morphological-operations), which is a better method than using projection operation. Recently, Sarai, an organisation that works on FLOSS projects in India has announced a fellowship for work on Indic OCR. One of the proposals likely to be accepted is to work farther in adding Indic script support to Tesseract. I plan to increase accuracy (it stands at 87% now) and implement morphological operations. But for that, i need an "ack" from you guys. Are you planning to implement Indic language support anytime soon. Will our work overlap on yours? Could you tell me when exactly you do plan to implement it? Eagerly waiting for a reply, from any person who knows.
~Debayan --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

