hi, I am also working on oriya OCR, can u please share your procedure of recognizing words or letters.
regards, bikash On 1 August 2010 12:35, Sriranga(77yrsold) <[email protected]> wrote: > Dear Rakesh, > Really interesting. Please don't forget me I like to join with you in > developing OCR for indian languages under your leadership. > Yes complexity existed as well as fundamental grammar in Indian languages > based on Sanskrit only. > I can also contribute Kannada tif image with its text converted in > unicodes also for experiment purpose. > I like to have software for hands on experience and beta-testing and > feedback. > Wishing you best of Luck and good wishes, > -sriranga(77yrsold) > > > On Sun, Aug 1, 2010 at 12:12 PM, Rakesh Achanta <[email protected]>wrote: > >> Very interesting. >> Me and a bunch of friends are currently dealing with Indian languages. As >> Tibetan is also based on the Devanagari system of writing, and is written as >> abugida, your work will be very helpful for us. >> Details like, how do you account for sandhis/joins in Sanskrit Eg:- >> sah+aham = soham etc. >> >> Complexity in Sanskrit like languages arises primarily from two things >> 1) Writing in syllables takes the symbols to a thousand or so (compare >> English's 80 or so) >> 2) The number of words in Sanskrit are limitless as one can keep >> combining them. >> >> I would be interested in reading any notes that detail how you are able to >> cope with the above two. >> >> Also as you said your system can learn new languages, it must be very easy >> for it to learn Indian languages that have the same writing concept as >> Tibetan. If you want a list of all possible combos for say, Telugu with the >> tiff image and the unicode string. I can give them to you. >> >> Regards >> Rakesh >> >> On 30 July 2010 04:59, Moscow Rime Dharma Centre <[email protected] >> > wrote: >> >>> Good day. >>> For a few years our group has been developing OCR (optical character >>> recognition) and translation system with Open Source code. Now we have >>> the first solid results and will be happy to share this system and our >>> knowledge with you. The key features of the OCR system include: >>> >>> 1. Stream OCR processing >>> During the first stage of the project, we recognized 300 000 pages of >>> Tibetan Canon in Tibetan for TBRS Digital Library (www.tbrc.org) We >>> used MacPro stream server that has processed all 280 volumes with one >>> OCR set. >>> >>> 2. Tibetan spell checker and online dictionary on 250000 words ans 6.5 >>> mln wordlist. >>> >>> 3. Multilingual support >>> At present, the key direction of the project is Tibetan and Sanskrit >>> OCR. However, its main algorithm can study one language per two >>> months. >>> >>> 4. High accuracy >>> The system uses dictionary control at all stages of OCR processing. >>> Its Grammar Corrector can use a statistic dictionary containing 20-30 >>> mln phrases (the Tibetan dictionary now includes 8.5 mln). For Tibetan >>> books, the current recognition results are 1 error per 1000 >>> characters. Here you can see a screenshot: >>> http://www.buddism.ru///ocrlib/OCRLib21_07_2010.png >>> >>> All this features can be integrated in Tesseract project. >>> >>> We believe that we may help you in your research and projects. And >>> probably you may help us to continue the development of the OCR system >>> and start tibetan translation program. We are looking forward to >>> hearing from you and will be happy to answer your questions! >>> >>> Best regards, >>> Alexander Stroganov, >>> [email protected] >>> >>> Rime Center Russia >>> OCR Project Web pages: >>> http://sourceforge.net/projects/ocrlib/ >>> www.buddism.ru/ocrlib >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]<tesseract-ocr%[email protected]> >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<tesseract-ocr%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<tesseract-ocr%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

