Re: Open Source OCR system

Bikash Bag Sun, 01 Aug 2010 04:59:52 -0700

hi, I am also working on oriya OCR, can u please share your procedure of
recognizing words or letters.


regards,
bikash

On 1 August 2010 12:35, Sriranga(77yrsold) <[email protected]> wrote:

> Dear Rakesh,
> Really interesting. Please don't forget me   I like to join with you in
> developing OCR for indian languages under your leadership.
> Yes complexity existed as well as fundamental grammar in Indian languages
> based on Sanskrit only.
> I can also contribute Kannada tif image with its text converted in
> unicodes also for experiment purpose.
> I like to have software for hands on experience and beta-testing and
> feedback.
> Wishing you best of Luck and good wishes,
> -sriranga(77yrsold)
>
>
> On Sun, Aug 1, 2010 at 12:12 PM, Rakesh Achanta <[email protected]>wrote:
>
>> Very interesting.
>> Me and a bunch of friends are currently dealing with Indian languages. As
>> Tibetan is also based on the Devanagari system of writing, and is written as
>> abugida, your work will be very helpful for us.
>> Details like, how do you account for sandhis/joins in Sanskrit Eg:-
>> sah+aham = soham etc.
>>
>> Complexity in Sanskrit like languages arises primarily from two things
>> 1) Writing in syllables takes the symbols to a thousand or so (compare
>> English's 80 or so)
>> 2) The number of words in  Sanskrit are limitless as one can keep
>> combining them.
>>
>> I would be interested in reading any notes that detail how you are able to
>> cope with the above two.
>>
>> Also as you said your system can learn new languages, it must be very easy
>> for it to learn Indian languages that have the same writing concept as
>> Tibetan. If you want a list of all possible combos for say, Telugu with the
>> tiff image and the unicode string. I can give them to you.
>>
>> Regards
>> Rakesh
>>
>> On 30 July 2010 04:59, Moscow Rime Dharma Centre <[email protected]
>> > wrote:
>>
>>> Good day.
>>> For a few years our group has been developing OCR (optical character
>>> recognition) and translation system with Open Source code. Now we have
>>> the first solid results and will be happy to share this system and our
>>> knowledge with you. The key features of the OCR system include:
>>>
>>> 1. Stream OCR processing
>>> During the first stage of the project, we recognized 300 000 pages of
>>> Tibetan Canon in Tibetan for TBRS Digital Library (www.tbrc.org) We
>>> used MacPro stream server that has processed all 280 volumes with one
>>> OCR set.
>>>
>>> 2. Tibetan spell checker and online dictionary on 250000 words ans 6.5
>>> mln wordlist.
>>>
>>> 3. Multilingual support
>>> At present, the key direction of the project is Tibetan and Sanskrit
>>> OCR. However, its main algorithm can study one language per two
>>> months.
>>>
>>> 4. High accuracy
>>> The system uses dictionary control at all stages of OCR processing.
>>> Its Grammar Corrector can use a statistic dictionary containing 20-30
>>> mln phrases (the Tibetan dictionary now includes 8.5 mln). For Tibetan
>>> books, the current recognition results are 1 error per 1000
>>> characters. Here you can see a screenshot:
>>> http://www.buddism.ru///ocrlib/OCRLib21_07_2010.png
>>>
>>> All this features can be integrated in Tesseract project.
>>>
>>> We believe that we may help you in your research and projects. And
>>> probably you may help us to continue the development of the OCR system
>>> and start tibetan translation program. We are looking forward to
>>> hearing from you and will be happy to answer your questions!
>>>
>>> Best regards,
>>> Alexander Stroganov,
>>> [email protected]
>>>
>>> Rime Center Russia
>>> OCR Project Web pages:
>>> http://sourceforge.net/projects/ocrlib/
>>> www.buddism.ru/ocrlib
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "tesseract-ocr" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected]<tesseract-ocr%[email protected]>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<tesseract-ocr%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<tesseract-ocr%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Open Source OCR system

Reply via email to