Hi Has anyone tried to create Nepali language data for tesseract ?
I think Hindi/Sanskrit data files can also be used for tesseract. I don't know which place is it to discuss about this : tesseract ocr forum or fossnepal. Any suggestions on this ? Art is a librarian at the University of Windsor and have been working on using open source OCR for newspaper collections. He was asked about Nepali by a friend and became curious but he doesn't have a specific project for the language at this point. He opts tesseract<http://code.google.com/p/tesseract-ocr/>for this and wants to use it for newspaper pages in batch. Earlier I was interested in creating a Nepali OCR but I am these days more into creating Nepali Translator [Hindi or English to Nepali text translator<http://code.google.com/p/nepaliwikipediatranslator> ] I read tesseract-ocr threads daily but still I prefer to be called a noob in this regards. ==== On Tue, Apr 17, 2012 at 7:26 AM, Art W Rhyno <> wrote: > Hi, > > I was curious whether you were aware of any efforts to create Nepali > language data for tesseract 3.01 and above. I see the 2.x test data but I > can't find anything more recent. > > art > --- > Art Rhyno > Systems Librarian > University of Windsor > art --- Art Rhyno Systems Librarian University of Windso On Sat, Dec 3, 2011 at 7:10 PM, Bal Krishna Bal <[email protected]>wrote: > Hi Anish, > > On Sat, Dec 3, 2011 at 2:04 PM, ANISH SHRESTHA > <[email protected]>wrote: > >> Great. that's good news!! I have been wondering how accurate does it >> analyze the handwritten scanned documents. >> >> 100% accuracy is hardly possible to obtain and there are of course, there >> are many criteria that would support in its accuracy. Is it possible to >> know the current status of the project? are we ready to jump >> into digitization already? How optimistic can we be about the digitization >> of the documents in near future? >> > >> Honestly, glad to hear about the progress of the project! Cheers!! >> > > There is still a long way to go for an accurate OCR application for > scanned image of printed text let alone the handwritten scanned documents, > the latter entailing additional challenges as handwritten texts are hardly > uniform and clean compared to printed text. There are issues over > segmentation of words in Nepali. Tesseract although is a good classifier or > recognition engine does not recognize conjoined characters. There is hence > the challenge to develop an accurate segmentation module. We have made some > effort > in the past on this front in the past and it would be really great if > somebody could take this up to further the work. Details of the work can be > found in the links that I shared earlier under this thread. > Regards, > Bal Krishna > > > >> >> >> On Sat, Dec 3, 2011 at 1:42 PM, Sushil <[email protected]> wrote: >> >>> Hi, >>> >>> I am from OTRC >>> Yes we have been working on Nepali OCR. >>> The most difficult portion for us was the segmentation of nepali >>> characters so that it could be trained in tesseract-ocr engine. >>> But recently we have got some good results in segmentation. So the >>> remaining portion is training in tesseract for devnagiri characters >>> and building a good user interface. >>> >>> May be we can collaborate for further development. >>> You can reach me @ 9849038151 for more information. >>> >>> >>> On Dec 3, 11:38 am, ANISH SHRESTHA <[email protected]> wrote: >>> > Dhanyabad sir. I should correspond to OTRC and LTK for more details. >>> Very >>> > hoping it might help digitization of the govt data! I totally very >>> > appreciate your help sir. >>> > >>> > Cheers!!! >>> > >>> > On Sat, Dec 3, 2011 at 10:25 AM, Rajesh Pandey < >>> [email protected]>wrote: >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > Yes i worked for nepaliocr at mpp after my thesis on it at Kathmandu >>> > > University. Currently OTRC and LTK are working on it. Tesseract for >>> > > devanagari and sanskritocr are some other ocrs that i know. Accuracy >>> of >>> > > Sanskritocr is fairly good however it produces result in >>> German/Roman. >>> > >>> > > Sent from my <your samsung devicename>. >>> > > On Dec 1, 2011 4:26 PM, "ANISH SHRESTHA" <[email protected]> >>> > > wrote: >>> > >>> > >> Thank you everyone! Will get back for more details!! >>> > >>> > >> Totally appreciate the help. >>> > >>> > >> On Thu, Dec 1, 2011 at 11:26 AM, Bal Krishna Bal < >>> > >> [email protected]> wrote: >>> > >>> > >>> Hi, >>> > >>> The link below lists down some efforts on the Research and >>> Development >>> > >>> of the Nepali OCR in the past. >>> > >>> > >>> >>> http://nepalinux.org/index.php?option=com_content&task=view&id=46&Ite... >>> > >>> > >>> I think the Open Technology Resource Center (OTRC) guys were also >>> > >>> working on it. >>> > >>>http://www.otrc.gov.np/?q=projects/devanagari-ocr >>> > >>> > >>> Please feel free to contact the Language Technology Kendra (LTK, >>> > >>>http://ltk.org.np) if you require further information. >>> > >>> > >>> Regards, >>> > >>> Bal Krishna Bal >>> > >>> Chief Technical Officer >>> > >>> Language Technology Kendra >>> > >>> Lalitpur, PatanDhoka >>> > >>> Assistant Professor >>> > >>> Department of Computer Science and Engineering >>> > >>> Kathmandu University >>> > >>> Dhulikhel, Kavre >>> > >>> Nepal >>> > >>> > >>> On Thu, Dec 1, 2011 at 10:43 AM, Sagar Kshetri < >>> [email protected]>wrote: >>> > >>> > >>>> I think it is underdevelopment on mpp or ku. >>> > >>>> project close bhayo re bhanne halla pani suneko ho. >>> > >>>> better to contact mpp or ku >>> > >>> > >>>> On Wed, Nov 30, 2011 at 4:08 PM, ANISH SHRESTHA < >>> > >>>> [email protected]> wrote: >>> > >>> > >>>>> I have been searching Nepali OCR and found some researches was >>> going >>> > >>>>> about that at NepalLinux couple year ago. But could not track >>> anything >>> > >>>>> later that!! >>> > >>> > >>>>> Would be very grateful if anyone could help me on this!! >>> > >>> > >>>>> Thank you in advance. >>> > >>> > >>>>> Cheers! >>> > >>> > >>>>> -- >>> > >>>>> Anish Shrestha >>> > >>>>> Mob:(+977)-9841472979 >>> > >>>>> [email protected] >>> > >>>>>http://aniXification.com >>> > >>>>> Lalitpur, Nepal. >>> > >>> > >>>>> -- >>> > >>>>> FOSS Nepal mailing list: [email protected] >>> > >>>>>http://groups.google.com/group/foss-nepal >>> > >>>>> To unsubscribe, e-mail: [email protected] >>> > >>> > >>>>> Mailing List Guidelines: >>> > >>>>>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> > >>>>> Community website:http://www.fossnepal.org/ >>> > >>> > >>>> -- >>> > >>>> Regards >>> > >>> > >>>> ><((((º>`·.¸¸.·´¯`·.¸.·´¯`·...¸><((((º>¸. >>> > >>> > >>>> ·´¯`·.¸. , . .·´¯`·.. ><((((º>`·.¸¸.·´¯`·.¸.·´¯`·...¸><((((º> >>> > >>>> Mr. Sagar Kshetri (ASK?) >>> > >>>> Url:www.sagarkshetri.com.np >>> > >>> > >>>> -- >>> > >>>> FOSS Nepal mailing list: [email protected] >>> > >>>>http://groups.google.com/group/foss-nepal >>> > >>>> To unsubscribe, e-mail: [email protected] >>> > >>> > >>>> Mailing List Guidelines: >>> > >>>>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> > >>>> Community website:http://www.fossnepal.org/ >>> > >>> > >>> -- >>> > >>> FOSS Nepal mailing list: [email protected] >>> > >>>http://groups.google.com/group/foss-nepal >>> > >>> To unsubscribe, e-mail: [email protected] >>> > >>> > >>> Mailing List Guidelines: >>> > >>>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> > >>> Community website:http://www.fossnepal.org/ >>> > >>> > >> -- >>> > >> Anish Shrestha >>> > >> Mob:(+977)-9841472979 >>> > >> [email protected] >>> > >>http://aniXification.com >>> > >> Lalitpur, Nepal. >>> > >>> > >> -- >>> > >> FOSS Nepal mailing list: [email protected] >>> > >>http://groups.google.com/group/foss-nepal >>> > >> To unsubscribe, e-mail: [email protected] >>> > >>> > >> Mailing List Guidelines: >>> > >>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> > >> Community website:http://www.fossnepal.org/ >>> > >>> > > -- >>> > > FOSS Nepal mailing list: [email protected] >>> > >http://groups.google.com/group/foss-nepal >>> > > To unsubscribe, e-mail: [email protected] >>> > >>> > > Mailing List Guidelines: >>> > >http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> > > Community website:http://www.fossnepal.org/ >>> > >>> > -- >>> > Anish Shrestha >>> > Mob:(+977)-9841472979 >>> > [email protected]http://aniXification.com >>> > Lalitpur, Nepal. >>> >>> -- >>> FOSS Nepal mailing list: [email protected] >>> http://groups.google.com/group/foss-nepal >>> To unsubscribe, e-mail: [email protected] >>> >>> Mailing List Guidelines: >>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> Community website: http://www.fossnepal.org/ >>> >> >> >> >> -- >> Anish Shrestha >> Mob:(+977)-9841472979 >> [email protected] >> http://aniXification.com >> Lalitpur, Nepal. >> >> -- >> FOSS Nepal mailing list: [email protected] >> http://groups.google.com/group/foss-nepal >> To unsubscribe, e-mail: [email protected] >> >> Mailing List Guidelines: >> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >> Community website: http://www.fossnepal.org/ >> > > -- > FOSS Nepal mailing list: [email protected] > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: [email protected] > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ > -- Rajesh Pandey -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

