Hi

Has anyone tried to create Nepali language data for tesseract ?

I think Hindi/Sanskrit data files can also be used for tesseract.
I don't know which place is it to discuss about this : tesseract ocr forum
or fossnepal.

Any suggestions on this ?

Art is a librarian at the University of Windsor and have been working on
using open source OCR for newspaper collections. He was asked about Nepali
by a friend and became curious but he doesn't have a specific project for
the language at this point. He opts
tesseract<http://code.google.com/p/tesseract-ocr/>for this and wants
to use it for newspaper pages in batch.



Earlier I was interested in creating a Nepali OCR but I am these days more
into creating Nepali Translator [Hindi or English to Nepali text
translator<http://code.google.com/p/nepaliwikipediatranslator>
]
I read tesseract-ocr threads daily but still I prefer to be called a noob
in this regards.


====
On Tue, Apr 17, 2012 at 7:26 AM, Art W Rhyno <> wrote:

> Hi,
>
> I was curious whether you were aware of any efforts to create Nepali
> language data for tesseract 3.01 and above. I see the 2.x test data but I
> can't find anything more recent.
>
> art
> ---
> Art Rhyno
> Systems Librarian
> University of Windsor
>



art
---
Art Rhyno
Systems Librarian
University of Windso



On Sat, Dec 3, 2011 at 7:10 PM, Bal Krishna Bal <[email protected]>wrote:

> Hi Anish,
>
> On Sat, Dec 3, 2011 at 2:04 PM, ANISH SHRESTHA 
> <[email protected]>wrote:
>
>> Great. that's  good news!! I have been wondering how accurate does it
>> analyze the handwritten scanned documents.
>>
>> 100% accuracy is hardly possible to obtain and there are of course, there
>> are many criteria that would support in its accuracy. Is it possible to
>> know the current status of the project? are we ready to jump
>> into digitization already?  How optimistic can we be about the digitization
>> of the documents in near future?
>>
>
>> Honestly, glad to hear about the progress of the project! Cheers!!
>>
>
> There is still a long way to go for an accurate OCR application for
> scanned image of printed text let alone the handwritten scanned documents,
> the latter entailing additional challenges as handwritten texts are hardly
> uniform and clean compared to printed text. There are issues over
> segmentation of words in Nepali. Tesseract although is a good classifier or
> recognition engine does not recognize conjoined characters. There is hence
> the challenge to develop an accurate segmentation module. We have made some
> effort
> in the past on this front in the past and it would be really great if
> somebody could take this up to further the work. Details of the work can be
> found in the links that I shared earlier under this thread.
> Regards,
> Bal Krishna
>
>
>
>>
>>
>> On Sat, Dec 3, 2011 at 1:42 PM, Sushil <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am from OTRC
>>> Yes we have been working on Nepali OCR.
>>> The most difficult portion for us was the segmentation of nepali
>>> characters so that it could be trained in tesseract-ocr engine.
>>> But recently we have got some good results in segmentation. So the
>>> remaining portion is training in tesseract for devnagiri characters
>>> and building a good user interface.
>>>
>>> May be we can collaborate for further development.
>>> You can reach me @ 9849038151 for more information.
>>>
>>>
>>> On Dec 3, 11:38 am, ANISH SHRESTHA <[email protected]> wrote:
>>> > Dhanyabad sir. I should correspond to OTRC and LTK for more details.
>>> Very
>>> > hoping it might help digitization of the govt data!  I totally very
>>> > appreciate your help sir.
>>> >
>>> > Cheers!!!
>>> >
>>> > On Sat, Dec 3, 2011 at 10:25 AM, Rajesh Pandey <
>>> [email protected]>wrote:
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > > Yes i worked for nepaliocr at mpp after my thesis on it at Kathmandu
>>> > > University. Currently OTRC and LTK are working on it. Tesseract for
>>> > > devanagari and sanskritocr are some other ocrs that i know. Accuracy
>>> of
>>> > > Sanskritocr is fairly good however it produces result in
>>> German/Roman.
>>> >
>>> > > Sent from my <your samsung devicename>.
>>> > > On Dec 1, 2011 4:26 PM, "ANISH SHRESTHA" <[email protected]>
>>> > > wrote:
>>> >
>>> > >> Thank you everyone! Will get back for more details!!
>>> >
>>> > >> Totally appreciate the help.
>>> >
>>> > >> On Thu, Dec 1, 2011 at 11:26 AM, Bal Krishna Bal <
>>> > >> [email protected]> wrote:
>>> >
>>> > >>> Hi,
>>> > >>> The link below lists down some efforts on the Research and
>>> Development
>>> > >>> of the Nepali OCR in the past.
>>> >
>>> > >>>
>>> http://nepalinux.org/index.php?option=com_content&task=view&id=46&Ite...
>>> >
>>> > >>> I think the Open Technology Resource Center (OTRC) guys were also
>>> > >>> working on it.
>>> > >>>http://www.otrc.gov.np/?q=projects/devanagari-ocr
>>> >
>>> > >>> Please feel free to contact the Language Technology Kendra (LTK,
>>> > >>>http://ltk.org.np) if you require further information.
>>> >
>>> > >>> Regards,
>>> > >>> Bal Krishna Bal
>>> > >>> Chief Technical Officer
>>> > >>> Language Technology Kendra
>>> > >>> Lalitpur, PatanDhoka
>>> > >>> Assistant Professor
>>> > >>> Department of Computer Science and Engineering
>>> > >>> Kathmandu University
>>> > >>>  Dhulikhel, Kavre
>>> > >>> Nepal
>>> >
>>> > >>> On Thu, Dec 1, 2011 at 10:43 AM, Sagar Kshetri <
>>> [email protected]>wrote:
>>> >
>>> > >>>> I think it is underdevelopment on mpp or ku.
>>> > >>>> project close bhayo re bhanne halla pani suneko ho.
>>> > >>>> better to contact mpp or ku
>>> >
>>> > >>>> On Wed, Nov 30, 2011 at 4:08 PM, ANISH SHRESTHA <
>>> > >>>> [email protected]> wrote:
>>> >
>>> > >>>>> I have been searching Nepali OCR and found some researches was
>>> going
>>> > >>>>> about that at NepalLinux couple year ago. But could not track
>>> anything
>>> > >>>>> later that!!
>>> >
>>> > >>>>> Would be very grateful if anyone could help me on this!!
>>> >
>>> > >>>>> Thank you in advance.
>>> >
>>> > >>>>> Cheers!
>>> >
>>> > >>>>> --
>>> > >>>>> Anish Shrestha
>>> > >>>>> Mob:(+977)-9841472979
>>> > >>>>> [email protected]
>>> > >>>>>http://aniXification.com
>>> > >>>>> Lalitpur, Nepal.
>>> >
>>> > >>>>> --
>>> > >>>>> FOSS Nepal mailing list: [email protected]
>>> > >>>>>http://groups.google.com/group/foss-nepal
>>> > >>>>> To unsubscribe, e-mail: [email protected]
>>> >
>>> > >>>>> Mailing List Guidelines:
>>> > >>>>>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>>> > >>>>> Community website:http://www.fossnepal.org/
>>> >
>>> > >>>> --
>>> > >>>> Regards
>>> >
>>> > >>>> ><((((º>`·.¸¸.·´¯`·.¸.·´¯`·...¸><((((º>¸.
>>> >
>>> > >>>> ·´¯`·.¸. , . .·´¯`·.. ><((((º>`·.¸¸.·´¯`·.¸.·´¯`·...¸><((((º>
>>> > >>>> Mr. Sagar Kshetri (ASK?)
>>> > >>>> Url:www.sagarkshetri.com.np
>>> >
>>> > >>>>  --
>>> > >>>> FOSS Nepal mailing list: [email protected]
>>> > >>>>http://groups.google.com/group/foss-nepal
>>> > >>>> To unsubscribe, e-mail: [email protected]
>>> >
>>> > >>>> Mailing List Guidelines:
>>> > >>>>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>>> > >>>> Community website:http://www.fossnepal.org/
>>> >
>>> > >>>  --
>>> > >>> FOSS Nepal mailing list: [email protected]
>>> > >>>http://groups.google.com/group/foss-nepal
>>> > >>> To unsubscribe, e-mail: [email protected]
>>> >
>>> > >>> Mailing List Guidelines:
>>> > >>>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>>> > >>> Community website:http://www.fossnepal.org/
>>> >
>>> > >> --
>>> > >> Anish Shrestha
>>> > >> Mob:(+977)-9841472979
>>> > >> [email protected]
>>> > >>http://aniXification.com
>>> > >> Lalitpur, Nepal.
>>> >
>>> > >> --
>>> > >> FOSS Nepal mailing list: [email protected]
>>> > >>http://groups.google.com/group/foss-nepal
>>> > >> To unsubscribe, e-mail: [email protected]
>>> >
>>> > >> Mailing List Guidelines:
>>> > >>http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>>> > >> Community website:http://www.fossnepal.org/
>>> >
>>> > >  --
>>> > > FOSS Nepal mailing list: [email protected]
>>> > >http://groups.google.com/group/foss-nepal
>>> > > To unsubscribe, e-mail: [email protected]
>>> >
>>> > > Mailing List Guidelines:
>>> > >http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>>> > > Community website:http://www.fossnepal.org/
>>> >
>>> > --
>>> > Anish Shrestha
>>> > Mob:(+977)-9841472979
>>> > [email protected]http://aniXification.com
>>> > Lalitpur, Nepal.
>>>
>>> --
>>> FOSS Nepal mailing list: [email protected]
>>> http://groups.google.com/group/foss-nepal
>>> To unsubscribe, e-mail: [email protected]
>>>
>>> Mailing List Guidelines:
>>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>>> Community website: http://www.fossnepal.org/
>>>
>>
>>
>>
>> --
>> Anish Shrestha
>> Mob:(+977)-9841472979
>> [email protected]
>> http://aniXification.com
>> Lalitpur, Nepal.
>>
>> --
>> FOSS Nepal mailing list: [email protected]
>> http://groups.google.com/group/foss-nepal
>> To unsubscribe, e-mail: [email protected]
>>
>> Mailing List Guidelines:
>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>> Community website: http://www.fossnepal.org/
>>
>
>  --
> FOSS Nepal mailing list: [email protected]
> http://groups.google.com/group/foss-nepal
> To unsubscribe, e-mail: [email protected]
>
> Mailing List Guidelines:
> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
> Community website: http://www.fossnepal.org/
>



-- 
Rajesh Pandey

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to