Hi Falke, Here is a sample Image. I have more images that are used for testing but they are copyrighted so I can't send them here in public but I can email them individually.
On Mon, Apr 30, 2012 at 12:10 AM, Falke <[email protected]> wrote: > > > On Apr 26, 2:18 pm, Rajesh Pandey <[email protected]> wrote: > > > > Earlier I was interested in creating a Nepali OCR but I am these days > > > more > > > > > You were going to write the whole engine, from scratch? Wow. > > > > Yes indeed. We(as a team) were creating a complete OCR. We > > *were*researching and developing a full fledged Nepali OCR. > > > > Some of the work is still there at code.google.com/p/nepaliocr > > > > I haven't tried to train again. I was asking if anyone had ever tried for > > Nepali because there might be some people who had luck. If I'd know that > > people had luck training, it would be worth trying it. Its nearly 3 > years I > > had attempted to train tesseract for Nepali. > > > > Fossnepal is a group of Nepali Open source community. > > > > If you uploaded a sample scanned image to this forum, others > (including myself) could try it with tesseract. I'm not sure how much > difference there is between font(s) in (older?) Nepali documents and > Hindi documents... While the alphabet is the same (correct me if i'm > wrong), maybe the styles (font variations) are different enough to > call for separate training (?) But I don't think it should be SO > different as to negate the following deductive statement: "If > tesseract is trainable for Hindi, it should be trainable for Nepali > ". Or, IOW: At best -- you can piggyback on the hindi training; at > worst, you'll need to train specifically for nepali (therewith > achieving accuracy comparable to the one with Hindi). > > Of course, not being an expert on this, i may have to eat my words ... > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- Rajesh Pandey -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
<<attachment: nepaliSampleImage.png>>

