On Apr 26, 2:18 pm, Rajesh Pandey <[email protected]> wrote: > > > Earlier I was interested in creating a Nepali OCR but I am these days > > more > > > You were going to write the whole engine, from scratch? Wow. > > Yes indeed. We(as a team) were creating a complete OCR. We > *were*researching and developing a full fledged Nepali OCR. > > Some of the work is still there at code.google.com/p/nepaliocr > > I haven't tried to train again. I was asking if anyone had ever tried for > Nepali because there might be some people who had luck. If I'd know that > people had luck training, it would be worth trying it. Its nearly 3 years I > had attempted to train tesseract for Nepali. > > Fossnepal is a group of Nepali Open source community. >
If you uploaded a sample scanned image to this forum, others (including myself) could try it with tesseract. I'm not sure how much difference there is between font(s) in (older?) Nepali documents and Hindi documents... While the alphabet is the same (correct me if i'm wrong), maybe the styles (font variations) are different enough to call for separate training (?) But I don't think it should be SO different as to negate the following deductive statement: "If tesseract is trainable for Hindi, it should be trainable for Nepali ". Or, IOW: At best -- you can piggyback on the hindi training; at worst, you'll need to train specifically for nepali (therewith achieving accuracy comparable to the one with Hindi). Of course, not being an expert on this, i may have to eat my words ... -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

