Hi Falke, Thanks for trying this out. The hindi language tesseract data files should work. While I was working in 2007-2008, Hindi language data files were not available. A bengali guy called debayanin tried hard to use hindi / devanagari. Today the hindi language data files (tessdata) are available. I haven't tested it. But I am sure it should work. The question has been answered. Nepali Language should be able to use the hindi data files. It all depends on how much accurate the results for Hindi are. If Hindi is detected flawlessly, it should work similarly with Nepali. There is a slight difference in Nepali that some characters from Hindi are not used. However they are in the devanagari chart. Its good for Nepali that Nepali does not use those characters. If it had been the reverse, we should train again to incorporate those characters.
So everything should be fine. Thanks for testing out with the Nepali sample image. The result is not good but I think it can be done after digging out with correct Hindi tessdata and the new tesseract. Uh thanks everyone for reading this. 2012/5/1 Falke <[email protected]> > I subjected your png to some pre-processing (resize, blur, threshold, > etc.) and got slightly better results: > > ---------- my results ----------- > दृप्राक्वछ संसारन्मा पाइज्ञे प्रश्मीइरनंआ टूपबम्भन्दा चन्नाग्ध र > बुद्धिन्मात्न प्राणी > हो । यसले अक्वफ्लो बुद्धिको उपयोग ब्वगरेर संसारन्नाई बं सत्नाएको छा > ह्नरासँचद्धरि इसको चन्नान्धीठो रांहॉका सवं प्रग़गीत्माई कट्सभाएको छा > एक > सइपरांन्मा सार:; प्रग़गीहाँ टज्ञइगत्मरज्ञइचक्वत्म चह्मर्ल आक्वछठो > छात्र पात्रों छाडी > चन्द्रछग़आ सठोत्त पाइत्मा इउप्तिसकेको छा रांप्तत्ये शयद्धत्प्त > न्माप्तिसत्माई > हृपृह्नयुको हपुरब्रबाट बत्ताठज्ञे अऋत्तलुल्या औंषणा बत्नाएप्त कि । > संसारका सवं > न्माप्तिटूपत्माई ष्टकप्ताश सिपइरांश्वठज्ञे अज्वगुबन्म बत्नाएत्न कि । > घऊहिरिएर हैच हो अले > न्नानंछ, शो अद-छे रास संदृपारको कति ड्डप्तलौठो प्राणी रहेछ । > ---------- end my results ------ > > But, essentially, it's much better to start with higher-resolution > scans. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- Rajesh Pandey -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

