>The best fix for the unconnected > scripts may be to break them into sub-akshara glyphs and recognize those > separately.
After correct recognition, is there a method to put the output in the the accepted form the language. MNS Rao On Apr 2, 8:43 pm, Ray Smith <[email protected]> wrote: > The biggest problem with unconnected Indic scripts seems to be the aspect > ratio and the amount of horizontal detail. Hindi seems to work quite well as > it doesn't seem to have very big ligatures. The best fix for the unconnected > scripts may be to break them into sub-akshara glyphs and recognize those > separately. > > Ray. > Sent from my Nexus1 Android phone. > On Mar 29, 2011 11:18 PM, "Debayan Banerjee" <[email protected]> wrote: > > > > > > > > > Hi, > > > I gather that Tesseract 3.0 works well for Chinese script now. The > > hallmark of Chinese script is that it is unconnected (unlike say Hindi > > which has a line connecting all its characters), and it has a large > > number of characters in the alphabet. In this light, I think it should > > also work well with unconnected Indic script such as Kannada, > > Malayalam, Punjabi etc. > > > Anyone know if this works? > > > -- > > Debayan Banerjee > >http://hacking-tesseract.blogspot.com/ > > > -- > > You received this message because you are subscribed to the Google Groups > > "tesseract-ocr" group.> To post to this group, send email to > [email protected]. > > To unsubscribe from this group, send email to > > [email protected].> For more options, visit this > group at > > http://groups.google.com/group/tesseract-ocr?hl=en. > > > > > > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

