On 8 April 2011 01:22, Debayan Banerjee <[email protected]> wrote: > On 2 April 2011 21:13, Ray Smith <[email protected]> wrote: >> The biggest problem with unconnected Indic scripts seems to be the aspect >> ratio and the amount of horizontal detail. Hindi seems to work quite well as >> it doesn't seem to have very big ligatures. The best fix for the unconnected >> scripts may be to break them into sub-akshara glyphs and recognize those >> separately. >> > > Wrote a blog spot about a possible strategy to handle descender vowel > signs > http://hacking-tesseract.blogspot.com/2011/04/horizontal-histogram-profiles-of.html
This will work for Bengali and Hindi. Am not working on South Indian languages for now. When you say it seems to work well for HIndi, have you tested 3.0 with this? -- Debayan Banerjee -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

