The biggest problem with unconnected Indic scripts seems to be the aspect
ratio and the amount of horizontal detail. Hindi seems to work quite well as
it doesn't seem to have very big ligatures. The best fix for the unconnected
scripts may be to break them into sub-akshara glyphs and recognize those
separately.

Ray.
Sent from my Nexus1 Android phone.
On Mar 29, 2011 11:18 PM, "Debayan Banerjee" <[email protected]> wrote:
> Hi,
>
> I gather that Tesseract 3.0 works well for Chinese script now. The
> hallmark of Chinese script is that it is unconnected (unlike say Hindi
> which has a line connecting all its characters), and it has a large
> number of characters in the alphabet. In this light, I think it should
> also work well with unconnected Indic script such as Kannada,
> Malayalam, Punjabi etc.
>
> Anyone know if this works?
>
> --
> Debayan Banerjee
> http://hacking-tesseract.blogspot.com/
>
> --
> You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
[email protected].
> For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to