Hello,
    Thanks for the reply. I will check the points as you said, as far as the
font issues are considered. We all know how jna,shra and ksh are formed in
UNICODE and ISCII, but the point I wanted to make was, if we have to sort /
search / process the data in Devanagari script, then we have to keep track
of at least three characters and not one. This becomes tedious, thought not
impossible. If single
code point is present it will be very easy to process.
    With regards, to "predict language by using some heuristic", in my
opinion it is a very risky solution, at least when I don't have much
information at stage one of my application. I am running OCR engine on a
Devanagari page, then based on the formatting, tagging the language. So I
think tagging, as I am doing right now is a better solution. I also agree
with the views expressed by Asmus Freytag, that if we go on including all
the 6000 languages, it will be extremely impossible to cross-correlate these
'code pages'.

-Aditya



Reply via email to