Hello, Thanks for the reply. I will check the points as you said, as far as the font issues are considered. We all know how jna,shra and ksh are formed in UNICODE and ISCII, but the point I wanted to make was, if we have to sort / search / process the data in Devanagari script, then we have to keep track of at least three characters and not one. This becomes tedious, thought not impossible. If single code point is present it will be very easy to process. With regards, to "predict language by using some heuristic", in my opinion it is a very risky solution, at least when I don't have much information at stage one of my application. I am running OCR engine on a Devanagari page, then based on the formatting, tagging the language. So I think tagging, as I am doing right now is a better solution. I also agree with the views expressed by Asmus Freytag, that if we go on including all the 6000 languages, it will be extremely impossible to cross-correlate these 'code pages'.
-Aditya