Aman Chawla wrote,
> I would be grateful if I could get opinions on the following: > 1. Which encoding/character set is most suitable for using Hindi/Marathi > (both of which use Devanagari) on the internet as well as in databases, and > why? In your response, please refer to: > http://www.iiit.net/ltrc/Publications/iscii_plugin_display.html, > particularly the following paragraphs: <snip> Unicode is the best. It is the World's standard for computer encoding, and, as such, offers the best possibility that text can be exchanged around the globe and cross-platform. The arguments about relative size are true, but in this day and age are considered unimportant. Graphics files are extremely large in comparison with text files of any script and so are sound files. Devanagari UTF-8 is three bytes. The four byte UTF-8 sequences so far are only used for Plane One Unicode and up. > 3. With reference to the previous question, can programs that convert > the myriad Devangari encodings in use today to a standard encoding > (question 1) be made freely available, and how? Yes, converters exist and are being distributed. Just go to the Google search engine and input "character conversion Unicode" into the box. Look for ICU and Rosette, to name a few. You might even run across Mark Leisher's download page at: http://crl.nmsu.edu/~mleisher/download.html and see the PERL script for converting the Naidunia Devanagari encoding to UTF-16. > 4. Is there any search engine on the internet that maintains an up to date > index of sites in Devanagari? If not, what can be done to encourage > proprietary search engines to support Hindi? Google supposedly has a > Hindi language option, but surprise, it's in Roman script! Several emails > to them have elicited the response: "At the moment we don't support > Devanagari..." This appears to be because Google is converting UTF-8 strings input to the search words box into decimal NCRs. Pasted यूनिकोड क्या है into the Google box, it displays fine. Since the "What is Unicode?" pages are popular and have been up for a while, thought that it would have a good chance of being indexed. But, there were no hits for the resulting search string: यूनिकोड क्या है ...which is not surprising since the actual page doesn't use NCRs. Best regards, James Kass.

