On Sun, Jan 20, 2002 at 10:44:00PM -0500, Aman Chawla wrote: > For sites providing archives > of documents/manuscripts (in plain text) in Devanagari, this factor could be > as high as approx. 3 using UTF-8 and around 1 using ISCII.
Uncompressed, yes. It shouldn't be nearly as bad compressed - gzip, zip, bzip2, or whatever your favorite tool is. You could also use UTF-16 or SCSU, which will get it down to about 2 or about 1, respectively. What's your point in continuing this? Most of the people on this list already know how UTF-8 can expand the size of non-English text. There's nothing we can do about it. Even if you had brought it up when UTF-8 was being designed, there's not much anyone could have done about it. There is no simple encoding scheme that will encode Indic text in Unicode in one byte per character. It's the pigeonhole principle in action - if you need to encode 150,000 characters, you can't encode each one in one or two bytes, and while you can write encodings that approach that for normal text, they aren't going to be simple or pretty. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org When the aliens come, when the deathrays hum, when the bombers bomb, we'll still be freakin' friends. - "Freakin' Friends"

