Re: Devanagari

David Starner Sun, 20 Jan 2002 21:57:01 -0800

On Sun, Jan 20, 2002 at 10:44:00PM -0500, Aman Chawla wrote:
> For sites providing archives
> of documents/manuscripts (in plain text) in Devanagari, this factor could be
> as high as approx. 3 using UTF-8 and around 1 using ISCII.


Uncompressed, yes. It shouldn't be nearly as bad compressed - gzip, zip,
bzip2, or whatever your favorite tool is. You could also use UTF-16 or
SCSU, which will get it down to about 2 or about 1, respectively.

What's your point in continuing this? Most of the people on this list
already know how UTF-8 can expand the size of non-English text. There's
nothing we can do about it. Even if you had brought it up when UTF-8
was being designed, there's not much anyone could have done about it.
There is no simple encoding scheme that will encode Indic text in
Unicode in one byte per character. 

It's the pigeonhole principle in action - if you need to encode 150,000
characters, you can't encode each one in one or two bytes, and while you
can write encodings that approach that for normal text, they aren't
going to be simple or pretty.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
When the aliens come, when the deathrays hum, when the bombers bomb,
we'll still be freakin' friends. - "Freakin' Friends"

Re: Devanagari

Reply via email to