On Sun, 20 Jan 2002, Aman Chawla wrote: > Taking the extra links into account the sizes are: > English: 10.4 Kb > Devanagari: 15.0 Kb > Thus the Dev. page is 1.44 times the Eng. page. For sites providing archives > of documents/manuscripts (in plain text) in Devanagari, this factor could be > as high as approx. 3 using UTF-8 and around 1 using ISCII.
Well a trivial adjustment is to use UTF-16 to store your documents if you know they are going to be predominantly Devangari. Or if you have so much text that the number of extra disks is going to be painful, use SCSU to bring it very close to the ISCII ratio. Of course I would note that you can store millions of pages of plain-text on a single harddisk these days. If you going to be storing so many hundreds of millions of pages of plain text that the number of extra disks is a bother, I am amazed that none of it might be outside the ISCII repetoire. And this huge document archive has no graphics component to go with it... But the real reason for publishing the data in Unicode on the web is so people not using a machine specially configured for ISCII will still be able to read and process the data. [then later wrote:] > With regards to South Asia, where the most widely used modems are > approx. 14 kbps, maybe some 36 kbps and rarely 56 kbps, where > broadband/DSL is mostly unheard of, efficiency in data transmission is > of paramount importance... how can we convince the south asian user to > create websites in an encoding that would make his client's 14 kbps > modem as effective (rather, ineffective) as a 4.6 kbps modem? Can you read 500 characters per second? So long as they are receiving only plain text, even this dwaddling speed is not going to impact them. People wanting to efficiently transfer data will use a compression program. Geoffrey

