On Sat, Apr 28, 2012 at 6:22 PM, Naena Guru <[email protected]> wrote: > How I see Unicode is as a > set of character groups, 7-bit, 8-bit (extends and replaces 7-bit), 16-bit, > and CJKV that use some sort of 16-bit paring.
That's one lens to see Unicode through, but in most cases it's substantially distorting. Unicode is a set of 1112064 characters, divided up into a flat section of 55,296 characters, a break of 2048 non-characters, and then another 1,054,720 characters. There's a number of other ways to view it, but there's no guarantee that U+0370 won't be filled with an Egyptian hieroglyph, and any view of Unicode that assumes that it won't, is thus not a correct view. > As Unicode says, they are just > numeric codes assigned to letters or whatever other ideas. It is the task if > the devices to decide what they are and show them That is the concept of a character encoding. It has continued to exist since the first days of computing because plain text seems to encode something important and distinct from higher levels. > It shows perfectly when 'dressed' with a > smartfont. Except in IE, one of the most common browsers on the market. Except to anyone using a screen reader. > It takes about half the bandwidth to transmit that the double-byte set. Who cares. SMS's restrictions are not technical ones. G.711, the most common digital compression for telephony, uses 8 kb per second.* One byte per character or two, that's faster then you can type. Outside telephony, plain text is trivial; long novels, like Dracula, come in at under a MB, and download instantaneously for me--partially because it's automatically gzipped down to 330 KB. At 3 bytes per Even on not-so-good connections the time taken to download a full novel is nowhere near the time needed to read it, and is always a fraction of time needed to download a song, and is less than 1% of the time needed to download a TV show. http://www.lovatasinhala.com/ is 4 kb of text and 8 kb of images. The costs you're trying to impose on everyone to save 4 kb just aren't worth it, especially as you're sending 177 kb of font to avoid it. * Before anyone starts to mention kb = kilobytes, yes, 64 kilobits / sec = 8 kb / sec. > In the small market of Singhala, no font is present that > goes typographically well with Arial Unicode. There is no incentive or money > to make beautiful fonts for a minority language like Singhala. I'm sorry; unfortunately, that's what's known as a Hard Problem. There is nothing any character encoding can do about that. > I hope both the mobile device industry and the PC side separate fonts and > characters and allow the users to decide the default font sets in their > devices. It'd be nice, but that doesn't have much to do with Unicode. >This is eminently rational because the rendering of the font > happens locally, whereas the characters travel across the network. I don't see the connection. The font is almost always local, whether or not it's user-selectable. > This will > also help those who like me who understand that their language is better > served by a transliteration solution than a convoluted double-byte solution > that discourages the natives to use their script. I see no evidence that using an industry-standard solution that treats all scripts equally discourages people from using the script. I do think that "Please get a browser that keeps with times" discourages people. -- Kie ekzistas vivo, ekzistas espero.

