At 11:53 AM 2/7/02 -0600, David Starner wrote: >a superset of a number of preexisting character sets, so that it was >possible for those users to move to Unicode without problems. Since >important preexisting character sets seperated Greek, Cyrillic and Latin >scripts, Unicode had to. Had Unicode not chosen to follow these >principles, ISO 10646 would have, and it would have become the dominant >character set, with the same problems.
Actually, this discussion ignores that, in order to be workable, a character set standard for *cased* scripts, must support context free case transitions. That's why B, B, and B need to be separated, since they lower case into the three different characters 'b', 'beta' and 'small B'. That they are also considered to come from different scripts, just reinforces that argument. However, the Latin character that looks like a captital D with stroke can lowercase into a straight 'd with stroke' or a curly form, which is an icelandic letter. As long as the two lower case forms aren't unified, and little speaks in favor of that, least of all, legacy, then the two upper case forms must be separated as well. The one exception that survived (Turkish I) is causing innumerable problems, which supports the rule I gave at the outset. Any workable multilingual character set containing these characters will allow spoofing on the character level, and all existing ones (including 8859-7 for Latin/Greek for example) do. But, as the discussion shows, spoofing on the word level (.com for .gov) is alive and well, and supported by any character set whatsoever. For that reason, it seems to promise little gain to try to chase the holy grail of a multilingual character set that somehow avoids the character level spoofing, if the word level spoofing can go on unchecked. A./