At 11:53 AM 2/7/02 -0600, David Starner wrote:
>a superset of a number of preexisting character sets, so that it was
>possible for those users to move to Unicode without problems. Since
>important preexisting character sets seperated Greek, Cyrillic and Latin
>scripts, Unicode had to. Had Unicode not chosen to follow these
>principles, ISO 10646 would have, and it would have become the dominant
>character set, with the same problems.

Actually, this discussion ignores that, in order to be workable, a
character set standard for *cased* scripts, must support context
free case transitions.

That's why B, B, and B need to be separated, since they lower case
into the three different characters 'b', 'beta' and 'small B'.
That they are also considered to come from different scripts, just
reinforces that argument.

However, the Latin character that looks like a captital D with stroke
can lowercase into a straight 'd with stroke' or a curly form, which
is an icelandic letter. As long as the two lower case forms aren't
unified, and little speaks in favor of that, least of all, legacy,
then the two upper case forms must be separated as well.

The one exception that survived (Turkish I) is causing innumerable
problems, which supports the rule I gave at the outset.

Any workable multilingual character set containing these characters
will allow spoofing on the character level, and all existing ones
(including 8859-7 for Latin/Greek for example) do.

But, as the discussion shows, spoofing on the word level (.com
for .gov) is alive and well, and supported by any character set
whatsoever. For that reason, it seems to promise little gain to
try to chase the holy grail of a multilingual character set that
somehow avoids the character level spoofing, if the word level
spoofing can go on unchecked.

A./

Reply via email to