--On Thursday, February 05, 2015 19:23 +0900 "\"Martin J.
Dürst\"" <[email protected]> wrote:

>...
> I think the above should work. Giving that it's the most busy
> time of the Japanese academic year, I haven't been able to
> follow the discussion in as much detail as I would have
> wanted, but I remember that quite some time ago, I was very
> much concerned by phrases along the line of "only allow those
> characters whose implications they fully understand".
> 
> Except maybe for two or three people on the Unicode Technical
> Committee I know, I wouldn't want to claim that anybody knows
> the implications of even a significant (in terms of size and
> use) part of the Unicode repertoire. And for the average
> implementer or system administrator, it's of course much less.
> But we definitely don't want that to lead to a situation where
> we go back to (some time) last century and ASCII only.

Martin,

I don't see how you get from there to "ASCII only".  First,
there are a lot of people in the world who don't "understand the
implications of" Latin Script, even the basic undecorated Latin
characters and even though they might use them.  I think that,
while it may require some effort on their part, it is reasonable
to expect implementers and system administrators who establish
rules for identifiers to take responsibility for understanding
the use and possible risks associated with the characters of
their own scripts, especially the subset of those characters
that are relevant to their own languages.  

I recognize that makes it hard to design software systems that
are somehow internationally script-insensitive where identifiers
are concerned, but I think we have to live with that as the
price of the diversity of human languages and writing systems.
It may also imply a need for software implementations that are
far more rule-driven, possibly with locally-tailorable rules for
individual scripts, languages, and context, rather than an
approach that is construed as "this magic table of characters is
ok".   Again, that may be the price of the diversity of human
writing system and, by looking at tables and global profiles, we
may just be in denial about that diversity and its implications.


None of the above is made any easier by Unicode decisions,
however justified by the same diversity issues, pushing us from
design principles that apply to all of the coding system, to
design principles that are different on a per-script basis, to
specific and exception-driven rules such as "normalization does
all of the right comparison things within the Latin script
except for the following code points for which decomposition is
appropriate under some circumstances but not others" or "there
are case matching rules that are universally applicable except
for certain specific locales and code points, where special
treatment is needed".

It may be that we have been in denial, that the whole concept of
identifiers without language context is unworkable for at least
some protocols, and that we should be thinking of an
"internationalized identifier" as a tuple with a string and
language identifier.  Comparisons would then depend, not on
catenation and bit-by-bit comparison but on 
consideration of the language identifier based on RFC 4647 and
then interpretation and comparison of the string based on that
information.

That suggests that we should finish the PRECIS work based on
current documents rather than looking for a more prefect
solution (or textual phrasing) now.  However, it does also
suggest that, for at least some purposes, the PRECIS work may be
a waypoint rather than a final answer.

    john


_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to