[precis] Uncode version review

John C Klensin Mon, 21 Jul 2014 13:52:21 -0700

Hi.

The recently-posted I-D,
draft-klensin-idna-5892upd-unicode70-00.txt, is oriented toward
IDNA but identifies an issue that reinforces my concerns about
precis-framework citing IDNA's categories and rules by inclusion
rather than by reference.


My sense is that there really isn't enough energy to do a lot of
hard, down-in-the-details, multiple-script work on i18n in the
IETF.  PRECIS has moved along slowly, the IAB i18n Program has
had difficulties getting work done, and one or two other
possible efforts of which I'm aware have not gotten off the
ground.  

Some of it takes a lot of work, too: Patrik and I, with the help
of Andrew Sullivan and a few others, spent significant time over
several months trying to completely understand the problem and
how to explain it to each other and others.  The draft cited
above explains the problem (maybe still not as well as needed),
and proposes a specific action for IDNA.

It describes a situation in which two ways to code and represent
the same character (same script and even the same Unicode name,
not a look-alike issue) do not compare equal under
normalization.  For IDNA and other user-visible and confusable
identifiers that is almost certainly a bad situation (whether
worth fixing and how is a separate situation).  For strings that
might be improved by being hard to type or obscure, it may be
wonderful.  Either way, we have now discovered that such
characters exist in Unicode and discovered it as part of new
version review.

I think this identifies three challenges for PRECIS:

(1) Unlike IDNA several years ago, we now know that the problem
exists, so it would be inappropriate for PRECIS to not discuss
it and figure out what to do.  In particular, while I don't
recommend that PRECIS keep separate rule sets and categories
from IDNA, if you are going to do so, it might be worth
addressing this new character and the several cases that are
similar to it in some consistent way.

(2) This was not discovered, and could not have been discovered,
by running some computations with a few implementations of an
algorithm.  Detecting it required careful examination of changes
in a new version of Unicode on (shudder) a character by
character basis.  That isn't as bad as it might seem because
newly-added scripts are rarely or ever a problem we have to
worry about -- similar-looking characters from different scripts
are common and we, and Unicode have accepted that we have to
deal with them.  The issues can arise only when new precomposed
characters are added to existing scripts (a situation that we
believe we were told when IDNA2008 was being completed would
never occur again).   But it does require review if
potentially-significant risks are not to get through unnoticed.
And that review needs to be "expert" and with a lot of
attention, not only running an algorithm and seeing if it
identifies a property change.

Now, if precis-framework incorporated the rules and categories
of IDNA by reference, including allowing forward-pointing
references to issues detected there, we would need only one
review effort and team.  The present model requires a separate
one --and, if actions are required, presumably separate IETF
review-- for PRECIS (maybe, given multiple profiles, more than
one).  While that separate review would have the advantage of
being able to tune things very precisely (sic) to PRECIS needs,
it will, if the IDNA experience this time is indicative, require
a lot of energy.

I think the WG needs to examine the question of where the
necessary skill level and energy are going to come from in the
near term and for as long as new versions of Unicode keep being
produced.  

(3) The language about the per-version review in
precis-framework seems to me (without having done a careful
side-by-side comparison) to be little more lightweight and less
systematic than that called for by IDNA.  In particular, it does
not seem to me to allow the type of analysis that detected this
problem since the issue is neither one of consistency and
applicability of rules or the ability to create a table.   If
the intent is that be covered by the expert review, instructions
to the expert seem to be missing.

Recommendations on request if not obvious.   But I suggest that
people look through the I-D to the extend needed to understand
at least the general nature of the problem.  If you find that
difficult, please think of it not only in terms of the quality
of my writing (that part can be checked by reading the relevant
(and referenced) sections of the Unicode Standard too) but in
terms of your expectations of experts and the discussion above.

best,
    john
 


 

_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

[precis] Uncode version review

Reply via email to