--On Tuesday, February 03, 2015 22:40 -0600 Pete Resnick
<[email protected]> wrote:

>...
> Your suggestion I think is making too strong a claim, but I
> see where you're going. You needn't limit to a single script
> or otherwise heavily restrict to stay clear of potential
> problems; you simply have to stick to the more restrictive
> classes provided, or if you need to use free-form, then
> restrict to something more limited. So perhaps this would be
> clearer, and capture your concern:
> 
>     Even so, implementations that are sensitive to the advice
> given in
>     this specification (to use the more restrictive String
> Classes, or
>     otherwise to only allow a restricted set of characters,
> particularly
>     ones whose implications they actually understand) are
> unlikely to
>     run into significant problems as a consequence of these
> issues or
>     potential changes.

Pete,

It seems to me that "particularly ones whose implications they
actually understand" in this newer phrasing essentially
encourages people to allow/use characters whose implications
they know they don't understand.  I don't think we should
encourage that.  Ever.  It may well happen with FreeFormClass,
but I still think we shouldn't encourage it.

Bjoern wrote,
 
        "I think the text to be added is very clear that
        implementations should disallow all characters whose
        implications are not actually understood. If you cannot
        produce a complete list of all characters
        implementations should disallow with the current version
        of the Unicode standard, then my comment seems very
        applicable to me."

We cannot produce such a list, especially in the presence of the
issue of decomposability being less predictable than we had
assumed, but it is equally possible to have characters in "a
single script" (even the script in which one's first language is
written) that one does not understand and characters in other
scripts that one does.

To illustrate the problem within the Latin script, while I might
be able to make guesses from his use of a domain in the DE.
tree, I would need to claim significantly more understanding
than I have to be sure whether "Bjoern" is most properly
"Bjoern", "Björn" or "Bjørn" and, if the latter, whether it is
expected to be coded as U+00F8 or as \u'006F'\u'0338' (or, for
that matter, something else).  Anyone who doesn't fully
understand that remark probably doesn't fully understand the
code points involved.  I expect that Bjoern (sic) does
understand them, but that many users of the Latin script might
not.

    john

_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to