Re: [precis] WGLC: draft-ietf-precis-framework-09.txt

Peter Saint-Andre Tue, 08 Oct 2013 20:22:03 -0700

Hi Florian, thanks for the review! Comments inline.

On 09/10/2013 08:06 PM, Florian Zeitz wrote:

Am 28.08.2013 08:46, schrieb Yoshiro YONEYA:

Dear all,


This message starts two weeks Working Group Last Call (WGLC) on
draft-ietf-precis-framework-09.txt (PRECIS Framework: Preparation and
Comparison of Internationalized Strings in Application Protocols).

Please review the document and send comments to the list ([email protected]),
the co-chairs ([email protected]), or the authors
([email protected]) by the end of WGLC.

The WGLC will end on Wednesday, Sep 11th.

I've reviewed this draft, and generally think it takes a sensible approach.
However, I do have one major grief about it, as well as some smaller
comments.

The major thing that bothers me about this draft is that string classes
IMHO conflate to separate concepts. On the one hand they specify valid
and disallowed codepoints. On the other hand they specify (or rather,
let the application protocol specify) mappings and normalization.
The problem I have with this is, that it makes it unclear which strings
are valid in a certain class.

You are correct. Validity really applies at the level of a profile, nota class.


E.g. consider an applications protocol that specifies FreeformClass
mixed with NFKC. This means characters, which have a compatibility
equivalent are valid in the sense that they are FREE_PVAL, but are
invalid in the normalization form. It is unclear to me, whether a string
containing characters with a compatibility equivalent would be contained
in the FreeformClass, or more precisely, this specialization thereof.

Similar considerations are true for e.g. mixing case mapping with
IdentifierClass. Uppercase characters are PVALID/ID_PVAL, but shouldn't
be present after mapping.

I would prefer it if we specified classes solely in terms of valid and
disallowed codepoints and directionality requirements.

When you suggest that we specify a class in terms of codepoints, are yousuggesting that go back to something like the stringprep model, in whicha class or profile defines a lookup table?

We would then have separate text saying that an application protocol
MUST also specify which mappings and normalization to apply, what entity
needs to apply them (e.g. only the server), and when they need to be
applied (e.g. when comparing strings, before storing them, before
display to a user). Both StringPrep-bis and 6122bis already have text to
this effect. It seems sensible to me to generally require application
protocols to specify the "who", and "when" beyond the "what". E.g. it is
often sensible to display identifiers with their case as entered, but
compare them after case folding. The current text might suggest that
mappings have to be applied to user input immediately.

I agree that all good application protocols that use PRECIS need tospecify the enforcement rules, as we already do for SASL and XMPP. I amless sure that the PRECIS framework needs to legislate that.


The following are smaller comments ordered by section:

Section 3.1:
This section talks about "safety" of strings, without ever defining what
that means in this context. The term "very safe" used to describe the
IdentifierClass also strangely reminds me of statements about "absolute
security". Maybe there is a way to generally word this better?


I'll think about better wording and suggest something on the list.


The sentence "Directionality:  defines application behavior in the
presence of code points that have directionality" seems a bit off to me.
It is very different from the explanation given later in Section 4.1.
 From my understanding this is about the allowed combinations of
characters with directionality, and not about "application beahvior" in
their presence. It could be about both, but I have not seen a draft talk
about anything but allowed combinations (i.e. the Bidi Rule) yet.


See other messages in this thread.


Section 3.3.3 and 3.4.3:
While "unassigned codepoints are unassigned" is a nice tautology, I'm
not sure what this means in terms of their treatment. In general I feel
like more explanation is needed about unassigned codepoints and their
(possible) handling.

Good point. I'll propose text.


Section 3.3.6:
I think it would be sensible to suggest using the Unicode Default Case
Folding algorithm, if case mapping is to be applied.

That seems reasonable.


Section 5:
I feel like this lacks a normative statement about contextual rules.
E.g. "A character with the derived property value CONTEXTJ or CONTEXTO
    (CONTEXTUAL RULE REQUIRED) MUST NOT be used unless an appropriate
    rule has been established and the context of the character is

As mentioned, we just point to IDNA2008 here, but I think you and Martinare right that we need provide some more detals here. For example, RFC5891 says:

###

The Unicode string MUST NOT contain any characters whose validity iscontext-dependent, unless the validity is positively confirmed by acontextual rule. To check this, each code point identified as CONTEXTJor CONTEXTO in the Tables document [RFC5892<http://tools.ietf.org/html/rfc5892>] MUST have a non-null rule. If sucha code point is missing a rule, the label is invalid. If the rule existsbut the result of applying the rule is negative or inconclusive, theproposed label is invalid.


###

IMHO your text is more to the point.

Peter

_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Re: [precis] WGLC: draft-ietf-precis-framework-09.txt

Reply via email to