Re: [precis] Enforcement as an Idempotent operation

Peter Saint-Andre Sun, 12 Feb 2017 11:28:25 -0800

Hi Bill, thanks for your message and sorry about the seriously delayedreply - I've been working to finish some other Internet-Drafts and nowhave time again to finish the PRECIS updates.


On 10/13/16 1:33 PM, William Fisher wrote:

On Wed, Oct 12, 2016 at 9:03 PM, Peter Saint-Andre <stpe...@stpeter.im> wrote:

It's not clear to me that U+1F11 has the problem you describe; perhaps could 
you sketch it out further?


Oops, that should be U+0001F11A.

Did you mean U+212A (KELVIN SIGN)? That decomposes to U+004B (LATINCAPITAL LETTER K).

The full example is:
"\U0001f11aevin" => "(K)evin" => "(k)evin"


Yes, "U+212Aevin" => "Kevin" via NFKC.

However, "U+212Aevin" => "kevin" via toLower() if I am not mistaken.

I wrote a program to categorize characters that are not idempotent
under Nickname "ToLower" (ignoring white space). The numbers are the
same for Unicode 6.3, 8.0 and 9.0.

{
  '<font>': 467,
  '<square>': 90,
  '<compat>': 35,
  '<super>': 27,
  '<circle>': 4
}

Would you mind sending me your list of characters? (I'm happy to receiveit off-list.) I suspect that it might be similar to a list that emergedfrom differing assumptions regarding how to apply the PRECIS rules inimplementations. My original implementation for testing purposes wasrather naïve, whereas the implementation that Yoshiro Yoneya andTakahiro Nemoto created was smarter, in the sense that it would followthe chain of characters and decompose each one fully as it went along(this might require a few rounds of applying the normalization rule inorder to fully decompose the original characters).

The following two characters also appear to fail the idempotent test.
The initial decompositions do not begin with '<'.

\u03d3 GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
\u03d4 GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL

These examples are different from the KELVIN SIGN example, because thereis no direct toLower() transformation - normalization needs to happenbefore toLower() is applied.

Thanks for your input. Personally I will think about it further and post again 
after I do so.


To me, the problem is to take untrusted input, validate it using
specified rules, and transform it into a stable, unambiguous format.
I'm still learning more about Unicode.


Trust me: it never ends.

Is there a reason that the case
mapping rule has to be applied *before* the normalization rule?

As explained in Section 5.2.1 of RFC 7564, there is a good reason toapply the width mapping rule before the normalization rule. I'm now lesssure that it makes sense, for comparison purposes, to apply the casemapping rule before the normalization rule.

The
order appears to make a difference for NFKC.  I suppose the Nickname
"comparison" profile could re-apply the case mapping rule after the
normalization rule?

If I understand correctly, you are suggesting that an implementationthat is processing nickname strings for purposes of comparison would dothe following:


1. Apply the "enforcement" action in Section 2.3
2. Apply the "comparison" action in Section 2.4

Let's choose a practical but somewhat contrived example: a nickname ofΨϓΧΗ, which is U+03A8 U+03D3 U+03A7 U+0397 (something like an uppercaseversion of the Greek word for soul, although the accent is wrong). Thisincludes the code point U+03D3 that you mention above (which, by theway, is not the standard code point for the Greek letter upsilon but analternative with a hook symbol, the usual character being U+03A5).


The two-step process you suggest would involve the following:

1. The "enforcement" action results in normalization (note that fullnormalization involves several steps):


U+03A8 U+03D3 U+03A7 U+0397
=>
U+03A8 U+03D2 U+0301 U+03A7 U+0397
=>
U+03A8 U+03A5 U+0301 U+03A7 U+0397

(note that U+03D2 has a compatibility equivalent of U+03A5)

2. The "comparison" action results in case mapping:

U+03A8 U+03A5 U+0301 U+03A7 U+0397
=>
U+03C8 U+03C5 U+0301 U+03C7 U+03B7

Thus, for comparison purposes, ΨϓΧΗ and ψύχη would be considered equivalent.

Unfortunately, even though that seems to yield the correct outcome, it'snot what RFC 7700 specifies.

I'll continue to think about this - in particular, about any negativeimplications from modifying the order of operations so thatnormalization comes before case mapping (unlike what we specified in RFC7700). Because we would prefer that all the PRECIS specs follow the sameorder, we'd also need to look at the implications for RFC 7613 (althoughthe OpaqueString profile that we use there for passwords has quite adifferent purpose than the Nickname profile in RFC 7700); however, thismight not be possible.


Peter

_______________________________________________
precis mailing list
precis@ietf.org
https://www.ietf.org/mailman/listinfo/precis

Re: [precis] Enforcement as an Idempotent operation

Reply via email to