On Wed, Oct 12, 2016 at 9:03 PM, Peter Saint-Andre <stpe...@stpeter.im> wrote: > It's not clear to me that U+1F11 has the problem you describe; perhaps could > you sketch it out further?
Oops, that should be U+0001F11A. The full example is: "\U0001f11aevin" => "(K)evin" => "(k)evin" I wrote a program to categorize characters that are not idempotent under Nickname "ToLower" (ignoring white space). The numbers are the same for Unicode 6.3, 8.0 and 9.0. { '<font>': 467, '<square>': 90, '<compat>': 35, '<super>': 27, '<circle>': 4 } The following two characters also appear to fail the idempotent test. The initial decompositions do not begin with '<'. \u03d3 GREEK UPSILON WITH ACUTE AND HOOK SYMBOL \u03d4 GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL > Thanks for your input. Personally I will think about it further and post > again after I do so. To me, the problem is to take untrusted input, validate it using specified rules, and transform it into a stable, unambiguous format. I'm still learning more about Unicode. Is there a reason that the case mapping rule has to be applied *before* the normalization rule? The order appears to make a difference for NFKC. I suppose the Nickname "comparison" profile could re-apply the case mapping rule after the normalization rule? Thanks, -Bill _______________________________________________ precis mailing list precis@ietf.org https://www.ietf.org/mailman/listinfo/precis