"Martin v. Löwis" writes: > > TR 15, section 19, numbered paragraph 3 > > """ > > Higher-level processes that transform or compare strings, or that > > perform other higher-level functions, must respect canonical > > equivalence or problems will result. > > """ > > That's not a mandatory requirement, but an "important aspect". Also, > it applies to "higher-level processes"; I would expect that string > comparison is not a higher-level function. Indeed, UAX#15 only > gives definitions, no rules.
In the language of these standards, I would expect that string comparison is exactly the kind of higher-level process they have in mind. In fact, it is given as an example in what Jim quoted above. > > C9 A process shall not assume that the interpretations of two > > canonical-equivalent character sequences are distinct. > > Right. What is "a process"? Anything that accepts Unicode on input or produces it on output, and claims to conform to the standard. > > ... > > Ideally, an implementation would always interpret two > > canonical-equivalent character sequences identically. There are > > practical circumstances under which implementations may reasonably > > distinguish them. > > """ > > So it should be the application's choice. I don't think so. I think the kind of practical circumstance they have in mind is (eg) a Unicode document which is PGP-signed. PGP clearly will not be able to verify a canonicalized document, unless it happened to be in canonical form when transmitted. But I think it is quite clear that they do not admit that an implementation might return False when evaluating u"L\u00F6wis" == u"Lo\u0308wis". > So this *allows* to canonicalize strings, it doesn't *require* Python > to do so. Indeed, doing so would be fairly expensive, and therefore > it should not be done (IMO). It would be much more expensive to make all string comparisons grok canonical equivalence. That's why it *allows* canonicalization. Otherwise the PGP signature case would suggest that canonicalization should be forbidden (except where that is part of the definition of the process), and canonical equivalencing be done at the site of each comparison. You are correct that this is outside the scope of PEP 3131, but I don't want your interpretation of "Unicode conformance" (which I believe to be incorrect) to go unchallenged. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com