On Thu, Oct 13, 2016 at 5:17 PM, Stephen J. Turnbull <turnbull.stephen...@u.tsukuba.ac.jp> wrote: > Chris Angelico writes: > > > I'm not sure what you mean by "strcmp-able"; do you mean that the > > lexical ordering of two Unicode strings is guaranteed to be the same > > as the byte-wise ordering of their UTF-8 encodings? > > This is definitely not true for the Han characters. In Japanese, the > most commonly used lexical ordering is based on the pronunciation, > meaning that there are few characters (perhaps none) in common use > that has a unique place in lexical ordering (most individual > characters have multiple pronunciations, and even many whole personal > names do).
Yeah, and even just with Latin-1 characters, you have (a) non-ASCII characters that sort between ASCII characters, and (b) characters that have different meanings in different languages, and should be sorted differently. So lexicographical ordering is impossible in a generic string sort. ChrisA _______________________________________________ Python-ideas mailing list Pythonfirstname.lastname@example.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/