On Sun, Feb 5, 2017 at 3:52 AM, boB Stepp <robertvst...@gmail.com> wrote: > Does the list sort() method (and other sort methods in Python) just go > by the hex value assigned to each symbol to determine sort order in > whichever Unicode encoding chart is being implemented?
list.sort uses a less-than comparison. What you really want to know is how Python compares strings. They're compared by ordinal at corresponding indexes, i.e. ord(s1[i]) vs ord(s2[i]) for i less than min(len(s1), len(s2)). This gets a bit interesting when you're comparing characters that have composed and decomposed Unicode forms, i.e. a single code vs multiple combining codes. For example: >>> s1 = '\xc7' >>> s2 = 'C' + '\u0327' >>> print(s1, s2) Ç Ç >>> s2 < s1 True where U+0327 is a combining cedilla. As characters, s1 and s2 are the same. However, codewise s2 is less than s1 because 0x43 ("C") is less than 0xc7 ("Ç"). In this case you can first normalize the strings to either composed or decomposed form [1]. For example: >>> strings = ['\xc7', 'C\u0327', 'D'] >>> sorted(strings) ['Ç', 'D', 'Ç'] >>> norm_nfc = functools.partial(unicodedata.normalize, 'NFC') >>> sorted(strings, key=norm_nfc) ['D', 'Ç', 'Ç'] [1]: https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor