Re: [Python-Dev] order of Misc/ACKS
Xavier Morel writes: On 2011-11-12, at 10:24 , Georg Brandl wrote: Am 12.11.2011 08:03, schrieb Stephen J. Turnbull: The sensible thing is to just sort in Unicode code point order, I think. The sensible thing is to accept that there is no solution, and to stop worrying. The file could use the default collation order, that way it'd be incorrectly sorted for everybody. What I tell you three times is true. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
On 11/11/2011 11:03 PM, Stephen J. Turnbull wrote: The sensible thing is to just sort in Unicode code point order, I think. I was going to suggest the official Unicode Collation Algorithm: http://unicode.org/reports/tr10/ But I peeked in the can, saw it was chock-a-block with worms, and declined to open it. /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
Am 12.11.2011 08:03, schrieb Stephen J. Turnbull: Eli Bendersky writes: special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone. Yes, it is. The examples were already given in this thread. The Han-using languages also have this problem, and Japanese is nondetermistic all by itself (there are kanji names which for historical reasons are pronounced in several different ways, and therefore cannot be placed in phonetic order without additional information). The sensible thing is to just sort in Unicode code point order, I think. The sensible thing is to accept that there is no solution, and to stop worrying. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
On Nov 12, 2011, at 04:03 PM, Stephen J. Turnbull wrote: The sensible thing is to just sort in Unicode code point order, I think. M-x sort-lines-by-unicode-point-order RET wink -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
On 2011-11-12, at 10:24 , Georg Brandl wrote: Am 12.11.2011 08:03, schrieb Stephen J. Turnbull: Eli Bendersky writes: special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone. Yes, it is. The examples were already given in this thread. The Han-using languages also have this problem, and Japanese is nondetermistic all by itself (there are kanji names which for historical reasons are pronounced in several different ways, and therefore cannot be placed in phonetic order without additional information). The sensible thing is to just sort in Unicode code point order, I think. The sensible thing is to accept that there is no solution, and to stop worrying. The file could use the default collation order, that way it'd be incorrectly sorted for everybody. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
Hi, On 11/11/2011 10.39, Eli Bendersky wrote: The PS: at the top of Misc/ACKS says: PS: In the standard Python distribution, this file is encoded in UTF-8 and the list is in rough alphabetical order by last names. However, the last 3 names in the list don't appear to be part of that alphabetical order. Is this somehow intentional, or just a mistake? Only the last two are out of place, and should be fixed. The 'Å' in Peter Åstrand sorts after 'Z'. See http://mail.python.org/pipermail/python-dev/2010-August/102961.html for a discussion about the order of Misc/ACKS. Best Regards, Ezio Melotti Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
Am 11.11.2011 10:56, schrieb Ezio Melotti: Hi, On 11/11/2011 10.39, Eli Bendersky wrote: The PS: at the top of Misc/ACKS says: PS: In the standard Python distribution, this file is encoded in UTF-8 and the list is in rough alphabetical order by last names. However, the last 3 names in the list don't appear to be part of that alphabetical order. Is this somehow intentional, or just a mistake? Only the last two are out of place, and should be fixed. The 'Å' in Peter Åstrand sorts after 'Z'. See http://mail.python.org/pipermail/python-dev/2010-August/102961.html for a discussion about the order of Misc/ACKS. The key point here is that it is *rough* alphabetic order. IMO, sorting accented characters along with their unaccented versions would be fine as well, and be more practical. In general, it's not possible to provide a correct alphabetic order. For example, in German, 'ö' sorts after 'o', whereas in Swedish, it sorts after 'z'. In fact, in German, we have two different ways of sorting the ö: one is to treat it is a letter after o, and the other is to treat it as equivalent to oe. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
The key point here is that it is *rough* alphabetic order. IMO, sorting accented characters along with their unaccented versions would be fine as well, and be more practical. In general, it's not possible to provide a correct alphabetic order. For example, in German, 'ö' sorts after 'o', whereas in Swedish, it sorts after 'z'. In fact, in German, we have two different ways of sorting the ö: one is to treat it is a letter after o, and the other is to treat it as equivalent to oe. This is really interesting. I guess lexical ordering of alphabet letters is a locale thing, but Misc/ACKS isn't supposed to be any special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone. We can then call it the Misc/ACKS incompleteness theorem ;-) Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] order of Misc/ACKS
Eli Bendersky writes: special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone. Yes, it is. The examples were already given in this thread. The Han-using languages also have this problem, and Japanese is nondetermistic all by itself (there are kanji names which for historical reasons are pronounced in several different ways, and therefore cannot be placed in phonetic order without additional information). The sensible thing is to just sort in Unicode code point order, I think. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com