2013/3/20 Richard Wordingham <[email protected]>: > On Wed, 20 Mar 2013 07:34:38 -1000 > Markus Scherer <[email protected]> wrote: > >> More processing in the bowels >> of the collation code would be very complicated, and ambiguous: >> "file-5.txt" is probably file number 5 rather than file minus five. > > And "section-2.3.txt" should probably be sorted before > "section-2.12.txt"!
This also demonstrates that the generic ASCII full dot found in identifiers like filenames should not be interpreted as a decimal point in identifier collators. If one wnats to sort filenames where the dot should be interpreted unambeguously as a decimal point, that decimal point should be encoded distinctly (just like the hyphen-minus beng replaced by the mathematical minus sign). Identifier collators have to follow different rules that generic collators that parse normal texts really written in humane languages : they can only process unsigned integers and nothing else (excluding also the grouping separators, so a file named "item 1,000.txt" will sort **before** "item 2.txt", but "item 1000.txt" will sort after "item 2.txt", there's no clear way about how to encode commas, breaking or non-breaking whitespaces, dots, or apostrophes which may be used as grouping separators, not even any standardized sequence encoded with these punctuations or whitespaces plus a semantic variation selector).

