Re: Processing Digit Variants

Philippe Verdy Thu, 21 Mar 2013 15:02:12 -0700

2013/3/20 Richard Wordingham <[email protected]>:
> On Wed, 20 Mar 2013 07:34:38 -1000
> Markus Scherer <[email protected]> wrote:
>
>> More processing in the bowels
>> of the collation code would be very complicated, and ambiguous:
>> "file-5.txt" is probably file number 5 rather than file minus five.
>
> And "section-2.3.txt" should probably be sorted before
> "section-2.12.txt"!


This also demonstrates that the generic ASCII full dot found in
identifiers like filenames should not be interpreted as a decimal
point in identifier collators.
If one wnats to sort filenames where the dot should be interpreted
unambeguously as a decimal point, that decimal point should be encoded
distinctly (just like the hyphen-minus beng replaced by the
mathematical minus sign).

Identifier collators have to follow different rules that generic
collators that parse normal texts really written in humane languages :
they can only process unsigned integers and nothing else (excluding
also the grouping separators, so a file named "item 1,000.txt" will
sort **before** "item 2.txt", but "item 1000.txt" will sort after
"item 2.txt", there's no clear way about how to encode commas,
breaking or non-breaking whitespaces, dots, or apostrophes which may
be used as grouping separators, not even any standardized sequence
encoded with these punctuations or whitespaces plus a semantic
variation selector).

Re: Processing Digit Variants

Reply via email to