2013/3/20 Markus Scherer <[email protected]>: > Numeric collation is actually much more limited than number parsing, to > strictly strings of digits, not including sign (thus only non-negative), > decimal, exponent, etc. More processing in the bowels of the collation code > would be very complicated, and ambiguous: "file-5.txt" is probably file > number 5 rather than file minus five.
File names are identifiers, they are not real phulane language, thye don't obey to any grammatical rule from any language, even if they may be named according to some convention in a given language, but they are frequently abbreviated and use a reduced set of characters. So collation parsing of numbers for sorting filenames is in fact collation parsing in technical identifiers. It would be different if performing collation in a true text like a book, or even in OCR'd facsimile of accounting reports, when preparing them to rebuild a spreadsheet. Imagine toy import a list of filenames in a spreadsheet, the column type would be set as "text", not numbers. In such cases, sorting as "text" should use the sort options appropriate for sorting identifiers. Numbers imported in a "number" column should convert any number, accepting signs, exponent notations, and correctly filtering out control formats ot compute the effective value. So for converting formatted numbers to effective numeric values, the lenient parsing should be used (numbers will then not sort using collation, but using their effective numeric value after this operation). If the lenient parsing of numbers fails, the column in the spreadsheet will be trated as "text" and will sort with collation but with a reduced supported format for numbers (so effectively the ambiguous ASCII hyphen-minus will be treated as a.hyphen punctuation, not as a minus sign. If filenames have to be sorted according to the represented numeric value, the ambiguous ASCII hyphen-minus should not be used, ans the mathematical MINUS character should be used in their name (and it shoul dremain interpreted as a sign in the more restrictive collation parsing of numbers in identifiers). .

