Re: [Pharo-dev] CasInsensitiveOrder map with upper/lower case characters for all latin1 entries

Henrik Johansen Wed, 06 Jan 2016 02:10:48 -0800

> On 06 Jan 2016, at 10:09 , Sven Van Caekenberghe <[email protected]> wrote:
> 
> Hallo Nicolai,
> 
>> On 06 Jan 2016, at 09:58, Nicolai Hess <[email protected]> wrote:
>> 
>> 
>> see issues 17302/17242/17227
>> String>>findString:startindAt:caseSensitive appears to be failing for 
>> extended charsets
>> String>>compare:caseSensitive seems to be failing for extended charset 
>> comparisons
>> String>>beginsWithEmpty:caseSensitive: has test failure for some cases
>> 
>> the problem is, the standard character set used for building the 
>> CaseInsensitiveOrder map
>> only maps characters from the set of ascii characters but it is used in the 
>> findString/compare/beginsWith-methods for all byte characters.
>> 
>> Any objections if we fill this map like it is suggested in case 17242 ?
>> 
>> CaseInsensitiveOrder := AsciiOrder copy.
>>    (0 to: 255) do:[ :v |
>>            | char upper |
>>            char := v asCharacter.
>>            upper := char asUppercase.
>>            upper isOctetCharacter
>>                ifFalse: [ upper := char ].
>>            CaseInsensitiveOrder at: char asciiValue + 1 put: 
>> (CaseInsensitiveOrder at: upper asciiValue + 1) ].
>> 
>> (the check for #isOctectCharacter is needed because for some entries the 
>> correspondending
>> uppercase character is not within this character set).
>> 
>> This would solve all three issues.
>> 
>> 
>> nicolai
> 
> That looks like a beautiful fix that makes perfect sense.
> If all tests are green, I see no reason not to do it.
> 
> Thanks and well done (again),
> 
> Sven
> 
> 
If you use asLowercase as the "canonical" ordering index instead, can you drop 
the isOctetCharacter test, or are there uppercase characters in latin1 with no 
corresponding lowercases?


I was about to suggest copying the CaseSensitiveOrder mapping instead of the 
AsciiOrder, since it has an ordering more refined than just A-Z, but that would 
quickly lead to wanting to extend it to a generic Latin1 sort order rather than 
just ASCII (é between e and f, for example), which is a can of worms that is 
hard to solve without making the ordering locale specific...
I mean, one could use the default Unicode ordering, but would inevitably 
receive complaints from, say, Norwegians, that å sorts between a and b instead 
of after z.

After all, it only affects the case where compare is used for ordering anyways.

Cheers,
Henry

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [Pharo-dev] CasInsensitiveOrder map with upper/lower case characters for all latin1 entries

Reply via email to