> On 06 Jan 2016, at 10:09 , Sven Van Caekenberghe <[email protected]> wrote: > > Hallo Nicolai, > >> On 06 Jan 2016, at 09:58, Nicolai Hess <[email protected]> wrote: >> >> >> see issues 17302/17242/17227 >> String>>findString:startindAt:caseSensitive appears to be failing for >> extended charsets >> String>>compare:caseSensitive seems to be failing for extended charset >> comparisons >> String>>beginsWithEmpty:caseSensitive: has test failure for some cases >> >> the problem is, the standard character set used for building the >> CaseInsensitiveOrder map >> only maps characters from the set of ascii characters but it is used in the >> findString/compare/beginsWith-methods for all byte characters. >> >> Any objections if we fill this map like it is suggested in case 17242 ? >> >> CaseInsensitiveOrder := AsciiOrder copy. >> (0 to: 255) do:[ :v | >> | char upper | >> char := v asCharacter. >> upper := char asUppercase. >> upper isOctetCharacter >> ifFalse: [ upper := char ]. >> CaseInsensitiveOrder at: char asciiValue + 1 put: >> (CaseInsensitiveOrder at: upper asciiValue + 1) ]. >> >> (the check for #isOctectCharacter is needed because for some entries the >> correspondending >> uppercase character is not within this character set). >> >> This would solve all three issues. >> >> >> nicolai > > That looks like a beautiful fix that makes perfect sense. > If all tests are green, I see no reason not to do it. > > Thanks and well done (again), > > Sven > > If you use asLowercase as the "canonical" ordering index instead, can you drop the isOctetCharacter test, or are there uppercase characters in latin1 with no corresponding lowercases?
I was about to suggest copying the CaseSensitiveOrder mapping instead of the AsciiOrder, since it has an ordering more refined than just A-Z, but that would quickly lead to wanting to extend it to a generic Latin1 sort order rather than just ASCII (é between e and f, for example), which is a can of worms that is hard to solve without making the ordering locale specific... I mean, one could use the default Unicode ordering, but would inevitably receive complaints from, say, Norwegians, that å sorts between a and b instead of after z. After all, it only affects the case where compare is used for ordering anyways. Cheers, Henry
signature.asc
Description: Message signed with OpenPGP using GPGMail
