2016-01-06 11:09 GMT+01:00 Henrik Johansen <[email protected]>:
> > > On 06 Jan 2016, at 10:09 , Sven Van Caekenberghe <[email protected]> wrote: > > > > Hallo Nicolai, > > > >> On 06 Jan 2016, at 09:58, Nicolai Hess <[email protected]> wrote: > >> > >> > >> see issues 17302/17242/17227 > >> String>>findString:startindAt:caseSensitive appears to be failing for > extended charsets > >> String>>compare:caseSensitive seems to be failing for extended charset > comparisons > >> String>>beginsWithEmpty:caseSensitive: has test failure for some cases > >> > >> the problem is, the standard character set used for building the > CaseInsensitiveOrder map > >> only maps characters from the set of ascii characters but it is used in > the findString/compare/beginsWith-methods for all byte characters. > >> > >> Any objections if we fill this map like it is suggested in case 17242 ? > >> > >> CaseInsensitiveOrder := AsciiOrder copy. > >> (0 to: 255) do:[ :v | > >> | char upper | > >> char := v asCharacter. > >> upper := char asUppercase. > >> upper isOctetCharacter > >> ifFalse: [ upper := char ]. > >> CaseInsensitiveOrder at: char asciiValue + 1 put: > (CaseInsensitiveOrder at: upper asciiValue + 1) ]. > >> > >> (the check for #isOctectCharacter is needed because for some entries > the correspondending > >> uppercase character is not within this character set). > >> > >> This would solve all three issues. > >> > >> > >> nicolai > > > > That looks like a beautiful fix that makes perfect sense. > > If all tests are green, I see no reason not to do it. > > > > Thanks and well done (again), > > > > Sven > > > > > If you use asLowercase as the "canonical" ordering index instead, can you > drop the isOctetCharacter test, or are there uppercase characters in latin1 > with no corresponding lowercases? > Interesting, good idea, there are no uppercase characters without lowercases. > > I was about to suggest copying the CaseSensitiveOrder mapping instead of > the AsciiOrder, since it has an ordering more refined than just A-Z, but > that would quickly lead to wanting to extend it to a generic Latin1 sort > order rather than just ASCII (é between e and f, for example), which is a > can of worms that is hard to solve without making the ordering locale > specific... > I mean, one could use the default Unicode ordering, but would inevitably > receive complaints from, say, Norwegians, that å sorts between a and b > instead of after z. > I didn't thought about ordering..., and I think I don't want to :) > > After all, it only affects the case where compare is used for ordering > anyways. > > Cheers, > Henry >
