Re: [Pharo-dev] CasInsensitiveOrder map with upper/lower case characters for all latin1 entries

Nicolai Hess Wed, 06 Jan 2016 02:21:17 -0800

2016-01-06 11:09 GMT+01:00 Henrik Johansen <[email protected]>:


>
> > On 06 Jan 2016, at 10:09 , Sven Van Caekenberghe <[email protected]> wrote:
> >
> > Hallo Nicolai,
> >
> >> On 06 Jan 2016, at 09:58, Nicolai Hess <[email protected]> wrote:
> >>
> >>
> >> see issues 17302/17242/17227
> >> String>>findString:startindAt:caseSensitive appears to be failing for
> extended charsets
> >> String>>compare:caseSensitive seems to be failing for extended charset
> comparisons
> >> String>>beginsWithEmpty:caseSensitive: has test failure for some cases
> >>
> >> the problem is, the standard character set used for building the
> CaseInsensitiveOrder map
> >> only maps characters from the set of ascii characters but it is used in
> the findString/compare/beginsWith-methods for all byte characters.
> >>
> >> Any objections if we fill this map like it is suggested in case 17242 ?
> >>
> >> CaseInsensitiveOrder := AsciiOrder copy.
> >>    (0 to: 255) do:[ :v |
> >>            | char upper |
> >>            char := v asCharacter.
> >>            upper := char asUppercase.
> >>            upper isOctetCharacter
> >>                ifFalse: [ upper := char ].
> >>            CaseInsensitiveOrder at: char asciiValue + 1 put:
> (CaseInsensitiveOrder at: upper asciiValue + 1) ].
> >>
> >> (the check for #isOctectCharacter is needed because for some entries
> the correspondending
> >> uppercase character is not within this character set).
> >>
> >> This would solve all three issues.
> >>
> >>
> >> nicolai
> >
> > That looks like a beautiful fix that makes perfect sense.
> > If all tests are green, I see no reason not to do it.
> >
> > Thanks and well done (again),
> >
> > Sven
> >
> >
> If you use asLowercase as the "canonical" ordering index instead, can you
> drop the isOctetCharacter test, or are there uppercase characters in latin1
> with no corresponding lowercases?
>

Interesting, good idea, there are no uppercase characters without
lowercases.


>
> I was about to suggest copying the CaseSensitiveOrder mapping instead of
> the AsciiOrder, since it has an ordering more refined than just A-Z, but
> that would quickly lead to wanting to extend it to a generic Latin1 sort
> order rather than just ASCII (é between e and f, for example), which is a
> can of worms that is hard to solve without making the ordering locale
> specific...
> I mean, one could use the default Unicode ordering, but would inevitably
> receive complaints from, say, Norwegians, that å sorts between a and b
> instead of after z.
>

I didn't thought about ordering..., and I think I don't want to :)


>
> After all, it only affects the case where compare is used for ordering
> anyways.
>
> Cheers,
> Henry
>

Re: [Pharo-dev] CasInsensitiveOrder map with upper/lower case characters for all latin1 entries

Reply via email to