On Fri, Apr 20, 2001 at 04:12:17PM -0400, Dan Sugalski wrote:
> At 02:34 PM 4/20/2001 -0500, Jarkko Hietaniemi wrote:
> > > >One additional datapoint to overload your brain with is to consider
> > > >the ambiguity of equality and comparison. Unicode normalization:
> > > >is A + grave equal to Agrave? Is Agrave less than Aacute? Unicode
> > > >collation combined with language/locale-specific rules.
> > >
> > > Comparisons on Unicode data will do it on the Unicode collation version of
> > > the string data. Equality checking will be done either on normalized data
> >
> >We need to include in our design a spot for the customization hooks, though.
>
> This'll be buried in the vtable code for the various data types. If you do
> a comparision with one or both arguments Unicode, then we do the collation
> thing. Oherwise the vtable code's free to do whatever it thinks is
> appropriate. (We can certainly encapsulate this stuff--Simon and I have
> both been considering some sort of loadable string type system)
I'm talking about things like the Funky French collation rules,
or the collation difference between Deutsch and svenskan, err,
German and Swedish. I think we need to have a *standard* way
of being able to customize the collation rules, in addition
to the Unicode basic collation.
> >The NFC seems like the way to go.
>
> I wasn't all that clear. For places where parrot^Wperl 6 normalizes it'll
> go for NFC. The question is whether we actually check for/force
> normalization, or rely on the programmer to Do The Right Thing.
Ahhh. Okay. I can see how overly aggressive NFCification can
burn cycles and annoy programmers/users.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen