Ok, this is in line with what how I understood this paragraph in perluniintro:
The short answer is that by default, Perl compares strings ("lt",
"le", "cmp", "ge", "gt") based only on the code points of the char-
acters. In the above case, the answer is "after", since 0x00C1 >
0x00C0.
So is it just by chance that these French words are accurately sorted?
I think a "qualified yes" here is in order...
% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort qw(côte côté cote coté)'
cote coté côte côté
Is this the famous French "backwards accents" rule in action? (http://www-clips.imag.fr/geta/gilles.serasset/tri-du-francais.html) (no, I don't speak French)
But in this case, with those particular words, I think ISO Latin 1 (none of the characters are beyond ISO Latin 1) just "happens" to work right. o < ô, and e < é.
Some more links (database related since they have had to think about these things
for years already) that hopefully explain some of the problems related to "linguistic sorting":
http://www.engin.umich.edu/caen/wls/software/oracle/server.901/a90236/ ch4.htm
http://developer.mimer.com/documentation/html_92/ Mimer_SQL_Engine_DocSet/Mimer_Concepts14.html
--
Thanks, -- Eric Cholet
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen