Dan Sugalski wrote: > At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote: > >> > 1) ISO-8859-1 is used to represent text in several different languages, >> >>> including German and Swedish. German and Swedish differ in their sort >>> order, even for things they have in common. (For example, ö >>> (o-with-diaeresis) is considered a separate letter in Swedish, but is >>> just a accented "o" in German.) So (assuming my strings aren't >>> explicitly langauge-tagged, or are tagged with "Dunno"), what sort >>> order does ISO-8859-1 define? I'm not sure whether the national >>> standards themselves actually define a sort order, so are we going to >> >>National standards yes, ISO 8859 (and the like) not. In other words, >>sorting standards exist, but they have (quite rightly) nothing to do >>with sorting standards. > > > ?
Ooops. Replace the last "sorting" with "character". That's what I get, errrm, what you get, from writing email while watching evening news :-) >> Real life sorting is messy (multiple passes, >>some parts may be ignored in some passes, acronyms, etc.) and worlds >>apart from "let's compare the bytes one by one" or even from "let's >>compare code points" or even from "let's compare grapheme (clusters)". > > > True enough, though what I want the language for > is as much case-mangling as sorting. I just think that having languages for strings is akin to having types (dimensioned or -less) for numbers. (Making 2 kg plus 3 Hz to croak, that kind of thing.) -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen