Dan Sugalski wrote:
> At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote:
> 
>> > 1) ISO-8859-1 is used to represent text in several different languages,
>>
>>> including German and Swedish. German and Swedish differ in their sort
>>> order, even for things they have in common. (For example, ö
>>> (o-with-diaeresis) is considered a separate letter in Swedish, but is
>>> just a accented "o" in German.) So (assuming my strings aren't
>>> explicitly langauge-tagged, or are tagged with "Dunno"), what sort
>>> order does ISO-8859-1 define? I'm not sure whether the national
>>> standards themselves actually define a sort order, so are we going to
>>
>>National standards yes, ISO 8859 (and the like) not.  In other words,
>>sorting standards exist, but they have (quite rightly) nothing to do
>>with sorting standards.
> 
> 
> ?

Ooops.  Replace the last "sorting" with "character".  That's what I get,
errrm, what you get, from writing email while watching evening news :-)

>>  Real life sorting is messy (multiple passes,
>>some parts may be ignored in some passes, acronyms, etc.) and worlds
>>apart from "let's compare the bytes one by one" or even from "let's
>>compare code points" or even from "let's compare grapheme (clusters)".
> 
> 
> True enough, though what I want the language for 
> is as much case-mangling as sorting.

I just think that having languages for strings is akin to
having types (dimensioned or -less) for numbers.
(Making 2 kg plus 3 Hz to croak, that kind of thing.)

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

Reply via email to