CHARACTER SET - Contains meta-information about code points. This includes both the meaning of individual code points (65 is capital A, 776 is a combining diaresis) as well as a set of categorizations of code points (alpha, numeric, whitespace, punctuation, and so on), and a sorting order.
I'm assuming here that you are referring to things like Shift-JIS and ISO-8859-1 as character sets, right?
Questions (based on that assumption):
[*Note: assume everywhere below that the strings in question are not explicitly language-tagged (or, are tagged with "Dunno"--however it's supposed to work).]
1) ISO-8859-1 is used to represent text in several different languages, including German and Swedish. German and Swedish differ in their sort order, even for things they have in common. (For example, ö (o-with-diaeresis) is considered a separate letter in Swedish, but is just a accented "o" in German.) So (assuming my strings aren't explicitly langauge-tagged, or are tagged with "Dunno"), what sort order does ISO-8859-1 define? I'm not sure whether the national standards themselves actually define a sort order, so are we going to define one for every "character set"? In addition, many languages can be represented in several different "character set", so that seems to mean that the sort order for "öut" v. "out" will vary, depending on the "character set" used for those strings?
2) In light of the above, how do you sort an array of strings, assuming they're not all in the same "character set"?
3) If the answer to (2) is "you must upgrade them all to UTF-8", then that means that the sort order for an array might totally change when you add one new member, right? If the answer is, "for a given pair, when you compare them during sorting, only upgrade if their character sets don't match", then you open the door to non-convergent sorting (ie, the sort might never finish).
My worry here is that if the semantics of the Latin Capital Letter A ("A"), for example (or pick any other character), are allowed to differ between different "character sets", then we'll have problems for any binary string operation.
JEff