>>We'll basically need 4 levels of string support: >> >>,--[ Larry Wall >>]-------------------------------------------------------- >>| level 0 byte == character, "use bytes" basically >>| level 1 codepoint == character, what we seem to be aiming for, >>vaguely >>| level 2 grapheme == character, what the user usually wants >>| level 3 letter == character, what the "current language" wants >>`---------------------------------------------------------------------- >>-- > > > Yes, and I'm boldly arguing that this is the wrong way to go, and I > guarantee you that you can't find any other string or encoding library > out there which takes an approach like that, or anyone asking for one. > I'm eager for Larry to comment.
I'm no Larry, either :-) but I think Larry is *not* saying that the "localeness" or "languageness" should hang off each string (or *shudder* off each substring). What I've seen is that Larry wants the "level" to be a lexical pragma (in Perl terms). The "abstract string" stays the same, but the operative level decides for _some_ ops what a "character stands for. The default level should be somewhere between levels 1 and 2 (again, it depends on the ops). For example, usually /./ means "match one Unicode code point" (a CCS character code). But one can somehow ratchet the level up to 2 and make it mean "match one Unicode base character, followed by zero or more modifier characters". For level 3 the language (locale) needs to be specified. As another example, bitstring xor does not make much sense for anything else than level zero. The basic idea being that we cannot and should not dictate at what level of abstraction the user wants to operate. We will give a default level, and ways to "zoom in" and "zoom out". (If Larry is really saying that the "locale" should be an attribute of the string value, I'm on the barricades with you, holding cobblestones and Molotov cocktails...) Larry can feel free to correct me :-)