>>We'll basically need 4 levels of string support:
>>
>>,--[ Larry Wall  
>>]--------------------------------------------------------
>>|  level 0    byte == character, "use bytes" basically
>>|  level 1    codepoint == character, what we seem to be aiming for,  
>>vaguely
>>|  level 2    grapheme == character, what the user usually wants
>>|  level 3    letter == character, what the "current language" wants
>>`---------------------------------------------------------------------- 
>>--
> 
> 
> Yes, and I'm boldly arguing that this is the wrong way to go, and I  
> guarantee you that you can't find any other string or encoding library  
> out there which takes an approach like that, or anyone asking for one.  
> I'm eager for Larry to comment.

I'm no Larry, either :-) but I think Larry is *not* saying that the
"localeness" or "languageness" should hang off each string (or *shudder*
off each substring).  What I've seen is that Larry wants the "level" to
be a lexical pragma (in Perl terms).  The "abstract string" stays the
same, but the operative level decides for _some_ ops what a "character
stands for.

The default level should be somewhere between levels 1 and 2 (again, it
depends on the ops).

For example, usually /./ means "match one Unicode code point" (a CCS
character code).  But one can somehow ratchet the level up to 2 and make
it mean "match one Unicode base character, followed by zero or more
modifier characters".  For level 3 the language (locale) needs to be
specified.

As another example, bitstring xor does not make much sense for anything
else than level zero.

The basic idea being that we cannot and should not dictate at what level
of abstraction the user wants to operate.  We will give a default level,
and ways to "zoom in" and "zoom out".

(If Larry is really saying that the "locale" should be an attribute of
the string value, I'm on the barricades with you, holding cobblestones
and Molotov cocktails...)

Larry can feel free to correct me :-)

Reply via email to