Re: ICU incorporation and string changes heads-up

Jeff Clites Sat, 10 Apr 2004 03:38:58 -0700

On Apr 10, 2004, at 3:19 AM, Jarkko Hietaniemi wrote:

We'll basically need 4 levels of string support:

,--[ Larry Wall ]-------------------------------------------------------- | level 0 byte == character, "use bytes" basically | level 1 codepoint == character, what we seem to be aiming for, vaguely | level 2 grapheme == character, what the user usually wants | level 3 letter == character, what the "current language" wants `-------------------------------------------------------------------- -- --
Yes, and I'm boldly arguing that this is the wrong way to go, and I
guarantee you that you can't find any other string or encoding library
out there which takes an approach like that, or anyone asking for one.
I'm eager for Larry to comment.
I'm no Larry, either :-) but I think Larry is *not* saying that the "localeness" or "languageness" should hang off each string (or *shudder* off each substring). What I've seen is that Larry wants the "level" to be a lexical pragma (in Perl terms). The "abstract string" stays the same, but the operative level decides for _some_ ops what a "character stands for.

That makes a lot of sense to me, and I'd further it by saying that levels 2 and 3 don't mean that we need to have "grapheme" or "letter" data types, per se. (If we tried to have those, we'd need properties databases to go with them, and we'd go crazy.)

For example, usually /./ means "match one Unicode code point" (a CCS character code). But one can somehow ratchet the level up to 2 and make it mean "match one Unicode base character, followed by zero or more modifier characters". For level 3 the language (locale) needs to be specified.

Another example could be that at level 2 (and 3), maybe "eq" automatically normalizes before doing string comparisons, and at levels 1 and 0 it doesn't.

(If Larry is really saying that the "locale" should be an attribute of
the string value, I'm on the barricades with you, holding cobblestones
and Molotov cocktails...)

It's nice to have company!

JEff

Re: ICU incorporation and string changes heads-up

Reply via email to