At 11:09 PM -0800 1/10/08, Larry Wall wrote:
It's really already very much like you want it to be. Most Str objects
do not in fact have any byte semantics. If you say "foo".bytes, that
is shorthand for "foo".bytes(:nf<c>, :enc<UTF-8>). In other words,
you have to tell it what units you want the bytes to be measured in.
It just assumes utf-8 as a convenient default. Likewise a Str does
not have any codepoint semantics unless you tell it the normalization
to assume.
Oh, that's good then.
Until now my interpretation of the Perl 6 situation is that while Str
objects were conceptually grapheme strings, which .graphs refers to,
you could access the currently in-use implementation details of that
object using .codes and .bytes et al. Timtoady (user choice of
abstraction level) and all that.
As such, in my own Muldis D language design, which is heavily
influenced by Perl 6, and has its character strings as
highest-possible-abstraction unicode (generally graphemes), I made a
point that all character string operations were more implementation
agnostic, hence rather than 'graphs' or 'codes' there are
'nfc_graphs' or 'nfd_codes' etc.
I'm glad to see, from your latest post, that this is how Perl 6
actually works as well. That .codes specifically works in terms of a
particular normal-form (either a specified one or a default one)
rather than the current implementation, and so makes this aspect of
Perl 6 a lot more deterministic while portable.
-- Darren Duncan