At 11:09 PM -0800 1/10/08, Larry Wall wrote:
It's really already very much like you want it to be.  Most Str objects
do not in fact have any byte semantics.  If you say "foo".bytes, that
is shorthand for "foo".bytes(:nf<c>, :enc<UTF-8>).  In other words,
you have to tell it what units you want the bytes to be measured in.
It just assumes utf-8 as a convenient default.  Likewise a Str does
not have any codepoint semantics unless you tell it the normalization
to assume.

Oh, that's good then.

Until now my interpretation of the Perl 6 situation is that while Str objects were conceptually grapheme strings, which .graphs refers to, you could access the currently in-use implementation details of that object using .codes and .bytes et al. Timtoady (user choice of abstraction level) and all that.

As such, in my own Muldis D language design, which is heavily influenced by Perl 6, and has its character strings as highest-possible-abstraction unicode (generally graphemes), I made a point that all character string operations were more implementation agnostic, hence rather than 'graphs' or 'codes' there are 'nfc_graphs' or 'nfd_codes' etc.

I'm glad to see, from your latest post, that this is how Perl 6 actually works as well. That .codes specifically works in terms of a particular normal-form (either a specified one or a default one) rather than the current implementation, and so makes this aspect of Perl 6 a lot more deterministic while portable.

-- Darren Duncan

Reply via email to