On Mon, May 18, 2009 at 02:16:17PM -0400, Mark J. Reed wrote:
: Surrogates are just weird, since they have assigned code points even
: though they're purely an encoding mechanism.  As such, they straddle
: the line between abstract characters and an encoding form. I assume
: that if text comes in as UTF-16, the surrogates will disappear as far
: as character-level P6 code is concerned.

I devoutly hope so.  UTF-8 is much cleaner than UTF-16 in this regard.
(And it's why I qualified my "code point" with "abstract" earlier, to
mean the UTF-8 interpretion rather than the UTF-16 interpretation.)

: So is there any way for P6
: to manipulate surrogates as "characters"?  Maybe an adverb or trait?
: Or does one have to descend to the bytewise layer for that?  (As you
: said, that *normally* shouldn't be necessary outside encoding and
: decoding, where you need to do things bytewise anyway; just trying to
: cover all the bases...)

Buf16 should work for raw UTF-16 just fine.  That's one of the main
reasons we have buffers in sizes other than 8, after all.

Larry

Reply via email to