On Mon, May 18, 2009 at 02:16:17PM -0400, Mark J. Reed wrote: : Surrogates are just weird, since they have assigned code points even : though they're purely an encoding mechanism. As such, they straddle : the line between abstract characters and an encoding form. I assume : that if text comes in as UTF-16, the surrogates will disappear as far : as character-level P6 code is concerned.
I devoutly hope so. UTF-8 is much cleaner than UTF-16 in this regard. (And it's why I qualified my "code point" with "abstract" earlier, to mean the UTF-8 interpretion rather than the UTF-16 interpretation.) : So is there any way for P6 : to manipulate surrogates as "characters"? Maybe an adverb or trait? : Or does one have to descend to the bytewise layer for that? (As you : said, that *normally* shouldn't be necessary outside encoding and : decoding, where you need to do things bytewise anyway; just trying to : cover all the bases...) Buf16 should work for raw UTF-16 just fine. That's one of the main reasons we have buffers in sizes other than 8, after all. Larry