All in all, looks really good, especially the fact that it defaults to a grapheme view rather than a codepoint view. I also like the escape valve for drilling down to bytes if you really need it, but it reminds me that we'll need something similar for drilling down to codepoints for those charsets that define graphemes with multiple codepoints.
Ah, I'm too close to the source.
The charset API defines both "get_grapheme(s)" entry point and "get_codepoint(s)" entry point. get_codepoint returns a single 32-bit integer, the rest return STRINGs with the appropriate stuff in 'em. I think this covers what you're worried about.
--
Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk