Updates:
        Status: Accepted
        Labels: Priority-Low

Comment #14 on issue 761 by [email protected]: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions
http://code.google.com/p/v8/issues/detail?id=761

I'm not sure what you are trying to send here. The "\u" suggests that it's part of a string, but in that case the following should be ASCII hex digits.

You can't send the character U+1D356 to the V8 JavaScript engine, since it simply doesn't recognize code points outside the BMP.

Since you are running in a browser, the above discussion doesn't apply - that was about the V8 API. When running in the browser, UTF-8 decoding is generally handled by WebKit.

If you want to send the two 16-bit words D834 and DF56, and the browser will be the one interpreting it first, you send the UTF-8 encoding as part of a normal HTML file or JS file. Then it will be expanded into the two surrogate codes before being passed to V8. It only works for valid character encodings (my U+10000 above should be encoded as F0 90 80 80, then it works too).

I haven't checked whether Chrome does something else to characters coming through a web-socket, but I would try the same thing there.

If you are embedding V8 directly, and creating strings through the API, then it's a different matter, because then you use the V8 UTF-8 decoder, which turns any non-BMP character into U+FFFD. That's the one that we might consider changing (if it can be done without breaking the parser/preparser interaction), but it's not a high priority. I'll reopen this feature request.

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to