[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

codesite-noreply Thu, 15 Sep 2011 05:06:45 -0700

Updates:
        Status: Accepted
        Labels: Priority-Low

Comment #14 on issue 761 by [email protected]: Incorrect UTF-8encoding/decoding for non-BMP characters in String related functions

http://code.google.com/p/v8/issues/detail?id=761

I'm not sure what you are trying to send here. The "\u" suggests that it'spart of a string, but in that case the following should be ASCII hex digits.

You can't send the character U+1D356 to the V8 JavaScript engine, since itsimply doesn't recognize code points outside the BMP.

Since you are running in a browser, the above discussion doesn't apply -that was about the V8 API. When running in the browser, UTF-8 decoding isgenerally handled by WebKit.

If you want to send the two 16-bit words D834 and DF56, and the browserwill be the one interpreting it first, you send the UTF-8 encoding as partof a normal HTML file or JS file. Then it will be expanded into the twosurrogate codes before being passed to V8. It only works for validcharacter encodings (my U+10000 above should be encoded as F0 90 80 80,then it works too).

I haven't checked whether Chrome does something else to characters comingthrough a web-socket, but I would try the same thing there.

If you are embedding V8 directly, and creating strings through the API,then it's a different matter, because then you use the V8 UTF-8 decoder,which turns any non-BMP character into U+FFFD. That's the one that we mightconsider changing (if it can be done without breaking the parser/preparserinteraction), but it's not a high priority. I'll reopen this featurerequest.


--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

Reply via email to