[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

codesite-noreply Mon, 19 Sep 2011 01:50:39 -0700

Comment #18 on issue 761 by [email protected]: Incorrect UTF-8encoding/decoding for non-BMP characters in String related functions

http://code.google.com/p/v8/issues/detail?id=761

The \uxxxx sequence is recognized in ECMAScript string and RegExp literalsand in identifiers only. It's always a six-character sequence, and the 'x'smust be ASCII hex digits. The above example used four-byte sequences withnon-ASCII hex digits, and not obviously inside a string or RegExp literal.

In any case, I agree that we should have conformity in behavior. If non-BMPcode points encoded as UTF-8 is treated in one way when entering thebrowser as a script, but differently if entering as a web-socket, then it'sa problem. I'd say it's the responsibility of the browser code to do thesame thing before passing it on to JavaScript.

I'll see if I can reproduce it locally, and then I'll open a Chromium bugfor it (or you can go ahead and do that, since you have an examplealready). Then we'll see if it should be handled inside Webkit or non-V8Chromium (as the other incoming UTF-8 data), or if it should be delegatedto V8 (in which case our UTF-8 decoder needs changing).


--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

Reply via email to