[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

codesite-noreply Tue, 13 Sep 2011 02:19:41 -0700

Updates:
        Owner: [email protected]

Comment #7 on issue 761 by [email protected]: Incorrect UTF-8encoding/decoding for non-BMP characters in String related functions

http://code.google.com/p/v8/issues/detail?id=761


No plans at the moment, no.

We will never (barring a development in ECMAScript) support surrogate pairsin JavaScript strings. Characters with codes in the surrogate pair rangeare considered a single stand-alone character from JavaScript's point ofview.

I'm reconsidering whether it's possible to convert all incoming UTF-8 intoUTF-16 sequences instead of UCS-2 (i.e., convert a non-BMP character into asurrogate pair).This will be on input only, and won't make sense outside of comments andString and RegExp literals (since a surrogate code isn't valid anywhereelse).It's likely to confuse users, since we won't ever interpret the result asUTF-16 anyway. That means that the length of a string literal containingnon-BMP characters is different from the number of Unicode characters sentas UTF-8.

If we can avoid problems with the parsers by always counting non-BMP inputas two code-points, then this might be possible, but it's not obvious thatit's desirable, except for very specific uses.

As such, no current plans to change anything.

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

Reply via email to