Updates:
Owner: [email protected]
Comment #7 on issue 761 by [email protected]: Incorrect UTF-8
encoding/decoding for non-BMP characters in String related functions
http://code.google.com/p/v8/issues/detail?id=761
No plans at the moment, no.
We will never (barring a development in ECMAScript) support surrogate pairs
in JavaScript strings. Characters with codes in the surrogate pair range
are considered a single stand-alone character from JavaScript's point of
view.
I'm reconsidering whether it's possible to convert all incoming UTF-8 into
UTF-16 sequences instead of UCS-2 (i.e., convert a non-BMP character into a
surrogate pair).
This will be on input only, and won't make sense outside of comments and
String and RegExp literals (since a surrogate code isn't valid anywhere
else).
It's likely to confuse users, since we won't ever interpret the result as
UTF-16 anyway. That means that the length of a string literal containing
non-BMP characters is different from the number of Unicode characters sent
as UTF-8.
If we can avoid problems with the parsers by always counting non-BMP input
as two code-points, then this might be possible, but it's not obvious that
it's desirable, except for very specific uses.
As such, no current plans to change anything.
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev