[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

codesite-noreply Thu, 01 Jul 2010 02:35:01 -0700

Comment #4 on issue 761 by [email protected]: Incorrect UTF-8encoding/decoding for non-BMP characters in String related functions

http://code.google.com/p/v8/issues/detail?id=761

After closer inspection, I don't see any way we can safely use the secondoption.

We parse the same input as either utf-8 or as a String value, and theyshould parse exactly the same.More precisely, the pre-parser parses it as utf-8, and the real parserparses it from a string later on. The pre-parser stores indices into thestring for later use, so the number of codepoints in the tworepresentations MUST be the same.That means that we can't turn one code-point into a surrogate pair as longas we don't parse the string as UTF-16 too.

I.e., the second option requires us to also change any parsing from stringvalues to interpreting the string value as UTF-16, which is a larger change.


--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

Reply via email to