[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

codesite-noreply Wed, 14 Sep 2011 01:20:03 -0700

Comment #12 on issue 761 by [email protected]: Incorrect UTF-8encoding/decoding for non-BMP characters in String related functions

http://code.google.com/p/v8/issues/detail?id=761

While it would be convenient to convert UTF-8 to UTF-16 and then treat itas UCS-2, we should still be compatible with other browsers.Currently we match Safari and IE: A four-byte sequence like F0 80 80 80(UTF-8 of U+10000) is converted to four U+FFFD characters (probably becausethe first byte isn't recognized by the decoder, and the following bytesaren't valid UTF-8 starters).(In comparison, Opera and Firefox read it as one U+FFFD. They obviouslydecode the UTF-8 correctly, and then converts the one non-BMP character toinvalid).


We should probably keep compatibility for now.

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

Reply via email to