[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

codesite-noreply Mon, 12 Mar 2012 06:30:05 -0700

Comment #33 on issue 761 by [email protected]: Incorrect UTF-8encoding/decoding for non-BMP characters in String related functions

http://code.google.com/p/v8/issues/detail?id=761

The bleeding edge revision 11007 has fixes to handle surrogate pairs oninput and output. The intended behaviour is:


* 4-byte UTF-8 sequences turn into 2 surrogates in the JS String

* Two 3-byte UTF-8 sequences can also be used to create 2 surrogates in theJS String* String.fromCharCode(x) takes a single UTF-16 code unit, so you stillcan't give it numbers above 0xffff* Most places in JS (RegExp, [], charCodeAt, charAt, etc.) work on UTF-16code units with no special treatment for surrogates.* On output to UTF-8, unmatched surrogates map to a 3-byte UTF-8 sequence,and surrogate pairs map to a single 4-byte UTF-8 sequence.


--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: Issue 761 in v8: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions

Reply via email to