Status: New
Owner: ----
New issue 761 by kangxn: Incorrect UTF-8 encoding/decoding for non-BMP
characters in String related functions
http://code.google.com/p/v8/issues/detail?id=761
Non-BMP unicode characters(over 0xFFFF) are presented in 4 bytes or more in
UTF8, 2 uint16_ts in UTF-16, V8 can't handle that correctly.
Sample case:
Unicode: \U00010412
UTF-8: f0 90 90 92
UTF-16LE: 01 d8 12 dc ('\ud801\udc12')
String::New fails to accept the utf-8 string, returning an empty string.
And String::WriteUtf8 would write 6 bytes out for the sample.
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev