Status: New
Owner: ----

New issue 761 by kangxn: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions
http://code.google.com/p/v8/issues/detail?id=761

Non-BMP unicode characters(over 0xFFFF) are presented in 4 bytes or more in UTF8, 2 uint16_ts in UTF-16, V8 can't handle that correctly.

Sample case:

Unicode: \U00010412
UTF-8: f0 90 90 92
UTF-16LE: 01 d8 12 dc ('\ud801\udc12')

String::New fails to accept the utf-8 string, returning an empty string.
And String::WriteUtf8 would write 6 bytes out for the sample.



--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to