Comment #29 on issue 761 by [email protected]: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions
http://code.google.com/p/v8/issues/detail?id=761
I don't see why this issue was closed... I think offering a full UTF-8 codec is still a valid feature request.
I think V8 should adopt a pragmatic approach there. Indeed, I don't care if JavaScript String.* simply ignores surrogate pairs (conforming to the standard), but I just want to be able to get UTF-8 data from and to a socket, a file, a database passing by a conversion into V8 string without losing data due to those few extra-BMP characters.
If you are interested, I've a patch of v8 to do so. Regards, M. -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev
