Comment #6 on issue 761 by [email protected]: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions
http://code.google.com/p/v8/issues/detail?id=761

We are using Websocket for sending bulk data from native application to web browser application written in java script. our native application is sending bulk data in utf-8 decoding format.

web browser java script application works fine with data having character in Basic Multilingual Plane. If there is a utf-8 codded character outside the Basic Multilingual Plane (code point in surrogate area) it replace it with U+FFFD (REPLACEMENT CHARACTER). due to which java script application never know what string has been received.

One option is to fix this using utf-16 for code point in surrogate area. our orginal data is in utf-8 format and conversion from utf-8 to utf-16 for these characeted require to scan complete string and identify location of those characted and then replace them with utf-16 surrogate pair. This replcement is a costly operation and slowdown whole application.

Is there any plan to support code point in surrogate area in utf-8 format in browser itself to avoid this costly conversion.

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to