Comment #15 on issue 761 by [email protected]: Incorrect UTF-8 encoding/decoding for non-BMP characters in String related functions
http://code.google.com/p/v8/issues/detail?id=761

Thanks for your reply. Actully I am sending UTF-8 Encodded Data from a Native (C) application to Client inside Chrome browser which recieve data from WebSocket using javascript (I think i use V8 for same). This Data containg non-BMP character as well.

But to due to limitation of V8 Engine as I have seen in Chrome browser it has been converted into U+FFFD

So I have tried non-BMP character in UTF-16 surrogate pair
e.g. charcter (𝍖) U+1D356  in UTF-8=(f0 9d 8d 96) in UTF-16=(D834 DF56)

Native Apps:
        char *p = out;
        *p++ = 0xd8;
        *p++ = 0x34;
        *p++ = 0xdf;
        *p++ = 0x56;
        *p = '\0';
        
JavaScrip in Chrome:
        var ws = new WebSocket('ws://localhost:12345/mySession');
        this.ws.onmessage = function(evt)
        {
                var reply = evt.data;
console.log ('reply :'+ reply); // empty string received(when send non-BMP char in UTF-16 )
                                                                                
// replacement char U+FFFD ( when send non-BMP char in UTF-8 )
        }
        
This code is Native (C) sending data in UTF-16 ASCII hex digits. But at Chrome browser Java script Application receive empty string

I am not sure it is a problem with V8 or Webkit or Chrome. But finally data in UTF-16(surrogate pair) is not received.

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to