Re: [v8-users] Why UTF surrogate pairs are modified by V8?

2016-06-03 Thread Roman Budnyjj
Thank you for explanation!

2016-06-03 9:42 GMT+03:00 Jochen Eisinger :

> Your input string is encoded as CESU-8, not UTF-8. Older versions of V8
> would silently accept that encoding, however, this leads to broken behavior
> when interacting with libraries actually expecting UTF-8, so we changed our
> implementation to actually require UTF-8, and invalid characters are
> replaced when converting the string to our internal utf-16 representation.
>
> Roman Budnyjj  schrieb am Do., 2. Juni 2016, 20:44:
>
>> Hi guys,
>> I'm trying to pass some string data to the JS functions, managed by V8.
>> These strings are UTF8-encoded and contain surrogate pairs (Emoji).
>> I've found, that for some reason new versions (5.1.281.56) of library
>> modify these strings, so
>> the contents of strings after conversion to std types back are not the
>> same, as it was initially:
>>
>> #include 
>>
>> #include 
>>
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>>
>> class SampleArrayBufferAllocator : public v8::ArrayBuffer::Allocator {
>>  public:
>>   virtual void* Allocate(size_t length) {
>> void* data = AllocateUninitialized(length);
>> return data == NULL ? data : memset(data, 0, length);
>>   }
>>   virtual void* AllocateUninitialized(size_t length) { return
>> malloc(length); }
>>   virtual void Free(void* data, size_t) { free(data); }
>> };
>>
>> int main(int argc, char* argv[]) {
>> v8::V8::InitializeICU();
>> v8::V8::InitializeExternalStartupData(argv[0]);
>> v8::Platform* platform = v8::platform::CreateDefaultPlatform();
>> v8::V8::InitializePlatform(platform);
>> v8::V8::Initialize();
>> v8::V8::SetFlagsFromCommandLine(, argv, true);
>> SampleArrayBufferAllocator array_buffer_allocator;
>> v8::Isolate::CreateParams create_params;
>> create_params.array_buffer_allocator = _buffer_allocator;
>> v8::Isolate* isolate = v8::Isolate::New(create_params);
>> {
>> v8::Isolate::Scope isolate_scope(isolate);
>> v8::HandleScope handle_scope(isolate);
>> // v8::Local context = CreateShellContext(isolate);
>>
>> // \uD83C\uDC32\
>> std::string src("\355\240\274\355\260\262");
>> std::string dst =
>> *v8::String::Utf8Value(
>> v8::String::NewFromUtf8(
>> isolate, src.c_str(),
>> v8::NewStringType::kNormal).ToLocalChecked());
>> if (src != dst) {
>> printf("!\n");
>> }
>> }
>>
>> return 0;
>> }
>>
>> Both on my x64 machine and android-19 (ARM) it prints "!".
>> I also want to mention, that on older versions of V8 (3.27.34) this
>> string stays unmodified.
>> Could you please describe the reasons of such behavior?
>>
>> --
>> --
>> v8-users mailing list
>> v8-users@googlegroups.com
>> http://groups.google.com/group/v8-users
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to v8-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> --
> v8-users mailing list
> v8-users@googlegroups.com
> http://groups.google.com/group/v8-users
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "v8-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/v8-users/c01n8mxAqTQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> v8-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [v8-users] Why UTF surrogate pairs are modified by V8?

2016-06-03 Thread Jochen Eisinger
Your input string is encoded as CESU-8, not UTF-8. Older versions of V8
would silently accept that encoding, however, this leads to broken behavior
when interacting with libraries actually expecting UTF-8, so we changed our
implementation to actually require UTF-8, and invalid characters are
replaced when converting the string to our internal utf-16 representation.

Roman Budnyjj  schrieb am Do., 2. Juni 2016, 20:44:

> Hi guys,
> I'm trying to pass some string data to the JS functions, managed by V8.
> These strings are UTF8-encoded and contain surrogate pairs (Emoji).
> I've found, that for some reason new versions (5.1.281.56) of library
> modify these strings, so
> the contents of strings after conversion to std types back are not the
> same, as it was initially:
>
> #include 
>
> #include 
>
> #include 
> #include 
> #include 
> #include 
> #include 
>
>
> class SampleArrayBufferAllocator : public v8::ArrayBuffer::Allocator {
>  public:
>   virtual void* Allocate(size_t length) {
> void* data = AllocateUninitialized(length);
> return data == NULL ? data : memset(data, 0, length);
>   }
>   virtual void* AllocateUninitialized(size_t length) { return
> malloc(length); }
>   virtual void Free(void* data, size_t) { free(data); }
> };
>
> int main(int argc, char* argv[]) {
> v8::V8::InitializeICU();
> v8::V8::InitializeExternalStartupData(argv[0]);
> v8::Platform* platform = v8::platform::CreateDefaultPlatform();
> v8::V8::InitializePlatform(platform);
> v8::V8::Initialize();
> v8::V8::SetFlagsFromCommandLine(, argv, true);
> SampleArrayBufferAllocator array_buffer_allocator;
> v8::Isolate::CreateParams create_params;
> create_params.array_buffer_allocator = _buffer_allocator;
> v8::Isolate* isolate = v8::Isolate::New(create_params);
> {
> v8::Isolate::Scope isolate_scope(isolate);
> v8::HandleScope handle_scope(isolate);
> // v8::Local context = CreateShellContext(isolate);
>
> // \uD83C\uDC32\
> std::string src("\355\240\274\355\260\262");
> std::string dst =
> *v8::String::Utf8Value(
> v8::String::NewFromUtf8(
> isolate, src.c_str(),
> v8::NewStringType::kNormal).ToLocalChecked());
> if (src != dst) {
> printf("!\n");
> }
> }
>
> return 0;
> }
>
> Both on my x64 machine and android-19 (ARM) it prints "!".
> I also want to mention, that on older versions of V8 (3.27.34) this string
> stays unmodified.
> Could you please describe the reasons of such behavior?
>
> --
> --
> v8-users mailing list
> v8-users@googlegroups.com
> http://groups.google.com/group/v8-users
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to v8-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.