Re: [protobuf] BinaryToJsonString mishandling strings containing UTF8 multibyte characters

2016-04-10 Thread Ron Ben-Yosef
type_url should be an identifier of the type of the protobuf message you're 
transcoding. By default the url of a specific message type looks like 
*type.googleapis.com/.. 
*I'd imagine the prefix might be configurable to something other than 
*type.googleapis.com*, but can't say for sure, haven't tried changing it.

A TypeResolver instance can be created with the function 
*NewTypeResolverForDescriptorPool 
*declared in type_resolver_util.h:
https://github.com/google/protobuf/blob/master/src/google/protobuf/util/type_resolver_util.h

*NewTypeResolverForDescriptorPool *takes a pointer to a DescriptorPool. If 
the generated code for the relevant type of message has been compiled as 
part of your binary then its descriptor should be in the generated 
descriptor pool so you should just use that. Otherwise, you can build the 
descriptor and the pool from a FileDescriptorProto.


Usage might look something like this:
...


#include 
#include 
#include 
#include 


using namespace google::protobuf;
using namespace google::protobuf::util;

...


void foo(const Message& msg)
{
 ...

 std::string json_output;
 TypeResolver* resolver = NewTypeResolverForDescriptorPool(
"type.googleapis.com", &DescriptorPool::generated_pool());
 
 Status status = BinaryToJsonString(resolver, "type.googleapis.com/" + msg.
GetTypeName(), msg.SerializeAsString(), &json_output);
 
 std::cout << json_output;


 delete resolver;
 ...
}


...


I hope this helps.


Ron



On Tuesday, April 5, 2016 at 9:23:17 PM UTC+3, Zachary Deretsky wrote:

> Ron,
> could you post and example and some explanation on how to (de)serialize 
> proto3 to JSON using
> LIBPROTOBUF_EXPORT util::Status BinaryToJsonString(
> TypeResolver* resolver,
> const string& type_url,
> const string& binary_input,
> string* json_output,
> const JsonOptions& options);
>
> How to create TypeResolver and what is type_url?
>
> I am asking because you seem to be the only one with expertise on the 
> subject.
> Thank you, Zach. 
>
>
> On Thursday, November 26, 2015 at 12:51:07 AM UTC-8, Ron Ben-Yosef wrote:
>>
>>
>> On Wednesday, November 25, 2015 at 8:56:51 PM UTC+2, Feng Xiao wrote:
>>>
>>> Thanks for the explanation. Could you help file a bug for this on 
>>> protobuf github site? If you know of an solution to this, you are also 
>>> welcomed to send us a pull request.
>>>
>>
>> Sure, no problem.
>>
>> https://github.com/google/protobuf/issues/1010 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] BinaryToJsonString mishandling strings containing UTF8 multibyte characters

2016-04-05 Thread Zachary Deretsky
Ron,
could you post and example and some explanation on how to (de)serialize 
proto3 to JSON using
LIBPROTOBUF_EXPORT util::Status BinaryToJsonString(
TypeResolver* resolver,
const string& type_url,
const string& binary_input,
string* json_output,
const JsonOptions& options);

How to create TypeResolver and what is type_url?

I am asking because you seem to be the only one with expertise on the 
subject.
Thank you, Zach. 


On Thursday, November 26, 2015 at 12:51:07 AM UTC-8, Ron Ben-Yosef wrote:
>
>
> On Wednesday, November 25, 2015 at 8:56:51 PM UTC+2, Feng Xiao wrote:
>>
>> Thanks for the explanation. Could you help file a bug for this on 
>> protobuf github site? If you know of an solution to this, you are also 
>> welcomed to send us a pull request.
>>
>
> Sure, no problem.
>
> https://github.com/google/protobuf/issues/1010 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] BinaryToJsonString mishandling strings containing UTF8 multibyte characters

2015-11-26 Thread Ron

On Wednesday, November 25, 2015 at 8:56:51 PM UTC+2, Feng Xiao wrote:
>
> Thanks for the explanation. Could you help file a bug for this on protobuf 
> github site? If you know of an solution to this, you are also welcomed to 
> send us a pull request.
>

Sure, no problem.

https://github.com/google/protobuf/issues/1010 

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] BinaryToJsonString mishandling strings containing UTF8 multibyte characters

2015-11-25 Thread 'Feng Xiao' via Protocol Buffers
On Wed, Nov 25, 2015 at 12:47 AM Ron  wrote:

> Sure.
>
> For example, I defined the below message in the proto file:
> message Person
> {
>  string first_name = 1;
>  string last_name = 2;
> }
>
>
> When I set the first_name field to "Ron" both binary serialization and
> JSON serialization work fine.
>
>
> But when I set it to "רון" (as UTF8) , while the serialization to binary
> is correct (shown here as base64):
>
> *CgbXqNeV158=*
> ... when using *BinaryToJsonString *to get the JSON representation the
> value is mishandled and is ultimatately replaced with an empty string:
> { "firstName": "" }
>
>
> This example will probably only work correctly with compilers that define
> char as unsigned by default, but with compilers that define char as signed
> (such as Microsoft's) - I think you should get the same (incorrect) result
> I pasted above.
>
Thanks for the explanation. Could you help file a bug for this on protobuf
github site? If you know of an solution to this, you are also welcomed to
send us a pull request.


>
>
>
> On Tuesday, November 24, 2015 at 10:51:55 PM UTC+2, Feng Xiao wrote:
>>
>>
>>
>> On Tue, Nov 24, 2015 at 11:42 AM, Ron  wrote:
>>
>>> Hi,
>>>
>>> When using *BinaryToJsonString *or *BinaryToJsonStream*, I seem to
>>> encounter a problem whenever there's a message containing a string
>>> containing multibyte characters.
>>> After some debugging, it seems the place where things start to go wrong
>>> is in *ReadCodePoint* (in json_escaping.cc) when the first byte of the
>>> multibyte character is being read from the string (as char) and assigned
>>> into a variable of type uint32. This casting directly from a signed 1-byte
>>> value to an unsigned 4-byte value seems to produce values that are
>>> different than intended and different than expected a little later on by
>>> some *if-else* statements trying to look at that value to determine the
>>> correct length of the multibyte character. From there things go wrong and
>>> the string isn't serialized and just gets dropped...
>>>
>>> For now as a temporary solution I added a cast of the value returned by
>>> StringPiece's *operator[ ]* to uint8 before the assignment into uint32,
>>> but any advice or a more permanent solution will be appreciated.
>>>
>> Could you provide a sample input that will fail for this reason?
>>
>>
>
>>> Thanks,
>>> Ron
>>>
>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Protocol Buffers" group.
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to protobuf+u...@googlegroups.com.
>>> To post to this group, send email to prot...@googlegroups.com.
>>
>>
>>> Visit this group at http://groups.google.com/group/protobuf.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] BinaryToJsonString mishandling strings containing UTF8 multibyte characters

2015-11-25 Thread Ron
Sure.

For example, I defined the below message in the proto file:
message Person 
{
 string first_name = 1;
 string last_name = 2;
}


When I set the first_name field to "Ron" both binary serialization and JSON 
serialization work fine.


But when I set it to "רון" (as UTF8) , while the serialization to binary is 
correct (shown here as base64):

*CgbXqNeV158=*
... when using *BinaryToJsonString *to get the JSON representation the 
value is mishandled and is ultimatately replaced with an empty string:
{ "firstName": "" }


This example will probably only work correctly with compilers that define 
char as unsigned by default, but with compilers that define char as signed 
(such as Microsoft's) - I think you should get the same (incorrect) result 
I pasted above.



On Tuesday, November 24, 2015 at 10:51:55 PM UTC+2, Feng Xiao wrote:
>
>
>
> On Tue, Nov 24, 2015 at 11:42 AM, Ron > 
> wrote:
>
>> Hi,
>>
>> When using *BinaryToJsonString *or *BinaryToJsonStream*, I seem to 
>> encounter a problem whenever there's a message containing a string 
>> containing multibyte characters.
>> After some debugging, it seems the place where things start to go wrong 
>> is in *ReadCodePoint* (in json_escaping.cc) when the first byte of the 
>> multibyte character is being read from the string (as char) and assigned 
>> into a variable of type uint32. This casting directly from a signed 1-byte 
>> value to an unsigned 4-byte value seems to produce values that are 
>> different than intended and different than expected a little later on by 
>> some *if-else* statements trying to look at that value to determine the 
>> correct length of the multibyte character. From there things go wrong and 
>> the string isn't serialized and just gets dropped...
>>
>> For now as a temporary solution I added a cast of the value returned by 
>> StringPiece's *operator[ ]* to uint8 before the assignment into uint32, 
>> but any advice or a more permanent solution will be appreciated.
>>
> Could you provide a sample input that will fail for this reason?
>  
>
>>
>> Thanks,
>> Ron
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to protobuf+u...@googlegroups.com .
>> To post to this group, send email to prot...@googlegroups.com 
>> .
>> Visit this group at http://groups.google.com/group/protobuf.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] BinaryToJsonString mishandling strings containing UTF8 multibyte characters

2015-11-24 Thread 'Feng Xiao' via Protocol Buffers
On Tue, Nov 24, 2015 at 11:42 AM, Ron  wrote:

> Hi,
>
> When using *BinaryToJsonString *or *BinaryToJsonStream*, I seem to
> encounter a problem whenever there's a message containing a string
> containing multibyte characters.
> After some debugging, it seems the place where things start to go wrong is
> in *ReadCodePoint* (in json_escaping.cc) when the first byte of the
> multibyte character is being read from the string (as char) and assigned
> into a variable of type uint32. This casting directly from a signed 1-byte
> value to an unsigned 4-byte value seems to produce values that are
> different than intended and different than expected a little later on by
> some *if-else* statements trying to look at that value to determine the
> correct length of the multibyte character. From there things go wrong and
> the string isn't serialized and just gets dropped...
>
> For now as a temporary solution I added a cast of the value returned by
> StringPiece's *operator[ ]* to uint8 before the assignment into uint32,
> but any advice or a more permanent solution will be appreciated.
>
Could you provide a sample input that will fail for this reason?


>
> Thanks,
> Ron
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


[protobuf] BinaryToJsonString mishandling strings containing UTF8 multibyte characters

2015-11-24 Thread Ron
Hi,

When using *BinaryToJsonString *or *BinaryToJsonStream*, I seem to 
encounter a problem whenever there's a message containing a string 
containing multibyte characters.
After some debugging, it seems the place where things start to go wrong is 
in *ReadCodePoint* (in json_escaping.cc) when the first byte of the 
multibyte character is being read from the string (as char) and assigned 
into a variable of type uint32. This casting directly from a signed 1-byte 
value to an unsigned 4-byte value seems to produce values that are 
different than intended and different than expected a little later on by 
some *if-else* statements trying to look at that value to determine the 
correct length of the multibyte character. From there things go wrong and 
the string isn't serialized and just gets dropped...

For now as a temporary solution I added a cast of the value returned by 
StringPiece's *operator[ ]* to uint8 before the assignment into uint32, but 
any advice or a more permanent solution will be appreciated.

Thanks,
Ron

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.