Re: [protobuf] JSON Serialization Performance

2018-03-23 Thread Edward Clark
Thanks for the input.  I'm not looking to beat binary serialization 
performance, but I would like to avoid having to hand write the JSON 
serialization for insertion into elasticsearch.  I understand the proto 
JSON serialization has to lookup field names to generate the JSON, which 
isn't required when building manually, but I wouldn't expect that to 
account for an order of magnitude difference.

A repeated double would not give the desired JSON output.  This is used for 
the coordinates section of GeoJson (what elasticsearch understands).

Thanks,
Ed

On Thursday, March 22, 2018 at 6:45:41 PM UTC-4, Feng Xiao wrote:
>
> On Thu, Mar 22, 2018 at 8:23 AM, Edward Clark  > wrote:
>
>> Howdy,
>>
>> I'm working on a project that recently needed to insert data represented 
>> by protobufs into elasticsearch.  Using the built in JSON serialization we 
>> were able to quickly get data into elasticsearch, however, the JSON 
>> serialization seems to be rather slow when compared to generating with a 
>> library like rapidjson.  Is this expected or is a likely we're doing 
>> something wrong? 
>>
> It's expected for proto-to-JSON conversion to be slower (and likely much 
> slower) than a dedicated JSON library converting objects designed to 
> represent JSON objects to JSON. It's like comparing a library that converts 
> rapidjson::Document to protobuf binary format against protobuf binary 
> serialization. The latter is definitely going to be faster no matter how 
> you optimize the former. Proto objects are just not designed to be 
> efficiently converted to JSON.
>
> There are ways to improve the proto to JSON conversion though, but at the 
> end of day it won't going to beat proto to proto binary serialization so 
> usually performance sensitive services will just support proto binary 
> format instead. 
>  
>
>> Below is info on what we're using, and relative serialization performance 
>> results.  Surprisingly, rapidjson serialization was faster than protobufs 
>> binary serialization in some cases, which leads me to believe I'm doing 
>> something wrong.
>>
>> Ubuntu 16.04
>> GCC 7.3, std=c++17, libstdc++11 string api
>> Protobuf 3.5.1.1 compiled with -O3, proto3 syntax
>>
>> I've measure the performance of 3 cases, serializing the protobuf to 
>> binary, serializing the protobuf to JSON via MessageToJSONString, and 
>> building a rapidjson::Document from the protobuf and then serializing that 
>> to JSON.  All tests use the same message with different portions of the 
>> message populated, 100,000 iterations.  The json generated from the 
>> protobuf and rapidjson match exactly.
>>
>> Test 1, a single string field populated.
>> proto binary: 0.01s
>> proto json:0.50s
>> rapidjson: 0.02s
>>
>> Test 2, 1 top level string field, 1 nested object with 3 more string 
>> fields.
>> proto binary: 0.02s
>> proto json:1.06s
>> rapidjson: 0.05s
>>
>> Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing 
>> doubles of the format, [[[double, double], [double, double], ...]], 36 
>> pairs of doubles total.
>> *proto binary: 1.50s*
>> *proto json:8.87s*
>> *rapidjson: 0.41s*
>>
> I think this is because of your choice of using 
> google::protobuf::ListValue. That type (along with 
> google::protobuf::Value/Struct) is specifically designed to mimic arbitrary 
> JSON content with proto and is far from efficient compared to protobuf 
> primitive types. I would just use a "repeated double" to represent these 36 
> pairs of doubles.
>  
>
>>
>> Protobuf binary serialization code:
>> std::string toJSON(Message const& msg) { return 
>> msg.SerializeAsString(); }
>>
>> Protobuf json serialization code:
>> std::string toJSON(Message const& msg) { return 
>> msg.SerializeAsString(); }
>> std::string json;
>> ::google::protobuf::util::MessageToJsonString(msg, 
>> std::addressof(json));
>> return json;
>> }
>>
>> Rapidjson serialization code:
>> // It's a lengthy section of code manually populating the document.  
>> Of note, empty strings and numbers set to 0 are omitted from the JSON as 
>> the protobuf does.  The resulting JSON is exactly the same as the protobuf 
>> json.
>>
>> Any info on how to improve the protobuf to JSON serialization would be 
>> greatly appreciated! 
>>
>> Thanks,
>> Ed
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to protobuf+u...@googlegroups.com .
>> To post to this group, send email to prot...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/protobuf.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroup

Re: [protobuf] JSON Serialization Performance

2018-03-22 Thread 'Feng Xiao' via Protocol Buffers
On Thu, Mar 22, 2018 at 8:23 AM, Edward Clark  wrote:

> Howdy,
>
> I'm working on a project that recently needed to insert data represented
> by protobufs into elasticsearch.  Using the built in JSON serialization we
> were able to quickly get data into elasticsearch, however, the JSON
> serialization seems to be rather slow when compared to generating with a
> library like rapidjson.  Is this expected or is a likely we're doing
> something wrong?
>
It's expected for proto-to-JSON conversion to be slower (and likely much
slower) than a dedicated JSON library converting objects designed to
represent JSON objects to JSON. It's like comparing a library that converts
rapidjson::Document to protobuf binary format against protobuf binary
serialization. The latter is definitely going to be faster no matter how
you optimize the former. Proto objects are just not designed to be
efficiently converted to JSON.

There are ways to improve the proto to JSON conversion though, but at the
end of day it won't going to beat proto to proto binary serialization so
usually performance sensitive services will just support proto binary
format instead.


> Below is info on what we're using, and relative serialization performance
> results.  Surprisingly, rapidjson serialization was faster than protobufs
> binary serialization in some cases, which leads me to believe I'm doing
> something wrong.
>
> Ubuntu 16.04
> GCC 7.3, std=c++17, libstdc++11 string api
> Protobuf 3.5.1.1 compiled with -O3, proto3 syntax
>
> I've measure the performance of 3 cases, serializing the protobuf to
> binary, serializing the protobuf to JSON via MessageToJSONString, and
> building a rapidjson::Document from the protobuf and then serializing that
> to JSON.  All tests use the same message with different portions of the
> message populated, 100,000 iterations.  The json generated from the
> protobuf and rapidjson match exactly.
>
> Test 1, a single string field populated.
> proto binary: 0.01s
> proto json:0.50s
> rapidjson: 0.02s
>
> Test 2, 1 top level string field, 1 nested object with 3 more string
> fields.
> proto binary: 0.02s
> proto json:1.06s
> rapidjson: 0.05s
>
> Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing
> doubles of the format, [[[double, double], [double, double], ...]], 36
> pairs of doubles total.
> *proto binary: 1.50s*
> *proto json:8.87s*
> *rapidjson: 0.41s*
>
I think this is because of your choice of using
google::protobuf::ListValue. That type (along with
google::protobuf::Value/Struct) is specifically designed to mimic arbitrary
JSON content with proto and is far from efficient compared to protobuf
primitive types. I would just use a "repeated double" to represent these 36
pairs of doubles.


>
> Protobuf binary serialization code:
> std::string toJSON(Message const& msg) { return
> msg.SerializeAsString(); }
>
> Protobuf json serialization code:
> std::string toJSON(Message const& msg) { return
> msg.SerializeAsString(); }
> std::string json;
> ::google::protobuf::util::MessageToJsonString(msg,
> std::addressof(json));
> return json;
> }
>
> Rapidjson serialization code:
> // It's a lengthy section of code manually populating the document.
> Of note, empty strings and numbers set to 0 are omitted from the JSON as
> the protobuf does.  The resulting JSON is exactly the same as the protobuf
> json.
>
> Any info on how to improve the protobuf to JSON serialization would be
> greatly appreciated!
>
> Thanks,
> Ed
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


[protobuf] JSON Serialization Performance

2018-03-22 Thread Edward Clark
Howdy,

I'm working on a project that recently needed to insert data represented by 
protobufs into elasticsearch.  Using the built in JSON serialization we 
were able to quickly get data into elasticsearch, however, the JSON 
serialization seems to be rather slow when compared to generating with a 
library like rapidjson.  Is this expected or is a likely we're doing 
something wrong?  Below is info on what we're using, and relative 
serialization performance results.  Surprisingly, rapidjson serialization 
was faster than protobufs binary serialization in some cases, which leads 
me to believe I'm doing something wrong.

Ubuntu 16.04
GCC 7.3, std=c++17, libstdc++11 string api
Protobuf 3.5.1.1 compiled with -O3, proto3 syntax

I've measure the performance of 3 cases, serializing the protobuf to 
binary, serializing the protobuf to JSON via MessageToJSONString, and 
building a rapidjson::Document from the protobuf and then serializing that 
to JSON.  All tests use the same message with different portions of the 
message populated, 100,000 iterations.  The json generated from the 
protobuf and rapidjson match exactly.

Test 1, a single string field populated.
proto binary: 0.01s
proto json:0.50s
rapidjson: 0.02s

Test 2, 1 top level string field, 1 nested object with 3 more string fields.
proto binary: 0.02s
proto json:1.06s
rapidjson: 0.05s

Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing 
doubles of the format, [[[double, double], [double, double], ...]], 36 
pairs of doubles total.
*proto binary: 1.50s*
*proto json:8.87s*
*rapidjson: 0.41s*

Protobuf binary serialization code:
std::string toJSON(Message const& msg) { return 
msg.SerializeAsString(); }

Protobuf json serialization code:
std::string toJSON(Message const& msg) { return 
msg.SerializeAsString(); }
std::string json;
::google::protobuf::util::MessageToJsonString(msg, 
std::addressof(json));
return json;
}

Rapidjson serialization code:
// It's a lengthy section of code manually populating the document.  Of 
note, empty strings and numbers set to 0 are omitted from the JSON as the 
protobuf does.  The resulting JSON is exactly the same as the protobuf json.

Any info on how to improve the protobuf to JSON serialization would be 
greatly appreciated! 

Thanks,
Ed

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.