Thanks for the input.  I'm not looking to beat binary serialization 
performance, but I would like to avoid having to hand write the JSON 
serialization for insertion into elasticsearch.  I understand the proto 
JSON serialization has to lookup field names to generate the JSON, which 
isn't required when building manually, but I wouldn't expect that to 
account for an order of magnitude difference.

A repeated double would not give the desired JSON output.  This is used for 
the coordinates section of GeoJson (what elasticsearch understands).

Thanks,
Ed

On Thursday, March 22, 2018 at 6:45:41 PM UTC-4, Feng Xiao wrote:
>
> On Thu, Mar 22, 2018 at 8:23 AM, Edward Clark <[email protected] 
> <javascript:>> wrote:
>
>> Howdy,
>>
>> I'm working on a project that recently needed to insert data represented 
>> by protobufs into elasticsearch.  Using the built in JSON serialization we 
>> were able to quickly get data into elasticsearch, however, the JSON 
>> serialization seems to be rather slow when compared to generating with a 
>> library like rapidjson.  Is this expected or is a likely we're doing 
>> something wrong? 
>>
> It's expected for proto-to-JSON conversion to be slower (and likely much 
> slower) than a dedicated JSON library converting objects designed to 
> represent JSON objects to JSON. It's like comparing a library that converts 
> rapidjson::Document to protobuf binary format against protobuf binary 
> serialization. The latter is definitely going to be faster no matter how 
> you optimize the former. Proto objects are just not designed to be 
> efficiently converted to JSON.
>
> There are ways to improve the proto to JSON conversion though, but at the 
> end of day it won't going to beat proto to proto binary serialization so 
> usually performance sensitive services will just support proto binary 
> format instead. 
>  
>
>> Below is info on what we're using, and relative serialization performance 
>> results.  Surprisingly, rapidjson serialization was faster than protobufs 
>> binary serialization in some cases, which leads me to believe I'm doing 
>> something wrong.
>>
>> Ubuntu 16.04
>> GCC 7.3, std=c++17, libstdc++11 string api
>> Protobuf 3.5.1.1 compiled with -O3, proto3 syntax
>>
>> I've measure the performance of 3 cases, serializing the protobuf to 
>> binary, serializing the protobuf to JSON via MessageToJSONString, and 
>> building a rapidjson::Document from the protobuf and then serializing that 
>> to JSON.  All tests use the same message with different portions of the 
>> message populated, 100,000 iterations.  The json generated from the 
>> protobuf and rapidjson match exactly.
>>
>> Test 1, a single string field populated.
>> proto binary: 0.01s
>> proto json:    0.50s
>> rapidjson:     0.02s
>>
>> Test 2, 1 top level string field, 1 nested object with 3 more string 
>> fields.
>> proto binary: 0.02s
>> proto json:    1.06s
>> rapidjson:     0.05s
>>
>> Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing 
>> doubles of the format, [[[double, double], [double, double], ...]], 36 
>> pairs of doubles total.
>> *proto binary: 1.50s*
>> *proto json:    8.87s*
>> *rapidjson:     0.41s*
>>
> I think this is because of your choice of using 
> google::protobuf::ListValue. That type (along with 
> google::protobuf::Value/Struct) is specifically designed to mimic arbitrary 
> JSON content with proto and is far from efficient compared to protobuf 
> primitive types. I would just use a "repeated double" to represent these 36 
> pairs of doubles.
>  
>
>>
>> Protobuf binary serialization code:
>>     std::string toJSON(Message const& msg) { return 
>> msg.SerializeAsString(); }
>>
>> Protobuf json serialization code:
>>     std::string toJSON(Message const& msg) { return 
>> msg.SerializeAsString(); }
>>         std::string json;
>>         ::google::protobuf::util::MessageToJsonString(msg, 
>> std::addressof(json));
>>         return json;
>>     }
>>
>> Rapidjson serialization code:
>>     // It's a lengthy section of code manually populating the document.  
>> Of note, empty strings and numbers set to 0 are omitted from the JSON as 
>> the protobuf does.  The resulting JSON is exactly the same as the protobuf 
>> json.
>>
>> Any info on how to improve the protobuf to JSON serialization would be 
>> greatly appreciated! 
>>
>> Thanks,
>> Ed
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/protobuf.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to