Thanks for the input. I'm not looking to beat binary serialization performance, but I would like to avoid having to hand write the JSON serialization for insertion into elasticsearch. I understand the proto JSON serialization has to lookup field names to generate the JSON, which isn't required when building manually, but I wouldn't expect that to account for an order of magnitude difference.
A repeated double would not give the desired JSON output. This is used for the coordinates section of GeoJson (what elasticsearch understands). Thanks, Ed On Thursday, March 22, 2018 at 6:45:41 PM UTC-4, Feng Xiao wrote: > > On Thu, Mar 22, 2018 at 8:23 AM, Edward Clark <[email protected] > <javascript:>> wrote: > >> Howdy, >> >> I'm working on a project that recently needed to insert data represented >> by protobufs into elasticsearch. Using the built in JSON serialization we >> were able to quickly get data into elasticsearch, however, the JSON >> serialization seems to be rather slow when compared to generating with a >> library like rapidjson. Is this expected or is a likely we're doing >> something wrong? >> > It's expected for proto-to-JSON conversion to be slower (and likely much > slower) than a dedicated JSON library converting objects designed to > represent JSON objects to JSON. It's like comparing a library that converts > rapidjson::Document to protobuf binary format against protobuf binary > serialization. The latter is definitely going to be faster no matter how > you optimize the former. Proto objects are just not designed to be > efficiently converted to JSON. > > There are ways to improve the proto to JSON conversion though, but at the > end of day it won't going to beat proto to proto binary serialization so > usually performance sensitive services will just support proto binary > format instead. > > >> Below is info on what we're using, and relative serialization performance >> results. Surprisingly, rapidjson serialization was faster than protobufs >> binary serialization in some cases, which leads me to believe I'm doing >> something wrong. >> >> Ubuntu 16.04 >> GCC 7.3, std=c++17, libstdc++11 string api >> Protobuf 3.5.1.1 compiled with -O3, proto3 syntax >> >> I've measure the performance of 3 cases, serializing the protobuf to >> binary, serializing the protobuf to JSON via MessageToJSONString, and >> building a rapidjson::Document from the protobuf and then serializing that >> to JSON. All tests use the same message with different portions of the >> message populated, 100,000 iterations. The json generated from the >> protobuf and rapidjson match exactly. >> >> Test 1, a single string field populated. >> proto binary: 0.01s >> proto json: 0.50s >> rapidjson: 0.02s >> >> Test 2, 1 top level string field, 1 nested object with 3 more string >> fields. >> proto binary: 0.02s >> proto json: 1.06s >> rapidjson: 0.05s >> >> Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing >> doubles of the format, [[[double, double], [double, double], ...]], 36 >> pairs of doubles total. >> *proto binary: 1.50s* >> *proto json: 8.87s* >> *rapidjson: 0.41s* >> > I think this is because of your choice of using > google::protobuf::ListValue. That type (along with > google::protobuf::Value/Struct) is specifically designed to mimic arbitrary > JSON content with proto and is far from efficient compared to protobuf > primitive types. I would just use a "repeated double" to represent these 36 > pairs of doubles. > > >> >> Protobuf binary serialization code: >> std::string toJSON(Message const& msg) { return >> msg.SerializeAsString(); } >> >> Protobuf json serialization code: >> std::string toJSON(Message const& msg) { return >> msg.SerializeAsString(); } >> std::string json; >> ::google::protobuf::util::MessageToJsonString(msg, >> std::addressof(json)); >> return json; >> } >> >> Rapidjson serialization code: >> // It's a lengthy section of code manually populating the document. >> Of note, empty strings and numbers set to 0 are omitted from the JSON as >> the protobuf does. The resulting JSON is exactly the same as the protobuf >> json. >> >> Any info on how to improve the protobuf to JSON serialization would be >> greatly appreciated! >> >> Thanks, >> Ed >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Protocol Buffers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/protobuf. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.
