bmahler commented on issue #356: libprocess: check protobuf (de)serialisation success. URL: https://github.com/apache/mesos/pull/356#issuecomment-610590957 Took a look at the protobuf code: > Before the code didn't check at all the return value of > Message::SerializeToString, which can fail for various reasons, > e.g. out-of-memory, message too large, or invalid UTF-8 string. As far as I see, `MessageLite::SerializeToString` boils down to [this code](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.cc#L422-L436): ``` bool MessageLite::AppendPartialToString(std::string* output) const { size_t old_size = output->size(); size_t byte_size = ByteSizeLong(); if (byte_size > INT_MAX) { GOOGLE_LOG(ERROR) << GetTypeName() << " exceeded maximum protobuf size of 2GB: " << byte_size; return false; } STLStringResizeUninitialized(output, old_size + byte_size); uint8* start = reinterpret_cast<uint8*>(io::mutable_string_data(output) + old_size); SerializeToArrayImpl(*this, start, byte_size); return true; } ``` This will only return false for the message being too large? Where do you see out of memory being handled? It also looks like invalid UTF-8 does not fail serialization, but rather only logs an error for "proto3" always and logs an error in debug mode for "proto2", see [this example of a proto3 message](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/any.pb.cc#L214). On the parsing side, looks like the possible cases for a false return are: `!IsInitialized()` [[1]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.h#L529)[[2]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.h#L475-L479), didn't consume entire message [[3]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.cc#L135), invalid message data (e.g. invalid varint, length of a string / submessage goes out of bounds). For invalid UTF-8, it looks [pretty complicated](https://github.com/protocolbuffers/protobuf/blob/e667bf6eaaa2fb1ba2987c6538df81f88500d030/src/google/protobuf/compiler/cpp/cpp_helpers.cc#L1034-L1101): * If the message is "proto3", it seems to always validate UTF-8 by directly calling `WireFormatLite::VerifyUtf8String` and will then strictly fail parsing if not UTF-8. * For "proto2" code, it seems to call [`WireFormat::VerifyUTF8StringNamedField`](https://github.com/protocolbuffers/protobuf/blob/df6e3a2f54ca893635119e98688a47ad85ecca26/src/google/protobuf/wire_format.h#L355-L381) and **only log** if it's invalid UTF-8, not fail. Also the code seems to be within a `#ifdef GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED` which in turns is [only enabled if in a debug build i.e. `!NDEBUG`](https://github.com/protocolbuffers/protobuf/blob/e667bf6eaaa2fb1ba2987c6538df81f88500d030/src/google/protobuf/wire_format_lite.h#L54-L57). I'm a little puzzled at how you saw the logging, is your libprotobuf using debug mode? If you're using protoc from mesos I wonder if the way it's built in mesos is hitting the same issue as https://github.com/Homebrew/legacy-homebrew/issues/9279. Here's a summary of my findings: **Serialization of invalid UTF-8**: * "proto3": serialization will succeed but invalid UTF-8 will be logged. * "proto2": serialization will succeed. Invalid UTF-8 will only be logged in debug mode (NDEBUG not defined). **De-serialization of invalid UTF-8**: * "proto3": de-serialization will fail and it will be logged. * "proto2": de-serialization will succeed. Invalid UTF-8 will only be logged in debug mode (NDEBUG not defined).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
