bmahler commented on issue #356: libprocess: check protobuf (de)serialisation 
success.
URL: https://github.com/apache/mesos/pull/356#issuecomment-610590957
 
 
   Took a look at the protobuf code:
   
   > Before the code didn't check at all the return value of
   > Message::SerializeToString, which can fail for various reasons,
   > e.g. out-of-memory, message too large, or invalid UTF-8 string.
   
   As far as I see, `MessageLite::SerializeToString` boils down to [this 
code](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.cc#L422-L436):
   
   ```
   bool MessageLite::AppendPartialToString(std::string* output) const {
     size_t old_size = output->size();
     size_t byte_size = ByteSizeLong();
     if (byte_size > INT_MAX) {
       GOOGLE_LOG(ERROR) << GetTypeName()
                  << " exceeded maximum protobuf size of 2GB: " << byte_size;
       return false;
     }
   
     STLStringResizeUninitialized(output, old_size + byte_size);
     uint8* start =
         reinterpret_cast<uint8*>(io::mutable_string_data(output) + old_size);
     SerializeToArrayImpl(*this, start, byte_size);
     return true;
   }
   ```
   
   This will only return false for the message being too large? Where do you 
see out of memory being handled? It also looks like invalid UTF-8 does not fail 
serialization, but rather only logs an error for "proto3" always and logs an 
error in debug mode for "proto2", see [this example of a proto3 
message](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/any.pb.cc#L214).
   
   On the parsing side, looks like the possible cases for a false return are: 
`!IsInitialized()` 
[[1]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.h#L529)[[2]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.h#L475-L479),
 didn't consume entire message 
[[3]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.cc#L135),
 invalid message data (e.g. invalid varint, length of a string / submessage 
goes out of bounds). For invalid UTF-8, it looks [pretty 
complicated](https://github.com/protocolbuffers/protobuf/blob/e667bf6eaaa2fb1ba2987c6538df81f88500d030/src/google/protobuf/compiler/cpp/cpp_helpers.cc#L1034-L1101):
   
   * If the message is "proto3", it seems to always validate UTF-8 by directly 
calling `WireFormatLite::VerifyUtf8String` and will then strictly fail parsing 
if not UTF-8.
   * For "proto2" code, it seems to call 
[`WireFormat::VerifyUTF8StringNamedField`](https://github.com/protocolbuffers/protobuf/blob/df6e3a2f54ca893635119e98688a47ad85ecca26/src/google/protobuf/wire_format.h#L355-L381)
 and **only log** if it's invalid UTF-8, not fail. Also the code seems to be 
within a `#ifdef GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED` which in turns is 
[only enabled if in a debug build i.e. 
`!NDEBUG`](https://github.com/protocolbuffers/protobuf/blob/e667bf6eaaa2fb1ba2987c6538df81f88500d030/src/google/protobuf/wire_format_lite.h#L54-L57).
 I'm a little puzzled at how you saw the logging, is your libprotobuf using 
debug mode? If you're using protoc from mesos I wonder if the way it's built in 
mesos is hitting the same issue as 
https://github.com/Homebrew/legacy-homebrew/issues/9279.
   
   Here's a summary of my findings:
   
   **Serialization of invalid UTF-8**:
   * "proto3": serialization will succeed but invalid UTF-8 will be logged.
   * "proto2": serialization will succeed. Invalid UTF-8 will only be logged in 
debug mode (NDEBUG not defined).
   
   **De-serialization of invalid UTF-8**:
   * "proto3": de-serialization will fail and it will be logged.
   * "proto2": de-serialization will succeed. Invalid UTF-8 will only be logged 
in debug mode (NDEBUG not defined).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to