On 05/08/2012 06:04 PM, Daniel Wright wrote:
On Tue, May 8, 2012 at 4:42 PM, Jeremy Stribling <st...@nicira.com <mailto:st...@nicira.com>> wrote:

    I'm working on a project to upgrade- and downgrade-proof a distributed
    system that uses protobufs to communicate data between instances
    of a C
    ++ program.  I'm trying to cover all possible cases for data schema
    changes between versions of my programs, and I was hoping to get some
    insight from the community on what the best practice is for the
    following tricky scenario.

    To reduce serialization type and protobuf message size, the format of
    a field in a message is changed between incompatible types.  For
    example, a string field gets changed to an int, or perhaps a field
    gets changed from one message type to another.  Because this is being
    done as an optimization, it makes no sense to keep both versions of
    the data around, so I think whether we change the field ID is not
    relevant -- we only ever want to have one version of the field in any
    particular protobuf.

Even though you don't keep both versions of the data around, you should keep both fields around, and have the code be able to read from whichever is set during the transition. You can rename the old one (say put "deprecated" in the name) so that people know that it's old, but don't actually remove it from the .proto file until no old instances of the proto remain. To put it more concretely, say you have

  optional string my_data = 1;

Now you come up with a way to encode it as an int64 instead. You'd change the .proto to:

  optional string deprecated_my_data = 1;
  optional int64 my_data = 2;

- At this point, you write the data to "deprecated_my_data" and not "my_data", but when you read, you check has_my_data() and has_deprecated_my_data() and read from whichever one is present. It might help to wrapper functions for reading and writing during the transition if the field is accessed in many places.

- once all instances of the program have been re-compiled so they all know about the new int64 field, you can start writing to my_data and not deprecated_my_data.

- once all of the instances of the program have been recompiled again, you can remove the code that reads deprecated_my_data, and delete the field.

This is kind of painful, but it's much cleaner than adding a version number. It also only ever writes the data to one field, so there's no bloat during the transition.

Thanks for the response. As you say, this solution is painful because you can't enable the optimization until the old version of the program is completely deprecated. This is somewhat simple in the case that you yourself are deploying the software, but when you're shipping software to customers (as we are) and have to support many old versions, it will take a very long time (possibly years) before you can enable the optimization. Also, it breaks the downgrade path. Once you enable the optimization, you can never downgrade back to a version that did not know about the new field.

You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to