> That makes no sense. If the bytes were the same, how would deserializing
> them be able to produce unequal messages?
Yes, I guess if we can rely on the canonical ordering of the fields,
that should be enough.
> If possible, I would recommend designing your application such that it only
> requires that equal messages have the same serialization *most* of the time.
> For example, if you were designing a cache where the cache key is the hash
> of a serialized message, then the worst that can happen if two equal
> messages had different serializations is that you'd perform the same
> operation twice rather than hitting cache. As long as this is relatively
> rare, it's no big deal.
I was thinking more along the lines of keys in a MapReduce (between
Mapper and Reducer phases). I don't think this would work in this
> If you must know the details, in the varint encoding, the upper bit of each
> byte is used to indicate whether there are more bytes in the value. So, in
> a 3-byte varint, the first two bytes have the upper bit set, but the last
> byte does not. So obviously a 3-byte varint cannot start with the same
> bytes as a 6-byte varint, because in a 6-byte varint the third byte would
> have the upper bit set.
Yes, I actually know this :) I've implemented similar encoding, and
also looked at the source... the beauty of open source. It was just
hard to grok the entirety of the serialization scheme from the code.
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to email@example.com
To unsubscribe from this group, send email to
For more options, visit this group at