When I started using protocol buffers I faced the problem of sending unstructured data inside a protocol buffers message. For example, letting the user to send any custom data inside a protocol buffers message without having to define extensions, recompile messages, etc. Then one may think to send json data inside a string field or binary-json/messagepack inside a bytes field. Both of them are good approaches. However I though on using protocol buffer encoding techniques for storing arbitrary data. You can define JSON as a set of protocol buffers messages, but it would not be efficient as there are many submessages fields that take extra space. In this way I have changed the protocol buffers encoding rules a little bit to allow encoding arbitrary data efficiently using basic encoding concepts that already uses protocol buffers, like encoding tags, varints, strings, etc. Here is an example of how it is encoded a sample JSON:
{"str":"hello", "val": 1, "array" : [true, false], "nested": { "value" : true }} TAG(PSON, OBJECT) VARINT(OBJECT_SIZE) VARINT(3) "str" TAG(PSON, STRING) VARINT(5) "hello" VARINT(3) "val" TAG(PSON, ONE) VARINT(5) "array" TAG(PSON, ARRAY) VARINT(2) TAG(PSON, TRUE) TAG(PSON, FALSE) VARINT(6) "nested" TAG(PSON, OBJECT) VARINT(7) VARINT(5) "value" TAG(PSON, TRUE) For reference, TAG is defined in protocol buffers as (wire type, field number); This approach could be easily integrated in protocol buffers by defining a new wire type 'pson' (it still remains traversable), which define a set of custom fields inside the tag to properly determine the data type (object, array, string, bytes, varint, signed varint, float, boolean, true, false, 1, 0, etc). It also apply some optimizations, and encoding a boolean, zero, or one, requires just one byte. Also floating point numbers are encoded as varints under some circumstances. Signed integers as encoded as simple varints and the sign is restored on decoding, and so on. This format can also encode data without defining a root object like an array or a json object, and also you can store binary data. So the pson wire type could be an efficient way to store any unstructured data. I have implemented a preliminary version of this approach in GitHub (https://github.com/thinger-io/Protoson). Depending on the encoded content, the output size is quite similar or smaller than MessagePack. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To post to this group, send email to protobuf@googlegroups.com. Visit this group at http://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.