Hi,
I read recently protocol-buffers encoding
<https://developers.google.com/protocol-buffers/docs/encoding#packed> and
notice a way to save space
It is better to hold a repeated value than a repeated message of a value
protobuf is saving data on each message (header/type/length), so saving a
repeated message of two int64 will cost more than saving 2 repeated int64
(int64 as an example).
I Used protobuf-java version: 3.4.0
Made a test to check it, with and without compression (LZ4) see results
bellow (this is a similar case we have in production)
message Head1 {
repeated Data d1 = 1;
}
message Data {
int64 v1 = 1;
int64 v2 = 2;
}
message Head2 {
repeated int64 v1 = 1;
repeated int64 v2 = 2;
}
*With 400 messages of Head1 and Head2 (same random values in each message):*
Message 'Head1' Uncompressed data size is: 3985 bytes
Message 'Head1' compressed data size is: *3697* bytes
Message 'Head2' Uncompressed data size is: 2391 bytes
Message 'Head2' compressed data size is: *2402* bytes --> 35% less
*Questions:*
The problem is I am losing schema ordering on app side and I will have to
keep lists (in Head2) syncd all the time
Is this correct or I am missing something?
By adding a new flag to the proto it can save lots of data in the encoded
proto (in case its relevant)
I tested also with writing to cassandra and the save is huge +40%!!!!
Thoughts?
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit
https://groups.google.com/d/msgid/protobuf/69b44003-5821-4678-9ba7-18c1a7a05ee5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.