Hi,
I read recently protocol-buffers encoding 
<https://developers.google.com/protocol-buffers/docs/encoding#packed> and 
notice a way to save space 
It is better to hold a repeated value than a repeated message of a value
protobuf is saving data on each message (header/type/length), so saving a 
repeated message of two int64 will cost more than saving 2 repeated int64 
(int64 as an example).

I Used protobuf-java version: 3.4.0
Made a test to check it, with and without compression (LZ4) see results 
bellow (this is a similar case we have in production)

message Head1 {
    repeated Data d1 = 1;
}

message Data {
    int64 v1 = 1;
    int64 v2 = 2;
}
message Head2 {
    repeated int64 v1 = 1;
    repeated int64 v2 = 2;
}

*With 400 messages of Head1 and Head2 (same random values in each message):*
Message 'Head1' Uncompressed data size is: 3985 bytes
Message 'Head1' compressed data size is: *3697* bytes

Message 'Head2' Uncompressed data size is: 2391 bytes
Message 'Head2' compressed data size is: *2402* bytes   --> 35% less

*Questions:*
The problem is I am losing schema ordering on app side and I will have to 
keep lists (in Head2) syncd all the time

Is this correct or I am missing something?

By adding a new flag to the proto it can save lots of data in the encoded 
proto (in case its relevant)

I tested also with writing to cassandra and the save is huge +40%!!!!

Thoughts? 

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/69b44003-5821-4678-9ba7-18c1a7a05ee5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to