Re: [protobuf] Reduce protobuf with repeated messages size

2019-05-13 Thread Nadav Samet
A few more thoughts:
- Random data doesn't tend to compress well - try to measure the benefit of
compression for your preferred message layout using data that is typical
for your application.
- It's always possible to add helper functions/classes to make it easier to
deal with inconvenient message layouts.

-Nadav

On Mon, May 13, 2019 at 2:25 PM 'Adam Cozzette' via Protocol Buffers <
protobuf@googlegroups.com> wrote:

> I have not looked into size savings from compression, but your
> uncompressed sizes sound right assuming about 3 bytes per int64 (those use
> a variable-size integer so the size depends on the value being stored). I
> think there's a tradeoff here between ease of use and serialized size, but
> if it's important for your use case to keep serialized size small then this
> technique sounds like one which is worth considering for sure.
>
> *From: *'Boaz Yaniv' via Protocol Buffers 
> *Date: *Sun, May 12, 2019 at 6:41 AM
> *To: *Protocol Buffers
>
> Hi,
>> I read recently protocol-buffers encoding
>> 
>> and notice a way to save space
>> It is better to hold a repeated value than a repeated message of a value
>> protobuf is saving data on each message (header/type/length), so saving a
>> repeated message of two int64 will cost more than saving 2 repeated int64
>> (int64 as an example).
>>
>> I Used protobuf-java version: 3.4.0
>> Made a test to check it, with and without compression (LZ4) see results
>> bellow (this is a similar case we have in production)
>>
>> message Head1 {
>> repeated Data d1 = 1;
>> }
>>
>> message Data {
>> int64 v1 = 1;
>> int64 v2 = 2;
>> }
>> message Head2 {
>> repeated int64 v1 = 1;
>> repeated int64 v2 = 2;
>> }
>>
>> *With 400 messages of Head1 and Head2 (same random values in each
>> message):*
>> Message 'Head1' Uncompressed data size is: 3985 bytes
>> Message 'Head1' compressed data size is: *3697* bytes
>>
>> Message 'Head2' Uncompressed data size is: 2391 bytes
>> Message 'Head2' compressed data size is: *2402* bytes   --> 35% less
>>
>> *Questions:*
>> The problem is I am losing schema ordering on app side and I will have
>> to keep lists (in Head2) syncd all the time
>>
>> Is this correct or I am missing something?
>>
>> By adding a new flag to the proto it can save lots of data in the encoded
>> proto (in case its relevant)
>>
>> I tested also with writing to cassandra and the save is huge +40%
>>
>> Thoughts?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to protobuf+unsubscr...@googlegroups.com.
>> To post to this group, send email to protobuf@googlegroups.com.
>> Visit this group at https://groups.google.com/group/protobuf.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/protobuf/69b44003-5821-4678-9ba7-18c1a7a05ee5%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/protobuf/CADqAXr4tkcoOb2NYYkgBwEVD3MofC43Cq_hWr%3D8EvCJCTB-z1g%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
-Nadav

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/CANZcNErTUCZjL9zaPybvUqO-9Qh%2BnuMkdKU2SknJMDGpkzhT9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] Reduce protobuf with repeated messages size

2019-05-13 Thread 'Adam Cozzette' via Protocol Buffers
I have not looked into size savings from compression, but your uncompressed
sizes sound right assuming about 3 bytes per int64 (those use a
variable-size integer so the size depends on the value being stored). I
think there's a tradeoff here between ease of use and serialized size, but
if it's important for your use case to keep serialized size small then this
technique sounds like one which is worth considering for sure.

*From: *'Boaz Yaniv' via Protocol Buffers 
*Date: *Sun, May 12, 2019 at 6:41 AM
*To: *Protocol Buffers

Hi,
> I read recently protocol-buffers encoding
>  and
> notice a way to save space
> It is better to hold a repeated value than a repeated message of a value
> protobuf is saving data on each message (header/type/length), so saving a
> repeated message of two int64 will cost more than saving 2 repeated int64
> (int64 as an example).
>
> I Used protobuf-java version: 3.4.0
> Made a test to check it, with and without compression (LZ4) see results
> bellow (this is a similar case we have in production)
>
> message Head1 {
> repeated Data d1 = 1;
> }
>
> message Data {
> int64 v1 = 1;
> int64 v2 = 2;
> }
> message Head2 {
> repeated int64 v1 = 1;
> repeated int64 v2 = 2;
> }
>
> *With 400 messages of Head1 and Head2 (same random values in each
> message):*
> Message 'Head1' Uncompressed data size is: 3985 bytes
> Message 'Head1' compressed data size is: *3697* bytes
>
> Message 'Head2' Uncompressed data size is: 2391 bytes
> Message 'Head2' compressed data size is: *2402* bytes   --> 35% less
>
> *Questions:*
> The problem is I am losing schema ordering on app side and I will have to
> keep lists (in Head2) syncd all the time
>
> Is this correct or I am missing something?
>
> By adding a new flag to the proto it can save lots of data in the encoded
> proto (in case its relevant)
>
> I tested also with writing to cassandra and the save is huge +40%
>
> Thoughts?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/protobuf/69b44003-5821-4678-9ba7-18c1a7a05ee5%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/CADqAXr4tkcoOb2NYYkgBwEVD3MofC43Cq_hWr%3D8EvCJCTB-z1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] Proposal: a mechanism to deal with sensitive/redacted fields in string output

2019-05-13 Thread 'Adam Cozzette' via Protocol Buffers
*From: *Zellyn Hunter 
*Date: *Mon, May 13, 2019 at 8:16 AM
*To: *Adam Cozzette
*Cc: *Josh Humphries, Protocol Buffers

On Fri, May 10, 2019 at 6:06 PM Adam Cozzette  wrote:
>
>> I asked for feedback about this proposal within Google and unfortunately
>> it sounds like there's not a lot of support for accepting this kind of
>> change. The general feedback I got was that it's best to simply avoid
>> printing out any protos at all if they might contain sensitive information.
>> This kind of feature might provide a false sense of security and encourage
>> developers to print out protos that haven't necessarily been fully
>> annotated with the sensitive field option. There was some agreement that in
>> Java it is particularly easy to print stringified protos by accident, but
>> it seems that ideally we would want to disable that behavior entirely
>> rather than redacting particular fields.
>>
>
> For what it's worth, when discussing this before, some folks on the
> Protobuf Team mentioned that the parts of Google that deal with financial
> transactions actually have something similar to our proposal. Or at least
> something that accomplishes the same goal.
>

That is true, but from what I understand that solution is a bit different.
It is built as libraries and tools on top of protobuf, so while it has the
advantage of not needing access to protobuf internals, it doesn't really
prevent accidental stringification the same way. You would have to call a
library function to sanitize a message, so this would not just happen
automatically.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/CADqAXr6r1k%3Dr46ZS2-gGHd7TEQc1wh0OjQM8rRMMyYo3R2r8yA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] reading binary data

2019-05-13 Thread 'Adam Cozzette' via Protocol Buffers
Protobuf uses its own binary format, so it's unlikely that you would be
able to use protobuf to parse some binary data written by another program
that isn't using protobuf. The binary format doesn't really have much
metadata, but it's mostly just a series of tag-value pairs (see here
 for more
information).

You can definitely set things up to rebuild your binary using a shell
script, and like you mentioned this would usually involve at least two
steps: first calling protoc on your .proto files and then invoking the C++
compiler. Protobuf doesn't have any built-in support for that because this
is really the responsibility of whichever build system you're using.

*From: *Sonny Diaz 
*Date: *Mon, May 13, 2019 at 12:06 PM
*To: *Protocol Buffers

Hi,
>
> May I use protocol buffers strictly for reading binary data? In other
> words, the sender/writer did not use protocol buffers. Or does protocol
> buffers inject some metadata into the binary? I ask because the "optional"
> option makes me thing there is metadata. And also, avro injects the
> protocol into the binary. I am fine with everything being "required".
>
> Usually, we would parse this data with structs in C/C++. However, going
> forward we want to expose the protocol definition to non programmers. So as
> a protocol evolves, the non programmer user can just edit a text file.
> Which brings up another question. Can I make it so that once the proto file
> is edited, the user can just run double click a shell script and then the
> binary be updated? I think that is still a bit too much for a non
> programmer, but from what I am reading, a proto edit requires two
> compilations (protocol buffers and my program) correct? Avro seemed more of
> a fit in that respect, but I don't want any metadata required for reading.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/protobuf/9ffc5008-c5c8-4dc8-8744-30052b174f65%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/CADqAXr6haU-18%2BOBygOnexh8NkRJXjYqyx9zQJiT3ZHd5W9B9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[protobuf] reading binary data

2019-05-13 Thread Sonny Diaz
Hi,

May I use protocol buffers strictly for reading binary data? In other 
words, the sender/writer did not use protocol buffers. Or does protocol 
buffers inject some metadata into the binary? I ask because the "optional" 
option makes me thing there is metadata. And also, avro injects the 
protocol into the binary. I am fine with everything being "required".

Usually, we would parse this data with structs in C/C++. However, going 
forward we want to expose the protocol definition to non programmers. So as 
a protocol evolves, the non programmer user can just edit a text file. 
Which brings up another question. Can I make it so that once the proto file 
is edited, the user can just run double click a shell script and then the 
binary be updated? I think that is still a bit too much for a non 
programmer, but from what I am reading, a proto edit requires two 
compilations (protocol buffers and my program) correct? Avro seemed more of 
a fit in that respect, but I don't want any metadata required for reading.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/9ffc5008-c5c8-4dc8-8744-30052b174f65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] Proposal: a mechanism to deal with sensitive/redacted fields in string output

2019-05-13 Thread Zellyn Hunter
On Fri, May 10, 2019 at 6:06 PM Adam Cozzette  wrote:

> I asked for feedback about this proposal within Google and unfortunately
> it sounds like there's not a lot of support for accepting this kind of
> change. The general feedback I got was that it's best to simply avoid
> printing out any protos at all if they might contain sensitive information.
> This kind of feature might provide a false sense of security and encourage
> developers to print out protos that haven't necessarily been fully
> annotated with the sensitive field option. There was some agreement that in
> Java it is particularly easy to print stringified protos by accident, but
> it seems that ideally we would want to disable that behavior entirely
> rather than redacting particular fields.
>

For what it's worth, when discussing this before, some folks on the
Protobuf Team mentioned that the parts of Google that deal with financial
transactions actually have something similar to our proposal. Or at least
something that accomplishes the same goal.


> I gather that Square is already relying on this functionality in its
> internal protobuf fork, so I would say if it helps we could probably at
> least try to refactor things to minimize the complexity of maintaining that
> behavior difference.
>
That would be super-helpful. I'll have to catch up on the current state of
protobuf library code and figure out how to allow convenient interception.

Zellyn

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/CAMQ7dq5jRg9TT0uMfiOreD0F1K4d1EXtUEqP9SkE5aC1jt9UKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.