Re: [protobuf] Re: Streaming Serialization - Suggestion

2016-04-01 Thread 'Feng Xiao' via Protocol Buffers
On Wed, Mar 30, 2016 at 5:27 PM, Yoav H  wrote:

> I saw the start\end group but I couldn't find any information on those and
> how to use them.
>
> Your point about skipping fields makes sense.
> I think it is also solvable with applying the same idea of chunked
> encoding, even on sub fields.
> So instead of writing the full length of the child field, you allow the
> serializer to write it in smaller chunks.
> The deserializer can then just read the chunk markings and skip them.
> A very basic serializer can put just one chunk (which will be equivalent
> to the current implementation, plus one more zero marking at the end), but
> it allows a more efficient serializer to stream data.
>
> Regarding adding something to the encoding spec, are you allowing proto2
> serializers to call into proto3 deserializers and vice versa?
> I thought that if you have a protoX server, you expect clients to take the
> protoX file and generate a client out of it, which will match that proto
> version encoding. Isn't it the case?
>
Proto2 and proto3 are wire-compatible. We already have a lot of proto3
clients communicating with proto2 servers or vice versa. Like Josh
mentioned, we can't change proto3's wire format now.


>
> Thanks,
> Yoav.
>
> On Tuesday, March 29, 2016 at 5:06:46 PM UTC-7, Feng Xiao wrote:
>
>>
>>
>> On Mon, Mar 28, 2016 at 10:53 PM, Yoav H  wrote:
>>
>>> They say on their website: "When evaluating new features, we look for
>>> additions that are very widely useful or very simple".
>>> What I'm suggesting here is both very useful (speeding up serialization
>>> and eliminating memory duplication) and very simple (simple additions to
>>> the encoding, no need to change the language).
>>> So far, no response from the Google guys...
>>>
>> Actually there are already a "start embedding" tag and a "end embedding"
>> tag in protobuf:
>> https://developers.google.com/protocol-buffers/docs/encoding#structure
>>
>> 3 Start group groups (deprecated)
>> 4 End group groups (deprecated)
>>
>> They are deprecated though.
>>
>> You mentioned it will be a performance gain, but what we experienced in
>> google says otherwise. For example, in a lot places we are only interested
>> in a few fields and want to skip through all other fields (if we are
>> building a proxy, or the field is simply an unknown field). The start
>> group/end group tag pair forces the parser to decode every single field in
>> the a whole group even the whole group is to be ignored after parsing, and
>> that's a very significant drawback.
>>
>> And adding a new wire tag type to protobuf is not a simple thing.
>> Actually I don't think we have added any new wire type to protobuf before.
>> There are a lot issues to consider. For example, isn't all code that switch
>> on protobuf wire types now suddenly broken? if a new serializer uses the
>> new wire type in its output, what will happen if the parsers can't
>> understand it?
>>
>> Proto3 is already finalized and we will not add new wire types in proto3.
>> Whether to add it in proto4 depends on whether we have a good use for it
>> and whether we can mitigate the risks of rolling out a new wire type.
>>
>>
>>>
>>>
>>> On Monday, March 28, 2016 at 10:24:17 AM UTC-7, Peter Hultqvist wrote:

 This exact suggestion has been up for discussion long time ago(years?,
 before proto2?)

 When it comes to taking suggestions I'm only a 3rd party implementer
 but my understanding is that the design process of protocol buffers and its
 goals are internal to Google and they usually publish new versions of their
 code implementing new features before you can read about them in the
 documents.
 On Mar 27, 2016 5:31 AM, "Yoav H"  wrote:

> Any comment on this?
> Will you consider this for proto3?
>
> On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>>
>> Hi,
>>
>> I have a suggestion fr improving the protobuf encoding.
>> Is proto3 final?
>>
>> I like the simplicity of the encoding of protobuf.
>> But I think it has one issue with serialization, using streams.
>> The problem is with length delimited fields and the fact that they
>> require knowing the length ahead of time.
>> If we have a very long string, we need to encode the entire string
>> before we know its length, so we basically duplicate the data in memory.
>> Same is true for embedded messages, where we need to encode the
>> entire embedded message before we can append it to the stream.
>>
>> I think there is a simple solution for both issues.
>>
>> For strings and byte arrays, a simple solution is to use "chunked
>> encoding".
>> Which means that the byte array is split into chunks and every chunk
>> starts with the chunk length. End of array is indicated by length zero.
>>
>> For embedded messages, the solution is to have an 

Re: [protobuf] Re: Streaming Serialization - Suggestion

2016-04-01 Thread 'Josh Haberman' via Protocol Buffers
Hi Yoav,

Chunked encoding is definitely an interesting idea, and I can see the 
benefits you mentioned. However proto2 and proto3 are more or less frozen 
from a wire perspective. There are lots of existing clients out there 
already communicating with proto3, so we're not really at liberty to make 
any changes. Sorry about that.

Best,
Josh

On Wednesday, March 30, 2016 at 5:27:53 PM UTC-7, Yoav H wrote:
>
> I saw the start\end group but I couldn't find any information on those and 
> how to use them.
>
> Your point about skipping fields makes sense.
> I think it is also solvable with applying the same idea of chunked 
> encoding, even on sub fields.
> So instead of writing the full length of the child field, you allow the 
> serializer to write it in smaller chunks.
> The deserializer can then just read the chunk markings and skip them.
> A very basic serializer can put just one chunk (which will be equivalent 
> to the current implementation, plus one more zero marking at the end), but 
> it allows a more efficient serializer to stream data.
>
> Regarding adding something to the encoding spec, are you allowing proto2 
> serializers to call into proto3 deserializers and vice versa?
> I thought that if you have a protoX server, you expect clients to take the 
> protoX file and generate a client out of it, which will match that proto 
> version encoding. Isn't it the case?
>
> Thanks,
> Yoav.
>
> On Tuesday, March 29, 2016 at 5:06:46 PM UTC-7, Feng Xiao wrote:
>>
>>
>>
>> On Mon, Mar 28, 2016 at 10:53 PM, Yoav H  wrote:
>>
>>> They say on their website: "When evaluating new features, we look for 
>>> additions that are very widely useful or very simple".
>>> What I'm suggesting here is both very useful (speeding up serialization 
>>> and eliminating memory duplication) and very simple (simple additions to 
>>> the encoding, no need to change the language).
>>> So far, no response from the Google guys...
>>>
>> Actually there are already a "start embedding" tag and a "end embedding" 
>> tag in protobuf:
>> https://developers.google.com/protocol-buffers/docs/encoding#structure
>>
>> 3 Start group groups (deprecated)
>> 4 End group groups (deprecated)
>>
>> They are deprecated though.
>>
>> You mentioned it will be a performance gain, but what we experienced in 
>> google says otherwise. For example, in a lot places we are only interested 
>> in a few fields and want to skip through all other fields (if we are 
>> building a proxy, or the field is simply an unknown field). The start 
>> group/end group tag pair forces the parser to decode every single field in 
>> the a whole group even the whole group is to be ignored after parsing, and 
>> that's a very significant drawback.
>>
>> And adding a new wire tag type to protobuf is not a simple thing. 
>> Actually I don't think we have added any new wire type to protobuf before. 
>> There are a lot issues to consider. For example, isn't all code that switch 
>> on protobuf wire types now suddenly broken? if a new serializer uses the 
>> new wire type in its output, what will happen if the parsers can't 
>> understand it?
>>
>> Proto3 is already finalized and we will not add new wire types in proto3. 
>> Whether to add it in proto4 depends on whether we have a good use for it 
>> and whether we can mitigate the risks of rolling out a new wire type.
>>  
>>
>>>
>>>
>>> On Monday, March 28, 2016 at 10:24:17 AM UTC-7, Peter Hultqvist wrote:

 This exact suggestion has been up for discussion long time ago(years?, 
 before proto2?)

 When it comes to taking suggestions I'm only a 3rd party implementer 
 but my understanding is that the design process of protocol buffers and 
 its 
 goals are internal to Google and they usually publish new versions of 
 their 
 code implementing new features before you can read about them in the 
 documents.
 On Mar 27, 2016 5:31 AM, "Yoav H"  wrote:

> Any comment on this?
> Will you consider this for proto3?
>
> On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>>
>> Hi,
>>
>> I have a suggestion fr improving the protobuf encoding.
>> Is proto3 final?
>>
>> I like the simplicity of the encoding of protobuf.
>> But I think it has one issue with serialization, using streams.
>> The problem is with length delimited fields and the fact that they 
>> require knowing the length ahead of time.
>> If we have a very long string, we need to encode the entire string 
>> before we know its length, so we basically duplicate the data in memory.
>> Same is true for embedded messages, where we need to encode the 
>> entire embedded message before we can append it to the stream.
>>
>> I think there is a simple solution for both issues.
>>
>> For strings and byte arrays, a simple solution is to use "chunked 
>> encoding".
>> Which means that the 

Re: [protobuf] Re: Streaming Serialization - Suggestion

2016-03-30 Thread Yoav H
I saw the start\end group but I couldn't find any information on those and 
how to use them.

Your point about skipping fields makes sense.
I think it is also solvable with applying the same idea of chunked 
encoding, even on sub fields.
So instead of writing the full length of the child field, you allow the 
serializer to write it in smaller chunks.
The deserializer can then just read the chunk markings and skip them.
A very basic serializer can put just one chunk (which will be equivalent to 
the current implementation, plus one more zero marking at the end), but it 
allows a more efficient serializer to stream data.

Regarding adding something to the encoding spec, are you allowing proto2 
serializers to call into proto3 deserializers and vice versa?
I thought that if you have a protoX server, you expect clients to take the 
protoX file and generate a client out of it, which will match that proto 
version encoding. Isn't it the case?

Thanks,
Yoav.

On Tuesday, March 29, 2016 at 5:06:46 PM UTC-7, Feng Xiao wrote:
>
>
>
> On Mon, Mar 28, 2016 at 10:53 PM, Yoav H  > wrote:
>
>> They say on their website: "When evaluating new features, we look for 
>> additions that are very widely useful or very simple".
>> What I'm suggesting here is both very useful (speeding up serialization 
>> and eliminating memory duplication) and very simple (simple additions to 
>> the encoding, no need to change the language).
>> So far, no response from the Google guys...
>>
> Actually there are already a "start embedding" tag and a "end embedding" 
> tag in protobuf:
> https://developers.google.com/protocol-buffers/docs/encoding#structure
>
> 3 Start group groups (deprecated)
> 4 End group groups (deprecated)
>
> They are deprecated though.
>
> You mentioned it will be a performance gain, but what we experienced in 
> google says otherwise. For example, in a lot places we are only interested 
> in a few fields and want to skip through all other fields (if we are 
> building a proxy, or the field is simply an unknown field). The start 
> group/end group tag pair forces the parser to decode every single field in 
> the a whole group even the whole group is to be ignored after parsing, and 
> that's a very significant drawback.
>
> And adding a new wire tag type to protobuf is not a simple thing. Actually 
> I don't think we have added any new wire type to protobuf before. There are 
> a lot issues to consider. For example, isn't all code that switch on 
> protobuf wire types now suddenly broken? if a new serializer uses the new 
> wire type in its output, what will happen if the parsers can't understand 
> it?
>
> Proto3 is already finalized and we will not add new wire types in proto3. 
> Whether to add it in proto4 depends on whether we have a good use for it 
> and whether we can mitigate the risks of rolling out a new wire type.
>  
>
>>
>>
>> On Monday, March 28, 2016 at 10:24:17 AM UTC-7, Peter Hultqvist wrote:
>>>
>>> This exact suggestion has been up for discussion long time ago(years?, 
>>> before proto2?)
>>>
>>> When it comes to taking suggestions I'm only a 3rd party implementer but 
>>> my understanding is that the design process of protocol buffers and its 
>>> goals are internal to Google and they usually publish new versions of their 
>>> code implementing new features before you can read about them in the 
>>> documents.
>>> On Mar 27, 2016 5:31 AM, "Yoav H"  wrote:
>>>
 Any comment on this?
 Will you consider this for proto3?

 On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>
> Hi,
>
> I have a suggestion fr improving the protobuf encoding.
> Is proto3 final?
>
> I like the simplicity of the encoding of protobuf.
> But I think it has one issue with serialization, using streams.
> The problem is with length delimited fields and the fact that they 
> require knowing the length ahead of time.
> If we have a very long string, we need to encode the entire string 
> before we know its length, so we basically duplicate the data in memory.
> Same is true for embedded messages, where we need to encode the entire 
> embedded message before we can append it to the stream.
>
> I think there is a simple solution for both issues.
>
> For strings and byte arrays, a simple solution is to use "chunked 
> encoding".
> Which means that the byte array is split into chunks and every chunk 
> starts with the chunk length. End of array is indicated by length zero.
>
> For embedded messages, the solution is to have an "start embedding" 
> tag and an "end embedding tag".
> Everything in between is the embedded message.
>
> By adding these two new features, serialization can be fully 
> streamable and there is no need to pre-serialize big chunks in memory 
> before writing them to the stream.
>
> Hope you'll find this suggestion useful 

Re: [protobuf] Re: Streaming Serialization - Suggestion

2016-03-29 Thread David Yu
On Wed, Mar 30, 2016 at 8:06 AM, 'Feng Xiao' via Protocol Buffers <
protobuf@googlegroups.com> wrote:

>
>
> On Mon, Mar 28, 2016 at 10:53 PM, Yoav H  wrote:
>
>> They say on their website: "When evaluating new features, we look for
>> additions that are very widely useful or very simple".
>> What I'm suggesting here is both very useful (speeding up serialization
>> and eliminating memory duplication) and very simple (simple additions to
>> the encoding, no need to change the language).
>> So far, no response from the Google guys...
>>
> Actually there are already a "start embedding" tag and a "end embedding"
> tag in protobuf:
> https://developers.google.com/protocol-buffers/docs/encoding#structure
>
> 3 Start group groups (deprecated)
> 4 End group groups (deprecated)
>
> They are deprecated though.
>
> You mentioned it will be a performance gain, but what we experienced in
> google says otherwise. For example, in a lot places we are only interested
> in a few fields and want to skip through all other fields (if we are
> building a proxy, or the field is simply an unknown field). The start
> group/end group tag pair forces the parser to decode every single field in
> the a whole group even the whole group is to be ignored after parsing, and
> that's a very significant drawback.
>
This is definitely the use-case where delimiting makes perfect sense
(proxy/middleware service that reads part of a message).
The name 'protocol buffers' does kinda makes that use-case obvious.
If using protobuf to simply serialize/deserialize, then start/end group
would definitely benefit the streaming use-case.
Shameless plug: https://github.com/protostuff/protostuff optimizes for the
latter use-case and was mostly the reason it was created (java only though)

>
> And adding a new wire tag type to protobuf is not a simple thing. Actually
> I don't think we have added any new wire type to protobuf before. There are
> a lot issues to consider. For example, isn't all code that switch on
> protobuf wire types now suddenly broken? if a new serializer uses the new
> wire type in its output, what will happen if the parsers can't understand
> it?
>
> Proto3 is already finalized and we will not add new wire types in proto3.
> Whether to add it in proto4 depends on whether we have a good use for it
> and whether we can mitigate the risks of rolling out a new wire type.
>
>
>>
>>
>> On Monday, March 28, 2016 at 10:24:17 AM UTC-7, Peter Hultqvist wrote:
>>>
>>> This exact suggestion has been up for discussion long time ago(years?,
>>> before proto2?)
>>>
>>> When it comes to taking suggestions I'm only a 3rd party implementer but
>>> my understanding is that the design process of protocol buffers and its
>>> goals are internal to Google and they usually publish new versions of their
>>> code implementing new features before you can read about them in the
>>> documents.
>>> On Mar 27, 2016 5:31 AM, "Yoav H"  wrote:
>>>
 Any comment on this?
 Will you consider this for proto3?

 On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>
> Hi,
>
> I have a suggestion fr improving the protobuf encoding.
> Is proto3 final?
>
> I like the simplicity of the encoding of protobuf.
> But I think it has one issue with serialization, using streams.
> The problem is with length delimited fields and the fact that they
> require knowing the length ahead of time.
> If we have a very long string, we need to encode the entire string
> before we know its length, so we basically duplicate the data in memory.
> Same is true for embedded messages, where we need to encode the entire
> embedded message before we can append it to the stream.
>
> I think there is a simple solution for both issues.
>
> For strings and byte arrays, a simple solution is to use "chunked
> encoding".
> Which means that the byte array is split into chunks and every chunk
> starts with the chunk length. End of array is indicated by length zero.
>
> For embedded messages, the solution is to have an "start embedding"
> tag and an "end embedding tag".
> Everything in between is the embedded message.
>
> By adding these two new features, serialization can be fully
> streamable and there is no need to pre-serialize big chunks in memory
> before writing them to the stream.
>
> Hope you'll find this suggestion useful and incorporate it into the
> protocol.
>
> Thanks,
> Yoav.
>
>
> --
 You received this message because you are subscribed to the Google
 Groups "Protocol Buffers" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to protobuf+u...@googlegroups.com.
 To post to this group, send email to prot...@googlegroups.com.
 Visit this group at https://groups.google.com/group/protobuf.
 For more options, 

Re: [protobuf] Re: Streaming Serialization - Suggestion

2016-03-29 Thread 'Feng Xiao' via Protocol Buffers
On Mon, Mar 28, 2016 at 10:53 PM, Yoav H  wrote:

> They say on their website: "When evaluating new features, we look for
> additions that are very widely useful or very simple".
> What I'm suggesting here is both very useful (speeding up serialization
> and eliminating memory duplication) and very simple (simple additions to
> the encoding, no need to change the language).
> So far, no response from the Google guys...
>
Actually there are already a "start embedding" tag and a "end embedding"
tag in protobuf:
https://developers.google.com/protocol-buffers/docs/encoding#structure

3 Start group groups (deprecated)
4 End group groups (deprecated)

They are deprecated though.

You mentioned it will be a performance gain, but what we experienced in
google says otherwise. For example, in a lot places we are only interested
in a few fields and want to skip through all other fields (if we are
building a proxy, or the field is simply an unknown field). The start
group/end group tag pair forces the parser to decode every single field in
the a whole group even the whole group is to be ignored after parsing, and
that's a very significant drawback.

And adding a new wire tag type to protobuf is not a simple thing. Actually
I don't think we have added any new wire type to protobuf before. There are
a lot issues to consider. For example, isn't all code that switch on
protobuf wire types now suddenly broken? if a new serializer uses the new
wire type in its output, what will happen if the parsers can't understand
it?

Proto3 is already finalized and we will not add new wire types in proto3.
Whether to add it in proto4 depends on whether we have a good use for it
and whether we can mitigate the risks of rolling out a new wire type.


>
>
> On Monday, March 28, 2016 at 10:24:17 AM UTC-7, Peter Hultqvist wrote:
>>
>> This exact suggestion has been up for discussion long time ago(years?,
>> before proto2?)
>>
>> When it comes to taking suggestions I'm only a 3rd party implementer but
>> my understanding is that the design process of protocol buffers and its
>> goals are internal to Google and they usually publish new versions of their
>> code implementing new features before you can read about them in the
>> documents.
>> On Mar 27, 2016 5:31 AM, "Yoav H"  wrote:
>>
>>> Any comment on this?
>>> Will you consider this for proto3?
>>>
>>> On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:

 Hi,

 I have a suggestion fr improving the protobuf encoding.
 Is proto3 final?

 I like the simplicity of the encoding of protobuf.
 But I think it has one issue with serialization, using streams.
 The problem is with length delimited fields and the fact that they
 require knowing the length ahead of time.
 If we have a very long string, we need to encode the entire string
 before we know its length, so we basically duplicate the data in memory.
 Same is true for embedded messages, where we need to encode the entire
 embedded message before we can append it to the stream.

 I think there is a simple solution for both issues.

 For strings and byte arrays, a simple solution is to use "chunked
 encoding".
 Which means that the byte array is split into chunks and every chunk
 starts with the chunk length. End of array is indicated by length zero.

 For embedded messages, the solution is to have an "start embedding" tag
 and an "end embedding tag".
 Everything in between is the embedded message.

 By adding these two new features, serialization can be fully streamable
 and there is no need to pre-serialize big chunks in memory before writing
 them to the stream.

 Hope you'll find this suggestion useful and incorporate it into the
 protocol.

 Thanks,
 Yoav.


 --
>>> You received this message because you are subscribed to the Google
>>> Groups "Protocol Buffers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to protobuf+u...@googlegroups.com.
>>> To post to this group, send email to prot...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/protobuf.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: [protobuf] Re: Streaming Serialization - Suggestion

2016-03-28 Thread Yoav H
They say on their website: "When evaluating new features, we look for 
additions that are very widely useful or very simple".
What I'm suggesting here is both very useful (speeding up serialization and 
eliminating memory duplication) and very simple (simple additions to the 
encoding, no need to change the language).
So far, no response from the Google guys...


On Monday, March 28, 2016 at 10:24:17 AM UTC-7, Peter Hultqvist wrote:
>
> This exact suggestion has been up for discussion long time ago(years?, 
> before proto2?)
>
> When it comes to taking suggestions I'm only a 3rd party implementer but 
> my understanding is that the design process of protocol buffers and its 
> goals are internal to Google and they usually publish new versions of their 
> code implementing new features before you can read about them in the 
> documents.
> On Mar 27, 2016 5:31 AM, "Yoav H"  
> wrote:
>
>> Any comment on this?
>> Will you consider this for proto3?
>>
>> On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>>>
>>> Hi,
>>>
>>> I have a suggestion fr improving the protobuf encoding.
>>> Is proto3 final?
>>>
>>> I like the simplicity of the encoding of protobuf.
>>> But I think it has one issue with serialization, using streams.
>>> The problem is with length delimited fields and the fact that they 
>>> require knowing the length ahead of time.
>>> If we have a very long string, we need to encode the entire string 
>>> before we know its length, so we basically duplicate the data in memory.
>>> Same is true for embedded messages, where we need to encode the entire 
>>> embedded message before we can append it to the stream.
>>>
>>> I think there is a simple solution for both issues.
>>>
>>> For strings and byte arrays, a simple solution is to use "chunked 
>>> encoding".
>>> Which means that the byte array is split into chunks and every chunk 
>>> starts with the chunk length. End of array is indicated by length zero.
>>>
>>> For embedded messages, the solution is to have an "start embedding" tag 
>>> and an "end embedding tag".
>>> Everything in between is the embedded message.
>>>
>>> By adding these two new features, serialization can be fully streamable 
>>> and there is no need to pre-serialize big chunks in memory before writing 
>>> them to the stream.
>>>
>>> Hope you'll find this suggestion useful and incorporate it into the 
>>> protocol.
>>>
>>> Thanks,
>>> Yoav.
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to protobuf+u...@googlegroups.com .
>> To post to this group, send email to prot...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/protobuf.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] Re: Streaming Serialization - Suggestion

2016-03-28 Thread Peter Hultqvist
This exact suggestion has been up for discussion long time ago(years?,
before proto2?)

When it comes to taking suggestions I'm only a 3rd party implementer but my
understanding is that the design process of protocol buffers and its goals
are internal to Google and they usually publish new versions of their code
implementing new features before you can read about them in the documents.
On Mar 27, 2016 5:31 AM, "Yoav H"  wrote:

> Any comment on this?
> Will you consider this for proto3?
>
> On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>>
>> Hi,
>>
>> I have a suggestion fr improving the protobuf encoding.
>> Is proto3 final?
>>
>> I like the simplicity of the encoding of protobuf.
>> But I think it has one issue with serialization, using streams.
>> The problem is with length delimited fields and the fact that they
>> require knowing the length ahead of time.
>> If we have a very long string, we need to encode the entire string before
>> we know its length, so we basically duplicate the data in memory.
>> Same is true for embedded messages, where we need to encode the entire
>> embedded message before we can append it to the stream.
>>
>> I think there is a simple solution for both issues.
>>
>> For strings and byte arrays, a simple solution is to use "chunked
>> encoding".
>> Which means that the byte array is split into chunks and every chunk
>> starts with the chunk length. End of array is indicated by length zero.
>>
>> For embedded messages, the solution is to have an "start embedding" tag
>> and an "end embedding tag".
>> Everything in between is the embedded message.
>>
>> By adding these two new features, serialization can be fully streamable
>> and there is no need to pre-serialize big chunks in memory before writing
>> them to the stream.
>>
>> Hope you'll find this suggestion useful and incorporate it into the
>> protocol.
>>
>> Thanks,
>> Yoav.
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


[protobuf] Re: Streaming Serialization - Suggestion

2016-03-26 Thread Yoav H
Any comment on this?
Will you consider this for proto3?

On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>
> Hi,
>
> I have a suggestion fr improving the protobuf encoding.
> Is proto3 final?
>
> I like the simplicity of the encoding of protobuf.
> But I think it has one issue with serialization, using streams.
> The problem is with length delimited fields and the fact that they require 
> knowing the length ahead of time.
> If we have a very long string, we need to encode the entire string before 
> we know its length, so we basically duplicate the data in memory.
> Same is true for embedded messages, where we need to encode the entire 
> embedded message before we can append it to the stream.
>
> I think there is a simple solution for both issues.
>
> For strings and byte arrays, a simple solution is to use "chunked 
> encoding".
> Which means that the byte array is split into chunks and every chunk 
> starts with the chunk length. End of array is indicated by length zero.
>
> For embedded messages, the solution is to have an "start embedding" tag 
> and an "end embedding tag".
> Everything in between is the embedded message.
>
> By adding these two new features, serialization can be fully streamable 
> and there is no need to pre-serialize big chunks in memory before writing 
> them to the stream.
>
> Hope you'll find this suggestion useful and incorporate it into the 
> protocol.
>
> Thanks,
> Yoav.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.