Re: [protobuf] Re: Streaming Serialization - Suggestion

Yoav H Wed, 30 Mar 2016 17:28:47 -0700

I saw the start\end group but I couldn't find any information on those and 
how to use them.


Your point about skipping fields makes sense.
I think it is also solvable with applying the same idea of chunked 
encoding, even on sub fields.
So instead of writing the full length of the child field, you allow the 
serializer to write it in smaller chunks.
The deserializer can then just read the chunk markings and skip them.
A very basic serializer can put just one chunk (which will be equivalent to 
the current implementation, plus one more zero marking at the end), but it 
allows a more efficient serializer to stream data.

Regarding adding something to the encoding spec, are you allowing proto2 
serializers to call into proto3 deserializers and vice versa?
I thought that if you have a protoX server, you expect clients to take the 
protoX file and generate a client out of it, which will match that proto 
version encoding. Isn't it the case?

Thanks,
Yoav.

On Tuesday, March 29, 2016 at 5:06:46 PM UTC-7, Feng Xiao wrote:
>
>
>
> On Mon, Mar 28, 2016 at 10:53 PM, Yoav H <[email protected] 
> <javascript:>> wrote:
>
>> They say on their website: "When evaluating new features, we look for 
>> additions that are very widely useful or very simple".
>> What I'm suggesting here is both very useful (speeding up serialization 
>> and eliminating memory duplication) and very simple (simple additions to 
>> the encoding, no need to change the language).
>> So far, no response from the Google guys...
>>
> Actually there are already a "start embedding" tag and a "end embedding" 
> tag in protobuf:
> https://developers.google.com/protocol-buffers/docs/encoding#structure
>
> 3 Start group groups (deprecated)
> 4 End group groups (deprecated)
>
> They are deprecated though.
>
> You mentioned it will be a performance gain, but what we experienced in 
> google says otherwise. For example, in a lot places we are only interested 
> in a few fields and want to skip through all other fields (if we are 
> building a proxy, or the field is simply an unknown field). The start 
> group/end group tag pair forces the parser to decode every single field in 
> the a whole group even the whole group is to be ignored after parsing, and 
> that's a very significant drawback.
>
> And adding a new wire tag type to protobuf is not a simple thing. Actually 
> I don't think we have added any new wire type to protobuf before. There are 
> a lot issues to consider. For example, isn't all code that switch on 
> protobuf wire types now suddenly broken? if a new serializer uses the new 
> wire type in its output, what will happen if the parsers can't understand 
> it?
>
> Proto3 is already finalized and we will not add new wire types in proto3. 
> Whether to add it in proto4 depends on whether we have a good use for it 
> and whether we can mitigate the risks of rolling out a new wire type.
>  
>
>>
>>
>> On Monday, March 28, 2016 at 10:24:17 AM UTC-7, Peter Hultqvist wrote:
>>>
>>> This exact suggestion has been up for discussion long time ago(years?, 
>>> before proto2?)
>>>
>>> When it comes to taking suggestions I'm only a 3rd party implementer but 
>>> my understanding is that the design process of protocol buffers and its 
>>> goals are internal to Google and they usually publish new versions of their 
>>> code implementing new features before you can read about them in the 
>>> documents.
>>> On Mar 27, 2016 5:31 AM, "Yoav H" <[email protected]> wrote:
>>>
>>>> Any comment on this?
>>>> Will you consider this for proto3?
>>>>
>>>> On Wednesday, March 23, 2016 at 11:50:36 AM UTC-7, Yoav H wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have a suggestion fr improving the protobuf encoding.
>>>>> Is proto3 final?
>>>>>
>>>>> I like the simplicity of the encoding of protobuf.
>>>>> But I think it has one issue with serialization, using streams.
>>>>> The problem is with length delimited fields and the fact that they 
>>>>> require knowing the length ahead of time.
>>>>> If we have a very long string, we need to encode the entire string 
>>>>> before we know its length, so we basically duplicate the data in memory.
>>>>> Same is true for embedded messages, where we need to encode the entire 
>>>>> embedded message before we can append it to the stream.
>>>>>
>>>>> I think there is a simple solution for both issues.
>>>>>
>>>>> For strings and byte arrays, a simple solution is to use "chunked 
>>>>> encoding".
>>>>> Which means that the byte array is split into chunks and every chunk 
>>>>> starts with the chunk length. End of array is indicated by length zero.
>>>>>
>>>>> For embedded messages, the solution is to have an "start embedding" 
>>>>> tag and an "end embedding tag".
>>>>> Everything in between is the embedded message.
>>>>>
>>>>> By adding these two new features, serialization can be fully 
>>>>> streamable and there is no need to pre-serialize big chunks in memory 
>>>>> before writing them to the stream.
>>>>>
>>>>> Hope you'll find this suggestion useful and incorporate it into the 
>>>>> protocol.
>>>>>
>>>>> Thanks,
>>>>> Yoav.
>>>>>
>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Protocol Buffers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/protobuf.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/protobuf.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Re: [protobuf] Re: Streaming Serialization - Suggestion

Reply via email to