I should clarify: when talking about "groups" I should emphasise that Google have marked that feature plagued. Which is a shame since I'm still of the opinion that they are at least as good, probably better, than sub-messages **on the wire** (how they appear in the class model is of far less interest to me, since I don't use the Google API).
Marc On 14 May 2013 17:47, "Marc Gravell" <marc.grav...@gmail.com> wrote: > I asked about this a few years ago (feel free to search the archive - I > couldn't find it; I believe I used the term "subnormal forms" for this). > IIRC the answer then was along the lines of "hmmm.... looking at the > current implementation that will probably work, but it isn't guaranteed and > won't be tested on all platforms; we don't recommend it". > > However: I should note that if you want optimal encoding, groups (rather > than length-prefix) might be worth a look - since the group doesn't demand > you know the length. > > Note that on some implementations the length *is* known in advance, so > they don't have any overhead here. > > Note that 2 bytes *is not enough* to guarantee every scenario, but it is > probably enough to avoid the large majority of shuffles, if (whichever > implementation you are using) is doing what I suspect it is doing. > > Marc > On 14 May 2013 16:35, <mailto.jo...@gmail.com> wrote: > >> I am trying to understand the performance overhead of serializing Google >> Protocol Buffer messages. One aspect that annoys me a bit is the way >> submessages are handled with it's variable size field. It seems to be >> optimized for reducing message size on not for serialization performance. >> >> Problem: >> ====== >> The size field that preceeds a submessage is of type varint. >> >> She number of bytes (octets) needed: >> submessage size (bytes): 1-127, 127-16383, ... >> number of bytes for serialized "size": 1 , 2 , ... >> >> Since the serialized size vary, we cannot know the start position of the >> submessage in the stream before the submessage serialized size is known. >> I.e. we have to make a guess and/or use temporary buffers for submessage >> serialization which will affect serialization performance in a negative way. >> >> Question: >> ======= >> Now to my question, would it be possible (not violating the standard) to >> force a varint to be 2 bytes large even though the value is less than 127? >> This could be achieved by prefixing the value with zeros according to the >> following scheme: >> >> value = b0,b1,b2,b3,b4,b5,b6,b7 where b0=0 (i.e. value < 127) >> >> Serialized: 1,b1,b2,b3,b4,b5,b6,b7,0,0,0,0,0,0,0,0 >> >> Prefixing with 7 zero bits would add to the overall message size, but >> reduce serialization time (for submessages < 16384 bytes). Do the standard >> allow such manipulation? I have tried to decode such a message using protoc >> but it reports a failure. However, I have not found any description in the >> Google documentation saying that this is not allowed. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Protocol Buffers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to protobuf+unsubscr...@googlegroups.com. >> To post to this group, send email to email@example.com. >> Visit this group at http://groups.google.com/group/protobuf?hl=en. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To post to this group, send email to firstname.lastname@example.org. Visit this group at http://groups.google.com/group/protobuf?hl=en. For more options, visit https://groups.google.com/groups/opt_out.