(I believe the term I failed to use here is "over-long forms")

On Mar 10, 3:48 pm, Marc Gravell <marc.grav...@gmail.com> wrote:
> The variable-length encoding allows multiple representations of the
> same value - for example, 1 could be written as:
>  0000 0001
> or it could be (I believe):
>  1000 0001    1000 0000    0000 0000
> Is there anything in the core (or other) implementations that would
> object to this second form?
> In particular, the scenario I'm thinking about is where a message is
> know to be pretty deep - for example:
>  A
>  > B
>      > C
>         > D
>         > D
>      > C
>         > D
>         > D
> At the moment, my code leaves an optimistic single-byte dummy-value
> for the prefix, and then backfills this value when the length is known
> (i.e. after writing this subtree), shuffling the data if needed.
> I'm toying with making this voluntarily lossy; for example it might
> (in some cases) decide to leave a longer (2-5) dummy value, and write
> the alternative form if the value turns out to be small (i.e. if
> actually only 4 bytes were written). This would reduce the number of
> times I need to block-copy the data (noting that it might have to copy
> the same data multiple times - potentially every non-root object might
> be more than 127 bytes).
> I'm tempted to spike it (to gauge the performance benefit), but I
> don't want to waste my time if this is going to make it incompatible
> with the core implementations (i.e. if they would actively spot this
> and cause an error). And if it *is* valid, would it be possible to
> make it explicit in the encoding spec? (or indeed, if it *isn't* valid
> make it explicit in the encoding spec).
> Thanks,
> Marc Gravell

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to