[protobuf] Variant-length encoding : is it OK to be wasteful?

Marc Gravell Wed, 10 Mar 2010 07:48:43 -0800

The variable-length encoding allows multiple representations of the
same value - for example, 1 could be written as:


 0000 0001

or it could be (I believe):

 1000 0001    1000 0000    0000 0000

Is there anything in the core (or other) implementations that would
object to this second form?

In particular, the scenario I'm thinking about is where a message is
know to be pretty deep - for example:

 A
 > B
     > C
        > D
        > D
     > C
        > D
        > D

At the moment, my code leaves an optimistic single-byte dummy-value
for the prefix, and then backfills this value when the length is known
(i.e. after writing this subtree), shuffling the data if needed.

I'm toying with making this voluntarily lossy; for example it might
(in some cases) decide to leave a longer (2-5) dummy value, and write
the alternative form if the value turns out to be small (i.e. if
actually only 4 bytes were written). This would reduce the number of
times I need to block-copy the data (noting that it might have to copy
the same data multiple times - potentially every non-root object might
be more than 127 bytes).

I'm tempted to spike it (to gauge the performance benefit), but I
don't want to waste my time if this is going to make it incompatible
with the core implementations (i.e. if they would actively spot this
and cause an error). And if it *is* valid, would it be possible to
make it explicit in the encoding spec? (or indeed, if it *isn't* valid
make it explicit in the encoding spec).

Thanks,

Marc Gravell

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Variant-length encoding : is it OK to be wasteful?

Reply via email to