Re: optional bytes vs repeated bytes

jasonh Mon, 24 Aug 2009 12:17:17 -0700

On Aug 24, 7:51 am, Saptarshi <saptarshi.g...@gmail.com> wrote:
> Hello,
> Suppose I would like to store a type that could be a sequence of raw
> bytes, so
>
>     message ...{
>
>             optional bytes sdata1=1; //A
>             repeated bytes sdata2=2;  //B
>            optional BYT sdata3=3; //C
>
> }
>
> message BYT{
> uint32 length=1;
> bytes data=2;}
>
> Now I have a unsigned char * array which I wish to store in the
> message.
>
> My first approach was using (B), add_sdata2(array[i],i) (something
> similar), but this is 3bytes per byte stored.

This shouldn't be 3 bytes per byte, unless you are storing each byte
individually. If you only have a single string, you should just be
adding the entire thing as the repeated element. Also, with that
add_sdata2() call you are using the index as the length of the string.
Given const char* array[] = { "foo", "bar" }; you want add_sdata2(array
[i], strlen(array[i]));

>
> I then tried, option A, storing the entire set of data into sdata1
> (which is actually a string, according to the generated protobuf
> header files). But when it comes to reading it, how do i know the
> number if bytes stored in the string? Suppose I my data looks like
> 0x00,0x00,0x00, what will be the length?
>
> I am currently using option (c).

You should use option a or b: c adds additional overhead. Here is the
wire format for each of the options:

(a): <1-byte tag + wire type (00001010 for tag 1, length-delimited)>
<varint length><raw bytes of sdata1>
If your data contains three null characters, then you'll get
<0x0a><0x3><0x00><0x00><0x00>
When parsing, the size of the data will just be msg.sdata1().size() ==
3.
(b): has the same wire format as (a), except that you can encode
multiple byte arrays with the same tag. I.e., if you had:
unsigned char* array[] = { "foo", "bar", "quux" };
You could add each to the repeated bytes field, and each string would
get encoded to the wire with the format above:
<0x12><0x3>"foo"<0x12><0x3>"bar"<0x12><0x4>"quux"
(c): The wire format for nested messages already encodes the size, so
you are adding extra bytes of overhead by encoding the length
separately. You now have:
[tag for BYT field][BYT message length][tag for size][varint size][tag
for data][length of data][data]
Or for three null characters:
<0x1a><0x07><0x8><0x03><0x12><0x03><0x00><0x00><0x00>

Hope that helps,
Jason

>
> Have I missed something? Is there a better approach.
> Thank you in advance
> Saptarshi
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---
Re: optional bytes vs repeated bytes

Reply via email to