[protobuf] Incorrect encoding of protocol buffer message

alok Thu, 15 Dec 2011 01:56:54 -0800

Hi,

 I am facing a strange issue when I write a binary file using protocol
buffers library. I had hard time reading the generated binary file in
my C# program. I would find an incorrect byte and unexpectedly
encounter an end of file byte in the middle of file. But interesting
thing is that my C++ reader program was able to read the program
properly but C# program errors. So I am very confused.


After investigating further in the binary file, we saw that the file
had some incorrect bytes inserted while encoding. (Either it is
incorrect or maybe I am missing something).

I can share all my c++ reader/C# reader programs and the binary file
to resolve this issue. We suspect that there is a bug in the library
which is writing the data incorrectly.

Below is the finding from the investigations of the data .

* I am displaying only certain information which is required to
understand the issue. I am printing the actual Message and not the
length and other header bytes associated with this.

<code>
message TradeMessage {
        required double timestamp = 1;
        required string ric_code = 2;
        required double price = 3;
        required int64 size = 4;
        required int64 AccumulatedVolume = 5;
}
</code>

Some of the objects read from the binary file look as below

(object 1 - good)
09 06 81 95 43 c3 27 dc 40 12 07 30 30 32 35 2e 48 4b 19 00 00 00 00
00 00 20 40 20 00 28 00
(object 2 - good)
09 25 06 81 95 c3 27 dc 40 12 07 30 30 32 34 2e 48 4b 19 00 00 00 00
00 00 00 00 20 00 28 00
(object 3 - corrupt)
09 71 3d 0d 0a d7 c3 27 dc 40 12 07 30 30 32 33 2e 48 4b 19 00 00 00
00 00 00 3b 40 20 00 28

If you look at 3 objects above, each object starts with field 1 which
is timestamp, double. It is encoded as 09 i.e. "field 1, wire-type
1" (i.e. 64-bit), so next 8 bytes represent the timestamp value.

If you carefully look at first 10 bytes of each object, you will see
that object 1 and object 2 encodes field 1 properly, but for object 3,
actual payload of field 1 starts from byte # 3. header for this field
is (09 71). 2 bytes. (either 2 bytes of header or 9 bytes of payload)
But in either case, one extra byte is written to my binary file.

Why is this happening? How does c++ reader understands to decode this
data? Is this correct or is there a bug involved here?

Please advice.

Regards,
Alok


-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Incorrect encoding of protocol buffer message

Reply via email to