I have figured it out by myself.
The problem is that _DecodeVarint() may only consume fewer than the 4 bytes
reserved for it and reports how long it really was in the second return
tuple element. So progressing offset by that returned value rather than 4
does the trick.
Cheers,
Moose
Mr Moose schrieb am Freitag, 17. Mai 2024 um 13:19:58 UTC+2:
> Hello everyone,
>
> I hope I can find some advise here.
> I have C++ code that writes a number of protobuf messages to a compressed
> size delimited stream like this (simplified):
>
> FILE *ofile = fopen("myfile.bin.gz", "wb");
> google::protobuf::io::FileOutputStream ostream(_fileno(ofile));
> google::protobuf::io::GzipOutputStream zipstream(&ostream);
>
> while (loop) {
> google::protobuf::util::SerializeDelimitedToZeroCopyStream(my_msg,
> zipstream);
> }
>
> This works fine. The files are written and I can read them back in in C++
> with no issues.
> Now I am trying to read them in Python and I'm having difficulties to
> understand the structure of the files. Here's what I'm trying:
>
> def read_messages(raw_data: bytes):
> offset = 0
> while offset < len(raw_data):
> # Read the size (4 bytes, little-endian) and decode
> size_bytes = raw_data[offset : offset + 4]
> offset += 4
> size, _ = _DecodeVarint(size_bytes, 0)
> # This reads the correct size of the message (verified in C++)
>
> message_data = raw_data[offset : offset + size]
> offset += size
>
> # This causes an "Error parsing message" exception at the first
> message
> msg = my_messages_protobuf.MyMessage()
> msg.ParseFromString(message_data)
>
> ... and ...
>
> with gzip.open( "myfile.bin.gz", "r") as f:
> while True:
> chunk = f.read(chunk_size)
> if not chunk:
> break;
> read_messages(chunk)
>
> Now, to clarify a bit, I have worked with protobuf for very long, although
> not in Python. Yet much Python code already deserializes such messages that
> come in elsewhere, so I assume the whole "setup Protobuf in Python" thing
> is not an issue here. It should work.
>
> Given the fact that _DecodeVarint() correctly reads the message size leads
> me to believe the reading of the gzipped file is okay too.
>
> Yet when I look at the raw buffer "message_data" it looks very different
> than the raw message data looks in C++ when I use the debugger there. I
> have no idea what could cause this difference.
>
> Can anybody give me a hint on what could be wrong here?
>
> Much appreciated,
> Moose
>
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/protobuf/b8db4d69-987b-452a-bad3-d2743302c068n%40googlegroups.com.