[protobuf] Re: Decode size delimited compressed protobuf file in python

Mr Moose Fri, 17 May 2024 05:06:17 -0700

I have figured it out by myself.
The problem is that _DecodeVarint() may only consume fewer than the 4 bytes 
reserved for it and reports how long it really was in the second return 
tuple element. So progressing offset by that returned value rather than 4 
does the trick.


Cheers,
Moose

Mr Moose schrieb am Freitag, 17. Mai 2024 um 13:19:58 UTC+2:

> Hello everyone,
>
> I hope I can find some advise here.
> I have C++ code that writes a number of protobuf messages to a compressed 
> size delimited stream like this (simplified):
>
> FILE *ofile = fopen("myfile.bin.gz", "wb");
> google::protobuf::io::FileOutputStream ostream(_fileno(ofile));
> google::protobuf::io::GzipOutputStream zipstream(&ostream);
>
> while (loop) {
>    google::protobuf::util::SerializeDelimitedToZeroCopyStream(my_msg, 
> zipstream);
> }
>
> This works fine. The files are written and I can read them back in in C++ 
> with no issues.
> Now I am trying to read them in Python and I'm having difficulties to 
> understand the structure of the files. Here's what I'm trying:
>
> def read_messages(raw_data: bytes):
>     offset = 0
>     while offset < len(raw_data):
>         # Read the size (4 bytes, little-endian) and decode
>         size_bytes = raw_data[offset : offset + 4]
>         offset += 4
>         size, _ = _DecodeVarint(size_bytes, 0)
>         # This reads the correct size of the message (verified in C++)
>
>         message_data = raw_data[offset : offset + size]
>         offset += size
>
>         # This causes an "Error parsing message" exception at the first 
> message
>         msg = my_messages_protobuf.MyMessage()
>         msg.ParseFromString(message_data)
>
> ... and ...
>
>  with gzip.open( "myfile.bin.gz", "r") as f:
>       while True:
>           chunk = f.read(chunk_size)
>           if not chunk:
>               break;
>           read_messages(chunk)
>
> Now, to clarify a bit, I have worked with protobuf for very long, although 
> not in Python. Yet much Python code already deserializes such messages that 
> come in elsewhere, so I assume the whole "setup Protobuf in Python" thing 
> is not an issue here. It should work.
>
> Given the fact that _DecodeVarint() correctly reads the message size leads 
> me to believe the reading of the gzipped file is okay too.
>
> Yet when I look at the raw buffer "message_data" it looks very different 
> than the raw message data looks in C++ when I use the debugger there. I 
> have no idea what could cause this difference.
>
> Can anybody give me a hint on what could be wrong here?
>
> Much appreciated,
> Moose
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/b8db4d69-987b-452a-bad3-d2743302c068n%40googlegroups.com.

[protobuf] Re: Decode size delimited compressed protobuf file in python

Reply via email to