[protobuf] Finding the starting point of a PB block in a file that contains noise at the beginning.

Angel Cervera Claudio Sun, 04 Oct 2020 10:17:06 -0700

I try to read chuncks of a file that contains sequence of PB blocks. Is 
there a way to detect where a block starts?


A little bit of context:
It is a huge file (around 60GB).
The file format is a sequences of [[Block header][Block content]]. In 
reallity, It is a little bit more complex, but as sample is enough.
The [Block header] contains the lenght of the next [block content].
So the way to read it is sequencially.

I wrote a Spark Connector. The first version is reading the file 
sequencially as well.

In the next version, I want to proccess the file splitted, as Spark 
provides it. So I will get chuncks of the file.
I need to search where a [block header] starts, to be able to read 
sequencially from that point.
So, How to find this first block? Any idea?

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/01bd0fbf-cc13-476d-ab3a-c50a278f81aen%40googlegroups.com.

[protobuf] Finding the starting point of a PB block in a file that contains noise at the beginning.

Reply via email to