[protobuf] Re: Fast navigation in binary gzip'ed file that includes messages of variable length

Roman Vinogradov Mon, 21 May 2012 23:58:06 -0700

Yes, this is what I am thinking about too. I will use format similar
to TFile with some meta datas relevant to my problem.
If the file gets corrupted somewhere and index is not available then I
will be able to recover data before the place where corruption occurs
and build data index at least (if compression algorithm is known).
If I prefix each data chunk with magic number too we would not need to
take care of possible invalid reads at the end (if magic number is
missing at the beginning of the chunk then it is not data to read any
longer).
This scheme is quite good to be used with protobuf encoding.
Thank you.


/Roman

On 20 май, 09:40, Eyal Farago <eyal.far...@gmail.com> wrote:
> a separate file only makes things worse as you have to deal with the
> possibility of corruption in either one of the files or both...
>
> I think a better approach is to start with the TFile format, prefix each
> chunk in the file (sorry, it;s been some time since my last read about this
> interesting format) with its length, then write the mtd+index at the end
> when you close the file and add some kind of magic number /+ signature at
> the end. when you open the file, check the magic/signature/both, if it's
> valid you can trust the index to 'random access' blocks in the file, if
> it's not you can take the sequential path, based on the lengths that prefix
> each block in the file.
>
> Eyal.
>
>
>
> On Friday, May 18, 2012 7:05:38 PM UTC+3, Igor Gatis wrote:
>
> > Is a separate index file an option? You could build main file and this
> > index file. If one file only is desirable, you can prepend or append index
> > later.
>
> > On Thu, May 17, 2012 at 8:22 AM, Roman Vinogradov 
> > <.<vinogradov.ro...@gmail.com>
> > ..> wrote:
>
> >> Thank you all.
> >> I think TFile is very similar to what I am trying to do.
> >> One issue of TFile is that it stores all meta and indices in its tail
> >> which in turn means that if the writing process suddenly crashes for
> >> some reason (e.g. hardware issue or something not related to our
> >> process) then file will be of incomplete format and it won't be
> >> possible to read it or even recover it.
>
> >> /Roman
>
> >> On 17 май, 10:28, Eyal Farago <...> wrote:
> >> > you can use an approach like HADDOP's TFilehttps://
> >> issues.apache.org/jira/secure/attachment/12396286/TFile%20Spe...
>
> >> > basically they store compressed chunks in a file, and at the end of the
> >> > file (hence T(ail)File) they store come kind of a sparse index to
> >> > the chunks.
>
> >> > Eyal.
>
> >> > - Скрыть цитируемый текст -
>
> >> > - Показать цитируемый текст -
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Protocol Buffers" group.
> >> To post to this group, send email to protobuf@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> protobuf+unsubscr...@googlegroups.com.
> >> For more options, visit this group at
> >>http://groups.google.com/group/protobuf?hl=en.- Скрыть цитируемый текст -
>
> - Показать цитируемый текст -

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Re: Fast navigation in binary gzip'ed file that includes messages of variable length

Reply via email to