Madhav,

I am not concerned with the integrity of the file itself. My only goal
is to be able to read back (completely or partially) a message saved
on disk in case file corruption occurs. A checksum would help to
decide if the file got corrupted but would not help at all in
recovering the data if indeed it got corrupted.

On Jan 28, 12:19 am, Madhav Ancha <madhavan...@gmail.com> wrote:
> Stefan,
>
>    Under what circumstances does persistence occur? If the message does not
> break into smaller ones naturally and it makes sense to keep the message in
> one piece, you can also use some checksum algorithm to verify your
> persistence.
>
> -Madhav Ancha
>
> On Wed, Jan 27, 2010 at 11:57 PM, Stefan <sneg...@gmail.com> wrote:
> > Kenton, Michael thanks for your quick answers (that was fast!). The
> > suggestions are great and to the point (and if I remember correctly
> > the approach was mentioned before).
>
> > So, a possible solution would be to break a big Bag into a couple of
> > Bags with smaller number of items. Now, I would need a mechanism to
> > write those smaller Bags delimited by some sort of a frame or marker.
> > Next step would be to discard corrupted messages from the file (a
> > corrupt message is one that does not parse) and seek to the next
> > marker. If I want to lose no more than one message per corruption, I
> > would need to write each Item separately but the overhead from the
> > markers would be bigger. On the other hand, if the Bag has too many
> > Items then I have the chance of losing too much data on a single
> > corruption. Aside from the markers, I would get overhead from
> > collating and separating the lists each time I need to use/save a big
> > Bag.
>
> > I hope I got the idea correctly. I will give it a try (hopefully it
> > will not be slow).
>
> > Again, thanks for your quick answers.
>
> > On Jan 27, 11:05 pm, Michael Poole <mdpo...@troilus.org> wrote:
> > > Stefan writes:
> > > > What could I do reduce the risk of losing the entire list due to
> > > > arbitrary corruption? What if corruption only occurs at the end of the
> > > > file, would it be simpler to recover all the elements up to the
> > > > corruption point?
>
> > > If you serialize the elements inside the Bag to the disk individually,
> > > you could prefix them with a synchronizing marker and length.  A marker
> > > would typically be a fixed-length pattern that is unlikely to appear in
> > > legitimate data -- starting with a zero byte is a good way given
> > > Protocol Buffers data, it should contain some other (ideally uncommon)
> > > bytes for robustness.
>
> > > By reading the marker, length, message, and checking the next marker,
> > > your program can be reasonably sure that the detected message boundaries
> > > are correct.  Recovery then becomes a matter of looking for the next
> > > synchronizing marker, and checking it the same way.
>
> > > There is obviously a tradeoff between how much data you can lose with a
> > > corrupted message and the per-message overhead.  If you were using the
> > > particular example in your email, you might serialize a Bag that
> > > contains several Items rather than serializing each Item individually.
>
> > > Michael Poole
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Protocol Buffers" group.
> > To post to this group, send email to proto...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > protobuf+unsubscr...@googlegroups.com<protobuf%2bunsubscr...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to