Stefan writes:

> What could I do reduce the risk of losing the entire list due to
> arbitrary corruption? What if corruption only occurs at the end of the
> file, would it be simpler to recover all the elements up to the
> corruption point?

If you serialize the elements inside the Bag to the disk individually,
you could prefix them with a synchronizing marker and length.  A marker
would typically be a fixed-length pattern that is unlikely to appear in
legitimate data -- starting with a zero byte is a good way given
Protocol Buffers data, it should contain some other (ideally uncommon)
bytes for robustness.

By reading the marker, length, message, and checking the next marker,
your program can be reasonably sure that the detected message boundaries
are correct.  Recovery then becomes a matter of looking for the next
synchronizing marker, and checking it the same way.

There is obviously a tradeoff between how much data you can lose with a
corrupted message and the per-message overhead.  If you were using the
particular example in your email, you might serialize a Bag that
contains several Items rather than serializing each Item individually.

Michael Poole

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to