Stefan, Under what circumstances does persistence occur? If the message does not break into smaller ones naturally and it makes sense to keep the message in one piece, you can also use some checksum algorithm to verify your persistence.
-Madhav Ancha On Wed, Jan 27, 2010 at 11:57 PM, Stefan <sneg...@gmail.com> wrote: > Kenton, Michael thanks for your quick answers (that was fast!). The > suggestions are great and to the point (and if I remember correctly > the approach was mentioned before). > > So, a possible solution would be to break a big Bag into a couple of > Bags with smaller number of items. Now, I would need a mechanism to > write those smaller Bags delimited by some sort of a frame or marker. > Next step would be to discard corrupted messages from the file (a > corrupt message is one that does not parse) and seek to the next > marker. If I want to lose no more than one message per corruption, I > would need to write each Item separately but the overhead from the > markers would be bigger. On the other hand, if the Bag has too many > Items then I have the chance of losing too much data on a single > corruption. Aside from the markers, I would get overhead from > collating and separating the lists each time I need to use/save a big > Bag. > > I hope I got the idea correctly. I will give it a try (hopefully it > will not be slow). > > Again, thanks for your quick answers. > > On Jan 27, 11:05 pm, Michael Poole <mdpo...@troilus.org> wrote: > > Stefan writes: > > > What could I do reduce the risk of losing the entire list due to > > > arbitrary corruption? What if corruption only occurs at the end of the > > > file, would it be simpler to recover all the elements up to the > > > corruption point? > > > > If you serialize the elements inside the Bag to the disk individually, > > you could prefix them with a synchronizing marker and length. A marker > > would typically be a fixed-length pattern that is unlikely to appear in > > legitimate data -- starting with a zero byte is a good way given > > Protocol Buffers data, it should contain some other (ideally uncommon) > > bytes for robustness. > > > > By reading the marker, length, message, and checking the next marker, > > your program can be reasonably sure that the detected message boundaries > > are correct. Recovery then becomes a matter of looking for the next > > synchronizing marker, and checking it the same way. > > > > There is obviously a tradeoff between how much data you can lose with a > > corrupted message and the per-message overhead. If you were using the > > particular example in your email, you might serialize a Bag that > > contains several Items rather than serializing each Item individually. > > > > Michael Poole > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to proto...@googlegroups.com. > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com<protobuf%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.