On Tue, Oct 19, 2010 at 06:45, ksamdev <ksam...@gmail.com> wrote: > Hi, > > I am wondering how do Protocol Buffers read input files? Is the entire > file read into memory or some proxy technique is used and entries are > read only when required?
If you have a sequence of messages you process then you'd put some container around them in the file. A very simple scheme would be <lengh-of-next-message><next-message>. That way you can read the messages one by one. This has been discussed several times on this list. (You are free to hide this in some proxy technique implementation though it will just complicate things without much gain). The protocol buffer library tries to be as simple as possible in providing the pure serialization functionality. It provides everything to allow sending them over the wire or storing them in files, but you actually would need to do that yourself (adding that to the core protocol buffer library would be beyond the scope and you might already have something you would like to store your data in, such as Berkely DB for keyed data). > > This is a vital feature for large lists, say, some dataset with 10^9 > messages. I regularly process datasets with more than 10^9 Protocol Buffer messages and essentially store them the way I described above. Depending on the content, it helps to use a compressing scheme on the file level (Many people use GZip streams). (For larger datasets it actually makes sense to add some sort of CRC as disks have a noticeable error rate at that size). -h > Do Protocol Buffers use any additional archiving technique (zip, tar, > etc.) to further compress saved information? > > sincerely, Sam. > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to proto...@googlegroups.com. > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.