On Tue, Oct 19, 2010 at 06:45, ksamdev <ksam...@gmail.com> wrote:
> Hi,
>
> I am wondering how do Protocol Buffers read input files? Is the entire
> file read into memory or some proxy technique is used and entries are
> read only when required?

If you have a sequence of messages you process then you'd put some
container around them in the file. A very simple scheme would be
  <lengh-of-next-message><next-message>.
That way you can read the messages one by one. This has been discussed
several times on this list.
(You are free to hide this in some proxy technique implementation
though it will just complicate things without much gain).

The protocol buffer library tries to be as simple as possible in
providing the pure serialization functionality. It provides everything
to allow sending them over the wire or storing them in files, but you
actually would need to do that yourself (adding that to the core
protocol buffer library would be beyond the scope and you might
already have something you would like to store your data in, such as
Berkely DB for keyed data).

>
> This is a vital feature for large lists, say, some dataset with 10^9
> messages.

I regularly process datasets with more than 10^9 Protocol Buffer
messages and essentially store them the way I described above.
Depending on the content, it helps to use a compressing scheme on the
file level (Many people use GZip streams).
(For larger datasets it actually makes sense to add some sort of CRC
as disks have a noticeable error rate at that size).

-h

> Do Protocol Buffers use any additional archiving technique (zip, tar,
> etc.) to further compress saved information?
>
> sincerely, Sam.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to 
> protobuf+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to