Note that the inefficiency of repeated primitive types will be fixed...
 eventually...  but, yeah, right now it's no good.

On Fri, Sep 12, 2008 at 2:09 AM, Chris <[EMAIL PROTECTED]> wrote:

> Nicolas wrote:
> > Can anyone with some experience in these matters, and especially of
> > alternative formats, e.g. netCDF, comment on this and recommend a
> > standard well-supported solution?
> I do not think scientific data should be stored with protocol-buffers.
> I would suggest, since netCDF-4 now encloses HDF5, the you also look at
> HDF5 (e.g. wikipedia is always a good
> source of links).
> The netCDF and HDF5 are both self-describing data file formats unlike
> protocol-buffer's wire format.
> It is impossible to read back and analyze a protocol-buffer without the
> .proto file description because the wire format does not hold the actual
> type of any of the data (e.g. bool vs int vs unsigned int vs
> enumeration, or string vs bytes vs embedded message).
> The protocol-buffer wire format for repeated elements, such as you need
> for scientific data vectors and arrays, is relatively inefficient since
> it includes the "field# + wire tag" before each and every single number
> in your file.  This kills efficient bulk reading and writing unless you
> create a new format and embed it as a binary blob.
> So you end up using HDF5 or something else anyway.
> Cheers,
>   Chris
> >

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at

Reply via email to