Re: [protobuf] ProtoBuf file larger than CSV

Peter Hultqvist Sat, 12 Apr 2014 12:28:02 -0700

That could be the case.

A string value is encoded with the overhead of a type(1byte) and
length(1 byte or more)
Compared to csv you have one byte, the "," character, if your string
also is quoted that adds 2 bytes, 3 in total.
Now if the string is longer that 127 bytes the protobuf prefix will grow
to 3 bytes as well.
So if your strings are not quoted protobuf will add one byte to each.


Furthermore if you will add an extra byte to every field with an id
larger than 15.

In your case:
First 15 fields will have headers of at minimal 2 bytes.
The last 22 fields will have a 3 byte overhead.
All this assume no string is longer than 127 bytes.
That is 96 bytes per record, compared to 36 if you don't use quoted
strings in csv.

(96-36)*200000 = 12M
Which match exactly your data: 60M-48M=12M

On 2014-04-11 00:50, Dan Ling wrote:
> I have a 48mb CSV file with 200,000 records and 37 string columns.  I
> used protobuf to write the same 200,000 records to a .bin file, and
> the .bin file is about 60mb.  Is this expected?  I thought the
> protobuf file would be smaller.
> -- 
> You received this message because you are subscribed to the Google
> Groups "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected]
> <mailto:[email protected]>.
> To post to this group, send email to [email protected]
> <mailto:[email protected]>.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Re: [protobuf] ProtoBuf file larger than CSV

Reply via email to