That could be the case. A string value is encoded with the overhead of a type(1byte) and length(1 byte or more) Compared to csv you have one byte, the "," character, if your string also is quoted that adds 2 bytes, 3 in total. Now if the string is longer that 127 bytes the protobuf prefix will grow to 3 bytes as well. So if your strings are not quoted protobuf will add one byte to each.
Furthermore if you will add an extra byte to every field with an id larger than 15. In your case: First 15 fields will have headers of at minimal 2 bytes. The last 22 fields will have a 3 byte overhead. All this assume no string is longer than 127 bytes. That is 96 bytes per record, compared to 36 if you don't use quoted strings in csv. (96-36)*200000 = 12M Which match exactly your data: 60M-48M=12M On 2014-04-11 00:50, Dan Ling wrote: > I have a 48mb CSV file with 200,000 records and 37 string columns. I > used protobuf to write the same 200,000 records to a .bin file, and > the .bin file is about 60mb. Is this expected? I thought the > protobuf file would be smaller. > -- > You received this message because you are subscribed to the Google > Groups "Protocol Buffers" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at http://groups.google.com/group/protobuf. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.
