Hi Marc,

Thanks for the reply.

Ours is basically a text data. The record size varies, it is in the
range from around 300Bytes to 3kB. But most of them will be more than
1k. I serialized one record from the production, and found that, the
3kB object, when serialized, became 728Byte string. Our data is
structured and is kind of hierarchical. CPU is not a limitation in our
case too, so I was thinking that if compressing will be helpful to us
to further reduce the data size, then it will help us to save on SSD
cost.

Our proto is something similar to :
Message test {
         Optional string key = 1; [16 Byte].
         Message inner1{
                   Optional int id = 1;
                   Message inner2{
                               Optional int inner_id = 1;
                               Optional string entry = 2; [size varies
a lot..from 16 byte to 100 byte.]
                  }
                  repeated inner2 rec = 2;
        }
        repeated inner1 in = 2;
}

But this will be changing over time and we do plan to push lot of the
more information(hence change in the proto) to this store once the
basic version is rolled out.

Thanks and Regards,
Suraj Narkhede

On Sep 22, 11:40 pm, Marc Gravell <marc.grav...@gmail.com> wrote:
> This will depend on many factors:
>
> - how big is each fragment? Very small fragments of *anything* generally get 
> bigger when compressed
> - what is the data? If it contains a lot of text data you might see benefits; 
> however, many typical fragments will get bigger when compressed - it depends 
> entirely on the content
>
> In one of our uses, I cheat: I the size above some nominal lower-bound, I 
> *try* GZipStream; the moment this exceeds the original size I kill it, and 
> send the uncompressed original. I it turns out to be smaller, I store that.
>
> This works well for us as the tier that is processing this data has plenty of 
> spare CPU to speculatively try both options.
>
> Marc
>
> On 22 Sep 2011, at 11:39, Suraj <surajn.v...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Hi,
>
> > We are planning to use protocol buffer to serialized the data before
> > inserting it into the db. Then we will be inserting this serialized
> > string into the db.
>
> > We will be storing this on SSD so look up throughput is pretty high.
> > But since SSD's are costly, to save on the disk cost, I am thinking to
> > compress the serialized string before inserting it into DB.
>
> > Do someone have done the benchmarking of using GZipStream on the
> > binary serialized string?
> > Also, can you please give me any example of how to do this?  I want to
> > compress the serialized string. So data is in memory and it is not in
> > file.
>
> > Thanks.
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Protocol Buffers" group.
> > To post to this group, send email to protobuf@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > protobuf+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to