Adam,

I agree, as we grow our system we are probably going to want to
compression in some cases, I will look into this by making the changes
in couch_file as you suggest and report back.

Norman

On Fri, Jun 18, 2010 at 5:27 AM, Adam Kocoloski <[email protected]> wrote:
> On Jun 17, 2010, at 6:00 PM, Norman Barker wrote:
>
>> Hi,
>>
>> I am looking at the couchdb db database and view index directory and I
>> see the files are saved as binary, my indexes and database are getting
>> fairly large so I tried gzipping them (by hand) and it made a big
>> difference (at least for my data).
>>
>> Looking at
>>
>> http://www.erlang.org/doc/man/file.html
>>
>> I see that compressed is an option when reading or writing a file, is
>> it worth trying this out, could it be an option in the ini file so we
>> could trade off database size versus a possible lag in access?
>>
>> I can do look into this, does everything go through the couch_file
>> module and is there a suitable test dataset that we can analyse
>> performance with?
>>
>> thanks,
>>
>> Norman
>
> Hi Norman, I'd support making gzip compression a config option.  Yes, 
> everything goes through couch_file, so adding a flag to the term_to_binary 
> calls in append_term and append_term_md5 would get you there.
>
> You should search the archives for a discussion about this.  We used to 
> compress the terms, and IIRC it almost cut the file size in half.  However, 
> it also introduced a measurable drop in write throughput.  That's a tradeoff 
> I'm sure some folks would be willing to make.
>
> One other interesting thing to investigate might be to have separate 
> compression settings for document bodies and btree nodes.  It could be that 
> one compresses more effectively than the other.  Best,
>
> Adam
>
>

Reply via email to