Adam, I agree, as we grow our system we are probably going to want to compression in some cases, I will look into this by making the changes in couch_file as you suggest and report back.
Norman On Fri, Jun 18, 2010 at 5:27 AM, Adam Kocoloski <[email protected]> wrote: > On Jun 17, 2010, at 6:00 PM, Norman Barker wrote: > >> Hi, >> >> I am looking at the couchdb db database and view index directory and I >> see the files are saved as binary, my indexes and database are getting >> fairly large so I tried gzipping them (by hand) and it made a big >> difference (at least for my data). >> >> Looking at >> >> http://www.erlang.org/doc/man/file.html >> >> I see that compressed is an option when reading or writing a file, is >> it worth trying this out, could it be an option in the ini file so we >> could trade off database size versus a possible lag in access? >> >> I can do look into this, does everything go through the couch_file >> module and is there a suitable test dataset that we can analyse >> performance with? >> >> thanks, >> >> Norman > > Hi Norman, I'd support making gzip compression a config option. Yes, > everything goes through couch_file, so adding a flag to the term_to_binary > calls in append_term and append_term_md5 would get you there. > > You should search the archives for a discussion about this. We used to > compress the terms, and IIRC it almost cut the file size in half. However, > it also introduced a measurable drop in write throughput. That's a tradeoff > I'm sure some folks would be willing to make. > > One other interesting thing to investigate might be to have separate > compression settings for document bodies and btree nodes. It could be that > one compresses more effectively than the other. Best, > > Adam > >
