Thanks - I've set the following values: checkpoint_after = 524288000 doc_buffer_size = 52428800
and started the compact process. Have to wait for a bit. -Sharath On Sun, Jan 25, 2015 at 5:59 PM, Alexander Shorin <[email protected]> wrote: > Ok, so far, this looks exactly what I have for my hashes databases: > > data_size: 557537537 > disk_size: 1542664311 > doc_count: 1298255 > doc_del_count: 18 > avg doc size: ~350 bytes > > While there is 3 times disk_size/data_size ratio, this database > uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it > at 1.5GB. This looks like a some "specifics" of underlying database > format which isn't able to rationale allocate huge amount of tiny > documents....But, CouchDB provides two interesting options to > configure database compaction: doc_buffer_size and checkpoint_after. > > http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction > > By default they are have the following values: > > checkpoint_after = 5242880 > doc_buffer_size = 524288 > > And this makes my hashes database to stop at 1.5GB point. If I > multiple them both by 10, after compaction database size will be > ~900MB - yay! If I do this again with the resulting config: > > checkpoint_after = 524288000 > doc_buffer_size = 52428800 > > Then database sizes will be much more better: > > disk_size: 633688183 > data_size: 556759808 > > Almost no overhead! Why this happens? Paul or Robert may correct me, > but it seems that the most of wasted space after compaction is > consumed by checkpoint headers and btree rebalance. Asking CouchDB to > make compaction checkpoints rarely and use bigger buffer for docs > allows it to build the resulting btree in the new database file in > more optimized way. As the downsize of such configuration, if your > compaction fails, it have to start from far and bigger buffer size > requires more memory to use. > > Try to play with these options and see how they will affect on your > databases. > > P.S. This issue is eventually solved for upcoming 2.0 with default config. > -- > ,,,^..^,,, > > > On Sun, Jan 25, 2015 at 9:52 AM, Sharath <[email protected]> wrote: > > yes the databases were recently compacted - both the databases run as > > insert only (no deletion for either). > > database2 completed compaction about 4 hours ago and I've triggered > > compaction again (so what you see below for database2 could be > misleading) > > > > database1: > > { > > "db_name":"database1", > > "doc_count":13337224, > > "doc_del_count":0, > > "update_seq":13337224, > > "purge_seq":0, > > "compact_running":false, > > "disk_size":8574615674, > > "data_size":6896805847, > > "instance_start_time":"1422157234994080", > > "disk_format_version":6, > > "committed_update_seq":13337224 > > } > > > > database2: > > { > > "db_name":"database2", > > "doc_count":12982621, > > "doc_del_count":0, > > "update_seq":12982621, > > "purge_seq":0, > > "compact_running":true, > > "disk_size":31587352698, > > "data_size":8026729752, > > "instance_start_time":"1422157235289671", > > "disk_format_version":6, > > "committed_update_seq":12982621 > > } > > > > -Sharath > > > > On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <[email protected]> > wrote: > > > >> Hm...are you sure that database was recently compacted? How many > >> deleted documents in these databases? > >> -- > >> ,,,^..^,,, > >> > >> > >> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <[email protected]> wrote: > >> > Hi Alexander, > >> > > >> > CouchDB version: 1.61 > >> > > >> > database1: "disk_size":8574615674,"data_size":6896805847 > >> > database2: "disk_size":31587352698,"data_size":8026729752 > >> > > >> > -Sharath > >> > > >> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <[email protected]> > >> wrote: > >> > > >> >> Hi Sharath, > >> >> > >> >> What is your CouchDB version? > >> >> Could you provide data_size and disk_size values from database info > for > >> >> both? > >> >> curl http://localhost:5984/db1 > >> >> curl http://localhost:5984/db2 > >> >> -- > >> >> ,,,^..^,,, > >> >> > >> >> > >> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <[email protected]> > wrote: > >> >> > Hi All, > >> >> > > >> >> > recently moved to couchdb and find my databases taking a lot of > >> diskspace > >> >> > > >> >> > I have two database both with json documents (no attachments) - > >> however > >> >> the > >> >> > sizes vary by a lot > >> >> > > >> >> > database1 size 8.0GB number of documents: 13337224 > >> >> > database2 size 29.4 GB number of documents: 12981148 > >> >> > > >> >> > both the databases have been compacted > >> >> > > >> >> > each document in database1 is 487 bytes long (including _id and > _rev) > >> >> > each document in database2 is 564 bytes long (including _id and > _rev) > >> >> > > >> >> > database1 should be ~6.1GB (only data without compression) [487 * > >> >> 13337224 > >> >> > / 1024 /1024] > >> >> > database2 should be ~6.9GB (only data without compression) [564 * > >> >> 12981148 > >> >> > / 1024 /1024] > >> >> > > >> >> > I'm curious why the database file takes 29 GB. > >> >> > > >> >> > unfortunately I cannot post the document as this is prod data. > >> >> > > >> >> > CouchDb is running on my mac 10.10.1 with default configuration. > >> >> > > >> >> > database1 was populated by a bulk upload from a mysql extract and > >> >> database > >> >> > 2 was populated by individual document inserts (put) database > >> compaction > >> >> > was let to complete (took ~30hr on database 2) > >> >> > > >> >> > is there a command that compacts superfluous data? or am i missing > >> >> anything? > >> >> > > >> >> > > >> >> > thanks! > >> >> > > >> >> > -Sharath > >> >> > >> >
