Ok, so far, this looks exactly what I have for my hashes databases: data_size: 557537537 disk_size: 1542664311 doc_count: 1298255 doc_del_count: 18 avg doc size: ~350 bytes
While there is 3 times disk_size/data_size ratio, this database uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it at 1.5GB. This looks like a some "specifics" of underlying database format which isn't able to rationale allocate huge amount of tiny documents....But, CouchDB provides two interesting options to configure database compaction: doc_buffer_size and checkpoint_after. http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction By default they are have the following values: checkpoint_after = 5242880 doc_buffer_size = 524288 And this makes my hashes database to stop at 1.5GB point. If I multiple them both by 10, after compaction database size will be ~900MB - yay! If I do this again with the resulting config: checkpoint_after = 524288000 doc_buffer_size = 52428800 Then database sizes will be much more better: disk_size: 633688183 data_size: 556759808 Almost no overhead! Why this happens? Paul or Robert may correct me, but it seems that the most of wasted space after compaction is consumed by checkpoint headers and btree rebalance. Asking CouchDB to make compaction checkpoints rarely and use bigger buffer for docs allows it to build the resulting btree in the new database file in more optimized way. As the downsize of such configuration, if your compaction fails, it have to start from far and bigger buffer size requires more memory to use. Try to play with these options and see how they will affect on your databases. P.S. This issue is eventually solved for upcoming 2.0 with default config. -- ,,,^..^,,, On Sun, Jan 25, 2015 at 9:52 AM, Sharath <[email protected]> wrote: > yes the databases were recently compacted - both the databases run as > insert only (no deletion for either). > database2 completed compaction about 4 hours ago and I've triggered > compaction again (so what you see below for database2 could be misleading) > > database1: > { > "db_name":"database1", > "doc_count":13337224, > "doc_del_count":0, > "update_seq":13337224, > "purge_seq":0, > "compact_running":false, > "disk_size":8574615674, > "data_size":6896805847, > "instance_start_time":"1422157234994080", > "disk_format_version":6, > "committed_update_seq":13337224 > } > > database2: > { > "db_name":"database2", > "doc_count":12982621, > "doc_del_count":0, > "update_seq":12982621, > "purge_seq":0, > "compact_running":true, > "disk_size":31587352698, > "data_size":8026729752, > "instance_start_time":"1422157235289671", > "disk_format_version":6, > "committed_update_seq":12982621 > } > > -Sharath > > On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <[email protected]> wrote: > >> Hm...are you sure that database was recently compacted? How many >> deleted documents in these databases? >> -- >> ,,,^..^,,, >> >> >> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <[email protected]> wrote: >> > Hi Alexander, >> > >> > CouchDB version: 1.61 >> > >> > database1: "disk_size":8574615674,"data_size":6896805847 >> > database2: "disk_size":31587352698,"data_size":8026729752 >> > >> > -Sharath >> > >> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <[email protected]> >> wrote: >> > >> >> Hi Sharath, >> >> >> >> What is your CouchDB version? >> >> Could you provide data_size and disk_size values from database info for >> >> both? >> >> curl http://localhost:5984/db1 >> >> curl http://localhost:5984/db2 >> >> -- >> >> ,,,^..^,,, >> >> >> >> >> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <[email protected]> wrote: >> >> > Hi All, >> >> > >> >> > recently moved to couchdb and find my databases taking a lot of >> diskspace >> >> > >> >> > I have two database both with json documents (no attachments) - >> however >> >> the >> >> > sizes vary by a lot >> >> > >> >> > database1 size 8.0GB number of documents: 13337224 >> >> > database2 size 29.4 GB number of documents: 12981148 >> >> > >> >> > both the databases have been compacted >> >> > >> >> > each document in database1 is 487 bytes long (including _id and _rev) >> >> > each document in database2 is 564 bytes long (including _id and _rev) >> >> > >> >> > database1 should be ~6.1GB (only data without compression) [487 * >> >> 13337224 >> >> > / 1024 /1024] >> >> > database2 should be ~6.9GB (only data without compression) [564 * >> >> 12981148 >> >> > / 1024 /1024] >> >> > >> >> > I'm curious why the database file takes 29 GB. >> >> > >> >> > unfortunately I cannot post the document as this is prod data. >> >> > >> >> > CouchDb is running on my mac 10.10.1 with default configuration. >> >> > >> >> > database1 was populated by a bulk upload from a mysql extract and >> >> database >> >> > 2 was populated by individual document inserts (put) database >> compaction >> >> > was let to complete (took ~30hr on database 2) >> >> > >> >> > is there a command that compacts superfluous data? or am i missing >> >> anything? >> >> > >> >> > >> >> > thanks! >> >> > >> >> > -Sharath >> >> >>
