Thanks - I've set the following values:
checkpoint_after = 524288000
doc_buffer_size = 52428800

and started the compact process. Have to wait for a bit.

-Sharath

On Sun, Jan 25, 2015 at 5:59 PM, Alexander Shorin <[email protected]> wrote:

> Ok, so far, this looks exactly what I have for my hashes databases:
>
> data_size: 557537537
> disk_size: 1542664311
> doc_count: 1298255
> doc_del_count: 18
> avg doc size: ~350 bytes
>
> While there is 3 times disk_size/data_size ratio, this database
> uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it
> at 1.5GB. This looks like a some "specifics" of underlying database
> format which isn't able to rationale allocate huge amount of tiny
> documents....But, CouchDB provides two interesting options to
> configure database compaction: doc_buffer_size and checkpoint_after.
>
> http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction
>
> By default they are have the following values:
>
> checkpoint_after = 5242880
> doc_buffer_size = 524288
>
> And this makes my hashes database to stop at 1.5GB point. If I
> multiple them both by 10, after compaction database size will be
> ~900MB - yay! If I do this again with the resulting config:
>
> checkpoint_after = 524288000
> doc_buffer_size = 52428800
>
> Then database sizes will be much more better:
>
> disk_size: 633688183
> data_size: 556759808
>
> Almost no overhead! Why this happens? Paul or Robert may correct me,
> but it seems that the most of wasted space after compaction is
> consumed by checkpoint headers and btree rebalance. Asking CouchDB to
> make compaction checkpoints rarely and use bigger buffer for docs
> allows it to build the resulting btree in the new database file in
> more optimized way. As the downsize of such configuration, if your
> compaction fails, it have to start from far and bigger buffer size
> requires more memory to use.
>
> Try to play with these options and see how they will affect on your
> databases.
>
> P.S. This issue is eventually solved for upcoming 2.0 with default config.
> --
> ,,,^..^,,,
>
>
> On Sun, Jan 25, 2015 at 9:52 AM, Sharath <[email protected]> wrote:
> > yes the databases were recently compacted - both the databases run as
> > insert only (no deletion for either).
> > database2 completed compaction about 4 hours ago and I've triggered
> > compaction again (so what you see below for database2 could be
> misleading)
> >
> > database1:
> > {
> >    "db_name":"database1",
> >    "doc_count":13337224,
> >    "doc_del_count":0,
> >    "update_seq":13337224,
> >    "purge_seq":0,
> >    "compact_running":false,
> >    "disk_size":8574615674,
> >    "data_size":6896805847,
> >    "instance_start_time":"1422157234994080",
> >    "disk_format_version":6,
> >    "committed_update_seq":13337224
> > }
> >
> > database2:
> > {
> >    "db_name":"database2",
> >    "doc_count":12982621,
> >    "doc_del_count":0,
> >    "update_seq":12982621,
> >    "purge_seq":0,
> >    "compact_running":true,
> >    "disk_size":31587352698,
> >    "data_size":8026729752,
> >    "instance_start_time":"1422157235289671",
> >    "disk_format_version":6,
> >    "committed_update_seq":12982621
> > }
> >
> > -Sharath
> >
> > On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <[email protected]>
> wrote:
> >
> >> Hm...are you sure that database was recently compacted? How many
> >> deleted documents in these databases?
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <[email protected]> wrote:
> >> > Hi Alexander,
> >> >
> >> > CouchDB version: 1.61
> >> >
> >> > database1: "disk_size":8574615674,"data_size":6896805847
> >> > database2: "disk_size":31587352698,"data_size":8026729752
> >> >
> >> > -Sharath
> >> >
> >> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <[email protected]>
> >> wrote:
> >> >
> >> >> Hi Sharath,
> >> >>
> >> >> What is your CouchDB version?
> >> >> Could you provide data_size and disk_size values from database info
> for
> >> >> both?
> >> >> curl http://localhost:5984/db1
> >> >> curl http://localhost:5984/db2
> >> >> --
> >> >> ,,,^..^,,,
> >> >>
> >> >>
> >> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <[email protected]>
> wrote:
> >> >> > Hi All,
> >> >> >
> >> >> > recently moved to couchdb and find my databases taking a lot of
> >> diskspace
> >> >> >
> >> >> > I have two database both with json documents (no attachments) -
> >> however
> >> >> the
> >> >> > sizes vary by a lot
> >> >> >
> >> >> > database1      size 8.0GB    number of documents: 13337224
> >> >> > database2      size 29.4 GB    number of documents: 12981148
> >> >> >
> >> >> > both the databases have been compacted
> >> >> >
> >> >> > each document in database1 is 487 bytes long (including _id and
> _rev)
> >> >> > each document in database2 is 564 bytes long (including _id and
> _rev)
> >> >> >
> >> >> > database1 should be ~6.1GB (only data without compression) [487 *
> >> >> 13337224
> >> >> > / 1024 /1024]
> >> >> > database2 should be ~6.9GB (only data without compression) [564 *
> >> >> 12981148
> >> >> > / 1024 /1024]
> >> >> >
> >> >> > I'm curious why the database file takes 29 GB.
> >> >> >
> >> >> > unfortunately I cannot post the document as this is prod data.
> >> >> >
> >> >> > CouchDb is running on my mac 10.10.1 with default configuration.
> >> >> >
> >> >> > database1 was populated by a bulk upload from a mysql extract and
> >> >> database
> >> >> > 2 was populated by individual document inserts (put) database
> >> compaction
> >> >> > was let to complete (took ~30hr on database 2)
> >> >> >
> >> >> > is there a command that compacts superfluous data? or am i missing
> >> >> anything?
> >> >> >
> >> >> >
> >> >> > thanks!
> >> >> >
> >> >> > -Sharath
> >> >>
> >>
>

Reply via email to