Re: Disk full

2019-05-02 Thread Robert Newson
Indeed puzzling.

If you delete the database (DELETE /dbname) and if this succeeds (2xx response) 
then all of the db data is deleted fully. If you think you're seeing data 
persisting after deletion you have a problem (the delete is failing, or you're 
not really deleting the db, or something extremely strange is happening).

Another  cause of invisible bloat would be failed writes (especially ones with 
attachment data) as we'll write the data as we go but if the write then fails 
that leaves the partial write in the file with nothing pointing back at it. 
Compaction will clean that up, of course.

Compaction is essential in practically all cases. You could maybe get away with 
disabling it if you don't create, update or delete a document but even in that 
case the files will grow on restart (and perhaps when the db is closed and 
reopened?) as we'll append a new database footer.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 2 May 2019, at 18:02, Adam Kocoloski wrote:
> Hi Willem,
> 
> Good question. CouchDB has a 100% copy-on-write storage engine, 
> including for all updates to btree nodes, etc. so any updates to the 
> database will necessarily increase the file size before compaction. 
> Looking at your info I don’t see a heavy source of updates, so it is a 
> little puzzling.
> 
> Adam
> 
> 
> > On May 2, 2019, at 12:53 PM, Willem Bison  wrote:
> > 
> > Hi Adam,
> > 
> > I ran "POST compact" on the DB mentioned in my post and 'disk_size' went
> > from 729884227 (yes, it had grown that much in 1 hour !?) to 1275480.
> > 
> > Wow.
> > 
> > I disabled compacting because I thought it was useless in our case since
> > the db's and the docs are so small. I do wonder how it is possible for a db
> > to grow so much when its being deleted several times a week. What is all
> > the 'air' ?
> > 
> > On Thu, 2 May 2019 at 18:31, Adam Kocoloski  wrote:
> > 
> >> Hi Willem,
> >> 
> >> Compaction would certainly reduce your storage space. You have such a
> >> small number of documents in these databases that it would be a fast
> >> operation.  Did you try it and run into issues?
> >> 
> >> Changing cluster.q shouldn’t affect the overall storage consumption.
> >> 
> >> Adam
> >> 
> >>> On May 2, 2019, at 12:15 PM, Willem Bison  wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
> >>> disk space, so much so that it regularly causes a disk full and a crash.
> >>> 
> >>> The server contains approximately 100 databases each with a reported
> >>> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
> >>> 'shards' folders combined exceeded a total 14G causing the server to
> >> crash.
> >>> 
> >>> The server is configured with
> >>> cluster.n = 1 and
> >>> cluster.q = 8
> >>> because that was suggested during setup.
> >>> 
> >>> When I write this the 'shards' folders look like this:
> >>> /var/lib/couchdb/shards# du -hs *
> >>> 869M -1fff
> >>> 1.4G 2000-3fff
> >>> 207M 4000-5fff
> >>> 620M 6000-7fff
> >>> 446M 8000-9fff
> >>> 458M a000-bfff
> >>> 400M c000-dfff
> >>> 549M e000-
> >>> 
> >>> One of the largest files is this:
> >>> curl localhost:5984/xxx_1590
> >>> {
> >>>   "db_name": "xxx_1590",
> >>>   "purge_seq":
> >>> 
> >> "0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
> >>>   "update_seq":
> >>> 
> >> "3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
> >>>   "sizes": {
> >>>   "file": 595928643,
> >>>   "external": 462778,
> >>>   "active": 1393380
> >>>   },
> >>>   "other": {
> >>>   "data_size": 462778
> >>>   },
> >>>   "doc_del_count": 0,
> >>>   "doc_count": 74,
> >>>   "disk_size": 595928643,
> >>>   "disk_format_version": 7,

Re: Disk full

2019-05-02 Thread Adam Kocoloski
Hi Willem,

Good question. CouchDB has a 100% copy-on-write storage engine, including for 
all updates to btree nodes, etc. so any updates to the database will 
necessarily increase the file size before compaction. Looking at your info I 
don’t see a heavy source of updates, so it is a little puzzling.

Adam


> On May 2, 2019, at 12:53 PM, Willem Bison  wrote:
> 
> Hi Adam,
> 
> I ran "POST compact" on the DB mentioned in my post and 'disk_size' went
> from 729884227 (yes, it had grown that much in 1 hour !?) to 1275480.
> 
> Wow.
> 
> I disabled compacting because I thought it was useless in our case since
> the db's and the docs are so small. I do wonder how it is possible for a db
> to grow so much when its being deleted several times a week. What is all
> the 'air' ?
> 
> On Thu, 2 May 2019 at 18:31, Adam Kocoloski  wrote:
> 
>> Hi Willem,
>> 
>> Compaction would certainly reduce your storage space. You have such a
>> small number of documents in these databases that it would be a fast
>> operation.  Did you try it and run into issues?
>> 
>> Changing cluster.q shouldn’t affect the overall storage consumption.
>> 
>> Adam
>> 
>>> On May 2, 2019, at 12:15 PM, Willem Bison  wrote:
>>> 
>>> Hi,
>>> 
>>> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
>>> disk space, so much so that it regularly causes a disk full and a crash.
>>> 
>>> The server contains approximately 100 databases each with a reported
>>> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
>>> 'shards' folders combined exceeded a total 14G causing the server to
>> crash.
>>> 
>>> The server is configured with
>>> cluster.n = 1 and
>>> cluster.q = 8
>>> because that was suggested during setup.
>>> 
>>> When I write this the 'shards' folders look like this:
>>> /var/lib/couchdb/shards# du -hs *
>>> 869M -1fff
>>> 1.4G 2000-3fff
>>> 207M 4000-5fff
>>> 620M 6000-7fff
>>> 446M 8000-9fff
>>> 458M a000-bfff
>>> 400M c000-dfff
>>> 549M e000-
>>> 
>>> One of the largest files is this:
>>> curl localhost:5984/xxx_1590
>>> {
>>>   "db_name": "xxx_1590",
>>>   "purge_seq":
>>> 
>> "0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
>>>   "update_seq":
>>> 
>> "3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
>>>   "sizes": {
>>>   "file": 595928643,
>>>   "external": 462778,
>>>   "active": 1393380
>>>   },
>>>   "other": {
>>>   "data_size": 462778
>>>   },
>>>   "doc_del_count": 0,
>>>   "doc_count": 74,
>>>   "disk_size": 595928643,
>>>   "disk_format_version": 7,
>>>   "data_size": 1393380,
>>>   "compact_running": false,
>>>   "cluster": {
>>>   "q": 8,
>>>   "n": 1,
>>>   "w": 1,
>>>   "r": 1
>>>   },
>>>   "instance_start_time": "0"
>>> }
>>> 
>>> curl localhost:5984/xxx_1590/_local_docs
>>> {"total_rows":null,"offset":null,"rows":[
>>> 
>> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
>>> 
>> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
>>> 
>> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
>>> 
>> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
>>> ]}
>>> 
>>> Each database push/pull replicates with a small number of clients (< 10).
>>> Most of the documents contain orders that are shortlived. We throw away
>> all
>>> db's 3 times a week as a brute force purge.
>>> Compacting has been disabled because it takes too much cpu and was
>>> considered useless in our case (small db's, purging).
>>> 
>>> I read this:
>>> https://github.com/apache/couchdb/issues/1621
>>> but I'm not sure how it helps me.
>>> 
>>> These are my questions:
>>> How is it possible that such a small db occupies so much space?
>>> What can I do to reduce this?
>>> Would changing 'cluster.q' have any effect or would the same amount of
>>> bytes be used in less folders? (am I correct in assuming that cluster.q
>>> 1
>>> is pointless in standalone configuration?)
>>> 
>>> Thanks!
>>> Willem
>> 
>> 



Re: Disk full

2019-05-02 Thread Willem Bison
Hi Adam,

I ran "POST compact" on the DB mentioned in my post and 'disk_size' went
from 729884227 (yes, it had grown that much in 1 hour !?) to 1275480.

Wow.

I disabled compacting because I thought it was useless in our case since
the db's and the docs are so small. I do wonder how it is possible for a db
to grow so much when its being deleted several times a week. What is all
the 'air' ?

On Thu, 2 May 2019 at 18:31, Adam Kocoloski  wrote:

> Hi Willem,
>
> Compaction would certainly reduce your storage space. You have such a
> small number of documents in these databases that it would be a fast
> operation.  Did you try it and run into issues?
>
> Changing cluster.q shouldn’t affect the overall storage consumption.
>
> Adam
>
> > On May 2, 2019, at 12:15 PM, Willem Bison  wrote:
> >
> > Hi,
> >
> > Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
> > disk space, so much so that it regularly causes a disk full and a crash.
> >
> > The server contains approximately 100 databases each with a reported
> > (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
> > 'shards' folders combined exceeded a total 14G causing the server to
> crash.
> >
> > The server is configured with
> > cluster.n = 1 and
> > cluster.q = 8
> > because that was suggested during setup.
> >
> > When I write this the 'shards' folders look like this:
> > /var/lib/couchdb/shards# du -hs *
> > 869M -1fff
> > 1.4G 2000-3fff
> > 207M 4000-5fff
> > 620M 6000-7fff
> > 446M 8000-9fff
> > 458M a000-bfff
> > 400M c000-dfff
> > 549M e000-
> >
> > One of the largest files is this:
> > curl localhost:5984/xxx_1590
> > {
> >"db_name": "xxx_1590",
> >"purge_seq":
> >
> "0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
> >"update_seq":
> >
> "3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
> >"sizes": {
> >"file": 595928643,
> >"external": 462778,
> >"active": 1393380
> >},
> >"other": {
> >"data_size": 462778
> >},
> >"doc_del_count": 0,
> >"doc_count": 74,
> >"disk_size": 595928643,
> >"disk_format_version": 7,
> >"data_size": 1393380,
> >"compact_running": false,
> >"cluster": {
> >"q": 8,
> >"n": 1,
> >"w": 1,
> >"r": 1
> >},
> >"instance_start_time": "0"
> > }
> >
> > curl localhost:5984/xxx_1590/_local_docs
> > {"total_rows":null,"offset":null,"rows":[
> >
> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
> >
> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
> >
> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
> >
> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
> > ]}
> >
> > Each database push/pull replicates with a small number of clients (< 10).
> > Most of the documents contain orders that are shortlived. We throw away
> all
> > db's 3 times a week as a brute force purge.
> > Compacting has been disabled because it takes too much cpu and was
> > considered useless in our case (small db's, purging).
> >
> > I read this:
> > https://github.com/apache/couchdb/issues/1621
> > but I'm not sure how it helps me.
> >
> > These are my questions:
> > How is it possible that such a small db occupies so much space?
> > What can I do to reduce this?
> > Would changing 'cluster.q' have any effect or would the same amount of
> > bytes be used in less folders? (am I correct in assuming that cluster.q
> > 1
> > is pointless in standalone configuration?)
> >
> > Thanks!
> > Willem
>
>


Re: Disk full

2019-05-02 Thread Adam Kocoloski
Hi Willem,

Compaction would certainly reduce your storage space. You have such a small 
number of documents in these databases that it would be a fast operation.  Did 
you try it and run into issues?

Changing cluster.q shouldn’t affect the overall storage consumption.

Adam

> On May 2, 2019, at 12:15 PM, Willem Bison  wrote:
> 
> Hi,
> 
> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
> disk space, so much so that it regularly causes a disk full and a crash.
> 
> The server contains approximately 100 databases each with a reported
> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
> 'shards' folders combined exceeded a total 14G causing the server to crash.
> 
> The server is configured with
> cluster.n = 1 and
> cluster.q = 8
> because that was suggested during setup.
> 
> When I write this the 'shards' folders look like this:
> /var/lib/couchdb/shards# du -hs *
> 869M -1fff
> 1.4G 2000-3fff
> 207M 4000-5fff
> 620M 6000-7fff
> 446M 8000-9fff
> 458M a000-bfff
> 400M c000-dfff
> 549M e000-
> 
> One of the largest files is this:
> curl localhost:5984/xxx_1590
> {
>"db_name": "xxx_1590",
>"purge_seq":
> "0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
>"update_seq":
> "3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
>"sizes": {
>"file": 595928643,
>"external": 462778,
>"active": 1393380
>},
>"other": {
>"data_size": 462778
>},
>"doc_del_count": 0,
>"doc_count": 74,
>"disk_size": 595928643,
>"disk_format_version": 7,
>"data_size": 1393380,
>"compact_running": false,
>"cluster": {
>"q": 8,
>"n": 1,
>"w": 1,
>"r": 1
>},
>"instance_start_time": "0"
> }
> 
> curl localhost:5984/xxx_1590/_local_docs
> {"total_rows":null,"offset":null,"rows":[
> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
> ]}
> 
> Each database push/pull replicates with a small number of clients (< 10).
> Most of the documents contain orders that are shortlived. We throw away all
> db's 3 times a week as a brute force purge.
> Compacting has been disabled because it takes too much cpu and was
> considered useless in our case (small db's, purging).
> 
> I read this:
> https://github.com/apache/couchdb/issues/1621
> but I'm not sure how it helps me.
> 
> These are my questions:
> How is it possible that such a small db occupies so much space?
> What can I do to reduce this?
> Would changing 'cluster.q' have any effect or would the same amount of
> bytes be used in less folders? (am I correct in assuming that cluster.q > 1
> is pointless in standalone configuration?)
> 
> Thanks!
> Willem



Disk full

2019-05-02 Thread Willem Bison
Hi,

Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
disk space, so much so that it regularly causes a disk full and a crash.

The server contains approximately 100 databases each with a reported
(Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
'shards' folders combined exceeded a total 14G causing the server to crash.

The server is configured with
cluster.n = 1 and
cluster.q = 8
because that was suggested during setup.

When I write this the 'shards' folders look like this:
/var/lib/couchdb/shards# du -hs *
869M -1fff
1.4G 2000-3fff
207M 4000-5fff
620M 6000-7fff
446M 8000-9fff
458M a000-bfff
400M c000-dfff
549M e000-

One of the largest files is this:
curl localhost:5984/xxx_1590
{
"db_name": "xxx_1590",
"purge_seq":
"0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
"update_seq":
"3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
"sizes": {
"file": 595928643,
"external": 462778,
"active": 1393380
},
"other": {
"data_size": 462778
},
"doc_del_count": 0,
"doc_count": 74,
"disk_size": 595928643,
"disk_format_version": 7,
"data_size": 1393380,
"compact_running": false,
"cluster": {
"q": 8,
"n": 1,
"w": 1,
"r": 1
},
"instance_start_time": "0"
}

curl localhost:5984/xxx_1590/_local_docs
{"total_rows":null,"offset":null,"rows":[
{"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
{"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
{"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
{"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
]}

Each database push/pull replicates with a small number of clients (< 10).
Most of the documents contain orders that are shortlived. We throw away all
db's 3 times a week as a brute force purge.
Compacting has been disabled because it takes too much cpu and was
considered useless in our case (small db's, purging).

I read this:
https://github.com/apache/couchdb/issues/1621
but I'm not sure how it helps me.

These are my questions:
How is it possible that such a small db occupies so much space?
What can I do to reduce this?
Would changing 'cluster.q' have any effect or would the same amount of
bytes be used in less folders? (am I correct in assuming that cluster.q > 1
is pointless in standalone configuration?)

Thanks!
Willem