[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-24 Thread Jan Lehnardt (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090078#comment-13090078
 ] 

Jan Lehnardt commented on COUCHDB-1153:
---

Paul, this came across wrong, sorry for the added confusion and feelings hurt.

My post was aimed at let's get back to work, not disregarding your efforts.

Just to close the loop, I was going through this ticket and dev@ looking for 
open issues and then categorised them into this code needs work and this 
code needs work and thus should not be in trunk at this point (with the 
obvious goal of removing all of the latter ones, so we can retroactively have 
the patch in trunk okayed). I used the catch-all bikedshed for the former 
category and hence totally disregarded your comments. I was meaning to say that 
these aren't blockers (in my opinion) that would warrant a revert of the patch 
and I didn't mean to imply that this is some opinion you pulled out of thin 
air. Sorry, again.

I hope we can put this to rest now.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Jan Lehnardt (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089354#comment-13089354
 ] 

Jan Lehnardt commented on COUCHDB-1153:
---

Robert, thanks for raising the review concerns, I thought between this ticket 
and dev@ they were all sorted out. I was wrong. 

Filipe, thanks for addressing Paul's initial concerns.

Let's look at the open items:

1. 0.6 vs 60%: total bikeshed, I like 60% better, I think it is more 
user-friendly. But this is in no way a blocker and if someone feels strongly 
about this, I'm happy to be convinced in passing a patch.

2. background-sweep: I think both Paul's and Robert's concerns are valid. 
There's a reason this feature is disabled by default. We need to see some field 
testing before we know how it all works out in the end. I think it is futile to 
a) pretend we know what it looks like in practice* and b) aim for an 
improvement without having a reproducible benchmark.

Paul, of the four detailed points you raise, I only found the receive timeout 
one to be a non-bikeshed. Filipe, could you look at that one?

* I specifically don't want to discard the operational experience of the 
Cloudant folks here, their input is very valuable here.


In summary: This is a major new feature. This is our first stab at it. It won't 
work perfectly in all conditions, not the ones we have thought of and not the 
ones we haven't thought of. But we'll never know if we don't put it out there. 
That's why it is disabled by default.

This isn't to say we should start half-baked features to users. We shouldn't, 
but I believe this patch is in shape enough to go out there and get tested. 
There are plenty of features we added over the time that we had to refine later 
on, that's how this works. I don't see how this should be suddenly different 
for this specific feature.



 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089361#comment-13089361
 ] 

Benoit Chesneau commented on COUCHDB-1153:
--

My point hasn't be addressed too. I also put a formal -1 that have been 
ignored. I'm not so happy with that but that's not big deal either.

One of the point I made join rnewson  davisp concern. I would prefer all this 
this thing more evented/asynchronous and in anycase (except on startup) be 
based on polling db lists and such. That open the door to a lot of expected 
problem. This daemon is also not the only one to run around. I've the feeling 
we could have a generic service in couch handling a pool of workers reacting on 
db events with different kind of workers could be used for that and useful for 
others purposes too. I will provide such thing asap (developed for refuge) 
probably on thursday on the release.

Second is config / db. Having it configured in an ini file is not the best 
thing to do. Having to parse n lines / dbs is awkward. I would prefer this 
config like readers/admins set on a db level.  On that part , it can of course 
be added later but I would really prefer we handle it when 1.2 is out.

I will open another ticket for this _meta thing.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Jan Lehnardt (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089380#comment-13089380
 ] 

Jan Lehnardt commented on COUCHDB-1153:
---

Thanks for the reminder Benoit. 

Your -1 was one of the points that went over dev@ and got lost in this ticket.

My rebuttal was that while I agree that a per-db config would be nice, we don't 
have an infrastructure for that currently and that adding one is out of the 
scope of this ticket. Once we have one, we should totally make use of it for 
this feature. I hope that convinces you to retract the -1. Sorry for not making 
this more implicit in this ticket earlier.

As for the practical concerns, we need real world testing and benchmarks for 
problematic situations, no amount of armchair jiraing will get us any closer to 
a solution, so I'd again propose to retroactively OK that this patch lands in 
trunk.


 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089391#comment-13089391
 ] 

Robert Newson commented on COUCHDB-1153:


For the record, my interrupt is better than polling _all_dbs thing is derived 
from operational experience.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
 ;The `foo` database is compacted if its fragmentation is 70% or more.
 ;Any view index of this database is compacted only if its fragmentation
 ;is 60% or more.
 ;
 ; 2) foo = 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Jan Lehnardt (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089419#comment-13089419
 ] 

Jan Lehnardt commented on COUCHDB-1153:
---

Robert, I wasn't expecting anything else, but without being able to quantify 
either situation, andy discussion is moot.


 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
 ;The `foo` database is compacted if its fragmentation is 70% or more.
 ;Any view index of this database is compacted only if its fragmentation
 ;is 60% or more.
 ;
 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089512#comment-13089512
 ] 

Damien Katz commented on COUCHDB-1153:
--

Robert, Benoit, your issues can still be addressed. You can submit patches that 
improve upon Filipes work. But telling Filipe to code the patch your way, 
without code is not how this community works. Filipe's work is a feature people 
care about, and any objections of correctness have been addressed. Switching 
the code to an evented model, or any other improvements is welcome from you or 
any other community member, but users want this feature, and Filipe should not 
be expected to code it up to everyone else expectations before any check-in can 
occur. Improvement can, and should, happen continuously.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Filipe Manana (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089727#comment-13089727
 ] 

Filipe Manana commented on COUCHDB-1153:


Paul,
Thanks for your long analysis. My use case test, with a fairly large amount of 
databases, was under the scenario where they're constantly being updated with 
new documents, with delayed commits set to true (which makes couch_db:is_idle/1 
return false very often) and several indexers running at the same time (not 
more than 25 indexers).

I agree this could be much better in all aspects, and that's the motivation for 
having it disabled by default and having details such as the periodicity of the 
scans configurable. The allowed period parameter also helps people ensuring 
compactions will happen only in low activity periods (not an uncommon case).

I would like to see all these details improved, together with a better 
configuration system (either using such a _meta object or _local documents per 
database, or whatever else), but that is far beyond this feature.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089978#comment-13089978
 ] 

Paul Joseph Davis commented on COUCHDB-1153:


Gonna make this in two comments. This one will be the non-ranty version.

@Jan

Calling waving your hands at something and calling it bike-shedding is not 
overly productive. I listed specific points on why I prefer 0.6 over 60%. If 
you would like to argue something along the lines of I see your point, but I 
believe that the percent format will be more intuitive to users, that's fine. 
But to dismiss it as bike-shedding is a bit off putting.

Discarding the point about reducing the redundancy of code as bike-shedding is 
equally off putting. If some asks me to review code and I make suggestions, I'd 
rather not expect people to walk by and say that's just bike-shedding.

On the other hand I do admit that the order of functions can come off as 
bike-shedding. Then again I also feel obliged to apologize to other Erlangers 
when they mention they've looked at our code base.


@Filipe

No worries. couch_server is quite a hard bit of code to test in any sort of 
thorough manner at the scale that matters. I'm still mulling over how to test 
it so that I can have a way to decide if a fix has fixed anything or not. Its 
on my agenda to address in the next couple weeks so I'll make sure and ping you 
with anything I find.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089991#comment-13089991
 ] 

Paul Joseph Davis commented on COUCHDB-1153:


I've deleted at least four versions of a ranty comment. Basically it all boils 
down to this:


Can we get past this meta discussion bullshit and get back to engineering 
yet?

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
 ;The `foo` database is compacted if its fragmentation is 70% or more.
 ;Any view index of this 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-22 Thread Filipe Manana (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088483#comment-13088483
 ] 

Filipe Manana commented on COUCHDB-1153:


Thanks for your concerns Robert.

Regarding the percentage, I haven't seen before any comment about it. I think 
it's more clear to users what the ranges are when explicitly using percentages. 
Either way this is a very minor thing imho.

I've been testing this over the last 2 months or so in systems with a high 
number of databases where all of them are constantly being updated. It behaves 
fairly well, specially for the case where the number of databases is = 
max_dbs_open. I haven't seen the case where a compaction lasts forever due to 
retries (2 to 5 retries were the worst scenario I think I ever saw). There's a 
tradeoff always.

Asking the filesystem for a list of .couch files and opening each respective 
database to check if it needs to be compacted - There's a price to pay here 
yes, but there are also costs of not compacting databases often - databases 
(and views) with high amounts of wasted space are much less cache friendly, and 
the server can run out of disk space quickly, something clearly undesirable as 
well.

This is far from perfect yes (and so are many features of CouchDB or any other 
software). The periodicity of the scans if configurable, and doing such scans 
in systems with a small amount of databases (= max_dbs_open) is not a major 
problem. Such deployments aren't an uncommon case.

If you have a patch or concrete proposal, I'll be happy to review and get it 
into the repository.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ; 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-22 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088676#comment-13088676
 ] 

Robert Newson commented on COUCHDB-1153:


The comment about percentages was just a minor thing and certainly not part of 
my earlier concerns. I still feel that 60% is a UI-level thing, but it's not 
a blocker for me.

It sounds like you've tested under the conditions that concerned me, though 
your findings contradict my own. I also agree that it's a matter of tradeoffs.

My proposal would be to have a much less frequent crawl of _all_dbs but to also 
build a higher priority queue of actively updated databases by hooked into 
db_update_notifier. This seems to work out quite well in practice. The 
infrequent background sweep would ensure that eventually everything gets 
compacted.

In any case, my objections were relatively minor and easily addressable, my 
main issue was the apparent disregard for consensus building in this instance. 
It turns out to have been a simple misunderstanding, fortunately.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-22 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089077#comment-13089077
 ] 

Paul Joseph Davis commented on COUCHDB-1153:


@Filipe

Re you're earlier comment: It behaves fairly well, specially for the case 
where the number of databases is = max_dbs_open.

Yes. That's the be expected. Re-reading my earlier comment I wasn't as clear as 
I could have been.

The issue here is that couch_server's LRU cache can easily turn into a table 
scan for every incoming open/create message that it receives. There are a 
couple of conditions that you need to satisfy for it to become noticeable, but 
when it happens it turns into a positive feedback loop that grinds couch_server 
to a halt and eventually crashes the VM due to running out of memory because it 
can't process its mailbox fast enough.

First condition is that you have a large amount of active databases that is 
near the max_dbs_open limit. The reason that this is important is that you need 
a large number of databases for which couch_db:is_idle/1 returns false. The way 
that couch_server's LRU works is by checking the oldest used DB, and if its 
idle it scans for the next one. If there are lots of active db's and a largish 
max_dbs_open setting, this can turn into a largish loop as it scans through ETS 
looking for an idle db.

I haven't tried triggering this on purpose yet, but if I were going to I'd 
start by setting the max_dbs_open to something like 1000, open 990 or so 
clients that are all listening to continuous changes to make sure that is_idle 
returns false for most db's. The test then would be to run a load test under 
this condition while the auto-compactor loops through all dbs. This would be 
especially painful where max_dbs_open covers say a 20% list of hot databases 
with some breathing room for the less often used databases.

Obviously the correct solution here is to fix couch_server to not suck, but a 
proper fix there is going to take some serious engineering and will require 
modifying some critical pieces of code. The worry with the auto-compactor is 
that its going to make hitting this error condition more likely as it churns 
through a possibly large number of databases eating up open db slots in 
couch_server's ets tables. Then again it may be fine, but it doesn't sound like 
anyone's addressed it.



Turning our attention to the patch itself, you've addressed most of what I 
commented on before but there are still things that I'd like to see changed 
that don't relate to the performance questions:

* The issue with adding value formats like 60% is that you have to spend time 
writing and maintaing code that's essentially useless. There's nothing that  a 
% sign indicates that a simple comment wouldn't handle. And yet the parsing 
itself is prone to barfing on users if they happen to make a small typo or 
similar error. Specific error conditions that are apparent just reading the 
code  60% 60% , 60%5 60%% etc etc. There's also no default clause when 
setting record members which will puke on users as well.

* There are some record tricks you can use to remove some of the redundancy in 
this config code. Also, I've found it more sane to have two passes for this 
sort of thing. The first pass sets the values and the second pass enforces 
constraints. This makes things like the handling for #period much nicer.

* There's still not a timeout on that receive clause for the parallel view 
builds. I understand there's a receive clause further down and it could never 
happen that we get stuck there. But we could. Cause there's no timeout.

* The order of function definitions in this module is giving me the rage eyes. 
But maybe I'm the only crotchety bastard that grumbles to himself about such 
things.


 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-21 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088331#comment-13088331
 ] 

Robert Newson commented on COUCHDB-1153:


I'm concerned that this landed on trunk without a follow-up review once you'd 
addressed Paul' concerns. Since we will all share the burden of maintenance 
once this is included in a release, a little more effort to gain consensus 
would have been appreciated.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
 ;The `foo` database is compacted 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-21 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088332#comment-13088332
 ] 

Robert Newson commented on COUCHDB-1153:


as a small note, and one Paul mentioned in passing, it seems simpler if the 
percentage values were expressed as ratios instead. That is, 0.6 instead of 
60%.

I'd also like more detail on the impact of periodically crawling all_dbs on a 
very active system. Having seen a significant negative impact of that approach 
in production I remain skeptical that it's a viable approach. I can contribute 
a patch to hook actively updated databases, for example.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085599#comment-13085599
 ] 

Robert Newson commented on COUCHDB-1153:


Could you hold off on this commit until after the srcmv? I'd really prefer to 
see it be added as a separate, optional application, not core. Different 
environments will need quite different approaches to compaction scheduling.

It seems this patch causes a periodic scan of all_dbs? If so, I don't think 
that's going to fly in a hosted environment like Cloudant's (or, presumably, 
IrisCouch).

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085602#comment-13085602
 ] 

Benoit Chesneau commented on COUCHDB-1153:
--

I'm -1 on this patch. Passing db options in the ini file seems awkward. But I 
really like the idea of a daemon.

We should rather have these options saved  when creating a db via query 
parameters or headers.  It may be the perfect time to transform this 
_security object in a _meta object used to save such db's settings . So we 
could do :

create a db:

PUT /db?db_fragmentation= 

Update setting

PUT /db/_meta 

Options could be passed as a meta document when creating the db too rather than 
passing an empty body. 

Note the _meta object could be later used for other purposes by app developers 
to annotate a db.. Like some devs already do with this _security object.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085603#comment-13085603
 ] 

Benoit Chesneau commented on COUCHDB-1153:
--

about the _all_dbs scanning, maybe we could have a database maintaing created 
dbs like cloudant do. Or Elasticsearch for that purpose. Rather than scanning 
_all_dbs it oculd react  on _changes ?

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
 ;The `foo` database is compacted if its fragmentation is 70% or more.
 ;Any view index 

Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau
On Tue, Aug 16, 2011 at 11:30 AM, Filipe Manana (JIRA) j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085605#comment-13085605
  ]

 Filipe Manana commented on COUCHDB-1153:
 

 I'm -1 on adding such a _meta thing.

why?


 I don't understand either that idea of _changes nor how it can be applied.

creating db, adding db document to dbs db., update - update db document.


[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085613#comment-13085613
 ] 

Benoit Chesneau commented on COUCHDB-1153:
--

why not? I'm -1 on -1 without any arguments. And... security object is already 
used for such uses around. Annotating dbs is also something people wants 
around. 


Creating a db - create a db document, Update - update ? Simple enough. Can be 
used by people who want to have a db listener for any purpose. (Also solve an 
old ticket).




 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = 

Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau
On Tue, Aug 16, 2011 at 11:46 AM, Filipe David Manana
fdman...@apache.org wrote:
 On Tue, Aug 16, 2011 at 2:38 AM, Benoit Chesneau bchesn...@gmail.com wrote:
 On Tue, Aug 16, 2011 at 11:30 AM, Filipe Manana (JIRA) j...@apache.org 
 wrote:

    [ 
 https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085605#comment-13085605
  ]

 Filipe Manana commented on COUCHDB-1153:
 

 I'm -1 on adding such a _meta thing.

 why?

 From your description, that _meta sounds like something that can be
 done with _local docs. But that is a whole separate discussion and
 topic I think.


Could be a local docs, But why didn't we took this path for this
_security object ? Also since they are really meta informations,
i've the feeling it should be solved as a special member in the db
file, just like the _security object.

Anyway what I really dislike is saving per db configuration in an ini
file. Per db configuration should be done on the db. What if you more
than 100 dbs. Having 100 lines in an ini file to parse is awkward.
meta informations (like security, db params, ...) should be saved in
the db file and available in the same time. Since we have already this
_security object that is available when you open why not reusing it ?



 I don't understand either that idea of _changes nor how it can be applied.

 creating db, adding db document to dbs db., update - update db document.

 You'll have to elaborate a lot more than that :) I'm not familiar with
 that bigcouch special db nor elasticsearch.

 Reacting to a changes feed of some database it's not something easy
 (the _replicator db is such a case and might have been the hardest
 thing i did ever for couch, really)



This is just as simple as this line, creating a db create an entry in
a db index (or db file) that you can use later.

 I suspect what you think is something like rather than scanning
 periodically, to let the daemon be notified when a db (or view) can be
 compacted?
 At some point I considered reacting to db_updated events but this was
 pretty much flooding the the event handler (daemon).
 Was this your idea?


Using db events is my idea yes.  If t actually flood the db event
handler (not sure why), then maybe we should fix it first?

- benoit


Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Filipe David Manana
On Tue, Aug 16, 2011 at 2:58 AM, Benoit Chesneau bchesn...@gmail.com wrote:

 Could be a local docs, But why didn't we took this path for this
 _security object ? Also since they are really meta informations,
 i've the feeling it should be solved as a special member in the db
 file, just like the _security object.

I don't know why _security is like it is now, that predates me, and
it's another topic :)


 Anyway what I really dislike is saving per db configuration in an ini
 file. Per db configuration should be done on the db. What if you more
 than 100 dbs. Having 100 lines in an ini file to parse is awkward.

I don't think the common case is to have a separate compact config for
every single database.
The fragmentation parameter, which is likely the most useful, you're
likely to not set a different value for 100 databases (neither the
period for e.g.).

For other things like the oauth tokens/secrets, the .ini system
doesn't scale. But that's again another topic.

 This is just as simple as this line, creating a db create an entry in
 a db index (or db file) that you can use later.

 I suspect what you think is something like rather than scanning
 periodically, to let the daemon be notified when a db (or view) can be
 compacted?
 At some point I considered reacting to db_updated events but this was
 pretty much flooding the the event handler (daemon).
 Was this your idea?


 Using db events is my idea yes.  If t actually flood the db event
 handler (not sure why), then maybe we should fix it first?

The problem is when you have many dbs in the system and under a
reasonable write load, the daemon (which is the receiver of db_updated
events) receives too many messages. To know if you need to compact the
db after such message, you need to open it, and opening it on every
message is a big burden as well.
I tried this on a system with 1024 databases being updated constantly.
It also doesn't deal with the case on startup where if a db with a
high fragmentation is not updated for a long period, it won't have
compaction started.

If someone can measure the current solution's impact and present
another working alternative with a lower impact (and practical tests,
not just theory) I would be the first one wanting to make the change
asap.


 - benoit




-- 
Filipe David Manana,
fdman...@gmail.com, fdman...@apache.org

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.


[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Filipe Manana (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086096#comment-13086096
 ] 

Filipe Manana commented on COUCHDB-1153:


Paul,

I'm addressing all the concerns pointed before. Some of them are already done 
and be tracked in individual commits:
https://github.com/fdmanana/couchdb/commits/compaction_daemon

The thing I'm not 100% sure is how to make the config and load/start of os_mon. 
I'm not that familiar with all those OTP structuring details. I've come up with 
this so far:

http://friendpaste.com/R43WflJ8r75MupvXuS98v

Using the args_file you mentioned makes sense, but we already have some stuff 
that could be moved into that new file, so I think it should go into a separate 
change. I'll help/do it, just need to figure out exactly how to do it and 
integrate into the build system / startup scripts.


 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-15 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085491#comment-13085491
 ] 

Paul Joseph Davis commented on COUCHDB-1153:


Couple notes so far.

I'm don't care much either way, but I would've just parsed proplists from 
Erlang terms from the config file like we do for other various options instead 
of creating the key=val syntax goop.

Never register anonymous config change functions. Always register functions 
using the M:F/A pattern. This has to do with how functions are called and code 
reloading. If module aren't calling exported functions it'll eventually cause 
random processes to crash when the code they were referring to is purged.

I'm not a super huge fan of how os_mon is being started. There's a -args_file 
command line switch that we might want to look into supporting for VM 
configuration.

The compact_loop thing seems kinda weird. A pattern I've had luck with lately 
is to use erlang:send_interval to replace loops like that. Not super concerned 
about this, but on first skim it looks like it could clean that loop's logic up 
a bit.

Also, I'm wondering if there should be some sort of throttling on how quickly 
the scan for databases to compact runs. The concern is that for installs that 
have non-trivial numbers of databases this could start doing mean things to 
couch_server as well as start thrashing system resources by opening and closing 
a large number of files.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-15 Thread Filipe Manana (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085507#comment-13085507
 ] 

Filipe Manana commented on COUCHDB-1153:


Thanks Paul

Not sure about what you mean with the loop weirdness. Doesn't seem complicated 
to me:   loop() - do_stuff(), sleep(...), loop().

An alternative ti start os_mon (i really don't care) is to add it to list it as 
a dependency in the .app file.

You're right about the couch_server. It's part of the reason why the 
autocompaction is disabled by default. Haven't seen however yet a big issue 
with about ~1000 databases. An approach would be to wait a bit before opening a 
db if it's not in the lru cache perhahps.

Certainly there's a lot of room for improvements in auto compaction and an 
initial implementation will unlikely ever be perfect for all scenarios.



 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-15 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085523#comment-13085523
 ] 

Paul Joseph Davis commented on COUCHDB-1153:


My thoughts on the loop were based on my day dreaming that it's entirely 
possible that there's going to be feature requests to handle multiple 
simultaneous compactions. I tend to have better luck reacting to messages to 
maintain the state of a set of long running process directly from the 
gen_server rather than have this middleman process looping around accepting 
messages. Also, the more I look at this compact_loop the more things I see 
wrong with it:

* You have a Pid = spawn_link/1, MonRef = erlang:monitor(process, Pid) 
sequence for the parallel
  view compactor. One of these is redundant. You want a link if you want 
the compactor_loop to
  exit when the view compactor crashes, or you want the monitor if you just 
want to know when it dies.
* When you wait for the view compaction process to end there's no timeout. 
That means that the
  compactor loop could never move depending on whether the view compactor 
process exits or not.
* You never flush monitor messages. This means the compact_loop process 
mailbox
  will slowly fill with messages over time causing hard to track memory 
leaks.
* Views don't seem to be checked to see if they need to be compacted if 
their database doesn't
  need to be.
* View compaction holds open a reference to the database its compacting 
views for. What happens if
  views haven't finished compacting before the main database compaction 
gets swapped out?


I'd prefer to either have os_mon in an app file or started as an app when the 
VM boots. If we're going to talk about moving towards being more OTP compliant 
we should be trying to avoid adding more non-OTP bits when possible.

The important part to trigger the couch_server issues you need to have a lot of 
active databases as well as a lot of load so that try_close_lru turns into a 
table scan of that ets table. Adam rewrote couch_server quite a long time ago 
to replace this so that requests for open databases turned into a single ets 
lookup on a public table which helped quite a bit. Though it introduces the 
possibility of a race condition when opening a database that's just about to be 
shut. Since then other things have been fixed and couch_server has become a 
bottleneck again. I looked at it the other day and the only thing I came up 
with would require some non-trivial changes to the close semantics of databases.

I think the general approach here is quite good and I'm quite fine with leaving 
room for improvement. On the flip side, we need to avoid just pushing features 
into trunk without considering how we might be asked to improve them or what 
sort of maintenance cost they'll incur.




 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is