[jira] [Commented] (COUCHDB-1242) Filtered replication silently converts all query parameters to strings
[ https://issues.apache.org/jira/browse/COUCHDB-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085014#comment-13085014 ] Robert Newson commented on COUCHDB-1242: For trunk this also needs to update the validate_doc_update fun for _replicator Filtered replication silently converts all query parameters to strings -- Key: COUCHDB-1242 URL: https://issues.apache.org/jira/browse/COUCHDB-1242 Project: CouchDB Issue Type: Bug Affects Versions: 1.0.2, 1.1 Reporter: Robert Newson Assignee: Robert Newson Attachments: 0001-throw-400-bad_request-if-any-query_params-value-is-n.patch All filtered query params are silently converted to strings. this causes tests for if (req.query.foo) to evaluate to true even for {query_params:{foo:false}} because false is true. No clean solution exists to fix this as _changes has a GET entry point and request parameters are untyped. Suggested fix is to scan the query_params in handle_replicate_req and throw a 400 Bad Request if any value is not a JSON string. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (COUCHDB-911) Repeating a doc._id in a _bulk_docs request results in erroneous Document conflict error
[ https://issues.apache.org/jira/browse/COUCHDB-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kocoloski reopened COUCHDB-911: I think the issue here is that we responde with two conflict errors, but we do end up saving the document in the database. If I get a 409 Conflict response from CouchDB I assume the database rejected the request, but it's not the case here. Repeating a doc._id in a _bulk_docs request results in erroneous Document conflict error -- Key: COUCHDB-911 URL: https://issues.apache.org/jira/browse/COUCHDB-911 Project: CouchDB Issue Type: Bug Components: HTTP Interface Affects Versions: 1.0 Environment: Cloudant BigCouch EC2 node Reporter: Jay Nelson Priority: Minor Original Estimate: 48h Remaining Estimate: 48h Repeating an _id in a _bulk_docs post data file results in both entries being reported as document conflict errors. The first occurrence actual inserts into the database, and only the second occurrence should report a conflict. curl -d '{ docs: [ {_id:foo}, {_id,foo} ] }' -H 'Content-Type:application/json' -X POST http://appadvice.cloudant.com/foo/_bulk_docs [{id:foo,error:conflict,reason:Document update conflict.},{id:foo,error:conflict,reason:Document update conflict.}] But the database shows that one new document was actually inserted. Only the second occurrence should report conflict. The first occurrence should report the _rev property of the newly inserted doc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Futon Test Suite
I thought about suggesting node's parser, especially since you'd get the REPL for free. I think the downside is that there are roughly 300 versions of node out there, and I'd hate for our tests to keep breaking because of node's development pace. libcurl is nothing if not stable. Adam On Aug 14, 2011, at 12:55 PM, Paul Davis wrote: My plan was to rewrite couch.js to use the new request/response classes internally and then when we need closer HTTP access we'd be able to have it. Same for T and Tequals. and what not. There is at least one test that we just can't make work in our current couchjs based test runner because it needs to use async HTTP requests, so at a certain point we have to at least add some of this stuff. I quite like using etap over eunit as it seems more better. Also, now that we're going to a second language for make check tests, it seems like an even better approach. Though I'm not at all married to it by any means. Also, I do understand your concerns about moving parts and uncessesary dependencies. I should get around to updating the build system to use the single file etap distribution but its never really been a concern. Another thing I've been contemplating is if it'd be beneficial to remove libcurl and replace it with node.js's parser or with the ragel parser from Mongrel. Anyway, food for thought. I'll be around this afternoon to hack. On Sun, Aug 14, 2011 at 7:50 AM, Robert Dionne dio...@dionne-associates.com wrote: Paul, This is interesting, and if you're willing to put together the new infrastructure I can help with writing tests. I would suggest a more incremental approach that's less of a rewrite (rewrites often just get you back to 0 from a user's perspective). The existing CouchDB JS object seems to work ok in terms of the http interface, and the Futon tests more or less all ran using couchjs until very recently. I would suggest getting these all running first, reusing copies of the existing CouchDB objects and such so we can hack them as needed. Then we would review and throw out all the tests that are not part of the core APIs, like the coffee stuff (I don't know why we decided to bundle coffee in there) and any tests that are for specific internals. At some point something like BigCouch is integrated in or MobileCouch we might have different make targets for the different deployments. Perhaps in that case we'd have different sets of tests. There needs to be a set of tests that can verify that the semantics of API calls is the same in CouchDB and BigCouch. So I'd say let's work backwards from what we have. Also I'm not a big fan of etap, preferring eunit mainly because it's one less moving part. For JS we already have this T(...) and TEquals() funs which seem to do the trick. All that said, I have a few hours today to hack on this today if you want some help just ping me on #couchdb Bob On Aug 12, 2011, at 11:46 AM, Paul Davis wrote: Here's a bit of a brain dump on the sort of environment I'd like to see our CLI JS tests have. If anyone has any thoughts I'd like to hear them. Otherwise I'll start hacking on this at some point over the weekend. https://gist.github.com/1142306
Re: Futon Test Suite
Not sure I follow what you mean there. When I mentioned node's HTTP parser, I meant, the parser [1]. I'd still have to write my own C adaptor for that to Spidermonkey objects. Not entirely certain on the REPL bit, but couchjs was basically a hack on top of the Spidermonkey js REPL so going back to our roots a bit there shouldn't be too hard. [1] https://github.com/ry/http-parser On Mon, Aug 15, 2011 at 8:38 AM, Adam Kocoloski kocol...@apache.org wrote: I thought about suggesting node's parser, especially since you'd get the REPL for free. I think the downside is that there are roughly 300 versions of node out there, and I'd hate for our tests to keep breaking because of node's development pace. libcurl is nothing if not stable. Adam On Aug 14, 2011, at 12:55 PM, Paul Davis wrote: My plan was to rewrite couch.js to use the new request/response classes internally and then when we need closer HTTP access we'd be able to have it. Same for T and Tequals. and what not. There is at least one test that we just can't make work in our current couchjs based test runner because it needs to use async HTTP requests, so at a certain point we have to at least add some of this stuff. I quite like using etap over eunit as it seems more better. Also, now that we're going to a second language for make check tests, it seems like an even better approach. Though I'm not at all married to it by any means. Also, I do understand your concerns about moving parts and uncessesary dependencies. I should get around to updating the build system to use the single file etap distribution but its never really been a concern. Another thing I've been contemplating is if it'd be beneficial to remove libcurl and replace it with node.js's parser or with the ragel parser from Mongrel. Anyway, food for thought. I'll be around this afternoon to hack. On Sun, Aug 14, 2011 at 7:50 AM, Robert Dionne dio...@dionne-associates.com wrote: Paul, This is interesting, and if you're willing to put together the new infrastructure I can help with writing tests. I would suggest a more incremental approach that's less of a rewrite (rewrites often just get you back to 0 from a user's perspective). The existing CouchDB JS object seems to work ok in terms of the http interface, and the Futon tests more or less all ran using couchjs until very recently. I would suggest getting these all running first, reusing copies of the existing CouchDB objects and such so we can hack them as needed. Then we would review and throw out all the tests that are not part of the core APIs, like the coffee stuff (I don't know why we decided to bundle coffee in there) and any tests that are for specific internals. At some point something like BigCouch is integrated in or MobileCouch we might have different make targets for the different deployments. Perhaps in that case we'd have different sets of tests. There needs to be a set of tests that can verify that the semantics of API calls is the same in CouchDB and BigCouch. So I'd say let's work backwards from what we have. Also I'm not a big fan of etap, preferring eunit mainly because it's one less moving part. For JS we already have this T(...) and TEquals() funs which seem to do the trick. All that said, I have a few hours today to hack on this today if you want some help just ping me on #couchdb Bob On Aug 12, 2011, at 11:46 AM, Paul Davis wrote: Here's a bit of a brain dump on the sort of environment I'd like to see our CLI JS tests have. If anyone has any thoughts I'd like to hear them. Otherwise I'll start hacking on this at some point over the weekend. https://gist.github.com/1142306
Re: Configuration Load Order
On Jul 19, 2011, at 5:28 PM, Noah Slater wrote: On 19 Jul 2011, at 09:22, Matt Goodall wrote: This makes sense to me. Personally, I don't think a generated.ini/generated.d pair is needed - just a single generated.ini would do. As well as ensuring changes are written to the last .ini file in the configuration chain a generated.ini would act very nicely as a per-instance configuration for when multiple CouchDB instances are run from the same, read-only installation, i.e. default.ini/default.d -- CouchDB default config, system-wide (R) local.ini/local.d -- local sysadmin's config, system-wide (R) generated.ini -- per-instance config (RW) As such, I would suggest naming generated.ini something more like instance.ini. I am +1 on all of this. This doesn't solve the problem that spawned this discussion: 1. Write admin = password to local.ini 2. Restart CouchDB 3. Hash gets persisted to generated.ini 4. Plain text password remains in local.ini Cheers Jan --
Re: Configuration Load Order
On 15 Aug 2011, at 18:32, Jan Lehnardt wrote: 1. Write admin = password to local.ini 2. Restart CouchDB 3. Hash gets persisted to generated.ini 4. Plain text password remains in local.ini Which one of these steps is the problem? 4? What would you have happen in place of that? That the plain text password be removed? Could we not simply leave that up to the admin to remove it from the config? What if it is needed again at some point? If I put my plain text password in a config file that I had edited by hand on a server, I would not expect it to be removed by the software. If I was concerned about saving the plain text password in the first place, I would hope that the software in question would come with an interactive prompt that would ask me for my password and write the hash out to the file for me.
Re: Configuration Load Order
On Aug 15, 2011, at 7:36 PM, Noah Slater wrote: On 15 Aug 2011, at 18:32, Jan Lehnardt wrote: 1. Write admin = password to local.ini 2. Restart CouchDB 3. Hash gets persisted to generated.ini 4. Plain text password remains in local.ini Which one of these steps is the problem? 4? What would you have happen in place of that? That the plain text password be removed? Could we not simply leave that up to the admin to remove it from the config? What if it is needed again at some point? If I put my plain text password in a config file that I had edited by hand on a server, I would not expect it to be removed by the software. If I was concerned about saving the plain text password in the first place, I would hope that the software in question would come with an interactive prompt that would ask me for my password and write the hash out to the file for me. I would expect that a plaintext admin password would never survive a server restart. If you want to change the admin-addition procedure to a startup prompt thing, I'd be happy to consider this, but currently we are stuck between a rock and a hard place because all the documentation out there suggests adding an admin to local.ini will do the trick, yet distributions that add config files to local.d/ will keep plaintext passwords around, contrary to what is documented. I consider this a bad user experience as well as a security issue. I was supporting that local.ini should come after local.d/*.ini, but dev@ overturned me here and came up with generated.ini, which I'd be fine with, except, it doesn't solve the original problem. Cheers Jan --
Re: Futon Test Suite
Ah, you'd just embed the http-parser itself, reducing dependencies instead of trading one for another. +1, Adam On Aug 15, 2011, at 10:41 AM, Paul Davis wrote: Not sure I follow what you mean there. When I mentioned node's HTTP parser, I meant, the parser [1]. I'd still have to write my own C adaptor for that to Spidermonkey objects. Not entirely certain on the REPL bit, but couchjs was basically a hack on top of the Spidermonkey js REPL so going back to our roots a bit there shouldn't be too hard. [1] https://github.com/ry/http-parser On Mon, Aug 15, 2011 at 8:38 AM, Adam Kocoloski kocol...@apache.org wrote: I thought about suggesting node's parser, especially since you'd get the REPL for free. I think the downside is that there are roughly 300 versions of node out there, and I'd hate for our tests to keep breaking because of node's development pace. libcurl is nothing if not stable. Adam On Aug 14, 2011, at 12:55 PM, Paul Davis wrote: My plan was to rewrite couch.js to use the new request/response classes internally and then when we need closer HTTP access we'd be able to have it. Same for T and Tequals. and what not. There is at least one test that we just can't make work in our current couchjs based test runner because it needs to use async HTTP requests, so at a certain point we have to at least add some of this stuff. I quite like using etap over eunit as it seems more better. Also, now that we're going to a second language for make check tests, it seems like an even better approach. Though I'm not at all married to it by any means. Also, I do understand your concerns about moving parts and uncessesary dependencies. I should get around to updating the build system to use the single file etap distribution but its never really been a concern. Another thing I've been contemplating is if it'd be beneficial to remove libcurl and replace it with node.js's parser or with the ragel parser from Mongrel. Anyway, food for thought. I'll be around this afternoon to hack. On Sun, Aug 14, 2011 at 7:50 AM, Robert Dionne dio...@dionne-associates.com wrote: Paul, This is interesting, and if you're willing to put together the new infrastructure I can help with writing tests. I would suggest a more incremental approach that's less of a rewrite (rewrites often just get you back to 0 from a user's perspective). The existing CouchDB JS object seems to work ok in terms of the http interface, and the Futon tests more or less all ran using couchjs until very recently. I would suggest getting these all running first, reusing copies of the existing CouchDB objects and such so we can hack them as needed. Then we would review and throw out all the tests that are not part of the core APIs, like the coffee stuff (I don't know why we decided to bundle coffee in there) and any tests that are for specific internals. At some point something like BigCouch is integrated in or MobileCouch we might have different make targets for the different deployments. Perhaps in that case we'd have different sets of tests. There needs to be a set of tests that can verify that the semantics of API calls is the same in CouchDB and BigCouch. So I'd say let's work backwards from what we have. Also I'm not a big fan of etap, preferring eunit mainly because it's one less moving part. For JS we already have this T(...) and TEquals() funs which seem to do the trick. All that said, I have a few hours today to hack on this today if you want some help just ping me on #couchdb Bob On Aug 12, 2011, at 11:46 AM, Paul Davis wrote: Here's a bit of a brain dump on the sort of environment I'd like to see our CLI JS tests have. If anyone has any thoughts I'd like to hear them. Otherwise I'll start hacking on this at some point over the weekend. https://gist.github.com/1142306
Re: Configuration Load Order
On Tue, Aug 16, 2011 at 12:32 AM, Jan Lehnardt j...@apache.org wrote: On Jul 19, 2011, at 5:28 PM, Noah Slater wrote: On 19 Jul 2011, at 09:22, Matt Goodall wrote: This makes sense to me. Personally, I don't think a generated.ini/generated.d pair is needed - just a single generated.ini would do. As well as ensuring changes are written to the last .ini file in the configuration chain a generated.ini would act very nicely as a per-instance configuration for when multiple CouchDB instances are run from the same, read-only installation, i.e. default.ini/default.d -- CouchDB default config, system-wide (R) local.ini/local.d -- local sysadmin's config, system-wide (R) generated.ini -- per-instance config (RW) As such, I would suggest naming generated.ini something more like instance.ini. I am +1 on all of this. This doesn't solve the problem that spawned this discussion: 1. Write admin = password to local.ini 2. Restart CouchDB 3. Hash gets persisted to generated.ini 4. Plain text password remains in local.ini That is an excellent point. IMO (and sysadmins responsible for couches would agree): the .ini system, especially the automatic editing and overwriting by couch itself, is brittle or at best confusing. CouchDB's raison d'être is to store structured data which changes over time. And yet the config is a totally separate, less mature, less coherent implementation. You can talk about bootstrapping or human-access or backups, but wearing my sysadmin hat, I don't care. All I know is the config files change arbitrarily and incomprehensibly depending on the whims of CouchDB. (Remember, I wrote the config_whitelist patch in part to address this.) Maybe the answer is not in code but in documentation. Is it possible to deprecate the .ini files as a configuration tool? In other words, tell the world: Configure CouchDB over HTTP via the /_config URLs, probably via Futon. The .ini files become an irrelevant implementation detail. The fact that one of them changes is of no concern. Is that possible? The abstraction is slightly leaky: 1. Sysadmins still have to edit the bootstrapping config, such as the listen address and port. 2. Sysadmins still have to back-up the .ini files because they do in fact reflect changes to the config. But I still think it's a net-win: 1. No changes to the code, just to the mental model of CouchDB 2. Nobody will ever put an admin account in foo.ini, but the hash shows up in bar.ini -- Iris Couch
Bringing automatic compaction into trunk
Developers, users, It's been a while now since I opened a Jira ticket for it ( https://issues.apache.org/jira/browse/COUCHDB-1153 ). I won't describe it here with detail since it's already done in the Jira ticket. Unless there are objections, I would like to get this moving soon. Thanks -- Filipe David Manana, fdman...@gmail.com, fdman...@apache.org Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men.
[jira] [Created] (COUCHDB-1250) Start accepting pull requests via github - this system is a bear for simple documentation fixes - something this community really needs!
Start accepting pull requests via github - this system is a bear for simple documentation fixes - something this community really needs! Key: COUCHDB-1250 URL: https://issues.apache.org/jira/browse/COUCHDB-1250 Project: CouchDB Issue Type: Wish Reporter: Mike McKay -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (COUCHDB-1249) Documentation for view function in jquery.couch.js needs work
Documentation for view function in jquery.couch.js needs work - Key: COUCHDB-1249 URL: https://issues.apache.org/jira/browse/COUCHDB-1249 Project: CouchDB Issue Type: Bug Components: JavaScript View Server Affects Versions: 1.1 Reporter: Mike McKay Priority: Minor Fix For: 1.1 patch here: http://pastie.org/2378357 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085491#comment-13085491 ] Paul Joseph Davis commented on COUCHDB-1153: Couple notes so far. I'm don't care much either way, but I would've just parsed proplists from Erlang terms from the config file like we do for other various options instead of creating the key=val syntax goop. Never register anonymous config change functions. Always register functions using the M:F/A pattern. This has to do with how functions are called and code reloading. If module aren't calling exported functions it'll eventually cause random processes to crash when the code they were referring to is purged. I'm not a super huge fan of how os_mon is being started. There's a -args_file command line switch that we might want to look into supporting for VM configuration. The compact_loop thing seems kinda weird. A pattern I've had luck with lately is to use erlang:send_interval to replace loops like that. Not super concerned about this, but on first skim it looks like it could clean that loop's logic up a bit. Also, I'm wondering if there should be some sort of throttling on how quickly the scan for databases to compact runs. The concern is that for installs that have non-trivial numbers of databases this could start doing mean things to couch_server as well as start thrashing system resources by opening and closing a large number of files. Database and view index compaction daemon - Key: COUCHDB-1153 URL: https://issues.apache.org/jira/browse/COUCHDB-1153 Project: CouchDB Issue Type: New Feature Environment: trunk Reporter: Filipe Manana Assignee: Filipe Manana Priority: Minor Labels: compaction I've recently written an Erlang process to automatically compact databases and they're views based on some configurable parameters. These parameters can be global or per database and are: minimum database fragmentation, minimum view fragmentation, allowed period and strict_window (whether an ongoing compaction should be canceled if it doesn't finish within the allowed period). These fragmentation values are based on the recently added data_size parameter to the database and view group information URIs (COUCHDB-1132). I've documented the .ini configuration, as a comment in default.ini, which I paste here: [compaction_daemon] ; The delay, in seconds, between each check for which database and view indexes ; need to be compacted. check_interval = 60 ; If a database or view index file is smaller then this value (in bytes), ; compaction will not happen. Very small files always have a very high ; fragmentation therefore it's not worth to compact them. min_file_size = 131072 [compactions] ; List of compaction rules for the compaction daemon. ; The daemon compacts databases and they're respective view groups when all the ; condition parameters are satisfied. Configuration can be per database or ; global, and it has the following format: ; ; database_name = parameter=value [, parameter=value]* ; _default = parameter=value [, parameter=value]* ; ; Possible parameters: ; ; * db_fragmentation - If the ratio (as an integer percentage), of the amount ; of old data (and its supporting metadata) over the database ; file size is equal to or greater then this value, this ; database compaction condition is satisfied. ; This value is computed as: ; ; (file_size - data_size) / file_size * 100 ; ; The data_size and file_size values can be obtained when ; querying a database's information URI (GET /dbname/). ; ; * view_fragmentation - If the ratio (as an integer percentage), of the amount ;of old data (and its supporting metadata) over the view ;index (view group) file size is equal to or greater then ;this value, then this view index compaction condition is ;satisfied. This value is computed as: ; ;(file_size - data_size) / file_size * 100 ; ;The data_size and file_size values can be obtained when ;querying a view group's information URI ;(GET /dbname/_design/groupname/_info). ; ; * period - The period for which a database (and its view groups) compaction ;is allowed. This value must obey the following format: ; ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) ; ; * strict_window - If a compaction is still running after the end of the allowed ;
Re: Bringing automatic compaction into trunk
Did a quick review. Posted to the ticket. On Mon, Aug 15, 2011 at 8:29 PM, Filipe David Manana fdman...@apache.org wrote: Developers, users, It's been a while now since I opened a Jira ticket for it ( https://issues.apache.org/jira/browse/COUCHDB-1153 ). I won't describe it here with detail since it's already done in the Jira ticket. Unless there are objections, I would like to get this moving soon. Thanks -- Filipe David Manana, fdman...@gmail.com, fdman...@apache.org Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men.
[jira] [Resolved] (COUCHDB-1250) Start accepting pull requests via github - this system is a bear for simple documentation fixes - something this community really needs!
[ https://issues.apache.org/jira/browse/COUCHDB-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis resolved COUCHDB-1250. Resolution: Invalid This is a known issue and affects all projects at the ASF. If you'd like to follow this up the place you should file a ticket is on the infrastructure group's JIRA instance [1]. There's already talk, but last I heard the most likely solution was just try and have GitHub disable pull requests for the Apache account. https://issues.apache.org/jira/browse/INFRA Start accepting pull requests via github - this system is a bear for simple documentation fixes - something this community really needs! Key: COUCHDB-1250 URL: https://issues.apache.org/jira/browse/COUCHDB-1250 Project: CouchDB Issue Type: Wish Reporter: Mike McKay -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085507#comment-13085507 ] Filipe Manana commented on COUCHDB-1153: Thanks Paul Not sure about what you mean with the loop weirdness. Doesn't seem complicated to me: loop() - do_stuff(), sleep(...), loop(). An alternative ti start os_mon (i really don't care) is to add it to list it as a dependency in the .app file. You're right about the couch_server. It's part of the reason why the autocompaction is disabled by default. Haven't seen however yet a big issue with about ~1000 databases. An approach would be to wait a bit before opening a db if it's not in the lru cache perhahps. Certainly there's a lot of room for improvements in auto compaction and an initial implementation will unlikely ever be perfect for all scenarios. Database and view index compaction daemon - Key: COUCHDB-1153 URL: https://issues.apache.org/jira/browse/COUCHDB-1153 Project: CouchDB Issue Type: New Feature Environment: trunk Reporter: Filipe Manana Assignee: Filipe Manana Priority: Minor Labels: compaction I've recently written an Erlang process to automatically compact databases and they're views based on some configurable parameters. These parameters can be global or per database and are: minimum database fragmentation, minimum view fragmentation, allowed period and strict_window (whether an ongoing compaction should be canceled if it doesn't finish within the allowed period). These fragmentation values are based on the recently added data_size parameter to the database and view group information URIs (COUCHDB-1132). I've documented the .ini configuration, as a comment in default.ini, which I paste here: [compaction_daemon] ; The delay, in seconds, between each check for which database and view indexes ; need to be compacted. check_interval = 60 ; If a database or view index file is smaller then this value (in bytes), ; compaction will not happen. Very small files always have a very high ; fragmentation therefore it's not worth to compact them. min_file_size = 131072 [compactions] ; List of compaction rules for the compaction daemon. ; The daemon compacts databases and they're respective view groups when all the ; condition parameters are satisfied. Configuration can be per database or ; global, and it has the following format: ; ; database_name = parameter=value [, parameter=value]* ; _default = parameter=value [, parameter=value]* ; ; Possible parameters: ; ; * db_fragmentation - If the ratio (as an integer percentage), of the amount ; of old data (and its supporting metadata) over the database ; file size is equal to or greater then this value, this ; database compaction condition is satisfied. ; This value is computed as: ; ; (file_size - data_size) / file_size * 100 ; ; The data_size and file_size values can be obtained when ; querying a database's information URI (GET /dbname/). ; ; * view_fragmentation - If the ratio (as an integer percentage), of the amount ;of old data (and its supporting metadata) over the view ;index (view group) file size is equal to or greater then ;this value, then this view index compaction condition is ;satisfied. This value is computed as: ; ;(file_size - data_size) / file_size * 100 ; ;The data_size and file_size values can be obtained when ;querying a view group's information URI ;(GET /dbname/_design/groupname/_info). ; ; * period - The period for which a database (and its view groups) compaction ;is allowed. This value must obey the following format: ; ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) ; ; * strict_window - If a compaction is still running after the end of the allowed ; period, it will be canceled if this parameter is set to yes. ; It defaults to no and it's meaningful only if the *period* ; parameter is also specified. ; ; * parallel_view_compaction - If set to yes, the database and its views are ; compacted in parallel. This is only useful on ; certain setups, like for example when the database ; and view index directories point to different ; disks. It defaults to no. ; ; Before a
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085523#comment-13085523 ] Paul Joseph Davis commented on COUCHDB-1153: My thoughts on the loop were based on my day dreaming that it's entirely possible that there's going to be feature requests to handle multiple simultaneous compactions. I tend to have better luck reacting to messages to maintain the state of a set of long running process directly from the gen_server rather than have this middleman process looping around accepting messages. Also, the more I look at this compact_loop the more things I see wrong with it: * You have a Pid = spawn_link/1, MonRef = erlang:monitor(process, Pid) sequence for the parallel view compactor. One of these is redundant. You want a link if you want the compactor_loop to exit when the view compactor crashes, or you want the monitor if you just want to know when it dies. * When you wait for the view compaction process to end there's no timeout. That means that the compactor loop could never move depending on whether the view compactor process exits or not. * You never flush monitor messages. This means the compact_loop process mailbox will slowly fill with messages over time causing hard to track memory leaks. * Views don't seem to be checked to see if they need to be compacted if their database doesn't need to be. * View compaction holds open a reference to the database its compacting views for. What happens if views haven't finished compacting before the main database compaction gets swapped out? I'd prefer to either have os_mon in an app file or started as an app when the VM boots. If we're going to talk about moving towards being more OTP compliant we should be trying to avoid adding more non-OTP bits when possible. The important part to trigger the couch_server issues you need to have a lot of active databases as well as a lot of load so that try_close_lru turns into a table scan of that ets table. Adam rewrote couch_server quite a long time ago to replace this so that requests for open databases turned into a single ets lookup on a public table which helped quite a bit. Though it introduces the possibility of a race condition when opening a database that's just about to be shut. Since then other things have been fixed and couch_server has become a bottleneck again. I looked at it the other day and the only thing I came up with would require some non-trivial changes to the close semantics of databases. I think the general approach here is quite good and I'm quite fine with leaving room for improvement. On the flip side, we need to avoid just pushing features into trunk without considering how we might be asked to improve them or what sort of maintenance cost they'll incur. Database and view index compaction daemon - Key: COUCHDB-1153 URL: https://issues.apache.org/jira/browse/COUCHDB-1153 Project: CouchDB Issue Type: New Feature Environment: trunk Reporter: Filipe Manana Assignee: Filipe Manana Priority: Minor Labels: compaction I've recently written an Erlang process to automatically compact databases and they're views based on some configurable parameters. These parameters can be global or per database and are: minimum database fragmentation, minimum view fragmentation, allowed period and strict_window (whether an ongoing compaction should be canceled if it doesn't finish within the allowed period). These fragmentation values are based on the recently added data_size parameter to the database and view group information URIs (COUCHDB-1132). I've documented the .ini configuration, as a comment in default.ini, which I paste here: [compaction_daemon] ; The delay, in seconds, between each check for which database and view indexes ; need to be compacted. check_interval = 60 ; If a database or view index file is smaller then this value (in bytes), ; compaction will not happen. Very small files always have a very high ; fragmentation therefore it's not worth to compact them. min_file_size = 131072 [compactions] ; List of compaction rules for the compaction daemon. ; The daemon compacts databases and they're respective view groups when all the ; condition parameters are satisfied. Configuration can be per database or ; global, and it has the following format: ; ; database_name = parameter=value [, parameter=value]* ; _default = parameter=value [, parameter=value]* ; ; Possible parameters: ; ; * db_fragmentation - If the ratio (as an integer percentage), of the amount ; of old data (and its supporting metadata) over the database ; file size is