[jira] [Commented] (COUCHDB-1242) Filtered replication silently converts all query parameters to strings

2011-08-15 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085014#comment-13085014
 ] 

Robert Newson commented on COUCHDB-1242:


For trunk this also needs to update the validate_doc_update fun for _replicator

 Filtered replication silently converts all query parameters to strings
 --

 Key: COUCHDB-1242
 URL: https://issues.apache.org/jira/browse/COUCHDB-1242
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 1.0.2, 1.1
Reporter: Robert Newson
Assignee: Robert Newson
 Attachments: 
 0001-throw-400-bad_request-if-any-query_params-value-is-n.patch


 All filtered query params are silently converted to strings. this causes 
 tests for if (req.query.foo) to evaluate to true even for 
 {query_params:{foo:false}} because false is true.
 No clean solution exists to fix this as _changes has a GET entry point and 
 request parameters are untyped.
 Suggested fix is to scan the query_params in handle_replicate_req and throw a 
 400 Bad Request if any value is not a JSON string.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (COUCHDB-911) Repeating a doc._id in a _bulk_docs request results in erroneous Document conflict error

2011-08-15 Thread Adam Kocoloski (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski reopened COUCHDB-911:



I think the issue here is that we responde with two conflict errors, but we do 
end up saving the document in the database.  If I get a 409 Conflict response 
from CouchDB I assume the database rejected the request, but it's not the case 
here.

 Repeating a doc._id in a _bulk_docs request results in erroneous Document 
 conflict error
 --

 Key: COUCHDB-911
 URL: https://issues.apache.org/jira/browse/COUCHDB-911
 Project: CouchDB
  Issue Type: Bug
  Components: HTTP Interface
Affects Versions: 1.0
 Environment: Cloudant BigCouch EC2 node
Reporter: Jay Nelson
Priority: Minor
   Original Estimate: 48h
  Remaining Estimate: 48h

 Repeating an _id in a _bulk_docs post data file results in both entries 
 being reported as document conflict errors.  The first occurrence actual 
 inserts into the database, and only the second occurrence should report a 
 conflict.
 curl -d '{ docs: [ {_id:foo}, {_id,foo} ] }' -H 
 'Content-Type:application/json' -X POST 
 http://appadvice.cloudant.com/foo/_bulk_docs
 [{id:foo,error:conflict,reason:Document update 
 conflict.},{id:foo,error:conflict,reason:Document update 
 conflict.}]
 But the database shows that one new document was actually inserted.
 Only the second occurrence should report conflict.  The first occurrence 
 should report the _rev property of the newly inserted doc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Futon Test Suite

2011-08-15 Thread Adam Kocoloski
I thought about suggesting node's parser, especially since you'd get the REPL 
for free.  I think the downside is that there are roughly 300 versions of node 
out there, and I'd hate for our tests to keep breaking because of node's 
development pace.  libcurl is nothing if not stable.

Adam

On Aug 14, 2011, at 12:55 PM, Paul Davis wrote:

 My plan was to rewrite couch.js to use the new request/response
 classes internally and then when we need closer HTTP access we'd be
 able to have it. Same for T and Tequals. and what not. There is at
 least one test that we just can't make work in our current couchjs
 based test runner because it needs to use async HTTP requests, so at a
 certain point we have to at least add some of this stuff.
 
 I quite like using etap over eunit as it seems more better. Also, now
 that we're going to a second language for make check tests, it seems
 like an even better approach. Though I'm not at all married to it by
 any means. Also, I do understand your concerns about moving parts and
 uncessesary dependencies. I should get around to updating the build
 system to use the single file etap distribution but its never really
 been a concern.
 
 Another thing I've been contemplating is if it'd be beneficial to
 remove libcurl and replace it with node.js's parser or with the ragel
 parser from Mongrel. Anyway, food for thought. I'll be around this
 afternoon to hack.
 
 On Sun, Aug 14, 2011 at 7:50 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Paul,
 
  This is interesting, and if you're willing to put together the new 
 infrastructure I can help with writing tests. I would suggest a more 
 incremental approach that's less of a rewrite (rewrites often just get you 
 back to 0 from a user's perspective).
 
   The existing CouchDB JS object seems to work ok in terms of the http 
 interface, and the Futon tests more or less all ran using couchjs until very 
 recently. I would suggest getting these all running first, reusing copies of 
 the existing CouchDB objects and such so we can hack them as needed. Then we 
 would review and throw out all the tests that are not part of the core APIs, 
 like the coffee stuff (I don't know why we decided to bundle coffee in 
 there) and any tests that are for specific internals.
 
   At some point something like BigCouch is integrated in or MobileCouch we 
 might have different make targets for the different deployments. Perhaps 
 in that case we'd have different sets of tests. There needs to be a set of 
 tests that can verify that the semantics of API calls is the same in CouchDB 
 and BigCouch.
 
  So I'd say let's work backwards from what we have. Also I'm not a big fan 
 of etap, preferring eunit mainly because it's one less moving part. For JS 
 we already have this T(...) and TEquals() funs which seem to do the 
 trick.
 
   All that said, I have a few hours today to hack on this today if you want 
 some help just ping me on #couchdb
 
 Bob
 
 
 
 
 On Aug 12, 2011, at 11:46 AM, Paul Davis wrote:
 
 Here's a bit of a brain dump on the sort of environment I'd like to
 see our CLI JS tests have. If anyone has any thoughts I'd like to hear
 them. Otherwise I'll start hacking on this at some point over the
 weekend.
 
 https://gist.github.com/1142306
 
 



Re: Futon Test Suite

2011-08-15 Thread Paul Davis
Not sure I follow what you mean there. When I mentioned node's HTTP
parser, I meant, the parser [1]. I'd still have to write my own C
adaptor for that to Spidermonkey objects. Not entirely certain on the
REPL bit, but couchjs was basically a hack on top of the Spidermonkey
js REPL so going back to our roots a bit there shouldn't be too hard.

[1] https://github.com/ry/http-parser

On Mon, Aug 15, 2011 at 8:38 AM, Adam Kocoloski kocol...@apache.org wrote:
 I thought about suggesting node's parser, especially since you'd get the REPL 
 for free.  I think the downside is that there are roughly 300 versions of 
 node out there, and I'd hate for our tests to keep breaking because of node's 
 development pace.  libcurl is nothing if not stable.

 Adam

 On Aug 14, 2011, at 12:55 PM, Paul Davis wrote:

 My plan was to rewrite couch.js to use the new request/response
 classes internally and then when we need closer HTTP access we'd be
 able to have it. Same for T and Tequals. and what not. There is at
 least one test that we just can't make work in our current couchjs
 based test runner because it needs to use async HTTP requests, so at a
 certain point we have to at least add some of this stuff.

 I quite like using etap over eunit as it seems more better. Also, now
 that we're going to a second language for make check tests, it seems
 like an even better approach. Though I'm not at all married to it by
 any means. Also, I do understand your concerns about moving parts and
 uncessesary dependencies. I should get around to updating the build
 system to use the single file etap distribution but its never really
 been a concern.

 Another thing I've been contemplating is if it'd be beneficial to
 remove libcurl and replace it with node.js's parser or with the ragel
 parser from Mongrel. Anyway, food for thought. I'll be around this
 afternoon to hack.

 On Sun, Aug 14, 2011 at 7:50 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Paul,

  This is interesting, and if you're willing to put together the new 
 infrastructure I can help with writing tests. I would suggest a more 
 incremental approach that's less of a rewrite (rewrites often just get you 
 back to 0 from a user's perspective).

   The existing CouchDB JS object seems to work ok in terms of the http 
 interface, and the Futon tests more or less all ran using couchjs until 
 very recently. I would suggest getting these all running first, reusing 
 copies of the existing CouchDB objects and such so we can hack them as 
 needed. Then we would review and throw out all the tests that are not part 
 of the core APIs, like the coffee stuff (I don't know why we decided to 
 bundle coffee in there) and any tests that are for specific internals.

   At some point something like BigCouch is integrated in or MobileCouch we 
 might have different make targets for the different deployments. Perhaps 
 in that case we'd have different sets of tests. There needs to be a set of 
 tests that can verify that the semantics of API calls is the same in 
 CouchDB and BigCouch.

  So I'd say let's work backwards from what we have. Also I'm not a big fan 
 of etap, preferring eunit mainly because it's one less moving part. For JS 
 we already have this T(...) and TEquals() funs which seem to do the 
 trick.

   All that said, I have a few hours today to hack on this today if you want 
 some help just ping me on #couchdb

 Bob




 On Aug 12, 2011, at 11:46 AM, Paul Davis wrote:

 Here's a bit of a brain dump on the sort of environment I'd like to
 see our CLI JS tests have. If anyone has any thoughts I'd like to hear
 them. Otherwise I'll start hacking on this at some point over the
 weekend.

 https://gist.github.com/1142306






Re: Configuration Load Order

2011-08-15 Thread Jan Lehnardt

On Jul 19, 2011, at 5:28 PM, Noah Slater wrote:

 
 On 19 Jul 2011, at 09:22, Matt Goodall wrote:
 
 This makes sense to me. Personally, I don't think a
 generated.ini/generated.d pair is needed - just a single generated.ini would
 do.
 
 As well as ensuring changes are written to the last .ini file in the
 configuration chain a generated.ini would act very nicely as a per-instance
 configuration for when multiple CouchDB instances are run from the same,
 read-only installation, i.e.
 
   default.ini/default.d  --  CouchDB default config, system-wide (R)
   local.ini/local.d  --  local sysadmin's config, system-wide (R)
   generated.ini  --  per-instance config (RW)
 
 As such, I would suggest naming generated.ini something more like
 instance.ini.
 
 I am +1 on all of this.

This doesn't solve the problem that spawned this discussion:

1. Write admin = password to local.ini
2. Restart CouchDB
3. Hash gets persisted to generated.ini
4. Plain text password remains in local.ini

Cheers
Jan
-- 




Re: Configuration Load Order

2011-08-15 Thread Noah Slater

On 15 Aug 2011, at 18:32, Jan Lehnardt wrote:

 1. Write admin = password to local.ini
 2. Restart CouchDB
 3. Hash gets persisted to generated.ini
 4. Plain text password remains in local.ini

Which one of these steps is the problem? 4? What would you have happen in place 
of that? That the plain text password be removed? Could we not simply leave 
that up to the admin to remove it from the config? What if it is needed again 
at some point? If I put my plain text password in a config file that I had 
edited by hand on a server, I would not expect it to be removed by the 
software. If I was concerned about saving the plain text password in the first 
place, I would hope that the software in question would come with an 
interactive prompt that would ask me for my password and write the hash out to 
the file for me.

Re: Configuration Load Order

2011-08-15 Thread Jan Lehnardt

On Aug 15, 2011, at 7:36 PM, Noah Slater wrote:

 
 On 15 Aug 2011, at 18:32, Jan Lehnardt wrote:
 
 1. Write admin = password to local.ini
 2. Restart CouchDB
 3. Hash gets persisted to generated.ini
 4. Plain text password remains in local.ini
 
 Which one of these steps is the problem? 4? What would you have happen in 
 place of that? That the plain text password be removed? Could we not simply 
 leave that up to the admin to remove it from the config? What if it is needed 
 again at some point? If I put my plain text password in a config file that I 
 had edited by hand on a server, I would not expect it to be removed by the 
 software. If I was concerned about saving the plain text password in the 
 first place, I would hope that the software in question would come with an 
 interactive prompt that would ask me for my password and write the hash out 
 to the file for me.

I would expect that a plaintext admin password would never survive a server 
restart.

If you want to change the admin-addition procedure to a startup prompt thing, 
I'd be happy to consider this, but currently we are stuck between a rock and a 
hard place because all the documentation out there suggests adding an admin to 
local.ini will do the trick, yet distributions that add config files to 
local.d/ will keep plaintext passwords around, contrary to what is documented. 
I consider this a bad user experience as well as a security issue.

I was supporting that local.ini should come after local.d/*.ini, but dev@ 
overturned me here and came up with generated.ini, which I'd be fine with, 
except, it doesn't solve the original problem.

Cheers
Jan
-- 



Re: Futon Test Suite

2011-08-15 Thread Adam Kocoloski
Ah, you'd just embed the http-parser itself, reducing dependencies instead of 
trading one for another.  +1,

Adam

On Aug 15, 2011, at 10:41 AM, Paul Davis wrote:

 Not sure I follow what you mean there. When I mentioned node's HTTP
 parser, I meant, the parser [1]. I'd still have to write my own C
 adaptor for that to Spidermonkey objects. Not entirely certain on the
 REPL bit, but couchjs was basically a hack on top of the Spidermonkey
 js REPL so going back to our roots a bit there shouldn't be too hard.
 
 [1] https://github.com/ry/http-parser
 
 On Mon, Aug 15, 2011 at 8:38 AM, Adam Kocoloski kocol...@apache.org wrote:
 I thought about suggesting node's parser, especially since you'd get the 
 REPL for free.  I think the downside is that there are roughly 300 versions 
 of node out there, and I'd hate for our tests to keep breaking because of 
 node's development pace.  libcurl is nothing if not stable.
 
 Adam
 
 On Aug 14, 2011, at 12:55 PM, Paul Davis wrote:
 
 My plan was to rewrite couch.js to use the new request/response
 classes internally and then when we need closer HTTP access we'd be
 able to have it. Same for T and Tequals. and what not. There is at
 least one test that we just can't make work in our current couchjs
 based test runner because it needs to use async HTTP requests, so at a
 certain point we have to at least add some of this stuff.
 
 I quite like using etap over eunit as it seems more better. Also, now
 that we're going to a second language for make check tests, it seems
 like an even better approach. Though I'm not at all married to it by
 any means. Also, I do understand your concerns about moving parts and
 uncessesary dependencies. I should get around to updating the build
 system to use the single file etap distribution but its never really
 been a concern.
 
 Another thing I've been contemplating is if it'd be beneficial to
 remove libcurl and replace it with node.js's parser or with the ragel
 parser from Mongrel. Anyway, food for thought. I'll be around this
 afternoon to hack.
 
 On Sun, Aug 14, 2011 at 7:50 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Paul,
 
  This is interesting, and if you're willing to put together the new 
 infrastructure I can help with writing tests. I would suggest a more 
 incremental approach that's less of a rewrite (rewrites often just get you 
 back to 0 from a user's perspective).
 
   The existing CouchDB JS object seems to work ok in terms of the http 
 interface, and the Futon tests more or less all ran using couchjs until 
 very recently. I would suggest getting these all running first, reusing 
 copies of the existing CouchDB objects and such so we can hack them as 
 needed. Then we would review and throw out all the tests that are not part 
 of the core APIs, like the coffee stuff (I don't know why we decided to 
 bundle coffee in there) and any tests that are for specific internals.
 
   At some point something like BigCouch is integrated in or MobileCouch we 
 might have different make targets for the different deployments. Perhaps 
 in that case we'd have different sets of tests. There needs to be a set of 
 tests that can verify that the semantics of API calls is the same in 
 CouchDB and BigCouch.
 
  So I'd say let's work backwards from what we have. Also I'm not a big fan 
 of etap, preferring eunit mainly because it's one less moving part. For JS 
 we already have this T(...) and TEquals() funs which seem to do the 
 trick.
 
   All that said, I have a few hours today to hack on this today if you 
 want some help just ping me on #couchdb
 
 Bob
 
 
 
 
 On Aug 12, 2011, at 11:46 AM, Paul Davis wrote:
 
 Here's a bit of a brain dump on the sort of environment I'd like to
 see our CLI JS tests have. If anyone has any thoughts I'd like to hear
 them. Otherwise I'll start hacking on this at some point over the
 weekend.
 
 https://gist.github.com/1142306
 
 
 
 



Re: Configuration Load Order

2011-08-15 Thread Jason Smith
On Tue, Aug 16, 2011 at 12:32 AM, Jan Lehnardt j...@apache.org wrote:

 On Jul 19, 2011, at 5:28 PM, Noah Slater wrote:


 On 19 Jul 2011, at 09:22, Matt Goodall wrote:

 This makes sense to me. Personally, I don't think a
 generated.ini/generated.d pair is needed - just a single generated.ini would
 do.

 As well as ensuring changes are written to the last .ini file in the
 configuration chain a generated.ini would act very nicely as a per-instance
 configuration for when multiple CouchDB instances are run from the same,
 read-only installation, i.e.

   default.ini/default.d  --  CouchDB default config, system-wide (R)
   local.ini/local.d  --  local sysadmin's config, system-wide (R)
   generated.ini  --  per-instance config (RW)

 As such, I would suggest naming generated.ini something more like
 instance.ini.

 I am +1 on all of this.

 This doesn't solve the problem that spawned this discussion:

 1. Write admin = password to local.ini
 2. Restart CouchDB
 3. Hash gets persisted to generated.ini
 4. Plain text password remains in local.ini

That is an excellent point.

IMO (and sysadmins responsible for couches would agree): the .ini
system, especially the automatic editing and overwriting by couch
itself, is brittle or at best confusing. CouchDB's raison d'être is to
store structured data which changes over time. And yet the config is a
totally separate, less mature, less coherent implementation.

You can talk about bootstrapping or human-access or backups, but
wearing my sysadmin hat, I don't care. All I know is the config files
change arbitrarily and incomprehensibly depending on the whims of
CouchDB. (Remember, I wrote the config_whitelist patch in part to
address this.)

Maybe the answer is not in code but in documentation.

Is it possible to deprecate the .ini files as a configuration tool? In
other words, tell the world: Configure CouchDB over HTTP via the
/_config URLs, probably via Futon.

The .ini files become an irrelevant implementation detail. The fact
that one of them changes is of no concern. Is that possible?

The abstraction is slightly leaky:

1. Sysadmins still have to edit the bootstrapping config, such as the
listen address and port.
2. Sysadmins still have to back-up the .ini files because they do in
fact reflect changes to the config.

But I still think it's a net-win:

1. No changes to the code, just to the mental model of CouchDB
2. Nobody will ever put an admin account in foo.ini, but the hash
shows up in bar.ini

-- 
Iris Couch


Bringing automatic compaction into trunk

2011-08-15 Thread Filipe David Manana
Developers, users,

It's been a while now since I opened a Jira ticket for it (
https://issues.apache.org/jira/browse/COUCHDB-1153 ).
I won't describe it here with detail since it's already done in the Jira ticket.

Unless there are objections, I would like to get this moving soon.

Thanks


-- 
Filipe David Manana,
fdman...@gmail.com, fdman...@apache.org

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.


[jira] [Created] (COUCHDB-1250) Start accepting pull requests via github - this system is a bear for simple documentation fixes - something this community really needs!

2011-08-15 Thread Mike McKay (JIRA)
Start accepting pull requests via github - this system is a bear for simple 
documentation fixes - something this community really needs!


 Key: COUCHDB-1250
 URL: https://issues.apache.org/jira/browse/COUCHDB-1250
 Project: CouchDB
  Issue Type: Wish
Reporter: Mike McKay




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (COUCHDB-1249) Documentation for view function in jquery.couch.js needs work

2011-08-15 Thread Mike McKay (JIRA)
Documentation for view function in jquery.couch.js needs work
-

 Key: COUCHDB-1249
 URL: https://issues.apache.org/jira/browse/COUCHDB-1249
 Project: CouchDB
  Issue Type: Bug
  Components: JavaScript View Server
Affects Versions: 1.1
Reporter: Mike McKay
Priority: Minor
 Fix For: 1.1


patch here: http://pastie.org/2378357

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-15 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085491#comment-13085491
 ] 

Paul Joseph Davis commented on COUCHDB-1153:


Couple notes so far.

I'm don't care much either way, but I would've just parsed proplists from 
Erlang terms from the config file like we do for other various options instead 
of creating the key=val syntax goop.

Never register anonymous config change functions. Always register functions 
using the M:F/A pattern. This has to do with how functions are called and code 
reloading. If module aren't calling exported functions it'll eventually cause 
random processes to crash when the code they were referring to is purged.

I'm not a super huge fan of how os_mon is being started. There's a -args_file 
command line switch that we might want to look into supporting for VM 
configuration.

The compact_loop thing seems kinda weird. A pattern I've had luck with lately 
is to use erlang:send_interval to replace loops like that. Not super concerned 
about this, but on first skim it looks like it could clean that loop's logic up 
a bit.

Also, I'm wondering if there should be some sort of throttling on how quickly 
the scan for databases to compact runs. The concern is that for installs that 
have non-trivial numbers of databases this could start doing mean things to 
couch_server as well as start thrashing system resources by opening and closing 
a large number of files.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   

Re: Bringing automatic compaction into trunk

2011-08-15 Thread Paul Davis
Did a quick review. Posted to the ticket.

On Mon, Aug 15, 2011 at 8:29 PM, Filipe David Manana
fdman...@apache.org wrote:
 Developers, users,

 It's been a while now since I opened a Jira ticket for it (
 https://issues.apache.org/jira/browse/COUCHDB-1153 ).
 I won't describe it here with detail since it's already done in the Jira 
 ticket.

 Unless there are objections, I would like to get this moving soon.

 Thanks


 --
 Filipe David Manana,
 fdman...@gmail.com, fdman...@apache.org

 Reasonable men adapt themselves to the world.
  Unreasonable men adapt the world to themselves.
  That's why all progress depends on unreasonable men.



[jira] [Resolved] (COUCHDB-1250) Start accepting pull requests via github - this system is a bear for simple documentation fixes - something this community really needs!

2011-08-15 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis resolved COUCHDB-1250.


Resolution: Invalid

This is a known issue and affects all projects at the ASF. If you'd like to 
follow this up the place you should file a ticket is on the infrastructure 
group's JIRA instance [1]. There's already talk, but last I heard the most 
likely solution was just try and have GitHub disable pull requests for the 
Apache account.

https://issues.apache.org/jira/browse/INFRA

 Start accepting pull requests via github - this system is a bear for simple 
 documentation fixes - something this community really needs!
 

 Key: COUCHDB-1250
 URL: https://issues.apache.org/jira/browse/COUCHDB-1250
 Project: CouchDB
  Issue Type: Wish
Reporter: Mike McKay



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-15 Thread Filipe Manana (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085507#comment-13085507
 ] 

Filipe Manana commented on COUCHDB-1153:


Thanks Paul

Not sure about what you mean with the loop weirdness. Doesn't seem complicated 
to me:   loop() - do_stuff(), sleep(...), loop().

An alternative ti start os_mon (i really don't care) is to add it to list it as 
a dependency in the .app file.

You're right about the couch_server. It's part of the reason why the 
autocompaction is disabled by default. Haven't seen however yet a big issue 
with about ~1000 databases. An approach would be to wait a bit before opening a 
db if it's not in the lru cache perhahps.

Certainly there's a lot of room for improvements in auto compaction and an 
initial implementation will unlikely ever be perfect for all scenarios.



 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-15 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085523#comment-13085523
 ] 

Paul Joseph Davis commented on COUCHDB-1153:


My thoughts on the loop were based on my day dreaming that it's entirely 
possible that there's going to be feature requests to handle multiple 
simultaneous compactions. I tend to have better luck reacting to messages to 
maintain the state of a set of long running process directly from the 
gen_server rather than have this middleman process looping around accepting 
messages. Also, the more I look at this compact_loop the more things I see 
wrong with it:

* You have a Pid = spawn_link/1, MonRef = erlang:monitor(process, Pid) 
sequence for the parallel
  view compactor. One of these is redundant. You want a link if you want 
the compactor_loop to
  exit when the view compactor crashes, or you want the monitor if you just 
want to know when it dies.
* When you wait for the view compaction process to end there's no timeout. 
That means that the
  compactor loop could never move depending on whether the view compactor 
process exits or not.
* You never flush monitor messages. This means the compact_loop process 
mailbox
  will slowly fill with messages over time causing hard to track memory 
leaks.
* Views don't seem to be checked to see if they need to be compacted if 
their database doesn't
  need to be.
* View compaction holds open a reference to the database its compacting 
views for. What happens if
  views haven't finished compacting before the main database compaction 
gets swapped out?


I'd prefer to either have os_mon in an app file or started as an app when the 
VM boots. If we're going to talk about moving towards being more OTP compliant 
we should be trying to avoid adding more non-OTP bits when possible.

The important part to trigger the couch_server issues you need to have a lot of 
active databases as well as a lot of load so that try_close_lru turns into a 
table scan of that ets table. Adam rewrote couch_server quite a long time ago 
to replace this so that requests for open databases turned into a single ets 
lookup on a public table which helped quite a bit. Though it introduces the 
possibility of a race condition when opening a database that's just about to be 
shut. Since then other things have been fixed and couch_server has become a 
bottleneck again. I looked at it the other day and the only thing I came up 
with would require some non-trivial changes to the close semantics of databases.

I think the general approach here is quite good and I'm quite fine with leaving 
room for improvement. On the flip side, we need to avoid just pushing features 
into trunk without considering how we might be asked to improve them or what 
sort of maintenance cost they'll incur.




 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is