[jira] [Commented] (COUCHDB-3415) EUnit: should_accept_live_as_an_alias_for_continuous invalid_trailing_data

2017-06-09 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044461#comment-16044461
 ] 

Paul Joseph Davis commented on COUCHDB-3415:


Fix incoming. Saw data for this when looking for a different log. The timeout=1 
parameter will sometimes cause a timeout to fire before we get the result which 
ends up putting a newline at the front of the body so that the split fails to 
find the proper last_seq data. Fix is to use the global option for 
binary:split/3 and then filter out any empty binaries. PR incoming.

> EUnit: should_accept_live_as_an_alias_for_continuous invalid_trailing_data
> --
>
> Key: COUCHDB-3415
> URL: https://issues.apache.org/jira/browse/COUCHDB-3415
> Project: CouchDB
>  Issue Type: Bug
>  Components: Test Suite
>Reporter: Joan Touzet
>
> New bug. Seen once in Travis, Erlang 17.5. Re-running caused the error to 
> disappear.
> {noformat}
> module 'chttpd_db_test'
>   chttpd db tests
> chttpd_db_test:71: should_return_ok_true_on_bulk_update...[0.073 s] ok
> chttpd_db_test:86: 
> should_accept_live_as_an_alias_for_continuous...*failed*
> in function couch_util:json_decode/1 (src/couch_util.erl, line 414)
> in call from 
> chttpd_db_test:'-should_accept_live_as_an_alias_for_continuous/1-fun-1-'/1 
> (test/chttpd_db_test.erl, line 98)
> **throw:{invalid_json,{error,{257,invalid_trailing_data}}}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (COUCHDB-3376) Fix mem3_shards under load

2017-04-25 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3376.
--
Resolution: Fixed

> Fix mem3_shards under load
> --
>
> Key: COUCHDB-3376
> URL: https://issues.apache.org/jira/browse/COUCHDB-3376
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Paul Joseph Davis
>
> There were two issues with mem3_shards that were fixed while I've been 
> testing the PSE code.
> The first issue was found by [~jaydoane] where a database can have its shards 
> inserted into the cache after its been deleted. This can happen if a client 
> does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to 
> track the changes feed update sequence from the changes feed listener and 
> only insert shard maps that come from a client that has read as recent of an 
> update_seq as mem3_shards.
> The second issue found during heavy benchmarking was that large shard maps 
> (in the Q>=128 range) can quite easily cause mem3_shards to backup when 
> there's a thundering herd attempting to open the database. There's no 
> coordination among workers trying to add a shard map to the cache so if a 
> bunch of independent clients all send the shard map at once (say, at the 
> beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for 
> this was two fold. First, rather than send the shard map directly to 
> mem3_shards, we copy it into a spawned process and when/if mem3_shards wants 
> to write it, it tells this writer process to do its business. The second 
> optimization for this change is to create an ets table to track these 
> processes. Then independent clients can check if a shard map is already 
> enroute to mem3_shards by using ets:insert_new and canceling their writer if 
> that returns false.
> PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (COUCHDB-3378) Fix mango full text detection

2017-04-18 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3378.
--
Resolution: Fixed

> Fix mango full text detection
> -
>
> Key: COUCHDB-3378
> URL: https://issues.apache.org/jira/browse/COUCHDB-3378
> Project: CouchDB
>  Issue Type: Bug
>  Components: Mango
>Reporter: Paul Joseph Davis
>
> The renaming of source files for mango's full text adapter was not super 
> awesome. So I fixed it to not do that. PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (COUCHDB-3379) Fix couch_auth_cache reinitialization logic

2017-04-18 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3379.
--
Resolution: Fixed

> Fix couch_auth_cache reinitialization logic
> ---
>
> Key: COUCHDB-3379
> URL: https://issues.apache.org/jira/browse/COUCHDB-3379
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> The reinitialization logic is subtle and quite silly in hindsight. This 
> reacted badly with the PSE work that has a slight change to the order of 
> signals (which nothing should be relying on in an async system :). This 
> simplifies and fixes the reinitialization of couch_auth_cache.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3343) JS: show_documents failure

2017-04-18 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973323#comment-15973323
 ] 

Paul Joseph Davis commented on COUCHDB-3343:


Another instance:

https://s3.amazonaws.com/archive.travis-ci.org/jobs/223225332/log.txt

> JS: show_documents failure
> --
>
> Key: COUCHDB-3343
> URL: https://issues.apache.org/jira/browse/COUCHDB-3343
> Project: CouchDB
>  Issue Type: Test
>  Components: Test Suite
>Reporter: Joan Touzet
>
> Has occurred once so far in Jenkins CI runs.
> {noformat}
> test/javascript/tests/show_documents.js
> Error: changed ddoc
> Trace back (most recent call first):
>   52: test/javascript/test_setup.js
>   T(false,"changed ddoc")
>  296: test/javascript/tests/show_documents.js
>   ()
>   37: test/javascript/cli_runner.js
>   runTest()
>   48: test/javascript/cli_runner.js
>   
> fail
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (COUCHDB-3380) Fix mem3_sync_event_listener unit tests

2017-04-18 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3380.
--
Resolution: Fixed

> Fix mem3_sync_event_listener unit tests
> ---
>
> Key: COUCHDB-3380
> URL: https://issues.apache.org/jira/browse/COUCHDB-3380
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> The tests in mem3_sync_event_listener get skipped because of meck issues but 
> if you run the mem3 eunit tests directly (i.e., make eunit apps=mem3) you'll 
> see this failure. The change is pretty trivial. Just a matter of this test 
> never having run in CI because reasons.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3380) Fix mem3_sync_event_listener unit tests

2017-04-18 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3380:
--

 Summary: Fix mem3_sync_event_listener unit tests
 Key: COUCHDB-3380
 URL: https://issues.apache.org/jira/browse/COUCHDB-3380
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Paul Joseph Davis


The tests in mem3_sync_event_listener get skipped because of meck issues but if 
you run the mem3 eunit tests directly (i.e., make eunit apps=mem3) you'll see 
this failure. The change is pretty trivial. Just a matter of this test never 
having run in CI because reasons.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3379) Fix couch_auth_cache reinitialization logic

2017-04-18 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3379:
--

 Summary: Fix couch_auth_cache reinitialization logic
 Key: COUCHDB-3379
 URL: https://issues.apache.org/jira/browse/COUCHDB-3379
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Paul Joseph Davis


The reinitialization logic is subtle and quite silly in hindsight. This reacted 
badly with the PSE work that has a slight change to the order of signals (which 
nothing should be relying on in an async system :). This simplifies and fixes 
the reinitialization of couch_auth_cache.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3378) Fix mango full text detection

2017-04-18 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973279#comment-15973279
 ] 

Paul Joseph Davis commented on COUCHDB-3378:


Whoops. Thought GH integration was still broken.

> Fix mango full text detection
> -
>
> Key: COUCHDB-3378
> URL: https://issues.apache.org/jira/browse/COUCHDB-3378
> Project: CouchDB
>  Issue Type: Bug
>  Components: Mango
>Reporter: Paul Joseph Davis
>
> The renaming of source files for mango's full text adapter was not super 
> awesome. So I fixed it to not do that. PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3378) Fix mango full text detection

2017-04-18 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973278#comment-15973278
 ] 

Paul Joseph Davis commented on COUCHDB-3378:


PR: https://github.com/apache/couchdb/pull/480

> Fix mango full text detection
> -
>
> Key: COUCHDB-3378
> URL: https://issues.apache.org/jira/browse/COUCHDB-3378
> Project: CouchDB
>  Issue Type: Bug
>  Components: Mango
>Reporter: Paul Joseph Davis
>
> The renaming of source files for mango's full text adapter was not super 
> awesome. So I fixed it to not do that. PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3378) Fix mango full text detection

2017-04-18 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3378:
--

 Summary: Fix mango full text detection
 Key: COUCHDB-3378
 URL: https://issues.apache.org/jira/browse/COUCHDB-3378
 Project: CouchDB
  Issue Type: Bug
  Components: Mango
Reporter: Paul Joseph Davis


The renaming of source files for mango's full text adapter was not super 
awesome. So I fixed it to not do that. PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3261) Test case couch_compress_tests failed

2017-04-18 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973258#comment-15973258
 ] 

Paul Joseph Davis commented on COUCHDB-3261:


I'm not a fan of special casing architectures here to make the tests pass. 
Looking at the actual change in data it seems that this is more than just 
changing some byte ordering but that snappy is actually generating different 
output on a different architecture (which is fine as long as that output is 
portable between architectures).

There are two things here that I think we should change:

1. The actual test comparing compression output to a known value seems rather 
wrong. I would change those tests to be more along the lines of: check 
compression doesn't throw an error, check that the output is not identical to 
the input, and then check that the output binary is smaller than an 
uncompressed "compression". We may need to add a largish string so that we're 
giving each algorithm a softball for compression to prevent silly changes in 
the algorithm from breaking the unit test (as best exemplified by this case)

2. I think we should add the new s390x output to the list of various tests so 
that we can verify that snappy is capable of reading its own output from 
various architectures.

> Test case couch_compress_tests failed
> -
>
> Key: COUCHDB-3261
> URL: https://issues.apache.org/jira/browse/COUCHDB-3261
> Project: CouchDB
>  Issue Type: Bug
>  Components: Test Suite
>Reporter: salamani
>  Labels: test
> Attachments: couch_compress_tests.patch
>
>
> CouchDB : 2.0.0
> I have built the CouchDB source for version 2.0.0. 
> Test case log of couch_compress_tests:
> module 'couch_compress_tests'
> couch_compress_tests:33: compress_test_...ok
> couch_compress_tests:34: compress_test_...ok
> couch_compress_tests:35: compress_test_...*failed*
> in function couch_compress_tests:'-compress_test_/0-fun-4-'/0 
> (test/couch_compress_tests.erl, line 35)
> **error:{assertEqual,[{module,couch_compress_tests},
>   {line,35},
>   {expression,"couch_compress : compress ( ? TERM , snappy )"},
>   {expected,<<1,49,64,131,104,1,108,0,0,0,5,...>>},
>   {value,<<1,49,60,131,104,1,108,0,0,0,...>>}]}
>   output:<<"">>
> couch_compress_tests:40: decompress_test_...ok
> couch_compress_tests:41: decompress_test_...ok
> couch_compress_tests:42: decompress_test_...ok
> couch_compress_tests:43: decompress_test_...ok
> couch_compress_tests:48: recompress_test_...ok
> couch_compress_tests:49: recompress_test_...*failed*
> in function couch_compress_tests:'-recompress_test_/0-fun-2-'/0 
> (test/couch_compress_tests.erl, line 49)
> **error:{assertEqual,[{module,couch_compress_tests},
>   {line,49},
>   {expression,"couch_compress : compress ( ? NONE , snappy )"},
>   {expected,<<1,49,64,131,104,1,108,0,0,0,5,...>>},
>   {value,<<1,49,60,131,104,1,108,0,0,0,...>>}]}
>   output:<<"">>
> couch_compress_tests:50: recompress_test_...ok
> couch_compress_tests:51: recompress_test_...*failed*
> in function couch_compress_tests:'-recompress_test_/0-fun-6-'/0 
> (test/couch_compress_tests.erl, line 51)
> **error:{assertEqual,[{module,couch_compress_tests},
>   {line,51},
>   {expression,"couch_compress : compress ( ? DEFLATE , snappy )"},
>   {expected,<<1,49,64,131,104,1,108,0,0,0,5,...>>},
>   {value,<<1,49,60,131,104,1,108,0,0,0,...>>}]}
>   output:<<"">>
> couch_compress_tests:52: recompress_test_...ok
> couch_compress_tests:53: recompress_test_...ok
> couch_compress_tests:58: is_compressed_test_...ok
> couch_compress_tests:59: is_compressed_test_...ok
> couch_compress_tests:60: is_compressed_test_...ok
> couch_compress_tests:61: is_compressed_test_...ok
> couch_compress_tests:62: is_compressed_test_...ok
> couch_compress_tests:63: is_compressed_test_...ok
> couch_compress_tests:64: is_compressed_test_...ok
> couch_compress_tests:65: is_compressed_test_...ok
> couch_compress_tests:66: is_compressed_test_...ok
> couch_compress_tests:67: is_compressed_test_...ok
> couch_compress_tests:68: is_compressed_test_...ok
> couch_compress_tests:70: is_compressed_test_...ok
> couch_compress_tests:72: is_compressed_test_...ok
> [done in 0.078 s]
> [done in 0.078 s]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3376) Fix mem3_shards under load

2017-04-14 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969627#comment-15969627
 ] 

Paul Joseph Davis commented on COUCHDB-3376:


PR: https://github.com/apache/couchdb/pull/476

> Fix mem3_shards under load
> --
>
> Key: COUCHDB-3376
> URL: https://issues.apache.org/jira/browse/COUCHDB-3376
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Paul Joseph Davis
>
> There were two issues with mem3_shards that were fixed while I've been 
> testing the PSE code.
> The first issue was found by [~jaydoane] where a database can have its shards 
> inserted into the cache after its been deleted. This can happen if a client 
> does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to 
> track the changes feed update sequence from the changes feed listener and 
> only insert shard maps that come from a client that has read as recent of an 
> update_seq as mem3_shards.
> The second issue found during heavy benchmarking was that large shard maps 
> (in the Q>=128 range) can quite easily cause mem3_shards to backup when 
> there's a thundering herd attempting to open the database. There's no 
> coordination among workers trying to add a shard map to the cache so if a 
> bunch of independent clients all send the shard map at once (say, at the 
> beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for 
> this was two fold. First, rather than send the shard map directly to 
> mem3_shards, we copy it into a spawned process and when/if mem3_shards wants 
> to write it, it tells this writer process to do its business. The second 
> optimization for this change is to create an ets table to track these 
> processes. Then independent clients can check if a shard map is already 
> enroute to mem3_shards by using ets:insert_new and canceling their writer if 
> that returns false.
> PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3376) Fix mem3_shards under load

2017-04-14 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3376:
--

 Summary: Fix mem3_shards under load
 Key: COUCHDB-3376
 URL: https://issues.apache.org/jira/browse/COUCHDB-3376
 Project: CouchDB
  Issue Type: Bug
Reporter: Paul Joseph Davis


There were two issues with mem3_shards that were fixed while I've been testing 
the PSE code.

The first issue was found by [~jaydoane] where a database can have its shards 
inserted into the cache after its been deleted. This can happen if a client 
does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to 
track the changes feed update sequence from the changes feed listener and only 
insert shard maps that come from a client that has read as recent of an 
update_seq as mem3_shards.

The second issue found during heavy benchmarking was that large shard maps (in 
the Q>=128 range) can quite easily cause mem3_shards to backup when there's a 
thundering herd attempting to open the database. There's no coordination among 
workers trying to add a shard map to the cache so if a bunch of independent 
clients all send the shard map at once (say, at the beginning of a benchmark) 
then mem3_shards can get overwhelmed. The fix for this was two fold. First, 
rather than send the shard map directly to mem3_shards, we copy it into a 
spawned process and when/if mem3_shards wants to write it, it tells this writer 
process to do its business. The second optimization for this change is to 
create an ets table to track these processes. Then independent clients can 
check if a shard map is already enroute to mem3_shards by using ets:insert_new 
and canceling their writer if that returns false.

PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3314) Add an option in doc creation APIs to specify a random value for an initial doc revision

2017-03-17 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930326#comment-15930326
 ] 

Paul Joseph Davis commented on COUCHDB-3314:


Couple clarifications and summary and opinion:

1. We only do the random revision for the initial revision which does *not* 
include when a document is deleted and being recreated (as that causes wide 
revision trees which has its own issues).
2. Random initial revisions are not a hard requirement for clustered purge, its 
purely to avoid some confusing behavior in a specific scenario of create/purge 
cycling with the same doc content.
3. I think we're all in agreement that letting people specify their own 
revisions is still useful and theoretically already Just Works if they use 
new_edits=false.

Opinion:

I'd still like to add this pre-3.0 with a config switch that we either remove 
or swap the default on when 3.0 goes out. Given that at least initially we'll 
want to be playing with this a lot before a 3.0 (and there's loads of other 
things we have planned for 3.0 that are backwards incompatible) this still 
seems best to me. Though we could also just work on specifying the revision 
pre-3.0 (ie, make sure it works) and then make this change when we get all the 
things ready for 3.0. Oooh, also it occurs to me that we should look at making 
specifying the revision without requiring new_edits=false which only works if 
the document doesn't exist. This way we can do all of this without conditioning 
users to specify new_edits=false in some situations which could end up causing 
conflicts.

> Add an option in doc creation APIs to specify a random value for an initial 
> doc revision
> 
>
> Key: COUCHDB-3314
> URL: https://issues.apache.org/jira/browse/COUCHDB-3314
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core, HTTP Interface
>Reporter: Mayya Sharipova
>
> Currently the initial revision of a document is deterministic. For instance,
> anyone that has created an empty document probably recognizes the revision 
> starting with "1-967a00dff...". In order to account for situations when a 
> document is continually purged and recreated we're going to add randomness to 
> this initial revision by specifying a 0-$rev in the request coordinator. We 
> will then include this in the revision generation but drop the 0-$rev entry 
> from the revision's path. 
> Thus, the new API will look like this:
> acurl -X PUT 
> https://http://adm:pass@127.0.0.1:5984/test-db/newdoc1?rev=0-adfdafa123 -d 
> '{}'
> And similarly for _bulk_docs
> For a user who wants to create a doc, then purge it, and then re-create, it 
> is recommended to recreate it with another random revision. 
> It is important to note that the 0-$rev only affects document creation. Once
> a document exists, updates to the document will continue to update their hash 
> in the same deterministic fashion. Ie, once a document exists, identical
> updates will result in identical revisions.
> _
> The following changes need to be made in the code:
> 1.  API changes to allow to specify random rev in doc PUT requests, _bulk_docs
> 2. Internals: 
> 2.1 Use a new revision here: 
> https://github.com/apache/couchdb-couch/blob/master/src/couch_db.erl#L886 
> 2.2 Don't include provided 0-$rev entry to the revision's path (find wherever 
> new_revid is called from; could be 2-3 places)
> 2.3 Reject a 0-$rev during replication



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3314) Add an option in doc creation APIs to specify a random value for an initial doc revision

2017-03-16 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928298#comment-15928298
 ] 

Paul Joseph Davis commented on COUCHDB-3314:


To clarify on the create/purge cycle behavior. My worry there is that users 
would end up seeing either purges that don't seem to take effect and/or creates 
that don't seem to take effect. As this would be racing with internal 
replication and read-repair the eventual consistency aspects of the system 
would I think produce "interesting" results that the random initial revision 
would solve.


> Add an option in doc creation APIs to specify a random value for an initial 
> doc revision
> 
>
> Key: COUCHDB-3314
> URL: https://issues.apache.org/jira/browse/COUCHDB-3314
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core, HTTP Interface
>Reporter: Mayya Sharipova
>
> Currently the initial revision of a document is deterministic. For instance,
> anyone that has created an empty document probably recognizes the revision 
> starting with "1-967a00dff...". In order to account for situations when a 
> document is continually purged and recreated we're going to add randomness to 
> this initial revision by specifying a 0-$rev in the request coordinator. We 
> will then include this in the revision generation but drop the 0-$rev entry 
> from the revision's path. 
> Thus, the new API will look like this:
> acurl -X PUT 
> https://http://adm:pass@127.0.0.1:5984/test-db/newdoc1?rev=0-adfdafa123 -d 
> '{}'
> And similarly for _bulk_docs
> For a user who wants to create a doc, then purge it, and then re-create, it 
> is recommended to recreate it with another random revision. 
> It is important to note that the 0-$rev only affects document creation. Once
> a document exists, updates to the document will continue to update their hash 
> in the same deterministic fashion. Ie, once a document exists, identical
> updates will result in identical revisions.
> _
> The following changes need to be made in the code:
> 1.  API changes to allow to specify random rev in doc PUT requests, _bulk_docs
> 2. Internals: 
> 2.1 Use a new revision here: 
> https://github.com/apache/couchdb-couch/blob/master/src/couch_db.erl#L886 
> 2.2 Don't include provided 0-$rev entry to the revision's path (find wherever 
> new_revid is called from; could be 2-3 places)
> 2.3 Reject a 0-$rev during replication



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3314) Add an option in doc creation APIs to specify a random value for an initial doc revision

2017-03-16 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928293#comment-15928293
 ] 

Paul Joseph Davis commented on COUCHDB-3314:


Couple points for clarification:

The random initial revision isn't *required* for clustered purge or permanent 
deletes. Its merely to try and avoid some *possible* odd behavior if a user has 
a specific pattern around "create doc, completely purge doc" in that if they 
cycle quickly enough they may get into a state where things get a bit wonky 
until they create a second revision. And that's not even known. This just makes 
things a lot more sane by not having the same revisions floating around in a 
given document's revision tree.

That said, the other reason this was optional was so that we could split it 
between the 2.x/3.0 branch. One was to make it possible (either via API or 
config) and then the second ticket was to swap the default on 3.0 release.

[~rnewson] I'd say that's only part of the swap and as [~janl] says a rare part 
of the reasoning. Its after a doc is created in a db that deterministic 
revisions are mostly important.

[~janl] Responding point by point:

1. That's my assumption but I haven't got any hard data either way.
2. Cool
3. This is unrelated. Purge will work the same regardless of how the doc is 
created.
4/5. The escape hatch I came up with was the 0- hack. Your suggestion to 
specify a revision with new_edits=false seems better on the face of it cause 
that kind of even somehow matches the semantics better I think. Creating the 
same doc in two different databases is almost like a "pre-creation replication" 
type of operation if that makes sense.

> Add an option in doc creation APIs to specify a random value for an initial 
> doc revision
> 
>
> Key: COUCHDB-3314
> URL: https://issues.apache.org/jira/browse/COUCHDB-3314
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core, HTTP Interface
>Reporter: Mayya Sharipova
>
> Currently the initial revision of a document is deterministic. For instance,
> anyone that has created an empty document probably recognizes the revision 
> starting with "1-967a00dff...". In order to account for situations when a 
> document is continually purged and recreated we're going to add randomness to 
> this initial revision by specifying a 0-$rev in the request coordinator. We 
> will then include this in the revision generation but drop the 0-$rev entry 
> from the revision's path. 
> Thus, the new API will look like this:
> acurl -X PUT 
> https://http://adm:pass@127.0.0.1:5984/test-db/newdoc1?rev=0-adfdafa123 -d 
> '{}'
> And similarly for _bulk_docs
> For a user who wants to create a doc, then purge it, and then re-create, it 
> is recommended to recreate it with another random revision. 
> It is important to note that the 0-$rev only affects document creation. Once
> a document exists, updates to the document will continue to update their hash 
> in the same deterministic fashion. Ie, once a document exists, identical
> updates will result in identical revisions.
> _
> The following changes need to be made in the code:
> 1.  API changes to allow to specify random rev in doc PUT requests, _bulk_docs
> 2. Internals: 
> 2.1 Use a new revision here: 
> https://github.com/apache/couchdb-couch/blob/master/src/couch_db.erl#L886 
> 2.2 Don't include provided 0-$rev entry to the revision's path (find wherever 
> new_revid is called from; could be 2-3 places)
> 2.3 Reject a 0-$rev during replication



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (COUCHDB-3298) Improve couch_btree:chunkify logic

2017-03-01 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis resolved COUCHDB-3298.

Resolution: Fixed

Merged.

> Improve couch_btree:chunkify logic
> --
>
> Key: COUCHDB-3298
> URL: https://issues.apache.org/jira/browse/COUCHDB-3298
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> The current chunkify has problems when reduce functions create large values 
> in that it will produce chunks (ie, kp nodes) that contain a single key. In 
> some pathological cases this can create long chains of nodes that never 
> branch.
> The old chunkify would also try and create nodes with an even number of bytes 
> in each chunk. Given that we don't re-use chunks it makes more sense to try 
> and pack our chunks as close to the threshold as possible so that we're 
> creating fewer branches in our tree.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3309) Remove disk_size, data_size and other.data_size attribute from db info blobs

2017-02-24 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883392#comment-15883392
 ] 

Paul Joseph Davis commented on COUCHDB-3309:


I have no idea how to set this as blocking so I just assigned a Fix Version. If 
anyone knows better feel free to correct that.

> Remove disk_size, data_size and other.data_size attribute from db info blobs
> 
>
> Key: COUCHDB-3309
> URL: https://issues.apache.org/jira/browse/COUCHDB-3309
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Reporter: Paul Joseph Davis
> Fix For: 3.0.0
>
>
> Since 2.0 we've had duplicate keys in our database info blobs for size 
> fields. I was going to remove these as part of the storage engine work but 
> that'd be backwards incompatible. I'm opening this ticket and setting it as 
> blocking for 3.0 so that we remember to remove them when we can make backward 
> incompatible changes.
> Also, to be clear, these are duplicates. The same data is available under the 
> sizes key with extremely less ambiguous naming (and will be configurable for 
> storage engines to return whatever they want there).
> {code}
> {
> "compact_running": false,
> "data_size": 23403,
> "db_name": "test-db",
> "disk_format_version": 6,
> "disk_size": 513032,
> "doc_count": 10,
> "doc_del_count": 2,
> "instance_start_time": "0",
> "other": {
> "data_size": 6020
> },
> "purge_seq": 0,
> "sizes": {
> "active": 23403,
> "external": 6020,
> "file": 513032
> },
> "update_seq": 
> "82-g1DveJzLYWBgYMlgTmFQSklKzi9KdUhJMjLWy83PzyvOyMxL1UvOyS9NScwr0ctLLckBqmVKZEiyf1YiH5ouU3y6khyAZFI9No2GeDXmsQBJhgYgBdS7PytRAs1WQ8KaD0A0A22WywIAA-tQaQ"
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections

2017-02-21 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876234#comment-15876234
 ] 

Paul Joseph Davis commented on COUCHDB-3302:


You logged that nothing sends that message? There's a call to send it in two 
places:

https://github.com/apache/couchdb-fabric/blob/master/src/fabric_doc_attachments.erl#L40
https://github.com/apache/couchdb-fabric/blob/master/src/fabric_doc_attachments.erl#L71

This code is gnarly enough that its quite possible there's something broken 
with those calls obviously but given how rexi:reply/1 should throw a badmatch 
if the rexi_from pdict entry isn't set I'm not sure what it'd be.

> Attachment replication over low bandwidth network connections
> -
>
> Key: COUCHDB-3302
> URL: https://issues.apache.org/jira/browse/COUCHDB-3302
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Jan Lehnardt
> Attachments: attach_large.py, replication-failure.log, 
> replication-failure-target.log
>
>
> Setup:
> Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit 
> network connection (simulated locally with traffic shaping, see way below for 
> an example).
> {noformat}
> git clone https://github.com/apache/couchdb.git
> cd couchdb
> ./configure --disable-docs --disable-fauxton
> make release
> cd ..
> cp -r couchdb/rel/couchdb source
> cp -r couchdb/rel/couchdb target
> # set up local ini: chttpd / port: 5981 / 5983
> # set up vm.args: source@hostname.local / target@hostname.local
> # no admins
> Start both CouchDB in their own terminal windows: ./bin/couchdb
> # create all required databases, and our `t` test database
> curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t}
> # create 64MB attachments
> dd if=/dev/urandom of=att-64 bs=1024 count=65536
> # create doc on source
> curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: 
> application/octet-stream' -d @att-64
> # replicate to target
> curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json 
> -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}'
> {noformat}
> With the traffic shaping in place, the replication call doesn’t return, and 
> eventually CouchDB fails with:
> {noformat}
> [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator  
> Error in process <0.15811.0> on node 'source@hostname.local' with exit value:
> {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}
> [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0>  
> Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; 
> failed due to error {error,
> {'EXIT',
> {{{nocatch,{mp_parser_died,noproc}},
>   [{couch_att,'-foldl/4-fun-0-',3,
>[{file,"src/couch_att.erl"},{line,591}]},
>{couch_att,fold_streamed_data,4,
>[{file,"src/couch_att.erl"},{line,642}]},
>{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},
>{couch_httpd_multipart,atts_to_mp,4,
>[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
>  {gen_server,call,
>  [<0.15778.0>,
>   {send_req,
>   {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;,
>"127.0.0.1",5983,undefined,undefined,
>"/t/doc1?new_edits=false",http,ipv4_address},
>[{"Accept","application/json"},
> {"Content-Length",33194202},
> {"Content-Type",
>  "multipart/related; 
> boundary=\"0dea87076009b928b191e0b456375c93\""},
> {"User-Agent","CouchDB-Replicator/2.0.0"}],
>put,
>{#Fun,
> 
> {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>,
>  [{att,<<"att_64">>,<<"application/octet-stream">>,
>   33193656,33193656,
>   <<179,112,0,209,198,47,192,236,235,72,84,218,0,177,
> 161,242>>,
>   1,
>   {follows,<0.8720.0>,#Ref<0.0.1.23804>},
>   

[jira] [Commented] (COUCHDB-3300) Merge all apps that can't be used externally

2017-02-15 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868338#comment-15868338
 ] 

Paul Joseph Davis commented on COUCHDB-3300:


It has calls to couch_log and couch_stats or else it'd be find. Its definitely 
gray area but I think I'd rather keep it part of the mono repo, especially if 
we ever get around to creating separate data channels between nodes.

> Merge all apps that can't be used externally
> 
>
> Key: COUCHDB-3300
> URL: https://issues.apache.org/jira/browse/COUCHDB-3300
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: Paul Joseph Davis
>
> Managing a whole bunch of repos isn't fun. Most of our repos aren't really 
> useful outside of CouchDB so we're looking to merge them into the main 
> repository while still leaving our generally useful apps as standalone 
> repositories. Here's the current list of how we're categorizing repos:
> *monorepo*
> chttpd
> couch
> couch_epi
> couch_event
> couch_index
> couch_log
> couch_mrview
> couch_peruser
> couch_plugins
> couch_replicator
> couch_stats
> couch_tests
> ddoc_cache
> fabric
> global_changes
> mango
> mem3
> rexi
> *independent life cycle*
> fauxton
> docs
> setup
> *deprecated*
> oauth
> *standalone*
> config
> ets_lru
> khash
> b64url
> snappy
> ioq
> *third-party*
> jiffy
> rebar
> bear
> folsom
> meck
> mochiweb
> ibrowse



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3300) Merge all apps that can't be used externally

2017-02-15 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868252#comment-15868252
 ] 

Paul Joseph Davis commented on COUCHDB-3300:


Also, here's the script to generate the merged repository:

https://gist.github.com/davisp/99d1ac0516e0a0d02104b123e79ff6a0

With this and the patches listed above I got everything compiled and a dev 
cluster running. If someone wants to check that work that'd be nice.

Also I went to push a COUCHDB-3300-merge-repos branch on couchdb.git but it 
failed after writing a whole bunch of stuff. So we may have to talk to infra 
about that. I also realized after it was writing that that may generate 
thousands of notifications since we're adding a whole bunch of commits at once.

{code}
#!/bin/bash -e

rm -rf couchdb
git clone https://github.com/apache/couchdb.git
cd couchdb
echo ""


add_subtree () {
name=$1
if [ -z "$2" ]; then
path=`echo $1 | sed -e 's/-/_/g'`
else
path=$2
fi

echo "Adding couchdb-$name.git as src/$path"
git subtree add -P src/$path https://github.com/apache/couchdb-$name.git 
master
echo ""
}


add_subtree "chttpd"
add_subtree "couch"
add_subtree "couch-epi"
add_subtree "couch-event"
add_subtree "couch-index"
add_subtree "couch-log"
add_subtree "couch-mrview"
add_subtree "peruser" "couch_peruser"
add_subtree "couch-plugins"
add_subtree "couch-replicator"
add_subtree "couch-stats"
add_subtree "erlang-tests" "couch_tests"
add_subtree "ddoc-cache"
add_subtree "fabric"
add_subtree "global-changes"
add_subtree "mango"
add_subtree "mem3"
add_subtree "rexi"
{code}

> Merge all apps that can't be used externally
> 
>
> Key: COUCHDB-3300
> URL: https://issues.apache.org/jira/browse/COUCHDB-3300
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: Paul Joseph Davis
>
> Managing a whole bunch of repos isn't fun. Most of our repos aren't really 
> useful outside of CouchDB so we're looking to merge them into the main 
> repository while still leaving our generally useful apps as standalone 
> repositories. Here's the current list of how we're categorizing repos:
> *monorepo*
> chttpd
> couch
> couch_epi
> couch_event
> couch_index
> couch_log
> couch_mrview
> couch_peruser
> couch_plugins
> couch_replicator
> couch_stats
> couch_tests
> ddoc_cache
> fabric
> global_changes
> mango
> mem3
> rexi
> *independent life cycle*
> fauxton
> docs
> setup
> *deprecated*
> oauth
> *standalone*
> config
> ets_lru
> khash
> b64url
> snappy
> ioq
> *third-party*
> jiffy
> rebar
> bear
> folsom
> meck
> mochiweb
> ibrowse



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3300) Merge all apps that can't be used externally

2017-02-15 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868249#comment-15868249
 ] 

Paul Joseph Davis commented on COUCHDB-3300:


Here's the patch I needed to make things compile:

https://gist.github.com/davisp/218c17a96886f05dc4e0e6b5fef99f4c
(pasted below as well)

Also, this patch to setup to fix a search path:

https://git-wip-us.apache.org/repos/asf?p=couchdb-setup.git;a=blobdiff;f=src/setup.erl;h=085decce63ed5243c3792692ea109036850e21a2;hp=b27c6c63dca63d1032cd937b6c85a547c65b789a;hb=bdf96f926952071c5b8b7b04d6c4de932aee6d65;hpb=e8d1e32ba3b4f5f3be0e06e5269b12d811f24d52


{code}
commit 6bfc236edea2ac9e285517056dabeaf67f7cd7f7
Author: Paul J. Davis 
Date:   Wed Feb 15 11:46:31 2017 -0600

Fix rebar configuration after repository merge

diff --git a/rebar.config.script b/rebar.config.script
index 85d5c94fc..9770a3f6c 100644
--- a/rebar.config.script
+++ b/rebar.config.script
@@ -21,42 +21,52 @@ os:putenv("COUCHDB_CONFIG", ConfigureEnv).
 
 os:putenv("COUCHDB_APPS_CONFIG_DIR", filename:join([COUCHDB_ROOT, 
"rel/apps"])).
 
+SubDirs = [
+%% must be compiled first as it has a custom behavior
+"src/couch_epi",
+"src/couch_log",
+"src/chttpd",
+"src/couch",
+"src/couch_index",
+"src/couch_mrview",
+"src/couch_replicator",
+"src/couch_plugins",
+"src/couch_event",
+"src/couch_stats",
+"src/couch_peruser",
+"src/couch_tests",
+"src/ddoc_cache",
+"src/fabric",
+"src/global_changes",
+"src/mango",
+"src/mem3",
+"src/rexi",
+"rel"
+],
+
 DepDescs = [
-%% must be compiled first as it has a custom behavior
-{couch_epi,"couch-epi",
"60e7f808513b2611eb412cf641d6e7132dda2a30"},
+%% Independent Apps
 {config,   "config",   
"f62d553b337ce975edb0fb68772d22bdd3bf6490"},
-%% keep these sorted
 {b64url,   "b64url",   
"6895652d80f95cdf04efb14625abed868998f174"},
-{couch_log,"couch-log",
"ad803f66dbd1900b67543259142875a6d03503ce"},
-{chttpd,   "chttpd",   
"cb0f20ea0898cd24ff8ac0617b326874088d9157"},
-{couch,"couch",
"66292dbdfee1a6d5981085d7e50751feacf860c8"},
-{couch_index,  "couch-index",  
"f0a6854e578469612937a766632fdcdc52ee9c65"},
-{couch_mrview, "couch-mrview", 
"e1d13a983a0ba56fcb1eb31c4e4fe56bc3692719"},
-{couch_replicator, "couch-replicator", 
"648e465f54f538a133fb31c9b1e3b487a6f2ca7c"},
-{couch_plugins,"couch-plugins",
"3e73b723cb126cfc471b560d17c24a8b5c540085"},
-{couch_event,  "couch-event",  
"7e382132219d708239306aa3591740694943d367"},
-{couch_stats,  "couch-stats",  
"7895d4d3f509ed24f09b6d1a0bd0e06af34551dc"},
-{couch_peruser,"peruser",  
"4eea9571171a5b41d832da32204a1122a01f4b0e"},
-{couch_tests,   "erlang-tests",
"37b3bfeb4b1a48a592456e67991362e155ed81e0"},
-{docs, "documentation",
"59a887a97f9b6befc6de0c5bdaf17d79fb7f915d", [raw]},
-{ddoc_cache,   "ddoc-cache",   
"c762e90a33ce3cda19ef142dd1120f1087ecd876"},
 {ets_lru,  "ets-lru",  
"c05488c8b1d7ec1c3554a828e0c9bf2888932ed6"},
-{fabric,   "fabric",   
"ec2235196d7195afab59cedc2d61a02b11596ab4"},
+{ioq,  "ioq",  
"1d2b149ee12dfeaf8d89a67b2f937207f4c5bdf2"},
+{khash,"khash",
"7c6a9cd9776b5c6f063ccafedfa984b00877b019"},
+{snappy,   "snappy",   
"a728b960611d0795025de7e9668d06b9926c479d"},
+{setup,"setup",
"e8d1e32ba3b4f5f3be0e06e5269b12d811f24d52"},
+
+%% Non-Erlang deps
+{docs, "documentation",
"59a887a97f9b6befc6de0c5bdaf17d79fb7f915d", [raw]},
 {fauxton,  "fauxton",  {tag, "v1.1.9"}, [raw]},
+
+%% Third party deps
 {folsom,   "folsom",   
"a5c95dec18227c977029fbd3b638966d98f17003"},
-{global_changes,   "global-changes",   
"f6e4c5629a7d996d284e4489f1897c057823f846"},
 {ibrowse,  "ibrowse",  
"4af2d408607874d124414ac45df1edbe3961d1cd"},
-{ioq,  "ioq",  
"1d2b149ee12dfeaf8d89a67b2f937207f4c5bdf2"},
 {jiffy,"jiffy",
"d3c00e19d8fa20c21758402231247602190988d3"},
-{khash,"khash",
"7c6a9cd9776b5c6f063ccafedfa984b00877b019"},
-{mango,"mango",
"4afd60e84d0e1c57f5d6a1e3542955faa565ca4b"},
-{mem3, "mem3", 
"c3c5429180de14a2b139f7741c934143ef73988c"},
 {mochiweb, "mochiweb", 
"bd6ae7cbb371666a1f68115056f7b30d13765782"},
-{oauth,"oauth",
"099057a98e41f3aff91e77e3cf496d6c6fd901df"},
-{rexi, "rexi", 
"a327b7dbeb2b0050f7ca9072047bf8ef2d282833"},
-{snappy,   "snappy",   
"a728b960611d0795025de7e9668d06b9926c479d"},
-{setup,"setup",

[jira] [Created] (COUCHDB-3300) Merge all apps that can't be used externally

2017-02-15 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3300:
--

 Summary: Merge all apps that can't be used externally
 Key: COUCHDB-3300
 URL: https://issues.apache.org/jira/browse/COUCHDB-3300
 Project: CouchDB
  Issue Type: Improvement
Reporter: Paul Joseph Davis


Managing a whole bunch of repos isn't fun. Most of our repos aren't really 
useful outside of CouchDB so we're looking to merge them into the main 
repository while still leaving our generally useful apps as standalone 
repositories. Here's the current list of how we're categorizing repos:

# monorepo
chttpd
couch
couch_epi
couch_event
couch_index
couch_log
couch_mrview
couch_peruser
couch_plugins
couch_replicator
couch_stats
couch_tests
ddoc_cache
fabric
global_changes
mango
mem3
rexi

# independent life cycle
fauxton
docs
setup

#deprecated
oauth

# standalone
config
ets_lru
khash
b64url
snappy
ioq

# third-party
jiffy
rebar
bear
folsom
meck
mochiweb
ibrowse



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3298) Improve couch_btree:chunkify logic

2017-02-11 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3298:
--

 Summary: Improve couch_btree:chunkify logic
 Key: COUCHDB-3298
 URL: https://issues.apache.org/jira/browse/COUCHDB-3298
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Paul Joseph Davis


The current chunkify has problems when reduce functions create large values in 
that it will produce chunks (ie, kp nodes) that contain a single key. In some 
pathological cases this can create long chains of nodes that never branch.

The old chunkify would also try and create nodes with an even number of bytes 
in each chunk. Given that we don't re-use chunks it makes more sense to try and 
pack our chunks as close to the threshold as possible so that we're creating 
fewer branches in our tree.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3288) Remove access to the #db{} record

2017-02-01 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3288:
--

 Summary: Remove access to the #db{} record
 Key: COUCHDB-3288
 URL: https://issues.apache.org/jira/browse/COUCHDB-3288
 Project: CouchDB
  Issue Type: Improvement
Reporter: Paul Joseph Davis


To enable a mixed cluster upgrade (i.e., rolling reboot upgrade) we need to do 
some preparatory work to remove access to the #db{} record since this record is 
shared between nodes.

This work is all straight forward and just involves changing things like 
Db#db.main_pid to couch_db:get_main_pid(Db) or similar.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3287) Implement pluggable storage engines

2017-02-01 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3287:
--

 Summary: Implement pluggable storage engines
 Key: COUCHDB-3287
 URL: https://issues.apache.org/jira/browse/COUCHDB-3287
 Project: CouchDB
  Issue Type: Improvement
Reporter: Paul Joseph Davis


Opening branches for the pluggable storage engine work described here:

http://mail-archives.apache.org/mod_mbox/couchdb-dev/201606.mbox/%3CCAJ_m3YDjA9xym_JRVtd6Xi7LX7Ajwc6EmH_wyCRD1jgTzk8mKA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (COUCHDB-3255) Conflicts introduced by recreating docs with attachments

2016-12-15 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3255.
--
Resolution: Fixed

> Conflicts introduced by recreating docs with attachments
> 
>
> Key: COUCHDB-3255
> URL: https://issues.apache.org/jira/browse/COUCHDB-3255
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> When a document is re-created with an attachment it receives a 
> non-deterministic revision.  This is due to a fairly old commit [1] that 
> introduced the behavior by accidentally including information about revisions 
> on disk into the revision id calculation when the revision id was being 
> calculated by couch_db_updater when it realized that the update was 
> re-creating a document that was previously deleted.
> I'm opening a PR with the fix.
> [1] 
> https://github.com/apache/couchdb-couch/commit/08a94d582cd3086ebcbd51ad8ac98ca6df98a1b7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3255) Conflicts introduced by recreating docs with attachments

2016-12-13 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746609#comment-15746609
 ] 

Paul Joseph Davis commented on COUCHDB-3255:


PR: https://github.com/apache/couchdb-couch/pull/218

> Conflicts introduced by recreating docs with attachments
> 
>
> Key: COUCHDB-3255
> URL: https://issues.apache.org/jira/browse/COUCHDB-3255
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> When a document is re-created with an attachment it receives a 
> non-deterministic revision.  This is due to a fairly old commit [1] that 
> introduced the behavior by accidentally including information about revisions 
> on disk into the revision id calculation when the revision id was being 
> calculated by couch_db_updater when it realized that the update was 
> re-creating a document that was previously deleted.
> I'm opening a PR with the fix.
> [1] 
> https://github.com/apache/couchdb-couch/commit/08a94d582cd3086ebcbd51ad8ac98ca6df98a1b7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3255) Conflicts introduced by recreating docs with attachments

2016-12-13 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3255:
--

 Summary: Conflicts introduced by recreating docs with attachments
 Key: COUCHDB-3255
 URL: https://issues.apache.org/jira/browse/COUCHDB-3255
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Paul Joseph Davis


When a document is re-created with an attachment it receives a 
non-deterministic revision.  This is due to a fairly old commit [1] that 
introduced the behavior by accidentally including information about revisions 
on disk into the revision id calculation when the revision id was being 
calculated by couch_db_updater when it realized that the update was re-creating 
a document that was previously deleted.

I'm opening a PR with the fix.

[1] 
https://github.com/apache/couchdb-couch/commit/08a94d582cd3086ebcbd51ad8ac98ca6df98a1b7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3251) Remove hot loop usage of filename:rootname/1

2016-12-07 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3251.
--

> Remove hot loop usage of filename:rootname/1
> 
>
> Key: COUCHDB-3251
> URL: https://issues.apache.org/jira/browse/COUCHDB-3251
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> We added a call to filename:rootname/1 that removes the ".couch" extension 
> when it exists. We've been doing some profiling of CouchDB 2.0 recently and 
> found this to be a fairly expensive call. It and related calls are in the top 
> few most expensive functions according to eprof (this is VM wide, so not just 
> cherry picking couch_server where its actually even worse).
> {code}
> lists:zip/2 
> 157491702  1.35   77463688  [  0.49]
> erlang:setelement/3 
> 139509262  1.48   85212600  [  0.61]
> erlang:term_to_binary/2  
> 14724676  1.52   87419458  [  5.94]
> erlang:phash/2   
> 30943420  1.54   88195214  [  2.85]
> erlang:send/3
> 13487486  2.06  118261137  [  8.77]
> filename:rootname/4 
> 514574672  2.59  148907072  [  0.29]
> ets:lookup/2 
> 32852756  2.66  152952875  [  4.66]
> erts_internal:port_command/3 
> 10448091  2.95  169649699  [ 16.24]
> ioq_server:matching_request/4   
> 906453003  3.19  183041235  [  0.20]
> ioq_server:split/4  
> 535820540  3.31  189913578  [  0.35]
> snappy:compress/1 
> 7950803  3.42  196220575  [ 24.68]
> filename:do_flatten/2   
> 516517594  4.21  241562020  [  0.47]
> gen_server:try_handle_call/4  
> 9529789  5.66  324927694  [ 34.10]
> gen_server:loop/6
> 16844687  7.41  425628355  [ 25.27]
> {code}
> There's an obvious easy way to optimize this by using binary matching so 
> simple PR is incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3251) Remove hot loop usage of filename:rootname/1

2016-12-07 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis resolved COUCHDB-3251.

Resolution: Fixed

Merged.

> Remove hot loop usage of filename:rootname/1
> 
>
> Key: COUCHDB-3251
> URL: https://issues.apache.org/jira/browse/COUCHDB-3251
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> We added a call to filename:rootname/1 that removes the ".couch" extension 
> when it exists. We've been doing some profiling of CouchDB 2.0 recently and 
> found this to be a fairly expensive call. It and related calls are in the top 
> few most expensive functions according to eprof (this is VM wide, so not just 
> cherry picking couch_server where its actually even worse).
> {code}
> lists:zip/2 
> 157491702  1.35   77463688  [  0.49]
> erlang:setelement/3 
> 139509262  1.48   85212600  [  0.61]
> erlang:term_to_binary/2  
> 14724676  1.52   87419458  [  5.94]
> erlang:phash/2   
> 30943420  1.54   88195214  [  2.85]
> erlang:send/3
> 13487486  2.06  118261137  [  8.77]
> filename:rootname/4 
> 514574672  2.59  148907072  [  0.29]
> ets:lookup/2 
> 32852756  2.66  152952875  [  4.66]
> erts_internal:port_command/3 
> 10448091  2.95  169649699  [ 16.24]
> ioq_server:matching_request/4   
> 906453003  3.19  183041235  [  0.20]
> ioq_server:split/4  
> 535820540  3.31  189913578  [  0.35]
> snappy:compress/1 
> 7950803  3.42  196220575  [ 24.68]
> filename:do_flatten/2   
> 516517594  4.21  241562020  [  0.47]
> gen_server:try_handle_call/4  
> 9529789  5.66  324927694  [ 34.10]
> gen_server:loop/6
> 16844687  7.41  425628355  [ 25.27]
> {code}
> There's an obvious easy way to optimize this by using binary matching so 
> simple PR is incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3251) Remove hot loop usage of filename:rootname/1

2016-12-06 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3251:
--

 Summary: Remove hot loop usage of filename:rootname/1
 Key: COUCHDB-3251
 URL: https://issues.apache.org/jira/browse/COUCHDB-3251
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Paul Joseph Davis


We added a call to filename:rootname/1 that removes the ".couch" extension when 
it exists. We've been doing some profiling of CouchDB 2.0 recently and found 
this to be a fairly expensive call. It and related calls are in the top few 
most expensive functions according to eprof (this is VM wide, so not just 
cherry picking couch_server where its actually even worse).

{code}
lists:zip/2 
157491702  1.35   77463688  [  0.49]
erlang:setelement/3 
139509262  1.48   85212600  [  0.61]
erlang:term_to_binary/2  
14724676  1.52   87419458  [  5.94]
erlang:phash/2   
30943420  1.54   88195214  [  2.85]
erlang:send/3
13487486  2.06  118261137  [  8.77]
filename:rootname/4 
514574672  2.59  148907072  [  0.29]
ets:lookup/2 
32852756  2.66  152952875  [  4.66]
erts_internal:port_command/3 
10448091  2.95  169649699  [ 16.24]
ioq_server:matching_request/4   
906453003  3.19  183041235  [  0.20]
ioq_server:split/4  
535820540  3.31  189913578  [  0.35]
snappy:compress/1 
7950803  3.42  196220575  [ 24.68]
filename:do_flatten/2   
516517594  4.21  241562020  [  0.47]
gen_server:try_handle_call/4  
9529789  5.66  324927694  [ 34.10]
gen_server:loop/6
16844687  7.41  425628355  [ 25.27]
{code}

There's an obvious easy way to optimize this by using binary matching so simple 
PR is incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3239) incorrect ordering of results when using open_revs and latest=true

2016-12-06 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725967#comment-15725967
 ] 

Paul Joseph Davis commented on COUCHDB-3239:


Hah, well that'd do it.



> incorrect ordering of results when using open_revs and latest=true
> --
>
> Key: COUCHDB-3239
> URL: https://issues.apache.org/jira/browse/COUCHDB-3239
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 2.0.0
>Reporter: Will Holley
> Attachments: docs.json
>
>
> When fetching open_revs with latest=true for a conflicted document, the order 
> of results is incorrect. For example, if I create a document with the rev 
> tree:
> {code}
>   4-d1
>   /
>   3-c1
>   /
>   2-b1
>   /
> 1-a 
>   \
>   2-b2
>   \
>   3-c2
> {code}
> and ask for {{open_revs=["2-b1","2-b2"]=true}}, the response will 
> return {{3-c2}} followed by {{4-d1}} - the reverse of what I'd expect.
> Below is a test/reproduction executed against Couch 1.6.1 and 2.0.
> 1.6.1:
> {code}
> $ export COUCH_HOST="http://127.0.0.1:5984;
> $ curl -XPUT "$COUCH_HOST/open_revs_test"
> {"ok":true}
> $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H 
> "Content-Type:application/json" -XPOST -d @docs.json
> []
> # GET open_revs=["2-b1","2-b2"]
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D"
> [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}]
> # GET open_revs=["2-b1","2-b2"]=true
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true"
> [{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}},{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}}]
> {code}
> 2.0:
> {code}
> $ export COUCH_HOST="http://127.0.0.1:15984;
> $ curl -XPUT "$COUCH_HOST/open_revs_test"
> {"ok":true}
> $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H 
> "Content-Type:application/json" -XPOST -d @docs.json
> []
> # GET open_revs=["2-b1","2-b2"]
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D"
> [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}]
> # GET open_revs=["2-b1","2-b2"]=true
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true"
> [{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}},{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}}]
> {code}
> Note the reversed order of the results in 2.0 when {{latest=true}} is 
> specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3239) incorrect ordering of results when using open_revs and latest=true

2016-12-06 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725798#comment-15725798
 ] 

Paul Joseph Davis commented on COUCHDB-3239:


With your example specifying the 1-a would be the easiest way. If it
doesn't return multiple then its broken.



> incorrect ordering of results when using open_revs and latest=true
> --
>
> Key: COUCHDB-3239
> URL: https://issues.apache.org/jira/browse/COUCHDB-3239
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 2.0.0
>Reporter: Will Holley
> Attachments: docs.json
>
>
> When fetching open_revs with latest=true for a conflicted document, the order 
> of results is incorrect. For example, if I create a document with the rev 
> tree:
> {code}
>   4-d1
>   /
>   3-c1
>   /
>   2-b1
>   /
> 1-a 
>   \
>   2-b2
>   \
>   3-c2
> {code}
> and ask for {{open_revs=["2-b1","2-b2"]=true}}, the response will 
> return {{3-c2}} followed by {{4-d1}} - the reverse of what I'd expect.
> Below is a test/reproduction executed against Couch 1.6.1 and 2.0.
> 1.6.1:
> {code}
> $ export COUCH_HOST="http://127.0.0.1:5984;
> $ curl -XPUT "$COUCH_HOST/open_revs_test"
> {"ok":true}
> $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H 
> "Content-Type:application/json" -XPOST -d @docs.json
> []
> # GET open_revs=["2-b1","2-b2"]
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D"
> [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}]
> # GET open_revs=["2-b1","2-b2"]=true
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true"
> [{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}},{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}}]
> {code}
> 2.0:
> {code}
> $ export COUCH_HOST="http://127.0.0.1:15984;
> $ curl -XPUT "$COUCH_HOST/open_revs_test"
> {"ok":true}
> $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H 
> "Content-Type:application/json" -XPOST -d @docs.json
> []
> # GET open_revs=["2-b1","2-b2"]
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D"
> [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}]
> # GET open_revs=["2-b1","2-b2"]=true
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true"
> [{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}},{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}}]
> {code}
> Note the reversed order of the results in 2.0 when {{latest=true}} is 
> specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3239) incorrect ordering of results when using open_revs and latest=true

2016-12-06 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725561#comment-15725561
 ] 

Paul Joseph Davis commented on COUCHDB-3239:


[~wilhol] Yeah, the docs are pretty bad for latest=true:

"Forces retrieving latest “leaf” revision, no matter what rev was requested. 
Default is false"

Even for a single node latest=true might return multiple revisions and it 
doesn't say anything about ordering. A commit there would be useful. We'd also 
probably want to add a note that explains how complicated that API call can 
get. In hind sight, the open_revs and latest=true parameters should have 
probably been different API end points since they fundamentally change the body 
from a single doc with optional info into a multiple doc body response.

> incorrect ordering of results when using open_revs and latest=true
> --
>
> Key: COUCHDB-3239
> URL: https://issues.apache.org/jira/browse/COUCHDB-3239
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 2.0.0
>Reporter: Will Holley
> Attachments: docs.json
>
>
> When fetching open_revs with latest=true for a conflicted document, the order 
> of results is incorrect. For example, if I create a document with the rev 
> tree:
> {code}
>   4-d1
>   /
>   3-c1
>   /
>   2-b1
>   /
> 1-a 
>   \
>   2-b2
>   \
>   3-c2
> {code}
> and ask for {{open_revs=["2-b1","2-b2"]=true}}, the response will 
> return {{3-c2}} followed by {{4-d1}} - the reverse of what I'd expect.
> Below is a test/reproduction executed against Couch 1.6.1 and 2.0.
> 1.6.1:
> {code}
> $ export COUCH_HOST="http://127.0.0.1:5984;
> $ curl -XPUT "$COUCH_HOST/open_revs_test"
> {"ok":true}
> $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H 
> "Content-Type:application/json" -XPOST -d @docs.json
> []
> # GET open_revs=["2-b1","2-b2"]
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D"
> [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}]
> # GET open_revs=["2-b1","2-b2"]=true
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true"
> [{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}},{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}}]
> {code}
> 2.0:
> {code}
> $ export COUCH_HOST="http://127.0.0.1:15984;
> $ curl -XPUT "$COUCH_HOST/open_revs_test"
> {"ok":true}
> $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H 
> "Content-Type:application/json" -XPOST -d @docs.json
> []
> # GET open_revs=["2-b1","2-b2"]
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D"
> [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}]
> # GET open_revs=["2-b1","2-b2"]=true
> $ curl -H "Accept:application/json" 
> "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true"
> [{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}},{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}}]
> {code}
> Note the reversed order of the results in 2.0 when {{latest=true}} is 
> specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3234) Track open shard timeouts with a counter instead of logging

2016-11-11 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3234.
--
Resolution: Fixed

Merged.

> Track open shard timeouts with a counter instead of logging
> ---
>
> Key: COUCHDB-3234
> URL: https://issues.apache.org/jira/browse/COUCHDB-3234
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> Fabric uses the open_shard RPC method to get security objects for every 
> request. These calls have very short timeouts on them which can cause massive 
> amounts of log spam when a node is under load. Rather than log a whole bunch 
> of garbage when each one fails lets just use a counter instead.
> PR incoming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3234) Track open shard timeouts with a counter instead of logging

2016-11-11 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3234:
--

 Summary: Track open shard timeouts with a counter instead of 
logging
 Key: COUCHDB-3234
 URL: https://issues.apache.org/jira/browse/COUCHDB-3234
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Paul Joseph Davis


Fabric uses the open_shard RPC method to get security objects for every 
request. These calls have very short timeouts on them which can cause massive 
amounts of log spam when a node is under load. Rather than log a whole bunch of 
garbage when each one fails lets just use a counter instead.

PR incoming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3191) Improve couch_lru performance

2016-10-13 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3191:
--

 Summary: Improve couch_lru performance
 Key: COUCHDB-3191
 URL: https://issues.apache.org/jira/browse/COUCHDB-3191
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Paul Joseph Davis


This ticket is to track work around updating couch_lru to be more performant. 
So far I have a change that replaces the gb_tree/dict pair with two khash'es. 
This approach allows us to change the algorithmic speed from O(N log N) to O(1) 
which should in theory make this faster.

This is motivated by the poor behavior of couch_server when under load by lots 
of concurrent clients and a high max_dbs_open value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3178) Fabric does not send message when filtering lots of documents

2016-10-04 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546713#comment-15546713
 ] 

Paul Joseph Davis commented on COUCHDB-3178:


Yeap. That fixed it. Kind of amazing how something like that can have such a 
profound impact on the system. For background, what would happen is that when 
we got a call to the clustered _changes endpoint, we'd fire off RPC workers for 
each shard and wait to hear back from them. Which we never did so we'd timeout.

However, the rpc workers were still furiously looking for docs that passed the 
filter which was just wasting resources since their coordinator had already 
abandoned them.

So now filtered changes feeds work again when they have to filter lots of rows 
(once we merge the PR and get it into a relase).

> Fabric does not send message when filtering lots of documents
> -
>
> Key: COUCHDB-3178
> URL: https://issues.apache.org/jira/browse/COUCHDB-3178
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> We managed to mess up part of the fabric merge where fabric_rpc workers that 
> are running filter changes end up not sending a message for long periods of 
> time if no documents are passing the filter. PR Incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3178) Fabric does not send message when filtering lots of documents

2016-10-04 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546545#comment-15546545
 ] 

Paul Joseph Davis commented on COUCHDB-3178:


I should note, that if you have a replication with a filter that's constantly 
timing out, this is likely the cause. Also, if you have that replication as a 
replicator doc, we're seeing a large amount of load on various nodes because 
the couchjs process count is much higher as we're filtering a whole bunch of 
docs repeatedly because replications are retried by the replication manager. 
So, while it seems like a small fix it should actually have a fairly sizable 
impact on cluster performance and resource usage. I'll update more once I've 
learned more.

> Fabric does not send message when filtering lots of documents
> -
>
> Key: COUCHDB-3178
> URL: https://issues.apache.org/jira/browse/COUCHDB-3178
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> We managed to mess up part of the fabric merge where fabric_rpc workers that 
> are running filter changes end up not sending a message for long periods of 
> time if no documents are passing the filter. PR Incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3178) Fabric does not send message when filtering lots of documents

2016-10-04 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3178:
--

 Summary: Fabric does not send message when filtering lots of 
documents
 Key: COUCHDB-3178
 URL: https://issues.apache.org/jira/browse/COUCHDB-3178
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Paul Joseph Davis


We managed to mess up part of the fabric merge where fabric_rpc workers that 
are running filter changes end up not sending a message for long periods of 
time if no documents are passing the filter. PR Incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters

2016-10-04 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545522#comment-15545522
 ] 

Paul Joseph Davis commented on COUCHDB-3173:


Fixed. PR incoming.

> Views return corrupt data for text fields containing non-BMP characters
> ---
>
> Key: COUCHDB-3173
> URL: https://issues.apache.org/jira/browse/COUCHDB-3173
> Project: CouchDB
>  Issue Type: Bug
>  Components: JavaScript View Server
>Affects Versions: 2.0.0
>Reporter: Loke
>
> When inserting a non-BMP character (i.e. characters with a Unicode codepoint 
> above {{U+}}), the content gets corrupted after reading it from a view. 
> At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT 
> CHARACTER}} inserted into the text.
> To reproduce, use the following commands.
> Create the document containing a field with the character {{U+1F604 SMILING 
> FACE WITH OPEN MOUTH AND SMILING EYES}}:
> {noformat}
> $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2
> {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
> {noformat}
> Get the document to ensure that it was saved properly:
> {noformat}
> curl -X GET http://localhost:5984/foo/foo2
> {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}
> {noformat}
> Create a view that will return that document:
> {noformat}
> $ curl --user user:password -X PUT -d 
> '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}'
>  http://localhost:5984/foo/_design/bugdemo
> {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
> {noformat}
> Get the document from the view:
> {noformat}
> $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
> {"total_rows":1,"offset":0,"rows":[
> {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}}
> ]}
> {noformat}
> Now we can see that the field {{value}} now contains two characters. The 
> original character as well as {{U+FFFD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters

2016-10-04 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545511#comment-15545511
 ] 

Paul Joseph Davis commented on COUCHDB-3173:


Here's a simpler reproducer:

https://gist.github.com/davisp/3cc1a0e5b0de04a3c027f694d5a4bc31

The contents of the gist are pasted below for posterity, but I dunno how well 
Jira and Chrome will store the raw byte values:

repro.js:

["reset", {"reduce_limit":"true", "timeout":5000}]
["add_fun", "function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"]
["map_doc", 
{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}]

run.sh:

cat repro.js | ./bin/couchjs share/server/main.js

Should have a fix in a few minutes if I'm lucky.

> Views return corrupt data for text fields containing non-BMP characters
> ---
>
> Key: COUCHDB-3173
> URL: https://issues.apache.org/jira/browse/COUCHDB-3173
> Project: CouchDB
>  Issue Type: Bug
>  Components: JavaScript View Server
>Affects Versions: 2.0.0
>Reporter: Loke
>
> When inserting a non-BMP character (i.e. characters with a Unicode codepoint 
> above {{U+}}), the content gets corrupted after reading it from a view. 
> At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT 
> CHARACTER}} inserted into the text.
> To reproduce, use the following commands.
> Create the document containing a field with the character {{U+1F604 SMILING 
> FACE WITH OPEN MOUTH AND SMILING EYES}}:
> {noformat}
> $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2
> {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
> {noformat}
> Get the document to ensure that it was saved properly:
> {noformat}
> curl -X GET http://localhost:5984/foo/foo2
> {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}
> {noformat}
> Create a view that will return that document:
> {noformat}
> $ curl --user user:password -X PUT -d 
> '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}'
>  http://localhost:5984/foo/_design/bugdemo
> {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
> {noformat}
> Get the document from the view:
> {noformat}
> $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
> {"total_rows":1,"offset":0,"rows":[
> {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}}
> ]}
> {noformat}
> Now we can see that the field {{value}} now contains two characters. The 
> original character as well as {{U+FFFD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3143) Make Mango's MR index default limit match the docs

2016-09-11 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3143:
--

 Summary: Make Mango's MR index default limit match the docs
 Key: COUCHDB-3143
 URL: https://issues.apache.org/jira/browse/COUCHDB-3143
 Project: CouchDB
  Issue Type: Bug
  Components: Mango
Reporter: Paul Joseph Davis


We document that mango indexes return 25 rows per call by default but the code 
had a large value to basically return unlimited. Fix is to update mango to 
match the docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3101) Builtin reduce functions should not throw errors

2016-08-16 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422782#comment-15422782
 ] 

Paul Joseph Davis commented on COUCHDB-3101:


Adding an "error" key would be difficult, but the almost same format could be:

{"key":null, "value":{"error": "invalid input from map function 'name'"}}

Which I think would be easy enough.

> Builtin reduce functions should not throw errors
> 
>
> Key: COUCHDB-3101
> URL: https://issues.apache.org/jira/browse/COUCHDB-3101
> Project: CouchDB
>  Issue Type: Bug
>  Components: View Server Support
>Reporter: Paul Joseph Davis
>
> So I just figured out we have an issue with the builtin reduce functions. 
> Currently, if they receive invalid data they'll throw an error. Unfortunately 
> what ends up happening is that if the error is never corrected then the view 
> files end up becoming bloated and refusing to open (because they're searching 
> for a header as Jay pointed out the other week).
> We should either return null or ignore the bad data. My preference would be 
> to return null so that it indicates bad data was given somewhere but I could 
> also see just dropping the bad value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3101) Builtin reduce functions should not throw errors

2016-08-12 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418998#comment-15418998
 ] 

Paul Joseph Davis commented on COUCHDB-3101:


I should note, I prefer null because if we drop data the user doesn't realize 
was malformed then they'll have no signal that something is broken and may 
instead rely on probably invalid data out of the reducer. Also notice the null 
is just for any reducer that's broken it doesn't null out any other reducer or 
anything of that nature.

So basically if say a user has a _sum reduce function and emits a string as a 
value this would return null for any reduce query for that specific view. Any 
other view in the same ddoc would be unaffected.

> Builtin reduce functions should not throw errors
> 
>
> Key: COUCHDB-3101
> URL: https://issues.apache.org/jira/browse/COUCHDB-3101
> Project: CouchDB
>  Issue Type: Bug
>  Components: View Server Support
>Reporter: Paul Joseph Davis
>
> So I just figured out we have an issue with the builtin reduce functions. 
> Currently, if they receive invalid data they'll throw an error. Unfortunately 
> what ends up happening is that if the error is never corrected then the view 
> files end up becoming bloated and refusing to open (because they're searching 
> for a header as Jay pointed out the other week).
> We should either return null or ignore the bad data. My preference would be 
> to return null so that it indicates bad data was given somewhere but I could 
> also see just dropping the bad value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3101) Builtin reduce functions should not throw errors

2016-08-12 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3101:
--

 Summary: Builtin reduce functions should not throw errors
 Key: COUCHDB-3101
 URL: https://issues.apache.org/jira/browse/COUCHDB-3101
 Project: CouchDB
  Issue Type: Bug
  Components: View Server Support
Reporter: Paul Joseph Davis


So I just figured out we have an issue with the builtin reduce functions. 
Currently, if they receive invalid data they'll throw an error. Unfortunately 
what ends up happening is that if the error is never corrected then the view 
files end up becoming bloated and refusing to open (because they're searching 
for a header as Jay pointed out the other week).

We should either return null or ignore the bad data. My preference would be to 
return null so that it indicates bad data was given somewhere but I could also 
see just dropping the bad value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3096) Fix config listener handler accumulation

2016-08-05 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3096:
--

 Summary: Fix config listener handler accumulation
 Key: COUCHDB-3096
 URL: https://issues.apache.org/jira/browse/COUCHDB-3096
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Paul Joseph Davis


We found an issue in production with config listeners piling up in the 
config_event gen_event server. This was due to how we fixed the API 
inconsistencies. We had re-parented the handler supervision to the config 
gen_server instead of the process that wanted config notifications. This means 
that since config never dies the handlers are never removed. The proposed patch 
just removes the config gen_server and uses a dedicated gen_server per event 
handler that handles the gen_event_EXIT messages. PR incoming after I have a 
ticket number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3067) Improve couch_log implementation

2016-08-05 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-3067.
--
   Resolution: Fixed
 Assignee: Paul Joseph Davis
Fix Version/s: 2.0.0

Done and done. This has been merged.

> Improve couch_log implementation
> 
>
> Key: COUCHDB-3067
> URL: https://issues.apache.org/jira/browse/COUCHDB-3067
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Paul Joseph Davis
>Assignee: Paul Joseph Davis
> Fix For: 2.0.0
>
>
> The current couch_log implementation splits its configuration between 
> CouchDB's config app and lager's use of the standard sys.config system. 
> Generally speaking we don't use the fancy features of lager so there's not 
> much reason to keep it around. This ticket is to remove lager and its 
> dependencies and fix up the short comings of the existing couch_log app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3092) couch_log_writer_file_test failure on Windows

2016-08-05 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409532#comment-15409532
 ] 

Paul Joseph Davis commented on COUCHDB-3092:


[~wohali] Should be fixed on master now.

> couch_log_writer_file_test failure on Windows
> -
>
> Key: COUCHDB-3092
> URL: https://issues.apache.org/jira/browse/COUCHDB-3092
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Joan Touzet
>Priority: Critical
> Fix For: 2.0.0
>
>
> {noformat}
>   couch_log_writer_file_test: couch_log_writer_file_test_...*failed*
> in function couch_log_writer_file_test:'-check_reopen/0-fun-1-'/2 
> (test/couch_log_writer_file_test.erl, line 147)
> in call from couch_log_writer_file_test:check_reopen/0 
> (test/couch_log_writer_file_test.erl, line 147)
> **error:{assertion_failed,[{module,couch_log_writer_file_test},
>{line,147},
>{expression,"element ( 3 , St3 ) /= element ( 3 , St2 )"},
>{expected,true},
>{value,false}]}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3092) couch_log_writer_file_test failure on Windows

2016-08-05 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409520#comment-15409520
 ] 

Paul Joseph Davis commented on COUCHDB-3092:


Derp. This is simple. I'll just disable when on Windows. This is just testing 
that the file will reopen at the path if its been deleted (or moved for log 
rotation). I'll just disable on Windows.

> couch_log_writer_file_test failure on Windows
> -
>
> Key: COUCHDB-3092
> URL: https://issues.apache.org/jira/browse/COUCHDB-3092
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Joan Touzet
>Priority: Critical
> Fix For: 2.0.0
>
>
> {noformat}
>   couch_log_writer_file_test: couch_log_writer_file_test_...*failed*
> in function couch_log_writer_file_test:'-check_reopen/0-fun-1-'/2 
> (test/couch_log_writer_file_test.erl, line 147)
> in call from couch_log_writer_file_test:check_reopen/0 
> (test/couch_log_writer_file_test.erl, line 147)
> **error:{assertion_failed,[{module,couch_log_writer_file_test},
>{line,147},
>{expression,"element ( 3 , St3 ) /= element ( 3 , St2 )"},
>{expected,true},
>{value,false}]}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3067) Improve couch_log implementation

2016-07-19 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3067:
--

 Summary: Improve couch_log implementation
 Key: COUCHDB-3067
 URL: https://issues.apache.org/jira/browse/COUCHDB-3067
 Project: CouchDB
  Issue Type: Bug
  Components: Logging
Reporter: Paul Joseph Davis


The current couch_log implementation splits its configuration between CouchDB's 
config app and lager's use of the standard sys.config system. Generally 
speaking we don't use the fancy features of lager so there's not much reason to 
keep it around. This ticket is to remove lager and its dependencies and fix up 
the short comings of the existing couch_log app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes

2016-07-13 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375998#comment-15375998
 ] 

Paul Joseph Davis commented on COUCHDB-2791:


Initial implementation seems to be working well enough. I've tested db info, 
single doc ops, _all_docs, _changes, and basic views. I've disabled any sort of 
write operation though as that would likely get a cluster into a very bad state 
if a user wasn't being careful. We can investigate adding write ops when we 
look at adding safety precautions in the storage engine.

All in all this change is rather small and I'm actually fairly happy with how 
its turned out. Its only superficially tested at this point so there will need 
to be more done there before we call it good. I only read enough between 
chttpd/fabric/couch_db to hopefully get all return values consistent. However 
its possible I missed some differences here and there.

Branches are up:

https://github.com/apache/couchdb-couch/compare/master...cloudant:2791-allow-shard-access-through-cluster-port
https://github.com/apache/couchdb-fabric/compare/master...cloudant:2791-allow-shard-access-through-cluster-port
https://github.com/apache/couchdb-chttpd/compare/master...cloudant:2791-allow-shard-access-through-cluster-port

Let me know what y'all think.

> Allow for direct parallel access to shards via _changes
> ---
>
> Key: COUCHDB-2791
> URL: https://issues.apache.org/jira/browse/COUCHDB-2791
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core
>Reporter: Tony Sun
>Assignee: Tony Sun
>
> For performance gains, we introduce a new _changes feed option parallel that 
> returns a list of urls that the user can use to directly access individual 
> shards. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes

2016-07-13 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375345#comment-15375345
 ] 

Paul Joseph Davis commented on COUCHDB-2791:


I contemplated allowing writes and rejecting things but yeah, the level at 
which we'd enforce that becomes the issue. I could easily add it in new HTTP 
handlers I'll be adding but as you note it doesn't help anywhere else. But 
adding this low enough means that we're tagging individual shards with an idea 
of their shard range which would be the first time an individual shard has ever 
known it was part of a cluster database. Which isn't a big deal, its just that 
there's no current plumbing for that. I'm gonna start work on the read side and 
will contemplate writes as I go, but I could see it happening after pluggable 
storage engines land when we're starting to actually muck with the core storage 
bits again.

> Allow for direct parallel access to shards via _changes
> ---
>
> Key: COUCHDB-2791
> URL: https://issues.apache.org/jira/browse/COUCHDB-2791
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core
>Reporter: Tony Sun
>Assignee: Tony Sun
>
> For performance gains, we introduce a new _changes feed option parallel that 
> returns a list of urls that the user can use to directly access individual 
> shards. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes

2016-07-12 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373851#comment-15373851
 ] 

Paul Joseph Davis commented on COUCHDB-2791:


I should note, that we'll likely end up adding a few special API end points to 
help clients with changes feeds. There's some non-trivial shard replacement 
logic that's not available to people outside the database (plus it'd be good ot 
have that logic in one place). Though I may be able to piggy back this onto the 
existing _shards endpoint by just expanding some of its capabilities.

> Allow for direct parallel access to shards via _changes
> ---
>
> Key: COUCHDB-2791
> URL: https://issues.apache.org/jira/browse/COUCHDB-2791
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core
>Reporter: Tony Sun
>Assignee: Tony Sun
>
> For performance gains, we introduce a new _changes feed option parallel that 
> returns a list of urls that the user can use to directly access individual 
> shards. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes

2016-07-12 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373845#comment-15373845
 ] 

Paul Joseph Davis commented on COUCHDB-2791:


I'm taking another look at this and contemplating a few different approaches. 
Originally when we were kicking this idea around the idea was purely motivated 
by trying to make the changes feed faster by allowing clients to stream from 
individual shards. However [~kxepal] makes a good point that it probably makes 
better sense to turn this into a more generic feature allowing access to 
individual shards.

One thing I wanted to put out explicitly is that the idea for this is that it 
would be available to users over the clustered 5984 port if they want to do 
fancy advanced stuff client side. Ie, this isn't something for the 5986 port 
(and will try and avoid using 5986 things since we're looking to get rid of 
that anyway).

Also, as I think about this I think it'd be bad to allow write/modification 
APIs across this new shard specific interface as that seems like it'd be a 
super easy way to mess up a clustered database by getting docs in the wrong 
shard and/or getting shards desynchronized with other settings. So for the time 
being at least I'm going to limit this to read-only APIs which will basically 
be fetching shard db info, individual docs, all docs, views, and changes off 
the top of my head. Beyond that I think I can make this happen as a change to 
chttpd plus some additional support code to fabric for the new local 
operations. The end result API I'm looking at will be something like this:

http://hostname:5984/dbname/_shard/-/$rest

Where $rest is any supported API call that will match the same operations in 
the cluster case.

To implement this i'm planning on adding a new field to the #httpd record that 
selects the fabric module to use. By default this will be set to fabric which 
is the current default. I'll then add a fabric_local (or something, if anyone 
wants to suggest a better name) that will support just the set of things we 
want to export over this interface. This will then be fairly similar to 
fabric_rpc internally but without going through RPC/rexi calls and the like. 
Once that's done then we should hopefully be good to go for making everything 
work all magically.

That seem sane to everyone?

> Allow for direct parallel access to shards via _changes
> ---
>
> Key: COUCHDB-2791
> URL: https://issues.apache.org/jira/browse/COUCHDB-2791
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core
>Reporter: Tony Sun
>Assignee: Tony Sun
>
> For performance gains, we introduce a new _changes feed option parallel that 
> returns a list of urls that the user can use to directly access individual 
> shards. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3036) Bug in fabric_db_update_listener breaks continuous changes feeds when a node is down

2016-06-10 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis resolved COUCHDB-3036.

Resolution: Fixed

Merged.

> Bug in fabric_db_update_listener breaks continuous changes feeds when a node 
> is down
> 
>
> Key: COUCHDB-3036
> URL: https://issues.apache.org/jira/browse/COUCHDB-3036
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> A recent fix [1] to fabric_db_update_listener uncovered the fact that we were 
> never starting rexi monitors to know if a node went down during a changes 
> feed. Fixing that bug lead us to realize that we don't handle rexi_DOWN 
> messages correctly in fabric_db_udpater.
> Patch is incoming.
> [1] 
> https://github.com/apache/couchdb-fabric/commit/b592c390b99a198d6a051c6ed7b0280800cc2939



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3036) Bug in fabric_db_update_listener breaks continuous changes feeds when a node is down

2016-06-10 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324958#comment-15324958
 ] 

Paul Joseph Davis commented on COUCHDB-3036:


PR here: https://github.com/apache/couchdb-fabric/pull/56

> Bug in fabric_db_update_listener breaks continuous changes feeds when a node 
> is down
> 
>
> Key: COUCHDB-3036
> URL: https://issues.apache.org/jira/browse/COUCHDB-3036
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Paul Joseph Davis
>
> A recent fix [1] to fabric_db_update_listener uncovered the fact that we were 
> never starting rexi monitors to know if a node went down during a changes 
> feed. Fixing that bug lead us to realize that we don't handle rexi_DOWN 
> messages correctly in fabric_db_udpater.
> Patch is incoming.
> [1] 
> https://github.com/apache/couchdb-fabric/commit/b592c390b99a198d6a051c6ed7b0280800cc2939



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3036) Bug in fabric_db_update_listener breaks continuous changes feeds when a node is down

2016-06-10 Thread Paul Joseph Davis (JIRA)
Paul Joseph Davis created COUCHDB-3036:
--

 Summary: Bug in fabric_db_update_listener breaks continuous 
changes feeds when a node is down
 Key: COUCHDB-3036
 URL: https://issues.apache.org/jira/browse/COUCHDB-3036
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Paul Joseph Davis


A recent fix [1] to fabric_db_update_listener uncovered the fact that we were 
never starting rexi monitors to know if a node went down during a changes feed. 
Fixing that bug lead us to realize that we don't handle rexi_DOWN messages 
correctly in fabric_db_udpater.

Patch is incoming.


[1] 
https://github.com/apache/couchdb-fabric/commit/b592c390b99a198d6a051c6ed7b0280800cc2939



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3026) fabric:open_revs doesn't filter out not_found replies anymore

2016-05-31 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308365#comment-15308365
 ] 

Paul Joseph Davis commented on COUCHDB-3026:


Yeap its a bug. I called out [1] that we were using remove ancestors wrong to 
handle this in my original comment on Alexander's original PR and forgot to 
make sure to add a function to handle the removal.

Adding a simple filter is the right fix.

[1] https://github.com/apache/couchdb-fabric/pull/35#issuecomment-152303652

> fabric:open_revs doesn't filter out not_found replies anymore
> -
>
> Key: COUCHDB-3026
> URL: https://issues.apache.org/jira/browse/COUCHDB-3026
> Project: CouchDB
>  Issue Type: Bug
>Reporter: ILYA
>
> Previously we filtered out `{{not_found,missing}, …}` replies in this line.
> We don’t filter them out anymore. Therefore `fabric:open_revs` returns more 
> than one reply. In some places we assume that the return from open_revs is 
> always a list with one element in it. As a result we get a badmatch there.
> Here is the list of places where we assume single reply:
> - https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L699
> - 
> https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L1040:L1044
> - 
> https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L1209:L1210
> - 
> https://github.com/apache/couchdb-ddoc-cache/blob/master/src/ddoc_cache_opener.erl#L123
> - 
> https://github.com/apache/couchdb-fabric/blob/master/src/fabric_view.erl#L180:L183
> All above places are broken if we don't filter not_found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-2863) function_clause on requesting multiple open_revs with lastest=true

2016-05-26 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis resolved COUCHDB-2863.

   Resolution: Fixed
Fix Version/s: 2.0.0

The fix for this has been merged.

> function_clause on requesting multiple open_revs with lastest=true
> --
>
> Key: COUCHDB-2863
> URL: https://issues.apache.org/jira/browse/COUCHDB-2863
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Alexander Shorin
>Assignee: Alexander Shorin
>Priority: Blocker
>  Labels: has-pr
> Fix For: 2.0.0
>
>
> During work on the COUCHDB-2857 found another issue for us:
> {code}
> $ echo '{}' | http put http://localhost:15984/db/doc
> {
> "id": "doc",
> "ok": true,
> "rev": "1-967a00dff5e02add41819138abb3284d"
> }
> $ echo '{"_rev": "1-967a00dff5e02add41819138abb3284d"}' | http put 
> http://localhost:15984/db/doc
> {
> "id": "doc",
> "ok": true,
> "rev": "2-7051cbe5c8faecd085a3fa619e6e6337"
> }
> $ http 
> 'http://localhost:15984/db/doc?open_revs=["1-967a00dff5e02add41819138abb3284d;,
>  "2-7051cbe5c8faecd085a3fa619e6e6337"]=true'
> {"error":"unknown_error","reason":"function_clause","ref":162084788}
> $ cat dev/logs/node1.log
> 2015-10-28 02:38:26.707 [error] node1@127.0.0.1 <0.1222.0> req_err(162084788) 
> unknown_error : function_clause
> [<<"lists:zipwith/3 L450">>,<<"lists:zipwith/3 
> L450">>,<<"fabric_doc_open_revs:handle_message/3 
> L104">>,<<"rexi_utils:process_mailbox/6 L55">>,<<"rexi_utils:recv/6 
> L49">>,<<"fabric_doc_open_revs:go/4 L47">>,<<"chttpd_db:db_doc_req/3 
> L660">>,<<"chttpd:handle_request_int/1 L238">>]
> 2015-10-28 02:38:26.707 [error] node1@127.0.0.1 <0.1222.0> httpd 500 error 
> response:
>  {"error":"unknown_error","reason":"function_clause","ref":162084788}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2784) Re-optimize skip query-string parameter in clusters

2015-08-25 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712337#comment-14712337
 ] 

Paul Joseph Davis commented on COUCHDB-2784:


Its a more stringent requirement in that you have to be able to re-read from 
the same snapshot or reverse iteration direction in the current snapshot 
multiple times.

The reason is that when you send the new Foo start key, all but one of the RPC 
workers will most likely have to back up to Foo.

 Re-optimize skip query-string parameter in clusters
 ---

 Key: COUCHDB-2784
 URL: https://issues.apache.org/jira/browse/COUCHDB-2784
 Project: CouchDB
  Issue Type: Improvement
  Security Level: public(Regular issues) 
  Components: Database Core
Reporter: Adam Kocoloski

 In COUCHDB-977 we implemented a more efficient version of the skip function 
 that relies on the document counts we maintain in the inner nodes of 
 couch_btree. The 2.0 codebase did not initially take advantage of this 
 enhancement, because when a user specifies `skip=X` we don't know a priori 
 how many rows will be skipped from each shard.
 The current implementation tells each shard to not skip any rows and then has 
 the coordinator discard the first N rows after doing the mergesort. It's O(N) 
 complexity just like the bad old days before COUCHDB-977 and is actually 
 substantially more expensive because of all the message traffic.
 Good news is we can do better. For a database with Q shards and a request 
 specifying ?skip=X We know that either a) at least one of the shards will end 
 skipping at least `X / Q` rows, or b) the entire response body will be empty. 
 So, I propose the following:
 # Set the per-shard skip value to `X div Q`
 #* If a shard has fewer than `X div Q` rows remaining it should send its last 
 row
 #* If `X div Q` is zero we can short-circuit and just use the current 
 algorithm.
 # The coordinator sorts the first responses from each shard. It then sends 
 the key of the row that sorts first (let's call it Foo) back to all the shards
 # Each shard counts the number of rows in between the original startkey and 
 Foo and sends that number, then starts streaming with Foo as the new startkey
 # The coordinator deducts the computed per-shard skip values from the 
 user-specified skip and then takes care of the remainder in the usual way we 
 do it today (i.e. by consuming the rows as they come in).
 What do you think? Did I overlook anything here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2784) Re-optimize skip query-string parameter in clusters

2015-08-24 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710135#comment-14710135
 ] 

Paul Joseph Davis commented on COUCHDB-2784:


Thinking about this for awhile I think it'd work. And in the worst case it'd be 
a logarithmic number of iterations to find the start key for each shard. I got 
successfully nerd sniped trying to chase down a closed form solution to get the 
number of iterations as a function of Q. Thanks for that.

Anyway, the biggest thing I see is that this requires a snapshot with rewind 
capabilities or re-reading from the same snapshot. We'll need to be careful in 
how we guarantee that. Currently as long as we hold a #db{} record without 
reopening it we'll be fine. But if we get fancier in the future this will 
require more thought if the storage engine could change underneath our feet 
while performing this calculation.

An alternative approach that occurs to me that seems a bit easier to digest 
while placing much stricter restrictions on our btree would be to do a merge 
sort of the btree traversal if that makes any sense. Basically, we could insert 
a clustered coordination into the traverse/skip decisions in couch_btree. Of 
course that means that all storage would always have to be a btree written in 
Erlang to a fairly specific API.

 Re-optimize skip query-string parameter in clusters
 ---

 Key: COUCHDB-2784
 URL: https://issues.apache.org/jira/browse/COUCHDB-2784
 Project: CouchDB
  Issue Type: Improvement
  Security Level: public(Regular issues) 
  Components: Database Core
Reporter: Adam Kocoloski

 In COUCHDB-977 we implemented a more efficient version of the skip function 
 that relies on the document counts we maintain in the inner nodes of 
 couch_btree. The 2.0 codebase did not initially take advantage of this 
 enhancement, because when a user specifies `skip=X` we don't know a priori 
 how many rows will be skipped from each shard.
 The current implementation tells each shard to not skip any rows and then has 
 the coordinator discard the first N rows after doing the mergesort. It's O(N) 
 complexity just like the bad old days before COUCHDB-977 and is actually 
 substantially more expensive because of all the message traffic.
 Good news is we can do better. For a database with Q shards and a request 
 specifying ?skip=X We know that either a) at least one of the shards will end 
 skipping at least `X / Q` rows, or b) the entire response body will be empty. 
 So, I propose the following:
 # Set the per-shard skip value to `X div Q`
 #* If a shard has fewer than `X div Q` rows remaining it should send its last 
 row
 #* If `X div Q` is zero we can short-circuit and just use the current 
 algorithm.
 # The coordinator sorts the first responses from each shard. It then sends 
 the key of the row that sorts first (let's call it Foo) back to all the shards
 # Each shard counts the number of rows in between the original startkey and 
 Foo and sends that number, then starts streaming with Foo as the new startkey
 # The coordinator deducts the computed per-shard skip values from the 
 user-specified skip and then takes care of the remainder in the usual way we 
 do it today (i.e. by consuming the rows as they come in).
 What do you think? Did I overlook anything here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2732) Use thread local storage for couch_ejson_compare NIF

2015-07-13 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624988#comment-14624988
 ] 

Paul Joseph Davis commented on COUCHDB-2732:


When we saw this in testing my recollection was that we'd probably missed it 
due to the concurrency issue.

For reference, the tests that we use to illustrate the performance difference 
is to set up a view on a clustered database with q=128 and then ask for a set 
of 10 rows from that view with a large number of clients bypassing HAProxy. The 
end result is that we end up having to call couch_ejson_compare when streaming 
the view response an extremely large number of times in lots of different 
request handling processes. This was enough to demonstrate that the mutex 
locking was a global bottleneck. On single node couch the number of collations 
is significantly smaller because it doesn't have to merge the responses from 
all 128 shards before returning them to the user.


 Use thread local storage for couch_ejson_compare NIF
 

 Key: COUCHDB-2732
 URL: https://issues.apache.org/jira/browse/COUCHDB-2732
 Project: CouchDB
  Issue Type: Improvement
  Security Level: public(Regular issues) 
Reporter: Adam Kocoloski

 Some folks inside IBM have demonstrated conclusively that the NIF we use for 
 JSON sorting is a significant bottleneck with more than a few concurrent 
 users hitting us. The VM ends up spending all of its time dealing with lock 
 contention. We'd be better off sticking with the pure Erlang code, but we 
 have an even better alternative, which is to use thread local storage to pin 
 an allocator to each OS thread and eliminate the locks.
 Patch forthcoming, but I wanted to make sure this got in the tracker. The 
 improvement looks really signficant. Interestingly, there was some discussion 
 about a performance regression after this was introduced back in COUCHDB-1186 
 ... maybe the missing element in that discussion was the client concurrency?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)