[jira] [Commented] (COUCHDB-3415) EUnit: should_accept_live_as_an_alias_for_continuous invalid_trailing_data
[ https://issues.apache.org/jira/browse/COUCHDB-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044461#comment-16044461 ] Paul Joseph Davis commented on COUCHDB-3415: Fix incoming. Saw data for this when looking for a different log. The timeout=1 parameter will sometimes cause a timeout to fire before we get the result which ends up putting a newline at the front of the body so that the split fails to find the proper last_seq data. Fix is to use the global option for binary:split/3 and then filter out any empty binaries. PR incoming. > EUnit: should_accept_live_as_an_alias_for_continuous invalid_trailing_data > -- > > Key: COUCHDB-3415 > URL: https://issues.apache.org/jira/browse/COUCHDB-3415 > Project: CouchDB > Issue Type: Bug > Components: Test Suite >Reporter: Joan Touzet > > New bug. Seen once in Travis, Erlang 17.5. Re-running caused the error to > disappear. > {noformat} > module 'chttpd_db_test' > chttpd db tests > chttpd_db_test:71: should_return_ok_true_on_bulk_update...[0.073 s] ok > chttpd_db_test:86: > should_accept_live_as_an_alias_for_continuous...*failed* > in function couch_util:json_decode/1 (src/couch_util.erl, line 414) > in call from > chttpd_db_test:'-should_accept_live_as_an_alias_for_continuous/1-fun-1-'/1 > (test/chttpd_db_test.erl, line 98) > **throw:{invalid_json,{error,{257,invalid_trailing_data}}} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (COUCHDB-3376) Fix mem3_shards under load
[ https://issues.apache.org/jira/browse/COUCHDB-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3376. -- Resolution: Fixed > Fix mem3_shards under load > -- > > Key: COUCHDB-3376 > URL: https://issues.apache.org/jira/browse/COUCHDB-3376 > Project: CouchDB > Issue Type: Bug >Reporter: Paul Joseph Davis > > There were two issues with mem3_shards that were fixed while I've been > testing the PSE code. > The first issue was found by [~jaydoane] where a database can have its shards > inserted into the cache after its been deleted. This can happen if a client > does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to > track the changes feed update sequence from the changes feed listener and > only insert shard maps that come from a client that has read as recent of an > update_seq as mem3_shards. > The second issue found during heavy benchmarking was that large shard maps > (in the Q>=128 range) can quite easily cause mem3_shards to backup when > there's a thundering herd attempting to open the database. There's no > coordination among workers trying to add a shard map to the cache so if a > bunch of independent clients all send the shard map at once (say, at the > beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for > this was two fold. First, rather than send the shard map directly to > mem3_shards, we copy it into a spawned process and when/if mem3_shards wants > to write it, it tells this writer process to do its business. The second > optimization for this change is to create an ets table to track these > processes. Then independent clients can check if a shard map is already > enroute to mem3_shards by using ets:insert_new and canceling their writer if > that returns false. > PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (COUCHDB-3378) Fix mango full text detection
[ https://issues.apache.org/jira/browse/COUCHDB-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3378. -- Resolution: Fixed > Fix mango full text detection > - > > Key: COUCHDB-3378 > URL: https://issues.apache.org/jira/browse/COUCHDB-3378 > Project: CouchDB > Issue Type: Bug > Components: Mango >Reporter: Paul Joseph Davis > > The renaming of source files for mango's full text adapter was not super > awesome. So I fixed it to not do that. PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (COUCHDB-3379) Fix couch_auth_cache reinitialization logic
[ https://issues.apache.org/jira/browse/COUCHDB-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3379. -- Resolution: Fixed > Fix couch_auth_cache reinitialization logic > --- > > Key: COUCHDB-3379 > URL: https://issues.apache.org/jira/browse/COUCHDB-3379 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > The reinitialization logic is subtle and quite silly in hindsight. This > reacted badly with the PSE work that has a slight change to the order of > signals (which nothing should be relying on in an async system :). This > simplifies and fixes the reinitialization of couch_auth_cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3343) JS: show_documents failure
[ https://issues.apache.org/jira/browse/COUCHDB-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973323#comment-15973323 ] Paul Joseph Davis commented on COUCHDB-3343: Another instance: https://s3.amazonaws.com/archive.travis-ci.org/jobs/223225332/log.txt > JS: show_documents failure > -- > > Key: COUCHDB-3343 > URL: https://issues.apache.org/jira/browse/COUCHDB-3343 > Project: CouchDB > Issue Type: Test > Components: Test Suite >Reporter: Joan Touzet > > Has occurred once so far in Jenkins CI runs. > {noformat} > test/javascript/tests/show_documents.js > Error: changed ddoc > Trace back (most recent call first): > 52: test/javascript/test_setup.js > T(false,"changed ddoc") > 296: test/javascript/tests/show_documents.js > () > 37: test/javascript/cli_runner.js > runTest() > 48: test/javascript/cli_runner.js > > fail > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (COUCHDB-3380) Fix mem3_sync_event_listener unit tests
[ https://issues.apache.org/jira/browse/COUCHDB-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3380. -- Resolution: Fixed > Fix mem3_sync_event_listener unit tests > --- > > Key: COUCHDB-3380 > URL: https://issues.apache.org/jira/browse/COUCHDB-3380 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > The tests in mem3_sync_event_listener get skipped because of meck issues but > if you run the mem3 eunit tests directly (i.e., make eunit apps=mem3) you'll > see this failure. The change is pretty trivial. Just a matter of this test > never having run in CI because reasons. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3380) Fix mem3_sync_event_listener unit tests
Paul Joseph Davis created COUCHDB-3380: -- Summary: Fix mem3_sync_event_listener unit tests Key: COUCHDB-3380 URL: https://issues.apache.org/jira/browse/COUCHDB-3380 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Paul Joseph Davis The tests in mem3_sync_event_listener get skipped because of meck issues but if you run the mem3 eunit tests directly (i.e., make eunit apps=mem3) you'll see this failure. The change is pretty trivial. Just a matter of this test never having run in CI because reasons. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3379) Fix couch_auth_cache reinitialization logic
Paul Joseph Davis created COUCHDB-3379: -- Summary: Fix couch_auth_cache reinitialization logic Key: COUCHDB-3379 URL: https://issues.apache.org/jira/browse/COUCHDB-3379 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Paul Joseph Davis The reinitialization logic is subtle and quite silly in hindsight. This reacted badly with the PSE work that has a slight change to the order of signals (which nothing should be relying on in an async system :). This simplifies and fixes the reinitialization of couch_auth_cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3378) Fix mango full text detection
[ https://issues.apache.org/jira/browse/COUCHDB-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973279#comment-15973279 ] Paul Joseph Davis commented on COUCHDB-3378: Whoops. Thought GH integration was still broken. > Fix mango full text detection > - > > Key: COUCHDB-3378 > URL: https://issues.apache.org/jira/browse/COUCHDB-3378 > Project: CouchDB > Issue Type: Bug > Components: Mango >Reporter: Paul Joseph Davis > > The renaming of source files for mango's full text adapter was not super > awesome. So I fixed it to not do that. PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3378) Fix mango full text detection
[ https://issues.apache.org/jira/browse/COUCHDB-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973278#comment-15973278 ] Paul Joseph Davis commented on COUCHDB-3378: PR: https://github.com/apache/couchdb/pull/480 > Fix mango full text detection > - > > Key: COUCHDB-3378 > URL: https://issues.apache.org/jira/browse/COUCHDB-3378 > Project: CouchDB > Issue Type: Bug > Components: Mango >Reporter: Paul Joseph Davis > > The renaming of source files for mango's full text adapter was not super > awesome. So I fixed it to not do that. PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3378) Fix mango full text detection
Paul Joseph Davis created COUCHDB-3378: -- Summary: Fix mango full text detection Key: COUCHDB-3378 URL: https://issues.apache.org/jira/browse/COUCHDB-3378 Project: CouchDB Issue Type: Bug Components: Mango Reporter: Paul Joseph Davis The renaming of source files for mango's full text adapter was not super awesome. So I fixed it to not do that. PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3261) Test case couch_compress_tests failed
[ https://issues.apache.org/jira/browse/COUCHDB-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973258#comment-15973258 ] Paul Joseph Davis commented on COUCHDB-3261: I'm not a fan of special casing architectures here to make the tests pass. Looking at the actual change in data it seems that this is more than just changing some byte ordering but that snappy is actually generating different output on a different architecture (which is fine as long as that output is portable between architectures). There are two things here that I think we should change: 1. The actual test comparing compression output to a known value seems rather wrong. I would change those tests to be more along the lines of: check compression doesn't throw an error, check that the output is not identical to the input, and then check that the output binary is smaller than an uncompressed "compression". We may need to add a largish string so that we're giving each algorithm a softball for compression to prevent silly changes in the algorithm from breaking the unit test (as best exemplified by this case) 2. I think we should add the new s390x output to the list of various tests so that we can verify that snappy is capable of reading its own output from various architectures. > Test case couch_compress_tests failed > - > > Key: COUCHDB-3261 > URL: https://issues.apache.org/jira/browse/COUCHDB-3261 > Project: CouchDB > Issue Type: Bug > Components: Test Suite >Reporter: salamani > Labels: test > Attachments: couch_compress_tests.patch > > > CouchDB : 2.0.0 > I have built the CouchDB source for version 2.0.0. > Test case log of couch_compress_tests: > module 'couch_compress_tests' > couch_compress_tests:33: compress_test_...ok > couch_compress_tests:34: compress_test_...ok > couch_compress_tests:35: compress_test_...*failed* > in function couch_compress_tests:'-compress_test_/0-fun-4-'/0 > (test/couch_compress_tests.erl, line 35) > **error:{assertEqual,[{module,couch_compress_tests}, > {line,35}, > {expression,"couch_compress : compress ( ? TERM , snappy )"}, > {expected,<<1,49,64,131,104,1,108,0,0,0,5,...>>}, > {value,<<1,49,60,131,104,1,108,0,0,0,...>>}]} > output:<<"">> > couch_compress_tests:40: decompress_test_...ok > couch_compress_tests:41: decompress_test_...ok > couch_compress_tests:42: decompress_test_...ok > couch_compress_tests:43: decompress_test_...ok > couch_compress_tests:48: recompress_test_...ok > couch_compress_tests:49: recompress_test_...*failed* > in function couch_compress_tests:'-recompress_test_/0-fun-2-'/0 > (test/couch_compress_tests.erl, line 49) > **error:{assertEqual,[{module,couch_compress_tests}, > {line,49}, > {expression,"couch_compress : compress ( ? NONE , snappy )"}, > {expected,<<1,49,64,131,104,1,108,0,0,0,5,...>>}, > {value,<<1,49,60,131,104,1,108,0,0,0,...>>}]} > output:<<"">> > couch_compress_tests:50: recompress_test_...ok > couch_compress_tests:51: recompress_test_...*failed* > in function couch_compress_tests:'-recompress_test_/0-fun-6-'/0 > (test/couch_compress_tests.erl, line 51) > **error:{assertEqual,[{module,couch_compress_tests}, > {line,51}, > {expression,"couch_compress : compress ( ? DEFLATE , snappy )"}, > {expected,<<1,49,64,131,104,1,108,0,0,0,5,...>>}, > {value,<<1,49,60,131,104,1,108,0,0,0,...>>}]} > output:<<"">> > couch_compress_tests:52: recompress_test_...ok > couch_compress_tests:53: recompress_test_...ok > couch_compress_tests:58: is_compressed_test_...ok > couch_compress_tests:59: is_compressed_test_...ok > couch_compress_tests:60: is_compressed_test_...ok > couch_compress_tests:61: is_compressed_test_...ok > couch_compress_tests:62: is_compressed_test_...ok > couch_compress_tests:63: is_compressed_test_...ok > couch_compress_tests:64: is_compressed_test_...ok > couch_compress_tests:65: is_compressed_test_...ok > couch_compress_tests:66: is_compressed_test_...ok > couch_compress_tests:67: is_compressed_test_...ok > couch_compress_tests:68: is_compressed_test_...ok > couch_compress_tests:70: is_compressed_test_...ok > couch_compress_tests:72: is_compressed_test_...ok > [done in 0.078 s] > [done in 0.078 s] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3376) Fix mem3_shards under load
[ https://issues.apache.org/jira/browse/COUCHDB-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969627#comment-15969627 ] Paul Joseph Davis commented on COUCHDB-3376: PR: https://github.com/apache/couchdb/pull/476 > Fix mem3_shards under load > -- > > Key: COUCHDB-3376 > URL: https://issues.apache.org/jira/browse/COUCHDB-3376 > Project: CouchDB > Issue Type: Bug >Reporter: Paul Joseph Davis > > There were two issues with mem3_shards that were fixed while I've been > testing the PSE code. > The first issue was found by [~jaydoane] where a database can have its shards > inserted into the cache after its been deleted. This can happen if a client > does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to > track the changes feed update sequence from the changes feed listener and > only insert shard maps that come from a client that has read as recent of an > update_seq as mem3_shards. > The second issue found during heavy benchmarking was that large shard maps > (in the Q>=128 range) can quite easily cause mem3_shards to backup when > there's a thundering herd attempting to open the database. There's no > coordination among workers trying to add a shard map to the cache so if a > bunch of independent clients all send the shard map at once (say, at the > beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for > this was two fold. First, rather than send the shard map directly to > mem3_shards, we copy it into a spawned process and when/if mem3_shards wants > to write it, it tells this writer process to do its business. The second > optimization for this change is to create an ets table to track these > processes. Then independent clients can check if a shard map is already > enroute to mem3_shards by using ets:insert_new and canceling their writer if > that returns false. > PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3376) Fix mem3_shards under load
Paul Joseph Davis created COUCHDB-3376: -- Summary: Fix mem3_shards under load Key: COUCHDB-3376 URL: https://issues.apache.org/jira/browse/COUCHDB-3376 Project: CouchDB Issue Type: Bug Reporter: Paul Joseph Davis There were two issues with mem3_shards that were fixed while I've been testing the PSE code. The first issue was found by [~jaydoane] where a database can have its shards inserted into the cache after its been deleted. This can happen if a client does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to track the changes feed update sequence from the changes feed listener and only insert shard maps that come from a client that has read as recent of an update_seq as mem3_shards. The second issue found during heavy benchmarking was that large shard maps (in the Q>=128 range) can quite easily cause mem3_shards to backup when there's a thundering herd attempting to open the database. There's no coordination among workers trying to add a shard map to the cache so if a bunch of independent clients all send the shard map at once (say, at the beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for this was two fold. First, rather than send the shard map directly to mem3_shards, we copy it into a spawned process and when/if mem3_shards wants to write it, it tells this writer process to do its business. The second optimization for this change is to create an ets table to track these processes. Then independent clients can check if a shard map is already enroute to mem3_shards by using ets:insert_new and canceling their writer if that returns false. PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3314) Add an option in doc creation APIs to specify a random value for an initial doc revision
[ https://issues.apache.org/jira/browse/COUCHDB-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930326#comment-15930326 ] Paul Joseph Davis commented on COUCHDB-3314: Couple clarifications and summary and opinion: 1. We only do the random revision for the initial revision which does *not* include when a document is deleted and being recreated (as that causes wide revision trees which has its own issues). 2. Random initial revisions are not a hard requirement for clustered purge, its purely to avoid some confusing behavior in a specific scenario of create/purge cycling with the same doc content. 3. I think we're all in agreement that letting people specify their own revisions is still useful and theoretically already Just Works if they use new_edits=false. Opinion: I'd still like to add this pre-3.0 with a config switch that we either remove or swap the default on when 3.0 goes out. Given that at least initially we'll want to be playing with this a lot before a 3.0 (and there's loads of other things we have planned for 3.0 that are backwards incompatible) this still seems best to me. Though we could also just work on specifying the revision pre-3.0 (ie, make sure it works) and then make this change when we get all the things ready for 3.0. Oooh, also it occurs to me that we should look at making specifying the revision without requiring new_edits=false which only works if the document doesn't exist. This way we can do all of this without conditioning users to specify new_edits=false in some situations which could end up causing conflicts. > Add an option in doc creation APIs to specify a random value for an initial > doc revision > > > Key: COUCHDB-3314 > URL: https://issues.apache.org/jira/browse/COUCHDB-3314 > Project: CouchDB > Issue Type: New Feature > Components: Database Core, HTTP Interface >Reporter: Mayya Sharipova > > Currently the initial revision of a document is deterministic. For instance, > anyone that has created an empty document probably recognizes the revision > starting with "1-967a00dff...". In order to account for situations when a > document is continually purged and recreated we're going to add randomness to > this initial revision by specifying a 0-$rev in the request coordinator. We > will then include this in the revision generation but drop the 0-$rev entry > from the revision's path. > Thus, the new API will look like this: > acurl -X PUT > https://http://adm:pass@127.0.0.1:5984/test-db/newdoc1?rev=0-adfdafa123 -d > '{}' > And similarly for _bulk_docs > For a user who wants to create a doc, then purge it, and then re-create, it > is recommended to recreate it with another random revision. > It is important to note that the 0-$rev only affects document creation. Once > a document exists, updates to the document will continue to update their hash > in the same deterministic fashion. Ie, once a document exists, identical > updates will result in identical revisions. > _ > The following changes need to be made in the code: > 1. API changes to allow to specify random rev in doc PUT requests, _bulk_docs > 2. Internals: > 2.1 Use a new revision here: > https://github.com/apache/couchdb-couch/blob/master/src/couch_db.erl#L886 > 2.2 Don't include provided 0-$rev entry to the revision's path (find wherever > new_revid is called from; could be 2-3 places) > 2.3 Reject a 0-$rev during replication -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3314) Add an option in doc creation APIs to specify a random value for an initial doc revision
[ https://issues.apache.org/jira/browse/COUCHDB-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928298#comment-15928298 ] Paul Joseph Davis commented on COUCHDB-3314: To clarify on the create/purge cycle behavior. My worry there is that users would end up seeing either purges that don't seem to take effect and/or creates that don't seem to take effect. As this would be racing with internal replication and read-repair the eventual consistency aspects of the system would I think produce "interesting" results that the random initial revision would solve. > Add an option in doc creation APIs to specify a random value for an initial > doc revision > > > Key: COUCHDB-3314 > URL: https://issues.apache.org/jira/browse/COUCHDB-3314 > Project: CouchDB > Issue Type: New Feature > Components: Database Core, HTTP Interface >Reporter: Mayya Sharipova > > Currently the initial revision of a document is deterministic. For instance, > anyone that has created an empty document probably recognizes the revision > starting with "1-967a00dff...". In order to account for situations when a > document is continually purged and recreated we're going to add randomness to > this initial revision by specifying a 0-$rev in the request coordinator. We > will then include this in the revision generation but drop the 0-$rev entry > from the revision's path. > Thus, the new API will look like this: > acurl -X PUT > https://http://adm:pass@127.0.0.1:5984/test-db/newdoc1?rev=0-adfdafa123 -d > '{}' > And similarly for _bulk_docs > For a user who wants to create a doc, then purge it, and then re-create, it > is recommended to recreate it with another random revision. > It is important to note that the 0-$rev only affects document creation. Once > a document exists, updates to the document will continue to update their hash > in the same deterministic fashion. Ie, once a document exists, identical > updates will result in identical revisions. > _ > The following changes need to be made in the code: > 1. API changes to allow to specify random rev in doc PUT requests, _bulk_docs > 2. Internals: > 2.1 Use a new revision here: > https://github.com/apache/couchdb-couch/blob/master/src/couch_db.erl#L886 > 2.2 Don't include provided 0-$rev entry to the revision's path (find wherever > new_revid is called from; could be 2-3 places) > 2.3 Reject a 0-$rev during replication -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3314) Add an option in doc creation APIs to specify a random value for an initial doc revision
[ https://issues.apache.org/jira/browse/COUCHDB-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928293#comment-15928293 ] Paul Joseph Davis commented on COUCHDB-3314: Couple points for clarification: The random initial revision isn't *required* for clustered purge or permanent deletes. Its merely to try and avoid some *possible* odd behavior if a user has a specific pattern around "create doc, completely purge doc" in that if they cycle quickly enough they may get into a state where things get a bit wonky until they create a second revision. And that's not even known. This just makes things a lot more sane by not having the same revisions floating around in a given document's revision tree. That said, the other reason this was optional was so that we could split it between the 2.x/3.0 branch. One was to make it possible (either via API or config) and then the second ticket was to swap the default on 3.0 release. [~rnewson] I'd say that's only part of the swap and as [~janl] says a rare part of the reasoning. Its after a doc is created in a db that deterministic revisions are mostly important. [~janl] Responding point by point: 1. That's my assumption but I haven't got any hard data either way. 2. Cool 3. This is unrelated. Purge will work the same regardless of how the doc is created. 4/5. The escape hatch I came up with was the 0- hack. Your suggestion to specify a revision with new_edits=false seems better on the face of it cause that kind of even somehow matches the semantics better I think. Creating the same doc in two different databases is almost like a "pre-creation replication" type of operation if that makes sense. > Add an option in doc creation APIs to specify a random value for an initial > doc revision > > > Key: COUCHDB-3314 > URL: https://issues.apache.org/jira/browse/COUCHDB-3314 > Project: CouchDB > Issue Type: New Feature > Components: Database Core, HTTP Interface >Reporter: Mayya Sharipova > > Currently the initial revision of a document is deterministic. For instance, > anyone that has created an empty document probably recognizes the revision > starting with "1-967a00dff...". In order to account for situations when a > document is continually purged and recreated we're going to add randomness to > this initial revision by specifying a 0-$rev in the request coordinator. We > will then include this in the revision generation but drop the 0-$rev entry > from the revision's path. > Thus, the new API will look like this: > acurl -X PUT > https://http://adm:pass@127.0.0.1:5984/test-db/newdoc1?rev=0-adfdafa123 -d > '{}' > And similarly for _bulk_docs > For a user who wants to create a doc, then purge it, and then re-create, it > is recommended to recreate it with another random revision. > It is important to note that the 0-$rev only affects document creation. Once > a document exists, updates to the document will continue to update their hash > in the same deterministic fashion. Ie, once a document exists, identical > updates will result in identical revisions. > _ > The following changes need to be made in the code: > 1. API changes to allow to specify random rev in doc PUT requests, _bulk_docs > 2. Internals: > 2.1 Use a new revision here: > https://github.com/apache/couchdb-couch/blob/master/src/couch_db.erl#L886 > 2.2 Don't include provided 0-$rev entry to the revision's path (find wherever > new_revid is called from; could be 2-3 places) > 2.3 Reject a 0-$rev during replication -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (COUCHDB-3298) Improve couch_btree:chunkify logic
[ https://issues.apache.org/jira/browse/COUCHDB-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis resolved COUCHDB-3298. Resolution: Fixed Merged. > Improve couch_btree:chunkify logic > -- > > Key: COUCHDB-3298 > URL: https://issues.apache.org/jira/browse/COUCHDB-3298 > Project: CouchDB > Issue Type: Improvement > Components: Database Core >Reporter: Paul Joseph Davis > > The current chunkify has problems when reduce functions create large values > in that it will produce chunks (ie, kp nodes) that contain a single key. In > some pathological cases this can create long chains of nodes that never > branch. > The old chunkify would also try and create nodes with an even number of bytes > in each chunk. Given that we don't re-use chunks it makes more sense to try > and pack our chunks as close to the threshold as possible so that we're > creating fewer branches in our tree. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3309) Remove disk_size, data_size and other.data_size attribute from db info blobs
[ https://issues.apache.org/jira/browse/COUCHDB-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883392#comment-15883392 ] Paul Joseph Davis commented on COUCHDB-3309: I have no idea how to set this as blocking so I just assigned a Fix Version. If anyone knows better feel free to correct that. > Remove disk_size, data_size and other.data_size attribute from db info blobs > > > Key: COUCHDB-3309 > URL: https://issues.apache.org/jira/browse/COUCHDB-3309 > Project: CouchDB > Issue Type: Bug > Components: HTTP Interface >Reporter: Paul Joseph Davis > Fix For: 3.0.0 > > > Since 2.0 we've had duplicate keys in our database info blobs for size > fields. I was going to remove these as part of the storage engine work but > that'd be backwards incompatible. I'm opening this ticket and setting it as > blocking for 3.0 so that we remember to remove them when we can make backward > incompatible changes. > Also, to be clear, these are duplicates. The same data is available under the > sizes key with extremely less ambiguous naming (and will be configurable for > storage engines to return whatever they want there). > {code} > { > "compact_running": false, > "data_size": 23403, > "db_name": "test-db", > "disk_format_version": 6, > "disk_size": 513032, > "doc_count": 10, > "doc_del_count": 2, > "instance_start_time": "0", > "other": { > "data_size": 6020 > }, > "purge_seq": 0, > "sizes": { > "active": 23403, > "external": 6020, > "file": 513032 > }, > "update_seq": > "82-g1DveJzLYWBgYMlgTmFQSklKzi9KdUhJMjLWy83PzyvOyMxL1UvOyS9NScwr0ctLLckBqmVKZEiyf1YiH5ouU3y6khyAZFI9No2GeDXmsQBJhgYgBdS7PytRAs1WQ8KaD0A0A22WywIAA-tQaQ" > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections
[ https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876234#comment-15876234 ] Paul Joseph Davis commented on COUCHDB-3302: You logged that nothing sends that message? There's a call to send it in two places: https://github.com/apache/couchdb-fabric/blob/master/src/fabric_doc_attachments.erl#L40 https://github.com/apache/couchdb-fabric/blob/master/src/fabric_doc_attachments.erl#L71 This code is gnarly enough that its quite possible there's something broken with those calls obviously but given how rexi:reply/1 should throw a badmatch if the rexi_from pdict entry isn't set I'm not sure what it'd be. > Attachment replication over low bandwidth network connections > - > > Key: COUCHDB-3302 > URL: https://issues.apache.org/jira/browse/COUCHDB-3302 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Jan Lehnardt > Attachments: attach_large.py, replication-failure.log, > replication-failure-target.log > > > Setup: > Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit > network connection (simulated locally with traffic shaping, see way below for > an example). > {noformat} > git clone https://github.com/apache/couchdb.git > cd couchdb > ./configure --disable-docs --disable-fauxton > make release > cd .. > cp -r couchdb/rel/couchdb source > cp -r couchdb/rel/couchdb target > # set up local ini: chttpd / port: 5981 / 5983 > # set up vm.args: source@hostname.local / target@hostname.local > # no admins > Start both CouchDB in their own terminal windows: ./bin/couchdb > # create all required databases, and our `t` test database > curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t} > # create 64MB attachments > dd if=/dev/urandom of=att-64 bs=1024 count=65536 > # create doc on source > curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: > application/octet-stream' -d @att-64 > # replicate to target > curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json > -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}' > {noformat} > With the traffic shaping in place, the replication call doesn’t return, and > eventually CouchDB fails with: > {noformat} > [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator > Error in process <0.15811.0> on node 'source@hostname.local' with exit value: > {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]} > [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0> > Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; > failed due to error {error, > {'EXIT', > {{{nocatch,{mp_parser_died,noproc}}, > [{couch_att,'-foldl/4-fun-0-',3, >[{file,"src/couch_att.erl"},{line,591}]}, >{couch_att,fold_streamed_data,4, >[{file,"src/couch_att.erl"},{line,642}]}, >{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]}, >{couch_httpd_multipart,atts_to_mp,4, >[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}, > {gen_server,call, > [<0.15778.0>, > {send_req, > {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;, >"127.0.0.1",5983,undefined,undefined, >"/t/doc1?new_edits=false",http,ipv4_address}, >[{"Accept","application/json"}, > {"Content-Length",33194202}, > {"Content-Type", > "multipart/related; > boundary=\"0dea87076009b928b191e0b456375c93\""}, > {"User-Agent","CouchDB-Replicator/2.0.0"}], >put, >{#Fun, > > {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>, > [{att,<<"att_64">>,<<"application/octet-stream">>, > 33193656,33193656, > <<179,112,0,209,198,47,192,236,235,72,84,218,0,177, > 161,242>>, > 1, > {follows,<0.8720.0>,#Ref<0.0.1.23804>}, >
[jira] [Commented] (COUCHDB-3300) Merge all apps that can't be used externally
[ https://issues.apache.org/jira/browse/COUCHDB-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868338#comment-15868338 ] Paul Joseph Davis commented on COUCHDB-3300: It has calls to couch_log and couch_stats or else it'd be find. Its definitely gray area but I think I'd rather keep it part of the mono repo, especially if we ever get around to creating separate data channels between nodes. > Merge all apps that can't be used externally > > > Key: COUCHDB-3300 > URL: https://issues.apache.org/jira/browse/COUCHDB-3300 > Project: CouchDB > Issue Type: Improvement >Reporter: Paul Joseph Davis > > Managing a whole bunch of repos isn't fun. Most of our repos aren't really > useful outside of CouchDB so we're looking to merge them into the main > repository while still leaving our generally useful apps as standalone > repositories. Here's the current list of how we're categorizing repos: > *monorepo* > chttpd > couch > couch_epi > couch_event > couch_index > couch_log > couch_mrview > couch_peruser > couch_plugins > couch_replicator > couch_stats > couch_tests > ddoc_cache > fabric > global_changes > mango > mem3 > rexi > *independent life cycle* > fauxton > docs > setup > *deprecated* > oauth > *standalone* > config > ets_lru > khash > b64url > snappy > ioq > *third-party* > jiffy > rebar > bear > folsom > meck > mochiweb > ibrowse -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3300) Merge all apps that can't be used externally
[ https://issues.apache.org/jira/browse/COUCHDB-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868252#comment-15868252 ] Paul Joseph Davis commented on COUCHDB-3300: Also, here's the script to generate the merged repository: https://gist.github.com/davisp/99d1ac0516e0a0d02104b123e79ff6a0 With this and the patches listed above I got everything compiled and a dev cluster running. If someone wants to check that work that'd be nice. Also I went to push a COUCHDB-3300-merge-repos branch on couchdb.git but it failed after writing a whole bunch of stuff. So we may have to talk to infra about that. I also realized after it was writing that that may generate thousands of notifications since we're adding a whole bunch of commits at once. {code} #!/bin/bash -e rm -rf couchdb git clone https://github.com/apache/couchdb.git cd couchdb echo "" add_subtree () { name=$1 if [ -z "$2" ]; then path=`echo $1 | sed -e 's/-/_/g'` else path=$2 fi echo "Adding couchdb-$name.git as src/$path" git subtree add -P src/$path https://github.com/apache/couchdb-$name.git master echo "" } add_subtree "chttpd" add_subtree "couch" add_subtree "couch-epi" add_subtree "couch-event" add_subtree "couch-index" add_subtree "couch-log" add_subtree "couch-mrview" add_subtree "peruser" "couch_peruser" add_subtree "couch-plugins" add_subtree "couch-replicator" add_subtree "couch-stats" add_subtree "erlang-tests" "couch_tests" add_subtree "ddoc-cache" add_subtree "fabric" add_subtree "global-changes" add_subtree "mango" add_subtree "mem3" add_subtree "rexi" {code} > Merge all apps that can't be used externally > > > Key: COUCHDB-3300 > URL: https://issues.apache.org/jira/browse/COUCHDB-3300 > Project: CouchDB > Issue Type: Improvement >Reporter: Paul Joseph Davis > > Managing a whole bunch of repos isn't fun. Most of our repos aren't really > useful outside of CouchDB so we're looking to merge them into the main > repository while still leaving our generally useful apps as standalone > repositories. Here's the current list of how we're categorizing repos: > *monorepo* > chttpd > couch > couch_epi > couch_event > couch_index > couch_log > couch_mrview > couch_peruser > couch_plugins > couch_replicator > couch_stats > couch_tests > ddoc_cache > fabric > global_changes > mango > mem3 > rexi > *independent life cycle* > fauxton > docs > setup > *deprecated* > oauth > *standalone* > config > ets_lru > khash > b64url > snappy > ioq > *third-party* > jiffy > rebar > bear > folsom > meck > mochiweb > ibrowse -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3300) Merge all apps that can't be used externally
[ https://issues.apache.org/jira/browse/COUCHDB-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868249#comment-15868249 ] Paul Joseph Davis commented on COUCHDB-3300: Here's the patch I needed to make things compile: https://gist.github.com/davisp/218c17a96886f05dc4e0e6b5fef99f4c (pasted below as well) Also, this patch to setup to fix a search path: https://git-wip-us.apache.org/repos/asf?p=couchdb-setup.git;a=blobdiff;f=src/setup.erl;h=085decce63ed5243c3792692ea109036850e21a2;hp=b27c6c63dca63d1032cd937b6c85a547c65b789a;hb=bdf96f926952071c5b8b7b04d6c4de932aee6d65;hpb=e8d1e32ba3b4f5f3be0e06e5269b12d811f24d52 {code} commit 6bfc236edea2ac9e285517056dabeaf67f7cd7f7 Author: Paul J. DavisDate: Wed Feb 15 11:46:31 2017 -0600 Fix rebar configuration after repository merge diff --git a/rebar.config.script b/rebar.config.script index 85d5c94fc..9770a3f6c 100644 --- a/rebar.config.script +++ b/rebar.config.script @@ -21,42 +21,52 @@ os:putenv("COUCHDB_CONFIG", ConfigureEnv). os:putenv("COUCHDB_APPS_CONFIG_DIR", filename:join([COUCHDB_ROOT, "rel/apps"])). +SubDirs = [ +%% must be compiled first as it has a custom behavior +"src/couch_epi", +"src/couch_log", +"src/chttpd", +"src/couch", +"src/couch_index", +"src/couch_mrview", +"src/couch_replicator", +"src/couch_plugins", +"src/couch_event", +"src/couch_stats", +"src/couch_peruser", +"src/couch_tests", +"src/ddoc_cache", +"src/fabric", +"src/global_changes", +"src/mango", +"src/mem3", +"src/rexi", +"rel" +], + DepDescs = [ -%% must be compiled first as it has a custom behavior -{couch_epi,"couch-epi", "60e7f808513b2611eb412cf641d6e7132dda2a30"}, +%% Independent Apps {config, "config", "f62d553b337ce975edb0fb68772d22bdd3bf6490"}, -%% keep these sorted {b64url, "b64url", "6895652d80f95cdf04efb14625abed868998f174"}, -{couch_log,"couch-log", "ad803f66dbd1900b67543259142875a6d03503ce"}, -{chttpd, "chttpd", "cb0f20ea0898cd24ff8ac0617b326874088d9157"}, -{couch,"couch", "66292dbdfee1a6d5981085d7e50751feacf860c8"}, -{couch_index, "couch-index", "f0a6854e578469612937a766632fdcdc52ee9c65"}, -{couch_mrview, "couch-mrview", "e1d13a983a0ba56fcb1eb31c4e4fe56bc3692719"}, -{couch_replicator, "couch-replicator", "648e465f54f538a133fb31c9b1e3b487a6f2ca7c"}, -{couch_plugins,"couch-plugins", "3e73b723cb126cfc471b560d17c24a8b5c540085"}, -{couch_event, "couch-event", "7e382132219d708239306aa3591740694943d367"}, -{couch_stats, "couch-stats", "7895d4d3f509ed24f09b6d1a0bd0e06af34551dc"}, -{couch_peruser,"peruser", "4eea9571171a5b41d832da32204a1122a01f4b0e"}, -{couch_tests, "erlang-tests", "37b3bfeb4b1a48a592456e67991362e155ed81e0"}, -{docs, "documentation", "59a887a97f9b6befc6de0c5bdaf17d79fb7f915d", [raw]}, -{ddoc_cache, "ddoc-cache", "c762e90a33ce3cda19ef142dd1120f1087ecd876"}, {ets_lru, "ets-lru", "c05488c8b1d7ec1c3554a828e0c9bf2888932ed6"}, -{fabric, "fabric", "ec2235196d7195afab59cedc2d61a02b11596ab4"}, +{ioq, "ioq", "1d2b149ee12dfeaf8d89a67b2f937207f4c5bdf2"}, +{khash,"khash", "7c6a9cd9776b5c6f063ccafedfa984b00877b019"}, +{snappy, "snappy", "a728b960611d0795025de7e9668d06b9926c479d"}, +{setup,"setup", "e8d1e32ba3b4f5f3be0e06e5269b12d811f24d52"}, + +%% Non-Erlang deps +{docs, "documentation", "59a887a97f9b6befc6de0c5bdaf17d79fb7f915d", [raw]}, {fauxton, "fauxton", {tag, "v1.1.9"}, [raw]}, + +%% Third party deps {folsom, "folsom", "a5c95dec18227c977029fbd3b638966d98f17003"}, -{global_changes, "global-changes", "f6e4c5629a7d996d284e4489f1897c057823f846"}, {ibrowse, "ibrowse", "4af2d408607874d124414ac45df1edbe3961d1cd"}, -{ioq, "ioq", "1d2b149ee12dfeaf8d89a67b2f937207f4c5bdf2"}, {jiffy,"jiffy", "d3c00e19d8fa20c21758402231247602190988d3"}, -{khash,"khash", "7c6a9cd9776b5c6f063ccafedfa984b00877b019"}, -{mango,"mango", "4afd60e84d0e1c57f5d6a1e3542955faa565ca4b"}, -{mem3, "mem3", "c3c5429180de14a2b139f7741c934143ef73988c"}, {mochiweb, "mochiweb", "bd6ae7cbb371666a1f68115056f7b30d13765782"}, -{oauth,"oauth", "099057a98e41f3aff91e77e3cf496d6c6fd901df"}, -{rexi, "rexi", "a327b7dbeb2b0050f7ca9072047bf8ef2d282833"}, -{snappy, "snappy", "a728b960611d0795025de7e9668d06b9926c479d"}, -{setup,"setup",
[jira] [Created] (COUCHDB-3300) Merge all apps that can't be used externally
Paul Joseph Davis created COUCHDB-3300: -- Summary: Merge all apps that can't be used externally Key: COUCHDB-3300 URL: https://issues.apache.org/jira/browse/COUCHDB-3300 Project: CouchDB Issue Type: Improvement Reporter: Paul Joseph Davis Managing a whole bunch of repos isn't fun. Most of our repos aren't really useful outside of CouchDB so we're looking to merge them into the main repository while still leaving our generally useful apps as standalone repositories. Here's the current list of how we're categorizing repos: # monorepo chttpd couch couch_epi couch_event couch_index couch_log couch_mrview couch_peruser couch_plugins couch_replicator couch_stats couch_tests ddoc_cache fabric global_changes mango mem3 rexi # independent life cycle fauxton docs setup #deprecated oauth # standalone config ets_lru khash b64url snappy ioq # third-party jiffy rebar bear folsom meck mochiweb ibrowse -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3298) Improve couch_btree:chunkify logic
Paul Joseph Davis created COUCHDB-3298: -- Summary: Improve couch_btree:chunkify logic Key: COUCHDB-3298 URL: https://issues.apache.org/jira/browse/COUCHDB-3298 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Paul Joseph Davis The current chunkify has problems when reduce functions create large values in that it will produce chunks (ie, kp nodes) that contain a single key. In some pathological cases this can create long chains of nodes that never branch. The old chunkify would also try and create nodes with an even number of bytes in each chunk. Given that we don't re-use chunks it makes more sense to try and pack our chunks as close to the threshold as possible so that we're creating fewer branches in our tree. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3288) Remove access to the #db{} record
Paul Joseph Davis created COUCHDB-3288: -- Summary: Remove access to the #db{} record Key: COUCHDB-3288 URL: https://issues.apache.org/jira/browse/COUCHDB-3288 Project: CouchDB Issue Type: Improvement Reporter: Paul Joseph Davis To enable a mixed cluster upgrade (i.e., rolling reboot upgrade) we need to do some preparatory work to remove access to the #db{} record since this record is shared between nodes. This work is all straight forward and just involves changing things like Db#db.main_pid to couch_db:get_main_pid(Db) or similar. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3287) Implement pluggable storage engines
Paul Joseph Davis created COUCHDB-3287: -- Summary: Implement pluggable storage engines Key: COUCHDB-3287 URL: https://issues.apache.org/jira/browse/COUCHDB-3287 Project: CouchDB Issue Type: Improvement Reporter: Paul Joseph Davis Opening branches for the pluggable storage engine work described here: http://mail-archives.apache.org/mod_mbox/couchdb-dev/201606.mbox/%3CCAJ_m3YDjA9xym_JRVtd6Xi7LX7Ajwc6EmH_wyCRD1jgTzk8mKA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (COUCHDB-3255) Conflicts introduced by recreating docs with attachments
[ https://issues.apache.org/jira/browse/COUCHDB-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3255. -- Resolution: Fixed > Conflicts introduced by recreating docs with attachments > > > Key: COUCHDB-3255 > URL: https://issues.apache.org/jira/browse/COUCHDB-3255 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > When a document is re-created with an attachment it receives a > non-deterministic revision. This is due to a fairly old commit [1] that > introduced the behavior by accidentally including information about revisions > on disk into the revision id calculation when the revision id was being > calculated by couch_db_updater when it realized that the update was > re-creating a document that was previously deleted. > I'm opening a PR with the fix. > [1] > https://github.com/apache/couchdb-couch/commit/08a94d582cd3086ebcbd51ad8ac98ca6df98a1b7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3255) Conflicts introduced by recreating docs with attachments
[ https://issues.apache.org/jira/browse/COUCHDB-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746609#comment-15746609 ] Paul Joseph Davis commented on COUCHDB-3255: PR: https://github.com/apache/couchdb-couch/pull/218 > Conflicts introduced by recreating docs with attachments > > > Key: COUCHDB-3255 > URL: https://issues.apache.org/jira/browse/COUCHDB-3255 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > When a document is re-created with an attachment it receives a > non-deterministic revision. This is due to a fairly old commit [1] that > introduced the behavior by accidentally including information about revisions > on disk into the revision id calculation when the revision id was being > calculated by couch_db_updater when it realized that the update was > re-creating a document that was previously deleted. > I'm opening a PR with the fix. > [1] > https://github.com/apache/couchdb-couch/commit/08a94d582cd3086ebcbd51ad8ac98ca6df98a1b7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3255) Conflicts introduced by recreating docs with attachments
Paul Joseph Davis created COUCHDB-3255: -- Summary: Conflicts introduced by recreating docs with attachments Key: COUCHDB-3255 URL: https://issues.apache.org/jira/browse/COUCHDB-3255 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Paul Joseph Davis When a document is re-created with an attachment it receives a non-deterministic revision. This is due to a fairly old commit [1] that introduced the behavior by accidentally including information about revisions on disk into the revision id calculation when the revision id was being calculated by couch_db_updater when it realized that the update was re-creating a document that was previously deleted. I'm opening a PR with the fix. [1] https://github.com/apache/couchdb-couch/commit/08a94d582cd3086ebcbd51ad8ac98ca6df98a1b7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3251) Remove hot loop usage of filename:rootname/1
[ https://issues.apache.org/jira/browse/COUCHDB-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3251. -- > Remove hot loop usage of filename:rootname/1 > > > Key: COUCHDB-3251 > URL: https://issues.apache.org/jira/browse/COUCHDB-3251 > Project: CouchDB > Issue Type: Improvement > Components: Database Core >Reporter: Paul Joseph Davis > > We added a call to filename:rootname/1 that removes the ".couch" extension > when it exists. We've been doing some profiling of CouchDB 2.0 recently and > found this to be a fairly expensive call. It and related calls are in the top > few most expensive functions according to eprof (this is VM wide, so not just > cherry picking couch_server where its actually even worse). > {code} > lists:zip/2 > 157491702 1.35 77463688 [ 0.49] > erlang:setelement/3 > 139509262 1.48 85212600 [ 0.61] > erlang:term_to_binary/2 > 14724676 1.52 87419458 [ 5.94] > erlang:phash/2 > 30943420 1.54 88195214 [ 2.85] > erlang:send/3 > 13487486 2.06 118261137 [ 8.77] > filename:rootname/4 > 514574672 2.59 148907072 [ 0.29] > ets:lookup/2 > 32852756 2.66 152952875 [ 4.66] > erts_internal:port_command/3 > 10448091 2.95 169649699 [ 16.24] > ioq_server:matching_request/4 > 906453003 3.19 183041235 [ 0.20] > ioq_server:split/4 > 535820540 3.31 189913578 [ 0.35] > snappy:compress/1 > 7950803 3.42 196220575 [ 24.68] > filename:do_flatten/2 > 516517594 4.21 241562020 [ 0.47] > gen_server:try_handle_call/4 > 9529789 5.66 324927694 [ 34.10] > gen_server:loop/6 > 16844687 7.41 425628355 [ 25.27] > {code} > There's an obvious easy way to optimize this by using binary matching so > simple PR is incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3251) Remove hot loop usage of filename:rootname/1
[ https://issues.apache.org/jira/browse/COUCHDB-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis resolved COUCHDB-3251. Resolution: Fixed Merged. > Remove hot loop usage of filename:rootname/1 > > > Key: COUCHDB-3251 > URL: https://issues.apache.org/jira/browse/COUCHDB-3251 > Project: CouchDB > Issue Type: Improvement > Components: Database Core >Reporter: Paul Joseph Davis > > We added a call to filename:rootname/1 that removes the ".couch" extension > when it exists. We've been doing some profiling of CouchDB 2.0 recently and > found this to be a fairly expensive call. It and related calls are in the top > few most expensive functions according to eprof (this is VM wide, so not just > cherry picking couch_server where its actually even worse). > {code} > lists:zip/2 > 157491702 1.35 77463688 [ 0.49] > erlang:setelement/3 > 139509262 1.48 85212600 [ 0.61] > erlang:term_to_binary/2 > 14724676 1.52 87419458 [ 5.94] > erlang:phash/2 > 30943420 1.54 88195214 [ 2.85] > erlang:send/3 > 13487486 2.06 118261137 [ 8.77] > filename:rootname/4 > 514574672 2.59 148907072 [ 0.29] > ets:lookup/2 > 32852756 2.66 152952875 [ 4.66] > erts_internal:port_command/3 > 10448091 2.95 169649699 [ 16.24] > ioq_server:matching_request/4 > 906453003 3.19 183041235 [ 0.20] > ioq_server:split/4 > 535820540 3.31 189913578 [ 0.35] > snappy:compress/1 > 7950803 3.42 196220575 [ 24.68] > filename:do_flatten/2 > 516517594 4.21 241562020 [ 0.47] > gen_server:try_handle_call/4 > 9529789 5.66 324927694 [ 34.10] > gen_server:loop/6 > 16844687 7.41 425628355 [ 25.27] > {code} > There's an obvious easy way to optimize this by using binary matching so > simple PR is incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3251) Remove hot loop usage of filename:rootname/1
Paul Joseph Davis created COUCHDB-3251: -- Summary: Remove hot loop usage of filename:rootname/1 Key: COUCHDB-3251 URL: https://issues.apache.org/jira/browse/COUCHDB-3251 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Paul Joseph Davis We added a call to filename:rootname/1 that removes the ".couch" extension when it exists. We've been doing some profiling of CouchDB 2.0 recently and found this to be a fairly expensive call. It and related calls are in the top few most expensive functions according to eprof (this is VM wide, so not just cherry picking couch_server where its actually even worse). {code} lists:zip/2 157491702 1.35 77463688 [ 0.49] erlang:setelement/3 139509262 1.48 85212600 [ 0.61] erlang:term_to_binary/2 14724676 1.52 87419458 [ 5.94] erlang:phash/2 30943420 1.54 88195214 [ 2.85] erlang:send/3 13487486 2.06 118261137 [ 8.77] filename:rootname/4 514574672 2.59 148907072 [ 0.29] ets:lookup/2 32852756 2.66 152952875 [ 4.66] erts_internal:port_command/3 10448091 2.95 169649699 [ 16.24] ioq_server:matching_request/4 906453003 3.19 183041235 [ 0.20] ioq_server:split/4 535820540 3.31 189913578 [ 0.35] snappy:compress/1 7950803 3.42 196220575 [ 24.68] filename:do_flatten/2 516517594 4.21 241562020 [ 0.47] gen_server:try_handle_call/4 9529789 5.66 324927694 [ 34.10] gen_server:loop/6 16844687 7.41 425628355 [ 25.27] {code} There's an obvious easy way to optimize this by using binary matching so simple PR is incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3239) incorrect ordering of results when using open_revs and latest=true
[ https://issues.apache.org/jira/browse/COUCHDB-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725967#comment-15725967 ] Paul Joseph Davis commented on COUCHDB-3239: Hah, well that'd do it. > incorrect ordering of results when using open_revs and latest=true > -- > > Key: COUCHDB-3239 > URL: https://issues.apache.org/jira/browse/COUCHDB-3239 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Affects Versions: 2.0.0 >Reporter: Will Holley > Attachments: docs.json > > > When fetching open_revs with latest=true for a conflicted document, the order > of results is incorrect. For example, if I create a document with the rev > tree: > {code} > 4-d1 > / > 3-c1 > / > 2-b1 > / > 1-a > \ > 2-b2 > \ > 3-c2 > {code} > and ask for {{open_revs=["2-b1","2-b2"]=true}}, the response will > return {{3-c2}} followed by {{4-d1}} - the reverse of what I'd expect. > Below is a test/reproduction executed against Couch 1.6.1 and 2.0. > 1.6.1: > {code} > $ export COUCH_HOST="http://127.0.0.1:5984; > $ curl -XPUT "$COUCH_HOST/open_revs_test" > {"ok":true} > $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H > "Content-Type:application/json" -XPOST -d @docs.json > [] > # GET open_revs=["2-b1","2-b2"] > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D" > [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}] > # GET open_revs=["2-b1","2-b2"]=true > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true" > [{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}},{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}}] > {code} > 2.0: > {code} > $ export COUCH_HOST="http://127.0.0.1:15984; > $ curl -XPUT "$COUCH_HOST/open_revs_test" > {"ok":true} > $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H > "Content-Type:application/json" -XPOST -d @docs.json > [] > # GET open_revs=["2-b1","2-b2"] > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D" > [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}] > # GET open_revs=["2-b1","2-b2"]=true > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true" > [{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}},{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}}] > {code} > Note the reversed order of the results in 2.0 when {{latest=true}} is > specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3239) incorrect ordering of results when using open_revs and latest=true
[ https://issues.apache.org/jira/browse/COUCHDB-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725798#comment-15725798 ] Paul Joseph Davis commented on COUCHDB-3239: With your example specifying the 1-a would be the easiest way. If it doesn't return multiple then its broken. > incorrect ordering of results when using open_revs and latest=true > -- > > Key: COUCHDB-3239 > URL: https://issues.apache.org/jira/browse/COUCHDB-3239 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Affects Versions: 2.0.0 >Reporter: Will Holley > Attachments: docs.json > > > When fetching open_revs with latest=true for a conflicted document, the order > of results is incorrect. For example, if I create a document with the rev > tree: > {code} > 4-d1 > / > 3-c1 > / > 2-b1 > / > 1-a > \ > 2-b2 > \ > 3-c2 > {code} > and ask for {{open_revs=["2-b1","2-b2"]=true}}, the response will > return {{3-c2}} followed by {{4-d1}} - the reverse of what I'd expect. > Below is a test/reproduction executed against Couch 1.6.1 and 2.0. > 1.6.1: > {code} > $ export COUCH_HOST="http://127.0.0.1:5984; > $ curl -XPUT "$COUCH_HOST/open_revs_test" > {"ok":true} > $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H > "Content-Type:application/json" -XPOST -d @docs.json > [] > # GET open_revs=["2-b1","2-b2"] > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D" > [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}] > # GET open_revs=["2-b1","2-b2"]=true > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true" > [{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}},{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}}] > {code} > 2.0: > {code} > $ export COUCH_HOST="http://127.0.0.1:15984; > $ curl -XPUT "$COUCH_HOST/open_revs_test" > {"ok":true} > $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H > "Content-Type:application/json" -XPOST -d @docs.json > [] > # GET open_revs=["2-b1","2-b2"] > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D" > [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}] > # GET open_revs=["2-b1","2-b2"]=true > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true" > [{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}},{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}}] > {code} > Note the reversed order of the results in 2.0 when {{latest=true}} is > specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3239) incorrect ordering of results when using open_revs and latest=true
[ https://issues.apache.org/jira/browse/COUCHDB-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725561#comment-15725561 ] Paul Joseph Davis commented on COUCHDB-3239: [~wilhol] Yeah, the docs are pretty bad for latest=true: "Forces retrieving latest “leaf” revision, no matter what rev was requested. Default is false" Even for a single node latest=true might return multiple revisions and it doesn't say anything about ordering. A commit there would be useful. We'd also probably want to add a note that explains how complicated that API call can get. In hind sight, the open_revs and latest=true parameters should have probably been different API end points since they fundamentally change the body from a single doc with optional info into a multiple doc body response. > incorrect ordering of results when using open_revs and latest=true > -- > > Key: COUCHDB-3239 > URL: https://issues.apache.org/jira/browse/COUCHDB-3239 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Affects Versions: 2.0.0 >Reporter: Will Holley > Attachments: docs.json > > > When fetching open_revs with latest=true for a conflicted document, the order > of results is incorrect. For example, if I create a document with the rev > tree: > {code} > 4-d1 > / > 3-c1 > / > 2-b1 > / > 1-a > \ > 2-b2 > \ > 3-c2 > {code} > and ask for {{open_revs=["2-b1","2-b2"]=true}}, the response will > return {{3-c2}} followed by {{4-d1}} - the reverse of what I'd expect. > Below is a test/reproduction executed against Couch 1.6.1 and 2.0. > 1.6.1: > {code} > $ export COUCH_HOST="http://127.0.0.1:5984; > $ curl -XPUT "$COUCH_HOST/open_revs_test" > {"ok":true} > $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H > "Content-Type:application/json" -XPOST -d @docs.json > [] > # GET open_revs=["2-b1","2-b2"] > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D" > [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}] > # GET open_revs=["2-b1","2-b2"]=true > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true" > [{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}},{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}}] > {code} > 2.0: > {code} > $ export COUCH_HOST="http://127.0.0.1:15984; > $ curl -XPUT "$COUCH_HOST/open_revs_test" > {"ok":true} > $ curl "$COUCH_HOST/open_revs_test/_bulk_docs" -H > "Content-Type:application/json" -XPOST -d @docs.json > [] > # GET open_revs=["2-b1","2-b2"] > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D" > [{"ok":{"_id":"mydoc","_rev":"2-b1","value":"x-winning"}},{"ok":{"_id":"mydoc","_rev":"2-b2","value":"x-losing"}}] > # GET open_revs=["2-b1","2-b2"]=true > $ curl -H "Accept:application/json" > "$COUCH_HOST/open_revs_test/mydoc?open_revs=%5B%222-b1%22%2C%222-b2%22%5D=true" > [{"ok":{"_id":"mydoc","_rev":"3-c2","value":"y-losing"}},{"ok":{"_id":"mydoc","_rev":"4-d1","value":"z-winning"}}] > {code} > Note the reversed order of the results in 2.0 when {{latest=true}} is > specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3234) Track open shard timeouts with a counter instead of logging
[ https://issues.apache.org/jira/browse/COUCHDB-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3234. -- Resolution: Fixed Merged. > Track open shard timeouts with a counter instead of logging > --- > > Key: COUCHDB-3234 > URL: https://issues.apache.org/jira/browse/COUCHDB-3234 > Project: CouchDB > Issue Type: Improvement > Components: Database Core >Reporter: Paul Joseph Davis > > Fabric uses the open_shard RPC method to get security objects for every > request. These calls have very short timeouts on them which can cause massive > amounts of log spam when a node is under load. Rather than log a whole bunch > of garbage when each one fails lets just use a counter instead. > PR incoming -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3234) Track open shard timeouts with a counter instead of logging
Paul Joseph Davis created COUCHDB-3234: -- Summary: Track open shard timeouts with a counter instead of logging Key: COUCHDB-3234 URL: https://issues.apache.org/jira/browse/COUCHDB-3234 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Paul Joseph Davis Fabric uses the open_shard RPC method to get security objects for every request. These calls have very short timeouts on them which can cause massive amounts of log spam when a node is under load. Rather than log a whole bunch of garbage when each one fails lets just use a counter instead. PR incoming -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3191) Improve couch_lru performance
Paul Joseph Davis created COUCHDB-3191: -- Summary: Improve couch_lru performance Key: COUCHDB-3191 URL: https://issues.apache.org/jira/browse/COUCHDB-3191 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Paul Joseph Davis This ticket is to track work around updating couch_lru to be more performant. So far I have a change that replaces the gb_tree/dict pair with two khash'es. This approach allows us to change the algorithmic speed from O(N log N) to O(1) which should in theory make this faster. This is motivated by the poor behavior of couch_server when under load by lots of concurrent clients and a high max_dbs_open value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3178) Fabric does not send message when filtering lots of documents
[ https://issues.apache.org/jira/browse/COUCHDB-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546713#comment-15546713 ] Paul Joseph Davis commented on COUCHDB-3178: Yeap. That fixed it. Kind of amazing how something like that can have such a profound impact on the system. For background, what would happen is that when we got a call to the clustered _changes endpoint, we'd fire off RPC workers for each shard and wait to hear back from them. Which we never did so we'd timeout. However, the rpc workers were still furiously looking for docs that passed the filter which was just wasting resources since their coordinator had already abandoned them. So now filtered changes feeds work again when they have to filter lots of rows (once we merge the PR and get it into a relase). > Fabric does not send message when filtering lots of documents > - > > Key: COUCHDB-3178 > URL: https://issues.apache.org/jira/browse/COUCHDB-3178 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > We managed to mess up part of the fabric merge where fabric_rpc workers that > are running filter changes end up not sending a message for long periods of > time if no documents are passing the filter. PR Incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3178) Fabric does not send message when filtering lots of documents
[ https://issues.apache.org/jira/browse/COUCHDB-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546545#comment-15546545 ] Paul Joseph Davis commented on COUCHDB-3178: I should note, that if you have a replication with a filter that's constantly timing out, this is likely the cause. Also, if you have that replication as a replicator doc, we're seeing a large amount of load on various nodes because the couchjs process count is much higher as we're filtering a whole bunch of docs repeatedly because replications are retried by the replication manager. So, while it seems like a small fix it should actually have a fairly sizable impact on cluster performance and resource usage. I'll update more once I've learned more. > Fabric does not send message when filtering lots of documents > - > > Key: COUCHDB-3178 > URL: https://issues.apache.org/jira/browse/COUCHDB-3178 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > We managed to mess up part of the fabric merge where fabric_rpc workers that > are running filter changes end up not sending a message for long periods of > time if no documents are passing the filter. PR Incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3178) Fabric does not send message when filtering lots of documents
Paul Joseph Davis created COUCHDB-3178: -- Summary: Fabric does not send message when filtering lots of documents Key: COUCHDB-3178 URL: https://issues.apache.org/jira/browse/COUCHDB-3178 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Paul Joseph Davis We managed to mess up part of the fabric merge where fabric_rpc workers that are running filter changes end up not sending a message for long periods of time if no documents are passing the filter. PR Incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters
[ https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545522#comment-15545522 ] Paul Joseph Davis commented on COUCHDB-3173: Fixed. PR incoming. > Views return corrupt data for text fields containing non-BMP characters > --- > > Key: COUCHDB-3173 > URL: https://issues.apache.org/jira/browse/COUCHDB-3173 > Project: CouchDB > Issue Type: Bug > Components: JavaScript View Server >Affects Versions: 2.0.0 >Reporter: Loke > > When inserting a non-BMP character (i.e. characters with a Unicode codepoint > above {{U+}}), the content gets corrupted after reading it from a view. > At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT > CHARACTER}} inserted into the text. > To reproduce, use the following commands. > Create the document containing a field with the character {{U+1F604 SMILING > FACE WITH OPEN MOUTH AND SMILING EYES}}: > {noformat} > $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2 > {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"} > {noformat} > Get the document to ensure that it was saved properly: > {noformat} > curl -X GET http://localhost:5984/foo/foo2 > {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""} > {noformat} > Create a view that will return that document: > {noformat} > $ curl --user user:password -X PUT -d > '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}' > http://localhost:5984/foo/_design/bugdemo > {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"} > {noformat} > Get the document from the view: > {noformat} > $ curl -X GET http://localhost:5984/foo/_design/bugdemo/_view/v > {"total_rows":1,"offset":0,"rows":[ > {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}} > ]} > {noformat} > Now we can see that the field {{value}} now contains two characters. The > original character as well as {{U+FFFD}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters
[ https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545511#comment-15545511 ] Paul Joseph Davis commented on COUCHDB-3173: Here's a simpler reproducer: https://gist.github.com/davisp/3cc1a0e5b0de04a3c027f694d5a4bc31 The contents of the gist are pasted below for posterity, but I dunno how well Jira and Chrome will store the raw byte values: repro.js: ["reset", {"reduce_limit":"true", "timeout":5000}] ["add_fun", "function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"] ["map_doc", {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}] run.sh: cat repro.js | ./bin/couchjs share/server/main.js Should have a fix in a few minutes if I'm lucky. > Views return corrupt data for text fields containing non-BMP characters > --- > > Key: COUCHDB-3173 > URL: https://issues.apache.org/jira/browse/COUCHDB-3173 > Project: CouchDB > Issue Type: Bug > Components: JavaScript View Server >Affects Versions: 2.0.0 >Reporter: Loke > > When inserting a non-BMP character (i.e. characters with a Unicode codepoint > above {{U+}}), the content gets corrupted after reading it from a view. > At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT > CHARACTER}} inserted into the text. > To reproduce, use the following commands. > Create the document containing a field with the character {{U+1F604 SMILING > FACE WITH OPEN MOUTH AND SMILING EYES}}: > {noformat} > $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2 > {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"} > {noformat} > Get the document to ensure that it was saved properly: > {noformat} > curl -X GET http://localhost:5984/foo/foo2 > {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""} > {noformat} > Create a view that will return that document: > {noformat} > $ curl --user user:password -X PUT -d > '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}' > http://localhost:5984/foo/_design/bugdemo > {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"} > {noformat} > Get the document from the view: > {noformat} > $ curl -X GET http://localhost:5984/foo/_design/bugdemo/_view/v > {"total_rows":1,"offset":0,"rows":[ > {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}} > ]} > {noformat} > Now we can see that the field {{value}} now contains two characters. The > original character as well as {{U+FFFD}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3143) Make Mango's MR index default limit match the docs
Paul Joseph Davis created COUCHDB-3143: -- Summary: Make Mango's MR index default limit match the docs Key: COUCHDB-3143 URL: https://issues.apache.org/jira/browse/COUCHDB-3143 Project: CouchDB Issue Type: Bug Components: Mango Reporter: Paul Joseph Davis We document that mango indexes return 25 rows per call by default but the code had a large value to basically return unlimited. Fix is to update mango to match the docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3101) Builtin reduce functions should not throw errors
[ https://issues.apache.org/jira/browse/COUCHDB-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422782#comment-15422782 ] Paul Joseph Davis commented on COUCHDB-3101: Adding an "error" key would be difficult, but the almost same format could be: {"key":null, "value":{"error": "invalid input from map function 'name'"}} Which I think would be easy enough. > Builtin reduce functions should not throw errors > > > Key: COUCHDB-3101 > URL: https://issues.apache.org/jira/browse/COUCHDB-3101 > Project: CouchDB > Issue Type: Bug > Components: View Server Support >Reporter: Paul Joseph Davis > > So I just figured out we have an issue with the builtin reduce functions. > Currently, if they receive invalid data they'll throw an error. Unfortunately > what ends up happening is that if the error is never corrected then the view > files end up becoming bloated and refusing to open (because they're searching > for a header as Jay pointed out the other week). > We should either return null or ignore the bad data. My preference would be > to return null so that it indicates bad data was given somewhere but I could > also see just dropping the bad value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3101) Builtin reduce functions should not throw errors
[ https://issues.apache.org/jira/browse/COUCHDB-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418998#comment-15418998 ] Paul Joseph Davis commented on COUCHDB-3101: I should note, I prefer null because if we drop data the user doesn't realize was malformed then they'll have no signal that something is broken and may instead rely on probably invalid data out of the reducer. Also notice the null is just for any reducer that's broken it doesn't null out any other reducer or anything of that nature. So basically if say a user has a _sum reduce function and emits a string as a value this would return null for any reduce query for that specific view. Any other view in the same ddoc would be unaffected. > Builtin reduce functions should not throw errors > > > Key: COUCHDB-3101 > URL: https://issues.apache.org/jira/browse/COUCHDB-3101 > Project: CouchDB > Issue Type: Bug > Components: View Server Support >Reporter: Paul Joseph Davis > > So I just figured out we have an issue with the builtin reduce functions. > Currently, if they receive invalid data they'll throw an error. Unfortunately > what ends up happening is that if the error is never corrected then the view > files end up becoming bloated and refusing to open (because they're searching > for a header as Jay pointed out the other week). > We should either return null or ignore the bad data. My preference would be > to return null so that it indicates bad data was given somewhere but I could > also see just dropping the bad value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3101) Builtin reduce functions should not throw errors
Paul Joseph Davis created COUCHDB-3101: -- Summary: Builtin reduce functions should not throw errors Key: COUCHDB-3101 URL: https://issues.apache.org/jira/browse/COUCHDB-3101 Project: CouchDB Issue Type: Bug Components: View Server Support Reporter: Paul Joseph Davis So I just figured out we have an issue with the builtin reduce functions. Currently, if they receive invalid data they'll throw an error. Unfortunately what ends up happening is that if the error is never corrected then the view files end up becoming bloated and refusing to open (because they're searching for a header as Jay pointed out the other week). We should either return null or ignore the bad data. My preference would be to return null so that it indicates bad data was given somewhere but I could also see just dropping the bad value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3096) Fix config listener handler accumulation
Paul Joseph Davis created COUCHDB-3096: -- Summary: Fix config listener handler accumulation Key: COUCHDB-3096 URL: https://issues.apache.org/jira/browse/COUCHDB-3096 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Paul Joseph Davis We found an issue in production with config listeners piling up in the config_event gen_event server. This was due to how we fixed the API inconsistencies. We had re-parented the handler supervision to the config gen_server instead of the process that wanted config notifications. This means that since config never dies the handlers are never removed. The proposed patch just removes the config gen_server and uses a dedicated gen_server per event handler that handles the gen_event_EXIT messages. PR incoming after I have a ticket number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3067) Improve couch_log implementation
[ https://issues.apache.org/jira/browse/COUCHDB-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis closed COUCHDB-3067. -- Resolution: Fixed Assignee: Paul Joseph Davis Fix Version/s: 2.0.0 Done and done. This has been merged. > Improve couch_log implementation > > > Key: COUCHDB-3067 > URL: https://issues.apache.org/jira/browse/COUCHDB-3067 > Project: CouchDB > Issue Type: Improvement > Components: Logging >Reporter: Paul Joseph Davis >Assignee: Paul Joseph Davis > Fix For: 2.0.0 > > > The current couch_log implementation splits its configuration between > CouchDB's config app and lager's use of the standard sys.config system. > Generally speaking we don't use the fancy features of lager so there's not > much reason to keep it around. This ticket is to remove lager and its > dependencies and fix up the short comings of the existing couch_log app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3092) couch_log_writer_file_test failure on Windows
[ https://issues.apache.org/jira/browse/COUCHDB-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409532#comment-15409532 ] Paul Joseph Davis commented on COUCHDB-3092: [~wohali] Should be fixed on master now. > couch_log_writer_file_test failure on Windows > - > > Key: COUCHDB-3092 > URL: https://issues.apache.org/jira/browse/COUCHDB-3092 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Joan Touzet >Priority: Critical > Fix For: 2.0.0 > > > {noformat} > couch_log_writer_file_test: couch_log_writer_file_test_...*failed* > in function couch_log_writer_file_test:'-check_reopen/0-fun-1-'/2 > (test/couch_log_writer_file_test.erl, line 147) > in call from couch_log_writer_file_test:check_reopen/0 > (test/couch_log_writer_file_test.erl, line 147) > **error:{assertion_failed,[{module,couch_log_writer_file_test}, >{line,147}, >{expression,"element ( 3 , St3 ) /= element ( 3 , St2 )"}, >{expected,true}, >{value,false}]} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3092) couch_log_writer_file_test failure on Windows
[ https://issues.apache.org/jira/browse/COUCHDB-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409520#comment-15409520 ] Paul Joseph Davis commented on COUCHDB-3092: Derp. This is simple. I'll just disable when on Windows. This is just testing that the file will reopen at the path if its been deleted (or moved for log rotation). I'll just disable on Windows. > couch_log_writer_file_test failure on Windows > - > > Key: COUCHDB-3092 > URL: https://issues.apache.org/jira/browse/COUCHDB-3092 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Joan Touzet >Priority: Critical > Fix For: 2.0.0 > > > {noformat} > couch_log_writer_file_test: couch_log_writer_file_test_...*failed* > in function couch_log_writer_file_test:'-check_reopen/0-fun-1-'/2 > (test/couch_log_writer_file_test.erl, line 147) > in call from couch_log_writer_file_test:check_reopen/0 > (test/couch_log_writer_file_test.erl, line 147) > **error:{assertion_failed,[{module,couch_log_writer_file_test}, >{line,147}, >{expression,"element ( 3 , St3 ) /= element ( 3 , St2 )"}, >{expected,true}, >{value,false}]} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3067) Improve couch_log implementation
Paul Joseph Davis created COUCHDB-3067: -- Summary: Improve couch_log implementation Key: COUCHDB-3067 URL: https://issues.apache.org/jira/browse/COUCHDB-3067 Project: CouchDB Issue Type: Bug Components: Logging Reporter: Paul Joseph Davis The current couch_log implementation splits its configuration between CouchDB's config app and lager's use of the standard sys.config system. Generally speaking we don't use the fancy features of lager so there's not much reason to keep it around. This ticket is to remove lager and its dependencies and fix up the short comings of the existing couch_log app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes
[ https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375998#comment-15375998 ] Paul Joseph Davis commented on COUCHDB-2791: Initial implementation seems to be working well enough. I've tested db info, single doc ops, _all_docs, _changes, and basic views. I've disabled any sort of write operation though as that would likely get a cluster into a very bad state if a user wasn't being careful. We can investigate adding write ops when we look at adding safety precautions in the storage engine. All in all this change is rather small and I'm actually fairly happy with how its turned out. Its only superficially tested at this point so there will need to be more done there before we call it good. I only read enough between chttpd/fabric/couch_db to hopefully get all return values consistent. However its possible I missed some differences here and there. Branches are up: https://github.com/apache/couchdb-couch/compare/master...cloudant:2791-allow-shard-access-through-cluster-port https://github.com/apache/couchdb-fabric/compare/master...cloudant:2791-allow-shard-access-through-cluster-port https://github.com/apache/couchdb-chttpd/compare/master...cloudant:2791-allow-shard-access-through-cluster-port Let me know what y'all think. > Allow for direct parallel access to shards via _changes > --- > > Key: COUCHDB-2791 > URL: https://issues.apache.org/jira/browse/COUCHDB-2791 > Project: CouchDB > Issue Type: New Feature > Components: Database Core >Reporter: Tony Sun >Assignee: Tony Sun > > For performance gains, we introduce a new _changes feed option parallel that > returns a list of urls that the user can use to directly access individual > shards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes
[ https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375345#comment-15375345 ] Paul Joseph Davis commented on COUCHDB-2791: I contemplated allowing writes and rejecting things but yeah, the level at which we'd enforce that becomes the issue. I could easily add it in new HTTP handlers I'll be adding but as you note it doesn't help anywhere else. But adding this low enough means that we're tagging individual shards with an idea of their shard range which would be the first time an individual shard has ever known it was part of a cluster database. Which isn't a big deal, its just that there's no current plumbing for that. I'm gonna start work on the read side and will contemplate writes as I go, but I could see it happening after pluggable storage engines land when we're starting to actually muck with the core storage bits again. > Allow for direct parallel access to shards via _changes > --- > > Key: COUCHDB-2791 > URL: https://issues.apache.org/jira/browse/COUCHDB-2791 > Project: CouchDB > Issue Type: New Feature > Components: Database Core >Reporter: Tony Sun >Assignee: Tony Sun > > For performance gains, we introduce a new _changes feed option parallel that > returns a list of urls that the user can use to directly access individual > shards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes
[ https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373851#comment-15373851 ] Paul Joseph Davis commented on COUCHDB-2791: I should note, that we'll likely end up adding a few special API end points to help clients with changes feeds. There's some non-trivial shard replacement logic that's not available to people outside the database (plus it'd be good ot have that logic in one place). Though I may be able to piggy back this onto the existing _shards endpoint by just expanding some of its capabilities. > Allow for direct parallel access to shards via _changes > --- > > Key: COUCHDB-2791 > URL: https://issues.apache.org/jira/browse/COUCHDB-2791 > Project: CouchDB > Issue Type: New Feature > Components: Database Core >Reporter: Tony Sun >Assignee: Tony Sun > > For performance gains, we introduce a new _changes feed option parallel that > returns a list of urls that the user can use to directly access individual > shards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2791) Allow for direct parallel access to shards via _changes
[ https://issues.apache.org/jira/browse/COUCHDB-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373845#comment-15373845 ] Paul Joseph Davis commented on COUCHDB-2791: I'm taking another look at this and contemplating a few different approaches. Originally when we were kicking this idea around the idea was purely motivated by trying to make the changes feed faster by allowing clients to stream from individual shards. However [~kxepal] makes a good point that it probably makes better sense to turn this into a more generic feature allowing access to individual shards. One thing I wanted to put out explicitly is that the idea for this is that it would be available to users over the clustered 5984 port if they want to do fancy advanced stuff client side. Ie, this isn't something for the 5986 port (and will try and avoid using 5986 things since we're looking to get rid of that anyway). Also, as I think about this I think it'd be bad to allow write/modification APIs across this new shard specific interface as that seems like it'd be a super easy way to mess up a clustered database by getting docs in the wrong shard and/or getting shards desynchronized with other settings. So for the time being at least I'm going to limit this to read-only APIs which will basically be fetching shard db info, individual docs, all docs, views, and changes off the top of my head. Beyond that I think I can make this happen as a change to chttpd plus some additional support code to fabric for the new local operations. The end result API I'm looking at will be something like this: http://hostname:5984/dbname/_shard/-/$rest Where $rest is any supported API call that will match the same operations in the cluster case. To implement this i'm planning on adding a new field to the #httpd record that selects the fabric module to use. By default this will be set to fabric which is the current default. I'll then add a fabric_local (or something, if anyone wants to suggest a better name) that will support just the set of things we want to export over this interface. This will then be fairly similar to fabric_rpc internally but without going through RPC/rexi calls and the like. Once that's done then we should hopefully be good to go for making everything work all magically. That seem sane to everyone? > Allow for direct parallel access to shards via _changes > --- > > Key: COUCHDB-2791 > URL: https://issues.apache.org/jira/browse/COUCHDB-2791 > Project: CouchDB > Issue Type: New Feature > Components: Database Core >Reporter: Tony Sun >Assignee: Tony Sun > > For performance gains, we introduce a new _changes feed option parallel that > returns a list of urls that the user can use to directly access individual > shards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3036) Bug in fabric_db_update_listener breaks continuous changes feeds when a node is down
[ https://issues.apache.org/jira/browse/COUCHDB-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis resolved COUCHDB-3036. Resolution: Fixed Merged. > Bug in fabric_db_update_listener breaks continuous changes feeds when a node > is down > > > Key: COUCHDB-3036 > URL: https://issues.apache.org/jira/browse/COUCHDB-3036 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > A recent fix [1] to fabric_db_update_listener uncovered the fact that we were > never starting rexi monitors to know if a node went down during a changes > feed. Fixing that bug lead us to realize that we don't handle rexi_DOWN > messages correctly in fabric_db_udpater. > Patch is incoming. > [1] > https://github.com/apache/couchdb-fabric/commit/b592c390b99a198d6a051c6ed7b0280800cc2939 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3036) Bug in fabric_db_update_listener breaks continuous changes feeds when a node is down
[ https://issues.apache.org/jira/browse/COUCHDB-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324958#comment-15324958 ] Paul Joseph Davis commented on COUCHDB-3036: PR here: https://github.com/apache/couchdb-fabric/pull/56 > Bug in fabric_db_update_listener breaks continuous changes feeds when a node > is down > > > Key: COUCHDB-3036 > URL: https://issues.apache.org/jira/browse/COUCHDB-3036 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Paul Joseph Davis > > A recent fix [1] to fabric_db_update_listener uncovered the fact that we were > never starting rexi monitors to know if a node went down during a changes > feed. Fixing that bug lead us to realize that we don't handle rexi_DOWN > messages correctly in fabric_db_udpater. > Patch is incoming. > [1] > https://github.com/apache/couchdb-fabric/commit/b592c390b99a198d6a051c6ed7b0280800cc2939 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3036) Bug in fabric_db_update_listener breaks continuous changes feeds when a node is down
Paul Joseph Davis created COUCHDB-3036: -- Summary: Bug in fabric_db_update_listener breaks continuous changes feeds when a node is down Key: COUCHDB-3036 URL: https://issues.apache.org/jira/browse/COUCHDB-3036 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Paul Joseph Davis A recent fix [1] to fabric_db_update_listener uncovered the fact that we were never starting rexi monitors to know if a node went down during a changes feed. Fixing that bug lead us to realize that we don't handle rexi_DOWN messages correctly in fabric_db_udpater. Patch is incoming. [1] https://github.com/apache/couchdb-fabric/commit/b592c390b99a198d6a051c6ed7b0280800cc2939 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3026) fabric:open_revs doesn't filter out not_found replies anymore
[ https://issues.apache.org/jira/browse/COUCHDB-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308365#comment-15308365 ] Paul Joseph Davis commented on COUCHDB-3026: Yeap its a bug. I called out [1] that we were using remove ancestors wrong to handle this in my original comment on Alexander's original PR and forgot to make sure to add a function to handle the removal. Adding a simple filter is the right fix. [1] https://github.com/apache/couchdb-fabric/pull/35#issuecomment-152303652 > fabric:open_revs doesn't filter out not_found replies anymore > - > > Key: COUCHDB-3026 > URL: https://issues.apache.org/jira/browse/COUCHDB-3026 > Project: CouchDB > Issue Type: Bug >Reporter: ILYA > > Previously we filtered out `{{not_found,missing}, …}` replies in this line. > We don’t filter them out anymore. Therefore `fabric:open_revs` returns more > than one reply. In some places we assume that the return from open_revs is > always a list with one element in it. As a result we get a badmatch there. > Here is the list of places where we assume single reply: > - https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L699 > - > https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L1040:L1044 > - > https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L1209:L1210 > - > https://github.com/apache/couchdb-ddoc-cache/blob/master/src/ddoc_cache_opener.erl#L123 > - > https://github.com/apache/couchdb-fabric/blob/master/src/fabric_view.erl#L180:L183 > All above places are broken if we don't filter not_found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-2863) function_clause on requesting multiple open_revs with lastest=true
[ https://issues.apache.org/jira/browse/COUCHDB-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis resolved COUCHDB-2863. Resolution: Fixed Fix Version/s: 2.0.0 The fix for this has been merged. > function_clause on requesting multiple open_revs with lastest=true > -- > > Key: COUCHDB-2863 > URL: https://issues.apache.org/jira/browse/COUCHDB-2863 > Project: CouchDB > Issue Type: Bug >Reporter: Alexander Shorin >Assignee: Alexander Shorin >Priority: Blocker > Labels: has-pr > Fix For: 2.0.0 > > > During work on the COUCHDB-2857 found another issue for us: > {code} > $ echo '{}' | http put http://localhost:15984/db/doc > { > "id": "doc", > "ok": true, > "rev": "1-967a00dff5e02add41819138abb3284d" > } > $ echo '{"_rev": "1-967a00dff5e02add41819138abb3284d"}' | http put > http://localhost:15984/db/doc > { > "id": "doc", > "ok": true, > "rev": "2-7051cbe5c8faecd085a3fa619e6e6337" > } > $ http > 'http://localhost:15984/db/doc?open_revs=["1-967a00dff5e02add41819138abb3284d;, > "2-7051cbe5c8faecd085a3fa619e6e6337"]=true' > {"error":"unknown_error","reason":"function_clause","ref":162084788} > $ cat dev/logs/node1.log > 2015-10-28 02:38:26.707 [error] node1@127.0.0.1 <0.1222.0> req_err(162084788) > unknown_error : function_clause > [<<"lists:zipwith/3 L450">>,<<"lists:zipwith/3 > L450">>,<<"fabric_doc_open_revs:handle_message/3 > L104">>,<<"rexi_utils:process_mailbox/6 L55">>,<<"rexi_utils:recv/6 > L49">>,<<"fabric_doc_open_revs:go/4 L47">>,<<"chttpd_db:db_doc_req/3 > L660">>,<<"chttpd:handle_request_int/1 L238">>] > 2015-10-28 02:38:26.707 [error] node1@127.0.0.1 <0.1222.0> httpd 500 error > response: > {"error":"unknown_error","reason":"function_clause","ref":162084788} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2784) Re-optimize skip query-string parameter in clusters
[ https://issues.apache.org/jira/browse/COUCHDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712337#comment-14712337 ] Paul Joseph Davis commented on COUCHDB-2784: Its a more stringent requirement in that you have to be able to re-read from the same snapshot or reverse iteration direction in the current snapshot multiple times. The reason is that when you send the new Foo start key, all but one of the RPC workers will most likely have to back up to Foo. Re-optimize skip query-string parameter in clusters --- Key: COUCHDB-2784 URL: https://issues.apache.org/jira/browse/COUCHDB-2784 Project: CouchDB Issue Type: Improvement Security Level: public(Regular issues) Components: Database Core Reporter: Adam Kocoloski In COUCHDB-977 we implemented a more efficient version of the skip function that relies on the document counts we maintain in the inner nodes of couch_btree. The 2.0 codebase did not initially take advantage of this enhancement, because when a user specifies `skip=X` we don't know a priori how many rows will be skipped from each shard. The current implementation tells each shard to not skip any rows and then has the coordinator discard the first N rows after doing the mergesort. It's O(N) complexity just like the bad old days before COUCHDB-977 and is actually substantially more expensive because of all the message traffic. Good news is we can do better. For a database with Q shards and a request specifying ?skip=X We know that either a) at least one of the shards will end skipping at least `X / Q` rows, or b) the entire response body will be empty. So, I propose the following: # Set the per-shard skip value to `X div Q` #* If a shard has fewer than `X div Q` rows remaining it should send its last row #* If `X div Q` is zero we can short-circuit and just use the current algorithm. # The coordinator sorts the first responses from each shard. It then sends the key of the row that sorts first (let's call it Foo) back to all the shards # Each shard counts the number of rows in between the original startkey and Foo and sends that number, then starts streaming with Foo as the new startkey # The coordinator deducts the computed per-shard skip values from the user-specified skip and then takes care of the remainder in the usual way we do it today (i.e. by consuming the rows as they come in). What do you think? Did I overlook anything here? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2784) Re-optimize skip query-string parameter in clusters
[ https://issues.apache.org/jira/browse/COUCHDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710135#comment-14710135 ] Paul Joseph Davis commented on COUCHDB-2784: Thinking about this for awhile I think it'd work. And in the worst case it'd be a logarithmic number of iterations to find the start key for each shard. I got successfully nerd sniped trying to chase down a closed form solution to get the number of iterations as a function of Q. Thanks for that. Anyway, the biggest thing I see is that this requires a snapshot with rewind capabilities or re-reading from the same snapshot. We'll need to be careful in how we guarantee that. Currently as long as we hold a #db{} record without reopening it we'll be fine. But if we get fancier in the future this will require more thought if the storage engine could change underneath our feet while performing this calculation. An alternative approach that occurs to me that seems a bit easier to digest while placing much stricter restrictions on our btree would be to do a merge sort of the btree traversal if that makes any sense. Basically, we could insert a clustered coordination into the traverse/skip decisions in couch_btree. Of course that means that all storage would always have to be a btree written in Erlang to a fairly specific API. Re-optimize skip query-string parameter in clusters --- Key: COUCHDB-2784 URL: https://issues.apache.org/jira/browse/COUCHDB-2784 Project: CouchDB Issue Type: Improvement Security Level: public(Regular issues) Components: Database Core Reporter: Adam Kocoloski In COUCHDB-977 we implemented a more efficient version of the skip function that relies on the document counts we maintain in the inner nodes of couch_btree. The 2.0 codebase did not initially take advantage of this enhancement, because when a user specifies `skip=X` we don't know a priori how many rows will be skipped from each shard. The current implementation tells each shard to not skip any rows and then has the coordinator discard the first N rows after doing the mergesort. It's O(N) complexity just like the bad old days before COUCHDB-977 and is actually substantially more expensive because of all the message traffic. Good news is we can do better. For a database with Q shards and a request specifying ?skip=X We know that either a) at least one of the shards will end skipping at least `X / Q` rows, or b) the entire response body will be empty. So, I propose the following: # Set the per-shard skip value to `X div Q` #* If a shard has fewer than `X div Q` rows remaining it should send its last row #* If `X div Q` is zero we can short-circuit and just use the current algorithm. # The coordinator sorts the first responses from each shard. It then sends the key of the row that sorts first (let's call it Foo) back to all the shards # Each shard counts the number of rows in between the original startkey and Foo and sends that number, then starts streaming with Foo as the new startkey # The coordinator deducts the computed per-shard skip values from the user-specified skip and then takes care of the remainder in the usual way we do it today (i.e. by consuming the rows as they come in). What do you think? Did I overlook anything here? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2732) Use thread local storage for couch_ejson_compare NIF
[ https://issues.apache.org/jira/browse/COUCHDB-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624988#comment-14624988 ] Paul Joseph Davis commented on COUCHDB-2732: When we saw this in testing my recollection was that we'd probably missed it due to the concurrency issue. For reference, the tests that we use to illustrate the performance difference is to set up a view on a clustered database with q=128 and then ask for a set of 10 rows from that view with a large number of clients bypassing HAProxy. The end result is that we end up having to call couch_ejson_compare when streaming the view response an extremely large number of times in lots of different request handling processes. This was enough to demonstrate that the mutex locking was a global bottleneck. On single node couch the number of collations is significantly smaller because it doesn't have to merge the responses from all 128 shards before returning them to the user. Use thread local storage for couch_ejson_compare NIF Key: COUCHDB-2732 URL: https://issues.apache.org/jira/browse/COUCHDB-2732 Project: CouchDB Issue Type: Improvement Security Level: public(Regular issues) Reporter: Adam Kocoloski Some folks inside IBM have demonstrated conclusively that the NIF we use for JSON sorting is a significant bottleneck with more than a few concurrent users hitting us. The VM ends up spending all of its time dealing with lock contention. We'd be better off sticking with the pure Erlang code, but we have an even better alternative, which is to use thread local storage to pin an allocator to each OS thread and eliminate the locks. Patch forthcoming, but I wanted to make sure this got in the tracker. The improvement looks really signficant. Interestingly, there was some discussion about a performance regression after this was introduced back in COUCHDB-1186 ... maybe the missing element in that discussion was the client concurrency? -- This message was sent by Atlassian JIRA (v6.3.4#6332)