[jira] [Closed] (COUCHDB-2965) Race condition in replicator rescan logic
[ https://issues.apache.org/jira/browse/COUCHDB-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2965. > Race condition in replicator rescan logic > - > > Key: COUCHDB-2965 > URL: https://issues.apache.org/jira/browse/COUCHDB-2965 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Nick Vatamaniuc > > There is race condition between the full rescan and regular change feed > processing in the couch_replicator_manger code. > This race condition would lead to replication docs left in untriggered state > when a rescan of all the docs is performed. The rescan might happen when > nodes connect and disconnect. The likelihood of this race condition appear > goes up if a lot of documents are updated and there is a back-up of messages > in the replicator manager's mailbox. > The race condition happens in the following way: > * A full rescan is initiated here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L424 > It clears the db_to_seq ets table which holds the latest change sequence for > each replicator database. Then launches a scan_all_dbs process. > * scan_all_dbs will find all replicator-looking-like database and for each > send a \{resume_scan, DbName\} message to the main couch_replicator_manager > process. > * \{resume_scan, DbName\} message is handled here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L233 > The expectation is because db_to_seq was reset it ends up not finding a > sequence checkpoint in db_to_seq, so start 0 and spawns a new change feed, > which will rescan all documents (since we need to determine ownership for > them). > But the race condition occurs because when change feeds stop, they call > replicator manager with \{ rep_db_checkpoint, DbName \} message. That updates > db_to_seq ets table with the latest change sequence: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L225 > Which means this sequence of operations could happen: > * db_to_seq is reset to 0, scan_all_dbs is spawned > * change feed stops at sequence 1042, it calls \{rep_db_checkpoint, > <<"_replicator">>\} > * \{rep_db_checkpoint, <<"_replicator">>\} call is handled, now latest > db_to_seq for _replicator is 1042 > * \{resume, <<"_replicator">>\} is sent from scan_all_dbs process and > received by replicator manager. It sees that db_to_seq has _replicator with > latest sequence 1042, so it will either start from that instead of 0, thus > skipping updates from 0 to 1042. > This was seen by running the experiment with1000 replication documents were > being updated. Around document 700 or so , node1 was killed (pkill -f node1) > . node2 experienced the race condition on rescan and never picked up a bunch > of document that should have belong to it. didn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-2965) Race condition in replicator rescan logic
[ https://issues.apache.org/jira/browse/COUCHDB-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-2965. -- Resolution: Fixed > Race condition in replicator rescan logic > - > > Key: COUCHDB-2965 > URL: https://issues.apache.org/jira/browse/COUCHDB-2965 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Nick Vatamaniuc > > There is race condition between the full rescan and regular change feed > processing in the couch_replicator_manger code. > This race condition would lead to replication docs left in untriggered state > when a rescan of all the docs is performed. The rescan might happen when > nodes connect and disconnect. The likelihood of this race condition appear > goes up if a lot of documents are updated and there is a back-up of messages > in the replicator manager's mailbox. > The race condition happens in the following way: > * A full rescan is initiated here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L424 > It clears the db_to_seq ets table which holds the latest change sequence for > each replicator database. Then launches a scan_all_dbs process. > * scan_all_dbs will find all replicator-looking-like database and for each > send a \{resume_scan, DbName\} message to the main couch_replicator_manager > process. > * \{resume_scan, DbName\} message is handled here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L233 > The expectation is because db_to_seq was reset it ends up not finding a > sequence checkpoint in db_to_seq, so start 0 and spawns a new change feed, > which will rescan all documents (since we need to determine ownership for > them). > But the race condition occurs because when change feeds stop, they call > replicator manager with \{ rep_db_checkpoint, DbName \} message. That updates > db_to_seq ets table with the latest change sequence: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L225 > Which means this sequence of operations could happen: > * db_to_seq is reset to 0, scan_all_dbs is spawned > * change feed stops at sequence 1042, it calls \{rep_db_checkpoint, > <<"_replicator">>\} > * \{rep_db_checkpoint, <<"_replicator">>\} call is handled, now latest > db_to_seq for _replicator is 1042 > * \{resume, <<"_replicator">>\} is sent from scan_all_dbs process and > received by replicator manager. It sees that db_to_seq has _replicator with > latest sequence 1042, so it will either start from that instead of 0, thus > skipping updates from 0 to 1042. > This was seen by running the experiment with1000 replication documents were > being updated. Around document 700 or so , node1 was killed (pkill -f node1) > . node2 experienced the race condition on rescan and never picked up a bunch > of document that should have belong to it. didn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2984) mem3_sync event listener performance degrades with high q values
[ https://issues.apache.org/jira/browse/COUCHDB-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277086#comment-15277086 ] ASF GitHub Bot commented on COUCHDB-2984: - Github user asfgit closed the pull request at: https://github.com/apache/couchdb-mem3/pull/19 > mem3_sync event listener performance degrades with high q values > > > Key: COUCHDB-2984 > URL: https://issues.apache.org/jira/browse/COUCHDB-2984 > Project: CouchDB > Issue Type: Improvement >Reporter: Benjamin Anderson > > High throughput applications on databases with high (300+) q values have a > tendency to cause very poor performance. While I don't fully understand the > issue at hand, one clear manifestation is in mem3_sync's event listener. With > high q values, the shard "selection" routine (the <<"shards/",·_/binary>> > head of handle_event/3) will bottleneck on calls to mem3_shards:for_db/1 due > to the large (tens of KB) shard maps in ETS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2984) mem3_sync event listener performance degrades with high q values
[ https://issues.apache.org/jira/browse/COUCHDB-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277081#comment-15277081 ] ASF subversion and git services commented on COUCHDB-2984: -- Commit d3ce2273c0c1eba5b4107e7bb0a83aaa1736cc6a in couchdb-mem3's branch refs/heads/master from [~banjiewen] [ https://git-wip-us.apache.org/repos/asf?p=couchdb-mem3.git;h=d3ce227 ] Refactor mem3_sync events to dedicated module COUCHDB-2984 > mem3_sync event listener performance degrades with high q values > > > Key: COUCHDB-2984 > URL: https://issues.apache.org/jira/browse/COUCHDB-2984 > Project: CouchDB > Issue Type: Improvement >Reporter: Benjamin Anderson > > High throughput applications on databases with high (300+) q values have a > tendency to cause very poor performance. While I don't fully understand the > issue at hand, one clear manifestation is in mem3_sync's event listener. With > high q values, the shard "selection" routine (the <<"shards/",·_/binary>> > head of handle_event/3) will bottleneck on calls to mem3_shards:for_db/1 due > to the large (tens of KB) shard maps in ETS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2984) mem3_sync event listener performance degrades with high q values
[ https://issues.apache.org/jira/browse/COUCHDB-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277084#comment-15277084 ] ASF subversion and git services commented on COUCHDB-2984: -- Commit 0b70afb7cc11f39e894a37211349e711facb in couchdb-mem3's branch refs/heads/master from [~banjiewen] [ https://git-wip-us.apache.org/repos/asf?p=couchdb-mem3.git;h=0b70afb ] Add read_concurrency option to mem3_shards table This table sees a great deal of activity from various subsystems - turning on read_concurrency should be a win. COUCHDB-2984 > mem3_sync event listener performance degrades with high q values > > > Key: COUCHDB-2984 > URL: https://issues.apache.org/jira/browse/COUCHDB-2984 > Project: CouchDB > Issue Type: Improvement >Reporter: Benjamin Anderson > > High throughput applications on databases with high (300+) q values have a > tendency to cause very poor performance. While I don't fully understand the > issue at hand, one clear manifestation is in mem3_sync's event listener. With > high q values, the shard "selection" routine (the <<"shards/",·_/binary>> > head of handle_event/3) will bottleneck on calls to mem3_shards:for_db/1 due > to the large (tens of KB) shard maps in ETS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2984) mem3_sync event listener performance degrades with high q values
[ https://issues.apache.org/jira/browse/COUCHDB-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277082#comment-15277082 ] ASF subversion and git services commented on COUCHDB-2984: -- Commit d5e0a4a19de99b2c6a91c9de8a1bc120664e36d5 in couchdb-mem3's branch refs/heads/master from [~banjiewen] [ https://git-wip-us.apache.org/repos/asf?p=couchdb-mem3.git;h=d5e0a4a ] Reduce frequency of mem3_sync:push/2 calls In high-throughput scenarios on databases with large q values the mem3_sync event listener becomes overloaded with messages due to the poor performance of the shard selection logic. It's not strictly necessary to sync on every update, but we do need to be careful not to lose updates by keeping history too naively. This patch adds a configurable delay and push frequencyto reduce pressure on the mem3_sync event listener. COUCHDB-2984 > mem3_sync event listener performance degrades with high q values > > > Key: COUCHDB-2984 > URL: https://issues.apache.org/jira/browse/COUCHDB-2984 > Project: CouchDB > Issue Type: Improvement >Reporter: Benjamin Anderson > > High throughput applications on databases with high (300+) q values have a > tendency to cause very poor performance. While I don't fully understand the > issue at hand, one clear manifestation is in mem3_sync's event listener. With > high q values, the shard "selection" routine (the <<"shards/",·_/binary>> > head of handle_event/3) will bottleneck on calls to mem3_shards:for_db/1 due > to the large (tens of KB) shard maps in ETS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2984) mem3_sync event listener performance degrades with high q values
[ https://issues.apache.org/jira/browse/COUCHDB-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277083#comment-15277083 ] ASF subversion and git services commented on COUCHDB-2984: -- Commit 130efcd6b0b6f9fa6403e131586dfbf003643074 in couchdb-mem3's branch refs/heads/master from [~banjiewen] [ https://git-wip-us.apache.org/repos/asf?p=couchdb-mem3.git;h=130efcd ] Use ets:select/2 to retrieve shards by name The result of mem3_shards:for_db/1 on databases with high q values can be very large, resulting in suboptimal performance for high-volume callers. mem3_sync_event_listener is only interested in a small subset of the result of mem3_shards:for_db/1; moving this filter in to an ets:select/2 call improves performance significantly. COUCHDB-2984 > mem3_sync event listener performance degrades with high q values > > > Key: COUCHDB-2984 > URL: https://issues.apache.org/jira/browse/COUCHDB-2984 > Project: CouchDB > Issue Type: Improvement >Reporter: Benjamin Anderson > > High throughput applications on databases with high (300+) q values have a > tendency to cause very poor performance. While I don't fully understand the > issue at hand, one clear manifestation is in mem3_sync's event listener. With > high q values, the shard "selection" routine (the <<"shards/",·_/binary>> > head of handle_event/3) will bottleneck on calls to mem3_shards:for_db/1 due > to the large (tens of KB) shard maps in ETS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] couchdb-mem3 pull request: Improve mem3_sync event listener perfor...
Github user asfgit closed the pull request at: https://github.com/apache/couchdb-mem3/pull/19 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] couchdb-mem3 pull request: Improve mem3_sync event listener perfor...
Github user banjiewen commented on the pull request: https://github.com/apache/couchdb-mem3/pull/19#issuecomment-217993884 Squashed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] couchdb-chttpd pull request: Add new metric (histogram) to track n...
Github user kxepal commented on a diff in the pull request: https://github.com/apache/couchdb-chttpd/pull/118#discussion_r62553105 --- Diff: src/chttpd_db.erl --- @@ -448,6 +448,7 @@ db_req(#httpd{method='POST', path_parts=[_, <<"_bulk_get">>]}=Req, Db) -> undefined -> throw({bad_request, <<"Missing JSON list of 'docs'.">>}); Docs -> +couch_stats:update_histogram([couchdb, httpd, bulk_reads], length(Docs)), --- End diff -- I think the name is confusing. If we want to count only "/_bulk_get" requests here, better give such name to the metric instead of generic bulk_reads. Otherwise, if you want to use "bulk_reads", should we count requests with include_docs=true to views and all_docs as bulk reads? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] couchdb-chttpd pull request: Add new metric (histogram) to track n...
Github user brkolla commented on the pull request: https://github.com/apache/couchdb-chttpd/pull/118#issuecomment-217954638 @chewbranca I have updated the code to add a new metric bulk_reads to track the number of doc reads done as part of the bulk_get. Can you review this code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] couchdb-mem3 pull request: Improve mem3_sync event listener perfor...
Github user banjiewen commented on the pull request: https://github.com/apache/couchdb-mem3/pull/19#issuecomment-217951954 @kocolosk: This exact changeset hasn't been tested, but a functionally equivalent one has; the Cloudant mem3 isn't quite in sync with the Apache mem3. Performance results were indeed quite positive. /cc @chewbranca --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] couchdb-mem3 pull request: Make sure mem3_rep autocreates target s...
Github user iilyak commented on a diff in the pull request: https://github.com/apache/couchdb-mem3/pull/21#discussion_r62547512 --- Diff: src/mem3_rpc.erl --- @@ -275,6 +275,16 @@ rexi_call(Node, MFA) -> end. +get_or_create_db(DbName, Options) -> +case couch_db:open_int(DbName, Options) of +{not_found, no_db_file} -> +twig:log(warn, "~p creating ~s", [?MODULE, DbName]), --- End diff -- good catch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (COUCHDB-3011) Unable to create or access database with "/" in name when using futon
[ https://issues.apache.org/jira/browse/COUCHDB-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276605#comment-15276605 ] Dane Jones commented on COUCHDB-3011: - See TurnKey Linux's issue ticket at: https://github.com/turnkeylinux/tracker/issues/629 > Unable to create or access database with "/" in name when using futon > - > > Key: COUCHDB-3011 > URL: https://issues.apache.org/jira/browse/COUCHDB-3011 > Project: CouchDB > Issue Type: Question > Components: Futon, HTTP Interface >Affects Versions: 1.6.1 >Reporter: Dane Jones > Labels: database, forwardslash, no_db_file > > First of all, this CouchDB v1.6.1 appliance was acquired from TurnKeyLinux.com > I'm attempting to create a database named "artist/guid". > With futon I'm using the UI to with name "artist/guid" returns: "no_db_file" > With the HTTP API I used: > curl -u admin -X PUT http://127.0.0.1:5984/artist%2Fguid/ > returns successfully. > Futon lists the DB but upon accessing it I retrieve the "no_db_file" error > again. > If I create a document within the database using the HTTP API is seams to > work fine. > curl -u admin -X POST http://127.0.0.1:5984/artist%2Fguid/ -H "Content-Type: > application/json" -d {} > *edited* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3011) Unable to create or access database with "/" in name when using futon
[ https://issues.apache.org/jira/browse/COUCHDB-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dane Jones closed COUCHDB-3011. --- Resolution: Not A Problem Issue is with TurnKey Linux's configuration of Nginx and not Futon. > Unable to create or access database with "/" in name when using futon > - > > Key: COUCHDB-3011 > URL: https://issues.apache.org/jira/browse/COUCHDB-3011 > Project: CouchDB > Issue Type: Question > Components: Futon, HTTP Interface >Affects Versions: 1.6.1 >Reporter: Dane Jones > Labels: database, forwardslash, no_db_file > > First of all, this CouchDB v1.6.1 appliance was acquired from TurnKeyLinux.com > I'm attempting to create a database named "artist/guid". > With futon I'm using the UI to with name "artist/guid" returns: "no_db_file" > With the HTTP API I used: > curl -u admin -X PUT http://127.0.0.1:5984/artist%2Fguid/ > returns successfully. > Futon lists the DB but upon accessing it I retrieve the "no_db_file" error > again. > If I create a document within the database using the HTTP API is seams to > work fine. > curl -u admin -X POST http://127.0.0.1:5984/artist%2Fguid/ -H "Content-Type: > application/json" -d {} > *edited* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3011) Unable to create or access database with "/" in name when using futon
[ https://issues.apache.org/jira/browse/COUCHDB-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276602#comment-15276602 ] Dane Jones commented on COUCHDB-3011: - After working with the TurnKey Linux team the issue appears to be with the Nginx proxy causing issues. The full details are not yet hashed out but I'm confident that the issue is not with Futon. > Unable to create or access database with "/" in name when using futon > - > > Key: COUCHDB-3011 > URL: https://issues.apache.org/jira/browse/COUCHDB-3011 > Project: CouchDB > Issue Type: Question > Components: Futon, HTTP Interface >Affects Versions: 1.6.1 >Reporter: Dane Jones > Labels: database, forwardslash, no_db_file > > First of all, this CouchDB v1.6.1 appliance was acquired from TurnKeyLinux.com > I'm attempting to create a database named "artist/guid". > With futon I'm using the UI to with name "artist/guid" returns: "no_db_file" > With the HTTP API I used: > curl -u admin -X PUT http://127.0.0.1:5984/artist%2Fguid/ > returns successfully. > Futon lists the DB but upon accessing it I retrieve the "no_db_file" error > again. > If I create a document within the database using the HTTP API is seams to > work fine. > curl -u admin -X POST http://127.0.0.1:5984/artist%2Fguid/ -H "Content-Type: > application/json" -d {} > *edited* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3009) Cluster node databases unreadable when first node in cluster is down
[ https://issues.apache.org/jira/browse/COUCHDB-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276415#comment-15276415 ] Jason Gordon commented on COUCHDB-3009: --- Thanks for the pointer! You're right. What seems to be happening is that the script /dev/run puts the shards on one less (n-1). When ./dev/run -n 2 shards are only placed on node1. When ./dev/run (default n=3), shards get placed on node1 and node2. And ./dev/run -n 4 shards get place on node1, node2, and node3. BTW, tried the same thing on a MAC instead of CentOS and did not have this issue. This is the scenario where two nodes are running a single CentOS machine. With all nodes up and everything running fine (./dev/run -n 2 -a admin:x): curl -X GET "http://198.72.252.244:15984/_membership; --user admin:x {"all_nodes":["node1@198.72.252.244","node2@198.72.252.244"],"cluster_nodes":["node1@198.72.252.244","node2@198.72.252.244"]} curl -X GET "http://198.72.252.244:25984/_membership; --user admin:x {"all_nodes":["node1@198.72.252.244","node2@198.72.252.244"],"cluster_nodes":["node1@198.72.252.244","node2@198.72.252.244"]} curl -X GET "http://198.72.252.244:15984/_users; --user admin:x {"db_name":"_users","update_seq":"1-g1FreJzLYWBg4MhgTmEQyctPSTV0MLS00DM01DMyNdIzMjHJAcoyJTIkyf___z8rkQG_uiQFIJlkT5RSB5DSeLBSRgJKE0BK64kxNY8FSDI0ACmg6vlEKl8AUb6fSOUHIMrvE6n8AUQ5yO1ZAL6wXv8","sizes":{"file":38110,"external":2003,"active":2199},"purge_seq":0,"other":{"data_size":2003},"doc_del_count":0,"doc_count":1,"disk_size":38110,"disk_format_version":6,"data_size":2199,"compact_running":false,"instance_start_time":"0"} curl -X GET "http://198.72.252.244:25984/_users; --user admin:x {"db_name":"_users","update_seq":"1-g1FreJzLYWBg4MhgTmEQyctPSTV0MLS00DM01DMyNdIzMjHJAcoyJTIkyf___z8rkQG_uiQFIJlkT5RSB5DSeLBSRgJKE0BK64kxNY8FSDI0ACmg6vlEKl8AUb6fSOUHIMrvE6n8AUQ5yO1ZAL6wXv8","sizes":{"file":38110,"external":2003,"active":2199},"purge_seq":0,"other":{"data_size":2003},"doc_del_count":0,"doc_count":1,"disk_size":38110,"disk_format_version":6,"data_size":2199,"compact_running":false,"instance_start_time":"0"} curl -X GET "http://198.72.252.244:15984/_users/_shards; --user admin:x {"shards":{"-1fff":["node1@198.72.252.244"],"2000-3fff":["node1@198.72.252.244"],"4000-5fff":["node1@198.72.252.244"],"6000-7fff":["node1@198.72.252.244"],"8000-9fff":["node1@198.72.252.244"],"a000-bfff":["node1@198.72.252.244"],"c000-dfff":["node1@198.72.252.244"],"e000-":["node1@198.72.252.244"]}} curl -X GET "http://198.72.252.244:25984/_users/_shards; --user admin:x {"shards":{"-1fff":["node1@198.72.252.244"],"2000-3fff":["node1@198.72.252.244"],"4000-5fff":["node1@198.72.252.244"],"6000-7fff":["node1@198.72.252.244"],"8000-9fff":["node1@198.72.252.244"],"a000-bfff":["node1@198.72.252.244"],"c000-dfff":["node1@198.72.252.244"],"e000-":["node1@198.72.252.244"]}} > Cluster node databases unreadable when first node in cluster is down > > > Key: COUCHDB-3009 > URL: https://issues.apache.org/jira/browse/COUCHDB-3009 > Project: CouchDB > Issue Type: Bug > Components: BigCouch, Database Core >Affects Versions: 2.0.0 >Reporter: Jason Gordon > > After creating 3 nodes in a cluster. If the first node is taken down, the > other two nodes' default databases (_global_changes,_metadata, _replicator, > _users ) become unreadable with the error 500 > {"error":"nodedown","reason":"progress not possible"}. > Bringing up the first node, restores access. However if the first node is > down, restarting nodes 2 and 3 does not restore access and also causes the > user databases to become unreachable. > Note, only the first node created in the cluster causes this problem. As > long as node 1 is up, nodes 2 and 3 can go up and down without having an > issue. > Log messages seen on nodes 2 and 3: > 15:23:46.388 [notice] cassim_metadata_cache changes listener died > {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]} > 15:23:46.388 [error] Error in process <0.27407.0> on node > 'couchdb@198.72.252.245' with exit value: > {{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,190}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]} > 15:23:46.389 [notice]
[jira] [Created] (COUCHDB-3012) HTTPS / SSL certificate from letsencrypt do not work with couchDB but with apache webserver
Lutz Hohle created COUCHDB-3012: --- Summary: HTTPS / SSL certificate from letsencrypt do not work with couchDB but with apache webserver Key: COUCHDB-3012 URL: https://issues.apache.org/jira/browse/COUCHDB-3012 Project: CouchDB Issue Type: Bug Components: HTTP Interface Reporter: Lutz Hohle I have installed couchdb on my vserver, which says: {"couchdb":"Welcome","uuid":"6f37da04e99f90f8f2ba3b8165202922","version":"1.6.1","vendor":{"name":"Ubuntu","version":"12.04"}} I wanted to add SSL. So i have used https://letsencrypt.org/ got get to get free & automated HTTPS certificates which are authority signed. These certificates work well -tested with chrome- with apache webserver. I'v spend the hole night, but unfortunately I have not got couchDB to work with. Chrome says: ERR_CONNECTION_RESET when I try wo connect using https://mydomain.de:6984 The problem is still described here with details by an other user: http://serverfault.com/questions/743452/configure-couchdb-with-lets-encrypt-ssl-certificate When i doing a https request I get this log: [Mon, 09 May 2016 10:54:04 GMT] [error] [<0.178.0>] {error_report,<0.62.0>, {<0.178.0>,std_error, [83,83,76,58,32,"1095",58,32,"error",58, [123, ["try_clause",44, [123,["error",44,"eacces"],125]], 125], 32, "/etc/letsencrypt/live/www.digiscales.de/cert.pem", "\n",32,32, [91, [[123, ["ssl_manager",44,"cache_pem_file",44, "2"], 125], 44,10," ", [123, ["ssl_certificate",44, "file_to_certificats",44,"2"], 125], 44,10," ", [123, ["ssl_connection",44, "init_certificates",44,"6"], 125], 44,10," ", [123, ["ssl_connection",44,"ssl_init",44,"2"], 125], 44,10," ", [123, ["ssl_connection",44,"init",44,"1"], 125], 44,10," ", [123, ["gen_fsm",44,"init_it",44,"6"], 125], 44,10," ", [123, ["proc_lib",44,"init_p_do_apply",44, "3"], 125]], 93], "\n"]}} [Mon, 09 May 2016 10:54:04 GMT] [error] [<0.178.0>] {error_report,<0.62.0>, {<0.178.0>,crash_report, [[{initial_call, {ssl_connection,init,['Argument__1']}}, {pid,<0.178.0>}, {registered_name,[]}, {error_info, {exit,ecertfile, [{gen_fsm,init_it,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[ssl_connection_sup,ssl_sup,<0.63.0>]}, {messages,[]}, {links,[<0.67.0>]}, {dictionary,[]}, {trap_exit,false}, {status,running}, {heap_size,987}, {stack_size,24}, {reductions,1488}], []]}} [Mon, 09 May 2016 10:54:04 GMT] [error] [<0.153.0>] {error_report,<0.31.0>, {<0.153.0>,std_error, [{application,mochiweb}, "Accept failed error", "{error,ecertfile}"]}} [Mon, 09 May 2016 10:54:04 GMT] [error] [<0.153.0>] {error_report,<0.31.0>,