[jira] [Closed] (COUCHDB-3245) couchjs -S option doesn't have any effect
[ https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-3245. > couchjs -S option doesn't have any effect > - > > Key: COUCHDB-3245 > URL: https://issues.apache.org/jira/browse/COUCHDB-3245 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > currently -S option of couchjs sets stack _chunk_ size for js contexts > Reference: to > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext > Documentation recommends 8K and I have seen cases where it was raised to 1G+ > in production!. That doesn't seem right at all and also probably kills > performance and eats memory. > Docs from above say: > > The stackchunksize parameter does not control the JavaScript stack size. > > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing > > a large number for stackchunksize is a mistake. In a DEBUG build, large > > chunk sizes can degrade performance dramatically. The usual value of 8192 > > is recommended > Instead we should be setting the max gc value which is set in the runtime > {{JS_NewRuntime(uint32_t maxbytes)}} > Experimentally a large maxbytes seems to fix out of memory error caused by > large views. I suspect that it works because it stops GC. At some point we > probably drops some object, GC collects them and we crash... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (COUCHDB-3245) couchjs -S option doesn't have any effect
[ https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3245. -- Resolution: Fixed > couchjs -S option doesn't have any effect > - > > Key: COUCHDB-3245 > URL: https://issues.apache.org/jira/browse/COUCHDB-3245 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > currently -S option of couchjs sets stack _chunk_ size for js contexts > Reference: to > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext > Documentation recommends 8K and I have seen cases where it was raised to 1G+ > in production!. That doesn't seem right at all and also probably kills > performance and eats memory. > Docs from above say: > > The stackchunksize parameter does not control the JavaScript stack size. > > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing > > a large number for stackchunksize is a mistake. In a DEBUG build, large > > chunk sizes can degrade performance dramatically. The usual value of 8192 > > is recommended > Instead we should be setting the max gc value which is set in the runtime > {{JS_NewRuntime(uint32_t maxbytes)}} > Experimentally a large maxbytes seems to fix out of memory error caused by > large views. I suspect that it works because it stops GC. At some point we > probably drops some object, GC collects them and we crash... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (COUCHDB-3245) couchjs -S option doesn't have any effect
[ https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085228#comment-16085228 ] Nick Vatamaniuc commented on COUCHDB-3245: -- @gdelfino the issues was fixed just forgot to close the ticket. > couchjs -S option doesn't have any effect > - > > Key: COUCHDB-3245 > URL: https://issues.apache.org/jira/browse/COUCHDB-3245 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > currently -S option of couchjs sets stack _chunk_ size for js contexts > Reference: to > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext > Documentation recommends 8K and I have seen cases where it was raised to 1G+ > in production!. That doesn't seem right at all and also probably kills > performance and eats memory. > Docs from above say: > > The stackchunksize parameter does not control the JavaScript stack size. > > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing > > a large number for stackchunksize is a mistake. In a DEBUG build, large > > chunk sizes can degrade performance dramatically. The usual value of 8192 > > is recommended > Instead we should be setting the max gc value which is set in the runtime > {{JS_NewRuntime(uint32_t maxbytes)}} > Experimentally a large maxbytes seems to fix out of memory error caused by > large views. I suspect that it works because it stops GC. At some point we > probably drops some object, GC collects them and we crash... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (COUCHDB-3404) Improve ./dev/run command to allow overriding config values from a file
Nick Vatamaniuc created COUCHDB-3404: Summary: Improve ./dev/run command to allow overriding config values from a file Key: COUCHDB-3404 URL: https://issues.apache.org/jira/browse/COUCHDB-3404 Project: CouchDB Issue Type: Improvement Reporter: Nick Vatamaniuc Allow passing a config file path to ./dev/run and have those values be applied to the running dev cluster instance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3389) Bring back jittered delay during replication shard scan
Nick Vatamaniuc created COUCHDB-3389: Summary: Bring back jittered delay during replication shard scan Key: COUCHDB-3389 URL: https://issues.apache.org/jira/browse/COUCHDB-3389 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc When we switched to using mem3 db for shard discovery we dropped jittered delay during shard scan. On a large production system with thousands of replicator dbs, back to back shard notification, which spawn change feeds can cause performance issues. https://github.com/apache/couchdb/blob/884cf3e55f77ab1a5f26dc7202ce21771062eae6/src/couch_replicator_manager.erl#L940-L946 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3386) Use plugin-based authentication for transient replication cancelation
Nick Vatamaniuc created COUCHDB-3386: Summary: Use plugin-based authentication for transient replication cancelation Key: COUCHDB-3386 URL: https://issues.apache.org/jira/browse/COUCHDB-3386 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc Currently there is a direct check for <<"_admin">> in roles. Instead for consistency use https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db.erl#L434 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3371) Investigate putting user specified replication filtering function in replication document
Nick Vatamaniuc created COUCHDB-3371: Summary: Investigate putting user specified replication filtering function in replication document Key: COUCHDB-3371 URL: https://issues.apache.org/jira/browse/COUCHDB-3371 Project: CouchDB Issue Type: Improvement Reporter: Nick Vatamaniuc Investigate letting users specify the filter function in the replication document. There are two main reasons for it: 1) Because user specified filters live on the source and filter code contents is used to generated replication IDs. In order to even create a replication it is necessary to do a remote network fetch. This also implies having to handle retries and temporary failures an area of code were this would otherwise not be needed. 2) If filtering code is provided in the replication document, replication ID calculation and tracking of changes to replication ID when filter is updated become trivial. Not it ranges from broken to awkwardly complicated -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3327) Improve CouchDB's LRU
Nick Vatamaniuc created COUCHDB-3327: Summary: Improve CouchDB's LRU Key: COUCHDB-3327 URL: https://issues.apache.org/jira/browse/COUCHDB-3327 Project: CouchDB Issue Type: Task Reporter: Nick Vatamaniuc Since we recently started to put all dbs into the LRU. Try to improve it a bit to make it more performant. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3324) Scheduling Replicator
[ https://issues.apache.org/jira/browse/COUCHDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925398#comment-15925398 ] Nick Vatamaniuc commented on COUCHDB-3324: -- Fauxton PR https://github.com/apache/couchdb-fauxton/pull/864 > Scheduling Replicator > - > > Key: COUCHDB-3324 > URL: https://issues.apache.org/jira/browse/COUCHDB-3324 > Project: CouchDB > Issue Type: New Feature >Reporter: Nick Vatamaniuc > > Merge scheduling replicator -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3324) Scheduling Replicatorr
Nick Vatamaniuc created COUCHDB-3324: Summary: Scheduling Replicatorr Key: COUCHDB-3324 URL: https://issues.apache.org/jira/browse/COUCHDB-3324 Project: CouchDB Issue Type: New Feature Reporter: Nick Vatamaniuc Merge scheduling replicator -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3323) Idle dbs cause excessive overhead
Nick Vatamaniuc created COUCHDB-3323: Summary: Idle dbs cause excessive overhead Key: COUCHDB-3323 URL: https://issues.apache.org/jira/browse/COUCHDB-3323 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc Idle dbs, especially sys_dbs like _replicator shards once opened once for scanning would stay open forever. In a large cluster with many _replicator shards that can add up to a significant overhead, mostly in terms of number of active processes. Add a mechanism to close dbs which are idle. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3308) Upgrade snappy to 1.1.4
Nick Vatamaniuc created COUCHDB-3308: Summary: Upgrade snappy to 1.1.4 Key: COUCHDB-3308 URL: https://issues.apache.org/jira/browse/COUCHDB-3308 Project: CouchDB Issue Type: Improvement Reporter: Nick Vatamaniuc They claim a 20% decompression and 5% compression speed improvement. https://github.com/google/snappy/commit/2d99bd14d471664758e4dfdf81b44f413a7353fd -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections
[ https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876250#comment-15876250 ] Nick Vatamaniuc commented on COUCHDB-3302: -- fabric_doc_attachments used when PUT-ing individual attachments, I was looking at a doc PUT with attachments in mulitpart-related format > Attachment replication over low bandwidth network connections > - > > Key: COUCHDB-3302 > URL: https://issues.apache.org/jira/browse/COUCHDB-3302 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Jan Lehnardt > Attachments: attach_large.py, replication-failure.log, > replication-failure-target.log > > > Setup: > Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit > network connection (simulated locally with traffic shaping, see way below for > an example). > {noformat} > git clone https://github.com/apache/couchdb.git > cd couchdb > ./configure --disable-docs --disable-fauxton > make release > cd .. > cp -r couchdb/rel/couchdb source > cp -r couchdb/rel/couchdb target > # set up local ini: chttpd / port: 5981 / 5983 > # set up vm.args: source@hostname.local / target@hostname.local > # no admins > Start both CouchDB in their own terminal windows: ./bin/couchdb > # create all required databases, and our `t` test database > curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t} > # create 64MB attachments > dd if=/dev/urandom of=att-64 bs=1024 count=65536 > # create doc on source > curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: > application/octet-stream' -d @att-64 > # replicate to target > curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json > -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}' > {noformat} > With the traffic shaping in place, the replication call doesn’t return, and > eventually CouchDB fails with: > {noformat} > [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator > Error in process <0.15811.0> on node 'source@hostname.local' with exit value: > {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]} > [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0> > Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; > failed due to error {error, > {'EXIT', > {{{nocatch,{mp_parser_died,noproc}}, > [{couch_att,'-foldl/4-fun-0-',3, >[{file,"src/couch_att.erl"},{line,591}]}, >{couch_att,fold_streamed_data,4, >[{file,"src/couch_att.erl"},{line,642}]}, >{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]}, >{couch_httpd_multipart,atts_to_mp,4, >[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}, > {gen_server,call, > [<0.15778.0>, > {send_req, > {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;, >"127.0.0.1",5983,undefined,undefined, >"/t/doc1?new_edits=false",http,ipv4_address}, >[{"Accept","application/json"}, > {"Content-Length",33194202}, > {"Content-Type", > "multipart/related; > boundary=\"0dea87076009b928b191e0b456375c93\""}, > {"User-Agent","CouchDB-Replicator/2.0.0"}], >put, >{#Fun, > > {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>, > [{att,<<"att_64">>,<<"application/octet-stream">>, > 33193656,33193656, > <<179,112,0,209,198,47,192,236,235,72,84,218,0,177, > 161,242>>, > 1, > {follows,<0.8720.0>,#Ref<0.0.1.23804>}, > identity}], > <<"0dea87076009b928b191e0b456375c93">>,33194202}}, >[{response_format,binary}, > {inactivity_timeout,3}, > {socket_options,[{keepalive,true},{nodelay,false}]}], >infinity}}, > infinity] > {noformat} > Expected Behaviour: >
[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections
[ https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875076#comment-15875076 ] Nick Vatamaniuc commented on COUCHDB-3302: -- Confirmed that setting fabric request_timeout to a higher value like 9 helps with this. At least {{./attach_large.py --size=10 --mintime=80}} Successfully finishes while it doesn't with the default value of 6 > Attachment replication over low bandwidth network connections > - > > Key: COUCHDB-3302 > URL: https://issues.apache.org/jira/browse/COUCHDB-3302 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Jan Lehnardt > Attachments: attach_large.py, replication-failure.log, > replication-failure-target.log > > > Setup: > Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit > network connection (simulated locally with traffic shaping, see way below for > an example). > {noformat} > git clone https://github.com/apache/couchdb.git > cd couchdb > ./configure --disable-docs --disable-fauxton > make release > cd .. > cp -r couchdb/rel/couchdb source > cp -r couchdb/rel/couchdb target > # set up local ini: chttpd / port: 5981 / 5983 > # set up vm.args: source@hostname.local / target@hostname.local > # no admins > Start both CouchDB in their own terminal windows: ./bin/couchdb > # create all required databases, and our `t` test database > curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t} > # create 64MB attachments > dd if=/dev/urandom of=att-64 bs=1024 count=65536 > # create doc on source > curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: > application/octet-stream' -d @att-64 > # replicate to target > curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json > -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}' > {noformat} > With the traffic shaping in place, the replication call doesn’t return, and > eventually CouchDB fails with: > {noformat} > [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator > Error in process <0.15811.0> on node 'source@hostname.local' with exit value: > {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]} > [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0> > Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; > failed due to error {error, > {'EXIT', > {{{nocatch,{mp_parser_died,noproc}}, > [{couch_att,'-foldl/4-fun-0-',3, >[{file,"src/couch_att.erl"},{line,591}]}, >{couch_att,fold_streamed_data,4, >[{file,"src/couch_att.erl"},{line,642}]}, >{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]}, >{couch_httpd_multipart,atts_to_mp,4, >[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}, > {gen_server,call, > [<0.15778.0>, > {send_req, > {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;, >"127.0.0.1",5983,undefined,undefined, >"/t/doc1?new_edits=false",http,ipv4_address}, >[{"Accept","application/json"}, > {"Content-Length",33194202}, > {"Content-Type", > "multipart/related; > boundary=\"0dea87076009b928b191e0b456375c93\""}, > {"User-Agent","CouchDB-Replicator/2.0.0"}], >put, >{#Fun, > > {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>, > [{att,<<"att_64">>,<<"application/octet-stream">>, > 33193656,33193656, > <<179,112,0,209,198,47,192,236,235,72,84,218,0,177, > 161,242>>, > 1, > {follows,<0.8720.0>,#Ref<0.0.1.23804>}, > identity}], > <<"0dea87076009b928b191e0b456375c93">>,33194202}}, >[{response_format,binary}, > {inactivity_timeout,3}, > {socket_options,[{keepalive,true},{nodelay,false}]}], >
[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections
[ https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875031#comment-15875031 ] Nick Vatamaniuc commented on COUCHDB-3302: -- >From investigating it seems to be related to how long it takes for the >request to complete. I created a "paced" python multi-part sender which PUTs an attachment over period of time. It splits it into chunks then sends those with sleep in between. Attached script as attach_large.py. Can run it with {[./attach_large.py --size=10 --mintime=80}} that will put an attachment of size 10 bytes over at least 80 seconds. With that code I was able to get a 500 error and I get: {code} HTTP/1.1 500 Internal Server Error Cache-Control: must-revalidate Content-Length: 47 Content-Type: application/json Date: Mon, 20 Feb 2017 19:27:30 GMT Server: CouchDB/2.0.0 (Erlang OTP/18) X-Couch-Request-ID: 80a6cfd301 X-CouchDB-Body-Time: 0 {"error":"unknown_error","reason":"undefined"} {code} > Attachment replication over low bandwidth network connections > - > > Key: COUCHDB-3302 > URL: https://issues.apache.org/jira/browse/COUCHDB-3302 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Jan Lehnardt > Attachments: attach_large.py, replication-failure.log, > replication-failure-target.log > > > Setup: > Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit > network connection (simulated locally with traffic shaping, see way below for > an example). > {noformat} > git clone https://github.com/apache/couchdb.git > cd couchdb > ./configure --disable-docs --disable-fauxton > make release > cd .. > cp -r couchdb/rel/couchdb source > cp -r couchdb/rel/couchdb target > # set up local ini: chttpd / port: 5981 / 5983 > # set up vm.args: source@hostname.local / target@hostname.local > # no admins > Start both CouchDB in their own terminal windows: ./bin/couchdb > # create all required databases, and our `t` test database > curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t} > # create 64MB attachments > dd if=/dev/urandom of=att-64 bs=1024 count=65536 > # create doc on source > curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: > application/octet-stream' -d @att-64 > # replicate to target > curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json > -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}' > {noformat} > With the traffic shaping in place, the replication call doesn’t return, and > eventually CouchDB fails with: > {noformat} > [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator > Error in process <0.15811.0> on node 'source@hostname.local' with exit value: > {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]} > [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0> > Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; > failed due to error {error, > {'EXIT', > {{{nocatch,{mp_parser_died,noproc}}, > [{couch_att,'-foldl/4-fun-0-',3, >[{file,"src/couch_att.erl"},{line,591}]}, >{couch_att,fold_streamed_data,4, >[{file,"src/couch_att.erl"},{line,642}]}, >{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]}, >{couch_httpd_multipart,atts_to_mp,4, >[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}, > {gen_server,call, > [<0.15778.0>, > {send_req, > {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;, >"127.0.0.1",5983,undefined,undefined, >"/t/doc1?new_edits=false",http,ipv4_address}, >[{"Accept","application/json"}, > {"Content-Length",33194202}, > {"Content-Type", > "multipart/related; > boundary=\"0dea87076009b928b191e0b456375c93\""}, > {"User-Agent","CouchDB-Replicator/2.0.0"}], >put, >{#Fun, > > {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>, >
[jira] [Created] (COUCHDB-3293) Configure maximum document ID length
Nick Vatamaniuc created COUCHDB-3293: Summary: Configure maximum document ID length Key: COUCHDB-3293 URL: https://issues.apache.org/jira/browse/COUCHDB-3293 Project: CouchDB Issue Type: New Feature Reporter: Nick Vatamaniuc Allow users / operators to specify maximum document ID length. Currently it is easy to break CouchDB by feeding it large IDs through _bulk_docs endpoint but which will hit the limits of http parser if sent through GET/PUT/DELETE methods. In case those limits are hit the error returned is not obvious as the requests would often crash in the mochiweb http parser step before a request even makes to CouchDB code. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (COUCHDB-3291) Excessivly long document IDs prevent replicator from making progress
Nick Vatamaniuc created COUCHDB-3291: Summary: Excessivly long document IDs prevent replicator from making progress Key: COUCHDB-3291 URL: https://issues.apache.org/jira/browse/COUCHDB-3291 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc Currently there is not protection in couchdb from creating IDs which are too long. So large IDs will hit various implicit limits which usually results in unpredictable failure modes. On such example implicit limit is hit in the replicator code. Replicate usually fetches document IDs in a bulk-like call either gets them via changes feed, computes revs_diffs in a post or inserts them with bulk_docs, except one case when it fetch open_revs. There it uses a single GET request. That requests fails because there is a bug / limitation in the http parser. The first GET line in the http request has to fit in the receive buffer for the receiving socket. Increasing that buffer allow passing through larger http requests lines. In configuration options it can be manipulated as {code} chttpd.server_options="[...,{recbuf, 32768},...]" {code} Steve Vinoski mentions something about a possible bug in http packet parser code as well: http://erlang.org/pipermail/erlang-questions/2011-June/059567.html Tracing this a bit I see that a proper mochiweb request is never even created and instead request hangs. So that confirms it further. It seems in the code here: https://github.com/apache/couchdb-mochiweb/blob/bd6ae7cbb371666a1f68115056f7b30d13765782/src/mochiweb_http.erl#L90 The timeout clause is hit. Adding a catchall exception I get the {tcp_error,#Port<0.40682>,emsgsize} message which we don't handle. Seems like a sane place to throw a 413 or such there. There are probably multiple ways to address the issue: * Increase mochiweb listener buffer to fit larger doc ids. However that is a separate bug and using it to control document size during replication is not reliable. Moreover that would allow larger IDs to propagate through the system during replication, then would have to configure all future replication source with the same maximum recbuf value. * Introduce a validation step in {code} couch_doc:validate_docid {code}. Currently that code doesn't read from config files and is in the hotpath. Added a config read in there might reduce performance. If that is enabled it would stop creating new documents with large ids. But have to decide how to handle already existing IDs which are larger than the limit. * Introduce a validation/bypass in the replicator. Specifically targeting replicator might help prevent propagation of large IDs during replication. There is a already a similar case of skipping writing large attachment or large documents (which exceed request size) and bumping {code} doc_write_failures {code}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (COUCHDB-3284) 8Kb read-ahead in couch_file causes extra IO and binary memory usage
[ https://issues.apache.org/jira/browse/COUCHDB-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3284. -- Resolution: Fixed Merged fix > 8Kb read-ahead in couch_file causes extra IO and binary memory usage > > > Key: COUCHDB-3284 > URL: https://issues.apache.org/jira/browse/COUCHDB-3284 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Affects Versions: 2.0.0 >Reporter: Nick Vatamaniuc > Attachments: jira_io_increased.png > > > 8Kb read-ahead logic in couch_file seems to cause extra input IO thrashing, > binary memory usage but doesn't speed-up -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (COUCHDB-3284) 8Kb read-ahead in couch_file causes extra IO and binary memory usage
[ https://issues.apache.org/jira/browse/COUCHDB-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843539#comment-15843539 ] Nick Vatamaniuc commented on COUCHDB-3284: -- Attached performance graphs showing increased IO usage and increased binary memory usage. Those go away after disabling 8kb read-ahead logic > 8Kb read-ahead in couch_file causes extra IO and binary memory usage > > > Key: COUCHDB-3284 > URL: https://issues.apache.org/jira/browse/COUCHDB-3284 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Affects Versions: 2.0.0 >Reporter: Nick Vatamaniuc > Attachments: jira_io_increased.png > > > 8Kb read-ahead logic in couch_file seems to cause extra input IO thrashing, > binary memory usage but doesn't speed-up -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3284) 8Kb read-ahead in couch_file causes extra IO and binary memory usage
Nick Vatamaniuc created COUCHDB-3284: Summary: 8Kb read-ahead in couch_file causes extra IO and binary memory usage Key: COUCHDB-3284 URL: https://issues.apache.org/jira/browse/COUCHDB-3284 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3271) Replications crash with 'kaboom' exit
[ https://issues.apache.org/jira/browse/COUCHDB-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3271. -- Resolution: Fixed > Replications crash with 'kaboom' exit > -- > > Key: COUCHDB-3271 > URL: https://issues.apache.org/jira/browse/COUCHDB-3271 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > In a few cases it was observer that replications were crashing with `kaboom` > exit. This happens here: > https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236 > this is during an open_revs call one of the docs. So change feed found it but > then could not get its revisions. > The reason is open_revs get request returns an empty result when more than > one nodes are in maintenance mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3271) Replications crash with 'kaboom' exit
[ https://issues.apache.org/jira/browse/COUCHDB-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-3271. > Replications crash with 'kaboom' exit > -- > > Key: COUCHDB-3271 > URL: https://issues.apache.org/jira/browse/COUCHDB-3271 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > In a few cases it was observer that replications were crashing with `kaboom` > exit. This happens here: > https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236 > this is during an open_revs call one of the docs. So change feed found it but > then could not get its revisions. > The reason is open_revs get request returns an empty result when more than > one nodes are in maintenance mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3271) Replications crash with 'kaboom' exit
[ https://issues.apache.org/jira/browse/COUCHDB-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822100#comment-15822100 ] Nick Vatamaniuc commented on COUCHDB-3271: -- https://github.com/apache/couchdb-fabric/pull/84 > Replications crash with 'kaboom' exit > -- > > Key: COUCHDB-3271 > URL: https://issues.apache.org/jira/browse/COUCHDB-3271 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > In a few cases it was observer that replications were crashing with `kaboom` > exit. This happens here: > https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236 > this is during an open_revs call one of the docs. So change feed found it but > then could not get its revisions. > The reason is open_revs get request returns an empty result when more than > one nodes are in maintenance mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3271) Replications crash with 'kaboom' exit
Nick Vatamaniuc created COUCHDB-3271: Summary: Replications crash with 'kaboom' exit Key: COUCHDB-3271 URL: https://issues.apache.org/jira/browse/COUCHDB-3271 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc In a few cases it was observer that replications were crashing with `kaboom` exit. This happens here: https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236 this is during an open_revs call one of the docs. So change feed found it but then could not get its revisions. The reason is open_revs get request returns an empty result when more than one nodes are in maintenance mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3267) Don't exit on timeout callback in cassim fabric:changes feed
[ https://issues.apache.org/jira/browse/COUCHDB-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-3267. > Don't exit on timeout callback in cassim fabric:changes feed > > > Key: COUCHDB-3267 > URL: https://issues.apache.org/jira/browse/COUCHDB-3267 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > cassim metadata changes feed uses a continuous changes feed with heartbeats. > Don't exit on timeout and restart after 5 seconds instead continue receiving > changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3267) Don't exit on timeout callback in cassim fabric:changes feed
[ https://issues.apache.org/jira/browse/COUCHDB-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3267. -- Resolution: Fixed > Don't exit on timeout callback in cassim fabric:changes feed > > > Key: COUCHDB-3267 > URL: https://issues.apache.org/jira/browse/COUCHDB-3267 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > cassim metadata changes feed uses a continuous changes feed with heartbeats. > Don't exit on timeout and restart after 5 seconds instead continue receiving > changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3266) changes feed not invoked when deleting a document using a selector filtered feed.
[ https://issues.apache.org/jira/browse/COUCHDB-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798696#comment-15798696 ] Nick Vatamaniuc commented on COUCHDB-3266: -- Thanks for checking out the new selector feature! Deleting a document via the DELETE method is an equivalent operation PUT-ing {"_deleted":true} document a a new revision. It seems to achieve the desired effect, it is also possible to delete a document and keep all the original fields by just adding a "_deleted:true field and PUT-ing that. For example {"type":"message", "subtype":"email", ..., "_deleted":true} then the document will pass the filter. > changes feed not invoked when deleting a document using a selector filtered > feed. > - > > Key: COUCHDB-3266 > URL: https://issues.apache.org/jira/browse/COUCHDB-3266 > Project: CouchDB > Issue Type: Bug > Components: Database Core >Reporter: Steven Spungin > > When I subscribe to the _changes endpoint with a selector filter, I get > updated and created changes, but not deleted changes. > But when I subscribe without a selector, I get all the changes as expected. > Here is my posted selector: > {"selector": {"type": "message", "subtype": "email"} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3267) Don't exit on timeout callback in cassim fabric:changes feed
Nick Vatamaniuc created COUCHDB-3267: Summary: Don't exit on timeout callback in cassim fabric:changes feed Key: COUCHDB-3267 URL: https://issues.apache.org/jira/browse/COUCHDB-3267 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc cassim metadata changes feed uses a continuous changes feed with heartbeats. Don't exit on timeout and restart after 5 seconds instead continue receiving changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3245) couchjs -S option doesn't have any effect
[ https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707631#comment-15707631 ] Nick Vatamaniuc commented on COUCHDB-3245: -- Interesting! Looks like it was fixed before: https://issues.apache.org/jira/browse/COUCHDB-1792 > couchjs -S option doesn't have any effect > - > > Key: COUCHDB-3245 > URL: https://issues.apache.org/jira/browse/COUCHDB-3245 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > currently -S option of couchjs sets stack _chunk_ size for js contexts > Reference: to > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext > Documentation recommends 8K and I have seen cases where it was raised to 1G+ > in production!. That doesn't seem right at all and also probably kills > performance and eats memory. > Docs from above say: > > The stackchunksize parameter does not control the JavaScript stack size. > > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing > > a large number for stackchunksize is a mistake. In a DEBUG build, large > > chunk sizes can degrade performance dramatically. The usual value of 8192 > > is recommended > Instead we should be setting the max gc value which is set in the runtime > {{JS_NewRuntime(uint32_t maxbytes)}} > It seems that acts similarly to a max heap used (from what I understand). > Which makes more sense. A stack size of hundreds of megabytes doesn't sound > right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3245) couchjs -S option doesn't have any effect
Nick Vatamaniuc created COUCHDB-3245: Summary: couchjs -S option doesn't have any effect Key: COUCHDB-3245 URL: https://issues.apache.org/jira/browse/COUCHDB-3245 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc currently -S option of couchjs sets stack _chunk_ size for js contexts Reference: to https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext Documentation recommends 8K and I have seen cases where it was raised to 1G+ in production!. That doesn't seem right at all and also probably kills performance and eats memory. Docs from above say: > The stackchunksize parameter does not control the JavaScript stack size. (The > JSAPI does not provide a way to adjust the stack depth limit.) Passing a > large number for stackchunksize is a mistake. In a DEBUG build, large chunk > sizes can degrade performance dramatically. The usual value of 8192 is > recommended Instead we should be setting the max gc value which is set in the runtime {{JS_NewRuntime(uint32_t maxbytes)}} It seems that acts similarly to a max heap used (from what I understand). Which makes more sense. A stack size of hundreds of megabytes doesn't sound right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3242) Make get view group info timeout in couch_indexer configurable
Nick Vatamaniuc created COUCHDB-3242: Summary: Make get view group info timeout in couch_indexer configurable Key: COUCHDB-3242 URL: https://issues.apache.org/jira/browse/COUCHDB-3242 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc Some busy views will take longer than the default 5 seconds to return. https://github.com/cloudant/couchdb-couch-index/blob/master/src/couch_index.erl#L57-L58 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3199) Replicator VDU function doesn't acount for an already malformed document in replicator db
[ https://issues.apache.org/jira/browse/COUCHDB-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3199. -- Resolution: Fixed > Replicator VDU function doesn't acount for an already malformed document in > replicator db > - > > Key: COUCHDB-3199 > URL: https://issues.apache.org/jira/browse/COUCHDB-3199 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > In case when code is updated from an older version of couchdb which didn't > have (or had a less restrictive) VDU function. A malformed document could > have ended up in the _replicator database. > Replicator will try to parse it and flag it as an error then try to update > the document. However the more restrictive VDU function will cause the > document update to crash the replicator manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3199) Replicator VDU function doesn't acount for an already malformed document in replicator db
Nick Vatamaniuc created COUCHDB-3199: Summary: Replicator VDU function doesn't acount for an already malformed document in replicator db Key: COUCHDB-3199 URL: https://issues.apache.org/jira/browse/COUCHDB-3199 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc In case when code is updated from an older version of couchdb which didn't have (or had a less restrictive) VDU function. A malformed document could have ended up in the _replicator database. Replicator will try to parse it and flag it as an error then try to update the document. However the more restrictive VDU function will cause the document update to crash the replicator manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests
[ https://issues.apache.org/jira/browse/COUCHDB-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3174. -- Resolution: Fixed > max_document_size setting can by bypassed by issuing multipart/related > requests > --- > > Key: COUCHDB-3174 > URL: https://issues.apache.org/jira/browse/COUCHDB-3174 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > Attachments: attach_large.py > > > Testing how replicator handled small values of max_document_size parameter, > discovered if user issues PUT requests which are multipart/related, then > max_document_size setting is bypassed. > Wireshark capture of a PUT with attachments request coming from replicator in > a EUnit test I wrote. max_document_size was set to 1 yet a 70k byte > document with a 70k byte attachment was created. > {code} > PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1 > Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3" > Content-Length: 140515 > Accept: application/json > --e5d21d5fd988dc1c6c6e8911030213b3 > Content-Type: application/json > {"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":" > ...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}} > --e5d21d5fd988dc1c6c6e8911030213b3 > Content-Disposition: attachment; filename="att1" > Content-Type: app/binary > Content-Length: 7 > xx > --e5d21d5fd988dc1c6c6e8911030213b3-- > HTTP/1.1 201 Created > {code} > Here is a regular request which works as expected: > {code} > PUT /dbl/dl2 HTTP/1.1 > Content-Length: 100026 > Content-Type: application/json > Accept: application/json > {"_id": "dl2", "size": "...xxx"} > HTTP/1.1 413 Request Entity Too Large > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests
[ https://issues.apache.org/jira/browse/COUCHDB-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1211#comment-1211 ] Nick Vatamaniuc commented on COUCHDB-3174: -- The issue is not 100% fixed, but it should help against accidental cases > max_document_size setting can by bypassed by issuing multipart/related > requests > --- > > Key: COUCHDB-3174 > URL: https://issues.apache.org/jira/browse/COUCHDB-3174 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > Attachments: attach_large.py > > > Testing how replicator handled small values of max_document_size parameter, > discovered if user issues PUT requests which are multipart/related, then > max_document_size setting is bypassed. > Wireshark capture of a PUT with attachments request coming from replicator in > a EUnit test I wrote. max_document_size was set to 1 yet a 70k byte > document with a 70k byte attachment was created. > {code} > PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1 > Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3" > Content-Length: 140515 > Accept: application/json > --e5d21d5fd988dc1c6c6e8911030213b3 > Content-Type: application/json > {"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":" > ...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}} > --e5d21d5fd988dc1c6c6e8911030213b3 > Content-Disposition: attachment; filename="att1" > Content-Type: app/binary > Content-Length: 7 > xx > --e5d21d5fd988dc1c6c6e8911030213b3-- > HTTP/1.1 201 Created > {code} > Here is a regular request which works as expected: > {code} > PUT /dbl/dl2 HTTP/1.1 > Content-Length: 100026 > Content-Type: application/json > Accept: application/json > {"_id": "dl2", "size": "...xxx"} > HTTP/1.1 413 Request Entity Too Large > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (COUCHDB-3180) Add ability to return a list of features in server's welcome message
[ https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc updated COUCHDB-3180: - Comment: was deleted (was: From GH [~rnewson] mentioned: {quote} We've wanted a general feature discovery thing for a while now so let's come up with a way for code to register with this. The patch should include that mechanism even if nothing in couchdb calls it yet, and a test that shows that a registered feature shows up in the welcome message. {quote} Here is what I came up with on first try: * Don't add a new application, it would be silly. * Stick it in config application. It seems like a configuration-y thing. * API looks like - {{config:features() -> \[<<"feature1">>, <<"feature2">>, ...\].}} - {{config:feature_enable(<<"feature1">>).}} - {{config:feature_disable(<<"feature2">>).}} * Applications enable features and disable them. Then `chttpd` reads list of features from config and shows them in the welcome message. * Behind the scenes it is really just writing to config "\[features\]" section a bunch of booleans. With persistence set to `false`. * Users can directly set features in the config file if they want. Could be a something external to the CouchDB instance, maybe something about how code was compiled or where it is running that warrants a different treatment from the API standpoint. The advantage is it doesn't reinvent the world. Takes advantage of config server (so applications can monitor for changes and such if needed).) > Add ability to return a list of features in server's welcome message > > > Key: COUCHDB-3180 > URL: https://issues.apache.org/jira/browse/COUCHDB-3180 > Project: CouchDB > Issue Type: New Feature >Reporter: Nick Vatamaniuc > > This could be used to let users discover quickly the availability of some API > or modes of operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3180) Add ability to return a list of features in server's welcome message
[ https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550508#comment-15550508 ] Nick Vatamaniuc commented on COUCHDB-3180: -- >From GH [~rnewson] mentioned: {quote} We've wanted a general feature discovery thing for a while now so let's come up with a way for code to register with this. The patch should include that mechanism even if nothing in couchdb calls it yet, and a test that shows that a registered feature shows up in the welcome message. {quote} Here is what I came up with on first try: * Don't add a new application, it would be silly. * Stick it in config application. It seems like a configuration-y thing. * API looks like - {{config:features() -> \[<<"feature1">>, <<"feature2">>, ...\].}} - {{config:feature_enable(<<"feature1">>).}} - {{config:feature_disable(<<"feature2">>).}} * Applications enable features and disable them. Then `chttpd` reads list of features from config and shows them in the welcome message. * Behind the scenes it is really just writing to config "\[features\]" section a bunch of booleans. With persistence set to `false`. * Users can directly set features in the config file if they want. Could be a something external to the CouchDB instance, maybe something about how code was compiled or where it is running that warrants a different treatment from the API standpoint. The advantage is it doesn't reinvent the world. Takes advantage of config server (so applications can monitor for changes and such if needed). > Add ability to return a list of features in server's welcome message > > > Key: COUCHDB-3180 > URL: https://issues.apache.org/jira/browse/COUCHDB-3180 > Project: CouchDB > Issue Type: New Feature >Reporter: Nick Vatamaniuc > > This could be used to let users discover quickly the availability of some API > or modes of operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (COUCHDB-3180) Add ability to return a list of features in server's welcome message
[ https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc updated COUCHDB-3180: - Comment: was deleted (was: https://github.com/apache/couchdb-chttpd/pull/144) > Add ability to return a list of features in server's welcome message > > > Key: COUCHDB-3180 > URL: https://issues.apache.org/jira/browse/COUCHDB-3180 > Project: CouchDB > Issue Type: New Feature >Reporter: Nick Vatamaniuc > > This could be used to let users discover quickly the availability of some API > or modes of operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3180) Add ability to return a list of features in server's welcome message
[ https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549772#comment-15549772 ] Nick Vatamaniuc commented on COUCHDB-3180: -- https://github.com/apache/couchdb-chttpd/pull/144 > Add ability to return a list of features in server's welcome message > > > Key: COUCHDB-3180 > URL: https://issues.apache.org/jira/browse/COUCHDB-3180 > Project: CouchDB > Issue Type: New Feature >Reporter: Nick Vatamaniuc > > This could be used to let users discover quickly the availability of some API > or modes of operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3179) Add ability to return a list of features in server's welcome message
[ https://issues.apache.org/jira/browse/COUCHDB-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-3179. Resolution: Duplicate > Add ability to return a list of features in server's welcome message > > > Key: COUCHDB-3179 > URL: https://issues.apache.org/jira/browse/COUCHDB-3179 > Project: CouchDB > Issue Type: New Feature >Reporter: Nick Vatamaniuc > > This could be used to let users discover quickly the availability of some API > or modes of operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3180) Add ability to return a list of features in server's welcome message
Nick Vatamaniuc created COUCHDB-3180: Summary: Add ability to return a list of features in server's welcome message Key: COUCHDB-3180 URL: https://issues.apache.org/jira/browse/COUCHDB-3180 Project: CouchDB Issue Type: New Feature Reporter: Nick Vatamaniuc This could be used to let users discover quickly the availability of some API or modes of operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3179) Add ability to return a list of features in server's welcome message
Nick Vatamaniuc created COUCHDB-3179: Summary: Add ability to return a list of features in server's welcome message Key: COUCHDB-3179 URL: https://issues.apache.org/jira/browse/COUCHDB-3179 Project: CouchDB Issue Type: New Feature Reporter: Nick Vatamaniuc This could be used to let users discover quickly the availability of some API or modes of operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3168) Replicator doesn't handle well writing documents to a target db which has a small max_document_size
[ https://issues.apache.org/jira/browse/COUCHDB-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3168. -- Resolution: Fixed > Replicator doesn't handle well writing documents to a target db which has a > small max_document_size > --- > > Key: COUCHDB-3168 > URL: https://issues.apache.org/jira/browse/COUCHDB-3168 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > If a target db has set a smaller document max size, replication crashes. > It might make sense for the replication to not crash and instead treat > document size as an implicit replication filter then display doc write > failures in the stats / task info / completion record of normal replications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3175) When PUT-ing a multipart/related doc with attachment get a 500 error on md5 mismatch
Nick Vatamaniuc created COUCHDB-3175: Summary: When PUT-ing a multipart/related doc with attachment get a 500 error on md5 mismatch Key: COUCHDB-3175 URL: https://issues.apache.org/jira/browse/COUCHDB-3175 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc fabric_doc_updater handle_message crashes with a function_clause which crashes the whole request. Instead, perhaps is should handle: {code} {md5_mismatch, Blah}, _Worker, _Acc0) -> ... {code} and return a 4xx code... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests
[ https://issues.apache.org/jira/browse/COUCHDB-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544234#comment-15544234 ] Nick Vatamaniuc commented on COUCHDB-3174: -- The problem seems to be here: https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L763-L776 and here: https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L763-L776 We need to call the json_body bit in order to get the max document size which is passed to `MochiReq:recv_body(MaxSize)`. Presumably we could retrieve Content-Length ourselves before mp parsing and raise a 413, but I haven't thought about it too much yet... > max_document_size setting can by bypassed by issuing multipart/related > requests > --- > > Key: COUCHDB-3174 > URL: https://issues.apache.org/jira/browse/COUCHDB-3174 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > Testing how replicator handled small values of max_document_size parameter, > discovered if user issues PUT requests which are multipart/related, then > max_document_size setting is bypassed. > Wireshark capture of a PUT with attachments request coming from replicator in > a EUnit test I wrote. max_document_size was set to 1 yet a 70k byte > document with a 70k byte attachment was created. > {code} > PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1 > Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3" > Content-Length: 140515 > Accept: application/json > --e5d21d5fd988dc1c6c6e8911030213b3 > Content-Type: application/json > {"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":" > ...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}} > --e5d21d5fd988dc1c6c6e8911030213b3 > Content-Disposition: attachment; filename="att1" > Content-Type: app/binary > Content-Length: 7 > xx > --e5d21d5fd988dc1c6c6e8911030213b3-- > HTTP/1.1 201 Created > {code} > Here is a regular request which works as expected: > {code} > PUT /dbl/dl2 HTTP/1.1 > Content-Length: 100026 > Content-Type: application/json > Accept: application/json > {"_id": "dl2", "size": "...xxx"} > HTTP/1.1 413 Request Entity Too Large > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests
Nick Vatamaniuc created COUCHDB-3174: Summary: max_document_size setting can by bypassed by issuing multipart/related requests Key: COUCHDB-3174 URL: https://issues.apache.org/jira/browse/COUCHDB-3174 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc Testing how replicator handled small values of max_document_size parameter, discovered if user issues PUT requests which are multipart/related, then max_document_size setting is bypassed. Wireshark capture of a PUT with attachments request coming from replicator in a EUnit test I wrote. max_document_size was set to 1 yet a 70k byte document with a 70k byte attachment was created. {code} PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1 Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3" Content-Length: 140515 Accept: application/json --e5d21d5fd988dc1c6c6e8911030213b3 Content-Type: application/json {"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":" ...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}} --e5d21d5fd988dc1c6c6e8911030213b3 Content-Disposition: attachment; filename="att1" Content-Type: app/binary Content-Length: 7 xx --e5d21d5fd988dc1c6c6e8911030213b3-- HTTP/1.1 201 Created {code} Here is a regular request which works as expected: {code} PUT /dbl/dl2 HTTP/1.1 Content-Length: 100026 Content-Type: application/json Accept: application/json {"_id": "dl2", "size": "...xxx"} HTTP/1.1 413 Request Entity Too Large {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2992) Add additional support for document size
[ https://issues.apache.org/jira/browse/COUCHDB-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532980#comment-15532980 ] Nick Vatamaniuc commented on COUCHDB-2992: -- This is related to replications crashing unexpectedly. Users can add documents smaller than the limit. Then replicator batches them up to 500 at a time and then repeatedly crashes. https://issues.apache.org/jira/browse/COUCHDB-3168 > Add additional support for document size > > > Key: COUCHDB-2992 > URL: https://issues.apache.org/jira/browse/COUCHDB-2992 > Project: CouchDB > Issue Type: Improvement > Components: Database Core >Reporter: Tony Sun > > Currently, only max_document_size of 64 GB is the restriction for users > creating documents. Large documents often leads to issues with our indexers. > This feature will allow users more finer grain control over document size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size
[ https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc updated COUCHDB-3169: - Comment: was deleted (was: There is already an open ticket and associated pr related to this: https://issues.apache.org/jira/browse/COUCHDB-2992) > couchdb.max_document_size setting is actually max_http_request_size > --- > > Key: COUCHDB-3169 > URL: https://issues.apache.org/jira/browse/COUCHDB-3169 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > It turns out couchdb.max_document_size doesn't limit document size really, it > limits http request size. > For PUT document requests both are similar, but that is not the case for > _bulk_docs requests. For example if max_document_size is set to 1MB, and user > post: 10, 200KB dbs, their whole _bulk_docs will fail. > It would probably be useful to rename the setting to max_request_size and put > it in chttpd section. And then possibly implement a max_document_size as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size
[ https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3169. -- Resolution: Duplicate https://issues.apache.org/jira/browse/COUCHDB-2992 > couchdb.max_document_size setting is actually max_http_request_size > --- > > Key: COUCHDB-3169 > URL: https://issues.apache.org/jira/browse/COUCHDB-3169 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > It turns out couchdb.max_document_size doesn't limit document size really, it > limits http request size. > For PUT document requests both are similar, but that is not the case for > _bulk_docs requests. For example if max_document_size is set to 1MB, and user > post: 10, 200KB dbs, their whole _bulk_docs will fail. > It would probably be useful to rename the setting to max_request_size and put > it in chttpd section. And then possibly implement a max_document_size as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size
[ https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532971#comment-15532971 ] Nick Vatamaniuc commented on COUCHDB-3169: -- There is already an open ticket and associated pr related to this: https://issues.apache.org/jira/browse/COUCHDB-2992 > couchdb.max_document_size setting is actually max_http_request_size > --- > > Key: COUCHDB-3169 > URL: https://issues.apache.org/jira/browse/COUCHDB-3169 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > It turns out couchdb.max_document_size doesn't limit document size really, it > limits http request size. > For PUT document requests both are similar, but that is not the case for > _bulk_docs requests. For example if max_document_size is set to 1MB, and user > post: 10, 200KB dbs, their whole _bulk_docs will fail. > It would probably be useful to rename the setting to max_request_size and put > it in chttpd section. And then possibly implement a max_document_size as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size
Nick Vatamaniuc created COUCHDB-3169: Summary: couchdb.max_document_size setting is actually max_http_request_size Key: COUCHDB-3169 URL: https://issues.apache.org/jira/browse/COUCHDB-3169 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc It turns out couchdb.max_document_size doesn't limit document size really, it limits http request size. For PUT document requests both are similar, but that is not the case for _bulk_docs requests. For example if max_document_size is set to 1MB, and user post: 10, 200KB dbs, their whole _bulk_docs will fail. It would probably be useful to rename the setting to max_request_size and put it in chttpd section. And then possibly implement a max_document_size as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size
[ https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530654#comment-15530654 ] Nick Vatamaniuc commented on COUCHDB-3169: -- Related to: https://issues.apache.org/jira/browse/COUCHDB-3168 > couchdb.max_document_size setting is actually max_http_request_size > --- > > Key: COUCHDB-3169 > URL: https://issues.apache.org/jira/browse/COUCHDB-3169 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > It turns out couchdb.max_document_size doesn't limit document size really, it > limits http request size. > For PUT document requests both are similar, but that is not the case for > _bulk_docs requests. For example if max_document_size is set to 1MB, and user > post: 10, 200KB dbs, their whole _bulk_docs will fail. > It would probably be useful to rename the setting to max_request_size and put > it in chttpd section. And then possibly implement a max_document_size as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3168) Replicator doesn't handle writing document to a db which has a limited document size
[ https://issues.apache.org/jira/browse/COUCHDB-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530569#comment-15530569 ] Nick Vatamaniuc commented on COUCHDB-3168: -- 413 are emitted per request, generated from here: https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L607-L611 So "max_document_size" is not strictly true as it is max_request_size really. Can still have documents smaller than that size just have many of them in a _bulk_docs request. > Replicator doesn't handle writing document to a db which has a limited > document size > > > Key: COUCHDB-3168 > URL: https://issues.apache.org/jira/browse/COUCHDB-3168 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > If a target db has set a smaller document max size, replication crashes. > It might make sense for the replication to not crash and instead treat > document size as an implicit replication filter then display doc write > failures in the stats / task info / completion record of normal replications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3168) Replicator doesn't handle writing document to a db which has a limited document size
[ https://issues.apache.org/jira/browse/COUCHDB-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530426#comment-15530426 ] Nick Vatamaniuc commented on COUCHDB-3168: -- Initially this seemed like a one-line change: https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_api_wrap.erl#L451 However a too large document crashes the whole _bulk_docs request it seems with: {"error":"too_large","reason":"the request entity is too large"} This mean we don't know which ones from the list of docs succeeded and which ones didn't. I tried this with: curl -X DELETE http://adm:pass@localhost:15984/x; curl -X PUT http://adm:pass@localhost:15984/x && curl -d @large_docs.json -H 'Content-Type: application/json' -X POST http://adm:pass@localhost:15984/x/_bulk_docs where large_docs.json looked something like {code} { "docs" : [ {"_id" : "doc1"}, {"_id" : "doc2", "large":"x"} ] } {code} and max docs size was set to something smaller than the "large" value in the docs > Replicator doesn't handle writing document to a db which has a limited > document size > > > Key: COUCHDB-3168 > URL: https://issues.apache.org/jira/browse/COUCHDB-3168 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > If a target db has set a smaller document max size, replication crashes. > It might make sense for the replication to not crash and instead treat > document size as an implicit replication filter then display doc write > failures in the stats / task info / completion record of normal replications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3168) Replicator doesn't handle writing document to a db which has a limited document size
Nick Vatamaniuc created COUCHDB-3168: Summary: Replicator doesn't handle writing document to a db which has a limited document size Key: COUCHDB-3168 URL: https://issues.apache.org/jira/browse/COUCHDB-3168 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc If a target db has set a smaller document max size, replication crashes. It might make sense for the replication to not crash and instead treat document size as an implicit replication filter then display doc write failures in the stats / task info / completion record of normal replications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3167) CouchDB replicator will retry forever if it cannot write to source db
Nick Vatamaniuc created COUCHDB-3167: Summary: CouchDB replicator will retry forever if it cannot write to source db Key: COUCHDB-3167 URL: https://issues.apache.org/jira/browse/COUCHDB-3167 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc If a replication is using checkpoints (and by default they do), and replication document doesn't not have authorization to write to source db, replication will crash repeatedly. Crashing is expected and not a problem, however, each time it crashes it writes an error state to the replication doc and then the replication job exits. Writing the error state, generates a new doc update change for the _replicator db. Replicator reads the document change. Starts a new replication job. Writes a "triggered" state to the document. Replication starts successfully then crashes and writes "error" to the document. So alternating states of "triggered" and "error" keep being written to the document forever. Looking at some examples of this there was a shard >900GB in size. Some as high as 500GB. The critical bit above is that the replication starts successfully. There is a mechanism to fail and cancel replications which fail repeated starts. However after replication jobs start, if it crashes, it will be restarted an unlimited number of times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3149) Exception written to the log if db deleted while there is a change feed running
[ https://issues.apache.org/jira/browse/COUCHDB-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493655#comment-15493655 ] Nick Vatamaniuc commented on COUCHDB-3149: -- Here is an attempt at fixing it: https://github.com/apache/couchdb-fabric/pull/69 But not familiar with that code, so probably took the wrong approach. > Exception written to the log if db deleted while there is a change feed > running > --- > > Key: COUCHDB-3149 > URL: https://issues.apache.org/jira/browse/COUCHDB-3149 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > {code} > [info] 2016-09-14T20:08:23.217251Z node1@127.0.0.1 <0.23485.0> ea02496172 > ea02496172 127.0.0.1 localhost:15984 DELETE /d1 200 ok 46 > [error] 2016-09-14T20:08:23.221676Z node1@127.0.0.1 <0.22945.0> > rexi_server > error:{'EXIT',{{stop,{cb_state,<0.22937.0>,#Ref<0.0.1.15627>,true}},[{couch_event_listener_mfa,handle_event,3,[{file,"src/couch_event_listener_mfa.erl"},{line,91}]},{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,142}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl" > },{line,139}]}]}} > [{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,150}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}] > [info] 2016-09-14T20:08:23.222174Z node1@127.0.0.1 <0.22898.0> 549ae68ef1 > 549ae68ef1 127.0.0.1 localhost:15984 GET /d1/_changes?feed=longpoll 200 ok > 32901 > {code} > Appears in the log if a database gets deleted while there is a change feed > running. Both longpoll or continuous seem to trigger the behavior. > Exception above in couch_event_listener_mfa:handle_event comes from > https://github.com/apache/couchdb-couch-event/blob/master/src/couch_event_listener_mfa.erl#L91 > which, in turn comes from fabric_db_update_listener handle_db_event returning > \{stop, St\} in: > https://github.com/apache/couchdb-fabric/blob/master/src/fabric_db_update_listener.erl#L87 > It seems couch_event_listerner_mfa:handle_event doesn’t handle \{stop, St\} > only, \{ok, NewState\} or just stop or it raises an exception. > I tried to replace \{stop, St\} with \{ok, St\} and then with stop. But in > both cases change feeds never stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3149) Exception written to the log if db deleted while there is a change feed running
Nick Vatamaniuc created COUCHDB-3149: Summary: Exception written to the log if db deleted while there is a change feed running Key: COUCHDB-3149 URL: https://issues.apache.org/jira/browse/COUCHDB-3149 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc {code} [info] 2016-09-14T20:08:23.217251Z node1@127.0.0.1 <0.23485.0> ea02496172 ea02496172 127.0.0.1 localhost:15984 DELETE /d1 200 ok 46 [error] 2016-09-14T20:08:23.221676Z node1@127.0.0.1 <0.22945.0> rexi_server error:{'EXIT',{{stop,{cb_state,<0.22937.0>,#Ref<0.0.1.15627>,true}},[{couch_event_listener_mfa,handle_event,3,[{file,"src/couch_event_listener_mfa.erl"},{line,91}]},{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,142}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl" },{line,139}]}]}} [{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,150}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}] [info] 2016-09-14T20:08:23.222174Z node1@127.0.0.1 <0.22898.0> 549ae68ef1 549ae68ef1 127.0.0.1 localhost:15984 GET /d1/_changes?feed=longpoll 200 ok 32901 {code} Appears in the log if a database gets deleted while there is a change feed running. Both longpoll or continuous seem to trigger the behavior. Exception above in couch_event_listener_mfa:handle_event comes from https://github.com/apache/couchdb-couch-event/blob/master/src/couch_event_listener_mfa.erl#L91 which, in turn comes from fabric_db_update_listener handle_db_event returning \{stop, St\} in: https://github.com/apache/couchdb-fabric/blob/master/src/fabric_db_update_listener.erl#L87 It seems couch_event_listerner_mfa:handle_event doesn’t handle \{stop, St\} only, \{ok, NewState\} or just stop or it raises an exception. I tried to replace \{stop, St\} with \{ok, St\} and then with stop. But in both cases change feeds never stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports
[ https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467500#comment-15467500 ] Nick Vatamaniuc commented on COUCHDB-2980: -- Wonder if it is worth at least preventing creating local replications like the original pr did? https://github.com/apache/couchdb-couch-replicator/pull/41 Otherwise behavior is surprising for someone with 1.x experience. And then later even if we add a local clustered support (say in 2.1), it will all of the sudden do something different. In the meantime is using `http://localhost:5984/db` an alternative for users to get the equivalent behavior? In other words would that cover Chris's case of make replicator db work as expected if it is replicated to another cluster? > Replicator DB on 15984 replicates to backdoor ports > --- > > Key: COUCHDB-2980 > URL: https://issues.apache.org/jira/browse/COUCHDB-2980 > Project: CouchDB > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0 >Reporter: Robert Kowalski > > If you POST a doc into the replicator database a replication is kicked off > and finishes successfully (usual 5984 port which maps to 15984 via haproxy). > The problem is that the DB is replicated to the backdoor ports (15986) and is > not visible on the other ports. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3111) Default replicator change feed timeout too short
[ https://issues.apache.org/jira/browse/COUCHDB-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446957#comment-15446957 ] Nick Vatamaniuc commented on COUCHDB-3111: -- For reference here is an example of a change feed request : {code} GET /rdyno_001/_changes?filter=rdyno_filterdoc%2Frdyno_filtername=continuous=all_docs=%22334-g1IieJyV0MENgjAUgOGnmKAnR9AJjG2RlpPcHENbHg0SxJNn3UQ30U10EyzUBAkhgTRpk7b_d3gZAEwTB2GenzEmIaF8tTaLZOZhLEEtiqJIE0e6J3PhKqEiLfz2905CLc2utj8FKoUTuqGaIswuOcb6mMfY3Ydlv2_0WiqfR9ivP5T9tdFLznxNvF59PjE73MxhiHttIGFCoxhgPKzxrA0aKCLQH2C8rPH-nwX1Auw3S2t8rFHOQwGMdhXDNPOQYztMv-12jrg%22=1 {code} > Default replicator change feed timeout too short > > > Key: COUCHDB-3111 > URL: https://issues.apache.org/jira/browse/COUCHDB-3111 > Project: CouchDB > Issue Type: Improvement >Reporter: Nick Vatamaniuc > > Current replicator change feeds are set up to timeout based on default > connection_timeout parameter divided by 3. Default connection timeout is > 3 (msec). So replicator change feeds are torn down and established again > every 10 seconds. > That doesn't seem bad on a smaller scale but if there are 1000 replications > jobs on a server it would means tearing down change feed connections every 10 > msec. It seems like it might not be optimal so wanted to discuss it. > Looking at the commit which introduced 'div 3' behavior wondering if there is > anything to improve here: > https://github.com/apache/couchdb-couch-replicator/commit/ed447f8c01880c7f99f5829a8ef485fd8d399376 > Maybe keep div 3 but increase default connection timeout to 60 seconds? Or > maybe apply div 2 - 5 seconds, or have a minimum of 30 seconds? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3111) Default replicator change feed timeout too short
Nick Vatamaniuc created COUCHDB-3111: Summary: Default replicator change feed timeout too short Key: COUCHDB-3111 URL: https://issues.apache.org/jira/browse/COUCHDB-3111 Project: CouchDB Issue Type: Improvement Reporter: Nick Vatamaniuc Current replicator change feeds are set up to timeout based on default connection_timeout parameter divided by 3. Default connection timeout is 3 (msec). So replicator change feeds are torn down and established again every 10 seconds. That doesn't seem bad on a smaller scale but if there are 1000 replications jobs on a server it would means tearing down change feed connections every 10 msec. It seems like it might not be optimal so wanted to discuss it. Looking at the commit which introduced 'div 3' behavior wondering if there is anything to improve here: https://github.com/apache/couchdb-couch-replicator/commit/ed447f8c01880c7f99f5829a8ef485fd8d399376 Maybe keep div 3 but increase default connection timeout to 60 seconds? Or maybe apply div 2 - 5 seconds, or have a minimum of 30 seconds? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3104) Replicator manager does not checkpoint properly
Nick Vatamaniuc created COUCHDB-3104: Summary: Replicator manager does not checkpoint properly Key: COUCHDB-3104 URL: https://issues.apache.org/jira/browse/COUCHDB-3104 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Nick Vatamaniuc In couch_replicator_manager {code} changes_reader_cb({stop, EndSeq, _Pending}, ...) -> {code} function at one point in the past was handling callback messages from {{fabric:change}} and so it would get pending info in the callback messages. When it was optimized to use local shard, local changes feeds don't send stop messages. As a result replicator manager never checkpoints and on every change to a replicator shard, rescan all the changes in that shard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3076) CouchDB 2.0 Blog Series: Feature: replicator
[ https://issues.apache.org/jira/browse/COUCHDB-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419612#comment-15419612 ] Nick Vatamaniuc commented on COUCHDB-3076: -- Good idea, Jenn. I added a little line about me at the end. Also linked to the syntax of Mango selectors Jan suggested. Let me know if anything else needs to be one. > CouchDB 2.0 Blog Series: Feature: replicator > > > Key: COUCHDB-3076 > URL: https://issues.apache.org/jira/browse/COUCHDB-3076 > Project: CouchDB > Issue Type: New JIRA Project >Reporter: Jenn Turner >Assignee: kzx >Priority: Minor > > This issue is to track progress on a series of blog posts promoting the > release of CouchDB 2.0. > Topic: Feature: replicator > -TBD > Nick Vatamaniuc volunteered via email thread: > https://lists.apache.org/thread.html/47637fe64739d26eca81a109650022b77c92aac05d15d49b18ade813@%3Cdev.couchdb.apache.org%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3101) Builtin reduce functions should not throw errors
[ https://issues.apache.org/jira/browse/COUCHDB-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419039#comment-15419039 ] Nick Vatamaniuc commented on COUCHDB-3101: -- I think it makes sense to signal user about the error somehow. So I like returning null better. There is already a pattern of returning an error to user if their reduce function doesn't reduce fast enough: {code}query_server_config.reduce_limit{code} And that is enabled by default. (Also, there is currently a bug in it how it calculates the limit with a pr fix: https://github.com/apache/couchdb/pull/425 ). > Builtin reduce functions should not throw errors > > > Key: COUCHDB-3101 > URL: https://issues.apache.org/jira/browse/COUCHDB-3101 > Project: CouchDB > Issue Type: Bug > Components: View Server Support >Reporter: Paul Joseph Davis > > So I just figured out we have an issue with the builtin reduce functions. > Currently, if they receive invalid data they'll throw an error. Unfortunately > what ends up happening is that if the error is never corrected then the view > files end up becoming bloated and refusing to open (because they're searching > for a header as Jay pointed out the other week). > We should either return null or ignore the bad data. My preference would be > to return null so that it indicates bad data was given somewhere but I could > also see just dropping the bad value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-3076) CouchDB 2.0 Blog Series: Feature: replicator
[ https://issues.apache.org/jira/browse/COUCHDB-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411197#comment-15411197 ] Nick Vatamaniuc commented on COUCHDB-3076: -- Draft: https://docs.google.com/document/d/14rk9jRrAElzAFA3XdXDsjahmklGrMHfwY_j9LJji1bA/edit?usp=sharing > CouchDB 2.0 Blog Series: Feature: replicator > > > Key: COUCHDB-3076 > URL: https://issues.apache.org/jira/browse/COUCHDB-3076 > Project: CouchDB > Issue Type: New JIRA Project >Reporter: Jenn Turner >Assignee: kzx >Priority: Minor > > This issue is to track progress on a series of blog posts promoting the > release of CouchDB 2.0. > Topic: Feature: replicator > -TBD > Nick Vatamaniuc volunteered via email thread: > https://lists.apache.org/thread.html/47637fe64739d26eca81a109650022b77c92aac05d15d49b18ade813@%3Cdev.couchdb.apache.org%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2824) group & group_level view parameters override each
[ https://issues.apache.org/jira/browse/COUCHDB-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2824. > group & group_level view parameters override each > - > > Key: COUCHDB-2824 > URL: https://issues.apache.org/jira/browse/COUCHDB-2824 > Project: CouchDB > Issue Type: Bug > Components: Database Core, HTTP Interface >Reporter: Nick Vatamaniuc >Assignee: Nick Vatamaniuc > Fix For: 2.0.0 > > > In a view query, if both group and group_level is specified the last one > specified overrides any of the previous "group" or "group_level" parameters. > Example: > Create a db (db1), at least one document, a design doc (des1) that looks like: > {code:javascript} > { >"views": { > "v1" : { "map": "function(d){ > emit([1,1],1); > emit([1,1],10); > emit([1,2],100); > emit([1,2],1000); > emit([2,2],1); >}" , > "reduce":"_sum" > } > } > {code} > Then these queries show the problem: > {code} > $ http "$DB1/db1/_design/des1/_view/v1?group_level=1=true" > {"rows":[ > {"key":[1,1],"value":11}, > {"key":[1,2],"value":1100}, > {"key":[2,2],"value":1} > ]} > {code} > But users might expect group_level=1 results to show or a 400 request invalid. > Specifying group_level=1 after group=true make group_level=1 take effect: > {code} > $ http "$DB1/db1/_design/des1/_view/v1?group_level=1=true_level=1" > {"rows":[ > {"key":[1],"value":}, > {"key":[2],"value":1} > ]} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2831) OS Daemons configuration test is failing when run in isolation
[ https://issues.apache.org/jira/browse/COUCHDB-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2831. > OS Daemons configuration test is failing when run in isolation > -- > > Key: COUCHDB-2831 > URL: https://issues.apache.org/jira/browse/COUCHDB-2831 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > It seems to work when run as part of the whole test suite. When run on its > won it fails. > ... apps=couch tests=configuration_reader_test_, > {code} > [error] Ignoring OS daemon request: {error,{1,invalid_json}} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2815) POST to /{db}/_all_docs with invalid keys should return a 400 error instead of 500
[ https://issues.apache.org/jira/browse/COUCHDB-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2815. > POST to /{db}/_all_docs with invalid keys should return a 400 error instead > of 500 > -- > > Key: COUCHDB-2815 > URL: https://issues.apache.org/jira/browse/COUCHDB-2815 > Project: CouchDB > Issue Type: Bug > Components: Database Core, HTTP Interface >Reporter: Nick Vatamaniuc > Fix For: 2.0.0 > > > Related to > http://docs.couchdb.org/en/latest/api/database/bulk-api.html#post--db-_all_docs > end point. > Example: > * db1 created with two documents ids : "1" and "2". > {code} > http -a adm:pass POST http://127.0.0.1:15984/db1/_all_docs keys:='["1",2]' > HTTP/1.1 500 Internal Server Error > Cache-Control: must-revalidate > Content-Length: 43 > Content-Type: application/json > Date: Wed, 16 Sep 2015 18:25:08 GMT > Server: CouchDB/b8b9968 (Erlang OTP/17) > X-Couch-Request-ID: 898d97fc1f > X-CouchDB-Body-Time: 0 > { > "error": "2", > "reason": "{illegal_docid,2}" > } > {code} > Expected 400 error instead as there is nothing wrong with on the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2818) Design documents accept invalid views
[ https://issues.apache.org/jira/browse/COUCHDB-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2818. > Design documents accept invalid views > - > > Key: COUCHDB-2818 > URL: https://issues.apache.org/jira/browse/COUCHDB-2818 > Project: CouchDB > Issue Type: Bug > Components: Database Core, Documentation, JavaScript View Server >Reporter: Nick Vatamaniuc >Assignee: Nick Vatamaniuc > Fix For: 2.0.0 > > > Design documents seem to accept invalid views. > For example: > {code} > $ http PUT $DB1/db2/_design/des1 views:='{ "v1" : > "function(d){emit(d._id,d);}" }' > HTTP/1.1 201 Created > { > "id": "_design/des1", > "ok": true, > "rev": "1-04701f13eb827265c442d219bd995e91" > } > {code} > Going by the documentation for design documents: > http://docs.couchdb.org/en/latest/api/ddoc/common.html#put--db-_design-ddoc , > a view should be an object that has a map (a string) and an optional reduce > (string). > Interestingly some validation is performed to check that views field itself > is an object. For example: > {code} > $ http PUT $DB1/db2/_design/des1 views:='"x"' > HTTP/1.1 400 Bad Request > { > "error": "invalid_design_doc", > "reason": "`views` parameter must be an object." > } > {code} > Also there is a deeper level validation of map functions: > {code} > $ http PUT $DB1/db2/_design/des1 views:='{ "m":{"map":""} }' > { > "error": "not_found", > "reason": "missing function" > } > {code} > If there is interest, I have a patch that, if provided: views, filters, > lists, show, updates, options are objects. rewrites are arrays, > validate_doc_update and language are strings. > Then if views is provided, each view is an object. It must have a map > function (a string) and an optional reduce function (also a string). > Here is an example how it works: > {code} > $ http PUT $DB1/db2/_design/des1 views:='{ "m":"bad" }' > HTTP/1.1 400 Bad Request > { > "error": "invalid_design_doc", > "reason": "View m must be an object" > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2848) EUnit Tests Fail Intermetently
[ https://issues.apache.org/jira/browse/COUCHDB-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2848. > EUnit Tests Fail Intermetently > -- > > Key: COUCHDB-2848 > URL: https://issues.apache.org/jira/browse/COUCHDB-2848 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > Fix For: 2.0.0 > > > Use this for now to keep track of them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2954) Deprecate configurable _replicator db name in 2.0
[ https://issues.apache.org/jira/browse/COUCHDB-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2954. > Deprecate configurable _replicator db name in 2.0 > - > > Key: COUCHDB-2954 > URL: https://issues.apache.org/jira/browse/COUCHDB-2954 > Project: CouchDB > Issue Type: Improvement >Reporter: Nick Vatamaniuc > > CouchDB 1.x has a configurable replicator database name. > CouchDB 2.x uses another pattern for having custom replicator databases -- it > scans files in local database data directory for patterns matching {code} > "_replicator(\\.[0-9]{10,})?.couch$" {code}. So for example, can create a > database called {{"joe/_replicator"}} and it will be considered a replicator > database by the replication management code. This way can even have multiple > replicator databases ( {{"mike/_replicator"}}, or {{"joe/other/_replicator"}} > ), so configuration is even more flexible than it was in 1.x. > Current code in couch_replicator_manager.erl is a mix of using the 1.x config > option and scanning recursively for db files with _replicator pattern. It > already also assumes a hard-coded "_replicator" name in a few places: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L918 > The proposal it to deprecate _replicator db name configuration in order to > simplify and clean up the the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2832) Task status test setup fails
[ https://issues.apache.org/jira/browse/COUCHDB-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2832. > Task status test setup fails > > > Key: COUCHDB-2832 > URL: https://issues.apache.org/jira/browse/COUCHDB-2832 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > Unit test couch_task_status_tests fails > {code} > $ reunit apps=couch tests=couch_task_status_test_ > ==> couch_log (eunit) > Running test function(s): > EUnit > There were no tests to run. > ==> couch (eunit) > Compiled test/couch_doc_json_tests.erl > Compiled test/couchdb_os_daemons_tests.erl > Running test function(s): > couch_task_status_tests:couch_task_status_test_/0 > EUnit > CouchDB task status updates > couch_task_status_tests:58: should_register_task...ok > couch_task_status_tests:62: should_set_task_startup_time...[0.002 s] ok > couch_task_status_tests:67: > should_have_update_time_as_startup_before_any_progress...ok > couch_task_status_tests:71: should_set_task_type...ok > couch_task_status_tests:75: > should_not_register_multiple_tasks_for_same_pid...ok > couch_task_status_tests:80: should_set_task_progress...ok > couch_task_status_tests:85: should_update_task_progress...*skipped* > undefined > *unexpected termination of test process* > ::{{badmatch,undefined}, >[{couch_log,debug,2,[{file,"src/couch_log.erl"},{line,32}]}, > {couch_task_status,handle_cast,2, >[{file,"src/couch_task_status.erl"},{line,137}]}, > {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]}, > {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]}, > {proc_lib,init_p_do_apply,3,[{file,[...]},{line,...}]}]} > === > Failed: 0. Skipped: 0. Passed: 6. > One or more tests were cancelled. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2963) Replication manager does not rescan databases on cluster membership change
[ https://issues.apache.org/jira/browse/COUCHDB-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2963. > Replication manager does not rescan databases on cluster membership change > -- > > Key: COUCHDB-2963 > URL: https://issues.apache.org/jira/browse/COUCHDB-2963 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc >Assignee: Nick Vatamaniuc > Fix For: 2.0.0 > > > Replication manager should rescan all replicator databases on cluster > membership changes from sequence 0, in order to possibly pick up new > replication it might be an owner of. > On receipt of nodedown or nodeup message, replication manager attempts to > start a new scan by resetting the checkpointed sequence IDs ets table. With > the intent that change feeds will exit and then check if they need to rescan > again. However because change feeds used for the replicator databases are > "continuous" they never exit, so consequently they never get a chance start > rescanning from 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2959) Deadlock condition in replicator with remote source and configured 1 http connection
[ https://issues.apache.org/jira/browse/COUCHDB-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2959. > Deadlock condition in replicator with remote source and configured 1 http > connection > > > Key: COUCHDB-2959 > URL: https://issues.apache.org/jira/browse/COUCHDB-2959 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Nick Vatamaniuc > Attachments: rep.py > > > A deadlock that occurs that can get the starting replications to get stuck > (and never update their state to triggered). This happens with a remote > source and when using a single http connection and single worker. > The deadlock occurs in this case: > - Replication process starts, it starts the changes reader: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator.erl#L276 > - Changes reader consumes the worker from httpc pool. At some point it will > make a call back to the replication process to report how much work it has > done using gen_server call {{report_seq_done}} > - In the meantime, main replication process calls {{get_pending_changes}} to > get changes from the source. If the source is remote it will attempt to > consumer a worker from httpc pool. However the worker is used by the change > feed process. So get_pending_changes is blocked waiting for a worker to be > released. > - So changes feed is waiting for report_seq_done call to replication process > to return while holding a worker and main replication process is waiting for > httpc pool to release the worker and it never responds to report_seq_done. > Attached python script (rep.py) to reproduce issue. Script creates n > databases (tested with n=1000). Then replicates those databases to 1 single > database. It also need Python CouchDB module from pip (or package repos). > 1. It an can be run from ipython. By importing {{rep}}. > 2. start dev cluster {{./dev/run --admin=adm:pass}} > 3. {{rep.replicate_1_to_n(1000)}} > wait > 4. {{rep.check_untriggered()}} > When it fails, result might look like this: > {code} > { > 'rdyno_1_6': None, > 'rdyno_1_00158': None > } > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-2959) Deadlock condition in replicator with remote source and configured 1 http connection
[ https://issues.apache.org/jira/browse/COUCHDB-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-2959. -- Resolution: Fixed > Deadlock condition in replicator with remote source and configured 1 http > connection > > > Key: COUCHDB-2959 > URL: https://issues.apache.org/jira/browse/COUCHDB-2959 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Nick Vatamaniuc > Attachments: rep.py > > > A deadlock that occurs that can get the starting replications to get stuck > (and never update their state to triggered). This happens with a remote > source and when using a single http connection and single worker. > The deadlock occurs in this case: > - Replication process starts, it starts the changes reader: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator.erl#L276 > - Changes reader consumes the worker from httpc pool. At some point it will > make a call back to the replication process to report how much work it has > done using gen_server call {{report_seq_done}} > - In the meantime, main replication process calls {{get_pending_changes}} to > get changes from the source. If the source is remote it will attempt to > consumer a worker from httpc pool. However the worker is used by the change > feed process. So get_pending_changes is blocked waiting for a worker to be > released. > - So changes feed is waiting for report_seq_done call to replication process > to return while holding a worker and main replication process is waiting for > httpc pool to release the worker and it never responds to report_seq_done. > Attached python script (rep.py) to reproduce issue. Script creates n > databases (tested with n=1000). Then replicates those databases to 1 single > database. It also need Python CouchDB module from pip (or package repos). > 1. It an can be run from ipython. By importing {{rep}}. > 2. start dev cluster {{./dev/run --admin=adm:pass}} > 3. {{rep.replicate_1_to_n(1000)}} > wait > 4. {{rep.check_untriggered()}} > When it fails, result might look like this: > {code} > { > 'rdyno_1_6': None, > 'rdyno_1_00158': None > } > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-2954) Deprecate configurable _replicator db name in 2.0
[ https://issues.apache.org/jira/browse/COUCHDB-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-2954. -- Resolution: Fixed > Deprecate configurable _replicator db name in 2.0 > - > > Key: COUCHDB-2954 > URL: https://issues.apache.org/jira/browse/COUCHDB-2954 > Project: CouchDB > Issue Type: Improvement >Reporter: Nick Vatamaniuc > > CouchDB 1.x has a configurable replicator database name. > CouchDB 2.x uses another pattern for having custom replicator databases -- it > scans files in local database data directory for patterns matching {code} > "_replicator(\\.[0-9]{10,})?.couch$" {code}. So for example, can create a > database called {{"joe/_replicator"}} and it will be considered a replicator > database by the replication management code. This way can even have multiple > replicator databases ( {{"mike/_replicator"}}, or {{"joe/other/_replicator"}} > ), so configuration is even more flexible than it was in 1.x. > Current code in couch_replicator_manager.erl is a mix of using the 1.x config > option and scanning recursively for db files with _replicator pattern. It > already also assumes a hard-coded "_replicator" name in a few places: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L918 > The proposal it to deprecate _replicator db name configuration in order to > simplify and clean up the the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-2988) Allow query selector as changes and replication filter
[ https://issues.apache.org/jira/browse/COUCHDB-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-2988. -- Resolution: Fixed > Allow query selector as changes and replication filter > -- > > Key: COUCHDB-2988 > URL: https://issues.apache.org/jira/browse/COUCHDB-2988 > Project: CouchDB > Issue Type: Improvement > Components: Database Core, Mango >Reporter: Nick Vatamaniuc > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2979) Replicator manager attempts to checkpoint too frequently
[ https://issues.apache.org/jira/browse/COUCHDB-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2979. > Replicator manager attempts to checkpoint too frequently > > > Key: COUCHDB-2979 > URL: https://issues.apache.org/jira/browse/COUCHDB-2979 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > Current checkpoint interval is set to 5 seconds. That works well for a few > replications but when there are thousands of them it ends up being an attempt > every few milliseconds or so. > Moreover to decide on ownership (in order to keep on replication running per > cluster) each replication during an attempted checkpoint uses a gen_server > call to replicator manager. Those usually are fast (I bench-marked at a > 100-200 usec) however if replicator manager is busy (say stuck fetching large > filter documents when computing replication ids), none of the replication > would be able to checkpoint and make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2988) Allow query selector as changes and replication filter
[ https://issues.apache.org/jira/browse/COUCHDB-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2988. > Allow query selector as changes and replication filter > -- > > Key: COUCHDB-2988 > URL: https://issues.apache.org/jira/browse/COUCHDB-2988 > Project: CouchDB > Issue Type: Improvement > Components: Database Core, Mango >Reporter: Nick Vatamaniuc > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3006) Source failure in one source to many target replications causes a stampede
[ https://issues.apache.org/jira/browse/COUCHDB-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-3006. > Source failure in one source to many target replications causes a stampede > -- > > Key: COUCHDB-3006 > URL: https://issues.apache.org/jira/browse/COUCHDB-3006 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > For multiple replications from a single source to multiple targets. If source > fails, all replications post an error state back their replication document > and attempt to restart. This creates a stampede effect and causes sharp load > spikes on the replication cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-3039) Inconsistent behavior with with _all_docs handling of null keys between CouchDB 1.x and 2.x
[ https://issues.apache.org/jira/browse/COUCHDB-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-3039. -- Resolution: Fixed > Inconsistent behavior with with _all_docs handling of null keys between > CouchDB 1.x and 2.x > --- > > Key: COUCHDB-3039 > URL: https://issues.apache.org/jira/browse/COUCHDB-3039 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > CouchDB in a POST request to _all_docs where key is null will return an error > row: > {code} > { >"total_rows": 14970916, >"offset": 0, >"rows": [ > { > "key": null, > "error": "not_found" > }, > ... other valid rows ... > ] > } > {code} > CouchDB 2.0 will return a 400 error > {code} > HTTP/1.1 400 Bad Request > { > "error": "illegal_docid", > "reason": null > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-3082) Replicator manager crashes in terminate/2 if initial change feed spawned for _replicate hasn't finished
[ https://issues.apache.org/jira/browse/COUCHDB-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-3082. > Replicator manager crashes in terminate/2 if initial change feed spawned for > _replicate hasn't finished > --- > > Key: COUCHDB-3082 > URL: https://issues.apache.org/jira/browse/COUCHDB-3082 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > During init we spawn a change feed for the _replicator db and assign > rep_start_pids = [Pid]. However the shape of rep_start_pids should be {Tag, > Pid}. In terminate/2 we clean up by doing: > {code} > lists:foreach( > fun({_Tag, Pid}) -> > ... > [{scanner, ScanPid} | StartPids]), > {code} > > Which ends up crashing with a function clause because we expect foreach > function to get a tuple of 2 items. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports
[ https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405193#comment-15405193 ] Nick Vatamaniuc commented on COUCHDB-2980: -- [~chrisfosterelli] Interesting points. Thinking more about this, it seems it is hard to for a node in a cluster to know the host of the cluster in general. Say a cluster is behind a proxy for fault tollerance, after the document is added to a replicator db, can't see how it would know what the external cluster host would be say database {{a}} means "https://user:p...@mycluster.com/a; or "http://user:p...@user.somecluster.net/a; for example. In case of { > Replicator DB on 15984 replicates to backdoor ports > --- > > Key: COUCHDB-2980 > URL: https://issues.apache.org/jira/browse/COUCHDB-2980 > Project: CouchDB > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0 >Reporter: Robert Kowalski >Priority: Blocker > > If you POST a doc into the replicator database a replication is kicked off > and finishes successfully (usual 5984 port which maps to 15984 via haproxy). > The problem is that the DB is replicated to the backdoor ports (15986) and is > not visible on the other ports. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3082) Replicator manager crashes in terminate/2 if initial change feed spawned for _replicate hasn't finished
Nick Vatamaniuc created COUCHDB-3082: Summary: Replicator manager crashes in terminate/2 if initial change feed spawned for _replicate hasn't finished Key: COUCHDB-3082 URL: https://issues.apache.org/jira/browse/COUCHDB-3082 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc During init we spawn a change feed for the _replicator db and assign rep_start_pids = [Pid]. However the shape of rep_start_pids should be {Tag, Pid}. In terminate/2 we clean up by doing: lists:foreach( fun({_Tag, Pid}) -> ... [{scanner, ScanPid} | StartPids]), Which ends up crashing with a function clause because we expect foreach function to get a tuple of 2 items. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3046) Improve reduce function overflow protection
Nick Vatamaniuc created COUCHDB-3046: Summary: Improve reduce function overflow protection Key: COUCHDB-3046 URL: https://issues.apache.org/jira/browse/COUCHDB-3046 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Nick Vatamaniuc The protection algorithm: https://github.com/apache/couchdb/blob/master/share/server/views.js#L36-L41 When enabled, looks at couchjs' reduce command input and output line lengths (as stringy-fied json). If 2*len(output) > len(input) and len(output) > 200 then an error is triggered. There a few issues in that scheme: * Input line contains the length of the reduce function code itself. A large reduce function body (say 100KB) might lead to failure to trip the error. * On the other hand, output size checking threshold is too small = 200. It prevents functions using single large accumulator object (say with fields like .sum, .count, .stddev, and so on) from working. The size of output will be > 200 but, even though it won't be growing it will still be prevented from running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3039) Inconsistent behavior with with _all_docs handling of null keys between CouchDB 1.x and 2.x
Nick Vatamaniuc created COUCHDB-3039: Summary: Inconsistent behavior with with _all_docs handling of null keys between CouchDB 1.x and 2.x Key: COUCHDB-3039 URL: https://issues.apache.org/jira/browse/COUCHDB-3039 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc CouchDB in a POST request to _all_docs where key is null will return an error row: {code} { "total_rows": 14970916, "offset": 0, "rows": [ { "key": null, "error": "not_found" }, ... other valid rows ... ] } {code} CouchDB 2.0 will return a 400 error {code} HTTP/1.1 400 Bad Request { "error": "illegal_docid", "reason": null } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (COUCHDB-2965) Race condition in replicator rescan logic
[ https://issues.apache.org/jira/browse/COUCHDB-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc closed COUCHDB-2965. > Race condition in replicator rescan logic > - > > Key: COUCHDB-2965 > URL: https://issues.apache.org/jira/browse/COUCHDB-2965 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Nick Vatamaniuc > > There is race condition between the full rescan and regular change feed > processing in the couch_replicator_manger code. > This race condition would lead to replication docs left in untriggered state > when a rescan of all the docs is performed. The rescan might happen when > nodes connect and disconnect. The likelihood of this race condition appear > goes up if a lot of documents are updated and there is a back-up of messages > in the replicator manager's mailbox. > The race condition happens in the following way: > * A full rescan is initiated here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L424 > It clears the db_to_seq ets table which holds the latest change sequence for > each replicator database. Then launches a scan_all_dbs process. > * scan_all_dbs will find all replicator-looking-like database and for each > send a \{resume_scan, DbName\} message to the main couch_replicator_manager > process. > * \{resume_scan, DbName\} message is handled here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L233 > The expectation is because db_to_seq was reset it ends up not finding a > sequence checkpoint in db_to_seq, so start 0 and spawns a new change feed, > which will rescan all documents (since we need to determine ownership for > them). > But the race condition occurs because when change feeds stop, they call > replicator manager with \{ rep_db_checkpoint, DbName \} message. That updates > db_to_seq ets table with the latest change sequence: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L225 > Which means this sequence of operations could happen: > * db_to_seq is reset to 0, scan_all_dbs is spawned > * change feed stops at sequence 1042, it calls \{rep_db_checkpoint, > <<"_replicator">>\} > * \{rep_db_checkpoint, <<"_replicator">>\} call is handled, now latest > db_to_seq for _replicator is 1042 > * \{resume, <<"_replicator">>\} is sent from scan_all_dbs process and > received by replicator manager. It sees that db_to_seq has _replicator with > latest sequence 1042, so it will either start from that instead of 0, thus > skipping updates from 0 to 1042. > This was seen by running the experiment with1000 replication documents were > being updated. Around document 700 or so , node1 was killed (pkill -f node1) > . node2 experienced the race condition on rescan and never picked up a bunch > of document that should have belong to it. didn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-2965) Race condition in replicator rescan logic
[ https://issues.apache.org/jira/browse/COUCHDB-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-2965. -- Resolution: Fixed > Race condition in replicator rescan logic > - > > Key: COUCHDB-2965 > URL: https://issues.apache.org/jira/browse/COUCHDB-2965 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Nick Vatamaniuc > > There is race condition between the full rescan and regular change feed > processing in the couch_replicator_manger code. > This race condition would lead to replication docs left in untriggered state > when a rescan of all the docs is performed. The rescan might happen when > nodes connect and disconnect. The likelihood of this race condition appear > goes up if a lot of documents are updated and there is a back-up of messages > in the replicator manager's mailbox. > The race condition happens in the following way: > * A full rescan is initiated here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L424 > It clears the db_to_seq ets table which holds the latest change sequence for > each replicator database. Then launches a scan_all_dbs process. > * scan_all_dbs will find all replicator-looking-like database and for each > send a \{resume_scan, DbName\} message to the main couch_replicator_manager > process. > * \{resume_scan, DbName\} message is handled here: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L233 > The expectation is because db_to_seq was reset it ends up not finding a > sequence checkpoint in db_to_seq, so start 0 and spawns a new change feed, > which will rescan all documents (since we need to determine ownership for > them). > But the race condition occurs because when change feeds stop, they call > replicator manager with \{ rep_db_checkpoint, DbName \} message. That updates > db_to_seq ets table with the latest change sequence: > https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L225 > Which means this sequence of operations could happen: > * db_to_seq is reset to 0, scan_all_dbs is spawned > * change feed stops at sequence 1042, it calls \{rep_db_checkpoint, > <<"_replicator">>\} > * \{rep_db_checkpoint, <<"_replicator">>\} call is handled, now latest > db_to_seq for _replicator is 1042 > * \{resume, <<"_replicator">>\} is sent from scan_all_dbs process and > received by replicator manager. It sees that db_to_seq has _replicator with > latest sequence 1042, so it will either start from that instead of 0, thus > skipping updates from 0 to 1042. > This was seen by running the experiment with1000 replication documents were > being updated. Around document 700 or so , node1 was killed (pkill -f node1) > . node2 experienced the race condition on rescan and never picked up a bunch > of document that should have belong to it. didn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-3006) Source failure in one source to many target replications causes a stampede
Nick Vatamaniuc created COUCHDB-3006: Summary: Source failure in one source to many target replications causes a stampede Key: COUCHDB-3006 URL: https://issues.apache.org/jira/browse/COUCHDB-3006 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc For multiple replications from a single source to multiple targets. If source fails, all replications post an error state back their replication document and attempt to restart. This creates a stampede effect and causes sharp load spikes on the replication cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2631) Ensure that system databases callbacks are adds correctly for shared case
[ https://issues.apache.org/jira/browse/COUCHDB-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248545#comment-15248545 ] Nick Vatamaniuc commented on COUCHDB-2631: -- {code} IsReplicatorDb = DbName == config:get("replicator", "db", "_replicator") {code} Doesn't apply to 2.x anymore. Local replicator is always "_replicator". Also it expects binaries. {code} (node1@127.0.0.1)4> couch_db:normalize_dbname("shards/-1fff/_users.1460972107"). "shards/-1fff/_users.1460972107" (node1@127.0.0.1)5> couch_db:normalize_dbname("shards/-1fff/_users"). "shards/-1fff/_users" (node1@127.0.0.1)6> couch_db:normalize_dbname(<<"shards/-1fff/_users">>). <<"_users">> (node1@127.0.0.1)7> couch_db:normalize_dbname(<<"shards/-1fff/_users.134565677">>). <<"_users">> {code} [~eiri] pointed to this PR that should handle this issue https://github.com/apache/couchdb-couch/pull/160 > Ensure that system databases callbacks are adds correctly for shared case > - > > Key: COUCHDB-2631 > URL: https://issues.apache.org/jira/browse/COUCHDB-2631 > Project: CouchDB > Issue Type: Bug > Components: BigCouch >Reporter: Alexander Shorin >Priority: Blocker > Labels: needs-pr > Fix For: 2.0.0 > > > We have the following code in > [couch_server|https://github.com/apache/couchdb-couch/blob/master/src/couch_server.erl#L119-L143] > {code} > maybe_add_sys_db_callbacks(DbName, Options) when is_binary(DbName) -> > maybe_add_sys_db_callbacks(?b2l(DbName), Options); > maybe_add_sys_db_callbacks(DbName, Options) -> > DbsDbName = config:get("mem3", "shard_db", "dbs"), > NodesDbName = config:get("mem3", "node_db", "nodes"), > IsReplicatorDb = DbName == config:get("replicator", "db", "_replicator") > orelse > path_ends_with(DbName, <<"_replicator">>), > IsUsersDb = DbName ==config:get("couch_httpd_auth", "authentication_db", > "_users") orelse > path_ends_with(DbName, <<"_users">>), > if > DbName == DbsDbName -> > [sys_db | Options]; > DbName == NodesDbName -> > [sys_db | Options]; > IsReplicatorDb -> > [{before_doc_update, fun > couch_replicator_manager:before_doc_update/2}, >{after_doc_read, fun couch_replicator_manager:after_doc_read/2}, >sys_db | Options]; > IsUsersDb -> > [{before_doc_update, fun couch_users_db:before_doc_update/2}, >{after_doc_read, fun couch_users_db:after_doc_read/2}, >sys_db | Options]; > true -> > Options > end. > {code} > Which works perfectly except if system database is clustered. So, for shared > _users and _replicator the check condition will not work since shared > databases ends with timestamp and full name looks as > "shards/-1fff/_users.1424979962" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2834) Server sends connection: close too early
[ https://issues.apache.org/jira/browse/COUCHDB-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247182#comment-15247182 ] Nick Vatamaniuc commented on COUCHDB-2834: -- Just noticed email from JIRA. Will have PR ready by tomorrow. > Server sends connection: close too early > > > Key: COUCHDB-2834 > URL: https://issues.apache.org/jira/browse/COUCHDB-2834 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc >Priority: Blocker > Labels: has-pr > Fix For: 2.0.0 > > > This is related COUCHDB-2833. > This was found investigating the failure of replication tests. Specifically > couch_replicator_large_atts_tests, the {local, remote} sub-case. > The test sets up push replications from local to remote. > Replication workers have more than 1 document larger than > MAX_BULK_ATT_SIZE=64K. They start pushing them to the target, using a > keep-alive connection (default for HTTP 1.1), the first few pipelined > requests will go through using the same connection, then server will accept > the first PUT to …/docid?edits=false, then return Connection:close and close > the connection after the 201 Created result. > Server should not close request too early and instead keep it open longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-2988) Allow query selector as changes and replication filter
Nick Vatamaniuc created COUCHDB-2988: Summary: Allow query selector as changes and replication filter Key: COUCHDB-2988 URL: https://issues.apache.org/jira/browse/COUCHDB-2988 Project: CouchDB Issue Type: Improvement Components: Database Core, Mango Reporter: Nick Vatamaniuc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-2987) Mango Python tests failure
Nick Vatamaniuc created COUCHDB-2987: Summary: Mango Python tests failure Key: COUCHDB-2987 URL: https://issues.apache.org/jira/browse/COUCHDB-2987 Project: CouchDB Issue Type: Bug Components: Mango Reporter: Nick Vatamaniuc Saw this tests failure running mango's test suit: {code} $ nosetests S...SF..SSS.S.S...SSS == FAIL: test_empty_subsel_match (02-basic-find-test.BasicFindTests) -- Traceback (most recent call last): File "/Users/nvatama/asf/couchdb/src/mango/test/02-basic-find-test.py", line 256, in test_empty_subsel_match assert len(docs) == 1 AssertionError: >> begin captured logging << requests.packages.urllib3.connectionpool: DEBUG: "POST /mango_test_b7fb2baf897741a288e8174971ef388c/_bulk_docs HTTP/1.1" 201 97 requests.packages.urllib3.connectionpool: DEBUG: "POST /mango_test_b7fb2baf897741a288e8174971ef388c/_find HTTP/1.1" 200 None - >> end captured logging << - -- Ran 137 tests in 51.613s FAILED (SKIP=90, failures=1) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports
[ https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc updated COUCHDB-2980: - Comment: was deleted (was: We should probably disallow "local" replications from being accepted in source and target of replication doc. Those end up as "local" databases (like say _users, _nodes, _dbs) don't do what is expected. To make things more interesting, for the _replicate http endpoint we do some hacks to turn a local db into a full url: https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L389 But that is running inside the context of a http request so it easy to access to authorization headers and such. ) > Replicator DB on 15984 replicates to backdoor ports > --- > > Key: COUCHDB-2980 > URL: https://issues.apache.org/jira/browse/COUCHDB-2980 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Robert Kowalski > > If you POST a doc into the replicator database a replication is kicked off > and finishes successfully (usual 5984 port which maps to 15984 via haproxy). > The problem is that the DB is replicated to the backdoor ports (15986) and is > not visible on the other ports. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports
[ https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227328#comment-15227328 ] Nick Vatamaniuc commented on COUCHDB-2980: -- We should probably disallow "local" replications from being accepted in source and target of replication doc. Those end up as "local" databases (like say _users, _nodes, _dbs) don't do what is expected. To make things more interesting, for the _replicate http endpoint we do some hacks to turn a local db into a full url: https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L389 But that is running inside the context of a http request so it easy to access to authorization headers and such. > Replicator DB on 15984 replicates to backdoor ports > --- > > Key: COUCHDB-2980 > URL: https://issues.apache.org/jira/browse/COUCHDB-2980 > Project: CouchDB > Issue Type: Bug > Components: Replication >Reporter: Robert Kowalski > > If you POST a doc into the replicator database a replication is kicked off > and finishes successfully (usual 5984 port which maps to 15984 via haproxy). > The problem is that the DB is replicated to the backdoor ports (15986) and is > not visible on the other ports. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COUCHDB-2979) Replicator manager attempts to checkpoint too frequently
[ https://issues.apache.org/jira/browse/COUCHDB-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Vatamaniuc resolved COUCHDB-2979. -- Resolution: Fixed > Replicator manager attempts to checkpoint too frequently > > > Key: COUCHDB-2979 > URL: https://issues.apache.org/jira/browse/COUCHDB-2979 > Project: CouchDB > Issue Type: Bug >Reporter: Nick Vatamaniuc > > Current checkpoint interval is set to 5 seconds. That works well for a few > replications but when there are thousands of them it ends up being an attempt > every few milliseconds or so. > Moreover to decide on ownership (in order to keep on replication running per > cluster) each replication during an attempted checkpoint uses a gen_server > call to replicator manager. Those usually are fast (I bench-marked at a > 100-200 usec) however if replicator manager is busy (say stuck fetching large > filter documents when computing replication ids), none of the replication > would be able to checkpoint and make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (COUCHDB-2979) Replicator manager attempts to checkpoint too frequently
Nick Vatamaniuc created COUCHDB-2979: Summary: Replicator manager attempts to checkpoint too frequently Key: COUCHDB-2979 URL: https://issues.apache.org/jira/browse/COUCHDB-2979 Project: CouchDB Issue Type: Bug Reporter: Nick Vatamaniuc Current checkpoint interval is set to 5 seconds. That works well for a few replications but when there are thousands of them it ends up being an attempt every few milliseconds or so. Moreover to decide on ownership (in order to keep on replication running per cluster) each replication during an attempted checkpoint uses a gen_server call to replicator manager. Those usually are fast (I bench-marked at a 100-200 usec) however if replicator manager is busy (say stuck fetching large filter documents when computing replication ids), none of the replication would be able to checkpoint and make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2975) Automatically restart replication jobs if they crash
[ https://issues.apache.org/jira/browse/COUCHDB-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211933#comment-15211933 ] Nick Vatamaniuc commented on COUCHDB-2975: -- Noticed transient mode does not clean up child specs after it is done. Even if exit is normal. The intent behind that is to let users restart children. >From erlang docs saw this {{If the child is temporary, the child specification >is deleted as soon as the process terminates. This means that delete_child/2 >has no meaning, and restart_child/2 can not be used for these children.}} However in our code sometimes we explicitly delete child: {code} cancel_replication({BaseId, Extension}) -> ... case supervisor:terminate_child(couch_replicator_job_sup, FullRepId) of ok -> ... case supervisor:delete_child(couch_replicator_job_sup, FullRepId) of ok -> {ok, {cancelled, ?l2b(FullRepId)}}; ... {code} That would make it seem as if supervisor auto-deleted the child spec in some cases. To test that it doesn't start a normal replication (not a continuous one) and then after it is finished inspect the state of {{couch_replicator_job_sup}}. An example of state from supervisor after 10 replication have finished on a cluster: {code} {state, {local,couch_replicator_job_sup}, one_for_one, [{child,undefined,"ac35738f5003c02b6780116fdf04b524", {gen_server,start_link, [couch_replicator, {rep, {"ac35738f5003c02b6780116fdf04b524",[]}, {httpdb,"http://adm:pass@localhost:5984/rdyno_src_0001/;, nil, [{"Accept","application/json"}, {"User-Agent","CouchDB-Replicator/5fa9098"}], 20, [{socket_options,[{keepalive,true},{nodelay,false}]}], 1,250,nil,1}, {httpdb,"http://adm:pass@localhost:5984/rdyno_tgt_0009/;, nil, [{"Accept","application/json"}, {"User-Agent","CouchDB-Replicator/5fa9098"}], 20, [{socket_options,[{keepalive,true},{nodelay,false}]}], 1,250,nil,1}, [{checkpoint_interval,5000}, {connection_timeout,20}, {continuous,false}, {http_connections,1}, {retries,1}, {socket_options,[{keepalive,true},{nodelay,false}]}, {use_checkpoints,true}, {worker_batch_size,500}, {worker_processes,1}], {user_ctx,null,[],undefined}, db,nil, <<"rdyno_0001"...(15 B)>>, <<"shards/a00"...(47 B)>>}, [{timeout,20}]]}, transient,250,worker, [couch_replicator]}, {child,undefined,"6c48c1ab7a6e3ed5e3d4415ced912e4a", {gen_server,start_link, [couch_replicator, {rep, {"6c48c1ab7a6e3ed5e3d4415ced912e4a",[]}, {httpdb,"http://adm:pass@localhost:5984/rdyno_src_0001/;, nil, [{"Accept","application/json"}, {"User-Agent","CouchDB-Replicator/5fa9098"}], 20, [{socket_options,[{keepalive,true},{nodelay,false}]}], 1,250,nil,1}, {httpdb,"http://adm:pass@localhost:5984/rdyno_tgt_0002/;, nil, [{"Accept","application/json"}, {"User-Agent","CouchDB-Replicator/5fa9098"}], 20, [{socket_options,[{keepalive,true},{nodelay,false}]}], 1,250,nil,1}, [{checkpoint_interval,5000}, {connection_timeout,20}, {continuous,false}, {http_connections,1}, {retries,1}, {socket_options,[{keepalive,true},{nodelay,false}]}, {use_checkpoints,true}, {worker_batch_size,500}, {worker_processes,1}], {user_ctx,null,[],undefined}, db,nil, <<"rdyno_0001"...(15 B)>>, <<"shards/200"...(47 B)>>}, [{timeout,20}]]}, transient,250,worker, [couch_replicator]}], undefined,100,1,[],couch_replicator_job_sup,[]} {code} > Automatically restart replication jobs if they crash > > > Key: COUCHDB-2975 > URL: https://issues.apache.org/jira/browse/COUCHDB-2975 > Project: CouchDB > Issue Type: Improvement >
[jira] [Commented] (COUCHDB-2975) Automatically restart replication jobs if they crash
[ https://issues.apache.org/jira/browse/COUCHDB-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211101#comment-15211101 ] Nick Vatamaniuc commented on COUCHDB-2975: -- We might have to increase intensity threshold. One common use case that will trigger is one source to multiple targets replications. Source fails, So all replications will fail as well. Tested it with 1 source to 200 targets. Then killed the source and noticed supervisors were restarted: (node1@127.0.0.1)4> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]). {[<0.352.0>,<26873.355.0>,<26910.354.0>],[]} % before deleting source (node1@127.0.0.1)5> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]). {[<0.5617.4>,<26873.7071.3>,<26910.8924.3>],[]} % after deleting source Saw we already have some protection again failed repeated replication re-starts as the “max_replication_retry_count” parameter. By default it is 10. So 10 failed replication starts for a particular replication will cancel that replication. Once it successfully starts once, the failed retries number gets reset back to max (10). Another thing, noticed replications will restart even without {{transient}} supervisors if they are killed with an exit reason other than 'kill' (brutal kill). So if the goal is to just restart them, sending them exit(Pid, meh) should suffice. > Automatically restart replication jobs if they crash > > > Key: COUCHDB-2975 > URL: https://issues.apache.org/jira/browse/COUCHDB-2975 > Project: CouchDB > Issue Type: Improvement > Components: Replication >Reporter: Robert Newson > > We currently use the temporary restart strategy for replication jobs, which > means if they crash they are not restarted. > Instead, let's use the transient restart strategy, ensuring they are > restarted on abnormal termination, while still allowing these tasks to end > successfully on completion or cancellation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COUCHDB-2971) Provide cardinality estimate (COUNT DISTINCT) as builtin reducer
[ https://issues.apache.org/jira/browse/COUCHDB-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205902#comment-15205902 ] Nick Vatamaniuc commented on COUCHDB-2971: -- Ah, good point on having a nicer way to specify precision. Yeah otherwise it looks kind of hackish. Noticed they provide various backends for the registers. One is a C NIF. Tried to compile and run their code on Erlang 18 and had to fiddle with it a bit, but got it to work and got these results: https://gist.github.com/nickva/bf19a2b7b537f5051a99 There are some tradeoffs between memory usage, cardinality and union times. While C array is interesting, having the cheapest union operation (under 1ms), has cardinality estimation time greater than a few milliseconds which might not play well with the Erlang schedulers. But if it happens only during the finalize stage it could be handled in another way (some thread + queue mechanism). Unfortunately it also has a large/constant memory usage for low cardinalities. > Provide cardinality estimate (COUNT DISTINCT) as builtin reducer > > > Key: COUCHDB-2971 > URL: https://issues.apache.org/jira/browse/COUCHDB-2971 > Project: CouchDB > Issue Type: Improvement >Reporter: Adam Kocoloski > > We’ve seen a number of applications now where a user needs to count the > number of unique keys in a view. Currently the recommended approach is to add > a trivial reduce function and then count the number of rows in a _list > function or client-side application code, but of course that doesn’t scale > nicely. > It seems that in a majority of these cases all that’s required is an > approximation of the number of distinct entries, which brings us into the > space of hash sets, linear probabilistic counters, and the ever-popular > “HyperLogLog” algorithm. Taking HLL specifically, this seems like quite a > nice candidate for a builtin reduce. The size of the data structure is > independent of the number of input elements and individual HLL filters can be > unioned together. There’s already what seems to be a good MIT-licensed > implementation on GitHub: > https://github.com/GameAnalytics/hyper > One caveat is that this reducer would not work for group_level reductions; > it’d only give the correct result for the exact key. I don’t think that > should preclude us from evaluating it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)