[jira] [Closed] (COUCHDB-3245) couchjs -S option doesn't have any effect

2017-07-13 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-3245.


> couchjs -S option doesn't have any effect
> -
>
> Key: COUCHDB-3245
> URL: https://issues.apache.org/jira/browse/COUCHDB-3245
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> currently -S option of couchjs sets stack _chunk_ size for js contexts
> Reference: to 
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext
> Documentation recommends 8K and I have seen cases where it was raised to 1G+ 
> in production!. That doesn't seem right at all and also probably kills 
> performance and eats memory. 
> Docs from above say:
> > The stackchunksize parameter does not control the JavaScript stack size. 
> > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing 
> > a large number for stackchunksize is a mistake. In a DEBUG build, large 
> > chunk sizes can degrade performance dramatically. The usual value of 8192 
> > is recommended
> Instead we should be setting the max gc value which is set in the runtime
> {{JS_NewRuntime(uint32_t maxbytes)}}
> Experimentally a large maxbytes seems to fix out of memory error caused by 
> large views. I suspect that it works because it stops GC. At some point we 
> probably drops some object, GC collects them and we crash...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (COUCHDB-3245) couchjs -S option doesn't have any effect

2017-07-13 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3245.
--
Resolution: Fixed

> couchjs -S option doesn't have any effect
> -
>
> Key: COUCHDB-3245
> URL: https://issues.apache.org/jira/browse/COUCHDB-3245
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> currently -S option of couchjs sets stack _chunk_ size for js contexts
> Reference: to 
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext
> Documentation recommends 8K and I have seen cases where it was raised to 1G+ 
> in production!. That doesn't seem right at all and also probably kills 
> performance and eats memory. 
> Docs from above say:
> > The stackchunksize parameter does not control the JavaScript stack size. 
> > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing 
> > a large number for stackchunksize is a mistake. In a DEBUG build, large 
> > chunk sizes can degrade performance dramatically. The usual value of 8192 
> > is recommended
> Instead we should be setting the max gc value which is set in the runtime
> {{JS_NewRuntime(uint32_t maxbytes)}}
> Experimentally a large maxbytes seems to fix out of memory error caused by 
> large views. I suspect that it works because it stops GC. At some point we 
> probably drops some object, GC collects them and we crash...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (COUCHDB-3245) couchjs -S option doesn't have any effect

2017-07-13 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085228#comment-16085228
 ] 

Nick Vatamaniuc commented on COUCHDB-3245:
--

@gdelfino the issues was fixed just forgot to close the ticket. 

> couchjs -S option doesn't have any effect
> -
>
> Key: COUCHDB-3245
> URL: https://issues.apache.org/jira/browse/COUCHDB-3245
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> currently -S option of couchjs sets stack _chunk_ size for js contexts
> Reference: to 
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext
> Documentation recommends 8K and I have seen cases where it was raised to 1G+ 
> in production!. That doesn't seem right at all and also probably kills 
> performance and eats memory. 
> Docs from above say:
> > The stackchunksize parameter does not control the JavaScript stack size. 
> > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing 
> > a large number for stackchunksize is a mistake. In a DEBUG build, large 
> > chunk sizes can degrade performance dramatically. The usual value of 8192 
> > is recommended
> Instead we should be setting the max gc value which is set in the runtime
> {{JS_NewRuntime(uint32_t maxbytes)}}
> Experimentally a large maxbytes seems to fix out of memory error caused by 
> large views. I suspect that it works because it stops GC. At some point we 
> probably drops some object, GC collects them and we crash...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (COUCHDB-3404) Improve ./dev/run command to allow overriding config values from a file

2017-04-28 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3404:


 Summary: Improve ./dev/run command to allow overriding config 
values from a file
 Key: COUCHDB-3404
 URL: https://issues.apache.org/jira/browse/COUCHDB-3404
 Project: CouchDB
  Issue Type: Improvement
Reporter: Nick Vatamaniuc


Allow passing a config file path to ./dev/run and have those values be applied 
to the running dev cluster instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3389) Bring back jittered delay during replication shard scan

2017-04-21 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3389:


 Summary: Bring back jittered delay during replication shard scan
 Key: COUCHDB-3389
 URL: https://issues.apache.org/jira/browse/COUCHDB-3389
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


When we switched to using mem3 db for shard discovery we dropped jittered delay 
during shard scan. On a large production system with thousands of replicator 
dbs, back to back shard notification, which spawn change feeds can cause 
performance issues.

https://github.com/apache/couchdb/blob/884cf3e55f77ab1a5f26dc7202ce21771062eae6/src/couch_replicator_manager.erl#L940-L946



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3386) Use plugin-based authentication for transient replication cancelation

2017-04-20 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3386:


 Summary: Use plugin-based authentication for transient replication 
cancelation
 Key: COUCHDB-3386
 URL: https://issues.apache.org/jira/browse/COUCHDB-3386
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


Currently there is a direct check for <<"_admin">> in roles. Instead for 
consistency use 
https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db.erl#L434



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3371) Investigate putting user specified replication filtering function in replication document

2017-04-10 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3371:


 Summary: Investigate putting user specified replication filtering 
function in replication document
 Key: COUCHDB-3371
 URL: https://issues.apache.org/jira/browse/COUCHDB-3371
 Project: CouchDB
  Issue Type: Improvement
Reporter: Nick Vatamaniuc


Investigate letting users specify the filter function in the replication 
document.

There are two main reasons for it:

1) Because user specified filters live on the source and filter code contents 
is used to generated replication IDs. In order to even create a replication it 
is necessary to do a remote network fetch.  This also implies having to handle 
retries and temporary failures an area of code were this would otherwise not be 
needed.

2) If filtering code is provided in the replication document, replication ID 
calculation and tracking of changes to replication ID when filter is updated 
become trivial. Not it ranges from broken to awkwardly complicated



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3327) Improve CouchDB's LRU

2017-03-15 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3327:


 Summary: Improve CouchDB's LRU
 Key: COUCHDB-3327
 URL: https://issues.apache.org/jira/browse/COUCHDB-3327
 Project: CouchDB
  Issue Type: Task
Reporter: Nick Vatamaniuc


Since we recently started to put all dbs into the LRU. Try to improve it a bit 
to make it more performant.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3324) Scheduling Replicator

2017-03-14 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925398#comment-15925398
 ] 

Nick Vatamaniuc commented on COUCHDB-3324:
--

Fauxton PR https://github.com/apache/couchdb-fauxton/pull/864

> Scheduling Replicator
> -
>
> Key: COUCHDB-3324
> URL: https://issues.apache.org/jira/browse/COUCHDB-3324
> Project: CouchDB
>  Issue Type: New Feature
>Reporter: Nick Vatamaniuc
>
> Merge scheduling replicator 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3324) Scheduling Replicatorr

2017-03-14 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3324:


 Summary: Scheduling Replicatorr
 Key: COUCHDB-3324
 URL: https://issues.apache.org/jira/browse/COUCHDB-3324
 Project: CouchDB
  Issue Type: New Feature
Reporter: Nick Vatamaniuc


Merge scheduling replicator 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3323) Idle dbs cause excessive overhead

2017-03-13 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3323:


 Summary: Idle dbs cause excessive overhead
 Key: COUCHDB-3323
 URL: https://issues.apache.org/jira/browse/COUCHDB-3323
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


Idle dbs, especially sys_dbs like _replicator shards once opened
once for scanning would stay open forever. In a large cluster with many
_replicator shards that can add up to a significant overhead, mostly in terms
of number of active processes.

Add a mechanism to close dbs which are idle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3308) Upgrade snappy to 1.1.4

2017-02-23 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3308:


 Summary: Upgrade snappy to 1.1.4
 Key: COUCHDB-3308
 URL: https://issues.apache.org/jira/browse/COUCHDB-3308
 Project: CouchDB
  Issue Type: Improvement
Reporter: Nick Vatamaniuc


They claim a 20% decompression and 5% compression speed improvement.

https://github.com/google/snappy/commit/2d99bd14d471664758e4dfdf81b44f413a7353fd



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections

2017-02-21 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876250#comment-15876250
 ] 

Nick Vatamaniuc commented on COUCHDB-3302:
--

fabric_doc_attachments used when PUT-ing individual attachments, I was looking 
at a doc PUT with attachments in mulitpart-related format

> Attachment replication over low bandwidth network connections
> -
>
> Key: COUCHDB-3302
> URL: https://issues.apache.org/jira/browse/COUCHDB-3302
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Jan Lehnardt
> Attachments: attach_large.py, replication-failure.log, 
> replication-failure-target.log
>
>
> Setup:
> Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit 
> network connection (simulated locally with traffic shaping, see way below for 
> an example).
> {noformat}
> git clone https://github.com/apache/couchdb.git
> cd couchdb
> ./configure --disable-docs --disable-fauxton
> make release
> cd ..
> cp -r couchdb/rel/couchdb source
> cp -r couchdb/rel/couchdb target
> # set up local ini: chttpd / port: 5981 / 5983
> # set up vm.args: source@hostname.local / target@hostname.local
> # no admins
> Start both CouchDB in their own terminal windows: ./bin/couchdb
> # create all required databases, and our `t` test database
> curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t}
> # create 64MB attachments
> dd if=/dev/urandom of=att-64 bs=1024 count=65536
> # create doc on source
> curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: 
> application/octet-stream' -d @att-64
> # replicate to target
> curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json 
> -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}'
> {noformat}
> With the traffic shaping in place, the replication call doesn’t return, and 
> eventually CouchDB fails with:
> {noformat}
> [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator  
> Error in process <0.15811.0> on node 'source@hostname.local' with exit value:
> {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}
> [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0>  
> Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; 
> failed due to error {error,
> {'EXIT',
> {{{nocatch,{mp_parser_died,noproc}},
>   [{couch_att,'-foldl/4-fun-0-',3,
>[{file,"src/couch_att.erl"},{line,591}]},
>{couch_att,fold_streamed_data,4,
>[{file,"src/couch_att.erl"},{line,642}]},
>{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},
>{couch_httpd_multipart,atts_to_mp,4,
>[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
>  {gen_server,call,
>  [<0.15778.0>,
>   {send_req,
>   {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;,
>"127.0.0.1",5983,undefined,undefined,
>"/t/doc1?new_edits=false",http,ipv4_address},
>[{"Accept","application/json"},
> {"Content-Length",33194202},
> {"Content-Type",
>  "multipart/related; 
> boundary=\"0dea87076009b928b191e0b456375c93\""},
> {"User-Agent","CouchDB-Replicator/2.0.0"}],
>put,
>{#Fun,
> 
> {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>,
>  [{att,<<"att_64">>,<<"application/octet-stream">>,
>   33193656,33193656,
>   <<179,112,0,209,198,47,192,236,235,72,84,218,0,177,
> 161,242>>,
>   1,
>   {follows,<0.8720.0>,#Ref<0.0.1.23804>},
>   identity}],
>  <<"0dea87076009b928b191e0b456375c93">>,33194202}},
>[{response_format,binary},
> {inactivity_timeout,3},
> {socket_options,[{keepalive,true},{nodelay,false}]}],
>infinity}},
>   infinity]
> {noformat}
> Expected Behaviour:
> 

[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections

2017-02-20 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875076#comment-15875076
 ] 

Nick Vatamaniuc commented on COUCHDB-3302:
--

Confirmed that setting fabric request_timeout to a higher value like 9 
helps with this.

At least {{./attach_large.py --size=10 --mintime=80}}

Successfully finishes while it doesn't with the default value of 6

> Attachment replication over low bandwidth network connections
> -
>
> Key: COUCHDB-3302
> URL: https://issues.apache.org/jira/browse/COUCHDB-3302
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Jan Lehnardt
> Attachments: attach_large.py, replication-failure.log, 
> replication-failure-target.log
>
>
> Setup:
> Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit 
> network connection (simulated locally with traffic shaping, see way below for 
> an example).
> {noformat}
> git clone https://github.com/apache/couchdb.git
> cd couchdb
> ./configure --disable-docs --disable-fauxton
> make release
> cd ..
> cp -r couchdb/rel/couchdb source
> cp -r couchdb/rel/couchdb target
> # set up local ini: chttpd / port: 5981 / 5983
> # set up vm.args: source@hostname.local / target@hostname.local
> # no admins
> Start both CouchDB in their own terminal windows: ./bin/couchdb
> # create all required databases, and our `t` test database
> curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t}
> # create 64MB attachments
> dd if=/dev/urandom of=att-64 bs=1024 count=65536
> # create doc on source
> curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: 
> application/octet-stream' -d @att-64
> # replicate to target
> curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json 
> -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}'
> {noformat}
> With the traffic shaping in place, the replication call doesn’t return, and 
> eventually CouchDB fails with:
> {noformat}
> [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator  
> Error in process <0.15811.0> on node 'source@hostname.local' with exit value:
> {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}
> [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0>  
> Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; 
> failed due to error {error,
> {'EXIT',
> {{{nocatch,{mp_parser_died,noproc}},
>   [{couch_att,'-foldl/4-fun-0-',3,
>[{file,"src/couch_att.erl"},{line,591}]},
>{couch_att,fold_streamed_data,4,
>[{file,"src/couch_att.erl"},{line,642}]},
>{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},
>{couch_httpd_multipart,atts_to_mp,4,
>[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
>  {gen_server,call,
>  [<0.15778.0>,
>   {send_req,
>   {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;,
>"127.0.0.1",5983,undefined,undefined,
>"/t/doc1?new_edits=false",http,ipv4_address},
>[{"Accept","application/json"},
> {"Content-Length",33194202},
> {"Content-Type",
>  "multipart/related; 
> boundary=\"0dea87076009b928b191e0b456375c93\""},
> {"User-Agent","CouchDB-Replicator/2.0.0"}],
>put,
>{#Fun,
> 
> {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>,
>  [{att,<<"att_64">>,<<"application/octet-stream">>,
>   33193656,33193656,
>   <<179,112,0,209,198,47,192,236,235,72,84,218,0,177,
> 161,242>>,
>   1,
>   {follows,<0.8720.0>,#Ref<0.0.1.23804>},
>   identity}],
>  <<"0dea87076009b928b191e0b456375c93">>,33194202}},
>[{response_format,binary},
> {inactivity_timeout,3},
> {socket_options,[{keepalive,true},{nodelay,false}]}],
> 

[jira] [Commented] (COUCHDB-3302) Attachment replication over low bandwidth network connections

2017-02-20 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875031#comment-15875031
 ] 

Nick Vatamaniuc commented on COUCHDB-3302:
--

>From investigating it seems  to be related to how long it takes for the 
>request to complete. 

I created a "paced" python multi-part sender which PUTs an attachment over  
period of time. It splits it into chunks then sends those with sleep in between.

Attached script as attach_large.py. Can run it with {[./attach_large.py 
--size=10 --mintime=80}} that will put an attachment of size 10 bytes 
over at least 80 seconds.

With that code I was able to get a 500 error and I get:

{code}
HTTP/1.1 500 Internal Server Error
Cache-Control: must-revalidate
Content-Length: 47
Content-Type: application/json
Date: Mon, 20 Feb 2017 19:27:30 GMT
Server: CouchDB/2.0.0 (Erlang OTP/18)
X-Couch-Request-ID: 80a6cfd301
X-CouchDB-Body-Time: 0

{"error":"unknown_error","reason":"undefined"}
{code}

> Attachment replication over low bandwidth network connections
> -
>
> Key: COUCHDB-3302
> URL: https://issues.apache.org/jira/browse/COUCHDB-3302
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Jan Lehnardt
> Attachments: attach_large.py, replication-failure.log, 
> replication-failure-target.log
>
>
> Setup:
> Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit 
> network connection (simulated locally with traffic shaping, see way below for 
> an example).
> {noformat}
> git clone https://github.com/apache/couchdb.git
> cd couchdb
> ./configure --disable-docs --disable-fauxton
> make release
> cd ..
> cp -r couchdb/rel/couchdb source
> cp -r couchdb/rel/couchdb target
> # set up local ini: chttpd / port: 5981 / 5983
> # set up vm.args: source@hostname.local / target@hostname.local
> # no admins
> Start both CouchDB in their own terminal windows: ./bin/couchdb
> # create all required databases, and our `t` test database
> curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t}
> # create 64MB attachments
> dd if=/dev/urandom of=att-64 bs=1024 count=65536
> # create doc on source
> curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type: 
> application/octet-stream' -d @att-64
> # replicate to target
> curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json 
> -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}'
> {noformat}
> With the traffic shaping in place, the replication call doesn’t return, and 
> eventually CouchDB fails with:
> {noformat}
> [error] 2017-02-16T17:37:30.488990Z source@hostname.local emulator  
> Error in process <0.15811.0> on node 'source@hostname.local' with exit value:
> {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}
> [error] 2017-02-16T17:37:30.490610Z source@hostname.local <0.8721.0>  
> Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false; 
> failed due to error {error,
> {'EXIT',
> {{{nocatch,{mp_parser_died,noproc}},
>   [{couch_att,'-foldl/4-fun-0-',3,
>[{file,"src/couch_att.erl"},{line,591}]},
>{couch_att,fold_streamed_data,4,
>[{file,"src/couch_att.erl"},{line,642}]},
>{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},
>{couch_httpd_multipart,atts_to_mp,4,
>[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
>  {gen_server,call,
>  [<0.15778.0>,
>   {send_req,
>   {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false;,
>"127.0.0.1",5983,undefined,undefined,
>"/t/doc1?new_edits=false",http,ipv4_address},
>[{"Accept","application/json"},
> {"Content-Length",33194202},
> {"Content-Type",
>  "multipart/related; 
> boundary=\"0dea87076009b928b191e0b456375c93\""},
> {"User-Agent","CouchDB-Replicator/2.0.0"}],
>put,
>{#Fun,
> 
> {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>,
>  

[jira] [Created] (COUCHDB-3293) Configure maximum document ID length

2017-02-06 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3293:


 Summary: Configure maximum document ID length
 Key: COUCHDB-3293
 URL: https://issues.apache.org/jira/browse/COUCHDB-3293
 Project: CouchDB
  Issue Type: New Feature
Reporter: Nick Vatamaniuc


Allow users / operators to specify maximum document ID length.

Currently it is easy to break CouchDB by feeding it large IDs through 
_bulk_docs endpoint but which will hit the limits of http parser if sent 
through GET/PUT/DELETE methods. In case those limits are hit the error returned 
is not obvious as the requests would often crash in the mochiweb http parser 
step before a request even makes to CouchDB code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (COUCHDB-3291) Excessivly long document IDs prevent replicator from making progress

2017-02-03 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3291:


 Summary: Excessivly long document IDs prevent replicator from 
making progress
 Key: COUCHDB-3291
 URL: https://issues.apache.org/jira/browse/COUCHDB-3291
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


Currently there is not protection in couchdb from creating IDs which are too 
long. So large IDs will hit various implicit limits which usually results in 
unpredictable failure modes.

On such example implicit limit is hit in the replicator code. Replicate usually 
fetches document IDs in a bulk-like call either gets them via changes feed, 
computes revs_diffs in a post or inserts them with bulk_docs, except one case 
when it fetch open_revs. There it uses a single GET request. That requests 
fails because there is a bug / limitation in the http parser. The first GET 
line in the http request has to fit in the receive buffer for the receiving 
socket. 

Increasing that buffer allow passing through larger http requests lines. In 
configuration options it can be manipulated as 
{code}
 chttpd.server_options="[...,{recbuf, 32768},...]"
{code}

Steve Vinoski mentions something about a possible bug in http packet parser 
code as well:

http://erlang.org/pipermail/erlang-questions/2011-June/059567.html

Tracing this a bit I see that a proper mochiweb request is never even created 
and instead request hangs. So that confirms it further. It seems in the code 
here:

https://github.com/apache/couchdb-mochiweb/blob/bd6ae7cbb371666a1f68115056f7b30d13765782/src/mochiweb_http.erl#L90

The timeout clause is hit. Adding a catchall exception I get the 
{tcp_error,#Port<0.40682>,emsgsize} message which we don't handle. Seems like a 
sane place to throw a 413 or such there.

There are probably multiple ways to address the issue:

 * Increase mochiweb listener buffer to fit larger doc ids. However that is a 
separate bug and using it to control document size during replication is not 
reliable. Moreover that would allow larger IDs to propagate through the system 
during replication, then would have to configure all future replication source 
with the same maximum recbuf value.

 * Introduce a validation step in {code} couch_doc:validate_docid {code}. 
Currently that code doesn't read from config files and is in the hotpath. Added 
a config read in there might reduce performance.  If that is enabled it would 
stop creating new documents with large ids. But have to decide how to handle 
already existing IDs which are larger than the limit.

 * Introduce a validation/bypass in the replicator. Specifically targeting 
replicator might help prevent propagation of large IDs during replication. 
There is a already a similar case of skipping writing large attachment or large 
documents (which exceed request size) and bumping {code} doc_write_failures 
{code}.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (COUCHDB-3284) 8Kb read-ahead in couch_file causes extra IO and binary memory usage

2017-01-31 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3284.
--
Resolution: Fixed

Merged fix

> 8Kb read-ahead in couch_file causes extra IO and binary memory usage
> 
>
> Key: COUCHDB-3284
> URL: https://issues.apache.org/jira/browse/COUCHDB-3284
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 2.0.0
>Reporter: Nick Vatamaniuc
> Attachments: jira_io_increased.png
>
>
> 8Kb read-ahead logic in couch_file seems to cause extra input IO thrashing, 
> binary memory usage but doesn't speed-up



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COUCHDB-3284) 8Kb read-ahead in couch_file causes extra IO and binary memory usage

2017-01-27 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843539#comment-15843539
 ] 

Nick Vatamaniuc commented on COUCHDB-3284:
--

Attached performance graphs showing increased IO usage and increased binary 
memory usage. Those go away after disabling 8kb read-ahead logic

> 8Kb read-ahead in couch_file causes extra IO and binary memory usage
> 
>
> Key: COUCHDB-3284
> URL: https://issues.apache.org/jira/browse/COUCHDB-3284
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 2.0.0
>Reporter: Nick Vatamaniuc
> Attachments: jira_io_increased.png
>
>
> 8Kb read-ahead logic in couch_file seems to cause extra input IO thrashing, 
> binary memory usage but doesn't speed-up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3284) 8Kb read-ahead in couch_file causes extra IO and binary memory usage

2017-01-27 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3284:


 Summary: 8Kb read-ahead in couch_file causes extra IO and binary 
memory usage
 Key: COUCHDB-3284
 URL: https://issues.apache.org/jira/browse/COUCHDB-3284
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3271) Replications crash with 'kaboom' exit

2017-01-20 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3271.
--
Resolution: Fixed

> Replications crash with 'kaboom' exit 
> --
>
> Key: COUCHDB-3271
> URL: https://issues.apache.org/jira/browse/COUCHDB-3271
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> In a few cases it was observer that replications were crashing with `kaboom` 
> exit. This happens here:
> https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236
> this is during an open_revs call one of the docs. So change feed found it but 
> then could not get its revisions.
> The reason is open_revs get request returns an empty result when more than 
> one nodes are in maintenance mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3271) Replications crash with 'kaboom' exit

2017-01-20 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-3271.


> Replications crash with 'kaboom' exit 
> --
>
> Key: COUCHDB-3271
> URL: https://issues.apache.org/jira/browse/COUCHDB-3271
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> In a few cases it was observer that replications were crashing with `kaboom` 
> exit. This happens here:
> https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236
> this is during an open_revs call one of the docs. So change feed found it but 
> then could not get its revisions.
> The reason is open_revs get request returns an empty result when more than 
> one nodes are in maintenance mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3271) Replications crash with 'kaboom' exit

2017-01-13 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822100#comment-15822100
 ] 

Nick Vatamaniuc commented on COUCHDB-3271:
--

https://github.com/apache/couchdb-fabric/pull/84

> Replications crash with 'kaboom' exit 
> --
>
> Key: COUCHDB-3271
> URL: https://issues.apache.org/jira/browse/COUCHDB-3271
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> In a few cases it was observer that replications were crashing with `kaboom` 
> exit. This happens here:
> https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236
> this is during an open_revs call one of the docs. So change feed found it but 
> then could not get its revisions.
> The reason is open_revs get request returns an empty result when more than 
> one nodes are in maintenance mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3271) Replications crash with 'kaboom' exit

2017-01-13 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3271:


 Summary: Replications crash with 'kaboom' exit 
 Key: COUCHDB-3271
 URL: https://issues.apache.org/jira/browse/COUCHDB-3271
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


In a few cases it was observer that replications were crashing with `kaboom` 
exit. This happens here:

https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236

this is during an open_revs call one of the docs. So change feed found it but 
then could not get its revisions.

The reason is open_revs get request returns an empty result when more than one 
nodes are in maintenance mode. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3267) Don't exit on timeout callback in cassim fabric:changes feed

2017-01-06 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-3267.


> Don't exit on timeout callback in cassim fabric:changes feed
> 
>
> Key: COUCHDB-3267
> URL: https://issues.apache.org/jira/browse/COUCHDB-3267
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> cassim metadata changes feed uses a continuous changes feed with heartbeats. 
> Don't exit on timeout and restart after 5 seconds instead continue receiving 
> changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3267) Don't exit on timeout callback in cassim fabric:changes feed

2017-01-06 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3267.
--
Resolution: Fixed

> Don't exit on timeout callback in cassim fabric:changes feed
> 
>
> Key: COUCHDB-3267
> URL: https://issues.apache.org/jira/browse/COUCHDB-3267
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> cassim metadata changes feed uses a continuous changes feed with heartbeats. 
> Don't exit on timeout and restart after 5 seconds instead continue receiving 
> changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3266) changes feed not invoked when deleting a document using a selector filtered feed.

2017-01-04 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798696#comment-15798696
 ] 

Nick Vatamaniuc commented on COUCHDB-3266:
--

Thanks for checking out the new selector feature!

Deleting a document via the DELETE method is an equivalent operation PUT-ing 
{"_deleted":true} document a a new revision.

It seems to achieve the desired effect, it is also possible to delete a 
document and keep all the original fields by just adding a "_deleted:true field 
and PUT-ing that. For example {"type":"message", "subtype":"email", ..., 
"_deleted":true} then the document will pass the filter.

> changes feed not invoked when deleting a document using a selector filtered 
> feed.
> -
>
> Key: COUCHDB-3266
> URL: https://issues.apache.org/jira/browse/COUCHDB-3266
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Reporter: Steven Spungin
>
> When I subscribe to the _changes endpoint with a selector filter, I get 
> updated and created changes, but not deleted changes.
> But when I subscribe without a selector, I get all the changes as expected.
> Here is my posted selector:
>  {"selector": {"type": "message", "subtype": "email"}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3267) Don't exit on timeout callback in cassim fabric:changes feed

2017-01-03 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3267:


 Summary: Don't exit on timeout callback in cassim fabric:changes 
feed
 Key: COUCHDB-3267
 URL: https://issues.apache.org/jira/browse/COUCHDB-3267
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


cassim metadata changes feed uses a continuous changes feed with heartbeats. 
Don't exit on timeout and restart after 5 seconds instead continue receiving 
changes.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3245) couchjs -S option doesn't have any effect

2016-11-29 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707631#comment-15707631
 ] 

Nick Vatamaniuc commented on COUCHDB-3245:
--

Interesting! Looks like it was fixed before: 
https://issues.apache.org/jira/browse/COUCHDB-1792

> couchjs -S option doesn't have any effect
> -
>
> Key: COUCHDB-3245
> URL: https://issues.apache.org/jira/browse/COUCHDB-3245
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> currently -S option of couchjs sets stack _chunk_ size for js contexts
> Reference: to 
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext
> Documentation recommends 8K and I have seen cases where it was raised to 1G+ 
> in production!. That doesn't seem right at all and also probably kills 
> performance and eats memory. 
> Docs from above say:
> > The stackchunksize parameter does not control the JavaScript stack size. 
> > (The JSAPI does not provide a way to adjust the stack depth limit.) Passing 
> > a large number for stackchunksize is a mistake. In a DEBUG build, large 
> > chunk sizes can degrade performance dramatically. The usual value of 8192 
> > is recommended
> Instead we should be setting the max gc value which is set in the runtime
> {{JS_NewRuntime(uint32_t maxbytes)}}
> It seems that acts similarly to a max heap used (from what I understand). 
> Which makes more sense. A stack size of hundreds of megabytes doesn't sound 
> right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3245) couchjs -S option doesn't have any effect

2016-11-29 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3245:


 Summary: couchjs -S option doesn't have any effect
 Key: COUCHDB-3245
 URL: https://issues.apache.org/jira/browse/COUCHDB-3245
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


currently -S option of couchjs sets stack _chunk_ size for js contexts

Reference: to 
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS_NewContext

Documentation recommends 8K and I have seen cases where it was raised to 1G+ in 
production!. That doesn't seem right at all and also probably kills performance 
and eats memory. 

Docs from above say:

> The stackchunksize parameter does not control the JavaScript stack size. (The 
> JSAPI does not provide a way to adjust the stack depth limit.) Passing a 
> large number for stackchunksize is a mistake. In a DEBUG build, large chunk 
> sizes can degrade performance dramatically. The usual value of 8192 is 
> recommended

Instead we should be setting the max gc value which is set in the runtime

{{JS_NewRuntime(uint32_t maxbytes)}}

It seems that acts similarly to a max heap used (from what I understand). Which 
makes more sense. A stack size of hundreds of megabytes doesn't sound right.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3242) Make get view group info timeout in couch_indexer configurable

2016-11-23 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3242:


 Summary: Make get view group info timeout in couch_indexer 
configurable
 Key: COUCHDB-3242
 URL: https://issues.apache.org/jira/browse/COUCHDB-3242
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


Some busy views will take longer than the default 5 seconds to return.

https://github.com/cloudant/couchdb-couch-index/blob/master/src/couch_index.erl#L57-L58



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3199) Replicator VDU function doesn't acount for an already malformed document in replicator db

2016-10-31 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3199.
--
Resolution: Fixed

> Replicator VDU function doesn't acount for an already malformed document in 
> replicator db
> -
>
> Key: COUCHDB-3199
> URL: https://issues.apache.org/jira/browse/COUCHDB-3199
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> In case when code is updated from an older version of couchdb which didn't 
> have (or had a less restrictive) VDU function. A malformed document could 
> have ended up in the _replicator database.
> Replicator will try to parse it and flag it as an error then try to update 
> the document. However the more restrictive VDU function will cause the 
> document update to crash the replicator manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3199) Replicator VDU function doesn't acount for an already malformed document in replicator db

2016-10-14 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3199:


 Summary: Replicator VDU function doesn't acount for an already 
malformed document in replicator db
 Key: COUCHDB-3199
 URL: https://issues.apache.org/jira/browse/COUCHDB-3199
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


In case when code is updated from an older version of couchdb which didn't have 
(or had a less restrictive) VDU function. A malformed document could have ended 
up in the _replicator database.

Replicator will try to parse it and flag it as an error then try to update the 
document. However the more restrictive VDU function will cause the document 
update to crash the replicator manager.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests

2016-10-07 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3174.
--
Resolution: Fixed

> max_document_size setting can by bypassed by issuing multipart/related 
> requests
> ---
>
> Key: COUCHDB-3174
> URL: https://issues.apache.org/jira/browse/COUCHDB-3174
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
> Attachments: attach_large.py
>
>
> Testing how replicator handled small values of max_document_size parameter, 
> discovered if user issues PUT requests which are multipart/related, then 
> max_document_size setting is bypassed.
> Wireshark capture of a PUT with attachments request coming from replicator in 
> a EUnit test I wrote.  max_document_size was set to 1 yet a 70k byte 
> document with a 70k byte attachment was created.
> {code}
> PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1
> Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3"
> Content-Length: 140515
> Accept: application/json
> --e5d21d5fd988dc1c6c6e8911030213b3
> Content-Type: application/json
> {"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":"
> ...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}}
> --e5d21d5fd988dc1c6c6e8911030213b3
> Content-Disposition: attachment; filename="att1"
> Content-Type: app/binary
> Content-Length: 7
> xx
> --e5d21d5fd988dc1c6c6e8911030213b3--
> HTTP/1.1 201 Created
> {code}
> Here is a regular request which works as expected:
> {code}
> PUT /dbl/dl2 HTTP/1.1
> Content-Length: 100026
> Content-Type: application/json
> Accept: application/json
> {"_id": "dl2", "size": "...xxx"}
> HTTP/1.1 413 Request Entity Too Large
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests

2016-10-07 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1211#comment-1211
 ] 

Nick Vatamaniuc commented on COUCHDB-3174:
--

The issue is not 100% fixed, but it should help against accidental cases

> max_document_size setting can by bypassed by issuing multipart/related 
> requests
> ---
>
> Key: COUCHDB-3174
> URL: https://issues.apache.org/jira/browse/COUCHDB-3174
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
> Attachments: attach_large.py
>
>
> Testing how replicator handled small values of max_document_size parameter, 
> discovered if user issues PUT requests which are multipart/related, then 
> max_document_size setting is bypassed.
> Wireshark capture of a PUT with attachments request coming from replicator in 
> a EUnit test I wrote.  max_document_size was set to 1 yet a 70k byte 
> document with a 70k byte attachment was created.
> {code}
> PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1
> Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3"
> Content-Length: 140515
> Accept: application/json
> --e5d21d5fd988dc1c6c6e8911030213b3
> Content-Type: application/json
> {"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":"
> ...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}}
> --e5d21d5fd988dc1c6c6e8911030213b3
> Content-Disposition: attachment; filename="att1"
> Content-Type: app/binary
> Content-Length: 7
> xx
> --e5d21d5fd988dc1c6c6e8911030213b3--
> HTTP/1.1 201 Created
> {code}
> Here is a regular request which works as expected:
> {code}
> PUT /dbl/dl2 HTTP/1.1
> Content-Length: 100026
> Content-Type: application/json
> Accept: application/json
> {"_id": "dl2", "size": "...xxx"}
> HTTP/1.1 413 Request Entity Too Large
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (COUCHDB-3180) Add ability to return a list of features in server's welcome message

2016-10-05 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc updated COUCHDB-3180:
-
Comment: was deleted

(was: From GH [~rnewson] mentioned:
{quote}
We've wanted a general feature discovery thing for a while now so let's come up 
with a way for code to register with this. The patch should include that 
mechanism even if nothing in couchdb calls it yet, and a test that shows that a 
registered feature shows up in the welcome message.
{quote}

Here is what I came up with on first try:

* Don't add a new application, it would be silly.

* Stick it in config application. It seems like a configuration-y thing.

* API looks like
   - {{config:features() -> \[<<"feature1">>, <<"feature2">>, ...\].}}
   - {{config:feature_enable(<<"feature1">>).}}
   - {{config:feature_disable(<<"feature2">>).}}
* Applications enable features and disable them. Then `chttpd` reads list of 
features from config and shows them in the welcome message.

* Behind the scenes it is really just writing to config "\[features\]" section 
a bunch of booleans. With persistence set to `false`. 

* Users can directly set features in the config file if they want. Could be a 
something external to the CouchDB instance, maybe something about how code was 
compiled or where it is running that warrants a different treatment from the 
API standpoint.

The advantage is it doesn't reinvent the world. Takes advantage of config 
server (so applications can monitor for changes and such if needed).)

> Add ability to return a list of features in server's welcome message
> 
>
> Key: COUCHDB-3180
> URL: https://issues.apache.org/jira/browse/COUCHDB-3180
> Project: CouchDB
>  Issue Type: New Feature
>Reporter: Nick Vatamaniuc
>
> This could be used to let users discover quickly the availability of some API 
> or modes of operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3180) Add ability to return a list of features in server's welcome message

2016-10-05 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550508#comment-15550508
 ] 

Nick Vatamaniuc commented on COUCHDB-3180:
--

>From GH [~rnewson] mentioned:
{quote}
We've wanted a general feature discovery thing for a while now so let's come up 
with a way for code to register with this. The patch should include that 
mechanism even if nothing in couchdb calls it yet, and a test that shows that a 
registered feature shows up in the welcome message.
{quote}

Here is what I came up with on first try:

* Don't add a new application, it would be silly.

* Stick it in config application. It seems like a configuration-y thing.

* API looks like
   - {{config:features() -> \[<<"feature1">>, <<"feature2">>, ...\].}}
   - {{config:feature_enable(<<"feature1">>).}}
   - {{config:feature_disable(<<"feature2">>).}}
* Applications enable features and disable them. Then `chttpd` reads list of 
features from config and shows them in the welcome message.

* Behind the scenes it is really just writing to config "\[features\]" section 
a bunch of booleans. With persistence set to `false`. 

* Users can directly set features in the config file if they want. Could be a 
something external to the CouchDB instance, maybe something about how code was 
compiled or where it is running that warrants a different treatment from the 
API standpoint.

The advantage is it doesn't reinvent the world. Takes advantage of config 
server (so applications can monitor for changes and such if needed).

> Add ability to return a list of features in server's welcome message
> 
>
> Key: COUCHDB-3180
> URL: https://issues.apache.org/jira/browse/COUCHDB-3180
> Project: CouchDB
>  Issue Type: New Feature
>Reporter: Nick Vatamaniuc
>
> This could be used to let users discover quickly the availability of some API 
> or modes of operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (COUCHDB-3180) Add ability to return a list of features in server's welcome message

2016-10-05 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc updated COUCHDB-3180:
-
Comment: was deleted

(was: https://github.com/apache/couchdb-chttpd/pull/144)

> Add ability to return a list of features in server's welcome message
> 
>
> Key: COUCHDB-3180
> URL: https://issues.apache.org/jira/browse/COUCHDB-3180
> Project: CouchDB
>  Issue Type: New Feature
>Reporter: Nick Vatamaniuc
>
> This could be used to let users discover quickly the availability of some API 
> or modes of operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3180) Add ability to return a list of features in server's welcome message

2016-10-05 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549772#comment-15549772
 ] 

Nick Vatamaniuc commented on COUCHDB-3180:
--

https://github.com/apache/couchdb-chttpd/pull/144

> Add ability to return a list of features in server's welcome message
> 
>
> Key: COUCHDB-3180
> URL: https://issues.apache.org/jira/browse/COUCHDB-3180
> Project: CouchDB
>  Issue Type: New Feature
>Reporter: Nick Vatamaniuc
>
> This could be used to let users discover quickly the availability of some API 
> or modes of operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3179) Add ability to return a list of features in server's welcome message

2016-10-05 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-3179.

Resolution: Duplicate

> Add ability to return a list of features in server's welcome message
> 
>
> Key: COUCHDB-3179
> URL: https://issues.apache.org/jira/browse/COUCHDB-3179
> Project: CouchDB
>  Issue Type: New Feature
>Reporter: Nick Vatamaniuc
>
> This could be used to let users discover quickly the availability of some API 
> or modes of operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3180) Add ability to return a list of features in server's welcome message

2016-10-05 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3180:


 Summary: Add ability to return a list of features in server's 
welcome message
 Key: COUCHDB-3180
 URL: https://issues.apache.org/jira/browse/COUCHDB-3180
 Project: CouchDB
  Issue Type: New Feature
Reporter: Nick Vatamaniuc


This could be used to let users discover quickly the availability of some API 
or modes of operation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3179) Add ability to return a list of features in server's welcome message

2016-10-05 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3179:


 Summary: Add ability to return a list of features in server's 
welcome message
 Key: COUCHDB-3179
 URL: https://issues.apache.org/jira/browse/COUCHDB-3179
 Project: CouchDB
  Issue Type: New Feature
Reporter: Nick Vatamaniuc


This could be used to let users discover quickly the availability of some API 
or modes of operation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3168) Replicator doesn't handle well writing documents to a target db which has a small max_document_size

2016-10-04 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3168.
--
Resolution: Fixed

> Replicator doesn't handle well writing documents to a target db which has a 
> small max_document_size
> ---
>
> Key: COUCHDB-3168
> URL: https://issues.apache.org/jira/browse/COUCHDB-3168
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> If a target db has set a smaller document max size, replication crashes.
> It might make sense for the replication to not crash and instead treat 
> document size as an implicit replication filter then display doc write 
> failures in the stats / task info / completion record of normal replications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3175) When PUT-ing a multipart/related doc with attachment get a 500 error on md5 mismatch

2016-10-03 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3175:


 Summary: When PUT-ing a multipart/related doc with attachment get 
a 500 error on md5 mismatch
 Key: COUCHDB-3175
 URL: https://issues.apache.org/jira/browse/COUCHDB-3175
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


fabric_doc_updater handle_message crashes with a function_clause which crashes 
the whole request.

Instead, perhaps is should handle:

{code}
{md5_mismatch, Blah}, _Worker, _Acc0) -> ...
{code}

and return a 4xx code...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests

2016-10-03 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544234#comment-15544234
 ] 

Nick Vatamaniuc commented on COUCHDB-3174:
--

The problem seems to be here:

https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L763-L776

and here:

https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L763-L776

We need to call the json_body bit in order to get the max document size which 
is passed to `MochiReq:recv_body(MaxSize)`.

Presumably we could retrieve Content-Length ourselves before mp parsing and 
raise a 413, but I haven't thought about it too much yet...

> max_document_size setting can by bypassed by issuing multipart/related 
> requests
> ---
>
> Key: COUCHDB-3174
> URL: https://issues.apache.org/jira/browse/COUCHDB-3174
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> Testing how replicator handled small values of max_document_size parameter, 
> discovered if user issues PUT requests which are multipart/related, then 
> max_document_size setting is bypassed.
> Wireshark capture of a PUT with attachments request coming from replicator in 
> a EUnit test I wrote.  max_document_size was set to 1 yet a 70k byte 
> document with a 70k byte attachment was created.
> {code}
> PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1
> Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3"
> Content-Length: 140515
> Accept: application/json
> --e5d21d5fd988dc1c6c6e8911030213b3
> Content-Type: application/json
> {"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":"
> ...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}}
> --e5d21d5fd988dc1c6c6e8911030213b3
> Content-Disposition: attachment; filename="att1"
> Content-Type: app/binary
> Content-Length: 7
> xx
> --e5d21d5fd988dc1c6c6e8911030213b3--
> HTTP/1.1 201 Created
> {code}
> Here is a regular request which works as expected:
> {code}
> PUT /dbl/dl2 HTTP/1.1
> Content-Length: 100026
> Content-Type: application/json
> Accept: application/json
> {"_id": "dl2", "size": "...xxx"}
> HTTP/1.1 413 Request Entity Too Large
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3174) max_document_size setting can by bypassed by issuing multipart/related requests

2016-10-03 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3174:


 Summary: max_document_size setting can by bypassed by issuing 
multipart/related requests
 Key: COUCHDB-3174
 URL: https://issues.apache.org/jira/browse/COUCHDB-3174
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


Testing how replicator handled small values of max_document_size parameter, 
discovered if user issues PUT requests which are multipart/related, then 
max_document_size setting is bypassed.

Wireshark capture of a PUT with attachments request coming from replicator in a 
EUnit test I wrote.  max_document_size was set to 1 yet a 70k byte document 
with a 70k byte attachment was created.

{code}
PUT /eunit-test-db-147555017168185/doc0?new_edits=false HTTP/1.1
Content-Type: multipart/related; boundary="e5d21d5fd988dc1c6c6e8911030213b3"
Content-Length: 140515
Accept: application/json

--e5d21d5fd988dc1c6c6e8911030213b3
Content-Type: application/json

{"_id":"doc0","_rev":"1-40a6a02761aba1474c4a1ad9081a4c2e","x":"
...","_revisions":{"start":1,"ids":["40a6a02761aba1474c4a1ad9081a4c2e"]},"_attachments":{"att1":{"content_type":"app/binary","revpos":1,"digest":"md5-u+COd6RLUd6BGz0wJyuZFg==","length":7,"follows":true}}}
--e5d21d5fd988dc1c6c6e8911030213b3
Content-Disposition: attachment; filename="att1"
Content-Type: app/binary
Content-Length: 7

xx
--e5d21d5fd988dc1c6c6e8911030213b3--

HTTP/1.1 201 Created
{code}


Here is a regular request which works as expected:

{code}
PUT /dbl/dl2 HTTP/1.1
Content-Length: 100026
Content-Type: application/json
Accept: application/json
{"_id": "dl2", "size": "...xxx"}

HTTP/1.1 413 Request Entity Too Large

{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2992) Add additional support for document size

2016-09-29 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532980#comment-15532980
 ] 

Nick Vatamaniuc commented on COUCHDB-2992:
--

This is related to replications crashing unexpectedly. Users can add documents 
smaller than the limit. Then replicator batches them up to 500 at a time and 
then repeatedly crashes. 

https://issues.apache.org/jira/browse/COUCHDB-3168

> Add additional support for document size
> 
>
> Key: COUCHDB-2992
> URL: https://issues.apache.org/jira/browse/COUCHDB-2992
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Reporter: Tony Sun
>
> Currently, only max_document_size of 64 GB is the restriction for users 
> creating documents. Large documents often leads to issues with our indexers. 
> This feature will allow users more finer grain control over document size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size

2016-09-29 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc updated COUCHDB-3169:
-
Comment: was deleted

(was: There is already an open ticket and associated pr related to this:

https://issues.apache.org/jira/browse/COUCHDB-2992)

> couchdb.max_document_size setting is actually max_http_request_size
> ---
>
> Key: COUCHDB-3169
> URL: https://issues.apache.org/jira/browse/COUCHDB-3169
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> It turns out couchdb.max_document_size doesn't limit document size really, it 
> limits http request size. 
> For PUT document requests both are similar, but that is not the case for 
> _bulk_docs requests. For example if max_document_size is set to 1MB, and user 
> post: 10, 200KB dbs, their whole _bulk_docs will fail.
> It would probably be useful to rename the setting to max_request_size and put 
> it in chttpd section. And then possibly  implement a max_document_size as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size

2016-09-29 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3169.
--
Resolution: Duplicate

https://issues.apache.org/jira/browse/COUCHDB-2992

> couchdb.max_document_size setting is actually max_http_request_size
> ---
>
> Key: COUCHDB-3169
> URL: https://issues.apache.org/jira/browse/COUCHDB-3169
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> It turns out couchdb.max_document_size doesn't limit document size really, it 
> limits http request size. 
> For PUT document requests both are similar, but that is not the case for 
> _bulk_docs requests. For example if max_document_size is set to 1MB, and user 
> post: 10, 200KB dbs, their whole _bulk_docs will fail.
> It would probably be useful to rename the setting to max_request_size and put 
> it in chttpd section. And then possibly  implement a max_document_size as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size

2016-09-29 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532971#comment-15532971
 ] 

Nick Vatamaniuc commented on COUCHDB-3169:
--

There is already an open ticket and associated pr related to this:

https://issues.apache.org/jira/browse/COUCHDB-2992

> couchdb.max_document_size setting is actually max_http_request_size
> ---
>
> Key: COUCHDB-3169
> URL: https://issues.apache.org/jira/browse/COUCHDB-3169
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> It turns out couchdb.max_document_size doesn't limit document size really, it 
> limits http request size. 
> For PUT document requests both are similar, but that is not the case for 
> _bulk_docs requests. For example if max_document_size is set to 1MB, and user 
> post: 10, 200KB dbs, their whole _bulk_docs will fail.
> It would probably be useful to rename the setting to max_request_size and put 
> it in chttpd section. And then possibly  implement a max_document_size as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size

2016-09-28 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3169:


 Summary: couchdb.max_document_size setting is actually 
max_http_request_size
 Key: COUCHDB-3169
 URL: https://issues.apache.org/jira/browse/COUCHDB-3169
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


It turns out couchdb.max_document_size doesn't limit document size really, it 
limits http request size. 

For PUT document requests both are similar, but that is not the case for 
_bulk_docs requests. For example if max_document_size is set to 1MB, and user 
post: 10, 200KB dbs, their whole _bulk_docs will fail.

It would probably be useful to rename the setting to max_request_size and put 
it in chttpd section. And then possibly  implement a max_document_size as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3169) couchdb.max_document_size setting is actually max_http_request_size

2016-09-28 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530654#comment-15530654
 ] 

Nick Vatamaniuc commented on COUCHDB-3169:
--

Related to: https://issues.apache.org/jira/browse/COUCHDB-3168

> couchdb.max_document_size setting is actually max_http_request_size
> ---
>
> Key: COUCHDB-3169
> URL: https://issues.apache.org/jira/browse/COUCHDB-3169
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> It turns out couchdb.max_document_size doesn't limit document size really, it 
> limits http request size. 
> For PUT document requests both are similar, but that is not the case for 
> _bulk_docs requests. For example if max_document_size is set to 1MB, and user 
> post: 10, 200KB dbs, their whole _bulk_docs will fail.
> It would probably be useful to rename the setting to max_request_size and put 
> it in chttpd section. And then possibly  implement a max_document_size as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3168) Replicator doesn't handle writing document to a db which has a limited document size

2016-09-28 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530569#comment-15530569
 ] 

Nick Vatamaniuc commented on COUCHDB-3168:
--

413 are emitted per request, generated from here:

https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L607-L611

So "max_document_size" is not strictly true as it is max_request_size really. 
Can still have documents smaller than that size just have many of them in a 
_bulk_docs request.

> Replicator doesn't handle writing document to a db which has a limited 
> document size
> 
>
> Key: COUCHDB-3168
> URL: https://issues.apache.org/jira/browse/COUCHDB-3168
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> If a target db has set a smaller document max size, replication crashes.
> It might make sense for the replication to not crash and instead treat 
> document size as an implicit replication filter then display doc write 
> failures in the stats / task info / completion record of normal replications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3168) Replicator doesn't handle writing document to a db which has a limited document size

2016-09-28 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530426#comment-15530426
 ] 

Nick Vatamaniuc commented on COUCHDB-3168:
--

Initially this seemed like a one-line change:

https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_api_wrap.erl#L451

However a too large document crashes the whole _bulk_docs request it seems with:

{"error":"too_large","reason":"the request entity is too large"}

This mean we don't know which ones from the list of docs succeeded and which 
ones didn't.

I tried this with:

curl -X DELETE http://adm:pass@localhost:15984/x; curl -X PUT 
http://adm:pass@localhost:15984/x && curl -d @large_docs.json -H 'Content-Type: 
application/json' -X POST http://adm:pass@localhost:15984/x/_bulk_docs

where large_docs.json looked something like

{code}
{
"docs" : [
{"_id" : "doc1"},
{"_id" : "doc2", "large":"x"}
]
}
{code}

and max docs size was set to something smaller than the "large" value in the 
docs

> Replicator doesn't handle writing document to a db which has a limited 
> document size
> 
>
> Key: COUCHDB-3168
> URL: https://issues.apache.org/jira/browse/COUCHDB-3168
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> If a target db has set a smaller document max size, replication crashes.
> It might make sense for the replication to not crash and instead treat 
> document size as an implicit replication filter then display doc write 
> failures in the stats / task info / completion record of normal replications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3168) Replicator doesn't handle writing document to a db which has a limited document size

2016-09-28 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3168:


 Summary: Replicator doesn't handle writing document to a db which 
has a limited document size
 Key: COUCHDB-3168
 URL: https://issues.apache.org/jira/browse/COUCHDB-3168
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


If a target db has set a smaller document max size, replication crashes.

It might make sense for the replication to not crash and instead treat document 
size as an implicit replication filter then display doc write failures in the 
stats / task info / completion record of normal replications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3167) CouchDB replicator will retry forever if it cannot write to source db

2016-09-27 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3167:


 Summary: CouchDB replicator will retry forever if it cannot write 
to source db
 Key: COUCHDB-3167
 URL: https://issues.apache.org/jira/browse/COUCHDB-3167
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


If a replication is using checkpoints (and by default they do), and replication 
document doesn't not have authorization to write to source db, replication will 
crash repeatedly.

Crashing is expected and not a problem, however, each time it crashes it writes 
an error state to the replication doc and then the replication job exits. 
Writing the error state, generates a new doc update change for the _replicator 
db. Replicator reads the document change. Starts a new replication job. Writes 
a "triggered" state to the document. Replication starts successfully then 
crashes and writes "error" to the document.

So alternating states of "triggered" and "error" keep being written to the 
document forever. Looking at some examples of this there was a shard >900GB in 
size. Some as high as 500GB.

The critical bit above is that the replication starts successfully. There is a 
mechanism to fail and cancel replications which fail repeated starts. However 
after replication jobs start, if it crashes, it will be restarted an unlimited 
number of times.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3149) Exception written to the log if db deleted while there is a change feed running

2016-09-15 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493655#comment-15493655
 ] 

Nick Vatamaniuc commented on COUCHDB-3149:
--

Here is an attempt at fixing it:

https://github.com/apache/couchdb-fabric/pull/69

But not familiar with that code, so probably took the wrong approach.

> Exception written to the log if db deleted while there is a change feed 
> running
> ---
>
> Key: COUCHDB-3149
> URL: https://issues.apache.org/jira/browse/COUCHDB-3149
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> {code}
> [info] 2016-09-14T20:08:23.217251Z node1@127.0.0.1 <0.23485.0> ea02496172 
> ea02496172 127.0.0.1  localhost:15984 DELETE /d1 200 ok 46
> [error] 2016-09-14T20:08:23.221676Z node1@127.0.0.1 <0.22945.0>  
> rexi_server 
> error:{'EXIT',{{stop,{cb_state,<0.22937.0>,#Ref<0.0.1.15627>,true}},[{couch_event_listener_mfa,handle_event,3,[{file,"src/couch_event_listener_mfa.erl"},{line,91}]},{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,142}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"
> },{line,139}]}]}} 
> [{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,150}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
> [info] 2016-09-14T20:08:23.222174Z node1@127.0.0.1 <0.22898.0> 549ae68ef1 
> 549ae68ef1 127.0.0.1  localhost:15984 GET /d1/_changes?feed=longpoll 200 ok 
> 32901
> {code}
> Appears in the log if a database gets deleted while there is a change feed 
> running. Both longpoll or continuous seem to trigger the behavior.
> Exception above in couch_event_listener_mfa:handle_event comes from
> https://github.com/apache/couchdb-couch-event/blob/master/src/couch_event_listener_mfa.erl#L91
> which, in turn comes from fabric_db_update_listener handle_db_event returning 
> \{stop, St\}  in:
> https://github.com/apache/couchdb-fabric/blob/master/src/fabric_db_update_listener.erl#L87
> It seems couch_event_listerner_mfa:handle_event doesn’t handle  \{stop, St\} 
> only, \{ok, NewState\} or just stop or it raises an exception.
> I tried to replace \{stop, St\} with \{ok, St\} and then with stop. But in 
> both cases change feeds never stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3149) Exception written to the log if db deleted while there is a change feed running

2016-09-15 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3149:


 Summary: Exception written to the log if db deleted while there is 
a change feed running
 Key: COUCHDB-3149
 URL: https://issues.apache.org/jira/browse/COUCHDB-3149
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


{code}

[info] 2016-09-14T20:08:23.217251Z node1@127.0.0.1 <0.23485.0> ea02496172 
ea02496172 127.0.0.1  localhost:15984 DELETE /d1 200 ok 46

[error] 2016-09-14T20:08:23.221676Z node1@127.0.0.1 <0.22945.0>  
rexi_server 
error:{'EXIT',{{stop,{cb_state,<0.22937.0>,#Ref<0.0.1.15627>,true}},[{couch_event_listener_mfa,handle_event,3,[{file,"src/couch_event_listener_mfa.erl"},{line,91}]},{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,142}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"
},{line,139}]}]}} 
[{couch_event_listener,do_event,3,[{file,"src/couch_event_listener.erl"},{line,150}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]

[info] 2016-09-14T20:08:23.222174Z node1@127.0.0.1 <0.22898.0> 549ae68ef1 
549ae68ef1 127.0.0.1  localhost:15984 GET /d1/_changes?feed=longpoll 200 ok 
32901
{code}

Appears in the log if a database gets deleted while there is a change feed 
running. Both longpoll or continuous seem to trigger the behavior.

Exception above in couch_event_listener_mfa:handle_event comes from

https://github.com/apache/couchdb-couch-event/blob/master/src/couch_event_listener_mfa.erl#L91

which, in turn comes from fabric_db_update_listener handle_db_event returning 
\{stop, St\}  in:

https://github.com/apache/couchdb-fabric/blob/master/src/fabric_db_update_listener.erl#L87

It seems couch_event_listerner_mfa:handle_event doesn’t handle  \{stop, St\} 
only, \{ok, NewState\} or just stop or it raises an exception.

I tried to replace \{stop, St\} with \{ok, St\} and then with stop. But in both 
cases change feeds never stopped.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports

2016-09-06 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467500#comment-15467500
 ] 

Nick Vatamaniuc commented on COUCHDB-2980:
--

Wonder if it is worth at least preventing creating local replications like the 
original pr did? https://github.com/apache/couchdb-couch-replicator/pull/41

Otherwise behavior is surprising for someone with 1.x experience. And then 
later even if we add a local clustered support (say in 2.1), it will all of the 
sudden do something different.

In the meantime is using `http://localhost:5984/db` an alternative for users to 
get the equivalent behavior? In other words would that cover Chris's case of 
make replicator db work as expected if it is replicated to another cluster?

> Replicator DB on 15984 replicates to backdoor ports
> ---
>
> Key: COUCHDB-2980
> URL: https://issues.apache.org/jira/browse/COUCHDB-2980
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Robert Kowalski
>
> If you POST a doc into the replicator database a replication is kicked off 
> and finishes successfully (usual 5984 port which maps to 15984 via haproxy).
> The problem is that the DB is replicated to the backdoor ports (15986) and is 
> not visible on the other ports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3111) Default replicator change feed timeout too short

2016-08-29 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446957#comment-15446957
 ] 

Nick Vatamaniuc commented on COUCHDB-3111:
--

For reference here is an example of a change feed request :

{code} GET 
/rdyno_001/_changes?filter=rdyno_filterdoc%2Frdyno_filtername=continuous=all_docs=%22334-g1IieJyV0MENgjAUgOGnmKAnR9AJjG2RlpPcHENbHg0SxJNn3UQ30U10EyzUBAkhgTRpk7b_d3gZAEwTB2GenzEmIaF8tTaLZOZhLEEtiqJIE0e6J3PhKqEiLfz2905CLc2utj8FKoUTuqGaIswuOcb6mMfY3Ydlv2_0WiqfR9ivP5T9tdFLznxNvF59PjE73MxhiHttIGFCoxhgPKzxrA0aKCLQH2C8rPH-nwX1Auw3S2t8rFHOQwGMdhXDNPOQYztMv-12jrg%22=1
 {code}


> Default replicator change feed timeout too short
> 
>
> Key: COUCHDB-3111
> URL: https://issues.apache.org/jira/browse/COUCHDB-3111
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: Nick Vatamaniuc
>
> Current replicator change feeds are set up to timeout based on default 
> connection_timeout parameter divided by 3. Default connection timeout is 
> 3 (msec). So replicator change feeds are torn down and established again 
> every 10 seconds.
> That doesn't seem bad on a smaller scale but if there are 1000 replications 
> jobs on a server it would means tearing down change feed connections every 10 
> msec. It seems like it might not be optimal so wanted to discuss it.
> Looking at the commit which introduced 'div 3' behavior wondering if there is 
> anything to improve here:
> https://github.com/apache/couchdb-couch-replicator/commit/ed447f8c01880c7f99f5829a8ef485fd8d399376
> Maybe keep div 3 but increase default connection timeout to 60 seconds? Or 
> maybe apply div 2 - 5 seconds, or have a minimum of 30 seconds?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3111) Default replicator change feed timeout too short

2016-08-29 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3111:


 Summary: Default replicator change feed timeout too short
 Key: COUCHDB-3111
 URL: https://issues.apache.org/jira/browse/COUCHDB-3111
 Project: CouchDB
  Issue Type: Improvement
Reporter: Nick Vatamaniuc


Current replicator change feeds are set up to timeout based on default 
connection_timeout parameter divided by 3. Default connection timeout is 3 
(msec). So replicator change feeds are torn down and established again every 10 
seconds.

That doesn't seem bad on a smaller scale but if there are 1000 replications 
jobs on a server it would means tearing down change feed connections every 10 
msec. It seems like it might not be optimal so wanted to discuss it.

Looking at the commit which introduced 'div 3' behavior wondering if there is 
anything to improve here:

https://github.com/apache/couchdb-couch-replicator/commit/ed447f8c01880c7f99f5829a8ef485fd8d399376

Maybe keep div 3 but increase default connection timeout to 60 seconds? Or 
maybe apply div 2 - 5 seconds, or have a minimum of 30 seconds?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3104) Replicator manager does not checkpoint properly

2016-08-15 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3104:


 Summary: Replicator manager does not checkpoint properly
 Key: COUCHDB-3104
 URL: https://issues.apache.org/jira/browse/COUCHDB-3104
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Nick Vatamaniuc


In couch_replicator_manager {code} changes_reader_cb({stop, EndSeq, _Pending}, 
...) -> {code} function at one point in the past was handling callback messages 
from {{fabric:change}} and so it would get pending info in the callback 
messages. When it was optimized to use local shard, local changes feeds don't 
send stop messages.

As a result replicator manager never checkpoints and on every change to a 
replicator shard, rescan all the changes in that shard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3076) CouchDB 2.0 Blog Series: Feature: replicator

2016-08-12 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419612#comment-15419612
 ] 

Nick Vatamaniuc commented on COUCHDB-3076:
--

Good idea, Jenn.  I added a little line about me at the end. 

Also linked to the syntax of Mango selectors Jan suggested.

Let me know if anything else needs to be one.


> CouchDB 2.0 Blog Series: Feature: replicator
> 
>
> Key: COUCHDB-3076
> URL: https://issues.apache.org/jira/browse/COUCHDB-3076
> Project: CouchDB
>  Issue Type: New JIRA Project
>Reporter: Jenn Turner
>Assignee: kzx
>Priority: Minor
>
> This issue is to track progress on a series of blog posts promoting the 
> release of CouchDB 2.0.
> Topic: Feature: replicator 
> -TBD
> Nick Vatamaniuc volunteered via email thread: 
> https://lists.apache.org/thread.html/47637fe64739d26eca81a109650022b77c92aac05d15d49b18ade813@%3Cdev.couchdb.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3101) Builtin reduce functions should not throw errors

2016-08-12 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419039#comment-15419039
 ] 

Nick Vatamaniuc commented on COUCHDB-3101:
--

I think it makes sense to signal user about the error somehow. So I like 
returning null better.

There is already a pattern of returning an error to user if their reduce 
function doesn't reduce fast enough:  
{code}query_server_config.reduce_limit{code} And that is enabled by default. 
(Also, there is currently a bug in it how it calculates the limit with a pr 
fix: https://github.com/apache/couchdb/pull/425 ).


> Builtin reduce functions should not throw errors
> 
>
> Key: COUCHDB-3101
> URL: https://issues.apache.org/jira/browse/COUCHDB-3101
> Project: CouchDB
>  Issue Type: Bug
>  Components: View Server Support
>Reporter: Paul Joseph Davis
>
> So I just figured out we have an issue with the builtin reduce functions. 
> Currently, if they receive invalid data they'll throw an error. Unfortunately 
> what ends up happening is that if the error is never corrected then the view 
> files end up becoming bloated and refusing to open (because they're searching 
> for a header as Jay pointed out the other week).
> We should either return null or ignore the bad data. My preference would be 
> to return null so that it indicates bad data was given somewhere but I could 
> also see just dropping the bad value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3076) CouchDB 2.0 Blog Series: Feature: replicator

2016-08-07 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411197#comment-15411197
 ] 

Nick Vatamaniuc commented on COUCHDB-3076:
--

Draft: 



https://docs.google.com/document/d/14rk9jRrAElzAFA3XdXDsjahmklGrMHfwY_j9LJji1bA/edit?usp=sharing

> CouchDB 2.0 Blog Series: Feature: replicator
> 
>
> Key: COUCHDB-3076
> URL: https://issues.apache.org/jira/browse/COUCHDB-3076
> Project: CouchDB
>  Issue Type: New JIRA Project
>Reporter: Jenn Turner
>Assignee: kzx
>Priority: Minor
>
> This issue is to track progress on a series of blog posts promoting the 
> release of CouchDB 2.0.
> Topic: Feature: replicator 
> -TBD
> Nick Vatamaniuc volunteered via email thread: 
> https://lists.apache.org/thread.html/47637fe64739d26eca81a109650022b77c92aac05d15d49b18ade813@%3Cdev.couchdb.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2824) group & group_level view parameters override each

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2824.


> group & group_level view parameters override each
> -
>
> Key: COUCHDB-2824
> URL: https://issues.apache.org/jira/browse/COUCHDB-2824
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core, HTTP Interface
>Reporter: Nick Vatamaniuc
>Assignee: Nick Vatamaniuc
> Fix For: 2.0.0
>
>
> In a view query, if both group and group_level is specified the last one 
> specified overrides any of the previous "group" or "group_level" parameters.
> Example:
> Create a db (db1), at least one document, a design doc (des1) that looks like:
> {code:javascript}
> {
>"views": { 
>  "v1" : { "map": "function(d){
>  emit([1,1],1); 
>  emit([1,1],10);
>  emit([1,2],100); 
>  emit([1,2],1000); 
>  emit([2,2],1);
>}" , 
>  "reduce":"_sum" 
>  } 
> }
> {code}
> Then these queries show the problem:
> {code}
> $ http "$DB1/db1/_design/des1/_view/v1?group_level=1=true"
> {"rows":[
> {"key":[1,1],"value":11},
> {"key":[1,2],"value":1100},
> {"key":[2,2],"value":1}
> ]}
> {code}
> But users might expect group_level=1 results to show or a 400 request invalid.
> Specifying group_level=1 after group=true make group_level=1 take effect:
> {code}
> $ http "$DB1/db1/_design/des1/_view/v1?group_level=1=true_level=1"
> {"rows":[
> {"key":[1],"value":},
> {"key":[2],"value":1}
> ]}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2831) OS Daemons configuration test is failing when run in isolation

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2831.


> OS Daemons configuration test is failing when run in isolation
> --
>
> Key: COUCHDB-2831
> URL: https://issues.apache.org/jira/browse/COUCHDB-2831
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> It seems to work when run as part of the whole test suite. When run on its 
> won it fails.
> ... apps=couch tests=configuration_reader_test_,
> {code}
> [error] Ignoring OS daemon request: {error,{1,invalid_json}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2815) POST to /{db}/_all_docs with invalid keys should return a 400 error instead of 500

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2815.


> POST to /{db}/_all_docs with invalid keys should return a 400 error instead 
> of 500
> --
>
> Key: COUCHDB-2815
> URL: https://issues.apache.org/jira/browse/COUCHDB-2815
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core, HTTP Interface
>Reporter: Nick Vatamaniuc
> Fix For: 2.0.0
>
>
> Related to 
> http://docs.couchdb.org/en/latest/api/database/bulk-api.html#post--db-_all_docs
>  end point.
> Example:
>  *  db1 created with two documents ids : "1" and "2".
> {code}
>  http -a adm:pass POST http://127.0.0.1:15984/db1/_all_docs  keys:='["1",2]'
> HTTP/1.1 500 Internal Server Error
> Cache-Control: must-revalidate
> Content-Length: 43
> Content-Type: application/json
> Date: Wed, 16 Sep 2015 18:25:08 GMT
> Server: CouchDB/b8b9968 (Erlang OTP/17)
> X-Couch-Request-ID: 898d97fc1f
> X-CouchDB-Body-Time: 0
> {
> "error": "2",
> "reason": "{illegal_docid,2}"
> }
> {code}
> Expected 400 error instead as there is nothing wrong with on the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2818) Design documents accept invalid views

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2818.


> Design documents accept invalid views
> -
>
> Key: COUCHDB-2818
> URL: https://issues.apache.org/jira/browse/COUCHDB-2818
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core, Documentation, JavaScript View Server
>Reporter: Nick Vatamaniuc
>Assignee: Nick Vatamaniuc
> Fix For: 2.0.0
>
>
> Design documents seem to accept invalid views. 
> For example:
> {code}
> $ http PUT $DB1/db2/_design/des1 views:='{ "v1" : 
> "function(d){emit(d._id,d);}" }'
> HTTP/1.1 201 Created
> {
> "id": "_design/des1",
> "ok": true,
> "rev": "1-04701f13eb827265c442d219bd995e91"
> }
> {code}
> Going by the documentation for design documents: 
> http://docs.couchdb.org/en/latest/api/ddoc/common.html#put--db-_design-ddoc , 
> a view should be an object that has a map (a string) and an optional reduce 
> (string).  
> Interestingly some validation is performed to check that views field itself 
> is an object.  For example:
> {code}
> $ http PUT $DB1/db2/_design/des1 views:='"x"'
> HTTP/1.1 400 Bad Request
> {
> "error": "invalid_design_doc",
> "reason": "`views` parameter must be an object."
> }
> {code}
> Also there is a deeper level validation of map functions:
> {code}
> $  http PUT $DB1/db2/_design/des1 views:='{ "m":{"map":""} }'
> {
> "error": "not_found",
> "reason": "missing function"
> }
> {code}
> If there is interest, I have a patch that, if provided: views, filters, 
> lists, show, updates, options are objects. rewrites are arrays, 
> validate_doc_update and language are strings.
> Then if views is provided, each view is an object. It must have a map 
> function (a string) and an optional reduce function (also a string).
> Here is an example how it works:
> {code}
> $ http PUT $DB1/db2/_design/des1 views:='{ "m":"bad"  }'
> HTTP/1.1 400 Bad Request
> {
> "error": "invalid_design_doc",
> "reason": "View m must be an object"
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2848) EUnit Tests Fail Intermetently

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2848.


> EUnit Tests Fail Intermetently
> --
>
> Key: COUCHDB-2848
> URL: https://issues.apache.org/jira/browse/COUCHDB-2848
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
> Fix For: 2.0.0
>
>
> Use this for now to keep track of them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2954) Deprecate configurable _replicator db name in 2.0

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2954.


> Deprecate configurable _replicator db name in 2.0
> -
>
> Key: COUCHDB-2954
> URL: https://issues.apache.org/jira/browse/COUCHDB-2954
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: Nick Vatamaniuc
>
> CouchDB 1.x has a configurable replicator database name. 
> CouchDB 2.x uses another pattern for having custom replicator databases -- it 
> scans files in local database data directory for patterns matching {code} 
> "_replicator(\\.[0-9]{10,})?.couch$" {code}. So for example, can create a 
> database called {{"joe/_replicator"}} and it will be considered a replicator 
> database by the replication management code. This way can even have multiple 
> replicator databases ( {{"mike/_replicator"}}, or {{"joe/other/_replicator"}} 
> ), so configuration is even more flexible than it was in 1.x.
> Current code in couch_replicator_manager.erl is a mix of using the 1.x config 
> option and scanning recursively for db files with _replicator pattern. It 
> already also assumes a hard-coded "_replicator" name in a few places:
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L918
> The proposal it to deprecate _replicator db name configuration in order to 
> simplify and clean up the the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2832) Task status test setup fails

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2832.


> Task status test setup fails
> 
>
> Key: COUCHDB-2832
> URL: https://issues.apache.org/jira/browse/COUCHDB-2832
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> Unit test couch_task_status_tests fails 
> {code}
> $ reunit apps=couch tests=couch_task_status_test_
> ==> couch_log (eunit)
> Running test function(s):
>  EUnit 
>   There were no tests to run.
> ==> couch (eunit)
> Compiled test/couch_doc_json_tests.erl
> Compiled test/couchdb_os_daemons_tests.erl
> Running test function(s):
>   couch_task_status_tests:couch_task_status_test_/0
>  EUnit 
> CouchDB task status updates
>   couch_task_status_tests:58: should_register_task...ok
>   couch_task_status_tests:62: should_set_task_startup_time...[0.002 s] ok
>   couch_task_status_tests:67: 
> should_have_update_time_as_startup_before_any_progress...ok
>   couch_task_status_tests:71: should_set_task_type...ok
>   couch_task_status_tests:75: 
> should_not_register_multiple_tasks_for_same_pid...ok
>   couch_task_status_tests:80: should_set_task_progress...ok
>   couch_task_status_tests:85: should_update_task_progress...*skipped*
> undefined
> *unexpected termination of test process*
> ::{{badmatch,undefined},
>[{couch_log,debug,2,[{file,"src/couch_log.erl"},{line,32}]},
> {couch_task_status,handle_cast,2,
>[{file,"src/couch_task_status.erl"},{line,137}]},
> {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]},
> {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]},
> {proc_lib,init_p_do_apply,3,[{file,[...]},{line,...}]}]}
> ===
>   Failed: 0.  Skipped: 0.  Passed: 6.
> One or more tests were cancelled.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2963) Replication manager does not rescan databases on cluster membership change

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2963.


> Replication manager does not rescan databases on cluster membership change
> --
>
> Key: COUCHDB-2963
> URL: https://issues.apache.org/jira/browse/COUCHDB-2963
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>Assignee: Nick Vatamaniuc
> Fix For: 2.0.0
>
>
> Replication manager should rescan all replicator databases on cluster 
> membership changes from sequence 0, in order to possibly pick up new 
> replication it might be an owner of.
> On receipt of nodedown or nodeup message, replication manager attempts to 
> start a new scan by resetting the checkpointed sequence IDs ets table. With 
> the intent that change feeds will exit and then check if they need to rescan 
> again. However because change feeds used for the replicator databases are 
> "continuous" they never exit, so consequently they never get a chance start 
> rescanning from 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2959) Deadlock condition in replicator with remote source and configured 1 http connection

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2959.


> Deadlock condition in replicator with remote source and configured 1 http 
> connection
> 
>
> Key: COUCHDB-2959
> URL: https://issues.apache.org/jira/browse/COUCHDB-2959
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Nick Vatamaniuc
> Attachments: rep.py
>
>
> A deadlock that occurs that can get the starting replications to get stuck 
> (and never update their state to triggered). This happens with a remote 
> source and when using a single http connection and single worker.
>  The deadlock occurs in this case:
>  - Replication process starts, it starts the changes reader: 
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator.erl#L276
>  - Changes reader consumes the worker from httpc pool. At some point it will 
> make a call back to the replication process to report how much work it has 
> done using gen_server call {{report_seq_done}}
>  - In the meantime, main replication process calls {{get_pending_changes}} to 
> get changes from the source. If the source is remote it will attempt to 
> consumer a worker from httpc pool. However the worker is used by the change 
> feed process. So get_pending_changes is blocked waiting for a worker to be 
> released.
>  - So changes feed is waiting for report_seq_done call to replication process 
> to return while holding a worker and main replication process is waiting for 
> httpc pool to release the worker and it never responds to report_seq_done.
> Attached python script (rep.py) to reproduce issue. Script creates n 
> databases (tested with n=1000). Then replicates those databases to 1 single 
> database. It also need Python CouchDB module from pip (or package repos).
> 1. It an can be run from ipython. By importing {{rep}}. 
> 2. start dev cluster {{./dev/run --admin=adm:pass}}
> 3. {{rep.replicate_1_to_n(1000)}}
> wait
> 4. {{rep.check_untriggered()}}
> When it fails, result might look like this:
> {code}
> {
>  'rdyno_1_6': None,
>  'rdyno_1_00158': None
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-2959) Deadlock condition in replicator with remote source and configured 1 http connection

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-2959.
--
Resolution: Fixed

> Deadlock condition in replicator with remote source and configured 1 http 
> connection
> 
>
> Key: COUCHDB-2959
> URL: https://issues.apache.org/jira/browse/COUCHDB-2959
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Nick Vatamaniuc
> Attachments: rep.py
>
>
> A deadlock that occurs that can get the starting replications to get stuck 
> (and never update their state to triggered). This happens with a remote 
> source and when using a single http connection and single worker.
>  The deadlock occurs in this case:
>  - Replication process starts, it starts the changes reader: 
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator.erl#L276
>  - Changes reader consumes the worker from httpc pool. At some point it will 
> make a call back to the replication process to report how much work it has 
> done using gen_server call {{report_seq_done}}
>  - In the meantime, main replication process calls {{get_pending_changes}} to 
> get changes from the source. If the source is remote it will attempt to 
> consumer a worker from httpc pool. However the worker is used by the change 
> feed process. So get_pending_changes is blocked waiting for a worker to be 
> released.
>  - So changes feed is waiting for report_seq_done call to replication process 
> to return while holding a worker and main replication process is waiting for 
> httpc pool to release the worker and it never responds to report_seq_done.
> Attached python script (rep.py) to reproduce issue. Script creates n 
> databases (tested with n=1000). Then replicates those databases to 1 single 
> database. It also need Python CouchDB module from pip (or package repos).
> 1. It an can be run from ipython. By importing {{rep}}. 
> 2. start dev cluster {{./dev/run --admin=adm:pass}}
> 3. {{rep.replicate_1_to_n(1000)}}
> wait
> 4. {{rep.check_untriggered()}}
> When it fails, result might look like this:
> {code}
> {
>  'rdyno_1_6': None,
>  'rdyno_1_00158': None
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-2954) Deprecate configurable _replicator db name in 2.0

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-2954.
--
Resolution: Fixed

> Deprecate configurable _replicator db name in 2.0
> -
>
> Key: COUCHDB-2954
> URL: https://issues.apache.org/jira/browse/COUCHDB-2954
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: Nick Vatamaniuc
>
> CouchDB 1.x has a configurable replicator database name. 
> CouchDB 2.x uses another pattern for having custom replicator databases -- it 
> scans files in local database data directory for patterns matching {code} 
> "_replicator(\\.[0-9]{10,})?.couch$" {code}. So for example, can create a 
> database called {{"joe/_replicator"}} and it will be considered a replicator 
> database by the replication management code. This way can even have multiple 
> replicator databases ( {{"mike/_replicator"}}, or {{"joe/other/_replicator"}} 
> ), so configuration is even more flexible than it was in 1.x.
> Current code in couch_replicator_manager.erl is a mix of using the 1.x config 
> option and scanning recursively for db files with _replicator pattern. It 
> already also assumes a hard-coded "_replicator" name in a few places:
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L918
> The proposal it to deprecate _replicator db name configuration in order to 
> simplify and clean up the the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-2988) Allow query selector as changes and replication filter

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-2988.
--
Resolution: Fixed

> Allow query selector as changes and replication filter
> --
>
> Key: COUCHDB-2988
> URL: https://issues.apache.org/jira/browse/COUCHDB-2988
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core, Mango
>Reporter: Nick Vatamaniuc
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2979) Replicator manager attempts to checkpoint too frequently

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2979.


> Replicator manager attempts to checkpoint too frequently
> 
>
> Key: COUCHDB-2979
> URL: https://issues.apache.org/jira/browse/COUCHDB-2979
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> Current checkpoint interval is set to 5 seconds. That works well for a few 
> replications but when there are thousands of them it ends up being an attempt 
> every few milliseconds or so.
> Moreover to decide on ownership (in order to keep on replication running per 
> cluster) each replication during an attempted checkpoint uses a gen_server 
> call to replicator manager. Those usually are fast (I bench-marked at a 
> 100-200 usec) however if replicator manager is busy (say stuck fetching large 
> filter documents when computing replication ids), none of the replication 
> would be able to checkpoint and make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2988) Allow query selector as changes and replication filter

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2988.


> Allow query selector as changes and replication filter
> --
>
> Key: COUCHDB-2988
> URL: https://issues.apache.org/jira/browse/COUCHDB-2988
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core, Mango
>Reporter: Nick Vatamaniuc
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3006) Source failure in one source to many target replications causes a stampede

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-3006.


> Source failure in one source to many target replications causes a stampede
> --
>
> Key: COUCHDB-3006
> URL: https://issues.apache.org/jira/browse/COUCHDB-3006
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> For multiple replications from a single source to multiple targets. If source 
> fails, all replications post an error state back their replication document 
> and attempt to restart. This creates a stampede effect and causes sharp load 
> spikes on the replication cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-3039) Inconsistent behavior with with _all_docs handling of null keys between CouchDB 1.x and 2.x

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-3039.
--
Resolution: Fixed

> Inconsistent behavior with with _all_docs handling of null keys between 
> CouchDB 1.x and 2.x
> ---
>
> Key: COUCHDB-3039
> URL: https://issues.apache.org/jira/browse/COUCHDB-3039
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> CouchDB in a POST request to _all_docs where key is null will return an error 
> row:
> {code}
>  {
>"total_rows": 14970916,
>"offset": 0,
>"rows": [
>   {
>  "key": null,
>  "error": "not_found"
>   },
>   ... other valid rows ...
> ]
> }
> {code}
> CouchDB 2.0 will return a 400 error
> {code}
> HTTP/1.1 400 Bad Request
> {
> "error": "illegal_docid",
> "reason": null
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3082) Replicator manager crashes in terminate/2 if initial change feed spawned for _replicate hasn't finished

2016-08-02 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-3082.


> Replicator manager crashes in terminate/2 if initial change feed spawned for 
> _replicate hasn't finished
> ---
>
> Key: COUCHDB-3082
> URL: https://issues.apache.org/jira/browse/COUCHDB-3082
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> During init we spawn a change feed for the _replicator db and assign 
> rep_start_pids = [Pid]. However the shape of rep_start_pids should be {Tag, 
> Pid}. In terminate/2 we clean up by doing:
> {code}
> lists:foreach(
> fun({_Tag, Pid}) ->
> ...
> [{scanner, ScanPid} | StartPids]),
> {code}
>  
> Which ends up crashing with a function clause because we expect foreach 
> function to get a tuple of 2 items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports

2016-08-02 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405193#comment-15405193
 ] 

Nick Vatamaniuc commented on COUCHDB-2980:
--

[~chrisfosterelli] Interesting points.

Thinking more about this, it seems it is hard to for a node in a cluster to 
know the host of the cluster in general. Say a cluster is behind a proxy for 
fault tollerance, after the document is added to a replicator db, can't see how 
it would know what the external cluster host would be say database {{a}} means 
"https://user:p...@mycluster.com/a; or 
"http://user:p...@user.somecluster.net/a; for example. In case of {

> Replicator DB on 15984 replicates to backdoor ports
> ---
>
> Key: COUCHDB-2980
> URL: https://issues.apache.org/jira/browse/COUCHDB-2980
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Robert Kowalski
>Priority: Blocker
>
> If you POST a doc into the replicator database a replication is kicked off 
> and finishes successfully (usual 5984 port which maps to 15984 via haproxy).
> The problem is that the DB is replicated to the backdoor ports (15986) and is 
> not visible on the other ports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3082) Replicator manager crashes in terminate/2 if initial change feed spawned for _replicate hasn't finished

2016-07-26 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3082:


 Summary: Replicator manager crashes in terminate/2 if initial 
change feed spawned for _replicate hasn't finished
 Key: COUCHDB-3082
 URL: https://issues.apache.org/jira/browse/COUCHDB-3082
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc




During init we spawn a change feed for the _replicator db and assign 
rep_start_pids = [Pid]. However the shape of rep_start_pids should be {Tag, 
Pid}. In terminate/2 we clean up by doing:

lists:foreach(
fun({_Tag, Pid}) ->
...
[{scanner, ScanPid} | StartPids]),

 

Which ends up crashing with a function clause because we expect foreach 
function to get a tuple of 2 items.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3046) Improve reduce function overflow protection

2016-06-22 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3046:


 Summary: Improve reduce function overflow protection 
 Key: COUCHDB-3046
 URL: https://issues.apache.org/jira/browse/COUCHDB-3046
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Nick Vatamaniuc


The protection algorithm:

https://github.com/apache/couchdb/blob/master/share/server/views.js#L36-L41

When enabled, looks at couchjs' reduce command input and output line lengths 
(as stringy-fied json). If 2*len(output) > len(input) and len(output) > 200 
then an error is triggered.

There a few issues in that scheme:

 * Input line contains the length of the reduce function code itself. A large 
reduce function body (say 100KB) might lead to failure to trip the error.

 * On the other hand, output size checking threshold is too small = 200. It 
prevents functions using single large accumulator object (say with fields like 
.sum, .count, .stddev, and so on) from working. The size of output will be > 
200 but, even though it won't be growing it will still be prevented from 
running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3039) Inconsistent behavior with with _all_docs handling of null keys between CouchDB 1.x and 2.x

2016-06-17 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3039:


 Summary: Inconsistent behavior with with _all_docs handling of 
null keys between CouchDB 1.x and 2.x
 Key: COUCHDB-3039
 URL: https://issues.apache.org/jira/browse/COUCHDB-3039
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


CouchDB in a POST request to _all_docs where key is null will return an error 
row:

{code}
 {

   "total_rows": 14970916,

   "offset": 0,

   "rows": [

  {

 "key": null,

 "error": "not_found"

  },
  ... other valid rows ...
]
}
{code}

CouchDB 2.0 will return a 400 error

{code}
HTTP/1.1 400 Bad Request
{
"error": "illegal_docid",
"reason": null
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-2965) Race condition in replicator rescan logic

2016-05-09 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc closed COUCHDB-2965.


> Race condition in replicator rescan logic
> -
>
> Key: COUCHDB-2965
> URL: https://issues.apache.org/jira/browse/COUCHDB-2965
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Nick Vatamaniuc
>
> There is race condition between the full rescan and regular change feed 
> processing in the couch_replicator_manger code.
> This race condition would lead to replication docs left in untriggered state 
> when a rescan of all the docs is performed. The rescan might happen when 
> nodes connect and disconnect. The likelihood of this race condition appear 
> goes up if a lot of documents are updated and there is a back-up of messages 
> in the replicator manager's mailbox.
> The race condition happens in the following way:
> * A full rescan is initiated here: 
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L424
>  It clears the db_to_seq ets table which holds the latest change sequence for 
> each replicator database. Then launches a scan_all_dbs process.
>  * scan_all_dbs will find all replicator-looking-like database and for each 
> send a \{resume_scan, DbName\} message to the main couch_replicator_manager 
> process.
>  * \{resume_scan, DbName\} message is handled here:  
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L233
>  The expectation is because db_to_seq was reset it ends up not finding a 
> sequence checkpoint in db_to_seq, so start 0 and spawns a new change feed, 
> which will rescan all documents (since we need to determine ownership for 
> them).  
> But the race condition occurs because when change feeds stop, they call  
> replicator manager with \{ rep_db_checkpoint, DbName \} message. That updates 
> db_to_seq ets table with the latest change sequence: 
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L225
>  Which means this sequence of operations could happen:
>  * db_to_seq is reset to 0, scan_all_dbs is spawned
>  * change feed stops at sequence 1042, it calls \{rep_db_checkpoint, 
> <<"_replicator">>\}
>  * \{rep_db_checkpoint, <<"_replicator">>\} call is handled, now latest 
> db_to_seq for _replicator is 1042
>  * \{resume, <<"_replicator">>\} is sent from scan_all_dbs process and 
> received by replicator manager. It sees that db_to_seq has _replicator with 
> latest sequence 1042, so it will either start from that instead of 0, thus 
> skipping updates from 0 to 1042.
> This was seen by running the experiment with1000 replication documents were 
> being updated. Around document 700 or so , node1 was killed (pkill -f node1) 
> . node2 experienced the race condition on rescan and never picked up a bunch 
> of document that should have belong to it. didn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-2965) Race condition in replicator rescan logic

2016-05-09 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-2965.
--
Resolution: Fixed

> Race condition in replicator rescan logic
> -
>
> Key: COUCHDB-2965
> URL: https://issues.apache.org/jira/browse/COUCHDB-2965
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Nick Vatamaniuc
>
> There is race condition between the full rescan and regular change feed 
> processing in the couch_replicator_manger code.
> This race condition would lead to replication docs left in untriggered state 
> when a rescan of all the docs is performed. The rescan might happen when 
> nodes connect and disconnect. The likelihood of this race condition appear 
> goes up if a lot of documents are updated and there is a back-up of messages 
> in the replicator manager's mailbox.
> The race condition happens in the following way:
> * A full rescan is initiated here: 
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L424
>  It clears the db_to_seq ets table which holds the latest change sequence for 
> each replicator database. Then launches a scan_all_dbs process.
>  * scan_all_dbs will find all replicator-looking-like database and for each 
> send a \{resume_scan, DbName\} message to the main couch_replicator_manager 
> process.
>  * \{resume_scan, DbName\} message is handled here:  
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L233
>  The expectation is because db_to_seq was reset it ends up not finding a 
> sequence checkpoint in db_to_seq, so start 0 and spawns a new change feed, 
> which will rescan all documents (since we need to determine ownership for 
> them).  
> But the race condition occurs because when change feeds stop, they call  
> replicator manager with \{ rep_db_checkpoint, DbName \} message. That updates 
> db_to_seq ets table with the latest change sequence: 
> https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_manager.erl#L225
>  Which means this sequence of operations could happen:
>  * db_to_seq is reset to 0, scan_all_dbs is spawned
>  * change feed stops at sequence 1042, it calls \{rep_db_checkpoint, 
> <<"_replicator">>\}
>  * \{rep_db_checkpoint, <<"_replicator">>\} call is handled, now latest 
> db_to_seq for _replicator is 1042
>  * \{resume, <<"_replicator">>\} is sent from scan_all_dbs process and 
> received by replicator manager. It sees that db_to_seq has _replicator with 
> latest sequence 1042, so it will either start from that instead of 0, thus 
> skipping updates from 0 to 1042.
> This was seen by running the experiment with1000 replication documents were 
> being updated. Around document 700 or so , node1 was killed (pkill -f node1) 
> . node2 experienced the race condition on rescan and never picked up a bunch 
> of document that should have belong to it. didn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3006) Source failure in one source to many target replications causes a stampede

2016-04-27 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3006:


 Summary: Source failure in one source to many target replications 
causes a stampede
 Key: COUCHDB-3006
 URL: https://issues.apache.org/jira/browse/COUCHDB-3006
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


For multiple replications from a single source to multiple targets. If source 
fails, all replications post an error state back their replication document and 
attempt to restart. This creates a stampede effect and causes sharp load spikes 
on the replication cluster.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2631) Ensure that system databases callbacks are adds correctly for shared case

2016-04-19 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248545#comment-15248545
 ] 

Nick Vatamaniuc commented on COUCHDB-2631:
--

{code}
IsReplicatorDb = DbName == config:get("replicator", "db", "_replicator") 
{code}

Doesn't apply to 2.x anymore. Local replicator is always "_replicator".  Also 
it expects binaries.

{code}
(node1@127.0.0.1)4> 
couch_db:normalize_dbname("shards/-1fff/_users.1460972107").
"shards/-1fff/_users.1460972107"
(node1@127.0.0.1)5> 
couch_db:normalize_dbname("shards/-1fff/_users").
"shards/-1fff/_users"
(node1@127.0.0.1)6> 
couch_db:normalize_dbname(<<"shards/-1fff/_users">>).
<<"_users">>
(node1@127.0.0.1)7> 
couch_db:normalize_dbname(<<"shards/-1fff/_users.134565677">>).
<<"_users">>
{code}

[~eiri] pointed to this PR that should handle this issue 

https://github.com/apache/couchdb-couch/pull/160

> Ensure that system databases callbacks are adds correctly for shared case
> -
>
> Key: COUCHDB-2631
> URL: https://issues.apache.org/jira/browse/COUCHDB-2631
> Project: CouchDB
>  Issue Type: Bug
>  Components: BigCouch
>Reporter: Alexander Shorin
>Priority: Blocker
>  Labels: needs-pr
> Fix For: 2.0.0
>
>
> We have the following code in 
> [couch_server|https://github.com/apache/couchdb-couch/blob/master/src/couch_server.erl#L119-L143]
> {code}
> maybe_add_sys_db_callbacks(DbName, Options) when is_binary(DbName) ->
> maybe_add_sys_db_callbacks(?b2l(DbName), Options);
> maybe_add_sys_db_callbacks(DbName, Options) ->
> DbsDbName = config:get("mem3", "shard_db", "dbs"),
> NodesDbName = config:get("mem3", "node_db", "nodes"),
> IsReplicatorDb = DbName == config:get("replicator", "db", "_replicator") 
> orelse
>   path_ends_with(DbName, <<"_replicator">>),
> IsUsersDb = DbName ==config:get("couch_httpd_auth", "authentication_db", 
> "_users") orelse
>   path_ends_with(DbName, <<"_users">>),
> if
>   DbName == DbsDbName ->
>   [sys_db | Options];
>   DbName == NodesDbName ->
>   [sys_db | Options];
>   IsReplicatorDb ->
>   [{before_doc_update, fun 
> couch_replicator_manager:before_doc_update/2},
>{after_doc_read, fun couch_replicator_manager:after_doc_read/2},
>sys_db | Options];
>   IsUsersDb ->
>   [{before_doc_update, fun couch_users_db:before_doc_update/2},
>{after_doc_read, fun couch_users_db:after_doc_read/2},
>sys_db | Options];
>   true ->
>   Options
> end.
> {code}
> Which works perfectly except if system database is clustered. So, for shared 
> _users and _replicator the check condition will not work since shared 
> databases ends with timestamp and full name looks as 
> "shards/-1fff/_users.1424979962"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2834) Server sends connection: close too early

2016-04-18 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247182#comment-15247182
 ] 

Nick Vatamaniuc commented on COUCHDB-2834:
--

Just noticed email from JIRA. Will have PR ready by tomorrow.

> Server sends connection: close too early
> 
>
> Key: COUCHDB-2834
> URL: https://issues.apache.org/jira/browse/COUCHDB-2834
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>Priority: Blocker
>  Labels: has-pr
> Fix For: 2.0.0
>
>
> This is related COUCHDB-2833.
> This was found investigating the failure of replication tests. Specifically 
> couch_replicator_large_atts_tests, the {local, remote} sub-case.
> The test sets up push replications from local to remote.
> Replication workers  have more than 1 document larger than 
> MAX_BULK_ATT_SIZE=64K.  They start pushing them to the target, using a 
> keep-alive connection (default  for HTTP 1.1), the first few pipelined 
> requests will go through using the same connection, then server will accept 
> the first PUT to …/docid?edits=false, then return Connection:close and close 
> the connection after the 201 Created result.  
> Server should not close request too early and instead keep it open longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-2988) Allow query selector as changes and replication filter

2016-04-13 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-2988:


 Summary: Allow query selector as changes and replication filter
 Key: COUCHDB-2988
 URL: https://issues.apache.org/jira/browse/COUCHDB-2988
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core, Mango
Reporter: Nick Vatamaniuc






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-2987) Mango Python tests failure

2016-04-13 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-2987:


 Summary: Mango Python tests failure
 Key: COUCHDB-2987
 URL: https://issues.apache.org/jira/browse/COUCHDB-2987
 Project: CouchDB
  Issue Type: Bug
  Components: Mango
Reporter: Nick Vatamaniuc


Saw this tests failure running mango's test suit:

{code}
$ nosetests
S...SF..SSS.S.S...SSS
==
FAIL: test_empty_subsel_match (02-basic-find-test.BasicFindTests)
--
Traceback (most recent call last):
  File "/Users/nvatama/asf/couchdb/src/mango/test/02-basic-find-test.py", line 
256, in test_empty_subsel_match
assert len(docs) == 1
AssertionError:
 >> begin captured logging << 
requests.packages.urllib3.connectionpool: DEBUG: "POST 
/mango_test_b7fb2baf897741a288e8174971ef388c/_bulk_docs HTTP/1.1" 201 97
requests.packages.urllib3.connectionpool: DEBUG: "POST 
/mango_test_b7fb2baf897741a288e8174971ef388c/_find HTTP/1.1" 200 None
- >> end captured logging << -

--
Ran 137 tests in 51.613s

FAILED (SKIP=90, failures=1)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports

2016-04-05 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc updated COUCHDB-2980:
-
Comment: was deleted

(was: We should probably disallow "local" replications from being accepted in 
source and target of replication doc. Those end up as "local" databases (like 
say _users, _nodes, _dbs) don't do what is expected.

To make things more interesting, for the _replicate http endpoint we do some 
hacks to turn a local db into a full url:

https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L389

But that is running inside the context of a http request so it easy to access 
to authorization headers and such.

)

> Replicator DB on 15984 replicates to backdoor ports
> ---
>
> Key: COUCHDB-2980
> URL: https://issues.apache.org/jira/browse/COUCHDB-2980
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Robert Kowalski
>
> If you POST a doc into the replicator database a replication is kicked off 
> and finishes successfully (usual 5984 port which maps to 15984 via haproxy).
> The problem is that the DB is replicated to the backdoor ports (15986) and is 
> not visible on the other ports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2980) Replicator DB on 15984 replicates to backdoor ports

2016-04-05 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227328#comment-15227328
 ] 

Nick Vatamaniuc commented on COUCHDB-2980:
--

We should probably disallow "local" replications from being accepted in source 
and target of replication doc. Those end up as "local" databases (like say 
_users, _nodes, _dbs) don't do what is expected.

To make things more interesting, for the _replicate http endpoint we do some 
hacks to turn a local db into a full url:

https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L389

But that is running inside the context of a http request so it easy to access 
to authorization headers and such.



> Replicator DB on 15984 replicates to backdoor ports
> ---
>
> Key: COUCHDB-2980
> URL: https://issues.apache.org/jira/browse/COUCHDB-2980
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Reporter: Robert Kowalski
>
> If you POST a doc into the replicator database a replication is kicked off 
> and finishes successfully (usual 5984 port which maps to 15984 via haproxy).
> The problem is that the DB is replicated to the backdoor ports (15986) and is 
> not visible on the other ports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COUCHDB-2979) Replicator manager attempts to checkpoint too frequently

2016-03-31 Thread Nick Vatamaniuc (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Vatamaniuc resolved COUCHDB-2979.
--
Resolution: Fixed

> Replicator manager attempts to checkpoint too frequently
> 
>
> Key: COUCHDB-2979
> URL: https://issues.apache.org/jira/browse/COUCHDB-2979
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> Current checkpoint interval is set to 5 seconds. That works well for a few 
> replications but when there are thousands of them it ends up being an attempt 
> every few milliseconds or so.
> Moreover to decide on ownership (in order to keep on replication running per 
> cluster) each replication during an attempted checkpoint uses a gen_server 
> call to replicator manager. Those usually are fast (I bench-marked at a 
> 100-200 usec) however if replicator manager is busy (say stuck fetching large 
> filter documents when computing replication ids), none of the replication 
> would be able to checkpoint and make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-2979) Replicator manager attempts to checkpoint too frequently

2016-03-31 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-2979:


 Summary: Replicator manager attempts to checkpoint too frequently
 Key: COUCHDB-2979
 URL: https://issues.apache.org/jira/browse/COUCHDB-2979
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


Current checkpoint interval is set to 5 seconds. That works well for a few 
replications but when there are thousands of them it ends up being an attempt 
every few milliseconds or so.

Moreover to decide on ownership (in order to keep on replication running per 
cluster) each replication during an attempted checkpoint uses a gen_server call 
to replicator manager. Those usually are fast (I bench-marked at a 100-200 
usec) however if replicator manager is busy (say stuck fetching large filter 
documents when computing replication ids), none of the replication would be 
able to checkpoint and make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2975) Automatically restart replication jobs if they crash

2016-03-25 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211933#comment-15211933
 ] 

Nick Vatamaniuc commented on COUCHDB-2975:
--

Noticed transient mode does not clean up child specs after it is done. Even if 
exit is normal. The intent behind that is to let users restart children.

>From erlang docs saw this {{If the child is temporary, the child specification 
>is deleted as soon as the process terminates. This means that delete_child/2 
>has no meaning, and restart_child/2 can not be used for these children.}}

However in our code sometimes we explicitly delete child:

{code}
cancel_replication({BaseId, Extension}) ->
...
case supervisor:terminate_child(couch_replicator_job_sup, FullRepId) of
ok ->
...
case supervisor:delete_child(couch_replicator_job_sup, FullRepId) of
ok ->
{ok, {cancelled, ?l2b(FullRepId)}};
  ...
{code}

That would make it seem as if supervisor auto-deleted the child spec in some 
cases. To test that it doesn't start a normal replication (not a continuous 
one) and then after it is finished inspect the state of 
{{couch_replicator_job_sup}}.

An example of state from supervisor after 10 replication have finished on a 
cluster:

{code}
{state,
{local,couch_replicator_job_sup},
one_for_one,
[{child,undefined,"ac35738f5003c02b6780116fdf04b524",
 {gen_server,start_link,
 [couch_replicator,
  {rep,
  {"ac35738f5003c02b6780116fdf04b524",[]},
  {httpdb,"http://adm:pass@localhost:5984/rdyno_src_0001/;,
  nil,
  [{"Accept","application/json"},
   {"User-Agent","CouchDB-Replicator/5fa9098"}],
  20,
  [{socket_options,[{keepalive,true},{nodelay,false}]}],
  1,250,nil,1},
  {httpdb,"http://adm:pass@localhost:5984/rdyno_tgt_0009/;,
  nil,
  [{"Accept","application/json"},
   {"User-Agent","CouchDB-Replicator/5fa9098"}],
  20,
  [{socket_options,[{keepalive,true},{nodelay,false}]}],
  1,250,nil,1},
  [{checkpoint_interval,5000},
   {connection_timeout,20},
   {continuous,false},
   {http_connections,1},
   {retries,1},
   {socket_options,[{keepalive,true},{nodelay,false}]},
   {use_checkpoints,true},
   {worker_batch_size,500},
   {worker_processes,1}],
  {user_ctx,null,[],undefined},
  db,nil,
  <<"rdyno_0001"...(15 B)>>,
  <<"shards/a00"...(47 B)>>},
  [{timeout,20}]]},
 transient,250,worker,
 [couch_replicator]},
 {child,undefined,"6c48c1ab7a6e3ed5e3d4415ced912e4a",
 {gen_server,start_link,
 [couch_replicator,
  {rep,
  {"6c48c1ab7a6e3ed5e3d4415ced912e4a",[]},
  {httpdb,"http://adm:pass@localhost:5984/rdyno_src_0001/;,
  nil,
  [{"Accept","application/json"},
   {"User-Agent","CouchDB-Replicator/5fa9098"}],
  20,
  [{socket_options,[{keepalive,true},{nodelay,false}]}],
  1,250,nil,1},
  {httpdb,"http://adm:pass@localhost:5984/rdyno_tgt_0002/;,
  nil,
  [{"Accept","application/json"},
   {"User-Agent","CouchDB-Replicator/5fa9098"}],
  20,
  [{socket_options,[{keepalive,true},{nodelay,false}]}],
  1,250,nil,1},
  [{checkpoint_interval,5000},
   {connection_timeout,20},
   {continuous,false},
   {http_connections,1},
   {retries,1},
   {socket_options,[{keepalive,true},{nodelay,false}]},
   {use_checkpoints,true},
   {worker_batch_size,500},
   {worker_processes,1}],
  {user_ctx,null,[],undefined},
  db,nil,
  <<"rdyno_0001"...(15 B)>>,
  <<"shards/200"...(47 B)>>},
  [{timeout,20}]]},
 transient,250,worker,
 [couch_replicator]}],
undefined,100,1,[],couch_replicator_job_sup,[]}
{code}



> Automatically restart replication jobs if they crash
> 
>
> Key: COUCHDB-2975
> URL: https://issues.apache.org/jira/browse/COUCHDB-2975
> Project: CouchDB
>  Issue Type: Improvement
>  

[jira] [Commented] (COUCHDB-2975) Automatically restart replication jobs if they crash

2016-03-24 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211101#comment-15211101
 ] 

Nick Vatamaniuc commented on COUCHDB-2975:
--

We might have to increase intensity threshold.  One common use case that will 
trigger is one source to multiple targets replications. Source fails, So all 
replications will fail as well. Tested it with 1 source to 200 targets. Then 
killed the source and noticed supervisors were restarted:

(node1@127.0.0.1)4> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]).
{[<0.352.0>,<26873.355.0>,<26910.354.0>],[]} % before deleting source
(node1@127.0.0.1)5> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]).
{[<0.5617.4>,<26873.7071.3>,<26910.8924.3>],[]} % after deleting source

Saw we already have some protection again failed repeated replication re-starts 
as the “max_replication_retry_count” parameter. By default it is 10. So 10 
failed replication starts for a particular replication will cancel that 
replication. Once it successfully starts once, the failed retries number gets 
reset back to max (10).

Another thing, noticed replications will restart even without {{transient}} 
supervisors if they are killed with an exit reason other than 'kill' (brutal 
kill). So if the goal is to just restart them, sending them exit(Pid, meh) 
should suffice. 

> Automatically restart replication jobs if they crash
> 
>
> Key: COUCHDB-2975
> URL: https://issues.apache.org/jira/browse/COUCHDB-2975
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Robert Newson
>
> We currently use the temporary restart strategy for replication jobs, which 
> means if they crash they are not restarted.
> Instead, let's use the transient restart strategy, ensuring they are 
> restarted on abnormal termination, while still allowing these tasks to end 
> successfully on completion or cancellation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-2971) Provide cardinality estimate (COUNT DISTINCT) as builtin reducer

2016-03-22 Thread Nick Vatamaniuc (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205902#comment-15205902
 ] 

Nick Vatamaniuc commented on COUCHDB-2971:
--

Ah, good point on having a nicer way to specify precision. Yeah otherwise it 
looks kind of hackish. 

Noticed they provide various backends for the registers. One is a C NIF. Tried 
to compile and run their code on Erlang 18 and had to fiddle with it a bit, but 
got it to work and got these results:

https://gist.github.com/nickva/bf19a2b7b537f5051a99

There are some tradeoffs between memory usage, cardinality and union times. 
While C array is interesting, having the cheapest union operation (under 1ms), 
has cardinality estimation time greater than a few milliseconds which might not 
play well with the Erlang schedulers. But if it happens only during the 
finalize stage it could be handled in another way (some thread + queue 
mechanism). Unfortunately it also has a large/constant memory usage for low 
cardinalities.

> Provide cardinality estimate (COUNT DISTINCT) as builtin reducer
> 
>
> Key: COUCHDB-2971
> URL: https://issues.apache.org/jira/browse/COUCHDB-2971
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: Adam Kocoloski
>
> We’ve seen a number of applications now where a user needs to count the 
> number of unique keys in a view. Currently the recommended approach is to add 
> a trivial reduce function and then count the number of rows in a _list 
> function or client-side application code, but of course that doesn’t scale 
> nicely.
> It seems that in a majority of these cases all that’s required is an 
> approximation of the number of distinct entries, which brings us into the 
> space of hash sets, linear probabilistic counters, and the ever-popular 
> “HyperLogLog” algorithm. Taking HLL specifically, this seems like quite a 
> nice candidate for a builtin reduce. The size of the data structure is 
> independent of the number of input elements and individual HLL filters can be 
> unioned together. There’s already what seems to be a good MIT-licensed 
> implementation on GitHub:
> https://github.com/GameAnalytics/hyper
> One caveat is that this reducer would not work for group_level reductions; 
> it’d only give the correct result for the exact key. I don’t think that 
> should preclude us from evaluating it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >