[GitHub] [couchdb-documentation] ksnavely commented on a change in pull request #581: [RFC] Replicator Implementation for CouchDB 4.x

2020-08-19 Thread GitBox


ksnavely commented on a change in pull request #581:
URL: 
https://github.com/apache/couchdb-documentation/pull/581#discussion_r47436



##
File path: rfcs/016-fdb-replicator.md
##
@@ -0,0 +1,384 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Replicator Implementation On FDB'
+labels: rfc, discussion
+assignees: 'vatam...@apache.org'
+
+---
+
+# Introduction
+
+This document describes the design of the replicator application for CouchDB
+4.x. The replicator will rely on `couch_jobs` for centralized scheduling and
+monitoring of replication jobs.
+
+## Abstract
+
+Replication jobs can be created from documents in `_replicator` databases, or
+by `POST`-ing requests to the HTTP `/_replicate` endpoint. Previously, in
+CouchDB <= 3.x, replication jobs were mapped to individual cluster nodes and a
+scheduler component would run up to `max_jobs` number of jobs at a time on each
+node. The new design proposes using `couch_jobs`, as described in the
+[Background Jobs
+RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md),
+to have a central, FDB-based queue of replication jobs. `couch_jobs`
+application will manage job scheduling and coordination. The new design also
+proposes using heterogeneous node types as defined in the [Node Types
+RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+such that replication jobs will be created only on `api_frontend` nodes and run
+only on `replication` nodes.
+
+## Requirements Language
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in [RFC
+2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`_replicator` databases : A database that is either named `_replicator` or ends
+with the `/_replicator` suffix.
+
+`transient` replications : Replication jobs created by `POST`-ing to the
+`/_replicate` endpoint.
+
+`persistent` replications : Replication jobs defined in document in a
+`_replicator` database.
+
+`continuous` replications : Replication jobs created with the `"continuous":
+true` parameter. These jobs will try to run continuously until the user removes
+them. They may be temporarily paused to allow other jobs to make progress.
+
+`one-shot` replications : Replication jobs which are not `continuous`. If the
+`"continuous":true` parameter is not specified, by default, replication jobs
+will be `one-shot`. These jobs will try to run until they reach the end of the
+changes feed, then stop.
+
+`api_frontend node` : Database node which has the `api_frontend` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can be only be created on these nodes.
+
+`replication node` : Database node which has the `replication` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can only be run on these nodes.
+
+`filtered` replications: Replications with a user-defined filter on the source
+endpoint to filter its changes feed.
+
+`replication_id` : An ID defined by replication jobs, which is a hash of
+replication parameters that affect the result of the replication. These may
+include source and target endpoint URLs, as well as a filter function specified
+in a design document on the source endpoint.
+
+`job_id` : A replication job ID derived from the database and document IDs for
+persistent replications, and from source, target endpoint, user name and some
+options for transient replications. Computing a `job_id`, unlike a
+`replication_id`, doesn't require making any network requests. A filtered
+replication with a given `job_id` during its lifetime may change its
+`replication_id` multiple times when filter contents changes on the source.
+
+`max_jobs` : Configuration parameter which specifies up to how many replication
+jobs to run on each `replication` node.
+
+`max_churn` : Configuration parameter which specifies a limit of how many new
+jobs to spawn during each rescheduling interval.
+
+`min_backoff_penalty` : Configuration parameter specifying the minimum (the
+base) penalty applied to jobs which crash repeatedly.
+
+`max_backoff_penalty` : Configuration parameter specifying the maximum penalty
+applied to jobs which crash repeatedly.
+
+---
+
+# Detailed Description
+
+Replication job creation and scheduling works roughly as follows:
+
+ 1) `Persistent` and `transient` jobs both start by creating or updating a
+ `couch_jobs` record in a separate replication key-space on `api_frontend`
+ nodes. Persistent jobs are driven by the `couch_epi` callback mechanism which
+ notifies `couch_replicator` application when documents in `_replicator` DBs
+ are updated, or when `_replicator` DBs 

[GitHub] [couchdb-documentation] ksnavely commented on a change in pull request #581: [RFC] Replicator Implementation for CouchDB 4.x

2020-08-13 Thread GitBox


ksnavely commented on a change in pull request #581:
URL: 
https://github.com/apache/couchdb-documentation/pull/581#discussion_r470289084



##
File path: rfcs/016-fdb-replicator.md
##
@@ -0,0 +1,359 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Replicator Implementation On FDB'
+labels: rfc, discussion
+assignees: 'vatam...@apache.org'
+
+---
+
+# Introduction
+
+This document describes the design of the replicator application for CouchDB
+4.x. The replicator will rely on `couch_jobs` for centralized scheduling and
+monitoring of replication jobs.
+
+## Abstract
+
+CouchDB replicator is the CouchDB application which runs replication jobs.
+Replication jobs can be created from documents in `_replicator` databases, or
+by `POST`-ing requests to the HTTP `/_replicate` endpoint. Previously, in
+CouchDB <= 3.x replication jobs were mapped to individual cluster nodes and a
+scheduler component would run up to `max_jobs` number of jobs at a time on each
+node. The new design proposes using `couch_jobs`, as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md),
+to have a central, FDB-based queue of replication jobs. `couch_jobs`
+application would manage job scheduling and coordination. The new design also
+proposes using heterogeneous node types as defined in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+such that replication jobs will be created only on `api_frontend` nodes and run
+only on `replication` nodes.
+
+## Requirements Language
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in [RFC
+2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`_replicator` databases : A database that is either named `_replicator` or ends
+with the `/_replicator` suffix.
+
+`transient` replications : Replication jobs created by `POST`-ing to the
+`/_replicate` endpoint.
+
+`persistent` replications : Replication jobs created from a document in a
+`_replicator` database.
+
+`continuous` replications : Jobs created with the `"continuous": true`
+parameter. When this job reaches the end of the changes feed it will continue
+waiting for new changes in a loop until the user removes the job.
+
+`normal` replications : Replication jobs which are not `continuous`. If the
+`"continuous":true` parameter is not specified, by default, replication jobs
+will be `normal`.
+
+`api_frontend node` : Database node which has the `api_frontend` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can be only be created on these nodes.
+
+`replication node` : Database node which has the `replication` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can only be run on these nodes.

Review comment:
   Great clarification, thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [couchdb-documentation] ksnavely commented on a change in pull request #581: [RFC] Replicator Implementation for CouchDB 4.x

2020-08-13 Thread GitBox


ksnavely commented on a change in pull request #581:
URL: 
https://github.com/apache/couchdb-documentation/pull/581#discussion_r470289014



##
File path: rfcs/016-fdb-replicator.md
##
@@ -0,0 +1,359 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Replicator Implementation On FDB'
+labels: rfc, discussion
+assignees: 'vatam...@apache.org'
+
+---
+
+# Introduction
+
+This document describes the design of the replicator application for CouchDB
+4.x. The replicator will rely on `couch_jobs` for centralized scheduling and
+monitoring of replication jobs.
+
+## Abstract
+
+CouchDB replicator is the CouchDB application which runs replication jobs.
+Replication jobs can be created from documents in `_replicator` databases, or
+by `POST`-ing requests to the HTTP `/_replicate` endpoint. Previously, in
+CouchDB <= 3.x replication jobs were mapped to individual cluster nodes and a
+scheduler component would run up to `max_jobs` number of jobs at a time on each
+node. The new design proposes using `couch_jobs`, as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md),
+to have a central, FDB-based queue of replication jobs. `couch_jobs`
+application would manage job scheduling and coordination. The new design also
+proposes using heterogeneous node types as defined in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+such that replication jobs will be created only on `api_frontend` nodes and run
+only on `replication` nodes.
+
+## Requirements Language
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in [RFC
+2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`_replicator` databases : A database that is either named `_replicator` or ends
+with the `/_replicator` suffix.

Review comment:
   Ah yes, makes total sense.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [couchdb-documentation] ksnavely commented on a change in pull request #581: [RFC] Replicator Implementation for CouchDB 4.x

2020-08-13 Thread GitBox


ksnavely commented on a change in pull request #581:
URL: 
https://github.com/apache/couchdb-documentation/pull/581#discussion_r470130225



##
File path: rfcs/016-fdb-replicator.md
##
@@ -0,0 +1,359 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Replicator Implementation On FDB'
+labels: rfc, discussion
+assignees: 'vatam...@apache.org'
+
+---
+
+# Introduction
+
+This document describes the design of the replicator application for CouchDB
+4.x. The replicator will rely on `couch_jobs` for centralized scheduling and
+monitoring of replication jobs.
+
+## Abstract
+
+CouchDB replicator is the CouchDB application which runs replication jobs.
+Replication jobs can be created from documents in `_replicator` databases, or
+by `POST`-ing requests to the HTTP `/_replicate` endpoint. Previously, in
+CouchDB <= 3.x replication jobs were mapped to individual cluster nodes and a
+scheduler component would run up to `max_jobs` number of jobs at a time on each
+node. The new design proposes using `couch_jobs`, as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md),
+to have a central, FDB-based queue of replication jobs. `couch_jobs`
+application would manage job scheduling and coordination. The new design also
+proposes using heterogeneous node types as defined in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+such that replication jobs will be created only on `api_frontend` nodes and run
+only on `replication` nodes.
+
+## Requirements Language
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in [RFC
+2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`_replicator` databases : A database that is either named `_replicator` or ends
+with the `/_replicator` suffix.
+
+`transient` replications : Replication jobs created by `POST`-ing to the
+`/_replicate` endpoint.
+
+`persistent` replications : Replication jobs created from a document in a
+`_replicator` database.
+
+`continuous` replications : Jobs created with the `"continuous": true`
+parameter. When this job reaches the end of the changes feed it will continue
+waiting for new changes in a loop until the user removes the job.
+
+`normal` replications : Replication jobs which are not `continuous`. If the
+`"continuous":true` parameter is not specified, by default, replication jobs
+will be `normal`.
+
+`api_frontend node` : Database node which has the `api_frontend` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can be only be created on these nodes.
+
+`replication node` : Database node which has the `replication` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can only be run on these nodes.

Review comment:
   It might be worth clarifying that no other activities take place other 
than replication.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [couchdb-documentation] ksnavely commented on a change in pull request #581: [RFC] Replicator Implementation for CouchDB 4.x

2020-08-13 Thread GitBox


ksnavely commented on a change in pull request #581:
URL: 
https://github.com/apache/couchdb-documentation/pull/581#discussion_r470130225



##
File path: rfcs/016-fdb-replicator.md
##
@@ -0,0 +1,359 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Replicator Implementation On FDB'
+labels: rfc, discussion
+assignees: 'vatam...@apache.org'
+
+---
+
+# Introduction
+
+This document describes the design of the replicator application for CouchDB
+4.x. The replicator will rely on `couch_jobs` for centralized scheduling and
+monitoring of replication jobs.
+
+## Abstract
+
+CouchDB replicator is the CouchDB application which runs replication jobs.
+Replication jobs can be created from documents in `_replicator` databases, or
+by `POST`-ing requests to the HTTP `/_replicate` endpoint. Previously, in
+CouchDB <= 3.x replication jobs were mapped to individual cluster nodes and a
+scheduler component would run up to `max_jobs` number of jobs at a time on each
+node. The new design proposes using `couch_jobs`, as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md),
+to have a central, FDB-based queue of replication jobs. `couch_jobs`
+application would manage job scheduling and coordination. The new design also
+proposes using heterogeneous node types as defined in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+such that replication jobs will be created only on `api_frontend` nodes and run
+only on `replication` nodes.
+
+## Requirements Language
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in [RFC
+2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`_replicator` databases : A database that is either named `_replicator` or ends
+with the `/_replicator` suffix.
+
+`transient` replications : Replication jobs created by `POST`-ing to the
+`/_replicate` endpoint.
+
+`persistent` replications : Replication jobs created from a document in a
+`_replicator` database.
+
+`continuous` replications : Jobs created with the `"continuous": true`
+parameter. When this job reaches the end of the changes feed it will continue
+waiting for new changes in a loop until the user removes the job.
+
+`normal` replications : Replication jobs which are not `continuous`. If the
+`"continuous":true` parameter is not specified, by default, replication jobs
+will be `normal`.
+
+`api_frontend node` : Database node which has the `api_frontend` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can be only be created on these nodes.
+
+`replication node` : Database node which has the `replication` type set to
+`true` as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+Replication jobs can only be run on these nodes.

Review comment:
   It might be worth clarifying that no other activities take place other 
than replication on a `replication` node.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [couchdb-documentation] ksnavely commented on a change in pull request #581: [RFC] Replicator Implementation for CouchDB 4.x

2020-08-13 Thread GitBox


ksnavely commented on a change in pull request #581:
URL: 
https://github.com/apache/couchdb-documentation/pull/581#discussion_r470126606



##
File path: rfcs/016-fdb-replicator.md
##
@@ -0,0 +1,359 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Replicator Implementation On FDB'
+labels: rfc, discussion
+assignees: 'vatam...@apache.org'
+
+---
+
+# Introduction
+
+This document describes the design of the replicator application for CouchDB
+4.x. The replicator will rely on `couch_jobs` for centralized scheduling and
+monitoring of replication jobs.
+
+## Abstract
+
+CouchDB replicator is the CouchDB application which runs replication jobs.
+Replication jobs can be created from documents in `_replicator` databases, or
+by `POST`-ing requests to the HTTP `/_replicate` endpoint. Previously, in
+CouchDB <= 3.x replication jobs were mapped to individual cluster nodes and a
+scheduler component would run up to `max_jobs` number of jobs at a time on each
+node. The new design proposes using `couch_jobs`, as described in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md),
+to have a central, FDB-based queue of replication jobs. `couch_jobs`
+application would manage job scheduling and coordination. The new design also
+proposes using heterogeneous node types as defined in
+[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+such that replication jobs will be created only on `api_frontend` nodes and run
+only on `replication` nodes.
+
+## Requirements Language
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in [RFC
+2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`_replicator` databases : A database that is either named `_replicator` or ends
+with the `/_replicator` suffix.

Review comment:
   Just double checking, is the suffix identifier `_replicator` or 
`/_replicator`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org