[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203280029
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
 
 Review comment:
   "subsections of documents" sounds like sharding happens on a sub-document 
level. Is that true? (are attachments and documents sharded separately, for 
example?)  If so, is this clarified somewhere? Or should this read "subsets of 
documents"?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203291174
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203291005
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203287056
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203287516
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203280029
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
 
 Review comment:
   "subsections of ducments" sounds like sharding happens on a sub-document 
level. Is that true? (are attachments and documents sharded separately, for 
example?)  If so, is this clarified somewhere? Or should this read "subsets of 
documents"?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203290566
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203285036
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203288977
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203285695
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203289160
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203283772
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203291906
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203290115
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203290263
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203291369
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203286763
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203287254
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203283458
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203284621
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179913553
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
 
 Review comment:
   I think this should say "to establish a quorum" ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179913647
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
+{"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that 

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179913724
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -97,27 +224,49 @@ called ``small`` there. Let us look in it. Yes, you can 
get it with curl too:
 }
 }
 
-* ``_id`` The name of the database.
-* ``_rev`` The current revision of the metadata.
-* ``shard_suffix`` The numbers after small and before .couch. This is seconds
-  after UNIX epoch when the database was created. Stored as ASCII characters.
-* ``changelog`` Self explaining. Mostly used for debugging.
-* ``by_node`` List of shards on each node.
-* ``by_range`` On which nodes each shard is.
+Here is a brief anatomy of that document:
+
+-  ``_id``: The name of the database.
+-  ``_rev``: The current revision of the metadata.
+-  ``shard_suffix``: A timestamp of the database's creation, marked as
+   seconds after the Unix epoch mapped to the codepoints for ASCII
+   numerals.
+-  ``changelog``: History of the database's shards.
+-  ``by_node``: List of shards on each node.
+-  ``by_range``: On which nodes each shard is.
+
+To reflect the shard move in the metadata, there are three steps:
+
+1. Add appropriate changelog entries.
+2. Update the ``by_node`` entries.
+3. Update the ``by_range`` entries.
+
+As of this writing, this process must be done manually. **WARNING: Be
+very careful! Mistakes during this process can irreperably corrupt the
+cluster!**
 
-Nothing here, nothing there, a shard in my sleeve
--
+To add a shard to a node, add entries like this to the database
+metadata's ``changelog`` attribute:
 
-Start node2 and add it to the cluster. Check in ``/_membership`` that the
-nodes are talking with each other.
+.. code:: json
 
-If you look in the directory ``data`` on node2, you will see that there is no
-directory called shards.
+[
+"add",
+"{range}",
+"{name}@{address}"
+]
 
-Use curl to change the ``_dbs/small`` node-local document for small, so it
-looks like this:
+*Note*: You can remove a node by specifying 'remove' instead of 'add'.
 
-.. code-block:: javascript
+Once you have figured out the new changelog entries, you will need to
+update the ``by_node`` and ``by_range`` to reflect who is storing what
+shards. The data in the changelog entries and these attributes must
+match. If they do not, the database may become corrupted.
+
+As an example, here is an updated version of the metadata above that
+adds shards to a second node called ``node2``:
+
+.. code:: json
 
 {
 "_id": "small",
 
 Review comment:
   Do you want to rename this `small` to `{name}` for consistency, as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179913827
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -97,27 +224,49 @@ called ``small`` there. Let us look in it. Yes, you can 
get it with curl too:
 }
 }
 
-* ``_id`` The name of the database.
-* ``_rev`` The current revision of the metadata.
-* ``shard_suffix`` The numbers after small and before .couch. This is seconds
-  after UNIX epoch when the database was created. Stored as ASCII characters.
-* ``changelog`` Self explaining. Mostly used for debugging.
-* ``by_node`` List of shards on each node.
-* ``by_range`` On which nodes each shard is.
+Here is a brief anatomy of that document:
+
+-  ``_id``: The name of the database.
+-  ``_rev``: The current revision of the metadata.
+-  ``shard_suffix``: A timestamp of the database's creation, marked as
+   seconds after the Unix epoch mapped to the codepoints for ASCII
+   numerals.
+-  ``changelog``: History of the database's shards.
+-  ``by_node``: List of shards on each node.
+-  ``by_range``: On which nodes each shard is.
+
+To reflect the shard move in the metadata, there are three steps:
+
+1. Add appropriate changelog entries.
+2. Update the ``by_node`` entries.
+3. Update the ``by_range`` entries.
+
+As of this writing, this process must be done manually. **WARNING: Be
+very careful! Mistakes during this process can irreperably corrupt the
+cluster!**
 
-Nothing here, nothing there, a shard in my sleeve
--
+To add a shard to a node, add entries like this to the database
+metadata's ``changelog`` attribute:
 
-Start node2 and add it to the cluster. Check in ``/_membership`` that the
-nodes are talking with each other.
+.. code:: json
 
-If you look in the directory ``data`` on node2, you will see that there is no
-directory called shards.
+[
+"add",
+"{range}",
+"{name}@{address}"
+]
 
-Use curl to change the ``_dbs/small`` node-local document for small, so it
-looks like this:
+*Note*: You can remove a node by specifying 'remove' instead of 'add'.
 
 Review comment:
   This comment doesn't seem especially relevant here.  Is there a better 
place, we can put this, where it will be more discoverable by people trying to 
remove nodes?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179913679
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
+{"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that 

[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179913572
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
 
 Review comment:
   again "a quorum".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] flimzy commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
flimzy commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179913852
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -179,123 +328,114 @@ looks like this:
 }
 }
 
-After PUTting this document, it's like magic: the shards are now on node2 too!
-We now have ``n=2``!
+Now you can ``PUT`` this new metadata:
 
-If the shards are large, then you can copy them over manually and only have
-CouchDB sync the changes from the last minutes instead.
+.. code:: bash
 
-.. _cluster/sharding/move:
+$ curl -X PUT $COUCH_URL:5986/_dbs/{name} -d '{...}'
 
-Moving Shards
-=
+Replicating from old to new
+~~~
 
-Add, then delete
-
+Because shards are just CouchDB databases, you can replicate them
+around. In order to make sure the new shard receives any updates the old
+one processed while you were updating its metadata, you should replicate
+the old shard to the new one:
 
-In the world of CouchDB there is no such thing as "moving" shards, only adding
-and removing shard replicas.
-You can add a new replica of a shard and then remove the old replica,
-thereby creating the illusion of moving.
-If you do this for a database that has ``n=1``,
-you might be caught by the following mistake:
+::
 
-#. Copy the shard onto a new node.
-#. Update the metadata to use the new node.
-#. Delete the shard on the old node.
-#. Oh, no!: You have lost all writes made between 1 and 2.
+$ curl -X POST $COUCH_URL:5986/_replicate -d '{ \
+"source": $OLD_SHARD_URL,
+"target": $NEW_SHARD_URL
+}'
 
-To avoid this mistake, you always want to make sure
-that both shards have been live for some time
-and that the shard on your new node is fully caught up
-before removing a shard on an old node.
-Since "moving" is a more conceptually (if not technically)
-accurate description of what you want to do,
-we'll use that word in this documentation as well.
+This will bring the new shard up to date so that we can safely delete
+the old one.
 
-Moving
---
+Delete old shard
+
+
+You can remove the old shard either by deleting its file or by deleting
+it through the private 5986 port:
 
 Review comment:
   Is there a preferred method we should encourage people to use?
   
   I'd vote for through the 5986 port, as it seems less error prone, but this 
whole process is already pretty manual, so maybe it doesn't really matter 
unless/until the whole thing can be done through the API.  Maybe others have 
thoughts? @wohali ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services