[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203456026
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
+
+.. _cluster/sharding/move:
+
+Moving a shard
+--
+

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203377375
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
 Review comment:
   The Dynamo paper's primary example is n=3, but it doesn't go into why.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203375866
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
 
 Review comment:
   My point being - if the explanation is useful, 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-07-18 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203374588
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -12,290 +12,490 @@
 
 .. _cluster/sharding:
 
-
-Sharding
-
+
+Shard Management
+
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A `shard
+`__ is a
+horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "shard replicas" or
+just "replicas") to different nodes in a cluster gives the data greater
+durability against node loss. CouchDB clusters automatically shard
+databases and distribute the subsections of documents that compose each
+shard among nodes. Modifying cluster membership and sharding behavior
+must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. The default value for n is 3,
+and for q is 8. With q=8, the database is split into 8 shards. With
+n=3, the cluster distributes three replicas of each shard. Altogether,
+that's 24 shards for a single database. In a default 3-node cluster,
+each node would receive 8 shards. In a 4-node cluster, each node would
+receive 6 shards. We recommend in the general case that the number of
+nodes in your cluster should be a multiple of n, so that shards are
+distributed evenly.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
 
+Quorum
+~~
+
+Depending on the size of the cluster, the number of shards per database,
+and the number of shard replicas, not every node may have access to
+every shard, but every node knows where all the replicas of each shard
+can be found through CouchDB's internal shard map.
+
+Each request that comes in to a CouchDB cluster is handled by any one
+random coordinating node. This coordinating node proxies the request to
+the other nodes that have the relevant data, which may or may not
+include itself. The coordinating node sends a response to the client
+once a `quorum
+`__ of
+database nodes have responded; 2, by default.
+
+The size of the required quorum can be configured at
+request time by setting the ``r`` parameter for document and view
+reads, and the ``w`` parameter for document writes. For example, here
+is a request that specifies that at least two nodes must respond in
+order to establish a quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/?w=2" -d '{...}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded or timed out, and as such this approach does not
+guarantee `ACIDic consistency
+`__. Setting ``r`` or
+``w`` to 1 means you will receive a response after only one relevant
+node has responded.
 
 Review comment:
   Our documentation should be as self-contained 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-07-17 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r203230201
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -179,123 +328,114 @@ looks like this:
 }
 }
 
-After PUTting this document, it's like magic: the shards are now on node2 too!
-We now have ``n=2``!
+Now you can ``PUT`` this new metadata:
 
-If the shards are large, then you can copy them over manually and only have
-CouchDB sync the changes from the last minutes instead.
+.. code:: bash
 
-.. _cluster/sharding/move:
+$ curl -X PUT $COUCH_URL:5986/_dbs/{name} -d '{...}'
 
-Moving Shards
-=
+Replicating from old to new
+~~~
 
-Add, then delete
-
+Because shards are just CouchDB databases, you can replicate them
+around. In order to make sure the new shard receives any updates the old
+one processed while you were updating its metadata, you should replicate
+the old shard to the new one:
 
-In the world of CouchDB there is no such thing as "moving" shards, only adding
-and removing shard replicas.
-You can add a new replica of a shard and then remove the old replica,
-thereby creating the illusion of moving.
-If you do this for a database that has ``n=1``,
-you might be caught by the following mistake:
+::
 
-#. Copy the shard onto a new node.
-#. Update the metadata to use the new node.
-#. Delete the shard on the old node.
-#. Oh, no!: You have lost all writes made between 1 and 2.
+$ curl -X POST $COUCH_URL:5986/_replicate -d '{ \
+"source": $OLD_SHARD_URL,
+"target": $NEW_SHARD_URL
+}'
 
-To avoid this mistake, you always want to make sure
-that both shards have been live for some time
-and that the shard on your new node is fully caught up
-before removing a shard on an old node.
-Since "moving" is a more conceptually (if not technically)
-accurate description of what you want to do,
-we'll use that word in this documentation as well.
+This will bring the new shard up to date so that we can safely delete
+the old one.
 
-Moving
---
+Delete old shard
+
+
+You can remove the old shard either by deleting its file or by deleting
+it through the private 5986 port:
+
+.. code:: bash
+
+# delete the file
+rm $COUCH_DIR/data/shards/$OLD_SHARD
 
-When you get to ``n=3`` you should start moving the shards instead of adding
-more replicas.
+# OR delete the database
+curl -X DELETE $COUCH_URL:5986/$OLD_SHARD
 
-We will stop on ``n=2`` to keep things simple. Start node number 3 and add it 
to
-the cluster. Then create the directories for the shard on node3:
+Congratulations! You have manually added a new shard. By adding and
+removing database shards in this way, they can be moved between nodes.
 
-.. code-block:: bash
+Specifying database placement
+-
 
-mkdir -p data/shards/-7fff
+Database shards can be configured to live solely on specific nodes using
+placement rules.
 
-And copy over ``data/shards/-7fff/small.1425202577.couch`` from
-node1 to node3. Do not move files between the shard directories as that will
-confuse CouchDB!
+First, each node must be labeled with a zone attribute. This defines
+which zone each node is in. You do this by editing the node’s document
+in the ``/nodes`` database, which is accessed through the “back-door”
+(5986) port. Add a key value pair of the form:
 
-Edit the database document in ``_dbs`` again. Make it so that node3 have a
-replica of the shard ``-7fff``. Save the document and let CouchDB
-sync. If we do not do this, then writes made during the copy of the shard and
-the updating of the metadata will only have ``n=1`` until CouchDB has synced.
+::
 
-Then update the metadata document so that node2 no longer have the shard
-``-7fff``. You can now safely delete
-``data/shards/-7fff/small.1425202577.couch`` on node 2.
+"zone": "{zone-name}"
 
-The changelog is nothing that CouchDB cares about, it is only for the admins.
-But for the sake of completeness, we will update it again. Use ``delete`` for
-recording the removal of the shard ``-7fff`` from node2.
+Do this for all of the nodes in your cluster. For example:
 
-Start node4, add it to the cluster and do the same as above with shard
-``8000-``.
+.. code:: bash
 
-All documents added during this operation was saved and all reads responded to
-without the users noticing anything.
+$ curl -X PUT $COUCH_URL:5986/_nodes/{name}@{address} \
+-d '{ \
+"_id": "{name}@{address}",
+"_rev": "{rev}",
+"zone": "{zone-name}"
+}'
 
-.. _cluster/sharding/views:
+In the config file (local.ini or default.ini) of each node, define a
+consistent cluster-wide setting like:
 
-Views
-=
+::
 
-The views need to be moved together with the shards. If you 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179922811
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -179,123 +328,114 @@ looks like this:
 }
 }
 
-After PUTting this document, it's like magic: the shards are now on node2 too!
-We now have ``n=2``!
+Now you can ``PUT`` this new metadata:
 
-If the shards are large, then you can copy them over manually and only have
-CouchDB sync the changes from the last minutes instead.
+.. code:: bash
 
-.. _cluster/sharding/move:
+$ curl -X PUT $COUCH_URL:5986/_dbs/{name} -d '{...}'
 
-Moving Shards
-=
+Replicating from old to new
+~~~
 
-Add, then delete
-
+Because shards are just CouchDB databases, you can replicate them
+around. In order to make sure the new shard receives any updates the old
+one processed while you were updating its metadata, you should replicate
+the old shard to the new one:
 
-In the world of CouchDB there is no such thing as "moving" shards, only adding
-and removing shard replicas.
-You can add a new replica of a shard and then remove the old replica,
-thereby creating the illusion of moving.
-If you do this for a database that has ``n=1``,
-you might be caught by the following mistake:
+::
 
-#. Copy the shard onto a new node.
-#. Update the metadata to use the new node.
-#. Delete the shard on the old node.
-#. Oh, no!: You have lost all writes made between 1 and 2.
+$ curl -X POST $COUCH_URL:5986/_replicate -d '{ \
+"source": $OLD_SHARD_URL,
+"target": $NEW_SHARD_URL
+}'
 
-To avoid this mistake, you always want to make sure
-that both shards have been live for some time
-and that the shard on your new node is fully caught up
-before removing a shard on an old node.
-Since "moving" is a more conceptually (if not technically)
-accurate description of what you want to do,
-we'll use that word in this documentation as well.
+This will bring the new shard up to date so that we can safely delete
+the old one.
 
-Moving
---
+Delete old shard
+
+
+You can remove the old shard either by deleting its file or by deleting
+it through the private 5986 port:
 
 Review comment:
   I'm ambivalent, with a slight preference for file deletion. The key is to 
update the cluster metadata first so that the shard is no longer "live" as far 
as the cluster is concerned; after that, the file is just detritus.
   
   As I mentioned in my review, port 5986 will go away with 3.0, so this 
section will need at least a partial rewrite sometime before then anyway.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179922771
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
+{"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-07 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179922734
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
 
 Review comment:
   Wikipedia uses the phrase "to obtain a quorum," for what it's worth.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179886705
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
 
 Review comment:
   I'd use a domain name in an actual cluster; running multiple nodes on the 
same host kind-of defeats the purpose of a cluster and is only really done in 
dev/demo environments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179886087
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
 
 Review comment:
   ...or if that number of replicas have been consulted, and timed out while 
waiting for a response.
   
   This is a good one to try yourself: set up a 3 node cluster, then kill a 
node, then try to issue a `r=3` or `w=3` request and see what response you get 
(and how long it take).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179887035
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
+{"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179887306
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
+{"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179887907
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -179,123 +328,114 @@ looks like this:
 }
 }
 
-After PUTting this document, it's like magic: the shards are now on node2 too!
-We now have ``n=2``!
+Now you can ``PUT`` this new metadata:
 
-If the shards are large, then you can copy them over manually and only have
-CouchDB sync the changes from the last minutes instead.
+.. code:: bash
 
-.. _cluster/sharding/move:
+$ curl -X PUT $COUCH_URL:5986/_dbs/{name} -d '{...}'
 
-Moving Shards
-=
+Replicating from old to new
+~~~
 
-Add, then delete
-
+Because shards are just CouchDB databases, you can replicate them
+around. In order to make sure the new shard receives any updates the old
+one processed while you were updating its metadata, you should replicate
+the old shard to the new one:
 
-In the world of CouchDB there is no such thing as "moving" shards, only adding
-and removing shard replicas.
-You can add a new replica of a shard and then remove the old replica,
-thereby creating the illusion of moving.
-If you do this for a database that has ``n=1``,
-you might be caught by the following mistake:
+::
 
-#. Copy the shard onto a new node.
-#. Update the metadata to use the new node.
-#. Delete the shard on the old node.
-#. Oh, no!: You have lost all writes made between 1 and 2.
+$ curl -X POST $COUCH_URL:5986/_replicate -d '{ \
+"source": $OLD_SHARD_URL,
+"target": $NEW_SHARD_URL
+}'
 
-To avoid this mistake, you always want to make sure
-that both shards have been live for some time
-and that the shard on your new node is fully caught up
-before removing a shard on an old node.
-Since "moving" is a more conceptually (if not technically)
-accurate description of what you want to do,
-we'll use that word in this documentation as well.
+This will bring the new shard up to date so that we can safely delete
+the old one.
 
-Moving
---
+Delete old shard
+
+
+You can remove the old shard either by deleting its file or by deleting
+it through the private 5986 port:
+
+.. code:: bash
+
+# delete the file
+rm $COUCH_DIR/data/shards/$OLD_SHARD
 
-When you get to ``n=3`` you should start moving the shards instead of adding
-more replicas.
+# OR delete the database
+curl -X DELETE $COUCH_URL:5986/$OLD_SHARD
 
-We will stop on ``n=2`` to keep things simple. Start node number 3 and add it 
to
-the cluster. Then create the directories for the shard on node3:
+Congratulations! You have manually added a new shard. By adding and
+removing database shards in this way, they can be moved between nodes.
 
-.. code-block:: bash
+Specifying database placement
+-
 
-mkdir -p data/shards/-7fff
+Database shards can be configured to live solely on specific nodes using
+placement rules.
 
-And copy over ``data/shards/-7fff/small.1425202577.couch`` from
-node1 to node3. Do not move files between the shard directories as that will
-confuse CouchDB!
+First, each node must be labeled with a zone attribute. This defines
+which zone each node is in. You do this by editing the node’s document
+in the ``/nodes`` database, which is accessed through the “back-door”
+(5986) port. Add a key value pair of the form:
 
-Edit the database document in ``_dbs`` again. Make it so that node3 have a
-replica of the shard ``-7fff``. Save the document and let CouchDB
-sync. If we do not do this, then writes made during the copy of the shard and
-the updating of the metadata will only have ``n=1`` until CouchDB has synced.
+::
 
-Then update the metadata document so that node2 no longer have the shard
-``-7fff``. You can now safely delete
-``data/shards/-7fff/small.1425202577.couch`` on node 2.
+"zone": "{zone-name}"
 
-The changelog is nothing that CouchDB cares about, it is only for the admins.
-But for the sake of completeness, we will update it again. Use ``delete`` for
-recording the removal of the shard ``-7fff`` from node2.
+Do this for all of the nodes in your cluster. For example:
 
-Start node4, add it to the cluster and do the same as above with shard
-``8000-``.
+.. code:: bash
 
-All documents added during this operation was saved and all reads responded to
-without the users noticing anything.
+$ curl -X PUT $COUCH_URL:5986/_nodes/{name}@{address} \
+-d '{ \
+"_id": "{name}@{address}",
+"_rev": "{rev}",
+"zone": "{zone-name}"
+}'
 
-.. _cluster/sharding/views:
+In the config file (local.ini or default.ini) of each node, define a
+consistent cluster-wide setting like:
 
-Views
-=
+::
 
-The views need to be moved together with the shards. If you 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179887158
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
+{"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179887511
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -179,123 +328,114 @@ looks like this:
 }
 }
 
-After PUTting this document, it's like magic: the shards are now on node2 too!
-We now have ``n=2``!
+Now you can ``PUT`` this new metadata:
 
-If the shards are large, then you can copy them over manually and only have
-CouchDB sync the changes from the last minutes instead.
+.. code:: bash
 
-.. _cluster/sharding/move:
+$ curl -X PUT $COUCH_URL:5986/_dbs/{name} -d '{...}'
 
-Moving Shards
-=
+Replicating from old to new
 
 Review comment:
   Nooo no no no. Big -1. Don't do this. We have even talked about disabling 
the replicator on port 5986 for any purpose other than upgrading from 1.x to 
2.x.
   
   Instead just add the shards to the map and CouchDB will manage this thru 
internal replication itself.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179886916
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
+{"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179886582
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
 
 Review comment:
   I think this is wrong, don't you mean 
`127.0.0.1:5986/_nodes/{name}@{address}`, issued on one machine in the cluster?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r17987
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
 
 Review comment:
   I would insert before "With":  "The default values are `q=8` and `n=3` 
respectively.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179886812
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
 
 Review comment:
   You could also just point to the documentation over in the Cluster Setup 
section rather than duplicating this info here. In fact, DRYing this out is 
probably a good idea...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179886897
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency `__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+$ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+$ curl "$COUCH_URL:5984/_node/{name}@{address}"
 
 Review comment:
   Same problem as above: wrong port, and 

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179885381
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1=2; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+[cluster]
+q=8
+n=3
 
-data/
-+-- shards/
-|   +-- -7fff/
-|   |-- small.1425202577.couch
-|   +-- 8000-/
-|-- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+$ curl -X PUT "$COUCH_URL/database-name?q=4=2"
 
-curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small;
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+--
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum `__.
+The size of the required quorum can be configured at request time by
 
 Review comment:
   You should state how quorum is calculated by default, which is 
`r=w=(n+1/2)`, or `r=w=2` for CouchDB's defaults.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179878729
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===
+Introduction
+
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard `__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
 Review comment:
   Maybe a sentence here about why we recommend that the number of nodes in a 
cluster be divisible by 3?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

2018-04-06 Thread GitBox
wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179868287
 
 

 ##
 File path: src/cluster/sharding.rst
 ##
 @@ -97,27 +224,49 @@ called ``small`` there. Let us look in it. Yes, you can 
get it with curl too:
 }
 }
 
-* ``_id`` The name of the database.
-* ``_rev`` The current revision of the metadata.
-* ``shard_suffix`` The numbers after small and before .couch. This is seconds
-  after UNIX epoch when the database was created. Stored as ASCII characters.
-* ``changelog`` Self explaining. Mostly used for debugging.
-* ``by_node`` List of shards on each node.
-* ``by_range`` On which nodes each shard is.
+Here is a brief anatomy of that document:
+
+-  ``_id``: The name of the database.
+-  ``_rev``: The current revision of the metadata.
+-  ``shard_suffix``: A timestamp of the database's creation, marked as
+   seconds after the Unix epoch mapped to the codepoints for ASCII
+   numerals.
+-  ``changelog``: History of the database's shards.
+-  ``by_node``: List of shards on each node.
+-  ``by_range``: On which nodes each shard is.
+
+To reflect the shard move in the metadata, there are three steps:
+
+1. Add appropriate changelog entries.
+2. Update the ``by_node`` entries.
+3. Update the ``by_range`` entries.
+
+As of this writing, this process must be done manually. **WARNING: Be
+very careful! Mistakes during this process can irreperably corrupt the
+cluster!**
 
-Nothing here, nothing there, a shard in my sleeve
--
+To add a shard to a node, add entries like this to the database
+metadata's ``changelog`` attribute:
 
-Start node2 and add it to the cluster. Check in ``/_membership`` that the
-nodes are talking with each other.
+.. code:: json1
 
 Review comment:
   This is causing the build to fail, I think the `1` is the problem


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services