Hi Jens,

Quick question, what is the language/environment you planned to use for the feature.

On some thinking, this looks like something which might be a useful to me. If your work is public domain, and if I have the skills on your environment, I would like to contribute.

Thanks
Vivek


On 11/28/13, 11:59 PM, Vivek Pathak wrote:

Hi

Just curious when you say this is a problem:

   2. Inconsistent views; While transferring documents from one shard to
another there will be a moment when a document resides on two databases.

Are you talking about the view that shows which document do not belong - or you are talking about general views?

If you mean general view, one possibility is to specifically ignore all documents in the wrong shard. This will give you a stale view - but that is exactly what you get if you use stale=ok etc in single db case as well.

Thanks
Vivek



On 11/28/2013 06:28 PM, Jens Rantil wrote:
Hey couchers,

For the past week, I've been playing with the thought of implementing a
sharding orchestrator for CouchDB instances. My use case is that I have a lot of data/documents that can't be stored on a single machine and I'd like to be able to query the data with fairly low latencies. Oh, there's a big
write throughput constantly, too.

Querying could be done by rereducing the view results coming in from the
various shards. AFAIK, this is how BigCouch does it too.

Now, the rebalancing act of all this is pretty straight forward; each shard
is given a generated view that displays all documents that don't belong
there in the shard (and thus need to be transferred). The orchestrator will then periodically check if any documents should be transferred to another
server, replicate those documents and purge* them on the source server.

* Purging since I am not interested in keeping them on disk on the source
database anymore, not even tombstones. All the document data should only
reside on the new shard.

My issue with the described solution is that there will be a moment when
views will be showing incorrect information.

As I see it, there are two problems that needs to be solved with this
solution:

    1. Purging of documents: According to the purge
documentation<http://docs.couchdb.org/en/latest/api/database/misc.html#post--db-_purge>, purging documents will require rebuilding entire views. That ain't good. 2. Inconsistent views; While transferring documents from one shard to another there will be a moment when a document resides on two databases.

1) could possibly be solved by introducing an replica database of the
source database so that views could be recreated offline. However, this
makes replication way more tedious than I initially expected. 2) could be solved by simply locking view reads until the inconsistency is fixed. But hey, we're not bug fans of that kind of global locking. Possibly patching
all map-functions in the views would also fix the issue of double
documents. At the same time, that feels kind of like a hack.

Now, I know BigCouch is around the corner. How does it cope with the above issues? Maybe it's introduced some new magic for this that currently is not in CouchDB? I've been trying to Google this, but so far haven't found too
much. Feel free to point me in the right direction if you know of any
material.

Cheers,
Jens



Reply via email to