My replicator is fairly young so I think calling it "reliable" might be a little misleading.
It does less, I don't ever attempt to cache the high watermark (last seq written) and start over from there. If the process crashes just start over from scratch. This can lead to a delay after restart but I find that it's much simpler and more reliable on failure. It's also simpler because it doesn't have to content with being an http client and a client of the internal couchdb erlang API. It just proxies requests from one couch to another. While I'm sure there are bugs that I haven't found yet in it, I can say that it replicates the npm repository quite well and I'm using it in production. -Mikeal On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote: > Hi Chris, > > From what I understand the current state of the replicator (as of 1.1) is > that for certain types of collections of documents it can be somewhat > fragile. In the case of the node.js package repository, http://npmjs.org, > there are many relatively large (~100MB) documents that would sometimes > throw errors or timeout during replication and crash the replicator, at > which point the replicator would restart and attempt to pick up where it > left off. I am not an expert in the internals of the replicator but > apparently the cumulative time required for the replicator to repeatedly > crash and then subsequently relocate itself in _changes feed in the case of > replicating the node package manager was making the built in couch > replicator unusable for the task. > > Two solutions exist that I know of. There is a new replicator in trunk (not > to be confused with the _replicator db from 1.1 -- it is still using the old > replicator algorithms) and there is also a more reliable replicator written > in node.js https://github.com/mikeal/replicate that was was written > specifically to replicate the node package repository between hosting > providers. > > Additionally it may be useful if you could describe the 'fingerprint' of > your documents a bit. How many documents are in the failing databases? are > the documents large or small? do they have many attachments? how large is > your _changes feed? > > Cheers, > > Max > > On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton > <[email protected]>wrote: > >> Hello, >> >> We now have about 150 dbs that are refusing to replicate with random >> crashes, which provide really zero debug information. The error is db >> not found, but I know its available. Does anyone know how can I >> trouble shoot this? Do we just have to many databases replicating for >> couchdb to handle? 4000 is a small number for the massive hardware >> these are running on. >> >> -Chris >>
