replicating docs with tons of conflicts

Stephen Bartell Thu, 14 Mar 2013 02:03:12 -0700

Hi all, 

tldr; I've got a database with just a couple docs.  Conflict management went 
unchecked and these docs have thousands of conflicts each.  Replication fails.  
Couch consumes all the server's cpu.


First the story, then the questions.  Please bear with me!

I wanted to replicate this database to another, new database.  So I started the 
replication.  beam.smp took 100% of my cpu and the replicator status held 
steady at a constant percent for quite a while.  It eventually finished.

I thought maybe I should handle the conflicts and then replicate.  Hopefuly 
it'll go faster next time.  So I cleared all the conflicts.  I replicated again 
but this time I could not get anything to replicate.  Again, cpu held steady, 
topped out. I eventually restarted couch.

I dug throughout the logs and saw that the POSTS were failing.  I figure that 
the replicator was timing out when trying to post to couch.

I have a replicator that I've been working on thats written in node.js.  So I 
started that one up to do the same thing.  I drew inspiration from Pouchdb's 
replicator and from Jens Alkes amazing replication algorithm documentation, so 
my replicator follows more or less the same story.  1) consume _changes with 
style=all_docs.  2) revs_diff on the target database.  3) get each revision 
from source with revs=true.  4) bulk post with new_edits=false.

Same thing.  Except now I can kind of make sense of whats going on.  Sucking 
the data out of the source is no problem.  Diffing the revs against the target 
is no problem.  Posting the docs is THE problem.  Since the database is clean, 
thousands of docs are being thrown at couch at once to build up the revision 
trees.  Couch is just taking forever in finishing the job.  It doesn't matter 
if I bulk post the docs or post them individually, couch sucks 100% of my cpu 
every time and takes forever to finish. (I actually never let it finish). 

So that is is the story. Here are my questions.

1) Has anyone else stepped on this mine?  If so, could I get pointed towards 
some workarounds?  I don't think it is right to make the assumption that users 
of couchdb will never have databases with huge conflict sausages like this. So 
simply saying manage your conflicts won't help.

2) Lets say I did manage my conflicts.  I still have the _deleted_conflicts 
sausage.  I know that _deleted and _deleted_docs must be replicated to maintain 
consistency across the cluster.  If the replicator throws up when these huge 
sausages come through, how is the data ever going to replicate?  Is there a 
trade secret I don't know about?

3) Is there any limit on the resources that CouchDB is allowed to consume?  I 
can get that we run into these cases where theres tons of data to move and its 
just going to take a hell of a long time.  But I don't get why its permissible 
for CouchDB to eat all my cpu.  The whole server should never grind to a halt 
because its moving lots of data.  I feel like it should be like the little 
train who could.  Just chug along slow and steady until it crests the hill.

I would really like to reply on the erlang replicator, but I can't.  At least 
with the replicator I wrote I have a chance with throttling the posts so 
CouchDB doesn't render my server useless.

Sorry for wrapping more questions into those questions.  I'm pretty tired, 
stumped, and have machines in production crumbling.

Best, 
Stephen

replicating docs with tons of conflicts

Reply via email to