Hi Mathias: Yes, you're analysis looks like it is spot on. There are advantages though to use the replication feature - looking at the _changes feed, I'm not immediately clear on how I would achieve the same behavior (e.g., deleted documents) - maybe due to my lack of exposure to couch details - even the _changes feed was new to me.
Benoit's suggestion is along the same lines and very interesting but requires an even newer couch installation than I currently have in place. That said, I also looked into if it is disk seek time that causes it, but iostat and its numbers for iowait suggest otherwise: avg-cpu: %user %nice %system %iowait %steal %idle 90.36 0.00 3.71 0.10 5.19 0.64 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdf 49.05 799.60 31.57 8004 316 I see many couchjs processes running that require a constant CPU load of 6-9% each in top. Is that expected? Tailing the log files for POSTs also makes me wonder about scheduling fairness among the replications. I see activity mostly for a small number of the target databases. Do you know how this is being handled? Thanks, Andreas ________________________________ From: Mathias Leppich <[email protected]> To: [email protected]; Andreas Kemkes <[email protected]> Sent: Friday, July 6, 2012 2:37 AM Subject: Re: How many filtered replications is too many? Hi Andreas, If you say you want to split one large database in many smaller ones as a one-time task, its probably more efficient to write a script that reads the _changes feed of the large database and then decides where to put each document. Compared to the 200 filtered replications you will only need to read the changes feed 1 time instead of 200 times in parallel which will result in very poor performance because of disk seek times… Such a migration script is only a few lines of code. And the _changes feed also lets you catchup after an initial split, you just need to log the passed seq number to know where you left and start over. - mathias On Jul 6, 2012, at 3:37 , Andreas Kemkes wrote: > I'm trying to split up a monolithic database into smaller ones using filtered > continuous replications in couchdb 1.2. > > I need about 200 of these replications (on a single server) and would like to > parallelize as much as possible. Yet, when I do, the cpu load gets very high > and the system seems to be crawling, replication seems to be slow, and I'm > seeing timeout and other errors. > > How can I best determine what the bottleneck is? > > Are there suggestions on how to configure couchdb to handle it better (I've > increased max_dbs_open to 200)? > > How do I best achieve good throughput? > > This will be a one-time task, so any large measurement / monitoring effort is > probably overkill. > > Any suggestions are much appreciated (including suggestions for different > approaches). > > Thanks, > > Andreas
