Hi Andreas,

If you say you want to split one large database in many smaller ones as a 
one-time task, its probably more efficient to write a script that reads the 
_changes feed of the large database and then decides where to put each 
document. Compared to the 200 filtered replications you will only need to read 
the changes feed 1 time instead of 200 times in parallel which will result in 
very poor performance because of disk seek times…

Such a migration script is only a few lines of code. And the _changes feed also 
lets you catchup after an initial split, you just need to log the passed seq 
number to know where you left and start over.

- mathias

On Jul 6, 2012, at 3:37 , Andreas Kemkes wrote:

> I'm trying to split up a monolithic database into smaller ones using filtered 
> continuous replications in couchdb 1.2.
> 
> I need about 200 of these replications (on a single server) and would like to 
> parallelize as much as possible.  Yet, when I do, the cpu load gets very high 
> and the system seems to be crawling, replication seems to be slow, and I'm 
> seeing timeout and other errors.
> 
> How can I best determine what the bottleneck is?
> 
> Are there suggestions on how to configure couchdb to handle it better (I've 
> increased max_dbs_open to 200)?
> 
> How do I best achieve good throughput?
> 
> This will be a one-time task, so any large measurement / monitoring effort is 
> probably overkill.
> 
> Any suggestions are much appreciated (including suggestions for different 
> approaches).
> 
> Thanks,
> 
> Andreas

Reply via email to