Ran into this twice so far in production CouchDB in the last two days. We are running CouchDB 1.1 on an EC2 AMI with multi-master replication across two regions. I notice that every now and then CouchDB will simply suck up 100% CPU 50% of the total memory and not respond at all. So far the logs only show sporadic replication errors. One of the stack traces (failed to replicate after 10 times) is about 500,000 lines long. We are using the _replicator database.
Anyone else running into this? Since 1.1 doesn't have the try-until-infinity-and-beyond mode, we have a worker task that watches the _replication_state and kicks the replicator as soon as it errors out. Are there any settings in terms replicator memory usage, etc that could help us? Thanks! K. --- http://blog.mudynamics.com http://blitz.io @pcapr
