CouchDB Crashing under high CPU wait

[mRg] Mon, 12 Jul 2010 09:27:18 -0700

Hi all,

I wonder if any of you can help me with a problem that has been plaguing us
for months.


We have CouchDB running on 3 virtualised (VMWare) RHEL5 servers. Each
instance of couchdb replicating to the other. The disks for these virtual
machines sit on top of a HP EVA / SAN.

We are noticing that sometimes the disk latency on the SAN can get high
(when a lot of snapshots/backups are occuring) which will cause high CPU
wait (>4.0) on the machines, at these point CouchDB seem to fail. No error
message in the logs, it just stops without warning.

We've turned full debug logging on and there is nothing in any of the logs
showing any kind of couchdb error but we're seeing all 3 machines fail at
the same time which leads us to believe it may be due to the cpu wait / disk
IO latency issue. We also have Solr running on these boxes and it doesnt
seem to be effected by this.

I was wondering if anyone has seen this issue before and if there was
anything else we can try (other than monitoring to see if the process is not
running and restart it). Could some other process on RHEL be killing off the
couch process ?

We have tried with 0.10.1 / 0.10.2 and have now upgraded Erlang and are
running with 0.11 but still having exactly the same issues.

Any help would be appreciated as we go live at the end of the month after a
year in development and this is the only outstanding issue and it
is perplexing to say the least.

Regards

Stephen

CouchDB Crashing under high CPU wait

Reply via email to