Hi All, I am experiencing a problem with both couchdb version 0.10 and since upgrade, version 0.11
We have a process that generates lots of small png images (~12kb) and uploads them into couchdb as document attachments. the process kicks off about 3000 of these at a specific time of the day. takes about half an hour. Unfortunately sometimes during this loaded period, couchdb crashes and appears to restart. replication is then offline, and the daemon process temporarily looses connection to couch during the restart for a few seconds. Heres the command we are using ( based on the supplied init.d file ) in the ps list couchdb 24095 0.0 0.0 4020 644 ? S Jun13 0:00 /bin/sh -e /opt/couchdb/bin/couchdb -a /opt/couchdb/etc/couchdb/default.ini -a /opt/couchdb/etc/couchdb/local.ini -b -r 5 -p /var/couchdb/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R couchdb 24105 0.0 0.0 4020 356 ? S Jun13 0:00 \_ /bin/sh -e /opt/couchdb/bin/couchdb -a /opt/couchdb/etc/couchdb/default.ini -a /opt/couchdb/etc/couchdb/local.ini -b -r 5 -p /var/couchdb/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R couchdb 24106 1.8 0.2 330228 41784 ? Sl Jun13 22:56 \_ /opt/erlang_R13B03/lib/erlang/erts-5.7.4/bin/beam.smp -Bd -K true -- -root /opt/erlang_R13B03/lib/erlang -progname erl -- -home /home/couchdb -- -noshell -noinput -sasl errlog_type error -couch_ini /opt/couchdb/etc/couchdb/default.ini /opt/couchdb/etc/couchdb/local.ini /opt/couchdb/etc/couchdb/default.ini /opt/couchdb/etc/couchdb/local.ini -s couch -pidfile /var/couchdb/run/couchdb/couchdb.pid -heart couchdb 24122 0.0 0.0 3784 504 ? Ss Jun13 0:00 \_ heart -pid 24106 -ht 11 couchdb 24127 0.0 0.0 10640 524 ? Ss Jun13 0:00 \_ inet_gethost 4 couchdb 24128 0.0 0.0 12736 628 ? S Jun13 0:00 \_ inet_gethost 4 couchdb 24638 0.0 0.0 12736 624 ? S Jun13 0:00 \_ inet_gethost 4 during one of these crashes we would see that process 24106 and below would restart, however the processes above would still say Jun13 for instance nothing in the couchdb logs that I can find. this happens with fsync per commit both on and off. Questions 1 - whats the best way to find out why the crash is occurring, should i be running without the -o /dev/null -e /dev/null -R 2 - does anyone know why couch would be crashing under load? 3 - would it be wise to try trunk instead of 0.11? i'm sure we can alleviate load and speed things up greatly using batch inserts, but still I don't feel that comfortable seeing couch restart itself when it hits a little write load. I really like couchdb, its a great solution to the replication problem its being used to solve here and i'm keen to work out whats going on so we can keep using it. Cheers Carl.
