Hi All,

I am experiencing a problem with both couchdb version 0.10 and since upgrade, 
version 0.11

We have a process that generates lots of small png images (~12kb) and uploads 
them into couchdb as document attachments. the process kicks off about 3000 of 
these at a specific time of the day. takes about half an hour.

Unfortunately sometimes during this loaded period, couchdb crashes and appears 
to restart. replication is then offline, and the daemon process temporarily 
looses connection to couch during the restart for a few seconds.

Heres the command we are using ( based on the supplied init.d file ) in the ps 
list

couchdb  24095  0.0  0.0   4020   644 ?        S    Jun13   0:00 /bin/sh -e 
/opt/couchdb/bin/couchdb -a /opt/couchdb/etc/couchdb/default.ini -a 
/opt/couchdb/etc/couchdb/local.ini -b -r 5 -p 
/var/couchdb/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R
couchdb  24105  0.0  0.0   4020   356 ?        S    Jun13   0:00  \_ /bin/sh -e 
/opt/couchdb/bin/couchdb -a /opt/couchdb/etc/couchdb/default.ini -a 
/opt/couchdb/etc/couchdb/local.ini -b -r 5 -p 
/var/couchdb/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R
couchdb  24106  1.8  0.2 330228 41784 ?        Sl   Jun13  22:56      \_ 
/opt/erlang_R13B03/lib/erlang/erts-5.7.4/bin/beam.smp -Bd -K true -- -root 
/opt/erlang_R13B03/lib/erlang -progname erl -- -home /home/couchdb -- -noshell 
-noinput -sasl errlog_type error -couch_ini 
/opt/couchdb/etc/couchdb/default.ini /opt/couchdb/etc/couchdb/local.ini 
/opt/couchdb/etc/couchdb/default.ini /opt/couchdb/etc/couchdb/local.ini -s 
couch -pidfile /var/couchdb/run/couchdb/couchdb.pid -heart
couchdb  24122  0.0  0.0   3784   504 ?        Ss   Jun13   0:00          \_ 
heart -pid 24106 -ht 11
couchdb  24127  0.0  0.0  10640   524 ?        Ss   Jun13   0:00          \_ 
inet_gethost 4
couchdb  24128  0.0  0.0  12736   628 ?        S    Jun13   0:00              
\_ inet_gethost 4
couchdb  24638  0.0  0.0  12736   624 ?        S    Jun13   0:00              
\_ inet_gethost 4

during one of these crashes we would see that process 24106 and below would 
restart, however the processes above would still say Jun13 for instance

nothing in the couchdb logs that I can find.

this happens with fsync per commit both on and off.

Questions

1 - whats the best way to find out why the crash is occurring, should i be 
running without the -o /dev/null -e /dev/null -R

2 - does anyone know why couch would be crashing under load?

3 - would it be wise to try trunk instead of 0.11?

i'm sure we can alleviate load and speed things up greatly using batch inserts, 
but still I don't feel that comfortable seeing couch restart itself when it 
hits a little write load.

I really like couchdb, its a great solution to the replication problem its 
being used to solve here and i'm keen to work out whats going on so we can keep 
using it.

Cheers
Carl.

Reply via email to