Yeah, we've been using monit quite happily, as well: check process couchdb with pidfile /usr/local/var/run/couchdb/couchdb.pid group database start program = "/etc/init.d/couchdb start" stop program = "/etc/init.d/couchdb stop" if failed host 127.0.0.1 port 5984 then restart if cpu is greater than 40% for 2 cycles then alert if cpu > 60% for 5 cycles then restart if 10 restarts within 10 cycles then timeout depends on data_fs
You'll note that it depends on data_fs which is an Amazon EBS drive that also is monitored. Furthermore, we can be notified of high CPU usage for traffic spikes... On Mon, Oct 5, 2009 at 1:56 PM, Nicholas Orr <[email protected]> wrote: > I've changed mine to do the -r 5and to send an alert if it is not running. > as long as -r 5 does what it is suppose to do everything will be ok > if it fails at least I'll know about is - this is where monit is useful, no > matter how smart/capable an erlang app is "suppose" to be, I'd like to know > if it goes down :) > > Nick > > On Tue, Oct 6, 2009 at 4:48 AM, Robert Newson <[email protected]>wrote: > >> Understood. All I'm saying is that Erlang applications should already >> have rich support for process restarting, heartbeat/keep-alive. >> >> monit is a generic wrapper to add those things when they are absent. A >> correctly configured Erlang application shouldn't need monit, imo. >> >> B. >> >> On Mon, Oct 5, 2009 at 6:40 PM, Francisco Viramontes <[email protected]> >> wrote: >> > I dunno but I tried with the respawn parameter for couchdb command in >> Gentoo >> > but it did not work. Also I have other services setup with monit so its >> more >> > convenient for me to have everything in one place. >> > >> > PAco >> > On Oct 5, 2009, at 12:22 PM, Robert Newson wrote: >> > >> >> Isn't couchdb (at least in the Debian package) monitored by heart? >> >> >> >> B. >> >> >> >> On Mon, Oct 5, 2009 at 6:05 PM, Nicholas Orr <[email protected]> >> >> wrote: >> >>> >> >>> great! >> >>> i was wondering what to put for the "test" conditions. >> >>> Yours work well, so thanks to you as well ;) >> >>> >> >>> Nick >> >>> >> >>> On Tue, Oct 6, 2009 at 4:01 AM, Francisco Viramontes >> >>> <[email protected]>wrote: >> >>> >> >>>> Nicholas >> >>>> >> >>>> Thanks man it worked I had been banging on my head for a week because >> of >> >>>> this >> >>>> >> >>>> my final monit scipt is >> >>>> >> >>>> check process couchdb >> >>>> with pidfile /var/run/couchdb/couchdb.pid >> >>>> #start program = "/etc/init.d/couchdb start" >> >>>> #stop program = "/etc/init.d/couchdb stop" >> >>>> start program = "/usr/bin/sudo -u couchdb /usr/bin/couchdb -b -o >> >>>> /dev/null >> >>>> -e /dev/null -p /var/run/couchdb/couchdb.pid" >> >>>> stop program = "/usr/bin/sudo -u couchdb /usr/bin/couchdb -b -o >> >>>> /dev/null >> >>>> -e /dev/null -p /var/run/couchdb/couchdb.pid -d" >> >>>> if failed host 127.0.0.1 port 5984 then restart >> >>>> if failed url http://localhost:5984/ and content == '"couchdb"' then >> >>>> restart >> >>>> group couchdb >> >>>> >> >>>> PAco >> >>>> >> >>>> >> >>>> On Oct 5, 2009, at 2:45 AM, Nicholas Orr wrote: >> >>>> >> >>>> My monit script is verbatim, as monit is run as root I want couchdb >> >>>>> >> >>>>> run as couchdb so do the following >> >>>>> >> >>>>> check process couchdb with pidfile /var/run/couchdb/couchdb.pid >> >>>>> start program = "/usr/bin/sudo -u couchdb /usr/bin/couchdb -b -o >> >>>>> /dev/null -e /dev/null -p /var/run/couchdb/couchdb.pid" >> >>>>> stop program = "/usr/bin/sudo -u couchdb /usr/bin/couchdb -b -o >> >>>>> /dev/null -e /dev/null -p /var/run/couchdb/couchdb.pid -d" >> >>>>> >> >>>>> try that and see what happens... >> >>>>> >> >>>>> On Mon, Oct 5, 2009 at 7:49 AM, Francisco Viramontes < >> [email protected]> >> >>>>> wrote: >> >>>>> >> >>>>>> Hey Guys >> >>>>>> >> >>>>>> has anyone tried to monitor couch with monit? >> >>>>>> >> >>>>>> I am using this settings and monit successfully monitors but when >> >>>>>> couchdb >> >>>>>> dies it fails to restart the service and I can find out why >> >>>>>> >> >>>>>> here is my couchdb.monitrc file: >> >>>>>> >> >>>>>> check process couchdb >> >>>>>> with pidfile /var/run/couchdb/couchdb.pid >> >>>>>> start program = "/etc/init.d/couchdb start" >> >>>>>> stop program = "/etc/init.d/couchdb stop" >> >>>>>> if failed host 127.0.0.1 port 5984 then restart >> >>>>>> if failed url http://localhost:5984/ and content == '"couchdb"' >> then >> >>>>>> restart >> >>>>>> group couchdb >> >>>>>> >> >>>>>> BTW I am using couch 0.9.1 and about once a day it dies on me the >> only >> >>>>>> thing >> >>>>>> I get from the log are strange erlang error messages saying OS >> procees >> >>>>>> timeout, anyone know whats that about? >> >>>>>> >> >>>>>> PAco >> >>>>>> >> >>>>>> >> >>>> >> >>> >> > >> > >> >
