Thanks, Nathan. That startup script is long overdue for removal from the repo.
My recommendation is to just run couch directly from e.g. runit/supervisord/etc, given the crash-only design. Erlang's heartbeat setup just doesn't do a good enough job. Any reason you wouldn't replace your while/sleep/run loop with runit? -Joan ----- Original Message ----- From: "Nathan Vander Wilt" <[email protected]> To: [email protected] Sent: Wednesday, October 29, 2014 6:30:09 PM Subject: Re: Trying to get to bottom of another CouchDB crash scenario Hmm, actually the "timeout" suggestion is a key difference between the linked log and mine, which has "noproc" there instead. Changing the way I start CouchDB from: bc3/build/bin/couchdb -b -r 5 -o /dev/null -e production_couch/couch.stderr -p production_couch/couch.pid -n -a bc3/build/etc/couchdb/default.ini -a production_couch/local.ini To: while sleep 5; do bc3/build/bin/couchdb -n -a bc3/build/etc/couchdb/default.ini -a production_couch/local.ini; done …and I have not seen any further restarts after an hour. (Yes, it is still writing to its logfile… ;-) Maybe these notes can help somebody. I know I am pretty much cured of using the wrapper script's "-b" and "-r" option now; trusting that script to keep Couch working was a cause of a much longer production outage where the restart never happened either (see OP way below…). tired of this, -natevw On Oct 29, 2014, at 2:12 PM, Nathan Vander Wilt <[email protected]> wrote: > I've now got a huge rash of crashes on another (slightly less critical) > production server. Some are similar to this. I did find one thread, which > seems about as inconclusive but does kind of match and at least has a few > sort of cryptic suggestions: > http://mail-archives.apache.org/mod_mbox/couchdb-user/201408.mbox/%3C3B310764-6F57-4208-ADEC-381D23CC170B%40apache.org%3E > > The "nice" thing about the current situation is that it stays up for fifteen > minutes tops before crashing. So I flipped debug logging on. Didn't get much > more — most crashes are signalled by nothing other than the "CouchDB has > started" log line — but here's one with a bit more info. I'll try figure out > which vhosts setting is being referred to in the linked thread and see if it > has any affect. > > -nvw > > > [Wed, 29 Oct 2014 19:54:30 GMT] [debug] [<0.444.0>] OAuth Params: [] > [Wed, 29 Oct 2014 19:54:30 GMT] [debug] [<0.102.0>] DDocProc found for > DDocKey: {<<"_design/ipcalf">>, > > <<"46-bb2d975f3e712e0077c884153c48ad09">>} > [Wed, 29 Oct 2014 19:54:31 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> > Input :: ["reset",{"reduce_limit":true,"timeout":5000}] > [Wed, 29 Oct 2014 19:54:32 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> > Output :: true > [Wed, 29 Oct 2014 19:54:32 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> > Input :: > ["ddoc","_design/ipcalf",["shows","address"],[null,{"info":{"db_name":"public","doc_count":60,"doc_del_count":2,"update_seq":184,"purge_seq":0,"compact_running":false,"disk_size":16244847,"data_size":16042203,"instance_start_time":"1414612231302700","disk_format_version":6,"committed_update_seq":184},"id":null,"uuid":"b9fd1dce73cc155b922bda0c230091de","method":"GET","requested_path":[],"path":["public","_design","ipcalf","_show","address"],"raw_path":"/public/_design/ipcalf/_show/address/","query":{},"headers":{"Connection":"close","Host":"ipcalf.com","User-Agent":"Mozilla/5.0 > (compatible; monitis - premium monitoring service; > http://www.monitis.com)","x-couchdb-vhost-path":"/","X-Forwarded-For":"174.36.220.194"},"body":"undefined","peer":"174.36.220.194","form":{},"cookie":{},"userCtx":{"db":"public","name":null,"roles":[]},"secObj":{}}]] > [Wed, 29 Oct 2014 19:54:33 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> > Output :: > ["resp",{"headers":{"Access-Control-Allow-Origin":"*","Content-Type":"text/html; > charset=utf-8"},"body":"Your IP address is: <h1>174.36.220.194</h1> Have a > nice day.\n"}] > [Wed, 29 Oct 2014 19:54:33 GMT] [info] [<0.444.0>] 174.36.220.194 - - GET > /public/_design/ipcalf/_show/address/ 200 > [Wed, 29 Oct 2014 19:55:13 GMT] [error] [<0.107.0>] {error_report,<0.31.0>, > {<0.107.0>,crash_report, > [[{initial_call, > {mochiweb_acceptor,init, > ['Argument__1','Argument__2','Argument__3']}}, > {pid,<0.107.0>}, > {registered_name,[]}, > {error_info, > {exit, > {noproc, > {gen_server,call,[couch_httpd_vhost,get_state]}}, > [{gen_server,call,2, > [{file,"gen_server.erl"},{line,180}]}, > {couch_httpd_vhost,dispatch_host,1, > [{file, > > "/home/ubuntu/bc3/dependencies/couchdb/src/couchdb/couch_httpd_vhost.erl"}, > {line,96}]}, > {couch_httpd,handle_request,5, > [{file, > > "/home/ubuntu/bc3/dependencies/couchdb/src/couchdb/couch_httpd.erl"}, > {line,232}]}, > {mochiweb_http,headers,5, > [{file, > > "/home/ubuntu/bc3/dependencies/couchdb/src/mochiweb/mochiweb_http.erl"}, > {line,94}]}, > {proc_lib,init_p_do_apply,3, > [{file,"proc_lib.erl"},{line,239}]}]}}, > {ancestors, > [couch_httpd,couch_secondary_services, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.106.0>,#Port<0.2074>]}, > {dictionary,[{couch_rewrite_count,0}]}, > {trap_exit,false}, > {status,running}, > {heap_size,1598}, > {stack_size,27}, > {reductions,1011}], > []]}} > [Wed, 29 Oct 2014 19:55:14 GMT] [info] [<0.32.0>] Apache CouchDB has started > on http://127.0.0.1:5984/ > > > > > > > > On Oct 9, 2014, at 10:33 AM, Nathan Vander Wilt <[email protected]> > wrote: > >> Any idea what might have caused the second crash, at bottom of this email? >> Yesterday the same CouchDB server went down like this and didn't come back >> up: >> >> -- first crash >> heart: Wed Oct 8 10:31:25 2014: Erlang has closed. >> Segmentation fault (core dumped) >> sh: echo: I/O error >> heart: Wed Oct 8 10:31:26 2014: Executed >> "/home/natevw/bc16/build/bin/couchdb -k" -> 256. Terminating. >> >> …which have been because I was just starting it from crontab and hoping the >> `-b -r 5` options would actually work. By today I've got the daemonization >> more properly setup, using upstart and its respawn option. >> >> No big outage today, however I did notice another crash in the logs — I'd >> like to avoid the daemon restarting at all in routine use if possible. I >> don't see anything particularly useful/interesting as to the cause of the >> crash…does the backtrace below imply anything in particular? >> >> The main difference the last two days is that this system is now back under >> some load (maybe 50 users, up from maybe one or two in preceding weeks). >> Right now (under "higher" load) the server is showing "0.00, 0.01, 0.05" >> load average and 2.6 of 3.7GB memory free, so it doesn't seem offhand we're >> pushing the system too hard. Besides basic reads/writes/view stuff, we still >> haven't migrated off use of per-user filtered changes, which is the only >> thing I can think might lead to a load-related problem. >> >> thanks, >> -natevw >> >> >> >> -- second crash >> >> [Thu, 09 Oct 2014 15:23:24 GMT] [info] [<0.21979.2>] 127.0.0.1 - - GET >> /production-db/org.couchdb.user%3Au123456 200 >> [Thu, 09 Oct 2014 15:23:26 GMT] [error] [<0.108.0>] {error_report,<0.31.0>, >> {<0.108.0>,crash_report, >> [[{initial_call, >> {mochiweb_acceptor,init, >> ['Argument__1','Argument__2','Argument__3']}}, >> {pid,<0.108.0>}, >> {registered_name,[]}, >> {error_info, >> {exit, >> {noproc, >> {gen_server,call,[couch_httpd_vhost,get_state]}}, >> [{gen_server,call,2, >> [{file,"gen_server.erl"},{line,180}]}, >> {couch_httpd_vhost,dispatch_host,1, >> [{file, >> >> "/home/natevw/bc16/dependencies/couchdb/src/couchdb/couch_httpd_vhost.erl"}, >> {line,96}]}, >> {couch_httpd,handle_request,5, >> [{file, >> >> "/home/natevw/bc16/dependencies/couchdb/src/couchdb/couch_httpd.erl"}, >> {line,217}]}, >> {mochiweb_http,headers,5, >> [{file, >> >> "/home/natevw/bc16/dependencies/couchdb/src/mochiweb/mochiweb_http.erl"}, >> {line,94}]}, >> {proc_lib,init_p_do_apply,3, >> [{file,"proc_lib.erl"},{line,239}]}]}}, >> {ancestors, >> [couch_httpd,couch_secondary_services, >> couch_server_sup,<0.32.0>]}, >> {messages,[]}, >> {links,[<0.107.0>,#Port<0.2017>]}, >> {dictionary,[{couch_rewrite_count,0}]}, >> {trap_exit,false}, >> {status,running}, >> {heap_size,2586}, >> {stack_size,27}, >> {reductions,1173}], >> []]}} >> [Thu, 09 Oct 2014 15:23:26 GMT] [info] [<0.32.0>] Apache CouchDB has started >> on http://127.0.0.1:55984/ >> >
