Hmm, actually the "timeout" suggestion is a key difference between the linked
log and mine, which has "noproc" there instead.
Changing the way I start CouchDB from:
bc3/build/bin/couchdb -b -r 5 -o /dev/null -e production_couch/couch.stderr
-p production_couch/couch.pid -n -a bc3/build/etc/couchdb/default.ini -a
production_couch/local.ini
To:
while sleep 5; do bc3/build/bin/couchdb -n -a
bc3/build/etc/couchdb/default.ini -a production_couch/local.ini; done
…and I have not seen any further restarts after an hour. (Yes, it is still
writing to its logfile… ;-)
Maybe these notes can help somebody. I know I am pretty much cured of using the
wrapper script's "-b" and "-r" option now; trusting that script to keep Couch
working was a cause of a much longer production outage where the restart never
happened either (see OP way below…).
tired of this,
-natevw
On Oct 29, 2014, at 2:12 PM, Nathan Vander Wilt <[email protected]>
wrote:
> I've now got a huge rash of crashes on another (slightly less critical)
> production server. Some are similar to this. I did find one thread, which
> seems about as inconclusive but does kind of match and at least has a few
> sort of cryptic suggestions:
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201408.mbox/%3C3B310764-6F57-4208-ADEC-381D23CC170B%40apache.org%3E
>
> The "nice" thing about the current situation is that it stays up for fifteen
> minutes tops before crashing. So I flipped debug logging on. Didn't get much
> more — most crashes are signalled by nothing other than the "CouchDB has
> started" log line — but here's one with a bit more info. I'll try figure out
> which vhosts setting is being referred to in the linked thread and see if it
> has any affect.
>
> -nvw
>
>
> [Wed, 29 Oct 2014 19:54:30 GMT] [debug] [<0.444.0>] OAuth Params: []
> [Wed, 29 Oct 2014 19:54:30 GMT] [debug] [<0.102.0>] DDocProc found for
> DDocKey: {<<"_design/ipcalf">>,
>
> <<"46-bb2d975f3e712e0077c884153c48ad09">>}
> [Wed, 29 Oct 2014 19:54:31 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532>
> Input :: ["reset",{"reduce_limit":true,"timeout":5000}]
> [Wed, 29 Oct 2014 19:54:32 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532>
> Output :: true
> [Wed, 29 Oct 2014 19:54:32 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532>
> Input ::
> ["ddoc","_design/ipcalf",["shows","address"],[null,{"info":{"db_name":"public","doc_count":60,"doc_del_count":2,"update_seq":184,"purge_seq":0,"compact_running":false,"disk_size":16244847,"data_size":16042203,"instance_start_time":"1414612231302700","disk_format_version":6,"committed_update_seq":184},"id":null,"uuid":"b9fd1dce73cc155b922bda0c230091de","method":"GET","requested_path":[],"path":["public","_design","ipcalf","_show","address"],"raw_path":"/public/_design/ipcalf/_show/address/","query":{},"headers":{"Connection":"close","Host":"ipcalf.com","User-Agent":"Mozilla/5.0
> (compatible; monitis - premium monitoring service;
> http://www.monitis.com)","x-couchdb-vhost-path":"/","X-Forwarded-For":"174.36.220.194"},"body":"undefined","peer":"174.36.220.194","form":{},"cookie":{},"userCtx":{"db":"public","name":null,"roles":[]},"secObj":{}}]]
> [Wed, 29 Oct 2014 19:54:33 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532>
> Output ::
> ["resp",{"headers":{"Access-Control-Allow-Origin":"*","Content-Type":"text/html;
> charset=utf-8"},"body":"Your IP address is: <h1>174.36.220.194</h1> Have a
> nice day.\n"}]
> [Wed, 29 Oct 2014 19:54:33 GMT] [info] [<0.444.0>] 174.36.220.194 - - GET
> /public/_design/ipcalf/_show/address/ 200
> [Wed, 29 Oct 2014 19:55:13 GMT] [error] [<0.107.0>] {error_report,<0.31.0>,
> {<0.107.0>,crash_report,
> [[{initial_call,
> {mochiweb_acceptor,init,
> ['Argument__1','Argument__2','Argument__3']}},
> {pid,<0.107.0>},
> {registered_name,[]},
> {error_info,
> {exit,
> {noproc,
> {gen_server,call,[couch_httpd_vhost,get_state]}},
> [{gen_server,call,2,
> [{file,"gen_server.erl"},{line,180}]},
> {couch_httpd_vhost,dispatch_host,1,
> [{file,
>
> "/home/ubuntu/bc3/dependencies/couchdb/src/couchdb/couch_httpd_vhost.erl"},
> {line,96}]},
> {couch_httpd,handle_request,5,
> [{file,
>
> "/home/ubuntu/bc3/dependencies/couchdb/src/couchdb/couch_httpd.erl"},
> {line,232}]},
> {mochiweb_http,headers,5,
> [{file,
>
> "/home/ubuntu/bc3/dependencies/couchdb/src/mochiweb/mochiweb_http.erl"},
> {line,94}]},
> {proc_lib,init_p_do_apply,3,
> [{file,"proc_lib.erl"},{line,239}]}]}},
> {ancestors,
> [couch_httpd,couch_secondary_services,
> couch_server_sup,<0.32.0>]},
> {messages,[]},
> {links,[<0.106.0>,#Port<0.2074>]},
> {dictionary,[{couch_rewrite_count,0}]},
> {trap_exit,false},
> {status,running},
> {heap_size,1598},
> {stack_size,27},
> {reductions,1011}],
> []]}}
> [Wed, 29 Oct 2014 19:55:14 GMT] [info] [<0.32.0>] Apache CouchDB has started
> on http://127.0.0.1:5984/
>
>
>
>
>
>
>
> On Oct 9, 2014, at 10:33 AM, Nathan Vander Wilt <[email protected]>
> wrote:
>
>> Any idea what might have caused the second crash, at bottom of this email?
>> Yesterday the same CouchDB server went down like this and didn't come back
>> up:
>>
>> -- first crash
>> heart: Wed Oct 8 10:31:25 2014: Erlang has closed.
>> Segmentation fault (core dumped)
>> sh: echo: I/O error
>> heart: Wed Oct 8 10:31:26 2014: Executed
>> "/home/natevw/bc16/build/bin/couchdb -k" -> 256. Terminating.
>>
>> …which have been because I was just starting it from crontab and hoping the
>> `-b -r 5` options would actually work. By today I've got the daemonization
>> more properly setup, using upstart and its respawn option.
>>
>> No big outage today, however I did notice another crash in the logs — I'd
>> like to avoid the daemon restarting at all in routine use if possible. I
>> don't see anything particularly useful/interesting as to the cause of the
>> crash…does the backtrace below imply anything in particular?
>>
>> The main difference the last two days is that this system is now back under
>> some load (maybe 50 users, up from maybe one or two in preceding weeks).
>> Right now (under "higher" load) the server is showing "0.00, 0.01, 0.05"
>> load average and 2.6 of 3.7GB memory free, so it doesn't seem offhand we're
>> pushing the system too hard. Besides basic reads/writes/view stuff, we still
>> haven't migrated off use of per-user filtered changes, which is the only
>> thing I can think might lead to a load-related problem.
>>
>> thanks,
>> -natevw
>>
>>
>>
>> -- second crash
>>
>> [Thu, 09 Oct 2014 15:23:24 GMT] [info] [<0.21979.2>] 127.0.0.1 - - GET
>> /production-db/org.couchdb.user%3Au123456 200
>> [Thu, 09 Oct 2014 15:23:26 GMT] [error] [<0.108.0>] {error_report,<0.31.0>,
>> {<0.108.0>,crash_report,
>> [[{initial_call,
>> {mochiweb_acceptor,init,
>> ['Argument__1','Argument__2','Argument__3']}},
>> {pid,<0.108.0>},
>> {registered_name,[]},
>> {error_info,
>> {exit,
>> {noproc,
>> {gen_server,call,[couch_httpd_vhost,get_state]}},
>> [{gen_server,call,2,
>> [{file,"gen_server.erl"},{line,180}]},
>> {couch_httpd_vhost,dispatch_host,1,
>> [{file,
>>
>> "/home/natevw/bc16/dependencies/couchdb/src/couchdb/couch_httpd_vhost.erl"},
>> {line,96}]},
>> {couch_httpd,handle_request,5,
>> [{file,
>>
>> "/home/natevw/bc16/dependencies/couchdb/src/couchdb/couch_httpd.erl"},
>> {line,217}]},
>> {mochiweb_http,headers,5,
>> [{file,
>>
>> "/home/natevw/bc16/dependencies/couchdb/src/mochiweb/mochiweb_http.erl"},
>> {line,94}]},
>> {proc_lib,init_p_do_apply,3,
>> [{file,"proc_lib.erl"},{line,239}]}]}},
>> {ancestors,
>> [couch_httpd,couch_secondary_services,
>> couch_server_sup,<0.32.0>]},
>> {messages,[]},
>> {links,[<0.107.0>,#Port<0.2017>]},
>> {dictionary,[{couch_rewrite_count,0}]},
>> {trap_exit,false},
>> {status,running},
>> {heap_size,2586},
>> {stack_size,27},
>> {reductions,1173}],
>> []]}}
>> [Thu, 09 Oct 2014 15:23:26 GMT] [info] [<0.32.0>] Apache CouchDB has started
>> on http://127.0.0.1:55984/
>>
>