Thanks, Nathan. That startup script is long overdue for removal from the repo.

My recommendation is to just run couch directly from e.g. 
runit/supervisord/etc, given the crash-only design. Erlang's heartbeat setup 
just doesn't do a good enough job. Any reason you wouldn't replace your 
while/sleep/run loop with runit?

-Joan

----- Original Message -----
From: "Nathan Vander Wilt" <[email protected]>
To: [email protected]
Sent: Wednesday, October 29, 2014 6:30:09 PM
Subject: Re: Trying to get to bottom of another CouchDB crash scenario

Hmm, actually the "timeout" suggestion is a key difference between the linked 
log and mine, which has "noproc" there instead.

Changing the way I start CouchDB from:

    bc3/build/bin/couchdb -b -r 5 -o /dev/null -e production_couch/couch.stderr 
-p production_couch/couch.pid -n -a bc3/build/etc/couchdb/default.ini -a 
production_couch/local.ini


To:

    while sleep 5; do bc3/build/bin/couchdb -n -a 
bc3/build/etc/couchdb/default.ini -a production_couch/local.ini; done


…and I have not seen any further restarts after an hour. (Yes, it is still 
writing to its logfile… ;-)

Maybe these notes can help somebody. I know I am pretty much cured of using the 
wrapper script's "-b" and "-r" option now; trusting that script to keep Couch 
working was a cause of a much longer production outage where the restart never 
happened either (see OP way below…).

tired of this,
-natevw


On Oct 29, 2014, at 2:12 PM, Nathan Vander Wilt <[email protected]> 
wrote:

> I've now got a huge rash of crashes on another (slightly less critical) 
> production server. Some are similar to this. I did find one thread, which 
> seems about as inconclusive but does kind of match and at least has a few 
> sort of cryptic suggestions:
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201408.mbox/%3C3B310764-6F57-4208-ADEC-381D23CC170B%40apache.org%3E
> 
> The "nice" thing about the current situation is that it stays up for fifteen 
> minutes tops before crashing. So I flipped debug logging on. Didn't get much 
> more — most crashes are signalled by nothing other than the "CouchDB has 
> started" log line — but here's one with a bit more info. I'll try figure out 
> which vhosts setting is being referred to in the linked thread and see if it 
> has any affect.
> 
> -nvw
> 
> 
> [Wed, 29 Oct 2014 19:54:30 GMT] [debug] [<0.444.0>] OAuth Params: []
> [Wed, 29 Oct 2014 19:54:30 GMT] [debug] [<0.102.0>] DDocProc found for 
> DDocKey: {<<"_design/ipcalf">>,
>                                                  
> <<"46-bb2d975f3e712e0077c884153c48ad09">>}
> [Wed, 29 Oct 2014 19:54:31 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> 
> Input  :: ["reset",{"reduce_limit":true,"timeout":5000}]
> [Wed, 29 Oct 2014 19:54:32 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> 
> Output :: true
> [Wed, 29 Oct 2014 19:54:32 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> 
> Input  :: 
> ["ddoc","_design/ipcalf",["shows","address"],[null,{"info":{"db_name":"public","doc_count":60,"doc_del_count":2,"update_seq":184,"purge_seq":0,"compact_running":false,"disk_size":16244847,"data_size":16042203,"instance_start_time":"1414612231302700","disk_format_version":6,"committed_update_seq":184},"id":null,"uuid":"b9fd1dce73cc155b922bda0c230091de","method":"GET","requested_path":[],"path":["public","_design","ipcalf","_show","address"],"raw_path":"/public/_design/ipcalf/_show/address/","query":{},"headers":{"Connection":"close","Host":"ipcalf.com","User-Agent":"Mozilla/5.0
>  (compatible; monitis - premium monitoring service; 
> http://www.monitis.com)","x-couchdb-vhost-path":"/","X-Forwarded-For":"174.36.220.194"},"body":"undefined","peer":"174.36.220.194","form":{},"cookie":{},"userCtx":{"db":"public","name":null,"roles":[]},"secObj":{}}]]
> [Wed, 29 Oct 2014 19:54:33 GMT] [debug] [<0.253.0>] OS Process #Port<0.2532> 
> Output :: 
> ["resp",{"headers":{"Access-Control-Allow-Origin":"*","Content-Type":"text/html;
>  charset=utf-8"},"body":"Your IP address is: <h1>174.36.220.194</h1> Have a 
> nice day.\n"}]
> [Wed, 29 Oct 2014 19:54:33 GMT] [info] [<0.444.0>] 174.36.220.194 - - GET 
> /public/_design/ipcalf/_show/address/ 200
> [Wed, 29 Oct 2014 19:55:13 GMT] [error] [<0.107.0>] {error_report,<0.31.0>,
>                      {<0.107.0>,crash_report,
>                       [[{initial_call,
>                          {mochiweb_acceptor,init,
>                           ['Argument__1','Argument__2','Argument__3']}},
>                         {pid,<0.107.0>},
>                         {registered_name,[]},
>                         {error_info,
>                          {exit,
>                           {noproc,
>                            {gen_server,call,[couch_httpd_vhost,get_state]}},
>                           [{gen_server,call,2,
>                             [{file,"gen_server.erl"},{line,180}]},
>                            {couch_httpd_vhost,dispatch_host,1,
>                             [{file,
>                               
> "/home/ubuntu/bc3/dependencies/couchdb/src/couchdb/couch_httpd_vhost.erl"},
>                              {line,96}]},
>                            {couch_httpd,handle_request,5,
>                             [{file,
>                               
> "/home/ubuntu/bc3/dependencies/couchdb/src/couchdb/couch_httpd.erl"},
>                              {line,232}]},
>                            {mochiweb_http,headers,5,
>                             [{file,
>                               
> "/home/ubuntu/bc3/dependencies/couchdb/src/mochiweb/mochiweb_http.erl"},
>                              {line,94}]},
>                            {proc_lib,init_p_do_apply,3,
>                             [{file,"proc_lib.erl"},{line,239}]}]}},
>                         {ancestors,
>                          [couch_httpd,couch_secondary_services,
>                           couch_server_sup,<0.32.0>]},
>                         {messages,[]},
>                         {links,[<0.106.0>,#Port<0.2074>]},
>                         {dictionary,[{couch_rewrite_count,0}]},
>                         {trap_exit,false},
>                         {status,running},
>                         {heap_size,1598},
>                         {stack_size,27},
>                         {reductions,1011}],
>                        []]}}
> [Wed, 29 Oct 2014 19:55:14 GMT] [info] [<0.32.0>] Apache CouchDB has started 
> on http://127.0.0.1:5984/
> 
> 
> 
> 
> 
> 
> 
> On Oct 9, 2014, at 10:33 AM, Nathan Vander Wilt <[email protected]> 
> wrote:
> 
>> Any idea what might have caused the second crash, at bottom of this email? 
>> Yesterday the same CouchDB server went down like this and didn't come back 
>> up:
>> 
>> -- first crash
>>    heart: Wed Oct  8 10:31:25 2014: Erlang has closed.
>>    Segmentation fault (core dumped)
>>    sh: echo: I/O error
>>    heart: Wed Oct  8 10:31:26 2014: Executed 
>> "/home/natevw/bc16/build/bin/couchdb -k" -> 256. Terminating.
>> 
>> …which have been because I was just starting it from crontab and hoping the 
>> `-b -r 5` options would actually work. By today I've got the daemonization 
>> more properly setup, using upstart and its respawn option.
>> 
>> No big outage today, however I did notice another crash in the logs — I'd 
>> like to avoid the daemon restarting at all in routine use if possible. I 
>> don't see anything particularly useful/interesting as to the cause of the 
>> crash…does the backtrace below imply anything in particular?
>> 
>> The main difference the last two days is that this system is now back under 
>> some load (maybe 50 users, up from maybe one or two in preceding weeks). 
>> Right now (under "higher" load) the server is showing "0.00, 0.01, 0.05" 
>> load average and 2.6 of 3.7GB memory free, so it doesn't seem offhand we're 
>> pushing the system too hard. Besides basic reads/writes/view stuff, we still 
>> haven't migrated off use of per-user filtered changes, which is the only 
>> thing I can think might lead to a load-related problem.
>> 
>> thanks,
>> -natevw
>> 
>> 
>> 
>> -- second crash
>> 
>> [Thu, 09 Oct 2014 15:23:24 GMT] [info] [<0.21979.2>] 127.0.0.1 - - GET 
>> /production-db/org.couchdb.user%3Au123456 200
>> [Thu, 09 Oct 2014 15:23:26 GMT] [error] [<0.108.0>] {error_report,<0.31.0>,
>>                     {<0.108.0>,crash_report,
>>                      [[{initial_call,
>>                         {mochiweb_acceptor,init,
>>                          ['Argument__1','Argument__2','Argument__3']}},
>>                        {pid,<0.108.0>},
>>                        {registered_name,[]},
>>                        {error_info,
>>                         {exit,
>>                          {noproc,
>>                           {gen_server,call,[couch_httpd_vhost,get_state]}},
>>                          [{gen_server,call,2,
>>                            [{file,"gen_server.erl"},{line,180}]},
>>                           {couch_httpd_vhost,dispatch_host,1,
>>                            [{file,
>>                              
>> "/home/natevw/bc16/dependencies/couchdb/src/couchdb/couch_httpd_vhost.erl"},
>>                             {line,96}]},
>>                           {couch_httpd,handle_request,5,
>>                            [{file,
>>                              
>> "/home/natevw/bc16/dependencies/couchdb/src/couchdb/couch_httpd.erl"},
>>                             {line,217}]},
>>                           {mochiweb_http,headers,5,
>>                            [{file,
>>                              
>> "/home/natevw/bc16/dependencies/couchdb/src/mochiweb/mochiweb_http.erl"},
>>                             {line,94}]},
>>                           {proc_lib,init_p_do_apply,3,
>>                            [{file,"proc_lib.erl"},{line,239}]}]}},
>>                        {ancestors,
>>                         [couch_httpd,couch_secondary_services,
>>                          couch_server_sup,<0.32.0>]},
>>                        {messages,[]},
>>                        {links,[<0.107.0>,#Port<0.2017>]},
>>                        {dictionary,[{couch_rewrite_count,0}]},
>>                        {trap_exit,false},
>>                        {status,running},
>>                        {heap_size,2586},
>>                        {stack_size,27},
>>                        {reductions,1173}],
>>                       []]}}
>> [Thu, 09 Oct 2014 15:23:26 GMT] [info] [<0.32.0>] Apache CouchDB has started 
>> on http://127.0.0.1:55984/
>> 
> 

Reply via email to