RE: Apache children hanging
On Tue, 13 Jun 2000, Paul G. Weiss wrote: Yes, that much I knew, however when you do that you can't use curinfo from .gdbinit, because the process is not running. % gdb httpd core that's worked with curinfo for me in the past.
RE: Apache children hanging
On Mon, 12 Jun 2000, Ken Williams wrote: [EMAIL PROTECTED] (Blue) wrote: there is a procedure in the mod_peril SUPPORT document that works if you Uh-oh - is mod_peril a Microsoft product or something? =) Perhaps the DSO version of mod_perl should be called mod_peril. sorry, it's a joke wrt mod_perl + perl 5.6. :) ------ Ken Williams Last Bastion of Euclidity [EMAIL PROTECTED]The Math Forum -- Blue Lang Unix Systems Admin QSP, Inc., 3200 Atlantic Ave, Ste 100, Raleigh, NC, 27604 Home: 919 835 1540 Work: 919 875 6994 Fax: 919 872 4015
RE: Apache children hanging
Is there any equivalent procedure for debugging core dumps? I've tried this and it doesn't work because the process is not running. When I process dies it would be nice to know where it was in the Perl stack. -P -Original Message- From: Doug MacEachern [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 01, 2000 6:27 PM To: John Armstrong Cc: Gustavo Duarte; [EMAIL PROTECTED] Subject: Re: Apache children hanging % gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo oops, that should be: % gdb httpd $pid_of_spinning_process (gdb) source modperl_x.xx/.gdbinit (gdb) curinfo
Re: Apache children hanging (not exiting)
On Thu, 1 Jun 2000, Jay Jacobs wrote: With that previous thread of Apache children hanging up the server it made me think of an issue I see quite frequently in development... When I stop the mod_perl server, it won't exit properly (or fast) : [warn] child process 8530 still did not exit, sending a SIGTERM (multiplied by number of processes) I've seen this on servers running really sloppy code as well as really tight code (but a lot of it). Eventually the children do die off, but it takes about 15 seconds... from mod_perl.pod: =item PERL_DESTRUCT_LEVEL With Apache versions 1.3.0 and higher, mod_perl will call the perl_destruct() Perl API function during the child exit phase. This will cause proper execution of BEND blocks found during server startup along with invoking the BDESTROY method on global objects who are still alive. It is possible that this operation may take a long time to finish, causing problems during a restart. If your code does not contain and BEND blocks or BDESTROY methods which need to be run during child server shutdown, this destruction can be avoided by setting the IPERL_DESTRUCT_LEVEL environment variable to C-1.
Re: Apache children hanging
Doug MacEachern [EMAIL PROTECTED] writes: % gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo oops, that should be: % gdb httpd $pid_of_spinning_process (gdb) source modperl_x.xx/.gdbinit (gdb) curinfo Is this magic in the guide? I'm on the bus right now... ;-) -- Dave Hodgkinson, http://www.hodgkinson.org Editor-in-chief, The Highway Star http://www.deep-purple.com Apache, mod_perl, MySQL, Sybase hired gun for, well, hire -
Re: Apache children hanging
On Fri, Jun 02, 2000 at 09:06:02AM +0100, David Hodgkinson wrote: Doug MacEachern [EMAIL PROTECTED] writes: % gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo oops, that should be: % gdb httpd $pid_of_spinning_process (gdb) source modperl_x.xx/.gdbinit (gdb) curinfo Is this magic in the guide? I'm on the bus right now... ;-) I've added it to SUPPORT, and it will be added to the guide soonish. -- Dave Hodgkinson, http://www.hodgkinson.org Editor-in-chief, The Highway Star http://www.deep-purple.com Apache, mod_perl, MySQL, Sybase hired gun for, well, hire - -- Eric Cholet
Re: Apache children hanging
Eric Cholet [EMAIL PROTECTED] writes: I've added it to SUPPORT, and it will be added to the guide soonish. Cool. One of those things you hope you never need... -- Dave Hodgkinson, http://www.hodgkinson.org Editor-in-chief, The Highway Star http://www.deep-purple.com Apache, mod_perl, MySQL, Sybase hired gun for, well, hire -
Re: Apache children hanging
John Armstrong wrote: I have had this problem to varying degrees in all of my high traffic mod perl installations. The thing that saves me is Apache::Resource. In my httpd.conf I put : PerlModule Apache::Resource PerlSetEnv PERL_RLIMIT_DATA 32 PerlSetEnv PERL_RLIMIT_CPU 640 PerlChildInitHandler Apache::Resource That kills off any bad children. Before I installed this things were regularly spinning out of control. You could sit and watch them go nuts. After installing it you could watch them _attempt_ to go nuts and then watch apache clean house. Its a lifesaver. Why does it happen? A variety of reasons but I have just accepted it as the cost of doing business with mod_perl and gone on with things. Not the most proactive response but so be it :( John Armstrong Yes , I'm always get into this trouble , a gnu/linux redhat 6.1 with last kernel and patch . About 280k hits per day , down every two days . Even sometimes a watchdog can reboot it , but sometims not . Seemly modperl is too complex in idea . It's more like a hack than fastcgi .
Re: Apache children hanging
Steve Reppucci wrote: This is *exactly* the symptoms we see, and we're just about always up to date with Apache/Perl/modperl releases. We've spent a fair amount of time trying to isolate the cause of these, but haven't been able to point the finger at any one cause. Some of the things we've determined: - The same behavior is displayed under Solaris (5.6 and 5.7) and Linux (2.2.14). - We've seen this through through a bunch of releases of Apache/Perl/modperl over the past 6 months. - When a child process goes astray, it is in a tight loop, quickly growing to consume 95 to 100% of the cpu cycles. - Under Linux, running strace on the runaway results in nothing -- no system calls are shown whatsoever, so it's apparently spinning in a tight CPU loop (though see the next bullet -- it's possible I've just never caught it at an early enough stage.) - Under Solaris, I've managed to catch a few of these at an early stage and observed (via truss) an endless series of 'sbrk' calls, eventually this gets bound up tight with no system calls displayed, like the Linux case. - This seems to happen more often under heavy load, but we've also seen it fairly regularly during low traffic periods. - We did have some luck in doing a thorough read of our handlers that use DBI, making sure that all database connections are explicitly closed at the end of a request (we *don't* use Apache::DBI). This cut down on the number of runaways, but we still see them. We've kept our runaways under control by running a watchdog script that looks for modperl processes with the correct load numbers (cpu% 10% and run time something), but we've all along thought that this would be a temporary solution until we determined what we're doing wrong. Yup , I've do it before , but sometimes runaways are still there and quick take down the system before you kill them. Now that I've seen this report from a couple of others on the list, I'm wondering if it's not something we're doing, but rather something within Apache or modperl. If there's anything anyone on the list can recommend that I do to try to collect more clues on the cause, I'll be happy to try it. Or maybe if there are others who've seen the same behavior, pipe in so that we can get a feeling for how many sites are experiencing this? Steve Reppucci Just wonders the imdb's apache-modperl version : Server: Apache/1.3.11-dev (Unix) mod_perl/1.21_01-dev . Maybe this version is most stable to them, they must have a load balance for failover also.
Re: Apache children hanging
Now if only I had known this two years ago... Awsome tidbit though. Thanks! you can find out which line of Perl code is triggering a spin, by attaching to the process with gdb; % gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo should show you the filename:line_number where Perl is stuck.
Re: Apache children hanging
Someone just pointed out that this should probably go into the guide or FAQ somewhere. Just a thought... On Thu, 1 Jun 2000, Doug MacEachern wrote: % gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo oops, that should be: % gdb httpd $pid_of_spinning_process (gdb) source modperl_x.xx/.gdbinit (gdb) curinfo
Re: Apache children hanging
On Fri, 2 Jun 2000, Ian Struble wrote: Someone just pointed out that this should probably go into the guide or FAQ somewhere. Just a thought... On Thu, 1 Jun 2000, Doug MacEachern wrote: % gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo oops, that should be: % gdb httpd $pid_of_spinning_process (gdb) source modperl_x.xx/.gdbinit (gdb) curinfo don't worry, it's on the todo list already. _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://perl.org http://stason.org/TULARC http://singlesheaven.com http://perlmonth.com http://sourcegarden.org
Re: Apache children hanging
I have had this problem to varying degrees in all of my high traffic mod perl installations. The thing that saves me is Apache::Resource. In my httpd.conf I put : PerlModule Apache::Resource PerlSetEnv PERL_RLIMIT_DATA 32 PerlSetEnv PERL_RLIMIT_CPU 640 PerlChildInitHandler Apache::Resource That kills off any bad children. Before I installed this things were regularly spinning out of control. You could sit and watch them go nuts. After installing it you could watch them _attempt_ to go nuts and then watch apache clean house. Its a lifesaver. Why does it happen? A variety of reasons but I have just accepted it as the cost of doing business with mod_perl and gone on with things. Not the most proactive response but so be it :( John Armstrong At 3:33 PM -0700 6/1/00, Gustavo Duarte wrote: Hi there people, I have inherited a web server running mod_perl and I am experiencing a somewhat critical problem: http processes sometimes get into an infinite loop, using 100% cpu time, and given enough time bring the machine to a halt. I've done a lot of testing, and there isn't a specific http request that triggers the behaviour, eventhough it always happens after a request. It seems to happen every few hours: the httpd process simply starts hogging up the CPU, and won't let go of it. After a while, I have a few of these processes running, and the machine's load average skyrockets. Sometimes it's bad enough I'm not even able to log in via console. I'll upgrade all the software to new versions, but apparently this problem has been ocurring for a while, and survived a couple of hardware/software upgrades. I'll also be rewriting the perl code running there to see if it stops the problem (the code isn't too clean - lots of global variables, not written under strict, etc, but "it works"). However, it would be cool if someone could enlighten me on what's going on, and possibly suggest a fix :). Thanks a lot! signed, gustavo begin debugging info = our OS is: [root@blueland /root]# uname -a Linux blueland 2.2.14-5.0 #4 Wed Apr 12 20:28:28 MDT 2000 i586 unknown = Apache: Server Version: Apache/1.3.6 (Unix) mod_perl/1.19 mod_ssl/2.2.8 OpenSSL/0.9.2b = let's look into one of the monster processes: 497 ?R288:06 /usr/local/apache_1.3.6/bin/httpd = (nice cpu time there...) = now for gdb... [root@blueland /root]# gdb /usr/local/apache/bin/httpd 497 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. Attaching to program: /usr/local/apache/bin/httpd, Pid 497 Reading symbols from /lib/libNoVersion.so.1...done. Reading symbols from /lib/libm.so.6...done. Reading symbols from /lib/libcrypt.so.1...done. Reading symbols from /usr/lib/perl5/5.00502/i586-linux/CORE/libperl.so...done. Reading symbols from /lib/libnsl.so.1...done. Reading symbols from /lib/libdl.so.2...done. Reading symbols from /lib/libc.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.2...done. Reading symbols from /usr/lib/perl5/site_perl/5.005/i586-linux/auto/Sybase/DBlib/DBlib.none...done. Reading symbols from /opt/sybase/lib/libsybdb.so...done. Reading symbols from /opt/sybase/lib/libinsck.so...done. Reading symbols from /lib/libnss_nisplus.so.2...done. Reading symbols from /lib/libnss_nis.so.2...done. 0x80a7407 in free_blocks () = let's see what the stack tells us... (gdb) i s #0 0x80a7407 in free_blocks () #1 0x80a7660 in ap_clear_pool () #2 0x80a76a1 in ap_destroy_pool () #3 0x80be71a in ap_destroy_sub_req () #4 0x8070d8a in XS_Apache__SubRequest_DESTROY () #5 0x400d0f45 in Perl_pp_entersub () at pp_hot.c:1965 #6 0x4007aa8d in perl_call_sv () at perl.c:1008 #7 0x400d9d4c in Perl_sv_clear () at sv.c:2418 #8 0x400da451 in Perl_sv_free () at sv.c:2418 #9 0x400d24ab in do_clean_objs (sv=0x8385744) at sv.c:338 #10 0x400d237c in visit (f=0x400d2410 do_clean_objs) at sv.c:306 #11 0x400d263f in Perl_sv_clean_objs () at sv.c:359 #12 0x40077fa5 in perl_destruct () at perl.c:1008 #13 0x80635a3 in perl_shutdown () #14 0x80645c5 in perl_child_exit () #15 0x8064430 in perl_child_exit_cleanup () #16 0x80a8dd6 in run_cleanups () #17 0x80a762c in ap_clear_pool () #18 0x80a76a1 in ap_destroy_pool () #19 0x80b3b03 in clean_child_exit () #20 0x80b6773 in child_main () #21 0x80b6e14 in make_child () #22 0x80b716e in perform_idle_server_maintenance () #23 0x80b7665 in standalone_main () #24 0x80b7cd3 in main () = registers have: (gdb) i r eax0x8a3e408144958472 ecx0x8608414140542996 edx0x8a3e408144958472 ebx0x8239208136548872 esp0xb90c 0xb90c ebp0xb910 0xb910 esi0x1
Re: Apache children hanging
On Thu, 1 Jun 2000, Gustavo Duarte wrote: there to see if it stops the problem (the code isn't too clean - lots of global variables, not written under strict, etc, but "it works"). looks like a global variable is exactly what your problem is. somebody is creating an Apache::SubRequest object (via $r-lookup_{file,uri}, and it's not going out of scope until the child exits. in this case, when Apache::SubRequest::DESTROY is called, it calls destroy_sub_req() on a pool that's already been cleared. #0 0x80a7407 in free_blocks () #1 0x80a7660 in ap_clear_pool () #2 0x80a76a1 in ap_destroy_pool () #3 0x80be71a in ap_destroy_sub_req () #4 0x8070d8a in XS_Apache__SubRequest_DESTROY () ... #12 0x40077fa5 in perl_destruct () at perl.c:1008 #13 0x80635a3 in perl_shutdown () #14 0x80645c5 in perl_child_exit ()
Re: Apache children hanging
you can find out which line of Perl code is triggering a spin, by attaching to the process with gdb; % gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo should show you the filename:line_number where Perl is stuck.
Re: Apache children hanging
% gdb httpd $pid_of_spinning_process % source modperl_x.xx/.gdbinit % curinfo oops, that should be: % gdb httpd $pid_of_spinning_process (gdb) source modperl_x.xx/.gdbinit (gdb) curinfo
Re: Apache children hanging
This is *exactly* the symptoms we see, and we're just about always up to date with Apache/Perl/modperl releases. We've spent a fair amount of time trying to isolate the cause of these, but haven't been able to point the finger at any one cause. Some of the things we've determined: - The same behavior is displayed under Solaris (5.6 and 5.7) and Linux (2.2.14). - We've seen this through through a bunch of releases of Apache/Perl/modperl over the past 6 months. - When a child process goes astray, it is in a tight loop, quickly growing to consume 95 to 100% of the cpu cycles. - Under Linux, running strace on the runaway results in nothing -- no system calls are shown whatsoever, so it's apparently spinning in a tight CPU loop (though see the next bullet -- it's possible I've just never caught it at an early enough stage.) - Under Solaris, I've managed to catch a few of these at an early stage and observed (via truss) an endless series of 'sbrk' calls, eventually this gets bound up tight with no system calls displayed, like the Linux case. - This seems to happen more often under heavy load, but we've also seen it fairly regularly during low traffic periods. - We did have some luck in doing a thorough read of our handlers that use DBI, making sure that all database connections are explicitly closed at the end of a request (we *don't* use Apache::DBI). This cut down on the number of runaways, but we still see them. We've kept our runaways under control by running a watchdog script that looks for modperl processes with the correct load numbers (cpu% 10% and run time something), but we've all along thought that this would be a temporary solution until we determined what we're doing wrong. Now that I've seen this report from a couple of others on the list, I'm wondering if it's not something we're doing, but rather something within Apache or modperl. If there's anything anyone on the list can recommend that I do to try to collect more clues on the cause, I'll be happy to try it. Or maybe if there are others who've seen the same behavior, pipe in so that we can get a feeling for how many sites are experiencing this? Steve Reppucci On Thu, 1 Jun 2000, Gustavo Duarte wrote: Hi there people, I have inherited a web server running mod_perl and I am experiencing a somewhat critical problem: http processes sometimes get into an infinite loop, using 100% cpu time, and given enough time bring the machine to a halt. I've done a lot of testing, and there isn't a specific http request that triggers the behaviour, eventhough it always happens after a request. It seems to happen every few hours: the httpd process simply starts hogging up the CPU, and won't let go of it. After a while, I have a few of these processes running, and the machine's load average skyrockets. Sometimes it's bad enough I'm not even able to log in via console. I'll upgrade all the software to new versions, but apparently this problem has been ocurring for a while, and survived a couple of hardware/software upgrades. I'll also be rewriting the perl code running there to see if it stops the problem (the code isn't too clean - lots of global variables, not written under strict, etc, but "it works"). However, it would be cool if someone could enlighten me on what's going on, and possibly suggest a fix :). Thanks a lot! signed, gustavo begin debugging info = our OS is: [root@blueland /root]# uname -a Linux blueland 2.2.14-5.0 #4 Wed Apr 12 20:28:28 MDT 2000 i586 unknown = Apache: Server Version: Apache/1.3.6 (Unix) mod_perl/1.19 mod_ssl/2.2.8 OpenSSL/0.9.2b = let's look into one of the monster processes: 497 ?R288:06 /usr/local/apache_1.3.6/bin/httpd = (nice cpu time there...) = now for gdb... [root@blueland /root]# gdb /usr/local/apache/bin/httpd 497 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. Attaching to program: /usr/local/apache/bin/httpd, Pid 497 Reading symbols from /lib/libNoVersion.so.1...done. Reading symbols from /lib/libm.so.6...done. Reading symbols from /lib/libcrypt.so.1...done. Reading symbols from /usr/lib/perl5/5.00502/i586-linux/CORE/libperl.so...done. Reading symbols from /lib/libnsl.so.1...done. Reading symbols from /lib/libdl.so.2...done. Reading symbols from /lib/libc.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.2...done. Reading symbols from /usr/lib/perl5/site_perl/5.005/i586-linux/auto/Sybase/DBlib/DBlib.none...done. Reading symbols from /opt/sybase/lib/libsybdb.so...done. Reading symbols from /opt/sybase/lib/libinsck.so...done. Reading symbols from