For what it's worth I've had the same problems. It always follows my other problem, MySQL throwing (2013, 'Lost connection to MySQL server during query'). I get the sames symptoms with the webkit still listening but not responding. I have been unable to reproduce this problem reliably, it does happen about once a day.
Date: Thu, 07 Aug 2003 07:55:16 -0400 On Wed, 2003-08-06 at 22:07, Hancock, David (DHANCOCK) wrote: > Adam: Thanks for the additional information. Stephan Diehl also has > seen this situation on his systems. > > I agree about the gap in the PIDs, but most of the time, they're > contiguous. We do sort of a "heartbeat" ping on our servers with an > HTTP request at least every 5 minutes, which is how we notice the > problem. We've got two machines running Apache and WebKit, > load-balanced, but each gets hit pretty often. There's a LOT of > memory on these machines (2GB physical); we've typically got 500MB > physical free and swap generally shows 0K used. We'll start capturing > memory data to see if we really are using some swap space. > > My understanding of swapping (which, granted, is apt to be faulty) is > that Linux isn't apt to swap something to disk while there's unused > physical memory. > > We are using mod_webkit, and even with the WebKit processes wedged, > the port (we're using 8086) is still listening, just not responding. > > If we were able to reproduce this situation on our development or test > systems, we could use the debugger to find out more about what's going > on, but in production, our first priority is to get the system > responding again. > > If/when I learn more, I'll follow up to the list. And if anybody else > has some ideas, I'd be grateful to hear them. > > Cheers! > -- > David Hancock | [EMAIL PROTECTED] | 410-266-4384 > > -----Original Message----- > From: Adam Kerrison [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 06, 2003 10:26 AM > To: [EMAIL PROTECTED] > Subject: RE: [Webware-discuss] RE: Anyone seen WebKit > processes going into a weird state? > > > I can't say I've experienced this behaviour directly but few > points: > > - Process name in brackets does mean "swapped to disk" > probably because the process has been inactive for a while > (seems likely!) > > - The gap in the PID could just be that another process > started at that time - you can't rely on the PID's being > contiguous > > - I have had problems where I had to kill threaded apps > when the code raises an exception. In SOME cases the thread > dies and the application stops responding (depends a lot on > how the app is designed). I don't think I've seen this > specifically with Webware but if the socket handler dies then > the other threads will be waiting for things that will never > happen (and the process will be swapped out eventually). I am > assuming a lot about how the AppServer is working - I don't > know that this is right but I'm sure someone will correct > me :-) > > If you're using mod_webkit - and assuming that it maintains a > connection from apache to webkit - you should be able to see > this connection via netstat. If the socket handler has died > then the socket may have gone. Using gdb you should be able to > see the threads running and the state but that probably less > useful in python. > > Not sure that this helps or not - might be a red herring > > Adam > -----Original Message----- > From: Hancock, David (DHANCOCK) > [mailto:[EMAIL PROTECTED] > Sent: 06 August 2003 13:28 > To: '[EMAIL PROTECTED]' > Subject: [Webware-discuss] RE: Anyone seen WebKit > processes going into a weird state? > > > > Sorry to be replying to my own post, but I haven't > seen any list traffic related to my question below, so > maybe it didn't get out to the list. The situation > described below has occurred several times this week, > and in most cases there is a gap in the process > numbering. Every other time I've looked, the "python > Launch.py ThreadedAppServer" process numbers are > sequential, with no gaps. They must start up very > quickly. In the list below, there is a gap (25802 is > missing). > > I'm grasping at straws here. I think that the process > id in brackets with no command line means that the > process is swapped to disk, but I'm not sure about > that. When we see the processes looking like they do > below, they really ARE wedged, though, and require > manual termination. > > Cheers! > -- > David Hancock | [EMAIL PROTECTED] | 410-266-4384 > > -----Original Message----- > From: Hancock, David (DHANCOCK) > Sent: Friday, August 01, 2003 4:57 PM > To: [EMAIL PROTECTED] > Subject: Anyone seen WebKit processes > going into a weird state? > > Several times a week on our production > systems, we're seeing our WebKit processes > (normally entitled "python Launch.py > ThreadedAppServer") lose their command lines > in the output from ps. They're also well > wedged, and the processes need to be killed by > hand to clear this situation. Has anybody > else seen this and have some ideas to help us > troubleshoot? For now, we're detecting the > situation with automated monitoring (and > process-killing and webkit-restarting), but > we'd sure like to know how we can prevent it, > not just work around it. > > Output from ps auxww: > > adc 25799 0.1 1.6 130288 34252 ? > SN Jul28 10:04 [python] > adc 25800 0.0 1.6 130288 34252 ? > SN Jul28 0:00 [python] > adc 25801 0.0 1.6 130288 34252 ? > SN Jul28 2:52 [python] > adc 25803 0.0 1.6 130288 34252 ? > SN Jul28 1:37 [python] > adc 25804 0.0 1.6 130288 34252 ? > SN Jul28 2:17 [python] > adc 25805 0.0 1.6 130288 34252 ? > SN Jul28 1:37 [python] > adc 25806 0.0 1.6 130288 34252 ? > SN Jul28 1:45 [python] > adc 25807 0.0 1.6 130288 34252 ? > SN Jul28 1:27 [python] > adc 25808 0.0 1.6 130288 34252 ? > SN Jul28 1:51 [python] > adc 25809 0.0 1.6 130288 34252 ? > SN Jul28 1:08 [python] > adc 25810 0.0 1.6 130288 34252 ? > SN Jul28 3:37 [python] > > Our setup includes: > > Python 2.2 > Webware 0.8 > RedHat Linux 7.3 > A couple C extensions: DCOracle2 and > pymqi (interface to IBM's MQSeries) > > > Thanks in advance for any ideas and > assistance. > > P.S. We had an extreme example of something > similar several months ago, but even the > "[python]" was missing from the ps output. > Thus, it didn't look like WebKit was running > at all, but a start attempt couldn't bind to > the port. We could only find the culprit > process with "netstat -anp | grep 8086" run as > root. I don't know if that failure is > related, though, it was just weird. > > Cheers! > -- > David Hancock | [EMAIL PROTECTED] | > 410-266-4384 -- Thomas E Jenkins <[EMAIL PROTECTED]> ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Webware-discuss mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/webware-discuss