For what it's worth I've had the same problems.  It always follows my
other problem, MySQL throwing (2013, 'Lost connection to MySQL server
during query').  I get the sames symptoms with the webkit still
listening but not responding.   I have been unable to reproduce this
problem reliably, it does happen about once a day.


                             Date: 
Thu, 07 Aug 2003 07:55:16 -0400
On Wed, 2003-08-06 at 22:07, Hancock, David (DHANCOCK) wrote:
> Adam: Thanks for the additional information.  Stephan Diehl also has
> seen this situation on his systems.
>  
> I agree about the gap in the PIDs, but most of the time, they're
> contiguous.  We do sort of a "heartbeat"  ping on our servers with an
> HTTP request at least every 5 minutes, which is how we notice the
> problem. We've got two machines running Apache and WebKit,
> load-balanced, but each gets hit pretty often.  There's a LOT of
> memory on these machines (2GB physical); we've typically got 500MB
> physical free and swap generally shows 0K used.  We'll start capturing
> memory data to see if we really are using some swap space.
>  
> My understanding of swapping (which, granted, is apt to be faulty) is
> that Linux isn't apt to swap something to disk while there's unused
> physical memory.
>  
> We are using mod_webkit, and even with the WebKit processes wedged, 
> the port (we're using 8086) is still listening, just not responding.
>  
> If we were able to reproduce this situation on our development or test
> systems, we could use the debugger to find out more about what's going
> on, but in production, our first priority is to get the system
> responding again.
>  
> If/when I learn more, I'll follow up to the list.  And if anybody else
> has some ideas, I'd be grateful to hear them.
> 
> Cheers!
> --
> David Hancock | [EMAIL PROTECTED] | 410-266-4384
> 
>         -----Original Message-----
>         From: Adam Kerrison [mailto:[EMAIL PROTECTED] 
>         Sent: Wednesday, August 06, 2003 10:26 AM
>         To: [EMAIL PROTECTED]
>         Subject: RE: [Webware-discuss] RE: Anyone seen WebKit
>         processes going into a weird state?
>         
>         
>         I can't say I've experienced this behaviour directly but few
>         points:
>          
>         - Process name in brackets does mean "swapped to disk"
>         probably because the process has been inactive for a while
>         (seems likely!)
>          
>         - The gap in the PID could just be that another process
>         started at that time - you can't rely on the PID's being
>         contiguous
>          
>         - I have had problems where I had to kill threaded apps
>         when the code raises an exception. In SOME cases the thread
>         dies and the application stops responding (depends a lot on
>         how the app is designed). I don't think I've seen this
>         specifically with Webware but if the socket handler dies then
>         the other threads will be waiting for things that will never
>         happen (and the process will be swapped out eventually). I am
>         assuming a lot about how the AppServer is working - I don't
>         know that this is right but I'm sure someone will correct
>         me :-)
>          
>         If you're using mod_webkit  - and assuming that it maintains a
>         connection from apache to webkit - you should be able to see
>         this connection via netstat. If the socket handler has died
>         then the socket may have gone. Using gdb you should be able to
>         see the threads running and the state but that probably less
>         useful in python. 
>          
>         Not sure that this helps or not - might be a red herring
>          
>         Adam
>                 -----Original Message-----
>                 From: Hancock, David (DHANCOCK)
>                 [mailto:[EMAIL PROTECTED] 
>                 Sent: 06 August 2003 13:28
>                 To: '[EMAIL PROTECTED]'
>                 Subject: [Webware-discuss] RE: Anyone seen WebKit
>                 processes going into a weird state?
>                 
>                 
>                 
>                 Sorry to be replying to my own post, but I haven't
>                 seen any list traffic related to my question below, so
>                 maybe it didn't get out to the list.  The situation
>                 described below has occurred several times this week,
>                 and in most cases there is a gap in the process
>                 numbering.  Every other time I've looked, the "python
>                 Launch.py ThreadedAppServer" process numbers are
>                 sequential, with no gaps.  They must start up very
>                 quickly.  In the list below, there is a gap (25802 is
>                 missing).
>                 
>                 I'm grasping at straws here.  I think that the process
>                 id in brackets with no command line means that the
>                 process is swapped to disk, but I'm not sure about
>                 that.  When we see the processes looking like they do
>                 below, they really ARE wedged, though, and require
>                 manual termination.
>                 
>                 Cheers!
>                 --
>                 David Hancock | [EMAIL PROTECTED] | 410-266-4384
>                 
>                          -----Original Message-----
>                         From:   Hancock, David (DHANCOCK)  
>                         Sent:   Friday, August 01, 2003 4:57 PM
>                         To:     [EMAIL PROTECTED]
>                         Subject:        Anyone seen WebKit processes
>                         going into a weird state?
>                         
>                         Several times a week on our production
>                         systems, we're seeing our WebKit processes
>                         (normally entitled "python Launch.py
>                         ThreadedAppServer") lose their command lines
>                         in the output from ps.  They're also well
>                         wedged, and the processes need to be killed by
>                         hand to clear this situation.  Has anybody
>                         else seen this and have some ideas to help us
>                         troubleshoot?  For now, we're detecting the
>                         situation with automated monitoring (and
>                         process-killing and webkit-restarting), but
>                         we'd sure like to know how we can prevent it,
>                         not just work around it.
>                         
>                         Output from ps auxww:
>                         
>                         adc      25799  0.1  1.6 130288 34252 ?     
>                         SN   Jul28  10:04 [python]
>                         adc      25800  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   0:00 [python]
>                         adc      25801  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   2:52 [python]
>                         adc      25803  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   1:37 [python]
>                         adc      25804  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   2:17 [python]
>                         adc      25805  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   1:37 [python]
>                         adc      25806  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   1:45 [python]
>                         adc      25807  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   1:27 [python]
>                         adc      25808  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   1:51 [python]
>                         adc      25809  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   1:08 [python]
>                         adc      25810  0.0  1.6 130288 34252 ?     
>                         SN   Jul28   3:37 [python]
>                         
>                         Our setup includes:
>                         
>                                 Python 2.2
>                                 Webware 0.8
>                                 RedHat Linux 7.3
>                                 A  couple C extensions: DCOracle2 and
>                                 pymqi (interface to IBM's MQSeries)
>                                 
>                                 
>                         Thanks in advance for any ideas and
>                         assistance.
>                         
>                         P.S. We had an extreme example of something
>                         similar several months ago, but even the
>                         "[python]" was missing from the ps output.
>                         Thus, it didn't look like WebKit was running
>                         at all, but a start attempt couldn't bind to
>                         the port. We could only find the culprit
>                         process with "netstat -anp | grep 8086" run as
>                         root.  I don't know if that failure is
>                         related, though, it was just weird.
>                         
>                         Cheers!
>                         --
>                         David Hancock | [EMAIL PROTECTED] |
>                         410-266-4384
-- 
Thomas E Jenkins <[EMAIL PROTECTED]>



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Webware-discuss mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/webware-discuss

Reply via email to