On Oct 7, 2:11 am, Omry Yadan <[email protected]> wrote:
> Either that, or the gdb did not show the stack for the frozen processes.
I doubt gdb would be having an issue. I am also pretty confident the
mod_wsgi daemon process weren't frozen either. The Apache server child
processes were the ones hanging waiting.
Now, you don't actually say what version of mod_wsgi you are using and
also don't think you are saying whether you are compiling from source
code or using a prebuilt binary.
If using source code, can you apply the following patch, recompile and
reinstall. The patch is against mod_wsgi 2.6 source code, but should
apply cleanly against 2.5 as well with no issues.
The patch moves the setting of an internal server timeout on socket to
daemon process to an earlier phase of dealing with that connection.
The original point of the timeout was to avoid mutual deadlock when
more than socket buffer size data was sent by client and WSGI
application didn't read it all in and consume it but replied with a
response also larger than the socket buffer size. The timeout would
instead cause the request to fail.
Your problem isn't this, but as a I said an apparent failure of epoll
(), or maybe UNIX socket connect. Moving the timeout will help detect
such a failure as you are seeing and at least cause Apache server
child processes to timeout on the connection when mod_wsgi daemon
process doesn't do anything. Overall this may not help, as it may be
the case that subsequent requests will just suffer the same fate, but
if it is a temporary glitch, it may well recover. The value of this
timeout is control by Apache 'Timeout' directive. It defaults to 300
seconds. As this timeout controls many things in Apache, you do need
to be a bit careful in changing it, but dropping to 60 seconds
wouldn't be over the top.
It may be the case that you will want to try building Apache yourself
from source code and making it use the APR which is bundled with
Apache rather than allowing it to use the system one, in case the
issues is a mismatch in versions of precompiled Apache and APR/APU
libraries being used.
Given the strangeness of this issue, I would suggest the discussion be
moved over to the mod_wsgi list on Google Groups. I have cc'd this
email there.
Index: mod_wsgi.c
===================================================================
--- mod_wsgi.c (revision 1446)
+++ mod_wsgi.c (working copy)
@@ -9568,6 +9568,9 @@
apr_os_pipe_put_ex(&tmpsock, &daemon->fd, 1, r->pool);
apr_pool_cleanup_kill(r->pool, daemon, wsgi_close_socket);
+ apr_file_pipe_timeout_get(tmpsock, &timeout);
+ apr_file_pipe_timeout_set(tmpsock, r->server->timeout);
+
/* Setup bucket brigade for reading response from daemon. */
bbin = apr_brigade_create(r->pool, r->connection->bucket_alloc);
@@ -9662,6 +9665,9 @@
apr_os_pipe_put_ex(&tmpsock, &daemon->fd, 1, r->pool);
apr_pool_cleanup_kill(r->pool, daemon,
wsgi_close_socket);
+ apr_file_pipe_timeout_get(tmpsock, &timeout);
+ apr_file_pipe_timeout_set(tmpsock, r->server->timeout);
+
apr_brigade_destroy(bbin);
bbin = apr_brigade_create(r->pool, r->connection-
>bucket_alloc);
@@ -9687,9 +9693,6 @@
bbout = apr_brigade_create(r->pool, r->connection->bucket_alloc);
- apr_file_pipe_timeout_get(tmpsock, &timeout);
- apr_file_pipe_timeout_set(tmpsock, r->server->timeout);
-
do {
apr_bucket *bucket;
> my kenrel version is : 2.6.26-bpo.1-amd64.
>
> my apache version is : 2.2.9-10
>
> libapr version is : 1.2.12-5+lenny1
>
>
>
> Graham Dumpleton wrote:
> > Most odd.
>
> > All the mod_wsgi daemon processes are waiting for a request to arrive.
> > They aren't actively handling any.
>
> > All the Apache server child processes are waiting for initial
> > handshaking response from mod_wsgi daemon processes after having just
> > sent a request to them.
>
> > To me this looks a bit like the epoll() implementation on your system
> > is broken which is causing a failure of mod_wsgi daemon processes to
> > detect a new request has been sent.
>
> > I would be suggesting that you rebuild Apache/APR such that use of
> > epoll() is disabled and it instead fallbacks to either poll() or select
> > (). Unfortunately, not exactly sure how you can override 'configure'
> > for Apache/APR to stop it using epoll() if it exists.
>
> > That is about the only pointer I can give.
>
> > What versions of Apache/APR are you using?
>
> > Graham
>
> > On Oct 5, 12:48 am, Omry Yadan <[email protected]> wrote:
>
> >> apache backtraces attached.
>
> >>> I will configure monit to use this command before it restarts apache in
> >>> the next time it hangs:
>
> >>> pgrep apache2 | xargs -i bash -c "gdb -ex \"set height 0\" -ex \"thread
> >>> apply all bt\" --batch -p {}" > apache.backtrace.txt
>
> >>> it's supposed to dump the backtrace of all apache processes to a file.
> >>> hopefully it will contain something useful.
>
> >> apache.backtrace.zip
> >> 9KViewDownload
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Trac
Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/trac-dev?hl=en
-~----------~----~----~----~------~----~------~--~---