Solaris, prefork, accept mutex and mod_ext_filter (Was: Prefork MPM and mod_watchdog)

Rainer Jung Sun, 03 Feb 2019 04:30:51 -0800

Am 31.01.2019 um 10:31 schrieb Stefan Eissing:

Am 27.01.2019 um 14:40 schrieb Rainer Jung <rainer.j...@kippdata.de>:

- as soon as I enable mod_watchdog only (but not the above modules that would 
use it), the hangs start to happen every now and then.


Hmm, that sounds strange. As I understood the code, none of the parent/child 
watchdogs would be active then. So, its child_init should not do anything 
either.

But, if one of the proxy monitors runs against the server itself, could it 
connect against the child process it (the watchdog) is running on?


mod_watchdog was a red herring.

I can now frequently reproduce running t/modules/ext_filter.t only. Istripped the reproducer down to the part of t/modules/ext_filter.t whereit runs

POST "/apache/extfilter/out-limit/modules/cgi/perl_echo.pl", content =>"foo and bar\n"

10 times in a row. If I increase the iteration count slightly to 20, theproblem happens nearly every time. I also replaced perl_echo.pl andeval-cmd.pl by small C programs doing the echo and the s/foo/bar/ andcan still reproduce.


This test incolved mod_ext_filter and LimitRequestBody.

It seems I can not reproduce, if I shorten the POST body, so that it nolonger hits the LimitRequestBody 6 configured for/apache/extfilter/out-limit/.


What happens, but I do not understand is:

- the first few requests are handled by two children in an alternatingway. I can see the accept mutex being passed between these two childrenusing lwp_mutex_timedlock(ADDRESS, 0x00000000) andlwp_mutex_unlock(ADDRESS) using truss (Solaris variant of strace). SoSolaris seems to implement the pthread mutexes via these lwp mutexes.

- after a few requests alternating between the two children, somethingstrange happens:

- child A returns from port_getn() for 22 (probably part of theaccept() impl)

  - child A calls accept()
  - child A unlocks the accept mutex using lwp_mutex_unlock()
  - child B locks the accept mutex using lwp_mutex_timedlock()
  - child B calls port_associate for 22 (probably part of accept() impl)
  - child A handles the request
! - child A also calls port_associate for 22 (no sign of lock acquire)
! - child A returns from port_getn() for 22
  - child A calls accept() and starts to handle the connection
! - child B also returns from port_getn() for 22
! - child B gets EAGAIN for the accept()
  - child B calls port_associate for 22
  - child A handles the request

! - child A gets EDEADLK for pthread_mutex_lock() (this is from theerror_log; there's no system call for this instance ofpthread_mutex_lock() in the truss output). EDEADLK forpthread_mutex_lock() means that this process already has that lock.

  - child B returns from port_getn() for 22
  - child B calls accept() and starts to handle the connection
  - child A exits (now B is the only child left)
  - child B proceeds request handling

- child B does all further port_associate(), port_getn(), andaccept() plus connection and request handling. No morelwp_mutex_timedlock() or lwp_mutex_unlock() for B, maybe due to someoptimization when only one process for a lock is left.

It is quite possible, that there is an underlying lwp_mutex or portfsbug here. But it is strange, that this only comes up when used withmod_ext_filter in combination with LimitRequestBody.


There was the fix

https://bz.apache.org/bugzilla/show_bug.cgi?id=60375

but I don't see, how this would influence the above list.

I can try to further narrow down (using static content to eliminate oneforked child; using LimitRequestBody without mod_exit-filter etc.), butmaybe someone already has an idea?


Regards,

Rainer

Solaris, prefork, accept mutex and mod_ext_filter (Was: Prefork MPM and mod_watchdog)

Reply via email to