From:             gonzalo dot paniagua at acquia dot com
Operating system: Linux
PHP version:      5.5.1
Package:          FPM related
Bug Type:         Bug
Bug description:Race condition between SIGCHLD and child stdout/stderr event 
leads to segfault

Description:
------------
This problem affects 5.3.27, 5.4.17, 5.5.1.

The master fpm process dies with a SIGSEGV. Relevant configuration options
that 
we have set is "catch_workers_output = yes" and pm on_demand.

What follows is a stack trace, the strace output, and an explanation of the

events that lead to the segfault.

StaThread 1 (Thread 0x7ff47eed4720 (LWP 15282)):
#0  0x00007ff479fe5cc6 in fork () from /lib/libc.so.6
#1  0x0000000000870c05 in fpm_children_make (wp=0x1d746d0,
in_event_loop=<value 
optimized out>, nb_to_spawn=<value optimized out>, is_debug=0)
    at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_children.c:400
#2  0x000000000087125c in fpm_children_bury () at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/fpm_children.c:288
#3  0x0000000000875942 in fpm_got_signal (ev=<value optimized out>,
which=<value 
optimized out>, arg=<value optimized out>)
    at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_events.c:73
#4  0x0000000000880833 in fpm_event_epoll_wait (queue=<value optimized
out>, 
timeout=<value optimized out>) at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/events/epoll.c:143
#5  0x00000000008755d7 in fpm_event_loop (err=0) at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/fpm_events.c:403
#6  0x00000000008708c7 in fpm_run (max_requests=0x7fffafba0acc) at 
/usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm.c:113
#7  0x0000000000877dec in main (argc=5, argv=0x7fffafba2c38) at 
/usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_main.c:1842ck trace:

Strace output:
[pid 25742] <... clock_gettime resumed> {4285884, 268457129}) = 0
[pid 25742] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 25742] write(33, "C", 1)           = 1
[pid 25742] rt_sigreturn(0)             = 0
[pid 25742] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 25742] write(33, "C", 1)           = 1
[pid 25742] rt_sigreturn(0)             = 0
[pid 25742] epoll_wait(86, {{EPOLLHUP, {u32=24867480, u64=24867480}},
{EPOLLHUP, 
{u32=24867552, u64=24867552}}, {EPOLLIN, {u32=17325088, u64=17325088}}, 
{EPOLLHUP, {u32=24866200, u64=24866200}}, {EPOLLHUP, {u32=24866272, 
u64=24866272}}}, 261, 1001) = 5
[pid 25742] read(87, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 87, {EPOLLIN, {u32=24867480, 
u64=24867480}}) = 0
[pid 25742] close(87)                   = 0
[pid 25742] read(90, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 90, {EPOLLIN, {u32=24867552, 
u64=24867552}}) = 0
[pid 25742] close(90)                   = 0
[pid 25742] read(31, "C", 1)            = 1
[pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],
WNOHANG|WSTOPPED, 
NULL) = 26809
[pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0
[pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],
WNOHANG|WSTOPPED, 
NULL) = 26813
[pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0
[pid 25742] read(89, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 89, {EPOLLIN, {u32=24866200, 
u64=24866200}}) = 0
[pid 25742] close(89)                   = 0
[pid 25742] read(92, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 92, {EPOLLIN, {u32=24866272, 
u64=24866272}}) = 0
[pid 25742] close(92)                   = 0
[pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0
[pid 25742] read(31, "C", 1)            = 1
[pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0
[pid 25742] read(31, 0x7fffc66ba90f, 1) = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 25742] read(89, 0x7fffc66ba490, 1023) = -1 EBADF (Bad file
descriptor)
[pid 25742] gettimeofday({1375385643, 703965}, NULL) = 0
[pid 25742] open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo",
O_RDONLY) 
= -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo",
O_RDONLY) = 
-1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo",
O_RDONLY) = 
-1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY)
= -1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1

ENOENT (No such file or directory)
[pid 25742]
open("/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742]
open("/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo",

O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo",

O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo",
O_RDONLY) 
= -1 ENOENT (No such file or directory)
[pid 25742] write(4, "[01-Aug-2013 19:34:03] ERROR: un"..., 85) = 85
[pid 25742] close(4294967295)           = -1 EBADF (Bad file descriptor)
[pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[pid 25742] futex(0x7fba74d4f820, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) ---

What's going on? In order of appearance:
  * 2 processes finish, which triggers the SIGCHLD handler twice. They both

write one character to the writing end of a pipe.
  * epoll_wait() wakes up containing 5 events:
    a) 2 EPOLLHUP are the stdout and stderr of PID 26809.
    b) 1 is the availability of data in the read end of the pipe used by
the 
SIGCHLD handler.
    c) 2 EPOLLHUP are the stdout and stderr of PID 26813.
  * The file descriptors 87 and 90 from a) above are closed and removed
from the 
epoll socket.
  * Now we go on to process the SIGCHLD. Notice that the loop finds the 2 
processes that died. In both cases, fpm_children_close() is called, which
in 
turn calls fpm_children_free().
  * Time to process the events from c), only that in this case, the 'arg' 
received by "fpm_stdio_child_said(struct fpm_event_s *ev, short which, void

*arg)" points to a location of memory that has been freed in the previous
step. 
EBADF and SIGSEGV follow shortly after that.


I am attaching a patch that fixes this problem for the epoll backend. If
this is 
accepted as the right way to fix this issue, it is possible that similar
fixes 
have to made to devpoll, poll, select, and kqueue event backends.

The patch simply ignores EPOLLHUP events since there is nothing to read and
the 
SIGCHLD generated event will take care of cleaning up the file descriptors.
This 
patch fixed the issue for us.


-- 
Edit bug report at https://bugs.php.net/bug.php?id=65398&edit=1
-- 
Try a snapshot (PHP 5.4):   
https://bugs.php.net/fix.php?id=65398&r=trysnapshot54
Try a snapshot (PHP 5.3):   
https://bugs.php.net/fix.php?id=65398&r=trysnapshot53
Try a snapshot (trunk):     
https://bugs.php.net/fix.php?id=65398&r=trysnapshottrunk
Fixed in SVN:               https://bugs.php.net/fix.php?id=65398&r=fixed
Fixed in release:           https://bugs.php.net/fix.php?id=65398&r=alreadyfixed
Need backtrace:             https://bugs.php.net/fix.php?id=65398&r=needtrace
Need Reproduce Script:      https://bugs.php.net/fix.php?id=65398&r=needscript
Try newer version:          https://bugs.php.net/fix.php?id=65398&r=oldversion
Not developer issue:        https://bugs.php.net/fix.php?id=65398&r=support
Expected behavior:          https://bugs.php.net/fix.php?id=65398&r=notwrong
Not enough info:            
https://bugs.php.net/fix.php?id=65398&r=notenoughinfo
Submitted twice:            
https://bugs.php.net/fix.php?id=65398&r=submittedtwice
register_globals:           https://bugs.php.net/fix.php?id=65398&r=globals
PHP 4 support discontinued: https://bugs.php.net/fix.php?id=65398&r=php4
Daylight Savings:           https://bugs.php.net/fix.php?id=65398&r=dst
IIS Stability:              https://bugs.php.net/fix.php?id=65398&r=isapi
Install GNU Sed:            https://bugs.php.net/fix.php?id=65398&r=gnused
Floating point limitations: https://bugs.php.net/fix.php?id=65398&r=float
No Zend Extensions:         https://bugs.php.net/fix.php?id=65398&r=nozend
MySQL Configuration Error:  https://bugs.php.net/fix.php?id=65398&r=mysqlcfg

Reply via email to