From: gonzalo dot paniagua at acquia dot com Operating system: Linux PHP version: 5.5.1 Package: FPM related Bug Type: Bug Bug description:Race condition between SIGCHLD and child stdout/stderr event leads to segfault
Description: ------------ This problem affects 5.3.27, 5.4.17, 5.5.1. The master fpm process dies with a SIGSEGV. Relevant configuration options that we have set is "catch_workers_output = yes" and pm on_demand. What follows is a stack trace, the strace output, and an explanation of the events that lead to the segfault. StaThread 1 (Thread 0x7ff47eed4720 (LWP 15282)): #0 0x00007ff479fe5cc6 in fork () from /lib/libc.so.6 #1 0x0000000000870c05 in fpm_children_make (wp=0x1d746d0, in_event_loop=<value optimized out>, nb_to_spawn=<value optimized out>, is_debug=0) at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_children.c:400 #2 0x000000000087125c in fpm_children_bury () at /usr/local/src/php- 5.3.26/sapi/fpm/fpm/fpm_children.c:288 #3 0x0000000000875942 in fpm_got_signal (ev=<value optimized out>, which=<value optimized out>, arg=<value optimized out>) at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_events.c:73 #4 0x0000000000880833 in fpm_event_epoll_wait (queue=<value optimized out>, timeout=<value optimized out>) at /usr/local/src/php- 5.3.26/sapi/fpm/fpm/events/epoll.c:143 #5 0x00000000008755d7 in fpm_event_loop (err=0) at /usr/local/src/php- 5.3.26/sapi/fpm/fpm/fpm_events.c:403 #6 0x00000000008708c7 in fpm_run (max_requests=0x7fffafba0acc) at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm.c:113 #7 0x0000000000877dec in main (argc=5, argv=0x7fffafba2c38) at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_main.c:1842ck trace: Strace output: [pid 25742] <... clock_gettime resumed> {4285884, 268457129}) = 0 [pid 25742] --- SIGCHLD (Child exited) @ 0 (0) --- [pid 25742] write(33, "C", 1) = 1 [pid 25742] rt_sigreturn(0) = 0 [pid 25742] --- SIGCHLD (Child exited) @ 0 (0) --- [pid 25742] write(33, "C", 1) = 1 [pid 25742] rt_sigreturn(0) = 0 [pid 25742] epoll_wait(86, {{EPOLLHUP, {u32=24867480, u64=24867480}}, {EPOLLHUP, {u32=24867552, u64=24867552}}, {EPOLLIN, {u32=17325088, u64=17325088}}, {EPOLLHUP, {u32=24866200, u64=24866200}}, {EPOLLHUP, {u32=24866272, u64=24866272}}}, 261, 1001) = 5 [pid 25742] read(87, "", 1023) = 0 [pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 87, {EPOLLIN, {u32=24867480, u64=24867480}}) = 0 [pid 25742] close(87) = 0 [pid 25742] read(90, "", 1023) = 0 [pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 90, {EPOLLIN, {u32=24867552, u64=24867552}}) = 0 [pid 25742] close(90) = 0 [pid 25742] read(31, "C", 1) = 1 [pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, NULL) = 26809 [pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0 [pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, NULL) = 26813 [pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0 [pid 25742] read(89, "", 1023) = 0 [pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 89, {EPOLLIN, {u32=24866200, u64=24866200}}) = 0 [pid 25742] close(89) = 0 [pid 25742] read(92, "", 1023) = 0 [pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 92, {EPOLLIN, {u32=24866272, u64=24866272}}) = 0 [pid 25742] close(92) = 0 [pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0 [pid 25742] read(31, "C", 1) = 1 [pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0 [pid 25742] read(31, 0x7fffc66ba90f, 1) = -1 EAGAIN (Resource temporarily unavailable) [pid 25742] read(89, 0x7fffc66ba490, 1023) = -1 EBADF (Bad file descriptor) [pid 25742] gettimeofday({1375385643, 703965}, NULL) = 0 [pid 25742] open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] open("/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 25742] write(4, "[01-Aug-2013 19:34:03] ERROR: un"..., 85) = 85 [pid 25742] close(4294967295) = -1 EBADF (Bad file descriptor) [pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) --- [pid 25742] futex(0x7fba74d4f820, FUTEX_WAKE_PRIVATE, 2147483647) = 0 [pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) --- What's going on? In order of appearance: * 2 processes finish, which triggers the SIGCHLD handler twice. They both write one character to the writing end of a pipe. * epoll_wait() wakes up containing 5 events: a) 2 EPOLLHUP are the stdout and stderr of PID 26809. b) 1 is the availability of data in the read end of the pipe used by the SIGCHLD handler. c) 2 EPOLLHUP are the stdout and stderr of PID 26813. * The file descriptors 87 and 90 from a) above are closed and removed from the epoll socket. * Now we go on to process the SIGCHLD. Notice that the loop finds the 2 processes that died. In both cases, fpm_children_close() is called, which in turn calls fpm_children_free(). * Time to process the events from c), only that in this case, the 'arg' received by "fpm_stdio_child_said(struct fpm_event_s *ev, short which, void *arg)" points to a location of memory that has been freed in the previous step. EBADF and SIGSEGV follow shortly after that. I am attaching a patch that fixes this problem for the epoll backend. If this is accepted as the right way to fix this issue, it is possible that similar fixes have to made to devpoll, poll, select, and kqueue event backends. The patch simply ignores EPOLLHUP events since there is nothing to read and the SIGCHLD generated event will take care of cleaning up the file descriptors. This patch fixed the issue for us. -- Edit bug report at https://bugs.php.net/bug.php?id=65398&edit=1 -- Try a snapshot (PHP 5.4): https://bugs.php.net/fix.php?id=65398&r=trysnapshot54 Try a snapshot (PHP 5.3): https://bugs.php.net/fix.php?id=65398&r=trysnapshot53 Try a snapshot (trunk): https://bugs.php.net/fix.php?id=65398&r=trysnapshottrunk Fixed in SVN: https://bugs.php.net/fix.php?id=65398&r=fixed Fixed in release: https://bugs.php.net/fix.php?id=65398&r=alreadyfixed Need backtrace: https://bugs.php.net/fix.php?id=65398&r=needtrace Need Reproduce Script: https://bugs.php.net/fix.php?id=65398&r=needscript Try newer version: https://bugs.php.net/fix.php?id=65398&r=oldversion Not developer issue: https://bugs.php.net/fix.php?id=65398&r=support Expected behavior: https://bugs.php.net/fix.php?id=65398&r=notwrong Not enough info: https://bugs.php.net/fix.php?id=65398&r=notenoughinfo Submitted twice: https://bugs.php.net/fix.php?id=65398&r=submittedtwice register_globals: https://bugs.php.net/fix.php?id=65398&r=globals PHP 4 support discontinued: https://bugs.php.net/fix.php?id=65398&r=php4 Daylight Savings: https://bugs.php.net/fix.php?id=65398&r=dst IIS Stability: https://bugs.php.net/fix.php?id=65398&r=isapi Install GNU Sed: https://bugs.php.net/fix.php?id=65398&r=gnused Floating point limitations: https://bugs.php.net/fix.php?id=65398&r=float No Zend Extensions: https://bugs.php.net/fix.php?id=65398&r=nozend MySQL Configuration Error: https://bugs.php.net/fix.php?id=65398&r=mysqlcfg