Sorry, ignore the last question. PS. do we actually need to run s6-rc-init on each boot as a first line in the stage2 script?
On Mon, Jun 6, 2016 at 12:35 PM, Vasil Yordanov <[email protected]> wrote: > Hello Laurent, > > I have an issue with s6-rc, using the latest master code of the s6, s6-rc, > s6-linux-init. > > When I run a docker container with s6 installed on it, from time to time > it hangs on s6-rc-init. > > This happens from four times, ones or even more rear, but unfortunately > happens :( > > I have investigated a bit more the issue with strace, but it seems more > complex to understand the core reason of the problem. > > Here is my setup. > > Docker, with ENTRYPOINT "/etc/s6-linux-init/init" > root@25b2cb1a6998 / # cat /etc/s6-linux-init/init > #!/bin/execlineb -P > > /bin/export PATH "/usr/bin:/usr/sbin:/bin:/sbin" > /bin/cd / > s6-setsid -qb -- > umask 022 > if { s6-echo -- "\n init created by s6-linux-init-maker\n seehttp:// > skarnet.org/software/s6-linux-init/\n\n > <http://skarnet.org/software/s6-linux-init/%5Cn%5Cn>" } > #if { s6-mount -nwt tmpfs -o mode=0755 tmpfs "/run" } > if { s6-hiercopy "/etc/s6-linux-init"/run-image "/run" } > emptyenv -p > s6-envdir -I -- "/etc/s6-linux-init"/env > redirfd -r 0 /dev/null > redirfd -wnb 1 "/run"/service/s6-svscan-log/fifo > background > { > s6-setsid -- > redirfd -w 1 "/run"/service/s6-svscan-log/fifo > fdmove -c 1 2 > "/etc/rc.init" > } > unexport ! > cd "/run"/service > fdmove -c 2 1 > s6-svscan -st0 > > and the rc.init is > root@25b2cb1a6998 / # cat /etc/rc.init > #!/bin/sh > strace -o /tmp/str_s6-rc-init s6-rc-init -c /tmp/s6-rc-compiled-dir > /run/service > s6-rc change > > Currently I have put strace to trace s6-rc-init. > > Here is how my processes looks like when it hangs: > root 1 0 0 Jun04 ? 00:00:00 s6-svscan -st0 > root 7 1 0 Jun04 ? 00:00:00 /bin/sh /etc/rc.init > root 8 1 0 Jun04 ? 00:00:00 s6-supervise s6-svscan-log > nobody 9 8 0 Jun04 ? 00:00:00 s6-log -bp -- t > /run/uncaught-logs > root 10 7 0 Jun04 ? 00:00:00 strace -o > /tmp/str_s6-rc-init s6-rc-init -c /tmp/s6-rc-compiled-dir /run/service > root 13 10 0 Jun04 ? 00:00:00 s6-rc-init -c > /tmp/s6-rc-compiled-dir /run/service > root 14 13 0 Jun04 ? 00:00:00 s6-ftrigrd > .... > > You can see there is one additional process s6-ftrigrd. > > In the strace log you can see: > > .... > open("/run/s6-rc/scandir/.s6-svscan/control", O_WRONLY|O_NONBLOCK) = 6 > fcntl(6, F_GETFL) = 0x8801 (flags > O_WRONLY|O_NONBLOCK|O_LARGEFILE) > fcntl(6, F_SETFL, O_WRONLY|O_LARGEFILE) = 0 > write(6, "a", 1) = 1 > close(6) = 0 > ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975, > 950409000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left > {1152921504606846975, 909368723}) > recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0\f!s\0\0\0\4\0\0\0 > !s\0\0\0\4\0\0\0\"!s", 2047}, {NULL, 0}], msg_controllen=0, > msg_flags=MSG_CMSG_C > LOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 30 > recvmsg(5, 0x7ffd3eaca250, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = > -1 EAGAIN (Resource temporarily unavailable) > ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975, > 909072000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left > {1152921504606846975, 907791669}) > recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0\ > 0!s\0\0\0\4\0\0\0#!s\0\0\0\4\0\0\0!!s\0\0"..., 2018}, {"", 29}], > msg_controllen=0, msg_flags=MSG_C > MSG_CLOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 350 > ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975, > 907695000}, NULL, 8 > > it seems s6-rc-init is hanging and waiting for some event. > > The normal strace without hangs is > .... > open("/run/s6-rc/scandir/.s6-svscan/control", O_WRONLY|O_NONBLOCK) = 6 > fcntl(6, F_GETFL) = 0x8801 (flags > O_WRONLY|O_NONBLOCK|O_LARGEFILE) > fcntl(6, F_SETFL, O_WRONLY|O_LARGEFILE) = 0 > write(6, "a", 1) = 1 > close(6) = 0 > ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975, > 950277000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left > {1152921504606846975, 911900084}) > recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0\0!s", 2047}, > {NULL, 0}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, > MSG_DONTWAIT|MSG_WAITALL|M > SG_CMSG_CLOEXEC) = 10 > recvmsg(5, 0x7fff43f02640, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = > -1 EAGAIN (Resource temporarily unavailable) > ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975, > 911598000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left > {1152921504606846975, 910204953}) > recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0%! > s\0\0\0\4\0\0\0$!s\0\0\0\4\0\0\0\1!s\0\0"..., 2038}, {"", 9}], > msg_controllen=0, msg_flags=MSG_CM > SG_CLOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 290 > recvmsg(5, 0x7fff43f02640, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = > -1 EAGAIN (Resource temporarily unavailable) > ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975, > 910117000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left > {1152921504606846975, 909970613}) > recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0#! > s\0\0\0\4\0\0\0\t!s\0\0\0\4\0\0\0\6!s\0\0"..., 1748}, {"", 299}], > msg_controllen=0, msg_flags=MSG > _CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 80 > recvmsg(5, 0x7fff43f02640, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = > -1 EAGAIN (Resource temporarily unavailable) > close(4) = 0 > close(5) = 0 > wait4(14, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14 > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14, si_status=0, > si_utime=0, si_stime=1} --- > exit_group(0) = ? > +++ exited with 0 +++ > > And the processes are: > root 1 0 0 06:11 ? 00:00:00 s6-svscan -st0 > root 8 1 0 06:11 ? 00:00:00 s6-supervise s6-svscan-log > nobody 9 8 0 06:11 ? 00:00:00 s6-log -bp -- t > /run/uncaught-logs > ... > > > I have observed that the hangs of s6-rc-init happens more often when I > have more services. Currently I have 38 services and 2 bundles that group > these services > > Please help me to resolve this issue. > > Regards, > Vasil > > PS. do we actually need to run s6-rc-init on each boot as a first line in > the stage2 script? >
