On 07/06/2016 13:23, Vasil Yordanov wrote:
Unfortunately I'm not able to reproduce the problem. I decided to upgrade the Docker from 1.9 to 1.10, because 1.9 does not support mount of tmpfs. After that upgrade the s6-rc-init does not hang. I even wrote a script to continuously test it. For now its works for me, but if in future this happens I will create a more detailed strace to trace the forks.
Thanks. I will appreciate that. I've also been unable to reproduce the hang; I have triple-checked the code path and it definitely looks correct, so I have no idea what's happening. s6-rc-init, among other things, copies service directories from the service database to the live working copy (adding ./down files to the copies because the services start down). It starts a s6-ftrigrd process to listen to "s6-supervise has started" messages on all the service directories, then it links the servicedirs into the scandir and sends a "rescan" message to s6-svscan. At this point s6-svscan sees the new servicedirs and spawns s6-supervise processes on it. When a s6-supervise spawns, it notifies the listening s6-ftrigrd process. s6-rc-init waits until a notification has been received from all the service directories, then exits. What is happening when you hang is that s6-rc-init is not receiving all the notifications - one or more are missing, and s6-rc-init will not exit until it has gotten it. Either a s6-supervise process is failing to start, or it has already been started by the time the s6-ftrigrd listens to it, but since it's boot time the scandir is empty so that's not it (unless you have given the same name to one of your early services and one of your s6-rc longruns - don't do that!) If you have kept logs from your catch-all logger from the time s6-rc-init hanged, it would be a good idea to check them - one s6-supervise might be dying repeatedly and the catch-all logs would say so. -- Laurent
