My question is why are you running Upstart? Runit has it's own init so Upstart is pointless. Runit's binary should maintain runsv. It also could depend on the run script also having an improper handling.
Sent from my Windows Phone ________________________________ From: Caleb Spare<mailto:[email protected]> Sent: 7/25/2014 5:16 PM To: [email protected]<mailto:[email protected]> Subject: Rare runsv logging problem Hi, I've been using runit for a while now and it has been mostly wonderful. I'm noticing a persistent issue and I'm not sure how to debug it. On the servers we're running Ubuntu and we use runit 2.1.1 via the default package that comes with the distro. Upstart runs runsvdir and we use runit to manage all of our application processes. Each application has a simple ./run and ./log/run; the latter execs svlogd (this is all a typical configuration, as I understand it). The problem I'm seeing is that, very occasionally, runsv will get into a bad state where svlogd is not running. (I'm not sure if it fails to start svlogd or if this happens later on after it has been running properly.) When the problem occurs, pstree shows something like this: runsvdir-+-runsv-+-foo---5*[{foo}] | `-svlogd |-runsv-+-bar---21*[{bar}] | `-svlogd `-runsv---baz---250*[{baz}] Here you can see that the baz process does not have an associated svlogd process. Further: $ sudo sv s foo run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s $ sudo sv s baz run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok: file does not exist ; run: log: (pid 2337) 2983s Two strange things there: the warning about supervise/ok and also that the pid for 'log' is the same as for 'baz'. When runsv is in this bad state, the output from baz goes right to runsvdir and ends up in /var/log/upstart/runsvdir.log. The fix I've been using is to 'sv d baz' and then kill the offending runsv process. Runsvdir will quickly restart it and then everything will be working: runsvdir-+-runsv-+-foo---5*[{foo}] | `-svlogd |-runsv-+-baz---25*[{baz}] | `-svlogd `-runsv-+-bar---20*[{bar}] `-svlogd I'm unsure what causes this rare problem. We only do simple things with the runit: sv {t,d,u}. When we deploy services, we rsync a directory from elsewhere on the box into /etc/services/<name> and then 'sv t <name>'. That source dir only has ./run, ./finish, and ./log/run. Any ideas of what we might be doing wrong, or how to otherwise avoid this issue? Or if not, what I could do to further debug? Sorry for the long email; I wanted to be thorough in my description and avoid making assumptions about what could be causing this problem. Thanks, Caleb Spare
