On Fri, Jul 25, 2014 at 9:35 PM, James Powell <[email protected]> wrote: > My question is why are you running Upstart? Runit has it's own init so > Upstart is pointless.
Ubuntu uses upstart; this configuration is how runit is packaged in that distro. Replacing PID 1 on our servers with runit is a large effort I don't intend to undertake. We use various distro-packaged services that are run by Upstart. But runit has been great for managing our application processes thus far, modulo the issue I described. > Runit's binary should maintain runsv. It also could depend on the run script > also having an improper handling. Can you explain what this means? > > Sent from my Windows Phone > ________________________________ > From: Caleb Spare<mailto:[email protected]> > Sent: 7/25/2014 5:16 PM > To: [email protected]<mailto:[email protected]> > Subject: Rare runsv logging problem > > Hi, > > I've been using runit for a while now and it has been mostly > wonderful. I'm noticing a persistent issue and I'm not sure how to > debug it. > > On the servers we're running Ubuntu and we use runit 2.1.1 via the > default package that comes with the distro. Upstart runs runsvdir and > we use runit to manage all of our application processes. Each > application has a simple ./run and ./log/run; the latter execs svlogd > (this is all a typical configuration, as I understand it). > > The problem I'm seeing is that, very occasionally, runsv will get into > a bad state where svlogd is not running. (I'm not sure if it fails to > start svlogd or if this happens later on after it has been running > properly.) When the problem occurs, pstree shows something like this: > > runsvdir-+-runsv-+-foo---5*[{foo}] > | `-svlogd > |-runsv-+-bar---21*[{bar}] > | `-svlogd > `-runsv---baz---250*[{baz}] > > Here you can see that the baz process does not have an associated > svlogd process. Further: > > $ sudo sv s foo > run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s > $ sudo sv s baz > run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok: > file does not exist > ; run: log: (pid 2337) 2983s > > Two strange things there: the warning about supervise/ok and also that > the pid for 'log' is the same as for 'baz'. > > When runsv is in this bad state, the output from baz goes right to > runsvdir and ends up in /var/log/upstart/runsvdir.log. > > The fix I've been using is to 'sv d baz' and then kill the offending > runsv process. Runsvdir will quickly restart it and then everything > will be working: > > runsvdir-+-runsv-+-foo---5*[{foo}] > | `-svlogd > |-runsv-+-baz---25*[{baz}] > | `-svlogd > `-runsv-+-bar---20*[{bar}] > `-svlogd > > I'm unsure what causes this rare problem. We only do simple things > with the runit: sv {t,d,u}. When we deploy services, we rsync a > directory from elsewhere on the box into /etc/services/<name> and then > 'sv t <name>'. That source dir only has ./run, ./finish, and > ./log/run. > > Any ideas of what we might be doing wrong, or how to otherwise avoid > this issue? Or if not, what I could do to further debug? > > Sorry for the long email; I wanted to be thorough in my description > and avoid making assumptions about what could be causing this problem. > > Thanks, > Caleb Spare
