Hi Michael, Thanks so much for wrestling this one to the ground. I've been out for a couple days but an engineer on our team has pulled your update and is using it successfully.
Best, Marc On Thu, Mar 20, 2014 at 07:57:29PM +0100, Michael G. Noll wrote: > Thanks for bringing this up, Marc. > > Yes, I can confirm your problem. And I can confirm this for both > supervisor-3.0b2 (beta) and supervisor-3.0 (stable release). > > I summarized the details here: > https://github.com/miguno/puppet-zookeeper/issues/1 > > After some back-and-forth testing I learned that the correct fix (or > maybe I should call it a workaround, as it could be a bug in supervisord > actually) involves the use of /both/: > > - `stopasgroup` must be set to true > - `trap "kill -- -$$" EXIT` must be added to /usr/bin/zookeeper-server > > Using only one of the two is not enough. See the link above for what > breaks in each case if you still try. > > The problem is fixed in puppet-zookeeper 1.0.4 and in the latest > (master/trunk) version of Wirbelsturm. > > Hope this helps! > Michael > > > > On 03/19/2014 07:18 PM, Marc Vaillant wrote: > > Hi Michael, > > > > Thanks very much for your hard work on this, your puppet scripts have > > been very helpful. We are having a specific issue with supervision of > > zookeeper and I wonder if you have encountered something similar or if > > we are doing something wrong. Even with the stopasgroup=true > > supervisord option, there still seems to be a problem with orphaned > > child processes when the parent process (zookeeper-server script) goes > > down from an external event. Although running "supervisorctl stop > > zookeeper" will take down the zookeeper-server script and its child > > processes, issuing a "killall zookeeper-server" will take down only the > > script, leaving the child processes running. This sends supervisord > > into an infinite loop of attempting to restart zookeeper, but failing > > because the child processes are still alive and occupying the required > > ports. > > > > A fix we've found (refer to the last answer here > > http://stackoverflow.com/questions/9090683/supervisord-stopping-child-processesis) > > is to put > > > > trap "kill -- -$$" EXIT > > > > at the top of the zookeeper-server script. However, it seems like the > > stopasgroup=true setting was designed to handle this case. I know that > > stopasgroup was part of the 3.0b2 (05.28.2013) release of supervisord. We > > are using the 3.0 (07.30.2013) release from your RPM > > https://github.com/miguno/wirbelsturm-rpm-supervisord so I believe it > > should be available. > > > > Thanks, > > Marc > > > > > > On Mon, Mar 17, 2014 at 09:02:11PM +0100, Michael G. Noll wrote: > >> Hi everyone, > >> > >> I have released a tool called Wirbelsturm > >> (https://github.com/miguno/wirbelsturm) that allows you to perform local > >> and remote deployments of Storm. It's also a small way of saying a big > >> "thank you" to the Storm community. > >> > >> Wirbelsturm uses Vagrant for creating and managing machines, and Puppet > >> for provisioning the machines once they're up and running. You can also > >> use Ansible to interact with deployed machines. Deploying Storm is but > >> one example, of course -- you can deploy other software with Wirbelsturm > >> as well (e.g. Graphite, Kafka, Redis, ZooKeeper). > >> > >> I also wrote a quick intro and behind-the-scenes blog post at [1], which > >> covers, for instance, the motivation behind building Wirbelsturm and > >> lessons learned along the way (read: mistakes made :-P). > >> > >> Enjoy! > >> Michael > >> > >> > >> [1] > >> http://www.michael-noll.com/blog/2014/03/17/wirbelsturm-one-click-deploy-storm-kafka-clusters-with-vagrant-puppet/ > >> > >> >
