Thanks Nate, this is exactly what I was looking for. One more question — does puppet have any mechanism for monitoring service daemons and restarting them in the case where they have a catastrophic failure/crash? How do others in the Bigtop world deal with high availability and ensuring that processes are restarted when they inappropriately terminate? Does anyone have this kind of need?
On 12/11/14, 12:26 AM, "Nate D'Amico" <[email protected]> wrote: >Guess breaking into two items: > >-detecting a failed puppet run when triggered via script/external apply >-how many times to retry > >For the former, you could try to use " --detailed-exitcodes" which should >force a non-zero exit code, your script could detect that and act >accordingly. Remember seeing a bug while back mentioned that you needed >to assert that param on apply to force puppet to return non-zero on >error. Not sure if still exists, or what version you are running but >safe to probably try. > >As far as number of retries, all apps/services/etc could be different.., >only specific point of view I would say is given the puppet apply has all >data/attributes it needs to successfully converge, after two failed >attempts you can safely assume failed, and then resort to log check to >see what issue could be. > >One other aspect to consider is that the puppet converge could succeed >but something outside causes a failure right after. Depending on >resiliency you would want your process/other monitor to assert after a >successful run, and restart the whole converge run again.., or just >notify, or etc. > >Does that help? > > >-----Original Message----- >From: Konstantin Boudnik [mailto:[email protected]] >Sent: Wednesday, December 10, 2014 4:08 PM >To: [email protected] >Cc: [email protected]; Nate D'Amico; Rich >Subject: Re: Problem using puppet scripts to configure bigtop on >AmazonLinux > >Rob, > >following on our IRC chat I will Cc here two guys from the community who >know Puppet the best. Nate and Rich are likely to have the answer. Guys, >if you can chime in on the topic - it'd be great! > >To reiterate it: you are looking to a way to automatically tell if a >recipe has failed and repeat it, if required, right? > >On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote: >> Thanks Cos, >> >> This would be something that I would want to automate as it would be >> running many times across many different clusters. Ideally I would fix >> any issues causing the puppet scripts to not complete properly, but I >> don╧t know how realistic that is in the short term so I would like to >> setup retry logic if that is the recommended way of doing things. >> That╧s why I was hoping for some direction on how often to run the >>retry. >> >> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <[email protected]> wrote: >> >> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote: >> >> Thanks Roman, >> >> >> >> I actually fixed the problem. I had an existing process monitoring >> >>the daemon and restarting it if it terminated. However, puppet >> >>encapsulates this so it is no longer needed. Also, this process was >> >>causing the namenode service to terminate once. I removed my >> >>existing monitoring process and everything is working fine. >> >> >> >> That being said is there a recommended number of times we should >> >>retry the puppet scripts on failure? >> > >> >Good to see you're coming through! As for the retries: if something >> >doesn't work I usually check the logs immediatelly. Sometimes after a >> >second re-run. >> > >> >Cos >> > >> >
