Re: [Discuss] automatic daemon restarts

2014-09-17 Thread Derek Martin
On Wed, Sep 17, 2014 at 03:14:10AM +, Edward Ned Harvey (blu) wrote: My production server that costs me $10k/minute for downtime is actually not one server, or located in one datacenter. If you're interested in solving problems at scale, talk to me about Akamai, this is basically all we

Re: [Discuss] automatic daemon restarts

2014-09-17 Thread Bill Ricker
On Wed, Sep 17, 2014 at 11:42 AM, Derek Martin inva...@pizzashack.org wrote: The point is, this is not one-size-fits-all. Right. One-size-fits-all, silver-bullets, and golden-hammers are in same class as Unicorns and snake-oil. Your mileage WILL vary, as your circumstances vary. ( Some of us

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Edward Ned Harvey (blu)
From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- bounces+blu=nedharvey@blu.org] On Behalf Of Tom Metro Richard Pieri wrote: Edward Ned Harvey (blu) wrote: An active system will notice mysqld died, recognize that it's not supposed to do that right now, and restart it.

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Michael Tiernan
I'd like to interject a singular term here between these parties and that term is 'caveat'. On 9/16/14 7:42 AM, Edward Ned Harvey (blu) wrote: I would rather receive notification that a production service was *restarted* rather than *is down* *If* and only *IF* it is a restart process that *I*

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Jason Normand
like in many things, it depends on the application. we recently were considering doing the same thing with the puppet agent. The agent will occasionally get stuck due to network errors and require a restart. This is really a bug in the application, but a fix is not expected any time soon. in

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Richard Pieri
On 9/16/2014 9:12 AM, Jason Normand wrote: this also assuming the monitoring and restart system is intelligent enough to not fall into a rapid fail restart loop. Precisely. I /did/ write blind restarts, didn't I? Yes, I did: Blind restarts can obfuscate this information, can cause damage to

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Derek Martin
On Tue, Sep 16, 2014 at 11:42:51AM +, Edward Ned Harvey (blu) wrote: I would rather receive notification that a production service was *restarted* rather than *is down* Richard wants to say that's stupid. I not only disagree, I think Richard's position is insulting and ignorantly

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Edward Ned Harvey (blu)
From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- bounces+blu=nedharvey@blu.org] On Behalf Of Derek Martin 1. An attacker of your site is able to exploit a vulnerability to upload a custom malicous loadable module for your managed service, but can not otherwise gain

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Gordon Marx
On Sep 16, 2014, at 7:00 PM, Edward Ned Harvey (blu) b...@nedharvey.com wrote: You receive notification that your production server is down, and your customers are being unserved and your business is losing $10k per minute. Are you going to checksum all of your system binaries before

Re: [Discuss] automatic daemon restarts

2014-09-16 Thread Edward Ned Harvey (blu)
From: Gordon Marx [mailto:gcm...@gmail.com] On Sep 16, 2014, at 7:00 PM, Edward Ned Harvey (blu) b...@nedharvey.com wrote: You receive notification that your production server is down, and your customers are being unserved and your business is losing $10k per minute. Are you going to

Re: [Discuss] automatic daemon restarts

2014-09-15 Thread Tom Metro
Richard Pieri wrote: Edward Ned Harvey (blu) wrote: An active system will notice mysqld died, recognize that it's not supposed to do that right now, and restart it. Which is a stupid way to run in production. There's a reason why the daemon died. That reason needs to be identified so that

Re: [Discuss] automatic daemon restarts

2014-09-15 Thread Richard Pieri
On 9/15/2014 4:15 PM, Tom Metro wrote: Not to say your points are invalid, but Netflix would disagree with you. They created a testing tool that intentionally kills random services on their production systems just to test that automated recovery works correctly. Netflix is a highly available