On Wed, Sep 17, 2014 at 03:14:10AM +, Edward Ned Harvey (blu) wrote:
My production server that costs me $10k/minute for downtime is actually not
one server, or located in one datacenter. If you're interested in solving
problems at scale, talk to me about Akamai, this is basically all we
On Wed, Sep 17, 2014 at 11:42 AM, Derek Martin inva...@pizzashack.org wrote:
The point is, this is not one-size-fits-all.
Right. One-size-fits-all, silver-bullets, and golden-hammers are in
same class as Unicorns and snake-oil. Your mileage WILL vary, as your
circumstances vary.
( Some of us
From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss-
bounces+blu=nedharvey@blu.org] On Behalf Of Tom Metro
Richard Pieri wrote:
Edward Ned Harvey (blu) wrote:
An active system will notice mysqld died, recognize that it's not
supposed to do that right now, and restart it.
I'd like to interject a singular term here between these parties and
that term is 'caveat'.
On 9/16/14 7:42 AM, Edward Ned Harvey (blu) wrote:
I would rather receive notification that a production service was
*restarted* rather than *is down*
*If* and only *IF* it is a restart process that *I*
like in many things, it depends on the application. we recently were
considering doing the same thing with the puppet agent. The agent will
occasionally get stuck due to network errors and require a restart. This
is really a bug in the application, but a fix is not expected any time
soon. in
On 9/16/2014 9:12 AM, Jason Normand wrote:
this also assuming the monitoring and restart system is intelligent enough
to not fall into a rapid fail restart loop.
Precisely. I /did/ write blind restarts, didn't I? Yes, I did:
Blind restarts can obfuscate this information, can cause
damage to
On Tue, Sep 16, 2014 at 11:42:51AM +, Edward Ned Harvey (blu) wrote:
I would rather receive notification that a production service was
*restarted* rather than *is down*
Richard wants to say that's stupid. I not only disagree, I think
Richard's position is insulting and ignorantly
From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss-
bounces+blu=nedharvey@blu.org] On Behalf Of Derek Martin
1. An attacker of your site is able to exploit a vulnerability to
upload a custom malicous loadable module for your managed service,
but can not otherwise gain
On Sep 16, 2014, at 7:00 PM, Edward Ned Harvey (blu) b...@nedharvey.com
wrote:
You receive notification that your production server is down, and your
customers are being unserved and your business is losing $10k per minute.
Are you going to checksum all of your system binaries before
From: Gordon Marx [mailto:gcm...@gmail.com]
On Sep 16, 2014, at 7:00 PM, Edward Ned Harvey (blu)
b...@nedharvey.com wrote:
You receive notification that your production server is down, and your
customers are being unserved and your business is losing $10k per minute.
Are you going to
Richard Pieri wrote:
Edward Ned Harvey (blu) wrote:
An active system will notice mysqld died, recognize that it's not
supposed to do that right now, and restart it.
Which is a stupid way to run in production. There's a reason why the
daemon died. That reason needs to be identified so that
On 9/15/2014 4:15 PM, Tom Metro wrote:
Not to say your points are invalid, but Netflix would disagree with you.
They created a testing tool that intentionally kills random services on
their production systems just to test that automated recovery works
correctly.
Netflix is a highly available
12 matches
Mail list logo