On Thu, 04 Feb 2010 17:22:23 -0600
Larry Sheldon larryshel...@cox.net wrote:
On 2/4/2010 5:13 PM, Larry Sheldon wrote:
On 2/4/2010 3:30 PM, Scott Weeks wrote:
A recent organizational change at my company has put someone in charge
who is determined to make things perfect. We are a service
On 2/6/2010 8:12 AM, Mark Smith wrote:
On Thu, 04 Feb 2010 17:22:23 -0600
Larry Sheldonlarryshel...@cox.net wrote:
Present the cost and the plan in a public forum or widely distributed
memorandum (including as a minimum everybody that was at the meeting and
everybody in the chain(s) of
A recent organizational change at my company has put someone in charge
who is determined to make things perfect. We are a service provider,
isn't a common occurrence, and the engineer in question has a pristine
track record.
This outage, of a high profile customer, triggered upper management
has dependencies then the main issue has to be resolved.
Hopefully, you've seen that all good things have a dark side,
--- On Thu, 2/4/10, Scott Weeks sur...@mauigateway.com wrote:
From: Scott Weeks sur...@mauigateway.com
Subject: Re: Mitigating human error in the SP
To: nanog@nanog.org
:
From: Scott Weeks sur...@mauigateway.com
Subject: Re: Mitigating human error in the SP
To: nanog@nanog.org
Date: Thursday, February 4, 2010, 10:30 PM
A recent organizational change at my company has put
someone in charge
who is determined to make things perfect. We are a
service provider
On 2/4/2010 3:30 PM, Scott Weeks wrote:
A recent organizational change at my company has put someone in charge
who is determined to make things perfect. We are a service provider,
isn't a common occurrence, and the engineer in question has a pristine
track record.
This outage, of a high
On 2/4/2010 5:13 PM, Larry Sheldon wrote:
On 2/4/2010 3:30 PM, Scott Weeks wrote:
A recent organizational change at my company has put someone in charge
who is determined to make things perfect. We are a service provider,
isn't a common occurrence, and the engineer in question has a pristine
Reminds me of the saying, nothing is foolproof given a sufficiently talented
fool. I do agree that checklist, peer reviews, parallel turnups, and lab
testing when used and not jury rigged have helped me prepare for issue.
Usually when I skipped those things are the time I kick myself for not
On Mon, Feb 01, 2010 at 09:46:07PM -0500, Stefan Fouant wrote:
Vijay Gill had some real interesting insights into this in a
presentation he gave back at NANOG 44:
http://www.nanog.org/meetings/nanog44/presentations/Monday/Gill_programatic_N44.pdf
His Blog article on Infrastructure is
On Wed, Feb 3, 2010 at 11:14 AM, Ross Vandegrift r...@kallisti.us wrote:
On Mon, Feb 01, 2010 at 09:46:07PM -0500, Stefan Fouant wrote:
Vijay Gill had some real interesting insights into this in a
presentation he gave back at NANOG 44:
3) Automation interfaces are largely unsupported:
CLI is an automation interface. Combine that with a management server
from which telnet sessions to the router can be managed, and you have
probably the lowest risk automation interface possible. This may force
you into building your own tools,
You can completely implement Vijay's most impressive stuff and simply
move the problem to a different level of abstraction.
No matter what you do, it still comes down to some geek banging on
some plastic thingy. I'm as likely to screw up an Extensible
Entity-Attribute-Relationship as I am an
Otherwise, as Suresh notes, the only way to eliminate human error completely
is
to eliminate the presence of humans in the activity.
and,hence by reference.
Automated config deployment / provisioning.
That's the funniest thing I've read all day... ;-)
A little pessimistic rant ;-)
On Tue, 2010-02-02 at 12:26 +, gb10hkzo-na...@yahoo.co.uk wrote:
Nothing in the IT / ISP / Telco world is ever going to be perfect,
far too complex with many dependencies. Yes you might play in your
perfect little labs until the cows come home . but there always
has been and
On Mon, 1 Feb 2010 21:21:52 -0500
Chadwick Sorrell mirot...@gmail.com wrote:
Hello NANOG,
Long time listener, first time caller.
A recent organizational change at my company has put someone in charge
who is determined to make things perfect. We are a service provider,
not an enterprise
On 02/02/2010 02:21, Chadwick Sorrell wrote:
This outage, of a high profile customer, triggered upper management to
react by calling a meeting just days after. Put bluntly, we've been
told Human errors are unacceptable, and they will be completely
eliminated. One is too many.
Leaving the
Humans make errors.
For your upper management to think they can build a foundation of reliability
on the theory that humans won't make errors is self deceiving.
But that isn't where the story ends. That's where it begins. Your
infrastructure, processes and tools should all be designed
Humans make errors.
For your upper management to think they can build a foundation of reliability
on the theory that humans won't make errors is self deceiving.
But that isn't where the story ends. That's where it begins. Your
infrastructure, processes and tools should all be designed
On Tue, Feb 2, 2010 at 9:09 AM, Paul Corrao pcor...@voxeo.com wrote:
Humans make errors.
For your upper management to think they can build a foundation of
reliability on the theory that humans won't make errors is self deceiving.
But that isn't where the story ends. That's where it
On 2/2/2010 6:26 AM, gb10hkzo-na...@yahoo.co.uk wrote:
Otherwise, as Suresh notes, the only way to eliminate human error
completely is to eliminate the presence of humans in the
activity.
and,hence by reference.
Automated config deployment / provisioning.
That's the funniest thing
We have solved 98% of this with standard configurations and templates.
To deviate from this requires management approval/exception approval after an
evaluation of the business risks.
Automation of config building is not too hard, and certainly things like
peer-groups (cisco) and regular groups
On Feb 2, 2010, at 9:33 AM, Jared Mauch wrote:
We have solved 98% of this with standard configurations and templates.
To deviate from this requires management approval/exception approval
after an evaluation of the business risks.
I would also point Chad to this book: http://bit.ly/cShEIo
Chadwick Sorrell wrote:
This outage, of a high profile customer, triggered upper management to
react by calling a meeting just days after. Put bluntly, we've been
told Human errors are unacceptable, and they will be completely
eliminated. One is too many.
Good, Fast, Cheap - pick any two.
On Tue, Feb 2, 2010 at 12:45 PM, James Downs james.do...@egontech.com wrote:
On Feb 2, 2010, at 9:33 AM, Jared Mauch wrote:
We have solved 98% of this with standard configurations and templates.
To deviate from this requires management approval/exception approval after
an evaluation of the
On 2/2/2010 11:33 AM, Jared Mauch wrote:
We have solved 98% of this with standard configurations and
templates.
To deviate from this requires management approval/exception approval
after an evaluation of the business risks.
Automation of config building is not too hard, and certainly things
Thanks for all the comments!
On Tue, Feb 2, 2010 at 1:01 PM, JC Dill jcdill.li...@gmail.com wrote:
Chadwick Sorrell wrote:
This outage, of a high profile customer, triggered upper management to
react by calling a meeting just days after. Put bluntly, we've been
told Human errors are
Automated config deployment / provisioning. And sanity checking
before deployment.
Easy to say, not so easy to do. For instance, that incorrect port was identified
by a number or name. Theoretically, if an automated tool pulls the number/name
from a database and issues the command, then the
Never said it was, and never said foolproof either. Minimizing the
chance of error is what I'm after - and ssh'ing in + hand typing
configs isn't the way to go.
Use a known good template to provision stuff - and automatically
deploy it, and the chances of human error go down quite a lot. Getting
The actual error happened when someone was troubleshooting a turn-up,
where in the past the customer in question has had their ethertype set
wrong. It wasn't a provisioning problem as much as someone
troubleshooting why it didn't come up with the customer. Ironically,
the NOC was on the
If your manager pretends that they can manage humans without a few
well-worn human factor books on their shelf, quit.
David
On Tue, Feb 2, 2010 at 5:36 PM, Michael Dillon
wavetos...@googlemail.com wrote:
The actual error happened when someone was troubleshooting a turn-up,
where in
On Feb 2, 2010, at 8:36 PM, Suresh Ramasubramanian wrote:
Never said it was, and never said foolproof either. Minimizing the
chance of error is what I'm after - and ssh'ing in + hand typing
configs isn't the way to go.
Use a known good template to provision stuff - and automatically
On Tue, Feb 2, 2010 at 7:51 AM, Chadwick Sorrell mirot...@gmail.com wrote:
This outage, of a high profile customer, triggered upper management to
react by calling a meeting just days after. Put bluntly, we've been
told Human errors are unacceptable, and they will be completely
eliminated.
On Feb 2, 2010, at 10:28 AM, Suresh Ramasubramanian wrote:
Automated config deployment / provisioning. And sanity checking before
deployment.
A lab in which changes can be simulated and rehearsed ahead of time, new OS
revisions tested, etc.
A DCN.
: 0xB5E3803D
-Original Message-
From: Suresh Ramasubramanian [mailto:ops.li...@gmail.com]
Sent: Monday, February 01, 2010 9:29 PM
To: Chadwick Sorrell
Cc: nanog@nanog.org
Subject: Re: Mitigating human error in the SP
On Tue, Feb 2, 2010 at 7:51 AM, Chadwick Sorrell mirot...@gmail.com
On 2/1/2010 6:21 PM, Chadwick Sorrell wrote:
Any other comments on the subject would be appreciated, we would like
to come to our next meeting armed and dangerous.
If upper management believes humans can be required to make no errors, ask
whether they have achieved that ideal for
I'll say as vijay gill notes after Stefan posted those two very
interesting links. He's saying much the same that I did - in a great
deal more detail. Fascinating.
http://www.nanog.org/meetings/nanog44/presentations/Monday/Gill_programatic_N44.pdf
His Blog article on Infrastructure is
36 matches
Mail list logo