Re: Mitigating human error in the SP

2010-02-06 Thread Mark Smith
On Thu, 04 Feb 2010 17:22:23 -0600 Larry Sheldon larryshel...@cox.net wrote: On 2/4/2010 5:13 PM, Larry Sheldon wrote: On 2/4/2010 3:30 PM, Scott Weeks wrote: A recent organizational change at my company has put someone in charge who is determined to make things perfect. We are a service

Re: Mitigating human error in the SP

2010-02-06 Thread Larry Sheldon
On 2/6/2010 8:12 AM, Mark Smith wrote: On Thu, 04 Feb 2010 17:22:23 -0600 Larry Sheldonlarryshel...@cox.net wrote: Present the cost and the plan in a public forum or widely distributed memorandum (including as a minimum everybody that was at the meeting and everybody in the chain(s) of

Re: Mitigating human error in the SP

2010-02-04 Thread Scott Weeks
A recent organizational change at my company has put someone in charge who is determined to make things perfect. We are a service provider, isn't a common occurrence, and the engineer in question has a pristine track record. This outage, of a high profile customer, triggered upper management

Re: Mitigating human error in the SP

2010-02-04 Thread isabel dias
has dependencies then the main issue has to be resolved. Hopefully, you've seen that all good things have a dark side, --- On Thu, 2/4/10, Scott Weeks sur...@mauigateway.com wrote: From: Scott Weeks sur...@mauigateway.com Subject: Re: Mitigating human error in the SP To: nanog@nanog.org

Re: Mitigating human error in the SP

2010-02-04 Thread Scott Weeks
: From: Scott Weeks sur...@mauigateway.com Subject: Re: Mitigating human error in the SP To: nanog@nanog.org Date: Thursday, February 4, 2010, 10:30 PM A recent organizational change at my company has put someone in charge who is determined to make things perfect. We are a service provider

Re: Mitigating human error in the SP

2010-02-04 Thread Larry Sheldon
On 2/4/2010 3:30 PM, Scott Weeks wrote: A recent organizational change at my company has put someone in charge who is determined to make things perfect. We are a service provider, isn't a common occurrence, and the engineer in question has a pristine track record. This outage, of a high

Re: Mitigating human error in the SP

2010-02-04 Thread Larry Sheldon
On 2/4/2010 5:13 PM, Larry Sheldon wrote: On 2/4/2010 3:30 PM, Scott Weeks wrote: A recent organizational change at my company has put someone in charge who is determined to make things perfect. We are a service provider, isn't a common occurrence, and the engineer in question has a pristine

Re: Mitigating human error in the SP

2010-02-03 Thread Brian Raaen
Reminds me of the saying, nothing is foolproof given a sufficiently talented fool. I do agree that checklist, peer reviews, parallel turnups, and lab testing when used and not jury rigged have helped me prepare for issue. Usually when I skipped those things are the time I kick myself for not

Re: Mitigating human error in the SP

2010-02-03 Thread Ross Vandegrift
On Mon, Feb 01, 2010 at 09:46:07PM -0500, Stefan Fouant wrote: Vijay Gill had some real interesting insights into this in a presentation he gave back at NANOG 44: http://www.nanog.org/meetings/nanog44/presentations/Monday/Gill_programatic_N44.pdf His Blog article on Infrastructure is

Re: Mitigating human error in the SP

2010-02-03 Thread Christopher Morrow
On Wed, Feb 3, 2010 at 11:14 AM, Ross Vandegrift r...@kallisti.us wrote: On Mon, Feb 01, 2010 at 09:46:07PM -0500, Stefan Fouant wrote: Vijay Gill had some real interesting insights into this in a presentation he gave back at NANOG 44:

Re: Mitigating human error in the SP

2010-02-03 Thread Michael Dillon
3) Automation interfaces are largely unsupported: CLI is an automation interface. Combine that with a management server from which telnet sessions to the router can be managed, and you have probably the lowest risk automation interface possible. This may force you into building your own tools,

Re: Mitigating human error in the SP

2010-02-03 Thread David Hiers
You can completely implement Vijay's most impressive stuff and simply move the problem to a different level of abstraction. No matter what you do, it still comes down to some geek banging on some plastic thingy. I'm as likely to screw up an Extensible Entity-Attribute-Relationship as I am an

Re: Mitigating human error in the SP

2010-02-02 Thread gb10hkzo-nanog
Otherwise, as Suresh notes, the only way to eliminate human error completely is to eliminate the presence of humans in the activity. and,hence by reference. Automated config deployment / provisioning. That's the funniest thing I've read all day... ;-) A little pessimistic rant ;-)

Re: Mitigating human error in the SP

2010-02-02 Thread gordon b slater
On Tue, 2010-02-02 at 12:26 +, gb10hkzo-na...@yahoo.co.uk wrote: Nothing in the IT / ISP / Telco world is ever going to be perfect, far too complex with many dependencies. Yes you might play in your perfect little labs until the cows come home . but there always has been and

Re: Mitigating human error in the SP

2010-02-02 Thread Mark Smith
On Mon, 1 Feb 2010 21:21:52 -0500 Chadwick Sorrell mirot...@gmail.com wrote: Hello NANOG, Long time listener, first time caller. A recent organizational change at my company has put someone in charge who is determined to make things perfect. We are a service provider, not an enterprise

Re: Mitigating human error in the SP

2010-02-02 Thread Nick Hilliard
On 02/02/2010 02:21, Chadwick Sorrell wrote: This outage, of a high profile customer, triggered upper management to react by calling a meeting just days after. Put bluntly, we've been told Human errors are unacceptable, and they will be completely eliminated. One is too many. Leaving the

Re: Mitigating human error in the SP

2010-02-02 Thread Paul Corrao
Humans make errors. For your upper management to think they can build a foundation of reliability on the theory that humans won't make errors is self deceiving. But that isn't where the story ends. That's where it begins. Your infrastructure, processes and tools should all be designed

Re: Mitigating human error in the SP

2010-02-02 Thread Paul Corrao
Humans make errors. For your upper management to think they can build a foundation of reliability on the theory that humans won't make errors is self deceiving. But that isn't where the story ends. That's where it begins. Your infrastructure, processes and tools should all be designed

Re: Mitigating human error in the SP

2010-02-02 Thread Chadwick Sorrell
On Tue, Feb 2, 2010 at 9:09 AM, Paul Corrao pcor...@voxeo.com wrote: Humans make errors. For your upper management to think  they can build a foundation of reliability on the theory that humans won't make errors is self deceiving. But that isn't where the story ends.  That's where it

Re: Mitigating human error in the SP

2010-02-02 Thread Larry Sheldon
On 2/2/2010 6:26 AM, gb10hkzo-na...@yahoo.co.uk wrote: Otherwise, as Suresh notes, the only way to eliminate human error completely is to eliminate the presence of humans in the activity. and,hence by reference. Automated config deployment / provisioning. That's the funniest thing

Re: Mitigating human error in the SP

2010-02-02 Thread Jared Mauch
We have solved 98% of this with standard configurations and templates. To deviate from this requires management approval/exception approval after an evaluation of the business risks. Automation of config building is not too hard, and certainly things like peer-groups (cisco) and regular groups

Re: Mitigating human error in the SP

2010-02-02 Thread James Downs
On Feb 2, 2010, at 9:33 AM, Jared Mauch wrote: We have solved 98% of this with standard configurations and templates. To deviate from this requires management approval/exception approval after an evaluation of the business risks. I would also point Chad to this book: http://bit.ly/cShEIo

Re: Mitigating human error in the SP

2010-02-02 Thread JC Dill
Chadwick Sorrell wrote: This outage, of a high profile customer, triggered upper management to react by calling a meeting just days after. Put bluntly, we've been told Human errors are unacceptable, and they will be completely eliminated. One is too many. Good, Fast, Cheap - pick any two.

Re: Mitigating human error in the SP

2010-02-02 Thread Chadwick Sorrell
On Tue, Feb 2, 2010 at 12:45 PM, James Downs james.do...@egontech.com wrote: On Feb 2, 2010, at 9:33 AM, Jared Mauch wrote: We have solved 98% of this with standard configurations and templates. To deviate from this requires management approval/exception approval after an evaluation of the

Re: Mitigating human error in the SP

2010-02-02 Thread Larry Sheldon
On 2/2/2010 11:33 AM, Jared Mauch wrote: We have solved 98% of this with standard configurations and templates. To deviate from this requires management approval/exception approval after an evaluation of the business risks. Automation of config building is not too hard, and certainly things

Re: Mitigating human error in the SP

2010-02-02 Thread Chadwick Sorrell
Thanks for all the comments! On Tue, Feb 2, 2010 at 1:01 PM, JC Dill jcdill.li...@gmail.com wrote: Chadwick Sorrell wrote: This outage, of a high profile customer, triggered upper management to react by calling a meeting just days after.  Put bluntly, we've been told Human errors are

Re: Mitigating human error in the SP

2010-02-02 Thread Michael Dillon
Automated config deployment / provisioning.   And sanity checking before deployment. Easy to say, not so easy to do. For instance, that incorrect port was identified by a number or name. Theoretically, if an automated tool pulls the number/name from a database and issues the command, then the

Re: Mitigating human error in the SP

2010-02-02 Thread Suresh Ramasubramanian
Never said it was, and never said foolproof either. Minimizing the chance of error is what I'm after - and ssh'ing in + hand typing configs isn't the way to go. Use a known good template to provision stuff - and automatically deploy it, and the chances of human error go down quite a lot. Getting

Re: Mitigating human error in the SP

2010-02-02 Thread Michael Dillon
The actual error happened when someone was troubleshooting a turn-up, where in the past the customer in question has had their ethertype set wrong.  It wasn't a provisioning problem as much as someone troubleshooting why it didn't come up with the customer.  Ironically, the NOC was on the

Re: Mitigating human error in the SP

2010-02-02 Thread David Hiers
If your manager pretends that they can manage humans without a few well-worn human factor books on their shelf, quit. David On Tue, Feb 2, 2010 at 5:36 PM, Michael Dillon wavetos...@googlemail.com wrote: The actual error happened when someone was troubleshooting a turn-up, where in

Re: Mitigating human error in the SP

2010-02-02 Thread Steven Bellovin
On Feb 2, 2010, at 8:36 PM, Suresh Ramasubramanian wrote: Never said it was, and never said foolproof either. Minimizing the chance of error is what I'm after - and ssh'ing in + hand typing configs isn't the way to go. Use a known good template to provision stuff - and automatically

Re: Mitigating human error in the SP

2010-02-01 Thread Suresh Ramasubramanian
On Tue, Feb 2, 2010 at 7:51 AM, Chadwick Sorrell mirot...@gmail.com wrote: This outage, of a high profile customer, triggered upper management to react by calling a meeting just days after.  Put bluntly, we've been told Human errors are unacceptable, and they will be completely eliminated.  

Re: Mitigating human error in the SP

2010-02-01 Thread Dobbins, Roland
On Feb 2, 2010, at 10:28 AM, Suresh Ramasubramanian wrote: Automated config deployment / provisioning. And sanity checking before deployment. A lab in which changes can be simulated and rehearsed ahead of time, new OS revisions tested, etc. A DCN.

RE: Mitigating human error in the SP

2010-02-01 Thread Stefan Fouant
: 0xB5E3803D -Original Message- From: Suresh Ramasubramanian [mailto:ops.li...@gmail.com] Sent: Monday, February 01, 2010 9:29 PM To: Chadwick Sorrell Cc: nanog@nanog.org Subject: Re: Mitigating human error in the SP On Tue, Feb 2, 2010 at 7:51 AM, Chadwick Sorrell mirot...@gmail.com

Re: Mitigating human error in the SP

2010-02-01 Thread Dave CROCKER
On 2/1/2010 6:21 PM, Chadwick Sorrell wrote: Any other comments on the subject would be appreciated, we would like to come to our next meeting armed and dangerous. If upper management believes humans can be required to make no errors, ask whether they have achieved that ideal for

Re: Mitigating human error in the SP

2010-02-01 Thread Suresh Ramasubramanian
I'll say as vijay gill notes after Stefan posted those two very interesting links. He's saying much the same that I did - in a great deal more detail. Fascinating. http://www.nanog.org/meetings/nanog44/presentations/Monday/Gill_programatic_N44.pdf His Blog article on Infrastructure is