On 28/07/2016 08:51, David Simmons wrote:
Surely the funniest thing we have ever heard is BT not having single POP diversity?
At the risk of being flamed, we all know that's clearly not the case. And no, Neil has not paid me to say that :)
I'm sure there are a lot of clueful folks at BT who by now know (a) exactly which combination of $bad-things-happening made this possible, and (b) it [probably] won't happen again.
We've all tested things in the lab and the failover worked beautifully and not one ping was dropped into the bin. Meanwhile, over there, the real world is a special case where Sod, Murphy and their minions sit sharpening their teeth quietly looking smugly at you, waiting for their time to pounce. I can certainly say that I've had rude surprises on networks I've designed/operated/owned where redundancy hasn't worked properly (or not at all) - and there's always the unexpected dependency that shows up occasionally. Anyone who hasn't - next time, it could be you.
Now obvious flaws get flagged up in design reviews / operational checks, its the multi-layered interdependencies that are hard to figure ahead of time. Hell, they are hard to figure sometimes when you have all of the logs, debug, the TAC and know what happened in what order.
I'd actually say that a major failure like this is going to result in a faster resolution for the end customer. If lots of customers are down due to a core issue, it has immediate attention from a lot of engineering resource; and it has visibility at the top of the organisation so nobody is going to get moaned at for "working on the major priority 1 outage" A single line down, on the other hand, needs to wait in the queue for someone to visit the exchange/cabinet/CP.
Paul.
