Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-23 Thread Tom Hill
On 21/05/18 17:10, Large Hadron Collider wrote:
> I would go as far as to say that Tier 1 is a derogatory designation, but
> I have a beef with Cogent because they're expecting otherwise Tier 1
> IPv6 ISP Hurricane Electric to bow to the altar of Cogent.

Owen, is dat yew?!

-- 
Tom


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-23 Thread Tom Hill
On 18/05/18 14:55, Stephen Satchell wrote:
> What happened when you sent out your last RPQ to the vendors with these
> requirements?

Why bother? There are so few products, with so few vendors, and their
list prices & discount levels are easily researchable in less than a
day. If you thought someone was going to build you a tailored device of
that ilk then you're surely going to need to commit to buying a lot more
than you actually need...

Whilst small-to-medium providers still need to play in the DFZ, they
don't often buy hundreds (let alone thousands) of small edge routers.


-- 
Tom


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-23 Thread Tom Hill
On 19/05/18 21:51, Ben Cannon wrote:
> Isn’t that the ASR9010?  (And before that 7609?)

I can't tell if you're taking the piss or not.

-- 
Tom


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-22 Thread Sebastian Wiesinger
* David Hubbard  [2018-05-16 19:01]:
> I’m curious if anyone who’s used 3356 for transit has found
> shortcomings in how their peering and redundancy is configured, or

>From a recent experience I can tell you that a change request to
change a peering from "full table" to "default route only" has
resulted in now 3+ weeks of conversation and an outage when they
misconfigured their session without them realising it.

Colleague of mine is now trying to send them the exact required set
commands for the Juniper gear they're using.

This is not what I would expect from a carrier like 3356.

Regards

Sebastian

-- 
GPG Key: 0x93A0B9CE (F4F6 B1A3 866B 26E9 450A  9D82 58A2 D94A 93A0 B9CE)
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-22 Thread Ben Cannon
Yep?



-Ben

> On May 21, 2018, at 6:37 PM, Aaron Gould  wrote:
> 
> 9010 and 7609 Small? 
> 
> Aaron
> 
>> On May 19, 2018, at 3:51 PM, Ben Cannon  wrote:
>> 
>> Isn’t that the ASR9010?  (And before that 7609?)
>> 
>> -Ben
>> 
 On May 18, 2018, at 4:20 AM, Tom Hill  wrote:
 
 On 17/05/18 14:24, Mike Hammett wrote:
 There's some industry hard-on with having a few ginormous routers instead 
 of many smaller ones.
>>> 
>>> "Industry hard-on", ITYM "Greedy vendors".
>>> 
>>> Try finding a 'small' router with a lot of ports (1 & 10GE) for your
>>> customers, and the right features/TCAM/CP performance, for a price that
>>> permits you to buy a lot of them.
>>> 
>>> -- 
>>> Tom
> 


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-22 Thread Mark Tinka


On 19/May/18 22:51, Ben Cannon wrote:

> Isn’t that the ASR9010?  (And before that 7609?)

The ASR9901 comes reasonably close - as close as the MX204 could get
(although 1Gbps ports might be an issue).

Mark.


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-21 Thread Aaron Gould
9010 and 7609 Small? 

Aaron

> On May 19, 2018, at 3:51 PM, Ben Cannon  wrote:
> 
> Isn’t that the ASR9010?  (And before that 7609?)
> 
> -Ben
> 
>>> On May 18, 2018, at 4:20 AM, Tom Hill  wrote:
>>> 
>>> On 17/05/18 14:24, Mike Hammett wrote:
>>> There's some industry hard-on with having a few ginormous routers instead 
>>> of many smaller ones.
>> 
>> "Industry hard-on", ITYM "Greedy vendors".
>> 
>> Try finding a 'small' router with a lot of ports (1 & 10GE) for your
>> customers, and the right features/TCAM/CP performance, for a price that
>> permits you to buy a lot of them.
>> 
>> -- 
>> Tom



Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-21 Thread Robert DeVita
If this is a know issue and has happened before and point to point circuits 
aren’t effected you always have the opportunity to diversify your own network 
and get private lines back to Miami, Jax, Atlanta or Dallas to create your own 
diversity don’t you?

Robert DeVita
Managing Director
Mejeticks
c. 469-441-8864
e. radev...@mejeticks.com
_
From: David Hubbard 
Sent: Wednesday, May 16, 2018 12:03 PM
Subject: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)
To: 


I’m curious if anyone who’s used 3356 for transit has found shortcomings in how 
their peering and redundancy is configured, or what a normal expectation to 
have is. The Tampa Bay market has been completely down for 3356 IP services 
twice so far this year, each for what I’d consider an unacceptable period of 
time (many hours). I’m learning that the entire market is served by just two 
fiber routes, through cities hundreds of miles away in either direction. So, 
basically two fiber cuts, potentially 1000+ miles apart, takes the entire 
region down. The most recent occurrence was a week or so ago when a Miami-area 
cut and an Orange, Texas cut (1287 driving miles apart) took IP services down 
for hours. It did not take point to point circuits to out of market locations 
down, so that suggests they even have the ability to be more redundant and 
simply choose not to.

I feel like it’s not unreasonable to expect more redundancy, or a much smaller 
attack surface given a disgruntled lineman who knows the routes could take an 
entire region down with a planned cut four states apart. Maybe other regions 
are better designed? Or are my expectations unreasonable? I carry three peers 
in that market, so it hasn’t been outage-causing, but I use 3356 in other 
markets too, and have plans for more, but it makes me wonder if I just haven't 
had the pleasure of similar outages elsewhere yet and I should factor that 
expectation into the design. It creates a problem for me in one location where 
I can only get them and Cogent, since Cogent can't be relied on for IPv6 
service, which I need.

Thanks






Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-21 Thread Luca Salvatore via NANOG
To answer your specific question - In the regions we use 3356 (NYC and
SFO/Bay Area) 3356 have been solid. I’d even say they have less issues than
the other usual tier 1 providers... for example 1299 had a hell of a week
last week around SFO was 3356 was stable.

Can’t comment on what I’d say are small regions like Tampa though.

On Sat, May 19, 2018 at 5:56 PM David Hubbard <dhubb...@dino.hostasaurus.com>
wrote:

> Yes, I do, as stated in my initial email.  My inquiry is about whether
> this level of downtime, and lack of redundancy for a given region, is
> normal for 3356.  There are some markets where diverse paths are not so
> easy to acquire.
> 
> From: Robert DeVita <radev...@mejeticks.com>
> Sent: Saturday, May 19, 2018 5:36:23 PM
> To: David Hubbard; nanog@nanog.org
> Subject: Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in
> general)
>
> If this is a know issue and has happened before and point to point
> circuits aren’t effected you always have the opportunity to diversify your
> own network and get private lines back to Miami, Jax, Atlanta or Dallas to
> create your own diversity don’t you?
>
> Robert DeVita
> Managing Director
> Mejeticks
> c. 469-441-8864
> e. radev...@mejeticks.com
> _
> From: David Hubbard <dhubb...@dino.hostasaurus.com>
> Sent: Wednesday, May 16, 2018 12:03 PM
> Subject: Curiosity about AS3356 L3/CenturyLink network resiliency (in
> general)
> To: <nanog@nanog.org>
>
>
> I’m curious if anyone who’s used 3356 for transit has found shortcomings
> in how their peering and redundancy is configured, or what a normal
> expectation to have is. The Tampa Bay market has been completely down for
> 3356 IP services twice so far this year, each for what I’d consider an
> unacceptable period of time (many hours). I’m learning that the entire
> market is served by just two fiber routes, through cities hundreds of miles
> away in either direction. So, basically two fiber cuts, potentially 1000+
> miles apart, takes the entire region down. The most recent occurrence was a
> week or so ago when a Miami-area cut and an Orange, Texas cut (1287 driving
> miles apart) took IP services down for hours. It did not take point to
> point circuits to out of market locations down, so that suggests they even
> have the ability to be more redundant and simply choose not to.
>
> I feel like it’s not unreasonable to expect more redundancy, or a much
> smaller attack surface given a disgruntled lineman who knows the routes
> could take an entire region down with a planned cut four states apart.
> Maybe other regions are better designed? Or are my expectations
> unreasonable? I carry three peers in that market, so it hasn’t been
> outage-causing, but I use 3356 in other markets too, and have plans for
> more, but it makes me wonder if I just haven't had the pleasure of similar
> outages elsewhere yet and I should factor that expectation into the design.
> It creates a problem for me in one location where I can only get them and
> Cogent, since Cogent can't be relied on for IPv6 service, which I need.
>
> Thanks
>
>
>
>
>


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-21 Thread Ben Cannon
Isn’t that the ASR9010?  (And before that 7609?)

-Ben

> On May 18, 2018, at 4:20 AM, Tom Hill  wrote:
> 
>> On 17/05/18 14:24, Mike Hammett wrote:
>> There's some industry hard-on with having a few ginormous routers instead of 
>> many smaller ones.
> 
> "Industry hard-on", ITYM "Greedy vendors".
> 
> Try finding a 'small' router with a lot of ports (1 & 10GE) for your
> customers, and the right features/TCAM/CP performance, for a price that
> permits you to buy a lot of them.
> 
> -- 
> Tom


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-21 Thread Scott Weeks


--- joe...@bogus.com wrote:
From: joel jaeggli 

alcatel/nokia 7750 (L3's newer PE platform) is large but 
not outlandish and they've been deployed for a couple 
years. 
--


More than a couple...  I was using them for MPLS over 10 
years ago.  They're really good.  Also, they have different 
sizes; from the itty bitty 7750 SR-1 (2ru) all the way to 
the BFR 7750 SR-12e (22ru)

https://onestore.nokia.com/asset/164728/Nokia_7750_SR_R15-1_Data_Sheet_EN.pdf

scott


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-21 Thread Large Hadron Collider
I would go as far as to say that Tier 1 is a derogatory designation, but 
I have a beef with Cogent because they're expecting otherwise Tier 1 
IPv6 ISP Hurricane Electric to bow to the altar of Cogent.



On 05/20/2018 15:19, Mark Tinka wrote:


On 20/May/18 09:16, Baldur Norddahl wrote:


The question was if downtime on a transit provider of many hours is
unacceptable. I am offering my experience that this happens to all of
them. Some of them can have problems that last days not hours. Do not
ever assume that a so called "tier 1" network is good as your only
transit.

And that is where the sage advice is...

Just because they are "large", "global", "transit-free",
"international", "Tier this or Tier that", don't think they are beyond
fault. And more importantly, don't allow your customers to assume they
are beyond fault, just because you aren't them.

Take control of your situation, especially if you can.

Mark.




Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread Mike Hammett
To circle back to the original post... Level 3 does have multiple routes out of 
Tampa. They just apparently don't use them all for their transit service. Why 
not? 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "valdis kletnieks" <valdis.kletni...@vt.edu> 
To: "Baldur Norddahl" <baldur.nordd...@gmail.com> 
Cc: nanog@nanog.org 
Sent: Sunday, May 20, 2018 5:43:42 PM 
Subject: Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in 
general) 

On Sun, 20 May 2018 09:16:25 +0200, Baldur Norddahl said: 

> He is complaining about AS3356 in specific and claiming they COULD 
> reroute around it but choose not to. This leads me to assume there are 
> alternatives. Two places, Miami and Texas, are mentioned and that a 
> double fault, one in Miami and another in Texas would bring down the 
> network. I am from Europe, but am I to believe that Miami and Texas (or 
> anywhere between those two) are served by only two fiber conduits? 

There's a difference between "route around it by flipping some BGP magic" and 
"route around it by digging a ditch to a third city". 

The fact that other places have other conduits doesn't change the fact that a 
given city may only have two physical conduits handy. Often, there are other 
*possible* paths that could be built out, but other providers have looked at 
the cost of digging a ditch from the city, out a third path, to their closest 
POP, and decided it's not economically feasible. You can only route across the 
fiber that's actually there and lit up. 

You're from Europe? OK, consider this setup: Andorra. Two providers, one of 
who backhaul that path all the way to Madrid, and the other that backhauls to 
Marseilles. Sure, there's other cities along the way, but there's no fiber path 
from where you are to there. For instance, the fiber path may run from Madrid 
to Zaragoza, where it splits 3 ways to Pamplona, Andorra, and Barcelona - but 
if Barcelona and Pamplona don't provide alternate paths out to the net, you're 
still going to Madrid. Meanwhile, other companies may provide service to lots 
of smaller places along the border on the Spain side, and other companies 
provide service to lots of places on the French side, but not into Andorra 
itself. 

You don't like that, consider any one of the many European cities that are in a 
deep river valley, so the only realistic ways to the outside world are 
"upstream" and "downstream". 

> The question was if downtime on a transit provider of many hours is 
> unacceptable. I am offering my experience that this happens to all of 
> them. Some of them can have problems that last days not hours. Do not 
> ever assume that a so called "tier 1" network is good as your only transit. 

The gotcha here is the very high danger than with only two paths out of the 
city, your second and third choices are fate-sharing with that Tier 1. If 
you're 
in Andorra, and you have 8 providers that share a path through a tunnel to 
Toulouse, 
and another 6 that share a bridge to Barcelona, you still have a problem. 

(That, and anybody who buys transit only from one Tier 1 is going to have 
a really hard time getting routes to the *rest* of the internet...) 




Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread valdis . kletnieks
On Sun, 20 May 2018 09:16:25 +0200, Baldur Norddahl said:

> He is complaining about AS3356 in specific and claiming they COULD
> reroute around it but choose not to. This leads me to assume there are
> alternatives. Two places, Miami and Texas, are mentioned and that a
> double fault, one in Miami and another in Texas would bring down the
> network. I am from Europe, but am I to believe that Miami and Texas (or
> anywhere between those two) are served by only two fiber conduits?

There's a difference between "route around it by flipping some BGP magic" and
"route around it by digging a ditch to a third city".

The fact that other places have other conduits doesn't change the fact that a
given city may only have two physical conduits handy.  Often, there are other
*possible* paths that could be built out, but other providers have looked at
the cost of digging a ditch from the city, out a third path, to their closest
POP, and decided it's not economically feasible.  You can only route across the
fiber that's actually there and lit up.

You're from Europe?  OK, consider this setup:  Andorra.  Two providers, one of
who backhaul that path all the way to Madrid, and the other that backhauls to
Marseilles. Sure, there's other cities along the way, but there's no fiber path
from where you are to there.  For instance, the fiber path may run from Madrid
to Zaragoza, where it splits 3 ways to Pamplona, Andorra, and Barcelona - but
if Barcelona and Pamplona don't provide alternate paths out to the net, you're
still going to Madrid. Meanwhile, other companies may provide service to lots
of smaller places along the border on the Spain side, and other companies
provide service to lots of places on the French side, but not into Andorra
itself.

You don't like that, consider any one of the many European cities that are in a
deep river valley, so the only realistic ways to the outside world are
"upstream" and "downstream".

> The question was if downtime on a transit provider of many hours is
> unacceptable. I am offering my experience that this happens to all of
> them. Some of them can have problems that last days not hours. Do not
> ever assume that a so called "tier 1" network is good as your only transit.

The gotcha here is the very high danger than with only two paths out of the
city, your second and third choices are fate-sharing with that Tier 1.  If 
you're
in Andorra, and you have 8 providers that share a path through a tunnel to 
Toulouse,
and another 6 that share a bridge to Barcelona, you still have a problem.

(That, and anybody who buys transit only from one Tier 1 is going to have
a really hard time getting routes to the *rest* of the internet...)



pgpPJE3sF9gLw.pgp
Description: PGP signature


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread Christopher Morrow
On Sun, May 20, 2018 at 12:33 PM Rubens Kuhl  wrote:
>
> CenturyLink bought Level 3, which bought Global Crossing, which bought
> Impsat; this makes every market unique, for the good and bad of it.
>
> What I have as a customer feeling is that Global Crossing was the most
> quality-minded of the 4, while the other 3 is/were more "take what we give
> you and shut up".
that might be a thing related to the time when GC was around individually
though, right?
they could have been considered 'boutique' network provider at the time...
The L3/GC merger was ~10 yrs ago? much has changed in the carrier space
since...
being bigger dpesn't often make companies higher touch :)


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread Rubens Kuhl
CenturyLink bought Level 3, which bought Global Crossing, which bought
Impsat; this makes every market unique, for the good and bad of it.

What I have as a customer feeling is that Global Crossing was the most
quality-minded of the 4, while the other 3 is/were more "take what we give
you and shut up".


Rubens


On Wed, May 16, 2018 at 1:59 PM, David Hubbard <
dhubb...@dino.hostasaurus.com> wrote:

> I’m curious if anyone who’s used 3356 for transit has found shortcomings
> in how their peering and redundancy is configured, or what a normal
> expectation to have is.  The Tampa Bay market has been completely down for
> 3356 IP services twice so far this year, each for what I’d consider an
> unacceptable period of time (many hours).  I’m learning that the entire
> market is served by just two fiber routes, through cities hundreds of miles
> away in either direction.  So, basically two fiber cuts, potentially 1000+
> miles apart, takes the entire region down.  The most recent occurrence was
> a week or so ago when a Miami-area cut and an Orange, Texas cut (1287
> driving miles apart) took IP services down for hours.  It did not take
> point to point circuits to out of market locations down, so that suggests
> they even have the ability to be more redundant and simply choose not to.
>
> I feel like it’s not unreasonable to expect more redundancy, or a much
> smaller attack surface given a disgruntled lineman who knows the routes
> could take an entire region down with a planned cut four states apart.
> Maybe other regions are better designed?  Or are my expectations
> unreasonable?  I carry three peers in that market, so it hasn’t been
> outage-causing, but I use 3356 in other markets too, and have plans for
> more, but it makes me wonder if I just haven't had the pleasure of similar
> outages elsewhere yet and I should factor that expectation into the
> design.  It creates a problem for me in one location where I can only get
> them and Cogent, since Cogent can't be relied on for IPv6 service, which I
> need.
>
> Thanks
>
>
>


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread joel jaeggli


On 5/17/18 6:24 AM, Mike Hammett wrote:
> I often question why\how people build networks the way they do. There's some 
> industry hard-on with having a few ginormous routers instead of many smaller 
> ones. I've learned that when building Internet Exchanges, the number of 
> networks that don't have BGP edge routers in major markets where they have a 
> presence is quite a bit larger than one would expect. I heard a podcast once 
> (I forget if it was Packet Pushers or Network Collective) postulating that 
> the reason why everything runs back to a few big ass routers is that someone 
> decided to spend a crap-load of money on big ass routers for bragging rights, 
> so now they have to run everything they can through them to A) "prove" their 
> purchase wasn't foolish and B) because they now can't afford to buy anything 
> else. 
There  seems to be a bit of overstatement with respect to how large
these are...

alcatel/nokia 7750 (L3's newer PE platform) is large but not outlandish
and they've been deployed for a couple years. it's relatively similar in
capacity  or a to to the devices that many of us interconnect with them
using.  Most of their customers probably though not always need less fib
then they need on a PE router.

There is a longer time-scale overhang from the choice to design of MPLS
core networks 15-20 years ago where PE routers have more to do fib wise
then do cores (which may well be larger and simpler, since most of what
they do is label switching), that drives the selection of what hardware
goes in the edge in ways than an IP only carrier might make different
choices (e.g. this big fib/queue/buffer router might have been a large
l3 switch).
> There's no reason why Tampa doesn't have a direct L3 adjacency to Miami, 
> Atlanta, Houston, and Charlotte over diverse infrastructure to all four. 
> Obviously there's room to add\drop from that list, but it gets the point 
> across. 
the number of paths available into and out a market seems somewhat
orthogonal to the number of routers.
>
>
> - 
> Mike Hammett 
> Intelligent Computing Solutions 
> http://www.ics-il.com 
>
> Midwest-IX 
> http://www.midwest-ix.com 
>
> - Original Message -
>
> From: "David Hubbard"  
> To: nanog@nanog.org 
> Sent: Wednesday, May 16, 2018 11:59:42 AM 
> Subject: Curiosity about AS3356 L3/CenturyLink network resiliency (in 
> general) 
>
> I’m curious if anyone who’s used 3356 for transit has found shortcomings in 
> how their peering and redundancy is configured, or what a normal expectation 
> to have is. The Tampa Bay market has been completely down for 3356 IP 
> services twice so far this year, each for what I’d consider an unacceptable 
> period of time (many hours). I’m learning that the entire market is served by 
> just two fiber routes, through cities hundreds of miles away in either 
> direction. So, basically two fiber cuts, potentially 1000+ miles apart, takes 
> the entire region down. The most recent occurrence was a week or so ago when 
> a Miami-area cut and an Orange, Texas cut (1287 driving miles apart) took IP 
> services down for hours. It did not take point to point circuits to out of 
> market locations down, so that suggests they even have the ability to be more 
> redundant and simply choose not to. 
>
> I feel like it’s not unreasonable to expect more redundancy, or a much 
> smaller attack surface given a disgruntled lineman who knows the routes could 
> take an entire region down with a planned cut four states apart. Maybe other 
> regions are better designed? Or are my expectations unreasonable? I carry 
> three peers in that market, so it hasn’t been outage-causing, but I use 3356 
> in other markets too, and have plans for more, but it makes me wonder if I 
> just haven't had the pleasure of similar outages elsewhere yet and I should 
> factor that expectation into the design. It creates a problem for me in one 
> location where I can only get them and Cogent, since Cogent can't be relied 
> on for IPv6 service, which I need. 
>
> Thanks 
>
>
>
>




Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread Mark Tinka


On 20/May/18 09:16, Baldur Norddahl wrote:

>
> The question was if downtime on a transit provider of many hours is
> unacceptable. I am offering my experience that this happens to all of
> them. Some of them can have problems that last days not hours. Do not
> ever assume that a so called "tier 1" network is good as your only
> transit.

And that is where the sage advice is...

Just because they are "large", "global", "transit-free",
"international", "Tier this or Tier that", don't think they are beyond
fault. And more importantly, don't allow your customers to assume they
are beyond fault, just because you aren't them.

Take control of your situation, especially if you can.

Mark.


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread Mark Tinka


On 19/May/18 22:28, Baldur Norddahl wrote:

> What happened to do not trust anyone? Create your own resiliency by being
> multihomed to as many transits you can afford.
>
> You need the ability to shutdown a transit that is having trouble. It
> happens to all of them.

Agreed.

Mark.


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-20 Thread Baldur Norddahl



Den 20/05/2018 kl. 05.43 skrev valdis.kletni...@vt.edu:

On Sat, 19 May 2018 22:28:07 +0200, Baldur Norddahl said:

What happened to do not trust anyone? Create your own resiliency by being
multihomed to as many transits you can afford.

Re-read what David Hubbard said:


unacceptable period of time (many hours).  I’m learning that the entire
market is served by just two fiber routes, through cities hundreds of miles
away in either direction.  So, basically two fiber cuts, potentially 1000+
miles apart, takes the entire region down.

If in fact there's only two fiber conduit approaches to the area,  he's
basically stuck no matter how many companies sell him bandwidth in those two
conduits. He can contract with 8 companies to have 4 paths through each
conduit, and 2 cable cuts *still* leave him dead in the water.


He is complaining about AS3356 in specific and claiming they COULD 
reroute around it but choose not to. This leads me to assume there are 
alternatives. Two places, Miami and Texas, are mentioned and that a 
double fault, one in Miami and another in Texas would bring down the 
network. I am from Europe, but am I to believe that Miami and Texas (or 
anywhere between those two) are served by only two fiber conduits? This 
would have several big states only connected two ways.


The question was if downtime on a transit provider of many hours is 
unacceptable. I am offering my experience that this happens to all of 
them. Some of them can have problems that last days not hours. Do not 
ever assume that a so called "tier 1" network is good as your only transit.


Also a total cut of from the world is the good kind of trouble they can 
have. That would just lead them to lose a large part of the global 
routing table. Your router will automatically choose one of your other 
transits. The bad kind of trouble is when they have packet loss to some 
few (but important) destinations and your customer thinks it is you that 
is having issues. And basically all you can do about it is to "shutdown" 
the session and wait until they fixed the issue.


I am offering the view that one might consider that kind of downtime 
unacceptable, but it is just a matter of fact that they all have it. The 
two options to avoid it is to buy from a smaller local ISP instead - one 
that has multiple transits. Or to have multiple transits yourself and be 
prepared to deal with it.


Regards

Baldur



Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-19 Thread valdis . kletnieks
On Sat, 19 May 2018 22:28:07 +0200, Baldur Norddahl said:
> What happened to do not trust anyone? Create your own resiliency by being
> multihomed to as many transits you can afford.

Re-read what David Hubbard said:

> unacceptable period of time (many hours).  I’m learning that the entire
> market is served by just two fiber routes, through cities hundreds of miles
> away in either direction.  So, basically two fiber cuts, potentially 1000+
> miles apart, takes the entire region down.

If in fact there's only two fiber conduit approaches to the area,  he's
basically stuck no matter how many companies sell him bandwidth in those two
conduits. He can contract with 8 companies to have 4 paths through each
conduit, and 2 cable cuts *still* leave him dead in the water.

(Bonus points for estimating the chances that at least one of those 8 companies
will do one or more of the following: (a) not knowing which conduit the path 
will
be in, (b) actively lie about the conduit in order to seal the deal, or (c) 
re-provision
the path several weeks later into the other conduit)

And he probably doesn't have the budget to dig a third trench several hundred
miles to a third city...



pgpcjk0wr7dpL.pgp
Description: PGP signature


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-19 Thread David Hubbard
Yes, I do, as stated in my initial email.  My inquiry is about whether this 
level of downtime, and lack of redundancy for a given region, is normal for 
3356.  There are some markets where diverse paths are not so easy to acquire.

From: Robert DeVita <radev...@mejeticks.com>
Sent: Saturday, May 19, 2018 5:36:23 PM
To: David Hubbard; nanog@nanog.org
Subject: Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in 
general)

If this is a know issue and has happened before and point to point circuits 
aren’t effected you always have the opportunity to diversify your own network 
and get private lines back to Miami, Jax, Atlanta or Dallas to create your own 
diversity don’t you?

Robert DeVita
Managing Director
Mejeticks
c. 469-441-8864
e. radev...@mejeticks.com
_
From: David Hubbard <dhubb...@dino.hostasaurus.com>
Sent: Wednesday, May 16, 2018 12:03 PM
Subject: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)
To: <nanog@nanog.org>


I’m curious if anyone who’s used 3356 for transit has found shortcomings in how 
their peering and redundancy is configured, or what a normal expectation to 
have is. The Tampa Bay market has been completely down for 3356 IP services 
twice so far this year, each for what I’d consider an unacceptable period of 
time (many hours). I’m learning that the entire market is served by just two 
fiber routes, through cities hundreds of miles away in either direction. So, 
basically two fiber cuts, potentially 1000+ miles apart, takes the entire 
region down. The most recent occurrence was a week or so ago when a Miami-area 
cut and an Orange, Texas cut (1287 driving miles apart) took IP services down 
for hours. It did not take point to point circuits to out of market locations 
down, so that suggests they even have the ability to be more redundant and 
simply choose not to.

I feel like it’s not unreasonable to expect more redundancy, or a much smaller 
attack surface given a disgruntled lineman who knows the routes could take an 
entire region down with a planned cut four states apart. Maybe other regions 
are better designed? Or are my expectations unreasonable? I carry three peers 
in that market, so it hasn’t been outage-causing, but I use 3356 in other 
markets too, and have plans for more, but it makes me wonder if I just haven't 
had the pleasure of similar outages elsewhere yet and I should factor that 
expectation into the design. It creates a problem for me in one location where 
I can only get them and Cogent, since Cogent can't be relied on for IPv6 
service, which I need.

Thanks






Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-19 Thread Baldur Norddahl
What happened to do not trust anyone? Create your own resiliency by being
multihomed to as many transits you can afford.

You need the ability to shutdown a transit that is having trouble. It
happens to all of them.

Regards
Baldur


ons. 16. maj 2018 19.02 skrev David Hubbard :

> I’m curious if anyone who’s used 3356 for transit has found shortcomings
> in how their peering and redundancy is configured, or what a normal
> expectation to have is.  The Tampa Bay market has been completely down for
> 3356 IP services twice so far this year, each for what I’d consider an
> unacceptable period of time (many hours).  I’m learning that the entire
> market is served by just two fiber routes, through cities hundreds of miles
> away in either direction.  So, basically two fiber cuts, potentially 1000+
> miles apart, takes the entire region down.  The most recent occurrence was
> a week or so ago when a Miami-area cut and an Orange, Texas cut (1287
> driving miles apart) took IP services down for hours.  It did not take
> point to point circuits to out of market locations down, so that suggests
> they even have the ability to be more redundant and simply choose not to.
>
> I feel like it’s not unreasonable to expect more redundancy, or a much
> smaller attack surface given a disgruntled lineman who knows the routes
> could take an entire region down with a planned cut four states apart.
> Maybe other regions are better designed?  Or are my expectations
> unreasonable?  I carry three peers in that market, so it hasn’t been
> outage-causing, but I use 3356 in other markets too, and have plans for
> more, but it makes me wonder if I just haven't had the pleasure of similar
> outages elsewhere yet and I should factor that expectation into the
> design.  It creates a problem for me in one location where I can only get
> them and Cogent, since Cogent can't be relied on for IPv6 service, which I
> need.
>
> Thanks
>
>
>


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-18 Thread Stephen Satchell

On 05/18/2018 04:20 AM, Tom Hill wrote:

On 17/05/18 14:24, Mike Hammett wrote:

There's some industry hard-on with having a few ginormous routers instead of 
many smaller ones.


"Industry hard-on", ITYM "Greedy vendors".


I think this view (both versions) are a little over the top.  "Never 
attribute to malice that which can be explained by stupidity."


The "stupidity" in this instance is poor market analysis, perhaps with 
the market research folks concentrating on large service provider 
customers at the expense of enterprise customers with very, very large 
data traffic needs but fewer ports per location.


They could also be concentrating on the very large providers working on 
the theory that the rate of return on boxes requiring a fork lift to 
install is higher than the rate of return on the 1U or 2U variety.



Try finding a 'small' router with a lot of ports (1 & 10GE) for your
customers, and the right features/TCAM/CP performance, for a price that
permits you to buy a lot of them.


What happened when you sent out your last RPQ to the vendors with these 
requirements?




Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-18 Thread Tom Hill
On 17/05/18 14:24, Mike Hammett wrote:
> There's some industry hard-on with having a few ginormous routers instead of 
> many smaller ones.

"Industry hard-on", ITYM "Greedy vendors".

Try finding a 'small' router with a lot of ports (1 & 10GE) for your
customers, and the right features/TCAM/CP performance, for a price that
permits you to buy a lot of them.

-- 
Tom


Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-17 Thread Mike Hammett
I often question why\how people build networks the way they do. There's some 
industry hard-on with having a few ginormous routers instead of many smaller 
ones. I've learned that when building Internet Exchanges, the number of 
networks that don't have BGP edge routers in major markets where they have a 
presence is quite a bit larger than one would expect. I heard a podcast once (I 
forget if it was Packet Pushers or Network Collective) postulating that the 
reason why everything runs back to a few big ass routers is that someone 
decided to spend a crap-load of money on big ass routers for bragging rights, 
so now they have to run everything they can through them to A) "prove" their 
purchase wasn't foolish and B) because they now can't afford to buy anything 
else. 

There's no reason why Tampa doesn't have a direct L3 adjacency to Miami, 
Atlanta, Houston, and Charlotte over diverse infrastructure to all four. 
Obviously there's room to add\drop from that list, but it gets the point 
across. 



- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "David Hubbard"  
To: nanog@nanog.org 
Sent: Wednesday, May 16, 2018 11:59:42 AM 
Subject: Curiosity about AS3356 L3/CenturyLink network resiliency (in general) 

I’m curious if anyone who’s used 3356 for transit has found shortcomings in how 
their peering and redundancy is configured, or what a normal expectation to 
have is. The Tampa Bay market has been completely down for 3356 IP services 
twice so far this year, each for what I’d consider an unacceptable period of 
time (many hours). I’m learning that the entire market is served by just two 
fiber routes, through cities hundreds of miles away in either direction. So, 
basically two fiber cuts, potentially 1000+ miles apart, takes the entire 
region down. The most recent occurrence was a week or so ago when a Miami-area 
cut and an Orange, Texas cut (1287 driving miles apart) took IP services down 
for hours. It did not take point to point circuits to out of market locations 
down, so that suggests they even have the ability to be more redundant and 
simply choose not to. 

I feel like it’s not unreasonable to expect more redundancy, or a much smaller 
attack surface given a disgruntled lineman who knows the routes could take an 
entire region down with a planned cut four states apart. Maybe other regions 
are better designed? Or are my expectations unreasonable? I carry three peers 
in that market, so it hasn’t been outage-causing, but I use 3356 in other 
markets too, and have plans for more, but it makes me wonder if I just haven't 
had the pleasure of similar outages elsewhere yet and I should factor that 
expectation into the design. It creates a problem for me in one location where 
I can only get them and Cogent, since Cogent can't be relied on for IPv6 
service, which I need. 

Thanks 





Re: Curiosity about AS3356 L3/CenturyLink network resiliency (in general)

2018-05-16 Thread Mark Tinka


On 16/May/18 18:59, David Hubbard wrote:

> I’m curious if anyone who’s used 3356 for transit has found shortcomings in 
> how their peering and redundancy is configured, or what a normal expectation 
> to have is.  The Tampa Bay market has been completely down for 3356 IP 
> services twice so far this year, each for what I’d consider an unacceptable 
> period of time (many hours).  I’m learning that the entire market is served 
> by just two fiber routes, through cities hundreds of miles away in either 
> direction.  So, basically two fiber cuts, potentially 1000+ miles apart, 
> takes the entire region down.  The most recent occurrence was a week or so 
> ago when a Miami-area cut and an Orange, Texas cut (1287 driving miles apart) 
> took IP services down for hours.  It did not take point to point circuits to 
> out of market locations down, so that suggests they even have the ability to 
> be more redundant and simply choose not to.
>
> I feel like it’s not unreasonable to expect more redundancy, or a much 
> smaller attack surface given a disgruntled lineman who knows the routes could 
> take an entire region down with a planned cut four states apart.  Maybe other 
> regions are better designed?  Or are my expectations unreasonable?  I carry 
> three peers in that market, so it hasn’t been outage-causing, but I use 3356 
> in other markets too, and have plans for more, but it makes me wonder if I 
> just haven't had the pleasure of similar outages elsewhere yet and I should 
> factor that expectation into the design.  It creates a problem for me in one 
> location where I can only get them and Cogent, since Cogent can't be relied 
> on for IPv6 service, which I need.

Are Century Link your only option?

Mark.