Re: Facebook post-mortems...

2021-10-06 Thread Bjørn Mork
Masataka Ohta writes: > Bjørn Mork wrote: > >> Removing all DNS servers at the same time is never a good idea, even in >> the situation where you believe they are all failing. > > As I wrote: > > : That facebook use very short expiration period for zone > : data is a separate issue. > > that is a

Re: Better description of what happened

2021-10-06 Thread Curtis Maurand
On 10/5/21 5:51 AM, scott wrote: On 10/5/21 8:39 PM, Michael Thomas wrote: This bit posted by Randy might get lost in the other thread, but it appears that their DNS withdraws BGP routes for prefixes that they can't reach or are flaky it seems. Apparently that goes for the prefixes that

Re: Better description of what happened

2021-10-06 Thread Tom Beecher
By what they have said publicly, the initial trigger point was that all of their datacenters were disconnected from their internal backbone, thus unreachable. Once that occurs, nothing else really matters. Even if the external announcements were not withdrawn, and the edge DNS servers could

Cloudflare contact

2021-10-06 Thread David Brain
Hi, Is there a contact/procedure with Cloudflare to address the issue of users getting CAPCHAs when accessing sites behind CloudFlare. We are seeing this impacting some of our user base, and it has been persisting for a number of days. Thanks, David. -- David Brain - MCNC - 919.248.1998

Re: Better description of what happened

2021-10-06 Thread Tom Beecher
I mean, at the end of the day they likely designed these systems to be able to handle one or more datacenters being disconnected from the world, and considered a scenario of ALL their datacenters being disconnected from the world so unlikely they chose not to solve for it. Works great, until it

Re: DNS pulling BGP routes?

2021-10-06 Thread Jared Mauch
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment. It doesn’t have to be externally visible like this event for that to be the case. I would say more like Application availability caused the BGP routes to be withdrawn.

Re: DNS pulling BGP routes?

2021-10-06 Thread Matthew Petach
On Wed, Oct 6, 2021 at 10:45 AM Michael Thomas wrote: > So if I understand their post correctly, their DNS servers have the > ability to withdraw routes if they determine are sub-optimal (fsvo). I > can certainly understand for the DNS servers to not give answers they > think are unreachable but

Re: DNS pulling BGP routes?

2021-10-06 Thread Jon Lewis
On Wed, 6 Oct 2021, Michael Thomas wrote: So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always

Re: Better description of what happened

2021-10-06 Thread Bjørn Mork
Tom Beecher writes: > Even if the external > announcements were not withdrawn, and the edge DNS servers could provide > stale answers, the IPs those answers provided wouldn't have actually been > reachable Do we actually know this wrt the tools referred to in "the total loss of DNS broke many

NANOG 83 Sneak Peak + Safety Concerns

2021-10-06 Thread Nanog News
Sneak Peak of NANOG 83Register Now for Incredible Programming, Network Opportunities + More *The anticipation is killing us!* We are only three short weeks away from our next meeting and we cannot wait to see you! NANOG 83 will take place online + in person in Minneapolis, MN, from Nov. 1 - 3.

Re: DNS pulling BGP routes?

2021-10-06 Thread Sabri Berisha
- On Oct 6, 2021, at 10:42 AM, Michael Thomas m...@mtcc.com wrote: Hi, > My guess is that their post while more clear that most doesn't go into > enough detail, but is it me or does it seem like this is a really weird > thing to do? In large environments, it's not uncommon to have DNS

Re: DNS pulling BGP routes?

2021-10-06 Thread J. Hellenthal via NANOG
They most likely sent an update to the DNS servers for TLV DNSSEC and in oversight forgot they needed to null something's out of the workbook to not touch the BGP instances. I'd hardly believe that would be triggered by the dns server itself. -- J. Hellenthal The fact that there's a highway

Re: Better description of what happened

2021-10-06 Thread PJ Capelli via NANOG
I probably still have my US Robotics 14.4 in the basement, but it's been awhile since I've had access to a POTS line it would work on ... :) pj capelli pjcape...@pm.me "Never to get lost, is not living" - Rebecca Solnit Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On

DNS pulling BGP routes?

2021-10-06 Thread Michael Thomas
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned

Re: DNS pulling BGP routes?

2021-10-06 Thread Blake Dunlap
Yes, it really is common to announce sink routes via bgp from destination services / proxies and to have those announcements be dynamically based on service viability. On Wed, Oct 6, 2021, 12:56 Jared Mauch wrote: > This is quite common to tie an underlying service announcement to BGP >

Re: S.Korea broadband firm sues Netflix after traffic surge

2021-10-06 Thread Owen DeLong via NANOG
The bottom line problem is that we have allowed vertical integration to allow the natural monopoly that exists in last mile infrastructure in most locations to be leveraged into an effective full-stack monopoly for those same players. Lack of competition in the last-mile/eyeball space has

Re: DNS pulling BGP routes?

2021-10-06 Thread Jon Lewis
On Wed, 6 Oct 2021, Michael Thomas wrote: On 10/6/21 3:33 PM, Jon Lewis wrote: On Wed, 6 Oct 2021, Michael Thomas wrote:  People have been anycasting DNS server IPs for years (decades?). So, no. But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really

Re: DNS pulling BGP routes?

2021-10-06 Thread Jon Lewis
On Wed, 6 Oct 2021, Michael Thomas wrote: People have been anycasting DNS server IPs for years (decades?). So, no. But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense that they'd pull out if they couldn't

Re: DNS pulling BGP routes?

2021-10-06 Thread Michael Thomas
On 10/6/21 2:33 PM, William Herrin wrote: On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas wrote: So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). The servers' IP addresses are anycasted. When one data

Re: Better description of what happened

2021-10-06 Thread Hugo Slabbert
> > Do we actually know this wrt the tools referred to in "the total loss of > DNS broke many of the tools we’d normally use to investigate and resolve > outages like this."? Those tools aren't necessarily located in any of > the remote data centers, and some of them might even refer to resources

Re: DNS pulling BGP routes?

2021-10-06 Thread William Herrin
On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas wrote: > > So if I understand their post correctly, their DNS servers have the > ability to withdraw routes if they determine are sub-optimal (fsvo). The servers' IP addresses are anycasted. When one data center determines itself to be

Re: DNS pulling BGP routes?

2021-10-06 Thread Jon Lewis
On Wed, 6 Oct 2021, Michael Thomas wrote: On 10/6/21 2:33 PM, William Herrin wrote: On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas wrote: So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). The

Re: DNS pulling BGP routes?

2021-10-06 Thread Michael Thomas
On 10/6/21 2:58 PM, Jon Lewis wrote: On Wed, 6 Oct 2021, Michael Thomas wrote: On 10/6/21 2:33 PM, William Herrin wrote:  On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas wrote:  So if I understand their post correctly, their DNS servers have the  ability to withdraw routes if they

Re: DNS pulling BGP routes?

2021-10-06 Thread Michael Thomas
On 10/6/21 3:33 PM, Jon Lewis wrote: On Wed, 6 Oct 2021, Michael Thomas wrote:  People have been anycasting DNS server IPs for years (decades?). So, no. But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense

Re: DNS pulling BGP routes?

2021-10-06 Thread Mark Tinka
On 10/7/21 00:37, Michael Thomas wrote: Maybe the problem here is that two things happened and the article conflated the two: the DNS infrastructure pulled its routes from the anycast address and something else pulled all of the other routes but wasn't mentioned in the article. The

Re: DNS pulling BGP routes?

2021-10-06 Thread Mark Tinka
On 10/7/21 00:22, Michael Thomas wrote: But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense that they'd pull out if they couldn't contact the backend. But I thought that almost all of their routes to the

Another Perspective - Kentik's View on the Facebook Outage

2021-10-06 Thread Mark Tinka
https://www.kentik.com/blog/facebooks-historic-outage-explained/?mkt_tok=ODY5LVBBRC04ODcAAAF_9diyPhC6WFsJNFpS4z2ggF-DWEc6FksyD3aW8B3am5tUvtTOJDYl2MIgMdsmmqOLTL-BpUugbQHAIXCT677LE0OxHM8Dy-mqRorJQXnAjg4 Mark.

Re: DNS pulling BGP routes?

2021-10-06 Thread Masataka Ohta
Jared Mauch wrote: This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment. Yes, that is a commonly seen mistake with anycast. I would say more like Application availability caused the BGP routes to be withdrawn. Considering

Re: Facebook post-mortems...

2021-10-06 Thread Bjørn Mork
Masataka Ohta writes: > As long as name servers with expired zone data won't serve > request from outside of facebook, whether BGP routes to the > name servers are announced or not is unimportant. I am not convinced this is true. You'd normally serve some semi-static content, especially wrt

Re: Disaster Recovery Process

2021-10-06 Thread Wolfgang Tremmel
And a layer 8 item from me: - put a number (as in money) into the process up to that anything spend by anyone working on the recovery is covered. Has to be a number because if you write "all cost are covered" it makes the recovery person 2nd-guess if the airplane ticket or spare part he just

Re: Facebook post-mortems...

2021-10-06 Thread Masataka Ohta
Hank Nussbacher wrote: - "it was not possible to access our data centers through our normal means because their networks were down, and second, the total loss of DNS broke many of the internal tools we'd normally use to investigate and resolve outages like this. Our primary and out-of-band

Re: Facebook post-mortems...

2021-10-06 Thread Masataka Ohta
Bjørn Mork wrote: Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing. As I wrote: : That facebook use very short expiration period for zone : data is a separate issue. that is a separate issue. > This is a very hard