Hesham, Thank you very much for the review and the reference. your reference is mainly about practices to enhance the resilience from Applications perspective.
The section 3.2 of the draft-ietf-rtgwg-net2cloud-problem-statement is more from network infrastructure perspective on how to quickly & effectively propagate fault notifications to all the involved entities when a large number of apps being impacted by a site failure or degradation. Thanks, Linda From: Hesham ElBakoury <[email protected]> Sent: Tuesday, January 31, 2023 11:34 AM To: Linda Dunbar <[email protected]> Cc: opsawg <[email protected]>; [email protected] Subject: Re: Solicit feedback for the cloud site failure impact to forwarding for workloads hosted in Cloud DCs described in draft-ietf-rtgwg-net2cloud-problem-statement Hi Linda, I am using Google cloud. I can't talk about other cloud providers such as MSFT Azure and Amazon AWS. This page describes Google cloud resilience. Hope it provides useful info for you: https://cloud.google.com/architecture/disaster-recovery#resilience_and_availability_approach<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud.google.com%2Farchitecture%2Fdisaster-recovery%23resilience_and_availability_approach&data=05%7C01%7Clinda.dunbar%40futurewei.com%7Cf0859edb5ac449391b5208db03b1697a%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C638107832768081740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=JBLLHQw7I%2BuyFNBE3Fh6lTDNpbwcBTZA9R%2FjTH0GSJ4%3D&reserved=0> Hesham On Mon, Jan 30, 2023, 2:31 PM Linda Dunbar <[email protected]<mailto:[email protected]>> wrote: Opsawg, Section 3.2 of https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fdraft-ietf-rtgwg-net2cloud-problem-statement%2F&data=05%7C01%7Clinda.dunbar%40futurewei.com%7Cf0859edb5ac449391b5208db03b1697a%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C638107832768081740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=5vgOhsyHe8EAjzh58%2F8E1jfbewghr41pTOrV5Dtqj94%3D&reserved=0> describes the Cloud Site failure impact to traffic to/from the enterprises' workloads hosted in Cloud DCs. We really appreciate your feedback to this description. ---------- 3.2. Site failures and Methods to Minimize Impacts Site failures include, but not limited to, a site capacity degradation or entire site going down caused by a variety of reasons, such as fiber cut connecting to the site or among pods within the site, cooling failures, insufficient backup power, cyber threats attacks, too many changes outside of the maintenance window, etc. Fiber-cut is not uncommon within a Cloud site or between sites. As described in RFC7938, Cloud DC BGP might not have an IGP to route around link/node failures within the ASes. When those failure events happen, the Cloud DC GW which is visible to clients are running fine. Therefore, the Client GW can't use BFD to detect the failures. When a site capacity degrades or goes dark, there are massive numbers of routes needing to be changed. The large number of routes switching over to another site can also cause overloading that triggers more failures. In addition, the routes (IP addresses) in a Cloud DC cannot be aggregated nicely, triggering very large number of BGP UPDATE messages when a failure occurs. It might be more effective to do mass reroute, similar to EVPN [RFC7432] defined mass withdraw mechanism to signal a large number of routes being changed to remote PE nodes as quickly as possible. ------------------------------------- Thank you very much Linda Dunbar _______________________________________________ rtgwg mailing list [email protected]<mailto:[email protected]> https://www.ietf.org/mailman/listinfo/rtgwg<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Frtgwg&data=05%7C01%7Clinda.dunbar%40futurewei.com%7Cf0859edb5ac449391b5208db03b1697a%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C638107832768081740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=hyimxmMA41W9tWE4eybtq144lVTcwdztR23zKBGgUqQ%3D&reserved=0>
_______________________________________________ rtgwg mailing list [email protected] https://www.ietf.org/mailman/listinfo/rtgwg
