Re: Need your help to make sure the draft-ietf-rtgwg-net2cloud-problem-statement readability is good.

Joel Halpern Tue, 22 Aug 2023 16:02:55 -0700

I think I now understand your point. As a problem statement draft, Iwould replace the detailed description of the specific proposal with amore generic "There are proposals to enhance BGP advertisements toaddress this problem."


Yours,


Joel

On 8/22/2023 6:34 PM, Linda Dunbar wrote:

Joel,
I see your points. Please see my explanation below quoted by <ld> </ld>.
*From:* Joel Halpern <jmh.dir...@joelhalpern.com>
*Sent:* Monday, August 21, 2023 11:34 PM
*To:* Linda Dunbar <linda.dun...@futurewei.com>
*Cc:* rtgwg-chairs <rtgwg-cha...@ietf.org>;draft-ietf-rtgwg-net2cloud-problem-statement....@ietf.org; rtgwg@ietf.org*Subject:* Re: Need your help to make sure thedraft-ietf-rtgwg-net2cloud-problem-statement readability is good.Thank you Linda. Trimmed the agreements, including acceptable textfrom your reply. Leaving the two points that can benefit from alittle more tuning.
Marked <jmh2></jmh2>
Yours,
Joel
On 8/22/2023 12:12 AM, Linda Dunbar wrote:
Similarly, section 3.2 looks like it could apply to any operator. Thereference to the presence or absence of IGPs seems largely irrelevantto the question of how partial failures of a facility are detected anddealt with.[Linda] Two reasons that the site failure described in Section 3.2 donot apply to other networks:
 1. One DC can have many server racks concentrated in a small area
    which can fail by one single event. Vs. Regular network failure at
    one location only impact the routers at the location, which
    quickly triggers the services switched to the protection paths.
 2. Regular networks run IGP, which can propagate inner fiber cut
    failures quickly to the edge. While as many DCs don’t run IGP.
<jmh>Given that even a data center has to deal with internal failures,and that even traditional ISPs have to deal with partitioningfailures, I don't think the distinction you are drawing in thissection really exists. If it does, you need to provide strongerjustification. Also, not all public DCs have chosen to use just BGP,although I grant that many have. I don't think you want to argue thatthe folks who have chosen to use BGP are wrong. </jmh>
<ld> Are you referring to Network-Partitioning Failures in Cloud Systems?
Traditional ISPs don’t host end services; they are responsible fortransporting packets; therefore protection path can reroute packets .But Cloud DC site/PoD failure causing all the hosts (prefixes) nolonger reachable </ld><jmh2> If a DC Site fails, the services failed too. Yes, the DCoperator has to reinstantiate them. But that is way outside ourscope. To the degree that they can recover by rerouting to otherinstances (whether using anycast or some other trick) it looks justlike routing around failures in other case, which BGP and IGPs cando. I am still not seeing how this justifies any special mechanisms.</jmh2>
<ld>
You are correct that the protection is the same as the regular ISPnetworks.
The paragraph is intended to say the following:
When a site failure occurs, many instances can be impacted. When theimpacted instances’ IP prefixes in a Cloud DC are not aggregatednicely, which is very common, one single site failure can trigger ahuge number of BGP UPDATE messages. Instead of many BGP UPDATEmessages to the ingress routers for all the instances impacted,[METADATA-PATH] proposes one single BGP UPDATE indicating the sitefailure. The ingress routers can switch all the instances that areassociated with the site.
</ld>

_______________________________________________
rtgwg mailing list
rtgwg@ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Need your help to make sure the draft-ietf-rtgwg-net2cloud-problem-statement readability is good.

Reply via email to