Joel,

Thank you very much for the quick feedback.
My questions and replies are inserted below with <ld> </ld> (in purple text).

Linda

From: Joel Halpern <[email protected]>
Sent: Monday, August 21, 2023 6:15 PM
To: Linda Dunbar <[email protected]>
Cc: rtgwg-chairs <[email protected]>; 
[email protected]
Subject: Re: Need your help to make sure the 
draft-ietf-rtgwg-net2cloud-problem-statement readability is good.


Thank you for responding so promptly upon your return from PTO.  (I should take 
more time off myself.)

I will annotate in line, with <jmh></jmh>.  (The current IETF discussion about 
differing markup techniques creating difficulty in following responses 
exemplifies why I have adopted this practice.)

For this version, I will include agreements.  Feel free to remove those for 
followup.

Yours,

Joel
On 8/21/2023 6:45 PM, Linda Dunbar wrote:
Joel,

Thank you very much for the valuable feedback. Sorry I was on vacation without 
internet last week, just got around to study your comments.

Changes to the document to address your comments and questions are inserted 
below.

Attached is the document with change bar enabled. Once it is okay with you, we 
will upload.
Linda

From: Joel Halpern <[email protected]><mailto:[email protected]>
Sent: Monday, August 14, 2023 3:26 PM
To: Linda Dunbar <[email protected]><mailto:[email protected]>
Cc: rtgwg-chairs <[email protected]><mailto:[email protected]>
Subject: Re: Need your help to make sure the 
draft-ietf-rtgwg-net2cloud-problem-statement readability is good.

I have read over the draft.  The following comments may be helpful to you.
Major:
If you are going to use the term SD-WAN as central to the definition of 
controller, you need to provide a citation and definition of SD-WAN.
[Linda] As SD-WAN is so widely used, using any vendor's SD-WAN definition is 
inappropriate. How about using Gartner's SD-WAN definition or MEF's SD-WAN? Do 
you have any preference?
Gartner's SD-WAN definition 
(https://www.gartner.com/en/information-technology/glossary/software-defined-wan-sd-wan#:~:text=Software%2DDefined%20WAN%20(SD%2DWAN),-SD%2DWAN%20solutions&text=SD%2DWAN%20provides%20dynamic%2C%20policy,as%20WAN%20optimization%20and%20firewalls<https://www.gartner.com/en/information-technology/glossary/software-defined-wan-sd-wan#:~:text=Software-Defined%20WAN%20(SD-WAN),-SD-WAN%20solutions&text=SD-WAN%20provides%20dynamic%2C%20policy,as%20WAN%20optimization%20and%20firewalls>.
 )
"SD-WAN provides dynamic, policy-based, application path selection across 
multiple WAN connections and supports service chaining for additional services 
such as WAN optimization and firewalls."

MEF (70.1) has the definition for SD-WAN Services 
(https://www.mef.net/wp-content/uploads/MEF_70.1.pdf ):
"An overlay connectivity service that optimizes transport of IP Packets over 
one or more Underlay Connectivity Services by recognizing applications 
(Application Flows) and determining forwarding behavior by applying Policies to 
them."
<jmh>I slightly prefer to MEF definition, but could live with either one.  My 
concern is merely taht there be a definition. </jmh>
<ld> will use MEF's definition then </ld>

Also, if you are going to claim that controller is interchangeable with SD-WAN 
controller you need to explain why.   The definition seems to imply control 
over something very specific, whereas data center controllers, and even SDN 
Controllers, mean something far more general.
[Linda] As the section on "controller" has been removed in a previous revision 
of the document, the definition is no longer needed. We can remove the 
definition for next revision (-29). Does it address your concern?

<jmh>That works fine, thank you. </jmh>
I find the beginning of section 3.1 rather confusing.  You seem to be trying to 
distinguish between classical ISP peering policies and Public Data Center 
peering policies.  But the text simply does not match the broad range of 
practices in either category.    The issues that are then discussed do not seem 
to be specific to Cloud Data Centers.  They look like advice that should be 
vetted with the IDR working group, and apply to many different kinds of 
operators.
[Linda] This document is intended to describe the network-related problems for 
connecting to public (Cloud) DCs and mitigation practices. Several IETF 
solution drafts are being proposed in the relevant WGs, such as in 
draft-ietf-idr-sdwan-edge-discovery, draft-ietf-idr-5g-edge-service-metadata, 
draft-ietf-bess-secure-evpn, draft-dmk-rtgwg-multisegment-sdwan, etc. Having 
one document describing the problems and referencing all the relevant solutions 
being developed by IETF can make it easier for the implementers, even though 
some of the solutions can apply to general provider networks in addition to the 
Public DCs.
<jmh>Many traditional ISPs end up peering with a lot of people.  I think the 
difference is subtler.   I think the argument you are trying to make is "Where 
traditional ISPs view peering as a means to improve network operations, Public 
Data Centers which offer direct peering view that peering as  a tool to get 
more customers able to make good use of their data center.  As such, there is 
pressure to peer more widely and to peer with customers include those who lack 
the expertise and experience in running complex BGP peering relationships."  
Therefore, the issues you raise are more urgent than they have traditionally 
been, even though they may apply to traditional ISPs to a lesser degree.  Is 
that what you are trying to say? </jmh>

<ld> Yes. Your wording is accurate.  How about changing the paragraph to the 
following?

"Where traditional ISPs view peering as a means to improve network operations, 
Public Cloud DCs offer direct peering to get more customers to use their data 
centers and services. As such, there is pressure to peer more widely and to 
peer with customers, including those who lack the expertise and experience in 
running complex BGP peering relationships. All those can contribute to 
increased BGP peering errors such as capability mismatch, unwanted route leaks, 
missing Keepalives, and errors causing BGP ceases. Capability mismatch can 
cause BGP sessions not to be adequately established.
Those issues are more acute to Cloud DCs than they have traditionally been, 
even though they may apply to traditional ISPs, just to a lesser degree."

</ld>.



Similarly, section 3.2 looks like it could apply to any operator.  The 
reference to the presence or absence of IGPs seems largely irrelevant to the 
question of how partial failures of a facility are detected and dealt with.
[Linda] Two reasons that the site failure described in Section 3.2 do not apply 
to other networks:

  *   One DC can have many server racks concentrated in a small area which can 
fail by one single event. Vs. Regular network failure at one location only 
impact the routers at the location, which quickly triggers the services 
switched to the protection paths.
  *   Regular networks run IGP, which can propagate inner fiber cut failures 
quickly to the edge. While as many DCs don't run IGP.
<jmh>Given that even a data center has to deal with internal failures, and that 
even traditional ISPs have to deal with partitioning failures, I don't think 
the distinction you are drawing in this section really exists.  If it does, you 
need to provide stronger justification.  Also, not all public DCs have chosen 
to use just BGP, although I grant that many have. I don't think you want to 
argue that the folks who have chosen to use BGP are wrong.  </jmh>

<ld> Are you referring to Network-Partitioning Failures in Cloud Systems?
Traditional ISPs don't host end services; they are responsible for transporting 
packets;  therefore protection path can reroute packets . But Cloud DC site/PoD 
failure causing all the hosts (prefixes) no longer reachable </ld>


Given the issues that have been raised with injecting service information into 
underlay routing, I strongly recommend removing the last paragraph of section 
3.4 (the discussion of "[METADATA-PATH]")
[Linda] [METADATA-PATH] is intended for information exchange between the egress 
routers (i.e., the Cloud GW) and the ingress routers. Not for injecting into 
the underlay networks.
How about we make the last sentence more explicit to emphasize that the 
information exchanges are NOT to be injected to the underlay?
"[METADATA-PATH] extends the BGP UPDATE messages for a Cloud GW to propagate 
the edge service-related metrics from Cloud GW to the ingress routers so that 
the ingress routers can incorporate the destination site's capabilities with 
the routing distance in computing the optimal paths."
<jmh>That helps a great deal.   Would you consider adding another sentence: 
"The CATS working group is examining general aspects of this space, and may 
come up with protocol recommendations for this information exchange."?<j/mh>
<ld> great suggestion. Added </ld>


The assumption in section 3.7 that one can and should put geo-location into 
fields in the IP header seems to be quite controversial and difficult to 
justify at this time in a document intended to be an informational RFC.  This 
assumption is again made at the end of section 4.3.
[Linda] It is common for enterprises to instantiate their own virtual CPEs in 
Cloud DC sites. It is also common for the virtual CPEs in Cloud DC to add 
GENEVE encapsulation using its vendor-specific GENEVE Option Class 
(https://www.iana.org/assignments/nvo3/nvo3.xhtml ) to carry its additional 
proprietary information about the sites and others. How about changing the text 
to the following?
"For enterprises that instantiate virtual routers in Cloud DCs, metadata can be 
attached (e.g., GENEVE header or IPv6 optional header) to indicate additional 
properties about the sites where they are instantiated".
<jmh>That does help.  How about s/about the sites/including where useful about 
the sites/ ?</jmh>
<ld>  do you mean the "including useful information about the sites"?  I can't 
quite parse the phrase "where useful about the sites".  How about the text 
below?

"For enterprises that instantiate virtual routers in Cloud DCs, metadata can be 
attached (e.g., GENEVE header or IPv6 optional header) to indicate additional 
properties, including useful information about the sites".
</ld?

The assertion in section 4.3 that "private VPN networks" provide higher quality 
of service is not inherent.  Customers may pay operators additional fees for 
higher quality services.  But it is not inherent in private VPNs (particularly 
not in L3 MPLS VPNs) that they provide any quality differentiation.
[Linda] How about changing to the following text?
"As premium paid services, traditional private VPNs, including private circuits 
or MPLS-based L2/L3 VPNs, have been widely deployed as an effective way to 
support businesses and organizations that require network performance and 
reliability."
<jmh>Close.  How about starting the sentence "Private VPNS, including private 
ciruit or MPLS-based L2/L3 VPNS, when purchased with premium paid services, 
have been deployed as an effective way ..." </jmh>
<ld>  good suggestion. Changed </ld>
Minor:
The introduction reads "it is desirable for enterprises to instantiate 
applications and workloads in locations close to their end users."  This seems 
a much stronger statement than operational reality indicates.  There are some 
enterprises and some applications or workloads where it may be desirable to 
achieve such proximity.  But it is not at all universal.  Similarly,, while 
there may be workloads that want to move to follow mobile users, there are many 
workloads where that would be an actively bad idea.       Note that the text in 
section 3.4 seems fine.  I would suggest simply removing the discussion of 
placement from the introduction.
[Linda] The flexibility of placing applications close to end-users is one of 
the key advantages of Cloud DC, especially Edge Clouds. How about changing to 
the following text?
"With the advent of widely available Cloud data centers (DC) providing services 
in diverse geographic locations and advanced tools for monitoring and 
predicting application behaviors, it is desirable for enterprises to 
instantiate applications and workloads in Cloud DCs. Some enterprises prefer 
that their specific applications be located close to their end users, as the 
proximity can improve end-to-end latency and overall user experience."
<jmh>I can live with that. </jmh>


I think the definition of Hybrid Clud needs tuning.  It is hybrid because they 
are combining resources they own with resources run by third party providers.   
This does not require "on-premise DCs".  It may be a third party physical 
premise.  It may be the enterprise premise but distinct from its normal 
operational activities.
[Linda] How about using Google's definition for Hybrid Cloud?
"A hybrid cloud is a mixed computing environment where applications are run 
using a combination of computing, storage, and services in different 
environments-public clouds and private clouds, including on-premises data 
centers or "edge" locations." 
https://cloud.google.com/learn/what-is-hybrid-cloud
<jmh>That works for me. </jmh>

    The above becomes a bigger issue in section 3.5.  The distinction is not 
between on-premise and off-premise.  But rather between different control and 
selection regimes.
Please consult with IXP folks on the definition of IXP.  I know some of them 
are distributed, so "a physical location" seems to be wrong.
[Linda] The definition was from an IXP provider. However, we can use 
Wikipedia's definition which is more natural:
"Internet exchange points (IXes or IXPs) are common grounds of IP networking, 
allowing participant Internet service providers (ISPs) to exchange data 
destined for their respective networks." 
https://en.wikipedia.org/wiki/Internet_exchange_point
<jmh>Yes, that definition would be significantly better.  Please make that 
change. </jmh>


The title of section 3.3 is misleading.  As I read the section, the title 
should be "limitations of DNS for location selection" or similar.  It is not a 
general section about techniques to select paths or deployment locations.
[Linda] Okay, changed to "Limitation of DNS-based Cloud DC Location Selection."
<jmh>Thank you. </jmh>

Section 3.4 bullet 1 reads somewhat oddly.  The fact that the differences in 
routing distances are small is not a problem.  If anything, it is a feature.  I 
believe your point is really point 2, that one may want to take into account 
other parameters.   I also note that 3.4 seems to get into topics being dealt 
with in the CATS problem statement and analysis.  You amy want to at lest point 
at that work, and not try to capture all its nuances here.
[Linda] how about changing to the following text?
"The difference in routing distances to server instances in different edge 
Clouds is relatively small. Therefore, the instance in the Edge Cloud with the 
shortest routing distance from the 5G UPF might not be the best in providing 
the overall low latency service."
<jmh>Thank you.  That helps.  I can live with it. </jmh>


Figure 1 in section 4.1 could use some clarification.  It is unclear if the two 
TN-1 are the same networks, or are intended to be different parts of the tenant 
network.  And similarly for the two TN-2.  It is also unclear why the top 
portion is even included in the figure, since it does not seem to have anything 
to do with the data center connectivity task?  Wouldn't it be simpler to just 
note that the diagram only shows part of the tenant infrastructure, and leave 
out irrelevancies?
[Linda] The two TN-1 are intended to be different parts of one single tenant 
network.  Is adding the following good enough?
"TN: Tenant Network. One TN (e.g., TN-1) can be attached to both vR1 and vR2."
<jmh>While that at least makes meaning of the figure clear, I am still left 
confused as to why the upper part of the figure is needed.</jmh>
<ld> mainly to show that one Tenant can have some routes reachable via Internet 
GW and others reachable via Virtual GW (IPsec). And routes belonging to one 
Tenant can be connected by vRouters </ld>


I would suggest removing the reference to SDWAN-EDGE-DISCOVERY in section 4.2.  
That is likely not the only way to perform such discovery, and conversely has 
scaling issues if many companies choose to burden BGP with this information as 
it needs to transit multiple ISPs.
[Linda] SDWAN-EDGE-DISCOVERY was derived from the problem (Bullet C) described 
by this document. How about changing to the following text?
"For Approach c), [SDWAN-EDGE-DISCOVERY] describes a mechanism for virtual 
routers to advertise their properties for establishing proper IPsec tunnels 
among them. There could be other approaches developed to address the problem."
<jmh>I can live with that.:/jmh>


I suspect that the security considerations should also point out that data 
center operator security practices can affect the overall security posture and 
need to be evaluated by customers.  (This is implied, but not stated 
explicitly, by the last indented paragraph.
[Linda] Added your suggested wording to the section:
"The Cloud DC operator's security practices can affect the overall security 
posture and need to be evaluated by customers. Many Cloud operators offer 
monitoring services for data stored in Clouds, such as AWS CloudTrail, Azure 
Monitor, and many third-party monitoring tools to improve the visibility of 
data stored in Clouds."
<jmh>Thank you.  I suspect that we want to say more, but since I do not ahve a 
good suggestions Iwe can leave it at this.</jmh>


Nits:
For some reason the htmlized version of the Introduction has the initial clause 
in bold face.  This suggests an <am></em> marking in the text that you probably 
do not want.  This happens again at the front of section 4 and section 5.
[Linda] I will fix this in the next version.
Thank you very much.
<jmh>Again, think you for your efforts. </jmh>


Yours,
Joel
On 8/8/2023 9:03 PM, Linda Dunbar wrote:
Joel,

As I mentioned to you during IETF117, RTGwg chairs asked me to approach you to 
help review the draft-ietf-rtgwg-net2cloud-problem-statement to make sure the 
readability is good before starting the WGLC. I have cleaned the draft after 
IETF117 and addressed the Early Review comments from INTDIR, RTGDIR, OPSDIR, 
SECDIR, TSVART, DNSDIR, and GENART.    
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/
The document describes the network-related problems enterprises face at the 
moment of writing this specification when interconnecting their branch offices 
with dynamic workloads in third-party data centers (DC) (a.k.a. Cloud DCs). The 
Net2Cloud problem statements are mainly for enterprises with traditional VPN 
services who want to leverage those networks (instead of altogether abandoning 
them).
This document also describes the mitigation practices for getting around the 
identified problems.
Thank you very much,
Linda

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Reply via email to