RE: Tsvart early review of draft-ietf-rtgwg-net2cloud-problem-statement-22

Linda Dunbar Sat, 15 Apr 2023 18:17:26 -0700

Lukasz,

Thank you very much for reviewing the document and the comments.
Please see below for the resolutions to your comments.

Linda

From: Łukasz Bromirski <[email protected]>
Sent: Friday, April 14, 2023 5:38 PM
To: Linda Dunbar <[email protected]>
Cc: David Black <[email protected]>; [email protected]; 
[email protected]; [email protected]
Subject: Re: Tsvart early review of 
draft-ietf-rtgwg-net2cloud-problem-statement-22

Hi Linda, Group,

Let me offer some points related to the latest version of the draft:

1. "DSVPN" - this is Huawei specific term describing VPNs that allow for 
dynamic connections between spokes which itself is 1:1 copy of Cisco DMVPN down 
to use of NHRP and mGRE 
(https://support.huawei.com/enterprise/en/doc/EDOC1100112360/a485316c/overview-of-dsvpn).
 Shouldn't we avoid vendor-specific product/solution names in RFC documents?

It's actually called out again in point 4.2 later on along with Cisco's DMVPN 
callout at the same time (which itself is not defined anywhere).
[Linda] Agree with your point. Is "NHRP [RFC2735] based multi-point VPN" a 
better name? Or Can you suggest a name to indicate the NHRP based 
multi-point-to-point or multi-point-to multi-point tunnels among those client's 
own virtual routers?

2.

"3.1: [...] Cloud GWs need to peer with a larger variety of parties, via 
private circuits or IPsec over public internet."

As far as I understood, the whole 3.1. section tries to underline need for 
flexible/resilient BGP implementation and I agree with that. However,  I'd 
argue that a lot of cloud-based connections happen via BGP over internet 
directly, not necessarily through private circuits or IPsec. The 4.2 section of 
that draft even mentions some examples of that use case.

[Linda] Azure's ExpressRoute 
(https://azure.microsoft.com/en-us/products/expressroute ), AWS's  Direct 
Connect ( https://aws.amazon.com/directconnect/ ) are via private circuits, 
which are widely used. They all limit on the inbound routes via both Direct 
connect and via Internet.

There's so much focus in the document on only two types of connection - MPLS 
VPN or IPsec. The actual use case of connecting your workload to the cloud can 
be easily addressed by any type of overlay routing, like GRE or VXLAN/GENEVE 
terminated on the virtual cloud gateway.
[Linda] When the Cloud connection is via Internet, the IPsec is exclusively 
used as outer header. Within the payload of the IPsec, client routes can be 
encapsulated by VXLAN/GENEVE, which is not the focus of the document.

"When inbound routes exceed the maximum routes threshold for a peer, the 
current common practice is generating out of band alerts (e.g., Syslog) via 
management system to the peer, or terminating the BGP session (with cease 
notification messages [RFC 4486] being sent)."

For completness sake, shouldn't we explicitly state what's the action in the 
first case? Typically, the additional routes above the threshold are ignored 
and this in turn may lead to other reachability problems.
[Linda] At the current time, there is no standard procedure when inbound routes 
exceed the maximum limits or across certain thresholds. We are planning to 
write a standard track draft in IDR WG to kick start the discussion, like 
sending notifications when threshold across, ignoring routes that are not 
originated by the clients, or having some kind of policy on ignoring additional 
routes. There will be a lot of debates on this subject. IDR WG had many 
attempts on this in the past. None has reached consensus.

"3.4.1: [...] Therefore, the edge Cloud that is the closest doesn't contribute 
much to the overall latency."

How that's a problem?

[Linda] Here is what is intended to say:

  1.  The difference in routing distances to multiple server instances in 
different edge Clouds is relatively small. Therefore, the edge Cloud with the 
shortest routing distance might not be the best in providing the overall 
latency.

"4.3: [...] However, traditional MPLS-based VPN solutions are sub-optimized for 
dynamically connecting to workloads/applications in cloud DCs."

The whole section says existing MPLS VPNs and/or IPsec tunnels are being used 
to connect to Cloud DCs. So how exactly the "traditional MPLS-based VPNs" are 
"sub-optimized" if at the same time they're the exact means document mentions 
of solving the problem?
[Linda] "sub-optimal" because
The Provider Edge (PE) nodes of the enterprise's VPNs might not have direct 
connections to the third-party cloud DCs used by the enterprise to provide easy 
access to its end users. When the user base changes, the enterprise's 
workloads/applications may be migrated to a new cloud DC location closest to 
the new user base. The existing MPLS VPN provider might not have PEs at the new 
location. Deploying PEs routers at new locations is not trivial, which defeats 
one of the benefits of Clouds' geographically diverse locations allowing 
workloads to be as close to their end-users as possible.

"4.3. [...] The existing MPLS VPN provider might not have PEs at the new 
location. Deploying PEs routers at new locations is not trivial, which defeats 
one of the benefits of Clouds' geographically diverse locations allowing 
workloads to be as close to their end-users as possible."

When reading this literally, I'd say that any SP offering MPLS VPNs will be 
anyway more flexible in terms of reach (if it covers given geo) than pretty 
much fixed and limited number of cloud DCs available. However, I sense the 
intent here was to underline role of "agile" DCs set up by for example "cloud" 
stacks of 5G services (and similar services), and if so - that likely would 
require some clarification to be well understood.
[Linda] Setting up MPLS circuits takes weeks/months.

"4.3. [...] As MPLS VPNs provide more secure and higher quality services, 
choosing a PE closest to the Cloud GW for the IPsec tunnel is desirable to 
minimize the IPsec tunnel distance over the public Internet."

MPLS VPNs provide more secure and higher quality services.... than what?
[Linda] MPLS VPNs utilize private links.  Entrance to MPLS VPNs with edge 
filters provide additional filter. These are more secure than the public 
Internet.

"4.3. [...] As multiple Cloud DCs are interconnected by the Cloud provider's 
own internal network, the Cloud GW BGP session might advertise all of the 
prefixes of the enterprise's VPC, regardless of which Cloud DC a given prefix 
is actually in. This can result in inefficient routing for the end-to-end data 
path."

That's true, but either we praise use of anycast (in the doc above) or claim 
it's inferior to instead polluting routing table (announcing more prefixes), or 
limiting visibility (by announcing less prefixes). You can't really have it 
both ways.
[Linda] the intent of the section is to document the problem and describe a get 
around method:
To get around this problem, virtual routers in Cloud DCs can be used to attach 
metadata (e.g., GENEVE header or IPv6 optional header) to indicate Geo-location 
of the Cloud DCs.
Can you suggest a better text?

"5. As described in [Int-tunnels], IPsec tunnels can introduce MTU problems. 
This document assumes that endpoints manage the appropriate MTU sizes, 
therefore, not requiring VPN PEs to perform the fragmentation when 
encapsulating user payloads in the IPsec packets."

Well, typically no, it's 2023 and while PMTUD is still broken in parts of the 
internet that's abusively controlled or censored, the real problem here is with 
networks that run above typical 1500 bytes which is common for virtual 
environments and likely was a reason that text was put in place. Maybe 
underlining this would make sense in this paragraph?
[Linda] IETF drafts use plain text. Can't use underline. Can you suggest a 
better wording for this? Thank you.

"5.2. IPSec" -> "IPsec"
[Linda] changed.

"5.2. IPSec encap & decap are very processing intensive, which can degrade 
router performance. NAT also adds to the performance burden."

That's why nowadays IPsec is executed in hardware, or in "hardware-accelerated" 
software path (like QAT for x86-pure workloads), so is typically NAT on 
enterprise gear that does qualify as a "PE" so often mentioned in this document.
[Linda] Are the  "hardware-accelerated" path performance also impacted by 
larger number of IPsec flows? Can you suggest a better wording?

"5.2. [...] When enterprise CPEs or gateways are far away from cloud DC 
gateways or across country/continent boundaries, performance of IPsec tunnels 
over the public Internet can be problematic and unpredictable."

...compared to? Pure IP routing between the same IPs?
[Linda] comparing with private links.

"7. [...] via Public IP ports which are exposed"

Wouldn't it make sense to use 'interfaces' here? "ports" has TCP/UDP layer 4 
connotation.
[Linda] on routers, the term "physical ports" are commonly used, like Ethernet 
ports, OC12 Ports, WIFI ports, etc.

"7. [...] Potential risk of augmenting the attack surface with inter-Cloud DC 
connection by means of identity spoofing, man-in-the-middle, eavesdropping or 
DDoS attacks. One example of mitigating such attacks is using DTLS to 
authenticate and encrypt MPLS-in-UDP encapsulation (RFC 7510)."

How it is different than protection offered by IPsec?
[Linda] This section is about those attacks to the public facing "interface" 
that support IPsec.

"7. [...] When IPsec tunnels established from enterprise on-premises CPEs are 
terminated at the Cloud DC gateway where the workloads or applications are 
hosted, traffic to/from an enterprise's workload can be exposed to others 
behind the data center gateway (e.g., exposed to other organizations that have 
workloads in the same data center).

To ensure that traffic to/from workloads is not exposed to unwanted entities, 
IPsec tunnels may go all the way to the workload (servers, or VMs) within the 
DC."

How that problem statement would be different than DTLS solution/protection 
from the beginning of the section?

[Linda] DTLS is at the Transport Layer. Here we are talking about IP layer. The 
answer to that security question is long.  As you know IPSEC has different 
attack planes than DTLS at different costs. Are you looking for a chart that 
compares this facet?  Or can you simply reference the appropriate RFCs?
--
./

Thank you.
Linda

On 14 Apr 2023, at 19:24, Linda Dunbar 
<[email protected]<mailto:[email protected]>> wrote:

David,
We really appreciate your review and comments. Please see below for the 
resolutions.
Sorry for the delayed response. I missed yours when I was going through the 
comments from other reviewers.

The revision -23  
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/ 
has addressed the comments from OpsDIR, RTGDIR, DNSDIR and GENART. Changes to 
your comments will be reflected in the -24 revision.

Linda
-----Original Message-----
From: David Black via Datatracker <[email protected]<mailto:[email protected]>>
Sent: Monday, April 3, 2023 4:13 PM
To: [email protected]<mailto:[email protected]>
Cc: 
[email protected]<mailto:[email protected]>;
 [email protected]<mailto:[email protected]>
Subject: Tsvart early review of draft-ietf-rtgwg-net2cloud-problem-statement-22

Reviewer: David Black
Review result: Not Ready

Transport Area Review:

        Dynamic Networks to Hybrid Cloud DCs: Problem Statement and
                           Mitigation Practices
              draft-ietf-rtgwg-net2cloud-problem-statement-22

Reviewer: David L. Black ([email protected]<mailto:[email protected]>)
Date: April 3, 2023
Result: Not Ready

>From a Transport Area perspective, there's not a lot of relevant content in 
>this draft.
Section 5 mentions IPsec tunnels, which raise the usual transport-related 
concerns in dealing with tunnels.  Those concerns can be primarily addressed by 
citing appropriate references, e.g., MTU concerns are discussed in the tunnels 
draft in the intarea WG, and ECN propagation is covered by RFC 6040 plus the 
related update draft for shim headers in the TSVWG working group.  I don't see 
any serious problems here.
[Linda] For the MTU introduced by IPsec tunnels, how about adding the following 
sentences?
As described in [Int-tunnels], IPsec tunnels can introduce MTU problems. This 
document assumes that endpoints manage the appropriate MTU sizes, therefore, 
not requiring VPN PEs to perform the fragmentation when encapsulating user 
payloads in the IPsec packets

IPsec tunnels are over public internet, which doesn't support ECN. Why need to 
mention RFC6040?

OTOH, from a broader perspective, the draft is not a coherent problem statement 
- it discusses a plethora of technologies ranging from MPLS to DNS, often 
without making any connections among them (e.g., section 6 identifies policy 
management as a requirement, but there's no discussion of policies that require 
management elsewhere in the draft).
[Linda] This document describes the network-related problems enterprises face 
when interconnecting their branch offices with dynamic workloads in third-party 
data centers (a.k.a. Cloud DCs) and some mitigation practices. It is a list of 
technologies ranging from VPN to DNS.

I'm not even sure what the scope of the draft is, e.g.:

a) The abstract states that the draft is "mainly for enterprises that already 
have traditional MPLS services and are interested in leveraging those 
networks," but section
3.4 discusses 5G Edge Clouds, which are rather unlikely to use MPLS.
[Linda] The document is mainly for enterprises that already have traditional 
VPN services and are interested in leveraging those networks (instead of 
altogether abandoning them). MPLS (which is now replaced by VPN) is just one 
example.

b) There are at least three roles for BGP in this draft that are not 
disambiguated - IGP, EGP, and VPN routing protocol for MPLS-based VPNs, e.g., 
EVPN.  Section 4 would be a good place to clarify this by describing the 
Gateway interfaces in detail, including the role of BGP.
[Linda] Connecting to Cloud needs BGP, but doesn't run IGP, EVPN.
The intend of the draft is to identify future work in BGP.

In its current form, I don't understand the target audience or purpose of this 
draft, especially the head-spinning mixture of topics in section 3, so I cannot 
recommend IETF publication of the draft in its current form.
[Linda] The intent of the document is to lay out current mitigation methods and 
additional work on extension to BGPs, such as 
https://datatracker.ietf.org/doc/draft-ietf-idr-sdwan-edge-discovery/

Perhaps the draft ought to be focused and organized around extending and/or 
using MPLS and MPLS-based VPNs - much of the material in Sections 4 and 5 would 
be applicable, and some of the worst of section 3's distractions (e.g., 5G, 
DNS) could be avoided or at least scoped to the relevant VPN technologies.
[Linda] DNS issues introduced by connecting to Cloud DCs were strongly 
requested by DNSOps and OpsDIRs.

Thank you very much
Linda

_______________________________________________
rtgwg mailing list
[email protected]<mailto:[email protected]>
https://www.ietf.org/mailman/listinfo/rtgwg

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

RE: Tsvart early review of draft-ietf-rtgwg-net2cloud-problem-statement-22

Reply via email to