Deat authors,
Please find my comments for draft-ietf-rtgwg-net2cloud-problem-statement (I
have included line numbers from nits to help identify where in the document the
comment is relevant):
Please update references below.
== Outdated reference: A later version (-13) exists of
draft-ietf-idr-sdwan-edge-discovery-12
== Outdated reference: A later version (-12) exists of
draft-ietf-opsawg-ntw-attachment-circuit-08
== Outdated reference: A later version (-23) exists of
draft-ietf-idr-5g-edge-service-metadata-16
== Outdated reference: A later version (-15) exists of
draft-ietf-opsawg-teas-attachment-circuit-10
== Outdated reference: A later version (-14) exists of
draft-ietf-add-split-horizon-authority-07
109 Cloud services are generally exposed, on-demand services that claim
110 to be scalable, highly available, and have usage-based billing. Most
Jim> The above sentence is difficult to parse. Do you mean “Cloud services are
generally exposed as on-demand…” rather than “Cloud services are generally
exposed,…”
115 hosts services to many customers.
Jim> s/to/too
137 "edge" locations. <https://cloud.google.com/learn/what-
138 is-hybrid-cloud>.
Jim> Please remove the in-text reference and replace with a [] reference as
either normative or informative.
144 https://en.wikipedia.org/wiki/Internet_exchange_point.
Jim> Please remove in-text reference and replace with a [] reference as either
normative or informative.
186 - If a Cloud Gateway (GW), a BGP speaker, receives from its BGP
187 peer a capability that it does not itself support or recognize,
188 it need to ignore that capability, and the BGP session need not
Jim> As per RFC5492 it MUST ignore that capability and the BGP session MUST NOT
be terminated. See section 3 of RFC5492 and correct the above text.
189 be terminated per [RFC5492]. When receiving a BGP UPDATE with a
190 malformed attribute, the revised BGP error handling procedure
191 in [RFC7606] should be followed instead of session resetting.
Jim> the above paragraph seems to be confused. The first sentence is talking
about BGP OPEN and how to handle capabilities, and then the second sentence
talks about BGP UPDATE messages that have malformed attributes. These are two
completely different things so I am struggling to understand why they are
referenced in the same paragraph and what exactly they have to do with each
other in the context of a Cloud Gateway?. Everything referenced is existing
behavior, nothing new, so why is it here and what are the authors trying to
convey? If they are trying to simply say that a Cloud Gateway should adhere to
the procedure as specified in RFCs 5492 and 7606 then why not simply say that?
If the authors wish to keep the text I would suggest a rewrite as follows:
- If a Cloud Gateway (GW), a BGP speaker, receives from its BGP peer a
BGP OPEN with a capability that it does not support or recognize, it
MUST ignore that capability, and the BGP session MUST NOT be terminated,
as per [RFC 5492].
- When receiving a BGP UPDATE with a malformed attribute, the revised BGP
error handling procedures in [RFC 7606] should be followed instead of
resetting the BGP session.
196 - When a Cloud DC eBGP session supports a limited number of
197 routes from external entities, the on-premises DCs need to set
198 up default routes and filter as many routes as practical
199 replacing them with a default in the eBGP advertisement to
200 minimize the number of routes to be exchanged with the Cloud DC
201 eBGP peers.
Jim> I do not understand the above paragraph. Is a Cloud DC different to an
on-premise DC? Who is advertising default to who? The scenario that you are
trying to convey above is non-obvious, at least to me, so please clarify.
202 - When a Cloud GW receives inbound routes exceeding the maximum
203 routes threshold for a peer, the currently common practice is
204 generating out-of-band alerts (e.g., Syslog entries) via the
205 management system or terminating the BGP session (with cease
206 notification messages [RFC4486] being sent). Although out of
207 the scope of this document, more discussion is needed in the
208 IETF Inter-Domain Routing (IDR) Working Group for potential in-
209 band or autonomous notification directly to the peers when the
210 inbound routes exceed the maximum routes threshold.
Jim> More explanation is needed here including a reference to section 4 of
RFC4486 that describes the procedure for terminating a peering with a
NOTIFICATION message and error code providing a reason e.g. “Maximum number of
prefixes reached”.
222 Failures within a Cloud site, which can be a building, a floor, a
223 pod, or a server rack, include capacity degradation or complete out-
224 of-service failure. Here are some events that can trigger a site
225 failure: a) fiber cut for links connecting to the site or among pods
226 within the site; b) cooling failures; c) insufficient backup power
227 during a power failure; d) cyber threat attacks; e) too many changes
228 outside of the maintenance window; etc. A fiber-cut is not uncommon
229 in a Cloud site or between sites.
Jim> I would suggest to say above that the types of events are not an
exhaustive list but just some examples.
244 [RFC7432] specifies a mass withdrawal mechanism for EVPN to signal a
245 large number of routes being changed to remote PE nodes as quickly
246 as possible.
Jim> I am not sure that RFC 7432 is relevant here or why EVPN is even
mentioned. Is there a reason to mention this or should the text simply be
removed?
597 premesis CPEs to a Cloud DC via a private VPN requires the private
Jim> s/premesis/premise
691 necessary. Alternative encapsulations, like SRH (Segment Routing
Jim> Please provide a reference to RFC 8754 (SRH)
695 6. Requirements for Networks Connecting Cloud Data Centers
Jim> Why are there requirements in a problem statement document? Did the WG
discuss splitting these out into a separate document?
Thanks!
Jim
_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]