[nvo3] IESG review of draft-ietf-nvo3-overlay-problem-statement-03

Thomas Narten Fri, 28 Jun 2013 14:28:05 -0700

Hi.

The IESG approved the problem statement yesterday. There was one
discuss, but after some dialog it was changed to a comment.

Yay!!!!

Melinda Shore also had comments as a result of her OPS directorate
review, the are also covered below.

The IESG did have some comments. But because they are not "discusses",
we only need to consider them and make changes as we see fit.

Going through the comments, here are my proposed responses. If these
are OK, I'll reissue the document and we'll be done. :-)

I've collected all the responses into this one message.

The comments below are taken from datatracker
http://datatracker.ietf.org/doc/draft-ietf-nvo3-overlay-problem-statement/ballot/

Adrian Farrel writes:

> Thanks for this document which I believe is a major step towards scoping
> and documenting the real problems in this space.  I have a number of
> fairly editorial concerns that I hope you can work through with your AD
> and document shepherd.
> 
> ---
> 
> In reading this document, I found it difficult to distinguish the
> requirements that arrise from the provision of multiple virtual networks
> on a common infrastrucutre (traffic isolation, address space isolation,
> virtual network creation and configuration) from those that are specific
> to the NVO3 scope (massive scaling, multi-tenancy on individual physical
> servers, no constraints on physical location of hosted services).

I'm not sure this distinction is a distinction or matters.  I.e., all
of the things listed first (traffic isolation, etc) are very much
requirements and in scope for NVO3, or at least closely related.

Proposed resolution: no changes to text.

> [I-D.ietf-nvo3-framework] is used as a normative reference because it
> defines terminology used in this document.

Will update document. But it does mean the document won't get
published until the framework document is published...

> I would move the definiton of "in-band virtual network" from section 2
> to section 5.3 (the only place the term is used) to avoid complicating
> the definitions with concepts that appear to only be applied to L2
> networks.

Proposal:

Delete the defintion and update text in 5. (At one point this term was
used more than what is in the current document.)

Old:

5.3.  802.1 VLANs

   VLANs are a well understood construct in the networking industry,
   providing an L2 service via an in-band L2 Virtual Network.  A VLAN is
   an L2 bridging construct that provides the semantics of virtual
   networks mentioned above: a MAC address can be kept unique within a
   VLAN, but it is not necessarily unique across VLANs.  Traffic scoped
   within a VLAN (including broadcast and multicast traffic) can be kept
   within the VLAN it originates from.  Traffic forwarded from one VLAN
   to another typically involves router (L3) processing.  The forwarding
   table look up operation may be keyed on {VLAN, MAC address} tuples.

New (only first sentence has changed):

   VLANs are a well understood construct in the networking industry,
   providing an L2 service via a physical network in which tenant
   forwarding information is part of the phsyical network
   infrastructure. A VLAN is an L2 bridging construct ...

> Why is the example of an Overlay Virtual Network in section picked from
> the layer 2 space when this work is supposed to consider only layer 3
> overlays?

The solution will be over Layer 3. The example is one where overlays
are used. No need to restrict that example to L3, and SPB is an
obvious example to use here...

> OTOH, since this term is not used anywhere in the document, I suggest
> deleting it.

I propose to change the term to just "Overlay Network". That term is
used throughout the document. (It turns out that the term "overlay
virtual network" is not used elsewhere in the document). The two terms
are in practice used interchangably.

> I believe section 3.1 could be rewritten without the need to say "cloud"
> or "elastic services".  This would be helpful because those marketting
> phrases do not add to the meaning.

> I think the final sentence of the paragraph captures the issues, but
> could be pulled out into a little more explanation of what happens and
> what problems it causes.

This also came up in Melinda's review. I will remove the term cloud. I
think elastic is OK.

Old:

   Cloud computing involves on-demand provisioning of resources for
   multi-tenant environments.  A common example of cloud computing is
   the public cloud, where a cloud service provider offers elastic
   services to multiple customers over the same infrastructure.  In
   current systems, it can be difficult to provision resources for
   individual tenants (e.g., QoS) in such a way that provisioned
   properties migrate automatically when services are dynamically moved
   around within the data center to optimize workloads.

NEW (first sentence replaces two previous sentences):

   Some service providers offer services to multiple customers whereby
   services are dynamic and the resources assigned to support them
   must be able to change quickly as demand changes.  In current
   systems, it can be difficult to provision resources for individual
   tenants (e.g., QoS) in such a way that provisioned properties
   migrate automatically when services are dynamically moved around
   within the data center to optimize workloads.

> Section 5.3 uses the terms C-VLAN, S-VLAN, and B-VLAN, but only C-VLAN
> has been defined.

Rather than define them, I added references to IEEE-802.1Q and 802.1aq
(for I-SIDs).

> Section 10 seems to me to be missing the impact that one virtual network
> might be able to have on another (for example by stressing network
> resources to cause undesirable VM mobility, or by consuming shared
> resources to make b/w or CPU unavailable).

> This is a type of self-consuming DoS.

Old:

   In the control plane, the primary security concern is ensuring that
   unauthorized control information is not installed for use in the data
   plane.  The prevention of the installation of improper control
   information, and other forms of denial of service are also concerns.
   Hereto, some environments may also be concerned about confidentiality
   of the control plane.

New:

   In the control plane, the primary security concern is ensuring that
   an unauthorized party does not compromise the control plane
   protocol in ways that improperly impact the data plane. Some
   environments may also be concerned about confidentiality of the
   control plane.

   More generally, denial of service concerns may also be an
   consideration. For example, a tenant on one virtual network could
   consume excessive network resources in a way that degrades services
   for other tenants on other virtual networks.  

Benoit Claise writes:

> Comment (2013-06-27) 
> 
> Not much OPS feedback in this draft. I'm dying to see the "Operational
> Requirements submitted for IESG review" chartered item.
> 
> Editorial:

> "Tenant Systems" should not be capitalized. Alternatively, you can
> define the term. Please expand ARMD Explain/Expand: C-VID, B-VID,
> I-VID

Tenant Systems is defined in the framework document
(referenced). Reference to other terms added.

> And here is Melinda's feedback from OPS-DIR:
> I was asked to perform an OPS-DIR review of
> draft-ietf-nvo3-overlay-problem-statement.
> 
> The document specifically targets multitenancy in large data
> center networks, describing problems arising from that
> scenario and how they may be addressed by overlay networks.
> That this document made it through working group last call
> at all should be seen as a major political accomplishment,
> given the level of rancor in the working group, and much
> respect is due to the chairs and the document authors for
> getting this done.

:-)

> The underlying assumption is that these virtual networks
> will provide traffic isolation.
> 
> Minor issues:
> 
> Section 3.1: "Cloud computing" - the document would benefit
> from eliminating that terminology and just describing the
> scenario ("Some service providers offer elastic services
> ... ").  "Cloud" is imprecise and evocative of marketing
> jargon.  We can talk about the need for dynamic provisioning
> more carefully, I think.

Per above, "cloud" will be removed.

> Section 3.2, second sentence: "A VM can be migrated from
> one server to another, [ ... ]."  I'm afraid it's servers
> all the way down - may be clearer to say that VMs may be
> migrated between hypervisors.

I think leaving this as server is OK. There is no perfect term
here. Most folk understand that when you are talking about VMs, they
are running servers or some other physical machine. Even worse, in
industry the term "host" rather than "server" is used a lot too.

> An operational consideration for this section (3.2) is that
> there may be state associated with specific data flows to a
> VM that is not on the VM - that's resident on some sort of
> middlebox (firewall, application proxy, accelerator, cache,
> etc.).  I tend to think that network state will, in
> practice, be topologically close to the VM, but care must be
> taken.

This comment actually plays into the proposed NSC work...

> Doesn't really matter but it appears that the section header
> for section 3.6 is marked up incorrectly (font and bolding).

Font? Bolding? In an ascii ID? :-)

This is presumably an issue with some tool somewhere (this section
title is long and doesn't fit on one line).

> 3.7 is probably one of the clearest descriptions I've seen
> of this issue - well done.

Thanks!

> 10: I'm not sure the security considerations are quite right, or
> at least not the discussion of data plane security issues.
> What are the characteristics of an overlay network that
> differ from a physical network or VPN, and how do they
> impact design decisions for the overlay?
> 
> Also, may be worth saying something about data leakage from
> interception of control plane traffic (what inferences can
> be made from changes in topology, etc.?).

I proposed some changes above in response to Adrian's comments. But
I don't see what changes the above suggests. For one thing, there is
an assumption that overlays are running (primarily) in a data center,
where the network is not open and tenants don't have direct access to
the underlay. If they did, presumably that would raise a bunch of
bigger issues than their ability to see control traffic.

Jari Arkko writes;

> Thank you for writing this document. It is well written and easy to read, and
> documents the space well.
> 
> I had one question when reading Section 4. I was wondering why MTU was not
> mentioned, MTU issues being one of the impacts of overlay designs.

I responded:

   Well, in the data center, MTU issues are not that big of a problem
   in practice. Most hardware does jumbo frames (even if officially we
   don't like to say that, given IEEE's position on jumbo frames). And
   trying to deal with path MTU discovery and all that just doesn't
   seem worth it in the overall scheme of things. If you look at some
   of the solutions being built in this space (NVGRE and VXLAN), they
   also take a similar approach wrt MTU, i.e., avoid fragmentation.

   Also, one thing this document has had to stradle is what is a
   problem in data centers generally, and what is a problem that
   results if overlays are the solution. For the latter, that's
   arguably really for other documents to get into. If one starts
   documenting all the potential issues introduced when overlays are
   used, there is a list of issues, and we've moved beyond a problem
   statement...

To which Jari responded "OK, thanks".

Joel Jaeggli writes:

> Comment (2013-06-27) 
> 
> The document is weirdly though non-specifically ipv4-centric. I don't think
> there are any particular fixes to be applied. I would observe however that
> address reuse while common in parallel rfc1918 addressing planes would not I
> imagine be very common in ipv6 in the umbering plans of ipv6 enabled DCs. that
> the longest possible route is not a /32, and that a signficant scaling
> consideration with L3 --> L2 mappings is the duplication between the arp cache
> and the NDP cache.

Interesting that I somehow didn't call out IPv6 support
explicitly. :-) I guess when we say IP, we really mean both these
days. I think there is a general assumption solutions will have to
support both IPv4/IPv6 on the underlay/overlay. I take that as a
given.

Stephen Farrell writes:

> The nodes of a virtual network, once running, can look
> after securing their own traffic. That might lead one to
> say that nvo3 traffic isolation doesn't need to consider
> confidentiality. However, if the nodes in a virtual network
> are VMs and if VMs can be moved, then any secrets required
> for the virtual network to secure its traffic will be
> exposed to the underlay during the move.
> 
> I'm not clear if this wg will try address that issue or
> not. Section 10 does say that some environments might be
> concerned about confidentiality but is vague about whether
> or not the wg will work on the topic.

> Such a confidentiality service isn't a panacea of course,
> the underlay components providing the confidentiality
> service could leak the relevant keys, but it could still be
> useful nonetheless. (BTW, I've no idea if it'd make sense
> to have such a service that's separated from whatever
> technology is used to move the VM or not.)
> 
> So I was wondering: will the wg actually define such a
> confidentiality service or not? The response is that yes,
> this'll be considered for the requirements documents
> which is fine.

> Note that I'm not trying to insist on a "yes" answer, even
> though I think that'd be good. Even a "maybe, and that'll
> be answered in the requirements specs before we re-charter"
> would be ok. But regardless of the answer, I think it'd be
> good to at least note this issue in the security
> considerations section.

We had some back-and-forth with the IESG on this. This is something
that gets decided when we look at solutions. I.e., requirements for
solutions. Too early to say that here.

> - 4.1, bullets: I found the use of ingress/egress
> non-intuitive here. You mean ingress to the underlay and
> egress from the underlay, right? It'd be good to explicitly
> say that, though I figured it out eventually (or not, if
> I'm wrong above:-)

I changed "egress NVE" to "egress NVE for the tunneled packet" and
added "tunnel" to an earlier sentence:

New:

          The idea behind an overlay is quite straightforward. Each
          virtual network instance is implemented as an overlay. The
          original packet is encapsulated by the first-hop network
          device, called a Network Virtualization Edge (NVE), and
          tunneled to a remote NVE. The encapsulation identifies the
          destination of the device that will perform the
          decapsulation (i.e., the egress NVE for the tunneled packet)
          before delivering the original packet to the endpoint. The
          rest of the network forwards the packet based on the
          encapsulation header and can be oblivious to the payload
          that is carried inside.

Ted Lemon writes:

> Comment (2013-06-27) 
> 
> Minor nit:
>    While an overlay-based approach may address some of the
>    "pain points" that were raised in ARMD (e.g., better support for
>    multi-tenancy).  Analysis will be needed to understand the scaling
>    tradeoffs of an overlay based approach compared with existing
>    approaches.
> 
> I think you want a comma between these two chunks; otherwise it doesn't really
> parse.
> 
> In 5.7, trill-fine-labeling is in the RFC editor queue, so I think that should
> be described as completed work, rather than something TRILL is
> investigating.

Good catch!

New:

          TRILL is a network protocol that provides an Ethernet L2
          service to end systems and is designed to operate over any
          L2 link type. TRILL establishes forwarding paths using IS-IS
          routing and encapsulates traffic within its own TRILL
          header.  TRILL as originally defined, supports only the
          standard (and limited) 12-bit C-VID identifier.  Work to
          extend TRILL to support more than 4094 VLANs has recently
          completed and is defined in
          <xref target="I-D.ietf-trill-fine-labeling"></xref>

> 
> In general this draft is very clearly written, and does a good job of 
> analyzing
> the problem space.   Thanks for doing such a good job on it!    

Thanks!

Thomas

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] IESG review of draft-ietf-nvo3-overlay-problem-statement-03

Reply via email to