Re: [nvo3] draft-ietf-nvo3-vm-mobility-issues - Technical WG LC comments

Black, David Thu, 18 Sep 2014 16:42:14 -0700

Inline at [David-2].

Thanks,
--David

From: Linda Dunbar [mailto:[email protected]]
Sent: Thursday, September 18, 2014 5:09 PM
To: Black, David; [email protected]
Cc: Yakov Rekhter
Subject: RE: draft-ietf-nvo3-vm-mobility-issues - Technical WG LC comments

David,

Thank you very much for the detailed comments.
Please see the replies inserted below:

-----Original Message-----
From: nvo3 [mailto:[email protected]] On Behalf Of Black, David

-- Section 2.1 Terminology

   In this document the term "Top of Rack Switch (ToR)" is used to refer
   to a switch in a data center that is connected to the servers that
   host VMs.

That's a rather poor choice of terminology, as when there are two or more 
layers of switching in a rack, a switch at the top of the rack is not actually 
a "Top of Rack Switch (ToR)" as defined by this draft.
Something based on "First Hop" would be better than "Top of Rack" and see below 
on whether this draft is assuming external NVEs.

[Linda]Like it or not, the term "Top of Rack Switch (ToR)" is commonly used in 
Data Centers, referring to the switches connected to Servers. It is true that 
some servers have blade switches, some servers have virtual switches. NVE could 
on ToR, Virtual Switch, or Blade Switch. Those switches and likes make up the " 
L2 physical domain " attached to an NVE.

[David-2] There’s an important, but subtle distinction here.  I agree that 
servers are often connected directly to ToR switches, but there are exceptions. 
 A definition has to be broadly applicable - it’s not sufficient to cover only 
the common case.

[David=2] To define the ToR switch as above results in the ToR switch as 
defined by this draft not being the switch at the top of the rack in data 
centers where there is switching in the rack (blade servers are an example) - 
in such data centers, the servers are not directly connected to the ToR 
switches.  That’s the problem - the definition needs to be comprehensively 
applicable, and is flawed when there is switching in the rack.  A different 
term should be used to designate the switch that is connected to the servers 
that host VMs so that it covers all the cases.

 Nit: In the figure: "DBCR" -> "DCBR" (twice).
[Linda] Thanks.

I'm concerned about the implied requirement for a DCBR as the means of external 
data center connectivity, as it implies an IP (L3) forwarding hop at the data 
center boundary, and there are multi-data-center structures in which that does 
not happen.

[Linda] The figure shows the most common network design of most DCs today. I am 
sure you can find some DCs that are not designed this way.

[David-2] There’s only one use of the “DCBR” acronym outside of Section 2.1 - 
Section 3.4 makes this parenthetical comment: “(the router of the VM's L2-based 
CUG may be either DCBR or ToR itself).”  I suggest deleting that parenthetical 
comment from section 3.4 rather than trying to figure out how to generalize it 
to cover all the cases.

I don't see any mention of software switches in hypervisors in this section.  I 
think this draft is assuming that NVEs are external, i.e., not located in the 
server - e.g., otherwise Section 3.3 doesn't make much sense.  If so, that 
crucial assumption needs to be stated.

[Linda] We can add a statement saying that NVEs can be on Server as virtual 
switch, or blade switch.

[David-2] That misses the point.  Section 3.3 assumes that the NVE is not in a 
virtual switch in the server, and I think 3.1 does as well, although 3.1 is 
somewhat unclear to me.  Adding that statement will not suffice.

The term Closed User Group (CUG), appears to be close to, if not the same as 
"virtual network" - use of the latter term would be significantly clearer.

[Linda] Yes, CUG→ VN.

-- Section 3.1  Usage of VLAN-IDs

This appears to be a discussion of background that doesn't state a problem, 
hence does not belong in Section 3 (Problem Statement).
Perhaps this should be section 2.2 ?

[Linda] This section is to describe the issues of traditional way of using 
VLANs to represent L2 VNs.
To support tens of thousands of virtual networks, the local VID associated with 
client payload under each NVE has to be locally significant. Therefore, the 
same L2-based VN MAY have different VLAN-IDs under different NVEs.

[David-2] This assumes that the NVE is not in the server - that assumption 
needs to be stated, see above.

Thus when a given VM moves from one non-trivial L2 physical domain to another, 
the VLAN-ID of the traffic from/to VM in the former may be different than in 
the latter, and thus cannot assume to stay the same. For data frames traversing 
through the underlay network, if ingress NVE simply encapsulates an outer 
header to data frames received from VMs and forward the encapsulated data 
frames to egress NVE via underlay network, the egress NVE can’t simply 
decapsulate the outer header and send the decapsulated data frames to attached 
VMs as done by TRILL.

[David-2] If the NVE is in the virtual switch in the server, the NVE 
absolutely, positively can do exactly that, hence the assumption about NVE 
location needs to be clearly stated.

Nit:

   This document assumes that within a given non-trivial L2 physical
   domain traffic from/to VMs that are in that domain, and belong to the
   same L2-based CUG MUST have the same VLAN-ID.

That's true on a per-link basis, but not L2-physical-domain-wide, although 
using constant VLAN IDs domain-wide is a common practice because it's simple.  
Perhaps the assumption should be stated on a per-link basis followed by a 
"common practice" observation.

[Linda] The section is to emphasize the problem of using common VLAN-ID for the 
VNs. How about we use the text above instead?

[David-2] Maybe - I want to see the entire proposed rewrite of Section 3.1 
along with any text on NVE location assumptions in order to comment further.

I'm missing the point of the 3rd paragraph, specifically the "contrast" in this 
text:

   In other words, the
   VLAN-IDs used by a tagged VM network interface are part of the VM's
   state and cannot be changed when the VM moves from one L2 physical
   domain to another, even though it is possible for an entity, such as
   hypervisor virtual switch, to change the VLAN-ID from the value used
   by NVE to the value expected by the VM (in contrast, a VLAN tag
   assigned by a hypervisor for use with an untagged VM network
   interface can change).

Both hardware and software switches can add, remove and map VLAN-IDs, so the 
VLAN-ID used by the VM can be limited to the link between the
VM and the first switch.

[Linda] correct. How about change the text to:

In other words, the
   VLAN-IDs used by a tagged VM network interface are part of the VM's
   state and may not be changed when the VM moves from one L2 physical
   domain to another.  Therefore, it is necessary for an entity, such as
   hypervisor virtual switch, to change the VLAN-ID from the value used
   by NVE to the value expected by the VM (in contrast, a VLAN tag
   assigned by a hypervisor for use with an untagged VM network
   interface can change).

[David-2] “it is necessary” is too simple.  I would add “when the NVE does not 
use the VLAN-ID expected by the VM” before “it is necessary”.

I don't understand this sentence:

   If the L2 physical domain is extended to
   include VM tagged interfaces, the hypervisor virtual switch, and the
   DC bridged network, then special consideration is needed in
   assignment of VLAN tags for the VMs, the L2 physical domain and other
   domains into which the VM may move.

What is meant by "special consideration"?
[Linda] The “special consideration” should be “having an entity, such as 
hypervisor virtual switch, to change the VLAN-ID from the value used by NVE to 
the value expected by the VM”

[David-2] I’d suggest merging this sentence with the new sentence proposed 
above.

Nit:

   This document assumes that within a given non-trivial L2 physical
   domain traffic from/to VMs that are in that domain, and belong to
   different L2-based CUG MUST have different VLAN-IDs.

Again, per-link + "common practice", see above.

-- Section 3.2 Maintaining Connectivity in the Presence of VM Mobility

Again, I don't see a problem stated here.  There's also significant overlap 
between this section and both:

        - Section 3.2 of the problem statement draft, and
        - Section 3.3 of the framework draft.

[Linda] None of the drafts above mention about VM's ARP cache gets flushed once 
VM moves to another server

[David-2] Only for cold migration, as the draft text suggests.  The framework 
draft does mention ARP effects for “hot” mobility, although this refers 
primarily to use of ARP or RARP:

   Solutions to maintain connectivity while a VM is moved are necessary
   in the case of "hot" mobility. This implies that connectivity among
   VMs is preserved. For instance, for L2 VNs, ARP caches are updated
    accordingly.

What is being added here that isn't already covered in those sections of those 
drafts?

-- Section 3.3 Layer 2 extension

This text is hard to parse:

   Consider a scenario where a VM that is a member of a given L2-based
   CUG moves from one server to another, and these two servers are in
   different L2 physical domains, where these domains may be located in
   the same or different data centers. In order to enable communication
   between this VM and other VMs of that L2-based CUG, the new L2
   physical domain must become interconnected with the other L2 physical
   domain(s) that presently contain the rest of the VMs of that CUG, and
   the interconnect must not violate the L2-based CUG requirement to
   preserve source and destination MAC addresses in the Ethernet header
   of the packets exchange between this VM and other members of that
   CUG.

I'm missing the point here - two NVEs and an overlay-based encapsulation should 
solve this.  At a, "must become" and "presently" are wrong as the interconnect 
may exist prior to the move.  Also, I think this text assumes an external NVE 
(i.e., not in the server), see above.

[Linda] This paragraph is to emphasize that VM mobility causing changes of the 
NVEs that interconnect the VN.

[David-2] Really??  I don’t see the term NVE anywhere in that paragraph.  If 
that was the point, the paragraph needs a complete rewrite.  I’d like to see 
that rewrite before commenting further.

There seems to be significant overlap with section 3.4 of the problem statement 
draft, and I'm not clear on what this draft is adding.

-- Section 3.4 Optimal IP Routing

This has massive overlap with Section 3.7 of the problem statement draft.
I'm not sure what the text in this draft adds beyond what's already in the 
problem statement.

-- Section 3.5 Preserving Policies

This appears to be covered (albeit in less detail) in Section 3.3 of the 
framework draft.

[Linda] Those policies are referring to the zone policies, FW policies, etc, 
that Framework draft hasn’t covered.

[David-2] Ok, there are policies beyond those that “define connectivity among 
VMs” that need to be preserved.

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
[email protected]<mailto:[email protected]>        Mobile: +1 (978) 394-7754
----------------------------------------------------

_______________________________________________
nvo3 mailing list
[email protected]<mailto:[email protected]>
https://www.ietf.org/mailman/listinfo/nvo3

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] draft-ietf-nvo3-vm-mobility-issues - Technical WG LC comments

Reply via email to