Authors (Kireeti, Yakov and Thomas),

This is a good draft - it looks like a good foundation to focus discussion
around what the server-to-NVE (attach/detach) protocol needs to do.  I
like a lot of the contents - I have a few high level comments and some
more detailed feedback.

(1) This draft starts out dealing with the attach/detach (server-to-NVE)
protocol and then includes some material on the control protocol for
distributing and managing mapping information on the NVEs.  I suggest
focusing the draft on the attach/detach protocol, removing control
protocol discussion (e.g., Section 3), and minimizing assumptions about
the control protocol (see detailed comments for where I think assumptions
could be minimized).  The result should be more general and more useful.

(2) Section 2.2.3 on detach is trying to cover at least a couple of use
cases,  VM live migration, and VM removal (e.g., power-off) that probably
want to be separated. The current text really doesn't get the live migration
case right, D.4 comes before D.3 for the power-off case, and I think things
get more complex when the live migration detach functionality is corrected.

(3) Section 2.2.4 appears to assume a specific order of events between
the two servers involved in VM migration.  As those servers are operating
concurrently, that's not a robust assumption, and the NVE functionality
should be specified to not depend on the order of events.

--- Detailed comments by section ---

A pre-disassociate operation is defined in section 2.2.1 but not used 
in the rest of the draft.  Is it actually needed?

-- Section 2.2.2

   A.1:  Validate the authentication (if present).  If not, inform the
         provisioning system, log the error, and stop processing the
         associate message.

This step should also include an optional authorization check, as network
policy may limit which NVEs are allowed to participate in which VNs.

   A.3:  If the VID in the associate message is non-zero, look up <VNID,
         P>.  If the result is zero, or equal to VID, all's well.
         Otherwise, respond to S with an error, and stop processing the
         associate message.

Why is a zero VID lookup result ok for a non-zero VID in the associate
message?  Should the NVE copy the VID from the associate message to the
<VNID,P> entry before responding?

   A.5:  Communicate with each rNVE device to advertise the VM's
         addresses, and also to get the addresses of other VMs in the
         DCVPN.  Populate the table with the VM's addresses and
         addresses learned from each rNVE.

This assumes that the control protocol does active propagation of all
address info, and assumes that no other addresses for the VN are present
in the NVE.  Neither of those are good general assumptions, IMHO, and
in particular, lazy evaluation is possible (e.g., load address mappings
on demand to reduce the amount of invalidation traffic caused by
each mapping change).  I'd suggest rephrasing to something like:

   A.5:  Use the overlay control protocol to inform the network of the
         VM's addresses and the VM's association with this NVE. 

-- Section 2.2.3

   D.1:  Validate the authentication (if present).  If not, inform the
         provisioning system, log the error, and stop processing the
         associate message.

Like A.1, this should include an optional authorization check, as some
<VNID,P> -> VID mappings may be statically configured and hence not
permit removal.

   D.2:  If the hold time is non-zero, point the VM's addresses in the
         VNID table to the new location of the VM, if known, or to
         "discard", and start a timer for the period of the hold time.
         If hold time is zero, immediately perform step D.4, then go to
         D.3.

This is where the power-off and migration cases start to interact - 
Hold time would be zero for power-off, non-zero for detach.  For migration,
this change potentially races with a change to the VM's addresses received
via the control protocol, so the VM's address may already point somewhere
else if the control protocol did its update before the dissociate (in
which case nothing should be done to those addresses).

   D.3:  Set the VID for <VNID, P> as unassigned.  Respond to S saying
         that the operation was successful.

If there are multiple VMs using the VNID on that port, this
"pulls the rug" out from under the others by disabling their forwarding.
This <VNID,P> -> VID mapping needs a reference count of some form, and
corresponding changes would be needed to A.2 and A.3.  Not using a
reference count may be ok under the assumption that the NVE does not
share ports among VMs (or VSIs/vNICs), but that may not be a good
assumption for an external NVE (e.g., in a ToR switch).

   D.4:  When the hold timer expires, delete the VM's addresses from the
         VNID table.  Delete any VM-specific network policies associated
         with any of the VM addresses.  If the VNID table is empty after
         deleting the VM's addresses, optionally delete the table and
         any network policies for the VNID.

Well, that's the right thing to do in the power-off case, but not
when the VM has moved and there are other VMs on this NVE (possibly even
the same port) that still need to communicate with the moved VM.  Also,
the power-off case needs to include (at least optionally) informing the
control protocol of the withdrawal of the VM's addresses.

As noted in (2) above, I think it would be clearer if there were separate
versions of 2.2.3 for the migration departure and power-down use cases.

-- Section 2.2.4

   M.3:  S then gets a request to terminate the VM on S.

   M.4:  Finally, S' gets a request to start up the VM on S'.

Not exactly ;-).

Terminating the VM on S (and destroying its state) before confirming
its startup on S' risks losing the VM entirely if something goes wrong
on S'.  This level of detail isn't necessary - from the point of view
of the network:
- Startup on S' generates an associate request to the NVE for S'.
- The dissociate request from S to its NVE may occur before or after
        that S' associate request
- The dissociate request from S to its NVE may occur before or after
        control protocol propagation of the results of the S' associate
        request to the NVE for S.
The server-to-NVE functionality should be specified to operate properly
independent of the order of these events.

   PA.5:  Communicate with each rNVE device to advertise the VM's
      addresses but as non-preferred destinations(*).  Also get the
      addresses of other VMs in the DCVPN.  Populate the table with the
      VM's addresses and addresses learned from each rNVE.

That assumes aggressive push of the new address information by the
control protocol directly to the rNVEs - while a control protocol
may choose to do that, it's not strictly necessary and the interaction
may not be directly between the lNVE and the rNVEs.  Generalizing in
a fashion similar to A.5, I'd suggest something like:

   PA.5:  The overlay control protocol may be used to inform the
      network of the forthcoming change to the VM's addresses
      that will occur when the VM is associated with this NVE.

If this is done, withdrawal of the tentative address changes
needs to be discussed, as VM migrations can abort for a variety
of reasons (e.g., S' may crash during the copy).  This PA.5
step can be skipped for a control protocol only does on-demand
provisioning of the address mapping information.

-- Section 3

This appears to be entirely about the control protocol and (IMHO)
doesn't fit well with the rest of the draft.

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
[email protected]        Mobile: +1 (978) 394-7754
----------------------------------------------------

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Reply via email to