Re: [nvo3] Comments on draft-kompella-nvo3-server2nve

Truman Boyes Mon, 23 Jul 2012 10:03:01 -0700

On Mon, Jul 23, 2012 at 12:22 PM, Ivan Pepelnjak <[email protected]>wrote:


> Just because millions of applications misuse a simplistic protocol in a
> way it was never designed to handle doesn’t make it a good idea. Not to
> mention the total lack of security.****
>
> ** **
>
> @Xiaohu: how would you distinguish a gratuitous ARP send from the
> hypervisor to indicate a VM move from a gratuitous ARP sent by a VM with a
> misconfigured IP address or a malicious gratuitous ARP sent by an intruder
> (physical or virtual)? Unless you can totally control the VM attachment
> point (= hypervisor switch unless you’re using something like 802.1BR) you
> cannot trust ARP ... but then if you do control the hypervisor switch, you
> don’t need ARP.****
>
> ** **
>
> Ivan
>

Completely agree. ARP is the wrong protocol for any form of trusted
signaling. Additionally, IPv6-only networks would present their own set of
challenges if we had a reliance on  IPv4 address resolution protocol.

Truman


> ****
>
> ** **
>
> *From:* [email protected] [mailto:[email protected]] *On Behalf
> Of *Linda Dunbar
> *Sent:* Monday, July 23, 2012 5:59 PM
> *To:* Xuxiaohu; [email protected]; [email protected]
>
> *Cc:* [email protected]
> *Subject:* Re: [nvo3] Comments on draft-kompella-nvo3-server2nve****
>
> ** **
>
> Millions of applications being deployed already use ARP to signal their
> presence. The widely deployed vMotion makes VMs in new location to send ARP
> (RARP) to inform the network of their new location. It doesn’t hurt to
> utilize the available messages from applications. ****
>
> ** **
>
> My two cents. ****
>
> ** **
>
> Linda Dunbar****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* [email protected] 
> [mailto:[email protected]<[email protected]>]
> *On Behalf Of *Xuxiaohu
> *Sent:* Wednesday, July 18, 2012 9:48 PM
> *To:* [email protected]; [email protected]
> *Cc:* [email protected]
> *Subject:* Re: [nvo3] Comments on draft-kompella-nvo3-server2nve****
>
> ** **
>
> Does that mean the ARP could also be considered as an option for signaling
> the VM attachment/detachment event? For example, a gratuitous ARP packet
> can be inferred as an attachment event by the NVE which receives such
> packet via the NVE-TES interface. Meanwhile, for those L2VPN  (e.g., VPLS)
>  or L3VPN overlay approaches which only allows one next-hop to be
> available  for a given MAC route or a host route in the forwarding table,
> a gratuitous ARP packet received from a remote NVE could be inferred as a
> detachment event by the NVE to which the ARP sending VM was previously
> attached. Moreover, in case a gratuitous ARP packet triggers the NVE which
> received that packet via the NVE-TES interface to generate a MAC route or a
> host route for the ARP sending VM, and the NVE to which that VM was
> previously attached, upon receiving that route,  could also infer that
> route as a detachment event of that VM.****
>
> ** **
>
> Best regards,****
>
> Xiaohu****
>
> ** **
>
> <skipped>****
>
> I have a related consideration based on thinking about this further.  The
> network****
>
> SHOULD NOT rely on dissociate messages always being sent - a server crash
> at the****
>
> wrong point during a VM migration may cause a dissociate to be missed
> (e.g., the****
>
> VM made it to S’, but S crashed before sending the dissociate).  More
> importantly,****
>
> not relying on the dissociate messages (in particular, not having the
> inter-NVE****
>
> control protocol rely on them) helps if one wants to mix hypervisors that
> support****
>
> the attach/detach protocol with (exiting) ones that don’t.  For existing
> hypervisors,****
>
> under suitable restrictions and assuming some advance configuration,
> “associate”****
>
> can be inferred from a gratuitous ARP or RARP, but nothing is sent for
> dissociate.****
>
> The inference of “associate” won’t be possible if things have not been set
> up****
>
> to enable the gratuitous ARP or RARP.****
>
> ** **
>
> Thanks,
> --David****
>
> ** **
>
> *From:* [email protected] [mailto:[email protected]] *On Behalf
> Of *Kireeti Kompella
> *Sent:* Saturday, July 14, 2012 8:16 PM
> *To:* Black, David
> *Cc:* [email protected]
> *Subject:* Re: [nvo3] Comments on draft-kompella-nvo3-server2nve****
>
> ** **
>
> Hi David,****
>
> ** **
>
> Thanks for your detailed comments!  More inline.****
>
> On Fri, Jul 13, 2012 at 1:01 PM, <[email protected]> wrote:****
>
> Authors (Kireeti, Yakov and Thomas),
>
> This is a good draft - it looks like a good foundation to focus discussion
> around what the server-to-NVE (attach/detach) protocol needs to do.  I
> like a lot of the contents - I have a few high level comments and some
> more detailed feedback.****
>
> ** **
>
> Thanks!****
>
>  ****
>
> (1) This draft starts out dealing with the attach/detach (server-to-NVE)
> protocol and then includes some material on the control protocol for
> distributing and managing mapping information on the NVEs.  I suggest
> focusing the draft on the attach/detach protocol, removing control
> protocol discussion (e.g., Section 3), and minimizing assumptions about
> the control protocol (see detailed comments for where I think assumptions
> could be minimized).  The result should be more general and more useful.**
> **
>
> ** **
>
> About the control plane: it really concerns me that the control plane
> discussion has not happened so far (not really).  ARP doesn't scale;
> neither does flooding.  The goal here is to signal networking parameters:
> from server (vswitch) to local NVE to remote NVEs to remote servers.  Fine,
> call the local NVE to remote NVEs part "control plane" -- but that's a
> critical part of the picture.****
>
> ** **
>
> What I take from your suggestion is to move the lNVE to rNVE part to a
> different draft; I buy that, especially if there are other mechanisms for
> doing this that can plug-in to this server2nve signaling, so that one can
> mix-and-match server2nve signaling and lNVE2rNVE signaling.  Does that seem
> reasonable?****
>
>  ****
>
> (2) Section 2.2.3 on detach is trying to cover at least a couple of use
> cases,  VM live migration, and VM removal (e.g., power-off) that probably
> want to be separated. The current text really doesn't get the live
> migration
> case right, D.4 comes before D.3 for the power-off case, and I think things
> get more complex when the live migration detach functionality is corrected.
> ****
>
> ** **
>
> More on this below.****
>
>  ****
>
> (3) Section 2.2.4 appears to assume a specific order of events between
> the two servers involved in VM migration.  As those servers are operating
> concurrently, that's not a robust assumption, and the NVE functionality
> should be specified to not depend on the order of events.****
>
> ** **
>
> Ordering assumptions weren't intended, so we'll tweak the wording to
> remove any such implications.****
>
>  ****
>
> --- Detailed comments by section ---
>
> A pre-disassociate operation is defined in section 2.2.1 but not used
> in the rest of the draft.  Is it actually needed?****
>
> ** **
>
> Good catch!  I'd put that in early, worked out the rest of the details,
> couldn't figure out a use for it, but forgot to remove it.  I'll remove it.
> ****
>
>  ****
>
> -- Section 2.2.2
>
>    A.1:  Validate the authentication (if present).  If not, inform the
>          provisioning system, log the error, and stop processing the
>          associate message.
>
> This step should also include an optional authorization check, as network
> policy may limit which NVEs are allowed to participate in which VNs.****
>
> ** **
>
> Okay.  Authorization locally, or from the provisioning system?  (Or
> either?)****
>
>  ****
>
>    A.3:  If the VID in the associate message is non-zero, look up <VNID,
>          P>.  If the result is zero, or equal to VID, all's well.
>          Otherwise, respond to S with an error, and stop processing the
>          associate message.
>
> Why is a zero VID lookup result ok for a non-zero VID in the associate
> message?  ****
>
> ** **
>
> Just means no mapping yet.  With respect to the refcounting suggested
> below, good place to set it to 1; otherwise increment.****
>
>  ****
>
> Should the NVE copy the VID from the associate message to the
> <VNID,P> entry before responding?****
>
> ** **
>
> Good point.  Will fix.****
>
>  ****
>
>    A.5:  Communicate with each rNVE device to advertise the VM's
>          addresses, and also to get the addresses of other VMs in the
>          DCVPN.  Populate the table with the VM's addresses and
>          addresses learned from each rNVE.
>
> This assumes that the control protocol does active propagation of all
> address info, and assumes that no other addresses for the VN are present
> in the NVE.  Neither of those are good general assumptions, IMHO, and
> in particular, lazy evaluation is possible (e.g., load address mappings
> on demand to reduce the amount of invalidation traffic caused by
> each mapping change).  ****
>
> ** **
>
> I'm leery of on-demand/cache-based address mappings and lazy evaluation
> (love it in general, but not for address mappings).  However, you're right:
> there may be cases where this is a valid approach.****
>
> ** **
>
> I'd suggest rephrasing to something like:
>
>    A.5:  Use the overlay control protocol to inform the network of the
>          VM's addresses and the VM's association with this NVE.****
>
> ** **
>
> Something like that.  Will work on text.****
>
>  ****
>
> -- Section 2.2.3
>
>    D.1:  Validate the authentication (if present).  If not, inform the
>          provisioning system, log the error, and stop processing the
>          associate message.
>
> Like A.1, this should include an optional authorization check, as some
> <VNID,P> -> VID mappings may be statically configured and hence not
> permit removal.****
>
> ** **
>
> Okay, will copy wording from there once we've agreed on it.****
>
>  ****
>
>    D.2:  If the hold time is non-zero, point the VM's addresses in the
>          VNID table to the new location of the VM, if known, or to
>          "discard", and start a timer for the period of the hold time.
>          If hold time is zero, immediately perform step D.4, then go to
>          D.3.
>
> This is where the power-off and migration cases start to interact -
> Hold time would be zero for power-off, non-zero for detach.  For migration,
> this change potentially races with a change to the VM's addresses received
> via the control protocol, so the VM's address may already point somewhere
> else if the control protocol did its update before the dissociate (in
> which case nothing should be done to those addresses).****
>
> ** **
>
> Definitely worth looking at again, especially with respect to your
> comments about the order for migration.****
>
> ** **
>
> With regard to the race condition, I'll send a separate email on that.****
>
> ** **
>
>    D.3:  Set the VID for <VNID, P> as unassigned.  Respond to S saying
>          that the operation was successful.
>
> If there are multiple VMs using the VNID on that port, this
> "pulls the rug" out from under the others by disabling their forwarding.
> This <VNID,P> -> VID mapping needs a reference count of some form, and
> corresponding changes would be needed to A.2 and A.3.  Not using a
> reference count may be ok under the assumption that the NVE does not
> share ports among VMs (or VSIs/vNICs), but that may not be a good
> assumption for an external NVE (e.g., in a ToR switch).****
>
> ** **
>
> Good point!  I'll go with refcounting.****
>
>  ****
>
>    D.4:  When the hold timer expires, delete the VM's addresses from the
>          VNID table.  Delete any VM-specific network policies associated
>          with any of the VM addresses.  If the VNID table is empty after
>          deleting the VM's addresses, optionally delete the table and
>          any network policies for the VNID.
>
> Well, that's the right thing to do in the power-off case, but not
> when the VM has moved and there are other VMs on this NVE (possibly even
> the same port) that still need to communicate with the moved VM.  Also,
> the power-off case needs to include (at least optionally) informing the
> control protocol of the withdrawal of the VM's addresses.****
>
> ** **
>
> See separate email.****
>
>  ****
>
> As noted in (2) above, I think it would be clearer if there were separate
> versions of 2.2.3 for the migration departure and power-down use cases.***
> *
>
> ** **
>
> Perhaps.  Let's get the semantics right first, then see if there are
> common elements or not.****
>
>  ****
>
> -- Section 2.2.4
>
>    M.3:  S then gets a request to terminate the VM on S.
>
>    M.4:  Finally, S' gets a request to start up the VM on S'.
>
> Not exactly ;-).
>
> Terminating the VM on S (and destroying its state) before confirming
> its startup on S' risks losing the VM entirely if something goes wrong
> on S'.  ****
>
> ** **
>
> Interesting point.  However, if the VM starts on S' without first being
> stopped on S, then (for some time) both S and S' are running, and I'd think
> that the results would be unpredictable, especially if the VM is just about
> to engage in some I/O.  However, I'll bow to those who've implemented VM
> migration and know what they're doing.  Perhaps the VM is paused on S,
> started on S'; if that's successful, the VM is destroyed on S, otherwise
> the migration is aborted and the VM is continued on S.  I'd like to know,
> as this affects the "tentative address changes" you talk about below, and
> dealing with migration abort.****
>
> ** **
>
> This level of detail isn't necessary - from the point of view
> of the network:
> - Startup on S' generates an associate request to the NVE for S'.
> - The dissociate request from S to its NVE may occur before or after
>         that S' associate request
> - The dissociate request from S to its NVE may occur before or after
>         control protocol propagation of the results of the S' associate
>         request to the NVE for S. ****
>
> The server-to-NVE functionality should be specified to operate properly
> independent of the order of these events.****
>
> ** **
>
> Agreed.  Separate email thread to work this out.****
>
> ** **
>
>    PA.5:  Communicate with each rNVE device to advertise the VM's
>       addresses but as non-preferred destinations(*).  Also get the
>       addresses of other VMs in the DCVPN.  Populate the table with the
>       VM's addresses and addresses learned from each rNVE.
>
> That assumes aggressive push of the new address information by the
> control protocol directly to the rNVEs - while a control protocol
> may choose to do that, it's not strictly necessary and the interaction
> may not be directly between the lNVE and the rNVEs.  Generalizing in
> a fashion similar to A.5, I'd suggest something like:
>
>    PA.5:  The overlay control protocol may be used to inform the
>       network of the forthcoming change to the VM's addresses
>       that will occur when the VM is associated with this NVE.****
>
> ** **
>
> Okay, something like.****
>
>  ****
>
> If this is done, withdrawal of the tentative address changes
> needs to be discussed, as VM migrations can abort for a variety
> of reasons (e.g., S' may crash during the copy).  This PA.5
> step can be skipped for a control protocol only does on-demand
> provisioning of the address mapping information.****
>
> ** **
>
> Interesting thought.  Will follow up once we get the migration "right"
> (for some value of right).****
>
>  ****
>
> -- Section 3
>
> This appears to be entirely about the control protocol and (IMHO)
> doesn't fit well with the rest of the draft.****
>
> ** **
>
> Will discuss putting this in a separate draft with co-authors.****
>
> ** **
>
> Thanks again for the detailed comments!****
>
> Kireeti.****
>
> ** **
>
>  ****
>
> Thanks,
> --David
> ----------------------------------------------------
> David L. Black, Distinguished Engineer
> EMC Corporation, 176 South St., Hopkinton, MA  01748
> +1 (508) 293-7953             FAX: +1 (508) 293-7786
> [email protected]        Mobile: +1 (978) 394-7754
> ----------------------------------------------------
>
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3****
>
>
>
> ****
>
> ** **
>
> --
> Kireeti****
>
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3
>
>

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on draft-kompella-nvo3-server2nve

Reply via email to