On Mon, Jul 23, 2012 at 12:22 PM, Ivan Pepelnjak <[email protected]>wrote:
> Just because millions of applications misuse a simplistic protocol in a > way it was never designed to handle doesn’t make it a good idea. Not to > mention the total lack of security.**** > > ** ** > > @Xiaohu: how would you distinguish a gratuitous ARP send from the > hypervisor to indicate a VM move from a gratuitous ARP sent by a VM with a > misconfigured IP address or a malicious gratuitous ARP sent by an intruder > (physical or virtual)? Unless you can totally control the VM attachment > point (= hypervisor switch unless you’re using something like 802.1BR) you > cannot trust ARP ... but then if you do control the hypervisor switch, you > don’t need ARP.**** > > ** ** > > Ivan > Completely agree. ARP is the wrong protocol for any form of trusted signaling. Additionally, IPv6-only networks would present their own set of challenges if we had a reliance on IPv4 address resolution protocol. Truman > **** > > ** ** > > *From:* [email protected] [mailto:[email protected]] *On Behalf > Of *Linda Dunbar > *Sent:* Monday, July 23, 2012 5:59 PM > *To:* Xuxiaohu; [email protected]; [email protected] > > *Cc:* [email protected] > *Subject:* Re: [nvo3] Comments on draft-kompella-nvo3-server2nve**** > > ** ** > > Millions of applications being deployed already use ARP to signal their > presence. The widely deployed vMotion makes VMs in new location to send ARP > (RARP) to inform the network of their new location. It doesn’t hurt to > utilize the available messages from applications. **** > > ** ** > > My two cents. **** > > ** ** > > Linda Dunbar**** > > ** ** > > ** ** > > ** ** > > *From:* [email protected] > [mailto:[email protected]<[email protected]>] > *On Behalf Of *Xuxiaohu > *Sent:* Wednesday, July 18, 2012 9:48 PM > *To:* [email protected]; [email protected] > *Cc:* [email protected] > *Subject:* Re: [nvo3] Comments on draft-kompella-nvo3-server2nve**** > > ** ** > > Does that mean the ARP could also be considered as an option for signaling > the VM attachment/detachment event? For example, a gratuitous ARP packet > can be inferred as an attachment event by the NVE which receives such > packet via the NVE-TES interface. Meanwhile, for those L2VPN (e.g., VPLS) > or L3VPN overlay approaches which only allows one next-hop to be > available for a given MAC route or a host route in the forwarding table, > a gratuitous ARP packet received from a remote NVE could be inferred as a > detachment event by the NVE to which the ARP sending VM was previously > attached. Moreover, in case a gratuitous ARP packet triggers the NVE which > received that packet via the NVE-TES interface to generate a MAC route or a > host route for the ARP sending VM, and the NVE to which that VM was > previously attached, upon receiving that route, could also infer that > route as a detachment event of that VM.**** > > ** ** > > Best regards,**** > > Xiaohu**** > > ** ** > > <skipped>**** > > I have a related consideration based on thinking about this further. The > network**** > > SHOULD NOT rely on dissociate messages always being sent - a server crash > at the**** > > wrong point during a VM migration may cause a dissociate to be missed > (e.g., the**** > > VM made it to S’, but S crashed before sending the dissociate). More > importantly,**** > > not relying on the dissociate messages (in particular, not having the > inter-NVE**** > > control protocol rely on them) helps if one wants to mix hypervisors that > support**** > > the attach/detach protocol with (exiting) ones that don’t. For existing > hypervisors,**** > > under suitable restrictions and assuming some advance configuration, > “associate”**** > > can be inferred from a gratuitous ARP or RARP, but nothing is sent for > dissociate.**** > > The inference of “associate” won’t be possible if things have not been set > up**** > > to enable the gratuitous ARP or RARP.**** > > ** ** > > Thanks, > --David**** > > ** ** > > *From:* [email protected] [mailto:[email protected]] *On Behalf > Of *Kireeti Kompella > *Sent:* Saturday, July 14, 2012 8:16 PM > *To:* Black, David > *Cc:* [email protected] > *Subject:* Re: [nvo3] Comments on draft-kompella-nvo3-server2nve**** > > ** ** > > Hi David,**** > > ** ** > > Thanks for your detailed comments! More inline.**** > > On Fri, Jul 13, 2012 at 1:01 PM, <[email protected]> wrote:**** > > Authors (Kireeti, Yakov and Thomas), > > This is a good draft - it looks like a good foundation to focus discussion > around what the server-to-NVE (attach/detach) protocol needs to do. I > like a lot of the contents - I have a few high level comments and some > more detailed feedback.**** > > ** ** > > Thanks!**** > > **** > > (1) This draft starts out dealing with the attach/detach (server-to-NVE) > protocol and then includes some material on the control protocol for > distributing and managing mapping information on the NVEs. I suggest > focusing the draft on the attach/detach protocol, removing control > protocol discussion (e.g., Section 3), and minimizing assumptions about > the control protocol (see detailed comments for where I think assumptions > could be minimized). The result should be more general and more useful.** > ** > > ** ** > > About the control plane: it really concerns me that the control plane > discussion has not happened so far (not really). ARP doesn't scale; > neither does flooding. The goal here is to signal networking parameters: > from server (vswitch) to local NVE to remote NVEs to remote servers. Fine, > call the local NVE to remote NVEs part "control plane" -- but that's a > critical part of the picture.**** > > ** ** > > What I take from your suggestion is to move the lNVE to rNVE part to a > different draft; I buy that, especially if there are other mechanisms for > doing this that can plug-in to this server2nve signaling, so that one can > mix-and-match server2nve signaling and lNVE2rNVE signaling. Does that seem > reasonable?**** > > **** > > (2) Section 2.2.3 on detach is trying to cover at least a couple of use > cases, VM live migration, and VM removal (e.g., power-off) that probably > want to be separated. The current text really doesn't get the live > migration > case right, D.4 comes before D.3 for the power-off case, and I think things > get more complex when the live migration detach functionality is corrected. > **** > > ** ** > > More on this below.**** > > **** > > (3) Section 2.2.4 appears to assume a specific order of events between > the two servers involved in VM migration. As those servers are operating > concurrently, that's not a robust assumption, and the NVE functionality > should be specified to not depend on the order of events.**** > > ** ** > > Ordering assumptions weren't intended, so we'll tweak the wording to > remove any such implications.**** > > **** > > --- Detailed comments by section --- > > A pre-disassociate operation is defined in section 2.2.1 but not used > in the rest of the draft. Is it actually needed?**** > > ** ** > > Good catch! I'd put that in early, worked out the rest of the details, > couldn't figure out a use for it, but forgot to remove it. I'll remove it. > **** > > **** > > -- Section 2.2.2 > > A.1: Validate the authentication (if present). If not, inform the > provisioning system, log the error, and stop processing the > associate message. > > This step should also include an optional authorization check, as network > policy may limit which NVEs are allowed to participate in which VNs.**** > > ** ** > > Okay. Authorization locally, or from the provisioning system? (Or > either?)**** > > **** > > A.3: If the VID in the associate message is non-zero, look up <VNID, > P>. If the result is zero, or equal to VID, all's well. > Otherwise, respond to S with an error, and stop processing the > associate message. > > Why is a zero VID lookup result ok for a non-zero VID in the associate > message? **** > > ** ** > > Just means no mapping yet. With respect to the refcounting suggested > below, good place to set it to 1; otherwise increment.**** > > **** > > Should the NVE copy the VID from the associate message to the > <VNID,P> entry before responding?**** > > ** ** > > Good point. Will fix.**** > > **** > > A.5: Communicate with each rNVE device to advertise the VM's > addresses, and also to get the addresses of other VMs in the > DCVPN. Populate the table with the VM's addresses and > addresses learned from each rNVE. > > This assumes that the control protocol does active propagation of all > address info, and assumes that no other addresses for the VN are present > in the NVE. Neither of those are good general assumptions, IMHO, and > in particular, lazy evaluation is possible (e.g., load address mappings > on demand to reduce the amount of invalidation traffic caused by > each mapping change). **** > > ** ** > > I'm leery of on-demand/cache-based address mappings and lazy evaluation > (love it in general, but not for address mappings). However, you're right: > there may be cases where this is a valid approach.**** > > ** ** > > I'd suggest rephrasing to something like: > > A.5: Use the overlay control protocol to inform the network of the > VM's addresses and the VM's association with this NVE.**** > > ** ** > > Something like that. Will work on text.**** > > **** > > -- Section 2.2.3 > > D.1: Validate the authentication (if present). If not, inform the > provisioning system, log the error, and stop processing the > associate message. > > Like A.1, this should include an optional authorization check, as some > <VNID,P> -> VID mappings may be statically configured and hence not > permit removal.**** > > ** ** > > Okay, will copy wording from there once we've agreed on it.**** > > **** > > D.2: If the hold time is non-zero, point the VM's addresses in the > VNID table to the new location of the VM, if known, or to > "discard", and start a timer for the period of the hold time. > If hold time is zero, immediately perform step D.4, then go to > D.3. > > This is where the power-off and migration cases start to interact - > Hold time would be zero for power-off, non-zero for detach. For migration, > this change potentially races with a change to the VM's addresses received > via the control protocol, so the VM's address may already point somewhere > else if the control protocol did its update before the dissociate (in > which case nothing should be done to those addresses).**** > > ** ** > > Definitely worth looking at again, especially with respect to your > comments about the order for migration.**** > > ** ** > > With regard to the race condition, I'll send a separate email on that.**** > > ** ** > > D.3: Set the VID for <VNID, P> as unassigned. Respond to S saying > that the operation was successful. > > If there are multiple VMs using the VNID on that port, this > "pulls the rug" out from under the others by disabling their forwarding. > This <VNID,P> -> VID mapping needs a reference count of some form, and > corresponding changes would be needed to A.2 and A.3. Not using a > reference count may be ok under the assumption that the NVE does not > share ports among VMs (or VSIs/vNICs), but that may not be a good > assumption for an external NVE (e.g., in a ToR switch).**** > > ** ** > > Good point! I'll go with refcounting.**** > > **** > > D.4: When the hold timer expires, delete the VM's addresses from the > VNID table. Delete any VM-specific network policies associated > with any of the VM addresses. If the VNID table is empty after > deleting the VM's addresses, optionally delete the table and > any network policies for the VNID. > > Well, that's the right thing to do in the power-off case, but not > when the VM has moved and there are other VMs on this NVE (possibly even > the same port) that still need to communicate with the moved VM. Also, > the power-off case needs to include (at least optionally) informing the > control protocol of the withdrawal of the VM's addresses.**** > > ** ** > > See separate email.**** > > **** > > As noted in (2) above, I think it would be clearer if there were separate > versions of 2.2.3 for the migration departure and power-down use cases.*** > * > > ** ** > > Perhaps. Let's get the semantics right first, then see if there are > common elements or not.**** > > **** > > -- Section 2.2.4 > > M.3: S then gets a request to terminate the VM on S. > > M.4: Finally, S' gets a request to start up the VM on S'. > > Not exactly ;-). > > Terminating the VM on S (and destroying its state) before confirming > its startup on S' risks losing the VM entirely if something goes wrong > on S'. **** > > ** ** > > Interesting point. However, if the VM starts on S' without first being > stopped on S, then (for some time) both S and S' are running, and I'd think > that the results would be unpredictable, especially if the VM is just about > to engage in some I/O. However, I'll bow to those who've implemented VM > migration and know what they're doing. Perhaps the VM is paused on S, > started on S'; if that's successful, the VM is destroyed on S, otherwise > the migration is aborted and the VM is continued on S. I'd like to know, > as this affects the "tentative address changes" you talk about below, and > dealing with migration abort.**** > > ** ** > > This level of detail isn't necessary - from the point of view > of the network: > - Startup on S' generates an associate request to the NVE for S'. > - The dissociate request from S to its NVE may occur before or after > that S' associate request > - The dissociate request from S to its NVE may occur before or after > control protocol propagation of the results of the S' associate > request to the NVE for S. **** > > The server-to-NVE functionality should be specified to operate properly > independent of the order of these events.**** > > ** ** > > Agreed. Separate email thread to work this out.**** > > ** ** > > PA.5: Communicate with each rNVE device to advertise the VM's > addresses but as non-preferred destinations(*). Also get the > addresses of other VMs in the DCVPN. Populate the table with the > VM's addresses and addresses learned from each rNVE. > > That assumes aggressive push of the new address information by the > control protocol directly to the rNVEs - while a control protocol > may choose to do that, it's not strictly necessary and the interaction > may not be directly between the lNVE and the rNVEs. Generalizing in > a fashion similar to A.5, I'd suggest something like: > > PA.5: The overlay control protocol may be used to inform the > network of the forthcoming change to the VM's addresses > that will occur when the VM is associated with this NVE.**** > > ** ** > > Okay, something like.**** > > **** > > If this is done, withdrawal of the tentative address changes > needs to be discussed, as VM migrations can abort for a variety > of reasons (e.g., S' may crash during the copy). This PA.5 > step can be skipped for a control protocol only does on-demand > provisioning of the address mapping information.**** > > ** ** > > Interesting thought. Will follow up once we get the migration "right" > (for some value of right).**** > > **** > > -- Section 3 > > This appears to be entirely about the control protocol and (IMHO) > doesn't fit well with the rest of the draft.**** > > ** ** > > Will discuss putting this in a separate draft with co-authors.**** > > ** ** > > Thanks again for the detailed comments!**** > > Kireeti.**** > > ** ** > > **** > > Thanks, > --David > ---------------------------------------------------- > David L. Black, Distinguished Engineer > EMC Corporation, 176 South St., Hopkinton, MA 01748 > +1 (508) 293-7953 FAX: +1 (508) 293-7786 > [email protected] Mobile: +1 (978) 394-7754 > ---------------------------------------------------- > > _______________________________________________ > nvo3 mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/nvo3**** > > > > **** > > ** ** > > -- > Kireeti**** > > _______________________________________________ > nvo3 mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/nvo3 > >
_______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
