Hi Greg, See below prefixed with [ag4].
Thanks, Anoop On Wed, Nov 21, 2018 at 4:36 PM Greg Mirsky <[email protected]> wrote: > Hi Anoop, > apologies for the miss. Is it the last outstanding? Let's bring it to the > front then. > > - What is the benefit of running BFD per VNI between a pair of VTEPs? >>>>> >>>> GIM2>> An alternative would be to run CFM between VMs, if there's the >>>> need to monitor liveliness of the particular VM. Again, this is optional. >>>> >>> >>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one >>> to monitor the liveliness of VMs. >>> >> > [ag3] I think you missed responding to this. I'm not sure of the value of > running BFD per VNI between VTEPs. What am I getting that is not covered > by running a single BFD session with VNI 0 between the VTEPs? > > GIM3>> I've misspoken. Non-zero VNI is recommended to be used to > demultiplex BFD sessions between the same VTEPs. In section 6.1: > The procedure for demultiplexing > packets with Your Discriminator equal to 0 is different from > [RFC5880]. For such packets, the BFD session MUST be identified > using the inner headers, i.e., the source IP and the destination IP > present in the IP header carried by the payload of the VXLAN > encapsulated packet. The VNI of the packet SHOULD be used to derive > interface-related information for demultiplexing the packet. > > Hope that clarifies the use of non-zero VNI in VXLAN encapsulation of a > BFD control packet. > [ag4] This tells me how the VNI is used for BFD packets being sent/received. What is the use case/benefit of doing that? I am creating a special interface with VNI 0 just for BFD. Why do I now need to run BFD on any/all of the other VNIs? As a developer, if I read this spec, should I be building this capability or not? Basically what I'm getting at is I think the draft should recommend using VNI 0. If there is a convincing use case for running BFD over other VNIs serviced by that VTEP, then that needs to be explained. But as I mentioned before, this leads to scaling issues. So given the scaling issues, it would be good if an implementation only needed to worry about sending BFD messages on VNI 0. > > Regards, > Greg > > On Tue, Nov 20, 2018 at 12:14 PM Anoop Ghanwani <[email protected]> > wrote: > >> Hi Greg, >> >> Please see inline prefixed by [ag3]. >> >> Thanks, >> Anoop >> >> On Fri, Nov 16, 2018 at 5:29 PM Greg Mirsky <[email protected]> >> wrote: >> >>> Hi Anoop, >>> thank you for the discussion. Please find my responses tagged GIM3>>. >>> Also, attached diff and the updated working version of the draft. Hope >>> we're converging. >>> >>> Regards, >>> Greg >>> >>> On Wed, Nov 14, 2018 at 11:00 PM Anoop Ghanwani <[email protected]> >>> wrote: >>> >>>> Hi Greg, >>>> >>>> Please see inline prefixed with [ag2]. >>>> >>>> Thanks, >>>> Anoop >>>> >>>> On Wed, Nov 14, 2018 at 9:45 AM Greg Mirsky <[email protected]> >>>> wrote: >>>> >>>>> Hi Anoop, >>>>> thank you for the expedient response. I am glad that some of my >>>>> responses have addressed your concerns. Please find followup notes in-line >>>>> tagged GIM2>>. I've attached the diff to highlight the updates applied in >>>>> the working version. Let me know if these are acceptable changes. >>>>> >>>>> Regards, >>>>> Greg >>>>> >>>>> On Tue, Nov 13, 2018 at 12:30 PM Anoop Ghanwani <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Greg, >>>>>> >>>>>> Please see inline prefixed with [ag]. >>>>>> >>>>>> Thanks, >>>>>> Anoop >>>>>> >>>>>> On Tue, Nov 13, 2018 at 11:34 AM Greg Mirsky <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Anoop, >>>>>>> many thanks for the thorough review and detailed comments. Please >>>>>>> find my answers, this time for real, in-line tagged GIM>>. >>>>>>> >>>>>>> Regards, >>>>>>> Greg >>>>>>> >>>>>>> On Thu, Nov 8, 2018 at 1:58 AM Anoop Ghanwani <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Here are my comments. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Anoop >>>>>>>> >>>>>>>> == >>>>>>>> >>>>>>>> Philosophical >>>>>>>> >>>>>>>> Since VXLAN is not an IETF standard, should we be defining a >>>>>>>> standard for running BFD on it? Should we define BFD over Geneve >>>>>>>> instead >>>>>>>> which is the official WG selection? Is that going to be a separate >>>>>>>> document? >>>>>>>> GIM>> IS-IS is not on the Standard track either but that had not >>>>>>>> prevented IETF from developing tens of standard track RFCs using RFC >>>>>>>> 1142 >>>>>>>> as the normative reference until RFC 7142 re-classified it as >>>>>>>> historical. A >>>>>>>> similar path was followed with IS-IS-TE by publishing RFC 3784 until >>>>>>>> it was >>>>>>>> obsoleted by RFC 5305 four years later. I understand that Down >>>>>>>> Reference, >>>>>>>> i.e., using informational RFC as the normative reference, is not an >>>>>>>> unusual >>>>>>>> situation. >>>>>>>> >>>>>>> >>>>>> [ag] OK. I'm not an expert on this part so unless someone else that >>>>>> is an expert (chairs, AD?) can comment on it, I'll just let it go. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Technical >>>>>>>> >>>>>>>> Section 1: >>>>>>>> >>>>>>>> This part needs to be rewritten: >>>>>>>> >>> >>>>>>>> The individual racks may be part of a different Layer 3 network, or >>>>>>>> they could be in a single Layer 2 network. The VXLAN segments/overlays >>>>>>>> are >>>>>>>> overlaid on top of Layer 3 network. A VM can communicate with another >>>>>>>> VM >>>>>>>> only if they are on the same VXLAN segment. >>>>>>>> >>> >>>>>>>> It's hard to parse and, given IRB, >>>>>>>> >>>>>>> GIM>> Would the following text be acceptable: >>>>>>> OLD TEXT: >>>>>>> VXLAN is typically deployed in data centers interconnecting >>>>>>> virtualized hosts, which may be spread across multiple racks. The >>>>>>> individual racks may be part of a different Layer 3 network, or >>>>>>> they >>>>>>> could be in a single Layer 2 network. The VXLAN segments/overlays >>>>>>> are overlaid on top of Layer 3 network. >>>>>>> NEW TEXT: >>>>>>> VXLAN is typically deployed in data centers interconnecting >>>>>>> virtualized >>>>>>> hosts of a tenant. VXLAN addresses requirements of the Layer 2 and >>>>>>> Layer 3 data center network infrastructure in the presence of VMs in >>>>>>> a multi-tenant environment, discussed in section 3 [RFC7348], by >>>>>>> providing Layer 2 overlay scheme on a Layer 3 network. >>>>>>> >>>>>> >>>>>> [ag] This is a lot better. >>>>>> >>>>>> >>>>>>> >>>>>>> A VM can communicate with another VM only if they are on the same >>>>>>> VXLAN segment. >>>>>>>> >>>>>>>> the last sentence above is wrong. >>>>>>>> >>>>>>> GIM>> Section 4 in RFC 7348 states: >>>>>>> Only VMs within the same VXLAN segment can communicate with each >>>>>>> other. >>>>>>> >>>>>> >>>>>> [ag] VMs on different segments can communicate using routing/IRB, so >>>>>> even RFC 7348 is wrong. Perhaps the text should be modified so say -- >>>>>> "In >>>>>> the absence of a router in the overlay, a VM can communicate...". >>>>>> >>>>>> >>>>>>> >>>>>>> Section 3: >>>>>>>> >>> >>>>>>>> Most deployments will have VMs with only L2 capabilities that >>>>>>>> may not support L3. >>>>>>>> >>> >>>>>>>> Are you suggesting most deployments have VMs with no IP >>>>>>>> addresses/configuration? >>>>>>>> >>>>>>> GIM>> Would re-word as follows: >>>>>>> OLD TEXT: >>>>>>> Most deployments will have VMs with only L2 capabilities that >>>>>>> may not support L3. >>>>>>> NEW TEXT: >>>>>>> Deployments may have VMs with only L2 capabilities that do not >>>>>>> support L3. >>>>>>> >>>>>> >>>>>> [ag] I still don't understand this. What does it mean for a VM to >>>>>> not support L3? No IP address, no default GW, something else? >>>>>> >>>>> GIM2>> VM communicates with its VTEP which, in turn, originates VXLAN >>>>> tunnel. VM is not required to have IP address as it is VTEP's IP address >>>>> that VM's MAC is associated with. As for gateway, RFC 7348 discusses VXLAN >>>>> gateway as the device that forwards traffice between VXLAN and non-VXLAN >>>>> domains. Considering all that, would the following change be acceptable: >>>>> OLD TEXT: >>>>> Most deployments will have VMs with only L2 capabilities that >>>>> may not support L3. >>>>> NEW TEXT: >>>>> Most deployments will have VMs with only L2 capabilities and not have >>>>> an IP address assigned. >>>>> >>>> >>>> [ag2] Do you have a reference for this (i.e. that most deployments have >>>> VMs without an IP address)? Normally I would think VMs would have an IP >>>> address. It's just that they are segregated into segments and, without an >>>> intervening router, they are restricted to communicate only within their >>>> subnet. >>>> >>> GIM3>> Would the following text be acceptable: >>> >>> Deployments might have VMs with only L2 capabilities and not have an IP >>> address assigned or, >>> in other cases, VMs are assigned IP address but are restricted to >>> communicate only within their subnet. >>> >>> >> [ag3] Yes, this is better. >> >> >>>>>> >>>>>>> >>>>>>>> >>> >>>>>>>> Having a hierarchical OAM model helps localize faults though it >>>>>>>> requires additional consideration. >>>>>>>> >>> >>>>>>>> What are the additional considerations? >>>>>>>> >>>>>>> GIM>> For example, coordination of BFD intervals across the OAM >>>>>>> layers. >>>>>>> >>>>>> >>>>>> [ag] Can we mention them in the draft? >>>>>> >>>>>> >>>>>>> >>>>>>>> Would be useful to add a reference to RFC 8293 in case the reader >>>>>>>> would like to know more about service nodes. >>>>>>>> >>>>>>> GIM>> I have to admit that I don't find how RFC 8293 A Framework >>>>>>> for Multicast in Network Virtualization over Layer 3 is related to this >>>>>>> document. Please help with additional reference to the text of the >>>>>>> document. >>>>>>> >>>>>> >>>>>> [ag] The RFC discusses the use of service nodes which is mentioned >>>>>> here. >>>>>> >>>>>> >>>>>>> >>>>>>>> Section 4 >>>>>>>> >>> >>>>>>>> Separate BFD sessions can be established between the VTEPs (IP1 and >>>>>>>> IP2) for monitoring each of the VXLAN tunnels (VNI 100 and 200). >>>>>>>> >>> >>>>>>>> IMO, the document should mention that this could lead to scaling >>>>>>>> issues given that VTEPs can support well in excess of 4K VNIs. >>>>>>>> Additionally, we should mention that with IRB, a given VNI may not even >>>>>>>> exist on the destination VTEP. Finally, what is the benefit of doing >>>>>>>> this? There may be certain corner cases where it's useful (vs a >>>>>>>> single BFD >>>>>>>> session between the VTEPs for all VNIs) but it would be good to explain >>>>>>>> what those are. >>>>>>>> >>>>>>> GIM>> Will add text in the Security Considerations section that >>>>>>> VTEPs should have limit on number of BFD sessions. >>>>>>> >>>>>> >>>>>> [ag] I was hoping for two things: >>>>>> - A mention about the scalability issue right where per-VNI BFD is >>>>>> discussed. (Not sure why that is a security issue/consideration.) >>>>>> >>>>> GIM2>> I've added the following sentense in both places: >>>>> The implementation SHOULD have a reasonable upper bound on the number >>>>> of BFD sessions that can be created between the same pair of VTEPs. >>>>> >>>> >>>> [ag2] What is the criteria for determining what is reasonable? >>>> >>> GIM>> I usually understand that as requirement to make it controllable, >>> have configurable limit. Thus it will be up to an network operator to set >>> the limit. >>> >>>> >>>> >>>>> - What is the benefit of running BFD per VNI between a pair of VTEPs? >>>>>> >>>>> GIM2>> An alternative would be to run CFM between VMs, if there's the >>>>> need to monitor liveliness of the particular VM. Again, this is optional. >>>>> >>>> >>>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one >>>> to monitor the liveliness of VMs. >>>> >>> >> [ag3] I think you missed responding to this. I'm not sure of the value >> of running BFD per VNI between VTEPs. What am I getting that is not >> covered by running a single BFD session with VNI 0 between the VTEPs? >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> Sections 5.1 and 6.1 >>>>>>>> >>>>>>>> In 5.1 we have >>>>>>>> >>> >>>>>>>> The inner MAC frame carrying the BFD payload has the >>>>>>>> following format: >>>>>>>> ... Source IP: IP address of the originating VTEP. Destination IP: >>>>>>>> IP address of the terminating VTEP. >>>>>>>> >>> >>>>>>>> >>>>>>>> In 6.1 we have >>>>>>>> >>> >>>>>>>> >>>>>>>> Since multiple BFD sessions may be running between two >>>>>>>> VTEPs, there needs to be a mechanism for demultiplexing received BF >>>>>>>> >>>>>>>> packets to the proper session. The procedure for demultiplexing >>>>>>>> packets with Your Discriminator equal to 0 is different from[RFC5880 >>>>>>>> <https://tools.ietf.org/html/rfc5880>]. >>>>>>>> >>>>>>>> *For such packets, the BFD session MUST be identified* >>>>>>>> >>>>>>>> *using the inner headers, i.e., the source IP and the destination IP >>>>>>>> present in the IP header carried by the payload of the VXLAN* >>>>>>>> >>>>>>>> *encapsulated packet.* >>>>>>>> >>>>>>>> >>>>>>>> >>> >>>>>>>> How does this work if the source IP and dest IP are the same as >>>>>>>> specified in 5.1? >>>>>>>> >>>>>>> GIM>> You're right, Destination and source IP addresses likely are >>>>>>> the same in this case. Will add that the source UDP port number, along >>>>>>> with >>>>>>> the pair of IP addresses, MUST be used to demux received BFD control >>>>>>> packets. Would you agree that will be sufficient? >>>>>>> >>>>>> >>>>>> [ag] Yes, I think that should work. >>>>>> >>>>>>> >>>>>>>> Editorial >>>>>>>> >>>>>>> >>>>>> [ag] Agree with all comments on this section. >>>>>> >>>>>>> >>>>>>>> - Terminology section should be renamed to acronyms. >>>>>>>> >>>>>>> GIM>> Accepted >>>>>>> >>>>>>>> - Document would benefit from a thorough editorial scrub, but maybe >>>>>>>> that will happen once it gets to the RFC editor. >>>>>>>> >>>>>>> GIM>> Will certainly have helpful comments from ADs and RFC editor. >>>>>>> >>>>>>>> >>>>>>>> Section 1 >>>>>>>> >>> >>>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348 >>>>>>>> <https://tools.ietf.org/html/rfc7348>]. provides an encapsulation >>>>>>>> scheme that allows virtual machines (VMs) to communicate in a data >>>>>>>> center >>>>>>>> network. >>>>>>>> >>> >>>>>>>> This is not accurate. VXLAN allows you to implement an overlay to >>>>>>>> decouple the address space of the attached hosts from that of the >>>>>>>> network. >>>>>>>> >>>>>>> GIM>> Thank you for the suggested text. Will change as follows: >>>>>>> OLD TEXT: >>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348]. >>>>>>> provides >>>>>>> an encapsulation scheme that allows virtual machines (VMs) to >>>>>>> communicate in a data center network. >>>>>>> NEW TEXT: >>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348]. provides >>>>>>> an encapsulation scheme that allows building an overlay network >>>>>>> by >>>>>>> decoupling the address space of the attached virtual hosts from >>>>>>> that of the network. >>>>>>> >>>>>>>> >>>>>>>> Section 7 >>>>>>>> >>>>>>>> VTEP's -> VTEPs >>>>>>>> >>>>>>> GIM>> Yes, thank you. >>>>>>> >>>>>>
