Hi Anoop, apologies if my explanation was not clear. Non-zero VNIs are recommended to be used by a VTEP that received BFD control packet with zero Your Discriminator value. BFD control packets with non-zero Your Discriminator value will be demultiplexed using only that value. As for the special role of VNI 0 the section 7 of the draft states the following: BFD session MAY be established for the reserved VNI 0. One way to aggregate BFD sessions between VTEP's is to establish a BFD session with VNI 0. A VTEP MAY also use VNI 0 to establish a BFD session with a service node. Would you suggest changing the normative language in this text?
Regards, Greg PS. Happy Thanksgiving to All! On Wed, Nov 21, 2018 at 11:00 PM Anoop Ghanwani <[email protected]> wrote: > Hi Greg, > > See below prefixed with [ag4]. > > Thanks, > Anoop > > On Wed, Nov 21, 2018 at 4:36 PM Greg Mirsky <[email protected]> wrote: > >> Hi Anoop, >> apologies for the miss. Is it the last outstanding? Let's bring it to the >> front then. >> >> - What is the benefit of running BFD per VNI between a pair of VTEPs? >>>>>> >>>>> GIM2>> An alternative would be to run CFM between VMs, if there's the >>>>> need to monitor liveliness of the particular VM. Again, this is optional. >>>>> >>>> >>>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one >>>> to monitor the liveliness of VMs. >>>> >>> >> [ag3] I think you missed responding to this. I'm not sure of the value >> of running BFD per VNI between VTEPs. What am I getting that is not >> covered by running a single BFD session with VNI 0 between the VTEPs? >> >> GIM3>> I've misspoken. Non-zero VNI is recommended to be used to >> demultiplex BFD sessions between the same VTEPs. In section 6.1: >> The procedure for demultiplexing >> packets with Your Discriminator equal to 0 is different from >> [RFC5880]. For such packets, the BFD session MUST be identified >> using the inner headers, i.e., the source IP and the destination IP >> present in the IP header carried by the payload of the VXLAN >> encapsulated packet. The VNI of the packet SHOULD be used to derive >> interface-related information for demultiplexing the packet. >> >> Hope that clarifies the use of non-zero VNI in VXLAN encapsulation of a >> BFD control packet. >> > > [ag4] This tells me how the VNI is used for BFD packets being > sent/received. What is the use case/benefit of doing that? I am creating > a special interface with VNI 0 just for BFD. Why do I now need to run BFD > on any/all of the other VNIs? As a developer, if I read this spec, should > I be building this capability or not? Basically what I'm getting at is I > think the draft should recommend using VNI 0. If there is a convincing use > case for running BFD over other VNIs serviced by that VTEP, then that needs > to be explained. But as I mentioned before, this leads to scaling issues. > So given the scaling issues, it would be good if an implementation only > needed to worry about sending BFD messages on VNI 0. > > >> >> Regards, >> Greg >> >> On Tue, Nov 20, 2018 at 12:14 PM Anoop Ghanwani <[email protected]> >> wrote: >> >>> Hi Greg, >>> >>> Please see inline prefixed by [ag3]. >>> >>> Thanks, >>> Anoop >>> >>> On Fri, Nov 16, 2018 at 5:29 PM Greg Mirsky <[email protected]> >>> wrote: >>> >>>> Hi Anoop, >>>> thank you for the discussion. Please find my responses tagged GIM3>>. >>>> Also, attached diff and the updated working version of the draft. Hope >>>> we're converging. >>>> >>>> Regards, >>>> Greg >>>> >>>> On Wed, Nov 14, 2018 at 11:00 PM Anoop Ghanwani <[email protected]> >>>> wrote: >>>> >>>>> Hi Greg, >>>>> >>>>> Please see inline prefixed with [ag2]. >>>>> >>>>> Thanks, >>>>> Anoop >>>>> >>>>> On Wed, Nov 14, 2018 at 9:45 AM Greg Mirsky <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Anoop, >>>>>> thank you for the expedient response. I am glad that some of my >>>>>> responses have addressed your concerns. Please find followup notes >>>>>> in-line >>>>>> tagged GIM2>>. I've attached the diff to highlight the updates applied in >>>>>> the working version. Let me know if these are acceptable changes. >>>>>> >>>>>> Regards, >>>>>> Greg >>>>>> >>>>>> On Tue, Nov 13, 2018 at 12:30 PM Anoop Ghanwani < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Greg, >>>>>>> >>>>>>> Please see inline prefixed with [ag]. >>>>>>> >>>>>>> Thanks, >>>>>>> Anoop >>>>>>> >>>>>>> On Tue, Nov 13, 2018 at 11:34 AM Greg Mirsky <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Anoop, >>>>>>>> many thanks for the thorough review and detailed comments. Please >>>>>>>> find my answers, this time for real, in-line tagged GIM>>. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Greg >>>>>>>> >>>>>>>> On Thu, Nov 8, 2018 at 1:58 AM Anoop Ghanwani < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Here are my comments. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Anoop >>>>>>>>> >>>>>>>>> == >>>>>>>>> >>>>>>>>> Philosophical >>>>>>>>> >>>>>>>>> Since VXLAN is not an IETF standard, should we be defining a >>>>>>>>> standard for running BFD on it? Should we define BFD over Geneve >>>>>>>>> instead >>>>>>>>> which is the official WG selection? Is that going to be a separate >>>>>>>>> document? >>>>>>>>> GIM>> IS-IS is not on the Standard track either but that had not >>>>>>>>> prevented IETF from developing tens of standard track RFCs using RFC >>>>>>>>> 1142 >>>>>>>>> as the normative reference until RFC 7142 re-classified it as >>>>>>>>> historical. A >>>>>>>>> similar path was followed with IS-IS-TE by publishing RFC 3784 until >>>>>>>>> it was >>>>>>>>> obsoleted by RFC 5305 four years later. I understand that Down >>>>>>>>> Reference, >>>>>>>>> i.e., using informational RFC as the normative reference, is not an >>>>>>>>> unusual >>>>>>>>> situation. >>>>>>>>> >>>>>>>> >>>>>>> [ag] OK. I'm not an expert on this part so unless someone else that >>>>>>> is an expert (chairs, AD?) can comment on it, I'll just let it go. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Technical >>>>>>>>> >>>>>>>>> Section 1: >>>>>>>>> >>>>>>>>> This part needs to be rewritten: >>>>>>>>> >>> >>>>>>>>> The individual racks may be part of a different Layer 3 network, >>>>>>>>> or they could be in a single Layer 2 network. The VXLAN >>>>>>>>> segments/overlays >>>>>>>>> are overlaid on top of Layer 3 network. A VM can communicate with >>>>>>>>> another >>>>>>>>> VM only if they are on the same VXLAN segment. >>>>>>>>> >>> >>>>>>>>> It's hard to parse and, given IRB, >>>>>>>>> >>>>>>>> GIM>> Would the following text be acceptable: >>>>>>>> OLD TEXT: >>>>>>>> VXLAN is typically deployed in data centers interconnecting >>>>>>>> virtualized hosts, which may be spread across multiple racks. >>>>>>>> The >>>>>>>> individual racks may be part of a different Layer 3 network, or >>>>>>>> they >>>>>>>> could be in a single Layer 2 network. The VXLAN >>>>>>>> segments/overlays >>>>>>>> are overlaid on top of Layer 3 network. >>>>>>>> NEW TEXT: >>>>>>>> VXLAN is typically deployed in data centers interconnecting >>>>>>>> virtualized >>>>>>>> hosts of a tenant. VXLAN addresses requirements of the Layer 2 and >>>>>>>> Layer 3 data center network infrastructure in the presence of VMs >>>>>>>> in >>>>>>>> a multi-tenant environment, discussed in section 3 [RFC7348], by >>>>>>>> providing Layer 2 overlay scheme on a Layer 3 network. >>>>>>>> >>>>>>> >>>>>>> [ag] This is a lot better. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> A VM can communicate with another VM only if they are on the same >>>>>>>> VXLAN segment. >>>>>>>>> >>>>>>>>> the last sentence above is wrong. >>>>>>>>> >>>>>>>> GIM>> Section 4 in RFC 7348 states: >>>>>>>> Only VMs within the same VXLAN segment can communicate with each >>>>>>>> other. >>>>>>>> >>>>>>> >>>>>>> [ag] VMs on different segments can communicate using routing/IRB, so >>>>>>> even RFC 7348 is wrong. Perhaps the text should be modified so say -- >>>>>>> "In >>>>>>> the absence of a router in the overlay, a VM can communicate...". >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Section 3: >>>>>>>>> >>> >>>>>>>>> Most deployments will have VMs with only L2 capabilities that >>>>>>>>> may not support L3. >>>>>>>>> >>> >>>>>>>>> Are you suggesting most deployments have VMs with no IP >>>>>>>>> addresses/configuration? >>>>>>>>> >>>>>>>> GIM>> Would re-word as follows: >>>>>>>> OLD TEXT: >>>>>>>> Most deployments will have VMs with only L2 capabilities that >>>>>>>> may not support L3. >>>>>>>> NEW TEXT: >>>>>>>> Deployments may have VMs with only L2 capabilities that do not >>>>>>>> support L3. >>>>>>>> >>>>>>> >>>>>>> [ag] I still don't understand this. What does it mean for a VM to >>>>>>> not support L3? No IP address, no default GW, something else? >>>>>>> >>>>>> GIM2>> VM communicates with its VTEP which, in turn, originates VXLAN >>>>>> tunnel. VM is not required to have IP address as it is VTEP's IP address >>>>>> that VM's MAC is associated with. As for gateway, RFC 7348 discusses >>>>>> VXLAN >>>>>> gateway as the device that forwards traffice between VXLAN and non-VXLAN >>>>>> domains. Considering all that, would the following change be acceptable: >>>>>> OLD TEXT: >>>>>> Most deployments will have VMs with only L2 capabilities that >>>>>> may not support L3. >>>>>> NEW TEXT: >>>>>> Most deployments will have VMs with only L2 capabilities and not >>>>>> have an IP address assigned. >>>>>> >>>>> >>>>> [ag2] Do you have a reference for this (i.e. that most deployments >>>>> have VMs without an IP address)? Normally I would think VMs would have an >>>>> IP address. It's just that they are segregated into segments and, without >>>>> an intervening router, they are restricted to communicate only within >>>>> their >>>>> subnet. >>>>> >>>> GIM3>> Would the following text be acceptable: >>>> >>>> Deployments might have VMs with only L2 capabilities and not have an IP >>>> address assigned or, >>>> in other cases, VMs are assigned IP address but are restricted to >>>> communicate only within their subnet. >>>> >>>> >>> [ag3] Yes, this is better. >>> >>> >>>>>>> >>>>>>>> >>>>>>>>> >>> >>>>>>>>> Having a hierarchical OAM model helps localize faults though it >>>>>>>>> requires additional consideration. >>>>>>>>> >>> >>>>>>>>> What are the additional considerations? >>>>>>>>> >>>>>>>> GIM>> For example, coordination of BFD intervals across the OAM >>>>>>>> layers. >>>>>>>> >>>>>>> >>>>>>> [ag] Can we mention them in the draft? >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Would be useful to add a reference to RFC 8293 in case the reader >>>>>>>>> would like to know more about service nodes. >>>>>>>>> >>>>>>>> GIM>> I have to admit that I don't find how RFC 8293 A Framework >>>>>>>> for Multicast in Network Virtualization over Layer 3 is related to this >>>>>>>> document. Please help with additional reference to the text of the >>>>>>>> document. >>>>>>>> >>>>>>> >>>>>>> [ag] The RFC discusses the use of service nodes which is mentioned >>>>>>> here. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Section 4 >>>>>>>>> >>> >>>>>>>>> Separate BFD sessions can be established between the VTEPs (IP1 >>>>>>>>> and IP2) for monitoring each of the VXLAN tunnels (VNI 100 and 200). >>>>>>>>> >>> >>>>>>>>> IMO, the document should mention that this could lead to scaling >>>>>>>>> issues given that VTEPs can support well in excess of 4K VNIs. >>>>>>>>> Additionally, we should mention that with IRB, a given VNI may not >>>>>>>>> even >>>>>>>>> exist on the destination VTEP. Finally, what is the benefit of doing >>>>>>>>> this? There may be certain corner cases where it's useful (vs a >>>>>>>>> single BFD >>>>>>>>> session between the VTEPs for all VNIs) but it would be good to >>>>>>>>> explain >>>>>>>>> what those are. >>>>>>>>> >>>>>>>> GIM>> Will add text in the Security Considerations section that >>>>>>>> VTEPs should have limit on number of BFD sessions. >>>>>>>> >>>>>>> >>>>>>> [ag] I was hoping for two things: >>>>>>> - A mention about the scalability issue right where per-VNI BFD is >>>>>>> discussed. (Not sure why that is a security issue/consideration.) >>>>>>> >>>>>> GIM2>> I've added the following sentense in both places: >>>>>> The implementation SHOULD have a reasonable upper bound on the number >>>>>> of BFD sessions that can be created between the same pair of VTEPs. >>>>>> >>>>> >>>>> [ag2] What is the criteria for determining what is reasonable? >>>>> >>>> GIM>> I usually understand that as requirement to make it controllable, >>>> have configurable limit. Thus it will be up to an network operator to set >>>> the limit. >>>> >>>>> >>>>> >>>>>> - What is the benefit of running BFD per VNI between a pair of VTEPs? >>>>>>> >>>>>> GIM2>> An alternative would be to run CFM between VMs, if there's the >>>>>> need to monitor liveliness of the particular VM. Again, this is optional. >>>>>> >>>>> >>>>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows >>>>> one to monitor the liveliness of VMs. >>>>> >>>> >>> [ag3] I think you missed responding to this. I'm not sure of the value >>> of running BFD per VNI between VTEPs. What am I getting that is not >>> covered by running a single BFD session with VNI 0 between the VTEPs? >>> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Sections 5.1 and 6.1 >>>>>>>>> >>>>>>>>> In 5.1 we have >>>>>>>>> >>> >>>>>>>>> The inner MAC frame carrying the BFD payload has the >>>>>>>>> following format: >>>>>>>>> ... Source IP: IP address of the originating VTEP. Destination IP: >>>>>>>>> IP address of the terminating VTEP. >>>>>>>>> >>> >>>>>>>>> >>>>>>>>> In 6.1 we have >>>>>>>>> >>> >>>>>>>>> >>>>>>>>> Since multiple BFD sessions may be running between two >>>>>>>>> VTEPs, there needs to be a mechanism for demultiplexing received BF >>>>>>>>> >>>>>>>>> packets to the proper session. The procedure for demultiplexing >>>>>>>>> packets with Your Discriminator equal to 0 is different from[RFC5880 >>>>>>>>> <https://tools.ietf.org/html/rfc5880>]. >>>>>>>>> >>>>>>>>> *For such packets, the BFD session MUST be identified* >>>>>>>>> >>>>>>>>> *using the inner headers, i.e., the source IP and the destination IP >>>>>>>>> present in the IP header carried by the payload of the VXLAN* >>>>>>>>> >>>>>>>>> *encapsulated packet.* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> >>>>>>>>> How does this work if the source IP and dest IP are the same as >>>>>>>>> specified in 5.1? >>>>>>>>> >>>>>>>> GIM>> You're right, Destination and source IP addresses likely are >>>>>>>> the same in this case. Will add that the source UDP port number, along >>>>>>>> with >>>>>>>> the pair of IP addresses, MUST be used to demux received BFD control >>>>>>>> packets. Would you agree that will be sufficient? >>>>>>>> >>>>>>> >>>>>>> [ag] Yes, I think that should work. >>>>>>> >>>>>>>> >>>>>>>>> Editorial >>>>>>>>> >>>>>>>> >>>>>>> [ag] Agree with all comments on this section. >>>>>>> >>>>>>>> >>>>>>>>> - Terminology section should be renamed to acronyms. >>>>>>>>> >>>>>>>> GIM>> Accepted >>>>>>>> >>>>>>>>> - Document would benefit from a thorough editorial scrub, but >>>>>>>>> maybe that will happen once it gets to the RFC editor. >>>>>>>>> >>>>>>>> GIM>> Will certainly have helpful comments from ADs and RFC editor. >>>>>>>> >>>>>>>>> >>>>>>>>> Section 1 >>>>>>>>> >>> >>>>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348 >>>>>>>>> <https://tools.ietf.org/html/rfc7348>]. provides an encapsulation >>>>>>>>> scheme that allows virtual machines (VMs) to communicate in a data >>>>>>>>> center >>>>>>>>> network. >>>>>>>>> >>> >>>>>>>>> This is not accurate. VXLAN allows you to implement an overlay to >>>>>>>>> decouple the address space of the attached hosts from that of the >>>>>>>>> network. >>>>>>>>> >>>>>>>> GIM>> Thank you for the suggested text. Will change as follows: >>>>>>>> OLD TEXT: >>>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348]. >>>>>>>> provides >>>>>>>> an encapsulation scheme that allows virtual machines (VMs) to >>>>>>>> communicate in a data center network. >>>>>>>> NEW TEXT: >>>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348]. >>>>>>>> provides >>>>>>>> an encapsulation scheme that allows building an overlay network >>>>>>>> by >>>>>>>> decoupling the address space of the attached virtual hosts from >>>>>>>> that of the network. >>>>>>>> >>>>>>>>> >>>>>>>>> Section 7 >>>>>>>>> >>>>>>>>> VTEP's -> VTEPs >>>>>>>>> >>>>>>>> GIM>> Yes, thank you. >>>>>>>> >>>>>>>
