Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On Thu, May 25, 2017 at 12:58 PM, Lizhong Jinwrote: > Hi Tom, > > Sorry for the late reply, I finally get the time to read your document. Yes, > you are right for the Linux RFS implementation, where RFS is indexed with > hash value. But for the NIC hardware accelerated RFS, it is not the case. > The flow is indexed not by hash value, but 5/4/3/2-tuple exact match which > will improve the performance flow steering. As we know, there will be > collision when using hash value. You could refer some NIC datasheet for the > detail. Then if NIC could not parse the inner header, it will fail to have > same flow steering as currently doing. > Lizhong, RFS is a specific mechanism that is based on a hash into a flow table; accelerated RFS is HW acceleration for that but is still based on a hash into a flow table that maps to CPU (see https://www.kernel.org/doc/Documentation/networking/scaling.txt). Neither exact match nor doing DPI to get inner hash for encapsulation adds any value in HW as long as the UDP source port or flow label is set with enough entropy. If your NIC vendor is saying that all this invasive, expensive, and protocol ossifying DPI helps mechanisms like RFS, then ask them to pony up performance numbers to prove it! ;-) Tom > > > Regards > > Lizhong > > > On Sun, May 7, 2017 at 12:32 AM, Tom Herbert wrote: >> >> On Sat, May 6, 2017 at 9:15 AM, lizho.jin wrote: >> > Tom, see inline below. >> > >> > >> > Regards >> > Lizhong >> > >> > On 05/6/2017 23:45,Tom Herbert wrote: >> > >> > On Sat, May 6, 2017 at 8:37 AM, lizho.jin wrote: >> >> I am not referring RSS, but RFS with HW acceleration. What I >> >> >> >> proposed is to use hash value instead of 5-tuple to do flow steering. >> >> >> > RFS works as is also. The only requirement for RFS is that the hash is >> > reasonably consistent for a flow. The host should never need to >> > reverse engineer the hash a NIC does. >> > >> > [Lizhong] but the consistent requirement will not be met sometimes. Way >> > of >> > generating >> > >> > the source UDP port is privately designed. For example, what will be the >> > >> > rule to generate the source UDP port for the first TCP/UDP fragment >> > packet. >> > >> > Some may use 5-tuple while some may use 3-tuple. >> > >> Or they may use the same port all the time and get no entropy at all. >> But, all the UDP encapsulation drafts say to set UDP source port with >> flow entry and the reference implementation (Linux) does this >> automatically for such protocols. UDP source port without flow entry >> is an implementation edge case that I don't think justifies the >> complexity to solve in hardware. UDP hash work today across commodity >> hardware to give us RSS, RPS, and RFS. Note, checksum offload is >> similarly solves in a protocol agnostic way so we don't need explicit >> support in NICs for that either. >> >> Please see >> https://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf >> for details. >> >> Tom >> >> > And because of hash confliction, many hardware accelerated RFS do not >> > >> > use hash to select the CPU core, but use 5-tuple to select the CPU core. >> > While >> > >> > some privately designed method of source UDP port generation use very >> > small >> > port >> > >> > range which will worse the hash confliction. >> > >> > >> > >> > Tom >> > >> >> Sorry for the misunderstanding. >> >> >> >> >> >> Regards >> >> Lizhong >> >> >> >> On 05/6/2017 23:24,Tom Herbert wrote: >> >> >> >> On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: >> >>> Tom, thanks for the reply, see inline below. >> >>> >> >>> Regards >> >>> Lizhong >> >>> >> >>> On 05/6/2017 00:14,Tom Herbert wrote: >> >>> >> >>> [Lizhong] Total option length will not solve the parser buffer issue. >> >>> The parser buffer is located before parser, and for Geneve, implement >> >>> 512Byte is the only way since the longest of Geneve header is >> >>> 260Bytes. At least in some implementations as I know, hardware will >> >>> firstly receive enough 512Bytes per packets, and send the 512Bytes to >> >>> parser. Then parse will be able to skip over options to get inner >> >>> payload. Did I have any misunderstanding? >> >>> >> >>> [Tom] Skipping header is useful so that transit devices can find the >> >>> inner headers. The fact that there is no way to skip over an IPv6 >> >>> extension header chain to find the transport headers of a packet has >> >>> been a source of unhappiness. >> >>> >> >>> >> >>> [Lizhong] That's correct, and if we have not any working around way, >> >>> >> >>> some device may fail to get inner header, just like IPv6 with too many >> >>> >> >>> extension headers fails to parse transport header. Currently many >> >>> chips >> >>> >> >>> have this IPv6 extension header limitation. >> >>> >> >>> >> >>> [Tom] The parser buffer limit
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
Hi Tom, Sorry for the late reply, I finally get the time to read your document. Yes, you are right for the Linux RFS implementation, where RFS is indexed with hash value. But for the NIC hardware accelerated RFS, it is not the case. The flow is indexed not by hash value, but 5/4/3/2-tuple exact match which will improve the performance flow steering. As we know, there will be collision when using hash value. You could refer some NIC datasheet for the detail. Then if NIC could not parse the inner header, it will fail to have same flow steering as currently doing. Regards Lizhong On Sun, May 7, 2017 at 12:32 AM, Tom Herbertwrote: > On Sat, May 6, 2017 at 9:15 AM, lizho.jin wrote: > > Tom, see inline below. > > > > > > Regards > > Lizhong > > > > On 05/6/2017 23:45,Tom Herbert wrote: > > > > On Sat, May 6, 2017 at 8:37 AM, lizho.jin wrote: > >> I am not referring RSS, but RFS with HW acceleration. What I > >> > >> proposed is to use hash value instead of 5-tuple to do flow steering. > >> > > RFS works as is also. The only requirement for RFS is that the hash is > > reasonably consistent for a flow. The host should never need to > > reverse engineer the hash a NIC does. > > > > [Lizhong] but the consistent requirement will not be met sometimes. Way > of > > generating > > > > the source UDP port is privately designed. For example, what will be the > > > > rule to generate the source UDP port for the first TCP/UDP fragment > packet. > > > > Some may use 5-tuple while some may use 3-tuple. > > > Or they may use the same port all the time and get no entropy at all. > But, all the UDP encapsulation drafts say to set UDP source port with > flow entry and the reference implementation (Linux) does this > automatically for such protocols. UDP source port without flow entry > is an implementation edge case that I don't think justifies the > complexity to solve in hardware. UDP hash work today across commodity > hardware to give us RSS, RPS, and RFS. Note, checksum offload is > similarly solves in a protocol agnostic way so we don't need explicit > support in NICs for that either. > > Please see https://people.netfilter.org/pablo/netdev0.1/papers/UDP- > Encapsulation-in-Linux.pdf > for details. > > Tom > > > And because of hash confliction, many hardware accelerated RFS do not > > > > use hash to select the CPU core, but use 5-tuple to select the CPU core. > > While > > > > some privately designed method of source UDP port generation use very > small > > port > > > > range which will worse the hash confliction. > > > > > > > > Tom > > > >> Sorry for the misunderstanding. > >> > >> > >> Regards > >> Lizhong > >> > >> On 05/6/2017 23:24,Tom Herbert wrote: > >> > >> On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: > >>> Tom, thanks for the reply, see inline below. > >>> > >>> Regards > >>> Lizhong > >>> > >>> On 05/6/2017 00:14,Tom Herbert wrote: > >>> > >>> [Lizhong] Total option length will not solve the parser buffer issue. > >>> The parser buffer is located before parser, and for Geneve, implement > >>> 512Byte is the only way since the longest of Geneve header is > >>> 260Bytes. At least in some implementations as I know, hardware will > >>> firstly receive enough 512Bytes per packets, and send the 512Bytes to > >>> parser. Then parse will be able to skip over options to get inner > >>> payload. Did I have any misunderstanding? > >>> > >>> [Tom] Skipping header is useful so that transit devices can find the > >>> inner headers. The fact that there is no way to skip over an IPv6 > >>> extension header chain to find the transport headers of a packet has > >>> been a source of unhappiness. > >>> > >>> > >>> [Lizhong] That's correct, and if we have not any working around way, > >>> > >>> some device may fail to get inner header, just like IPv6 with too many > >>> > >>> extension headers fails to parse transport header. Currently many chips > >>> > >>> have this IPv6 extension header limitation. > >>> > >>> > >>> [Tom] The parser buffer limit applies to all headers a device wishes > >>> to inspect (some devices still may have less than 512 byte buffers > >>> also). The best way to deal with this is to minimize the length of > >>> headers. Geneve TLVs each have four bytes of overhead so they are less > >>> compact that other TLVs at similar layer (IP options, TCP options, > >>> IPv6 options each have two bytes overhead). The tradeoff made here is > >>> probably to simply alignment (I really don't see any rationale for > >>> needing 24 bits to identify options). Bit-fields are still better in > >>> this regard for being compact since there is no additional overhead > >>> per each option. > >>> > >>> > >>> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload > design. > >>> > >>> Since one of the purpose of NIC to parse inner header is to get a hash >
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On 5/15/2017 12:53 PM, Tom Herbert wrote: > I'm pretty sure the latest versions of all the major OSes are setting > the flow label If that hasn't happened, I agree it's likely to happen soon... > so hopefully the motivation to do DPI will go away. DPI isn't just based on flow management; it's also done for security purposes. That latter case is going to get worse until the deep info becomes obscured by encryption, IMO. Joe ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On Mon, May 15, 2017 at 12:02 PM, Joe Touchwrote: > > > On 5/6/2017 8:24 AM, Tom Herbert wrote: >> Using the entropy in the UDP port number works perfectly well to get >> ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, >> etc. If the UDP port number weren't good enough then the IPv6 flow >> label can be used (and that works for _any_ protocol not just UDP!). > > If the IPv6 flow is set, intermediate devices really have no business > doing DPI to infer flows themselves. > > I.e., the first thing is to check the flow ID and peek further only when > that fails (e.g., flow ID is zero). > I'm pretty sure the latest versions of all the major OSes are setting the flow label so hopefully the motivation to do DPI will go away. It's probably up to the switch vendors now to take flow label as input to ECMP and completing the deployment of IPv6... Tom > Joe ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On 5/6/2017 8:24 AM, Tom Herbert wrote: > Using the entropy in the UDP port number works perfectly well to get > ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, > etc. If the UDP port number weren't good enough then the IPv6 flow > label can be used (and that works for _any_ protocol not just UDP!). If the IPv6 flow is set, intermediate devices really have no business doing DPI to infer flows themselves. I.e., the first thing is to check the flow ID and peek further only when that fails (e.g., flow ID is zero). Joe ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On Sat, May 6, 2017 at 9:05 AM, Greg Mirskywrote: > Hi Tom and Lizhong, > I the strongest terms agree with your view that intermediate nodes should > not use DPI to do flow steering. Decisions should be based on information > expressed in the transport layer, not derived from the payload. Otherwise, > active OAM cannot be viewed as in-band thus making interpretation of defects > and performance metrics less accurate. > IMO, if OAM requires that network nodes inspect and change data in flight this would better served by IPv6 Hop by Hop options (modifiable bit set). They are designed for this purpose, eliminate the need for DPI, and work with any protocol not just a specific UDP encapsulation. Tom > Regards, > Greg > > On Sat, May 6, 2017 at 8:24 AM, Tom Herbert wrote: >> >> On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: >> > Tom, thanks for the reply, see inline below. >> > >> > Regards >> > Lizhong >> > >> > On 05/6/2017 00:14,Tom Herbert wrote: >> > >> > [Lizhong] Total option length will not solve the parser buffer issue. >> > The parser buffer is located before parser, and for Geneve, implement >> > 512Byte is the only way since the longest of Geneve header is >> > 260Bytes. At least in some implementations as I know, hardware will >> > firstly receive enough 512Bytes per packets, and send the 512Bytes to >> > parser. Then parse will be able to skip over options to get inner >> > payload. Did I have any misunderstanding? >> > >> > [Tom] Skipping header is useful so that transit devices can find the >> > inner headers. The fact that there is no way to skip over an IPv6 >> > extension header chain to find the transport headers of a packet has >> > been a source of unhappiness. >> > >> > >> > [Lizhong] That's correct, and if we have not any working around way, >> > >> > some device may fail to get inner header, just like IPv6 with too many >> > >> > extension headers fails to parse transport header. Currently many chips >> > >> > have this IPv6 extension header limitation. >> > >> > >> > [Tom] The parser buffer limit applies to all headers a device wishes >> > to inspect (some devices still may have less than 512 byte buffers >> > also). The best way to deal with this is to minimize the length of >> > headers. Geneve TLVs each have four bytes of overhead so they are less >> > compact that other TLVs at similar layer (IP options, TCP options, >> > IPv6 options each have two bytes overhead). The tradeoff made here is >> > probably to simply alignment (I really don't see any rationale for >> > needing 24 bits to identify options). Bit-fields are still better in >> > this regard for being compact since there is no additional overhead >> > per each option. >> > >> > >> > [Lizhong] I suspect, a 260Bytes long Geneve header is an overload >> > design. >> > >> > Since one of the purpose of NIC to parse inner header is to get a hash >> > value >> > >> > to do flow steering, one way is to define a Geneve TLV which SHOULD be >> > >> > at the first one to carry the hash value of inner 5-tuple, and also hash >> > algorithm. >> > >> > Then NIC may only need to parse to the first Geneve TLV. >> > >> > Note that the source UDP port could not serve that purpose since that >> > port >> > >> > number could not be able to be predicted by the receiver. >> > >> Using the entropy in the UDP port number works perfectly well to get >> ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, >> etc. If the UDP port number weren't good enough then the IPv6 flow >> label can be used (and that works for _any_ protocol not just UDP!). >> >> The goal should be to discourage intermediate devices from doing DPI >> into transport layer payloads. It requires a bunch of protocol >> specific logic and any interpretation may be completely wrong since >> port numbers don't have global meaning (e.g. if a device see a UDP >> port destined to port 6081 in the network it may or may not be >> Geneve). >> >> Tom >> >> > >> > >> > >> >> ___ >> nvo3 mailing list >> nvo3@ietf.org >> https://www.ietf.org/mailman/listinfo/nvo3 > > ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On Sat, May 6, 2017 at 9:15 AM, lizho.jinwrote: > Tom, see inline below. > > > Regards > Lizhong > > On 05/6/2017 23:45,Tom Herbert wrote: > > On Sat, May 6, 2017 at 8:37 AM, lizho.jin wrote: >> I am not referring RSS, but RFS with HW acceleration. What I >> >> proposed is to use hash value instead of 5-tuple to do flow steering. >> > RFS works as is also. The only requirement for RFS is that the hash is > reasonably consistent for a flow. The host should never need to > reverse engineer the hash a NIC does. > > [Lizhong] but the consistent requirement will not be met sometimes. Way of > generating > > the source UDP port is privately designed. For example, what will be the > > rule to generate the source UDP port for the first TCP/UDP fragment packet. > > Some may use 5-tuple while some may use 3-tuple. > Or they may use the same port all the time and get no entropy at all. But, all the UDP encapsulation drafts say to set UDP source port with flow entry and the reference implementation (Linux) does this automatically for such protocols. UDP source port without flow entry is an implementation edge case that I don't think justifies the complexity to solve in hardware. UDP hash work today across commodity hardware to give us RSS, RPS, and RFS. Note, checksum offload is similarly solves in a protocol agnostic way so we don't need explicit support in NICs for that either. Please see https://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf for details. Tom > And because of hash confliction, many hardware accelerated RFS do not > > use hash to select the CPU core, but use 5-tuple to select the CPU core. > While > > some privately designed method of source UDP port generation use very small > port > > range which will worse the hash confliction. > > > > Tom > >> Sorry for the misunderstanding. >> >> >> Regards >> Lizhong >> >> On 05/6/2017 23:24,Tom Herbert wrote: >> >> On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: >>> Tom, thanks for the reply, see inline below. >>> >>> Regards >>> Lizhong >>> >>> On 05/6/2017 00:14,Tom Herbert wrote: >>> >>> [Lizhong] Total option length will not solve the parser buffer issue. >>> The parser buffer is located before parser, and for Geneve, implement >>> 512Byte is the only way since the longest of Geneve header is >>> 260Bytes. At least in some implementations as I know, hardware will >>> firstly receive enough 512Bytes per packets, and send the 512Bytes to >>> parser. Then parse will be able to skip over options to get inner >>> payload. Did I have any misunderstanding? >>> >>> [Tom] Skipping header is useful so that transit devices can find the >>> inner headers. The fact that there is no way to skip over an IPv6 >>> extension header chain to find the transport headers of a packet has >>> been a source of unhappiness. >>> >>> >>> [Lizhong] That's correct, and if we have not any working around way, >>> >>> some device may fail to get inner header, just like IPv6 with too many >>> >>> extension headers fails to parse transport header. Currently many chips >>> >>> have this IPv6 extension header limitation. >>> >>> >>> [Tom] The parser buffer limit applies to all headers a device wishes >>> to inspect (some devices still may have less than 512 byte buffers >>> also). The best way to deal with this is to minimize the length of >>> headers. Geneve TLVs each have four bytes of overhead so they are less >>> compact that other TLVs at similar layer (IP options, TCP options, >>> IPv6 options each have two bytes overhead). The tradeoff made here is >>> probably to simply alignment (I really don't see any rationale for >>> needing 24 bits to identify options). Bit-fields are still better in >>> this regard for being compact since there is no additional overhead >>> per each option. >>> >>> >>> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design. >>> >>> Since one of the purpose of NIC to parse inner header is to get a hash >>> value >>> >>> to do flow steering, one way is to define a Geneve TLV which SHOULD be >>> >>> at the first one to carry the hash value of inner 5-tuple, and also hash >>> algorithm. >>> >>> Then NIC may only need to parse to the first Geneve TLV. >>> >>> Note that the source UDP port could not serve that purpose since that >>> port >>> >>> number could not be able to be predicted by the receiver. >>> >> Using the entropy in the UDP port number works perfectly well to get >> ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, >> etc. If the UDP port number weren't good enough then the IPv6 flow >> label can be used (and that works for _any_ protocol not just UDP!). >> >> >> The goal should be to discourage intermediate devices from doing DPI >> into transport layer payloads. It requires a bunch of protocol >> specific logic and any interpretation may be completely
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
Tom, see inline below. RegardsLizhong On 05/6/2017 23:45,Tom Herbertwrote: On Sat, May 6, 2017 at 8:37 AM, lizho.jin wrote: > I am not referring RSS, but RFS with HW acceleration. What I > > proposed is to use hash value instead of 5-tuple to do flow steering. > RFS works as is also. The only requirement for RFS is that the hash is reasonably consistent for a flow. The host should never need to reverse engineer the hash a NIC does. [Lizhong] but the consistent requirement will not be met sometimes. Way of generatingthe source UDP port is privately designed. For example, what will be therule to generate the source UDP port for the first TCP/UDP fragment packet.Some may use 5-tuple while some may use 3-tuple.And because of hash confliction, many hardware accelerated RFS do notuse hash to select the CPU core, but use 5-tuple to select the CPU core. Whilesome privately designed method of source UDP port generation use very small portrange which will worse the hash confliction. Tom > Sorry for the misunderstanding. > > > Regards > Lizhong > > On 05/6/2017 23:24,Tom Herbert wrote: > > On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: >> Tom, thanks for the reply, see inline below. >> >> Regards >> Lizhong >> >> On 05/6/2017 00:14,Tom Herbert wrote: >> >> [Lizhong] Total option length will not solve the parser buffer issue. >> The parser buffer is located before parser, and for Geneve, implement >> 512Byte is the only way since the longest of Geneve header is >> 260Bytes. At least in some implementations as I know, hardware will >> firstly receive enough 512Bytes per packets, and send the 512Bytes to >> parser. Then parse will be able to skip over options to get inner >> payload. Did I have any misunderstanding? >> >> [Tom] Skipping header is useful so that transit devices can find the >> inner headers. The fact that there is no way to skip over an IPv6 >> extension header chain to find the transport headers of a packet has >> been a source of unhappiness. >> >> >> [Lizhong] That's correct, and if we have not any working around way, >> >> some device may fail to get inner header, just like IPv6 with too many >> >> extension headers fails to parse transport header. Currently many chips >> >> have this IPv6 extension header limitation. >> >> >> [Tom] The parser buffer limit applies to all headers a device wishes >> to inspect (some devices still may have less than 512 byte buffers >> also). The best way to deal with this is to minimize the length of >> headers. Geneve TLVs each have four bytes of overhead so they are less >> compact that other TLVs at similar layer (IP options, TCP options, >> IPv6 options each have two bytes overhead). The tradeoff made here is >> probably to simply alignment (I really don't see any rationale for >> needing 24 bits to identify options). Bit-fields are still better in >> this regard for being compact since there is no additional overhead >> per each option. >> >> >> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design. >> >> Since one of the purpose of NIC to parse inner header is to get a hash >> value >> >> to do flow steering, one way is to define a Geneve TLV which SHOULD be >> >> at the first one to carry the hash value of inner 5-tuple, and also hash >> algorithm. >> >> Then NIC may only need to parse to the first Geneve TLV. >> >> Note that the source UDP port could not serve that purpose since that port >> >> number could not be able to be predicted by the receiver. >> > Using the entropy in the UDP port number works perfectly well to get > ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, > etc. If the UDP port number weren't good enough then the IPv6 flow > label can be used (and that works for _any_ protocol not just UDP!). > > > The goal should be to discourage intermediate devices from doing DPI > into transport layer payloads. It requires a bunch of protocol > specific logic and any interpretation may be completely wrong since > port numbers don't have global meaning (e.g. if a device see a UDP > port destined to port 6081 in the network it may or may not be > Geneve). > > Tom > >> >> >> ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
Hi Tom and Lizhong, I the strongest terms agree with your view that intermediate nodes should not use DPI to do flow steering. Decisions should be based on information expressed in the transport layer, not derived from the payload. Otherwise, active OAM cannot be viewed as in-band thus making interpretation of defects and performance metrics less accurate. Regards, Greg On Sat, May 6, 2017 at 8:24 AM, Tom Herbertwrote: > On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: > > Tom, thanks for the reply, see inline below. > > > > Regards > > Lizhong > > > > On 05/6/2017 00:14,Tom Herbert wrote: > > > > [Lizhong] Total option length will not solve the parser buffer issue. > > The parser buffer is located before parser, and for Geneve, implement > > 512Byte is the only way since the longest of Geneve header is > > 260Bytes. At least in some implementations as I know, hardware will > > firstly receive enough 512Bytes per packets, and send the 512Bytes to > > parser. Then parse will be able to skip over options to get inner > > payload. Did I have any misunderstanding? > > > > [Tom] Skipping header is useful so that transit devices can find the > > inner headers. The fact that there is no way to skip over an IPv6 > > extension header chain to find the transport headers of a packet has > > been a source of unhappiness. > > > > > > [Lizhong] That's correct, and if we have not any working around way, > > > > some device may fail to get inner header, just like IPv6 with too many > > > > extension headers fails to parse transport header. Currently many chips > > > > have this IPv6 extension header limitation. > > > > > > [Tom] The parser buffer limit applies to all headers a device wishes > > to inspect (some devices still may have less than 512 byte buffers > > also). The best way to deal with this is to minimize the length of > > headers. Geneve TLVs each have four bytes of overhead so they are less > > compact that other TLVs at similar layer (IP options, TCP options, > > IPv6 options each have two bytes overhead). The tradeoff made here is > > probably to simply alignment (I really don't see any rationale for > > needing 24 bits to identify options). Bit-fields are still better in > > this regard for being compact since there is no additional overhead > > per each option. > > > > > > [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design. > > > > Since one of the purpose of NIC to parse inner header is to get a hash > value > > > > to do flow steering, one way is to define a Geneve TLV which SHOULD be > > > > at the first one to carry the hash value of inner 5-tuple, and also hash > > algorithm. > > > > Then NIC may only need to parse to the first Geneve TLV. > > > > Note that the source UDP port could not serve that purpose since that > port > > > > number could not be able to be predicted by the receiver. > > > Using the entropy in the UDP port number works perfectly well to get > ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, > etc. If the UDP port number weren't good enough then the IPv6 flow > label can be used (and that works for _any_ protocol not just UDP!). > > The goal should be to discourage intermediate devices from doing DPI > into transport layer payloads. It requires a bunch of protocol > specific logic and any interpretation may be completely wrong since > port numbers don't have global meaning (e.g. if a device see a UDP > port destined to port 6081 in the network it may or may not be > Geneve). > > Tom > > > > > > > > > ___ > nvo3 mailing list > nvo3@ietf.org > https://www.ietf.org/mailman/listinfo/nvo3 > ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On Sat, May 6, 2017 at 8:37 AM, lizho.jinwrote: > I am not referring RSS, but RFS with HW acceleration. What I > > proposed is to use hash value instead of 5-tuple to do flow steering. > RFS works as is also. The only requirement for RFS is that the hash is reasonably consistent for a flow. The host should never need to reverse engineer the hash a NIC does. Tom > Sorry for the misunderstanding. > > > Regards > Lizhong > > On 05/6/2017 23:24,Tom Herbert wrote: > > On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: >> Tom, thanks for the reply, see inline below. >> >> Regards >> Lizhong >> >> On 05/6/2017 00:14,Tom Herbert wrote: >> >> [Lizhong] Total option length will not solve the parser buffer issue. >> The parser buffer is located before parser, and for Geneve, implement >> 512Byte is the only way since the longest of Geneve header is >> 260Bytes. At least in some implementations as I know, hardware will >> firstly receive enough 512Bytes per packets, and send the 512Bytes to >> parser. Then parse will be able to skip over options to get inner >> payload. Did I have any misunderstanding? >> >> [Tom] Skipping header is useful so that transit devices can find the >> inner headers. The fact that there is no way to skip over an IPv6 >> extension header chain to find the transport headers of a packet has >> been a source of unhappiness. >> >> >> [Lizhong] That's correct, and if we have not any working around way, >> >> some device may fail to get inner header, just like IPv6 with too many >> >> extension headers fails to parse transport header. Currently many chips >> >> have this IPv6 extension header limitation. >> >> >> [Tom] The parser buffer limit applies to all headers a device wishes >> to inspect (some devices still may have less than 512 byte buffers >> also). The best way to deal with this is to minimize the length of >> headers. Geneve TLVs each have four bytes of overhead so they are less >> compact that other TLVs at similar layer (IP options, TCP options, >> IPv6 options each have two bytes overhead). The tradeoff made here is >> probably to simply alignment (I really don't see any rationale for >> needing 24 bits to identify options). Bit-fields are still better in >> this regard for being compact since there is no additional overhead >> per each option. >> >> >> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design. >> >> Since one of the purpose of NIC to parse inner header is to get a hash >> value >> >> to do flow steering, one way is to define a Geneve TLV which SHOULD be >> >> at the first one to carry the hash value of inner 5-tuple, and also hash >> algorithm. >> >> Then NIC may only need to parse to the first Geneve TLV. >> >> Note that the source UDP port could not serve that purpose since that port >> >> number could not be able to be predicted by the receiver. >> > Using the entropy in the UDP port number works perfectly well to get > ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, > etc. If the UDP port number weren't good enough then the IPv6 flow > label can be used (and that works for _any_ protocol not just UDP!). > > > The goal should be to discourage intermediate devices from doing DPI > into transport layer payloads. It requires a bunch of protocol > specific logic and any interpretation may be completely wrong since > port numbers don't have global meaning (e.g. if a device see a UDP > port destined to port 6081 in the network it may or may not be > Geneve). > > Tom > >> >> >> ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
I am not referring RSS, but RFS with HW acceleration. What I proposed is to use hash value instead of 5-tuple to do flow steering.Sorry for the misunderstanding. RegardsLizhong On 05/6/2017 23:24,Tom Herbertwrote: On Fri, May 5, 2017 at 6:39 PM, lizho.jin wrote: > Tom, thanks for the reply, see inline below. > > Regards > Lizhong > > On 05/6/2017 00:14,Tom Herbert wrote: > > [Lizhong] Total option length will not solve the parser buffer issue. > The parser buffer is located before parser, and for Geneve, implement > 512Byte is the only way since the longest of Geneve header is > 260Bytes. At least in some implementations as I know, hardware will > firstly receive enough 512Bytes per packets, and send the 512Bytes to > parser. Then parse will be able to skip over options to get inner > payload. Did I have any misunderstanding? > > [Tom] Skipping header is useful so that transit devices can find the > inner headers. The fact that there is no way to skip over an IPv6 > extension header chain to find the transport headers of a packet has > been a source of unhappiness. > > > [Lizhong] That's correct, and if we have not any working around way, > > some device may fail to get inner header, just like IPv6 with too many > > extension headers fails to parse transport header. Currently many chips > > have this IPv6 extension header limitation. > > > [Tom] The parser buffer limit applies to all headers a device wishes > to inspect (some devices still may have less than 512 byte buffers > also). The best way to deal with this is to minimize the length of > headers. Geneve TLVs each have four bytes of overhead so they are less > compact that other TLVs at similar layer (IP options, TCP options, > IPv6 options each have two bytes overhead). The tradeoff made here is > probably to simply alignment (I really don't see any rationale for > needing 24 bits to identify options). Bit-fields are still better in > this regard for being compact since there is no additional overhead > per each option. > > > [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design. > > Since one of the purpose of NIC to parse inner header is to get a hash value > > to do flow steering, one way is to define a Geneve TLV which SHOULD be > > at the first one to carry the hash value of inner 5-tuple, and also hash > algorithm. > > Then NIC may only need to parse to the first Geneve TLV. > > Note that the source UDP port could not serve that purpose since that port > > number could not be able to be predicted by the receiver. > Using the entropy in the UDP port number works perfectly well to get ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, etc. If the UDP port number weren't good enough then the IPv6 flow label can be used (and that works for _any_ protocol not just UDP!). The goal should be to discourage intermediate devices from doing DPI into transport layer payloads. It requires a bunch of protocol specific logic and any interpretation may be completely wrong since port numbers don't have global meaning (e.g. if a device see a UDP port destined to port 6081 in the network it may or may not be Geneve). Tom > > > ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
On Fri, May 5, 2017 at 6:39 PM, lizho.jinwrote: > Tom, thanks for the reply, see inline below. > > Regards > Lizhong > > On 05/6/2017 00:14,Tom Herbert wrote: > > [Lizhong] Total option length will not solve the parser buffer issue. > The parser buffer is located before parser, and for Geneve, implement > 512Byte is the only way since the longest of Geneve header is > 260Bytes. At least in some implementations as I know, hardware will > firstly receive enough 512Bytes per packets, and send the 512Bytes to > parser. Then parse will be able to skip over options to get inner > payload. Did I have any misunderstanding? > > [Tom] Skipping header is useful so that transit devices can find the > inner headers. The fact that there is no way to skip over an IPv6 > extension header chain to find the transport headers of a packet has > been a source of unhappiness. > > > [Lizhong] That's correct, and if we have not any working around way, > > some device may fail to get inner header, just like IPv6 with too many > > extension headers fails to parse transport header. Currently many chips > > have this IPv6 extension header limitation. > > > [Tom] The parser buffer limit applies to all headers a device wishes > to inspect (some devices still may have less than 512 byte buffers > also). The best way to deal with this is to minimize the length of > headers. Geneve TLVs each have four bytes of overhead so they are less > compact that other TLVs at similar layer (IP options, TCP options, > IPv6 options each have two bytes overhead). The tradeoff made here is > probably to simply alignment (I really don't see any rationale for > needing 24 bits to identify options). Bit-fields are still better in > this regard for being compact since there is no additional overhead > per each option. > > > [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design. > > Since one of the purpose of NIC to parse inner header is to get a hash value > > to do flow steering, one way is to define a Geneve TLV which SHOULD be > > at the first one to carry the hash value of inner 5-tuple, and also hash > algorithm. > > Then NIC may only need to parse to the first Geneve TLV. > > Note that the source UDP port could not serve that purpose since that port > > number could not be able to be predicted by the receiver. > Using the entropy in the UDP port number works perfectly well to get ECMP or RSS for any UDP encapsulation including Geneve, VXLAN, GUE, etc. If the UDP port number weren't good enough then the IPv6 flow label can be used (and that works for _any_ protocol not just UDP!). The goal should be to discourage intermediate devices from doing DPI into transport layer payloads. It requires a bunch of protocol specific logic and any interpretation may be completely wrong since port numbers don't have global meaning (e.g. if a device see a UDP port destined to port 6081 in the network it may or may not be Geneve). Tom > > > ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
Tom, thanks for the reply, see inline below. RegardsLizhong On 05/6/2017 00:14,Tom Herbertwrote: [Lizhong] Total option length will not solve the parser buffer issue. The parser buffer is located before parser, and for Geneve, implement 512Byte is the only way since the longest of Geneve header is 260Bytes. At least in some implementations as I know, hardware will firstly receive enough 512Bytes per packets, and send the 512Bytes to parser. Then parse will be able to skip over options to get inner payload. Did I have any misunderstanding? [Tom] Skipping header is useful so that transit devices can find the inner headers. The fact that there is no way to skip over an IPv6 extension header chain to find the transport headers of a packet has been a source of unhappiness. [Lizhong] That's correct, and if we have not any working around way,some device may fail to get inner header, just like IPv6 with too manyextension headers fails to parse transport header. Currently many chipshave this IPv6 extension header limitation. [Tom] The parser buffer limit applies to all headers a device wishes to inspect (some devices still may have less than 512 byte buffers also). The best way to deal with this is to minimize the length of headers. Geneve TLVs each have four bytes of overhead so they are less compact that other TLVs at similar layer (IP options, TCP options, IPv6 options each have two bytes overhead). The tradeoff made here is probably to simply alignment (I really don't see any rationale for needing 24 bits to identify options). Bit-fields are still better in this regard for being compact since there is no additional overhead per each option. [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.Since one of the purpose of NIC to parse inner header is to get a hash valueto do flow steering, one way is to define a Geneve TLV which SHOULD be at the first one to carry the hash value of inner 5-tuple, and also hash algorithm.Then NIC may only need to parse to the first Geneve TLV.Note that the source UDP port could not serve that purpose since that portnumber could not be able to be predicted by the receiver. ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
Re: [nvo3] One comment for draft-dt-nvo3-encap-01
[Lizhong] Total option length will not solve the parser buffer issue. The parser buffer is located before parser, and for Geneve, implement 512Byte is the only way since the longest of Geneve header is 260Bytes. At least in some implementations as I know, hardware will firstly receive enough 512Bytes per packets, and send the 512Bytes to parser. Then parse will be able to skip over options to get inner payload. Did I have any misunderstanding? [Tom] Skipping header is useful so that transit devices can find the inner headers. The fact that there is no way to skip over an IPv6 extension header chain to find the transport headers of a packet has been a source of unhappiness. [Tom] The parser buffer limit applies to all headers a device wishes to inspect (some devices still may have less than 512 byte buffers also). The best way to deal with this is to minimize the length of headers. Geneve TLVs each have four bytes of overhead so they are less compact that other TLVs at similar layer (IP options, TCP options, IPv6 options each have two bytes overhead). The tradeoff made here is probably to simply alignment (I really don't see any rationale for needing 24 bits to identify options). Bit-fields are still better in this regard for being compact since there is no additional overhead per each option. ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3
[nvo3] One comment for draft-dt-nvo3-encap-01
Hi authors,One comment for the section 7 "Design team recommendations":2. Geneve has the total options length that allow skipping over the options for NIC offload operations, and will allow transit devices to view flow information in the inner payload.[Lizhong] Total option length will not solve the parser buffer issue. The parser buffer is located before parser, and for Geneve, implement 512Byte is the only way since the longest of Geneve header is 260Bytes. At least in some implementations as I know, hardware will firstly receive enough 512Bytes per packets, and send the 512Bytes to parser. Then parse will be able to skip over options to get inner payload. Did I have any misunderstanding?RegardsLizhong ___ nvo3 mailing list nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3