Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-25 Thread Tom Herbert
On Thu, May 25, 2017 at 12:58 PM, Lizhong Jin  wrote:
> Hi Tom,
>
> Sorry for the late reply, I finally get the time to read your document. Yes,
> you are right for the Linux RFS implementation, where RFS is indexed with
> hash value. But for the NIC hardware accelerated RFS, it is not the case.
> The flow is indexed not by hash value, but 5/4/3/2-tuple exact match which
> will improve the performance flow steering. As we know, there will be
> collision when using hash value. You could refer some NIC datasheet for the
> detail. Then if NIC could not parse the inner header, it will fail to have
> same flow steering as currently doing.
>
Lizhong,

RFS is a specific mechanism that is based on a hash into a flow table;
accelerated RFS is HW acceleration for that but is still based on a
hash into a flow table that maps to CPU (see
https://www.kernel.org/doc/Documentation/networking/scaling.txt).
Neither exact match nor doing DPI to get inner hash for encapsulation
adds any value in HW as long as the UDP source port or flow label is
set with enough entropy. If your NIC vendor is saying that all this
invasive, expensive, and protocol ossifying DPI helps mechanisms like
RFS, then ask them to pony up performance numbers to prove it! ;-)

Tom



>
>
> Regards
>
> Lizhong
>
>
> On Sun, May 7, 2017 at 12:32 AM, Tom Herbert  wrote:
>>
>> On Sat, May 6, 2017 at 9:15 AM, lizho.jin  wrote:
>> > Tom, see inline below.
>> >
>> >
>> > Regards
>> > Lizhong
>> >
>> > On 05/6/2017 23:45,Tom Herbert wrote:
>> >
>> > On Sat, May 6, 2017 at 8:37 AM, lizho.jin  wrote:
>> >> I am not referring RSS, but RFS with HW acceleration. What I
>> >>
>> >> proposed is to use hash value instead of 5-tuple to do flow steering.
>> >>
>> > RFS works as is also. The only requirement for RFS is that the hash is
>> > reasonably consistent for a flow. The host should never need to
>> > reverse engineer the hash a NIC does.
>> >
>> > [Lizhong] but the consistent requirement will not be met sometimes. Way
>> > of
>> > generating
>> >
>> > the source UDP port is privately designed. For example, what will be the
>> >
>> > rule to generate the source UDP port for the first TCP/UDP fragment
>> > packet.
>> >
>> > Some may use 5-tuple while some may use 3-tuple.
>> >
>> Or they may use the same port all the time and get no entropy at all.
>> But, all the UDP encapsulation drafts say to set UDP source port with
>> flow entry and the reference implementation (Linux) does this
>> automatically for such protocols. UDP source port without flow entry
>> is an implementation edge case that I don't think justifies the
>> complexity to solve in hardware. UDP hash work today across commodity
>> hardware to give us RSS, RPS, and RFS. Note, checksum offload is
>> similarly solves in a protocol agnostic way so we don't need explicit
>> support in NICs for that either.
>>
>> Please see
>> https://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf
>> for details.
>>
>> Tom
>>
>> > And because of hash confliction, many hardware accelerated RFS do not
>> >
>> > use hash to select the CPU core, but use 5-tuple to select the CPU core.
>> > While
>> >
>> > some privately designed method of source UDP port generation use very
>> > small
>> > port
>> >
>> > range which will worse the hash confliction.
>> >
>> >
>> >
>> > Tom
>> >
>> >> Sorry for the misunderstanding.
>> >>
>> >>
>> >> Regards
>> >> Lizhong
>> >>
>> >> On 05/6/2017 23:24,Tom Herbert wrote:
>> >>
>> >> On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
>> >>> Tom, thanks for the reply, see inline below.
>> >>>
>> >>> Regards
>> >>> Lizhong
>> >>>
>> >>> On 05/6/2017 00:14,Tom Herbert wrote:
>> >>>
>> >>> [Lizhong] Total option length will not solve the parser buffer issue.
>> >>> The parser buffer is located before parser, and for Geneve, implement
>> >>> 512Byte is the only way since the longest of Geneve header is
>> >>> 260Bytes. At least in some implementations as I know, hardware will
>> >>> firstly receive enough 512Bytes per packets, and send the 512Bytes to
>> >>> parser. Then parse will be able to skip over options to get inner
>> >>> payload. Did I have any misunderstanding?
>> >>>
>> >>> [Tom] Skipping header is useful so that transit devices can find the
>> >>> inner headers. The fact that there is no way to skip over an IPv6
>> >>> extension header chain to find the transport headers of a packet has
>> >>> been a source of unhappiness.
>> >>>
>> >>>
>> >>> [Lizhong] That's correct, and if we have not any working around way,
>> >>>
>> >>> some device may fail to get inner header, just like IPv6 with too many
>> >>>
>> >>> extension headers fails to parse transport header. Currently many
>> >>> chips
>> >>>
>> >>> have this IPv6 extension header limitation.
>> >>>
>> >>>
>> >>> [Tom] The parser buffer limit 

Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-25 Thread Lizhong Jin
Hi Tom,

Sorry for the late reply, I finally get the time to read your document.
Yes, you are right for the Linux RFS implementation, where RFS is indexed
with hash value. But for the NIC hardware accelerated RFS, it is not the
case. The flow is indexed not by hash value, but 5/4/3/2-tuple exact match
which will improve the performance flow steering. As we know, there will be
collision when using hash value. You could refer some NIC datasheet for the
detail. Then if NIC could not parse the inner header, it will fail to have
same flow steering as currently doing.



Regards

Lizhong

On Sun, May 7, 2017 at 12:32 AM, Tom Herbert  wrote:

> On Sat, May 6, 2017 at 9:15 AM, lizho.jin  wrote:
> > Tom, see inline below.
> >
> >
> > Regards
> > Lizhong
> >
> > On 05/6/2017 23:45,Tom Herbert wrote:
> >
> > On Sat, May 6, 2017 at 8:37 AM, lizho.jin  wrote:
> >> I am not referring RSS, but RFS with HW acceleration. What I
> >>
> >> proposed is to use hash value instead of 5-tuple to do flow steering.
> >>
> > RFS works as is also. The only requirement for RFS is that the hash is
> > reasonably consistent for a flow. The host should never need to
> > reverse engineer the hash a NIC does.
> >
> > [Lizhong] but the consistent requirement will not be met sometimes. Way
> of
> > generating
> >
> > the source UDP port is privately designed. For example, what will be the
> >
> > rule to generate the source UDP port for the first TCP/UDP fragment
> packet.
> >
> > Some may use 5-tuple while some may use 3-tuple.
> >
> Or they may use the same port all the time and get no entropy at all.
> But, all the UDP encapsulation drafts say to set UDP source port with
> flow entry and the reference implementation (Linux) does this
> automatically for such protocols. UDP source port without flow entry
> is an implementation edge case that I don't think justifies the
> complexity to solve in hardware. UDP hash work today across commodity
> hardware to give us RSS, RPS, and RFS. Note, checksum offload is
> similarly solves in a protocol agnostic way so we don't need explicit
> support in NICs for that either.
>
> Please see https://people.netfilter.org/pablo/netdev0.1/papers/UDP-
> Encapsulation-in-Linux.pdf
> for details.
>
> Tom
>
> > And because of hash confliction, many hardware accelerated RFS do not
> >
> > use hash to select the CPU core, but use 5-tuple to select the CPU core.
> > While
> >
> > some privately designed method of source UDP port generation use very
> small
> > port
> >
> > range which will worse the hash confliction.
> >
> >
> >
> > Tom
> >
> >> Sorry for the misunderstanding.
> >>
> >>
> >> Regards
> >> Lizhong
> >>
> >> On 05/6/2017 23:24,Tom Herbert wrote:
> >>
> >> On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
> >>> Tom, thanks for the reply, see inline below.
> >>>
> >>> Regards
> >>> Lizhong
> >>>
> >>> On 05/6/2017 00:14,Tom Herbert wrote:
> >>>
> >>> [Lizhong] Total option length will not solve the parser buffer issue.
> >>> The parser buffer is located before parser, and for Geneve, implement
> >>> 512Byte is the only way since the longest of Geneve header is
> >>> 260Bytes. At least in some implementations as I know, hardware will
> >>> firstly receive enough 512Bytes per packets, and send the 512Bytes to
> >>> parser. Then parse will be able to skip over options to get inner
> >>> payload. Did I have any misunderstanding?
> >>>
> >>> [Tom] Skipping header is useful so that transit devices can find the
> >>> inner headers. The fact that there is no way to skip over an IPv6
> >>> extension header chain to find the transport headers of a packet has
> >>> been a source of unhappiness.
> >>>
> >>>
> >>> [Lizhong] That's correct, and if we have not any working around way,
> >>>
> >>> some device may fail to get inner header, just like IPv6 with too many
> >>>
> >>> extension headers fails to parse transport header. Currently many chips
> >>>
> >>> have this IPv6 extension header limitation.
> >>>
> >>>
> >>> [Tom] The parser buffer limit applies to all headers a device wishes
> >>> to inspect (some devices still may have less than 512 byte buffers
> >>> also). The best way to deal with this is to minimize the length of
> >>> headers. Geneve TLVs each have four bytes of overhead so they are less
> >>> compact that other TLVs at similar layer (IP options, TCP options,
> >>> IPv6 options each have two bytes overhead). The tradeoff made here is
> >>> probably to simply alignment (I really don't see any rationale for
> >>> needing 24 bits to identify options). Bit-fields are still better in
> >>> this regard for being compact since there is no additional overhead
> >>> per each option.
> >>>
> >>>
> >>> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload
> design.
> >>>
> >>> Since one of the purpose of NIC to parse inner header is to get a hash
> 

Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-15 Thread Joe Touch


On 5/15/2017 12:53 PM, Tom Herbert wrote:
> I'm pretty sure the latest versions of all the major OSes are setting
> the flow label 
If that hasn't happened, I agree it's likely to happen soon...

> so hopefully the motivation to do DPI will go away.
DPI isn't just based on flow management; it's also done for security
purposes. That latter case is going to get worse until the deep info
becomes obscured by encryption, IMO.

Joe

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-15 Thread Tom Herbert
On Mon, May 15, 2017 at 12:02 PM, Joe Touch  wrote:
>
>
> On 5/6/2017 8:24 AM, Tom Herbert wrote:
>> Using the entropy in the UDP port number works perfectly well to get
>> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
>> etc. If the UDP port number  weren't good enough then the IPv6 flow
>> label can be used (and that works for _any_ protocol not just UDP!).
>
> If the IPv6 flow is set, intermediate devices really have no business
> doing DPI to infer flows themselves.
>
> I.e., the first thing is to check the flow ID and peek further only when
> that fails (e.g., flow ID is zero).
>
I'm pretty sure the latest versions of all the major OSes are setting
the flow label so hopefully the motivation to do DPI will go away.
It's probably up to the switch vendors now to take flow label as input
to ECMP and completing the deployment of IPv6...

Tom

> Joe

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-15 Thread Joe Touch


On 5/6/2017 8:24 AM, Tom Herbert wrote:
> Using the entropy in the UDP port number works perfectly well to get
> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
> etc. If the UDP port number  weren't good enough then the IPv6 flow
> label can be used (and that works for _any_ protocol not just UDP!).

If the IPv6 flow is set, intermediate devices really have no business
doing DPI to infer flows themselves.

I.e., the first thing is to check the flow ID and peek further only when
that fails (e.g., flow ID is zero).

Joe

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-06 Thread Tom Herbert
On Sat, May 6, 2017 at 9:05 AM, Greg Mirsky  wrote:
> Hi Tom and Lizhong,
> I the strongest terms agree with your view that intermediate nodes should
> not use DPI to do flow steering. Decisions should be based on information
> expressed in the transport layer, not derived from the payload. Otherwise,
> active OAM cannot be viewed as in-band thus making interpretation of defects
> and performance metrics less accurate.
>
IMO, if OAM requires that network nodes inspect and change data in
flight this would better served by IPv6 Hop by Hop options (modifiable
bit set). They are designed for this purpose, eliminate the need for
DPI, and work with any protocol not just a specific UDP encapsulation.

Tom

> Regards,
> Greg
>
> On Sat, May 6, 2017 at 8:24 AM, Tom Herbert  wrote:
>>
>> On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
>> > Tom, thanks for the reply, see inline below.
>> >
>> > Regards
>> > Lizhong
>> >
>> > On 05/6/2017 00:14,Tom Herbert wrote:
>> >
>> > [Lizhong] Total option length will not solve the parser buffer issue.
>> > The parser buffer is located before parser, and for Geneve, implement
>> > 512Byte is the only way since the longest of Geneve header is
>> > 260Bytes. At least in some implementations as I know, hardware will
>> > firstly receive enough 512Bytes per packets, and send the 512Bytes to
>> > parser. Then parse will be able to skip over options to get inner
>> > payload. Did I have any misunderstanding?
>> >
>> > [Tom] Skipping header is useful so that transit devices can find the
>> > inner headers. The fact that there is no way to skip over an IPv6
>> > extension header chain to find the transport headers of a packet has
>> > been a source of unhappiness.
>> >
>> >
>> > [Lizhong] That's correct, and if we have not any working around way,
>> >
>> > some device may fail to get inner header, just like IPv6 with too many
>> >
>> > extension headers fails to parse transport header. Currently many chips
>> >
>> > have this IPv6 extension header limitation.
>> >
>> >
>> > [Tom] The parser buffer limit applies to all headers a device wishes
>> > to inspect (some devices still may have less than 512 byte buffers
>> > also). The best way to deal with this is to minimize the length of
>> > headers. Geneve TLVs each have four bytes of overhead so they are less
>> > compact that other TLVs at similar layer (IP options, TCP options,
>> > IPv6 options each have two bytes overhead). The tradeoff made here is
>> > probably to simply alignment (I really don't see any rationale for
>> > needing 24 bits to identify options). Bit-fields are still better in
>> > this regard for being compact since there is no additional overhead
>> > per each option.
>> >
>> >
>> > [Lizhong] I suspect, a 260Bytes long Geneve header is an overload
>> > design.
>> >
>> > Since one of the purpose of NIC to parse inner header is to get a hash
>> > value
>> >
>> > to do flow steering, one way is to define a Geneve TLV which SHOULD be
>> >
>> > at the first one to carry the hash value of inner 5-tuple, and also hash
>> > algorithm.
>> >
>> > Then NIC may only need to parse to the first Geneve TLV.
>> >
>> > Note that the source UDP port could not serve that purpose since that
>> > port
>> >
>> > number could not be able to be predicted by the receiver.
>> >
>> Using the entropy in the UDP port number works perfectly well to get
>> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
>> etc. If the UDP port number  weren't good enough then the IPv6 flow
>> label can be used (and that works for _any_ protocol not just UDP!).
>>
>> The goal should be to discourage intermediate devices from doing DPI
>> into transport layer payloads. It requires a bunch of protocol
>> specific logic and any interpretation may be completely wrong since
>> port numbers don't have global meaning (e.g. if a device see a UDP
>> port destined to port 6081 in the network it may or may not be
>> Geneve).
>>
>> Tom
>>
>> >
>> >
>> >
>>
>> ___
>> nvo3 mailing list
>> nvo3@ietf.org
>> https://www.ietf.org/mailman/listinfo/nvo3
>
>

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-06 Thread Tom Herbert
On Sat, May 6, 2017 at 9:15 AM, lizho.jin  wrote:
> Tom, see inline below.
>
>
> Regards
> Lizhong
>
> On 05/6/2017 23:45,Tom Herbert wrote:
>
> On Sat, May 6, 2017 at 8:37 AM, lizho.jin  wrote:
>> I am not referring RSS, but RFS with HW acceleration. What I
>>
>> proposed is to use hash value instead of 5-tuple to do flow steering.
>>
> RFS works as is also. The only requirement for RFS is that the hash is
> reasonably consistent for a flow. The host should never need to
> reverse engineer the hash a NIC does.
>
> [Lizhong] but the consistent requirement will not be met sometimes. Way of
> generating
>
> the source UDP port is privately designed. For example, what will be the
>
> rule to generate the source UDP port for the first TCP/UDP fragment packet.
>
> Some may use 5-tuple while some may use 3-tuple.
>
Or they may use the same port all the time and get no entropy at all.
But, all the UDP encapsulation drafts say to set UDP source port with
flow entry and the reference implementation (Linux) does this
automatically for such protocols. UDP source port without flow entry
is an implementation edge case that I don't think justifies the
complexity to solve in hardware. UDP hash work today across commodity
hardware to give us RSS, RPS, and RFS. Note, checksum offload is
similarly solves in a protocol agnostic way so we don't need explicit
support in NICs for that either.

Please see 
https://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf
for details.

Tom

> And because of hash confliction, many hardware accelerated RFS do not
>
> use hash to select the CPU core, but use 5-tuple to select the CPU core.
> While
>
> some privately designed method of source UDP port generation use very small
> port
>
> range which will worse the hash confliction.
>
>
>
> Tom
>
>> Sorry for the misunderstanding.
>>
>>
>> Regards
>> Lizhong
>>
>> On 05/6/2017 23:24,Tom Herbert wrote:
>>
>> On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
>>> Tom, thanks for the reply, see inline below.
>>>
>>> Regards
>>> Lizhong
>>>
>>> On 05/6/2017 00:14,Tom Herbert wrote:
>>>
>>> [Lizhong] Total option length will not solve the parser buffer issue.
>>> The parser buffer is located before parser, and for Geneve, implement
>>> 512Byte is the only way since the longest of Geneve header is
>>> 260Bytes. At least in some implementations as I know, hardware will
>>> firstly receive enough 512Bytes per packets, and send the 512Bytes to
>>> parser. Then parse will be able to skip over options to get inner
>>> payload. Did I have any misunderstanding?
>>>
>>> [Tom] Skipping header is useful so that transit devices can find the
>>> inner headers. The fact that there is no way to skip over an IPv6
>>> extension header chain to find the transport headers of a packet has
>>> been a source of unhappiness.
>>>
>>>
>>> [Lizhong] That's correct, and if we have not any working around way,
>>>
>>> some device may fail to get inner header, just like IPv6 with too many
>>>
>>> extension headers fails to parse transport header. Currently many chips
>>>
>>> have this IPv6 extension header limitation.
>>>
>>>
>>> [Tom] The parser buffer limit applies to all headers a device wishes
>>> to inspect (some devices still may have less than 512 byte buffers
>>> also). The best way to deal with this is to minimize the length of
>>> headers. Geneve TLVs each have four bytes of overhead so they are less
>>> compact that other TLVs at similar layer (IP options, TCP options,
>>> IPv6 options each have two bytes overhead). The tradeoff made here is
>>> probably to simply alignment (I really don't see any rationale for
>>> needing 24 bits to identify options). Bit-fields are still better in
>>> this regard for being compact since there is no additional overhead
>>> per each option.
>>>
>>>
>>> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.
>>>
>>> Since one of the purpose of NIC to parse inner header is to get a hash
>>> value
>>>
>>> to do flow steering, one way is to define a Geneve TLV which SHOULD be
>>>
>>> at the first one to carry the hash value of inner 5-tuple, and also hash
>>> algorithm.
>>>
>>> Then NIC may only need to parse to the first Geneve TLV.
>>>
>>> Note that the source UDP port could not serve that purpose since that
>>> port
>>>
>>> number could not be able to be predicted by the receiver.
>>>
>> Using the entropy in the UDP port number works perfectly well to get
>> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
>> etc. If the UDP port number  weren't good enough then the IPv6 flow
>> label can be used (and that works for _any_ protocol not just UDP!).
>>
>>
>> The goal should be to discourage intermediate devices from doing DPI
>> into transport layer payloads. It requires a bunch of protocol
>> specific logic and any interpretation may be completely 

Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-06 Thread lizho.jin






Tom, see inline below.







RegardsLizhong





On 05/6/2017 23:45,Tom Herbert wrote: 


On Sat, May 6, 2017 at 8:37 AM, lizho.jin  wrote:
> I am not referring RSS, but RFS with HW acceleration. What I
>
> proposed is to use hash value instead of 5-tuple to do flow steering.
>
RFS works as is also. The only requirement for RFS is that the hash is
reasonably consistent for a flow. The host should never need to
reverse engineer the hash a NIC does. [Lizhong] but the consistent requirement will not be met sometimes. Way of generatingthe source UDP port is privately designed. For example, what will be therule to generate the source UDP port for the first TCP/UDP fragment packet.Some may use 5-tuple while some may use 3-tuple.And because of hash confliction, many hardware accelerated RFS do notuse hash to select the CPU core, but use 5-tuple to select the CPU core. Whilesome privately designed method of source UDP port generation use very small portrange which will worse the hash confliction.
Tom

> Sorry for the misunderstanding.
>
>
> Regards
> Lizhong
>
> On 05/6/2017 23:24,Tom Herbert wrote:
>
> On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
>> Tom, thanks for the reply, see inline below.
>>
>> Regards
>> Lizhong
>>
>> On 05/6/2017 00:14,Tom Herbert wrote:
>>
>> [Lizhong] Total option length will not solve the parser buffer issue.
>> The parser buffer is located before parser, and for Geneve, implement
>> 512Byte is the only way since the longest of Geneve header is
>> 260Bytes. At least in some implementations as I know, hardware will
>> firstly receive enough 512Bytes per packets, and send the 512Bytes to
>> parser. Then parse will be able to skip over options to get inner
>> payload. Did I have any misunderstanding?
>>
>> [Tom] Skipping header is useful so that transit devices can find the
>> inner headers. The fact that there is no way to skip over an IPv6
>> extension header chain to find the transport headers of a packet has
>> been a source of unhappiness.
>>
>>
>> [Lizhong] That's correct, and if we have not any working around way,
>>
>> some device may fail to get inner header, just like IPv6 with too many
>>
>> extension headers fails to parse transport header. Currently many chips
>>
>> have this IPv6 extension header limitation.
>>
>>
>> [Tom] The parser buffer limit applies to all headers a device wishes
>> to inspect (some devices still may have less than 512 byte buffers
>> also). The best way to deal with this is to minimize the length of
>> headers. Geneve TLVs each have four bytes of overhead so they are less
>> compact that other TLVs at similar layer (IP options, TCP options,
>> IPv6 options each have two bytes overhead). The tradeoff made here is
>> probably to simply alignment (I really don't see any rationale for
>> needing 24 bits to identify options). Bit-fields are still better in
>> this regard for being compact since there is no additional overhead
>> per each option.
>>
>>
>> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.
>>
>> Since one of the purpose of NIC to parse inner header is to get a hash
>> value
>>
>> to do flow steering, one way is to define a Geneve TLV which SHOULD be
>>
>> at the first one to carry the hash value of inner 5-tuple, and also hash
>> algorithm.
>>
>> Then NIC may only need to parse to the first Geneve TLV.
>>
>> Note that the source UDP port could not serve that purpose since that port
>>
>> number could not be able to be predicted by the receiver.
>>
> Using the entropy in the UDP port number works perfectly well to get
> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
> etc. If the UDP port number  weren't good enough then the IPv6 flow
> label can be used (and that works for _any_ protocol not just UDP!).
>
>
> The goal should be to discourage intermediate devices from doing DPI
> into transport layer payloads. It requires a bunch of protocol
> specific logic and any interpretation may be completely wrong since
> port numbers don't have global meaning (e.g. if a device see a UDP
> port destined to port 6081 in the network it may or may not be
> Geneve).
>
> Tom
>
>>
>>
>>



___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-06 Thread Greg Mirsky
Hi Tom and Lizhong,
I the strongest terms agree with your view that intermediate nodes should
not use DPI to do flow steering. Decisions should be based on information
expressed in the transport layer, not derived from the payload. Otherwise,
active OAM cannot be viewed as in-band thus making interpretation of
defects and performance metrics less accurate.

Regards,
Greg

On Sat, May 6, 2017 at 8:24 AM, Tom Herbert  wrote:

> On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
> > Tom, thanks for the reply, see inline below.
> >
> > Regards
> > Lizhong
> >
> > On 05/6/2017 00:14,Tom Herbert wrote:
> >
> > [Lizhong] Total option length will not solve the parser buffer issue.
> > The parser buffer is located before parser, and for Geneve, implement
> > 512Byte is the only way since the longest of Geneve header is
> > 260Bytes. At least in some implementations as I know, hardware will
> > firstly receive enough 512Bytes per packets, and send the 512Bytes to
> > parser. Then parse will be able to skip over options to get inner
> > payload. Did I have any misunderstanding?
> >
> > [Tom] Skipping header is useful so that transit devices can find the
> > inner headers. The fact that there is no way to skip over an IPv6
> > extension header chain to find the transport headers of a packet has
> > been a source of unhappiness.
> >
> >
> > [Lizhong] That's correct, and if we have not any working around way,
> >
> > some device may fail to get inner header, just like IPv6 with too many
> >
> > extension headers fails to parse transport header. Currently many chips
> >
> > have this IPv6 extension header limitation.
> >
> >
> > [Tom] The parser buffer limit applies to all headers a device wishes
> > to inspect (some devices still may have less than 512 byte buffers
> > also). The best way to deal with this is to minimize the length of
> > headers. Geneve TLVs each have four bytes of overhead so they are less
> > compact that other TLVs at similar layer (IP options, TCP options,
> > IPv6 options each have two bytes overhead). The tradeoff made here is
> > probably to simply alignment (I really don't see any rationale for
> > needing 24 bits to identify options). Bit-fields are still better in
> > this regard for being compact since there is no additional overhead
> > per each option.
> >
> >
> > [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.
> >
> > Since one of the purpose of NIC to parse inner header is to get a hash
> value
> >
> > to do flow steering, one way is to define a Geneve TLV which SHOULD be
> >
> > at the first one to carry the hash value of inner 5-tuple, and also hash
> > algorithm.
> >
> > Then NIC may only need to parse to the first Geneve TLV.
> >
> > Note that the source UDP port could not serve that purpose since that
> port
> >
> > number could not be able to be predicted by the receiver.
> >
> Using the entropy in the UDP port number works perfectly well to get
> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
> etc. If the UDP port number  weren't good enough then the IPv6 flow
> label can be used (and that works for _any_ protocol not just UDP!).
>
> The goal should be to discourage intermediate devices from doing DPI
> into transport layer payloads. It requires a bunch of protocol
> specific logic and any interpretation may be completely wrong since
> port numbers don't have global meaning (e.g. if a device see a UDP
> port destined to port 6081 in the network it may or may not be
> Geneve).
>
> Tom
>
> >
> >
> >
>
> ___
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3
>
___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-06 Thread Tom Herbert
On Sat, May 6, 2017 at 8:37 AM, lizho.jin  wrote:
> I am not referring RSS, but RFS with HW acceleration. What I
>
> proposed is to use hash value instead of 5-tuple to do flow steering.
>
RFS works as is also. The only requirement for RFS is that the hash is
reasonably consistent for a flow. The host should never need to
reverse engineer the hash a NIC does.

Tom

> Sorry for the misunderstanding.
>
>
> Regards
> Lizhong
>
> On 05/6/2017 23:24,Tom Herbert wrote:
>
> On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
>> Tom, thanks for the reply, see inline below.
>>
>> Regards
>> Lizhong
>>
>> On 05/6/2017 00:14,Tom Herbert wrote:
>>
>> [Lizhong] Total option length will not solve the parser buffer issue.
>> The parser buffer is located before parser, and for Geneve, implement
>> 512Byte is the only way since the longest of Geneve header is
>> 260Bytes. At least in some implementations as I know, hardware will
>> firstly receive enough 512Bytes per packets, and send the 512Bytes to
>> parser. Then parse will be able to skip over options to get inner
>> payload. Did I have any misunderstanding?
>>
>> [Tom] Skipping header is useful so that transit devices can find the
>> inner headers. The fact that there is no way to skip over an IPv6
>> extension header chain to find the transport headers of a packet has
>> been a source of unhappiness.
>>
>>
>> [Lizhong] That's correct, and if we have not any working around way,
>>
>> some device may fail to get inner header, just like IPv6 with too many
>>
>> extension headers fails to parse transport header. Currently many chips
>>
>> have this IPv6 extension header limitation.
>>
>>
>> [Tom] The parser buffer limit applies to all headers a device wishes
>> to inspect (some devices still may have less than 512 byte buffers
>> also). The best way to deal with this is to minimize the length of
>> headers. Geneve TLVs each have four bytes of overhead so they are less
>> compact that other TLVs at similar layer (IP options, TCP options,
>> IPv6 options each have two bytes overhead). The tradeoff made here is
>> probably to simply alignment (I really don't see any rationale for
>> needing 24 bits to identify options). Bit-fields are still better in
>> this regard for being compact since there is no additional overhead
>> per each option.
>>
>>
>> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.
>>
>> Since one of the purpose of NIC to parse inner header is to get a hash
>> value
>>
>> to do flow steering, one way is to define a Geneve TLV which SHOULD be
>>
>> at the first one to carry the hash value of inner 5-tuple, and also hash
>> algorithm.
>>
>> Then NIC may only need to parse to the first Geneve TLV.
>>
>> Note that the source UDP port could not serve that purpose since that port
>>
>> number could not be able to be predicted by the receiver.
>>
> Using the entropy in the UDP port number works perfectly well to get
> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
> etc. If the UDP port number  weren't good enough then the IPv6 flow
> label can be used (and that works for _any_ protocol not just UDP!).
>
>
> The goal should be to discourage intermediate devices from doing DPI
> into transport layer payloads. It requires a bunch of protocol
> specific logic and any interpretation may be completely wrong since
> port numbers don't have global meaning (e.g. if a device see a UDP
> port destined to port 6081 in the network it may or may not be
> Geneve).
>
> Tom
>
>>
>>
>>

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-06 Thread lizho.jin






I am not referring RSS, but RFS with HW acceleration. What I proposed is to use hash value instead of 5-tuple to do flow steering.Sorry for the misunderstanding.







RegardsLizhong





On 05/6/2017 23:24,Tom Herbert wrote: 


On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
> Tom, thanks for the reply, see inline below.
>
> Regards
> Lizhong
>
> On 05/6/2017 00:14,Tom Herbert wrote:
>
> [Lizhong] Total option length will not solve the parser buffer issue.
> The parser buffer is located before parser, and for Geneve, implement
> 512Byte is the only way since the longest of Geneve header is
> 260Bytes. At least in some implementations as I know, hardware will
> firstly receive enough 512Bytes per packets, and send the 512Bytes to
> parser. Then parse will be able to skip over options to get inner
> payload. Did I have any misunderstanding?
>
> [Tom] Skipping header is useful so that transit devices can find the
> inner headers. The fact that there is no way to skip over an IPv6
> extension header chain to find the transport headers of a packet has
> been a source of unhappiness.
>
>
> [Lizhong] That's correct, and if we have not any working around way,
>
> some device may fail to get inner header, just like IPv6 with too many
>
> extension headers fails to parse transport header. Currently many chips
>
> have this IPv6 extension header limitation.
>
>
> [Tom] The parser buffer limit applies to all headers a device wishes
> to inspect (some devices still may have less than 512 byte buffers
> also). The best way to deal with this is to minimize the length of
> headers. Geneve TLVs each have four bytes of overhead so they are less
> compact that other TLVs at similar layer (IP options, TCP options,
> IPv6 options each have two bytes overhead). The tradeoff made here is
> probably to simply alignment (I really don't see any rationale for
> needing 24 bits to identify options). Bit-fields are still better in
> this regard for being compact since there is no additional overhead
> per each option.
>
>
> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.
>
> Since one of the purpose of NIC to parse inner header is to get a hash value
>
> to do flow steering, one way is to define a Geneve TLV which SHOULD be
>
> at the first one to carry the hash value of inner 5-tuple, and also hash
> algorithm.
>
> Then NIC may only need to parse to the first Geneve TLV.
>
> Note that the source UDP port could not serve that purpose since that port
>
> number could not be able to be predicted by the receiver.
>
Using the entropy in the UDP port number works perfectly well to get
ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
etc. If the UDP port number  weren't good enough then the IPv6 flow
label can be used (and that works for _any_ protocol not just UDP!). The goal should be to discourage intermediate devices from doing DPI
into transport layer payloads. It requires a bunch of protocol
specific logic and any interpretation may be completely wrong since
port numbers don't have global meaning (e.g. if a device see a UDP
port destined to port 6081 in the network it may or may not be
Geneve).

Tom

>
>
>



___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-06 Thread Tom Herbert
On Fri, May 5, 2017 at 6:39 PM, lizho.jin  wrote:
> Tom, thanks for the reply, see inline below.
>
> Regards
> Lizhong
>
> On 05/6/2017 00:14,Tom Herbert wrote:
>
> [Lizhong] Total option length will not solve the parser buffer issue.
> The parser buffer is located before parser, and for Geneve, implement
> 512Byte is the only way since the longest of Geneve header is
> 260Bytes. At least in some implementations as I know, hardware will
> firstly receive enough 512Bytes per packets, and send the 512Bytes to
> parser. Then parse will be able to skip over options to get inner
> payload. Did I have any misunderstanding?
>
> [Tom] Skipping header is useful so that transit devices can find the
> inner headers. The fact that there is no way to skip over an IPv6
> extension header chain to find the transport headers of a packet has
> been a source of unhappiness.
>
>
> [Lizhong] That's correct, and if we have not any working around way,
>
> some device may fail to get inner header, just like IPv6 with too many
>
> extension headers fails to parse transport header. Currently many chips
>
> have this IPv6 extension header limitation.
>
>
> [Tom] The parser buffer limit applies to all headers a device wishes
> to inspect (some devices still may have less than 512 byte buffers
> also). The best way to deal with this is to minimize the length of
> headers. Geneve TLVs each have four bytes of overhead so they are less
> compact that other TLVs at similar layer (IP options, TCP options,
> IPv6 options each have two bytes overhead). The tradeoff made here is
> probably to simply alignment (I really don't see any rationale for
> needing 24 bits to identify options). Bit-fields are still better in
> this regard for being compact since there is no additional overhead
> per each option.
>
>
> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.
>
> Since one of the purpose of NIC to parse inner header is to get a hash value
>
> to do flow steering, one way is to define a Geneve TLV which SHOULD be
>
> at the first one to carry the hash value of inner 5-tuple, and also hash
> algorithm.
>
> Then NIC may only need to parse to the first Geneve TLV.
>
> Note that the source UDP port could not serve that purpose since that port
>
> number could not be able to be predicted by the receiver.
>
Using the entropy in the UDP port number works perfectly well to get
ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
etc. If the UDP port number  weren't good enough then the IPv6 flow
label can be used (and that works for _any_ protocol not just UDP!).

The goal should be to discourage intermediate devices from doing DPI
into transport layer payloads. It requires a bunch of protocol
specific logic and any interpretation may be completely wrong since
port numbers don't have global meaning (e.g. if a device see a UDP
port destined to port 6081 in the network it may or may not be
Geneve).

Tom

>
>
>

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-05 Thread lizho.jin






Tom, thanks for the reply, see inline below.







RegardsLizhong





On 05/6/2017 00:14,Tom Herbert wrote: 


[Lizhong] Total option length will not solve the parser buffer issue.
The parser buffer is located before parser, and for Geneve, implement
512Byte is the only way since the longest of Geneve header is
260Bytes. At least in some implementations as I know, hardware will
firstly receive enough 512Bytes per packets, and send the 512Bytes to
parser. Then parse will be able to skip over options to get inner
payload. Did I have any misunderstanding?

[Tom] Skipping header is useful so that transit devices can find the
inner headers. The fact that there is no way to skip over an IPv6
extension header chain to find the transport headers of a packet has
been a source of unhappiness. [Lizhong] That's correct, and if we have not any working around way,some device may fail to get inner header, just like IPv6 with too manyextension headers fails to parse transport header. Currently many chipshave this IPv6 extension header limitation.
[Tom] The parser buffer limit applies to all headers a device wishes
to inspect (some devices still may have less than 512 byte buffers
also). The best way to deal with this is to minimize the length of
headers. Geneve TLVs each have four bytes of overhead so they are less
compact that other TLVs at similar layer (IP options, TCP options,
IPv6 options each have two bytes overhead). The tradeoff made here is
probably to simply alignment (I really don't see any rationale for
needing 24 bits to identify options). Bit-fields are still better in
this regard for being compact since there is no additional overhead
per each option.
[Lizhong] I suspect, a 260Bytes long Geneve header is an overload design.Since one of the purpose of NIC to parse inner header is to get a hash valueto do flow steering, one way is to define a Geneve TLV which SHOULD be at the first one to carry the hash value of inner 5-tuple, and also hash algorithm.Then NIC may only need to parse to the first Geneve TLV.Note that the source UDP port could not serve that purpose since that portnumber could not be able to be predicted by the receiver. 


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


Re: [nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-05 Thread Tom Herbert
[Lizhong] Total option length will not solve the parser buffer issue.
The parser buffer is located before parser, and for Geneve, implement
512Byte is the only way since the longest of Geneve header is
260Bytes. At least in some implementations as I know, hardware will
firstly receive enough 512Bytes per packets, and send the 512Bytes to
parser. Then parse will be able to skip over options to get inner
payload. Did I have any misunderstanding?

[Tom] Skipping header is useful so that transit devices can find the
inner headers. The fact that there is no way to skip over an IPv6
extension header chain to find the transport headers of a packet has
been a source of unhappiness.

[Tom] The parser buffer limit applies to all headers a device wishes
to inspect (some devices still may have less than 512 byte buffers
also). The best way to deal with this is to minimize the length of
headers. Geneve TLVs each have four bytes of overhead so they are less
compact that other TLVs at similar layer (IP options, TCP options,
IPv6 options each have two bytes overhead). The tradeoff made here is
probably to simply alignment (I really don't see any rationale for
needing 24 bits to identify options). Bit-fields are still better in
this regard for being compact since there is no additional overhead
per each option.

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


[nvo3] One comment for draft-dt-nvo3-encap-01

2017-05-03 Thread lizho.jin






Hi authors,One comment for the section 7 "Design team recommendations":2. Geneve has the total options length that allow skipping over the
   options for NIC offload operations, and will allow transit devices to
   view flow information in the inner payload.[Lizhong] Total option length will not solve the parser buffer issue. The parser buffer is located before parser, and for Geneve, implement 512Byte is the only way since the longest of Geneve header is 260Bytes. At least in some implementations as I know, hardware will firstly receive enough 512Bytes per packets, and send the 512Bytes to parser. Then parse will be able to skip over options to get inner payload. Did I have any misunderstanding?RegardsLizhong




___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3