Hi Tom,

Sorry for the late reply, I finally get the time to read your document.
Yes, you are right for the Linux RFS implementation, where RFS is indexed
with hash value. But for the NIC hardware accelerated RFS, it is not the
case. The flow is indexed not by hash value, but 5/4/3/2-tuple exact match
which will improve the performance flow steering. As we know, there will be
collision when using hash value. You could refer some NIC datasheet for the
detail. Then if NIC could not parse the inner header, it will fail to have
same flow steering as currently doing.



Regards

Lizhong

On Sun, May 7, 2017 at 12:32 AM, Tom Herbert <t...@herbertland.com> wrote:

> On Sat, May 6, 2017 at 9:15 AM, lizho.jin <lizho....@gmail.com> wrote:
> > Tom, see inline below.
> >
> >
> > Regards
> > Lizhong
> >
> > On 05/6/2017 23:45,Tom Herbert<t...@herbertland.com> wrote:
> >
> > On Sat, May 6, 2017 at 8:37 AM, lizho.jin <lizho....@gmail.com> wrote:
> >> I am not referring RSS, but RFS with HW acceleration. What I
> >>
> >> proposed is to use hash value instead of 5-tuple to do flow steering.
> >>
> > RFS works as is also. The only requirement for RFS is that the hash is
> > reasonably consistent for a flow. The host should never need to
> > reverse engineer the hash a NIC does.
> >
> > [Lizhong] but the consistent requirement will not be met sometimes. Way
> of
> > generating
> >
> > the source UDP port is privately designed. For example, what will be the
> >
> > rule to generate the source UDP port for the first TCP/UDP fragment
> packet.
> >
> > Some may use 5-tuple while some may use 3-tuple.
> >
> Or they may use the same port all the time and get no entropy at all.
> But, all the UDP encapsulation drafts say to set UDP source port with
> flow entry and the reference implementation (Linux) does this
> automatically for such protocols. UDP source port without flow entry
> is an implementation edge case that I don't think justifies the
> complexity to solve in hardware. UDP hash work today across commodity
> hardware to give us RSS, RPS, and RFS. Note, checksum offload is
> similarly solves in a protocol agnostic way so we don't need explicit
> support in NICs for that either.
>
> Please see https://people.netfilter.org/pablo/netdev0.1/papers/UDP-
> Encapsulation-in-Linux.pdf
> for details.
>
> Tom
>
> > And because of hash confliction, many hardware accelerated RFS do not
> >
> > use hash to select the CPU core, but use 5-tuple to select the CPU core.
> > While
> >
> > some privately designed method of source UDP port generation use very
> small
> > port
> >
> > range which will worse the hash confliction.
> >
> >
> >
> > Tom
> >
> >> Sorry for the misunderstanding.
> >>
> >>
> >> Regards
> >> Lizhong
> >>
> >> On 05/6/2017 23:24,Tom Herbert<t...@herbertland.com> wrote:
> >>
> >> On Fri, May 5, 2017 at 6:39 PM, lizho.jin <lizho....@gmail.com> wrote:
> >>> Tom, thanks for the reply, see inline below.
> >>>
> >>> Regards
> >>> Lizhong
> >>>
> >>> On 05/6/2017 00:14,Tom Herbert<t...@herbertland.com> wrote:
> >>>
> >>> [Lizhong] Total option length will not solve the parser buffer issue.
> >>> The parser buffer is located before parser, and for Geneve, implement
> >>> 512Byte is the only way since the longest of Geneve header is
> >>> 260Bytes. At least in some implementations as I know, hardware will
> >>> firstly receive enough 512Bytes per packets, and send the 512Bytes to
> >>> parser. Then parse will be able to skip over options to get inner
> >>> payload. Did I have any misunderstanding?
> >>>
> >>> [Tom] Skipping header is useful so that transit devices can find the
> >>> inner headers. The fact that there is no way to skip over an IPv6
> >>> extension header chain to find the transport headers of a packet has
> >>> been a source of unhappiness.
> >>>
> >>>
> >>> [Lizhong] That's correct, and if we have not any working around way,
> >>>
> >>> some device may fail to get inner header, just like IPv6 with too many
> >>>
> >>> extension headers fails to parse transport header. Currently many chips
> >>>
> >>> have this IPv6 extension header limitation.
> >>>
> >>>
> >>> [Tom] The parser buffer limit applies to all headers a device wishes
> >>> to inspect (some devices still may have less than 512 byte buffers
> >>> also). The best way to deal with this is to minimize the length of
> >>> headers. Geneve TLVs each have four bytes of overhead so they are less
> >>> compact that other TLVs at similar layer (IP options, TCP options,
> >>> IPv6 options each have two bytes overhead). The tradeoff made here is
> >>> probably to simply alignment (I really don't see any rationale for
> >>> needing 24 bits to identify options). Bit-fields are still better in
> >>> this regard for being compact since there is no additional overhead
> >>> per each option.
> >>>
> >>>
> >>> [Lizhong] I suspect, a 260Bytes long Geneve header is an overload
> design.
> >>>
> >>> Since one of the purpose of NIC to parse inner header is to get a hash
> >>> value
> >>>
> >>> to do flow steering, one way is to define a Geneve TLV which SHOULD be
> >>>
> >>> at the first one to carry the hash value of inner 5-tuple, and also
> hash
> >>> algorithm.
> >>>
> >>> Then NIC may only need to parse to the first Geneve TLV.
> >>>
> >>> Note that the source UDP port could not serve that purpose since that
> >>> port
> >>>
> >>> number could not be able to be predicted by the receiver.
> >>>
> >> Using the entropy in the UDP port number works perfectly well to get
> >> ECMP or RSS  for any UDP encapsulation including Geneve, VXLAN, GUE,
> >> etc. If the UDP port number  weren't good enough then the IPv6 flow
> >> label can be used (and that works for _any_ protocol not just UDP!).
> >>
> >>
> >> The goal should be to discourage intermediate devices from doing DPI
> >> into transport layer payloads. It requires a bunch of protocol
> >> specific logic and any interpretation may be completely wrong since
> >> port numbers don't have global meaning (e.g. if a device see a UDP
> >> port destined to port 6081 in the network it may or may not be
> >> Geneve).
> >>
> >> Tom
> >>
> >>>
> >>>
> >>>
>
_______________________________________________
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Reply via email to