Ben, Looking into the current OVS behavior w.r.t. IP fragments: Based on datapath/flow.c function key_extract, it looks like OVS would treat a multi-fragment UDP packet as 2 different flows: 1) Unfragmented packets + first fragments for a given 5-tuple 2) Subsequent fragments ( for src/dst ip/proto 3-tuple )
That means the multipath hash calculation is performed twice, not for each fragment but for each flow. For consistency, it could be considered to change this key_extract logic to group together fragmented packets ( first + subsequent fragments ) versus unfragmented packets instead. That way, the first fragment would get the same treatment as subsequent fragments ( e.g. logging, hashing, mirroring, DSCP rewriting, whatever people do with a flow ). It is very likely that a first fragment is soon followed by a next fragment of that same flow, so memory access wise the flow cache would benefit from hashing these packets to the same slot. A potential downside or caveat would be that logic depending on flow port matching ( e.g. ACLs ) would no longer match on first fragments, so fragment processing further downstream would have to be evaluated. I would say it's a potentially disruptive change with unclear benefits ( although it would fix the fragment hashing issue, in a different way ) Regards, Jeroen -----Original Message----- From: Ben Pfaff <[email protected]> Sent: Wednesday, June 5, 2019 3:54 PM To: Van Bemmel, Jeroen (Nokia - US) <[email protected]> Cc: Gregory Rose <[email protected]>; [email protected] Subject: Re: [ovs-dev] Fix for hashing fragmented UDP packets On Wed, Jun 05, 2019 at 08:34:50PM +0000, Van Bemmel, Jeroen (Nokia - US) wrote: > Hi Greg, Ben, > > I doubt we would see a measurable difference in performance, with the > additional conditional jump based on the packet flags. That does bring > up an interesting question: Shouldn't fragmented packets all hash to > the same single flow, and shouldn't the resulting multipath hash value > get cached ( for at least 5 secs or so )? Based on our observations it > looks like the hash is calculated for each individual fragment, which > would be sub-optimal. Hmm. That *is* suboptimal. If you figure out anything about why it is not doing better, then please do follow up on it. > We would still need to exclude ports for the first fragment, in case > some subsequent fragments arrive after the flow entry disappeared - > but in theory, the hash could be done once, for the first packet in > each flow ( if there is space in the flow cache entry ) Yes. > In our case, it's not only that packets could get reordered due to > taking different paths - the ECMP destinations are end systems ( like > an anycast IP ) and reassembly fails because the first packet is sent > to one host, and the rest of the fragments to another host. > > Ben - you are correct that it also applies to TCP and SCTP in theory, > just that you won't typically see fragments for those protocols. I'll > prepare a formal patch to fix it for all protocols Great. Thank you. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
