Hey Ian, Thanks for this patch.
Apart from the issues that Flavio pointed out, I have a few additional comments inline. Cheers, Mark >From: Stokes, Ian >Sent: Friday, January 19, 2018 2:47 PM >To: Flavio Leitner <[email protected]> >Cc: [email protected]; Kavanagh, Mark B <[email protected]> >Subject: RE: [ovs-dev] [RFC PATCH v2 1/1] netdev-dpdk: Fix requested MTU >size validation. > >> Hi Ian, >> >> On Thu, Jan 18, 2018 at 05:32:22PM +0000, Ian Stokes wrote: >> > This commit replaces MTU_TO_FRAME_LEN(mtu) with >> > MTU_TO_MAX_FRAME_LEN(mtu) when validating if an MTU will exceed >> > NETDEV_DPDK_MAX_PKT_LEN in netdev_dpdk_set_mtu(). >> > >> > When setting an MTU we first check if the requested MTU frame size >> > will exceed the maximum packet frame size supported in >> > netdev_dpdk_set_mtu(). The MTU frame length is calculated as MTU + >> > ETHER_HEADER + ETHER_CRC. The MTU for the device will be set at a >> > later stage in dpdk_ethdev_init() using rte_ethdev_set_mtu(mtu). >> >> Perhaps? >> dpdk_eth_dev_init() >> rte_eth_dev_set_mtu() >> >> >> > However when using rte_ethdev_set_mtu(mtu) the calculation used to >> > check that the MTU does not exceed the max frame length for that >> > device varies between DPDK device drivers. For example ixgbe driver >> > calculates MTU frame length as >> > >> > mtu + ETHER_HDR_LEN + ETHER_CRC_LEN >> > >> > i40e driver calculates it as >> > >> > mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2 >> > >> > em driver calculates it as >> > >> > mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE >> > >> > Currently it is possible to set an MTU for a netdev_dpdk device that >> > exceeds the upper limit MTU for that devices DPDK driver. This leads >> > to a segfault. This is because the MTU frame length comparison as is, >> > does not take into account the addition of the vlan tag overhead >> > expected in the drivers. The netdev_dpdk_set_mtu() call will >> > incorrectly succeed but the subsequent dpdk_ethdev_init() will fail >> > before the queues have been created for the DPDK device. This coupled >> > with assumptions regarding reconfiguration requirements for the netdev >> > will lead to a segfault when the rxq is polled for this device. >> > >> > A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when >> > validating a requested MTU in netdev_dpdk_set_mtu(). >> > MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following: >> > >> > mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN) >> > >> > It's use accounts for the potential of up to two vlan headers being Since you're spinning a V3 anyway... ;) <micronit>Its, not "It's"</micronit> >> > included in the overhead required by some devices in MTU frame length >> > at the netdev_dpdk_set_mtu() stage. This allows OVS to flag an error >> > rather than the DPDK driver if the MTU exceeds the max DPDK frame >> > length. OVS can fail gracefully at this point by falling back to a I think 'falling back' here is a bit misleading, as it suggest some kind of rollback, or previous state restoration. In truth, the MTU in this case never changes; it was initialized to 1500, as stays as such, since the new value was rejected. >> > default MTU of 1500 and continue to configure the port. >> > >> > Note: this fix is a work around, a better approach would be if DPDK >> > devices could report the maximum MTU value that can be requested on a >> > per device basis. This capability however is not currently available. >> > A downside of this patch is that the MTU upper limit will be reduced >> > by 8 bytes for DPDK devices that do not need to account for vlan tags >> > in MTU frame length driver calculations e.g. ixgbe devices upper MTU >> > limit is reduced from the OVS point of view from 9710 to 9702. >> > >> > CC: Mark Kavanagh <[email protected]> >> > Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames") >> > Signed-off-by: Ian Stokes <[email protected]> >> > >> > --- >> > v1->v2 >> > * Add note to limitations in DPDK documentation regarding MTU. >> > * Correct MTU calculation in commit message. >> > * Flag that we provision 8 bytes (2 x vlan header) by using >> > MTU_TO_MAX_FRAME_LEN in commit message and code comments. >> > --- >> > Documentation/intro/install/dpdk.rst | 12 ++++++++++++ >> > lib/netdev-dpdk.c | 13 ++++++++++++- >> > 2 files changed, 24 insertions(+), 1 deletions(-) >> > >> > diff --git a/Documentation/intro/install/dpdk.rst >> > b/Documentation/intro/install/dpdk.rst >> > index 3fecb5c..5800096 100644 >> > --- a/Documentation/intro/install/dpdk.rst >> > +++ b/Documentation/intro/install/dpdk.rst >> > @@ -585,6 +585,18 @@ Limitations >> > >> > .. _DPDK release notes: >> > http://dpdk.org/doc/guides/rel_notes/release_17_11.html >> > >> > +- Upper bound MTU: DPDK device drivers differ in how MTU overhead is >> > + calculated e.g. i40e driver includes 2 x vlan headers in MTU >> > +overhead, As a general point, this is technically frame (i.e L2) overhead, whereas MTU is an L3 concept. I'd rephrase it as 'how the L2 overhead for a given MTU value is calculated'. >> > + em driver includes 1 x vlan header, ixgbe driver does not include a >> > +vlan >> > + header in overhead. Currently it is not possible for OVS DPDK to >> > +know what >> > + upper bound MTU value is supported for a given device. As such OVS >> > +DPDK >> > + must provision for the case where the maximum MTU value includes 2 >> > +x vlan >> > + headers. This reduces the upper bound MTU value for devices that do >> > +not >> > + include vlan headers in their overhead by 8 bytes e.g. ixgbe >> > +devices upper >> >> two spaces >> >> > + bound MTU is reduced from 9710 to 9702. This work around is >> > + temporary and is expected to be removed once a method is provided >> > + by DPDK to query the maximum MTU for a given device. Has it been confirmed that DPDK will provide such functionality? If not, you may want to rephrase the previous sentence, so as not to set unrealistic expectations. >> > + >> > Reporting Bugs >> > -------------- >> > >> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index >> > e32c7f6..2ac76e3 100644 >> > --- a/lib/netdev-dpdk.c >> > +++ b/lib/netdev-dpdk.c >> > @@ -2134,7 +2134,18 @@ netdev_dpdk_set_mtu(struct netdev *netdev, int >> > mtu) { >> > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> > >> > - if (MTU_TO_FRAME_LEN(mtu) > NETDEV_DPDK_MAX_PKT_LEN >> > + /* XXX: We need to ensure the requested MTU frame length does not I'd probably rephrase to something along these lines: "ensure that the overall frame length of the requested MTU...". >> > + * surpass the NETDEV_DPDK_MAX_PKT_LEN. DPDK device drivers differ >> > + * in how the MTU frame length is calculated when >> rte_ethdev_set_mtu(mtu) >> > + * is called e.g. i40e driver includes 2 x vlan headers in MTU >> overhead >> > + * the em driver includes 1 x vlan tag in MTU overhead, the ixgbe >> driver >> > + * does not include vlan tags in MTU overhead. As such we should >> use >> > + * MTU_TO_MAX_FRAME_LEN(mtu) which includes an additional 2 x vlan >> headers >> > + * (8 bytes) for comparison. This avoids a failure later with >> > + * rte_ethdev_set_mtu(). This approach should be used until DPDK >> > + provides >> >> rte_eth_dev_set_mtu >> >> > + * a method to retrieve maximum MTU frame length for a given >> device. >> > + */ >> > + if (MTU_TO_MAX_FRAME_LEN(mtu) > NETDEV_DPDK_MAX_PKT_LEN >> > || mtu < ETHER_MIN_MTU) { >> > VLOG_WARN("%s: unsupported MTU %d\n", dev->up.name, mtu); >> > return EINVAL; >> >> We got a bug report for another related MTU issue because in older >> OVS/DPDK, the i40e driver didn't include the 2 VLAN headers overhead, so >> if the packet length is the maximum allowed, it can't forward packets with >> VLAN headers included. So, I agree that this needs improvements on the >> DPDK side to report not only the maximum MTU but also to standardize what >> means MTU. Today we don't know the overhead each PMD will account. >> >> In the hope that the committer can fix the small typos above or if there >> is another spin: >> >> Acked-by: Flavio Leitner <[email protected]> > >Thanks Flavio, > >I agree, we can add this to requirements for DPDK MTU capabilities. > >I'll re-spin a v3 and remove the RFC, once there's an ACK from Mark I'll add >this to the DPDK_MERGE over the weekend. > >Thanks >Ian >> >> Thanks Ian! >> >> >> > -- >> > 1.7.0.7 >> > >> > _______________________________________________ >> > dev mailing list >> > [email protected] >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> -- >> Flavio _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
