On 11/01/2024 15:44, Ilya Maximets wrote: > On 1/10/24 19:35, Kevin Traynor wrote: >> +cc some others people who may be interested about OVS upgrading DPDK >> version. >> >> On 10/01/2024 16:52, Ilya Maximets wrote: >>> On 12/13/23 14:06, David Marchand wrote: >>>> This commit adds support for DPDK v23.11. >>>> It updates the CI script and documentation and includes the following >>>> changes coming from the dpdk-latest branch: >>>> >>>> - sparse: Add some compiler intrinsics for DPDK build. >>>> >>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=* >>>> >>>> - ci: Cache DPDK installed libraries only. >>>> - ci: Reduce optional libraries in DPDK. >>>> >>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=* >>>> >>>> - system-dpdk: Ignore net/ice error log about QinQ offloading. >>>> >>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=* >>>> >>>> Signed-off-by: David Marchand <[email protected]> >>>> --- >>> >>> Hi, Kevin, David, others. >>> >> >> Hi Ilya, >> >> Thanks for summarizing the options. >> >>> We need to make a decision on this patch as the proposed branching is >>> only one week away. As far as I understand, there is a problem with >>> Intel Virtual Function driver (iavf) that deadlocks OVS when VF is added. >>> The problem is described in https://bugs.dpdk.org/show_bug.cgi?id=1337 >>> (not 1337 at all) and the commit that introduced the issue in DPDK is >>> known. To the date the issue is not fixed. The potential solution is >>> to revert the commit from DPDK, bringing back another issue fixed by >>> aforementioned commit, though that issue seems less severe and, to my >>> knowledge, we didn't actually experience it in the past. >>> >> >> Agree, non-regression is always better. >> >>> There is also a situation around DPDK stable releases. Since these are >>> normally created after the next major release of DPDK is out, the time >>> gap between xx.11 and xx.11.1 is 5 months. Which is a lot, especially >>> for an LTS release, since projects are likely to migrate to new LTS >>> releases of DPDK and they are likely to discover bugs that need fixes >>> earlier than in 5 months. >>> >> >> Good feedback. The issue is the LTS follows the DPDK main release in >> order that fixes are applied in main branch and already have gone >> through some validation. But maybe there's a more limited xx.11.1 >> version with fixes for reported issues that could be released etc. It's >> something that would need more discussion. >> >> I think it's better to address the current issue and possible future >> workflow changes separately as much as possible, as they might need >> different resolutions and the thread could get a bit overloaded. I've >> just commented on the current issue below for now. > > OK. That makes sense. It's hard to solve long term issues in > a time scramble. > >> >>> With that said, we have a few options for the current patch: >>> >>> 0. Accept the patch and do nothing about the issue. Clearly not a good >>> option. The argument can be made that the problem was also >>> backported to stable DPDK 21.11.5 and 22.11.something, so older OVS >>> releases are also affected, i.e. it's kind of not a problem for 3.3 >>> release of OVS in particular. However, for older releases the users >>> can choose to fall back to older stable releases of DPDK. With a >>> major version upgrade we are going to introduce breaking changes, >>> and there is nowhere to fall back, since going back to 22.11 will >>> break features for certain drivers even if DPDK API/ABI that we >>> use would have been compatible. >>> >> >> I have reverted the patch that introduced the issue for 21.11.6. >> Hopefully we can do the same for 22.11.4, and we will have those >> releases shortly to cover the branches using those LTS's. >> >>> 1. Accept the patch and document that users will need to revert a >>> particular DPDK commit, if they are planning to use VFs on Intel NICs. >>> And upgrade to 23.11.1 as soon as it is available, assuming the issue >>> will be fixed there. >>> >>> This is not a very user-friendly option. And it is not clear if >>> distributions will do that. Also, it's a one-off solution that we may >>> have to repeat every year. And it might not be possible for other >>> types of issues we may encounter in the future. Also, users will >>> have zero validation for the changes they make in DPDK. >>> >>> 2. Check if DPDK can make a one-off stable release of 23.11.1 with just this >>> patch reverted or the fix implemented. If this can be done before OVS >>> release in mid February, that might be acceptable. >>> >>> This will likely mean skipping some validation steps on the DPDK release >>> side, so not ideal. However, it is better than asking users to revert >>> this patch themselves as they will have zero validation this way. >>> This also doesn't address the bigger problem with DPDK stable release >>> cadence and making one-off releases every year doesn't sound right. >>> >> >> Quite similar, but I guess 1 is more of an inconvenience for the user to >> have to revert that patch themselves, especially if they are just using >> the tar file. >> >> I'm not sure if it's Luca who is going to maintain 23.11 LTS, but if >> he's not available I would be prepared to make a 23.11.1 release with a >> revert for that issue *if* it's confirmed and agreed by Intel devs. > > I see that there were no replies to your questions in the BZ. Should the > revert patch for a main branch be posted to dpdk-dev? >
The code has diverged between main and stable branches and the patch that triggered the issue was a "fix" for another issue, so they may want to take a different approach on main branch, or debate about API usage and integration. >> >>> 3. Postpone 23.11 to OVS 3.4 and likely just move DPDK upgrades to summer >>> releases of OVS. >>> >>> This should address the release cadence problem, sine we'll have at >>> least one stable release of DPDK before moving to a new major version, >>> giving us time to test and report issues. Upgrading to .1 stable >>> versions >>> instead of unstable ones seems like a good idea for software in general. >>> Obvious downside for this approach is an even longer time for new DPDK >>> features to be available for OVS users. >>> >> >> A couple of downsides wrt doing this for current issue: >> - Possibly users of other DPDK drivers want to use the updated versions >> in DPDK 23.11 >> - Some users may have already planned updating to a common DPDK >> with/without OVS to 23.11 based on what has been the standard workflow >> over last few years >> - 22.11 will EoL a year before 23.11 so it may mean a user using OVS 3.3 >> faces more time with an unmaintained DPDK LTS at the backend of their usage > > Good point. OVS LTS support is already longer (3 years) than DPDK's (2 years) > and moving adoption of new DPDK LTS releases to summer releases will make > the difference even larger, since DPDK versions they are using will last only > for 10 months. This is not ideal, but we don't have a lot of options, unless > the options 4 or 5 are happening. > We have been doing 3 years maintenance on last few DPDK LTS releases as a trial and it has gone well. We didn't want to update docs and then break a promise in a year or two's time, but I think at this point we can update the docs to officially state 3 years maintenance. So the overlap maintenance time is better, but the point about the extra year still holds, just a bit later on. >> >>> Note: Moving release dates for major releases of OVS or DPDK doesn't sound >>> right and may create more issues than it solves due to release time >>> alignments >>> with major consumers like OVN, distributions and cluster management systems. >>> So, not suggesting that. >>> >>> <rant> >>> 4. Revisiting the stable release policy for DPDK LTS releases might be a >>> good >>> thing though, since 5 months is an unreasonably long time for a fresh >>> release to not receive any bug fixes. This time gap is also larger than a >>> time gap between two stable releases of the same series, i.e. time >>> between >>> xx.11.1 and xx.11.2 is less than time between xx.11 and xx.11.1, which >>> doesn't make a lot of sense. >>> >>> I understand a position of DPDK project to not incorporate testing of >>> external applications into their release process, since it can't possibly >>> test with every application. However, application developers can't >>> possibly >>> test every DPDK driver on their own, because upstream communities like >>> OVS >>> simply don't have hardware/infrastructure to do so. And there is a clear >>> gap in testing and validation on DPDK side, i.e. validation performed by >>> DPDK project alone is not sufficient. That means that bugs are >>> inevitable >>> and fresh releases of DPDK will contain bugs making them unusable for >>> some >>> applications. Hence the need for faster process for .1 releases. E.g. >>> have xx.11.1 release in the end of Januray / start of February would be >>> fine. Though the timing with different holidays around the world is not >>> good. >>> >>> This option is just a little more sustainable option 2 as it will involve >>> proper validation on DPDK side. But again it's not OVS' call to make. >>> >>> 5. Have bug-free DPDK right out the gate :D. This is obviously not >>> happening >>> unless OVS is tightly integrated into DPDK testing and validation and all >>> the issues are caught before new version of DPDK is released. >>> </rant> >>> >>> I think, option 0 is a no-go. To resolve a current issue at hands for OVS >>> 3.3 we could go with 1, 2 or 3. Though 2 is not OVS' call to make. Long >>> term solutions are 3 or 4, as 1 and 2 require solving this problem every >>> year, >>> depending on us having problems with a new release or not. 5 doesn't seem >>> like a possible solution at the moment for various reasons. >>> >>> Thoughts? >>> >> >> My preference would be 2, as it's the least amount of headaches and >> change for users. > > 2 does sounds like the best short term option, I agree. Though is is also > the one we (OVS community) have the least control over. We're waiting for > iavf maintainers to confirm the issue and then we're relying on 23.11.1 > release to be made and be made on time. So, the option is getting less > viable each day. > I'm coming to the conclusion that there may not be a quick solution on DPDK side to allow for option 2. and it's probably best to just go with option 1 at this point. That will allow more time before OVS 3.4/DPDK 23.11.1 to debate the API and how and where is best way to fix it. David, let us know if you agree ? If so, maybe you can send a new version of the patch with the added documentation. I can help with docs or discussing further. >> >> thanks, >> Kevin. >> >>> We need to make a decision on this by the end of this week. >>> >>> Best regards, Ilya Maximets. >>> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
