On 11/01/2024 15:44, Ilya Maximets wrote:
> On 1/10/24 19:35, Kevin Traynor wrote:
>> +cc some others people who may be interested about OVS upgrading DPDK
>> version.
>>
>> On 10/01/2024 16:52, Ilya Maximets wrote:
>>> On 12/13/23 14:06, David Marchand wrote:
>>>> This commit adds support for DPDK v23.11.
>>>> It updates the CI script and documentation and includes the following
>>>> changes coming from the dpdk-latest branch:
>>>>
>>>> - sparse: Add some compiler intrinsics for DPDK build.
>>>>   
>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=*
>>>>
>>>> - ci: Cache DPDK installed libraries only.
>>>> - ci: Reduce optional libraries in DPDK.
>>>>   
>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=*
>>>>
>>>> - system-dpdk: Ignore net/ice error log about QinQ offloading.
>>>>   
>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=*
>>>>
>>>> Signed-off-by: David Marchand <[email protected]>
>>>> ---
>>>
>>> Hi, Kevin, David, others.
>>>
>>
>> Hi Ilya,
>>
>> Thanks for summarizing the options.
>>
>>> We need to make a decision on this patch as the proposed branching is
>>> only one week away.  As far as I understand, there is a problem with
>>> Intel Virtual Function driver (iavf) that deadlocks OVS when VF is added.
>>> The problem is described in https://bugs.dpdk.org/show_bug.cgi?id=1337
>>> (not 1337 at all) and the commit that introduced the issue in DPDK is
>>> known.  To the date the issue is not fixed.  The potential solution is
>>> to revert the commit from DPDK, bringing back another issue fixed by
>>> aforementioned commit, though that issue seems less severe and, to my
>>> knowledge, we didn't actually experience it in the past.
>>>
>>
>> Agree, non-regression is always better.
>>
>>> There is also a situation around DPDK stable releases.  Since these are
>>> normally created after the next major release of DPDK is out, the time
>>> gap between xx.11 and xx.11.1 is 5 months.  Which is a lot, especially
>>> for an LTS release, since projects are likely to migrate to new LTS
>>> releases of DPDK and they are likely to discover bugs that need fixes
>>> earlier than in 5 months.
>>>
>>
>> Good feedback. The issue is the LTS follows the DPDK main release in
>> order that fixes are applied in main branch and already have gone
>> through some validation. But maybe there's a more limited xx.11.1
>> version with fixes for reported issues that could be released etc. It's
>> something that would need more discussion.
>>
>> I think it's better to address the current issue and possible future
>> workflow changes separately as much as possible, as they might need
>> different resolutions and the thread could get a bit overloaded. I've
>> just commented on the current issue below for now.
> 
> OK.  That makes sense.  It's hard to solve long term issues in
> a time scramble.
> 
>>
>>> With that said, we have a few options for the current patch:
>>>
>>> 0. Accept the patch and do nothing about the issue.  Clearly not a good
>>>    option.  The argument can be made that the problem was also
>>>    backported to stable DPDK 21.11.5 and 22.11.something, so older OVS
>>>    releases are also affected, i.e. it's kind of not a problem for 3.3
>>>    release of OVS in particular.  However, for older releases the users
>>>    can choose to fall back to older stable releases of DPDK.  With a
>>>    major version upgrade we are going to introduce breaking changes,
>>>    and there is nowhere to fall back, since going back to 22.11 will
>>>    break features for certain drivers even if DPDK API/ABI that we
>>>    use would have been compatible.
>>>
>>
>> I have reverted the patch that introduced the issue for 21.11.6.
>> Hopefully we can do the same for 22.11.4, and we will have those
>> releases shortly to cover the branches using those LTS's.
>>
>>> 1. Accept the patch and document that users will need to revert a
>>>    particular DPDK commit, if they are planning to use VFs on Intel NICs.
>>>    And upgrade to 23.11.1 as soon as it is available, assuming the issue
>>>    will be fixed there.
>>>
>>>    This is not a very user-friendly option.  And it is not clear if
>>>    distributions will do that.  Also, it's a one-off solution that we may
>>>    have to repeat every year.  And it might not be possible for other
>>>    types of issues we may encounter in the future.  Also, users will
>>>    have zero validation for the changes they make in DPDK.
>>>
>>> 2. Check if DPDK can make a one-off stable release of 23.11.1 with just this
>>>    patch reverted or the fix implemented.  If this can be done before OVS
>>>    release in mid February, that might be acceptable.
>>>
>>>    This will likely mean skipping some validation steps on the DPDK release
>>>    side, so not ideal.  However, it is better than asking users to revert
>>>    this patch themselves as they will have zero validation this way.
>>>    This also doesn't address the bigger problem with DPDK stable release
>>>    cadence and making one-off releases every year doesn't sound right.
>>>
>>
>> Quite similar, but I guess 1 is more of an inconvenience for the user to
>> have to revert that patch themselves, especially if they are just using
>> the tar file.
>>
>> I'm not sure if it's Luca who is going to maintain 23.11 LTS, but if
>> he's not available I would be prepared to make a 23.11.1 release with a
>> revert for that issue *if* it's confirmed and agreed by Intel devs.
> 
> I see that there were no replies to your questions in the BZ.  Should the
> revert patch for a main branch be posted to dpdk-dev?
> 

The code has diverged between main and stable branches and the patch
that triggered the issue was a "fix" for another issue, so they may want
to take a different approach on main branch, or debate about API usage
and integration.

>>
>>> 3. Postpone 23.11 to OVS 3.4 and likely just move DPDK upgrades to summer
>>>    releases of OVS.
>>>
>>>    This should address the release cadence problem, sine we'll have at
>>>    least one stable release of DPDK before moving to a new major version,
>>>    giving us time to test and report issues.  Upgrading to .1 stable 
>>> versions
>>>    instead of unstable ones seems like a good idea for software in general.
>>>    Obvious downside for this approach is an even longer time for new DPDK
>>>    features to be available for OVS users.
>>>
>>
>> A couple of downsides wrt doing this for current issue:
>> - Possibly users of other DPDK drivers want to use the updated versions
>> in DPDK 23.11
>> - Some users may have already planned updating to a common DPDK
>> with/without OVS to 23.11 based on what has been the standard workflow
>> over last few years
>> - 22.11 will EoL a year before 23.11 so it may mean a user using OVS 3.3
>> faces more time with an unmaintained DPDK LTS at the backend of their usage
> 
> Good point.  OVS LTS support is already longer (3 years) than DPDK's (2 years)
> and moving adoption of new DPDK LTS releases to summer releases will make
> the difference even larger, since DPDK versions they are using will last only
> for 10 months.  This is not ideal, but we don't have a lot of options, unless
> the options 4 or 5 are happening.
> 

We have been doing 3 years maintenance on last few DPDK LTS releases as
a trial and it has gone well. We didn't want to update docs and then
break a promise in a year or two's time, but I think at this point we
can update the docs to officially state 3 years maintenance.

So the overlap maintenance time is better, but the point about the extra
year still holds, just a bit later on.

>>
>>> Note: Moving release dates for major releases of OVS or DPDK doesn't sound
>>> right and may create more issues than it solves due to release time 
>>> alignments
>>> with major consumers like OVN, distributions and cluster management systems.
>>> So, not suggesting that.
>>>
>>> <rant>
>>> 4. Revisiting the stable release policy for DPDK LTS releases might be a 
>>> good
>>>    thing though, since 5 months is an unreasonably long time for a fresh
>>>    release to not receive any bug fixes. This time gap is also larger than a
>>>    time gap between two stable releases of the same series, i.e. time 
>>> between
>>>    xx.11.1 and xx.11.2 is less than time between xx.11 and xx.11.1, which
>>>    doesn't make a lot of sense.
>>>
>>>    I understand a position of DPDK project to not incorporate testing of
>>>    external applications into their release process, since it can't possibly
>>>    test with every application.  However, application developers can't 
>>> possibly
>>>    test every DPDK driver on their own, because upstream communities like 
>>> OVS
>>>    simply don't have hardware/infrastructure to do so.  And there is a clear
>>>    gap in testing and validation on DPDK side, i.e. validation performed by
>>>    DPDK project alone is not sufficient.  That means that bugs are 
>>> inevitable
>>>    and fresh releases of DPDK will contain bugs making them unusable for 
>>> some
>>>    applications.  Hence the need for faster process for .1 releases.  E.g.
>>>    have xx.11.1 release in the end of Januray / start of February would be
>>>    fine.  Though the timing with different holidays around the world is not
>>>    good.
>>>
>>>    This option is just a little more sustainable option 2 as it will involve
>>>    proper validation on DPDK side.  But again it's not OVS' call to make.
>>>
>>> 5. Have bug-free DPDK right out the gate :D.  This is obviously not 
>>> happening
>>>    unless OVS is tightly integrated into DPDK testing and validation and all
>>>    the issues are caught before new version of DPDK is released.
>>> </rant>
>>>
>>> I think, option 0 is a no-go.  To resolve a current issue at hands for OVS
>>> 3.3 we could go with 1, 2 or 3.  Though 2 is not OVS' call to make.  Long
>>> term solutions are 3 or 4, as 1 and 2 require solving this problem every 
>>> year,
>>> depending on us having problems with a new release or not.  5 doesn't seem
>>> like a possible solution at the moment for various reasons.
>>>
>>> Thoughts?
>>>
>>
>> My preference would be 2, as it's the least amount of headaches and
>> change for users.
> 
> 2 does sounds like the best short term option, I agree.  Though is is also
> the one we (OVS community) have the least control over.  We're waiting for
> iavf maintainers to confirm the issue and then we're relying on 23.11.1
> release to be made and be made on time.  So, the option is getting less
> viable each day.
> 

I'm coming to the conclusion that there may not be a quick solution on
DPDK side to allow for option 2. and it's probably best to just go with
option 1 at this point.

That will allow more time before OVS 3.4/DPDK 23.11.1 to debate the API
and how and where is best way to fix it.

David, let us know if you agree ? If so, maybe you can send a new
version of the patch with the added documentation. I can help with docs
or discussing further.

>>
>> thanks,
>> Kevin.
>>
>>> We need to make a decision on this by the end of this week.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>
> 


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to