2] northd.c: Avoid sending ICMP time exceeded for multicast packets.)

Dumitru Ceara Mon, 17 Apr 2023 02:56:20 -0700

On 4/14/23 18:01, Han Zhou wrote:
> 
> 
> On Fri, Apr 14, 2023 at 2:16 AM Dumitru Ceara <[email protected]
> <mailto:[email protected]>> wrote:
>>
>> On 4/14/23 04:15, Han Zhou wrote:
>> > On Thu, Apr 13, 2023 at 12:44 PM Han Zhou <[email protected]
> <mailto:[email protected]>> wrote:
>> >>
>> >>
>> >>
>> >> On Thu, Apr 13, 2023 at 12:35 PM Dumitru Ceara <[email protected]
> <mailto:[email protected]>> wrote:
>> >>>
>> >>> On 4/13/23 21:21, Han Zhou wrote:
>> >>>> On Thu, Apr 13, 2023 at 8:37 AM Dumitru Ceara <[email protected]
> <mailto:[email protected]>>
>> > wrote:
>> >>>>>
>> >>>>> On 4/13/23 17:34, Han Zhou wrote:
>> >>>>>> On Thu, Apr 13, 2023 at 12:54 AM Dumitru Ceara
> <[email protected] <mailto:[email protected]>>
>> >>>> wrote:
>> >>>>>>>
>> >>>>>>> On 4/13/23 07:07, Han Zhou wrote:
>> >>>>>>>> In RFC1812 section 5.3.1, it is mentioned that:
>> >>>>>>>>
>> >>>>>>>>    If the TTL is reduced to zero (or less), the packet MUST be
>> >>>>>>>>    discarded, and if the destination is not a multicast address
>> > the
>> >>>>>>>>    router MUST send an ICMP Time Exceeded message ...
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> The code itself looks OK but I wonder a bit about the rationale.
>> > Do
>> >>>> you
>> >>>>>>> have an example in which OVN replies with Time Exceeded for
>> > multicast
>> >>>>>>> destinations and that causes issues?
>> >>>>>>>
>> >>>>>>>> So if the destionation is a multicast address the route shouldn't
>> > send
>> >>>>>>>> ICMP Time Exceeded, but the current OVN implementation didn't
>> > check
>> >>>>>>>> multicast and tries to send ICMP regardless. This patch fixes it.
>> >>>>>>>
>> >>>>>>> The statement "if destination is not a multicast address the
> router
>> >>>> MUST
>> >>>>>>> send an ICMP Time Exceeded message" implies that "if destination
>> > is a
>> >>>>>>> multicast address the router MAY or MAY NOT send an ICMP Time
>> > Exceeded
>> >>>>>>> message".  So the fact that OVN sends one is not necessarily
> wrong.
>> >>>>>>
>> >>>>>> For my limited understanding of multicast, sending ICMP time
>> > exceeded is
>> >>>>>> not a good idea. In multicast TTL has special meanings, for
> example:
>> >>>>>>     TTL 0: Restricted to the same host, not transmitted by the
>> > router.
>> >>>>>>     TTL 1: Restricted to the same subnet, not forwarded by the
>> > router.
>> >>>>>> If we send ICMP for such packets, it means for a very common use
>> > case of
>> >>>>>> multicast (ttl = 1, same subnet), we will end up sending ICMP for
>> > every
>> >>>>>> normal packet.
>> >>>>>> In production we saw this happening with a rate higher than 10k
> pps!
>> >>>>>>
>> >>>>>
>> >>>>> Makes sense, thanks for the details!
>> >>>>>
>> >>>>>> So I believe this is the reason behind the "if" statement in the
>> > RFC.
>> >>>> Maybe
>> >>>>>> I should add this rationale in the comment, too.
>> >>>>>>
>> >>>>>
>> >>>>> If you could add some of the details above to the commit log too
>> > then:
>> >>>>>
>> >>>>> Acked-by: Dumitru Ceara <[email protected]
> <mailto:[email protected]>>
>> >>>>
>> >>>> Thanks Dumitru. I applied the series to main.
>> >>>> Han
>> >>>
>> >>> This sounds like a relevant bug fix.  Should this get backported to
>> >>> stable branches too?
>> >>>
>> >> Sound good. I will do that.
>> >
>> > I backported this series to the last release branch-23.03 and the LTS
>> > branch-22.03.
>> > The code base has changed a lot so was mostly manually backported
> instead
>> > of cherry-pick.
>> >
>>
>> Thanks for that but I think we shouldn't skip stable branches in
>> between.  We likely still have users on those that won't upgrade to the
>> latest 23.03 but instead to a new z-stream release (when that happens)
>> of the stable branch.
>>
>> I cherry picked your backports and made sure the tests pass and then
>> pushed them to branches 22.12, 22.09 and 22.06.
> 
> Thanks Dumitru for helping backporting. I was uncertain about whether we
> should continue backporting without skipping branches, as our growing
> number of released branches makes this approach increasingly difficult
> to sustain.
> 
> The release process document
> (https://docs.ovn.org/en/latest/internals/release-process.html
> <https://docs.ovn.org/en/latest/internals/release-process.html>) doesn't
> specify how long we should maintain non-LTS branches. My understanding
> was that we should primarily maintain LTS and the latest releases for
> regular bug fixes, while critical/security fixes would be applied to all
> branches. It appears we have different interpretations of this process.
> While it's beneficial to backport to all branches, I thought this was
> more of a convenience when there weren't too many branches to manage. In
> the worst case, a patch may require manual backporting to each branch if
> new features make the codebase in each branch unique. I also question
> what differentiates an LTS branch if we're backporting to all branches
> anyway. My assumption was that users opting for a non-LTS branch should
> be prepared to upgrade to newer releases, given the "short-term"
> maintenance of these branches. Is my understanding incorrect?
> 
Hi Han,


I changed the subject of the email to better reflect the new discussion.

There's this statement in the release-process document: "LTS releases
will receive bug fixes until the point that another LTS is released. At
that point, the old LTS will receive an additional year of critical and
security fixes."

Which further differentiates an LTS release from a non-LTS release.

> Perhaps we should discuss and formalize our backporting practice going
> forward, as the number of branches will only continue to grow in the
> coming months and years. cc @Mark Michelson
> <mailto:[email protected]>  @Numan Siddique <mailto:[email protected]> 
> 

Checking the OVS/OVN release-process documents again it's indeed not
explicitly specified that non-LTS stable release branches are supposed
to get all bug fixes.  But the OVS document has this addition:

"While branches other than LTS and the latest release are not formally
maintained, the OVS project usually provides stable releases for these
branches for at least 2 years, i.e. stable releases are provided for the
 last 4 release branches.  However, these branches may not include all
the fixes that LTS has in case backporting is not straightforward and
developers are not willing to spend their time on that (this mostly
affects branches that are older than the LTS, because backporting to LTS
implies backporting to all intermediate branches)."

This last part really makes me think that we SHOULD backport to all
intermediate branches: "...  because backporting to LTS
implies backporting to all intermediate branches".

Maybe we should make this rule (or some other backport strategy we
decide to use from this point on) explicit indeed.

CC: Ilya for input from OVS perspective.

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] OVN LTS/non-LTS bug fix backport strategy (was: Re: [PATCH ovn 2/2] northd.c: Avoid sending ICMP time exceeded for multicast packets.)

Reply via email to