On Mon, May 5, 2025 at 2:23 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 5/5/25 6:35 AM, Han Zhou wrote:
> >
> >
> > On Fri, May 2, 2025 at 3:10 AM Ilya Maximets <i.maxim...@ovn.org
<mailto:i.maxim...@ovn.org>> wrote:
> >>
> >> On 5/1/25 12:02 AM, Han Zhou wrote:
>
> <snip>
>
> >>
> >> Hi, Han.
> >>
> >> That's an interesting issue.  The documentation you're looking for is
RFC 6040
> >> that describes how the ECN bits should be passed around during
encap/decap process.
> >> Specifically:
> >>   https://www.rfc-editor.org/rfc/rfc6040#page-10
> >>
> >> Tunnel implementations in the linux kernel seem to support the same
logic from the
> >> RFC, so it might be possible to avoid exact matches on the ECN bits.
However, there
> >> is no real way for ofproto layer to detect if this is implemented in
the datapath or
> >> not.  Userspace datapath doesn't implement this, Windows datapath
doesn't, we use
> >> raw encap for rte_flow, so that will also likely not support this.  TC
offload I'm
> >> not sure about, I saw some references to ECN in the mlx driver, but
I'm not confident
> >> these are relevant.  Besides, implementing support for this in
userspace datapath may
> >> be problematic from the performance point of view as we'll need to add
extra parsing
> >> logic to the datapath and modify headers on per-packet basis.  Though
it's hard to
> >> tell what the performance impact will be.
> >>
> >> So, if we find a good way of detecting the datapath support, then it
might be a
> >> good improvement to avoid flow-based ECN handling when datapath
supports it.
> >> But it may not be worth implementing this handling in all the
datapaths.
> >
> > Hi Ilya, thanks for the information. For the datapaths that don't have
this ECN
> > handling logic, how would the exact match help? As we can see in the
example I gave,
> > for the packets that have the CE bit changed to 1, the newly generated
megaflow has
> > the same datapath action, which doesn't instruct any ECN processing. I
thought the
> > exact match was just useless, only bringing the performance penalty.
But now that
> > you mentioned some datapaths didn't implement this handling, does that
mean the ECN
> > handling is just broken in those cases? Did I miss anything?
>
> Your inner header is Not-ECT (nw_ecn=0x0) and the outer is ECT(1)
> (nw_ecn=0x1), that corresponds to Not-ECT intersection in the table
> in the RFC after decap.
>

In my case, both the outer and inner header have the ECT bit set, and then
at some point when congestion happens the CE bit is set in the outer header
when the packet is received at the receiving side, and OVS set the CE bit
to the inner header implicitly by its ECN handling logic. If you look at
the megaflow example I gave:

- Before the congestion has the match for the inner header: tos=0/0, it is
wildcarded.
- When congestion happens the match for the inner header became: tos=0x3/0x3

So it is not the invalid case you are referring to.

Best,
Han

> """
>       *  If the inner ECN field is Not-ECT and the outer ECN field is
>          CE, the decapsulator MUST drop the packet.
>
>       *  If the inner ECN field is Not-ECT and the outer ECN field is
>          Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
>          outgoing packet with the ECN field cleared to Not-ECT.
> """
>
> So, for the inner packet with nw_tos=0x0 there are only two options -
> not to update or drop.  That's why you don't see any ECN-related
> datapath action in your case.  But if you set nw_tos to 0x3 on the
> outer header, the packet must be dropped.  And that's the reason for
> having the field in the match - different actions can be taken
> depending on the actual values.  We have a corresponding test in the
> tunnel.at - "tunnel - ECN decapsulation".
>
> However, as you can see in the table in the RFC 6040, this intersection
> is marked with '(!!!)', which means the combination should not be
> possible in a normally functioning network:
>
> """
>    o  Certain combinations of inner and outer ECN fields cannot result
>       from any transition in any current or previous ECN tunneling
>       specification.  These currently unused (CU) combinations are
>       indicated in Figure 4 by '(!!!)' or '(!)', where '(!!!)' means the
>       combination is CU and always potentially dangerous, while '(!)'
>       means it is CU and possibly dangerous.  In these cases,
>       particularly the more dangerous ones, the decapsulator SHOULD log
>       the event and MAY also raise an alarm.
> """
>
> So, the fact that the inner packet is marked as Not-ECN-Capable-Transport
> (Not-Ect, 0x0) and yet something in the network marks the outer header as
> Capable (ECT(1), 0x1) likely means that there is a malfunctioning box
> somewhere between the tunnel endpoints.  This should not happen.  Routers
> should only set full CE (0x3) indicators and only when the packet is
> marked as ECN-capable (0x10 or 0x01).
>
> Note: ECT(1) doesn't mean there is a congestion, it means that the
> transport is capable of reacting to congestion, and only the sender can
> decide to enable that.
>
> Note2: The "ECT" and "CE" bit names are obsolete names from RFC 2481.
> Starting with RFC 3168, bits are not recognized separately, instead we
> have 4 distinct states: Not-ECT, ECT(0), ECT(1), CE.  ECT(1) combination
> had no meaning in obsolete RFCs, because it would mean congestion on a
> non-capable transport.  In RFC 3168 onwards it just means a capable
> transport and practically similar to ECT(0).
> But switching from 00 to 01 by some network appliance in the middle
> didn't make sense in any of these RFCs, AFAICT.
>
> Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to