On 5/5/25 6:35 AM, Han Zhou wrote:
> 
> 
> On Fri, May 2, 2025 at 3:10 AM Ilya Maximets <i.maxim...@ovn.org 
> <mailto:i.maxim...@ovn.org>> wrote:
>>
>> On 5/1/25 12:02 AM, Han Zhou wrote:

<snip>

>>
>> Hi, Han.
>>
>> That's an interesting issue.  The documentation you're looking for is RFC 
>> 6040
>> that describes how the ECN bits should be passed around during encap/decap 
>> process.
>> Specifically:
>>   https://www.rfc-editor.org/rfc/rfc6040#page-10
>>
>> Tunnel implementations in the linux kernel seem to support the same logic 
>> from the
>> RFC, so it might be possible to avoid exact matches on the ECN bits.  
>> However, there
>> is no real way for ofproto layer to detect if this is implemented in the 
>> datapath or
>> not.  Userspace datapath doesn't implement this, Windows datapath doesn't, 
>> we use
>> raw encap for rte_flow, so that will also likely not support this.  TC 
>> offload I'm
>> not sure about, I saw some references to ECN in the mlx driver, but I'm not 
>> confident
>> these are relevant.  Besides, implementing support for this in userspace 
>> datapath may
>> be problematic from the performance point of view as we'll need to add extra 
>> parsing
>> logic to the datapath and modify headers on per-packet basis.  Though it's 
>> hard to
>> tell what the performance impact will be.
>>
>> So, if we find a good way of detecting the datapath support, then it might 
>> be a
>> good improvement to avoid flow-based ECN handling when datapath supports it.
>> But it may not be worth implementing this handling in all the datapaths.
> 
> Hi Ilya, thanks for the information. For the datapaths that don't have this 
> ECN
> handling logic, how would the exact match help? As we can see in the example 
> I gave,
> for the packets that have the CE bit changed to 1, the newly generated 
> megaflow has
> the same datapath action, which doesn't instruct any ECN processing. I 
> thought the
> exact match was just useless, only bringing the performance penalty. But now 
> that
> you mentioned some datapaths didn't implement this handling, does that mean 
> the ECN
> handling is just broken in those cases? Did I miss anything?

Your inner header is Not-ECT (nw_ecn=0x0) and the outer is ECT(1)
(nw_ecn=0x1), that corresponds to Not-ECT intersection in the table
in the RFC after decap.

"""
      *  If the inner ECN field is Not-ECT and the outer ECN field is
         CE, the decapsulator MUST drop the packet.

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
         outgoing packet with the ECN field cleared to Not-ECT.
"""

So, for the inner packet with nw_tos=0x0 there are only two options -
not to update or drop.  That's why you don't see any ECN-related
datapath action in your case.  But if you set nw_tos to 0x3 on the
outer header, the packet must be dropped.  And that's the reason for
having the field in the match - different actions can be taken
depending on the actual values.  We have a corresponding test in the
tunnel.at - "tunnel - ECN decapsulation".

However, as you can see in the table in the RFC 6040, this intersection
is marked with '(!!!)', which means the combination should not be
possible in a normally functioning network:

"""
   o  Certain combinations of inner and outer ECN fields cannot result
      from any transition in any current or previous ECN tunneling
      specification.  These currently unused (CU) combinations are
      indicated in Figure 4 by '(!!!)' or '(!)', where '(!!!)' means the
      combination is CU and always potentially dangerous, while '(!)'
      means it is CU and possibly dangerous.  In these cases,
      particularly the more dangerous ones, the decapsulator SHOULD log
      the event and MAY also raise an alarm.
"""

So, the fact that the inner packet is marked as Not-ECN-Capable-Transport
(Not-Ect, 0x0) and yet something in the network marks the outer header as
Capable (ECT(1), 0x1) likely means that there is a malfunctioning box
somewhere between the tunnel endpoints.  This should not happen.  Routers
should only set full CE (0x3) indicators and only when the packet is
marked as ECN-capable (0x10 or 0x01).

Note: ECT(1) doesn't mean there is a congestion, it means that the
transport is capable of reacting to congestion, and only the sender can
decide to enable that.

Note2: The "ECT" and "CE" bit names are obsolete names from RFC 2481.
Starting with RFC 3168, bits are not recognized separately, instead we
have 4 distinct states: Not-ECT, ECT(0), ECT(1), CE.  ECT(1) combination
had no meaning in obsolete RFCs, because it would mean congestion on a
non-capable transport.  In RFC 3168 onwards it just means a capable
transport and practically similar to ECT(0).
But switching from 00 to 01 by some network appliance in the middle
didn't make sense in any of these RFCs, AFAICT.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to