On 10/18/23 16:24, Vladislav Odintsov wrote:
> Hi Ilya,
> 
> thanks for your response!
> 
>> On 18 Oct 2023, at 15:59, Ilya Maximets via discuss 
>> <ovs-discuss@openvswitch.org> wrote:
>>
>> On 10/17/23 16:30, Vladislav Odintsov via discuss wrote:
>>> Hi,
>>>
>>> I’m testing OVS hardware offload with tc flower with Mellanox/NVidia 
>>> ConnectX-6 Dx smartnic and see next warning in ovs-vswitchd log:
>>>
>>> 2023-10-17T14:23:15.116Z|00386|tc(handler20)|WARN|Kernel flower 
>>> acknowledgment does not match request!  Set dpif_netlink to dbg to see 
>>> which rule caused this error.
>>>
>>> With dpif_netlink debug logs enabled, after this message appears two 
>>> additional lines:
>>>
>>> 2023-10-17T14:23:15.117Z|00387|dpif_netlink(handler20)|DBG|added flow
>>> 2023-10-17T14:23:15.117Z|00388|dpif_netlink(handler20)|DBG|system@ovs-system:
>>>  put[create] ufid:d8a3ab6d-77d1-4574-8bbf-634b01a116f3 
>>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x10,src=10.1.0.105,dst=10.1.0.109,ttl=64/0,tp_src=59507/0,tp_dst=6081/0,geneve({class=0x102,type=0x80,len=4,0x60002}),flags(-df+csum+key)),in_port(4),skb_mark(0/0),ct_state(0/0x2f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x3),eth(src=00:00:ba:a4:6e:ad,dst=00:01:ba:a4:6e:ad),eth_type(0x0800),ipv4(src=172.32.2.4/0.0.0.0,dst=172.32.1.4/0.0.0.0,proto=1,tos=0/0x3,ttl=63/0,frag=no),icmp(type=8/0,code=0/0),
>>>  
>>> actions:set(tunnel(tun_id=0xff0011,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_src=59507,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x18000b}),flags(df|csum|key))),4
>>>
>>
>> Could you also enable debug logs for 'tc' module in OVS?
>> It shoudl give more infomation about where exactly is the
>> difference between what OVS asked for and what the kenrel
>> reported back.
>>
>> In general this warning typically signifies a kernel bug,
>> but it could be that OVS doesn't format something correctly
>> as well.
> 
> With enabled tc logs I see mismatches in expected/real keys and actions:
> 
> 2023-10-18T13:33:35.882Z|00118|tc(handler21)|DBG|tc flower compare failed 
> action compare
> Expected Mask:
> 00000000  ff ff 00 00 ff ff ff ff-ff ff ff ff ff ff ff ff
> 00000030  00 00 2f 00 00 00 00 00-00 00 00 00 00 00 00 00
> 00000040  03 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 00000050  00 00 00 00 ff ff ff ff-00 00 00 00 00 00 00 00
> 00000060  00 00 00 00 ff 00 00 00-00 00 00 00 00 00 00 00
> 00000090  00 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff
> 000000c0  ff 00 00 00 ff ff 00 00-ff ff ff ff ff ff ff ff
> 000000d0  08 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 000000e0  ff ff ff 01 ff ff ff ff-00 00 00 00 00 00 00 00
> 
> Received Mask:
> 00000000  ff ff 00 00 ff ff ff ff-ff ff ff ff ff ff ff ff
> 00000030  00 00 2f 00 00 00 00 00-00 00 00 00 00 00 00 00
> 00000040  03 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 00000050  00 00 00 00 ff ff ff ff-00 00 00 00 00 00 00 00
> 00000060  00 00 00 00 ff 00 00 00-00 00 00 00 00 00 00 00
> 00000090  00 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff
> 000000c0  ff 00 00 00 ff ff 00 00-ff ff ff ff ff ff ff ff
> 000000d0  08 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 000000e0  ff ff ff 01 ff ff ff ff-00 00 00 00 00 00 00 00
> 
> Expected Key:
> 00000000  08 06 00 00 ff ff ff ff-ff ff 00 00 ba a4 6e ad
> 00000050  a9 fe 64 01 a9 fe 64 03-00 00 ba a4 6e ad 00 00  <— mismatch in 
> this line
> 00000060  00 00 00 00 01 00 00 00-00 00 00 00 00 00 00 00
> 00000090  00 00 00 00 00 00 00 00-0a 01 00 68 0a 01 00 6d
> 000000c0  00 40 c0 5b 17 c1 00 00-00 00 00 00 00 00 00 10  <— mismatch in 
> this line
> 000000d0  08 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 000000e0  01 02 80 01 00 03 00 02-00 00 00 00 00 00 00 00
> 
> Received Key:
> 00000000  08 06 00 00 ff ff ff ff-ff ff 00 00 ba a4 6e ad
> 00000050  00 00 00 00 a9 fe 64 03-00 00 00 00 00 00 00 00  <— mismatch in 
> this line
> 00000060  00 00 00 00 01 00 00 00-00 00 00 00 00 00 00 00
> 00000090  00 00 00 00 00 00 00 00-0a 01 00 68 0a 01 00 6d
> 000000c0  00 00 00 00 17 c1 00 00-00 00 00 00 00 00 00 10  <— mismatch in 
> this line
> 000000d0  08 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 000000e0  01 02 80 01 00 03 00 02-00 00 00 00 00 00 00 00

These are not very important, it is expected that the kernel clears out
fields that are not coverd by a mask.  We do not have the difference
in the masks and we do not have a diference in the masked keys, so that
is fine.

> 
> Expected Masked Key:
> 00000000  08 06 00 00 ff ff ff ff-ff ff 00 00 ba a4 6e ad
> 00000050  00 00 00 00 a9 fe 64 03-00 00 00 00 00 00 00 00
> 00000060  00 00 00 00 01 00 00 00-00 00 00 00 00 00 00 00
> 00000090  00 00 00 00 00 00 00 00-0a 01 00 68 0a 01 00 6d
> 000000c0  00 00 00 00 17 c1 00 00-00 00 00 00 00 00 00 10
> 000000d0  08 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 000000e0  01 02 80 01 00 03 00 02-00 00 00 00 00 00 00 00
> 
> Received Masked Key:
> 00000000  08 06 00 00 ff ff ff ff-ff ff 00 00 ba a4 6e ad
> 00000050  00 00 00 00 a9 fe 64 03-00 00 00 00 00 00 00 00
> 00000060  00 00 00 00 01 00 00 00-00 00 00 00 00 00 00 00
> 00000090  00 00 00 00 00 00 00 00-0a 01 00 68 0a 01 00 6d
> 000000c0  00 00 00 00 17 c1 00 00-00 00 00 00 00 00 00 10
> 000000d0  08 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
> 000000e0  01 02 80 01 00 03 00 02-00 00 00 00 00 00 00 00
> 
> Action 0 mismatch:

We do have the difference in the actions, that is the main issue here.

>  - Expected Action:
> 0x1000000000000000000000000ff0011c05b17c1004000000a01006d0a010112000000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000010280010018000b00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000
>  - Received Action:
> 0x1000000000000000000000000ff0011000017c1004000000a01006d0a010112000000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000010280010018000b00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000
> 2023-10-18T13:33:35.882Z|00119|tc(handler21)|WARN|Kernel flower 
> acknowledgment does not match request!  Set dpif_netlink to dbg to see which 
> rule caused this error.
> 2023-10-18T13:33:35.882Z|00120|dpif_netlink(handler21)|DBG|added flow
> 2023-10-18T13:33:35.882Z|00121|dpif_netlink(handler21)|DBG|system@ovs-system: 
> put[create] ufid:dc160f96-84ef-4bf7-919a-3729c19382b8 
> recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x10,src=10.1.0.104,dst=10.1.0.109,ttl=64/0,tp_src=49243/0,tp_dst=6081/0,geneve({class=0x102,type=0x80,len=4,0x30002}),flags(-df+csum+key)),in_port(4),skb_mark(0/0),ct_state(0/0x2f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x3),eth(src=00:00:ba:a4:6e:ad,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=169.254.100.1/0.0.0.0,tip=169.254.100.3,op=1,sha=00:00:ba:a4:6e:ad/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00),
>  
> actions:set(tunnel(tun_id=0xff0011,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_src=49243,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x18000b}),flags(df|csum|key))),4
> 
> Is there any documentation or maybe code in OVS (or kernel, etc) to read to 
> understand the reason for this mismatch in more details?
> Or, maybe you have a good next steps to advice?

Unfortunately, that is just a direct hex dump of the tc_action structure:
  
https://github.com/openvswitch/ovs/blob/c29ba54018520f957c48d947325ed50c9442b831/lib/tc.h#L233

The only way to figure out what exactly is wrong here is to find which
bytes in the expected and received actions are different and find which
field in the tc_action structure the difference is in.  That's not fun.

The following patch may make the spotting the difference a little easier:

diff --git a/lib/tc.c b/lib/tc.c
index f85703633..39fe9c5cc 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -3875,12 +3875,13 @@ log_tc_flower_match(const char *msg,
 
         for (int i = 0; i < a->action_count; i++, action_a++, action_b++) {
             if (memcmp(action_a, action_b, sizeof *action_a)) {
-                ds_put_format(&s,
-                              "\nAction %d mismatch:\n - Expected Action: ",
-                              i);
-                ds_put_hex(&s, action_a, sizeof *action_a);
+                ds_put_format(&s, "\nAction %d mismatch:"
+                                  "\n - Expected Action:\n", i);
+                ds_put_sparse_hex_dump(&s, action_a, sizeof *action_a,
+                                       0, false);
                 ds_put_cstr(&s, "\n - Received Action: ");
-                ds_put_hex(&s, action_b, sizeof *action_b);
+                ds_put_sparse_hex_dump(&s, action_b, sizeof *action_b,
+                                       0, false);
             }
         }
     }
---

You may also need to use something like pahole on your OVS binary
to see the exact layout of the structure.


The difference seems to be in these 2 bytes:

0x 1000000000000000000000000ff0011c05b17c1004000000a01006d0a010112
0x 1000000000000000000000000ff0011000017c1004000000a01006d0a010112
                                  ^^^^
So, 16 byte offset within the structure.  Let's guess it is an encap
field.  Then it must be encap.tp_src.  And that checks out, because
0xc05b equals 49243, which is indeed a source port for the tunnel
encapsulation.

So, it seems like for some reason kernel decided to not populate
the tunnel source port in the tunnel key after decapsulation,
even though it was asked to do so.

@Eelco, @Marcelo, do you have some thoughts on that?

> 
>>
>>> The test system is a CentOS 8.4 with installed elrepo mainline kernel 
>>> 6.5.5, OVS 3.1.1 and OVN 22.09.1.
>>
>> 3.1.1 contains some known bugs in TC offloading code, so
>> you may want to try the latest 3.1.3.  Though it's unlikely
>> to be related ot the issue you're facing here.
> 
> I’ve upgraded OVS to 3.1.3 to eliminate the possible known OVS bugs, but this 
> didn’t help.
> Same warnings and mismatches still are reported.
> 
>>
>>> The workload I’m testing is a L3 Gateway for OVN IC (cross-az traffic).
>>>
>>> tc monitor at the same moment outputs next:
>>>
>>> replaced filter dev genev_sys_6081 ingress protocol ip pref 2 flower chain 
>>> 0 handle 0x3
>>>   dst_mac 00:01:ba:a4:6e:ad
>>>   src_mac 00:00:ba:a4:6e:ad
>>>   eth_type ipv4
>>>   ip_proto icmp
>>>   ip_tos 0/0x3
>>>   enc_dst_ip 10.1.0.109
>>>   enc_src_ip 10.1.0.105
>>>   enc_key_id 16
>>>   enc_dst_port 6081
>>>   enc_tos 0
>>>   geneve_opts 0102:80:00060002/ffff:ff:ffffffff
>>>   ip_flags nofrag
>>>   ct_state -trk-new-est
>>>   ct_label 00000000000000000000000000000000/030000000000000000000000000000
>>>   in_hw in_hw_count 2
>>> action order 1: tunnel_key  unset pipe
>>> index 5 ref 1 bind 1
>>> no_percpu
>>> used_hw_stats delayed
>>>
>>> action order 2: tunnel_key  set
>>> src_ip 10.1.0.109
>>> dst_ip 10.1.1.18
>>> key_id 16711697
>>> dst_port 6081

And we can see here, TC only populates the dst_port, not the src_port
into the tunnel key, even though the source port was in the tunnel(set())
action OVS requested.

>>> geneve_opts 0102:80:0018000b
>>> csum
>>> ttl 64 pipe
>>> index 6 ref 1 bind 1
>>> no_percpu
>>> used_hw_stats delayed
>>>
>>> action order 3: mirred (Egress Redirect to device genev_sys_6081) stolen
>>> index 3 ref 1 bind 1
>>> cookie 6daba3d87445d1774b63bf8bf316a101
>>> no_percpu
>>> used_hw_stats delayed
>>>
>>>
>>> Despite of these warnings, the flow is finally offloaded and the traffic 
>>> traverses this gw node well, only first packets of an ICMP sequence reach 
>>> CPU (seen in tcpdump):
>>
>> The warning is a warning.  It doesn't prevent the flow to be installed.
>> Though the installed flow may be incorrect and the traffic may be
>> handled in the wrong way.  Enabling debug logs for tc should show what
>> exacltly is wrong.
>>
>>>
>>> # ovs-appctl dpctl/dump-flows type=offloaded
>>> tunnel(tun_id=0x10,src=10.1.0.107,dst=10.1.0.109,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x50002}),flags(+key)),ct_state(-new-est-rel-rpl-trk),ct_label(0/0x3),recirc_id(0),in_port(4),eth(src=00:00:ba:a4:6e:ad,dst=00:01:ba:a4:6e:ad),eth_type(0x0800),ipv4(proto=1,tos=0/0x3,frag=no),
>>>  packets:3192, bytes:312816, used:1.240s, 
>>> actions:set(tunnel(tun_id=0xff0011,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x18000b}),flags(csum|key))),4
>>> tunnel(tun_id=0xff0011,src=10.1.1.18,dst=10.1.0.109,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xb0018}),flags(+key)),ct_state(-new-est-rel-rpl-trk),ct_label(0/0x3),recirc_id(0),in_port(4),eth(src=00:01:ba:a4:6e:ad,dst=00:00:ba:a4:6e:ad),eth_type(0x0800),ipv4(src=172.32.1.0/255.255.255.0,dst=172.32.0.4,proto=1,tos=0/0x3,ttl=63,frag=no),
>>>  packets:3192, bytes:312816, used:1.240s, 
>>> actions:set(tunnel(tun_id=0x11,src=10.1.0.109,dst=10.1.0.107,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x10002}),flags(csum|key))),set(eth(src=d0:fe:00:00:00:1d,dst=0a:00:66:ec:f7:40)),set(ipv4(ttl=62)),4
>>> tunnel(tun_id=0x10,src=10.1.0.105,dst=10.1.0.109,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x60002}),flags(+key)),ct_state(-new-est-rel-rpl-trk),ct_label(0/0x3),recirc_id(0),in_port(4),eth(src=00:00:ba:a4:6e:ad,dst=00:01:ba:a4:6e:ad),eth_type(0x0800),ipv4(proto=1,tos=0/0x3,frag=no),
>>>  packets:293, bytes:28714, used:1.240s, 
>>> actions:set(tunnel(tun_id=0xff0011,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x18000b}),flags(csum|key))),4
>>> tunnel(tun_id=0xff0011,src=10.1.1.18,dst=10.1.0.109,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xb0018}),flags(+key)),ct_state(-new-est-rel-rpl-trk),ct_label(0/0x3),recirc_id(0),in_port(4),eth(src=00:01:ba:a4:6e:ad,dst=00:00:ba:a4:6e:ad),eth_type(0x0800),ipv4(src=172.32.1.0/255.255.255.0,dst=172.32.2.4,proto=1,tos=0/0x3,ttl=63,frag=no),
>>>  packets:293, bytes:28714, used:1.240s, 
>>> actions:set(tunnel(tun_id=0x17,src=10.1.0.109,dst=10.1.0.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x10002}),flags(csum|key))),set(eth(src=d0:fe:00:00:00:8e,dst=0a:00:40:c2:76:a0)),set(ipv4(ttl=62)),4
>>> tunnel(tun_id=0x10,src=10.1.0.104,dst=10.1.0.109,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002}),flags(+key)),ct_state(-new-est-rel-rpl-trk),ct_label(0/0x3),recirc_id(0),in_port(4),eth(src=00:00:ba:a4:6e:ad,dst=00:01:ba:a4:6e:ad),eth_type(0x0800),ipv4(proto=6,tos=0/0x3,frag=no),
>>>  packets:0, bytes:0, used:never, 
>>> actions:set(tunnel(tun_id=0xff0011,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x18000b}),flags(csum|key))),4
>>> tunnel(tun_id=0xff0011,src=10.1.1.18,dst=10.1.0.109,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xb0018}),flags(+key)),ct_state(-new-est-rel-rpl-trk),ct_label(0/0x3),recirc_id(0),in_port(4),eth(src=00:01:ba:a4:6e:ad,dst=00:00:ba:a4:6e:ad),eth_type(0x0800),ipv4(src=169.254.96.0/255.255.252.0,dst=169.254.99.0,proto=6,tos=0/0x3,ttl=254,frag=no),
>>>  packets:0, bytes:0, used:never, 
>>> actions:set(tunnel(tun_id=0xe,src=10.1.0.109,dst=10.1.0.104,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x20001}),flags(csum|key))),set(eth(src=10:00:ba:a4:6e:ad,dst=02:00:ba:a4:6e:ad)),set(ipv4(ttl=253)),4
>>>
>>>
>>> I’m wonder, whether this is a known issue (I couldn’t find any related 
>>> messages searching in internet).
>>> Could someone give any advice/help with this?
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Vladislav Odintsov
>>
>> _______________________________________________
>> discuss mailing list
>> disc...@openvswitch.org <mailto:disc...@openvswitch.org>
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss 
>> <https://mail.openvswitch.org/mailman/listinfo/ovs-discuss>
> 
> 
> Regards,
> Vladislav Odintsov
> 

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to