Hi, Numan, I had found some steps to reproduce the issue and know the cause.
The steps are below:
1. create a switch including localnet port.
```
switch b6fc4114-9974-431e-aab7-c538300fde56 (neutron-27c5d497-b015-40bc-aeae-3cabad59d1bb) (aka gj-test-vlan3)
    port provnet-75041986-086e-425a-b10c-fb23557dbe0a
        type: localnet
        tag: 1670
        addresses: ["unknown"]
    port 72108ae2-168e-40b1-ac3e-13fb6e8bce0b
        type: localport
        addresses: ["fa:16:3e:de:dc:6a 192.168.33.2"]
port 85762ca1-0999-4fda-8ba3-2c5319eba9f0 (aka gj-test-vlan3-vm-2_gj-test-vlan3_5ff1568b)
        addresses: ["fa:16:3e:f5:99:03 192.168.33.219"]
port 74fa21c3-0958-480e-9902-48cea597d628 (aka gj-test-vlan3-vm-1_gj-test-vlan3_2329e3a0)
        addresses: ["fa:16:3e:ed:74:d1 192.168.33.120"]
```
2. the vm1 is on the node-1 and the related lsp is below.
```
()[root@ovn-tool-0 /]# ovn-nbctl list logical_switch_port gj-test-vlan3-vm-1_gj-test-vlan3_2329e3a0
_uuid               : a9f54a41-17ee-425e-bf18-a3e43e900dd8
addresses           : ["fa:16:3e:ed:74:d1 192.168.33.120"]
dhcpv4_options      : 2a2c6bbb-721d-43c0-8676-33242b770ce2
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids : {"neutron:cidrs"="192.168.33.120/24", "neutron:device_id"="67ac6967-07ef-4b27-a7f8-09b2d8a75dfe", "neutron:device_owner"="compute:default-az", "neutron:network_name"=neutron-27c5d497-b015-40bc-aeae-3cabad59d1bb, "neutron:port_name"=gj-test-vlan3-vm-1_gj-test-vlan3_2329e3a0, "neutron:project_id"=fbbc069bef5d4eca828d01592a1f03b3, "neutron:revision_number"="16", "neutron:security_group_ids"="2f26d220-f490-49e9-96d7-b9b72e57cc3b"}
ha_chassis_group    : []
name                : "74fa21c3-0958-480e-9902-48cea597d628"
options             : {requested-chassis=node-1.domain.tld}
parent_name         : []
port_security       : ["fa:16:3e:ed:74:d1 192.168.33.120"]
tag                 : []
tag_request         : []
type                : ""
up                  : true
```
3. move the above lsp to node-2.
```
ovn-nbctl lsp-set-options a9f54a41-17ee-425e-bf18-a3e43e900dd8 requested-chassis=node-2.domain.tld
```
4. After that, move the lsp back to node-1.
Run the above steps, the issue will be reproduced.(If not reproduce, run the steps many times.)

The cause is below:
When run the above step 3, the original node-1 will remove the datapath of gj-test-vlan3 from local_datapaths, so also will remove all related openflows.

when run the above step 4, the datapath of gj-test-vlan3 will be added to local_datapaths and port_binding table will be updated. According to I-P, the runtime_data engine will run runtime_data_sb_port_binding_handler function to call binding_handle_port_binding_changes function. But the updated port is a vif port which call handle_updated_vif_lport function, the localnet port isn't set to the datapath of gj-test-vlan3 which call update_ld_localnet_port function.

So the non-expected openflows will be generated.

I review recent patch about ovn-controller, I found the patch should solve the issue: https://github.com/ovn-org/ovn/commit/50b3af8938c93491d429dcabe8f9902f0aa43426

I will verify the above patch to check my guess.

Thanks,
Jun

On 7/19/24 23:01, Numan Siddique wrote:
On Thu, Jul 18, 2024 at 10:17 PM Jun Gu <[email protected]> wrote:

Hi, Numan, Thanks for your reply. Unfortunately, We have restarted the
ovn-controller and the issue is fixed now. Currently, I am analyzing
codes to confirm if some situations will cause the issue. Do you have
any suggestions for me?


Fact that the restart fixed the issue suggests that it was most likely
an I-P issue
It would be great if you can consider upgrading OVN to the latest
version or latest LTS version.

I'd suggest delete the localnet port and re-add it and see if those
unexpected flows are present or not.
Next time the issue is seen, please run the ovn-appctl command to see
if it fixes the issue.

Numan

Thanks
Jun

On 7/19/24 00:52, Numan Siddique wrote:
On Wed, Jul 17, 2024 at 6:07 AM Jun Gu <[email protected]> wrote:

Hi team
     The version is below:
       OVN version: branch-21.09
       OVS version: branch-2.16.2
     we encounter a issue that ovn-controller generates some non-expected
openflows to forward packets to tunnel ports in table 37 when a localnet
port existed on the datapath. The related information is below:
     - Related datapath and ports.
```
switch d9954742-5f9a-440d-8f4a-e9cba08ff5b3
(neutron-102c1351-65a6-4640-8310-c3cd446a5b69) (aka JSNX_APP_VLAN121)
       port b4decafe-1948-4214-b021-393953bf94c3 (aka
sc-arm-lnbp-clbatch-13_JSNX_APP_VLAN121_d36c91ab)
           addresses: ["fa:16:3e:32:ca:7e 32.12.121.150"]
       port 87e5ac12-80b8-4477-af59-dfb0fc0a43ff (aka
sc-arm-lnbp-pfservice-6_JSNX_APP_VLAN121_109c4c56)
           addresses: ["fa:16:3e:e7:58:37 32.12.121.103"]
       port ad722de8-20ca-4f52-97bf-df29f9512d6e
           type: localport
           addresses: ["fa:16:3e:3d:96:cd 32.12.121.10"]
...
       port 067b57c8-2e50-4a86-a255-31e799a24ac5 (aka
sc-arm-lnbp-taebatch-5_JSNX_APP_VLAN121_a029cd93)
           addresses: ["fa:16:3e:e5:0e:10 32.12.121.151"]
       port d2479952-a2e2-4c94-b5d9-5086f5a177ee (aka
sc-arm-lnbp-mars-3_JSNX_APP_VLAN121_319305dc)
           addresses: ["fa:16:3e:7c:4e:35 32.12.121.194"]
       port 64ddd42f-6a70-4060-8782-bd5cad7c4983 (aka
sc-arm-lnbp-clbatch-8_JSNX_APP_VLAN121_7a4ca62f)
           addresses: ["fa:16:3e:9c:56:aa 32.12.121.213"]
       port provnet-6612e9ff-5364-4788-9b27-0a8c6053bec3
           type: localnet
           tag: 121
           addresses: ["unknown"]
       port f004de9d-5e52-43c7-9094-364a07ac420f (aka
sc-arm-lnbp-influxdb-3_JSNX_APP_VLAN121_7527d6dc)
           addresses: ["fa:16:3e:96:6b:45 32.12.121.231"]
```
     - Related openflows.
```
    cookie=0xd9b58719, duration=3803793.502s, table=37, n_packets=0,
n_bytes=0, idle_age=65535, priority=100,reg15=0x37,metadata=0xd
actions=set_field:0xd/0xffffff->tun_id,set_field:0x37->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:13
    cookie=0x997eb173, duration=3803793.502s, table=37, n_packets=0,
n_bytes=0, idle_age=65535, priority=100,reg15=0x8005,metadata=0xd
actions=set_field:0x2->reg15,resubmit(,39),set_field:0x8005->reg15,set_field:0xd/0xffffff->tun_id,set_field:0x8005->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:300,output:14,output:138,output:137,output:299,output:140,output:304,output:303,output:142,output:1,output:302,output:139,output:301,output:13,output:11,output:10,resubmit(,38)
    cookie=0xce52adfe, duration=3803793.502s, table=37, n_packets=2475804,
n_bytes=162274574, idle_age=0, priority=100,reg15=0x8000,metadata=0xd
actions=set_field:0x2->reg15,resubmit(,39),set_field:0x8000->reg15,set_field:0xd/0xffffff->tun_id,set_field:0x8000->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:300,output:14,output:138,output:137,output:299,output:140,output:304,output:303,output:142,output:1,output:302,output:139,output:301,output:13,output:11,output:10,resubmit(,38)
    cookie=0xb9dee9ea, duration=3803793.502s, table=37, n_packets=0,
n_bytes=0, idle_age=65535, priority=100,reg15=0x8003,metadata=0xd
actions=set_field:0xd/0xffffff->tun_id,set_field:0x8003->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:300,output:14,output:138,output:137,output:299,output:140,output:304,output:303,output:142,output:1,output:302,output:139,output:301,output:13,output:11,output:10,resubmit(,38)
```
     The datapath tunnel_key is 0xd, its localnet port tunnel_key is 0x1
and its localport port tunnel_key is 0x2. From openflows related 0xd
datapath, the localnet port has not been acquired to generate the table
37 non-expected openflows. And other tables can identify the localnet
port, so other tables which generate openflows are correct. For example,
a openflow from table 38
```
    cookie=0xce52adfe, duration=3803727.055s, table=38, n_packets=2475787,
n_bytes=162273290, idle_age=0, priority=100,reg15=0x8000,metadata=0xd
actions=set_field:0x1->reg15,resubmit(,39),set_field:0x70->reg13,set_field:0x3->reg15,resubmit(,39),set_field:0x72->reg13,set_field:0x26->reg13,set_field:0x29->reg15,resubmit(,39),set_field:0x39->reg13,set_field:0x33->reg15,resubmit(,39),set_field:0x8000->reg15
```
     Based on the above information, we analyze the related ovn-controller
codes. Only one condition that the localnet port is not queried by
get_localnet_port function, the issue will occur.
     However, a deeper analysis of the code did not yield more useful
information.

Seems to me this could be a bug in the ovn-controller's incremental
processing (I-P) and should be fixed in recent versions.

Can you please run the command - "ovn-appctl -t ovn-controller
recompute" or "ovn-appctl -t ovn-controller inc-engine/recompute"
and see if the non-expected openflows are deleted ?  This would
confirm if its an I-P bug or not.

Thanks
Numan





_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to