Hi, Numan, I had found some steps to reproduce the issue and know the cause. The steps are below: 1. create a switch including localnet port. ```switch b6fc4114-9974-431e-aab7-c538300fde56 (neutron-27c5d497-b015-40bc-aeae-3cabad59d1bb) (aka gj-test-vlan3)
port provnet-75041986-086e-425a-b10c-fb23557dbe0a
type: localnet
tag: 1670
addresses: ["unknown"]
port 72108ae2-168e-40b1-ac3e-13fb6e8bce0b
type: localport
addresses: ["fa:16:3e:de:dc:6a 192.168.33.2"]
port 85762ca1-0999-4fda-8ba3-2c5319eba9f0 (aka
gj-test-vlan3-vm-2_gj-test-vlan3_5ff1568b)
addresses: ["fa:16:3e:f5:99:03 192.168.33.219"]
port 74fa21c3-0958-480e-9902-48cea597d628 (aka
gj-test-vlan3-vm-1_gj-test-vlan3_2329e3a0)
addresses: ["fa:16:3e:ed:74:d1 192.168.33.120"]
```
2. the vm1 is on the node-1 and the related lsp is below.
```
()[root@ovn-tool-0 /]# ovn-nbctl list logical_switch_port
gj-test-vlan3-vm-1_gj-test-vlan3_2329e3a0
_uuid : a9f54a41-17ee-425e-bf18-a3e43e900dd8 addresses : ["fa:16:3e:ed:74:d1 192.168.33.120"] dhcpv4_options : 2a2c6bbb-721d-43c0-8676-33242b770ce2 dhcpv6_options : [] dynamic_addresses : [] enabled : trueexternal_ids : {"neutron:cidrs"="192.168.33.120/24", "neutron:device_id"="67ac6967-07ef-4b27-a7f8-09b2d8a75dfe", "neutron:device_owner"="compute:default-az", "neutron:network_name"=neutron-27c5d497-b015-40bc-aeae-3cabad59d1bb, "neutron:port_name"=gj-test-vlan3-vm-1_gj-test-vlan3_2329e3a0, "neutron:project_id"=fbbc069bef5d4eca828d01592a1f03b3, "neutron:revision_number"="16", "neutron:security_group_ids"="2f26d220-f490-49e9-96d7-b9b72e57cc3b"}
ha_chassis_group : []
name : "74fa21c3-0958-480e-9902-48cea597d628"
options : {requested-chassis=node-1.domain.tld}
parent_name : []
port_security : ["fa:16:3e:ed:74:d1 192.168.33.120"]
tag : []
tag_request : []
type : ""
up : true
```
3. move the above lsp to node-2.
```
ovn-nbctl lsp-set-options a9f54a41-17ee-425e-bf18-a3e43e900dd8
requested-chassis=node-2.domain.tld
``` 4. After that, move the lsp back to node-1.Run the above steps, the issue will be reproduced.(If not reproduce, run the steps many times.)
The cause is below:When run the above step 3, the original node-1 will remove the datapath of gj-test-vlan3 from local_datapaths, so also will remove all related openflows.
when run the above step 4, the datapath of gj-test-vlan3 will be added to local_datapaths and port_binding table will be updated. According to I-P, the runtime_data engine will run runtime_data_sb_port_binding_handler function to call binding_handle_port_binding_changes function. But the updated port is a vif port which call handle_updated_vif_lport function, the localnet port isn't set to the datapath of gj-test-vlan3 which call update_ld_localnet_port function.
So the non-expected openflows will be generated.I review recent patch about ovn-controller, I found the patch should solve the issue: https://github.com/ovn-org/ovn/commit/50b3af8938c93491d429dcabe8f9902f0aa43426
I will verify the above patch to check my guess. Thanks, Jun On 7/19/24 23:01, Numan Siddique wrote:
On Thu, Jul 18, 2024 at 10:17 PM Jun Gu <[email protected]> wrote:Hi, Numan, Thanks for your reply. Unfortunately, We have restarted the ovn-controller and the issue is fixed now. Currently, I am analyzing codes to confirm if some situations will cause the issue. Do you have any suggestions for me?Fact that the restart fixed the issue suggests that it was most likely an I-P issue It would be great if you can consider upgrading OVN to the latest version or latest LTS version. I'd suggest delete the localnet port and re-add it and see if those unexpected flows are present or not. Next time the issue is seen, please run the ovn-appctl command to see if it fixes the issue. NumanThanks Jun On 7/19/24 00:52, Numan Siddique wrote:On Wed, Jul 17, 2024 at 6:07 AM Jun Gu <[email protected]> wrote:Hi team The version is below: OVN version: branch-21.09 OVS version: branch-2.16.2 we encounter a issue that ovn-controller generates some non-expected openflows to forward packets to tunnel ports in table 37 when a localnet port existed on the datapath. The related information is below: - Related datapath and ports. ``` switch d9954742-5f9a-440d-8f4a-e9cba08ff5b3 (neutron-102c1351-65a6-4640-8310-c3cd446a5b69) (aka JSNX_APP_VLAN121) port b4decafe-1948-4214-b021-393953bf94c3 (aka sc-arm-lnbp-clbatch-13_JSNX_APP_VLAN121_d36c91ab) addresses: ["fa:16:3e:32:ca:7e 32.12.121.150"] port 87e5ac12-80b8-4477-af59-dfb0fc0a43ff (aka sc-arm-lnbp-pfservice-6_JSNX_APP_VLAN121_109c4c56) addresses: ["fa:16:3e:e7:58:37 32.12.121.103"] port ad722de8-20ca-4f52-97bf-df29f9512d6e type: localport addresses: ["fa:16:3e:3d:96:cd 32.12.121.10"] ... port 067b57c8-2e50-4a86-a255-31e799a24ac5 (aka sc-arm-lnbp-taebatch-5_JSNX_APP_VLAN121_a029cd93) addresses: ["fa:16:3e:e5:0e:10 32.12.121.151"] port d2479952-a2e2-4c94-b5d9-5086f5a177ee (aka sc-arm-lnbp-mars-3_JSNX_APP_VLAN121_319305dc) addresses: ["fa:16:3e:7c:4e:35 32.12.121.194"] port 64ddd42f-6a70-4060-8782-bd5cad7c4983 (aka sc-arm-lnbp-clbatch-8_JSNX_APP_VLAN121_7a4ca62f) addresses: ["fa:16:3e:9c:56:aa 32.12.121.213"] port provnet-6612e9ff-5364-4788-9b27-0a8c6053bec3 type: localnet tag: 121 addresses: ["unknown"] port f004de9d-5e52-43c7-9094-364a07ac420f (aka sc-arm-lnbp-influxdb-3_JSNX_APP_VLAN121_7527d6dc) addresses: ["fa:16:3e:96:6b:45 32.12.121.231"] ``` - Related openflows. ``` cookie=0xd9b58719, duration=3803793.502s, table=37, n_packets=0, n_bytes=0, idle_age=65535, priority=100,reg15=0x37,metadata=0xd actions=set_field:0xd/0xffffff->tun_id,set_field:0x37->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:13 cookie=0x997eb173, duration=3803793.502s, table=37, n_packets=0, n_bytes=0, idle_age=65535, priority=100,reg15=0x8005,metadata=0xd actions=set_field:0x2->reg15,resubmit(,39),set_field:0x8005->reg15,set_field:0xd/0xffffff->tun_id,set_field:0x8005->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:300,output:14,output:138,output:137,output:299,output:140,output:304,output:303,output:142,output:1,output:302,output:139,output:301,output:13,output:11,output:10,resubmit(,38) cookie=0xce52adfe, duration=3803793.502s, table=37, n_packets=2475804, n_bytes=162274574, idle_age=0, priority=100,reg15=0x8000,metadata=0xd actions=set_field:0x2->reg15,resubmit(,39),set_field:0x8000->reg15,set_field:0xd/0xffffff->tun_id,set_field:0x8000->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:300,output:14,output:138,output:137,output:299,output:140,output:304,output:303,output:142,output:1,output:302,output:139,output:301,output:13,output:11,output:10,resubmit(,38) cookie=0xb9dee9ea, duration=3803793.502s, table=37, n_packets=0, n_bytes=0, idle_age=65535, priority=100,reg15=0x8003,metadata=0xd actions=set_field:0xd/0xffffff->tun_id,set_field:0x8003->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:300,output:14,output:138,output:137,output:299,output:140,output:304,output:303,output:142,output:1,output:302,output:139,output:301,output:13,output:11,output:10,resubmit(,38) ``` The datapath tunnel_key is 0xd, its localnet port tunnel_key is 0x1 and its localport port tunnel_key is 0x2. From openflows related 0xd datapath, the localnet port has not been acquired to generate the table 37 non-expected openflows. And other tables can identify the localnet port, so other tables which generate openflows are correct. For example, a openflow from table 38 ``` cookie=0xce52adfe, duration=3803727.055s, table=38, n_packets=2475787, n_bytes=162273290, idle_age=0, priority=100,reg15=0x8000,metadata=0xd actions=set_field:0x1->reg15,resubmit(,39),set_field:0x70->reg13,set_field:0x3->reg15,resubmit(,39),set_field:0x72->reg13,set_field:0x26->reg13,set_field:0x29->reg15,resubmit(,39),set_field:0x39->reg13,set_field:0x33->reg15,resubmit(,39),set_field:0x8000->reg15 ``` Based on the above information, we analyze the related ovn-controller codes. Only one condition that the localnet port is not queried by get_localnet_port function, the issue will occur. However, a deeper analysis of the code did not yield more useful information.Seems to me this could be a bug in the ovn-controller's incremental processing (I-P) and should be fixed in recent versions. Can you please run the command - "ovn-appctl -t ovn-controller recompute" or "ovn-appctl -t ovn-controller inc-engine/recompute" and see if the non-expected openflows are deleted ? This would confirm if its an I-P bug or not. Thanks Numan_______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev_______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
