Hi, we’ve met with an issue, where it was possible to create multiple similar routes within LR (same ip_prefix, nexthop, and route table).
Initially the problem stared after OVN upgrade. We use python ovsdbapp library, and we found a problem in python-ovs, which is described here https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my colleague Anton. @Terry Wilson, please take a look on this. The problem itself touches OVN and OVS. Sorry for the long read, but it seems that there are a couple of bugs in different places, part of which this RFC used to cover. How the issue was initially reproduced: 1. assume we have (at least) 2-Availability Zone OVN deployment (utilising ovn-ic infrastructure). 2. create transit switch in IC NB 3. create LR in each AZ, connect them to transit switch 4. create one logical switch with a VIF port attached to local OVS & connect this logical switch to LR (e.g. 192.168.0.1/24) 5. install in one AZ in LR 2 static routes with a create command (invoke next command twice): ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id From this time there is a couple of strange behaviour/bugs appear: 1. [possible problem] There is a duplicated route in the NB within a single LR. lflow is computed to have ECMP group with two similar routes: table=11(lr_in_ip_routing ), priority=97 , match=(reg7 == 0 && ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; reg8[16..31] = select(1, 2); table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 1 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 2 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) Maybe, it’s better to have some kind of handling such routes? ovsdb index or some logic in ovn-northd? 2. [bug] There is a duplicated route advertisement in OVN_IC_Southbound:Route table. IMO, this should be fixed by adding a new index to this table for availability_zone, transit_switch, ip_prefix, nexthop and route_table; adding a logic to check if the route was already advertised (covered in Patch #7). 3. [bug] There is a constant same route learning. Each ovn-ic iteration on the opposite availability zone adds one new same route. It creates thousands of same routes each second. This bug is covered by Patch #7. 4. [possible problem] After multiple routes are learned to NB on the opposite availability zone, ovn-northd generates ecmp lflows. Same as in #1: one in lr_in_ip_routing with select(<thousands of elements>) and thousands of same records in lr_in_ip_routing_ecmp. OVN allows installing UINT_MAX routes within ECMP group. 5. [OVS bug?] I'd like someone from OVS team to see on this. ovn-controller installed long-long openflow group rule (group #3): # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c 797824 When I try to dump groups with ovs-ofctl dump-groups br-int, I get next error in console: # ovs-ofctl dump-groups br-int ovs-ofctl: OpenFlow packet receive failed (End of file) In ovs-vswitchd I see next error in logs and after this line ovs is restarted: 2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion start_ofs <= UINT16_MAX failed in ofpmp_postappend() If I issue command again, sometimes it prints same error, but sometimes this one (I had on the dev machine another OVN LB, so there are excess groups): # ovs-ofctl dump-groups br-int NXST_GROUP_DESC reply (xid=0x2): flags=[more] group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1])) group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1])) 2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length 56 exceeds remaining buckets data size 40 NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET*** 00000000 01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# | 00000010 00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 |.........@......| 00000020 a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 |.........8.(....| 00000030 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| 00000040 00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# | 00000050 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| 00000060 00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# | 00000070 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 |................| 00000080 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| 00000090 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 |.....d...8.(....| 000000a0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| 000000b0 00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# | 000000c0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| 000000d0 00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# | 000000e0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 |................| 000000f0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| 00000100 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 |.....d...8.(....| 00000110 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| 00000120 00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# | 00000130 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| 00000140 00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# | 00000150 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 |................| 00000160 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| 00000170 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 |.....d...8.(....| 00000180 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| 00000190 00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# | 000001a0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| 000001b0 00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# | 000001c0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 |................| 000001d0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........| 000001e0 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 |.....d...8.(....| 000001f0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........| 00000200 00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# | 00000210 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..| 7. From this problem with groups-dump I have some questions: 1. Is there a limit for a buckets count in group? Or a limit for the group string length? 2. If yes, should OVN limit on its side the count of buckets in a group? (Patches #4 && #6). 8. Also I’ve tried to see from which values do these problem with dump-groups begin. I created in a for-loop in OVN multiple ECMP routes and see that starting from 1200 items in a group the error from last example appear. I tried to create 10k buckets and while it was configuring on my machine there were also next lines in logfile: 2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce 2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce 2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting for main to quiesce When the routes finished creating, I've issued ovs-ofctl dump-groups br-int and there was just an error: # ovs-ofctl dump-groups br-int ovs-ofctl: OpenFlow packet receive failed (End of file) And OVS crashed. OVS 2.17.3 is used. My script: # cat ./repro.sh #!/bin/bash count=$1 echo "Creating ${count} same routes..." ovn-nbctl lr-route-del lr1 1.2.3.4/32 for i in $(seq 1 ${count}); do echo $i ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router vpc-FC7D6A54 static_routes @id done Thanks for reading this, I'm ready to provide any additional information to help investigate this. Vladislav Odintsov (7): ic: move routes_ad hmap insert to separate function ic: remove orphan ovn interconnection routes ic: lookup southbound port_binding only if needed actions: limit possible OF group bucket count ic: minor code improvements northd: limit ECMP group by 1024 members ic: prevent advertising/learning multiple same routes ic/ovn-ic.c | 123 ++++++++++++++++++++++++++++------------ lib/actions.c | 40 ++++++++++++- northd/northd.c | 2 +- ovn-ic-sb.ovsschema | 6 +- tests/ovn-ic.at | 133 ++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 263 insertions(+), 41 deletions(-) -- 2.36.1 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
