On 12/2/22 18:31, Vladislav Odintsov wrote:
> Hi,
>
> we’ve met with an issue, where it was possible to create multiple similar
> routes within LR (same ip_prefix, nexthop, and route table).
>
> Initially the problem stared after OVN upgrade. We use python ovsdbapp
> library,
> and we found a problem in python-ovs, which is described here
> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my
> colleague Anton. @Terry Wilson, please take a look on this.
>
> The problem itself touches OVN and OVS. Sorry for the long read, but it seems
> that there are a couple of bugs in different places, part of which this RFC
> used to cover.
>
> How the issue was initially reproduced:
>
> 1. assume we have (at least) 2-Availability Zone OVN deployment
> (utilising ovn-ic infrastructure).
> 2. create transit switch in IC NB
> 3. create LR in each AZ, connect them to transit switch
> 4. create one logical switch with a VIF port attached to local OVS &
> connect this logical switch to LR (e.g. 192.168.0.1/24)
> 5. install in one AZ in LR 2 static routes with a create command (invoke
> next command twice):
>
> ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32
> nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id
>
> From this time there is a couple of strange behaviour/bugs appear:
>
> 1. [possible problem] There is a duplicated route in the NB within a
> single LR. lflow is computed to have ECMP group with two similar
> routes:
>
> table=11(lr_in_ip_routing ), priority=97 , match=(reg7 == 0 && ip4.dst
> == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1;
> reg8[16..31] = select(1, 2);
> table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 1
> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1;
> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
> table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 2
> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1;
> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>
> Maybe, it’s better to have some kind of handling such routes?
> ovsdb index or some logic in ovn-northd?
>
> 2. [bug] There is a duplicated route advertisement in
> OVN_IC_Southbound:Route table. IMO, this should be fixed by adding a
> new index to this table for availability_zone, transit_switch,
> ip_prefix, nexthop and route_table; adding a logic to check if the
> route was already advertised (covered in Patch #7).
>
> 3. [bug] There is a constant same route learning. Each ovn-ic iteration
> on the opposite availability zone adds one new same route. It creates
> thousands of same routes each second. This bug is covered by Patch #7.
>
> 4. [possible problem] After multiple routes are learned to NB on the
> opposite availability zone, ovn-northd generates ecmp lflows. Same as
> in #1: one in lr_in_ip_routing with select(<thousands of elements>)
> and thousands of same records in lr_in_ip_routing_ecmp. OVN allows
> installing UINT_MAX routes within ECMP group.
>
> 5. [OVS bug?] I'd like someone from OVS team to see on this.
> ovn-controller installed long-long openflow group rule
> (group #3):
>
> # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
> 797824
>
> When I try to dump groups with ovs-ofctl dump-groups br-int, I get
> next error in console:
>
> # ovs-ofctl dump-groups br-int
> ovs-ofctl: OpenFlow packet receive failed (End of file)
>
> In ovs-vswitchd I see next error in logs and after this line ovs is
> restarted:
>
> 2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion
> start_ofs <= UINT16_MAX failed in ofpmp_postappend()
This looks like an OVS bug to me. Ilya, what do you think the best way
to fix this is?
>
> If I issue command again, sometimes it prints same error, but
> sometimes this one (I had on the dev machine another OVN LB, so there
> are excess groups):
>
> # ovs-ofctl dump-groups br-int
> NXST_GROUP_DESC reply (xid=0x2): flags=[more]
>
> group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
>
> group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
> 2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length
> 56 exceeds remaining buckets data size 40
> NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET***
> 00000000 01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........#
> |
> 00000010 00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02
> |.........@......|
> 00000020 a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00
> |.........8.(....|
> 00000030 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......#
> ........|
> 00000040 00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............#
> |
> 00000050 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00
> |.............d..|
> 00000060 00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........#
> |
> 00000070 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02
> |................|
> 00000080 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......#
> ........|
> 00000090 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02
> |.....d...8.(....|
> 000000a0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......#
> ........|
> 000000b0 00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............#
> |
> 000000c0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00
> |.............d..|
> 000000d0 00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........#
> |
> 000000e0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04
> |................|
> 000000f0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......#
> ........|
> 00000100 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04
> |.....d...8.(....|
> 00000110 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......#
> ........|
> 00000120 00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............#
> |
> 00000130 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00
> |.............d..|
> 00000140 00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........#
> |
> 00000150 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06
> |................|
> 00000160 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......#
> ........|
> 00000170 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06
> |.....d...8.(....|
> 00000180 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......#
> ........|
> 00000190 00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............#
> |
> 000001a0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00
> |.............d..|
> 000001b0 00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........#
> |
> 000001c0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08
> |................|
> 000001d0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......#
> ........|
> 000001e0 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08
> |.....d...8.(....|
> 000001f0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......#
> ........|
> 00000200 00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............#
> |
> 00000210 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00
> |.............d..|
>
> 7. From this problem with groups-dump I have some questions:
> 1. Is there a limit for a buckets count in group? Or a limit for the
> group string length?
> 2. If yes, should OVN limit on its side the count of buckets in a
> group? (Patches #4 && #6).
>
> 8. Also I’ve tried to see from which values do these problem with
> dump-groups begin. I created in a for-loop in OVN multiple ECMP routes
> and see that starting from 1200 items in a group the error from last
> example appear. I tried to create 10k buckets and while it was
> configuring on my machine there were also next lines in logfile:
>
> 2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting
> for main to quiesce
> 2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting
> for main to quiesce
> 2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting
> for main to quiesce
>
> When the routes finished creating, I've issued ovs-ofctl dump-groups br-int
> and there was just an error:
>
> # ovs-ofctl dump-groups br-int
> ovs-ofctl: OpenFlow packet receive failed (End of file)
>
> And OVS crashed. OVS 2.17.3 is used.
>
> My script:
>
> # cat ./repro.sh
> #!/bin/bash
>
> count=$1
>
> echo "Creating ${count} same routes..."
>
> ovn-nbctl lr-route-del lr1 1.2.3.4/32
>
> for i in $(seq 1 ${count}); do
> echo $i
> ovn-nbctl --id=@id create logical-router-static-route
> ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router
> vpc-FC7D6A54 static_routes @id
> done
>
> Thanks for reading this, I'm ready to provide any additional information to
> help investigate this.
>
> Vladislav Odintsov (7):
> ic: move routes_ad hmap insert to separate function
> ic: remove orphan ovn interconnection routes
> ic: lookup southbound port_binding only if needed
> actions: limit possible OF group bucket count
> ic: minor code improvements
> northd: limit ECMP group by 1024 members
> ic: prevent advertising/learning multiple same routes
>
> ic/ovn-ic.c | 123 ++++++++++++++++++++++++++++------------
> lib/actions.c | 40 ++++++++++++-
> northd/northd.c | 2 +-
> ovn-ic-sb.ovsschema | 6 +-
> tests/ovn-ic.at | 133 ++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 263 insertions(+), 41 deletions(-)
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev