Hi,

we’ve met with an issue, where it was possible to create multiple similar
routes within LR (same ip_prefix, nexthop, and route table).

Initially the problem stared after OVN upgrade. We use python ovsdbapp library,
and we found a problem in python-ovs, which is described here
https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my
colleague Anton.  @Terry Wilson, please take a look on this.

The problem itself touches OVN and OVS.  Sorry for the long read, but it seems
that there are a couple of bugs in different places, part of which this RFC
used to cover.

How the issue was initially reproduced:

1. assume we have (at least) 2-Availability Zone OVN deployment
   (utilising ovn-ic infrastructure).
2. create transit switch in IC NB
3. create LR in each AZ, connect them to transit switch
4. create one logical switch with a VIF port attached to local OVS &
   connect this logical switch to LR (e.g. 192.168.0.1/24)
5. install in one AZ in LR 2 static routes with a create command (invoke
   next command twice):

   ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 
nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id

From this time there is a couple of strange behaviour/bugs appear:

1. [possible problem] There is a duplicated route in the NB within a
   single LR.  lflow is computed to have ECMP group with two similar
   routes:

   table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst 
== 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; 
reg8[16..31] = select(1, 2);
   table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 && 
reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 
d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
   table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 && 
reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 
d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)

   Maybe, it’s better to have some kind of handling such routes?
   ovsdb index or some logic in ovn-northd?

2. [bug] There is a duplicated route advertisement in
   OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
   new index to this table for availability_zone, transit_switch,
   ip_prefix, nexthop and route_table; adding a logic to check if the
   route was already advertised (covered in Patch #7).

3. [bug] There is a constant same route learning.  Each ovn-ic iteration
   on the opposite availability zone adds one new same route.  It creates
   thousands of same routes each second. This bug is covered by Patch #7.

4. [possible problem] After multiple routes are learned to NB on the
   opposite availability zone, ovn-northd generates ecmp lflows.  Same as
   in #1: one in lr_in_ip_routing with select(<thousands of elements>)
   and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
   installing UINT_MAX routes within ECMP group.

5. [OVS bug?] I'd like someone from OVS team to see on this.
   ovn-controller installed long-long openflow group rule
   (group #3):

   # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
   797824

   When I try to dump groups with ovs-ofctl dump-groups br-int, I get
   next error in console:

   # ovs-ofctl dump-groups br-int
   ovs-ofctl: OpenFlow packet receive failed (End of file)

   In ovs-vswitchd I see next error in logs and after this line ovs is
   restarted:

   2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion 
start_ofs <= UINT16_MAX failed in ofpmp_postappend()

   If I issue command again, sometimes it prints same error, but
   sometimes this one (I had on the dev machine another OVN LB, so there
   are excess groups):

   # ovs-ofctl dump-groups br-int
   NXST_GROUP_DESC reply (xid=0x2): flags=[more]
   
group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
   
group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
   2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length 56 
exceeds remaining buckets data size 40
   NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET***
   00000000  01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# |
   00000010  00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 |.........@......|
   00000020  a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 |.........8.(....|
   00000030  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000040  00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# |
   00000050  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   00000060  00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# |
   00000070  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 |................|
   00000080  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000090  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 |.....d...8.(....|
   000000a0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   000000b0  00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# |
   000000c0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   000000d0  00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# |
   000000e0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 |................|
   000000f0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000100  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 |.....d...8.(....|
   00000110  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000120  00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# |
   00000130  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   00000140  00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# |
   00000150  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 |................|
   00000160  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000170  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 |.....d...8.(....|
   00000180  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000190  00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# |
   000001a0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   000001b0  00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# |
   000001c0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 |................|
   000001d0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   000001e0  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 |.....d...8.(....|
   000001f0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000200  00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# |
   00000210  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|

7. From this problem with groups-dump I have some questions:
   1. Is there a limit for a buckets count in group? Or a limit for the
      group string length?
   2. If yes, should OVN limit on its side the count of buckets in a
      group? (Patches #4 && #6).

8. Also I’ve tried to see from which values do these problem with
   dump-groups begin. I created in a for-loop in OVN multiple ECMP routes
   and see that starting from 1200 items in a group the error from last
   example appear. I tried to create 10k buckets and while it was
   configuring on my machine there were also next lines in logfile:

   2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting 
for main to quiesce
   2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting 
for main to quiesce
   2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting 
for main to quiesce

   When the routes finished creating, I've issued ovs-ofctl dump-groups br-int
   and there was just an error:

   # ovs-ofctl dump-groups br-int
   ovs-ofctl: OpenFlow packet receive failed (End of file)

   And OVS crashed. OVS 2.17.3 is used.

   My script:

# cat ./repro.sh
#!/bin/bash

count=$1

echo "Creating ${count} same routes..."

ovn-nbctl lr-route-del lr1 1.2.3.4/32

for i in $(seq 1 ${count}); do
    echo $i
    ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 
nexthop=172.31.32.4 policy=dst-ip -- add logical-router vpc-FC7D6A54 
static_routes @id
done

Thanks for reading this, I'm ready to provide any additional information to 
help investigate this.

Vladislav Odintsov (7):
  ic: move routes_ad hmap insert to separate function
  ic: remove orphan ovn interconnection routes
  ic: lookup southbound port_binding only if needed
  actions: limit possible OF group bucket count
  ic: minor code improvements
  northd: limit ECMP group by 1024 members
  ic: prevent advertising/learning multiple same routes

 ic/ovn-ic.c         | 123 ++++++++++++++++++++++++++++------------
 lib/actions.c       |  40 ++++++++++++-
 northd/northd.c     |   2 +-
 ovn-ic-sb.ovsschema |   6 +-
 tests/ovn-ic.at     | 133 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 263 insertions(+), 41 deletions(-)

-- 
2.36.1

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to