On 12/5/22 17:40, Dumitru Ceara wrote:
> On 12/2/22 18:31, Vladislav Odintsov wrote:
>> Hi,
>>
>> we’ve met with an issue, where it was possible to create multiple similar
>> routes within LR (same ip_prefix, nexthop, and route table).
>>
>> Initially the problem stared after OVN upgrade. We use python ovsdbapp 
>> library,
>> and we found a problem in python-ovs, which is described here
>> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by 
>> my
>> colleague Anton.  @Terry Wilson, please take a look on this.
>>
>> The problem itself touches OVN and OVS.  Sorry for the long read, but it 
>> seems
>> that there are a couple of bugs in different places, part of which this RFC
>> used to cover.
>>
>> How the issue was initially reproduced:
>>
>> 1. assume we have (at least) 2-Availability Zone OVN deployment
>>    (utilising ovn-ic infrastructure).
>> 2. create transit switch in IC NB
>> 3. create LR in each AZ, connect them to transit switch
>> 4. create one logical switch with a VIF port attached to local OVS &
>>    connect this logical switch to LR (e.g. 192.168.0.1/24)
>> 5. install in one AZ in LR 2 static routes with a create command (invoke
>>    next command twice):
>>
>>    ovn-nbctl --id=@id create logical-router-static-route 
>> ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 
>> static_routes @id
>>
>> From this time there is a couple of strange behaviour/bugs appear:
>>
>> 1. [possible problem] There is a duplicated route in the NB within a
>>    single LR.  lflow is computed to have ECMP group with two similar
>>    routes:
>>
>>    table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && 
>> ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 
>> 1; reg8[16..31] = select(1, 2);
>>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 
>> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
>> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 
>> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
>> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>>
>>    Maybe, it’s better to have some kind of handling such routes?
>>    ovsdb index or some logic in ovn-northd?
>>
>> 2. [bug] There is a duplicated route advertisement in
>>    OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
>>    new index to this table for availability_zone, transit_switch,
>>    ip_prefix, nexthop and route_table; adding a logic to check if the
>>    route was already advertised (covered in Patch #7).
>>
>> 3. [bug] There is a constant same route learning.  Each ovn-ic iteration
>>    on the opposite availability zone adds one new same route.  It creates
>>    thousands of same routes each second. This bug is covered by Patch #7.
>>
>> 4. [possible problem] After multiple routes are learned to NB on the
>>    opposite availability zone, ovn-northd generates ecmp lflows.  Same as
>>    in #1: one in lr_in_ip_routing with select(<thousands of elements>)
>>    and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
>>    installing UINT_MAX routes within ECMP group.
>>
>> 5. [OVS bug?] I'd like someone from OVS team to see on this.
>>    ovn-controller installed long-long openflow group rule
>>    (group #3):
>>
>>    # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
>>    797824
>>
>>    When I try to dump groups with ovs-ofctl dump-groups br-int, I get
>>    next error in console:
>>
>>    # ovs-ofctl dump-groups br-int
>>    ovs-ofctl: OpenFlow packet receive failed (End of file)
>>
>>    In ovs-vswitchd I see next error in logs and after this line ovs is
>>    restarted:
>>
>>    2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion 
>> start_ofs <= UINT16_MAX failed in ofpmp_postappend()
> 
> This looks like an OVS bug to me.  Ilya, what do you think the best way
> to fix this is?

This might be considered as a bug in OVS.  In any case, OVS
should not crash, but print an error and continue.

I'm not sure what is the best way to fix that, need to look
closer at the code.

However...

>> 7. From this problem with groups-dump I have some questions:
>>    1. Is there a limit for a buckets count in group? Or a limit for the
>>       group string length?
>>    2. If yes, should OVN limit on its side the count of buckets in a
>>       group? (Patches #4 && #6).

Reading the OpenFlow 1.5 spec, there is a limit on the number
of buckets, but it is derived from the maximum bucket id, which
is close to a 32bit unsigned value.  So, there is no meaningful
limit until you reach 32bit limit, which is unlikely.

But, there are other indirect limits:

1. For the group modification message, the bucket should fit
   into a single OFPT_GROUP_MOD message (struct ofp_group_mod).
   Meaning that each bucket (including actions) cannot take
   more than roughly (didn't account for headers) 64K - 24 bytes.

2. Actions within a bucket cannot exceed 64K bytes (but they
   are more limited by the total bucket size above).

3. In order to be dumpable with OFPMP_GROUP_DESC, each group
   (with all the buckets with their actions) must fit into a
   single OF message, i.e. 64K.  This is caused by the fact
   that multipart messages must contain an integral number of
   objects and objects can not be split between messages.
   The 'object' for OFPMP_GROUP_DESC is a group, so we can't
   split it on a bucket level.

So, technically, a group with a very large number of buckets
can be created using OFPT_GROUP_MOD with OFPGC_INSERT_BUCKET,
but it will not be possible to dump that group with OFPMP_GROUP_DESC.
Depending on the size, it might still be possible to get group
stats with OFPMP_GROUP_STATS, since that reply will not contain
actual buckets, but only stats per bucket, that might be smaller
in total size.

OVN should definitely check and not create buckets with actions
longer than 64K minus some overhead.  OVN has the same issue for
OpenFlow rules as well that is currently is not handled in any
way.

I'm not sure if limiting the total number of buckets makes sense,
unless we're talking about the 32bit range.

Processing of very large groups may be a performance concern
for OVS as you saw in the logs, but that's a different story
and can, potentially, be optimized if necessary.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to