On Mon, Oct 24, 2022 at 8:53 AM Ales Musil <[email protected]> wrote:
> Hi Mark, > > please see my comment below. > > > On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <[email protected]> > wrote: > >> An issue was filed where CT zone 0 was assigned to both a logical router >> SNAT and to a logical port. CT zone 0 is typically "reserved" and not >> assigned by ovn-controller; however, since SNAT zones are configurable, >> it is possible for ovn-controller to assign this zone at the CMS's >> request. This accounts for how CT zone 0 can be assigned for SNAT. There >> was also a small bug in the incremental processing that could result in >> a logical port being assigned zone 0. >> >> In the specific issue report, CT zones were restored from the OVSDB when >> ovn-controller started, and the conflicting CT zones were already >> present. ovn-controller dutifully loaded these zones up. But then there >> was nothing that would allow for the conflict to be resolved afterwards. >> >> It is unknown how these conflicts entered into the OVSDB in the first >> place. This change does not purport to prevent conflicts from entering >> the OVSDB. However, it does make the following changes that should >> further safeguard against unwanted behavior: >> >> * ct_zones_runtime_data_handler() now assigns zones starting at >> 1 instead of 0. This makes it use the same range as update_ct_zones(). >> * update_ct_zones() now keeps a new simap for zones that are assigned >> but not due to an SNAT zone request. This allows for us to guarantee >> that when there is a conflict between a previously auto-assigned CT >> zone and a newly-requested zone, we are guaranteed to remove the >> auto-assigned CT zone. >> * The removal of conflicting auto-assigned CT zones is now performed >> before dealing with the newly requested zone. This makes it so that if >> we load conflicting zones from the OVSDB or if there is some issue >> that results in conflicting zones being assigned, we will correct the >> issue. >> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406 >> Signed-off-by: Mark Michelson <[email protected]> >> --- >> controller/ovn-controller.c | 53 ++++++++++++++++++------------------- >> 1 file changed, 26 insertions(+), 27 deletions(-) >> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 8895c7a2b..5fd9bbc40 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -659,7 +659,8 @@ update_ct_zones(const struct shash *binding_lports, >> const char *user; >> struct sset all_users = SSET_INITIALIZER(&all_users); >> struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones); >> - unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)]; >> + unsigned long unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)]; >> + struct simap unreq_snat_zones = SIMAP_INITIALIZER(&unreq_snat_zones); >> >> struct shash_node *shash_node; >> SHASH_FOR_EACH (shash_node, binding_lports) { >> @@ -696,49 +697,46 @@ update_ct_zones(const struct shash *binding_lports, >> bitmap_set0(ct_zone_bitmap, ct_zone->data); >> simap_delete(ct_zones, ct_zone); >> } else if (!simap_find(&req_snat_zones, ct_zone->name)) { >> - bitmap_set1(unreq_snat_zones, ct_zone->data); >> + bitmap_set1(unreq_snat_zones_map, ct_zone->data); >> + simap_put(&unreq_snat_zones, ct_zone->name, ct_zone->data); >> } >> } >> >> /* Prioritize requested CT zones */ >> struct simap_node *snat_req_node; >> SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) { >> - struct simap_node *node = simap_find(ct_zones, >> snat_req_node->name); >> - if (node) { >> - if (node->data == snat_req_node->data) { >> - /* No change to this request, so no action needed */ >> - continue; >> - } else { >> - /* Zone request has changed for this node. delete old >> entry */ >> - bitmap_set0(ct_zone_bitmap, node->data); >> - simap_delete(ct_zones, node); >> - } >> - } >> - >> /* Determine if someone already had this zone auto-assigned. >> * If so, then they need to give up their assignment since >> * that zone is being explicitly requested now. >> */ >> - if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) { >> - struct simap_node *dup; >> - SIMAP_FOR_EACH_SAFE (dup, ct_zones) { >> - if (dup != snat_req_node && dup->data == >> snat_req_node->data) { >> - simap_delete(ct_zones, dup); >> + if (bitmap_is_set(unreq_snat_zones_map, snat_req_node->data)) { >> + struct simap_node *unreq_node; >> + SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) { >> + if (unreq_node->data == snat_req_node->data) { >> break; >> } >> } >> > > This does not seem right, considering the scenario when we have two zones > A=0 and B=0 and at the same time requesting SNAT=0, > the B wouldn't be cleared and there would be conflict. Which makes me > wonder if it's not the scenario of how we got into the same zone scenario > in the first place? > IMO it would make more sense to run the simap_find_and_delete(ct_zones, > unreq_node->name); in the loop without break, to really remove anything > that has the same id. > Having this patch applied the could just run recompute without worrying > how it got into this situation in the first place and it should fix it. > WDYT? > > >> + ovs_assert(unreq_node); >> + >> + simap_find_and_delete(ct_zones, unreq_node->name); >> + >> /* Set this bit to 0 so that if multiple datapaths have >> requested >> * this zone, we don't needlessly double-detect this >> condition. >> */ >> - bitmap_set0(unreq_snat_zones, snat_req_node->data); >> + bitmap_set0(unreq_snat_zones_map, snat_req_node->data); >> } >> >> - add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED, >> - snat_req_node->data, true, >> - snat_req_node->name); >> - >> - bitmap_set1(ct_zone_bitmap, snat_req_node->data); >> - simap_put(ct_zones, snat_req_node->name, snat_req_node->data); >> + struct simap_node *node = simap_find(ct_zones, >> snat_req_node->name); >> + if (node && node->data != snat_req_node->data) { >> + /* Zone request has changed for this node. delete old entry >> and >> + * create new one*/ >> + add_pending_ct_zone_entry(pending_ct_zones, >> CT_ZONE_OF_QUEUED, >> + snat_req_node->data, true, >> + snat_req_node->name); >> + bitmap_set0(ct_zone_bitmap, node->data); >> + bitmap_set1(ct_zone_bitmap, snat_req_node->data); >> + node->data = snat_req_node->data; >> + } >> } >> >> /* xxx This is wasteful to assign a zone to each port--even if no >> @@ -756,6 +754,7 @@ update_ct_zones(const struct shash *binding_lports, >> } >> >> simap_destroy(&req_snat_zones); >> + simap_destroy(&unreq_snat_zones); >> sset_destroy(&all_users); >> } >> >> @@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct engine_node >> *node, void *data) >> >> struct hmap *tracked_dp_bindings = &rt_data->tracked_dp_bindings; >> struct tracked_datapath *tdp; >> - int scan_start = 0; >> + int scan_start = 1; >> >> bool updated = false; >> >> -- >> 2.37.3 >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > Thanks, > Ales > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > [email protected] IM: amusil > <https://red.ht/sig> > And I have also found a reproducer. If you run "snat-ct-zone with common NAT zone" with the below diff applied you should see multiple zones having zone id=0. It can be also reproduced with your patch so we probably need to adjust that loop. diff --git a/tests/ovn.at b/tests/ovn.at index f8b8db4df..6cf7d93a3 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -32366,6 +32366,9 @@ as hv1 check ovs-vsctl add-br br-phys ovn_attach n1 br-phys 192.168.0.1 check ovs-vsctl add-port br-int ls0-hv -- set Interface ls0-hv external-ids:iface-id=ls0-hv +check ovs-vsctl add-port br-int ls0-hv1 -- set Interface ls0-hv1 external-ids:iface-id=ls0-hv1 +check ovs-vsctl add-port br-int ls0-hv2 -- set Interface ls0-hv2 external-ids:iface-id=ls0-hv2 +check ovs-vsctl set open . external_ids:ovn-monitor-all=true check ovn-nbctl lr-add lr0 @@ -32378,6 +32381,12 @@ check ovn-nbctl lrp-add lr0 lr0-ls0 00:00:00:00:00:01 10.0.0.1 check ovn-nbctl lsp-add ls0 ls0-hv check ovn-nbctl lsp-set-addresses ls0-hv "00:00:00:00:00:02 10.0.0.2" +check ovn-nbctl lsp-add ls0 ls0-hv1 +check ovn-nbctl lsp-set-addresses ls0-hv1 "00:00:00:00:00:03 10.0.0.3" + +check ovn-nbctl lsp-add ls0 ls0-hv2 +check ovn-nbctl lsp-set-addresses ls0-hv2 "00:00:00:00:00:04 10.0.0.4" + check ovn-nbctl ls-add ext check ovn-nbctl lsp-add ext ext-lr0 check ovn-nbctl lsp-set-type ext-lr0 router @@ -32410,7 +32419,7 @@ snat_zone=${snat_zone::-1} check test "$dnat_zone" = "$DNAT_ZONE_REG" check test "$snat_zone" = "$DNAT_ZONE_REG" -check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=666 +check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0 dnat_zone=$(ovs-ofctl dump-flows br-int table=$DNAT_TABLE,metadata=0x${lr0_dp_key} | grep -o zone=.*, | cut -d '=' -f 2) dnat_zone=${dnat_zone::-1} @@ -32422,6 +32431,19 @@ snat_zone=${snat_zone::-1} check test "$dnat_zone" = "$SNAT_ZONE_REG" check test "$snat_zone" = "$SNAT_ZONE_REG" +ovs-appctl -t ovn-controller exit --restart + +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv="0" +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1="0" +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2="0" + +start_daemon ovn-controller +sleep 2 + +ovn-appctl ct-zone-list + OVN_CLEANUP([hv1]) AT_CLEANUP Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] IM: amusil <https://red.ht/sig> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
