Hi Mark, please see my comment below.
On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <[email protected]> wrote: > An issue was filed where CT zone 0 was assigned to both a logical router > SNAT and to a logical port. CT zone 0 is typically "reserved" and not > assigned by ovn-controller; however, since SNAT zones are configurable, > it is possible for ovn-controller to assign this zone at the CMS's > request. This accounts for how CT zone 0 can be assigned for SNAT. There > was also a small bug in the incremental processing that could result in > a logical port being assigned zone 0. > > In the specific issue report, CT zones were restored from the OVSDB when > ovn-controller started, and the conflicting CT zones were already > present. ovn-controller dutifully loaded these zones up. But then there > was nothing that would allow for the conflict to be resolved afterwards. > > It is unknown how these conflicts entered into the OVSDB in the first > place. This change does not purport to prevent conflicts from entering > the OVSDB. However, it does make the following changes that should > further safeguard against unwanted behavior: > > * ct_zones_runtime_data_handler() now assigns zones starting at > 1 instead of 0. This makes it use the same range as update_ct_zones(). > * update_ct_zones() now keeps a new simap for zones that are assigned > but not due to an SNAT zone request. This allows for us to guarantee > that when there is a conflict between a previously auto-assigned CT > zone and a newly-requested zone, we are guaranteed to remove the > auto-assigned CT zone. > * The removal of conflicting auto-assigned CT zones is now performed > before dealing with the newly requested zone. This makes it so that if > we load conflicting zones from the OVSDB or if there is some issue > that results in conflicting zones being assigned, we will correct the > issue. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406 > Signed-off-by: Mark Michelson <[email protected]> > --- > controller/ovn-controller.c | 53 ++++++++++++++++++------------------- > 1 file changed, 26 insertions(+), 27 deletions(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 8895c7a2b..5fd9bbc40 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -659,7 +659,8 @@ update_ct_zones(const struct shash *binding_lports, > const char *user; > struct sset all_users = SSET_INITIALIZER(&all_users); > struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones); > - unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)]; > + unsigned long unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)]; > + struct simap unreq_snat_zones = SIMAP_INITIALIZER(&unreq_snat_zones); > > struct shash_node *shash_node; > SHASH_FOR_EACH (shash_node, binding_lports) { > @@ -696,49 +697,46 @@ update_ct_zones(const struct shash *binding_lports, > bitmap_set0(ct_zone_bitmap, ct_zone->data); > simap_delete(ct_zones, ct_zone); > } else if (!simap_find(&req_snat_zones, ct_zone->name)) { > - bitmap_set1(unreq_snat_zones, ct_zone->data); > + bitmap_set1(unreq_snat_zones_map, ct_zone->data); > + simap_put(&unreq_snat_zones, ct_zone->name, ct_zone->data); > } > } > > /* Prioritize requested CT zones */ > struct simap_node *snat_req_node; > SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) { > - struct simap_node *node = simap_find(ct_zones, > snat_req_node->name); > - if (node) { > - if (node->data == snat_req_node->data) { > - /* No change to this request, so no action needed */ > - continue; > - } else { > - /* Zone request has changed for this node. delete old > entry */ > - bitmap_set0(ct_zone_bitmap, node->data); > - simap_delete(ct_zones, node); > - } > - } > - > /* Determine if someone already had this zone auto-assigned. > * If so, then they need to give up their assignment since > * that zone is being explicitly requested now. > */ > - if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) { > - struct simap_node *dup; > - SIMAP_FOR_EACH_SAFE (dup, ct_zones) { > - if (dup != snat_req_node && dup->data == > snat_req_node->data) { > - simap_delete(ct_zones, dup); > + if (bitmap_is_set(unreq_snat_zones_map, snat_req_node->data)) { > + struct simap_node *unreq_node; > + SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) { > + if (unreq_node->data == snat_req_node->data) { > break; > } > } > This does not seem right, considering the scenario when we have two zones A=0 and B=0 and at the same time requesting SNAT=0, the B wouldn't be cleared and there would be conflict. Which makes me wonder if it's not the scenario of how we got into the same zone scenario in the first place? IMO it would make more sense to run the simap_find_and_delete(ct_zones, unreq_node->name); in the loop without break, to really remove anything that has the same id. Having this patch applied the could just run recompute without worrying how it got into this situation in the first place and it should fix it. WDYT? > + ovs_assert(unreq_node); > + > + simap_find_and_delete(ct_zones, unreq_node->name); > + > /* Set this bit to 0 so that if multiple datapaths have > requested > * this zone, we don't needlessly double-detect this > condition. > */ > - bitmap_set0(unreq_snat_zones, snat_req_node->data); > + bitmap_set0(unreq_snat_zones_map, snat_req_node->data); > } > > - add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED, > - snat_req_node->data, true, > - snat_req_node->name); > - > - bitmap_set1(ct_zone_bitmap, snat_req_node->data); > - simap_put(ct_zones, snat_req_node->name, snat_req_node->data); > + struct simap_node *node = simap_find(ct_zones, > snat_req_node->name); > + if (node && node->data != snat_req_node->data) { > + /* Zone request has changed for this node. delete old entry > and > + * create new one*/ > + add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED, > + snat_req_node->data, true, > + snat_req_node->name); > + bitmap_set0(ct_zone_bitmap, node->data); > + bitmap_set1(ct_zone_bitmap, snat_req_node->data); > + node->data = snat_req_node->data; > + } > } > > /* xxx This is wasteful to assign a zone to each port--even if no > @@ -756,6 +754,7 @@ update_ct_zones(const struct shash *binding_lports, > } > > simap_destroy(&req_snat_zones); > + simap_destroy(&unreq_snat_zones); > sset_destroy(&all_users); > } > > @@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct engine_node > *node, void *data) > > struct hmap *tracked_dp_bindings = &rt_data->tracked_dp_bindings; > struct tracked_datapath *tdp; > - int scan_start = 0; > + int scan_start = 1; > > bool updated = false; > > -- > 2.37.3 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] IM: amusil <https://red.ht/sig> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
