On Mon, Oct 24, 2022 at 8:53 AM Ales Musil <[email protected]> wrote:

> Hi Mark,
>
> please see my comment below.
>
>
> On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <[email protected]>
> wrote:
>
>> An issue was filed where CT zone 0 was assigned to both a logical router
>> SNAT and to a logical port. CT zone 0 is typically "reserved" and not
>> assigned by ovn-controller; however, since SNAT zones are configurable,
>> it is possible for ovn-controller to assign this zone at the CMS's
>> request. This accounts for how CT zone 0 can be assigned for SNAT. There
>> was also a small bug in the incremental processing that could result in
>> a logical port being assigned zone 0.
>>
>> In the specific issue report, CT zones were restored from the OVSDB when
>> ovn-controller started, and the conflicting CT zones were already
>> present. ovn-controller dutifully loaded these zones up. But then there
>> was nothing that would allow for the conflict to be resolved afterwards.
>>
>> It is unknown how these conflicts entered into the OVSDB in the first
>> place. This change does not purport to prevent conflicts from entering
>> the OVSDB. However, it does make the following changes that should
>> further safeguard against unwanted behavior:
>>
>> * ct_zones_runtime_data_handler() now assigns zones starting at
>>   1 instead of 0. This makes it use the same range as update_ct_zones().
>> * update_ct_zones() now keeps a new simap for zones that are assigned
>>   but not due to an SNAT zone request. This allows for us to guarantee
>>   that when there is a conflict between a previously auto-assigned CT
>>   zone and a newly-requested zone, we are guaranteed to remove the
>>   auto-assigned CT zone.
>> * The removal of conflicting auto-assigned CT zones is now performed
>>   before dealing with the newly requested zone. This makes it so that if
>>   we load conflicting zones from the OVSDB or if there is some issue
>>   that results in conflicting zones being assigned, we will correct the
>>   issue.
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406
>> Signed-off-by: Mark Michelson <[email protected]>
>> ---
>>  controller/ovn-controller.c | 53 ++++++++++++++++++-------------------
>>  1 file changed, 26 insertions(+), 27 deletions(-)
>>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 8895c7a2b..5fd9bbc40 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -659,7 +659,8 @@ update_ct_zones(const struct shash *binding_lports,
>>      const char *user;
>>      struct sset all_users = SSET_INITIALIZER(&all_users);
>>      struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones);
>> -    unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>> +    unsigned long unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)];
>> +    struct simap unreq_snat_zones = SIMAP_INITIALIZER(&unreq_snat_zones);
>>
>>      struct shash_node *shash_node;
>>      SHASH_FOR_EACH (shash_node, binding_lports) {
>> @@ -696,49 +697,46 @@ update_ct_zones(const struct shash *binding_lports,
>>              bitmap_set0(ct_zone_bitmap, ct_zone->data);
>>              simap_delete(ct_zones, ct_zone);
>>          } else if (!simap_find(&req_snat_zones, ct_zone->name)) {
>> -            bitmap_set1(unreq_snat_zones, ct_zone->data);
>> +            bitmap_set1(unreq_snat_zones_map, ct_zone->data);
>> +            simap_put(&unreq_snat_zones, ct_zone->name, ct_zone->data);
>>          }
>>      }
>>
>>      /* Prioritize requested CT zones */
>>      struct simap_node *snat_req_node;
>>      SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
>> -        struct simap_node *node = simap_find(ct_zones,
>> snat_req_node->name);
>> -        if (node) {
>> -            if (node->data == snat_req_node->data) {
>> -                /* No change to this request, so no action needed */
>> -                continue;
>> -            } else {
>> -                /* Zone request has changed for this node. delete old
>> entry */
>> -                bitmap_set0(ct_zone_bitmap, node->data);
>> -                simap_delete(ct_zones, node);
>> -            }
>> -        }
>> -
>>          /* Determine if someone already had this zone auto-assigned.
>>           * If so, then they need to give up their assignment since
>>           * that zone is being explicitly requested now.
>>           */
>> -        if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) {
>> -            struct simap_node *dup;
>> -            SIMAP_FOR_EACH_SAFE (dup, ct_zones) {
>> -                if (dup != snat_req_node && dup->data ==
>> snat_req_node->data) {
>> -                    simap_delete(ct_zones, dup);
>> +        if (bitmap_is_set(unreq_snat_zones_map, snat_req_node->data)) {
>> +            struct simap_node *unreq_node;
>> +            SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) {
>> +                if (unreq_node->data == snat_req_node->data) {
>>                      break;
>>                  }
>>              }
>>
>
> This does not seem right, considering the scenario when we have two zones
> A=0 and B=0 and at the same time requesting SNAT=0,
> the B wouldn't be cleared and there would be conflict. Which makes me
> wonder if it's not the scenario of how we got into the same zone scenario
> in the first place?
> IMO it would make more sense to run the simap_find_and_delete(ct_zones,
> unreq_node->name); in the loop without break, to really remove anything
> that has the same id.
> Having this patch applied the could just run recompute without worrying
> how it got into this situation in the first place and it should fix it.
> WDYT?
>
>
>> +            ovs_assert(unreq_node);
>> +
>> +            simap_find_and_delete(ct_zones, unreq_node->name);
>> +
>>              /* Set this bit to 0 so that if multiple datapaths have
>> requested
>>               * this zone, we don't needlessly double-detect this
>> condition.
>>               */
>> -            bitmap_set0(unreq_snat_zones, snat_req_node->data);
>> +            bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
>>          }
>>
>> -        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
>> -                                  snat_req_node->data, true,
>> -                                  snat_req_node->name);
>> -
>> -        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
>> -        simap_put(ct_zones, snat_req_node->name, snat_req_node->data);
>> +        struct simap_node *node = simap_find(ct_zones,
>> snat_req_node->name);
>> +        if (node && node->data != snat_req_node->data) {
>> +            /* Zone request has changed for this node. delete old entry
>> and
>> +             * create new one*/
>> +            add_pending_ct_zone_entry(pending_ct_zones,
>> CT_ZONE_OF_QUEUED,
>> +                                      snat_req_node->data, true,
>> +                                      snat_req_node->name);
>> +            bitmap_set0(ct_zone_bitmap, node->data);
>> +            bitmap_set1(ct_zone_bitmap, snat_req_node->data);
>> +            node->data = snat_req_node->data;
>> +        }
>>      }
>>
>>      /* xxx This is wasteful to assign a zone to each port--even if no
>> @@ -756,6 +754,7 @@ update_ct_zones(const struct shash *binding_lports,
>>      }
>>
>>      simap_destroy(&req_snat_zones);
>> +    simap_destroy(&unreq_snat_zones);
>>      sset_destroy(&all_users);
>>  }
>>
>> @@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct engine_node
>> *node, void *data)
>>
>>      struct hmap *tracked_dp_bindings = &rt_data->tracked_dp_bindings;
>>      struct tracked_datapath *tdp;
>> -    int scan_start = 0;
>> +    int scan_start = 1;
>>
>>      bool updated = false;
>>
>> --
>> 2.37.3
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Thanks,
> Ales
>
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA <https://www.redhat.com>
>
> [email protected]    IM: amusil
> <https://red.ht/sig>
>


And I have also found a reproducer.
If you run "snat-ct-zone with common NAT zone" with the below diff applied
you should see
multiple zones having zone id=0. It can be also reproduced with your patch
so we probably need to adjust that loop.

diff --git a/tests/ovn.at b/tests/ovn.at
index f8b8db4df..6cf7d93a3 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -32366,6 +32366,9 @@ as hv1
 check ovs-vsctl add-br br-phys
 ovn_attach n1 br-phys 192.168.0.1
 check ovs-vsctl add-port br-int ls0-hv -- set Interface ls0-hv
external-ids:iface-id=ls0-hv
+check ovs-vsctl add-port br-int ls0-hv1 -- set Interface ls0-hv1
external-ids:iface-id=ls0-hv1
+check ovs-vsctl add-port br-int ls0-hv2 -- set Interface ls0-hv2
external-ids:iface-id=ls0-hv2
+check ovs-vsctl set open . external_ids:ovn-monitor-all=true

 check ovn-nbctl lr-add lr0

@@ -32378,6 +32381,12 @@ check ovn-nbctl lrp-add lr0 lr0-ls0
00:00:00:00:00:01 10.0.0.1
 check ovn-nbctl lsp-add ls0 ls0-hv
 check ovn-nbctl lsp-set-addresses ls0-hv "00:00:00:00:00:02 10.0.0.2"

+check ovn-nbctl lsp-add ls0 ls0-hv1
+check ovn-nbctl lsp-set-addresses ls0-hv1 "00:00:00:00:00:03 10.0.0.3"
+
+check ovn-nbctl lsp-add ls0 ls0-hv2
+check ovn-nbctl lsp-set-addresses ls0-hv2 "00:00:00:00:00:04 10.0.0.4"
+
 check ovn-nbctl ls-add ext
 check ovn-nbctl lsp-add ext ext-lr0
 check ovn-nbctl lsp-set-type ext-lr0 router
@@ -32410,7 +32419,7 @@ snat_zone=${snat_zone::-1}
 check test "$dnat_zone" = "$DNAT_ZONE_REG"
 check test "$snat_zone" = "$DNAT_ZONE_REG"

-check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=666
+check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0

 dnat_zone=$(ovs-ofctl dump-flows br-int
table=$DNAT_TABLE,metadata=0x${lr0_dp_key} | grep -o zone=.*, | cut -d '='
-f 2)
 dnat_zone=${dnat_zone::-1}
@@ -32422,6 +32431,19 @@ snat_zone=${snat_zone::-1}
 check test "$dnat_zone" = "$SNAT_ZONE_REG"
 check test "$snat_zone" = "$SNAT_ZONE_REG"

+ovs-appctl -t ovn-controller exit --restart
+
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv="0"
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1="0"
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2="0"
+
+start_daemon ovn-controller
+sleep 2
+
+ovn-appctl ct-zone-list
+
 OVN_CLEANUP([hv1])
 AT_CLEANUP

Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected]    IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to