On 9/14/21 1:27 PM, Mark Gray wrote:
On 13/09/2021 21:41, Mark Michelson wrote:
Hi Mark,

Of all the patches in this series, this is the one I am least sure about.
Thanks for looking over these.


It seems simple enough, but it's not clear to me

1) Why would the number of nat entries on an ovn_datapath differ from
what's in the database? Since this series doesn't introduce any "real"
incremental processing, we should be recreating the ovn_datapath on
every iteration. And on each iteration, we should have a fresh view of
the database, so why would the values differ?
2) Why is this only an issue in destroy_nat_entries()? There are several
other places in the code where we iterate over the number of nat entries
using od->nbr->n_nat. Why are those not affected?
3) How does this patch address that issue? The patch just sets the
ovn_datapath's n_nat_entries to be the same as the database's count,
which according to the commit message, is not always accurate.

Can you provide some clarity here? Maybe tell of a situation where
od->nbr->n_nat is incorrect?

You are absolutely correct as the details in the commit message are not
sufficient to understand. In fact, I had a hard time remembering the
reason myself!

IIRC, on each iteration, we need to call en_runtime_run(). This needs to
initialize the runtime data .. but .. it also needs to destroy the
runtime data from the previous run. This may not be the best way to do
it, but the I-P framework doesn't have a convenient way to clear data
between runs.

Consider an example in which we have completed iteration x, and started
iteration x+1. If we add a NAT entry between iteration x and iteration
x+1, od->nbr->n_nat will contain the updated value of n_nat as the
northbound record will have been updated. However, as we have not yet
(re)initialized the nat entries in od, od->n_nat_entries will be equal
to the previous value. These two values could then be different which
makes this loop invalid. Does this make sense? If so, shall I update the
commit message? How would you like me to proceed?

OK, this makes a lot more sense now. It makes sense that the issue specifically would be seen when destroying datapaths and not at any other point.

Here is my suggestion for how to proceed:
1) Update the commit message to give an explanation similar to what you've given here.

2) Correct the finding that I just found right now and that I mention down below.



On 9/3/21 8:21 AM, Mark Gray wrote:
destroy_nat_entries() iterates over nat_entries using the
number of NAT entries from the NB database. When using I-P,
this behaviour is incorrect as the number of NAT entries in
the NBDB may differ from that stored in 'struct ovn_datapath'
causing unexpected behaviour.

Signed-off-by: Mark Gray <[email protected]>
---
   northd/northd.c | 5 ++++-
   1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/northd/northd.c b/northd/northd.c
index 724baa3994d5..829c4479f14b 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -605,6 +605,8 @@ struct ovn_datapath {
/* NAT entries configured on the router. */
       struct ovn_nat *nat_entries;
+    size_t n_nat_entries;
+

n_nat_entries is declared and never used in this function.

       bool has_distributed_nat;
/* Set of nat external ips on the router. */
@@ -789,6 +791,7 @@ init_nat_entries(struct ovn_datapath *od)
               od->has_distributed_nat = true;
           }
       }
+    od->n_nat_entries = od->nbr->n_nat;
   }
static void
@@ -802,7 +805,7 @@ destroy_nat_entries(struct ovn_datapath *od)
       destroy_lport_addresses(&od->dnat_force_snat_addrs);
       destroy_lport_addresses(&od->lb_force_snat_addrs);
- for (size_t i = 0; i < od->nbr->n_nat; i++) {
+    for (size_t i = 0; i < od->n_nat_entries; i++) {
           destroy_lport_addresses(&od->nat_entries[i].ext_addrs);
       }
   }




_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to