from:"Han Zhou"

Re: [ovs-dev] [Patch ovn] docs: Typo. Remove duplicated "to" in ovn-sb.xml.

2024-04-23 Thread Han Zhou

On Tue, Apr 23, 2024 at 2:50 AM Martin Kalcok 
wrote:
>
> Signed-off-by: Martin Kalcok 
> ---
>  ovn-sb.xml | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index f9fb6c304..bf4689f12 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -1456,7 +1456,7 @@
>
>  ct_dnat sends the packet through the DNAT zone
in
>  connection tracking table to unDNAT any packet that was
DNATed in
> -the opposite direction.  The packet is then automatically
sent to
> +the opposite direction.  The packet is then automatically
sent
>  to the next tables as if followed by next;
action.
>  The next tables will see the changes in the packet caused by
>  the connection tracker.
> @@ -1498,7 +1498,7 @@
>  ct_dnat_in_czone sends the packet through the
common
>  NAT zone (used for both DNAT and SNAT) in connection
tracking table
>  to unDNAT any packet that was DNATed in the opposite
direction.
> -The packet is then automatically sent to to the next tables
as if
> +The packet is then automatically sent to the next tables as
if
>  followed by next; action.  The next tables will
see
>  the changes in the packet caused by the connection tracker.
>
> --
> 2.40.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Martin. Applied to main.

Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovn] northd: Fix the comment about route priorities.

2024-04-22 Thread Han Zhou

The current comments are obviously conflicting.  Fixing it according the
current implementation - static route overrides src-ip route.

Signed-off-by: Han Zhou 
---
 northd/northd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/northd/northd.c b/northd/northd.c
index 331d9c2677b8..dec1eb3679f5 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -271,7 +271,7 @@ static bool default_acl_drop;
  * Route offsets implement logic to prioritize traffic for routes with
  * same ip_prefix values:
  *  -  connected route overrides static one;
- *  -  static route overrides connected route. */
+ *  -  static route overrides src-ip route. */
 #define ROUTE_PRIO_OFFSET_MULTIPLIER 3
 #define ROUTE_PRIO_OFFSET_STATIC 1
 #define ROUTE_PRIO_OFFSET_CONNECTED 2
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] controller: Remove the ovn-set-local-ip option.

2024-04-22 Thread Han Zhou

On Fri, Apr 19, 2024 at 4:51 AM Dumitru Ceara  wrote:
>
> On 4/19/24 13:14, Ales Musil wrote:
> > The local_ip should be present for chassis with single encap whenever
> > we configure its interface in OvS. Not having the local_ip can lead to
> > traffic being dropped on the other side of tunnel because the source
> > IP might be different, this is more likely to happen in pure IPv6
> > deployments.
> >
> > Remove the option as with the local_ip being enforced
> > also for single encap it became "true" in all scenarios, and it's not
> > needed anymore.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-570
> > Signed-off-by: Ales Musil 
> > ---
>
> Hi Han,
>
> When you have time would you mind double checking this in case we missed
> some scenario?
>

Thanks Ales and Dumitru. I wanted to do the same even when I was working on
commit 41eefcb280. I kept the default behavior because setting local_ip
would require incoming tunnel packets' destination IP to match the
local_ip, which is more strict than the old default settings, and I wasn't
sure if any existing user would depend on the old behavior. Thinking it
more carefully, for OVN it seems not possible because the ovn-encap-ip used
as the local_ip is always the one shared to other chassis through SB DB. So
now I think we should be safe to change the default behavior.

Acked-by: Han Zhou 

> Thanks,
> Dumitru
>
> >  NEWS|  3 +++
> >  controller/encaps.c | 31 +++
> >  controller/ovn-controller.8.xml | 14 +---
> >  tests/ovn-controller.at | 38 +++--
> >  4 files changed, 39 insertions(+), 47 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 141f1831c..9adf6a31c 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -13,6 +13,9 @@ Post v24.03.0
> >  "lflow-stage-to-oftable STAGE_NAME" that converts stage name into
OpenFlow
> >  table id.
> >- Rename the ovs-sandbox script to ovn-sandbox.
> > +  - Remove "ovn-set-local-ip" config option from vswitchd
> > +external-ids, the option is no longer needed as it became
effectively
> > +"true" for all scenarios.
> >
> >  OVN v24.03.0 - 01 Mar 2024
> >  --
> > diff --git a/controller/encaps.c b/controller/encaps.c
> > index a9cb604b8..b5ef66371 100644
> > --- a/controller/encaps.c
> > +++ b/controller/encaps.c
> > @@ -208,11 +208,12 @@ out:
> >  static void
> >  tunnel_add(struct tunnel_ctx *tc, const struct sbrec_sb_global *sbg,
> > const char *new_chassis_id, const struct sbrec_encap *encap,
> > -   bool must_set_local_ip, const char *local_ip,
> > +   const char *local_ip,
> > const struct ovsrec_open_vswitch_table *ovs_table)
> >  {
> >  struct smap options = SMAP_INITIALIZER();
> >  smap_add(, "remote_ip", encap->ip);
> > +smap_add(, "local_ip", local_ip);
> >  smap_add(, "key", "flow");
> >  const char *dst_port = smap_get(>options, "dst_port");
> >  const char *csum = smap_get(>options, "csum");
> > @@ -239,7 +240,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct
sbrec_sb_global *sbg,
> >  const struct ovsrec_open_vswitch *cfg =
> >  ovsrec_open_vswitch_table_first(ovs_table);
> >
> > -bool set_local_ip = must_set_local_ip;
> >  if (cfg) {
> >  /* If the tos option is configured, get it */
> >  const char *encap_tos =
> > @@ -259,19 +259,10 @@ tunnel_add(struct tunnel_ctx *tc, const struct
sbrec_sb_global *sbg,
> >  if (encap_df) {
> >  smap_add(, "df_default", encap_df);
> >  }
> > -
> > -if (!set_local_ip) {
> > -/* If ovn-set-local-ip option is configured, get it */
> > -set_local_ip =
> > -get_chassis_external_id_value_bool(
> > ->external_ids, tc->this_chassis->name,
> > -"ovn-set-local-ip", false);
> > -}
> >  }
> >
> >  /* Add auth info if ipsec is enabled. */
> >  if (sbg->ipsec) {
> > -set_local_ip = true;
> >  smap_add(, "remote_name", new_chassis_id);
> >
> >  /* Force NAT-T traversal via configuration */
> > @@ -290,10 +281,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct
sbrec_sb_global *sbg,
> >  }
> >  }
>

Re: [ovs-dev] [ANN] Primary OVS branch renamed as main development branch as main.

2024-04-10 Thread Han Zhou

On Wed, Apr 10, 2024 at 6:52 AM Simon Horman  wrote:
>
> Hi,
>
> I would like to announce that the primary development branch for OvS
> has been renamed main.
>
> The rename occurred a little earlier today.
>
> OVS is currently hosted on GitHub. We can expect the following behaviour
> after the rename:
>
> * GitHub pull requests against master should have been automatically
>   re-homed on main.
> * GitHub Issues should not to be affected - the test issue I
>   created had no association with a branch
> * URLs accessed via the GitHub web UI are automatically renamed
> * Clones may also rename their primary branch - you may
>   get a notification about this in the Web UI
>
> As a result of this change it may be necessary to update your local git
> configuration for checked out branches.
>
> For example:
> # Fetch origin: new remote main branch; remote master branch is deleted
> git fetch -tp origin
> # Rename local branch
> git branch -m master main
> # Update local main branch to use remote main branch as it's upstream
> git branch --set-upstream-to=origin/main main
>
> If you have an automation that fetches the master branch then please
> update the automation to fetch main. If your automation is fetching
> main and falling back to master, then it should now be safe to
> remove the fallback.
>
> This change is in keeping with OVS's recently OVS adopted a policy of
using
> the inclusive naming word list v1 [1, 2].
>
> [1] df5e5cf4318a ("Documentation: Add section on inclusive language.")
> [2] https://inclusivenaming.org/word-lists/
>
> Kind regards,
> Simon
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Simon. Shall this be announced to ovs-announce as well?

Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] netdev-offload: make netdev-offload-tc work with flow-restore-wait

2024-04-09 Thread Han Zhou

On Thu, Mar 21, 2024 at 10:05 AM Ilya Maximets  wrote:
>
> On 3/14/24 06:23, Han Zhou wrote:
> >
> >
> > On Fri, Apr 22, 2022 at 1:41 AM Eelco Chaudron mailto:echau...@redhat.com>> wrote:
> >>
> >>
> >>
> >> On 15 Apr 2022, at 13:25, wenx05124...@163.com  wrote:
> >>
> >> > From: wenxu mailto:we...@chinatelecom.cn>>
> >> >
> >> > The netdev-offload in tc mode can't work with flow-restore-wait.
> >> > When the vswitchd restart with flow-restore-wait, the tc qdisc
> >> > will be delete in netdev_set_flow_api_enabled. The netdev flow
> >> > api can be enabled after the flow-restore-wait flag removing.
> >> >
> >> > Signed-off-by: wenxu >
> >
> > Hi, I found this patch useful, but it seems inactive for a long time. I
hope we can revive and update it.
> >
> > Regardless of the issues pointed out by Eelco, this patch works well
for traffic not going through tunnels, but for tunnelled traffic, e.g.
geneve traffic, I found that even with the patch, when OVS starts, the
ingress qdisc for the genev_sys_6081 device is deleted, so the traffic is
still broken even with flow-restore-wait set. I didn't find yet in the code
where it could be deleted. Any hint/insight would be appreciated.
>
> I'm not sure what is gong on here, but the tunnel offload is special
> as it works via egress qdisc on a bridge port.  I wonder if this one
> is getting messed with on restart.  Just a guess.
>

Thanks Ilya for the hints at the last OVN meeting. I checked again and the
reason why it didn't work for the tunnelled traffic was clear. The egress
qdisc on the bridge port was preserved. It was only because of
the genev_sys_6081, which was recreated when OVS starts, because of in my
test environment it was using an old version of OVS which didn't have the
patch:
b5313a8ceca8 ("ofproto: Fix re-creation of tunnel backing interfaces on
restart.")

After applying the patch, the genev_sys_6081 is not recreated and the
ingress qdisc was preserved for flow-restore-wait, and the tunnelled
traffic kept working during OVS restart.

So, Ilya, what's your feedback to this patch and the comments?

Thanks,
Han

> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] ovn-controller.at: Fix flaky test "ofctrl wait before clearing flows".

2024-04-04 Thread Han Zhou

On Thu, Apr 4, 2024 at 10:03 AM Mark Michelson  wrote:
>
> Thanks for the fix, Han.
>
> Acked-by: Mark Michelson 

Thanks Mark. Applied to main and backported.

Han

>
> On 4/4/24 02:47, Han Zhou wrote:
> > Fixes: bbf2f941965a ("ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.")
> > Signed-off-by: Han Zhou 
> > ---
> >   tests/ovn-controller.at | 11 ++-
> >   1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index 3202f0beff46..f2c792c9cdf6 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -2325,14 +2325,15 @@ AT_CHECK_UNQUOTED([echo $lflow_run_1], [0],
[$lflow_run_2
> >   ])
> >
> >   # Restart OVS this time. Flows should be reinstalled without waiting.
> > +# Set the wait-before-clear to a large value (60s) to make the test
more reliable.
> > +check ovs-vsctl set open .
external_ids:ovn-ofctrl-wait-before-clear=6
> > +check ovn-nbctl --wait=hv sync
> > +
> >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> >
> > -# Sync to make sure ovn-controller is given enough time to install the
flows.
> > -check ovn-nbctl --wait=hv sync
> > -
> > -# Flow should be installed without any extra waiting.
> > -AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF
2.2.2.2], [0], [ignore])
> > +# Flow should be installed without waiting for another 60s.
> > +OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep
-vF 2.2.2.2])
> >
> >   check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
> >   -- ls-lb-add ls1 lb3
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovn] ovn-controller.at: Fix flaky test "ofctrl wait before clearing flows".

2024-04-04 Thread Han Zhou

Fixes: bbf2f941965a ("ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.")
Signed-off-by: Han Zhou 
---
 tests/ovn-controller.at | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 3202f0beff46..f2c792c9cdf6 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -2325,14 +2325,15 @@ AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], 
[$lflow_run_2
 ])
 
 # Restart OVS this time. Flows should be reinstalled without waiting.
+# Set the wait-before-clear to a large value (60s) to make the test more 
reliable.
+check ovs-vsctl set open . external_ids:ovn-ofctrl-wait-before-clear=6
+check ovn-nbctl --wait=hv sync
+
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif 
-vunixctl
 
-# Sync to make sure ovn-controller is given enough time to install the flows.
-check ovn-nbctl --wait=hv sync
-
-# Flow should be installed without any extra waiting.
-AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], 
[0], [ignore])
+# Flow should be installed without waiting for another 60s.
+OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 
2.2.2.2])
 
 check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
 -- ls-lb-add ls1 lb3
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.

2024-04-03 Thread Han Zhou

On Tue, Apr 2, 2024 at 11:11 PM Han Zhou  wrote:
>
>
>
> On Tue, Apr 2, 2024 at 10:48 PM Han Zhou  wrote:
> >
> >
> >
> > On Thu, Mar 28, 2024 at 1:29 PM Mark Michelson 
wrote:
> > >
> > > Thanks Han,
> > >
> > > Acked-by: Mark Michelson 
> >
> > Thanks Mark. Applied to main.
>
> Also backported down to branch-23.06
>
Sorry that I found the test not stable in CI, although I am not able to
reproduce in my local environment.
With a quick look I realized that "check ovn-nbctl --wait=hv sync" may not
be sufficient to ensure the flows are installed after OVS restart. I will
try to find a way to fix this tomorrow.

Thanks,
Han

> Han
> >
> > Han
> > >
> > > On 3/28/24 02:58, Han Zhou wrote:
> > > > The ovn-ofctrl-wait-before-clear setting is designed to minimize
> > > > downtime during the initial start-up of the ovn-controller. For this
> > > > purpose, the ovn-controller should wait only once upon entering the
> > > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent
reconnections
> > > > to the OVS, such as those occurring during an OVS restart/upgrade,
> > > > should not trigger this wait. However, the current implemention
always
> > > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state,
which
> > > > can inadvertently delay flow installations during OVS
restart/upgrade,
> > > > potentially causing more harm than good. (The extent of the impact
> > > > varies based on the method used to restart OVS, including whether
flow
> > > > save/restore tools and the flow-restore-wait feature are employed.)
> > > >
> > > > This patch avoids the unnecessary wait after the initial one.
> > > >
> > > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear
to reduce down time during upgrade.")
> > > > Signed-off-by: Han Zhou 
> > > > ---
> > > > v2: Addressed Mark's comments - made test case more reliable.
> > > >
> > > >   controller/ofctrl.c | 1 -
> > > >   tests/ovn-controller.at | 9 +++--
> > > >   2 files changed, 7 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > > index 6ca2ea4ce63d..6a2564604aa1 100644
> > > > --- a/controller/ofctrl.c
> > > > +++ b/controller/ofctrl.c
> > > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
> > > >   if (!wait_before_clear_time ||
> > > >   (wait_before_clear_expire &&
> > > >time_msec() >= wait_before_clear_expire)) {
> > > > -wait_before_clear_expire = 0;
> > > >   state = S_CLEAR_FLOWS;
> > > >   return;
> > > >   }
> > > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > > > index fdcc5aab2bcf..3202f0beff46 100644
> > > > --- a/tests/ovn-controller.at
> > > > +++ b/tests/ovn-controller.at
> > > > @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller
coverage/read-counter lflow_run)
> > > >   AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
> > > >   ])
> > > >
> > > > -# Restart OVS this time, and wait until flows are reinstalled
> > > > +# Restart OVS this time. Flows should be reinstalled without
waiting.
> > > >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> > > >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> > > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 |
grep -vF 2.2.2.2])
> > > > +
> > > > +# Sync to make sure ovn-controller is given enough time to install
the flows.
> > > > +check ovn-nbctl --wait=hv sync
> > > > +
> > > > +# Flow should be installed without any extra waiting.
> > > > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep
-vF 2.2.2.2], [0], [ignore])
> > > >
> > > >   check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
> > > >   -- ls-lb-add ls1 lb3
> > >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.

2024-04-03 Thread Han Zhou

On Tue, Apr 2, 2024 at 10:48 PM Han Zhou  wrote:
>
>
>
> On Thu, Mar 28, 2024 at 1:29 PM Mark Michelson 
wrote:
> >
> > Thanks Han,
> >
> > Acked-by: Mark Michelson 
>
> Thanks Mark. Applied to main.

Also backported down to branch-23.06

Han
>
> Han
> >
> > On 3/28/24 02:58, Han Zhou wrote:
> > > The ovn-ofctrl-wait-before-clear setting is designed to minimize
> > > downtime during the initial start-up of the ovn-controller. For this
> > > purpose, the ovn-controller should wait only once upon entering the
> > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
> > > to the OVS, such as those occurring during an OVS restart/upgrade,
> > > should not trigger this wait. However, the current implemention always
> > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
> > > can inadvertently delay flow installations during OVS restart/upgrade,
> > > potentially causing more harm than good. (The extent of the impact
> > > varies based on the method used to restart OVS, including whether flow
> > > save/restore tools and the flow-restore-wait feature are employed.)
> > >
> > > This patch avoids the unnecessary wait after the initial one.
> > >
> > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to
reduce down time during upgrade.")
> > > Signed-off-by: Han Zhou 
> > > ---
> > > v2: Addressed Mark's comments - made test case more reliable.
> > >
> > >   controller/ofctrl.c | 1 -
> > >   tests/ovn-controller.at | 9 +++--
> > >   2 files changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > index 6ca2ea4ce63d..6a2564604aa1 100644
> > > --- a/controller/ofctrl.c
> > > +++ b/controller/ofctrl.c
> > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
> > >   if (!wait_before_clear_time ||
> > >   (wait_before_clear_expire &&
> > >time_msec() >= wait_before_clear_expire)) {
> > > -wait_before_clear_expire = 0;
> > >   state = S_CLEAR_FLOWS;
> > >   return;
> > >   }
> > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > > index fdcc5aab2bcf..3202f0beff46 100644
> > > --- a/tests/ovn-controller.at
> > > +++ b/tests/ovn-controller.at
> > > @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller
coverage/read-counter lflow_run)
> > >   AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
> > >   ])
> > >
> > > -# Restart OVS this time, and wait until flows are reinstalled
> > > +# Restart OVS this time. Flows should be reinstalled without waiting.
> > >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> > >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 |
grep -vF 2.2.2.2])
> > > +
> > > +# Sync to make sure ovn-controller is given enough time to install
the flows.
> > > +check ovn-nbctl --wait=hv sync
> > > +
> > > +# Flow should be installed without any extra waiting.
> > > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF
2.2.2.2], [0], [ignore])
> > >
> > >   check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
> > >   -- ls-lb-add ls1 lb3
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.

2024-04-02 Thread Han Zhou

On Thu, Mar 28, 2024 at 1:29 PM Mark Michelson  wrote:
>
> Thanks Han,
>
> Acked-by: Mark Michelson 

Thanks Mark. Applied to main.

Han
>
> On 3/28/24 02:58, Han Zhou wrote:
> > The ovn-ofctrl-wait-before-clear setting is designed to minimize
> > downtime during the initial start-up of the ovn-controller. For this
> > purpose, the ovn-controller should wait only once upon entering the
> > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
> > to the OVS, such as those occurring during an OVS restart/upgrade,
> > should not trigger this wait. However, the current implemention always
> > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
> > can inadvertently delay flow installations during OVS restart/upgrade,
> > potentially causing more harm than good. (The extent of the impact
> > varies based on the method used to restart OVS, including whether flow
> > save/restore tools and the flow-restore-wait feature are employed.)
> >
> > This patch avoids the unnecessary wait after the initial one.
> >
> > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to
reduce down time during upgrade.")
> > Signed-off-by: Han Zhou 
> > ---
> > v2: Addressed Mark's comments - made test case more reliable.
> >
> >   controller/ofctrl.c | 1 -
> >   tests/ovn-controller.at | 9 +++--
> >   2 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > index 6ca2ea4ce63d..6a2564604aa1 100644
> > --- a/controller/ofctrl.c
> > +++ b/controller/ofctrl.c
> > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
> >   if (!wait_before_clear_time ||
> >   (wait_before_clear_expire &&
> >time_msec() >= wait_before_clear_expire)) {
> > -wait_before_clear_expire = 0;
> >   state = S_CLEAR_FLOWS;
> >   return;
> >   }
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index fdcc5aab2bcf..3202f0beff46 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller
coverage/read-counter lflow_run)
> >   AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
> >   ])
> >
> > -# Restart OVS this time, and wait until flows are reinstalled
> > +# Restart OVS this time. Flows should be reinstalled without waiting.
> >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep
-vF 2.2.2.2])
> > +
> > +# Sync to make sure ovn-controller is given enough time to install the
flows.
> > +check ovn-nbctl --wait=hv sync
> > +
> > +# Flow should be installed without any extra waiting.
> > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF
2.2.2.2], [0], [ignore])
> >
> >   check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
> >   -- ls-lb-add ls1 lb3
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.

2024-03-28 Thread Han Zhou

On Wed, Mar 20, 2024 at 4:07 PM Han Zhou  wrote:
>
>
>
> On Mon, Mar 18, 2024 at 11:27 AM Mark Michelson 
wrote:
> >
> > Hi Han,
> >
> > I have a comment below
> >
> > On 3/5/24 01:27, Han Zhou wrote:
> > > The ovn-ofctrl-wait-before-clear setting is designed to minimize
> > > downtime during the initial start-up of the ovn-controller. For this
> > > purpose, the ovn-controller should wait only once upon entering the
> > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
> > > to the OVS, such as those occurring during an OVS restart/upgrade,
> > > should not trigger this wait. However, the current implemention always
> > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
> > > can inadvertently delay flow installations during OVS restart/upgrade,
> > > potentially causing more harm than good. (The extent of the impact
> > > varies based on the method used to restart OVS, including whether flow
> > > save/restore tools and the flow-restore-wait feature are employed.)
> > >
> > > This patch avoids the unnecessary wait after the initial one.
> > >
> > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to
reduce down time during upgrade.")
> > > Signed-off-by: Han Zhou 
> > > ---
> > >   controller/ofctrl.c | 1 -
> > >   tests/ovn-controller.at | 9 +++--
> > >   2 files changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > index f14cd79a8dbb..0d72ecbaa167 100644
> > > --- a/controller/ofctrl.c
> > > +++ b/controller/ofctrl.c
> > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
> > >   if (!wait_before_clear_time ||
> > >   (wait_before_clear_expire &&
> > >time_msec() >= wait_before_clear_expire)) {
> > > -wait_before_clear_expire = 0;
> > >   state = S_CLEAR_FLOWS;
> > >   return;
> > >   }
> > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > > index 37f1ded1bd26..b65e11722cbb 100644
> > > --- a/tests/ovn-controller.at
> > > +++ b/tests/ovn-controller.at
> > > @@ -2284,10 +2284,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller
coverage/read-counter lflow_run)
> > >   AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
> > >   ])
> > >
> > > -# Restart OVS this time, and wait until flows are reinstalled
> > > +# Restart OVS this time. Flows should be reinstalled without waiting.
> > >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> > >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 |
grep -vF 2.2.2.2])
> > > +
> > > +# Sleep for 3s, which is long enough for the flows to be installed,
but
> > > +# shorter than the wait-before-clear (5s), to make sure the flows
are installed
> > > +# without waiting.
> > > +sleep 3
> >
> > This change makes me nervous. The comment makes sense. However, I worry
> > that on slow or loaded systems, relying on the flows to be written
> > within 3 seconds may not always work out.
> >
> > If there were a way to peek into the ofctrl state machine and check that
> > we have moved off of S_WAIT_BEFORE_CLEAR by this point, that might work
> > better. But that is something that is hard to justify exposing.
> >
> > I came up with this possible idea:
> >   * set wait-before-clear to a time longer than OVS_CTL_TIMEOUT (e.g. 60
> > seconds)
> >   * Restart ovs
> >   * Use OVS_WAIT_UNTIL(...), just like the test used to do.
> >
> > This way, we get plenty of opportunities to ensure the flows were
> > written. In most cases, this probably will actually be quicker than the
> > 3 second sleep added in this patch. However, if it takes longer than 3
> > seconds, then the test can still pass. If the flows get written
> > properly, then we know ovn-controller did not wait for the
> > wait-before-clear time.
> >
>
> Hi Mark, thanks for your comment! I agree with you that sleep for 3s is
not very reliable. Your suggestion looks better, but I think there is still
a potential problem. The approach assumes that ovn-controller will always
apply the new settings of ofctrl-wait-before-clear. It is true for the
current implementation, but there is nothing preventing us from removing
this logic, so that ovn-controller ignores any ofctrl-wait-before-

[ovs-dev] [PATCH ovn v2] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.

2024-03-28 Thread Han Zhou

The ovn-ofctrl-wait-before-clear setting is designed to minimize
downtime during the initial start-up of the ovn-controller. For this
purpose, the ovn-controller should wait only once upon entering the
S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
to the OVS, such as those occurring during an OVS restart/upgrade,
should not trigger this wait. However, the current implemention always
waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
can inadvertently delay flow installations during OVS restart/upgrade,
potentially causing more harm than good. (The extent of the impact
varies based on the method used to restart OVS, including whether flow
save/restore tools and the flow-restore-wait feature are employed.)

This patch avoids the unnecessary wait after the initial one.

Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce 
down time during upgrade.")
Signed-off-by: Han Zhou 
---
v2: Addressed Mark's comments - made test case more reliable.

 controller/ofctrl.c | 1 -
 tests/ovn-controller.at | 9 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 6ca2ea4ce63d..6a2564604aa1 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
 if (!wait_before_clear_time ||
 (wait_before_clear_expire &&
  time_msec() >= wait_before_clear_expire)) {
-wait_before_clear_expire = 0;
 state = S_CLEAR_FLOWS;
 return;
 }
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index fdcc5aab2bcf..3202f0beff46 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller 
coverage/read-counter lflow_run)
 AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
 ])
 
-# Restart OVS this time, and wait until flows are reinstalled
+# Restart OVS this time. Flows should be reinstalled without waiting.
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif 
-vunixctl
-OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 
2.2.2.2])
+
+# Sync to make sure ovn-controller is given enough time to install the flows.
+check ovn-nbctl --wait=hv sync
+
+# Flow should be installed without any extra waiting.
+AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], 
[0], [ignore])
 
 check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
 -- ls-lb-add ls1 lb3
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] controller: Track individual address set constants.

2024-03-27 Thread Han Zhou

On Wed, Mar 27, 2024 at 12:46 AM Ales Musil  wrote:
>
> On Wed, Mar 27, 2024 at 7:14 AM Han Zhou  wrote:
>
> >
> >
> > On Tue, Mar 19, 2024 at 9:45 AM Ales Musil  wrote:
> > >
> > >
> > >
> > > On Tue, Mar 19, 2024 at 5:43 PM Ales Musil  wrote:
> > >>
> > >> Instead of tracking address set per struct expr_constant_set track it
> > >> per individual struct expr_constant. This allows more fine grained
> > >> control for I-P processing of address sets in controller. It helps
with
> > >> scenarios like matching on two address sets in one expression e.g.
> > >> "ip4.src == {$as1, $as2}". This allows any addition or removal of
> > >> individual adress from the set to be incrementally processed instead
> > >> of reprocessing all the flows.
> > >>
> > >> This unfortunately doesn't help with the following flows:
> > >> "ip4.src == $as1 && ip4.dst == $as2"
> > >> "ip4.src == $as1 || ip4.dst == $as2"
> > >>
> > >> The memory impact should be minimal as there is only increase of 8
bytes
> > >> per the struct expr_constant.
> > >>
> > >> Signed-off-by: Ales Musil 
> >
> > Thanks Ales for the improvement! The approach looks good to me. Please
see
> > some comments below:
> >
> >
> >
> Hi Han,
>
> thank you for the review.
>
>
> >
> > >> ---
> > >>  controller/lflow.c  |  4 +-
> > >>  include/ovn/actions.h   |  2 +-
> > >>  include/ovn/expr.h  | 46 ++--
> > >>  lib/actions.c   | 20 -
> > >>  lib/expr.c  | 95
+
> > >>  tests/ovn-controller.at | 14 +++---
> > >>  6 files changed, 83 insertions(+), 98 deletions(-)
> > >>
> > >> diff --git a/controller/lflow.c b/controller/lflow.c
> > >> index 895d17d19..730dc879d 100644
> > >> --- a/controller/lflow.c
> > >> +++ b/controller/lflow.c
> > >> @@ -278,7 +278,7 @@ lflow_handle_changed_flows(struct lflow_ctx_in
> > *l_ctx_in,
> > >>  }
> > >>
> > >>  static bool
> > >> -as_info_from_expr_const(const char *as_name, const union
expr_constant
> > *c,
> > >> +as_info_from_expr_const(const char *as_name, const struct
> > expr_constant *c,
> > >>  struct addrset_info *as_info)
> > >>  {
> > >>  as_info->name = as_name;
> > >> @@ -714,7 +714,7 @@ lflow_handle_addr_set_update(const char *as_name,
> > >>  if (as_diff->deleted) {
> > >>  struct addrset_info as_info;
> > >>  for (size_t i = 0; i < as_diff->deleted->n_values; i++)
{
> > >> -union expr_constant *c =
_diff->deleted->values[i];
> > >> +struct expr_constant *c =
_diff->deleted->values[i];
> > >>  if (!as_info_from_expr_const(as_name, c, _info))
{
> > >>  continue;
> > >>  }
> > >> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> > >> index dcacbb1ff..1e20f9b81 100644
> > >> --- a/include/ovn/actions.h
> > >> +++ b/include/ovn/actions.h
> > >> @@ -238,7 +238,7 @@ struct ovnact_next {
> > >>  struct ovnact_load {
> > >>  struct ovnact ovnact;
> > >>  struct expr_field dst;
> > >> -union expr_constant imm;
> > >> +struct expr_constant imm;
> > >>  };
> > >>
> > >>  /* OVNACT_MOVE, OVNACT_EXCHANGE. */
> > >> diff --git a/include/ovn/expr.h b/include/ovn/expr.h
> > >> index c48f82398..e54edb5bf 100644
> > >> --- a/include/ovn/expr.h
> > >> +++ b/include/ovn/expr.h
> > >> @@ -368,7 +368,7 @@ bool expr_relop_from_token(enum lex_type type,
enum
> > expr_relop *relop);
> > >>  struct expr {
> > >>  struct ovs_list node;   /* In parent EXPR_T_AND or EXPR_T_OR
> > if any. */
> > >>  enum expr_type type;/* Expression type. */
> > >> -char *as_name;  /* Address set name. Null if it is
not
> > an
> > >> +const char *as_name;/* Address set name. Null if it is
not
> > an
> > >> address set. */
> > >>
> > >>

Re: [ovs-dev] [PATCH ovn] controller: Track individual address set constants.

2024-03-27 Thread Han Zhou

On Tue, Mar 19, 2024 at 9:45 AM Ales Musil  wrote:
>
>
>
> On Tue, Mar 19, 2024 at 5:43 PM Ales Musil  wrote:
>>
>> Instead of tracking address set per struct expr_constant_set track it
>> per individual struct expr_constant. This allows more fine grained
>> control for I-P processing of address sets in controller. It helps with
>> scenarios like matching on two address sets in one expression e.g.
>> "ip4.src == {$as1, $as2}". This allows any addition or removal of
>> individual adress from the set to be incrementally processed instead
>> of reprocessing all the flows.
>>
>> This unfortunately doesn't help with the following flows:
>> "ip4.src == $as1 && ip4.dst == $as2"
>> "ip4.src == $as1 || ip4.dst == $as2"
>>
>> The memory impact should be minimal as there is only increase of 8 bytes
>> per the struct expr_constant.
>>
>> Signed-off-by: Ales Musil 

Thanks Ales for the improvement! The approach looks good to me. Please see
some comments below:

>> ---
>>  controller/lflow.c  |  4 +-
>>  include/ovn/actions.h   |  2 +-
>>  include/ovn/expr.h  | 46 ++--
>>  lib/actions.c   | 20 -
>>  lib/expr.c  | 95 +
>>  tests/ovn-controller.at | 14 +++---
>>  6 files changed, 83 insertions(+), 98 deletions(-)
>>
>> diff --git a/controller/lflow.c b/controller/lflow.c
>> index 895d17d19..730dc879d 100644
>> --- a/controller/lflow.c
>> +++ b/controller/lflow.c
>> @@ -278,7 +278,7 @@ lflow_handle_changed_flows(struct lflow_ctx_in
*l_ctx_in,
>>  }
>>
>>  static bool
>> -as_info_from_expr_const(const char *as_name, const union expr_constant
*c,
>> +as_info_from_expr_const(const char *as_name, const struct expr_constant
*c,
>>  struct addrset_info *as_info)
>>  {
>>  as_info->name = as_name;
>> @@ -714,7 +714,7 @@ lflow_handle_addr_set_update(const char *as_name,
>>  if (as_diff->deleted) {
>>  struct addrset_info as_info;
>>  for (size_t i = 0; i < as_diff->deleted->n_values; i++) {
>> -union expr_constant *c = _diff->deleted->values[i];
>> +struct expr_constant *c = _diff->deleted->values[i];
>>  if (!as_info_from_expr_const(as_name, c, _info)) {
>>  continue;
>>  }
>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>> index dcacbb1ff..1e20f9b81 100644
>> --- a/include/ovn/actions.h
>> +++ b/include/ovn/actions.h
>> @@ -238,7 +238,7 @@ struct ovnact_next {
>>  struct ovnact_load {
>>  struct ovnact ovnact;
>>  struct expr_field dst;
>> -union expr_constant imm;
>> +struct expr_constant imm;
>>  };
>>
>>  /* OVNACT_MOVE, OVNACT_EXCHANGE. */
>> diff --git a/include/ovn/expr.h b/include/ovn/expr.h
>> index c48f82398..e54edb5bf 100644
>> --- a/include/ovn/expr.h
>> +++ b/include/ovn/expr.h
>> @@ -368,7 +368,7 @@ bool expr_relop_from_token(enum lex_type type, enum
expr_relop *relop);
>>  struct expr {
>>  struct ovs_list node;   /* In parent EXPR_T_AND or EXPR_T_OR if
any. */
>>  enum expr_type type;/* Expression type. */
>> -char *as_name;  /* Address set name. Null if it is not
an
>> +const char *as_name;/* Address set name. Null if it is not
an
>> address set. */
>>
>>  union {
>> @@ -505,40 +505,42 @@ enum expr_constant_type {
>>  };
>>
>>  /* A string or integer constant (one must know which from context). */
>> -union expr_constant {
>> -/* Integer constant.
>> - *
>> - * The width of a constant isn't always clear, e.g. if you write
"1",
>> - * there's no way to tell whether you mean for that to be a 1-bit
constant
>> - * or a 128-bit constant or somewhere in between. */
>> -struct {
>> -union mf_subvalue value;
>> -union mf_subvalue mask; /* Only initialized if 'masked'. */
>> -bool masked;
>> -
>> -enum lex_format format; /* From the constant's lex_token. */
>> -};
>> +struct expr_constant {
>> +const char *as_name;
>>
>> -/* Null-terminated string constant. */
>> -char *string;
>> +union {
>> +/* Integer constant.
>> + *
>> + * The width of a constant isn't always clear, e.g. if you
write "1",
>> + * there's no way to tell whether you mean for that to be a
1-bit
>> + * constant or a 128-bit constant or somewhere in between. */
>> +struct {
>> +union mf_subvalue value;
>> +union mf_subvalue mask; /* Only initialized if 'masked'. */
>> +bool masked;
>> +
>> +enum lex_format format; /* From the constant's lex_token. */
>> +};
>> +
>> +/* Null-terminated string constant. */
>> +char *string;
>> +};
>>  };
>>
>>  bool expr_constant_parse(struct lexer *,
>>   const struct expr_field *,
>> - union expr_constant *);
>> -void

Re: [ovs-dev] [PATCH v2 1/5] ovsdb: raft: Avoid transferring leadership to unavailable servers.

2024-03-26 Thread Han Zhou

 all
> + * the append requests queued up for them before the
leadership
> +     * transfer message or their connection is broken and we
will not
> + * transfer anyway. */
> +threshold = 0;
> +}
> +goto retry;

Thanks Ilya. It seems the retry could try the earlier failed server (e.g.
the ones that raft_send_to_conn() returned false) one or two more times,
but it should be fine because the number of servers are very small anyway.
So this looks good to me.

Acked-by: Han Zhou 

> +}
> +
> +free(servers);
>  }
>
>  /* Send a RemoveServerRequest to the rest of the servers in the cluster.
> --
> 2.44.0
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] controller: Fix ofctrl memory usage underflow.

2024-03-26 Thread Han Zhou

On Wed, Mar 20, 2024 at 12:48 PM Mark Michelson  wrote:
>
> Thanks Ales, looks good to me.
>
> Acked-by: Mark Michelson 
>

Thanks Ales and Mark. I applied to main and backported down to branch-23.06.

Han

> On 3/19/24 11:57, Ales Musil wrote:
> > The memory usage would be increased for size of sb_addrset_ref
> > struct, but decreased for the size of the struct + the name.
> > That would slowly lead to underflows in some cases.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-507
> > Signed-off-by: Ales Musil 
> > ---
> >   controller/ofctrl.c | 10 --
> >   1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > index f14cd79a8..0ef3b8366 100644
> > --- a/controller/ofctrl.c
> > +++ b/controller/ofctrl.c
> > @@ -1112,6 +1112,12 @@ sb_to_flow_size(const struct sb_to_flow *stf)
> >   return sizeof *stf;
> >   }
> >
> > +static size_t
> > +sb_addrset_ref_size(const struct sb_addrset_ref *sar)
> > +{
> > +return sizeof *sar + strlen(sar->name) + 1;
> > +}
> > +
> >   static struct sb_to_flow *
> >   sb_to_flow_find(struct hmap *uuid_flow_table, const struct uuid
*sb_uuid)
> >   {
> > @@ -1181,8 +1187,8 @@ link_flow_to_sb(struct ovn_desired_flow_table
*flow_table,
> >   }
> >   if (!found) {
> >   sar = xmalloc(sizeof *sar);
> > -mem_stats.sb_flow_ref_usage += sizeof *sar;
> >   sar->name = xstrdup(as_info->name);
> > +mem_stats.sb_flow_ref_usage += sb_addrset_ref_size(sar);
> >   hmap_init(>as_ip_to_flow_map);
> >   ovs_list_insert(>addrsets, >list_node);
> >   }
> > @@ -1568,7 +1574,7 @@ remove_flows_from_sb_to_flow(struct
ovn_desired_flow_table *flow_table,
> >   free(itfn);
> >   }
> >   hmap_destroy(>as_ip_to_flow_map);
> > -mem_stats.sb_flow_ref_usage -= (sizeof *sar +
strlen(sar->name) + 1);
> > +mem_stats.sb_flow_ref_usage -= sb_addrset_ref_size(sar);
> >   free(sar->name);
> >   free(sar);
> >   }
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 5/5] ovsdb: raft: Fix inability to join after leadership change round trip.

2024-03-25 Thread Han Zhou

uster.at
> index 482e4e02d..9d8b4d06a 100644
> --- a/tests/ovsdb-cluster.at
> +++ b/tests/ovsdb-cluster.at
> @@ -525,6 +525,59 @@ for i in $(seq $n); do
>  OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid])
>  done
>
> +AT_CLEANUP
> +
> +AT_SETUP([OVSDB cluster - leadership change before replication while
joining])
> +AT_KEYWORDS([ovsdb server negative unix cluster join])
> +
> +n=5
> +AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db
dnl
> +  $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [],
[stderr])
> +cid=$(ovsdb-tool db-cid s1.db)
> +schema_name=$(ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema)
> +for i in $(seq 2 $n); do
> +AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft
unix:s1.raft])
> +done
> +
> +on_exit 'kill $(cat *.pid)'
> +on_exit "
> +  for i in \$(ls $(pwd)/s[[0-$n]]); do
> +ovs-appctl --timeout 1 -t \$i cluster/status $schema_name;
> +  done
> +"
> +
> +dnl Starting servers one by one asking all exisitng servers to transfer
> +dnl leadership right after starting to add a server.  Joining server will
> +dnl need to find a new leader that will also transfer leadership.
> +dnl This will continue until the same server will not become a leader
> +dnl for the second time and will be able to add a new server.
> +for i in $(seq $n); do
> +dnl Make sure that all already started servers joined the cluster.
> +for j in $(seq $((i - 1)) ); do
> +AT_CHECK([ovsdb_client_wait unix:s$j.ovsdb $schema_name
connected])
> +done
> +for j in $(seq $((i - 1)) ); do
> +OVS_WAIT_UNTIL([ovs-appctl -t "$(pwd)"/s$j \
> +  cluster/failure-test \
> +transfer-leadership-after-starting-to-add \
> +| grep -q "engaged"])
> +done
> +
> +AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off \
> +   --detach --no-chdir --log-file=s$i.log \
> +   --pidfile=s$i.pid --unixctl=s$i \
> +   --remote=punix:s$i.ovsdb s$i.db])
> +done
> +
> +dnl Make sure that all servers joined the cluster.
> +for i in $(seq $n); do
> +AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
> +done
> +
> +for i in $(seq $n); do
> +OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid])
> +done
> +
>  AT_CLEANUP
>
>
> --
> 2.43.0
>

Thanks Ilya. Looks good to me.

Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 4/5] ovsdb: raft: Fix assertion when 1-node cluster looses leadership.

2024-03-25 Thread Han Zhou

On Fri, Mar 15, 2024 at 1:15 PM Ilya Maximets  wrote:
>
> Some of the failure tests can make a single-node cluster to
> loose leadership.  In this case the next raft_run() will
> trigger election with a pre-vore enabled.  This is causing

s/pre-vore/pre-vote

> an assertion when this server attempts to vote for itself.
>
> Fix that by not using pre-voting if the is only one server.

s/the/there

>
> A new failure test introduced in later commit triggers this
> assertion every time.
>
> Fixes: 85634fd58004 ("ovsdb: raft: Support pre-vote mechanism to deal
with disruptive server.")
> Signed-off-by: Ilya Maximets 

Thanks for the fix.
Acked-by: Han Zhou 

> ---
>  ovsdb/raft.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> index 237d7ebf5..c41419052 100644
> --- a/ovsdb/raft.c
> +++ b/ovsdb/raft.c
> @@ -2083,7 +2083,7 @@ raft_run(struct raft *raft)
>  raft_start_election(raft, true, false);
>  }
>  } else {
> -raft_start_election(raft, true, false);
> +raft_start_election(raft, hmap_count(>servers) > 1,
false);
>  }
>
>  }
> --
> 2.43.0
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 3/5] ovsdb: raft: Fix permanent joining state on a cluster member.

2024-03-25 Thread Han Zhou

 * join request and will commit addition of this server
ourselves. */
> +VLOG_INFO_RL(, "elected as leader while joining");
> +raft->joining = false;
> +}
> +
>  struct raft_server *s;
>  HMAP_FOR_EACH (s, hmap_node, >servers) {
>  raft_server_init_leader(raft, s);
> @@ -2968,12 +2981,12 @@ raft_update_commit_index(struct raft *raft,
uint64_t new_commit_index)
>  }
>
>  while (raft->commit_index < new_commit_index) {
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
>  uint64_t index = ++raft->commit_index;
>  const struct raft_entry *e = raft_get_entry(raft, index);
>
>  if (raft_entry_has_data(e)) {
>  struct raft_command *cmd = raft_find_command_by_eid(raft,
>eid);
> -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
5);
>
>  if (cmd) {
>  if (!cmd->index && raft->role == RAFT_LEADER) {
> @@ -3017,6 +3030,35 @@ raft_update_commit_index(struct raft *raft,
uint64_t new_commit_index)
>   * reallocate raft->entries, which would invalidate 'e', so
>   * this case must be last, after the one for 'e->data'. */
>  raft_run_reconfigure(raft);
> +} else if (e->servers &&
!raft_has_uncommitted_configuration(raft)) {
> +struct ovsdb_error *error;
> +struct raft_server *s;
> +struct hmap servers;
> +
> +    error = raft_servers_from_json(e->servers, );
> +ovs_assert(!error);
> +HMAP_FOR_EACH (s, hmap_node, ) {
> +struct raft_server *server = raft_find_server(raft,
>sid);
> +
> +if (server && server->phase == RAFT_PHASE_COMMITTING) {
> +/* This server lost leadership while committing
> + * seever 's', but it was committed later by a

nit: typo s/seever/server

Acked-by: Han Zhou 

> + * new leader. */
> +server->phase = RAFT_PHASE_STABLE;
> +}
> +
> +if (raft->joining && uuid_equals(>sid, >sid)) {
> +/* Leadership change happened before previous leader
> + * could commit the change of a servers list, but it
> + * was replicated and a new leader committed it. */
> +VLOG_INFO_RL(,
> +"added to configuration without reply "
> +"(eid: "UUID_FMT", commit index: %"PRIu64")",
> +UUID_ARGS(>eid), index);
> +raft->joining = false;
> +}
> +}
> +raft_servers_destroy();
>  }
>  }
>
> diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
> index 481afc08b..482e4e02d 100644
> --- a/tests/ovsdb-cluster.at
> +++ b/tests/ovsdb-cluster.at
> @@ -473,6 +473,59 @@ done
>
>  AT_CLEANUP
>
> +AT_SETUP([OVSDB cluster - leadership change after replication while
joining])
> +AT_KEYWORDS([ovsdb server negative unix cluster join])
> +
> +n=5
> +AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db
dnl
> +  $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [],
[stderr])
> +cid=$(ovsdb-tool db-cid s1.db)
> +schema_name=$(ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema)
> +for i in $(seq 2 $n); do
> +AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft
unix:s1.raft])
> +done
> +
> +on_exit 'kill $(cat *.pid)'
> +on_exit "
> +  for i in \$(ls $(pwd)/s[[0-$n]]); do
> +ovs-appctl --timeout 1 -t \$i cluster/status $schema_name;
> +  done
> +"
> +
> +dnl Starting servers one by one asking all exisitng servers to transfer
> +dnl leadership after append reply forcing the joining server to try
another
> +dnl one that will also transfer leadership.  Since transfer is happening
> +dnl after the servers update is replicated to other servers, one of the
> +dnl other servers will actually commit it.  It may be a new leader from
> +dnl one of the old members or the new joining server itself.
> +for i in $(seq $n); do
> +dnl Make sure that all already started servers joined the cluster.
> +for j in $(seq $((i - 1)) ); do
> +AT_CHECK([ovsdb_client_wait unix:s$j.ovsdb $schema_name
connected])
> +done
> +for j in $(seq $((i - 1)) ); do
> +OVS_WAIT_UNTIL([ovs-appctl -t "$(pwd)"/s$j \
> +  cluster/failure-test \
> +
 transfer-leadership-after-sending-append-request \
> +| grep -q "engaged"])
> +done
> +
> +AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off \
> +   --detach --no-chdir --log-file=s$i.log \
> +   --pidfile=s$i.pid --unixctl=s$i \
> +   --remote=punix:s$i.ovsdb s$i.db])
> +done
> +
> +dnl Make sure that all servers joined the cluster.
> +for i in $(seq $n); do
> +AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
> +done
> +
> +for i in $(seq $n); do
> +OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid])
> +done
> +
> +AT_CLEANUP
>
>
>  OVS_START_SHELL_HELPERS
> --
> 2.43.0
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/5] ovsdb: raft: Fix time intervals for multitasking while joining.

2024-03-25 Thread Han Zhou

On Fri, Mar 15, 2024 at 1:15 PM Ilya Maximets  wrote:
>
> While joining, ovsdb-server may not wake up for a duration of a join
> timer, which is 1 second and is by default 3x larger than a heartbeat
> timer.  This is causing unnecessary warnings from the cooperative
> multitasking module that thinks that we missed the heartbeat time by
> a lot.
>
> Use join timer (1000) instead while joining.
>
> Fixes: d4a15647b917 ("ovsdb: raft: Enable cooperative multitasking.")
> Signed-off-by: Ilya Maximets 
> ---
>
> CC: Frode Nordahl 
>
>  ovsdb/raft.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> index 25f462431..57e27bf73 100644
> --- a/ovsdb/raft.c
> +++ b/ovsdb/raft.c
> @@ -2126,10 +2126,11 @@ raft_run(struct raft *raft)
>  raft_reset_ping_timer(raft);
>  }
>
> +uint64_t interval = raft->joining
> +? 1000 :
RAFT_TIMER_THRESHOLD(raft->election_timer);

nit: the hardcoded joining timer value 1000 is used at least 3 places, so
probably better to define a macro for it.

Acked-by: Han Zhou 

>  cooperative_multitasking_set(
>  _run_cb, (void *) raft, time_msec(),
> -RAFT_TIMER_THRESHOLD(raft->election_timer)
> -+ RAFT_TIMER_THRESHOLD(raft->election_timer) / 10, "raft_run");
> +interval + interval / 10, "raft_run");
>
>  /* Do this only at the end; if we did it as soon as we set
raft->left or
>   * raft->failed in handling the RemoveServerReply, then it could
easily
> --
> 2.43.0
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/5] ovsdb: raft: Randomize leadership transfer.

2024-03-25 Thread Han Zhou

On Tue, Mar 19, 2024 at 12:05 AM Felix Huettner via dev <
ovs-dev@openvswitch.org> wrote:
>
> On Mon, Mar 18, 2024 at 05:52:12PM +0100, Ilya Maximets wrote:
> > On 3/18/24 17:15, Felix Huettner wrote:
> > > On Fri, Mar 15, 2024 at 09:14:49PM +0100, Ilya Maximets wrote:
> > >> Each cluster member typically always transfers leadership to the same
> > >> other member, which is the first in their list of servers.  This may
> > >> result in two servers in a 3-node cluster to transfer leadership to
> > >> each other and never to the third one.
> > >>
> > >> Randomizing the selection to make the load more evenly distributed.
> > >
> > > Hi Ilya,
> >
> > Hi, Felix.  Thanks for the comments!
> >
> > >
> > > just out of curiosity: since basically only one of the 3 members is
> > > active at any point in time, is balancing the load even relevant. It
> > > will always only be on one of the 3 members anyway.
> > It is not very important, I agree.  What I observed in practice is
> > that sometimes if, for example, compactions happen in approximately
> > similar time, the server we transfer the leadership to may send it
> > right back, while the first server is busy compacting.  This is
> > less of a problem today as well, since we have parallel compaction,
> > but it may still be annoying if that happens every time.
> >
> > I'm mostly making this patch for the purpose of better testing below.
> >
> > >
> > >>
> > >> This also makes cluster failure tests cover more scenarios as servers
> > >> will transfer leadership to servers they didn't before.  This is
> > >> important especially for cluster joining tests.
> > >>
> > >> Ideally, we would transfer to a random server with a highest apply
> > >> index, but not trying to implement this for now.
> > >>
> > >> Signed-off-by: Ilya Maximets 
> > >> ---
> > >>  ovsdb/raft.c | 6 +-
> > >>  1 file changed, 5 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> > >> index f463afcb3..25f462431 100644
> > >> --- a/ovsdb/raft.c
> > >> +++ b/ovsdb/raft.c
> > >> @@ -1261,8 +1261,12 @@ raft_transfer_leadership(struct raft *raft,
const char *reason)
> > >>  return;
> > >>  }
> > >>
> > >> +size_t n = hmap_count(>servers) * 3;
> > >>  struct raft_server *s;
> > >> -HMAP_FOR_EACH (s, hmap_node, >servers) {
> > >> +
> > >> +while (n--) {
> > >> +s = CONTAINER_OF(hmap_random_node(>servers),
> > >> + struct raft_server, hmap_node);
> > >>  if (!uuid_equals(>sid, >sid)
> > >>  && s->phase == RAFT_PHASE_STABLE) {
> > >>  struct raft_conn *conn = raft_find_conn_by_sid(raft,
>sid);
> > >
> > > i think this has the risk of never selecting one server out of the
list of
> > > cluster members. Suppose you have a 3 node cluster where one of them
> > > members is down. In this case there is a single member the leadership
> > > can be transfered to.
> > > This means for a single iteration of the while loop has a chance of
2/3
> > > to select a member that can not be used. Over the 9 iterations this
> > > would do this would give a chance of (2/3)^9 to always choose an
> > > inappropriate member. This equals to a chance of 0.026% of never
> > > selecting the single appropriate target member.
> > >
> > > Could we instead rather start iterating the hmap at some random
offset?
> > > That should give a similar result while giving the guarantee that we
> > > visit each member once.
> >
> > I don't think it's very important, because we're transferring leadership
> > without verifying if the other side is alive and we're not checking if
we
> > actually transferred it or not.  So, retries are basically just for the
> > server not hitting itself or servers that didn't join yet.  We will
transfer
> > to the first other server that already joined regardless of the
iteration
> > method.
> >
> > The way to mostly fix the issue with dead servers is, as I mentioned, to
> > transfer only to the servers with the highest apply index, i.e. the
servers
> > that acknowledged the most amount of changes.  This will ensure that we
> > at least heard something from the server in the recent past.  But that's
> > a separate feature to implement.
> >
> > Also, the leadership transfer is just an optimization to speed up
elections,
> > so it's not necessary for correctness for this operation to be
successful.
> > If we fail to transfer or transfer to the dead server, the rest of the
> > cluster will notice the absence of the leader and initiate election by
> > the timeout.
> >
> > Best regards, Ilya Maximets.
>
> Hi Ilya,
>
> thanks for the clarifications.
> Then i guess the 0.026% chance is not too relevant.
>

Thanks for the patch and the discussion.
I still have a concern regarding the retry logic. Maybe it is not a
critical problem if the function failed to select any server to transfer,
but I'd still prefer this function to be more predictable. For example, we
can store the servers of the hmap to an

Re: [ovs-dev] [PATCH ovn] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.

2024-03-20 Thread Han Zhou

On Mon, Mar 18, 2024 at 11:27 AM Mark Michelson  wrote:
>
> Hi Han,
>
> I have a comment below
>
> On 3/5/24 01:27, Han Zhou wrote:
> > The ovn-ofctrl-wait-before-clear setting is designed to minimize
> > downtime during the initial start-up of the ovn-controller. For this
> > purpose, the ovn-controller should wait only once upon entering the
> > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
> > to the OVS, such as those occurring during an OVS restart/upgrade,
> > should not trigger this wait. However, the current implemention always
> > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
> > can inadvertently delay flow installations during OVS restart/upgrade,
> > potentially causing more harm than good. (The extent of the impact
> > varies based on the method used to restart OVS, including whether flow
> > save/restore tools and the flow-restore-wait feature are employed.)
> >
> > This patch avoids the unnecessary wait after the initial one.
> >
> > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to
reduce down time during upgrade.")
> > Signed-off-by: Han Zhou 
> > ---
> >   controller/ofctrl.c | 1 -
> >   tests/ovn-controller.at | 9 +++--
> >   2 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > index f14cd79a8dbb..0d72ecbaa167 100644
> > --- a/controller/ofctrl.c
> > +++ b/controller/ofctrl.c
> > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
> >   if (!wait_before_clear_time ||
> >   (wait_before_clear_expire &&
> >time_msec() >= wait_before_clear_expire)) {
> > -wait_before_clear_expire = 0;
> >   state = S_CLEAR_FLOWS;
> >   return;
> >   }
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index 37f1ded1bd26..b65e11722cbb 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -2284,10 +2284,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller
coverage/read-counter lflow_run)
> >   AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
> >   ])
> >
> > -# Restart OVS this time, and wait until flows are reinstalled
> > +# Restart OVS this time. Flows should be reinstalled without waiting.
> >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep
-vF 2.2.2.2])
> > +
> > +# Sleep for 3s, which is long enough for the flows to be installed, but
> > +# shorter than the wait-before-clear (5s), to make sure the flows are
installed
> > +# without waiting.
> > +sleep 3
>
> This change makes me nervous. The comment makes sense. However, I worry
> that on slow or loaded systems, relying on the flows to be written
> within 3 seconds may not always work out.
>
> If there were a way to peek into the ofctrl state machine and check that
> we have moved off of S_WAIT_BEFORE_CLEAR by this point, that might work
> better. But that is something that is hard to justify exposing.
>
> I came up with this possible idea:
>   * set wait-before-clear to a time longer than OVS_CTL_TIMEOUT (e.g. 60
> seconds)
>   * Restart ovs
>   * Use OVS_WAIT_UNTIL(...), just like the test used to do.
>
> This way, we get plenty of opportunities to ensure the flows were
> written. In most cases, this probably will actually be quicker than the
> 3 second sleep added in this patch. However, if it takes longer than 3
> seconds, then the test can still pass. If the flows get written
> properly, then we know ovn-controller did not wait for the
> wait-before-clear time.
>

Hi Mark, thanks for your comment! I agree with you that sleep for 3s is not
very reliable. Your suggestion looks better, but I think there is still a
potential problem. The approach assumes that ovn-controller will always
apply the new settings of ofctrl-wait-before-clear. It is true for the
current implementation, but there is nothing preventing us from removing
this logic, so that ovn-controller ignores any ofctrl-wait-before-clear
setting changes after startup. In fact, it is more reasonable to ignore it.
So, let's assume ovn-controller doesn't take care of the changes of the
settings. In this test case, the setting is initially 5s when
ovn-controller starts, and later changing it to 60s doesn't take effect and
ovn-controller still uses the 5s value. And then let's assume
ovn-controller still waits

Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix busy loop when ofctrl is disconnected.

2024-03-20 Thread Han Zhou

On Wed, Mar 20, 2024 at 2:56 AM Dumitru Ceara  wrote:
>
> On 3/20/24 06:41, Han Zhou wrote:
> > ovn-controller runs at 100% cpu when OVS exits. This is because the
> > ofctrl_run is not called while ofctrl_wait is always called in the main
> > loop. Because of the missing ofctrl_run, it doesn't even detect that the
> > ofctrl connection is disconnected.
> >
> > This patch fixes the issue by always giving a chance to run ofctrl_run
> > as long as ofctrl_wait is called.
> >
> > Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and
meter IDs to 16bit.")
> > Fixes: 94cbc59dc0f1 ("ovn-controller: Fix use of dangling pointers in
I-P runtime_data.")
> > Signed-off-by: Han Zhou 
> > ---
>
> Thanks for the patch, Han!  I tested it in a sandbox and it fixes the
> issue.  Looks good to me:
>
> Acked-by: Dumitru Ceara 
>
Thanks Dumitru. I applied to main and backported down to branch-23.06.

> Regards,
> Dumitru
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovn] ovn-controller: Fix busy loop when ofctrl is disconnected.

2024-03-19 Thread Han Zhou

ovn-controller runs at 100% cpu when OVS exits. This is because the
ofctrl_run is not called while ofctrl_wait is always called in the main
loop. Because of the missing ofctrl_run, it doesn't even detect that the
ofctrl connection is disconnected.

This patch fixes the issue by always giving a chance to run ofctrl_run
as long as ofctrl_wait is called.

Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter IDs 
to 16bit.")
Fixes: 94cbc59dc0f1 ("ovn-controller: Fix use of dangling pointers in I-P 
runtime_data.")
Signed-off-by: Han Zhou 
---
 controller/ofctrl.c | 2 +-
 controller/ovn-controller.c | 9 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index f14cd79a8dbb..1a2d2be791eb 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -787,7 +787,7 @@ ofctrl_run(const struct ovsrec_bridge *br_int,
 
 rconn_run(swconn);
 
-if (!rconn_is_connected(swconn)) {
+if (!rconn_is_connected(swconn) || !pending_ct_zones) {
 return reconnected;
 }
 
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 1c9960c708bf..c9ff5967a2af 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -5736,10 +5736,11 @@ main(int argc, char *argv[])
 }
 }
 
-if (br_int && ovs_feature_set_discovered()) {
+if (br_int) {
 ct_zones_data = engine_get_data(_ct_zones);
-if (ct_zones_data && ofctrl_run(br_int, ovs_table,
-_zones_data->pending)) {
+if (ofctrl_run(br_int, ovs_table,
+   ct_zones_data ? _zones_data->pending
+ : NULL)) {
 static struct vlog_rate_limit rl
 = VLOG_RATE_LIMIT_INIT(1, 1);
 
@@ -5748,7 +5749,7 @@ main(int argc, char *argv[])
 engine_set_force_recompute(true);
 }
 
-if (chassis) {
+if (chassis && ovs_feature_set_discovered()) {
 encaps_run(ovs_idl_txn, br_int,
sbrec_chassis_table_get(ovnsb_idl_loop.idl),
chassis,
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] netdev-offload: make netdev-offload-tc work with flow-restore-wait

2024-03-13 Thread Han Zhou

On Fri, Apr 22, 2022 at 1:41 AM Eelco Chaudron  wrote:
>
>
>
> On 15 Apr 2022, at 13:25, wenx05124...@163.com wrote:
>
> > From: wenxu 
> >
> > The netdev-offload in tc mode can't work with flow-restore-wait.
> > When the vswitchd restart with flow-restore-wait, the tc qdisc
> > will be delete in netdev_set_flow_api_enabled. The netdev flow
> > api can be enabled after the flow-restore-wait flag removing.
> >
> > Signed-off-by: wenxu 

Hi, I found this patch useful, but it seems inactive for a long time. I
hope we can revive and update it.

Regardless of the issues pointed out by Eelco, this patch works well for
traffic not going through tunnels, but for tunnelled traffic, e.g. geneve
traffic, I found that even with the patch, when OVS starts, the ingress
qdisc for the genev_sys_6081 device is deleted, so the traffic is still
broken even with flow-restore-wait set. I didn't find yet in the code where
it could be deleted. Any hint/insight would be appreciated.

> > ---
> >  lib/netdev-linux.c | 16 
> >  vswitchd/bridge.c  | 18 ++
> >  2 files changed, 18 insertions(+), 16 deletions(-)
> >
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index 9d12502..b6315e3 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -2784,17 +2784,17 @@ netdev_linux_set_policing(struct netdev
*netdev_, uint32_t kbits_rate,
> >  goto out;
> >  }
> >
> > -/* Remove any existing ingress qdisc. */
> > -error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS);
> > -if (error) {
> > -VLOG_WARN_RL(, "%s: removing policing failed: %s",
> > - netdev_name, ovs_strerror(error));
> > -goto out;
> > -}
> > -
> >  if (kbits_rate || kpkts_rate) {
> >  const char *cls_name = "matchall";
> >
> > +/* Remove any existing ingress qdisc. */
> > +error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS);
> > +if (error) {
> > +VLOG_WARN_RL(, "%s: removing policing failed: %s",
> > + netdev_name, ovs_strerror(error));
> > +goto out;
> > +}
>
> Are we sure we are not breaking a corner case here where something might
already have been configured, and we are not cleaning it up?
>
> I.e. what if we configured some rate limiting, and now we unconfigure it,
the values are all zero, and it will not be removed?
>

Agree. This is incorrect. I think it is better to use the approach from v1,
to avoid configuring qos when flow-restore-wait=true, hopefully good enough
for the short period before flow restore is done.

> > +
> >  error = tc_add_del_qdisc(ifindex, true, 0, TC_INGRESS);
> >  if (error) {
> >  VLOG_WARN_RL(, "%s: adding policing qdisc failed: %s",
> > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> > index e328d8e..d8843a7 100644
> > --- a/vswitchd/bridge.c
> > +++ b/vswitchd/bridge.c
> > @@ -3288,8 +3288,17 @@ bridge_run(void)
> >  }
> >  cfg = ovsrec_open_vswitch_first(idl);
> >
> > +/* Once the value of flow-restore-wait is false, we no longer
should
> > + * check its value from the database. */
> > +if (cfg && ofproto_get_flow_restore_wait()) {
> > +ofproto_set_flow_restore_wait(smap_get_bool(>other_config,
> > +"flow-restore-wait", false));
> > +}
> > +
> >  if (cfg) {
> > -netdev_set_flow_api_enabled(>other_config);
> > +if (!ofproto_get_flow_restore_wait()) {
> > +netdev_set_flow_api_enabled(>other_config);
>
> See my previous comment here, this needs more input:
>
> However, the main problem with this patch might be the following
statement in the documentation:
>
> “The default value is false. Changing this value requires
restarting the daemon”
>
> I’ve dealt with a case in the past that was failing because I enabled
offload but did not restart. Unfortunately, I can not remember the details,
I think it had something to do with feature detection. I’ve copied on some
more folks who might know more details about this requirement.
>
>
Based on my test, this seems to be fine, but I am not sure if I am just
lucky with my environment.

Thanks,
Han

>
> > +}
> >  dpdk_init(>other_config);
> >  userspace_tso_init(>other_config);
> >  }
> > @@ -3300,13 +3309,6 @@ bridge_run(void)
> >   * returns immediately. */
> >  bridge_init_ofproto(cfg);
> >
> > -/* Once the value of flow-restore-wait is false, we no longer
should
> > - * check its value from the database. */
> > -if (cfg && ofproto_get_flow_restore_wait()) {
> > -ofproto_set_flow_restore_wait(smap_get_bool(>other_config,
> > -"flow-restore-wait", false));
> > -}
> > -
> >  bridge_run__();
> >
> >  /* Re-configure SSL.  We do this on every trip through the main
loop,
>
> ___
> dev mailing list
>

Re: [ovs-dev] [PATCH] ofproto-dpif: Fix vxlan with different name del/add failed.

2024-03-13 Thread Han Zhou

Thanks Tao for fixing this. I think the title can be more generic because
this problem and fix applies to all tunnel types rather than just VXLAN.

On Tue, Mar 12, 2024 at 7:04 AM Tao Liu  wrote:
>
> Reproduce:
>   ovs-vsctl add-port br-int p0 \
> -- set interface p0 type=vxlan options:remote_ip=10.10.10.1
>
>   sleep 2
>
>   ovs-vsctl --if-exists del-port p0 \
> -- add-port br-int p1 \
> -- set interface p1 type=vxlan options:remote_ip=10.10.10.1
>   ovs-vsctl: Error detected while setting up 'p1': could not add
>   network device p1 to ofproto (File exists).
>
> vswitchd log:
>   bridge|INFO|bridge br-int: added interface p0 on port 1106
>   bridge|INFO|bridge br-int: deleted interface p0 on port 1106
>   tunnel|WARN|p1: attempting to add tunnel port with same config as port
'p0' (::->10.10.10.1, key=0, legacy_l2, dp port=122)
>   ofproto|WARN|br-int: could not add port p1 (File exists)
>   bridge|WARN|could not add network device p1 to ofproto (File exists)
>
> CallTrace:
>   bridge_reconfigure
> bridge_del_ports
>   port_destroy
> iface_destroy__
>   netdev_remove <-- netdev p0 removed
> bridge_delete_or_reconfigure_ports
>   OFPROTO_PORT_FOR_EACH
> ofproto_port_dump_next
>   port_dump_next
>   port_query_by_name<-- netdev_shash do not contain p0
> ofproto_port_del<-- p0 do not del in ofproto
> bridge_add_ports
>   bridge_add_ports__
> iface_create
>   iface_do_create
> ofproto_port_add<-- p1 add failed
>
> Fixes: fe83f81df977 ("netdev: Remove netdev from global shash when the
user is changing interface configuration.")
> Signed-off-by: Tao Liu 
> ---
>  ofproto/ofproto-dpif.c | 13 +
>  tests/tunnel.at| 12 
>  2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index f59d69c4d..0cac3050d 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -3905,14 +3905,19 @@ port_query_by_name(const struct ofproto
*ofproto_, const char *devname,
>
>  if (sset_contains(>ghost_ports, devname)) {
>  const char *type = netdev_get_type_from_name(devname);
> +const struct ofport *ofport =
> +shash_find_data(>up.port_by_name,
devname);
> +if (!type && ofport && ofport->netdev) {
> +type = netdev_get_type(ofport->netdev);
> +}
>
>  /* We may be called before ofproto->up.port_by_name is populated
with
>   * the appropriate ofport.  For this reason, we must get the
name and
> - * type from the netdev layer directly. */
> + * type from the netdev layer directly.
> + * When a port deleted, the corresponding netdev is also removed
from
> + * netdev_shash. netdev_get_type_from_name returns NULL in such
case.
> + * We should try to get type from ofport->netdev. */

nit: this comment is better to be moved to the above where we are trying to
get the type from ofport.

Otherwise looks good to me:
Acked-by: Han Zhou 
Tested-by: Han Zhou 

>  if (type) {
> -const struct ofport *ofport;
> -
> -ofport = shash_find_data(>up.port_by_name, devname);
>  ofproto_port->ofp_port = ofport ? ofport->ofp_port :
OFPP_NONE;
>  ofproto_port->name = xstrdup(devname);
>  ofproto_port->type = xstrdup(type);
> diff --git a/tests/tunnel.at b/tests/tunnel.at
> index 71e7c2df4..9d539ee6f 100644
> --- a/tests/tunnel.at
> +++ b/tests/tunnel.at
> @@ -1269,6 +1269,18 @@ OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])]
>  AT_CLEANUP
>
> +AT_SETUP([tunnel - re-create port with different name])
> +OVS_VSWITCHD_START(
> +  [add-port br0 p0 -- set int p0 type=vxlan
options:remote_ip=10.10.10.1])
> +
> +AT_CHECK([ovs-vsctl --if-exists del-port p0 -- \
> +  add-port br0 p1 -- \
> +  set int p1 type=vxlan options:remote_ip=10.10.10.1])
> +
> +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])]
> +AT_CLEANUP
> +
>  AT_SETUP([tunnel - SRV6 basic])
>  OVS_VSWITCHD_START([add-port br0 p1 -- set Interface p1 type=dummy \
>  ofport_request=1 \
> --
> 2.31.1
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC] bridge: Retry tunnel port addition for conflict.

2024-03-10 Thread Han Zhou

On Fri, Mar 8, 2024 at 12:17 AM Tao Liu  wrote:
>
>
> On 3/7/24 5:39 AM, Ilya Maximets wrote:
> > On 2/27/24 20:14, Han Zhou wrote:
> >> For kernel datapath, when a new tunnel port is created in the same
> >> transaction in which an old tunnel port with the same tunnel
> >> configuration is deleted, the new tunnel port creation will fail and
> >> left in an error state. This can be easily reproduced in OVN by
> >> triggering a chassis deletion and addition with the same encap in the
> >> same SB DB transaction, such as:
> >>
> >> ovn-sbctl chassis-add aa geneve 1.2.3.4
> >> ovn-sbctl chassis-del aa -- chassis-add bb 1.2.3.4
> >>
> >> ovs-vsctl show | grep error
> >> error: "could not add network device ovn-bb-0 to ofproto (File
exists)"
> >>
> >> Related logs in OVS:
> >> —
> >> 2024-02-23T05:41:49.978Z|405933|bridge|INFO|bridge br-int: deleted
interface ovn-aa-0 on port 113
> >> 2024-02-23T05:41:49.989Z|405935|tunnel|WARN|ovn-bb-0: attempting
to add tunnel port with same config as port 'ovn-aa-0' (::->1.2.3.4,
key=flow, legacy_l2, dp port=9)
> >> 2024-02-23T05:41:49.989Z|405936|ofproto|WARN|br-int: could not add
port ovn-bb-0 (File exists)
> >> 2024-02-23T05:41:49.989Z|405937|bridge|WARN|could not add network
device ovn-bb-0 to ofproto (File exists)
> >> —
> >
> > Hi, Han.  Thanks for the patch!
> >
> >>
> >> Depending on when there are other OVSDB changes, it may take a long
time
> >> for the device to be added successfully, triggered by the next OVS
> >> iteration.
> >>
> >> (note: native tunnel ports do not have this problem)
> >
> > I don't think this is correct.  The code path is common for both system
> > and native tunnels.  I can reproduce the issues in a sandbox with:
> >
> > $ make -j8 sandbox SANDBOXFLAGS="\-\-dummy='system'"
> > [tutorial]$ ovs-vsctl add-port br0 tunnel_port \
> >  -- set Interface tunnel_port \
> > type=geneve options:remote_ip=flow
options:key=123
> > [tutorial]$ ovs-vsctl del-port tunnel_port \
> >  -- add-port br0 tunnel_port2 \
> >  -- set Interface tunnel_port2 \
> > type=geneve options:remote_ip=flow
options:key=123
> > ovs-vsctl: Error detected while setting up 'tunnel_port2':
> > could not add network device tunnel_port2 to ofproto (File exists).
> > See ovs-vswitchd log for details.
> >
> > The same should work in a testsuite as well, i.e. we should be able to
> > create a test for this scenario.
> >
> > Note: The --dummy=system prevents OVS from replacing tunnel ports with
> >dummy ones.
> >

Thanks Ilya for the correction! --dummy=system is very helpful.

> >>
> >> The problem is how the tunnel port deletion and creation are handled.
In
> >> bridge_reconfigure(), port deletion is handled before creation, to
avoid
> >> such resource conflict. However, for kernel tunnel ports, the real
clean
> >> up is performed at the end of the bridge_reconfigure() in the:
> >> bridge_run__()->ofproto_run()->...->ofproto_dpif:port_destruct()
> >>
> >> We cannot call bridge_run__() at an earlier point before all
> >> reconfigurations are done, so this patch tries a generic approach to
> >> just re-run the bridge_reconfigure() when there are any port creations
> >> encountered "File exists" error, which indicates a possible resource
> >> conflict may have happened due to a later deleted port, and retry may
> >> succeed.
> >>
> >> Signed-off-by: Han Zhou 
> >> ---
> >> This is RFC because I am not sure if there is a better way to solve
the problem
> >> more specifically by executing the port_destruct for the old port
before trying
> >> to create the new port. The fix may be more complex though.
> >
> > I don't think re-trying is a good approach in general.  We should likely
> > just destroy the tnl_port structure right away, similarly to how we
clean
> > up stp, lldp and bundles in ofproto_port_unregister().  Maybe we can add
> > a new callback similar to bundle_remove() and call tnl_port_del() from
it?
> > (I didn't try, so I'm not 100% sure this will not cause any issues.)
> >
> > What do you think?
> >
> > Best regards, Ilya Maximets.
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.o

[ovs-dev] [PATCH ovn] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once.

2024-03-04 Thread Han Zhou

The ovn-ofctrl-wait-before-clear setting is designed to minimize
downtime during the initial start-up of the ovn-controller. For this
purpose, the ovn-controller should wait only once upon entering the
S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
to the OVS, such as those occurring during an OVS restart/upgrade,
should not trigger this wait. However, the current implemention always
waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
can inadvertently delay flow installations during OVS restart/upgrade,
potentially causing more harm than good. (The extent of the impact
varies based on the method used to restart OVS, including whether flow
save/restore tools and the flow-restore-wait feature are employed.)

This patch avoids the unnecessary wait after the initial one.

Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce 
down time during upgrade.")
Signed-off-by: Han Zhou 
---
 controller/ofctrl.c | 1 -
 tests/ovn-controller.at | 9 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index f14cd79a8dbb..0d72ecbaa167 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
 if (!wait_before_clear_time ||
 (wait_before_clear_expire &&
  time_msec() >= wait_before_clear_expire)) {
-wait_before_clear_expire = 0;
 state = S_CLEAR_FLOWS;
 return;
 }
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 37f1ded1bd26..b65e11722cbb 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -2284,10 +2284,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller 
coverage/read-counter lflow_run)
 AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
 ])
 
-# Restart OVS this time, and wait until flows are reinstalled
+# Restart OVS this time. Flows should be reinstalled without waiting.
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif 
-vunixctl
-OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 
2.2.2.2])
+
+# Sleep for 3s, which is long enough for the flows to be installed, but
+# shorter than the wait-before-clear (5s), to make sure the flows are installed
+# without waiting.
+sleep 3
+AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], 
[0], [ignore])
 
 check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
 -- ls-lb-add ls1 lb3
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] encaps: Support backward compatibility for tunnel chassis id change.

2024-03-01 Thread Han Zhou

On Fri, Mar 1, 2024 at 2:26 AM Dumitru Ceara  wrote:
>
> On 2/16/24 07:52, Han Zhou wrote:
> > In commit 41eefcb2807d, the format of external_ids:ovn-chassis-id for
> > tunnels was modified to include the local encapsulation IP. This change
> > can lead to the recreation of tunnels during an upgrade, potentially
> > disrupting the dataplane temporarily, especially in large-scale
> > environments.
> >
> > This patch resolves the issue by recognizing the previous format. Thus,
> > if the only modification is to the ID format, the tunnel will not be
> > recreated. Instead, the external_ids will be updated directly. This
> > approach ensures that the upgrade process is non-disruptive.
> >
> > Fixes: 41eefcb2807d ("encaps: Create separate tunnels for multiple
local encap IPs.")
> > Signed-off-by: Han Zhou 
> > ---
>
> Hi Han,
>
> >  controller/encaps.c | 83 -
> >  tests/ovn-controller.at | 44 ++
> >  2 files changed, 110 insertions(+), 17 deletions(-)
> >
> > diff --git a/controller/encaps.c b/controller/encaps.c
> > index 28237f6191c8..1d0d47523e77 100644
> > --- a/controller/encaps.c
> > +++ b/controller/encaps.c
> > @@ -104,40 +104,62 @@ encaps_tunnel_id_create(const char *chassis_id,
const char *remote_encap_ip,
> >   '%', local_encap_ip);
> >  }
> >
> > +/*
> > + * The older version of encaps_tunnel_id_create, which doesn't include
> > + * local_encap_ip in the ID. This is used for backward compatibility
support.
> > + */
> > +static char *
> > +encaps_tunnel_id_create_old(const char *chassis_id,
>
> Nit: I'd call this encaps_tunnel_id_create_legacy().

Ack

>
> > +const char *remote_encap_ip)
> > +{
> > +return xasprintf("%s%c%s", chassis_id, '@', remote_encap_ip);
> > +}
> > +
> >  /*
> >   * Parses a 'tunnel_id' of the form @%.
> >   * If the 'chassis_id' argument is not NULL the function will allocate
memory
> >   * and store the chassis_name part of the tunnel-id at '*chassis_id'.
> >   * Same for remote_encap_ip and local_encap_ip.
> > + *
> > + * The old form @ is also supported for
backward
> > + * compatibility during upgrade.
> >   */
> >  bool
> >  encaps_tunnel_id_parse(const char *tunnel_id, char **chassis_id,
> > char **remote_encap_ip, char **local_encap_ip)
> >  {
> > -/* Find the @.  Fail if there is no @ or if any part is empty. */
> > -const char *d = strchr(tunnel_id, '@');
> > -if (d == tunnel_id || !d || !d[1]) {
> > -return false;
> > +char *tokstr = xstrdup(tunnel_id);
> > +char *saveptr = NULL;
> > +bool ret = false;
> > +
> > +char *token_chassis = strtok_r(tokstr, "@", );
> > +if (!token_chassis) {
> > +goto out;
> >  }
> >
> > -/* Find the %.  Fail if there is no % or if any part is empty. */
> > -const char *d2 = strchr(d + 1, '%');
> > -if (d2 == d + 1 || !d2 || !d2[1]) {
> > -return false;
> > +char *token_remote_ip = strtok_r(NULL, "%", );
> > +if (!token_remote_ip) {
> > +goto out;
> >  }
> >
> > +char *token_local_ip = strtok_r(NULL, "", );
> > +
> >  if (chassis_id) {
> > -*chassis_id = xmemdup0(tunnel_id, d - tunnel_id);
> > +*chassis_id = xstrdup(token_chassis);
> >  }
> > -
> >  if (remote_encap_ip) {
> > -*remote_encap_ip = xmemdup0(d + 1, d2 - (d + 1));
> > +*remote_encap_ip = xstrdup(token_remote_ip);
> >  }
> > -
> >  if (local_encap_ip) {
> > -*local_encap_ip = xstrdup(d2 + 1);
> > +/* To support backward compatibility during upgrade, ignore
local ip if
> > + * it is not encoded in the tunnel id yet. */
> > +*local_encap_ip = token_local_ip ? xstrdup(token_local_ip) :
NULL;
>
> This can be simplified to:
>
> *local_encap_ip = nullable_xstrdup(token_local_ip);

Ack

>
> >  }
> > -return true;
> > +
> > +ret = true;
> > +out:
> > +free(tokstr);
> > +return ret;
> >  }
>
> I think I'd use strsep instead, seems a tiny bit cleaner.  I see
> encaps_tunnel_id_match() also uses strtok_r() so I'll leave it up
> to you to decide whether we should follow the same style or not.
>
> I'll leave the strsep() alternative here just in case:
>
> bool
> enca

[ovs-dev] [RFC] bridge: Retry tunnel port addition for conflict.

2024-02-27 Thread Han Zhou

For kernel datapath, when a new tunnel port is created in the same
transaction in which an old tunnel port with the same tunnel
configuration is deleted, the new tunnel port creation will fail and
left in an error state. This can be easily reproduced in OVN by
triggering a chassis deletion and addition with the same encap in the
same SB DB transaction, such as:

ovn-sbctl chassis-add aa geneve 1.2.3.4
ovn-sbctl chassis-del aa -- chassis-add bb 1.2.3.4

ovs-vsctl show | grep error
error: "could not add network device ovn-bb-0 to ofproto (File exists)"

Related logs in OVS:
—
2024-02-23T05:41:49.978Z|405933|bridge|INFO|bridge br-int: deleted interface 
ovn-aa-0 on port 113
2024-02-23T05:41:49.989Z|405935|tunnel|WARN|ovn-bb-0: attempting to add 
tunnel port with same config as port 'ovn-aa-0' (::->1.2.3.4, key=flow, 
legacy_l2, dp port=9)
2024-02-23T05:41:49.989Z|405936|ofproto|WARN|br-int: could not add port 
ovn-bb-0 (File exists)
2024-02-23T05:41:49.989Z|405937|bridge|WARN|could not add network device 
ovn-bb-0 to ofproto (File exists)
—

Depending on when there are other OVSDB changes, it may take a long time
for the device to be added successfully, triggered by the next OVS
iteration.

(note: native tunnel ports do not have this problem)

The problem is how the tunnel port deletion and creation are handled. In
bridge_reconfigure(), port deletion is handled before creation, to avoid
such resource conflict. However, for kernel tunnel ports, the real clean
up is performed at the end of the bridge_reconfigure() in the:
bridge_run__()->ofproto_run()->...->ofproto_dpif:port_destruct()

We cannot call bridge_run__() at an earlier point before all
reconfigurations are done, so this patch tries a generic approach to
just re-run the bridge_reconfigure() when there are any port creations
encountered "File exists" error, which indicates a possible resource
conflict may have happened due to a later deleted port, and retry may
succeed.

Signed-off-by: Han Zhou 
---
This is RFC because I am not sure if there is a better way to solve the problem
more specifically by executing the port_destruct for the old port before trying
to create the new port. The fix may be more complex though.

 tests/tunnel.at   |  1 +
 vswitchd/bridge.c | 47 ---
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/tests/tunnel.at b/tests/tunnel.at
index 71e7c2df4eae..d570c78790c7 100644
--- a/tests/tunnel.at
+++ b/tests/tunnel.at
@@ -1059,6 +1059,7 @@ AT_CHECK([ovs-vsctl add-port br0 p2 -- set Interface p2 
type=vxlan \
 
 AT_CHECK([grep 'p2: could not set configuration (File exists)' 
ovs-vswitchd.log | sed "s/^.*\(p2:.*\)$/\1/"], [0],
   [p2: could not set configuration (File exists)
+p2: could not set configuration (File exists)
 ])
 
 OVS_VSWITCHD_STOP(["/p2: VXLAN-GBP, and non-VXLAN-GBP tunnels can't be 
configured on the same dst_port/d
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 95a65fcdcd5e..9057da98e6c0 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -278,7 +278,7 @@ static void bridge_delete_ofprotos(void);
 static void bridge_delete_or_reconfigure_ports(struct bridge *);
 static void bridge_del_ports(struct bridge *,
  const struct shash *wanted_ports);
-static void bridge_add_ports(struct bridge *,
+static bool bridge_add_ports(struct bridge *,
  const struct shash *wanted_ports);
 
 static void bridge_configure_datapath_id(struct bridge *);
@@ -333,8 +333,8 @@ static void mirror_refresh_stats(struct mirror *);
 
 static void iface_configure_lacp(struct iface *,
  struct lacp_member_settings *);
-static bool iface_create(struct bridge *, const struct ovsrec_interface *,
- const struct ovsrec_port *);
+static int iface_create(struct bridge *, const struct ovsrec_interface *,
+const struct ovsrec_port *);
 static bool iface_is_internal(const struct ovsrec_interface *iface,
   const struct ovsrec_bridge *br);
 static const char *iface_get_type(const struct ovsrec_interface *,
@@ -858,7 +858,9 @@ datapath_reconfigure(const struct ovsrec_open_vswitch *cfg)
 }
 }
 
-static void
+/* Returns true if any ports addition failed and may need retry. Otherwise
+ * return false. */
+static bool
 bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
 {
 struct sockaddr_in *managers;
@@ -943,8 +945,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch 
*ovs_cfg)
 
 config_ofproto_types(_cfg->other_config);
 
+bool need_retry = false;
 HMAP_FOR_EACH (br, node, _bridges) {
-bridge_add_ports(br, >wanted_ports);
+if (bridge_add_ports(br, >wanted_ports)) {
+need_retry = true;
+}
 shash_destroy(>wanted_ports);
 }
 
@@ -1003,6 +1008,7 @@ br

Re: [ovs-dev] [PATCH ovn] controller: ofctrl: Use index for meter lookups.

2024-02-26 Thread Han Zhou

On Mon, Feb 26, 2024 at 9:44 AM Ilya Maximets  wrote:
>
> Currently, ovn-controller attempts to sync all the meters on each
> ofctrl_put() call.  And the complexity of this logic is quadratic
> because for each desired meter we perform a full scan of all the
> rows in the Southbound Meter table in order to lookup a matching
> meter.  This is very inefficient.  In a setup with 25K meters this
> operation takes anywhere from 30 to 60 seconds to perform.  All
> that time ovn-controller is blocked and doesn't process any updates.
> So, addition of new ports in such a setup becomes very slow.
>
> The meter lookup is performed by name and we have an index for it
> in the database schema.  Might as well use it.
>
> Using the index for lookup reduces complexity to O(n * log n).
> And the time to process port addition on the same setup drops down
> to just 100 - 300 ms.
>
> We are still iterating over all the desired meters while they can
> probably be processed incrementally instead.  But using an index
> is a simpler fix for now.
>
> Fixes: 885655e16e63 ("controller: reconfigure ovs meters for ovn meters")
> Fixes: 999e1adfb572 ("ovn: Support configuring meters through SB Meter
table.")
> Reported-at: https://issues.redhat.com/browse/FDP-399
> Signed-off-by: Ilya Maximets 
> ---
>
> This is a "performance bug", so the decision to backport this or not
> is on maintainers.  But it is severe enough, IMO.
>

Thanks Ilya. The fix looks good to me. And I think it is ok to backport,
since the change is simple enough.

Acked-by: Han Zhou 

Just curious, how would the OVS perform with this large number of meters?

Thanks,
Han
>  controller/ofctrl.c | 37 ++---
>  controller/ofctrl.h |  2 +-
>  controller/ovn-controller.c |  4 +++-
>  3 files changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index cb460a2a4..f14cd79a8 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -2257,18 +2257,29 @@ ofctrl_meter_bands_erase(struct
ovn_extend_table_info *entry,
>  }
>  }
>
> +static const struct sbrec_meter *
> +sb_meter_lookup_by_name(struct ovsdb_idl_index *sbrec_meter_by_name,
> +const char *name)
> +{
> +const struct sbrec_meter *sb_meter;
> +struct sbrec_meter *index_row;
> +
> +index_row = sbrec_meter_index_init_row(sbrec_meter_by_name);
> +sbrec_meter_index_set_name(index_row, name);
> +sb_meter = sbrec_meter_index_find(sbrec_meter_by_name, index_row);
> +sbrec_meter_index_destroy_row(index_row);
> +
> +return sb_meter;
> +}
> +
>  static void
>  ofctrl_meter_bands_sync(struct ovn_extend_table_info *m_existing,
> -const struct sbrec_meter_table *meter_table,
> +struct ovsdb_idl_index *sbrec_meter_by_name,
>  struct ovs_list *msgs)
>  {
>  const struct sbrec_meter *sb_meter;
> -SBREC_METER_TABLE_FOR_EACH (sb_meter, meter_table) {
> -if (!strcmp(m_existing->name, sb_meter->name)) {
> -break;
> -}
> -}
>
> +sb_meter = sb_meter_lookup_by_name(sbrec_meter_by_name,
m_existing->name);
>  if (sb_meter) {
>  /* OFPMC13_ADD or OFPMC13_MODIFY */
>  ofctrl_meter_bands_update(sb_meter, m_existing, msgs);
> @@ -2280,16 +2291,12 @@ ofctrl_meter_bands_sync(struct
ovn_extend_table_info *m_existing,
>
>  static void
>  add_meter(struct ovn_extend_table_info *m_desired,
> -  const struct sbrec_meter_table *meter_table,
> +  struct ovsdb_idl_index *sbrec_meter_by_name,
>struct ovs_list *msgs)
>  {
>  const struct sbrec_meter *sb_meter;
> -SBREC_METER_TABLE_FOR_EACH (sb_meter, meter_table) {
> -if (!strcmp(m_desired->name, sb_meter->name)) {
> -break;
> -}
> -}
>
> +sb_meter = sb_meter_lookup_by_name(sbrec_meter_by_name,
m_desired->name);
>  if (!sb_meter) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>  VLOG_ERR_RL(, "could not find meter named \"%s\"",
m_desired->name);
> @@ -2656,7 +2663,7 @@ ofctrl_put(struct ovn_desired_flow_table
*lflow_table,
> struct ovn_desired_flow_table *pflow_table,
> struct shash *pending_ct_zones,
> struct hmap *pending_lb_tuples,
> -   const struct sbrec_meter_table *meter_table,
> +   struct ovsdb_idl_index *sbrec_meter_by_name,
> uint64_t req_cfg,
> bool lflows_changed,
> bool pflows_changed)
> @@ -2733,10 +2740,10 @@ ofctrl_put(struct

Re: [ovs-dev] [PATCH ovn] physical: Don't reset encap ID across pipelines.

2024-02-20 Thread Han Zhou

On Fri, Feb 16, 2024 at 2:50 PM Numan Siddique  wrote:
>
> On Mon, Feb 12, 2024 at 2:49 PM Han Zhou  wrote:
> >
> > The MFF_LOG_ENCAP_ID register was defined to save the encap ID and avoid
> > changing across pipelines, but in the function
> > load_logical_ingress_metadata it was reset unconditionally. Because of
> > this, the encap selection doesn't work when traffic traverses L3
> > boundaries. This patch fixes it by ensuring the register is loaded only
> > in the flows of table 0, which is where packets from VIFs enter the
> > pipeline for the first time.
> >
> > Fixes: 17b6a12fa286 ("ovn-controller: Support VIF-based local encap IPs
selection.")
> > Signed-off-by: Han Zhou 
>
> Acked-by: Numan Siddique 
>
> Numan
>

Thanks Numan. Applied to main and backported to branch-24.03.

Han
> > ---
> >  controller/physical.c | 28 ---
> >  tests/ovn.at  | 83 ++-
> >  2 files changed, 98 insertions(+), 13 deletions(-)
> >
> > diff --git a/controller/physical.c b/controller/physical.c
> > index c32642d2c69b..7ef259da44b1 100644
> > --- a/controller/physical.c
> > +++ b/controller/physical.c
> > @@ -73,7 +73,8 @@ load_logical_ingress_metadata(const struct
sbrec_port_binding *binding,
> >const struct zone_ids *zone_ids,
> >size_t n_encap_ips,
> >const char **encap_ips,
> > -  struct ofpbuf *ofpacts_p);
> > +  struct ofpbuf *ofpacts_p,
> > +  bool load_encap_id);
> >  static int64_t get_vxlan_port_key(int64_t port_key);
> >
> >  /* UUID to identify OF flows not associated with ovsdb rows. */
> > @@ -689,7 +690,7 @@ put_replace_chassis_mac_flows(const struct simap
*ct_zones,
> >  ofpact_put_STRIP_VLAN(ofpacts_p);
> >  }
> >  load_logical_ingress_metadata(localnet_port, _ids, 0,
NULL,
> > -  ofpacts_p);
> > +  ofpacts_p, true);
> >  replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p);
> >  replace_mac->mac = router_port_mac;
> >
> > @@ -1047,7 +1048,8 @@ load_logical_ingress_metadata(const struct
sbrec_port_binding *binding,
> >const struct zone_ids *zone_ids,
> >size_t n_encap_ips,
> >const char **encap_ips,
> > -  struct ofpbuf *ofpacts_p)
> > +  struct ofpbuf *ofpacts_p,
> > +  bool load_encap_id)
> >  {
> >  put_zones_ofpacts(zone_ids, ofpacts_p);
> >
> > @@ -1057,13 +1059,15 @@ load_logical_ingress_metadata(const struct
sbrec_port_binding *binding,
> >  put_load(dp_key, MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
> >  put_load(port_key, MFF_LOG_INPORT, 0, 32, ofpacts_p);
> >
> > -/* Default encap_id 0. */
> > -size_t encap_id = 0;
> > -if (encap_ips && binding->encap) {
> > -encap_id = encap_ip_to_id(n_encap_ips, encap_ips,
> > -  binding->encap->ip);
> > +if (load_encap_id) {
> > +/* Default encap_id 0. */
> > +size_t encap_id = 0;
> > +if (encap_ips && binding->encap) {
> > +encap_id = encap_ip_to_id(n_encap_ips, encap_ips,
> > +  binding->encap->ip);
> > +}
> > +put_load(encap_id, MFF_LOG_ENCAP_ID, 16, 16, ofpacts_p);
> >  }
> > -put_load(encap_id, MFF_LOG_ENCAP_ID, 16, 16, ofpacts_p);
> >  }
> >
> >  static const struct sbrec_port_binding *
> > @@ -1108,7 +1112,7 @@ setup_rarp_activation_strategy(const struct
sbrec_port_binding *binding,
> >  match_set_dl_type(, htons(ETH_TYPE_RARP));
> >  match_set_in_port(, ofport);
> >
> > -load_logical_ingress_metadata(binding, zone_ids, 0, NULL,
);
> > +load_logical_ingress_metadata(binding, zone_ids, 0, NULL,
, true);
> >
> >  encode_controller_op(ACTION_OPCODE_ACTIVATION_STRATEGY_RARP,
> >   NX_CTLR_NO_METER, );
> > @@ -1522,7 +1526,7 @@ consider_port_binding(struct ovsdb_idl_index
*sbrec_port_binding_by_name,
> >  put_load(0, MFF_LOG_CT_ZONE, 0, 16, ofpacts_p);
> >  struct zone_ids peer_zones = get_zone_ids(peer, ct_zones);
> >  load_logical_ingress_metadata(pe

[ovs-dev] [PATCH ovn] encaps: Support backward compatibility for tunnel chassis id change.

2024-02-15 Thread Han Zhou

In commit 41eefcb2807d, the format of external_ids:ovn-chassis-id for
tunnels was modified to include the local encapsulation IP. This change
can lead to the recreation of tunnels during an upgrade, potentially
disrupting the dataplane temporarily, especially in large-scale
environments.

This patch resolves the issue by recognizing the previous format. Thus,
if the only modification is to the ID format, the tunnel will not be
recreated. Instead, the external_ids will be updated directly. This
approach ensures that the upgrade process is non-disruptive.

Fixes: 41eefcb2807d ("encaps: Create separate tunnels for multiple local encap 
IPs.")
Signed-off-by: Han Zhou 
---
 controller/encaps.c | 83 -
 tests/ovn-controller.at | 44 ++
 2 files changed, 110 insertions(+), 17 deletions(-)

diff --git a/controller/encaps.c b/controller/encaps.c
index 28237f6191c8..1d0d47523e77 100644
--- a/controller/encaps.c
+++ b/controller/encaps.c
@@ -104,40 +104,62 @@ encaps_tunnel_id_create(const char *chassis_id, const 
char *remote_encap_ip,
  '%', local_encap_ip);
 }
 
+/*
+ * The older version of encaps_tunnel_id_create, which doesn't include
+ * local_encap_ip in the ID. This is used for backward compatibility support.
+ */
+static char *
+encaps_tunnel_id_create_old(const char *chassis_id,
+const char *remote_encap_ip)
+{
+return xasprintf("%s%c%s", chassis_id, '@', remote_encap_ip);
+}
+
 /*
  * Parses a 'tunnel_id' of the form @%.
  * If the 'chassis_id' argument is not NULL the function will allocate memory
  * and store the chassis_name part of the tunnel-id at '*chassis_id'.
  * Same for remote_encap_ip and local_encap_ip.
+ *
+ * The old form @ is also supported for backward
+ * compatibility during upgrade.
  */
 bool
 encaps_tunnel_id_parse(const char *tunnel_id, char **chassis_id,
char **remote_encap_ip, char **local_encap_ip)
 {
-/* Find the @.  Fail if there is no @ or if any part is empty. */
-const char *d = strchr(tunnel_id, '@');
-if (d == tunnel_id || !d || !d[1]) {
-return false;
+char *tokstr = xstrdup(tunnel_id);
+char *saveptr = NULL;
+bool ret = false;
+
+char *token_chassis = strtok_r(tokstr, "@", );
+if (!token_chassis) {
+goto out;
 }
 
-/* Find the %.  Fail if there is no % or if any part is empty. */
-const char *d2 = strchr(d + 1, '%');
-if (d2 == d + 1 || !d2 || !d2[1]) {
-return false;
+char *token_remote_ip = strtok_r(NULL, "%", );
+if (!token_remote_ip) {
+goto out;
 }
 
+char *token_local_ip = strtok_r(NULL, "", );
+
 if (chassis_id) {
-*chassis_id = xmemdup0(tunnel_id, d - tunnel_id);
+*chassis_id = xstrdup(token_chassis);
 }
-
 if (remote_encap_ip) {
-*remote_encap_ip = xmemdup0(d + 1, d2 - (d + 1));
+*remote_encap_ip = xstrdup(token_remote_ip);
 }
-
 if (local_encap_ip) {
-*local_encap_ip = xstrdup(d2 + 1);
+/* To support backward compatibility during upgrade, ignore local ip if
+ * it is not encoded in the tunnel id yet. */
+*local_encap_ip = token_local_ip ? xstrdup(token_local_ip) : NULL;
 }
-return true;
+
+ret = true;
+out:
+free(tokstr);
+return ret;
 }
 
 /*
@@ -145,6 +167,10 @@ encaps_tunnel_id_parse(const char *tunnel_id, char 
**chassis_id,
  *  @%
  * contains 'chassis_id' and, if specified, the given 'remote_encap_ip' and
  * 'local_encap_ip'. Returns false otherwise.
+ *
+ * The old format @ is also supported for backward
+ * compatibility during upgrade, and the local_encap_ip matching is ignored in
+ * that case.
  */
 bool
 encaps_tunnel_id_match(const char *tunnel_id, const char *chassis_id,
@@ -166,8 +192,10 @@ encaps_tunnel_id_match(const char *tunnel_id, const char 
*chassis_id,
 }
 
 char *token_local_ip = strtok_r(NULL, "", );
-if (local_encap_ip &&
-(!token_local_ip || strcmp(token_local_ip, local_encap_ip))) {
+if (!token_local_ip) {
+/* It is old format. To support backward compatibility during upgrade,
+ * just ignore local_ip. */
+} else if (local_encap_ip && strcmp(token_local_ip, local_encap_ip)) {
 goto out;
 }
 
@@ -189,6 +217,7 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 const char *dst_port = smap_get(>options, "dst_port");
 const char *csum = smap_get(>options, "csum");
 char *tunnel_entry_id = NULL;
+char *tunnel_entry_id_old = NULL;
 
 /*
  * Since a chassis may have multiple encap-ip, we can't just add the
@@ -198,6 +227,8 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
  */
 tunnel_entry_id = encaps_tunnel_id_create(new_chassis_id, encap->ip,

Re: [ovs-dev] [PATCH ovn v1 4/4] northd: lflow-mgr: Allocate DP reference counters on a second use.

2024-02-12 Thread Han Zhou

har *actions, const char *io_port,
>  const char *ctrl_meter,
> @@ -674,9 +673,9 @@ lflow_table_add_lflow(struct lflow_table *lflow_table,
>
>  hash_lock = lflow_hash_lock(_table->entries, hash);
>  struct ovn_lflow *lflow =
> -do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> - dp_bitmap_len, hash, stage,
> - priority, match, actions,
> +do_ovn_lflow_add(lflow_table,
> + od ? ods_size(od->datapaths) : dp_bitmap_len,
> + hash, stage, priority, match, actions,
>   io_port, ctrl_meter, stage_hint, where);
>
>  if (lflow_ref) {
> @@ -702,17 +701,24 @@ lflow_table_add_lflow(struct lflow_table
*lflow_table,
>  ovs_assert(lrn->dpgrp_bitmap_len == dp_bitmap_len);
>  size_t index;
>  BITMAP_FOR_EACH_1 (index, dp_bitmap_len, dp_bitmap) {
> -dp_refcnt_use(>dp_refcnts_map, index);
> +/* Allocate a reference counter only if already
used. */
> +if (bitmap_is_set(lflow->dpg_bitmap, index)) {
> +dp_refcnt_use(>dp_refcnts_map, index);
> +}
>  }
>  } else {
> -dp_refcnt_use(>dp_refcnts_map, lrn->dp_index);
> +/* Allocate a reference counter only if already used. */
> +if (bitmap_is_set(lflow->dpg_bitmap, lrn->dp_index)) {
> +dp_refcnt_use(>dp_refcnts_map, lrn->dp_index);
> +}
>  }
>  }
>  lrn->linked = true;
>  }
>
> -lflow_hash_unlock(hash_lock);
> +ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, dp_bitmap_len);
>
> +lflow_hash_unlock(hash_lock);
>  }
>
>  void
> @@ -946,9 +952,7 @@ ovn_lflow_destroy(struct lflow_table *lflow_table,
struct ovn_lflow *lflow)
>  }
>
>  static struct ovn_lflow *
> -do_ovn_lflow_add(struct lflow_table *lflow_table,
> - const struct ovn_datapath *od,
> - const unsigned long *dp_bitmap, size_t dp_bitmap_len,
> +do_ovn_lflow_add(struct lflow_table *lflow_table, size_t dp_bitmap_len,
>   uint32_t hash, enum ovn_stage stage, uint16_t priority,
>   const char *match, const char *actions,
>   const char *io_port, const char *ctrl_meter,
> @@ -959,14 +963,11 @@ do_ovn_lflow_add(struct lflow_table *lflow_table,
>  struct ovn_lflow *old_lflow;
>  struct ovn_lflow *lflow;
>
> -size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
> -ovs_assert(bitmap_len);
> +ovs_assert(dp_bitmap_len);
>
>  old_lflow = ovn_lflow_find(_table->entries, stage,
> priority, match, actions, ctrl_meter,
hash);
>  if (old_lflow) {
> -ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
> -bitmap_len);
>  return old_lflow;
>  }
>
> @@ -974,14 +975,12 @@ do_ovn_lflow_add(struct lflow_table *lflow_table,
>  /* While adding new logical flows we're not setting single datapath,
but
>   * collecting a group.  'od' will be updated later for all flows
with only
>   * one datapath in a group, so it could be hashed correctly. */
> -ovn_lflow_init(lflow, NULL, bitmap_len, stage, priority,
> +ovn_lflow_init(lflow, NULL, dp_bitmap_len, stage, priority,
> xstrdup(match), xstrdup(actions),
> io_port ? xstrdup(io_port) : NULL,
> nullable_xstrdup(ctrl_meter),
> ovn_lflow_hint(stage_hint), where);
>
> -ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, bitmap_len);
> -
>  if (parallelization_state != STATE_USE_PARALLELIZATION) {
>  hmap_insert(_table->entries, >hmap_node, hash);
>  } else {
> @@ -1350,8 +1349,10 @@ dp_refcnt_use(struct hmap *dp_refcnts_map, size_t
dp_index)
>  struct dp_refcnt *dp_refcnt = dp_refcnt_find(dp_refcnts_map,
dp_index);
>
>  if (!dp_refcnt) {
> -dp_refcnt = xzalloc(sizeof *dp_refcnt);
> +dp_refcnt = xmalloc(sizeof *dp_refcnt);
>  dp_refcnt->dp_index = dp_index;
> +/* Allocation is happening on the second (!) use. */
> +dp_refcnt->refcnt = 1;
>
>  hmap_insert(dp_refcnts_map, _refcnt->key_node, dp_index);
>  }
> --
> 2.43.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Ilya and Numan.
Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v1 3/4] northd: Fix lflow ref node's reference counting.

2024-02-12 Thread Han Zhou

On Thu, Feb 8, 2024 at 1:50 PM  wrote:
>
> From: Numan Siddique 
>
> When the lflows in an lflow_ref are unlinked by calling
> lflow_ref_unlink_lflows(lflow_ref), the dp_ref counter
> for each lflow in the lflow_ref is decremented (by calling
> dp_refcnt_release()),  but it is not incremented later
> when the same lflow is linked back to the lflow_ref.
>
> This patch fixes it.
>
> Fixes: a623606052ea ("northd: Refactor lflow management into a separate
module.")
> Signed-off-by: Numan Siddique 
> ---
>  northd/lflow-mgr.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
> index 61729e9039..df62cd6ab4 100644
> --- a/northd/lflow-mgr.c
> +++ b/northd/lflow-mgr.c
> @@ -690,19 +690,24 @@ lflow_table_add_lflow(struct lflow_table
*lflow_table,
>  if (lrn->dpgrp_lflow) {
>  lrn->dpgrp_bitmap = bitmap_clone(dp_bitmap,
dp_bitmap_len);
>  lrn->dpgrp_bitmap_len = dp_bitmap_len;
> +} else {
> +lrn->dp_index = od->index;
> +}
> +ovs_list_insert(>referenced_by, >ref_list_node);
> +hmap_insert(_ref->lflow_ref_nodes, >ref_node,
hash);
> +}
>
> +if (!lrn->linked) {
> +if (lrn->dpgrp_lflow) {
> +ovs_assert(lrn->dpgrp_bitmap_len == dp_bitmap_len);
>  size_t index;
>  BITMAP_FOR_EACH_1 (index, dp_bitmap_len, dp_bitmap) {
>  dp_refcnt_use(>dp_refcnts_map, index);
>  }
>  } else {
> -lrn->dp_index = od->index;
>  dp_refcnt_use(>dp_refcnts_map, lrn->dp_index);
>  }
> -ovs_list_insert(>referenced_by, >ref_list_node);
> -hmap_insert(_ref->lflow_ref_nodes, >ref_node,
hash);
>  }
> -
>  lrn->linked = true;
>  }
>
> --
> 2.43.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Numan for the fix.
Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v1 2/4] northd: Don't add ARP request responder flows for NAT multiple times.

2024-02-12 Thread Han Zhou

ec_nat *nat = nat_entry->nb;
>
>  /* Check if the ovn port has a network configured on which we
could
>   * expect ARP requests/NS for the SNAT external_ip.
>   */
>  if (nat_entry_is_v6(nat_entry)) {
> -if (!lr_stateful_rec ||
> -!sset_contains(_stateful_rec->lb_ips->ips_v6,
> +if (!sset_contains(_stateful_rec->lb_ips->ips_v6,
> nat->external_ip)) {
>  build_lswitch_rport_arp_req_flow(
>  nat->external_ip, AF_INET6, sw_op, sw_od, 80, lflows,
>  stage_hint, lflow_ref);
>  }
>  } else {
> -if (!lr_stateful_rec ||
> -!sset_contains(_stateful_rec->lb_ips->ips_v4,
> +if (!sset_contains(_stateful_rec->lb_ips->ips_v4,
> nat->external_ip)) {
>  build_lswitch_rport_arp_req_flow(
>  nat->external_ip, AF_INET, sw_op, sw_od, 80, lflows,
> diff --git a/northd/northd.h b/northd/northd.h
> index b5c175929e..3f1cd83413 100644
> --- a/northd/northd.h
> +++ b/northd/northd.h
> @@ -293,6 +293,7 @@ struct ovn_datapath {
>  struct ovn_datapath **ls_peers;
>  size_t n_ls_peers;
>  size_t n_allocated_ls_peers;
> +struct sset router_ips; /* Router port IPs except the IPv6 LLAs. */
>
>  /* Logical switch data. */
>  struct ovn_port **router_ports;
> --
> 2.43.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Numan.
Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v1 1/4] northd: Don't add lr_out_delivery default drop flow for each lrp.

2024-02-12 Thread Han Zhou

On Thu, Feb 8, 2024 at 1:49 PM  wrote:
>
> From: Numan Siddique 
>
> The default drop flow in lr_out_delivery stage is generated
> for every router port of a logical router.  This results in the
> lflow_table_add_lflow() to be called multiple times for the
> same match and actions and the ovn_lflow to have multiple
> dp_refcnts.  Fix this by generating this lflow only once for
> each router.
>
> Fixes: 27a92cc272aa ("northd: make default drops explicit")
> Signed-off-by: Numan Siddique 
> ---
>  northd/northd.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index a174a4dcd1..a5d5e67117 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -13470,9 +13470,6 @@ build_egress_delivery_flows_for_lrouter_port(
>  ds_put_format(match, "outport == %s", op->json_key);
>  ovn_lflow_add(lflows, op->od, S_ROUTER_OUT_DELIVERY, 100,
>ds_cstr(match), "output;", lflow_ref);
> -
> -ovn_lflow_add_default_drop(lflows, op->od, S_ROUTER_OUT_DELIVERY,
> -   lflow_ref);
>  }
>
>  static void
> @@ -14838,9 +14835,9 @@ lrouter_check_nat_entry(const struct ovn_datapath
*od,
>  }
>
>  /* NAT, Defrag and load balancing. */
> -static void build_lr_nat_defrag_and_lb_default_flows(struct ovn_datapath
*od,
> -struct lflow_table
*lflows,
> -struct lflow_ref
*lflow_ref)
> +static void build_lr_nat_defrag_and_lb_default_flows(
> +struct ovn_datapath *od, struct lflow_table *lflows,
> +struct lflow_ref *lflow_ref)
>  {
>  ovs_assert(od->nbr);
>
> @@ -14866,6 +14863,12 @@ static void
build_lr_nat_defrag_and_lb_default_flows(struct ovn_datapath *od,
>   * packet would go through conntrack - which is not required. */
>  ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 120, "nd_ns", "next;",
>lflow_ref);
> +
> +/* Default drop rule in lr_out_delivery stage.  See
> + * build_egress_delivery_flows_for_lrouter_port() which adds a rule
> + * for each router port. */
> +ovn_lflow_add_default_drop(lflows, od, S_ROUTER_OUT_DELIVERY,
> +   lflow_ref);
>  }
>
>  static void
> --
> 2.43.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks Numan. The patch LGTM except that the function
build_lr_nat_defrag_and_lb_default_flows() doesn't seem to be the right
place to add the flow. I'd either add a new function, or just call the
ovn_lflow_add_default_drop directly in
build_lswitch_and_lrouter_iterate_by_lr for this flow.
With this addressed:

Acked-by: Han Zhou 

Regards,
Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovn] physical: Don't reset encap ID across pipelines.

2024-02-12 Thread Han Zhou

The MFF_LOG_ENCAP_ID register was defined to save the encap ID and avoid
changing across pipelines, but in the function
load_logical_ingress_metadata it was reset unconditionally. Because of
this, the encap selection doesn't work when traffic traverses L3
boundaries. This patch fixes it by ensuring the register is loaded only
in the flows of table 0, which is where packets from VIFs enter the
pipeline for the first time.

Fixes: 17b6a12fa286 ("ovn-controller: Support VIF-based local encap IPs 
selection.")
Signed-off-by: Han Zhou 
---
 controller/physical.c | 28 ---
 tests/ovn.at  | 83 ++-
 2 files changed, 98 insertions(+), 13 deletions(-)

diff --git a/controller/physical.c b/controller/physical.c
index c32642d2c69b..7ef259da44b1 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -73,7 +73,8 @@ load_logical_ingress_metadata(const struct sbrec_port_binding 
*binding,
   const struct zone_ids *zone_ids,
   size_t n_encap_ips,
   const char **encap_ips,
-  struct ofpbuf *ofpacts_p);
+  struct ofpbuf *ofpacts_p,
+  bool load_encap_id);
 static int64_t get_vxlan_port_key(int64_t port_key);
 
 /* UUID to identify OF flows not associated with ovsdb rows. */
@@ -689,7 +690,7 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,
 ofpact_put_STRIP_VLAN(ofpacts_p);
 }
 load_logical_ingress_metadata(localnet_port, _ids, 0, NULL,
-  ofpacts_p);
+  ofpacts_p, true);
 replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p);
 replace_mac->mac = router_port_mac;
 
@@ -1047,7 +1048,8 @@ load_logical_ingress_metadata(const struct 
sbrec_port_binding *binding,
   const struct zone_ids *zone_ids,
   size_t n_encap_ips,
   const char **encap_ips,
-  struct ofpbuf *ofpacts_p)
+  struct ofpbuf *ofpacts_p,
+  bool load_encap_id)
 {
 put_zones_ofpacts(zone_ids, ofpacts_p);
 
@@ -1057,13 +1059,15 @@ load_logical_ingress_metadata(const struct 
sbrec_port_binding *binding,
 put_load(dp_key, MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
 put_load(port_key, MFF_LOG_INPORT, 0, 32, ofpacts_p);
 
-/* Default encap_id 0. */
-size_t encap_id = 0;
-if (encap_ips && binding->encap) {
-encap_id = encap_ip_to_id(n_encap_ips, encap_ips,
-  binding->encap->ip);
+if (load_encap_id) {
+/* Default encap_id 0. */
+size_t encap_id = 0;
+if (encap_ips && binding->encap) {
+encap_id = encap_ip_to_id(n_encap_ips, encap_ips,
+  binding->encap->ip);
+}
+put_load(encap_id, MFF_LOG_ENCAP_ID, 16, 16, ofpacts_p);
 }
-put_load(encap_id, MFF_LOG_ENCAP_ID, 16, 16, ofpacts_p);
 }
 
 static const struct sbrec_port_binding *
@@ -1108,7 +1112,7 @@ setup_rarp_activation_strategy(const struct 
sbrec_port_binding *binding,
 match_set_dl_type(, htons(ETH_TYPE_RARP));
 match_set_in_port(, ofport);
 
-load_logical_ingress_metadata(binding, zone_ids, 0, NULL, );
+load_logical_ingress_metadata(binding, zone_ids, 0, NULL, , true);
 
 encode_controller_op(ACTION_OPCODE_ACTIVATION_STRATEGY_RARP,
  NX_CTLR_NO_METER, );
@@ -1522,7 +1526,7 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 put_load(0, MFF_LOG_CT_ZONE, 0, 16, ofpacts_p);
 struct zone_ids peer_zones = get_zone_ids(peer, ct_zones);
 load_logical_ingress_metadata(peer, _zones, n_encap_ips,
-  encap_ips, ofpacts_p);
+  encap_ips, ofpacts_p, false);
 put_load(0, MFF_LOG_FLAGS, 0, 32, ofpacts_p);
 put_load(0, MFF_LOG_OUTPORT, 0, 32, ofpacts_p);
 for (int i = 0; i < MFF_N_LOG_REGS; i++) {
@@ -1739,7 +1743,7 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 uint32_t ofpacts_orig_size = ofpacts_p->size;
 
 load_logical_ingress_metadata(binding, _ids, n_encap_ips,
-  encap_ips, ofpacts_p);
+  encap_ips, ofpacts_p, true);
 
 if (!strcmp(binding->type, "localport")) {
 /* mark the packet as incoming from a localport */
diff --git a/tests/ovn.at b/tests/ovn.at
index 902dd3793b92..30748c96e1c6 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -30426,7 +30426,7 @@ AT_CLEANUP
 
 
 OVN_FOR_EACH_NORTHD([
-AT_SETUP([multiple encap ips selection based on VIF's encap_ip])
+A

Re: [ovs-dev] [PATCH ovn v6 00/13] northd lflow incremental processing

2024-02-06 Thread Han Zhou

On Mon, Feb 5, 2024 at 7:47 PM Numan Siddique  wrote:
>
> On Mon, Feb 5, 2024 at 9:41 PM Han Zhou  wrote:
> >
> > On Mon, Feb 5, 2024 at 4:12 PM Numan Siddique  wrote:
> > >
> > > On Mon, Feb 5, 2024 at 5:54 PM Han Zhou  wrote:
> > > >
> > > > On Mon, Feb 5, 2024 at 10:15 AM Ilya Maximets 
> > wrote:
> > > > >
> > > > > On 2/5/24 15:45, Ilya Maximets wrote:
> > > > > > On 2/5/24 11:34, Ilya Maximets wrote:
> > > > > >> On 2/5/24 09:23, Dumitru Ceara wrote:
> > > > > >>> On 2/5/24 08:13, Han Zhou wrote:
> > > > > >>>> On Sun, Feb 4, 2024 at 9:26 PM Numan Siddique 
> > wrote:
> > > > > >>>>>
> > > > > >>>>> On Sun, Feb 4, 2024 at 9:53 PM Han Zhou 
wrote:
> > > > > >>>>>>
> > > > > >>>>>> On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets <
> > i.maxim...@ovn.org>
> > > > wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>>  35 files changed, 9681 insertions(+), 4645
deletions(-)
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> I had another look at this series and acked the
remaining
> > > > > >>>> patches.  I
> > > > > >>>>>>>>>> just had some minor comments that can be easily fixed
when
> > > > > >>>> applying
> > > > > >>>>>> the
> > > > > >>>>>>>>>> patches to the main branch.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Thanks for all the work on this!  It was a very large
> > change
> > > > but
> > > > > >>>> it
> > > > > >>>>>>>>>> improves northd performance significantly.  I just
hope we
> > > > don't
> > > > > >>>>>>>>>> introduce too many bugs.  Hopefully the time we have
until
> > > > release
> > > > > >>>>>> will
> > > > > >>>>>>>>>> allow us to further test this change on the 24.03
branch.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Regards,
> > > > > >>>>>>>>>> Dumitru
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks a lot Dumitru and Han for the reviews and
patience.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I addressed the comments and applied the patches to
main and
> > > > also
> > > > > >>>> to
> > > > > >>>>>>>> branch-24.03.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> @Han - I know you wanted to take another look in to v6.
 I
> > > > didn't
> > > > > >>>> want
> > > > > >>>>>> to
> > > > > >>>>>>>> delay further as branch-24.03 was created.  I'm more than
> > happy
> > > > to
> > > > > >>>>>> submit
> > > > > >>>>>>>> follow up patches if you have any comments to address.
> > Please
> > > > let
> > > > > >>>> me
> > > > > >>>>>> know.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Hi Numan,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I was writing the reply and saw your email just now.
Thanks
> > a lot
> > > > > >>>> for
> > > > > >>>>>>>> taking a huge effort to achieve the great optimization. I
> > only
> > > > left
> > > > > >>>> one
> > > > > >>>>>

Re: [ovs-dev] [PATCH ovn v6 00/13] northd lflow incremental processing

2024-02-05 Thread Han Zhou

On Mon, Feb 5, 2024 at 4:12 PM Numan Siddique  wrote:
>
> On Mon, Feb 5, 2024 at 5:54 PM Han Zhou  wrote:
> >
> > On Mon, Feb 5, 2024 at 10:15 AM Ilya Maximets 
wrote:
> > >
> > > On 2/5/24 15:45, Ilya Maximets wrote:
> > > > On 2/5/24 11:34, Ilya Maximets wrote:
> > > >> On 2/5/24 09:23, Dumitru Ceara wrote:
> > > >>> On 2/5/24 08:13, Han Zhou wrote:
> > > >>>> On Sun, Feb 4, 2024 at 9:26 PM Numan Siddique 
wrote:
> > > >>>>>
> > > >>>>> On Sun, Feb 4, 2024 at 9:53 PM Han Zhou  wrote:
> > > >>>>>>
> > > >>>>>> On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets <
i.maxim...@ovn.org>
> > wrote:
> > > >>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>>  35 files changed, 9681 insertions(+), 4645 deletions(-)
> > > >>>>>>>>>>
> > > >>>>>>>>>> I had another look at this series and acked the remaining
> > > >>>> patches.  I
> > > >>>>>>>>>> just had some minor comments that can be easily fixed when
> > > >>>> applying
> > > >>>>>> the
> > > >>>>>>>>>> patches to the main branch.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks for all the work on this!  It was a very large
change
> > but
> > > >>>> it
> > > >>>>>>>>>> improves northd performance significantly.  I just hope we
> > don't
> > > >>>>>>>>>> introduce too many bugs.  Hopefully the time we have until
> > release
> > > >>>>>> will
> > > >>>>>>>>>> allow us to further test this change on the 24.03 branch.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Regards,
> > > >>>>>>>>>> Dumitru
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks a lot Dumitru and Han for the reviews and patience.
> > > >>>>>>>>>
> > > >>>>>>>>> I addressed the comments and applied the patches to main and
> > also
> > > >>>> to
> > > >>>>>>>> branch-24.03.
> > > >>>>>>>>>
> > > >>>>>>>>> @Han - I know you wanted to take another look in to v6.  I
> > didn't
> > > >>>> want
> > > >>>>>> to
> > > >>>>>>>> delay further as branch-24.03 was created.  I'm more than
happy
> > to
> > > >>>>>> submit
> > > >>>>>>>> follow up patches if you have any comments to address.
Please
> > let
> > > >>>> me
> > > >>>>>> know.
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Hi Numan,
> > > >>>>>>>>
> > > >>>>>>>> I was writing the reply and saw your email just now. Thanks
a lot
> > > >>>> for
> > > >>>>>>>> taking a huge effort to achieve the great optimization. I
only
> > left
> > > >>>> one
> > > >>>>>>>> comment on the implicit dependency left for the en_lrnat ->
> > > >>>> en_lflow.
> > > >>>>>> Feel
> > > >>>>>>>> free to address it with a followup and no need to block the
> > > >>>> branching.
> > > >>>>>> And
> > > >>>>>>>> take my Ack for the series with that addressed.
> > > >>>>>>>>
> > > >>>>>>>> Acked-by: Han Zhou 
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Hi, Numan, Dumitru and Han.
> > > >>>>>>>
> > > >>>>>>> I see a huge negative performance impact, most likely from
this
> > set,
> >

Re: [ovs-dev] [PATCH ovn v6 00/13] northd lflow incremental processing

2024-02-05 Thread Han Zhou

On Mon, Feb 5, 2024 at 10:15 AM Ilya Maximets  wrote:
>
> On 2/5/24 15:45, Ilya Maximets wrote:
> > On 2/5/24 11:34, Ilya Maximets wrote:
> >> On 2/5/24 09:23, Dumitru Ceara wrote:
> >>> On 2/5/24 08:13, Han Zhou wrote:
> >>>> On Sun, Feb 4, 2024 at 9:26 PM Numan Siddique  wrote:
> >>>>>
> >>>>> On Sun, Feb 4, 2024 at 9:53 PM Han Zhou  wrote:
> >>>>>>
> >>>>>> On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets 
wrote:
> >>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>  35 files changed, 9681 insertions(+), 4645 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> I had another look at this series and acked the remaining
> >>>> patches.  I
> >>>>>>>>>> just had some minor comments that can be easily fixed when
> >>>> applying
> >>>>>> the
> >>>>>>>>>> patches to the main branch.
> >>>>>>>>>>
> >>>>>>>>>> Thanks for all the work on this!  It was a very large change
but
> >>>> it
> >>>>>>>>>> improves northd performance significantly.  I just hope we
don't
> >>>>>>>>>> introduce too many bugs.  Hopefully the time we have until
release
> >>>>>> will
> >>>>>>>>>> allow us to further test this change on the 24.03 branch.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Dumitru
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks a lot Dumitru and Han for the reviews and patience.
> >>>>>>>>>
> >>>>>>>>> I addressed the comments and applied the patches to main and
also
> >>>> to
> >>>>>>>> branch-24.03.
> >>>>>>>>>
> >>>>>>>>> @Han - I know you wanted to take another look in to v6.  I
didn't
> >>>> want
> >>>>>> to
> >>>>>>>> delay further as branch-24.03 was created.  I'm more than happy
to
> >>>>>> submit
> >>>>>>>> follow up patches if you have any comments to address.  Please
let
> >>>> me
> >>>>>> know.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Numan,
> >>>>>>>>
> >>>>>>>> I was writing the reply and saw your email just now. Thanks a lot
> >>>> for
> >>>>>>>> taking a huge effort to achieve the great optimization. I only
left
> >>>> one
> >>>>>>>> comment on the implicit dependency left for the en_lrnat ->
> >>>> en_lflow.
> >>>>>> Feel
> >>>>>>>> free to address it with a followup and no need to block the
> >>>> branching.
> >>>>>> And
> >>>>>>>> take my Ack for the series with that addressed.
> >>>>>>>>
> >>>>>>>> Acked-by: Han Zhou 
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi, Numan, Dumitru and Han.
> >>>>>>>
> >>>>>>> I see a huge negative performance impact, most likely from this
set,
> >>>> on
> >>>>>>> ovn-heater's cluster-density tests.  The memory consumption on
northd
> >>>
> >>> Thanks for reporting this, Ilya!
> >>>
> >>>>>>> jumped about 4x and it constantly recomputes due to failures of
> >>>> port_group
> >>>>>>> handler:
> >>>>>>>
> >>>>>>> 2024-02-03T11:09:12.441Z|01680|inc_proc_eng|INFO|node: lflow,
> >>>> recompute
> >>>>>> (failed handler for input port_group) took 9762ms
> >>>>>>> 2024-02-03T11:09:12.444Z|01681|timeval|WARN|Unreasonably long
9898ms
> >>>> poll
> >>>>>> interval (5969ms user, 1786ms system)
> >>>>>>> ...
> >>>>>>> 2024-02-03T11:09:23.770Z|01690|in

Re: [ovs-dev] [PATCH ovn v6 00/13] northd lflow incremental processing

2024-02-04 Thread Han Zhou

On Sun, Feb 4, 2024 at 9:26 PM Numan Siddique  wrote:
>
> On Sun, Feb 4, 2024 at 9:53 PM Han Zhou  wrote:
> >
> > On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets  wrote:
> > >
> > > >>>
> > > >>> >  35 files changed, 9681 insertions(+), 4645 deletions(-)
> > > >>>
> > > >>> I had another look at this series and acked the remaining
patches.  I
> > > >>> just had some minor comments that can be easily fixed when
applying
> > the
> > > >>> patches to the main branch.
> > > >>>
> > > >>> Thanks for all the work on this!  It was a very large change but
it
> > > >>> improves northd performance significantly.  I just hope we don't
> > > >>> introduce too many bugs.  Hopefully the time we have until release
> > will
> > > >>> allow us to further test this change on the 24.03 branch.
> > > >>>
> > > >>> Regards,
> > > >>> Dumitru
> > > >>
> > > >>
> > > >>
> > > >> Thanks a lot Dumitru and Han for the reviews and patience.
> > > >>
> > > >> I addressed the comments and applied the patches to main and also
to
> > > > branch-24.03.
> > > >>
> > > >> @Han - I know you wanted to take another look in to v6.  I didn't
want
> > to
> > > > delay further as branch-24.03 was created.  I'm more than happy to
> > submit
> > > > follow up patches if you have any comments to address.  Please let
me
> > know.
> > > >>
> > > >
> > > > Hi Numan,
> > > >
> > > > I was writing the reply and saw your email just now. Thanks a lot
for
> > > > taking a huge effort to achieve the great optimization. I only left
one
> > > > comment on the implicit dependency left for the en_lrnat ->
en_lflow.
> > Feel
> > > > free to address it with a followup and no need to block the
branching.
> > And
> > > > take my Ack for the series with that addressed.
> > > >
> > > > Acked-by: Han Zhou 
> > >
> > >
> > > Hi, Numan, Dumitru and Han.
> > >
> > > I see a huge negative performance impact, most likely from this set,
on
> > > ovn-heater's cluster-density tests.  The memory consumption on northd
> > > jumped about 4x and it constantly recomputes due to failures of
port_group
> > > handler:
> > >
> > > 2024-02-03T11:09:12.441Z|01680|inc_proc_eng|INFO|node: lflow,
recompute
> > (failed handler for input port_group) took 9762ms
> > > 2024-02-03T11:09:12.444Z|01681|timeval|WARN|Unreasonably long 9898ms
poll
> > interval (5969ms user, 1786ms system)
> > > ...
> > > 2024-02-03T11:09:23.770Z|01690|inc_proc_eng|INFO|node: lflow,
recompute
> > (failed handler for input port_group) took 9014ms
> > > 2024-02-03T11:09:23.773Z|01691|timeval|WARN|Unreasonably long 9118ms
poll
> > interval (5376ms user, 1515ms system)
> > > ...
> > > 2024-02-03T11:09:36.692Z|01699|inc_proc_eng|INFO|node: lflow,
recompute
> > (failed handler for input port_group) took 10695ms
> > > 2024-02-03T11:09:36.696Z|01700|timeval|WARN|Unreasonably long 10890ms
> > poll interval (6085ms user, 2745ms system)
> > > ...
> > > 2024-02-03T11:09:49.133Z|01708|inc_proc_eng|INFO|node: lflow,
recompute
> > (failed handler for input port_group) took 9985ms
> > > 2024-02-03T11:09:49.137Z|01709|timeval|WARN|Unreasonably long 10108ms
> > poll interval (5521ms user, 2440ms system)
> > >
> > > That increases 95%% ovn-installed latency in 500node cluster-density
from
> > > 3.6 seconds last week to 21.5 seconds this week.
> > >
> > > I think, this should be a release blocker.
> > >
> > > Memory usage is also very concerning.  Unfortunately it is not tied
to the
> > > cluster-density test.  The same 4-5x RSS jump is also seen in other
test
> > > like density-heavy.  Last week RSS of ovn-northd in cluster-density
500
> > node
> > > was between 1.5 and 2.5 GB, this week we have a range between 5.5 and
8.5
> > GB.
> > >
> > > I would consider this as a release blocker as well.
> > >
> > >
> > > I don't have direct evidence that this particular series is a
culprit, but
> > > it looks like the most likely candidate.  I can dig more into
> > investigation
> > > on Monday.
> > >
> > > Best regards, Ilya Max

Re: [ovs-dev] [PATCH ovn v6 00/13] northd lflow incremental processing

2024-02-04 Thread Han Zhou

On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets  wrote:
>
> >>>
> >>> >  35 files changed, 9681 insertions(+), 4645 deletions(-)
> >>>
> >>> I had another look at this series and acked the remaining patches.  I
> >>> just had some minor comments that can be easily fixed when applying
the
> >>> patches to the main branch.
> >>>
> >>> Thanks for all the work on this!  It was a very large change but it
> >>> improves northd performance significantly.  I just hope we don't
> >>> introduce too many bugs.  Hopefully the time we have until release
will
> >>> allow us to further test this change on the 24.03 branch.
> >>>
> >>> Regards,
> >>> Dumitru
> >>
> >>
> >>
> >> Thanks a lot Dumitru and Han for the reviews and patience.
> >>
> >> I addressed the comments and applied the patches to main and also to
> > branch-24.03.
> >>
> >> @Han - I know you wanted to take another look in to v6.  I didn't want
to
> > delay further as branch-24.03 was created.  I'm more than happy to
submit
> > follow up patches if you have any comments to address.  Please let me
know.
> >>
> >
> > Hi Numan,
> >
> > I was writing the reply and saw your email just now. Thanks a lot for
> > taking a huge effort to achieve the great optimization. I only left one
> > comment on the implicit dependency left for the en_lrnat -> en_lflow.
Feel
> > free to address it with a followup and no need to block the branching.
And
> > take my Ack for the series with that addressed.
> >
> > Acked-by: Han Zhou 
>
>
> Hi, Numan, Dumitru and Han.
>
> I see a huge negative performance impact, most likely from this set, on
> ovn-heater's cluster-density tests.  The memory consumption on northd
> jumped about 4x and it constantly recomputes due to failures of port_group
> handler:
>
> 2024-02-03T11:09:12.441Z|01680|inc_proc_eng|INFO|node: lflow, recompute
(failed handler for input port_group) took 9762ms
> 2024-02-03T11:09:12.444Z|01681|timeval|WARN|Unreasonably long 9898ms poll
interval (5969ms user, 1786ms system)
> ...
> 2024-02-03T11:09:23.770Z|01690|inc_proc_eng|INFO|node: lflow, recompute
(failed handler for input port_group) took 9014ms
> 2024-02-03T11:09:23.773Z|01691|timeval|WARN|Unreasonably long 9118ms poll
interval (5376ms user, 1515ms system)
> ...
> 2024-02-03T11:09:36.692Z|01699|inc_proc_eng|INFO|node: lflow, recompute
(failed handler for input port_group) took 10695ms
> 2024-02-03T11:09:36.696Z|01700|timeval|WARN|Unreasonably long 10890ms
poll interval (6085ms user, 2745ms system)
> ...
> 2024-02-03T11:09:49.133Z|01708|inc_proc_eng|INFO|node: lflow, recompute
(failed handler for input port_group) took 9985ms
> 2024-02-03T11:09:49.137Z|01709|timeval|WARN|Unreasonably long 10108ms
poll interval (5521ms user, 2440ms system)
>
> That increases 95%% ovn-installed latency in 500node cluster-density from
> 3.6 seconds last week to 21.5 seconds this week.
>
> I think, this should be a release blocker.
>
> Memory usage is also very concerning.  Unfortunately it is not tied to the
> cluster-density test.  The same 4-5x RSS jump is also seen in other test
> like density-heavy.  Last week RSS of ovn-northd in cluster-density 500
node
> was between 1.5 and 2.5 GB, this week we have a range between 5.5 and 8.5
GB.
>
> I would consider this as a release blocker as well.
>
>
> I don't have direct evidence that this particular series is a culprit, but
> it looks like the most likely candidate.  I can dig more into
investigation
> on Monday.
>
> Best regards, Ilya Maximets.

Thanks Ilya for reporting this. 95% latency and 4x RSS increase is a little
surprising to me. I did test this series with my scale test scripts for
recompute performance regression. It was 10+% increase in latency. I even
digged a little into it, and noticed ~5% increase caused by the hmap used
to maintain the lflows in each lflow_ref. This was discussed in the code
review for an earlier version (v2/v3). Overall it looked not very bad, if
we now handle most common scenarios incrementally, and it is reasonable to
have some cost for maintaining the references/index for incremental
processing. I wonder if my test scenario was too simple (didn't have LBs
included) to find the problems, so today I did another test by including a
LB group with 1k LBs applied to 100 node-LS & GR, and another 1K LBs per
node-LS & GR (101K LBs in total), and I did see more performance penalty
but still within ~20%. While for memory I didn't notice a significant
increase (<10%). I believe I am missing some specific scenario that had the
big impact in the ovn-heater's tests. Please share if you dig out more
clues .

Thanks,
Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v3 2/2] northd: Explicitly handle SNAT for ICMP need frag.

2024-02-01 Thread Han Zhou

On Thu, Feb 1, 2024 at 5:30 PM Han Zhou  wrote:
>
>
>
> On Mon, Jan 29, 2024 at 1:36 AM Dumitru Ceara  wrote:
> >
> > On 1/29/24 07:20, Ales Musil wrote:
> > > Considering following topology:
> > > client - sw0 - lrp0 - lr - lrp1 - sw1 - server
> > > sw0 in subnet 192.168.0.0/24
> > > sw1 in subnet 172.168.0.0/24
> > > SNAT configured for client
> > > gateway_mtu=1400 configured for lrp0
> > >
> > > If we send UDP traffic from client to server
> > > and server responds with packet bigger than 1400
> > > the following sequence will happen:
> > >
> > > 1) Packet is coming into lr via lrp1
> > > 2) unSNAT
> > > 3) Routing, the outport will be set to lrp0
> > > 4) Check for packet larger will fail
> > > 5) We will generate ICMP need frag
> > >
> > > However, the last step is wrong from the server
> > > perspective. The ICMP message will have IP source
> > > address = lrp1 IP address. Which means that SNAT won't
> > > happen because the source is not within the sw0 subnet,
> > > but the inner packet has sw0 subnet address, because it
> > > was unSNATted. This results in server ignoring the ICMP
> > > message because server never sent any packet to the
> > > sw0 subnet.
> > >
> > > In order to prevent this issue perform SNAT for the
> > > ICMP packet. Because the packet is related to already
> > > existing connection we just need to perform
> > > ct_commit_nat(snat) action.
> > >
> > > This is achieved with addition of the following flow for
> > > "lr_in_larger_pkts" stage (the flow for IPv6 is the in
> > > regard to the addition):
> > >
> > > match=(inport == "INPORT" && outport == "OUTPORT" && ip4 &&
REGBIT_PKT_LARGER && REGBIT_EGRESS_LOOPBACK == 0 && ct.trk && ct.rpl &&
ct.dnat), action=(icmp4_error {flags.icmp_snat = 1; REGBIT_EGRESS_LOOPBACK
= 1; REGBIT_PKT_LARGER = 0; eth.dst = ETH_DST; ip4.dst = ip4.src; ip4.src =
IP_SRC; ip.ttl = 255; icmp4.type = 3; /* Destination Unreachable. */
icmp4.code = 4; /* Frag Needed and DF was Set. */ icmp4.frag_mtu = 1500;
next(pipeline=ingress, table=0); };)
> > >
> > > Also, add flow to "lr_out_post_snat" stage:
> > >
> > > match=(icmp && flags.icmp_snat == 1), action=(ct_commit_nat(snat);)
> > >
> > > Partially revert 0e49f49c73d6 ("northd: Allow need frag to be SNATed")
> > > which attempted to fix the same issue in a wrong way.
> > >
> > > Also add feature flag for the updated ct_commit_nat action.
> > > In case there is an update of northd to newer version before all
> > > controllers are updated.
> > >
> > > Fixes: 0e49f49c73d6 ("northd: Allow need frag to be SNATed")
> > > Reported-at: https://issues.redhat.com/browse/FDP-134
> > > Reported-at: https://issues.redhat.com/browse/FDP-159
> > > Signed-off-by: Ales Musil 
> > > Acked-by: Dumitru Ceara 
> > > ---
> > > v3: Rebase on top of current main.
> > > v2: Rebase on top of current main.
> > > Squash the 2/3 and 3/3 from previous version to single commit.
> > > Add ack from Dumitru.
> > > ---
> >
> > Hi Ales,
> >
> > Before accepting this patch I'd like to try to clarify one thing that
> > was flagged as a potential issue by Numan (ct.dnat), please see below.
> >
> > >  controller/chassis.c |   8 ++
> > >  include/ovn/features.h   |   1 +
> > >  include/ovn/logical-fields.h |   3 +
> > >  lib/logical-fields.c |   4 +
> > >  northd/northd.c  | 192
---
> > >  northd/northd.h  |   1 +
> > >  tests/ovn-northd.at  | 118 ++---
> > >  tests/ovn.at |   6 +-
> > >  tests/system-ovn-kmod.at |   3 +-
> > >  9 files changed, 214 insertions(+), 122 deletions(-)
> > >
> > > diff --git a/controller/chassis.c b/controller/chassis.c
> > > index a6f13ccc4..ba2e57238 100644
> > > --- a/controller/chassis.c
> > > +++ b/controller/chassis.c
> > > @@ -370,6 +370,7 @@ chassis_build_other_config(const struct
ovs_chassis_cfg *ovs_cfg,
> > >  smap_replace(config, OVN_FEATURE_CT_LB_RELATED, "true");
> > >  smap_replace(config, OVN_FEATURE_FDB_TIMESTAMP, "true");
> > >  smap_replace(config, OVN_FEATURE_LS_DPG

Re: [ovs-dev] [PATCH ovn v3 2/2] northd: Explicitly handle SNAT for ICMP need frag.

2024-02-01 Thread Han Zhou

On Mon, Jan 29, 2024 at 1:36 AM Dumitru Ceara  wrote:
>
> On 1/29/24 07:20, Ales Musil wrote:
> > Considering following topology:
> > client - sw0 - lrp0 - lr - lrp1 - sw1 - server
> > sw0 in subnet 192.168.0.0/24
> > sw1 in subnet 172.168.0.0/24
> > SNAT configured for client
> > gateway_mtu=1400 configured for lrp0
> >
> > If we send UDP traffic from client to server
> > and server responds with packet bigger than 1400
> > the following sequence will happen:
> >
> > 1) Packet is coming into lr via lrp1
> > 2) unSNAT
> > 3) Routing, the outport will be set to lrp0
> > 4) Check for packet larger will fail
> > 5) We will generate ICMP need frag
> >
> > However, the last step is wrong from the server
> > perspective. The ICMP message will have IP source
> > address = lrp1 IP address. Which means that SNAT won't
> > happen because the source is not within the sw0 subnet,
> > but the inner packet has sw0 subnet address, because it
> > was unSNATted. This results in server ignoring the ICMP
> > message because server never sent any packet to the
> > sw0 subnet.
> >
> > In order to prevent this issue perform SNAT for the
> > ICMP packet. Because the packet is related to already
> > existing connection we just need to perform
> > ct_commit_nat(snat) action.
> >
> > This is achieved with addition of the following flow for
> > "lr_in_larger_pkts" stage (the flow for IPv6 is the in
> > regard to the addition):
> >
> > match=(inport == "INPORT" && outport == "OUTPORT" && ip4 &&
REGBIT_PKT_LARGER && REGBIT_EGRESS_LOOPBACK == 0 && ct.trk && ct.rpl &&
ct.dnat), action=(icmp4_error {flags.icmp_snat = 1; REGBIT_EGRESS_LOOPBACK
= 1; REGBIT_PKT_LARGER = 0; eth.dst = ETH_DST; ip4.dst = ip4.src; ip4.src =
IP_SRC; ip.ttl = 255; icmp4.type = 3; /* Destination Unreachable. */
icmp4.code = 4; /* Frag Needed and DF was Set. */ icmp4.frag_mtu = 1500;
next(pipeline=ingress, table=0); };)
> >
> > Also, add flow to "lr_out_post_snat" stage:
> >
> > match=(icmp && flags.icmp_snat == 1), action=(ct_commit_nat(snat);)
> >
> > Partially revert 0e49f49c73d6 ("northd: Allow need frag to be SNATed")
> > which attempted to fix the same issue in a wrong way.
> >
> > Also add feature flag for the updated ct_commit_nat action.
> > In case there is an update of northd to newer version before all
> > controllers are updated.
> >
> > Fixes: 0e49f49c73d6 ("northd: Allow need frag to be SNATed")
> > Reported-at: https://issues.redhat.com/browse/FDP-134
> > Reported-at: https://issues.redhat.com/browse/FDP-159
> > Signed-off-by: Ales Musil 
> > Acked-by: Dumitru Ceara 
> > ---
> > v3: Rebase on top of current main.
> > v2: Rebase on top of current main.
> > Squash the 2/3 and 3/3 from previous version to single commit.
> > Add ack from Dumitru.
> > ---
>
> Hi Ales,
>
> Before accepting this patch I'd like to try to clarify one thing that
> was flagged as a potential issue by Numan (ct.dnat), please see below.
>
> >  controller/chassis.c |   8 ++
> >  include/ovn/features.h   |   1 +
> >  include/ovn/logical-fields.h |   3 +
> >  lib/logical-fields.c |   4 +
> >  northd/northd.c  | 192 ---
> >  northd/northd.h  |   1 +
> >  tests/ovn-northd.at  | 118 ++---
> >  tests/ovn.at |   6 +-
> >  tests/system-ovn-kmod.at |   3 +-
> >  9 files changed, 214 insertions(+), 122 deletions(-)
> >
> > diff --git a/controller/chassis.c b/controller/chassis.c
> > index a6f13ccc4..ba2e57238 100644
> > --- a/controller/chassis.c
> > +++ b/controller/chassis.c
> > @@ -370,6 +370,7 @@ chassis_build_other_config(const struct
ovs_chassis_cfg *ovs_cfg,
> >  smap_replace(config, OVN_FEATURE_CT_LB_RELATED, "true");
> >  smap_replace(config, OVN_FEATURE_FDB_TIMESTAMP, "true");
> >  smap_replace(config, OVN_FEATURE_LS_DPG_COLUMN, "true");
> > +smap_replace(config, OVN_FEATURE_CT_COMMIT_NAT_V2, "true");
> >  }
> >
> >  /*
> > @@ -509,6 +510,12 @@ chassis_other_config_changed(const struct
ovs_chassis_cfg *ovs_cfg,
> >  return true;
> >  }
> >
> > +if (!smap_get_bool(_rec->other_config,
> > +   OVN_FEATURE_CT_COMMIT_NAT_V2,
> > +   false)) {
> > +return true;
> > +}
> > +
> >  return false;
> >  }
> >
> > @@ -640,6 +647,7 @@ update_supported_sset(struct sset *supported)
> >  sset_add(supported, OVN_FEATURE_CT_LB_RELATED);
> >  sset_add(supported, OVN_FEATURE_FDB_TIMESTAMP);
> >  sset_add(supported, OVN_FEATURE_LS_DPG_COLUMN);
> > +sset_add(supported, OVN_FEATURE_CT_COMMIT_NAT_V2);
> >  }
> >
> >  static void
> > diff --git a/include/ovn/features.h b/include/ovn/features.h
> > index 2c47ab766..08f1d8288 100644
> > --- a/include/ovn/features.h
> > +++ b/include/ovn/features.h
> > @@ -27,6 +27,7 @@
> >  #define OVN_FEATURE_CT_LB_RELATED "ovn-ct-lb-related"
> >  #define OVN_FEATURE_FDB_TIMESTAMP "fdb-timestamp"
> >  #define OVN_FEATURE_LS_DPG_COLUMN

Re: [ovs-dev] [PATCH ovn] tests: Fix grep warning

2024-01-30 Thread Han Zhou

On Tue, Jan 30, 2024 at 3:35 AM Dumitru Ceara  wrote:
>
> On 1/30/24 08:59, Ales Musil wrote:
> > On Tue, Jan 30, 2024 at 8:58 AM Ales Musil  wrote:
> >
> >> The Fedora version of grep (grep (GNU grep) 3.11) complains
> >> about the syntax grep "output\:": grep: warning: stray \ before :
> >>
> >> Remove the \ which works also for Ubuntu grep version
> >> (grep (GNU grep) 3.7).
> >>
> >>
> > I forgot to add the Fixes tag.
> >
> > Fixes: 17b6a12fa286 ("ovn-controller: Support VIF-based local encap IPs
> > selection.")
> >
> >
> >> Signed-off-by: Ales Musil 
> >> ---
>
> Thanks, I added the "fixes" tag and a dot at the end of the commit
> summary then pushed this to main.
>
> Regards,
> Dumitru

Thanks Ales and Dumitru for the fix.
Han
>
> >>  tests/ovn.at | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/tests/ovn.at b/tests/ovn.at
> >> index 62966752f..cf87b9ad4 100644
> >> --- a/tests/ovn.at
> >> +++ b/tests/ovn.at
> >> @@ -30435,7 +30435,7 @@ check_packet_tunnel() {
> >>  as $hv
> >>  echo "vif$src -> vif$dst should go through tunnel $local_encap_ip
->
> >> $remote_encap_ip"
> >>  tunnel_ofport=$(ovs-vsctl --bare --column=ofport find interface
> >> options:local_ip=$local_encap_ip options:remote_ip=$remote_encap_ip)
> >> -AT_CHECK([test $(ovs-appctl ofproto/trace br-int in_port=vif$src
> >> $packet | grep "output\:" | awk -F ':' '{ print $2 }') ==
$tunnel_ofport])
> >> +AT_CHECK([test $(ovs-appctl ofproto/trace br-int in_port=vif$src
> >> $packet | grep "output:" | awk -F ':' '{ print $2 }') ==
$tunnel_ofport])
> >>  }
> >>
> >>  for i in 1 2; do
> >> --
> >> 2.43.0
> >>
> >>
> >
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn 3/3] ovn-controller: Support VIF-based local encap IPs selection.

2024-01-29 Thread Han Zhou

On Mon, Jan 29, 2024 at 2:41 AM Ales Musil  wrote:
>
>
>
> On Fri, Jan 26, 2024 at 8:05 PM Han Zhou  wrote:
>>
>>
>>
>> On Thu, Jan 25, 2024 at 10:54 PM Ales Musil  wrote:
>> >
>> >
>> >
>> > On Fri, Jan 26, 2024 at 4:07 AM Han Zhou  wrote:
>> >>
>> >>
>> >>
>> >> On Tue, Jan 23, 2024 at 5:29 AM Ales Musil  wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Jan 17, 2024 at 6:48 AM Han Zhou  wrote:
>> >> >>
>> >> >> Commit dd527a283cd8 partially supported multiple encap IPs. It
supported
>> >> >> remote encap IP selection based on the destination VIF's encap_ip
>> >> >> configuration. This patch adds the support for selecting local
encap IP
>> >> >> based on the source VIF's encap_ip configuration.
>> >> >>
>> >> >> Co-authored-by: Lei Huang 
>> >> >> Signed-off-by: Lei Huang 
>> >> >> Signed-off-by: Han Zhou 
>> >> >> ---
>> >> >
>> >> >
>> >> > Hi Han and Lei,
>> >> >
>> >> > thank you for the patch, I have a couple of comments/questions down
below.
>> >>
>> >>
>> >> Thanks Ales.
>> >>
>> >> >
>> >> >
>> >> >>  NEWS|   3 +
>> >> >>  controller/chassis.c|   2 +-
>> >> >>  controller/local_data.c |   2 +-
>> >> >>  controller/local_data.h |   2 +-
>> >> >>  controller/ovn-controller.8.xml |  30 ++-
>> >> >>  controller/ovn-controller.c |  49 
>> >> >>  controller/physical.c   | 134
++--
>> >> >>  controller/physical.h   |   2 +
>> >> >>  include/ovn/logical-fields.h|   4 +-
>> >> >>  ovn-architecture.7.xml  |  18 -
>> >> >>  tests/ovn.at|  51 +++-
>> >> >>  11 files changed, 243 insertions(+), 54 deletions(-)
>> >> >>
>> >> >> diff --git a/NEWS b/NEWS
>> >> >> index 5f267b4c64cc..5a3eed608617 100644
>> >> >> --- a/NEWS
>> >> >> +++ b/NEWS
>> >> >> @@ -14,6 +14,9 @@ Post v23.09.0
>> >> >>- ovn-northd-ddlog has been removed.
>> >> >>- A new LSP option "enable_router_port_acl" has been added to
enable
>> >> >>  conntrack for the router port whose peer is l3dgw_port if set
it true.
>> >> >> +  - Support selecting encapsulation IP based on the
source/destination VIF's
>> >> >> +settting. See ovn-controller(8) 'external_ids:ovn-encap-ip'
for more
>> >> >> +details.
>> >> >>
>> >> >>  OVN v23.09.0 - 15 Sep 2023
>> >> >>  --
>> >> >> diff --git a/controller/chassis.c b/controller/chassis.c
>> >> >> index a6f13ccc42d5..55f2beb37674 100644
>> >> >> --- a/controller/chassis.c
>> >> >> +++ b/controller/chassis.c
>> >> >> @@ -61,7 +61,7 @@ struct ovs_chassis_cfg {
>> >> >>
>> >> >>  /* Set of encap types parsed from the 'ovn-encap-type'
external-id. */
>> >> >>  struct sset encap_type_set;
>> >> >> -/* Set of encap IPs parsed from the 'ovn-encap-type'
external-id. */
>> >> >> +/* Set of encap IPs parsed from the 'ovn-encap-ip'
external-id. */
>> >> >>  struct sset encap_ip_set;
>> >> >>  /* Interface type list formatted in the OVN-SB Chassis
required format. */
>> >> >>  struct ds iface_types;
>> >> >> diff --git a/controller/local_data.c b/controller/local_data.c
>> >> >> index a9092783958f..8606414f8728 100644
>> >> >> --- a/controller/local_data.c
>> >> >> +++ b/controller/local_data.c
>> >> >> @@ -514,7 +514,7 @@ chassis_tunnels_destroy(struct hmap
*chassis_tunnels)
>> >> >>   */
>> >> >>  struct chassis_tunnel *
>> >> >>  chassis_tunnel_find(const struct hmap *chassis_tunnels, const
char *chassis_id,
>> >> >> -char *remote_encap_ip, char *local_encap_ip)
>>

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-29 Thread Han Zhou

On Mon, Jan 29, 2024 at 7:11 PM Numan Siddique  wrote:

> On Thu, Jan 25, 2024 at 1:08 AM Han Zhou  wrote:
> >
> > On Thu, Jan 11, 2024 at 7:32 AM  wrote:
> > >
> > > From: Numan Siddique 
> > >
> > > ovn_lflow_add() and other related functions/macros are now moved
> > > into a separate module - lflow-mgr.c.  This module maintains a
> > > table 'struct lflow_table' for the logical flows.  lflow table
> > > maintains a hmap to store the logical flows.
> > >
> > > It also maintains the logical switch and router dp groups.
> > >
> > > Previous commits which added lflow incremental processing for
> > > the VIF logical ports, stored the references to
> > > the logical ports' lflows using 'struct lflow_ref_list'.  This
> > > struct is renamed to 'struct lflow_ref' and is part of lflow-mgr.c.
> > > It is  modified a bit to store the resource to lflow references.
> > >
> > > Example usage of 'struct lflow_ref'.
> > >
> > > 'struct ovn_port' maintains 2 instances of lflow_ref.  i,e
> > >
> > > struct ovn_port {
> > >...
> > >...
> > >struct lflow_ref *lflow_ref;
> > >struct lflow_ref *stateful_lflow_ref;
> >
> > Hi Numan,
> >
> > In addition to the lock discussion with you and Dumitru, I still want to
> > discuss another thing of this patch regarding the second lflow_ref:
> > stateful_lflow_ref.
> > I understand that you added this to achieve finer grained I-P especially
> > for router ports. I am wondering how much performance gain is from this.
> > For my understanding it shouldn't matter much since each ovn_port should
> be
> > associated with a very limited number of lflows. Could you provide more
> > insight/data on this? I think it would be better to keep things simple
> > (i.e. one object, one lflow_ref list) unless the benefit is obvious.
> >
> > I am also trying to run another performance regression test for
> recompute,
> > since I am a little concerned about the DP refcnt hmap associated with
> each
> > lflow. I understand it's necessary to handle the duplicated lflow cases,
> > but it looks heavy and removes the opportunities for more efficient
> bitmap
> > operations. Let me know if you have already had evaluated its
> performance.
> >
>
> Hi Han,
>
> I did some testing on a large scaled OVN database.
> The NB database has
>  - 1000 logical switches
>  - 500 routers
>  - 35253 load balancers
>  - Most of these load balancers are associated with all the
> logical switches and routers.
>
> When I run this command for example
>- ovn-nbctl set load_balancer 0d647ff9-4e49-4570-a05d-db670873b7ef
> options:foo=bar
>
> It results in changes to all the logical routers. And the function
> lflow_handle_lr_stateful_changes() is called.
> If you see this function, for each changed router, it also loops
> through the router ports and calls
>  - build_lbnat_lflows_iterate_by_lrp()
>  - build_lbnat_lflows_iterate_by_lsp()
>
> With your suggestion we also need to call
> build_lswitch_and_lrouter_iterate_by_lsp() and
> build_lbnat_lflows_iterate_by_lrp().
>
> I measured the number of lflows referenced in op->lflow_ref and
> op->stateful_lflow_ref
> for each of the logical switch port  and router port pair.  Total
> lflows in lflow_ref
> (of all router ports and their peer switch ports) were 23000 and total
> lflows in stateful_lflow_ref
> were just 4000.
>
> So with just one lflow_ref (as per suggestion) a small update to load
> balancer like above
> would result in generating 27000 logical flows as compared to just 4000.
>
> I think it has a considerable cost in terms of CPU.  And perhaps it
> would matter more
> when ovn-northd runs in a DPU.  My preference would be to have a
> separate lflow ref
> for stateful flows.
>
> Thanks
> Numan


Thanks for the data points! It makes sense to use separate lists.

Regards,
Han

>
>
>
> > Thanks,
> > Han
> >
> > > };
> > >
> > > All the logical flows generated by
> > > build_lswitch_and_lrouter_iterate_by_lsp() uses the
> ovn_port->lflow_ref.
> > >
> > > All the logical flows generated by build_lsp_lflows_for_lbnats()
> > > uses the ovn_port->stateful_lflow_ref.
> > >
> > > When handling the ovn_port changes incrementally, the lflows referenced
> > > in 'struct ovn_port' are cleared and regenerated and synced to the
> > > SB logical flows.
> > >
> > > eg.
> > >
>

Re: [ovs-dev] [PATCH ovn 3/3] ovn-controller: Support VIF-based local encap IPs selection.

2024-01-26 Thread Han Zhou

On Thu, Jan 25, 2024 at 10:54 PM Ales Musil  wrote:
>
>
>
> On Fri, Jan 26, 2024 at 4:07 AM Han Zhou  wrote:
>>
>>
>>
>> On Tue, Jan 23, 2024 at 5:29 AM Ales Musil  wrote:
>> >
>> >
>> >
>> > On Wed, Jan 17, 2024 at 6:48 AM Han Zhou  wrote:
>> >>
>> >> Commit dd527a283cd8 partially supported multiple encap IPs. It
supported
>> >> remote encap IP selection based on the destination VIF's encap_ip
>> >> configuration. This patch adds the support for selecting local encap
IP
>> >> based on the source VIF's encap_ip configuration.
>> >>
>> >> Co-authored-by: Lei Huang 
>> >> Signed-off-by: Lei Huang 
>> >> Signed-off-by: Han Zhou 
>> >> ---
>> >
>> >
>> > Hi Han and Lei,
>> >
>> > thank you for the patch, I have a couple of comments/questions down
below.
>>
>>
>> Thanks Ales.
>>
>> >
>> >
>> >>  NEWS|   3 +
>> >>  controller/chassis.c|   2 +-
>> >>  controller/local_data.c |   2 +-
>> >>  controller/local_data.h |   2 +-
>> >>  controller/ovn-controller.8.xml |  30 ++-
>> >>  controller/ovn-controller.c |  49 
>> >>  controller/physical.c   | 134
++--
>> >>  controller/physical.h   |   2 +
>> >>  include/ovn/logical-fields.h|   4 +-
>> >>  ovn-architecture.7.xml  |  18 -
>> >>  tests/ovn.at|  51 +++-
>> >>  11 files changed, 243 insertions(+), 54 deletions(-)
>> >>
>> >> diff --git a/NEWS b/NEWS
>> >> index 5f267b4c64cc..5a3eed608617 100644
>> >> --- a/NEWS
>> >> +++ b/NEWS
>> >> @@ -14,6 +14,9 @@ Post v23.09.0
>> >>- ovn-northd-ddlog has been removed.
>> >>- A new LSP option "enable_router_port_acl" has been added to
enable
>> >>  conntrack for the router port whose peer is l3dgw_port if set it
true.
>> >> +  - Support selecting encapsulation IP based on the
source/destination VIF's
>> >> +settting. See ovn-controller(8) 'external_ids:ovn-encap-ip' for
more
>> >> +details.
>> >>
>> >>  OVN v23.09.0 - 15 Sep 2023
>> >>  --
>> >> diff --git a/controller/chassis.c b/controller/chassis.c
>> >> index a6f13ccc42d5..55f2beb37674 100644
>> >> --- a/controller/chassis.c
>> >> +++ b/controller/chassis.c
>> >> @@ -61,7 +61,7 @@ struct ovs_chassis_cfg {
>> >>
>> >>  /* Set of encap types parsed from the 'ovn-encap-type'
external-id. */
>> >>  struct sset encap_type_set;
>> >> -/* Set of encap IPs parsed from the 'ovn-encap-type'
external-id. */
>> >> +/* Set of encap IPs parsed from the 'ovn-encap-ip' external-id.
*/
>> >>  struct sset encap_ip_set;
>> >>  /* Interface type list formatted in the OVN-SB Chassis required
format. */
>> >>  struct ds iface_types;
>> >> diff --git a/controller/local_data.c b/controller/local_data.c
>> >> index a9092783958f..8606414f8728 100644
>> >> --- a/controller/local_data.c
>> >> +++ b/controller/local_data.c
>> >> @@ -514,7 +514,7 @@ chassis_tunnels_destroy(struct hmap
*chassis_tunnels)
>> >>   */
>> >>  struct chassis_tunnel *
>> >>  chassis_tunnel_find(const struct hmap *chassis_tunnels, const char
*chassis_id,
>> >> -char *remote_encap_ip, char *local_encap_ip)
>> >> +char *remote_encap_ip, const char
*local_encap_ip)
>> >
>> >
>> > nit: Unrelated change.
>>
>>
>> Ack

Hi Ales, sorry that I just realized this change is related, which is
because of the const char * array introduced in this patch that stores the
parsed encap_ips, and it makes sense to use const char * since we should
never expect this string to be modified in the function.

>>
>> >
>> >
>> >>  {
>> >>  /*
>> >>   * If the specific encap_ip is given, look for the chassisid_ip
entry,
>> >> diff --git a/controller/local_data.h b/controller/local_data.h
>> >> index bab95bcc3824..ca3905bd20e6 100644
>> >> --- a/controller/local_data.h
>> >> +++ b/controller/local_data.h
>> >> @@ -150,7 +150,7 @@ bool local_nonvi

Re: [ovs-dev] [PATCH ovn v5 05/16] northd: Add a new engine 'lr_stateful' to manage lr's stateful data.

2024-01-26 Thread Han Zhou

On Thu, Jan 25, 2024 at 8:22 AM Numan Siddique  wrote:
>
> On Tue, Jan 23, 2024 at 2:40 AM Han Zhou  wrote:
> >
> > On Mon, Jan 22, 2024 at 7:11 PM Numan Siddique  wrote:
> > >
> > > On Mon, Jan 22, 2024 at 4:03 PM Han Zhou  wrote:
> > > >
> > > > On Mon, Jan 22, 2024 at 9:18 AM Numan Siddique 
wrote:
> > > > >
> > > > > Hi Han,
> > > > >
> > > > > Thanks for the reviews.
> > > > > PSB.
> > > > >
> > > > > Thanks
> > > > > Numan
> > > > >
> > > > > On Mon, Jan 22, 2024 at 2:23 AM Han Zhou  wrote:
> > > > > >
> > > > > > On Thu, Jan 11, 2024 at 7:30 AM  wrote:
> > > > > > >
> > > > > > > From: Numan Siddique 
> > > > > > >
> > > > > > > This new engine now maintains the load balancer and NAT data
of a
> > > > > > > logical router which was earlier part of northd engine node
data.
> > > > > > > The main inputs to this engine are:
> > > > > > >- northd node
> > > > > > >- lr_nat node
> > > > > > >- lb_data node
> > > > > > >
> > > > > > > A record for each logical router is maintained in the
> > > > 'lr_stateful_table'
> > > > > > > hmap table and this record
> > > > > > >- stores the lb related data
> > > > > > >- embeds the 'lr_nat' record.
> > > > > > >
> > > > > > > This engine node becomes an input to 'lflow' node.
> > > > > > >
> > > > > > > Signed-off-by: Numan Siddique 
> > > > > > > ---
> > > > > > >  lib/stopwatch-names.h|   1 +
> > > > > > >  northd/automake.mk   |   2 +
> > > > > > >  northd/en-lflow.c|   4 +
> > > > > > >  northd/en-lr-nat.h   |   3 +
> > > > > > >  northd/en-lr-stateful.c  | 641
> > +++
> > > > > > >  northd/en-lr-stateful.h  | 105 ++
> > > > > > >  northd/en-sync-sb.c  |  49 +--
> > > > > > >  northd/inc-proc-northd.c |  13 +-
> > > > > > >  northd/northd.c  | 711
> > > > ---
> > > > > > >  northd/northd.h  | 139 +++-
> > > > > > >  northd/ovn-northd.c  |   1 +
> > > > > > >  tests/ovn-northd.at  |  62 
> > > > > > >  12 files changed, 1213 insertions(+), 518 deletions(-)
> > > > > > >  create mode 100644 northd/en-lr-stateful.c
> > > > > > >  create mode 100644 northd/en-lr-stateful.h
> > > > > > >
> > > > > > > diff --git a/lib/stopwatch-names.h b/lib/stopwatch-names.h
> > > > > > > index 782d64320a..e5e41fbfd8 100644
> > > > > > > --- a/lib/stopwatch-names.h
> > > > > > > +++ b/lib/stopwatch-names.h
> > > > > > > @@ -30,5 +30,6 @@
> > > > > > >  #define PORT_GROUP_RUN_STOPWATCH_NAME "port_group_run"
> > > > > > >  #define SYNC_METERS_RUN_STOPWATCH_NAME "sync_meters_run"
> > > > > > >  #define LR_NAT_RUN_STOPWATCH_NAME "lr_nat_run"
> > > > > > > +#define LR_STATEFUL_RUN_STOPWATCH_NAME "lr_stateful"
> > > > > > >
> > > > > > >  #endif
> > > > > > > diff --git a/northd/automake.mk b/northd/automake.mk
> > > > > > > index a477105470..b886356c9c 100644
> > > > > > > --- a/northd/automake.mk
> > > > > > > +++ b/northd/automake.mk
> > > > > > > @@ -26,6 +26,8 @@ northd_ovn_northd_SOURCES = \
> > > > > > > northd/en-lb-data.h \
> > > > > > > northd/en-lr-nat.c \
> > > > > > > northd/en-lr-nat.h \
> > > > > > > +   northd/en-lr-stateful.c \
> > > > > > > +   northd/en-lr-stateful.h \
> > > > > > > northd/inc-proc-northd.c \
> > > > > > > northd/inc-proc-northd.h \
> > > > > > > northd/ipam.c \
> > > > > > > diff --git a/northd/en-lflow.

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-26 Thread Han Zhou

On Fri, Jan 26, 2024 at 10:26 AM Han Zhou  wrote:
>
>
>
> On Fri, Jan 26, 2024 at 7:53 AM Aaron Conole  wrote:
> >
> > Han Zhou  writes:
> >
> > > On Thu, Jan 25, 2024 at 12:55 PM Aaron Conole 
wrote:
> > >>
> > >> From: Kevin Sprague 
> > >>
> > >> During normal operations, it is useful to understand when a
particular flow
> > >> gets removed from the system. This can be useful when debugging
performance
> > >> issues tied to ofproto flow changes, trying to determine deployed
traffic
> > >> patterns, or while debugging dynamic systems where ports come and go.
> > >>
> > >> Prior to this change, there was a lack of visibility around flow
expiration.
> > >> The existing debugging infrastructure could tell us when a flow was
added to
> > >> the datapath, but not when it was removed or why.
> > >>
> > >> This change introduces a USDT probe at the point where the
revalidator
> > >> determines that the flow should be removed.  Additionally, we track
the
> > >> reason for the flow eviction and provide that information as well.
With
> > >> this change, we can track the complete flow lifecycle for the netlink
> > >> datapath by hooking the upcall tracepoint in kernel, the flow put
USDT, and
> > >> the revaldiator USDT, letting us watch as flows are added and
removed from
> > >> the kernel datapath.
> > >>
> > >> This change only enables this information via USDT probe, so it
won't be
> > >> possible to access this information any other way (see:
> > >> Documentation/topics/usdt-probes.rst).
> > >>
> > >> Also included is a script
(utilities/usdt-scripts/flow_reval_monitor.py)
> > >> which serves as a demonstration of how the new USDT probe might be
used
> > >> going forward.
> > >>
> > >> Signed-off-by: Kevin Sprague 
> > >> Co-authored-by: Aaron Conole 
> > >> Signed-off-by: Aaron Conole 
> > >
> > > Thanks Aaron for taking care of this patch. I saw you resolved most
of my comments for the v6 of the original patch:
> > >
https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401220.html
> > >
> > > Butit seems my last comment was missed:
> > > ===
> > >
> > > I do notice a counter in my patch doesn't have a
> > > counterpart in this patch. In revalidator_sweep__(), I have:
> > > if (purge) {
> > > result = UKEY_DELETE;
> > > +COVERAGE_INC(upcall_flow_del_purge);
> > >
> > > Would it be good to add one (e.g. FDR_PURGE) here, too?
> > >
> > > ===
> > > Could you check if this can be added?
> > > If this is merged I can rebase my patch on top of this.
> >
> > Sorry I didn't reply to this.
> >
> > I'm not sure it makes sense to add the probe for purge, specifically as
> > the purge is only done in two cases:
> >
> > 1. The threads are being stopped (which should never occur after
> >initialization unless the vswitchd is being stopped / killed)
> >
> > 2. An admin runs a command to purge the revalidators (which isn't a
> >recommended procedure as it can cause lots of really weird side
> >effects and we only use it as a debug tool).
> >
> > Did I understand the case enough?  I didn't reread the patch you're
> > proposing, so I might be misunderstanding something.
>
> I believe your understanding is correct. However, I think it would be
good to cover all the reasons of DP flow deletion, and "purge" seems to be
the only case missing now.
> Although purge is less likely to happen in production, it is still
possible, probably in some weird scenarios. Would it be good to add for
completeness, if there is no harm?
>
> Thanks,
> Han
>
That being said, I don't have a strong opinion on this. So I give my ack in
either case:
Acked-by: Han Zhou 

> >
> > > Thanks,
> > > Han
> > >
> > >>
> > >> ---
> > >>  Documentation/topics/usdt-probes.rst |   1 +
> > >>  ofproto/ofproto-dpif-upcall.c|  42 +-
> > >>  utilities/automake.mk|   3 +
> > >>  utilities/usdt-scripts/flow_reval_monitor.py | 653
+++
> > >>  4 files changed, 693 insertions(+), 6 deletions(-)
> > >>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
> > >>
> > >> diff --git a/Documentation/t

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-26 Thread Han Zhou

On Fri, Jan 26, 2024 at 7:53 AM Aaron Conole  wrote:
>
> Han Zhou  writes:
>
> > On Thu, Jan 25, 2024 at 12:55 PM Aaron Conole 
wrote:
> >>
> >> From: Kevin Sprague 
> >>
> >> During normal operations, it is useful to understand when a particular
flow
> >> gets removed from the system. This can be useful when debugging
performance
> >> issues tied to ofproto flow changes, trying to determine deployed
traffic
> >> patterns, or while debugging dynamic systems where ports come and go.
> >>
> >> Prior to this change, there was a lack of visibility around flow
expiration.
> >> The existing debugging infrastructure could tell us when a flow was
added to
> >> the datapath, but not when it was removed or why.
> >>
> >> This change introduces a USDT probe at the point where the revalidator
> >> determines that the flow should be removed.  Additionally, we track the
> >> reason for the flow eviction and provide that information as well.
With
> >> this change, we can track the complete flow lifecycle for the netlink
> >> datapath by hooking the upcall tracepoint in kernel, the flow put
USDT, and
> >> the revaldiator USDT, letting us watch as flows are added and removed
from
> >> the kernel datapath.
> >>
> >> This change only enables this information via USDT probe, so it won't
be
> >> possible to access this information any other way (see:
> >> Documentation/topics/usdt-probes.rst).
> >>
> >> Also included is a script
(utilities/usdt-scripts/flow_reval_monitor.py)
> >> which serves as a demonstration of how the new USDT probe might be used
> >> going forward.
> >>
> >> Signed-off-by: Kevin Sprague 
> >> Co-authored-by: Aaron Conole 
> >> Signed-off-by: Aaron Conole 
> >
> > Thanks Aaron for taking care of this patch. I saw you resolved most of
my comments for the v6 of the original patch:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401220.html
> >
> > Butit seems my last comment was missed:
> > ===
> >
> > I do notice a counter in my patch doesn't have a
> > counterpart in this patch. In revalidator_sweep__(), I have:
> > if (purge) {
> > result = UKEY_DELETE;
> > +COVERAGE_INC(upcall_flow_del_purge);
> >
> > Would it be good to add one (e.g. FDR_PURGE) here, too?
> >
> > ===
> > Could you check if this can be added?
> > If this is merged I can rebase my patch on top of this.
>
> Sorry I didn't reply to this.
>
> I'm not sure it makes sense to add the probe for purge, specifically as
> the purge is only done in two cases:
>
> 1. The threads are being stopped (which should never occur after
>initialization unless the vswitchd is being stopped / killed)
>
> 2. An admin runs a command to purge the revalidators (which isn't a
>recommended procedure as it can cause lots of really weird side
>effects and we only use it as a debug tool).
>
> Did I understand the case enough?  I didn't reread the patch you're
> proposing, so I might be misunderstanding something.

I believe your understanding is correct. However, I think it would be good
to cover all the reasons of DP flow deletion, and "purge" seems to be the
only case missing now.
Although purge is less likely to happen in production, it is still
possible, probably in some weird scenarios. Would it be good to add for
completeness, if there is no harm?

Thanks,
Han

>
> > Thanks,
> > Han
> >
> >>
> >> ---
> >>  Documentation/topics/usdt-probes.rst |   1 +
> >>  ofproto/ofproto-dpif-upcall.c|  42 +-
> >>  utilities/automake.mk|   3 +
> >>  utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
> >>  4 files changed, 693 insertions(+), 6 deletions(-)
> >>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
> >>
> >> diff --git a/Documentation/topics/usdt-probes.rst
b/Documentation/topics/usdt-probes.rst
> >> index e527f43bab..a8da9bb1f7 100644
> >> --- a/Documentation/topics/usdt-probes.rst
> >> +++ b/Documentation/topics/usdt-probes.rst
> >> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
> >>  - dpif_recv:recv_upcall
> >>  - main:poll_block
> >>  - main:run_start
> >> +- revalidate:flow_result
> >>  - revalidate_ukey\_\_:entry
> >>  - revalidate_ukey\_\_:exit
> >>  - udpif_revalidator:start_dump
> >> diff --git a/ofproto/ofproto-dpi

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-25 Thread Han Zhou

On Thu, Jan 25, 2024 at 12:55 PM Aaron Conole  wrote:
>
> From: Kevin Sprague 
>
> During normal operations, it is useful to understand when a particular
flow
> gets removed from the system. This can be useful when debugging
performance
> issues tied to ofproto flow changes, trying to determine deployed traffic
> patterns, or while debugging dynamic systems where ports come and go.
>
> Prior to this change, there was a lack of visibility around flow
expiration.
> The existing debugging infrastructure could tell us when a flow was added
to
> the datapath, but not when it was removed or why.
>
> This change introduces a USDT probe at the point where the revalidator
> determines that the flow should be removed.  Additionally, we track the
> reason for the flow eviction and provide that information as well.  With
> this change, we can track the complete flow lifecycle for the netlink
> datapath by hooking the upcall tracepoint in kernel, the flow put USDT,
and
> the revaldiator USDT, letting us watch as flows are added and removed from
> the kernel datapath.
>
> This change only enables this information via USDT probe, so it won't be
> possible to access this information any other way (see:
> Documentation/topics/usdt-probes.rst).
>
> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
> which serves as a demonstration of how the new USDT probe might be used
> going forward.
>
> Signed-off-by: Kevin Sprague 
> Co-authored-by: Aaron Conole 
> Signed-off-by: Aaron Conole 

Thanks Aaron for taking care of this patch. I saw you resolved most of my
comments for the v6 of the original patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401220.html

Butit seems my last comment was missed:
===

I do notice a counter in my patch doesn't have a
counterpart in this patch. In revalidator_sweep__(), I have:
if (purge) {
result = UKEY_DELETE;
+COVERAGE_INC(upcall_flow_del_purge);

Would it be good to add one (e.g. FDR_PURGE) here, too?

===
Could you check if this can be added?
If this is merged I can rebase my patch on top of this.

Thanks,
Han

>
> ---
>  Documentation/topics/usdt-probes.rst |   1 +
>  ofproto/ofproto-dpif-upcall.c|  42 +-
>  utilities/automake.mk|   3 +
>  utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>  4 files changed, 693 insertions(+), 6 deletions(-)
>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>
> diff --git a/Documentation/topics/usdt-probes.rst
b/Documentation/topics/usdt-probes.rst
> index e527f43bab..a8da9bb1f7 100644
> --- a/Documentation/topics/usdt-probes.rst
> +++ b/Documentation/topics/usdt-probes.rst
> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>  - dpif_recv:recv_upcall
>  - main:poll_block
>  - main:run_start
> +- revalidate:flow_result
>  - revalidate_ukey\_\_:entry
>  - revalidate_ukey\_\_:exit
>  - udpif_revalidator:start_dump
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index b5cbeed878..97d75833f7 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -269,6 +269,18 @@ enum ukey_state {
>  };
>  #define N_UKEY_STATES (UKEY_DELETED + 1)
>
> +enum flow_del_reason {
> +FDR_REVALIDATE = 0, /* The flow was revalidated. */
> +FDR_FLOW_IDLE,  /* The flow went unused and was deleted. */
> +FDR_TOO_EXPENSIVE,  /* The flow was too expensive to revalidate.
*/
> +FDR_FLOW_WILDCARDED,/* The flow needed a narrower wildcard mask.
*/
> +FDR_BAD_ODP_FIT,/* The flow had a bad ODP flow fit. */
> +FDR_NO_OFPROTO, /* The flow didn't have an associated
ofproto. */
> +FDR_XLATION_ERROR,  /* There was an error translating the flow.
*/
> +FDR_AVOID_CACHING,  /* Flow deleted to avoid caching. */
> +FDR_FLOW_LIMIT, /* All flows being killed. */
> +};
> +
>  /* 'udpif_key's are responsible for tracking the little bit of state
udpif
>   * needs to do flow expiration which can't be pulled directly from the
>   * datapath.  They may be created by any handler or revalidator thread
at any
> @@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, struct
udpif_key *ukey,
>  static enum reval_result
>  revalidate_ukey__(struct udpif *udpif, const struct udpif_key *ukey,
>uint16_t tcp_flags, struct ofpbuf *odp_actions,
> -  struct recirc_refs *recircs, struct xlate_cache
*xcache)
> +  struct recirc_refs *recircs, struct xlate_cache
*xcache,
> +  enum flow_del_reason *reason)
>  {
>  struct xlate_out *xoutp;
>  struct netflow *netflow;
> @@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif, const
struct udpif_key *ukey,
>  netflow = NULL;
>
>  if (xlate_ukey(udpif, ukey, tcp_flags, )) {
> +*reason = FDR_XLATION_ERROR;
>  goto exit;
>  }
>  xoutp = 
>

Re: [ovs-dev] [PATCH ovn 3/3] ovn-controller: Support VIF-based local encap IPs selection.

2024-01-25 Thread Han Zhou

On Tue, Jan 23, 2024 at 5:29 AM Ales Musil  wrote:
>
>
>
> On Wed, Jan 17, 2024 at 6:48 AM Han Zhou  wrote:
>>
>> Commit dd527a283cd8 partially supported multiple encap IPs. It supported
>> remote encap IP selection based on the destination VIF's encap_ip
>> configuration. This patch adds the support for selecting local encap IP
>> based on the source VIF's encap_ip configuration.
>>
>> Co-authored-by: Lei Huang 
>> Signed-off-by: Lei Huang 
>> Signed-off-by: Han Zhou 
>> ---
>
>
> Hi Han and Lei,
>
> thank you for the patch, I have a couple of comments/questions down below.


Thanks Ales.

>
>
>>  NEWS|   3 +
>>  controller/chassis.c|   2 +-
>>  controller/local_data.c |   2 +-
>>  controller/local_data.h |   2 +-
>>  controller/ovn-controller.8.xml |  30 ++-
>>  controller/ovn-controller.c |  49 
>>  controller/physical.c   | 134 ++--
>>  controller/physical.h   |   2 +
>>  include/ovn/logical-fields.h|   4 +-
>>  ovn-architecture.7.xml  |  18 -
>>  tests/ovn.at|  51 +++-
>>  11 files changed, 243 insertions(+), 54 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 5f267b4c64cc..5a3eed608617 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -14,6 +14,9 @@ Post v23.09.0
>>- ovn-northd-ddlog has been removed.
>>- A new LSP option "enable_router_port_acl" has been added to enable
>>  conntrack for the router port whose peer is l3dgw_port if set it
true.
>> +  - Support selecting encapsulation IP based on the source/destination
VIF's
>> +settting. See ovn-controller(8) 'external_ids:ovn-encap-ip' for more
>> +details.
>>
>>  OVN v23.09.0 - 15 Sep 2023
>>  --
>> diff --git a/controller/chassis.c b/controller/chassis.c
>> index a6f13ccc42d5..55f2beb37674 100644
>> --- a/controller/chassis.c
>> +++ b/controller/chassis.c
>> @@ -61,7 +61,7 @@ struct ovs_chassis_cfg {
>>
>>  /* Set of encap types parsed from the 'ovn-encap-type' external-id.
*/
>>  struct sset encap_type_set;
>> -/* Set of encap IPs parsed from the 'ovn-encap-type' external-id. */
>> +/* Set of encap IPs parsed from the 'ovn-encap-ip' external-id. */
>>  struct sset encap_ip_set;
>>  /* Interface type list formatted in the OVN-SB Chassis required
format. */
>>  struct ds iface_types;
>> diff --git a/controller/local_data.c b/controller/local_data.c
>> index a9092783958f..8606414f8728 100644
>> --- a/controller/local_data.c
>> +++ b/controller/local_data.c
>> @@ -514,7 +514,7 @@ chassis_tunnels_destroy(struct hmap *chassis_tunnels)
>>   */
>>  struct chassis_tunnel *
>>  chassis_tunnel_find(const struct hmap *chassis_tunnels, const char
*chassis_id,
>> -char *remote_encap_ip, char *local_encap_ip)
>> +char *remote_encap_ip, const char *local_encap_ip)
>
>
> nit: Unrelated change.


Ack

>
>
>>  {
>>  /*
>>   * If the specific encap_ip is given, look for the chassisid_ip
entry,
>> diff --git a/controller/local_data.h b/controller/local_data.h
>> index bab95bcc3824..ca3905bd20e6 100644
>> --- a/controller/local_data.h
>> +++ b/controller/local_data.h
>> @@ -150,7 +150,7 @@ bool local_nonvif_data_handle_ovs_iface_changes(
>>  struct chassis_tunnel *chassis_tunnel_find(const struct hmap
*chassis_tunnels,
>> const char *chassis_id,
>> char *remote_encap_ip,
>> -   char *local_encap_ip);
>> +   const char *local_encap_ip);
>
>
> Same as above.


Ack

>
>
>>
>>  bool get_chassis_tunnel_ofport(const struct hmap *chassis_tunnels,
>> const char *chassis_name,
>> diff --git a/controller/ovn-controller.8.xml
b/controller/ovn-controller.8.xml
>> index efa65e3fd927..5ebef048d721 100644
>> --- a/controller/ovn-controller.8.xml
>> +++ b/controller/ovn-controller.8.xml
>> @@ -176,10 +176,32 @@
>>
>>external_ids:ovn-encap-ip
>>
>> -The IP address that a chassis should use to connect to this node
>> -using encapsulation types specified by
>> -external_ids:ovn-encap-type. Multiple
encapsulation IPs
>> -may be specified with a comma-separated list.

Re: [ovs-dev] [PATCH ovn 1/4] rbac: MAC_Bindings can only be updated by the inserting chassis.

2024-01-25 Thread Han Zhou

On Mon, Jan 22, 2024 at 6:36 AM Ales Musil  wrote:
>
> On Mon, Jan 22, 2024 at 9:09 AM Felix Huettner via dev <
> ovs-dev@openvswitch.org> wrote:
>
> > On Fri, Jan 19, 2024 at 04:33:28PM -0500, Mark Michelson wrote:
> > > With this change, a chassis may only update MAC Binding records that
it
> > > has created. We achieve this by adding a "chassis_name" column to the
> > > MAC_Binding table, and having the chassis insert its name into this
> > > column when creating a new MAC_Binding. The "chassis_name" is now part
> > > of the rbac_auth structure for the MAC_Binding table.
> >
> > Hi Mark,
> >
> > i am concerned that this will negatively impact MAC_Bindings for LRPs
> > with multiple gateway chassis.
> >
> > Suppose a MAC_Binding is first learned by an LRP currently residing on
> > chassis1. The LRP then failovers to chassis2 and chassis1 is potentially
> > even
> > removed completely. In this case the ovn-controller on chassis2 would no
> > longer be allowed to update the timestamp column. This would break the
> > arp refresh mechanism.
> >
> > In this case the MAC_Binding would need to expire first, causing northd
> > to removed it. Afterwards chassis2 would be allowed to insert a new
> > record with its own chassis name.
> >
> > I honestly did not try out this case so i am not fully sure if this
> > issue realy exists or if i have a missunderstanding somewhere.
> >
> > Thanks
> > Felix
> >
> >
> Hi Mark and Felix,
>
> I personally don't see the ability to not refresh as an issue, the MAC
> binding would age out and the node could create a new one. However, it
will
> still produce errors when the remote chassis tries to update the timestamp
> of MAC binding owned by someone else.
>
> There is another issue that I'm more concerned about and that's in case
the
> aging is not enabled at all. After failover the MAC binding might not be
> updated at all. Similar issue applies to MAM bindings distributed across
> many chassis. One will own it and only that chassis can update MAC address
> when anything changes which it might never do.

This is indeed a fundamental problem. Even if aging is configured it is
still a problem. Let's say aging is set as 5 min, then when the IP is moved
to a different chassis it will not work within 5 min, which is unacceptable.

In fact the below test case fails with this patch:
81. ovn.at:5232: testing IP relocation using GARP request --
parallelization=yes -- ovn_monitor_all=yes ...

>
> To solve that we would need duplicates per chassis, basically the same MAC
> binding row, but with different "owners". This goes in hand with having OF
> only for MAC bindings owned by current chassis and nothing else. Does that
> make sense?
>

If each chassis has OF only for MAC bindings owned by itself, there is no
point to have MAC binding table in SB DB, right?
But it doesn't work since the MAC binding table works as ARP cache/Neigbor
table for a distributed router, so we will need a central place to share
this information. Otherwise, when a IP moves from one chassis to another,
how would other chassis know? (the same scenario as the test case 81 above,
and similarly there will be problem for DGP failover)
However I don't have a good solution either. Need to think more about it.

Thanks,
Han

> All the above unfortunately applies also to FDB.
>
> Thanks,
> Ales
>
>
> > > ---
> > >  controller/pinctrl.c | 51

> > >  northd/ovn-northd.c  |  2 +-
> > >  ovn-sb.ovsschema |  7 +++---
> > >  ovn-sb.xml   |  3 +++
> > >  4 files changed, 45 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > > index 4992eab08..a00cdceea 100644
> > > --- a/controller/pinctrl.c
> > > +++ b/controller/pinctrl.c
> > > @@ -180,6 +180,7 @@ struct pinctrl {
> > >  bool mac_binding_can_timestamp;
> > >  bool fdb_can_timestamp;
> > >  bool dns_supports_ovn_owned;
> > > +bool mac_binding_has_chassis_name;
> > >  };
> > >
> > >  static struct pinctrl pinctrl;
> > > @@ -204,7 +205,8 @@ static void run_put_mac_bindings(
> > >  struct ovsdb_idl_txn *ovnsb_idl_txn,
> > >  struct ovsdb_idl_index *sbrec_datapath_binding_by_key,
> > >  struct ovsdb_idl_index *sbrec_port_binding_by_key,
> > > -struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip)
> > > +struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip,
> > > +const struct sbrec_chassis *chassis)
> > >  OVS_REQUIRES(pinctrl_mutex);
> > >  static void wait_put_mac_bindings(struct ovsdb_idl_txn
*ovnsb_idl_txn);
> > >  static void send_mac_binding_buffered_pkts(struct rconn *swconn)
> > > @@ -3591,6 +3593,13 @@ pinctrl_update(const struct ovsdb_idl *idl,
const
> > char *br_int_name)
> > >  notify_pinctrl_handler();
> > >  }
> > >
> > > +bool mac_binding_has_chassis_name =
> > > +sbrec_server_has_mac_binding_table_col_chassis_name(idl);
> > > +if (mac_binding_has_chassis_name !=
> >

Re: [ovs-dev] [PATCH ovn v5 00/16] northd lflow incremental processing

2024-01-24 Thread Han Zhou

On Thu, Jan 11, 2024 at 7:28 AM  wrote:
>
> From: Numan Siddique 
>
> This patch series adds incremental processing in the lflow engine
> node to handle changes to northd and other engine nodes.
> Changed related to load balancers and NAT are mainly handled in
> this patch series.
>
> This patch series can also be found here -
https://github.com/numansiddique/ovn/tree/northd_lbnatacl_lflow/v5
>
> Prior to this patch series, most of the changes to northd engine
> resulted in full recomputation of logical flows.  This series
> aims to improve the performance of ovn-northd by adding the I-P
> support.  In order to add this support, some of the northd engine
> node data (from struct ovn_datapath) is split and moved over to
> new engine nodes - mainly related to load balancers, NAT and ACLs.
>
> Below are the scale testing results done with these patches applied
> using ovn-heater.  The test ran the scenario  -
> ocp-500-density-heavy.yml [1].
>
> With all the lflow I-P patches applied, the resuts are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1368831.1290161.192001
 1.2041671.2127280.66501783.127099   125 0
> Namespace.add_ports 0.0052160.0057360.007034
 0.0154860.0189780.0062110.776373125 0
> WorkerNode.bind_port0.0350300.0460820.052469
 0.0582930.0603110.04597311.493259   250 0
> WorkerNode.ping_port0.0050570.0067271.047692
 1.0692531.0713360.26689666.724094   250 0
>
---
>
> The results with the present main are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1354912.2238053.311270
 3.3390783.3453461.729172216.146495  125 0
> Namespace.add_ports 0.0053800.0057440.006819
 0.0187730.0208000.0062920.786532125 0
> WorkerNode.bind_port0.0341790.0460550.053488
 0.0588010.0710430.04611711.529311   250 0
> WorkerNode.ping_port0.0049560.0069523.086952
 3.1917433.1928070.791544197.886026  250 0
>
---
>
> Please see the link [2] which has a high level description of the
> changes done in this patch series.
>
>
> [1] -
https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> [2] -
https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410053.html
>
> v4 -> v5
> ---
>* Rebased to latest main and resolved the conflicts.
>
>* Addressed the review comments from Han in patch 15 (and in p8).
Removed the
>  assert if SB dp group is missing and handled it by returning false
>  so that lflow engine recomputes.  Added test cases to cover this
>  scenario for both lflows (p8) and SB load balancers (p15) .

Thanks Numan. I went through this version of the series. I tried my best to
review in details but I can't say I examined every lines of the changes.
The major comments are about the implicit dependency related to p4, p5, p6,
p7, and some pending discussions for p8 (for which I am also going to do
more performance test). For the rest of patches, please consider them as:
Acked-by: Han Zhou 

Thanks,
Han

>
> v3 -> v4
> ---
>* Addressed most of the review comments from Dumitru and Han.
>
>* Found a couple of bugs in v3 patch 9 -
>  "northd: Refactor lflow management into a separate module."
>  and addressed them in v4.
>  To brief  the issue, if a logical flow L(M,

Re: [ovs-dev] [PATCH ovn v5 11/16] northd: Handle lb changes in lflow engine.

2024-01-24 Thread Han Zhou

ovn_lflow_add_with_dp_group() macro. */

nit: the sentence is a little confusing. Probably more clear to say:
indicates whether the lflow was added with a dp_group using the
ovn_lflow_add_with_dp_group() macro.

> +bool dpgrp_lflow;
> +/* dpgrp bitmap and bitmap length.  Valid only of dpgrp_lflow is
true. */
> +unsigned long *dpgrp_bitmap;
> +size_t dpgrp_bitmap_len;
> +
> +/* Index id of the datapath this lflow_ref_node belongs to.
> + * Valid only if dpgrp_lflow is false. */
>  size_t dp_index;
>
>  /* Indicates if the lflow_ref_node for an lflow - L(M, A) is linked
> @@ -429,9 +437,19 @@ lflow_ref_unlink_lflows(struct lflow_ref *lflow_ref)

The comment of this function needs to be updated for the below change: it
clears all DP bits if it is generated from DPG directly.

>  struct lflow_ref_node *lrn;
>
>  LIST_FOR_EACH (lrn, lflow_list_node, _ref->lflows_ref_list) {
> -if (dec_dp_refcnt(>lflow->dp_refcnts_map,
> -  lrn->dp_index)) {
> -    bitmap_set0(lrn->lflow->dpg_bitmap, lrn->dp_index);
> +if (lrn->dpgrp_lflow) {
> +size_t index;
> +BITMAP_FOR_EACH_1 (index, lrn->dpgrp_bitmap_len,
> +   lrn->dpgrp_bitmap) {
> +if (dec_dp_refcnt(>lflow->dp_refcnts_map, index)) {
> +bitmap_set0(lrn->lflow->dpg_bitmap, lrn->dp_index);

This is wrong. Should use "index" instead of lrn->dp_index here. It is
fixed in a future patch but should actually be fixed in this patch.

With these addressed:
Acked-by: Han Zhou 

Thanks,
Han

> +}
> +}
> +} else {
> +if (dec_dp_refcnt(>lflow->dp_refcnts_map,
> +  lrn->dp_index)) {
> +bitmap_set0(lrn->lflow->dpg_bitmap, lrn->dp_index);
> +}
>  }
>
>  lrn->linked = false;
> @@ -502,18 +520,26 @@ lflow_table_add_lflow(struct lflow_table
*lflow_table,
>   io_port, ctrl_meter, stage_hint, where);
>
>  if (lflow_ref) {
> -/* lflow referencing is only supported if 'od' is not NULL. */
> -ovs_assert(od);
> -
>  struct lflow_ref_node *lrn =
>  lflow_ref_node_find(_ref->lflow_ref_nodes, lflow,
hash);
>  if (!lrn) {
>  lrn = xzalloc(sizeof *lrn);
>  lrn->lflow = lflow;
> -lrn->dp_index = od->index;
> +lrn->dpgrp_lflow = !od;
> +if (lrn->dpgrp_lflow) {
> +lrn->dpgrp_bitmap = bitmap_clone(dp_bitmap,
dp_bitmap_len);
> +lrn->dpgrp_bitmap_len = dp_bitmap_len;
> +
> +size_t index;
> +BITMAP_FOR_EACH_1 (index, dp_bitmap_len, dp_bitmap) {
> +inc_dp_refcnt(>dp_refcnts_map, index);
> +}
> +} else {
> +lrn->dp_index = od->index;
> +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> +}
>  ovs_list_insert(_ref->lflows_ref_list,
>  >lflow_list_node);
> -inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
>  ovs_list_insert(>referenced_by, >ref_list_node);
>
>  hmap_insert(_ref->lflow_ref_nodes, >ref_node,
hash);
> @@ -1257,5 +1283,8 @@ lflow_ref_node_destroy(struct lflow_ref_node *lrn,
>  }
>  ovs_list_remove(>lflow_list_node);
>  ovs_list_remove(>ref_list_node);
> +if (lrn->dpgrp_lflow) {
> +bitmap_free(lrn->dpgrp_bitmap);
> +}
>  free(lrn);
>  }
> diff --git a/northd/northd.c b/northd/northd.c
> index 08732abbfa..6225dfe541 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -7477,7 +7477,7 @@ build_lb_rules_pre_stateful(struct lflow_table
*lflows,
>  ovn_lflow_add_with_dp_group(
>  lflows, lb_dps->nb_ls_map, ods_size(ls_datapaths),
>  S_SWITCH_IN_PRE_STATEFUL, 120, ds_cstr(match),
ds_cstr(action),
> ->nlb->header_, NULL);
> +>nlb->header_, lb_dps->lflow_ref);
>  }
>  }
>
> @@ -7922,7 +7922,7 @@ build_lb_rules(struct lflow_table *lflows, struct
ovn_lb_datapaths *lb_dps,
>  }
>
>  build_lb_affinity_ls_flows(lflows, lb_dps, lb_vip, ls_datapaths,
> -   NULL);
> +   lb_dps->lflow_ref);
>
>  unsigned long *dp_non_meter = NULL;
>  bool build_non_meter = false;
> @@ -7946,7

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou

On Wed, Jan 24, 2024 at 10:07 PM Han Zhou  wrote:
>
>
>
> On Thu, Jan 11, 2024 at 7:32 AM  wrote:
> >
> > From: Numan Siddique 
> >
> > ovn_lflow_add() and other related functions/macros are now moved
> > into a separate module - lflow-mgr.c.  This module maintains a
> > table 'struct lflow_table' for the logical flows.  lflow table
> > maintains a hmap to store the logical flows.
> >
> > It also maintains the logical switch and router dp groups.
> >
> > Previous commits which added lflow incremental processing for
> > the VIF logical ports, stored the references to
> > the logical ports' lflows using 'struct lflow_ref_list'.  This
> > struct is renamed to 'struct lflow_ref' and is part of lflow-mgr.c.
> > It is  modified a bit to store the resource to lflow references.
> >
> > Example usage of 'struct lflow_ref'.
> >
> > 'struct ovn_port' maintains 2 instances of lflow_ref.  i,e
> >
> > struct ovn_port {
> >...
> >...
> >struct lflow_ref *lflow_ref;
> >struct lflow_ref *stateful_lflow_ref;
>
> Hi Numan,
>
> In addition to the lock discussion with you and Dumitru, I still want to
discuss another thing of this patch regarding the second lflow_ref:
stateful_lflow_ref.
> I understand that you added this to achieve finer grained I-P especially
for router ports. I am wondering how much performance gain is from this.
For my understanding it shouldn't matter much since each ovn_port should be
associated with a very limited number of lflows. Could you provide more
insight/data on this? I think it would be better to keep things simple
(i.e. one object, one lflow_ref list) unless the benefit is obvious.
>
> I am also trying to run another performance regression test for
recompute, since I am a little concerned about the DP refcnt hmap
associated with each lflow. I understand it's necessary to handle the
duplicated lflow cases, but it looks heavy and removes the opportunities
for more efficient bitmap operations. Let me know if you have already had
evaluated its performance.
>
> Thanks,
> Han
>

Sorry that I forgot to mention another minor comment for the struct
lflow_ref_node. It would helpful to add more comments about the typical
life cycle of the lflow_ref_node, so that it is easier to understand why
hmap is used in lflow_ref, and why linked is needed in lflow_ref_node. For
my understanding the life cycle is something like:
1. created and linked at lflow_table_add_lflow()
2. unlinked when handling a change of the object that references it.
3. it may be re-linked when handling the same object change.
4. it is used to sync the lflow change (e.g. adding/removing DPs) to SB.
5. it is destroyed after syncing a unlinked lflow_ref to SB.

I think it would help people understand the code more easily. What do you
think?

Thanks,
Han

> > };
> >
> > All the logical flows generated by
> > build_lswitch_and_lrouter_iterate_by_lsp() uses the ovn_port->lflow_ref.
> >
> > All the logical flows generated by build_lsp_lflows_for_lbnats()
> > uses the ovn_port->stateful_lflow_ref.
> >
> > When handling the ovn_port changes incrementally, the lflows referenced
> > in 'struct ovn_port' are cleared and regenerated and synced to the
> > SB logical flows.
> >
> > eg.
> >
> > lflow_ref_clear_lflows(op->lflow_ref);
> > build_lswitch_and_lrouter_iterate_by_lsp(op, ...);
> > lflow_ref_sync_lflows_to_sb(op->lflow_ref, ...);
> >
> > This patch does few more changes:
> >   -  Logical flows are now hashed without the logical
> >  datapaths.  If a logical flow is referenced by just one
> >  datapath, we don't rehash it.
> >
> >   -  The synthetic 'hash' column of sbrec_logical_flow now
> >  doesn't use the logical datapath.  This means that
> >  when ovn-northd is updated/upgraded and has this commit,
> >  all the logical flows with 'logical_datapath' column
> >  set will get deleted and re-added causing some disruptions.
> >
> >   -  With the commit [1] which added I-P support for logical
> >  port changes, multiple logical flows with same match 'M'
> >  and actions 'A' are generated and stored without the
> >  dp groups, which was not the case prior to
> >  that patch.
> >  One example to generate these lflows is:
> >  ovn-nbctl lsp-set-addresses sw0p1 "MAC1 IP1"
> >  ovn-nbctl lsp-set-addresses sw1p1 "MAC1 IP1"
> >  ovn-nbctl lsp-set-addresses sw2p1 "MAC1 IP1"
> >
> >  Now with this patch we go back to the earlier way.  i.e
> >  one logica

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou

On Thu, Jan 11, 2024 at 7:32 AM  wrote:
>
> From: Numan Siddique 
>
> ovn_lflow_add() and other related functions/macros are now moved
> into a separate module - lflow-mgr.c.  This module maintains a
> table 'struct lflow_table' for the logical flows.  lflow table
> maintains a hmap to store the logical flows.
>
> It also maintains the logical switch and router dp groups.
>
> Previous commits which added lflow incremental processing for
> the VIF logical ports, stored the references to
> the logical ports' lflows using 'struct lflow_ref_list'.  This
> struct is renamed to 'struct lflow_ref' and is part of lflow-mgr.c.
> It is  modified a bit to store the resource to lflow references.
>
> Example usage of 'struct lflow_ref'.
>
> 'struct ovn_port' maintains 2 instances of lflow_ref.  i,e
>
> struct ovn_port {
>...
>...
>struct lflow_ref *lflow_ref;
>struct lflow_ref *stateful_lflow_ref;

Hi Numan,

In addition to the lock discussion with you and Dumitru, I still want to
discuss another thing of this patch regarding the second lflow_ref:
stateful_lflow_ref.
I understand that you added this to achieve finer grained I-P especially
for router ports. I am wondering how much performance gain is from this.
For my understanding it shouldn't matter much since each ovn_port should be
associated with a very limited number of lflows. Could you provide more
insight/data on this? I think it would be better to keep things simple
(i.e. one object, one lflow_ref list) unless the benefit is obvious.

I am also trying to run another performance regression test for recompute,
since I am a little concerned about the DP refcnt hmap associated with each
lflow. I understand it's necessary to handle the duplicated lflow cases,
but it looks heavy and removes the opportunities for more efficient bitmap
operations. Let me know if you have already had evaluated its performance.

Thanks,
Han

> };
>
> All the logical flows generated by
> build_lswitch_and_lrouter_iterate_by_lsp() uses the ovn_port->lflow_ref.
>
> All the logical flows generated by build_lsp_lflows_for_lbnats()
> uses the ovn_port->stateful_lflow_ref.
>
> When handling the ovn_port changes incrementally, the lflows referenced
> in 'struct ovn_port' are cleared and regenerated and synced to the
> SB logical flows.
>
> eg.
>
> lflow_ref_clear_lflows(op->lflow_ref);
> build_lswitch_and_lrouter_iterate_by_lsp(op, ...);
> lflow_ref_sync_lflows_to_sb(op->lflow_ref, ...);
>
> This patch does few more changes:
>   -  Logical flows are now hashed without the logical
>  datapaths.  If a logical flow is referenced by just one
>  datapath, we don't rehash it.
>
>   -  The synthetic 'hash' column of sbrec_logical_flow now
>  doesn't use the logical datapath.  This means that
>  when ovn-northd is updated/upgraded and has this commit,
>  all the logical flows with 'logical_datapath' column
>  set will get deleted and re-added causing some disruptions.
>
>   -  With the commit [1] which added I-P support for logical
>  port changes, multiple logical flows with same match 'M'
>  and actions 'A' are generated and stored without the
>  dp groups, which was not the case prior to
>  that patch.
>  One example to generate these lflows is:
>  ovn-nbctl lsp-set-addresses sw0p1 "MAC1 IP1"
>  ovn-nbctl lsp-set-addresses sw1p1 "MAC1 IP1"
>  ovn-nbctl lsp-set-addresses sw2p1 "MAC1 IP1"
>
>  Now with this patch we go back to the earlier way.  i.e
>  one logical flow with logical_dp_groups set.
>
>   -  With this patch any updates to a logical port which
>  doesn't result in new logical flows will not result in
>  deletion and addition of same logical flows.
>  Eg.
>  ovn-nbctl set logical_switch_port sw0p1 external_ids:foo=bar
>  will be a no-op to the SB logical flow table.
>
> [1] - 8bbd678("northd: Incremental processing of VIF additions in 'lflow'
node.")
>
> Signed-off-by: Numan Siddique 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou

On Wed, Jan 24, 2024 at 8:39 PM Numan Siddique  wrote:
>
> On Wed, Jan 24, 2024 at 10:53 PM Han Zhou  wrote:
> >
> > On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara  wrote:
> > >
> > > On 1/24/24 06:01, Han Zhou wrote:
> > > > On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara 
wrote:
> > > >>
> > > >> On 1/11/24 16:31, num...@ovn.org wrote:
> > > >>> +
> > > >>> +void
> > > >>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
> > > >>> +  const struct ovn_datapath *od,
> > > >>> +  const unsigned long *dp_bitmap, size_t
> > > > dp_bitmap_len,
> > > >>> +  enum ovn_stage stage, uint16_t priority,
> > > >>> +  const char *match, const char *actions,
> > > >>> +  const char *io_port, const char
*ctrl_meter,
> > > >>> +  const struct ovsdb_idl_row *stage_hint,
> > > >>> +  const char *where,
> > > >>> +  struct lflow_ref *lflow_ref)
> > > >>> +OVS_EXCLUDED(fake_hash_mutex)
> > > >>> +{
> > > >>> +struct ovs_mutex *hash_lock;
> > > >>> +uint32_t hash;
> > > >>> +
> > > >>> +ovs_assert(!od ||
> > > >>> +   ovn_stage_to_datapath_type(stage) ==
> > > > ovn_datapath_get_type(od));
> > > >>> +
> > > >>> +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
> > > >>> + ovn_stage_get_pipeline(stage),
> > > >>> + priority, match,
> > > >>> + actions);
> > > >>> +
> > > >>> +hash_lock = lflow_hash_lock(_table->entries, hash);
> > > >>> +struct ovn_lflow *lflow =
> > > >>> +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> > > >>> + dp_bitmap_len, hash, stage,
> > > >>> + priority, match, actions,
> > > >>> + io_port, ctrl_meter, stage_hint, where);
> > > >>> +
> > > >>> +if (lflow_ref) {
> > > >>> +/* lflow referencing is only supported if 'od' is not
NULL.
> > */
> > > >>> +ovs_assert(od);
> > > >>> +
> > > >>> +struct lflow_ref_node *lrn =
> > > >>> +lflow_ref_node_find(_ref->lflow_ref_nodes,
lflow,
> > > > hash);
> > > >>> +if (!lrn) {
> > > >>> +lrn = xzalloc(sizeof *lrn);
> > > >>> +lrn->lflow = lflow;
> > > >>> +lrn->dp_index = od->index;
> > > >>> +ovs_list_insert(_ref->lflows_ref_list,
> > > >>> +>lflow_list_node);
> > > >>> +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> > > >>> +ovs_list_insert(>referenced_by,
> > > > >ref_list_node);
> > > >>> +
> > > >>> +hmap_insert(_ref->lflow_ref_nodes,
>ref_node,
> > > > hash);
> > > >>> +}
> > > >>> +
> > > >>> +lrn->linked = true;
> > > >>> +}
> > > >>> +
> > > >>> +lflow_hash_unlock(hash_lock);
> > > >>> +
> > > >>> +}
> > > >>> +
> > > >>
> > > >> This part is not thread safe.
> > > >>
> > > >> If two threads try to add logical flows that have different hashes
and
> > > >> lflow_ref is not NULL we're going to have a race condition when
> > > >> inserting to the _ref->lflow_ref_nodes hash map because the
two
> > > >> threads will take different locks.
> > > >>
> > > >
> > > > I think it is safe because a lflow_ref is always associated with an
> > object,
> > > > e.g. port, datapath, lb, etc., and lflow generation for a single
such
> > > > object is never executed in parallel, which is how the parallel
lflow
> > build
> > > > is designed.
> >

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou

On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara  wrote:
>
> On 1/24/24 06:01, Han Zhou wrote:
> > On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara  wrote:
> >>
> >> On 1/11/24 16:31, num...@ovn.org wrote:
> >>> +
> >>> +void
> >>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
> >>> +  const struct ovn_datapath *od,
> >>> +  const unsigned long *dp_bitmap, size_t
> > dp_bitmap_len,
> >>> +  enum ovn_stage stage, uint16_t priority,
> >>> +  const char *match, const char *actions,
> >>> +  const char *io_port, const char *ctrl_meter,
> >>> +  const struct ovsdb_idl_row *stage_hint,
> >>> +  const char *where,
> >>> +  struct lflow_ref *lflow_ref)
> >>> +OVS_EXCLUDED(fake_hash_mutex)
> >>> +{
> >>> +struct ovs_mutex *hash_lock;
> >>> +uint32_t hash;
> >>> +
> >>> +ovs_assert(!od ||
> >>> +   ovn_stage_to_datapath_type(stage) ==
> > ovn_datapath_get_type(od));
> >>> +
> >>> +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
> >>> + ovn_stage_get_pipeline(stage),
> >>> + priority, match,
> >>> + actions);
> >>> +
> >>> +hash_lock = lflow_hash_lock(_table->entries, hash);
> >>> +struct ovn_lflow *lflow =
> >>> +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> >>> + dp_bitmap_len, hash, stage,
> >>> + priority, match, actions,
> >>> + io_port, ctrl_meter, stage_hint, where);
> >>> +
> >>> +if (lflow_ref) {
> >>> +/* lflow referencing is only supported if 'od' is not NULL.
*/
> >>> +ovs_assert(od);
> >>> +
> >>> +struct lflow_ref_node *lrn =
> >>> +lflow_ref_node_find(_ref->lflow_ref_nodes, lflow,
> > hash);
> >>> +if (!lrn) {
> >>> +lrn = xzalloc(sizeof *lrn);
> >>> +lrn->lflow = lflow;
> >>> +lrn->dp_index = od->index;
> >>> +ovs_list_insert(_ref->lflows_ref_list,
> >>> +>lflow_list_node);
> >>> +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> >>> +ovs_list_insert(>referenced_by,
> > >ref_list_node);
> >>> +
> >>> +hmap_insert(_ref->lflow_ref_nodes, >ref_node,
> > hash);
> >>> +}
> >>> +
> >>> +lrn->linked = true;
> >>> +}
> >>> +
> >>> +lflow_hash_unlock(hash_lock);
> >>> +
> >>> +}
> >>> +
> >>
> >> This part is not thread safe.
> >>
> >> If two threads try to add logical flows that have different hashes and
> >> lflow_ref is not NULL we're going to have a race condition when
> >> inserting to the _ref->lflow_ref_nodes hash map because the two
> >> threads will take different locks.
> >>
> >
> > I think it is safe because a lflow_ref is always associated with an
object,
> > e.g. port, datapath, lb, etc., and lflow generation for a single such
> > object is never executed in parallel, which is how the parallel lflow
build
> > is designed.
> > Does it make sense?
>
> It happens that it's safe in this current patch set because indeed we
> always process individual ports, datapaths, lbs, etc, in the same
> thread.  However, this code (lflow_table_add_lflow()) is generic and
> there's nothing (not even a comment) that would warn developers in the
> future about the potential race if the lflow_ref is shared.
>
> I spoke to Numan offline a bit about this and I think the current plan
> is to leave it as is and add proper locking as a follow up (or in v7).
> But I think we still need a clear comment here warning users about this.
>  Maybe we should add a comment where the lflow_ref structure is defined
too.
>
> What do you think?

I totally agree with you about adding comments to explain the thread safety
considerations, and make it clear that the lflow_ref should always be
associated with the object that is being processed by the thread.
With regard

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-23 Thread Han Zhou

On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara  wrote:
>
> On 1/11/24 16:31, num...@ovn.org wrote:
> > +
> > +void
> > +lflow_table_add_lflow(struct lflow_table *lflow_table,
> > +  const struct ovn_datapath *od,
> > +  const unsigned long *dp_bitmap, size_t
dp_bitmap_len,
> > +  enum ovn_stage stage, uint16_t priority,
> > +  const char *match, const char *actions,
> > +  const char *io_port, const char *ctrl_meter,
> > +  const struct ovsdb_idl_row *stage_hint,
> > +  const char *where,
> > +  struct lflow_ref *lflow_ref)
> > +OVS_EXCLUDED(fake_hash_mutex)
> > +{
> > +struct ovs_mutex *hash_lock;
> > +uint32_t hash;
> > +
> > +ovs_assert(!od ||
> > +   ovn_stage_to_datapath_type(stage) ==
ovn_datapath_get_type(od));
> > +
> > +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
> > + ovn_stage_get_pipeline(stage),
> > + priority, match,
> > + actions);
> > +
> > +hash_lock = lflow_hash_lock(_table->entries, hash);
> > +struct ovn_lflow *lflow =
> > +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> > + dp_bitmap_len, hash, stage,
> > + priority, match, actions,
> > + io_port, ctrl_meter, stage_hint, where);
> > +
> > +if (lflow_ref) {
> > +/* lflow referencing is only supported if 'od' is not NULL. */
> > +ovs_assert(od);
> > +
> > +struct lflow_ref_node *lrn =
> > +lflow_ref_node_find(_ref->lflow_ref_nodes, lflow,
hash);
> > +if (!lrn) {
> > +lrn = xzalloc(sizeof *lrn);
> > +lrn->lflow = lflow;
> > +lrn->dp_index = od->index;
> > +ovs_list_insert(_ref->lflows_ref_list,
> > +>lflow_list_node);
> > +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> > +ovs_list_insert(>referenced_by,
>ref_list_node);
> > +
> > +hmap_insert(_ref->lflow_ref_nodes, >ref_node,
hash);
> > +}
> > +
> > +lrn->linked = true;
> > +}
> > +
> > +lflow_hash_unlock(hash_lock);
> > +
> > +}
> > +
>
> This part is not thread safe.
>
> If two threads try to add logical flows that have different hashes and
> lflow_ref is not NULL we're going to have a race condition when
> inserting to the _ref->lflow_ref_nodes hash map because the two
> threads will take different locks.
>

I think it is safe because a lflow_ref is always associated with an object,
e.g. port, datapath, lb, etc., and lflow generation for a single such
object is never executed in parallel, which is how the parallel lflow build
is designed.
Does it make sense?

Thanks,
Han

> That might corrupt the map.
>
> I guess, if we don't want to cause more performance degradation we
> should maintain as many lflow_ref instances as we do hash_locks, i.e.,
> LFLOW_HASH_LOCK_MASK + 1.  Will that even be possible?
>
> Wdyt?
>
> Regards,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v5 03/16] northd: Move router ports SB PB options sync to sync_to_sb_pb node.

2024-01-21 Thread Han Zhou

-appctl -t ovn-northd inc-engine/clear-stats
> @@ -11127,16 +11250,20 @@ check ovn-nbctl lrp-add lr0 lr0-sw0
00:00:00:00:ff:01 10.0.0.1/24
>  # for northd engine there will be both recompute and compute
>  # first it will be recompute to handle lr0-sw0 and then a compute
>  # for the SB port binding change.
> -check_engine_stats northd recompute nocompute
> +check_engine_stats northd recompute compute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  ovn-nbctl lsp-add sw0 sw0-lr0
>  ovn-nbctl lsp-set-type sw0-lr0 router
>  ovn-nbctl lsp-set-addresses sw0-lr0 00:00:00:00:ff:01
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lsp-set-options sw0-lr0 router-port=lr0-sw0
> -check_engine_stats northd recompute nocompute
> +check_engine_stats northd recompute compute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  ovn-nbctl ls-add public
>  ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.100/24
> @@ -11158,7 +11285,9 @@ ovn-nbctl --wait=hv lrp-set-gateway-chassis
lr0-public hv1 20
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl set logical_router_port lr0-sw0 options:foo=bar
>  check_engine_stats northd recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Do checks for NATs.
>  # Add a NAT. This should not result in recompute of both northd and lflow
> @@ -11167,6 +11296,7 @@ check as northd ovn-appctl -t ovn-northd
inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-nat-add lr0 dnat_and_snat  172.168.0.110
10.0.0.4
>  check_engine_stats northd recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Update the NAT options column
> @@ -11174,6 +11304,7 @@ check as northd ovn-appctl -t ovn-northd
inc-engine/clear-stats
>  check ovn-nbctl --wait=sb set NAT . options:foo=bar
>  check_engine_stats northd recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Update the NAT external_ip column
> @@ -11181,6 +11312,7 @@ check as northd ovn-appctl -t ovn-northd
inc-engine/clear-stats
>  check ovn-nbctl --wait=sb set NAT . external_ip=172.168.0.120
>  check_engine_stats northd recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Update the NAT logical_ip column
> @@ -11188,6 +11320,7 @@ check as northd ovn-appctl -t ovn-northd
inc-engine/clear-stats
>  check ovn-nbctl --wait=sb set NAT . logical_ip=10.0.0.10
>  check_engine_stats northd recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Update the NAT type
> @@ -11195,13 +11328,15 @@ check as northd ovn-appctl -t ovn-northd
inc-engine/clear-stats
>  check ovn-nbctl --wait=sb set NAT . type=snat
>  check_engine_stats northd recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Create a dnat_and_snat NAT with external_mac and logical_port
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-nat-add lr0 dnat_and_snat 172.168.0.110
10.0.0.4 sw0p1 30:54:00:00:00:03
> -check_engine_stats northd recompute nocompute
> +check_engine_stats northd recompute compute
>  check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  nat2_uuid=$(ovn-nbctl --bare --columns _uuid find nat
logical_ip=10.0.0.4)
> @@ -11210,6 +11345,7 @@ check as northd ovn-appctl -t ovn-northd
inc-engine/clear-stats
>  check ovn-nbctl --wait=sb set NAT $nat2_uuid
external_mac='"30:54:00:00:00:04"'
>  check_engine_stats northd recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Create a load balancer and add the lb vip as NAT
> @@ -11223,31 +11359,35 @@ check ovn-nbctl lr-lb-add lr0 lb2
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-nat-add lr0 dnat_and_snat 172.168.0.140
10.0.0.20
>  check_engine_stats northd recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-nat-add lr0 dnat_and_snat 172.168.0.150
10.0.0.41
>  check_engine_stats northd recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-nat-del lr0 dnat_and_snat 172.168.0.150
>  check_engine_stats northd recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-nat-del lr0 dnat_and_snat 172.168.0.140
>  check_engine_stats northd recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  # Delete the NAT
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb clear logical_router lr0 nat
> -check_engine_stats northd recompute nocompute
> +check_engine_stats northd recompute compute
>  check_engine_stats lflow recompute nocompute
>  check_engine_stats sync_to_sb_pb recompute nocompute
>  CHECK_NO_CHANGE_AFTER_RECOMPUTE
> @@ -11256,12 +11396,16 @@ CHECK_NO_CHANGE_AFTER_RECOMPUTE
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-policy-add lr0  10 "ip4.src == 10.0.0.3"
reroute 172.168.0.101,172.168.0.102
>  check_engine_stats northd recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
>  check ovn-nbctl --wait=sb lr-policy-del lr0  10 "ip4.src == 10.0.0.3"
>  check_engine_stats northd recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
>  check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
>
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
> --
> 2.43.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v5 02/16] tests: Add a couple of tests in ovn-northd for I-P.

2024-01-21 Thread Han Zhou

stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +# Update the NAT external_ip column
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb set NAT . external_ip=172.168.0.120
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +# Update the NAT logical_ip column
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb set NAT . logical_ip=10.0.0.10
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +# Update the NAT type
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb set NAT . type=snat
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +# Create a dnat_and_snat NAT with external_mac and logical_port
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb lr-nat-add lr0 dnat_and_snat 172.168.0.110
10.0.0.4 sw0p1 30:54:00:00:00:03
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +nat2_uuid=$(ovn-nbctl --bare --columns _uuid find nat
logical_ip=10.0.0.4)
> +
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb set NAT $nat2_uuid
external_mac='"30:54:00:00:00:04"'
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +# Create a load balancer and add the lb vip as NAT
> +check ovn-nbctl lb-add lb1 172.168.0.140 10.0.0.20
> +check ovn-nbctl lb-add lb2 172.168.0.150:80 10.0.0.40:8080
> +check ovn-nbctl lr-lb-add lr0 lb1
> +check ovn-nbctl lr-lb-add lr0 lb2
> +
> +# lflow engine should recompute since the nat ip 172.168.0.140
> +# is a lb vip.
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb lr-nat-add lr0 dnat_and_snat 172.168.0.140
10.0.0.20
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb lr-nat-add lr0 dnat_and_snat 172.168.0.150
10.0.0.41
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb lr-nat-del lr0 dnat_and_snat 172.168.0.150
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb lr-nat-del lr0 dnat_and_snat 172.168.0.140
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +# Delete the NAT
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb clear logical_router lr0 nat
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +check_engine_stats sync_to_sb_pb recompute nocompute
> +CHECK_NO_CHANGE_AFTER_RECOMPUTE
> +
> +# Create router Policy
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb lr-policy-add lr0  10 "ip4.src == 10.0.0.3"
reroute 172.168.0.101,172.168.0.102
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +
> +check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
> +check ovn-nbctl --wait=sb lr-policy-del lr0  10 "ip4.src == 10.0.0.3"
> +check_engine_stats northd recompute nocompute
> +check_engine_stats lflow recompute nocompute
> +
> +OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
> --
> 2.43.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v5 01/16] northd: Refactor the northd change tracking.

2024-01-21 Thread Han Zhou

ue(sbmc_unknown,
> -op->sb);
> -}
> -
> -/* Sync the newly added flows to SB. */
> -struct lflow_ref_node *lfrn;
> -LIST_FOR_EACH (lfrn, lflow_list_node, >lflows) {
> -sync_lsp_lflows_to_sb(ovnsb_txn, lflow_input, lflows,
> -  lfrn->lflow);
> -}
> +/* Sync the new flows to SB. */
> +struct lflow_ref_node *lfrn;
> +LIST_FOR_EACH (lfrn, lflow_list_node, >lflows) {
> +sync_lsp_lflows_to_sb(ovnsb_txn, lflow_input, lflows,
> +  lfrn->lflow);
>  }
> +}
>
> -bool ls_has_only_router_ports = (ls_change->od->n_router_ports &&
> - (ls_change->od->n_router_ports
==
> -
 hmap_count(_change->od->ports)));
> -
> -if (ls_change->had_only_router_ports !=
ls_has_only_router_ports) {
> -/* There are lflows related to router ports that depends on
whether
> - * there are switch ports on the logical switch (see
> - * build_lswitch_rport_arp_req_flow() for more details).
Since this
> - * dependency changed, we need to regenerate lflows for each
router
> - * port on this logical switch. */
> -for (size_t i = 0; i < ls_change->od->n_router_ports; i++) {
> -op = ls_change->od->router_ports[i];
> -
> -/* Delete old lflows. */
> -if (!delete_lflow_for_lsp(op, "affected router",
> -
 lflow_input->sbrec_logical_flow_table,
> -  lflows)) {
> -return false;
> -}
> +HMAPX_FOR_EACH (hmapx_node, _lsps->created) {
> +op = hmapx_node->data;
> +/* Make sure 'op' is an lsp and not lrp. */
> +ovs_assert(op->nbsp);
>
> -/* Generate new lflows. */
> -struct ds match = DS_EMPTY_INITIALIZER;
> -struct ds actions = DS_EMPTY_INITIALIZER;
> -build_lswitch_and_lrouter_iterate_by_lsp(op,
> -lflow_input->ls_ports, lflow_input->lr_ports,
> -lflow_input->meter_groups, , , lflows);
> -ds_destroy();
> -ds_destroy();
> -
> -/* Sync the new flows to SB. */
> -struct lflow_ref_node *lfrn;
> -LIST_FOR_EACH (lfrn, lflow_list_node, >lflows) {
> -sync_lsp_lflows_to_sb(ovnsb_txn, lflow_input, lflows,
> -  lfrn->lflow);
> -}
> +const struct sbrec_multicast_group *sbmc_flood =
> +mcast_group_lookup(lflow_input->sbrec_mcast_group_by_name_dp,
> +   MC_FLOOD, op->od->sb);
> +const struct sbrec_multicast_group *sbmc_flood_l2 =
> +mcast_group_lookup(lflow_input->sbrec_mcast_group_by_name_dp,
> +   MC_FLOOD_L2, op->od->sb);
> +const struct sbrec_multicast_group *sbmc_unknown =
> +mcast_group_lookup(lflow_input->sbrec_mcast_group_by_name_dp,
> +   MC_UNKNOWN, op->od->sb);
> +
> +struct ds match = DS_EMPTY_INITIALIZER;
> +struct ds actions = DS_EMPTY_INITIALIZER;
> +build_lswitch_and_lrouter_iterate_by_lsp(op,
lflow_input->ls_ports,
> +
 lflow_input->lr_ports,
> +
 lflow_input->meter_groups,
> +, ,
> +lflows);
> +ds_destroy();
> +ds_destroy();
> +
> +/* Update SB multicast groups for the new port. */
> +if (!sbmc_flood) {
> +sbmc_flood = create_sb_multicast_group(ovnsb_txn,
> +op->od->sb, MC_FLOOD, OVN_MCAST_FLOOD_TUNNEL_KEY);
> +}
> +sbrec_multicast_group_update_ports_addvalue(sbmc_flood, op->sb);
> +
> +if (!sbmc_flood_l2) {
> +sbmc_flood_l2 = create_sb_multicast_group(ovnsb_txn,
> +op->od->sb, MC_FLOOD_L2,
> +OVN_MCAST_FLOOD_L2_TUNNEL_KEY);
> +}
> +sbrec_multicast_group_update_ports_addvalue(sbmc_flood_l2,
op->sb);
> +
> +if (op->has_unknown) {
> +if (!sbmc_unknown) {
> +sbmc_unknown = create_sb_multicast_group(ovnsb_txn,
> +op->od->sb, MC_UNKNOWN,
> +OVN_MCAST_UNKNOWN_TUNNEL_KEY);
>

Re: [ovs-dev] [PATCH ovn v5 04/16] northd: Add a new engine 'lr_nat' to manage lr NAT data.

2024-01-21 Thread Han Zhou

On Thu, Jan 11, 2024 at 7:29 AM  wrote:
>
> From: Numan Siddique 
>
> This new engine now maintains the NAT related data for each
> logical router which was earlier maintained by the northd
> engine node in the 'struct ovn_datapath'.  The input to
> this engine node is 'northd'.
>
> A record for each logical router (lr_nat_record) is maintained
> in the 'lr_nats' hmap table which stores the lr's NAT dat.
>
> 'northd' engine now reports logical routers changed due to NATs
> in its tracking data.  'lr_nat' engine node makes use of
> this tracked data in its northd change handler to update the
> NAT data.
>
> This engine node becomes an input to 'lflow' node.
>
> Signed-off-by: Numan Siddique 
> ---
>  lib/ovn-util.c   |   6 +-
>  lib/ovn-util.h   |   2 +-
>  lib/stopwatch-names.h|   1 +
>  northd/automake.mk   |   2 +
>  northd/en-lflow.c|   5 +
>  northd/en-lr-nat.c   | 423 
>  northd/en-lr-nat.h   | 127 +
>  northd/en-northd.c   |   4 +
>  northd/en-sync-sb.c  |   6 +-
>  northd/inc-proc-northd.c |   6 +
>  northd/northd.c  | 589 ---
>  northd/northd.h  |  46 +--
>  northd/ovn-northd.c  |   1 +
>  tests/ovn-northd.at  |  46 ++-
>  14 files changed, 885 insertions(+), 379 deletions(-)
>  create mode 100644 northd/en-lr-nat.c
>  create mode 100644 northd/en-lr-nat.h
>
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index 6ef9cac7f2..c8b89cc216 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -385,7 +385,7 @@ extract_sbrec_binding_first_mac(const struct
sbrec_port_binding *binding,
>  }
>
>  bool
> -lport_addresses_is_empty(struct lport_addresses *laddrs)
> +lport_addresses_is_empty(const struct lport_addresses *laddrs)
>  {
>  return !laddrs->n_ipv4_addrs && !laddrs->n_ipv6_addrs;
>  }
> @@ -395,6 +395,10 @@ destroy_lport_addresses(struct lport_addresses
*laddrs)
>  {
>  free(laddrs->ipv4_addrs);
>  free(laddrs->ipv6_addrs);
> +laddrs->ipv4_addrs = NULL;
> +laddrs->ipv6_addrs = NULL;
> +laddrs->n_ipv4_addrs = 0;
> +laddrs->n_ipv6_addrs = 0;
>  }
>
>  /* Returns a string of the IP address of 'laddrs' that overlaps with
'ip_s'.
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index aa0b3b2fb4..d245d57d56 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -112,7 +112,7 @@ bool extract_sbrec_binding_first_mac(const struct
sbrec_port_binding *binding,
>  bool extract_lrp_networks__(char *mac, char **networks, size_t
n_networks,
>  struct lport_addresses *laddrs);
>
> -bool lport_addresses_is_empty(struct lport_addresses *);
> +bool lport_addresses_is_empty(const struct lport_addresses *);
>  void destroy_lport_addresses(struct lport_addresses *);
>  const char *find_lport_address(const struct lport_addresses *laddrs,
> const char *ip_s);
> diff --git a/lib/stopwatch-names.h b/lib/stopwatch-names.h
> index 4e93c1dc14..782d64320a 100644
> --- a/lib/stopwatch-names.h
> +++ b/lib/stopwatch-names.h
> @@ -29,5 +29,6 @@
>  #define LFLOWS_TO_SB_STOPWATCH_NAME "lflows_to_sb"
>  #define PORT_GROUP_RUN_STOPWATCH_NAME "port_group_run"
>  #define SYNC_METERS_RUN_STOPWATCH_NAME "sync_meters_run"
> +#define LR_NAT_RUN_STOPWATCH_NAME "lr_nat_run"
>
>  #endif
> diff --git a/northd/automake.mk b/northd/automake.mk
> index 5d77ca67b7..a477105470 100644
> --- a/northd/automake.mk
> +++ b/northd/automake.mk
> @@ -24,6 +24,8 @@ northd_ovn_northd_SOURCES = \
> northd/en-sync-from-sb.h \
> northd/en-lb-data.c \
> northd/en-lb-data.h \
> +   northd/en-lr-nat.c \
> +   northd/en-lr-nat.h \
> northd/inc-proc-northd.c \
> northd/inc-proc-northd.h \
> northd/ipam.c \
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index 6ba26006e0..e4f875ef7c 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -19,6 +19,7 @@
>  #include 
>
>  #include "en-lflow.h"
> +#include "en-lr-nat.h"
>  #include "en-northd.h"
>  #include "en-meters.h"
>
> @@ -40,6 +41,9 @@ lflow_get_input_data(struct engine_node *node,
>  engine_get_input_data("port_group", node);
>  struct sync_meters_data *sync_meters_data =
>  engine_get_input_data("sync_meters", node);
> +struct ed_type_lr_nat_data *lr_nat_data =
> +engine_get_input_data("lr_nat", node);
> +
>  lflow_input->nbrec_bfd_table =
>  EN_OVSDB_GET(engine_get_input("NB_bfd", node));
>  lflow_input->sbrec_bfd_table =
> @@ -61,6 +65,7 @@ lflow_get_input_data(struct engine_node *node,
>  lflow_input->ls_ports = _data->ls_ports;
>  lflow_input->lr_ports = _data->lr_ports;
>  lflow_input->ls_port_groups = _data->ls_port_groups;
> +lflow_input->lr_nats = _nat_data->lr_nats;
>  lflow_input->meter_groups = _meters_data->meter_groups;
>  lflow_input->lb_datapaths_map = _data->lb_datapaths_map;
>  lflow_input->svc_monitor_map =

Re: [ovs-dev] OVN 24.03 Soft Freeze 19 January

2024-01-18 Thread Han Zhou

On Tue, Jan 16, 2024 at 5:45 AM Mark Michelson  wrote:
>
> Hi everyone,
>
> The soft freeze for OVN 24.03 is this Friday, 19 January, 2024. Please
> ensure that any patches that introduce new features are posted to the
> mailing list by that date if you wish to have them included in OVN 24.03.
>
> If you have a feature patch that you wish to be included in 24.03,
> please post a link to the series in a reply to this mail. The OVN team
> will do our best to review the desired changes.
>
> After soft freeze, we will create the OVN 24.03 branch on 2 February. At
> that point, no new feature patches will be merged into the 24.03 branch.
> However, bug fixes will still be eligible for inclusion.
>
> Thank you,
> Mark Michelson
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Request for this feature:
https://patchwork.ozlabs.org/project/ovn/list/?series=390763

Thanks,
Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn 2/3] encaps: Create separate tunnels for multiple local encap IPs.

2024-01-16 Thread Han Zhou

On Tue, Jan 16, 2024 at 9:48 PM Han Zhou  wrote:
>
> In commit dd527a283cd8, it created separate tunnels for different remote
> encap IPs of the same remote chassis, but when the current chassis is
> the one that has multiple encap IPs configured, it only uses the first
> encap IP. This patch creates separate tunnels taking into consideration
> the multiple encap IPs in current chassis and sets corresponding
> local_ip for each tunnel interface in such cases.
>
> Co-authored-by: Lei Huang 
> Signed-off-by: Lei Huang 

Oops, I made a silly mistake in Lei's email address, which should be
l...@nvidia.com instead. Sorry for that.

Han

> Signed-off-by: Han Zhou 
> ---
>  controller/bfd.c|   4 +-
>  controller/encaps.c | 158 ++--
>  controller/encaps.h |   9 ++-
>  controller/lflow.c  |   2 +-
>  controller/local_data.c |  14 ++--
>  controller/local_data.h |   5 +-
>  controller/physical.c   |  28 +++
>  controller/pinctrl.c|   2 +-
>  tests/ovn-ipsec.at  |  49 -
>  tests/ovn.at|  88 +-
>  10 files changed, 192 insertions(+), 167 deletions(-)
>
> diff --git a/controller/bfd.c b/controller/bfd.c
> index cf011e382c6c..f24bfd063888 100644
> --- a/controller/bfd.c
> +++ b/controller/bfd.c
> @@ -75,7 +75,7 @@ bfd_calculate_active_tunnels(const struct ovsrec_bridge
*br_int,
>  char *chassis_name = NULL;
>
>  if (encaps_tunnel_id_parse(id, _name,
> -   NULL)) {
> +   NULL, NULL)) {
>  if (!sset_contains(active_tunnels,
> chassis_name)) {
>  sset_add(active_tunnels,
chassis_name);
> @@ -204,7 +204,7 @@ bfd_run(const struct ovsrec_interface_table
*interface_table,
>
>  sset_add(, port_name);
>
> -if (encaps_tunnel_id_parse(tunnel_id, _name, NULL)) {
> +if (encaps_tunnel_id_parse(tunnel_id, _name, NULL,
NULL)) {
>  if (sset_contains(_chassis, chassis_name)) {
>  sset_add(_ifaces, port_name);
>  }
> diff --git a/controller/encaps.c b/controller/encaps.c
> index 1f6e667a606c..28237f6191c8 100644
> --- a/controller/encaps.c
> +++ b/controller/encaps.c
> @@ -32,10 +32,9 @@ VLOG_DEFINE_THIS_MODULE(encaps);
>  /*
>   * Given there could be multiple tunnels with different IPs to the same
>   * chassis we annotate the external_ids:ovn-chassis-id in tunnel port
with
> - * OVN_MVTEP_CHASSISID_DELIM. The external_id key
> + * @%. The external_id key
>   * "ovn-chassis-id" is kept for backward compatibility.
>   */
> -#defineOVN_MVTEP_CHASSISID_DELIM '@'
>  #define OVN_TUNNEL_ID "ovn-chassis-id"
>
>  static char *current_br_int_name = NULL;
> @@ -95,72 +94,93 @@ tunnel_create_name(struct tunnel_ctx *tc, const char
*chassis_id)
>  }
>
>  /*
> - * Returns a tunnel-id of the form 'chassis_id'-delimiter-'encap_ip'.
> + * Returns a tunnel-id of the form chassis_id@remote_encap_ip
%local_encap_ip.
>   */
>  char *
> -encaps_tunnel_id_create(const char *chassis_id, const char *encap_ip)
> +encaps_tunnel_id_create(const char *chassis_id, const char
*remote_encap_ip,
> +const char *local_encap_ip)
>  {
> -return xasprintf("%s%c%s", chassis_id, OVN_MVTEP_CHASSISID_DELIM,
> - encap_ip);
> +return xasprintf("%s%c%s%c%s", chassis_id, '@', remote_encap_ip,
> + '%', local_encap_ip);
>  }
>
>  /*
> - * Parses a 'tunnel_id' of the form .
> + * Parses a 'tunnel_id' of the form @%.
>   * If the 'chassis_id' argument is not NULL the function will allocate
memory
>   * and store the chassis_name part of the tunnel-id at '*chassis_id'.
> - * If the 'encap_ip' argument is not NULL the function will allocate
memory
> - * and store the encapsulation IP part of the tunnel-id at '*encap_ip'.
> + * Same for remote_encap_ip and local_encap_ip.
>   */
>  bool
>  encaps_tunnel_id_parse(const char *tunnel_id, char **chassis_id,
> -   char **encap_ip)
> +   char **remote_encap_ip, char **local_encap_ip)
>  {
> -/* Find the delimiter.  Fail if there is no delimiter or if

> - * or  is the empty string.*/
> -const char *d = strchr(tunnel_id, OVN_MVTEP_CHASSISID_DELIM);
> +/* Find the @.  Fail if there is no @ or if any part is empty. */
> +const char *d = strchr(tunnel_id, '@');
>  if (d == tunne

[ovs-dev] [PATCH ovn 3/3] ovn-controller: Support VIF-based local encap IPs selection.

2024-01-16 Thread Han Zhou

Commit dd527a283cd8 partially supported multiple encap IPs. It supported
remote encap IP selection based on the destination VIF's encap_ip
configuration. This patch adds the support for selecting local encap IP
based on the source VIF's encap_ip configuration.

Co-authored-by: Lei Huang 
Signed-off-by: Lei Huang 
Signed-off-by: Han Zhou 
---
 NEWS|   3 +
 controller/chassis.c|   2 +-
 controller/local_data.c |   2 +-
 controller/local_data.h |   2 +-
 controller/ovn-controller.8.xml |  30 ++-
 controller/ovn-controller.c |  49 
 controller/physical.c   | 134 ++--
 controller/physical.h   |   2 +
 include/ovn/logical-fields.h|   4 +-
 ovn-architecture.7.xml  |  18 -
 tests/ovn.at|  51 +++-
 11 files changed, 243 insertions(+), 54 deletions(-)

diff --git a/NEWS b/NEWS
index 5f267b4c64cc..5a3eed608617 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,9 @@ Post v23.09.0
   - ovn-northd-ddlog has been removed.
   - A new LSP option "enable_router_port_acl" has been added to enable
 conntrack for the router port whose peer is l3dgw_port if set it true.
+  - Support selecting encapsulation IP based on the source/destination VIF's
+settting. See ovn-controller(8) 'external_ids:ovn-encap-ip' for more
+details.
 
 OVN v23.09.0 - 15 Sep 2023
 --
diff --git a/controller/chassis.c b/controller/chassis.c
index a6f13ccc42d5..55f2beb37674 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -61,7 +61,7 @@ struct ovs_chassis_cfg {
 
 /* Set of encap types parsed from the 'ovn-encap-type' external-id. */
 struct sset encap_type_set;
-/* Set of encap IPs parsed from the 'ovn-encap-type' external-id. */
+/* Set of encap IPs parsed from the 'ovn-encap-ip' external-id. */
 struct sset encap_ip_set;
 /* Interface type list formatted in the OVN-SB Chassis required format. */
 struct ds iface_types;
diff --git a/controller/local_data.c b/controller/local_data.c
index a9092783958f..8606414f8728 100644
--- a/controller/local_data.c
+++ b/controller/local_data.c
@@ -514,7 +514,7 @@ chassis_tunnels_destroy(struct hmap *chassis_tunnels)
  */
 struct chassis_tunnel *
 chassis_tunnel_find(const struct hmap *chassis_tunnels, const char *chassis_id,
-char *remote_encap_ip, char *local_encap_ip)
+char *remote_encap_ip, const char *local_encap_ip)
 {
 /*
  * If the specific encap_ip is given, look for the chassisid_ip entry,
diff --git a/controller/local_data.h b/controller/local_data.h
index bab95bcc3824..ca3905bd20e6 100644
--- a/controller/local_data.h
+++ b/controller/local_data.h
@@ -150,7 +150,7 @@ bool local_nonvif_data_handle_ovs_iface_changes(
 struct chassis_tunnel *chassis_tunnel_find(const struct hmap *chassis_tunnels,
const char *chassis_id,
char *remote_encap_ip,
-   char *local_encap_ip);
+   const char *local_encap_ip);
 
 bool get_chassis_tunnel_ofport(const struct hmap *chassis_tunnels,
const char *chassis_name,
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index efa65e3fd927..5ebef048d721 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -176,10 +176,32 @@
 
   external_ids:ovn-encap-ip
   
-The IP address that a chassis should use to connect to this node
-using encapsulation types specified by
-external_ids:ovn-encap-type. Multiple encapsulation IPs
-may be specified with a comma-separated list.
+
+  The IP address that a chassis should use to connect to this node
+  using encapsulation types specified by
+  external_ids:ovn-encap-type. Multiple encapsulation IPs
+  may be specified with a comma-separated list.
+
+
+  In scenarios where multiple encapsulation IPs are present, distinct
+  tunnels are established for each remote chassis. These tunnels are
+  differentiated by setting unique options:local_ip and
+  options:remote_ip values in the tunnel interface. When
+  transmitting a packet to a remote chassis, the selection of local_ip
+  is guided by the Interface:external_ids:encap-ip from
+  the local OVSDB, corresponding to the VIF originating the packet, if
+  specified. The Interface:external_ids:encap-ip setting
+  of the VIF is also populated to the Port_Binding
+  table in the OVN SB database via the encap column.
+  Consequently, when a remote chassis needs to send a packet to a
+  port-binding associated with this VIF, it utilizes the tunnel with
+  the a

[ovs-dev] [PATCH ovn 2/3] encaps: Create separate tunnels for multiple local encap IPs.

2024-01-16 Thread Han Zhou

In commit dd527a283cd8, it created separate tunnels for different remote
encap IPs of the same remote chassis, but when the current chassis is
the one that has multiple encap IPs configured, it only uses the first
encap IP. This patch creates separate tunnels taking into consideration
the multiple encap IPs in current chassis and sets corresponding
local_ip for each tunnel interface in such cases.

Co-authored-by: Lei Huang 
Signed-off-by: Lei Huang 
Signed-off-by: Han Zhou 
---
 controller/bfd.c|   4 +-
 controller/encaps.c | 158 ++--
 controller/encaps.h |   9 ++-
 controller/lflow.c  |   2 +-
 controller/local_data.c |  14 ++--
 controller/local_data.h |   5 +-
 controller/physical.c   |  28 +++
 controller/pinctrl.c|   2 +-
 tests/ovn-ipsec.at  |  49 -
 tests/ovn.at|  88 +-
 10 files changed, 192 insertions(+), 167 deletions(-)

diff --git a/controller/bfd.c b/controller/bfd.c
index cf011e382c6c..f24bfd063888 100644
--- a/controller/bfd.c
+++ b/controller/bfd.c
@@ -75,7 +75,7 @@ bfd_calculate_active_tunnels(const struct ovsrec_bridge 
*br_int,
 char *chassis_name = NULL;
 
 if (encaps_tunnel_id_parse(id, _name,
-   NULL)) {
+   NULL, NULL)) {
 if (!sset_contains(active_tunnels,
chassis_name)) {
 sset_add(active_tunnels, chassis_name);
@@ -204,7 +204,7 @@ bfd_run(const struct ovsrec_interface_table 
*interface_table,
 
 sset_add(, port_name);
 
-if (encaps_tunnel_id_parse(tunnel_id, _name, NULL)) {
+if (encaps_tunnel_id_parse(tunnel_id, _name, NULL, NULL)) {
 if (sset_contains(_chassis, chassis_name)) {
 sset_add(_ifaces, port_name);
 }
diff --git a/controller/encaps.c b/controller/encaps.c
index 1f6e667a606c..28237f6191c8 100644
--- a/controller/encaps.c
+++ b/controller/encaps.c
@@ -32,10 +32,9 @@ VLOG_DEFINE_THIS_MODULE(encaps);
 /*
  * Given there could be multiple tunnels with different IPs to the same
  * chassis we annotate the external_ids:ovn-chassis-id in tunnel port with
- * OVN_MVTEP_CHASSISID_DELIM. The external_id key
+ * @%. The external_id key
  * "ovn-chassis-id" is kept for backward compatibility.
  */
-#defineOVN_MVTEP_CHASSISID_DELIM '@'
 #define OVN_TUNNEL_ID "ovn-chassis-id"
 
 static char *current_br_int_name = NULL;
@@ -95,72 +94,93 @@ tunnel_create_name(struct tunnel_ctx *tc, const char 
*chassis_id)
 }
 
 /*
- * Returns a tunnel-id of the form 'chassis_id'-delimiter-'encap_ip'.
+ * Returns a tunnel-id of the form chassis_id@remote_encap_ip%local_encap_ip.
  */
 char *
-encaps_tunnel_id_create(const char *chassis_id, const char *encap_ip)
+encaps_tunnel_id_create(const char *chassis_id, const char *remote_encap_ip,
+const char *local_encap_ip)
 {
-return xasprintf("%s%c%s", chassis_id, OVN_MVTEP_CHASSISID_DELIM,
- encap_ip);
+return xasprintf("%s%c%s%c%s", chassis_id, '@', remote_encap_ip,
+ '%', local_encap_ip);
 }
 
 /*
- * Parses a 'tunnel_id' of the form .
+ * Parses a 'tunnel_id' of the form @%.
  * If the 'chassis_id' argument is not NULL the function will allocate memory
  * and store the chassis_name part of the tunnel-id at '*chassis_id'.
- * If the 'encap_ip' argument is not NULL the function will allocate memory
- * and store the encapsulation IP part of the tunnel-id at '*encap_ip'.
+ * Same for remote_encap_ip and local_encap_ip.
  */
 bool
 encaps_tunnel_id_parse(const char *tunnel_id, char **chassis_id,
-   char **encap_ip)
+   char **remote_encap_ip, char **local_encap_ip)
 {
-/* Find the delimiter.  Fail if there is no delimiter or if 
- * or  is the empty string.*/
-const char *d = strchr(tunnel_id, OVN_MVTEP_CHASSISID_DELIM);
+/* Find the @.  Fail if there is no @ or if any part is empty. */
+const char *d = strchr(tunnel_id, '@');
 if (d == tunnel_id || !d || !d[1]) {
 return false;
 }
 
+/* Find the %.  Fail if there is no % or if any part is empty. */
+const char *d2 = strchr(d + 1, '%');
+if (d2 == d + 1 || !d2 || !d2[1]) {
+return false;
+}
+
 if (chassis_id) {
 *chassis_id = xmemdup0(tunnel_id, d - tunnel_id);
 }
-if (encap_ip) {
-*encap_ip = xstrdup(d + 1);
+
+if (remote_encap_ip) {
+*remote_encap_ip = xmemdup0(d + 1, d2 - (d + 1));
+}
+
+if (local_encap_ip) {
+*local_encap_ip = xstrdup(d2 + 1);
 }
 return true;
 }
 
 /*
- * Returns true if 'tunnel_id' contains 'chassis_id' and, if specified, the
- * g

[ovs-dev] [PATCH ovn 1/3] encaps: Refactor the naming related to tunnels.

2024-01-16 Thread Han Zhou

Rename vars and structs to reflect the fact that there can be multiple
tunnels for each individual chassis. Also update the documentation of
external_ids:ovn-encap-ip.

Signed-off-by: Han Zhou 
---
 controller/encaps.c | 77 -
 controller/ovn-controller.8.xml |  3 +-
 2 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/controller/encaps.c b/controller/encaps.c
index b69d725843e9..1f6e667a606c 100644
--- a/controller/encaps.c
+++ b/controller/encaps.c
@@ -31,10 +31,12 @@ VLOG_DEFINE_THIS_MODULE(encaps);
 
 /*
  * Given there could be multiple tunnels with different IPs to the same
- * chassis we annotate the ovn-chassis-id with
- * OVN_MVTEP_CHASSISID_DELIM.
+ * chassis we annotate the external_ids:ovn-chassis-id in tunnel port with
+ * OVN_MVTEP_CHASSISID_DELIM. The external_id key
+ * "ovn-chassis-id" is kept for backward compatibility.
  */
 #defineOVN_MVTEP_CHASSISID_DELIM '@'
+#define OVN_TUNNEL_ID "ovn-chassis-id"
 
 static char *current_br_int_name = NULL;
 
@@ -55,8 +57,9 @@ encaps_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 
 /* Enough context to create a new tunnel, using tunnel_add(). */
 struct tunnel_ctx {
-/* Maps from a chassis name to "struct chassis_node *". */
-struct shash chassis;
+/* Maps from a tunnel-id (stored in external_ids:ovn-chassis-id) to
+ * "struct tunnel_node *". */
+struct shash tunnel;
 
 /* Names of all ports in the bridge, to allow checking uniqueness when
  * adding a new tunnel. */
@@ -68,7 +71,7 @@ struct tunnel_ctx {
 const struct sbrec_chassis *this_chassis;
 };
 
-struct chassis_node {
+struct tunnel_node {
 const struct ovsrec_port *port;
 const struct ovsrec_bridge *bridge;
 };
@@ -104,7 +107,7 @@ encaps_tunnel_id_create(const char *chassis_id, const char 
*encap_ip)
 /*
  * Parses a 'tunnel_id' of the form .
  * If the 'chassis_id' argument is not NULL the function will allocate memory
- * and store the chassis-id part of the tunnel-id at '*chassis_id'.
+ * and store the chassis_name part of the tunnel-id at '*chassis_id'.
  * If the 'encap_ip' argument is not NULL the function will allocate memory
  * and store the encapsulation IP part of the tunnel-id at '*encap_ip'.
  */
@@ -169,7 +172,7 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 
 /*
  * Since a chassis may have multiple encap-ip, we can't just add the
- * chassis name as as the "ovn-chassis-id" for the port; we use the
+ * chassis name as the OVN_TUNNEL_ID for the port; we use the
  * combination of the chassis_name and the encap-ip to identify
  * a specific tunnel to the chassis.
  */
@@ -260,25 +263,25 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 }
 }
 
-/* If there's an existing chassis record that does not need any change,
+/* If there's an existing tunnel record that does not need any change,
  * keep it.  Otherwise, create a new record (if there was an existing
  * record, the new record will supplant it and encaps_run() will delete
  * it). */
-struct chassis_node *chassis = shash_find_data(>chassis,
-   tunnel_entry_id);
-if (chassis
-&& chassis->port->n_interfaces == 1
-&& !strcmp(chassis->port->interfaces[0]->type, encap->type)
-&& smap_equal(>port->interfaces[0]->options, )) {
-shash_find_and_delete(>chassis, tunnel_entry_id);
-free(chassis);
+struct tunnel_node *tunnel = shash_find_data(>tunnel,
+ tunnel_entry_id);
+if (tunnel
+&& tunnel->port->n_interfaces == 1
+&& !strcmp(tunnel->port->interfaces[0]->type, encap->type)
+&& smap_equal(>port->interfaces[0]->options, )) {
+shash_find_and_delete(>tunnel, tunnel_entry_id);
+free(tunnel);
 goto exit;
 }
 
 /* Choose a name for the new port.  If we're replacing an old port, reuse
  * its name, otherwise generate a new, unique name. */
-char *port_name = (chassis
-   ? xstrdup(chassis->port->name)
+char *port_name = (tunnel
+   ? xstrdup(tunnel->port->name)
: tunnel_create_name(tc, new_chassis_id));
 if (!port_name) {
 VLOG_WARN("Unable to allocate unique name for '%s' tunnel",
@@ -294,7 +297,7 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 struct ovsrec_port *port = ovsrec_port_insert(tc->ovs_txn);
 ovsrec_port_set_name(port, port_name);
 ovsrec_port_set_interfaces(port, , 1);
-const struct smap id = SMAP_CONST1(, "ovn-chassis-id", tunnel_entry_id);
+const struct smap id = SMAP_CONST1(, OVN

[ovs-dev] [PATCH ovn 0/3] Support VIF-based local encap IPs selection.

2024-01-16 Thread Han Zhou

Han Zhou (3):
  encaps: Refactor the naming related to tunnels.
  encaps: Create separate tunnels for multiple local encap IPs.
  ovn-controller: Support VIF-based local encap IPs selection.

 NEWS|   3 +
 controller/bfd.c|   4 +-
 controller/chassis.c|   2 +-
 controller/encaps.c | 233 +---
 controller/encaps.h |   9 +-
 controller/lflow.c  |   2 +-
 controller/local_data.c |  14 +-
 controller/local_data.h |   5 +-
 controller/ovn-controller.8.xml |  29 +++-
 controller/ovn-controller.c |  49 +++
 controller/physical.c   | 150 +---
 controller/physical.h   |   2 +
 controller/pinctrl.c|   2 +-
 include/ovn/logical-fields.h|   4 +-
 ovn-architecture.7.xml  |  18 ++-
 tests/ovn-ipsec.at  |  49 ---
 tests/ovn.at| 137 ---
 17 files changed, 463 insertions(+), 249 deletions(-)

-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2] pinctrl: Directly retrieve desired port_binding MAC.

2024-01-10 Thread Han Zhou

On Wed, Jan 10, 2024 at 11:26 AM Mark Michelson  wrote:
>
> A static analyzer determined that if pb->n_mac was 0, then the c_addrs
> lport_addresses struct would never be initialized. We would then use
> and attempt to free uninitialized memory.
>
> In reality, pb->n_mac should always be 1. This is because the port
binding is a
> representation of a northbound logical router port. Logical router ports
do
> not contain an array of MACs like the southbound port bindings do.
Instead,
> they have a single MAC string that is always guaranteed to be non-NULL.
This
> string is copied into the port binding's MAC array. Therefore, a
southbound
> port binding that comes from a logical router port will always have n_mac
set
> to 1.
>
> How do we know this is a logical router port? The ports iterated over in
this
> function must have IPv6 prefix delegation configured on them. Only
northbound
> logical router ports have this option available.
>
> To silence the static analyzer, this change directly retrieves pb->mac[0]
> instead of iterating over the pb->mac array. As a safeguard, we ensure
> that the port binding has only one MAC before attempting to access it.
> This is based on the off chance that something other than northd has
> inserted the port binding into the southbound database.
>
> Reported-at: https://issues.redhat.com/browse/FDP-224
>
> Signed-off-by: Mark Michelson 
> ---
>  controller/pinctrl.c | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index 12055a675..49a56cf81 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -1286,11 +1286,25 @@ fill_ipv6_prefix_state(struct ovsdb_idl_txn
*ovnsb_idl_txn,
>  continue;
>  }
>
> +/* To reach this point, the port binding must be a logical router
> + * port. LRPs are configured with a single MAC that is always
non-NULL.
> + * Therefore, as long as we are working with a port_binding that
was
> + * inserted into the southbound database by northd, we can always
> + * safely extract pb->mac[0] since it will be non-NULL.
> + *
> + * However, if a port_binding was inserted by someone else, then
we
> + * need to double-check our assumption first.
> + */
> +if (pb->n_mac != 1) {
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
5);
> +VLOG_ERR_RL(, "Port binding "UUID_FMT" has %"PRIuSIZE"
MACs "
> +"instead of 1", UUID_ARGS(>header_.uuid),
> +pb->n_mac);
> +continue;
> +}
>  struct lport_addresses c_addrs;
> -for (size_t j = 0; j < pb->n_mac; j++) {
> -if (extract_lsp_addresses(pb->mac[j], _addrs)) {
> -        break;
> -}
> +if (!extract_lsp_addresses(pb->mac[0], _addrs)) {
> +continue;
>  }
>
>  pfd = shash_find_data(_prefixd, pb->logical_port);
> --
> 2.40.1
>

Thanks Mark.

Acked-by: Han Zhou 

> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] pinctrl: Directly retrieve desired port_binding MAC.

2024-01-09 Thread Han Zhou

On Tue, Jan 9, 2024 at 10:32 AM Mark Michelson  wrote:
>
> A static analyzer determined that if pb->n_mac was 0, then the c_addrs
> lport_addresses struct would never be initialized. We would then use
> and attempt to free uninitialized memory.
>
> In reality, pb->n_mac will always be 1. This is because the port binding
is a
> representation of a northbound logical router port. Logical router ports
do
> not contain an array of MACs like the southbound port bindings do.
Instead,
> they have a single MAC string that is always guaranteed to be non-NULL.
This
> string is copied into the port binding's MAC array. Therefore, a
southbound
> port binding that comes from a logical router port will always have n_mac
set
> to 1.
>
> How do we know this is a logical router port? The ports iterated over in
this
> function must have IPv6 prefix delegation configured on them. Only
northbound
> logical router ports have this option available.
>
> To silence the static analyzer, this change directly retrieves pb->mac[0]
> instead of iterating over the pb->mac array. It also adds an assertion
that
> pb->n_mac == 1. This arguably makes the code's intent more clear as
> well.
>
> Reported-at: https://issues.redhat.com/browse/FDP-224
>
> Signed-off-by: Mark Michelson 
> ---
>  controller/pinctrl.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index 12055a675..d5957ad69 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -1286,11 +1286,15 @@ fill_ipv6_prefix_state(struct ovsdb_idl_txn
*ovnsb_idl_txn,
>  continue;
>  }
>
> +/* To reach this point, the port binding must be a logical router
> + * port. LRPs are configured with a single MAC that is always
non-NULL.
> + * Therefore, we can always safely extract pb->mac[0] since it
will be
> + * non-NULL
> + */
> +ovs_assert(pb->n_mac == 1);

Thanks Mark for the fix. I think we shouldn't use assert here, since the DB
status is not controlled by ovn-controller and it is not good to crash the
ovn-controller when the mac column is cleared by an external tool for
whatever reason. Assertion should be used when we are 100% sure about the
assumption from developer's point of view except when there is a bug in the
current component. In this case I think it is better to print an error and
skip, if not doing more complex error handling.

Regards,
Han

>  struct lport_addresses c_addrs;
> -for (size_t j = 0; j < pb->n_mac; j++) {
> -if (extract_lsp_addresses(pb->mac[j], _addrs)) {
> -break;
> -}
> +if (!extract_lsp_addresses(pb->mac[0], _addrs)) {
> +continue;
>  }
>
>  pfd = shash_find_data(_prefixd, pb->logical_port);
> --
> 2.40.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v3 16/16] northd: Add northd change handler for sync_to_sb_lb node.

2024-01-08 Thread Han Zhou

On Mon, Jan 8, 2024 at 3:52 PM Numan Siddique  wrote:
>
> On Fri, Jan 5, 2024 at 11:36 AM Numan Siddique  wrote:
> >
> > On Tue, Dec 19, 2023 at 2:37 AM Han Zhou  wrote:
> > >
> > > On Mon, Nov 27, 2023 at 6:39 PM  wrote:
> > > >
> > > > From: Numan Siddique 
> > > >
> > > > Any changes to northd engine node due to load balancers
> > > > are now handled in 'sync_to_sb_lb' node to sync the changed
> > > > load balancers to SB load balancers.
> > > >
> > > > The logic to sync the SB load balancers is changed a bit and it
> > > > now mimics the SB lflow sync.
> > > >
> > > > Below are the scale testing results done with all the patches
applied
> > > > in this series using ovn-heater.  The test ran the scenario  -
> > > > ocp-500-density-heavy.yml [1].
> > > >
> > > > The resuts are:
> > > >
> > > >
> > >
---
> > > > Min (s) Median (s)  90%ile (s)
> > >  99%ile (s)  Max (s) Mean (s)Total (s)   Count
> > > Failed
> > > >
> > >
---
> > > > Iteration Total 0.1368831.1290161.192001
> > >  1.2041671.2127280.66501783.127099   125
0
> > > > Namespace.add_ports 0.0052160.0057360.007034
> > >  0.0154860.0189780.0062110.776373125
0
> > > > WorkerNode.bind_port0.0350300.0460820.052469
> > >  0.0582930.0603110.04597311.493259   250
0
> > > > WorkerNode.ping_port0.0050570.0067271.047692
> > >  1.0692531.0713360.26689666.724094   250
0
> > > >
> > >
---
> > > >
> > > > The results with the present main [2] are:
> > > >
> > > >
> > >
---
> > > > Min (s) Median (s)  90%ile (s)
> > >  99%ile (s)  Max (s) Mean (s)Total (s)   Count
> > > Failed
> > > >
> > >
---
> > > > Iteration Total 0.1354912.2238053.311270
> > >  3.3390783.3453461.729172216.146495  125
0
> > > > Namespace.add_ports 0.0053800.0057440.006819
> > >  0.0187730.0208000.0062920.786532125
0
> > > > WorkerNode.bind_port0.0341790.0460550.053488
> > >  0.0588010.0710430.04611711.529311   250
0
> > > > WorkerNode.ping_port0.0049560.0069523.086952
> > >  3.1917433.1928070.791544197.886026  250
0
> > > >
> > >
---
> > > >
> > > > [1] -
> > >
https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> > > > [2] - 2a12cda890a7("controller, northd: Wait for cleanup before
replying
> > > to exit")
> > > >
> > > > Signed-off-by: Numan Siddique 
> > > > ---
> > > >  northd/en-sync-sb.c  | 445
+--
> > > >  northd/inc-proc-northd.c |   1 +
> > > >  northd/lflow-mgr.c   | 196 ++---
> > > >  northd/lflow-mgr.h   |  23 +-
> > > >  northd/northd.c  | 238 -
> > > >  northd/northd.h  |   6 -
> > > >  tests/ovn-northd.at  | 103 +
> > > >  7 files changed, 585 insertions(+), 427 deletions(-)
> > > >
> > > > dif

Re: [ovs-dev] [PATCH ovn v2] northd: Support CIDR-based MAC binding aging threshold.

2023-12-29 Thread Han Zhou

On Thu, Dec 21, 2023 at 11:30 AM Mark Michelson  wrote:
>
> Thanks for the updates, Han, and apologies for the long time it took
> before someone reviewed this.
>
> Acked-by: Mark Michelson 
>
> I think there is potential for optimization here, specifically
> incremental processing of mac binding age thresholds and caching the
> minimum aging threshold between northd runs. However, I suspect the
> number of CIDRs specified for a given datapath will be low. Therefore, I
> think we can hold off on optimizations until a use case with large
> numbers of CIDRs shows itself.
>

Thanks Mark. I applied it to main.

Regards,
Han

> On 11/7/23 23:36, Han Zhou wrote:
> > Enhance MAC_Binding aging to allow CIDR-based threshold configurations.
> > This enables distinct threshold settings for different IP ranges,
> > applying the longest prefix matching for overlapping ranges.
> >
> > A common use case involves setting a default threshold for all IPs,
while
> > disabling aging for a specific range and potentially excluding a subset
> > within that range.
> >
> > Signed-off-by: Han Zhou 
> > ---
> > v2: Addressed comments from Ilya, Ales and Dumitru:
> >- Add NEWS.
> >- Use strtok_r in parse_aging_threshold.
> >- Do not call parse_aging_threshold in en_fdb_aging_run.
> >- Use size_t for array size.
> >- Add non-masked ipv6 in test case.
> >
> >   NEWS|   2 +
> >   northd/aging.c  | 291 +---
> >   northd/aging.h  |   3 +
> >   northd/northd.c |  11 +-
> >   ovn-nb.xml  |  63 ++-
> >   tests/ovn.at|  60 ++
> >   6 files changed, 409 insertions(+), 21 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 30f6edb282ca..74f0303280ca 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -5,6 +5,8 @@ Post v23.09.0
> >   connection method and doesn't require additional probing.
> >   external_ids:ovn-openflow-probe-interval configuration option for
> >   ovn-controller no longer matters and is ignored.
> > +  - Support CIDR based MAC binding aging threshold. See ovn-nb(5) for
> > +'mac_binding_age_threshold' for more details.
> >
> >   OVN v23.09.0 - 15 Sep 2023
> >   --
> > diff --git a/northd/aging.c b/northd/aging.c
> > index f626c72c8ca3..cdf5f4464e10 100644
> > --- a/northd/aging.c
> > +++ b/northd/aging.c
> > @@ -47,12 +47,250 @@ aging_waker_schedule_next_wake(struct aging_waker
*waker, int64_t next_wake_ms)
> >   }
> >   }
> >
> > +struct threshold_entry {
> > +union {
> > +ovs_be32 ipv4;
> > +struct in6_addr ipv6;
> > +} prefix;
> > +bool is_v4;
> > +unsigned int plen;
> > +unsigned int threshold;
> > +};
> > +
> > +/* Contains CIDR-based aging threshold configuration parsed from
> > + * "Logical_Router:options:mac_binding_age_threshold".
> > + *
> > + * This struct is also used for non-CIDR-based threshold, e.g. the
ones from
> > + * "NB_Global:other_config:fdb_age_threshold" for the common
aging_context
> > + * interface.
> > + *
> > + * - The arrays `v4_entries` and `v6_entries` are populated with
parsed entries
> > + *   for IPv4 and IPv6 CIDRs, respectively, along with their associated
> > + *   thresholds.  Entries within these arrays are sorted by prefix
length,
> > + *   starting with the longest.
> > + *
> > + * - If a threshold is provided without an accompanying prefix, it's
captured
> > + *   in `default_threshold`.  In cases with multiple unprefixed
thresholds,
> > + *   `default_threshold` will only store the last one.  */
> > +struct threshold_config {
> > +struct threshold_entry *v4_entries;
> > +size_t n_v4_entries;
> > +struct threshold_entry *v6_entries;
> > +size_t n_v6_entries;
> > +unsigned int default_threshold;
> > +};
> > +
> > +static int
> > +compare_entries_by_prefix_length(const void *a, const void *b)
> > +{
> > +const struct threshold_entry *entry_a = a;
> > +const struct threshold_entry *entry_b = b;
> > +
> > +return entry_b->plen - entry_a->plen;
> > +}
> > +
> > +/* Parse an ENTRY in the threshold option, with the format:
> > + * [CIDR:]THRESHOLD
> > + *
> > + * Returns true if successful, false if failed. */
> > +static bool
> > +parse_threshold_entry(const char *str, struct threshold_entry *entry)
> > +{
> > +char *colon_ptr;
> >

Re: [ovs-dev] [PATCH ovn v3 16/16] northd: Add northd change handler for sync_to_sb_lb node.

2023-12-18 Thread Han Zhou

On Mon, Nov 27, 2023 at 6:39 PM  wrote:
>
> From: Numan Siddique 
>
> Any changes to northd engine node due to load balancers
> are now handled in 'sync_to_sb_lb' node to sync the changed
> load balancers to SB load balancers.
>
> The logic to sync the SB load balancers is changed a bit and it
> now mimics the SB lflow sync.
>
> Below are the scale testing results done with all the patches applied
> in this series using ovn-heater.  The test ran the scenario  -
> ocp-500-density-heavy.yml [1].
>
> The resuts are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1368831.1290161.192001
 1.2041671.2127280.66501783.127099   125 0
> Namespace.add_ports 0.0052160.0057360.007034
 0.0154860.0189780.0062110.776373125 0
> WorkerNode.bind_port0.0350300.0460820.052469
 0.0582930.0603110.04597311.493259   250 0
> WorkerNode.ping_port0.0050570.0067271.047692
 1.0692531.0713360.26689666.724094   250 0
>
---
>
> The results with the present main [2] are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1354912.2238053.311270
 3.3390783.3453461.729172216.146495  125 0
> Namespace.add_ports 0.0053800.0057440.006819
 0.0187730.0208000.0062920.786532125 0
> WorkerNode.bind_port0.0341790.0460550.053488
 0.0588010.0710430.04611711.529311   250 0
> WorkerNode.ping_port0.0049560.0069523.086952
 3.1917433.1928070.791544197.886026  250 0
>
---
>
> [1] -
https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> [2] - 2a12cda890a7("controller, northd: Wait for cleanup before replying
to exit")
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/en-sync-sb.c  | 445 +--
>  northd/inc-proc-northd.c |   1 +
>  northd/lflow-mgr.c   | 196 ++---
>  northd/lflow-mgr.h   |  23 +-
>  northd/northd.c  | 238 -
>  northd/northd.h  |   6 -
>  tests/ovn-northd.at  | 103 +
>  7 files changed, 585 insertions(+), 427 deletions(-)
>
> diff --git a/northd/en-sync-sb.c b/northd/en-sync-sb.c
> index 73b30272c4..62c5dbd20f 100644
> --- a/northd/en-sync-sb.c
> +++ b/northd/en-sync-sb.c
> @@ -30,6 +30,7 @@
>  #include "lib/ovn-nb-idl.h"
>  #include "lib/ovn-sb-idl.h"
>  #include "lib/ovn-util.h"
> +#include "lflow-mgr.h"
>  #include "northd.h"
>
>  #include "openvswitch/vlog.h"
> @@ -53,6 +54,38 @@ static void build_port_group_address_set(const struct
nbrec_port_group *,
>   struct svec *ipv4_addrs,
>   struct svec *ipv6_addrs);
>
> +struct sb_lb_table;
> +struct sb_lb_record;
> +static void sb_lb_table_init(struct sb_lb_table *);
> +static void sb_lb_table_clear(struct sb_lb_table *);
> +static struct sb_lb_record *sb_lb_table_find(struct hmap *sb_lbs,
> + const struct uuid *);
> +static void sb_lb_table_build_and_sync(struct sb_lb_table *,
> +struct ovsdb_idl_txn *ovnsb_txn,
> +const struct sbrec_load_balancer_table *,
> +const struct
sbrec_logical_dp_group_table *,
> +struct hmap *lb_dps_map,
> +struct ovn_datapaths *ls_datapaths,
> +struct ovn_datapaths *lr_datapaths,
> +struct chassis_features *);
> +static void

Re: [ovs-dev] [PATCH ovn v3 15/16] northd: Add a noop handler for northd SB mac binding.

2023-12-18 Thread Han Zhou

On Mon, Dec 18, 2023 at 12:22 PM Numan Siddique  wrote:
>
> On Wed, Dec 13, 2023 at 9:57 AM Dumitru Ceara  wrote:
> >
> > On 11/28/23 03:38, num...@ovn.org wrote:
> > > From: Numan Siddique 
> > >
> > > Signed-off-by: Numan Siddique 
> > > ---
> > >  northd/inc-proc-northd.c | 9 -
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
> > > index 7b1c6597e2..28f397ff39 100644
> > > --- a/northd/inc-proc-northd.c
> > > +++ b/northd/inc-proc-northd.c
> > > @@ -185,7 +185,6 @@ void inc_proc_northd_init(struct ovsdb_idl_loop
*nb,
> > >  engine_add_input(_northd, _sb_mirror, NULL);
> > >  engine_add_input(_northd, _sb_meter, NULL);
> > >  engine_add_input(_northd, _sb_datapath_binding, NULL);
> > > -engine_add_input(_northd, _sb_mac_binding, NULL);
> > >  engine_add_input(_northd, _sb_dns, NULL);
> > >  engine_add_input(_northd, _sb_ha_chassis_group, NULL);
> > >  engine_add_input(_northd, _sb_ip_multicast, NULL);
> > > @@ -196,6 +195,14 @@ void inc_proc_northd_init(struct ovsdb_idl_loop
*nb,
> > >  engine_add_input(_northd, _global_config,
> > >   northd_global_config_handler);
> > >
> > > +/* northd engine node uses the sb mac binding table to
> > > + * cleanup mac binding entries for deleted logical ports
> > > + * and datapaths. Any update to to SB mac binding doesn't
> > > + * change the northd engine node state or data.  Hence
> > > + * it is ok to add a noop_handler here. */
> > > +engine_add_input(_northd, _sb_mac_binding,
> > > + engine_noop_handler);
> > > +
> >
> > Isn't this just a case of "ovn-northd" is not really interested in
> > change tracking for SB.MAC_Binding?  Can't we instead just disable
> > alerting, ovsdb_idl_omit_alert(..), for all SBREC_MAC_BINDING columns
> > like we do for other SB tables (lflow, multicast_group, meter,
> > portt_group, logical_dp_group)?
>
> I thought about that.  But mac_binding_ageing engine node also depends
> on SB mac_binding and it results in full recompute (no handler for it).
>
This is a reasonable consideration. In this case I would put such
consideration in the comment that explains why using the noop handler.

> If @Ales Musil  can confirm that the mac_binding_ageing node doesn't need
> to handle SB mac_binding changes,  then I'm fine with your suggestion.
>
I tend to believe that mac_binding_aging node doesn't really need to handle
mac_binding changes. The aging node processing should be triggered solely
by the aging timer. @Ales Musil  may help confirm.

Thanks,
Han

> Thanks
> Numan
>
> >
> > >  engine_add_input(_northd, _sb_port_binding,
> > >   northd_sb_port_binding_handler);
> > >  engine_add_input(_northd, _nb_logical_switch,
> >
> > Regards,
> > Dumitru
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v3 14/16] northd: Add I-P for NB_Global and SB_Global.

2023-12-18 Thread Han Zhou

On Wed, Dec 13, 2023 at 6:47 AM Dumitru Ceara  wrote:
>
> On 11/28/23 03:37, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > A new engine node "global_config" is added which handles the changes
> > to NB_Global an SB_Global tables.  It also creates these rows if
> > not present.
> >
> > Without the I-P, any changes to the options column of these tables
> > result in recompute of 'northd' and 'lflow' engine nodes.

This is not true. Common updates to NB_Global and SB_Global, such as nb_cfg
and timestamps changes, or external_ids changes populated by ovn-k8s, will
not trigger recompute.
Could you be more specific what recompute are avoided by this patch? I can
see for example, IPSec option change is handled with I-P, but these are
really rare changes. It seems most other option changes and chassis feature
changes would still trigger recompute with this patch.

> >
> > Signed-off-by: Numan Siddique 
> > ---
> >  northd/aging.c|  21 +-
> >  northd/automake.mk|   2 +
> >  northd/en-global-config.c | 588 ++
> >  northd/en-global-config.h |  65 +
> >  northd/en-lflow.c |  11 +-
> >  northd/en-northd.c|  52 ++--
> >  northd/en-northd.h|   2 +-
> >  northd/en-sync-sb.c   |  22 +-
> >  northd/inc-proc-northd.c  |  38 ++-
> >  northd/northd.c   | 230 +++
> >  northd/northd.h   |  24 +-
> >  tests/ovn-northd.at   | 256 +++--
> >  12 files changed, 1014 insertions(+), 297 deletions(-)
>
> I think most of the options values we interpret in the the NB_Global and
> SB_Global tables don't usually change (or don't change often).
>
> Doesn't it make sense to not have "proper I-P" for these tables and
> instead enhance northd_nb_nb_global_handler() to:
>
> - if one of the well known NB_Global/SB_Global options change trigger a
> recompute of the northd node
> - if one of the other options change then do a plain NB -> SB options sync
>
> I hope that can be done in way less than "1014 insertions(+), 297
> deletions(-)".
>
> What do you think?
>

+1. And I am even questioning "well known NB_Global/SB_Global options
change". Are there really such changes that need to be done frequently in
production?

Besides, I have a comment to the function check_nb_options_out_of_sync():
why not simply check all options by comparing two SMAPs? Otherwise it would
be easy to miss the check for a newly added option. The function name also
looks like it checks all options instead of some selected ones. And the
criteria for the selection is not clear.

Thanks,
Han


> Thanks,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v3 12/16] northd: Add lr_stateful handler for lflow engine node.

2023-12-18 Thread Han Zhou

On Mon, Nov 27, 2023 at 6:38 PM  wrote:
>
> From: Numan Siddique 
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/en-lflow.c|  29 +++
>  northd/en-lflow.h|   1 +
>  northd/en-lr-stateful.c  |  39 -
>  northd/en-lr-stateful.h  |  26 +++
>  northd/inc-proc-northd.c |   3 +-
>  northd/northd.c  | 370 ++-
>  northd/northd.h  |  10 ++
>  tests/ovn-northd.at  |  48 ++---
>  8 files changed, 336 insertions(+), 190 deletions(-)
>
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index 13284b5556..09748f570b 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -164,6 +164,35 @@ lflow_port_group_handler(struct engine_node *node,
void *data OVS_UNUSED)
>  return true;
>  }
>
> +bool
> +lflow_lr_stateful_handler(struct engine_node *node, void *data)
> +{
> +struct ed_type_lr_stateful *lr_sful_data =
> +engine_get_input_data("lr_stateful", node);
> +
> +if (!lr_stateful_has_tracked_data(_sful_data->trk_data)
> +|| lr_sful_data->trk_data.vip_nats_changed) {
> +return false;
> +}
> +
> +const struct engine_context *eng_ctx = engine_get_context();
> +struct lflow_data *lflow_data = data;
> +
> +struct lflow_input lflow_input;
> +lflow_get_input_data(node, _input);
> +
> +if (!lflow_handle_lr_stateful_changes(eng_ctx->ovnsb_idl_txn,
> +  _sful_data->trk_data,
> +  _input,
> +  lflow_data->lflow_table)) {
> +return false;
> +}
> +
> +engine_set_node_state(node, EN_UPDATED);
> +
> +return true;
> +}
> +
>  void *en_lflow_init(struct engine_node *node OVS_UNUSED,
>   struct engine_arg *arg OVS_UNUSED)
>  {
> diff --git a/northd/en-lflow.h b/northd/en-lflow.h
> index f7325c56b1..1d813a2a29 100644
> --- a/northd/en-lflow.h
> +++ b/northd/en-lflow.h
> @@ -20,5 +20,6 @@ void *en_lflow_init(struct engine_node *node, struct
engine_arg *arg);
>  void en_lflow_cleanup(void *data);
>  bool lflow_northd_handler(struct engine_node *, void *data);
>  bool lflow_port_group_handler(struct engine_node *, void *data);
> +bool lflow_lr_stateful_handler(struct engine_node *, void *data);
>
>  #endif /* EN_LFLOW_H */
> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
> index a54749ad93..8e025f057e 100644
> --- a/northd/en-lr-stateful.c
> +++ b/northd/en-lr-stateful.c
> @@ -39,6 +39,7 @@
>  #include "lib/ovn-sb-idl.h"
>  #include "lib/ovn-util.h"
>  #include "lib/stopwatch-names.h"
> +#include "lflow-mgr.h"
>  #include "northd.h"
>
>  VLOG_DEFINE_THIS_MODULE(en_lr_stateful);
> @@ -81,7 +82,7 @@ static void remove_lrouter_lb_reachable_ips(struct
lr_stateful_record *,
>  enum
lb_neighbor_responder_mode,
>  const struct sset *lb_ips_v4,
>  const struct sset
*lb_ips_v6);
> -static void lr_stateful_build_vip_nats(struct lr_stateful_record *);
> +static bool lr_stateful_build_vip_nats(struct lr_stateful_record *);
>
>  /* 'lr_stateful' engine node manages the NB logical router LB data.
>   */
> @@ -110,6 +111,7 @@ en_lr_stateful_clear_tracked_data(void *data_)
>  struct ed_type_lr_stateful *data = (struct ed_type_lr_stateful *)
data_;
>
>  hmapx_clear(>trk_data.crupdated);
> +data->trk_data.vip_nats_changed = false;
>  }
>
>  void
> @@ -190,6 +192,10 @@ lr_stateful_lb_data_handler(struct engine_node
*node, void *data_)
>
>  /* Add the lr_sful_rec rec to the tracking data. */
>  hmapx_add(>trk_data.crupdated, lr_sful_rec);
> +
> +if (!sset_is_empty(_sful_rec->vip_nats)) {
> +data->trk_data.vip_nats_changed = true;
> +}
>  continue;
>  }
>
> @@ -298,7 +304,9 @@ lr_stateful_lb_data_handler(struct engine_node *node,
void *data_)
>   * vip nats. */
>  HMAPX_FOR_EACH (hmapx_node, >trk_data.crupdated) {
>  lr_sful_rec = hmapx_node->data;
> -lr_stateful_build_vip_nats(lr_sful_rec);
> +if (lr_stateful_build_vip_nats(lr_sful_rec)) {
> +data->trk_data.vip_nats_changed = true;
> +}
>  lr_sful_rec->has_lb_vip = od_has_lb_vip(lr_sful_rec->od);
>  }
>
> @@ -335,8 +343,13 @@ lr_stateful_lr_nat_handler(struct engine_node *node,
void *data_)
>  lrnat_rec,
>  input_data.lb_datapaths_map,
>
 input_data.lbgrp_datapaths_map);
> +if (!sset_is_empty(_sful_rec->vip_nats)) {
> +data->trk_data.vip_nats_changed = true;
> +}
>  } else {
> -lr_stateful_build_vip_nats(lr_sful_rec);
> +if (lr_stateful_build_vip_nats(lr_sful_rec)) {
> +

Re: [ovs-dev] [PATCH ovn v3 11/16] northd: Handle lb changes in lflow engine.

2023-12-17 Thread Han Zhou

On Mon, Nov 27, 2023 at 6:38 PM  wrote:
>
> From: Numan Siddique 
>
> Since northd tracked data has the changed lb data, northd
> engine handler for lflow engine now handles the lb changes
> incrementally.  All the lflows generated for each lb is
> stored in the ovn_lb_datapaths->lflow_ref and this is used
> similar to how we handle ovn_port changes.
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/en-lflow.c   | 11 +++---
>  northd/lflow-mgr.c  | 41 ++-
>  northd/lflow-mgr.h  |  3 ++
>  northd/northd.c | 95 +
>  northd/northd.h | 28 +
>  tests/ovn-northd.at | 30 ++
>  6 files changed, 186 insertions(+), 22 deletions(-)
>
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index 65d2f45ebc..13284b5556 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -125,11 +125,6 @@ lflow_northd_handler(struct engine_node *node,
>  return false;
>  }
>
> -/* Fall back to recompute if load balancers have changed. */
> -if (northd_has_lbs_in_tracked_data(_data->trk_data)) {
> -return false;
> -}
> -
>  const struct engine_context *eng_ctx = engine_get_context();
>  struct lflow_data *lflow_data = data;
>
> @@ -142,6 +137,12 @@ lflow_northd_handler(struct engine_node *node,
>  return false;
>  }
>
> +if (!lflow_handle_northd_lb_changes(eng_ctx->ovnsb_idl_txn,
> +_data->trk_data.trk_lbs,
> +_input, lflow_data->lflow_table)) {
> +return false;
> +}
> +
>  engine_set_node_state(node, EN_UPDATED);
>  return true;
>  }
> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
> index 08962e9172..d779e7e087 100644
> --- a/northd/lflow-mgr.c
> +++ b/northd/lflow-mgr.c
> @@ -105,6 +105,10 @@ static void ovn_dp_group_add_with_reference(struct
ovn_lflow *,
>  size_t bitmap_len);
>
>  static void unlink_lflows_from_datapath(struct lflow_ref *);
> +static void unlink_lflows_from_all_datapaths(struct lflow_ref *,
> + size_t n_ls_datapaths,
> + size_t n_lr_datapaths);
> +
>  static void lflow_ref_sync_lflows_to_sb__(struct lflow_ref  *,
>  struct lflow_table *,
>  struct ovsdb_idl_txn *ovnsb_txn,
> @@ -394,6 +398,15 @@ lflow_ref_clear_lflows(struct lflow_ref *lflow_ref)
>  unlink_lflows_from_datapath(lflow_ref);
>  }
>
> +void
> +lflow_ref_clear_lflows_for_all_dps(struct lflow_ref *lflow_ref,
> +   size_t n_ls_datapaths,
> +   size_t n_lr_datapaths)
> +{
> +unlink_lflows_from_all_datapaths(lflow_ref, n_ls_datapaths,
> + n_lr_datapaths);
> +}
> +
>  void
>  lflow_ref_clear_and_sync_lflows(struct lflow_ref *lflow_ref,
>  struct lflow_table *lflow_table,
> @@ -462,7 +475,9 @@ lflow_table_add_lflow(struct lflow_table *lflow_table,
>  /*  lflow_ref_node for this lflow doesn't exist yet.  Add
it. */
>  struct lflow_ref_node *ref_node = xzalloc(sizeof *ref_node);
>  ref_node->lflow = lflow;
> -ref_node->dp_index = od->index;
> +if (od) {
> +ref_node->dp_index = od->index;
> +}
>  ovs_list_insert(_ref->lflows_ref_list,
>  _node->ref_list_node);
>
> @@ -1047,6 +1062,30 @@ unlink_lflows_from_datapath(struct lflow_ref
*lflow_ref)
>  }
>  }
>
> +static void
> +unlink_lflows_from_all_datapaths(struct lflow_ref *lflow_ref,
> + size_t n_ls_datapaths,
> + size_t n_lr_datapaths)
> +{
> +struct lflow_ref_node *ref_node;
> +struct ovn_lflow *lflow;
> +LIST_FOR_EACH (ref_node, ref_list_node, _ref->lflows_ref_list)
{
> +size_t n_datapaths;
> +size_t index;
> +
> +lflow = ref_node->lflow;
> +if (ovn_stage_to_datapath_type(lflow->stage) == DP_SWITCH) {
> +n_datapaths = n_ls_datapaths;
> +} else {
> +n_datapaths = n_lr_datapaths;
> +}
> +
> +BITMAP_FOR_EACH_1 (index, n_datapaths, lflow->dpg_bitmap) {
> +bitmap_set0(lflow->dpg_bitmap, index);
> +}

It would be way more efficient to replace the loop by ULLONG_SET0.

Thanks,
Han

> +}
> +}
> +
>  static void
>  lflow_ref_sync_lflows_to_sb__(struct lflow_ref  *lflow_ref,
>  struct lflow_table *lflow_table,
> diff --git a/northd/lflow-mgr.h b/northd/lflow-mgr.h
> index 02b74aa131..c65cd70e71 100644
> --- a/northd/lflow-mgr.h
> +++ b/northd/lflow-mgr.h
> @@ -54,6 +54,9 @@ struct lflow_ref *lflow_ref_alloc(const char *res_name);
>  void lflow_ref_destroy(struct lflow_ref *);
>  void lflow_ref_reset(struct lflow_ref *lflow_ref);
>

Re: [ovs-dev] [PATCH ovn v2 07/18] northd: Generate logical router's LB and NAT flows using lr_lbnat_data.

2023-12-14 Thread Han Zhou

On Tue, Nov 14, 2023 at 10:42 PM Han Zhou  wrote:
>
>
>
> On Thu, Oct 26, 2023 at 11:16 AM  wrote:
> >
> > From: Numan Siddique 
> >
> > Previous commits added new engine nodes to store logical router's lb
> > and NAT data.  Make use of the data stored by these engine nodes
> > to generate logical flows related to router's LBs and NATs.
> >
> > Signed-off-by: Numan Siddique 
> > ---
> >  northd/en-lflow.c  |   3 -
> >  northd/en-lr-lb-nat-data.h |   4 +
> >  northd/inc-proc-northd.c   |   1 -
> >  northd/northd.c| 752 -
> >  northd/northd.h|   1 -
> >  5 files changed, 496 insertions(+), 265 deletions(-)
> >
> > diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> > index 9cb0ead3f0..229f4be1d0 100644
> > --- a/northd/en-lflow.c
> > +++ b/northd/en-lflow.c
> > @@ -42,8 +42,6 @@ lflow_get_input_data(struct engine_node *node,
> >  engine_get_input_data("port_group", node);
> >  struct sync_meters_data *sync_meters_data =
> >  engine_get_input_data("sync_meters", node);
> > -struct ed_type_lr_nat_data *lr_nat_data =
> > -engine_get_input_data("lr_nat", node);
> >  struct ed_type_lr_lb_nat_data *lr_lb_nat_data =
> >  engine_get_input_data("lr_lb_nat_data", node);
> >
> > @@ -68,7 +66,6 @@ lflow_get_input_data(struct engine_node *node,
> >  lflow_input->ls_ports = _data->ls_ports;
> >  lflow_input->lr_ports = _data->lr_ports;
> >  lflow_input->ls_port_groups = _data->ls_port_groups;
> > -lflow_input->lr_nats = _nat_data->lr_nats;
> >  lflow_input->lr_lbnats = _lb_nat_data->lr_lbnats;
> >  lflow_input->meter_groups = _meters_data->meter_groups;
> >  lflow_input->lb_datapaths_map = _data->lb_datapaths_map;
> > diff --git a/northd/en-lr-lb-nat-data.h b/northd/en-lr-lb-nat-data.h
> > index 9029aee339..ffe41cad73 100644
> > --- a/northd/en-lr-lb-nat-data.h
> > +++ b/northd/en-lr-lb-nat-data.h
> > @@ -56,6 +56,10 @@ struct lr_lb_nat_data_table {
> >  #define LR_LB_NAT_DATA_TABLE_FOR_EACH(LR_LB_NAT_REC, TABLE) \
> >  HMAP_FOR_EACH (LR_LB_NAT_REC, key_node, &(TABLE)->entries)
> >
> > +#define LR_LB_NAT_DATA_TABLE_FOR_EACH_IN_P(LR_LB_NAT_REC, JOBID,
TABLE) \
> > +HMAP_FOR_EACH_IN_PARALLEL (LR_LB_NAT_REC, key_node, JOBID, \
> > +   &(TABLE)->entries)
> > +
> >  struct lr_lb_nat_data_tracked_data {
> >  /* Created or updated logical router with LB data. */
> >  struct hmapx crupdated; /* Stores 'struct lr_lb_nat_data_record'.
*/
> > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
> > index 369a151fa3..84627070a8 100644
> > --- a/northd/inc-proc-northd.c
> > +++ b/northd/inc-proc-northd.c
> > @@ -228,7 +228,6 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
> >  engine_add_input(_lflow, _sb_igmp_group, NULL);
> >  engine_add_input(_lflow, _northd, lflow_northd_handler);
> >  engine_add_input(_lflow, _port_group,
lflow_port_group_handler);
> > -engine_add_input(_lflow, _lr_nat, NULL);
> >  engine_add_input(_lflow, _lr_lb_nat_data, NULL);
> >
> >  engine_add_input(_sync_to_sb_addr_set, _nb_address_set,
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 24df14c0de..1877cbc7df 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -8854,18 +8854,14 @@ build_lrouter_groups(struct hmap *lr_ports,
struct ovs_list *lr_list)
> >   */
> >  static void
> >  build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port *op,
> > -   uint32_t priority,
> > -   struct ovn_datapath *od,
> > -   const struct lr_nat_table
*lr_nats,
> > -   struct hmap *lflows)
> > +uint32_t priority,
> > +const struct ovn_datapath *od,
> > +const struct lr_nat_record
*lrnat_rec,
> > +struct hmap *lflows)
> >  {
> >  struct ds eth_src = DS_EMPTY_INITIALIZER;
> >  struct ds match = DS_EMPTY_INITIALIZER;
> >
> > -const struct lr_nat_record *lrnat_rec = lr_nat_table_find_by_index(
> > -lr_nats, op->od->index);
> > -ovs_assert(lrnat_rec);
> >

Re: [ovs-dev] [PATCH ovn v2 00/18] northd lflow incremental processing

2023-11-16 Thread Han Zhou

On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique  wrote:
>
> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou  wrote:
> >
> > On Thu, Oct 26, 2023 at 11:12 AM  wrote:
> > >
> > > From: Numan Siddique 
> > >
> > > This patch series adds incremental processing in the lflow engine
> > > node to handle changes to northd and other engine nodes.
> > > Changed related to load balancers and NAT are mainly handled in
> > > this patch series.
> > >
> > > This patch series can also be found here -
> > https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1
> >
> > Thanks Numan for the great improvement!
>
> Hi Han,
>
> Thanks for the review comments.  I understand it is hard to review
> somany patches, especially related to I-P.
> I appreciate the time spent on it and  the feedback.
>
> >
> > I spent some time these days to review the series and now at patch 10. I
> > need to take a break and so I just sent the comments I had so far.
> >
> > I also did scale testing for northd with
> > https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp /
> > chassis setup, and noticed that the recompute latency has increased 20%
> > after the series. I think in general it is expected to have some
> > performance penalty for the recompute case because of the new indexes
> > created for I-P. However, the patch 10 "northd: Refactor lflow
management
> > into a separate module." alone introduces a 10% latency increase, which
> > necessitates more investigation.
>
> Before sending out the series I  did some testing on recomputes with a
> large OVN NB DB and SB DB
> (from a 500 node ovn-heater density heavy run).
> I'm aware of the increase in recomputes.  And I did some more testing
> today as well.
>
> In my testing,  I can see that the increase in latency is due to the
> new engine nodes added (lr_lbnat mainly)
> and due to housekeeping of the lflow references.  I do not see any
> increase due to the new lflow-mgr.c added in patch 10.
>
> I compared patch 9 and patch 10 separately and there is no difference
> in lflow engine node recompute time between patch 9 and 10.
>

My results were different. My test profile simulates the ovn-k8s topology
(central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small
amount of ACLs and PGs.
(
https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_1pg
)

The test results for ovn-northd recompute time are:
main: 1118.3
p9: 1129.5
p10: 1243.4 ==> 10% more than p9
p18: 1357.6

I am not sure if the p10 increase is related to the hash change or
something else.

> Below are the results with the time taken for the mentioned engine
> nodes in msecs for a recompute for some of the individual patches in
> the series.
>
>
> The sample OVN DBs have
>
> 
> Resource  Total
> ---
> Logical switches1001
> 
> Logical routers  501
> 
> Router ports 1501
> 
> Switch ports 29501
> 
> Load balancers35253
> 
> Load balancer group 1
> 
> SB Logical flows268349
> 
> SB DB groups  509
> 
>
> There is one load balancer group which has all the load balancers and
> it is associated with all the logical switches and routers.
>
> Below is the time taken for each engine node in msec
>
> -
> Engine nodes | lb_data | northd  | lr_lbnat  | lflow  |
> -
> ovn-northd-main  | 358  | 2455| x | 2082 |
> -
> ovn-northd-p1| 373   | 2476| x | 2170   |
> -
> ovn-northd-p5| 367   | 2413| x | 2195   |
> -
> ovn-northd-p6| 354   | 688 | 1815  | 2442|
> -
> ovn-northd-p7| 357   | 683 | 1778  | 2806|
> -
> ovn-northd-p9| 352   | 682 | 1781  | 2781|
> -
> ovn-northd-p10   | 365  | 838 | 1707  |

Re: [ovs-dev] [PATCH ovn v2 00/18] northd lflow incremental processing

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:12 AM  wrote:
>
> From: Numan Siddique 
>
> This patch series adds incremental processing in the lflow engine
> node to handle changes to northd and other engine nodes.
> Changed related to load balancers and NAT are mainly handled in
> this patch series.
>
> This patch series can also be found here -
https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1

Thanks Numan for the great improvement!

I spent some time these days to review the series and now at patch 10. I
need to take a break and so I just sent the comments I had so far.

I also did scale testing for northd with
https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp /
chassis setup, and noticed that the recompute latency has increased 20%
after the series. I think in general it is expected to have some
performance penalty for the recompute case because of the new indexes
created for I-P. However, the patch 10 "northd: Refactor lflow management
into a separate module." alone introduces a 10% latency increase, which
necessitates more investigation.

>
> Prior to this patch series, most of the changes to northd engine
> resulted in full recomputation of logical flows.  This series
> aims to improve the performance of ovn-northd by adding the I-P
> support.

I'd like to clarify "most of the changes" a little. I think we should focus
on the most impactful changes that happen in a cloud environment. I don't
think it is realistic to achieve "most of the changes" in I-P because it is
too complex (even with this series we still handle a very small part of the
DB schema incrementally), but it may be realistic to handle the most
impactful changes, which are the most frequent changes in the cloud,
incrementally. Before this series, we could handle regular VIF changes and
related address-sets, port-groups (some of) updates incrementally. Those
are the most common changes related to pod/VM changes in cloud. I believe
the next goal is for LB changes related to pod/VMs (i.e. the LB backends),
which I believe is the main goal of this series. Is my understanding
correct?

While reviewing the patches, I'd say sometimes I feel a little lost because
it is hard to correlate some of the changes with the above goal. I believe
there is a reason for all the changes but I am not sure if it is directly
related to the goal of LB backend related I-P. I understand that there are
other aspects of LB that can be incrementally processed. But I'd recommend
if changes necessary for this goal can be largely reduced it would help the
review and we might be able to merge them sooner and add more but less
impactful I-P later step by step. I guess I will have a clearer picture
when I finish the rest of the patches, but it would be helpful if you could
add more guidance in this cover letter.

Thanks,
Han

>  In order to add this support, some of the northd engine
> node data (from struct ovn_datapath) is split and moved over to
> new engine nodes - mainly related to load balancers, NAT and ACLs.
>
> Below are the scale testing results done with these patches applied
> using ovn-heater.  The test ran the scenario  -
> ocp-500-density-heavy.yml [1].
>
> With all the lflow I-P patches applied, the resuts are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1368831.1290161.192001
 1.2041671.2127280.66501783.127099   125 0
> Namespace.add_ports 0.0052160.0057360.007034
 0.0154860.0189780.0062110.776373125 0
> WorkerNode.bind_port0.0350300.0460820.052469
 0.0582930.0603110.04597311.493259   250 0
> WorkerNode.ping_port0.0050570.0067271.047692
 1.0692531.0713360.26689666.724094   250 0
>
---
>
> The results with the present main are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1354912.2238053.311270
 3.339078

Re: [ovs-dev] [PATCH ovn v2 09/18] northd: Add a new node ls_lbacls.

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:17 AM  wrote:
>
> From: Numan Siddique 
>
> This new engine now maintains the load balancer and ACL data of a
> logical switch which was earlier part of northd engine node data.
> The main inputs to this engine are:
> - northd node
> - NB logical switch node
> - Port group node
>
> A record for each logical switch is maintained in the 'ls_lbacls'
> hmap table and this record stores the below data which was earlier
> part of 'struct ovn_datapath'.
>
> - bool has_stateful_acl;
> - bool has_lb_vip;
> - bool has_acls;
> - uint64_t max_acl_tier;

The node name indicates it contains LBs and ACLs of each LS, but it is
actually about the stateful related flags. So I think it is better to be
renamed to avoid confusion.

>
> This engine node becomes an input to 'lflow' node.
>
> Signed-off-by: Numan Siddique 
> ---
>  lib/stopwatch-names.h  |   1 +
>  northd/automake.mk |   2 +
>  northd/en-lflow.c  |   4 +
>  northd/en-lr-lb-nat-data.c |   8 +-
>  northd/en-lr-lb-nat-data.h |   2 +
>  northd/en-ls-lb-acls.c | 527 +
>  northd/en-ls-lb-acls.h |  88 +++
>  northd/en-port-group.h |   3 +
>  northd/inc-proc-northd.c   |   9 +
>  northd/northd.c| 271 +--
>  northd/northd.h|   7 +-
>  11 files changed, 776 insertions(+), 146 deletions(-)
>  create mode 100644 northd/en-ls-lb-acls.c
>  create mode 100644 northd/en-ls-lb-acls.h
>
> diff --git a/lib/stopwatch-names.h b/lib/stopwatch-names.h
> index 7d85acdaea..8b0018a593 100644
> --- a/lib/stopwatch-names.h
> +++ b/lib/stopwatch-names.h
> @@ -34,5 +34,6 @@
>  #define SYNC_METERS_RUN_STOPWATCH_NAME "sync_meters_run"
>  #define LR_NAT_RUN_STOPWATCH_NAME "lr_nat_run"
>  #define LR_LB_NAT_DATA_RUN_STOPWATCH_NAME "lr_lb_nat_data"
> +#define LS_LBACLS_RUN_STOPWATCH_NAME "lr_lb_acls"
>
>  #endif
> diff --git a/northd/automake.mk b/northd/automake.mk
> index 4116c487df..4593654726 100644
> --- a/northd/automake.mk
> +++ b/northd/automake.mk
> @@ -28,6 +28,8 @@ northd_ovn_northd_SOURCES = \
> northd/en-lr-nat.h \
> northd/en-lr-lb-nat-data.c \
> northd/en-lr-lb-nat-data.h \
> +   northd/en-ls-lb-acls.c \
> +   northd/en-ls-lb-acls.h \
> northd/inc-proc-northd.c \
> northd/inc-proc-northd.h \
> northd/ipam.c \
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index 229f4be1d0..648a477916 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -21,6 +21,7 @@
>  #include "en-lflow.h"
>  #include "en-lr-nat.h"
>  #include "en-lr-lb-nat-data.h"
> +#include "en-ls-lb-acls.h"
>  #include "en-northd.h"
>  #include "en-meters.h"
>
> @@ -44,6 +45,8 @@ lflow_get_input_data(struct engine_node *node,
>  engine_get_input_data("sync_meters", node);
>  struct ed_type_lr_lb_nat_data *lr_lb_nat_data =
>  engine_get_input_data("lr_lb_nat_data", node);
> +struct ed_type_ls_lbacls *ls_lbacls_data =
> +engine_get_input_data("ls_lbacls", node);
>
>  lflow_input->nbrec_bfd_table =
>  EN_OVSDB_GET(engine_get_input("NB_bfd", node));
> @@ -67,6 +70,7 @@ lflow_get_input_data(struct engine_node *node,
>  lflow_input->lr_ports = _data->lr_ports;
>  lflow_input->ls_port_groups = _data->ls_port_groups;
>  lflow_input->lr_lbnats = _lb_nat_data->lr_lbnats;
> +lflow_input->ls_lbacls = _lbacls_data->ls_lbacls;
>  lflow_input->meter_groups = _meters_data->meter_groups;
>  lflow_input->lb_datapaths_map = _data->lb_datapaths_map;
>  lflow_input->svc_monitor_map = _data->svc_monitor_map;
> diff --git a/northd/en-lr-lb-nat-data.c b/northd/en-lr-lb-nat-data.c
> index 19b638ce0b..d816d2321d 100644
> --- a/northd/en-lr-lb-nat-data.c
> +++ b/northd/en-lr-lb-nat-data.c
> @@ -299,9 +299,11 @@ lr_lb_nat_data_lb_data_handler(struct engine_node
*node, void *data_)
>  if (!hmapx_is_empty(>tracked_data.crupdated)) {
>  struct hmapx_node *hmapx_node;
>  /* For all the modified lr_lb_nat_data records (re)build the
> - * vip nats. */
> + * vip nats and re-evaluate 'has_lb_vip'. */
>  HMAPX_FOR_EACH (hmapx_node, >tracked_data.crupdated) {
> -lr_lb_nat_data_build_vip_nats(hmapx_node->data);
> +lr_lbnat_rec = hmapx_node->data;
> +lr_lb_nat_data_build_vip_nats(lr_lbnat_rec);
> +lr_lbnat_rec->has_lb_vip = od_has_lb_vip(lr_lbnat_rec->od);

Changes in this module should belong to an earlier patch instead of this
one?

>  }
>
>  data->tracked = true;
> @@ -523,6 +525,8 @@ lr_lb_nat_data_record_init(struct
lr_lb_nat_data_record *lr_lbnat_rec,
>  if (!nbr->n_nat) {
>  lr_lb_nat_data_build_vip_nats(lr_lbnat_rec);
>  }
> +
> +lr_lbnat_rec->has_lb_vip = od_has_lb_vip(lr_lbnat_rec->od);
>  }
>
>  static struct lr_lb_nat_data_input
> diff --git a/northd/en-lr-lb-nat-data.h b/northd/en-lr-lb-nat-data.h
> index

Re: [ovs-dev] [PATCH ovn v2 08/18] northd: Don't commit dhcp response flows in the conntrack.

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:16 AM  wrote:
>
> From: Numan Siddique 
>
> This is not required.
>
Thanks Numan for the fix. Could you provide a little more detail why this
is not required:
- Is it a bug fix? What's the impact?
- Shall we update the ovn-northd documentation for the related lflow
changes?
- Is there a reason this is in the I-P patch series?

Thanks,
Han

> Signed-off-by: Numan Siddique 
> ---
>  northd/northd.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 1877cbc7df..c8a224d3cd 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -9223,9 +9223,7 @@ build_dhcpv4_options_flows(struct ovn_port *op,
>  >nbsp->dhcpv4_options->options, "lease_time");
>  ovs_assert(server_id && server_mac && lease_time);
>  const char *dhcp_actions =
> -(op->od->has_stateful_acl || op->od->has_lb_vip)
> - ? REGBIT_ACL_VERDICT_ALLOW" = 1; ct_commit; next;"
> - : REGBIT_ACL_VERDICT_ALLOW" = 1; next;";
> +REGBIT_ACL_VERDICT_ALLOW" = 1; next;";
>  ds_clear();
>  ds_put_format(, "outport == %s && eth.src == %s "
>"&& ip4.src == %s && udp && udp.src == 67 "
> @@ -9308,9 +9306,7 @@ build_dhcpv6_options_flows(struct ovn_port *op,
>  ipv6_string_mapped(server_ip, );
>
>  const char *dhcp6_actions =
> -(op->od->has_stateful_acl || op->od->has_lb_vip)
> -? REGBIT_ACL_VERDICT_ALLOW" = 1; ct_commit;
next;"
> -: REGBIT_ACL_VERDICT_ALLOW" = 1; next;";
> +REGBIT_ACL_VERDICT_ALLOW" = 1; next;";
>  ds_clear();
>  ds_put_format(, "outport == %s && eth.src == %s "
>"&& ip6.src == %s && udp && udp.src == 547
"
> --
> 2.41.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2 07/18] northd: Generate logical router's LB and NAT flows using lr_lbnat_data.

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:16 AM  wrote:
>
> From: Numan Siddique 
>
> Previous commits added new engine nodes to store logical router's lb
> and NAT data.  Make use of the data stored by these engine nodes
> to generate logical flows related to router's LBs and NATs.
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/en-lflow.c  |   3 -
>  northd/en-lr-lb-nat-data.h |   4 +
>  northd/inc-proc-northd.c   |   1 -
>  northd/northd.c| 752 -
>  northd/northd.h|   1 -
>  5 files changed, 496 insertions(+), 265 deletions(-)
>
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index 9cb0ead3f0..229f4be1d0 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -42,8 +42,6 @@ lflow_get_input_data(struct engine_node *node,
>  engine_get_input_data("port_group", node);
>  struct sync_meters_data *sync_meters_data =
>  engine_get_input_data("sync_meters", node);
> -struct ed_type_lr_nat_data *lr_nat_data =
> -engine_get_input_data("lr_nat", node);
>  struct ed_type_lr_lb_nat_data *lr_lb_nat_data =
>  engine_get_input_data("lr_lb_nat_data", node);
>
> @@ -68,7 +66,6 @@ lflow_get_input_data(struct engine_node *node,
>  lflow_input->ls_ports = _data->ls_ports;
>  lflow_input->lr_ports = _data->lr_ports;
>  lflow_input->ls_port_groups = _data->ls_port_groups;
> -lflow_input->lr_nats = _nat_data->lr_nats;
>  lflow_input->lr_lbnats = _lb_nat_data->lr_lbnats;
>  lflow_input->meter_groups = _meters_data->meter_groups;
>  lflow_input->lb_datapaths_map = _data->lb_datapaths_map;
> diff --git a/northd/en-lr-lb-nat-data.h b/northd/en-lr-lb-nat-data.h
> index 9029aee339..ffe41cad73 100644
> --- a/northd/en-lr-lb-nat-data.h
> +++ b/northd/en-lr-lb-nat-data.h
> @@ -56,6 +56,10 @@ struct lr_lb_nat_data_table {
>  #define LR_LB_NAT_DATA_TABLE_FOR_EACH(LR_LB_NAT_REC, TABLE) \
>  HMAP_FOR_EACH (LR_LB_NAT_REC, key_node, &(TABLE)->entries)
>
> +#define LR_LB_NAT_DATA_TABLE_FOR_EACH_IN_P(LR_LB_NAT_REC, JOBID, TABLE) \
> +HMAP_FOR_EACH_IN_PARALLEL (LR_LB_NAT_REC, key_node, JOBID, \
> +   &(TABLE)->entries)
> +
>  struct lr_lb_nat_data_tracked_data {
>  /* Created or updated logical router with LB data. */
>  struct hmapx crupdated; /* Stores 'struct lr_lb_nat_data_record'. */
> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
> index 369a151fa3..84627070a8 100644
> --- a/northd/inc-proc-northd.c
> +++ b/northd/inc-proc-northd.c
> @@ -228,7 +228,6 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
>  engine_add_input(_lflow, _sb_igmp_group, NULL);
>  engine_add_input(_lflow, _northd, lflow_northd_handler);
>  engine_add_input(_lflow, _port_group,
lflow_port_group_handler);
> -engine_add_input(_lflow, _lr_nat, NULL);
>  engine_add_input(_lflow, _lr_lb_nat_data, NULL);
>
>  engine_add_input(_sync_to_sb_addr_set, _nb_address_set,
> diff --git a/northd/northd.c b/northd/northd.c
> index 24df14c0de..1877cbc7df 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -8854,18 +8854,14 @@ build_lrouter_groups(struct hmap *lr_ports,
struct ovs_list *lr_list)
>   */
>  static void
>  build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port *op,
> -   uint32_t priority,
> -   struct ovn_datapath *od,
> -   const struct lr_nat_table
*lr_nats,
> -   struct hmap *lflows)
> +uint32_t priority,
> +const struct ovn_datapath *od,
> +const struct lr_nat_record
*lrnat_rec,
> +struct hmap *lflows)
>  {
>  struct ds eth_src = DS_EMPTY_INITIALIZER;
>  struct ds match = DS_EMPTY_INITIALIZER;
>
> -const struct lr_nat_record *lrnat_rec = lr_nat_table_find_by_index(
> -lr_nats, op->od->index);
> -ovs_assert(lrnat_rec);
> -
>  /* Self originated ARP requests/RARP/ND need to be flooded to the L2
domain
>   * (except on router ports).  Determine that packets are self
originated
>   * by also matching on source MAC. Matching on ingress port is not
> @@ -8952,7 +8948,8 @@ lrouter_port_ipv6_reachable(const struct ovn_port
*op,
>   */
>  static void
>  build_lswitch_rport_arp_req_flow(const char *ips,
> -int addr_family, struct ovn_port *patch_op, struct ovn_datapath *od,
> +int addr_family, struct ovn_port *patch_op,
> +const struct ovn_datapath *od,
>  uint32_t priority, struct hmap *lflows,
>  const struct ovsdb_idl_row *stage_hint)
>  {
> @@ -8993,8 +8990,6 @@ static void
>  build_lswitch_rport_arp_req_flows(struct ovn_port *op,
>struct ovn_datapath *sw_od,
>struct ovn_port *sw_op,
>

Re: [ovs-dev] [PATCH ovn v2 05/18] northd: Add a new engine 'lr-nat' to manage lr NAT data.

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:15 AM  wrote:
>
> From: Numan Siddique 
>
> This new engine now maintains the NAT related data for each
> logical router which was earlier maintained by the northd
> engine node in the 'struct ovn_datapath'.  Main inputs to
> this engine node are:
>- northd
>- NB logical router
>
> A record for each logical router is maintained in the 'lr_nats'
> hmap table and this record
>   - stores the ovn_nat's

It seems the sentence is incomplete.

>
> Handlers are also added to handle the changes to both these
> inputs.  This engine node becomes an input to 'lflow' node.
> This essentially decouples the lr NAT data from the northd
> engine node.
>
> Signed-off-by: Numan Siddique 
> ---
>  lib/ovn-util.c   |   6 +-
>  lib/ovn-util.h   |   2 +-
>  lib/stopwatch-names.h|   1 +
>  northd/automake.mk   |   2 +
>  northd/en-lflow.c|   5 +
>  northd/en-lr-nat.c   | 498 +
>  northd/en-lr-nat.h   | 134 ++
>  northd/en-sync-sb.c  |  11 +-
>  northd/inc-proc-northd.c |   9 +
>  northd/northd.c  | 514 ++-
>  northd/northd.h  |  32 ++-
>  tests/ovn-northd.at  |  18 ++
>  12 files changed, 877 insertions(+), 355 deletions(-)
>  create mode 100644 northd/en-lr-nat.c
>  create mode 100644 northd/en-lr-nat.h
>
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index 33105202f2..05e635a6b4 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -395,7 +395,7 @@ extract_sbrec_binding_first_mac(const struct
sbrec_port_binding *binding,
>  }
>
>  bool
> -lport_addresses_is_empty(struct lport_addresses *laddrs)
> +lport_addresses_is_empty(const struct lport_addresses *laddrs)
>  {
>  return !laddrs->n_ipv4_addrs && !laddrs->n_ipv6_addrs;
>  }
> @@ -405,6 +405,10 @@ destroy_lport_addresses(struct lport_addresses
*laddrs)
>  {
>  free(laddrs->ipv4_addrs);
>  free(laddrs->ipv6_addrs);
> +laddrs->ipv4_addrs = NULL;
> +laddrs->ipv6_addrs = NULL;
> +laddrs->n_ipv4_addrs = 0;
> +laddrs->n_ipv6_addrs = 0;
>  }
>
>  /* Returns a string of the IP address of 'laddrs' that overlaps with
'ip_s'.
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index bff50dbde9..5805415885 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -112,7 +112,7 @@ bool extract_sbrec_binding_first_mac(const struct
sbrec_port_binding *binding,
>  bool extract_lrp_networks__(char *mac, char **networks, size_t
n_networks,
>  struct lport_addresses *laddrs);
>
> -bool lport_addresses_is_empty(struct lport_addresses *);
> +bool lport_addresses_is_empty(const struct lport_addresses *);
>  void destroy_lport_addresses(struct lport_addresses *);
>  const char *find_lport_address(const struct lport_addresses *laddrs,
> const char *ip_s);
> diff --git a/lib/stopwatch-names.h b/lib/stopwatch-names.h
> index 3452cc71cf..0a16da211e 100644
> --- a/lib/stopwatch-names.h
> +++ b/lib/stopwatch-names.h
> @@ -32,5 +32,6 @@
>  #define LFLOWS_TO_SB_STOPWATCH_NAME "lflows_to_sb"
>  #define PORT_GROUP_RUN_STOPWATCH_NAME "port_group_run"
>  #define SYNC_METERS_RUN_STOPWATCH_NAME "sync_meters_run"
> +#define LR_NAT_RUN_STOPWATCH_NAME "lr_nat_run"
>
>  #endif
> diff --git a/northd/automake.mk b/northd/automake.mk
> index cf622fc3c9..ae367a2a8b 100644
> --- a/northd/automake.mk
> +++ b/northd/automake.mk
> @@ -24,6 +24,8 @@ northd_ovn_northd_SOURCES = \
> northd/en-sync-from-sb.h \
> northd/en-lb-data.c \
> northd/en-lb-data.h \
> +   northd/en-lr-nat.c \
> +   northd/en-lr-nat.h \
> northd/inc-proc-northd.c \
> northd/inc-proc-northd.h \
> northd/ipam.c \
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index 96d03b7ada..22f398d419 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -19,6 +19,7 @@
>  #include 
>
>  #include "en-lflow.h"
> +#include "en-lr-nat.h"
>  #include "en-northd.h"
>  #include "en-meters.h"
>
> @@ -40,6 +41,9 @@ lflow_get_input_data(struct engine_node *node,
>  engine_get_input_data("port_group", node);
>  struct sync_meters_data *sync_meters_data =
>  engine_get_input_data("sync_meters", node);
> +struct ed_type_lr_nat_data *lr_nat_data =
> +engine_get_input_data("lr_nat", node);
> +
>  lflow_input->nbrec_bfd_table =
>  EN_OVSDB_GET(engine_get_input("NB_bfd", node));
>  lflow_input->sbrec_bfd_table =
> @@ -61,6 +65,7 @@ lflow_get_input_data(struct engine_node *node,
>  lflow_input->ls_ports = _data->ls_ports;
>  lflow_input->lr_ports = _data->lr_ports;
>  lflow_input->ls_port_groups = _data->ls_port_groups;
> +lflow_input->lr_nats = _nat_data->lr_nats;
>  lflow_input->meter_groups = _meters_data->meter_groups;
>  lflow_input->lb_datapaths_map = _data->lb_datapaths_map;
>  lflow_input->svc_monitor_map = _data->svc_monitor_map;
> diff --git a/northd/en-lr-nat.c

Re: [ovs-dev] [PATCH ovn v2 04/18] northd: Move router ports SB PB options sync to sync_to_sb_pb node.

2023-11-14 Thread Han Zhou

On Tue, Nov 14, 2023 at 9:40 PM Han Zhou  wrote:
>
>
>
> On Thu, Oct 26, 2023 at 11:15 AM  wrote:
> >
> > From: Numan Siddique 
> >
> > It also moves the logical router port IPv6 prefix delegation
> > updates to "sync-from-sb" engine node.
> >
> > Signed-off-by: Numan Siddique 
> > ---
> >  northd/en-northd.c  |   2 +-
> >  northd/en-sync-sb.c |   3 +-
> >  northd/northd.c | 283 ++--
> >  northd/northd.h |   6 +-
> >  tests/ovn-northd.at |  31 -
> >  5 files changed, 198 insertions(+), 127 deletions(-)
> >
> > diff --git a/northd/en-northd.c b/northd/en-northd.c
> > index 96c2ce9f69..13e731cad9 100644
> > --- a/northd/en-northd.c
> > +++ b/northd/en-northd.c
> > @@ -189,7 +189,7 @@ northd_sb_port_binding_handler(struct engine_node
*node,
> >  northd_get_input_data(node, _data);
> >
> >  if (!northd_handle_sb_port_binding_changes(
> > -input_data.sbrec_port_binding_table, >ls_ports)) {
> > +input_data.sbrec_port_binding_table, >ls_ports,
>lr_ports)) {
> >  return false;
> >  }
> >
> > diff --git a/northd/en-sync-sb.c b/northd/en-sync-sb.c
> > index 2540fcfb97..a14c609acd 100644
> > --- a/northd/en-sync-sb.c
> > +++ b/northd/en-sync-sb.c
> > @@ -288,7 +288,8 @@ en_sync_to_sb_pb_run(struct engine_node *node, void
*data OVS_UNUSED)
> >  const struct engine_context *eng_ctx = engine_get_context();
> >  struct northd_data *northd_data = engine_get_input_data("northd",
node);
> >
> > -sync_pbs(eng_ctx->ovnsb_idl_txn, _data->ls_ports);
> > +sync_pbs(eng_ctx->ovnsb_idl_txn, _data->ls_ports,
> > + _data->lr_ports);
> >  engine_set_node_state(node, EN_UPDATED);
> >  }
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 9ce1b2cb5a..c9c7045755 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -3419,6 +3419,9 @@ ovn_port_update_sbrec(struct ovsdb_idl_txn
*ovnsb_txn,
> >  {
> >  sbrec_port_binding_set_datapath(op->sb, op->od->sb);
> >  if (op->nbrp) {
> > +/* Note: SB port binding options for router ports are set in
> > + * sync_pbs(). */
> > +
> >  /* If the router is for l3 gateway, it resides on a chassis
> >   * and its port type is "l3gateway". */
> >  const char *chassis_name = smap_get(>od->nbr->options,
"chassis");
> > @@ -3430,15 +3433,11 @@ ovn_port_update_sbrec(struct ovsdb_idl_txn
*ovnsb_txn,
> >  sbrec_port_binding_set_type(op->sb, "patch");
> >  }
> >
> > -struct smap new;
> > -smap_init();
> >  if (is_cr_port(op)) {
> >  ovs_assert(sbrec_chassis_by_name);
> >  ovs_assert(sbrec_chassis_by_hostname);
> >  ovs_assert(sbrec_ha_chassis_grp_by_name);
> >  ovs_assert(active_ha_chassis_grps);
> > -const char *redirect_type = smap_get(>nbrp->options,
> > - "redirect-type");
> >
> >  if (op->nbrp->ha_chassis_group) {
> >  if (op->nbrp->n_gateway_chassis) {
> > @@ -3480,49 +3479,8 @@ ovn_port_update_sbrec(struct ovsdb_idl_txn
*ovnsb_txn,
> >  /* Delete the legacy gateway_chassis from the pb. */
> >  sbrec_port_binding_set_gateway_chassis(op->sb, NULL,
0);
> >  }
> > -smap_add(, "distributed-port", op->nbrp->name);
> > -
> > -bool always_redirect =
> > -!op->od->has_distributed_nat &&
> > -!l3dgw_port_has_associated_vtep_lports(op->l3dgw_port);
> > -
> > -if (redirect_type) {
> > -smap_add(, "redirect-type", redirect_type);
> > -/* XXX Why can't we enable always-redirect when
redirect-type
> > - * is bridged? */
> > -if (!strcmp(redirect_type, "bridged")) {
> > -always_redirect = false;
> > -}
> > -}
> > -
> > -if (always_redirect) {
> > -smap_add(, "always-redirect", "true");
> > -}
> > -} else {
> > -if (op->peer) {
> > -

Re: [ovs-dev] [PATCH ovn v2 04/18] northd: Move router ports SB PB options sync to sync_to_sb_pb node.

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:15 AM  wrote:
>
> From: Numan Siddique 
>
> It also moves the logical router port IPv6 prefix delegation
> updates to "sync-from-sb" engine node.
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/en-northd.c  |   2 +-
>  northd/en-sync-sb.c |   3 +-
>  northd/northd.c | 283 ++--
>  northd/northd.h |   6 +-
>  tests/ovn-northd.at |  31 -
>  5 files changed, 198 insertions(+), 127 deletions(-)
>
> diff --git a/northd/en-northd.c b/northd/en-northd.c
> index 96c2ce9f69..13e731cad9 100644
> --- a/northd/en-northd.c
> +++ b/northd/en-northd.c
> @@ -189,7 +189,7 @@ northd_sb_port_binding_handler(struct engine_node
*node,
>  northd_get_input_data(node, _data);
>
>  if (!northd_handle_sb_port_binding_changes(
> -input_data.sbrec_port_binding_table, >ls_ports)) {
> +input_data.sbrec_port_binding_table, >ls_ports,
>lr_ports)) {
>  return false;
>  }
>
> diff --git a/northd/en-sync-sb.c b/northd/en-sync-sb.c
> index 2540fcfb97..a14c609acd 100644
> --- a/northd/en-sync-sb.c
> +++ b/northd/en-sync-sb.c
> @@ -288,7 +288,8 @@ en_sync_to_sb_pb_run(struct engine_node *node, void
*data OVS_UNUSED)
>  const struct engine_context *eng_ctx = engine_get_context();
>  struct northd_data *northd_data = engine_get_input_data("northd",
node);
>
> -sync_pbs(eng_ctx->ovnsb_idl_txn, _data->ls_ports);
> +sync_pbs(eng_ctx->ovnsb_idl_txn, _data->ls_ports,
> + _data->lr_ports);
>  engine_set_node_state(node, EN_UPDATED);
>  }
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 9ce1b2cb5a..c9c7045755 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -3419,6 +3419,9 @@ ovn_port_update_sbrec(struct ovsdb_idl_txn
*ovnsb_txn,
>  {
>  sbrec_port_binding_set_datapath(op->sb, op->od->sb);
>  if (op->nbrp) {
> +/* Note: SB port binding options for router ports are set in
> + * sync_pbs(). */
> +
>  /* If the router is for l3 gateway, it resides on a chassis
>   * and its port type is "l3gateway". */
>  const char *chassis_name = smap_get(>od->nbr->options,
"chassis");
> @@ -3430,15 +3433,11 @@ ovn_port_update_sbrec(struct ovsdb_idl_txn
*ovnsb_txn,
>  sbrec_port_binding_set_type(op->sb, "patch");
>  }
>
> -struct smap new;
> -smap_init();
>  if (is_cr_port(op)) {
>  ovs_assert(sbrec_chassis_by_name);
>  ovs_assert(sbrec_chassis_by_hostname);
>  ovs_assert(sbrec_ha_chassis_grp_by_name);
>  ovs_assert(active_ha_chassis_grps);
> -const char *redirect_type = smap_get(>nbrp->options,
> - "redirect-type");
>
>  if (op->nbrp->ha_chassis_group) {
>  if (op->nbrp->n_gateway_chassis) {
> @@ -3480,49 +3479,8 @@ ovn_port_update_sbrec(struct ovsdb_idl_txn
*ovnsb_txn,
>  /* Delete the legacy gateway_chassis from the pb. */
>  sbrec_port_binding_set_gateway_chassis(op->sb, NULL, 0);
>  }
> -smap_add(, "distributed-port", op->nbrp->name);
> -
> -bool always_redirect =
> -!op->od->has_distributed_nat &&
> -!l3dgw_port_has_associated_vtep_lports(op->l3dgw_port);
> -
> -if (redirect_type) {
> -smap_add(, "redirect-type", redirect_type);
> -/* XXX Why can't we enable always-redirect when
redirect-type
> - * is bridged? */
> -if (!strcmp(redirect_type, "bridged")) {
> -always_redirect = false;
> -}
> -}
> -
> -if (always_redirect) {
> -smap_add(, "always-redirect", "true");
> -}
> -} else {
> -if (op->peer) {
> -smap_add(, "peer", op->peer->key);
> -if (op->nbrp->ha_chassis_group ||
> -op->nbrp->n_gateway_chassis) {
> -char *redirect_name =
> -ovn_chassis_redirect_name(op->nbrp->name);
> -smap_add(, "chassis-redirect-port",
redirect_name);
> -free(redirect_name);
> -}
> -}
> -if (chassis_name) {
> -smap_add(, "l3gateway-chassis", chassis_name);
> -}
> -}
> -
> -const char *ipv6_pd_list = smap_get(>sb->options,
> -"ipv6_ra_pd_list");
> -if (ipv6_pd_list) {
> -smap_add(, "ipv6_ra_pd_list", ipv6_pd_list);
>  }
>
> -sbrec_port_binding_set_options(op->sb, );
> -smap_destroy();
> -
>  sbrec_port_binding_set_parent_port(op->sb, NULL);
>  sbrec_port_binding_set_tag(op->sb, NULL, 0);
>
> @@ -4752,12 +4710,14 @@ check_sb_lb_duplicates(const struct
sbrec_load_balancer_table *table)
>  return

Re: [ovs-dev] [PATCH ovn v2 02/18] northd: Track ovn_datapaths in northd engine track data.

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:14 AM  wrote:
>
> From: Numan Siddique 
>
> northd engine tracked data now also stores the logical switches
> and logical routers that got updated due to the changed load balancers.
>
> Eg 1.  For this command 'ovn-nbctl ls-lb-add sw0 lb1 -- lr-lb-add lr0
> lb1', northd engine tracking data will store 'sw0' and 'lr0'.
>
> Eg 2.  If load balancer lb1 is already associated with 'sw0' and 'lr0'
> then for this command 'ovn-nbctl set load_balancer 
> vips:10.0.0.10=20.0.0.20', northd engine tracking data will store
> 'sw0' and 'lr0'.
>
> An upcoming commit will make use of this tracked data.
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/northd.c | 34 +-
>  northd/northd.h | 12 
>  2 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index df22a9c658..9ce1b2cb5a 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -5146,6 +5146,8 @@ destroy_northd_data_tracked_changes(struct
northd_data *nd)
>  struct northd_tracked_data *trk_changes = >trk_northd_changes;
>  destroy_tracked_ovn_ports(_changes->trk_ovn_ports);
>  destroy_tracked_lbs(_changes->trk_lbs);
> +hmapx_clear(_changes->ls_with_changed_lbs.crupdated);
> +hmapx_clear(_changes->lr_with_changed_lbs.crupdated);
>  nd->change_tracked = false;
>  }
>
> @@ -5158,6 +5160,8 @@ init_northd_tracked_data(struct northd_data *nd)
>  hmapx_init(_changes->trk_ovn_ports.deleted);
>  hmapx_init(_changes->trk_lbs.crupdated);
>  hmapx_init(_changes->trk_lbs.deleted);
> +hmapx_init(_changes->ls_with_changed_lbs.crupdated);
> +hmapx_init(_changes->lr_with_changed_lbs.crupdated);
>  }
>
>  static void
> @@ -5169,6 +5173,8 @@ destroy_northd_tracked_data(struct northd_data *nd)
>  hmapx_destroy(_changes->trk_ovn_ports.deleted);
>  hmapx_destroy(_changes->trk_lbs.crupdated);
>  hmapx_destroy(_changes->trk_lbs.deleted);
> +hmapx_destroy(_changes->ls_with_changed_lbs.crupdated);
> +hmapx_destroy(_changes->lr_with_changed_lbs.crupdated);
>  }
>
>
> @@ -5179,7 +5185,10 @@ northd_has_tracked_data(struct northd_tracked_data
*trk_nd_changes)
>  || !hmapx_is_empty(_nd_changes->trk_ovn_ports.updated)
>  || !hmapx_is_empty(_nd_changes->trk_ovn_ports.deleted)
>  || !hmapx_is_empty(_nd_changes->trk_lbs.crupdated)
> -|| !hmapx_is_empty(_nd_changes->trk_lbs.deleted));
> +|| !hmapx_is_empty(_nd_changes->trk_lbs.deleted)
> +||
!hmapx_is_empty(_nd_changes->ls_with_changed_lbs.crupdated)
> +||
!hmapx_is_empty(_nd_changes->lr_with_changed_lbs.crupdated)
> +);
>  }
>
>  bool
> @@ -5188,6 +5197,8 @@ northd_has_only_ports_in_tracked_data(
>  {
>  return (hmapx_is_empty(_nd_changes->trk_lbs.crupdated)
>  && hmapx_is_empty(_nd_changes->trk_lbs.deleted)
> +&&
hmapx_is_empty(_nd_changes->ls_with_changed_lbs.crupdated)
> +&&
hmapx_is_empty(_nd_changes->lr_with_changed_lbs.crupdated)
>  && (!hmapx_is_empty(_nd_changes->trk_ovn_ports.created)
>  || !hmapx_is_empty(_nd_changes->trk_ovn_ports.updated)
>  || !hmapx_is_empty(_nd_changes->trk_ovn_ports.deleted)));
> @@ -5828,6 +5839,9 @@ northd_handle_lb_data_changes(struct
tracked_lb_data *trk_lb_data,
> lb_dps->nb_ls_map) {
>  od = ls_datapaths->array[index];
>  init_lb_for_datapath(od);
> +
> +/* Add the ls datapath to the northd tracked data. */
> +hmapx_add(_changes->ls_with_changed_lbs.crupdated, od);
>  }
>
>  hmap_remove(lb_datapaths_map, _dps->hmap_node);
> @@ -5909,6 +5923,9 @@ northd_handle_lb_data_changes(struct
tracked_lb_data *trk_lb_data,
>
>  /* Re-evaluate 'od->has_lb_vip' */
>  init_lb_for_datapath(od);
> +
> +/* Add the ls datapath to the northd tracked data. */
> +hmapx_add(_changes->ls_with_changed_lbs.crupdated, od);
>  }
>
>  LIST_FOR_EACH (codlb, list_node, _lb_data->crupdated_lr_lbs) {
> @@ -5954,6 +5971,9 @@ northd_handle_lb_data_changes(struct
tracked_lb_data *trk_lb_data,
>
>  /* Re-evaluate 'od->has_lb_vip' */
>  init_lb_for_datapath(od);
> +
> +/* Add the lr datapath to the northd tracked data. */
> +hmapx_add(_changes->lr_with_changed_lbs.crupdated, od);
>  }
>
>  HMAP_FOR_EACH (clb, hmap_node, _lb_data->crupdated_lbs) {
> @@ -5968,6 +5988,9 @@ northd_handle_lb_data_changes(struct
tracked_lb_data *trk_lb_data,
>  od = ls_datapaths->array[index];
>  /* Re-evaluate 'od->has_lb_vip' */
>  init_lb_for_datapath(od);
> +
> +/* Add the ls datapath to the northd tracked data. */
> +hmapx_add(_changes->ls_with_changed_lbs.crupdated, od);
>  }
>
>  BITMAP_FOR_EACH_1 (index, ods_size(lr_datapaths),
> @@ -5991,6 +6014,9 @@

Re: [ovs-dev] [PATCH ovn v2 01/18] northd: Refactor the northd change tracking.

2023-11-14 Thread Han Zhou

On Thu, Oct 26, 2023 at 11:14 AM  wrote:
>
> From: Numan Siddique 
>
> northd engine tracking data now has the following tracking data
>   - changed ovn_ports (right now only changed logical switch ports are
> tracked.)
>   - changed load balancers.
>
> This separation becomes easier to add lflow handling for these
> changes in lflow northd engine handler.  This patch doesn't
> handle the load balancer changes in lflow handler.  It will
> be handled in upcoming commits.
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/en-lflow.c   |  11 +-
>  northd/en-northd.c  |  13 +-
>  northd/en-sync-sb.c |  10 +-
>  northd/northd.c | 446 
>  northd/northd.h |  63 ---
>  tests/ovn-northd.at |  10 +-
>  6 files changed, 313 insertions(+), 240 deletions(-)
>
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index 2b84fef0ef..96d03b7ada 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -108,8 +108,8 @@ lflow_northd_handler(struct engine_node *node,
>  return false;
>  }
>
> -/* Fall back to recompute if lb related data has changed. */
> -if (northd_data->lb_changed) {
> +/* Fall back to recompute if load balancers have changed. */
> +if
(northd_has_lbs_in_tracked_data(_data->trk_northd_changes)) {
>  return false;
>  }
>
> @@ -119,13 +119,14 @@ lflow_northd_handler(struct engine_node *node,
>  struct lflow_input lflow_input;
>  lflow_get_input_data(node, _input);
>
> -if (!lflow_handle_northd_ls_changes(eng_ctx->ovnsb_idl_txn,
> -_data->tracked_ls_changes,
> -_input,
_data->lflows)) {
> +if (!lflow_handle_northd_port_changes(eng_ctx->ovnsb_idl_txn,
> +
 _data->trk_northd_changes.trk_ovn_ports,
> +_input, _data->lflows)) {
>  return false;
>  }
>
>  engine_set_node_state(node, EN_UPDATED);
> +
>  return true;
>  }
>
> diff --git a/northd/en-northd.c b/northd/en-northd.c
> index aa0f20f0c2..96c2ce9f69 100644
> --- a/northd/en-northd.c
> +++ b/northd/en-northd.c
> @@ -230,15 +230,16 @@ northd_lb_data_handler(struct engine_node *node,
void *data)
> >ls_datapaths,
> >lr_datapaths,
> >lb_datapaths_map,
> -   >lb_group_datapaths_map)) {
> +   >lb_group_datapaths_map,
> +   >trk_northd_changes)) {
>  return false;
>  }
>
> -/* Indicate the depedendant engine nodes that load balancer/group
> - * related data has changed (including association to logical
> - * switch/router). */
> -nd->lb_changed = true;
> -engine_set_node_state(node, EN_UPDATED);
> +if (northd_has_lbs_in_tracked_data(>trk_northd_changes)) {
> +nd->change_tracked = true;
> +engine_set_node_state(node, EN_UPDATED);
> +}
> +
>  return true;
>  }
>
> diff --git a/northd/en-sync-sb.c b/northd/en-sync-sb.c
> index 2ec3bf54f8..2540fcfb97 100644
> --- a/northd/en-sync-sb.c
> +++ b/northd/en-sync-sb.c
> @@ -236,7 +236,8 @@ sync_to_sb_lb_northd_handler(struct engine_node
*node, void *data OVS_UNUSED)
>  {
>  struct northd_data *nd = engine_get_input_data("northd", node);
>
> -if (!nd->change_tracked || nd->lb_changed) {
> +if (!nd->change_tracked ||
> +northd_has_lbs_in_tracked_data(>trk_northd_changes)) {
>  /* Return false if no tracking data or if lbs changed. */
>  return false;
>  }
> @@ -306,11 +307,14 @@ sync_to_sb_pb_northd_handler(struct engine_node
*node, void *data OVS_UNUSED)
>  }
>
>  struct northd_data *nd = engine_get_input_data("northd", node);
> -if (!nd->change_tracked) {
> +if (!nd->change_tracked ||
> +northd_has_lbs_in_tracked_data(>trk_northd_changes)) {
> +/* Return false if no tracking data or if lbs changed. */
>  return false;
>  }
>
> -if (!sync_pbs_for_northd_ls_changes(>tracked_ls_changes)) {
> +if (!sync_pbs_for_northd_changed_ovn_ports(
> +>trk_northd_changes.trk_ovn_ports)) {
>  return false;
>  }
>
> diff --git a/northd/northd.c b/northd/northd.c
> index f8b046d83e..df22a9c658 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -4894,21 +4894,20 @@ sync_pbs(struct ovsdb_idl_txn *ovnsb_idl_txn,
struct hmap *ls_ports)
>  }
>
>  /* Sync the SB Port bindings for the added and updated logical switch
ports
> - *  of the tracked logical switches (from the northd engine node). */
> + * of the tracked northd engine data. */
>  bool
> -sync_pbs_for_northd_ls_changes(struct tracked_ls_changes *ls_changes)
> +sync_pbs_for_northd_changed_ovn_ports( struct tracked_ovn_ports
*trk_ovn_ports)
>  {
> -struct ls_change *ls_change;
> -LIST_FOR_EACH (ls_change, list_node,

Re: [ovs-dev] [PATCH ovn] northd: Support CIDR-based MAC binding aging threshold.

2023-11-07 Thread Han Zhou

On Tue, Nov 7, 2023 at 6:12 PM Han Zhou  wrote:
>
>
>
> On Tue, Nov 7, 2023 at 8:06 AM Dumitru Ceara  wrote:
> >
> > On 11/6/23 08:19, Han Zhou wrote:
> > > On Sun, Nov 5, 2023 at 10:59 PM Ales Musil  wrote:
> > >>
> > >>
> > >>
> > >> On Sat, Nov 4, 2023 at 5:45 AM Han Zhou  wrote:
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Nov 3, 2023 at 1:08 AM Ales Musil  wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Oct 24, 2023 at 9:36 PM Han Zhou  wrote:
> > >>>>>
> > >>>>> Enhance MAC_Binding aging to allow CIDR-based threshold
> > > configurations.
> > >>>>> This enables distinct threshold settings for different IP ranges,
> > >>>>> applying the longest prefix matching for overlapping ranges.
> > >>>>>
> > >>>>> A common use case involves setting a default threshold for all
IPs,
> > > while
> > >>>>> disabling aging for a specific range and potentially excluding a
> > > subset
> > >>>>> within that range.
> > >>>>>
> > >>>>> Signed-off-by: Han Zhou 
> > >>>>> ---
> > >>>>
> > >>>>
> > >>>> Hi Han,
> > >>>>
> > >>>> thank you for the patch, I have a couple of comments down below.
> > >>>
> > >>> Thanks for your review!
> > >>>
> >
> >
> > Hi Han,
> >
> > Thanks for the patch!  Not a very in-depth review but I left a bunch of
> > minor comments and shared my opinion on the configuration format.
> >
>
> Thanks for your review!
>
> > Regards,
> > Dumitru
> >
> > >>>>
> > >>>>>
> > >>>>>  northd/aging.c  | 297
> > > +---
> > >>>>>  northd/aging.h  |   3 +
> > >>>>>  northd/northd.c |  11 +-
> > >>>>>  ovn-nb.xml  |  63 +-
> > >>>>>  tests/ovn.at|  60 ++
> > >>>>>  5 files changed, 413 insertions(+), 21 deletions(-)
> > >>>>>
> > >>>>> diff --git a/northd/aging.c b/northd/aging.c
> > >>>>> index f626c72c8ca3..e5868211a63b 100644
> > >>>>> --- a/northd/aging.c
> > >>>>> +++ b/northd/aging.c
> > >>>>> @@ -47,12 +47,253 @@ aging_waker_schedule_next_wake(struct
> > > aging_waker *waker, int64_t next_wake_ms)
> > >>>>>  }
> > >>>>>  }
> > >>>>>
> > >>>>> +struct threshold_entry {
> > >>>>> +union {
> > >>>>> +ovs_be32 ipv4;
> > >>>>> +struct in6_addr ipv6;
> > >>>>> +} prefix;
> > >>>>> +bool is_v4;
> > >>>>
> > >>>>
> > >>>> We can avoid the is_v4 whatsover by storing the address as mapped
v4
> > > in 'in6_addr', I saw the concern and we can still have separate
arrays for
> > > v4 and v6. The parsing IMO becomes much simpler if we use the mapped
v4 for
> > > both see down below.
> > >>>>
> > >>>
> > >>> I did consider using in6_addr, but it would make the
> > > find_threshold_for_ip() slightly more tedious because for every v4
entry,
> > > even if stored in a separate array, we will still need to call
> > > in6_addr_get_mapped_ipv4() to convert it. While using the union we can
> > > directly access what we need (ipv4 or ipv6).
> > >>
> > >>
> > >> We wouldn't have to actually call in6_addr_get_mapped_ipv4() for the
v4
> > > variant, that's why I suggested that approach. We can adjust the mask
and
> > > then use the same logic as for ipv6 (entry->plen += 96 should do the
trick).
> > >>
> > >
> > > Understood, and my reply below was about that:
> > >>> The search may be simplified a little in find_threshold_for_ip()
with
> > > adjusted prefix for v4, only if we combine the two arrays to a single
> > > array. Otherwise we will need to have a function to abstract and
reuse the
> > > ipv6 matching logic, while the ipv4 matching is really just 3 lines
of code
> > > :)

[ovs-dev] [PATCH ovn v2] northd: Support CIDR-based MAC binding aging threshold.

2023-11-07 Thread Han Zhou

Enhance MAC_Binding aging to allow CIDR-based threshold configurations.
This enables distinct threshold settings for different IP ranges,
applying the longest prefix matching for overlapping ranges.

A common use case involves setting a default threshold for all IPs, while
disabling aging for a specific range and potentially excluding a subset
within that range.

Signed-off-by: Han Zhou 
---
v2: Addressed comments from Ilya, Ales and Dumitru:
  - Add NEWS.
  - Use strtok_r in parse_aging_threshold.
  - Do not call parse_aging_threshold in en_fdb_aging_run.
  - Use size_t for array size.
  - Add non-masked ipv6 in test case.

 NEWS|   2 +
 northd/aging.c  | 291 +---
 northd/aging.h  |   3 +
 northd/northd.c |  11 +-
 ovn-nb.xml  |  63 ++-
 tests/ovn.at|  60 ++
 6 files changed, 409 insertions(+), 21 deletions(-)

diff --git a/NEWS b/NEWS
index 30f6edb282ca..74f0303280ca 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,8 @@ Post v23.09.0
 connection method and doesn't require additional probing.
 external_ids:ovn-openflow-probe-interval configuration option for
 ovn-controller no longer matters and is ignored.
+  - Support CIDR based MAC binding aging threshold. See ovn-nb(5) for
+'mac_binding_age_threshold' for more details.
 
 OVN v23.09.0 - 15 Sep 2023
 --
diff --git a/northd/aging.c b/northd/aging.c
index f626c72c8ca3..cdf5f4464e10 100644
--- a/northd/aging.c
+++ b/northd/aging.c
@@ -47,12 +47,250 @@ aging_waker_schedule_next_wake(struct aging_waker *waker, 
int64_t next_wake_ms)
 }
 }
 
+struct threshold_entry {
+union {
+ovs_be32 ipv4;
+struct in6_addr ipv6;
+} prefix;
+bool is_v4;
+unsigned int plen;
+unsigned int threshold;
+};
+
+/* Contains CIDR-based aging threshold configuration parsed from
+ * "Logical_Router:options:mac_binding_age_threshold".
+ *
+ * This struct is also used for non-CIDR-based threshold, e.g. the ones from
+ * "NB_Global:other_config:fdb_age_threshold" for the common aging_context
+ * interface.
+ *
+ * - The arrays `v4_entries` and `v6_entries` are populated with parsed entries
+ *   for IPv4 and IPv6 CIDRs, respectively, along with their associated
+ *   thresholds.  Entries within these arrays are sorted by prefix length,
+ *   starting with the longest.
+ *
+ * - If a threshold is provided without an accompanying prefix, it's captured
+ *   in `default_threshold`.  In cases with multiple unprefixed thresholds,
+ *   `default_threshold` will only store the last one.  */
+struct threshold_config {
+struct threshold_entry *v4_entries;
+size_t n_v4_entries;
+struct threshold_entry *v6_entries;
+size_t n_v6_entries;
+unsigned int default_threshold;
+};
+
+static int
+compare_entries_by_prefix_length(const void *a, const void *b)
+{
+const struct threshold_entry *entry_a = a;
+const struct threshold_entry *entry_b = b;
+
+return entry_b->plen - entry_a->plen;
+}
+
+/* Parse an ENTRY in the threshold option, with the format:
+ * [CIDR:]THRESHOLD
+ *
+ * Returns true if successful, false if failed. */
+static bool
+parse_threshold_entry(const char *str, struct threshold_entry *entry)
+{
+char *colon_ptr;
+unsigned int value;
+const char *threshold_str;
+
+colon_ptr = strrchr(str, ':');
+if (!colon_ptr) {
+threshold_str = str;
+entry->plen = 0;
+} else {
+threshold_str = colon_ptr + 1;
+}
+
+if (!str_to_uint(threshold_str, 10, )) {
+return false;
+}
+entry->threshold = value;
+
+if (!colon_ptr) {
+return true;
+}
+
+/* ":" was found, so parse the string before ":" as a cidr. */
+char ip_cidr[128];
+ovs_strzcpy(ip_cidr, str, MIN(colon_ptr - str + 1, sizeof ip_cidr));
+char *error = ip_parse_cidr(ip_cidr, >prefix.ipv4, >plen);
+if (!error) {
+entry->is_v4 = true;
+return true;
+}
+free(error);
+error = ipv6_parse_cidr(ip_cidr, >prefix.ipv6, >plen);
+if (!error) {
+entry->is_v4 = false;
+return true;
+}
+free(error);
+return false;
+}
+
+static void
+threshold_config_destroy(struct threshold_config *config)
+{
+free(config->v4_entries);
+free(config->v6_entries);
+config->v4_entries = config->v6_entries = NULL;
+config->n_v4_entries = config->n_v6_entries = 0;
+config->default_threshold = 0;
+}
+
+/* Parse the threshold option string, which has the format:
+ * ENTRY[;ENTRY[...]]
+ *
+ * For the exact format of ENTRY, refer to the function
+ * `parse_threshold_entry`.
+ *
+ * The parsed data is populated to the struct threshold_config.
+ * See the comments of struct threshold_config for details.
+ *
+ * Return Values:
+ * - Returns `false` if the input does not match the expected format.
+ *   Consequently, no entries

Re: [ovs-dev] [PATCH ovn] northd: Support CIDR-based MAC binding aging threshold.

2023-11-07 Thread Han Zhou

On Tue, Nov 7, 2023 at 8:06 AM Dumitru Ceara  wrote:
>
> On 11/6/23 08:19, Han Zhou wrote:
> > On Sun, Nov 5, 2023 at 10:59 PM Ales Musil  wrote:
> >>
> >>
> >>
> >> On Sat, Nov 4, 2023 at 5:45 AM Han Zhou  wrote:
> >>>
> >>>
> >>>
> >>> On Fri, Nov 3, 2023 at 1:08 AM Ales Musil  wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Oct 24, 2023 at 9:36 PM Han Zhou  wrote:
> >>>>>
> >>>>> Enhance MAC_Binding aging to allow CIDR-based threshold
> > configurations.
> >>>>> This enables distinct threshold settings for different IP ranges,
> >>>>> applying the longest prefix matching for overlapping ranges.
> >>>>>
> >>>>> A common use case involves setting a default threshold for all IPs,
> > while
> >>>>> disabling aging for a specific range and potentially excluding a
> > subset
> >>>>> within that range.
> >>>>>
> >>>>> Signed-off-by: Han Zhou 
> >>>>> ---
> >>>>
> >>>>
> >>>> Hi Han,
> >>>>
> >>>> thank you for the patch, I have a couple of comments down below.
> >>>
> >>> Thanks for your review!
> >>>
>
>
> Hi Han,
>
> Thanks for the patch!  Not a very in-depth review but I left a bunch of
> minor comments and shared my opinion on the configuration format.
>

Thanks for your review!

> Regards,
> Dumitru
>
> >>>>
> >>>>>
> >>>>>  northd/aging.c  | 297
> > +---
> >>>>>  northd/aging.h  |   3 +
> >>>>>  northd/northd.c |  11 +-
> >>>>>  ovn-nb.xml  |  63 +-
> >>>>>  tests/ovn.at|  60 ++
> >>>>>  5 files changed, 413 insertions(+), 21 deletions(-)
> >>>>>
> >>>>> diff --git a/northd/aging.c b/northd/aging.c
> >>>>> index f626c72c8ca3..e5868211a63b 100644
> >>>>> --- a/northd/aging.c
> >>>>> +++ b/northd/aging.c
> >>>>> @@ -47,12 +47,253 @@ aging_waker_schedule_next_wake(struct
> > aging_waker *waker, int64_t next_wake_ms)
> >>>>>  }
> >>>>>  }
> >>>>>
> >>>>> +struct threshold_entry {
> >>>>> +union {
> >>>>> +ovs_be32 ipv4;
> >>>>> +struct in6_addr ipv6;
> >>>>> +} prefix;
> >>>>> +bool is_v4;
> >>>>
> >>>>
> >>>> We can avoid the is_v4 whatsover by storing the address as mapped v4
> > in 'in6_addr', I saw the concern and we can still have separate arrays
for
> > v4 and v6. The parsing IMO becomes much simpler if we use the mapped v4
for
> > both see down below.
> >>>>
> >>>
> >>> I did consider using in6_addr, but it would make the
> > find_threshold_for_ip() slightly more tedious because for every v4
entry,
> > even if stored in a separate array, we will still need to call
> > in6_addr_get_mapped_ipv4() to convert it. While using the union we can
> > directly access what we need (ipv4 or ipv6).
> >>
> >>
> >> We wouldn't have to actually call in6_addr_get_mapped_ipv4() for the v4
> > variant, that's why I suggested that approach. We can adjust the mask
and
> > then use the same logic as for ipv6 (entry->plen += 96 should do the
trick).
> >>
> >
> > Understood, and my reply below was about that:
> >>> The search may be simplified a little in find_threshold_for_ip() with
> > adjusted prefix for v4, only if we combine the two arrays to a single
> > array. Otherwise we will need to have a function to abstract and reuse
the
> > ipv6 matching logic, while the ipv4 matching is really just 3 lines of
code
> > :)
> >
> >>>
> >>> The is_v4 member was needed when using a single array, but not
necessary
> > any more when storing v4 and v6 entries separately, but I kept it just
for
> > convenience, so that I don't need to use another arg in
> > parse_threshold_entry() to return the AF type. Using in6_addr can avoid
the
> > extra arg, too, but similarly, we would have to always call the macro
> > IN6_IS_ADDR_V4MAPPED, which doesn't seem cleaner than the is_v4 field to
> > me. Using ip46_parse_cidr does save several lines of co

Re: [ovs-dev] [PATCH ovn] physical: Fix else-if typo.

2023-11-06 Thread Han Zhou

On Mon, Nov 6, 2023 at 9:05 AM Mark Michelson  wrote:
>
> Good catch, Han.
>
> Acked-by: Mark Michelson 

Thanks Mark. Applied to main.

Han
>
> On 11/3/23 01:40, Han Zhou wrote:
> > It is not causing any problem so far, but fix it to avoid potential
> > problem in the future.
> >
> > Fixes: c6b20c9940ed ("ovn-ic: Support IGMP/MLD in multi-AZ
deployments.")
> > Signed-off-by: Han Zhou 
> > ---
> >   controller/physical.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/controller/physical.c b/controller/physical.c
> > index 72e88a203a66..41c38abc4a58 100644
> > --- a/controller/physical.c
> > +++ b/controller/physical.c
> > @@ -2043,7 +2043,7 @@ consider_mc_group(struct ovsdb_idl_index
*sbrec_port_binding_by_name,
> >   local_output_pb(port->tunnel_key, _ofpacts);
> >   local_output_pb(port->tunnel_key,
_ofpacts_ramp);
> >   }
> > -} if (!strcmp(port->type, "remote")) {
> > +} else if (!strcmp(port->type, "remote")) {
> >   if (port->chassis) {
> >   put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32,
> >_ofpacts);
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] northd: Support CIDR-based MAC binding aging threshold.

2023-11-05 Thread Han Zhou

On Sun, Nov 5, 2023 at 10:59 PM Ales Musil  wrote:
>
>
>
> On Sat, Nov 4, 2023 at 5:45 AM Han Zhou  wrote:
>>
>>
>>
>> On Fri, Nov 3, 2023 at 1:08 AM Ales Musil  wrote:
>> >
>> >
>> >
>> > On Tue, Oct 24, 2023 at 9:36 PM Han Zhou  wrote:
>> >>
>> >> Enhance MAC_Binding aging to allow CIDR-based threshold
configurations.
>> >> This enables distinct threshold settings for different IP ranges,
>> >> applying the longest prefix matching for overlapping ranges.
>> >>
>> >> A common use case involves setting a default threshold for all IPs,
while
>> >> disabling aging for a specific range and potentially excluding a
subset
>> >> within that range.
>> >>
>> >> Signed-off-by: Han Zhou 
>> >> ---
>> >
>> >
>> > Hi Han,
>> >
>> > thank you for the patch, I have a couple of comments down below.
>>
>> Thanks for your review!
>>
>> >
>> >>
>> >>  northd/aging.c  | 297
+---
>> >>  northd/aging.h  |   3 +
>> >>  northd/northd.c |  11 +-
>> >>  ovn-nb.xml  |  63 +-
>> >>  tests/ovn.at|  60 ++
>> >>  5 files changed, 413 insertions(+), 21 deletions(-)
>> >>
>> >> diff --git a/northd/aging.c b/northd/aging.c
>> >> index f626c72c8ca3..e5868211a63b 100644
>> >> --- a/northd/aging.c
>> >> +++ b/northd/aging.c
>> >> @@ -47,12 +47,253 @@ aging_waker_schedule_next_wake(struct
aging_waker *waker, int64_t next_wake_ms)
>> >>  }
>> >>  }
>> >>
>> >> +struct threshold_entry {
>> >> +union {
>> >> +ovs_be32 ipv4;
>> >> +struct in6_addr ipv6;
>> >> +} prefix;
>> >> +bool is_v4;
>> >
>> >
>> > We can avoid the is_v4 whatsover by storing the address as mapped v4
in 'in6_addr', I saw the concern and we can still have separate arrays for
v4 and v6. The parsing IMO becomes much simpler if we use the mapped v4 for
both see down below.
>> >
>>
>> I did consider using in6_addr, but it would make the
find_threshold_for_ip() slightly more tedious because for every v4 entry,
even if stored in a separate array, we will still need to call
in6_addr_get_mapped_ipv4() to convert it. While using the union we can
directly access what we need (ipv4 or ipv6).
>
>
> We wouldn't have to actually call in6_addr_get_mapped_ipv4() for the v4
variant, that's why I suggested that approach. We can adjust the mask and
then use the same logic as for ipv6 (entry->plen += 96 should do the trick).
>

Understood, and my reply below was about that:
>> The search may be simplified a little in find_threshold_for_ip() with
adjusted prefix for v4, only if we combine the two arrays to a single
array. Otherwise we will need to have a function to abstract and reuse the
ipv6 matching logic, while the ipv4 matching is really just 3 lines of code
:)

>>
>> The is_v4 member was needed when using a single array, but not necessary
any more when storing v4 and v6 entries separately, but I kept it just for
convenience, so that I don't need to use another arg in
parse_threshold_entry() to return the AF type. Using in6_addr can avoid the
extra arg, too, but similarly, we would have to always call the macro
IN6_IS_ADDR_V4MAPPED, which doesn't seem cleaner than the is_v4 field to
me. Using ip46_parse_cidr does save several lines of code, but that code
snippet itself is quite small. So with all these being considered, I think
either approach doesn't really make too much difference. I am totally fine
to change it if you still think a single in6_addr is better than the union.
>>
>> >>
>> >> +unsigned int plen;
>> >> +unsigned int threshold;
>> >
>> >
>> > I personally prefer sized integers.
>> >
>> I think it may be better to match the type of related helper functions.
>> For plen, it is ip[v6]_parse_cidr().
>> For threshold, it is str_to_uint(), or even smap_get_uint() returns
unsigned int.
>>
>> >>
>> >> +};
>> >> +
>> >> +/* Contains CIDR-based aging threshold configuration parsed from
>> >> + * "Logical_Router:options:mac_binding_age_threshold".
>> >> + *
>> >> + * This struct is also used for non-CIDR-based threshold, e.g. the
ones from
>> >> + * "NB_Global:other_config:fdb_age_threshold" for the common
aging_context
>> >> + * interface.
>

Re: [ovs-dev] [PATCH ovn] northd: Support CIDR-based MAC binding aging threshold.

2023-11-03 Thread Han Zhou

On Fri, Nov 3, 2023 at 1:08 AM Ales Musil  wrote:
>
>
>
> On Tue, Oct 24, 2023 at 9:36 PM Han Zhou  wrote:
>>
>> Enhance MAC_Binding aging to allow CIDR-based threshold configurations.
>> This enables distinct threshold settings for different IP ranges,
>> applying the longest prefix matching for overlapping ranges.
>>
>> A common use case involves setting a default threshold for all IPs, while
>> disabling aging for a specific range and potentially excluding a subset
>> within that range.
>>
>> Signed-off-by: Han Zhou 
>> ---
>
>
> Hi Han,
>
> thank you for the patch, I have a couple of comments down below.

Thanks for your review!

>
>>
>>  northd/aging.c  | 297 +---
>>  northd/aging.h  |   3 +
>>  northd/northd.c |  11 +-
>>  ovn-nb.xml  |  63 +-
>>  tests/ovn.at|  60 ++
>>  5 files changed, 413 insertions(+), 21 deletions(-)
>>
>> diff --git a/northd/aging.c b/northd/aging.c
>> index f626c72c8ca3..e5868211a63b 100644
>> --- a/northd/aging.c
>> +++ b/northd/aging.c
>> @@ -47,12 +47,253 @@ aging_waker_schedule_next_wake(struct aging_waker
*waker, int64_t next_wake_ms)
>>  }
>>  }
>>
>> +struct threshold_entry {
>> +union {
>> +ovs_be32 ipv4;
>> +struct in6_addr ipv6;
>> +} prefix;
>> +bool is_v4;
>
>
> We can avoid the is_v4 whatsover by storing the address as mapped v4 in
'in6_addr', I saw the concern and we can still have separate arrays for v4
and v6. The parsing IMO becomes much simpler if we use the mapped v4 for
both see down below.
>

I did consider using in6_addr, but it would make the
find_threshold_for_ip() slightly more tedious because for every v4 entry,
even if stored in a separate array, we will still need to call
in6_addr_get_mapped_ipv4() to convert it. While using the union we can
directly access what we need (ipv4 or ipv6).
The is_v4 member was needed when using a single array, but not necessary
any more when storing v4 and v6 entries separately, but I kept it just for
convenience, so that I don't need to use another arg in
parse_threshold_entry() to return the AF type. Using in6_addr can avoid the
extra arg, too, but similarly, we would have to always call the macro
IN6_IS_ADDR_V4MAPPED, which doesn't seem cleaner than the is_v4 field to
me. Using ip46_parse_cidr does save several lines of code, but that code
snippet itself is quite small. So with all these being considered, I think
either approach doesn't really make too much difference. I am totally fine
to change it if you still think a single in6_addr is better than the union.

>>
>> +unsigned int plen;
>> +unsigned int threshold;
>
>
> I personally prefer sized integers.
>
I think it may be better to match the type of related helper functions.
For plen, it is ip[v6]_parse_cidr().
For threshold, it is str_to_uint(), or even smap_get_uint() returns
unsigned int.

>>
>> +};
>> +
>> +/* Contains CIDR-based aging threshold configuration parsed from
>> + * "Logical_Router:options:mac_binding_age_threshold".
>> + *
>> + * This struct is also used for non-CIDR-based threshold, e.g. the ones
from
>> + * "NB_Global:other_config:fdb_age_threshold" for the common
aging_context
>> + * interface.
>> + *
>> + * - The arrays `v4_entries` and `v6_entries` are populated with parsed
entries
>> + *   for IPv4 and IPv6 CIDRs, respectively, along with their associated
>> + *   thresholds.  Entries within these arrays are sorted by prefix
length,
>> + *   starting with the longest.
>> + *
>> + * - If a threshold is provided without an accompanying prefix, it's
captured
>> + *   in `default_threshold`.  In cases with multiple unprefixed
thresholds,
>> + *   `default_threshold` will only store the last one.  */
>> +struct threshold_config {
>> +struct threshold_entry *v4_entries;
>> +int n_v4_entries;
>> +struct threshold_entry *v6_entries;
>> +int n_v6_entries;
>> +unsigned int default_threshold;
>> +};
>> +
>> +static int
>> +compare_entries_by_prefix_length(const void *a, const void *b)
>> +{
>> +const struct threshold_entry *entry_a = a;
>> +const struct threshold_entry *entry_b = b;
>> +
>> +return entry_b->plen - entry_a->plen;
>> +}
>> +
>> +/* Parse an ENTRY in the threshold option, with the format:
>> + * [CIDR:]THRESHOLD
>> + *
>> + * Returns true if successful, false if failed. */
>> +static bool
>> +parse_threshold_entry(const char *str, struct threshold_entry *entry)
>

[ovs-dev] [PATCH ovn] physical: Fix else-if typo.

2023-11-02 Thread Han Zhou

It is not causing any problem so far, but fix it to avoid potential
problem in the future.

Fixes: c6b20c9940ed ("ovn-ic: Support IGMP/MLD in multi-AZ deployments.")
Signed-off-by: Han Zhou 
---
 controller/physical.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/controller/physical.c b/controller/physical.c
index 72e88a203a66..41c38abc4a58 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -2043,7 +2043,7 @@ consider_mc_group(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 local_output_pb(port->tunnel_key, _ofpacts);
 local_output_pb(port->tunnel_key, _ofpacts_ramp);
 }
-} if (!strcmp(port->type, "remote")) {
+} else if (!strcmp(port->type, "remote")) {
 if (port->chassis) {
 put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32,
  _ofpacts);
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] northd: Support CIDR-based MAC binding aging threshold.

2023-11-02 Thread Han Zhou

On Wed, Nov 1, 2023 at 11:25 AM Ilya Maximets  wrote:
>
> On 10/26/23 18:51, Han Zhou wrote:
> >
> >
> > On Wed, Oct 25, 2023 at 11:15 AM Ilya Maximets mailto:i.maxim...@ovn.org>> wrote:
> >>
> >> On 10/24/23 21:35, Han Zhou wrote:
> >> > Enhance MAC_Binding aging to allow CIDR-based threshold
configurations.
> >> > This enables distinct threshold settings for different IP ranges,
> >> > applying the longest prefix matching for overlapping ranges.
> >> >
> >> > A common use case involves setting a default threshold for all IPs,
while
> >> > disabling aging for a specific range and potentially excluding a
subset
> >> > within that range.
> >> >
> >> > Signed-off-by: Han Zhou mailto:hz...@ovn.org>>
> >> > ---
> >> >  northd/aging.c  | 297
+---
> >> >  northd/aging.h  |   3 +
> >> >  northd/northd.c |  11 +-
> >> >  ovn-nb.xml  |  63 +-
> >> >  tests/ovn.at <http://ovn.at>|  60 ++
> >> >  5 files changed, 413 insertions(+), 21 deletions(-)
> >> >
> >> > diff --git a/northd/aging.c b/northd/aging.c
> >> > index f626c72c8ca3..e5868211a63b 100644
> >> > --- a/northd/aging.c
> >> > +++ b/northd/aging.c
> >> > @@ -47,12 +47,253 @@ aging_waker_schedule_next_wake(struct
aging_waker *waker, int64_t next_wake_ms)
> >> >  }
> >> >  }
> >> >
> >> > +struct threshold_entry {
> >> > +union {
> >> > +ovs_be32 ipv4;
> >> > +struct in6_addr ipv6;
> >> > +} prefix;
> >> > +bool is_v4;
> >>
> >> Hi, Han.  Thanks for the patch!
> >>
> >> Not a full review, but I wonder if it would be cleaner to replace
> >> all the structure members above with a single 'struct in6_addr prefix;'
> >> and store ipv4 addresses as ipv6-mapped ipv4.  This will allow to use
> >> a single array for storing the entries as well.  May save some lines
> >> of code.
> >>
> >> What do you think?
> >
> > Thanks Ilya for the review. In fact using a common in6_addr in a single
array was my first attempt, but then I realized that if there are both IPv4
and IPv6 entries in the array, for each mac-binding checking, say if it is
IPv4, it may have to go through all the IPv6 entries unnecessarily. If
there are a huge amount of MAC-bindings to check, the time may be doubled.
Probably this doesn't have a big impact, but using two arrays didn't seem
too many lines of code, so I went with the current *safe* approach. Let me
know if you have a strong preference for the alternative, and I am happy to
change it.
>
> That makes sense.
>
> Though, if performance here is a concern, we probably should have an
> actual I-P for MAC binding/FDB aging.  It should be possible to keep
> track of current entries in a heap or a skiplist based on the expected
> removal time and adjust whenever actual Sb tables change and not walk
> over the whole table each time.
>
> Another idea to keep in mind is to use a classifier instead of array,
> but this will only make sense with a large number of different CIDRs
> in the configuration, otherwise the overhead of the classifier will
> likely be higher than simple array traversal.
>
> With the lack of I-P, I think, it's OK to have the entries in separate
> arrays for now.
>
Cool. I'd say performance is probably not a big concern right now, while I
was also trying to avoid adding an obvious performance penalty. I agree
that when performance becomes a real concern we should do I-P.

> >
> >>
> >> Also, this patch, probably, needs a NEWS entry.
> >
> > I was wondering if this change is big enough to add one. Now I will add
it since you brought up :).
>
> It's a new user-facing API.  Should be in the NEWS, IMO.
>

OK, how about the below change:


diff --git a/NEWS b/NEWS
index 425dfe0a84e1..6352990ae8f5 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,7 @@
 Post v23.09.0
 -
+  - Support CIDR based MAC binding aging threshold. See man ovn-nb for
+'mac_binding_age_threshold' for more details.

 OVN v23.09.0 - 15 Sep 2023
 --
--

I can add it in v2 or when merging if there are no other comments.

Thanks,
Han

> >
> > Thanks,
> > Han
> >
> >>
> >> Best regards, Ilya Maximets.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2577 matches

Mail list logo