Re: [ovs-dev] [PATCH v3] route-table: Filter route changes by interface.

2024-03-26 Thread Cheng Li
On Tue, Mar 26, 2024 at 06:42:02PM +0100, 【外部账号】Ilya Maximets wrote:
> On 3/24/24 13:16, Cheng Li wrote:
> > When ovs host is also a kubernets node, pod creation/deletion may
> > trigger route changes. As a result, ovs run route_table_reset().
> > As ovs do not care the kubernetes pod routes, route_table_reset()
> > is not neccessary.
> > 
> > Signed-off-by: Cheng Li 
> > ---
> 
> Hi, Cheng Li.  Thanks for the patch!
> 
> I'm a little confused by the use case though.  Could you explain
> a bit more why this is a problem (route dump is relatively fast,
> unless there are millions of routes) and how this change helps?
> 
> AFAIU, routes will not change much in kubernetes environment once
> the pod is initially created and configured and the port creation
> will trigger route cache reset anyway.
> 

Hi Ilya, thanks for reviewing this patch.

When calico is used as kubernets cni and IPinIP overlay mode[1] is
enabled, each time a pod created/deleted a route(dev tunl0) is
avertised across all cluster nodes.
```
# ip monitor route
10.233.75.61 via 11.46.8.90 dev tunl0 proto bird onlink
```

If we have a large cluster, route update may happen in high
frequency, which triggers lots of route_table_reset().

In route_table_reset(), all ovs route items are first deleted
then add latest kernel route items. There is a time gap between old
route items all deleted and new route items not ready. During this
gap, upcall/revalidation generate bad datapath flows.  As ovs does not
care kuberntes route changes, seems adding a filter is the simple way
to resolve this issue.

[1] https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip

> And we will need to reset the cache when new interfaces are
> added/removed from the filter, because otherwise we'll have
> stale routes in there and the cache may become inconsistent.
> 

Make sense, will fix in next version.

> Best regards, Ilya Maximets
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/5] ovsdb: raft: Avoid transferring leadership to unavailable servers.

2024-03-26 Thread Han Zhou
On Tue, Mar 26, 2024 at 10:26 AM Ilya Maximets  wrote:
>
> Current implementation of the leadership transfer just shoots the
> leadership in the general direction of the first stable server in the
> configuration.  It doesn't check if the server was active recently or
> even that the connection is established.  This may result in sending
> leadership to a disconnected or otherwise unavailable server.
>
> Such behavior should not cause log truncation or any other correctness
> issues because the destination server would have all the append
> requests queued up or the connection will be dropped by the leader.
> In a worst case we will have a leader-less cluster until the next
> election timer fires up.  Other servers will notice the absence of the
> leader and will trigger a new leader election normally.
> However, the potential wait for the election timer is not good as
> real-world setups may have high values configured.
>
> Fix that by trying to transfer to servers that we know have applied
> the most changes, i.e., have the highest 'match_index'.  Such servers
> replied to the most recent append requests, so they have highest
> chances to be healthy.  Choosing the random starting point in the
> list of such servers so we don't transfer to the same server every
> single time.  This slightly improves load distribution, but, most
> importantly, increases robustness of our test suite, making it cover
> more cases.  Also checking that the message was actually sent without
> immediate failure.
>
> If we fail to transfer to any server with the highest index, try to
> just transfer to any other server that is not behind majority and then
> just any other server that is connected.  We did actually send them
> all the updates (if the connection is open), they just didn't reply
> yet for one reason or another.  It should be better than leaving the
> cluster without a leader.
>
> Note that there is always a chance that transfer will fail, since
> we're not waiting for it to be acknowledged (and must not wait).
> In this case, normal election will be triggered after the election
> timer fires up.
>
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered
databases.")
> Signed-off-by: Ilya Maximets 
> ---
>
> CC: Felix Huettner 
>
>  ovsdb/raft.c | 48 
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> index f463afcb3..b171da345 100644
> --- a/ovsdb/raft.c
> +++ b/ovsdb/raft.c
> @@ -1261,10 +1261,30 @@ raft_transfer_leadership(struct raft *raft, const
char *reason)
>  return;
>  }
>
> -struct raft_server *s;
> +struct raft_server **servers, *s;
> +uint64_t threshold = 0;
> +size_t n = 0, start, i;
> +
> +servers = xmalloc(hmap_count(>servers) * sizeof *servers);
> +
>  HMAP_FOR_EACH (s, hmap_node, >servers) {
> -if (!uuid_equals(>sid, >sid)
> -&& s->phase == RAFT_PHASE_STABLE) {
> +if (uuid_equals(>sid, >sid)
> +|| s->phase != RAFT_PHASE_STABLE) {
> +continue;
> +}
> +if (s->match_index > threshold) {
> +threshold = s->match_index;
> +}
> +servers[n++] = s;
> +}
> +
> +start = n ? random_range(n) : 0;
> +
> +retry:
> +for (i = 0; i < n; i++) {
> +s = servers[(start + i) % n];
> +
> +if (s->match_index >= threshold) {
>  struct raft_conn *conn = raft_find_conn_by_sid(raft,
>sid);
>  if (!conn) {
>  continue;
> @@ -1280,7 +1300,10 @@ raft_transfer_leadership(struct raft *raft, const
char *reason)
>  .term = raft->term,
>  }
>  };
> -raft_send_to_conn(raft, , conn);
> +
> +if (!raft_send_to_conn(raft, , conn)) {
> +continue;
> +}
>
>  raft_record_note(raft, "transfer leadership",
>   "transferring leadership to %s because %s",
> @@ -1288,6 +1311,23 @@ raft_transfer_leadership(struct raft *raft, const
char *reason)
>  break;
>  }
>  }
> +
> +if (n && i == n && threshold) {
> +if (threshold > raft->commit_index) {
> +/* Failed to transfer to servers with the highest
'match_index'.
> + * Try other servers that are not behind the majority. */
> +threshold = raft->commit_index;
> +} else {
> +/* Try any other server.  It is safe, because they either
have all
> + * the append requests queued up for them before the
leadership
> + * transfer message or their connection is broken and we
will not
> + * transfer anyway. */
> +threshold = 0;
> +}
> +goto retry;

Thanks Ilya. It seems the retry could try the earlier failed server (e.g.
the ones that raft_send_to_conn() returned false) one or two more times,
but it should be fine 

Re: [ovs-dev] [PATCH ovn v3 6/8] utilities/docker: Fix up container build.

2024-03-26 Thread Mark Michelson

On 3/25/24 05:09, Ales Musil wrote:



On Mon, Mar 25, 2024 at 9:47 AM Dumitru Ceara > wrote:


On 3/22/24 19:34, Mark Michelson wrote:
 > On 3/21/24 19:15, Dumitru Ceara wrote:
 >> Most of the steps were inaccurate.  Instead, use latest Ubuntu, use
 >> OVS from the submodule inside the OVN repo.
 >>
 >> Signed-off-by: Dumitru Ceara mailto:dce...@redhat.com>>
 >> ---
 >>   utilities/docker/Makefile  |  4 ++--
 >>   utilities/docker/debian/Dockerfile |  2 +-
 >>   utilities/docker/debian/build.sh   |  2 ++
 >>   utilities/docker/install_ovn.sh    | 31
++
 >>   4 files changed, 19 insertions(+), 20 deletions(-)
 >>
 >> diff --git a/utilities/docker/Makefile b/utilities/docker/Makefile
 >> index 57e95651cb..aad9c3482c 100644
 >> --- a/utilities/docker/Makefile
 >> +++ b/utilities/docker/Makefile
 >> @@ -1,5 +1,5 @@
 >> -#export OVN_BRANCH=master
 >> -#export OVN_VERSION=2.12
 >> +#export OVN_BRANCH=main
 >> +#export OVN_VERSION=24.03.90
 >
 > Is this something that we should update with each major release
of OVN?
 > If so, I could probably alter my release script to include updating
 > utilities/docker/Makefile as part of the release patches.
 >

It's just a comment and I guess the original intention was to show users
how to set this up.  But, back to your question, if it's not a lot of
work, automatically changing this on release would be nice.

Also, it might make sense to set up a CI job that actually runs this in
CI, maybe periodically.  I can try to find some time to do that at some
point in the future.  Wdyt?



Hi Mark, Dumitru,

from a different perspective, I would be interested to see if this is 
still used by anyone. We have other methods of running OVN in containers 
that are up to date and maintained. It might be the case of working 
perfectly, still used and doesn't need any attention. But it would 
probably still be useful to hear from the community if that's the case. 
IMO we shouldn't run CI on something just for the sake of running CI.


Does that make sense?


First off, with regards to this patch, I think it can be merged as-is, 
with no further action. If we want to continue this discussion, we 
shouldn't let it hold up the patch.


When it comes to your questions, Ales, they're really hard to answer :)

Is anyone using utilities/docker? I have no idea. I'd be willing to bet 
most CMSes are using their own containerization methods instead of the 
in-tree method. However, I don't know if there are people that use it 
for development purposes. I know I tend to use ovn-fake-multinode during 
development/testing, but that's just me.


When you mention having other methods of running OVN in containers, are 
you referring to the content in utilities/containers ? I agree that when 
it comes to CI, we probably don't need the content in utilities/docker . 
But it appears to have been written to be a general-purpose method of 
creating versioned container builds of OVN, rather than specifically for 
CI. The content in utilities/docker is also quite out of date, I would 
imagine, since it hasn't been updated since adding cluster support in 
January 2020. There is also documentation throughout the tree that 
refers to the content in utilities/docker. If a new user were to 
download the OVN source and browse the documentation, they would try to 
use the Makefile in utilities/docker.


In my opinion, the best thing to do is to merge the two containerization 
methods together into a single unified method. We can then use this for 
our CI, meaning we will keep it up to date and add necessary features as 
we see fit. This way we aren't keeping a "dead" method alive in the tree 
for no reason, and we won't be adding unnecessary CI either.


What do you think?




 >>   #export DISTRO=debian
 >>   #export GITHUB_SRC=https://github.com/ovn-org/ovn.git

 >>   #export DOCKER_REPO=ovn-org/ovn
 >> diff --git a/utilities/docker/debian/Dockerfile
 >> b/utilities/docker/debian/Dockerfile
 >> index 366ad6d4f3..a89ef46c9f 100644
 >> --- a/utilities/docker/debian/Dockerfile
 >> +++ b/utilities/docker/debian/Dockerfile
 >> @@ -1,4 +1,4 @@
 >> -FROM ubuntu:16.04
 >> +FROM ubuntu:22.04
 >>   MAINTAINER "Aliasgar Ginwala" mailto:aginw...@ebay.com>>
 >>     ARG OVN_BRANCH
 >> diff --git a/utilities/docker/debian/build.sh
 >> b/utilities/docker/debian/build.sh
 >> index 57ace5f505..6edb5b85e4 100755
 >> --- a/utilities/docker/debian/build.sh
 >> +++ b/utilities/docker/debian/build.sh
 >> @@ -12,6 +12,8 @@
 >>   # See the License for the specific language governing
permissions and
 >>   # limitations under the License.
 >>   +set -e
 >> +
 >>   OVN_BRANCH=$1
 >>   GITHUB_SRC=$2
 

Re: [ovs-dev] [PATCH v3] route-table: Filter route changes by interface.

2024-03-26 Thread Ilya Maximets
On 3/24/24 13:16, Cheng Li wrote:
> When ovs host is also a kubernets node, pod creation/deletion may
> trigger route changes. As a result, ovs run route_table_reset().
> As ovs do not care the kubernetes pod routes, route_table_reset()
> is not neccessary.
> 
> Signed-off-by: Cheng Li 
> ---

Hi, Cheng Li.  Thanks for the patch!

I'm a little confused by the use case though.  Could you explain
a bit more why this is a problem (route dump is relatively fast,
unless there are millions of routes) and how this change helps?

AFAIU, routes will not change much in kubernetes environment once
the pod is initially created and configured and the port creation
will trigger route cache reset anyway.

And we will need to reset the cache when new interfaces are
added/removed from the filter, because otherwise we'll have
stale routes in there and the cache may become inconsistent.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 5/5] ovsdb: raft: Fix inability to join after leadership change round trip.

2024-03-26 Thread Ilya Maximets
Consider the following sequence of events:

 1. Cluster with 2 nodes - A and B.  A is a leader.
 2. C connects to A and sends a join request.
 3. A sends an append request to C.  C is in CATCHUP phase for A.
 4. A looses leadership to B.  Sends join failure notification to C.
 5. C sends append reply to A.
 6. A discards append reply (not leader).
 7. B looses leadership back to A.
 8. C sends a new join request to A.
 9. A replies with failure (already in progress).
 10. GoTo step 8.

At this point A is waiting for an append reply that it already
discarded at step 6 and fails all the new attempts of C to join with
'already in progress' verdict.  C stays forever in a joining state
and in a CATCHUP phase from A's perspective.

This is a similar case to a sudden disconnect from a leader fixed in
commit 999ba294fb4f ("ovsdb: raft: Fix inability to join the cluster
after interrupted attempt."), but since we're not disconnecting, the
servers are not getting destroyed.

Fix that by destroying all the servers that are not yet part of the
configuration after leadership is lost.  This way, server C will be
able to simply re-start the joining process from scratch.

New failure test command is added in order to simulate leadership
change before we receive the append reply, so it gets discarded.
New cluster test is added to exercise this scenario.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
databases.")
Reported-at: https://github.com/ovn-org/ovn/issues/235
Acked-by: Han Zhou 
Signed-off-by: Ilya Maximets 
---
 ovsdb/raft.c   | 16 -
 tests/ovsdb-cluster.at | 53 ++
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index 0c171b754..d81a1758a 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -81,6 +81,7 @@ enum raft_failure_test {
 FT_STOP_RAFT_RPC,
 FT_TRANSFER_LEADERSHIP,
 FT_TRANSFER_LEADERSHIP_AFTER_SEND_APPEND_REQ,
+FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD,
 };
 static enum raft_failure_test failure_test;
 
@@ -2740,15 +2741,22 @@ raft_become_follower(struct raft *raft)
  * new configuration.  Our AppendEntries processing will properly update
  * the server configuration later, if necessary.
  *
+ * However, since we're sending replies about a failure to add, those new
+ * servers has to be cleaned up.  Otherwise, they will stuck in a 'CATCHUP'
+ * phase in case this server regains leadership before they join through
+ * the current new leader.  They are not yet in 'raft->servers', so not
+ * part of the shared configuration.
+ *
  * Also we do not complete commands here, as they can still be completed
  * if their log entries have already been replicated to other servers.
  * If the entries were actually committed according to the new leader, our
  * AppendEntries processing will complete the corresponding commands.
  */
 struct raft_server *s;
-HMAP_FOR_EACH (s, hmap_node, >add_servers) {
+HMAP_FOR_EACH_POP (s, hmap_node, >add_servers) {
 raft_send_add_server_reply__(raft, >sid, s->address, false,
  RAFT_SERVER_LOST_LEADERSHIP);
+raft_server_destroy(s);
 }
 if (raft->remove_server) {
 raft_send_remove_server_reply__(raft, >remove_server->sid,
@@ -4023,6 +4031,10 @@ raft_handle_add_server_request(struct raft *raft,
  "to cluster "CID_FMT, s->nickname, SID_ARGS(>sid),
  rq->address, CID_ARGS(>cid));
 raft_send_append_request(raft, s, 0, "initialize new server");
+
+if (failure_test == FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD) {
+failure_test = FT_TRANSFER_LEADERSHIP;
+}
 }
 
 static void
@@ -5148,6 +5160,8 @@ raft_unixctl_failure_test(struct unixctl_conn *conn 
OVS_UNUSED,
 } else if (!strcmp(test,
"transfer-leadership-after-sending-append-request")) {
 failure_test = FT_TRANSFER_LEADERSHIP_AFTER_SEND_APPEND_REQ;
+} else if (!strcmp(test, "transfer-leadership-after-starting-to-add")) {
+failure_test = FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD;
 } else if (!strcmp(test, "transfer-leadership")) {
 failure_test = FT_TRANSFER_LEADERSHIP;
 } else if (!strcmp(test, "clear")) {
diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
index 482e4e02d..9d8b4d06a 100644
--- a/tests/ovsdb-cluster.at
+++ b/tests/ovsdb-cluster.at
@@ -525,6 +525,59 @@ for i in $(seq $n); do
 OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid])
 done
 
+AT_CLEANUP
+
+AT_SETUP([OVSDB cluster - leadership change before replication while joining])
+AT_KEYWORDS([ovsdb server negative unix cluster join])
+
+n=5
+AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db dnl
+  $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr])
+cid=$(ovsdb-tool db-cid s1.db)
+schema_name=$(ovsdb-tool schema-name 

[ovs-dev] [PATCH v2 3/5] ovsdb: raft: Fix permanent joining state on a cluster member.

2024-03-26 Thread Ilya Maximets
Consider the following chain of events:

 1. Have a cluster with 2 members - A and B.  A is a leader.
 2. C connects to A, sends a request to join the cluster.
 3. A catches up C, creates an update for the 'servers' list and sends
it to B and C to apply.  This entry is not committed yet.
 4. Before B or C can reply, A looses leadership for some reason.
 5. A sends a joining failure message to C, C remains in joining state.
 5. Both B and C have the new version of 'servers', so they recognize
each other as valid cluster members.
 6. B initiates a vote, C (or A) replies and B becomes a new leader.
 7. B has a new list of servers.  B commits it.  C becomes a committed
cluster member.
 8. A and C receive heartbeats with a new commit index and C is now
a committed cluster member for all A, B and C.

However, at the end of this process, C is still in joining state
as it never received a successful reply for a join request, and C is
still in a COMMITTING phase for A.  So, C skips some parts of the RAFT
life cycle and A will refuse to transfer leadership to C if something
happens in the future.

More interestingly, B can actually transfer leadership to C and vote
for it.  A will vote for it just fine as well.  After that, C becomes
a new cluster leader while still in joining state.  In this state C
will not commit any changes.  So, we have seemingly stable cluster
that doesn't commit any changes!  E.g.:

  s3
  Address: unix:s3.raft
  Status: joining cluster
  Remotes for joining: unix:s3.raft unix:s2.raft unix:s1.raft
  Role: leader
  Term: 4
  Leader: self
  Vote: self

  Last Election started 30095 ms ago, reason: leadership_transfer
  Last Election won: 30093 ms ago
  Election timer: 1000
  Log: [2, 7]
  Entries not yet committed: 2
  Entries not yet applied: 6
  Connections: ->s1 ->s2 <-s1 <-s2
  Disconnections: 0
  Servers:
s3 (60cf at unix:s3.raft) (self) next_index=7 match_index=6
s2 (46aa at unix:s2.raft) next_index=7 match_index=6 last msg 58 ms ago
s1 (28f7 at unix:s1.raft) next_index=7 match_index=6 last msg 59 ms ago

Fix the first scenario by examining server changes in committed log
entries.  This way server A can transition C to a STABLE phase and
server C can find itself in the committed list of servers and move out
from a joining state.  This is similar to completing commands without
receiving an explicit reply or after the role change from leader to
follower.

The second scenario with a leader in a joining state can be fixed
when the joining server becomes leader.  New leader's log is getting
committed automatically and all servers transition into STABLE phase
for it, but it should also move on from a joining state, since it
leads the cluster now.
It is also possible that B transfers leadership to C before the list
of servers is marked as committed on other servers.  In this case
C will commit it's own addition to the cluster configuration.

The added test usually triggers both scenarios, but it will trigger at
least one of them.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
databases.")
Acked-by: Han Zhou 
Signed-off-by: Ilya Maximets 
---
 ovsdb/raft.c   | 44 ++-
 tests/ovsdb-cluster.at | 53 ++
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index ec3a0ff66..5d7e88732 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -386,6 +386,7 @@ static void raft_get_servers_from_log(struct raft *, enum 
vlog_level);
 static void raft_get_election_timer_from_log(struct raft *);
 
 static bool raft_handle_write_error(struct raft *, struct ovsdb_error *);
+static bool raft_has_uncommitted_configuration(const struct raft *);
 
 static void raft_run_reconfigure(struct raft *);
 
@@ -2848,6 +2849,18 @@ raft_become_leader(struct raft *raft)
 raft_reset_election_timer(raft);
 raft_reset_ping_timer(raft);
 
+if (raft->joining) {
+/* It is possible that the server committing this one to the list of
+ * servers lost leadership before the entry is committed but after
+ * it was already replicated to majority of servers.  In this case
+ * other servers will recognize this one as a valid cluster member
+ * and may transfer leadership to it and vote for it.  This way
+ * we're becoming a cluster leader without receiving reply for a
+ * join request and will commit addition of this server ourselves. */
+VLOG_INFO_RL(, "elected as leader while joining");
+raft->joining = false;
+}
+
 struct raft_server *s;
 HMAP_FOR_EACH (s, hmap_node, >servers) {
 raft_server_init_leader(raft, s);
@@ -3006,12 +3019,12 @@ raft_update_commit_index(struct raft *raft, uint64_t 
new_commit_index)
 }
 
 while (raft->commit_index < new_commit_index) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
 uint64_t index = 

[ovs-dev] [PATCH v2 4/5] ovsdb: raft: Fix assertion when 1-node cluster looses leadership.

2024-03-26 Thread Ilya Maximets
Some of the failure tests can make a single-node cluster to
loose leadership.  In this case the next raft_run() will
trigger election with a pre-vote enabled.  This is causing
an assertion when this server attempts to vote for itself.

Fix that by not using pre-voting if there is only one server.

A new failure test introduced in later commit triggers this
assertion every time.

Fixes: 85634fd58004 ("ovsdb: raft: Support pre-vote mechanism to deal with 
disruptive server.")
Acked-by: Han Zhou 
Signed-off-by: Ilya Maximets 
---
 ovsdb/raft.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index 5d7e88732..0c171b754 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -2120,7 +2120,7 @@ raft_run(struct raft *raft)
 raft_start_election(raft, true, false);
 }
 } else {
-raft_start_election(raft, true, false);
+raft_start_election(raft, hmap_count(>servers) > 1, false);
 }
 
 }
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/5] ovsdb: raft: Fix time intervals for multitasking while joining.

2024-03-26 Thread Ilya Maximets
While joining, ovsdb-server may not wake up for a duration of a join
timer, which is 1 second and is by default 3x larger than a heartbeat
timer.  This is causing unnecessary warnings from the cooperative
multitasking module that thinks that we missed the heartbeat time by
a lot.

Use join timer (1000) instead while joining.

Fixes: d4a15647b917 ("ovsdb: raft: Enable cooperative multitasking.")
Acked-by: Han Zhou 
Signed-off-by: Ilya Maximets 
---

CC: Frode Nordahl 

 ovsdb/raft.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index b171da345..ec3a0ff66 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -280,6 +280,7 @@ struct raft {
 /* Used for joining a cluster. */
 bool joining; /* Attempting to join the cluster? */
 struct sset remote_addresses; /* Addresses to try to find other servers. */
+#define RAFT_JOIN_TIMEOUT_MS 1000
 long long int join_timeout;   /* Time to re-send add server request. */
 
 /* Used for leaving a cluster. */
@@ -1083,7 +1084,7 @@ raft_open(struct ovsdb_log *log, struct raft **raftp)
 raft_start_election(raft, false, false);
 }
 } else {
-raft->join_timeout = time_msec() + 1000;
+raft->join_timeout = time_msec() + RAFT_JOIN_TIMEOUT_MS;
 }
 
 raft_reset_ping_timer(raft);
@@ -2128,7 +2129,7 @@ raft_run(struct raft *raft)
 }
 
 if (raft->joining && time_msec() >= raft->join_timeout) {
-raft->join_timeout = time_msec() + 1000;
+raft->join_timeout = time_msec() + RAFT_JOIN_TIMEOUT_MS;
 LIST_FOR_EACH (conn, list_node, >conns) {
 raft_send_add_server_request(raft, conn);
 }
@@ -2162,10 +2163,12 @@ raft_run(struct raft *raft)
 raft_reset_ping_timer(raft);
 }
 
+uint64_t interval = raft->joining
+? RAFT_JOIN_TIMEOUT_MS
+: RAFT_TIMER_THRESHOLD(raft->election_timer);
 cooperative_multitasking_set(
 _run_cb, (void *) raft, time_msec(),
-RAFT_TIMER_THRESHOLD(raft->election_timer)
-+ RAFT_TIMER_THRESHOLD(raft->election_timer) / 10, "raft_run");
+interval + interval / 10, "raft_run");
 
 /* Do this only at the end; if we did it as soon as we set raft->left or
  * raft->failed in handling the RemoveServerReply, then it could easily
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/5] ovsdb: raft: Avoid transferring leadership to unavailable servers.

2024-03-26 Thread Ilya Maximets
Current implementation of the leadership transfer just shoots the
leadership in the general direction of the first stable server in the
configuration.  It doesn't check if the server was active recently or
even that the connection is established.  This may result in sending
leadership to a disconnected or otherwise unavailable server.

Such behavior should not cause log truncation or any other correctness
issues because the destination server would have all the append
requests queued up or the connection will be dropped by the leader.
In a worst case we will have a leader-less cluster until the next
election timer fires up.  Other servers will notice the absence of the
leader and will trigger a new leader election normally.
However, the potential wait for the election timer is not good as
real-world setups may have high values configured.

Fix that by trying to transfer to servers that we know have applied
the most changes, i.e., have the highest 'match_index'.  Such servers
replied to the most recent append requests, so they have highest
chances to be healthy.  Choosing the random starting point in the
list of such servers so we don't transfer to the same server every
single time.  This slightly improves load distribution, but, most
importantly, increases robustness of our test suite, making it cover
more cases.  Also checking that the message was actually sent without
immediate failure.

If we fail to transfer to any server with the highest index, try to
just transfer to any other server that is not behind majority and then
just any other server that is connected.  We did actually send them
all the updates (if the connection is open), they just didn't reply
yet for one reason or another.  It should be better than leaving the
cluster without a leader.

Note that there is always a chance that transfer will fail, since
we're not waiting for it to be acknowledged (and must not wait).
In this case, normal election will be triggered after the election
timer fires up.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
databases.")
Signed-off-by: Ilya Maximets 
---

CC: Felix Huettner 

 ovsdb/raft.c | 48 
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index f463afcb3..b171da345 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -1261,10 +1261,30 @@ raft_transfer_leadership(struct raft *raft, const char 
*reason)
 return;
 }
 
-struct raft_server *s;
+struct raft_server **servers, *s;
+uint64_t threshold = 0;
+size_t n = 0, start, i;
+
+servers = xmalloc(hmap_count(>servers) * sizeof *servers);
+
 HMAP_FOR_EACH (s, hmap_node, >servers) {
-if (!uuid_equals(>sid, >sid)
-&& s->phase == RAFT_PHASE_STABLE) {
+if (uuid_equals(>sid, >sid)
+|| s->phase != RAFT_PHASE_STABLE) {
+continue;
+}
+if (s->match_index > threshold) {
+threshold = s->match_index;
+}
+servers[n++] = s;
+}
+
+start = n ? random_range(n) : 0;
+
+retry:
+for (i = 0; i < n; i++) {
+s = servers[(start + i) % n];
+
+if (s->match_index >= threshold) {
 struct raft_conn *conn = raft_find_conn_by_sid(raft, >sid);
 if (!conn) {
 continue;
@@ -1280,7 +1300,10 @@ raft_transfer_leadership(struct raft *raft, const char 
*reason)
 .term = raft->term,
 }
 };
-raft_send_to_conn(raft, , conn);
+
+if (!raft_send_to_conn(raft, , conn)) {
+continue;
+}
 
 raft_record_note(raft, "transfer leadership",
  "transferring leadership to %s because %s",
@@ -1288,6 +1311,23 @@ raft_transfer_leadership(struct raft *raft, const char 
*reason)
 break;
 }
 }
+
+if (n && i == n && threshold) {
+if (threshold > raft->commit_index) {
+/* Failed to transfer to servers with the highest 'match_index'.
+ * Try other servers that are not behind the majority. */
+threshold = raft->commit_index;
+} else {
+/* Try any other server.  It is safe, because they either have all
+ * the append requests queued up for them before the leadership
+ * transfer message or their connection is broken and we will not
+ * transfer anyway. */
+threshold = 0;
+}
+goto retry;
+}
+
+free(servers);
 }
 
 /* Send a RemoveServerRequest to the rest of the servers in the cluster.
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 0/5] ovsdb: raft: Fixes for cluster joining state.

2024-03-26 Thread Ilya Maximets


Issues discovered while working on:
  https://github.com/ovn-org/ovn/issues/235


Version 2:
  * Replaced the first patch with an implementation of leadership
transfer to servers that applied the most changes. [Felix, Han]
  * Fixed some typos. [Han]
  * Added a MACRO for the join timeout value. [Han]
  * Added Ack's from Han to patches 2-5.


Ilya Maximets (5):
  ovsdb: raft: Avoid transferring leadership to unavailable servers.
  ovsdb: raft: Fix time intervals for multitasking while joining.
  ovsdb: raft: Fix permanent joining state on a cluster member.
  ovsdb: raft: Fix assertion when 1-node cluster looses leadership.
  ovsdb: raft: Fix inability to join after leadership change round trip.

 ovsdb/raft.c   | 121 +
 tests/ovsdb-cluster.at | 106 
 2 files changed, 216 insertions(+), 11 deletions(-)

-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 2/2] missed during test simplification.

2024-03-26 Thread 0-day Robot
Bleep bloop.  Greetings Jacob Tanenbaum, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject summary should start with a capital.
Subject: missed during test simplification.
Lines checked: 41, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 2/2] missed during test simplification.

2024-03-26 Thread Jacob Tanenbaum
the testing was changed to remove the hardcoded table numbers from
flows. Two referances to table=28 where missed, this patch corrects that
and adds the correct variable name to the flow.

Fixes: f614335abca1 ("tests: Remove table numbers from "action parsing".")
Signed-off-by: Jacob Tanenbaum 
---
v2: addressed formatting seen by the day-0 bot and an additional referance
to table=28 was identified

diff --git a/tests/ovn.at b/tests/ovn.at
index f04b74210..d269077ac 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -22070,7 +22070,7 @@ check_row_count Port_Binding 1 logical_port=sw0-vir 
virtual_parent=sw0-p1
 wait_for_ports_up sw0-vir
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test 2 = `cat hv1/ovn-controller.log | grep "pinctrl received  
packet-in" | \
-grep opcode=BIND_VPORT | grep OF_Table_ID=28 | wc -l`])
+grep opcode=BIND_VPORT | grep OF_Table_ID=$(ovn-debug lflow-stage-to-oftable 
ls_in_arp_rsp) | wc -l`])
 
 wait_row_count Port_Binding 1 logical_port=sw0-vir6 chassis=$hv1_ch_uuid
 check_row_count Port_Binding 1 logical_port=sw0-vir6 virtual_parent=sw0-p1
@@ -37684,7 +37684,7 @@ tpa=$(ip_to_hex 10 0 0 10)
 send_garp 1 1 $eth_src $eth_dst $spa $tpa
 
 OVS_WAIT_UNTIL([test 1 = `cat hv1/ovn-controller.log | grep "pinctrl received  
packet-in" | \
-grep opcode=BIND_VPORT | grep OF_Table_ID=28 | wc -l`])
+grep opcode=BIND_VPORT | grep OF_Table_ID=$(ovn-debug lflow-stage-to-oftable 
ls_in_arp_rsp) | wc -l`])
 
 sleep_controller hv1
 
-- 
2.42.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 1/2] Merge QoS logical pipelines.

2024-03-26 Thread Jacob Tanenbaum
currently there are 2 QoS pipelines for ingress (ls_in_qos_mark,
ls_in_qos_meter) and egress (ls_out_qos_mark, ls_out_qos_meter). This is
not necessary as there are no actions across the two pipelines that
depend on each other. The two pipelines can be merged.

Reported-at: https://issues.redhat.com/browse/FDP-397
Signed-off-by: Jacob Tanenbaum 
---
v2: addressed formatting from the day-0 bot

diff --git a/northd/northd.c b/northd/northd.c
index 1839b7d8b..d5aab756f 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -3484,7 +3484,7 @@ build_lb_vip_actions(const struct ovn_northd_lb *lb,
 }
 
 if (reject) {
-int stage = ls_dp ? ovn_stage_get_table(S_SWITCH_OUT_QOS_MARK)
+int stage = ls_dp ? ovn_stage_get_table(S_SWITCH_OUT_QOS)
   : ovn_stage_get_table(S_ROUTER_OUT_SNAT);
 ds_clear(action);
 ds_put_format(action, "reg0 = 0; reject { outport <-> inport; "
@@ -6705,7 +6705,7 @@ build_acl_action_lflows(const struct ls_stateful_record 
*ls_stateful_rec,
 "/* eth.dst <-> eth.src; ip.dst <-> ip.src; is implicit. */ "
 "outport <-> inport; next(pipeline=%s,table=%d); };",
 ingress ? "egress" : "ingress",
-ingress ? ovn_stage_get_table(S_SWITCH_OUT_QOS_MARK)
+ingress ? ovn_stage_get_table(S_SWITCH_OUT_QOS)
 : ovn_stage_get_table(S_SWITCH_IN_L2_LKUP));
 
 ovn_lflow_metered(lflows, od, stage, 1000,
@@ -7081,24 +7081,39 @@ build_qos(struct ovn_datapath *od, struct lflow_table 
*lflows,
   struct lflow_ref *lflow_ref) {
 struct ds action = DS_EMPTY_INITIALIZER;
 
-ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS_MARK, 0, "1", "next;",
+ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS, 0, "1", "next;",
   lflow_ref);
-ovn_lflow_add(lflows, od, S_SWITCH_OUT_QOS_MARK, 0, "1", "next;",
-  lflow_ref);
-ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS_METER, 0, "1", "next;",
-  lflow_ref);
-ovn_lflow_add(lflows, od, S_SWITCH_OUT_QOS_METER, 0, "1", "next;",
+ovn_lflow_add(lflows, od, S_SWITCH_OUT_QOS, 0, "1", "next;",
   lflow_ref);
 
 for (size_t i = 0; i < od->nbs->n_qos_rules; i++) {
 struct nbrec_qos *qos = od->nbs->qos_rules[i];
 bool ingress = !strcmp(qos->direction, "from-lport") ? true :false;
-enum ovn_stage stage = ingress ? S_SWITCH_IN_QOS_MARK : 
S_SWITCH_OUT_QOS_MARK;
+enum ovn_stage stage = ingress ? S_SWITCH_IN_QOS : S_SWITCH_OUT_QOS;
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
 int64_t rate = 0;
 int64_t burst = 0;
 
 ds_clear();
+for (size_t n = 0; n < qos->n_bandwidth; n++) {
+if (!strcmp(qos->key_bandwidth[n], "rate")) {
+rate = qos->value_bandwidth[n];
+} else if (!strcmp(qos->key_bandwidth[n], "burst")) {
+burst = qos->value_bandwidth[n];
+}
+}
+if (rate) {
+stage = ingress ? S_SWITCH_IN_QOS : S_SWITCH_OUT_QOS;
+if (burst) {
+ds_put_format(,
+  "set_meter(%"PRId64", %"PRId64"); ",
+  rate, burst);
+} else {
+ds_put_format(,
+  "set_meter(%"PRId64"); ",
+  rate);
+}
+}
 for (size_t j = 0; j < qos->n_action; j++) {
 if (!strcmp(qos->key_action[j], "dscp")) {
 if (qos->value_action[j] > QOS_MAX_DSCP) {
@@ -7115,43 +7130,11 @@ build_qos(struct ovn_datapath *od, struct lflow_table 
*lflows,
   qos->value_action[j]);
 }
 }
-
-if (action.length) {
 ds_put_cstr(, "next;");
-ovn_lflow_add_with_hint(lflows, od, stage, qos->priority,
-qos->match, ds_cstr(),
->header_, lflow_ref);
-}
-
-for (size_t n = 0; n < qos->n_bandwidth; n++) {
-if (!strcmp(qos->key_bandwidth[n], "rate")) {
-rate = qos->value_bandwidth[n];
-} else if (!strcmp(qos->key_bandwidth[n], "burst")) {
-burst = qos->value_bandwidth[n];
-}
-}
-if (rate) {
-stage = ingress ? S_SWITCH_IN_QOS_METER : S_SWITCH_OUT_QOS_METER;
-ds_clear();
-if (burst) {
-ds_put_format(,
-  "set_meter(%"PRId64", %"PRId64"); next;",
-  rate, burst);
-} else {
-ds_put_format(,
-  "set_meter(%"PRId64"); next;",
-  rate);
-}
-
-/* Ingress and Egress QoS Meter Table.
- *
- * We limit the bandwidth of this flow by adding a meter table.
- */
 

Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.

2024-03-26 Thread Mike Pattrick
On Mon, Mar 25, 2024 at 10:04 PM Zhangweiwei  wrote:
>
> The ethernet addresses of two ICMP request packets are indeed different . One 
> is original packet and the other is modified. It is an expected behavior 
> according to the code.
> Actually, when a packet sent by port A is changed by flow table and then is 
> sent to itself, we expect to capture this packet. However, when this packet 
> is changed and is sent to another port, should we still capture the packet on 
> port A?

Currently in ovs-tcpdump we capture the packet on ingress and then on
egress if it is modified. You could make the mirror without
ovs-tcpdump, and only set the select_dst_port option but not the
select_src_port one. select_src_port is checked during ingress and
select_dst_port is set during egress.

Hope this helps,
M

>
> [root@localhost infiniband]# ovs-tcpdump -i tapVm71 -nnvve
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 17, length 64
> 09:36:52.822232 68:05:ca:21:d6:e5 > 52:54:00:67:d5:61, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 22101, offset 0, flags [none], 
> proto ICMP (1), length 84)
> 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 15498, seq 17, length 64
> 09:36:53.862137 52:54:00:67:d5:61 > 68:05:ca:21:d6:e5, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 64, id 26518, offset 0, flags [DF], proto 
> ICMP (1), length 84)
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 18, length 64
> 09:36:53.862139 68:05:ca:21:d6:e5 > 52:54:00:9a:bf:ed, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 26518, offset 0, flags [DF], proto 
> ICMP (1), length 84)
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 18, length 64
> 09:36:53.862230 68:05:ca:21:d6:e5 > 52:54:00:67:d5:61, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 22176, offset 0, flags [none], 
> proto ICMP (1), length 84)
> 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 15498, seq 18, length 64
>
> -邮件原件-
> 发件人: Mike Pattrick [mailto:m...@redhat.com]
> 发送时间: 2024年3月25日 22:26
> 收件人: zhangweiwei (RD) 
> 抄送: d...@openvswitch.org
> 主题: Re: [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't 
> modified.
>
> On Mon, Mar 25, 2024 at 3:48 AM Zhangweiwei  wrote:
> >
> > Hi,
> > I have tried this patch, however, there are still some issues when the 
> > packets contents are changed across recirculation. On the follow example, 
> > packets are modified in recirc_id(0) after mirror, the mirror context 
> > reset. Therefore, there are two ICMP request packets are mirrored on port 
> > mitapVm71.
> >
> > In the following example, ICMP packets ared sent from port(11) to
> > port(14), [root@localhost ~]# ovs-appctl dpif/dump-flows vds1-br
> > ct_state(-new-est-rel-rpl-inv-trk),recirc_id(0),in_port(11),packet_typ
> > e(ns=0,id=0),eth(src=52:54:00:67:d5:61,dst=68:05:ca:21:d6:e5),eth_type
> > (0x0800),ipv4(src=11.11.70.1,dst=1.1.70.2,proto=1,ttl=64,frag=no),
> > packets:431, bytes:42238, used:0.574s,
> > actions:10,set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:9a:bf:ed)),set(i
> > pv4(ttl=63)),ct(zone=6),recirc(0x3e8)
> > ct_state(+est-rel-rpl),recirc_id(0x3e8),in_port(11),packet_type(ns=0,i
> > d=0),eth_type(0x0800),ipv4(frag=no), packets:430, bytes:42140,
> > used:0.574s, actions:10,14
> >
> > ct_state(-new+est-rel+rpl-inv+trk),recirc_id(0x3e9),in_port(14),packet
> > _type(ns=0,id=0),eth(src=52:54:00:9a:bf:ed,dst=68:05:ca:21:d6:e5),eth_
> > type(0x0800),ipv4(dst=11.11.70.1,proto=1,ttl=64,frag=no), packets:431,
> > bytes:42238, used:0.574s,
> > actions:set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:67:d5:61)),set(ipv4
> > (ttl=63)),11,10
> > ct_state(-trk),recirc_id(0),in_port(14),packet_type(ns=0,id=0),eth(src
> > =52:54:00:9a:bf:ed),eth_type(0x0800),ipv4(src=1.1.70.2,proto=1,frag=no
> > ), packets:431, bytes:42238, used:0.574s,
> > actions:ct(zone=6),recirc(0x3e9)
> >
> > [root@localhost ~]# ovs-appctl dpif/show
> > netdev@ovs-netdev: hit:2552 missed:3019
> >   vds1-br:
> > mitapVm71 14/10: (system)
> > tapVm71 5/11: (dpdkvhostuserclient: configured_rx_queues=1, 
> > configured_tx_queues=1, mtu=1500, requested_rx_queues=1, 
> > requested_tx_queues=1)
> > tapVm72 6/14: (dpdkvhostuserclient: configured_rx_queues=1,
> > configured_tx_queues=1, mtu=1500, requested_rx_queues=1,
> > requested_tx_queues=1)
> >
> > [root@localhost ~]# ovs-tcpdump -i tapVm71
> > 14:38:53.702142 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2014, length 64
> > 14:38:53.702143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2014, length 64
> > 14:38:53.702185 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483,
> > seq 2014, length 64
> > 14:38:54.742141 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2015, length 64
> > 14:38:54.742143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2015, length 64
> > 14:38:54.742183 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483,
> > seq 2015, length 64
> > 14:38:55.782142 IP 11.11.70.1 > 

Re: [ovs-dev] [PATCH v13 5/6] tests: system-traffic: Add coverage for drop action.

2024-03-26 Thread Ilya Maximets
On 3/22/24 15:34, Eelco Chaudron wrote:
> 
> 
> On 22 Mar 2024, at 14:54, Eric Garver wrote:
> 
>> Exercise the drop action in the datapath. This specific tests triggers
>> an xlate_error.
>>
>> For the kernel datapath skb drop reasons can then be seen while this
>> test runs.
>>
>>  # perf trace -e skb:kfree_skb --filter="reason >= 0x3"
>>0.000 ping/1275884 skb:kfree_skb(skbaddr: 0x8acd76546000, \
>>   location: 0xc0ee3634, protocol: 2048, reason: 196611)
>>
>> Signed-off-by: Eric Garver 
> 
> Thank you, Eric, for implementing this final change. With the ACK below,
> the entire series appears satisfactory to me.
> 
> I'll hold off on applying the series until Ilya has had another opportunity
> to review it, considering his comments on a previous revision. Once he
> approves, I'll proceed with applying the series.

Thanks, Eelco and Eric!

I plan to take another look at this revision this week.

Best regards, Ilya Maximets.

> 
> Cheers,
> 
> Eelco
> 
> Acked-by: Eelco Chaudron 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] tests: Add macro for checking flows after recompute.

2024-03-26 Thread Xavier Simonart
The macro CHECK_FLOWS_AFTER_RECOMPUTE dumps the Openflows, then
recomputes, then dumps again the Openflows, and finally compares
both sets of flows. The test fails if flows are different.
As of now, the macro cannot be used in all tests: many tests would fail
as I+P does not properly remove flows when the last logical port of
a datapath is deleted.

Signed-off-by: Xavier Simonart 
---
 tests/ovn-macros.at | 44 
 1 file changed, 44 insertions(+)

diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index ed93764d3..11377f616 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -10,6 +10,50 @@ m4_define([OVN_CLEANUP_VSWITCH],[
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 ])
 
+# DUMP_FLOWS(sbox, output_file)
+# Dump openflows to output_file for sbox
+m4_define([DUMP_FLOWS], [
+sbox=$1
+output_file=$2
+as $sbox
+ovs-ofctl dump-flows br-int |
+  sed 's/cookie=0x[[^,]]*/cookie=xx/g' |
+  sed 's/duration=[[^,]]*/duration=xx/g' |
+  sed 's/idle_age=[[^,]]*/idle_age=xx/g' |
+  sed 's/, hard_age=[[^,]]*//g' |
+  sed 's/n_bytes=[[^,]]*/n_bytes=xx/g' |
+  sed 's/n_packets=[[^,]]*/n_packets=xx/g' |
+  sed 's/conjunction([[^,]]*/conjunction(xx/g' |
+  sort > $output_file
+])
+
+m4_define([CHECK_FLOWS_AFTER_RECOMPUTE], [
+hv=$1
+sbox=$2
+# Make sure I+P has finalized his job before getting flows and comparing 
them after recompte.
+# Some tests have northd and ovn-nb ovsdb stopped, so avoid ovn-nbctl for 
those.
+if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then
+# Do wait twice to handle some potential race conditions
+check ovn-nbctl --wait=hv sync
+check ovn-nbctl --wait=hv sync
+fi
+
+as $sbox
+if test "$hv" != "vtep"; then
+  # Get flows before and after recompute
+  DUMP_FLOWS([$sbox], [flows-$hv-1])
+
+  check ovn-appctl -t ovn-controller recompute
+  # The recompute might cause some sb changes. Let controller catch up.
+  if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then
+  check ovn-nbctl --wait=hv sync
+  fi
+  DUMP_FLOWS([$sbox], [flows-$hv-2])
+  diff flows-$hv-1 flows-$hv-2 > flow-diff
+  AT_CHECK([test $(diff flows-$hv-1 flows-$hv-2 | wc -l) == 0])
+fi
+])
+
 # OVN_CLEANUP_CONTROLLER(sbox)
 #
 # Gracefully terminate ovn-controller in the specified
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] automake: Make system tests dependent of ovn-macro.

2024-03-26 Thread Xavier Simonart
So system testsuite will be recompiled when ovn-macro is changed.

Signed-off-by: Xavier Simonart 
---
 tests/automake.mk | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/automake.mk b/tests/automake.mk
index f6f0f0e33..1fdc89835 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -21,6 +21,7 @@ EXTRA_DIST += \
 COMMON_MACROS_AT = \
tests/ovsdb-macros.at \
tests/ovs-macros.at \
+   tests/ovn-macros.at \
tests/ofproto-macros.at
 
 TESTSUITE_AT = \
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 1/3] tests: Add macros to pause controller updates.

2024-03-26 Thread Xavier Simonart
Such macros can then be used for instance to create condition where
sb is seen as read-only by ovn-controller.

Signed-off-by: Xavier Simonart 
---
 tests/ovn-macros.at | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index ed93764d3..bcfd6a521 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -220,12 +220,14 @@ ovn_start_northd() {
 # options are accepted to adjust that:
 #   --backup-northd Start a backup northd.
 #   --backup-northd=paused  Start the backup northd in the paused state.
+#   --use-tcp-to-sb Use tcp to connect to sb.
 ovn_start () {
 local backup_northd=false
 local backup_northd_options=
 case $1 in
 --backup-northd) backup_northd=true; shift ;;
 --backup-northd=paused) backup_northd=true; 
backup_northd_options=--paused; shift ;;
+--use-tcp-to-sb) use_tcp=true; shift ;;
 esac
 local AZ=$1
 local msg_prefix=${AZ:+$AZ: }
@@ -246,7 +248,13 @@ ovn_start () {
 ovn_start_northd $backup_northd_options backup $AZ
 fi
 
-if test X$HAVE_OPENSSL = Xyes; then
+if test $use_tcp; then
+# Create the SB DB ptcp connection.
+ovn-sbctl \
+-- --id=@c create connection \
+target=\"ptcp:0:127.0.0.1\" \
+-- add SB_Global . connections @c
+elif test X$HAVE_OPENSSL = Xyes; then
 # Create the SB DB pssl+RBAC connection.
 ovn-sbctl \
 -- --id=@c create connection \
@@ -973,6 +981,21 @@ wake_up_ovsdb() {
   AT_CHECK([kill -CONT $(cat $1/ovsdb-server.pid)])
 }
 
+stop_ovsdb_controller_updates() {
+  TCP_PORT=$1
+  echo Stopping updates from ovn-controller to ovsdb using port $TCP_PORT
+  on_exit 'nft list tables | grep ovn-test && nft delete table ip ovn-test'
+  nft add table ip ovn-test
+  nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy 
accept; }'
+  nft add rule ip ovn-test INPUT tcp dport $TCP_PORT counter drop
+}
+restart_ovsdb_controller_updates() {
+  TCP_PORT=$1
+  echo Restarting updates from ovn-controller to ovsdb
+  nft list ruleset | grep $TCP_PORT
+  nft delete table ip ovn-test
+}
+
 trim_zeros() {
 sed 's/\(00\)\{1,\}$//'
 }
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 2/3] pinctrl: Fix missing MAC_Bindings.

2024-03-26 Thread Xavier Simonart
Pinctrl is responsible of creating MAC_Bindings on peer router datapaths.
However, when sb was read-only, this did not happen.
This caused the test "neighbor update on same HV" to fail in a flaky way.

Signed-off-by: Xavier Simonart 

---
v2: - Fix userspace tests
- Replace iptables by nftables based on Ales's feedback.
- Move stop/restart_ovsdb_controller_updates to ovn-macros.
- Rebase on origin/main.
---
 controller/pinctrl.c |   2 +-
 tests/system-ovn.at  | 114 +++
 2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 2d3595cd2..f75b04696 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -4711,7 +4711,7 @@ send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn,
 garp_rarp->announce_time = time_msec() + 1000;
 garp_rarp->backoff = 1000; /* msec. */
 }
-} else {
+} else if (ovnsb_idl_txn) {
 add_garp_rarp(name, laddrs->ea,
   laddrs->ipv4_addrs[i].addr,
   binding_rec->datapath->tunnel_key,
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 54d913c0b..26bb331b6 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -12208,3 +12208,117 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port 
patch-.*/d
 /connection dropped.*/d"])
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([MAC_Bindings updates on read-only sb])
+ovn_start --use-tcp-to-sb
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-int])
+
+PARSE_LISTENING_PORT([$ovs_base/ovn-sb/ovsdb-server.log], [TCP_PORT])
+
+# Use tcp to connect to sb
+ovs-vsctl \
+-- set Open_vSwitch . external-ids:system-id=hv1 \
+-- set Open_vSwitch . external-ids:ovn-remote=tcp:127.0.0.1:$TCP_PORT \
+-- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
+-- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
+-- set bridge br-int fail-mode=secure other-config:disable-in-band=true
+
+# Start ovn-controller
+start_daemon ovn-controller
+
+# Logical network:
+# A public switch (pub) with a localnet port connected to two LRs (lr0 and lr1)
+# each with a distributed gateway port.
+# Two VMs: lp0 on sw0 connected to lr0
+#  lp1 on sw1 connected to lr1
+#
+# This test adds a floating IP on one VM and checks the MAC_Binding entries to 
be updated properly.
+
+# Create logical switches
+check ovn-nbctl ls-add sw0
+check ovn-nbctl ls-add sw1
+check ovn-nbctl ls-add pub
+
+# Created localnet port on public switch
+check ovn-nbctl lsp-add pub ln-pub
+check ovn-nbctl lsp-set-type ln-pub localnet
+check ovn-nbctl lsp-set-addresses ln-pub unknown
+check ovn-nbctl lsp-set-options ln-pub network_name=phys
+
+# Create logical routers and connect them to public switch
+AT_CHECK([(ovn-nbctl create Logical_Router name=lr0;
+   ovn-nbctl create Logical_Router name=lr1) | uuidfilt], [0], [<0>
+<1>
+])
+check ovn-nbctl lrp-add lr0 lr0-pub f0:00:00:00:00:01 172.24.4.220/24
+check ovn-nbctl lsp-add pub pub-lr0 -- set Logical_Switch_Port pub-lr0 \
+type=router options:router-port=lr0-pub options:nat-addresses="router" 
addresses="router"
+check ovn-nbctl lrp-add lr1 lr1-pub f0:00:00:00:01:01 172.24.4.221/24
+check ovn-nbctl lsp-add pub pub-lr1 -- set Logical_Switch_Port pub-lr1 \
+type=router options:router-port=lr1-pub options:nat-addresses="router" 
addresses="router"
+
+check ovn-nbctl lrp-set-gateway-chassis lr0-pub hv1 10
+check ovn-nbctl lrp-set-gateway-chassis lr1-pub hv1 10
+
+# Connect sw0 and sw1 to lr0 and lr1
+check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.254/24
+check ovn-nbctl lsp-add sw0 sw0-lr0 -- set Logical_Switch_Port sw0-lr0 
type=router \
+options:router-port=lr0-sw0 addresses="router"
+check ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24
+check ovn-nbctl lsp-add sw1 sw1-lr1 -- set Logical_Switch_Port sw1-lr1 
type=router \
+options:router-port=lr1-sw1 addresses="router"
+
+ADD_BR([br-phys])
+check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+
+check ovs-vsctl add-port br-int vif0 \
+  -- set Interface vif0 external-ids:iface-id=lp0 \
+  -- set Interface vif0 type=internal
+
+check ovn-nbctl lsp-add sw0 lp0
+check ovn-nbctl lsp-add sw1 lp1
+check ovn-nbctl lsp-set-addresses lp0 "50:54:00:00:00:01 10.0.0.10"
+check ovn-nbctl lsp-set-addresses lp1 "50:54:00:00:00:02 20.0.0.10"
+
+OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lp0` = xup])
+ovn-nbctl --wait=hv sync
+
+# Stopping updates to sb
+# By stopping temporarily updates from controller to sb, we are making sb 
read-only.
+# We can't just pause sb to make it read-only, as we expect sb to still handle 
northd changes.
+stop_ovsdb_controller_updates $TCP_PORT
+
+# Adding lp1 : this will make sb read-only
+check ovs-vsctl add-port br-int vif1 \
+  -- set Interface 

[ovs-dev] [PATCH ovn v2 3/3] pinctrl: Fixed 100% cpu on ovs connection loss.

2024-03-26 Thread Xavier Simonart
This issue is happening for instance when running test
"ovn-controller - Chassis other_config".

Signed-off-by: Xavier Simonart 

---
v2: Amend subject summary.
Rebase on origin/main.
---
 controller/pinctrl.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index f75b04696..ec6c7549b 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -3627,13 +3627,14 @@ pinctrl_handler(void *arg_)
 
 rconn_run_wait(swconn);
 rconn_recv_wait(swconn);
-send_garp_rarp_wait(send_garp_rarp_time);
-ipv6_ra_wait(send_ipv6_ra_time);
-ip_mcast_querier_wait(send_mcast_query_time);
-svc_monitors_wait(svc_monitors_next_run_time);
-ipv6_prefixd_wait(send_prefixd_time);
-bfd_monitor_wait(bfd_time);
-
+if (rconn_is_connected(swconn)) {
+send_garp_rarp_wait(send_garp_rarp_time);
+ipv6_ra_wait(send_ipv6_ra_time);
+ip_mcast_querier_wait(send_mcast_query_time);
+svc_monitors_wait(svc_monitors_next_run_time);
+ipv6_prefixd_wait(send_prefixd_time);
+bfd_monitor_wait(bfd_time);
+}
 seq_wait(pinctrl_handler_seq, new_seq);
 
 latch_wait(>pinctrl_thread_exit);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] acl-log: Properly log the "pass" verdict.

2024-03-26 Thread Ales Musil
On Mon, Mar 25, 2024 at 6:48 PM Mark Michelson  wrote:

> The "pass" verdict was not explicitly defined in the list of verdicts
> for ACL logging. This resulted in logs saying "Syntax error at `pass'
> unknown verdict."
>
> This change adds the "pass" verdict explicitly so that it shows up as a
> proper log in ovn-controller.
>
> Reported-at: https://issues.redhat.com/browse/FDP-442
> Signed-off-by: Mark Michelson 
> ---
>  lib/acl-log.c | 4 +++-
>  lib/acl-log.h | 1 +
>  lib/actions.c | 2 ++
>  tests/ovn.at  | 3 +++
>  4 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/lib/acl-log.c b/lib/acl-log.c
> index 9530dd763..b3eb4bbd0 100644
> --- a/lib/acl-log.c
> +++ b/lib/acl-log.c
> @@ -34,7 +34,9 @@ log_verdict_to_string(uint8_t verdict)
>  return "drop";
>  } else if (verdict == LOG_VERDICT_REJECT) {
>  return "reject";
> -} else {
> +} else if (verdict == LOG_VERDICT_PASS) {
> +return "pass";
> +} else  {
>  return "";
>  }
>  }
> diff --git a/lib/acl-log.h b/lib/acl-log.h
> index da7fa2f02..3973a8e0b 100644
> --- a/lib/acl-log.h
> +++ b/lib/acl-log.h
> @@ -33,6 +33,7 @@ enum log_verdict {
>  LOG_VERDICT_ALLOW,
>  LOG_VERDICT_DROP,
>  LOG_VERDICT_REJECT,
> +LOG_VERDICT_PASS,
>  LOG_VERDICT_UNKNOWN = UINT8_MAX
>  };
>
> diff --git a/lib/actions.c b/lib/actions.c
> index 71615fc53..29584a189 100644
> --- a/lib/actions.c
> +++ b/lib/actions.c
> @@ -3596,6 +3596,8 @@ parse_log_arg(struct action_context *ctx, struct
> ovnact_log *log)
>  log->verdict = LOG_VERDICT_REJECT;
>  } else if (lexer_match_id(ctx->lexer, "allow")) {
>  log->verdict = LOG_VERDICT_ALLOW;
> +} else if (lexer_match_id(ctx->lexer, "pass")) {
> +log->verdict = LOG_VERDICT_PASS;
>  } else {
>  lexer_syntax_error(ctx->lexer, "unknown verdict");
>  return;
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 4d0c7ad53..f272749aa 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -1847,6 +1847,9 @@ log(name="test1", verdict=drop, severity=info,
> meter="meter1");
>  log(verdict=drop);
>  formats as log(verdict=drop, severity=info);
>  encodes as controller(userdata=00.00.00.07.00.00.00.00.01.06)
> +log(verdict=pass);
> +formats as log(verdict=pass, severity=info);
> +encodes as controller(userdata=00.00.00.07.00.00.00.00.03.06)
>  log(verdict=bad_verdict, severity=info);
>  Syntax error at `bad_verdict' unknown verdict.
>  log(verdict=drop, severity=bad_severity);
> --
> 2.44.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>

Looks good to me, thanks.

Acked-by: Ales Musil 

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.com

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/2] missed during test simplification

2024-03-26 Thread Ales Musil
On Mon, Mar 25, 2024 at 5:59 PM Jacob Tanenbaum  wrote:

Hi Jacob,

thank you for the patch. Please address the 0-day bot concerns in v2.

commit: 1807499 changed the testing to get the table numbers themselves.
>

We use the following tag to indicate that the commit is fixing some
previous one:

Fixes: 180749965eda ("tests: Use the ovn-debug binary to determine table
numbers.")

This tag goes before the Signed-off-by and you can easily get it by running:

git log -1 --pretty=format:"Fixes: %h (\"%s\")" --abbrev=12 


> The table number was left in here. This patch corrects that and removes
> the table number.
>
> Signed-off-by: Jacob Tanenbaum 
> ---
>  tests/ovn.at | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index f04b74210..762b12a7b 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -37684,7 +37684,7 @@ tpa=$(ip_to_hex 10 0 0 10)
>  send_garp 1 1 $eth_src $eth_dst $spa $tpa
>
>  OVS_WAIT_UNTIL([test 1 = `cat hv1/ovn-controller.log | grep "pinctrl
> received  packet-in" | \
> -grep opcode=BIND_VPORT | grep OF_Table_ID=28 | wc -l`])
> +grep opcode=BIND_VPORT | grep OF_Table_ID=$(ovn-debug
> lflow-stage-to-oftable ls_in_arp_rsp) | wc -l`])
>

There is one more at
https://github.com/ovn-org/ovn/blob/dc52bf70cb7e066fdb84d88622d7f380eda18e8c/tests/ovn.at#L22074,
please address that one too.


>
>  sleep_controller hv1
>
> --
> 2.42.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.com

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/2] Merge QoS logical pipelines

2024-03-26 Thread Ales Musil
On Mon, Mar 25, 2024 at 5:59 PM Jacob Tanenbaum  wrote:

Hi Jacob,

thank you for he patch, the subject should end with a dot, and there is one
more thing
down below.

currently there are 2 QoS pipelines for ingress (ls_in_qos_mark,
> ls_in_qos_meter) and egress (ls_out_qos_mark, ls_out_qos_meter). This is
> not necessary as there are no actions across the two pipelines that
> depend on each other. The two pipelines can be merged.
>
> https://issues.redhat.com/browse/FDP-397


Should be:
Reported-at: https://issues.redhat.com/browse/FDP-397


>
>
> Signed-off-by: Jacob Tanenbaum 
> ---
>  northd/northd.c | 65 +
>  northd/northd.h | 46 +++-
>  tests/ovn-northd.at | 24 -
>  tests/ovn.at|  9 +++
>  4 files changed, 62 insertions(+), 82 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 1839b7d8b..d5aab756f 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -3484,7 +3484,7 @@ build_lb_vip_actions(const struct ovn_northd_lb *lb,
>  }
>
>  if (reject) {
> -int stage = ls_dp ? ovn_stage_get_table(S_SWITCH_OUT_QOS_MARK)
> +int stage = ls_dp ? ovn_stage_get_table(S_SWITCH_OUT_QOS)
>: ovn_stage_get_table(S_ROUTER_OUT_SNAT);
>  ds_clear(action);
>  ds_put_format(action, "reg0 = 0; reject { outport <-> inport; "
> @@ -6705,7 +6705,7 @@ build_acl_action_lflows(const struct
> ls_stateful_record *ls_stateful_rec,
>  "/* eth.dst <-> eth.src; ip.dst <-> ip.src; is implicit. */ "
>  "outport <-> inport; next(pipeline=%s,table=%d); };",
>  ingress ? "egress" : "ingress",
> -ingress ? ovn_stage_get_table(S_SWITCH_OUT_QOS_MARK)
> +ingress ? ovn_stage_get_table(S_SWITCH_OUT_QOS)
>  : ovn_stage_get_table(S_SWITCH_IN_L2_LKUP));
>
>  ovn_lflow_metered(lflows, od, stage, 1000,
> @@ -7081,24 +7081,39 @@ build_qos(struct ovn_datapath *od, struct
> lflow_table *lflows,
>struct lflow_ref *lflow_ref) {
>  struct ds action = DS_EMPTY_INITIALIZER;
>
> -ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS_MARK, 0, "1", "next;",
> +ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS, 0, "1", "next;",
>lflow_ref);
> -ovn_lflow_add(lflows, od, S_SWITCH_OUT_QOS_MARK, 0, "1", "next;",
> -  lflow_ref);
> -ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS_METER, 0, "1", "next;",
> -  lflow_ref);
> -ovn_lflow_add(lflows, od, S_SWITCH_OUT_QOS_METER, 0, "1", "next;",
> +ovn_lflow_add(lflows, od, S_SWITCH_OUT_QOS, 0, "1", "next;",
>lflow_ref);
>
>  for (size_t i = 0; i < od->nbs->n_qos_rules; i++) {
>  struct nbrec_qos *qos = od->nbs->qos_rules[i];
>  bool ingress = !strcmp(qos->direction, "from-lport") ? true
> :false;
> -enum ovn_stage stage = ingress ? S_SWITCH_IN_QOS_MARK :
> S_SWITCH_OUT_QOS_MARK;
> +enum ovn_stage stage = ingress ? S_SWITCH_IN_QOS :
> S_SWITCH_OUT_QOS;
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>  int64_t rate = 0;
>  int64_t burst = 0;
>
>  ds_clear();
> +for (size_t n = 0; n < qos->n_bandwidth; n++) {
> +if (!strcmp(qos->key_bandwidth[n], "rate")) {
> +rate = qos->value_bandwidth[n];
> +} else if (!strcmp(qos->key_bandwidth[n], "burst")) {
> +burst = qos->value_bandwidth[n];
> +}
> +}
> +if (rate) {
> +stage = ingress ? S_SWITCH_IN_QOS : S_SWITCH_OUT_QOS;
> +if (burst) {
> +ds_put_format(,
> +  "set_meter(%"PRId64", %"PRId64"); ",
> +  rate, burst);
> +} else {
> +ds_put_format(,
> +  "set_meter(%"PRId64"); ",
> +  rate);
> +}
> +}
>  for (size_t j = 0; j < qos->n_action; j++) {
>  if (!strcmp(qos->key_action[j], "dscp")) {
>  if (qos->value_action[j] > QOS_MAX_DSCP) {
> @@ -7115,43 +7130,11 @@ build_qos(struct ovn_datapath *od, struct
> lflow_table *lflows,
>qos->value_action[j]);
>  }
>  }
> -
> -if (action.length) {
>  ds_put_cstr(, "next;");
> -ovn_lflow_add_with_hint(lflows, od, stage, qos->priority,
> -qos->match, ds_cstr(),
> ->header_, lflow_ref);
> -}
> -
> -for (size_t n = 0; n < qos->n_bandwidth; n++) {
> -if (!strcmp(qos->key_bandwidth[n], "rate")) {
> -rate = qos->value_bandwidth[n];
> -} else if (!strcmp(qos->key_bandwidth[n], "burst")) {
> -burst = qos->value_bandwidth[n];
> -}
> -}
> - 

Re: [ovs-dev] [PATCH ovn] controller: Fix ofctrl memory usage underflow.

2024-03-26 Thread Han Zhou
On Wed, Mar 20, 2024 at 12:48 PM Mark Michelson  wrote:
>
> Thanks Ales, looks good to me.
>
> Acked-by: Mark Michelson 
>

Thanks Ales and Mark. I applied to main and backported down to branch-23.06.

Han

> On 3/19/24 11:57, Ales Musil wrote:
> > The memory usage would be increased for size of sb_addrset_ref
> > struct, but decreased for the size of the struct + the name.
> > That would slowly lead to underflows in some cases.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-507
> > Signed-off-by: Ales Musil 
> > ---
> >   controller/ofctrl.c | 10 --
> >   1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > index f14cd79a8..0ef3b8366 100644
> > --- a/controller/ofctrl.c
> > +++ b/controller/ofctrl.c
> > @@ -1112,6 +1112,12 @@ sb_to_flow_size(const struct sb_to_flow *stf)
> >   return sizeof *stf;
> >   }
> >
> > +static size_t
> > +sb_addrset_ref_size(const struct sb_addrset_ref *sar)
> > +{
> > +return sizeof *sar + strlen(sar->name) + 1;
> > +}
> > +
> >   static struct sb_to_flow *
> >   sb_to_flow_find(struct hmap *uuid_flow_table, const struct uuid
*sb_uuid)
> >   {
> > @@ -1181,8 +1187,8 @@ link_flow_to_sb(struct ovn_desired_flow_table
*flow_table,
> >   }
> >   if (!found) {
> >   sar = xmalloc(sizeof *sar);
> > -mem_stats.sb_flow_ref_usage += sizeof *sar;
> >   sar->name = xstrdup(as_info->name);
> > +mem_stats.sb_flow_ref_usage += sb_addrset_ref_size(sar);
> >   hmap_init(>as_ip_to_flow_map);
> >   ovs_list_insert(>addrsets, >list_node);
> >   }
> > @@ -1568,7 +1574,7 @@ remove_flows_from_sb_to_flow(struct
ovn_desired_flow_table *flow_table,
> >   free(itfn);
> >   }
> >   hmap_destroy(>as_ip_to_flow_map);
> > -mem_stats.sb_flow_ref_usage -= (sizeof *sar +
strlen(sar->name) + 1);
> > +mem_stats.sb_flow_ref_usage -= sb_addrset_ref_size(sar);
> >   free(sar->name);
> >   free(sar);
> >   }
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev