Re: [ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly

2023-07-18 Thread Florian Westphal
Jakub Kicinski  wrote:
> On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> > With the OVS upcall, the original ct in the skb will be dropped, and when
> > the skb comes back from userspace it has to create a new ct again through
> > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> > 
> > However, the new ct will not be able to have the exp as the original ct
> > has taken it away from the hash table in nf_ct_find_expectation(). This
> > will cause some flow never to be matched, like:
> > 
> >   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
> >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
> >   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> > 
> > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> > matched.
> > 
> > OVS conntrack works around this by adding its own exp lookup function to
> > not remove the exp from the hash table and saving the exp and its master
> > info to the flow keys instead of create a real ct. But this way doesn't
> > work for TC act_ct.
> > 
> > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> > allows both OVS conntrack and TC act_ct to have a simple and clear fix
> > for this problem in the patch 2/3 and 3/3.
> 
> Florian, Pablo, any opinion on these?

Sorry for the silence.  I dislike moving tc/ovs artifacts into
the conntrack core.

But so far I could not come up with a counter-proposal,
and it doesn't look too outrageous.

Let me look at it again later today (its early morning here)
and I will get back to this in my late evening.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly

2023-07-18 Thread Jakub Kicinski
On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> With the OVS upcall, the original ct in the skb will be dropped, and when
> the skb comes back from userspace it has to create a new ct again through
> nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> 
> However, the new ct will not be able to have the exp as the original ct
> has taken it away from the hash table in nf_ct_find_expectation(). This
> will cause some flow never to be matched, like:
> 
>   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> 
> if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> matched.
> 
> OVS conntrack works around this by adding its own exp lookup function to
> not remove the exp from the hash table and saving the exp and its master
> info to the flow keys instead of create a real ct. But this way doesn't
> work for TC act_ct.
> 
> The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> allows both OVS conntrack and TC act_ct to have a simple and clear fix
> for this problem in the patch 2/3 and 3/3.

Florian, Pablo, any opinion on these? 
Would you prefer to take them via netfilter?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Lookup route policy before ip routing

2023-07-18 Thread wangchuanlei
Hi, Dumitru,

Thank you for reply and the suggestions!
Base on your suggestions, i have a thought below. 
On 7/14/23 11:19, wangchuanlei wrote:
> Hi, Dumitru
> 

> Hi, wangchuanlei,

>   Thank you for review, i don't have performance measurements yet.
> In real physical switch, if flows hit dynimic/static route, or policy 
> route, the flows will skip other route entries, it's more reasonable on logic.
> 

>I see Cisco and Juniper routers do that, yes. However...

> For ovn, in table of ip route, there is no default route entry , but 
> route policy has, if i add one default route entry on route, for 
> unknown flows, it will through default route entry, then it won't hit 
> any route policy, but it will go to next table, so, we will not know where 
> the flows will go, it may makes the network confused.
> 
> What about add a default route entry on router, and delete the default route 
> policy on router?
> And if flows hit route entry except the default route entry, we set a 
> register, the if we detect the register is set on policy table, we 
> skip the reroute or drop action in policy table. In reverse, if we 
> don't hit any route entry, the it go to policy table, if it don't hit any 
> policy entry, we should drop it.
> 
> This idea will make the  ovn-kubernetes test fail, but it's more reasonable.
> What do you think? waiting your reply!

> It's more than breaking ovn-kubernetes, it also breaks the original use case 
> for this feature:

> https://github.com/ovn-org/ovn/commit/df4f37ea7f824d410b5682a879d50ced8b0fa71c

> It's clear that the original intention of the author was to override the 
> routing decision (after it was made):

>  To achieve this, a new table is introduced in the ingress pipeline of the
>  Logical-router. The new table is between the ???IP Routing??? and the 
> ???ARP/ND
>  resolution??? table. This way, PBR can override routing decisions and 
> provide a
>  different next-hop.

> To change the order of tables breaks the current "contract" OVN currently 
> provides.

> 
> Anyone else have ideas about this?

>I see two options (maybe there are more):

> a. we add a new table all together (maybe Logical_Router_Forward_Filter) and 
> insert rules before the routing stages.
This way may makes the flow tables more complex, because there is already have 
route table and policy table.

> b. add a new "Logical_Router_Policy.options:stage" option (or a better
> name) and support "pre-routing"/"post-routing".  If we default its value to 
> "post-routing" then we ensure backwards compatibility.
This options still needs a default route entry if we want go to the policy 
table.
Based your ideas, what about add the options to Logical_Router, like: 
Logical_Router:options:stage(or a better name) and support 
"pre-routing"/"post-routing". Set the default value to "post-routing".
Thus,
(1) stage:"post-routing"
We change nothing, all flows no changed.
(2)  stage:"pre-routing"
We would't change the table order any more. But we add a default route enty 
which macth is 1, priority is 0, and action is set a register ,then go to next 
table. 
In policy table if the register is set, which is hit none route entry, we 
need lookup policy table, otherwise go to the next table. This way is backwards 
compatible.

How do you think?

Thanks,
wangchuanlei

> 
> Best regards!
> wangchuanlei
> 
>> Hi wangchuanlei,
> 
>> Thanks for the patch!
> 
>> On 7/10/23 10:46, wangchuanlei wrote:
>> If there is no route in table ip_routing, the route policy item 
>> in table policy is useless.
> 
>> I'm sorry but I disagree with this.  We can always add a default route that 
>> matches all traffic that's not matched by any other route.
> 
>> Afterwards, in the policy stage we can re-route/drop accordingly.
> 
>> Because route policy has a higher priority than ip routing, so 
>> moddify the table id of IP_ROUTING and POLICY is a little better.By 
>> this way, there is no need reroute any more, this should be more 
>> effienct.
> 
>> I can see how this can be slightly more efficient in slow path but I'm quite 
>> confident the difference won't be significant.  Do you have performance 
>> measurements to indicate the opposite?
> 
> 
>> The real problem with just swapping the two pipelines is that we change 
>> behavior completely.  CMS out there (ovn-kube, neutron, etc.) rely on the 
>> fact that policies are applied after routing.  We can't just decide to 
>> change this and break compatibility.
> 
>> I'm quite sure the ovn-kubernetes tests also fail because of this change in 
>> behavior:
> 
>> https://github.com/ovsrobot/ovn/actions/runs/5506310703/jobs/10035411
>> 109
> 
>> It also makes OVN PBR tests fail:
> 
>> https://github.com/ovsrobot/ovn/actions/runs/5506310686
> 
>>
>> Signed-off-by: wangchuanlei 
> 
>> Regards,
>> Dumitru
> 
>> ---
>>  northd/northd.c | 14 ++
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c index
>> 

[ovs-dev] [PATCH ovn 4/5] tests: fixed "ECMP IPv6 symmetric reply"

2023-07-18 Thread Xavier Simonart
Signed-off-by: Xavier Simonart 
---
 tests/system-ovn.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 3ece31b0e..31329e74e 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -6365,7 +6365,7 @@ AT_CHECK([ovs-appctl dpctl/flush-conntrack])
 
 # Change bob1 L2 address anche check the reply is properly updated.
 ovn-nbctl set Logical_Router_Port R2_ext mac='"00:00:10:01:02:04"'
-ovn-nbctl set Logical_Switch_Port r2-ext \
+ovn-nbctl --wait=hv set Logical_Switch_Port r2-ext \
  type=router options:router-port=R2_ext addresses='"00:00:10:01:02:04"'
 
 NS_CHECK_EXEC([bob1], [nc -6 -z fd01::2 80], [0])
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/5] Unit and System tests fixes

2023-07-18 Thread Xavier Simonart
Xavier Simonart (5):
  tests: fixed another flake in "send gratuitous ARP for NAT rules on HA
distributed router"
  tests: increased bfd-mult to 15
  tests: fixed missing HAVE_SCAPY
  tests: fixed "ECMP IPv6 symmetric reply"
  tests: fixed "ARP replies for SNAT external ips"

 tests/ovn.at| 30 --
 tests/system-ovn.at |  2 +-
 2 files changed, 17 insertions(+), 15 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/5] tests: increased bfd-mult to 15

2023-07-18 Thread Xavier Simonart
Multiple tests using BFD showed a flaky behavior on busy systems as BFD
went down after three seconds.

- 4 HV, 1 LS, 1 LR, packet test with HA distributed router gateway port
- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port
- 1 LR with HA distributed router gateway port
- external logical port

Signed-off-by: Xavier Simonart 
---
 tests/ovn.at | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index 496403f36..b729595a1 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -11311,6 +11311,7 @@ check ovs-vsctl -- add-port br-int ext1-vif1 -- \
 options:rxq_pcap=ext1/vif1-rx.pcap \
 ofport-request=1
 
+ovn-nbctl --wait=hv set NB_Global . options:"bfd-mult"=15
 # Pre-populate the hypervisors' ARP tables so that we don't lose any
 # packets for ARP resolution (native tunneling doesn't queue packets
 # for ARP resolution).
@@ -11570,6 +11571,7 @@ check ovs-vsctl -- add-port br-int ext1-vif1 -- \
 options:rxq_pcap=ext1/vif1-rx.pcap \
 ofport-request=1
 
+ovn-nbctl --wait=hv set NB_Global . options:"bfd-mult"=15
 # Pre-populate the hypervisors' ARP tables so that we don't lose any
 # packets for ARP resolution (native tunneling doesn't queue packets
 # for ARP resolution).
@@ -13584,12 +13586,12 @@ for chassis in gw1 hv1 hv2; do
 ])
 done
 ovn-nbctl remove NB_Global . options "bfd-min-rx"
-ovn-nbctl --wait=hv set NB_Global . options:"bfd-mult"=5
+ovn-nbctl --wait=hv set NB_Global . options:"bfd-mult"=15
 for chassis in gw1 hv1 hv2; do
 echo "checking gw2 -> $chassis"
 OVS_WAIT_UNTIL([
 bfd_cfg=$(ovs-vsctl --bare --columns bfd find Interface 
name=ovn-$chassis-0)
-test "$bfd_cfg" = "enable=true min_tx=1500 mult=5"
+test "$bfd_cfg" = "enable=true min_tx=1500 mult=15"
 ])
 done
 
@@ -19430,6 +19432,8 @@ sim_add hv1
 sim_add hv2
 sim_add hv3
 
+ovn-nbctl --wait=hv set NB_Global . options:"bfd-mult"=15
+
 ovn-nbctl ls-add ls1
 ovn-nbctl lsp-add ls1 ls1-lp1 \
 -- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01 10.0.0.4 ae70::4"
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/5] tests: fixed another flake in "send gratuitous ARP for NAT rules on HA distributed router"

2023-07-18 Thread Xavier Simonart
Signed-off-by: Xavier Simonart 
---
 tests/ovn.at | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index f2148faac..496403f36 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -13874,6 +13874,9 @@ AT_CHECK([grep $garp hv3_br_phys_tx | sort], [0], [])
 
 # at this point, we invert the priority of the gw chassis between hv2 and hv3
 
+# Reset hv2/br-phys_n1 before inverting gw chassis, as otherwise packet might 
be reset (while not resetting hv1/snoopvif
+# because not yet sent)
+as hv2 reset_pcap_file br-phys_n1 hv2/br-phys_n1
 ovn-nbctl --wait=hv \
   --id=@gc0 create Gateway_Chassis \
 name=outside_gw1 chassis_name=hv2 priority=1 -- \
@@ -13881,7 +13884,6 @@ ovn-nbctl --wait=hv \
 name=outside_gw2 chassis_name=hv3 priority=10 -- \
   set Logical_Router_Port lrp0 'gateway_chassis=[@gc0,@gc1]'
 
-as hv2 reset_pcap_file br-phys_n1 hv2/br-phys_n1
 as hv3 reset_pcap_file br-phys_n1 hv3/br-phys_n1
 as hv1 reset_pcap_file snoopvif hv1/snoopvif
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 3/5] tests: fixed missing HAVE_SCAPY

2023-07-18 Thread Xavier Simonart
Signed-off-by: Xavier Simonart 
---
 tests/ovn.at | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/ovn.at b/tests/ovn.at
index b729595a1..1e9375ad2 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -36323,6 +36323,7 @@ AT_CLEANUP
 
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([FDB aging])
+AT_SKIP_IF([test $HAVE_SCAPY = no])
 ovn_start
 
 net_add n1
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 5/5] tests: fixed "ARP replies for SNAT external ips"

2023-07-18 Thread Xavier Simonart
Signed-off-by: Xavier Simonart 
---
 tests/ovn.at | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index 1e9375ad2..71d4a5091 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -29789,23 +29789,18 @@ test_arp_response () {
 arp_reply=${src_mac}${router_mac}08060001080006040002${router_mac}
 arp_reply=${arp_reply}${router_ip}${src_mac}${src_ip}
 
-OVS_WAIT_UNTIL([
-test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/phys1-tx.pcap | 
wc -l) -ge 1
-])
-
 # there is a small race where gw-router-port can be running on both
 # hvs and both of them can reply to the arp request ending up with
 # two arp replies in the pcap file. This is transitory not significant
-# in a real deployment that can trigger a fail here. Let's use
-# greq -q instead.
-AT_CHECK([$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/phys1-tx.pcap | \
-grep -q $arp_reply], [0])
+# in a real deployment that can trigger a fail here. Let's check that
+# we received the expected arp_reply, ignoring other packets such as
+# duplicates and garps.
+echo $arp_reply > expected_out
+OVN_CHECK_PACKETS_CONTAIN([hv2/phys1-tx.pcap], [expected_out])
 
 # $gw phys1-n1 should see the response because $gw ovn-controller responds
 # to arp request.
-AT_CHECK([$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" 
$gw/br-phys_n1-tx.pcap | \
-grep -c $arp_reply], [0], [1
-])
+OVN_CHECK_PACKETS_CONTAIN([$gw/br-phys_n1-tx.pcap], [expected_out])
 
 # $nongw1 and $nongw1 phys1-n1 should not see the response.
 $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" $nongw1/br-phys_n1-tx.pcap
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] vswitchd: Wait for a bridge exit before replying to exit unixctl.

2023-07-18 Thread Ilya Maximets
On 7/18/23 15:40, Eelco Chaudron wrote:
> 
> 
> On 18 Jul 2023, at 14:40, Ilya Maximets wrote:
> 
>> Before the cleanup option, the bridge_exit() call was fairly fast,
>> because it didn't include any particularly long operations.  However,
>> with the cleanup flag, this function destroys a lot of datapath
>> resources freeing a lot of memory, waiting on RCU and talking to
>> the kernel.  That may take a noticeable amount of time, especially
>> on a busy system or under profilers/sanitizers.  However, the unixctl
>> 'exit' command replies instantly without waiting for any work to
>> actually be done.  This may cause system test failures or other
>> issues where scripts expect ovs-vswitchd to exit or destroy all the
>> datapath resources shortly after appctl call.
>>
>> Fix that by waiting for the bridge_exit() before replying to the user.
>> At least, all the datapath resources will actually be destroyed by
>> the time ovs-appctl exits.
>>
>> Also moving a structure from stack to global.  Seems cleaner this way.
>>
>> Since we're not replying right away and it's technically possible
>> to have multiple clients requesting exit at the same time, storing
>> connections in an array.
>>
>> Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' 
>> command")
>> Signed-off-by: Ilya Maximets 
> 
> Thanks Ilya for the v2, and catching the exit_args.exiting update :)
> 
> Acked-by: Eelco Chaudron 
> 

Thanks!  Applied and backported down to 2.17 since it's fixing test issues.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ofproto-dpif-mirror: Add support for pre-selection filter

2023-07-18 Thread Mike Pattrick
Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

I benchmarked this with two setups. A netlink based test with two veth
pairs connected to a single bridge, and a netdev based test involving a
mix of DPDK nics, and netdev-linux interfaces. Both tests involved
saturating the link with iperf3 and then sending an icmp ping every
second. I then measured the throughput on the link with no mirroring,
icmp pre-selected mirroring, and full mirroring. The results, below,
indicate a significant reduction to impact of mirroring when only a
subset of the traffic on a port is selected for collection.

 Test No Mirror | No Filter |   Filter  | No Filter |  Filter  |
+---+---+---+---+--+
netlink | 39.0 Gbps | 36.1 Gbps | 38.2 Gbps | 7%|2%|
netdev  | 7.39 Gbps | 4.95 Gbps | 6.24 Gbps |33%|   15%|

The ratios above are the percent reduction in total throughput when
mirroring is used either with or without a filter.

Signed-off-by: Mike Pattrick 
---
 Documentation/ref/ovs-tcpdump.8.rst |  4 ++
 NEWS|  2 +
 lib/flow.h  | 11 ++
 ofproto/ofproto-dpif-mirror.c   | 60 +++--
 ofproto/ofproto-dpif-mirror.h   |  9 -
 ofproto/ofproto-dpif-xlate.c|  6 ++-
 ofproto/ofproto-dpif.c  |  2 +-
 ofproto/ofproto.h   |  3 ++
 tests/ofproto-dpif.at   | 43 +
 utilities/ovs-tcpdump.in| 13 ++-
 vswitchd/bridge.c   |  8 
 vswitchd/vswitch.ovsschema  |  4 +-
 vswitchd/vswitch.xml|  6 +++
 13 files changed, 160 insertions(+), 11 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..9163c8a5e 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,6 +61,10 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror flows that match the provided filter.
+
 See Also
 
 
diff --git a/NEWS b/NEWS
index 7a852427e..26797ca56 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,7 @@
 Post-v3.2.0
 
+   - ovs-tcpdump:
+ * Added new --filter parameter to apply pre-selection on mirrored flows.
 
 
 v3.2.0 - xx xxx 
diff --git a/lib/flow.h b/lib/flow.h
index a9d026e1c..c2e67dfc5 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -939,6 +939,17 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(>masks,
+>masks,
+src->masks.map);
+}
+
 static inline bool is_ct_valid(const struct flow *flow,
const struct flow_wildcards *mask,
struct flow_wildcards *wc)
diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..e4174a564 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -21,6 +21,7 @@
 #include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
+#include "ofproto-dpif-trace.h"
 #include "vlan-bitmap.h"
 #include "openvswitch/vlog.h"
 
@@ -57,6 +58,10 @@ struct mirror {
 struct hmapx srcs;  /* Contains "struct mbundle*"s. */
 struct hmapx dsts;  /* Contains "struct mbundle*"s. */
 
+/* Filter criteria. */
+struct miniflow *filter;
+struct minimask *mask;
+
 /* This is accessed by handler threads assuming RCU protection (see
  * mirror_get()), but can be manipulated by mirror_set() without any
  * explicit synchronization. */
@@ -212,7 +217,9 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
struct ofbundle **dsts, size_t n_dsts,
unsigned long *src_vlans, struct ofbundle *out_bundle,
uint16_t snaplen,
-   uint16_t out_vlan)
+   uint16_t out_vlan,
+   const char *filter,
+   const struct ofproto *ofproto)
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
@@ -285,6 +292,33 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 mirror->out_vlan = 

Re: [ovs-dev] [PATCH v5] ofproto-dpif-upcall: Mirror packets that are modified

2023-07-18 Thread Ilya Maximets
On 7/17/23 20:47, Mike Pattrick wrote:
> Currently OVS keeps track of which mirrors that each packet has been
> sent to for the purpose of deduplication. However, this doesn't consider
> that openflow rules can make significant changes to packets after
> ingress.
> 
> For example, OVN can create OpenFlow rules that turn an echo request
> into an echo response by flipping source/destination addresses and
> setting the ICMP type to Reply. When a mirror is configured, only the
> request gets mirrored even though a response is recieved.
> 
> This can cause a false impression of the actual traffic on wire if
> someone inspects the mirror and doesn't see an echo reply even though
> one has been sent.
> 
> This patch resets the mirrors every time a packet is modified, so
> mirrors will recieve every copy of a packet that is sent for output.

Nit: s/recieve/receive/ in two places above.

> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2155579
> Signed-off-by: Mike Pattrick 
> ---
>  ofproto/ofproto-dpif-xlate.c | 97 
>  tests/ofproto-dpif.at|  6 +--
>  2 files changed, 100 insertions(+), 3 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 4928ea99c..5563ac292 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -7046,6 +7046,101 @@ xlate_ofpact_unroll_xlate(struct xlate_ctx *ctx,
>   "cookie=%#"PRIx64, a->rule_table_id, a->rule_cookie);
>  }
>  
> +/* Reset the mirror context if we modify the packet and would like to mirror
> + * the new copy. */
> +static void
> +reset_mirror_ctx(struct xlate_ctx *ctx, const struct flow *flow,
> + const struct ofpact *a)
> +{
> +switch (a->type) {
> +case OFPACT_STRIP_VLAN:
> +case OFPACT_PUSH_VLAN:
> +case OFPACT_SET_ETH_SRC:
> +case OFPACT_SET_ETH_DST:
> +case OFPACT_PUSH_MPLS:
> +case OFPACT_POP_MPLS:
> +case OFPACT_SET_MPLS_LABEL:
> +case OFPACT_SET_MPLS_TC:
> +case OFPACT_SET_MPLS_TTL:
> +case OFPACT_DEC_MPLS_TTL:
> +case OFPACT_DEC_NSH_TTL:
> +case OFPACT_DEC_TTL:
> +case OFPACT_SET_VLAN_VID:
> +case OFPACT_SET_VLAN_PCP:
> +case OFPACT_ENCAP:
> +case OFPACT_DECAP:
> +case OFPACT_NAT:
> +ctx->mirrors = 0;
> +return;

An empty line needed after the case body here and below.

> +case OFPACT_SET_FIELD: {
> +const struct ofpact_set_field *set_field;
> +const struct mf_field *mf;
> +
> +set_field = ofpact_get_SET_FIELD(a);
> +mf = set_field->field;
> +if (mf_are_prereqs_ok(mf, flow, ctx->wc)) {

We probably shouldn't change wildcards in this function.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 6/8] northd: Handle load balancer group changes for a logical switch.

2023-07-18 Thread Ilya Maximets
On 7/18/23 15:17, Ilya Maximets wrote:
> On 7/18/23 05:32, Han Zhou wrote:
>>
>>
>> On Fri, Jul 7, 2023 at 1:55 PM mailto:num...@ovn.org>> 
>> wrote:
>>>
>>> From: Numan Siddique mailto:num...@ovn.org>>
>>>
>>> For every a given load balancer group 'A', northd engine data maintains
>>> a bitmap of datapaths associated to this lb group.  So when lb group 'A'
>>> gets associated to a logical switch 's1', the bitmap index of 's1' is set
>>> in its bitmap.
>>>
>>> In order to handle the load balancer group changes incrementally for a
>>> logical switch, we need to set and clear the bitmap bits accordingly.
>>> And this patch does it.
>>>
>>> Signed-off-by: Numan Siddique mailto:num...@ovn.org>>
>>> ---
>>>  lib/lb.c            |  45 +++---
>>>  lib/lb.h            |  33 ++
>>>  northd/northd.c     | 109 ++--
>>>  tests/ovn-northd.at  |   6 +--
>>>  4 files changed, 150 insertions(+), 43 deletions(-)
>>>
>>> diff --git a/lib/lb.c b/lib/lb.c
>>> index b2b6473c1d..a0426cc42e 100644
>>> --- a/lib/lb.c
>>> +++ b/lib/lb.c
>>> @@ -1100,7 +1100,7 @@ ovn_lb_datapaths_add_ls(struct ovn_lb_datapaths 
>>> *lb_dps, size_t n,
>>>
>>>  void
>>>  ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths *lb_dps, size_t n,
>>> -                        struct ovn_datapath **ods)
>>> +                           struct ovn_datapath **ods)
>>>  {
>>>      for (size_t i = 0; i < n; i++) {
>>>          if (bitmap_is_set(lb_dps->nb_ls_map, ods[i]->index)) {
>>> @@ -1112,14 +1112,14 @@ ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths 
>>> *lb_dps, size_t n,
>>>
>>>  struct ovn_lb_group_datapaths *
>>>  ovn_lb_group_datapaths_create(const struct ovn_lb_group *lb_group,
>>> -                              size_t max_ls_datapaths,
>>> -                              size_t max_lr_datapaths)
>>> +                              size_t n_ls_datapaths,
>>> +                              size_t n_lr_datapaths)
>>>  {
>>>      struct ovn_lb_group_datapaths *lb_group_dps =
>>>          xzalloc(sizeof *lb_group_dps);
>>>      lb_group_dps->lb_group = lb_group;
>>> -    lb_group_dps->ls = xmalloc(max_ls_datapaths * sizeof 
>>> *lb_group_dps->ls);
>>> -    lb_group_dps->lr = xmalloc(max_lr_datapaths * sizeof 
>>> *lb_group_dps->lr);
>>> +    lb_group_dps->nb_ls_map = bitmap_allocate(n_ls_datapaths);
>>> +    lb_group_dps->nb_lr_map = bitmap_allocate(n_lr_datapaths);
>>>
>>>      return lb_group_dps;
>>>  }
>>> @@ -1127,8 +1127,8 @@ ovn_lb_group_datapaths_create(const struct 
>>> ovn_lb_group *lb_group,
>>>  void
>>>  ovn_lb_group_datapaths_destroy(struct ovn_lb_group_datapaths *lb_group_dps)
>>>  {
>>> -    free(lb_group_dps->ls);
>>> -    free(lb_group_dps->lr);
>>> +    bitmap_free(lb_group_dps->nb_lr_map);
>>> +    bitmap_free(lb_group_dps->nb_ls_map);
>>>      free(lb_group_dps);
>>>  }
>>>
>>> @@ -1146,3 +1146,34 @@ ovn_lb_group_datapaths_find(const struct hmap 
>>> *lb_group_dps_map,
>>>      }
>>>      return NULL;
>>>  }
>>> +
>>> +void
>>> +ovn_lb_group_datapaths_add_ls(struct ovn_lb_group_datapaths *lbg_dps, 
>>> size_t n,
>>> +                              struct ovn_datapath **ods)
>>> +{
>>> +    for (size_t i = 0; i < n; i++) {
>>> +        if (!bitmap_is_set(lbg_dps->nb_ls_map, ods[i]->index)) {
>>> +            bitmap_set1(lbg_dps->nb_ls_map, ods[i]->index);
>>> +            lbg_dps->n_nb_ls++;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +void
>>> +ovn_lb_group_datapaths_remove_ls(struct ovn_lb_group_datapaths *lbg_dps,
>>> +                                 size_t n, struct ovn_datapath **ods)
>>> +{
>>> +    for (size_t i = 0; i < n; i++) {
>>> +        if (bitmap_is_set(lbg_dps->nb_ls_map, ods[i]->index)) {
>>> +            bitmap_set0(lbg_dps->nb_ls_map, ods[i]->index);
>>> +            lbg_dps->n_nb_ls--;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +void
>>> +ovn_lb_group_datapaths_add_lr(struct ovn_lb_group_datapaths *lbg_dps,
>>> +                              struct ovn_datapath *lr)
>>> +{
>>> +    bitmap_set1(lbg_dps->nb_lr_map, lr->index);
>>> +}
>>> diff --git a/lib/lb.h b/lib/lb.h
>>> index ac3a7f..08723e8ef3 100644
>>> --- a/lib/lb.h
>>> +++ b/lib/lb.h
>>> @@ -19,6 +19,7 @@
>>>
>>>  #include 
>>>  #include 
>>> +#include "lib/bitmap.h"
>>>  #include "lib/smap.h"
>>>  #include "openvswitch/hmap.h"
>>>  #include "ovn-util.h"
>>> @@ -179,6 +180,9 @@ void ovn_lb_datapaths_add_lr(struct ovn_lb_datapaths *, 
>>> size_t n,
>>>                               struct ovn_datapath **);
>>>  void ovn_lb_datapaths_add_ls(struct ovn_lb_datapaths *, size_t n,
>>>                               struct ovn_datapath **);
>>> +void ovn_lb_datapaths_set_ls(struct ovn_lb_datapaths *, size_t n,
>>> +                             struct ovn_datapath **);
>>> +
>>>  void ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths *, size_t n,
>>>                                  struct ovn_datapath **);
>>>
>>> @@ -188,10 +192,11 @@ struct ovn_lb_group_datapaths {
>>>      

Re: [ovs-dev] [PATCH] ofproto-dpif-mirror: Add support for pre-selection filter

2023-07-18 Thread Mike Pattrick
On Wed, Mar 1, 2023 at 10:14 AM David Marchand
 wrote:
>
> Hi Mike,
>
> This patch will need some rebasing.
>
>
> On Tue, Nov 1, 2022 at 5:27 PM Mike Pattrick  wrote:
> >
> > Currently a bridge mirror will collect all packets and tools like
> > ovs-tcpdump can apply additional filters after they have already been
> > duplicated by vswitchd. This can result in inefficient collection.
> >
> > This patch adds support to apply pre-selection to bridge mirrors, which
> > can limit which packets are mirrored based on flow metadata. This
> > significantly improves overall vswitchd performance during mirroring if
> > only a subset of traffic is required.
> >
> > I bencharked this with two setups. A netlink based test with two veth
>
> benchmarked*
>
> > pairs connected to a single bridge, and a netdev based test involving a
> > mix of DPDK nics, and netdev-linux interfaces. Both tests involved
> > saturating the link with iperf3 and then sending an icmp ping every
> > second. I then measured the throughput on the link with no mirroring,
> > icmp pre-selected mirroring, and full mirroring. The results, below,
> > indicate a significant reduction to impact of mirroring when only a
> > subset of the traffic on a port is selected for collection.
> >
> >  Test No Filter |   Filter  | Ratio | No Mirror |
> > +---+---+---+---+
> > netlink | 36.1 Gbps | 38.2 Gbps | 350 % | 39.0 Gbps |
> > netdev  | 4.95 Gbps | 6.24 Gbps | 126 % | 7.39 Gbps |
> >
> > Signed-off-by: Mike Pattrick 
>
> This looks really nice.
>
> I just have some trouble with "Ratio" in the table above.
> I tried to do different combinations of those numbers.. but I can't
> find a match.
> Could you detail what it is?

Looking back at the patch, this is actually a kind of silly equation,
but the 350 comes from:

(NM - NF) / (NM - F)

So the ratio between the differences. And then to make matters worse,
I rounded the raw transfer speeds after calculating that ratio. When
preparing the patch I had meant to replace this with just the ratio
between filter and no filter, but I obviously missed the 350. The
netdev number is just NF/F.

When I resubmit, I'll just include the ratios when filtering and not filtering.

>
>
> Some small comments for this first pass, I did not run this patch yet.
>
>
> > ---
> >  Documentation/ref/ovs-tcpdump.8.rst |  4 +++
> >  NEWS|  2 ++
> >  lib/flow.h  | 11 ++
> >  ofproto/ofproto-dpif-mirror.c   | 53 +++--
> >  ofproto/ofproto-dpif-mirror.h   |  9 +++--
> >  ofproto/ofproto-dpif-xlate.c|  4 ++-
> >  ofproto/ofproto-dpif.c  |  2 +-
> >  ofproto/ofproto.h   |  3 ++
> >  tests/ofproto-dpif.at   | 43 +++
> >  utilities/ovs-tcpdump.in| 13 +--
> >  vswitchd/bridge.c   |  2 ++
> >  vswitchd/vswitch.ovsschema  |  4 ++-
> >  vswitchd/vswitch.xml|  6 
> >  13 files changed, 147 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
> > b/Documentation/ref/ovs-tcpdump.8.rst
> > index b9f8cdf6f..9163c8a5e 100644
> > --- a/Documentation/ref/ovs-tcpdump.8.rst
> > +++ b/Documentation/ref/ovs-tcpdump.8.rst
> > @@ -61,6 +61,10 @@ Options
> >
> >If specified, mirror all ports (optional).
> >
> > +* ``--filter ``
> > +
> > +  If specified, only mirror flows that match the provided filter.
> > +
> >  See Also
> >  
> >
> > diff --git a/NEWS b/NEWS
> > index ff77ee404..d4eee89dd 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -23,6 +23,8 @@ Post-v3.0.0
> > bug and CVE fixes addressed since its release.
> > If a user wishes to benefit from these fixes it is recommended to 
> > use
> > DPDK 21.11.2.
> > +- ovs-tcpdump:
> > +  * Added new --filter parameter to apply pre-selection on mirrored 
> > flows.
>
> This feature affects the kernel datapath too (iow, move this entry out
> of the userspace datapath entries, by removing one indent level).
>
>
> > diff --git a/lib/flow.h b/lib/flow.h
> > index c647ad83c..403288ee5 100644
> > --- a/lib/flow.h
> > +++ b/lib/flow.h
> > @@ -938,6 +938,17 @@ flow_union_with_miniflow(struct flow *dst, const 
> > struct miniflow *src)
> >  flow_union_with_miniflow_subset(dst, src, src->map);
> >  }
> >
> > +/* Perform a bitwise OR of minimsdk 'src' mask data with the equivalent
>
> minimask*
>
>
> > + * fields in 'dst', storing the result in 'dst'. */
> > +static inline void
> > +flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
> > +   const struct minimask *src)
> > +{
> > +flow_union_with_miniflow_subset(>masks,
> > +>masks,
> > +src->masks.map);
> > +}
> > +
> >  static inline bool is_ct_valid(const struct flow *flow,
> > const struct 

Re: [ovs-dev] [PATCH v2] ovs-tcpdump: Bugfix-of-ovs-tcpdump

2023-07-18 Thread Mike Pattrick
On Mon, Jul 10, 2023 at 11:13 PM Simon Jones  wrote:
>
> From: simon 
>
> Fix bug of ovs-tcpdump, which will cause megaflow action wrong.
>
> As use ovs-tcpdump will add mipxxx NIC, and this NIC has IPv6 address by
> default.
> For vxlan topology, mipxxx will be treated as tunnel port, and will got
> error actions.
>
> For detail discuss, refer:
> https://github.com/batmancn/github-work/blob/main/ovs-bugfix-ovs-tcpdump.md
>
> Signed-off-by: simon 

I was able to confirm a variety of strange behaviours with these
autoconf IPv6 addresses in a vxlan setup. And this patch fixed those
issues.

Acked-by: Mike Pattrick 

> ---
> utilities/ovs-tcpdump.in | 4 
> 1 file changed, 4 insertions(+)
> diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
> index 420c11eb8..4cbd9a5d3 100755
> --- a/utilities/ovs-tcpdump.in
> +++ b/utilities/ovs-tcpdump.in
> @@ -96,6 +96,10 @@ def _install_dst_if_linux(tap_name, mtu_value=None):
>  *(['ip', 'link', 'set', 'dev', str(tap_name), 'up']))
>  pipe.wait()
>
> +pipe = _doexec(
> +*(['ip', '-6', 'addr', 'flush', 'dev', str(tap_name)]))
> +pipe.wait()
> +
>
>  def _remove_dst_if_linux(tap_name):
>  _doexec(
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Use helpers from committed connections.

2023-07-18 Thread Aaron Conole
Viacheslav Galaktionov  writes:

> On 7/14/23 19:11, Ilya Maximets wrote:
>> On 7/13/23 11:26, Viacheslav Galaktionov via dev wrote:
>>> Currently, if the user wants to track related connections, they have to
>>> specify a helper in all CT actions, which contradicts the behaviour
>>> described in the documentation.
>>>
>>> Fix this by using the helper committed along with the connection whenever
>>> a given CT action does not specify a helper of its own.
>>>
>>> Signed-off-by: Viacheslav Galaktionov 
>>> 
>>> Acked-by: Ivan Malov 
>>> ---
>>>   lib/conntrack.c | 4 
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>>> index 4375c03e2..d505ffed1 100644
>>> --- a/lib/conntrack.c
>>> +++ b/lib/conntrack.c
>>> @@ -1323,6 +1323,10 @@ process_one(struct conntrack *ct, struct dp_packet 
>>> *pkt,
>>>   }
>>>   }
>>>   +if (conn && helper == NULL) {
>>> +helper = conn->alg;
>>> +}
>>> +
>> Hi, Viacheslav.  Thanks for the fix!
>>
>> We have tests in the system-traffic-userspace testsuite that should cover
>> use of alg for FTP, for example.  Do you know why they are not triggering
>> the issue?  Could you add a test that demonstrates it?
>>
>> Best regards, Ilya Maximets.
> Hi Ilya!
>
> Sorry, it's been a while since I actually implemented this fix, so one
> important detail is missing from the patch's description: this fix is
> needed when the L7 server is running on a non-standard port. For example,
> my testing involved a mock FTP server running on a random port, so this
> problem popped up quickly.

We should be able to do this in the standard test suite using a system
traffic test.  Right now, they use wget ftp://x/ and I guess we
could modify test-l7 as follows:

---
diff --git a/tests/test-l7.py b/tests/test-l7.py
index 32a77392c6..4c3fa4519c 100755
--- a/tests/test-l7.py
+++ b/tests/test-l7.py
@@ -86,6 +86,8 @@ def main():
 description='Run basic application servers.')
 parser.add_argument('proto', default='http', nargs='?',
 help='protocol to serve (%s)' % protocols)
+parser.add_argument('port', default=0, nargs='?',
+help="Port number")
 args = parser.parse_args()
 
 if args.proto not in protocols:
@@ -95,6 +97,8 @@ def main():
 constructor = SERVERS[args.proto][0]
 handler = SERVERS[args.proto][1]
 port = SERVERS[args.proto][2]
+if args.port != 0:
+port = args.port
 srv = constructor(('', port), handler)
 srv.serve_forever()
 
---

Then we can run tests on non-standard ports and confuse the
get_alg_ctl_type(), etc.

> As I understand, there is a certain degree of guessing which L7 protocol
> is used by a particular connection, see the get_alg_ctl_type function.
> This function appears to look at a packet's src and dst ports and compare
> them to standard FTP/TFTP ports. If it worked perfectly, I'd expect it
> to be possible to completely omit the alg arguments from the flows used
> in tests. However, doing so breaks the tests. I haven't had the time to
> figure out why.
>
> It's clear that the helper that was committed along with the connection
> is largely (or completely) ignored in the code. This means that only
> the helper specified in the ct action can influence the result of this
> "guess", i.e. that if an action doesn't specify a helper, then it's all
> up to the packet's ports. This is what my patch aims to fix by letting
> the connection's helper play its part in this decision process.

+1

> Adding a test for this isn't exactly trivial since the test suite can
> use only standard ports right now. I managed to reproduce the issue on
> my local machine but this required me to comment out some waits and
> adjust some hardcoded constants, which probably isn't suitable for
> inclusion into the main repo.

The above patch should allow for testing on non-standard ports.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] vswitchd: Wait for a bridge exit before replying to exit unixctl.

2023-07-18 Thread Eelco Chaudron



On 18 Jul 2023, at 14:40, Ilya Maximets wrote:

> Before the cleanup option, the bridge_exit() call was fairly fast,
> because it didn't include any particularly long operations.  However,
> with the cleanup flag, this function destroys a lot of datapath
> resources freeing a lot of memory, waiting on RCU and talking to
> the kernel.  That may take a noticeable amount of time, especially
> on a busy system or under profilers/sanitizers.  However, the unixctl
> 'exit' command replies instantly without waiting for any work to
> actually be done.  This may cause system test failures or other
> issues where scripts expect ovs-vswitchd to exit or destroy all the
> datapath resources shortly after appctl call.
>
> Fix that by waiting for the bridge_exit() before replying to the user.
> At least, all the datapath resources will actually be destroyed by
> the time ovs-appctl exits.
>
> Also moving a structure from stack to global.  Seems cleaner this way.
>
> Since we're not replying right away and it's technically possible
> to have multiple clients requesting exit at the same time, storing
> connections in an array.
>
> Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' 
> command")
> Signed-off-by: Ilya Maximets 

Thanks Ilya for the v2, and catching the exit_args.exiting update :)

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-controller: Detect and use L4_SYM dp-hash if available.

2023-07-18 Thread Dumitru Ceara
On 7/18/23 12:14, Han Zhou wrote:
> On Mon, Jul 17, 2023 at 9:51 PM Dumitru Ceara  wrote:
>>
>> On 7/14/23 08:41, Ales Musil wrote:
>>> On Thu, Jul 13, 2023 at 4:39 PM Dumitru Ceara  wrote:
>>>
 Regular dp-hash is not a canonical L4 hash (at least with the netlink
 datapath).  If the datapath supports l4 symmetrical dp-hash use that
> one
 instead.

 Reported-at: https://github.com/ovn-org/ovn/issues/112
 Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2188679
 Signed-off-by: Dumitru Ceara 

>>>
>>> Hi Dumitru,
>>>
>>>
 ---
  include/ovn/features.h |  2 ++
  lib/actions.c  |  6 ++
  lib/features.c | 49 +-
  3 files changed, 47 insertions(+), 10 deletions(-)

 diff --git a/include/ovn/features.h b/include/ovn/features.h
 index de8f1f5489..3bf536127f 100644
 --- a/include/ovn/features.h
 +++ b/include/ovn/features.h
 @@ -34,12 +34,14 @@ enum ovs_feature_support_bits {
  OVS_CT_ZERO_SNAT_SUPPORT_BIT,
  OVS_DP_METER_SUPPORT_BIT,
  OVS_CT_TUPLE_FLUSH_BIT,
 +OVS_DP_HASH_L4_SYM_BIT,
  };

  enum ovs_feature_value {
  OVS_CT_ZERO_SNAT_SUPPORT = (1 << OVS_CT_ZERO_SNAT_SUPPORT_BIT),
  OVS_DP_METER_SUPPORT = (1 << OVS_DP_METER_SUPPORT_BIT),
  OVS_CT_TUPLE_FLUSH_SUPPORT = (1 << OVS_CT_TUPLE_FLUSH_BIT),
 +OVS_DP_HASH_L4_SYM_SUPPORT = (1 << OVS_DP_HASH_L4_SYM_BIT),
  };

  void ovs_feature_support_destroy(void);
 diff --git a/lib/actions.c b/lib/actions.c
 index 037172e606..9d52cb75a8 100644
 --- a/lib/actions.c
 +++ b/lib/actions.c
 @@ -1625,6 +1625,12 @@ encode_SELECT(const struct ovnact_select
> *select,
  struct ds ds = DS_EMPTY_INITIALIZER;
  ds_put_format(, "type=select,selection_method=dp_hash");

 +if (ovs_feature_is_supported(OVS_DP_HASH_L4_SYM_SUPPORT)) {
 +/* Select dp-hash l4_symmetric by setting the upper 32bits of
 + * selection_method_param to 1: */

>>>
>>> This comment is a bit unfortunate, because it reads like you want to set
>>> all bits of the upper half
>>> to 1 e.g. 0x. Maybe change it to: "selection_method_param to
> value
>>> 1 (1 << 32)." during merge, wdyt?
>>>
>>>
>>
>> Makes sense, thanks for the suggestion!  I changed it accordingly.
>>
 +ds_put_cstr(, ",selection_method_param=0x1");
 +}
 +
  struct mf_subfield sf = expr_resolve_field(>res_field);

  for (size_t bucket_id = 0; bucket_id < select->n_dsts;
> bucket_id++) {
 diff --git a/lib/features.c b/lib/features.c
 index 97c9c86f00..d24e8f6c5c 100644
 --- a/lib/features.c
 +++ b/lib/features.c
 @@ -21,6 +21,7 @@
  #include "lib/dirs.h"
  #include "socket-util.h"
  #include "lib/vswitch-idl.h"
 +#include "odp-netlink.h"
  #include "openvswitch/vlog.h"
  #include "openvswitch/ofpbuf.h"
  #include "openvswitch/rconn.h"
 @@ -33,20 +34,48 @@ VLOG_DEFINE_THIS_MODULE(features);

  #define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5

 +/* Parses 'cap_name' from 'ovs_capabilities' and returns whether the
 + * type of capability is supported or not. */
 +typedef bool ovs_feature_parse_func(const struct smap
> *ovs_capabilities,
 +const char *cap_name);
 +
  struct ovs_feature {
  enum ovs_feature_value value;
  const char *name;
 +ovs_feature_parse_func *parse;
  };

 +static bool
 +bool_parser(const struct smap *ovs_capabilities, const char *cap_name)
 +{
 +return smap_get_bool(ovs_capabilities, cap_name, false);
 +}
 +
 +static bool
 +dp_hash_l4_sym_support_parser(const struct smap *ovs_capabilities,
 +  const char *cap_name OVS_UNUSED)
 +{
 +int max_hash_alg = smap_get_int(ovs_capabilities, "max_hash_alg",
> 0);
 +
 +return max_hash_alg == OVS_HASH_ALG_SYM_L4;
 +}
 +
  static struct ovs_feature all_ovs_features[] = {
  {
  .value = OVS_CT_ZERO_SNAT_SUPPORT,
 -.name = "ct_zero_snat"
 +.name = "ct_zero_snat",
 +.parse = bool_parser,
  },
  {
  .value = OVS_CT_TUPLE_FLUSH_SUPPORT,
 -.name = "ct_flush"
 -}
 +.name = "ct_flush",
 +.parse = bool_parser,
 +},
 +{
 +.value = OVS_DP_HASH_L4_SYM_SUPPORT,
 +.name = "dp_hash_l4_sym_support",
 +.parse = dp_hash_l4_sym_support_parser,
 +},
  };

  /* A bitmap of OVS features that have been detected as 'supported'. */
 @@ -65,6 +94,7 @@ ovs_feature_is_valid(enum ovs_feature_value feature)
  case OVS_CT_ZERO_SNAT_SUPPORT:
  case OVS_DP_METER_SUPPORT:
  case 

Re: [ovs-dev] [PATCH ovn v2 6/8] northd: Handle load balancer group changes for a logical switch.

2023-07-18 Thread Ilya Maximets
On 7/18/23 05:32, Han Zhou wrote:
> 
> 
> On Fri, Jul 7, 2023 at 1:55 PM mailto:num...@ovn.org>> wrote:
>>
>> From: Numan Siddique mailto:num...@ovn.org>>
>>
>> For every a given load balancer group 'A', northd engine data maintains
>> a bitmap of datapaths associated to this lb group.  So when lb group 'A'
>> gets associated to a logical switch 's1', the bitmap index of 's1' is set
>> in its bitmap.
>>
>> In order to handle the load balancer group changes incrementally for a
>> logical switch, we need to set and clear the bitmap bits accordingly.
>> And this patch does it.
>>
>> Signed-off-by: Numan Siddique mailto:num...@ovn.org>>
>> ---
>>  lib/lb.c            |  45 +++---
>>  lib/lb.h            |  33 ++
>>  northd/northd.c     | 109 ++--
>>  tests/ovn-northd.at  |   6 +--
>>  4 files changed, 150 insertions(+), 43 deletions(-)
>>
>> diff --git a/lib/lb.c b/lib/lb.c
>> index b2b6473c1d..a0426cc42e 100644
>> --- a/lib/lb.c
>> +++ b/lib/lb.c
>> @@ -1100,7 +1100,7 @@ ovn_lb_datapaths_add_ls(struct ovn_lb_datapaths 
>> *lb_dps, size_t n,
>>
>>  void
>>  ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths *lb_dps, size_t n,
>> -                        struct ovn_datapath **ods)
>> +                           struct ovn_datapath **ods)
>>  {
>>      for (size_t i = 0; i < n; i++) {
>>          if (bitmap_is_set(lb_dps->nb_ls_map, ods[i]->index)) {
>> @@ -1112,14 +1112,14 @@ ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths 
>> *lb_dps, size_t n,
>>
>>  struct ovn_lb_group_datapaths *
>>  ovn_lb_group_datapaths_create(const struct ovn_lb_group *lb_group,
>> -                              size_t max_ls_datapaths,
>> -                              size_t max_lr_datapaths)
>> +                              size_t n_ls_datapaths,
>> +                              size_t n_lr_datapaths)
>>  {
>>      struct ovn_lb_group_datapaths *lb_group_dps =
>>          xzalloc(sizeof *lb_group_dps);
>>      lb_group_dps->lb_group = lb_group;
>> -    lb_group_dps->ls = xmalloc(max_ls_datapaths * sizeof *lb_group_dps->ls);
>> -    lb_group_dps->lr = xmalloc(max_lr_datapaths * sizeof *lb_group_dps->lr);
>> +    lb_group_dps->nb_ls_map = bitmap_allocate(n_ls_datapaths);
>> +    lb_group_dps->nb_lr_map = bitmap_allocate(n_lr_datapaths);
>>
>>      return lb_group_dps;
>>  }
>> @@ -1127,8 +1127,8 @@ ovn_lb_group_datapaths_create(const struct 
>> ovn_lb_group *lb_group,
>>  void
>>  ovn_lb_group_datapaths_destroy(struct ovn_lb_group_datapaths *lb_group_dps)
>>  {
>> -    free(lb_group_dps->ls);
>> -    free(lb_group_dps->lr);
>> +    bitmap_free(lb_group_dps->nb_lr_map);
>> +    bitmap_free(lb_group_dps->nb_ls_map);
>>      free(lb_group_dps);
>>  }
>>
>> @@ -1146,3 +1146,34 @@ ovn_lb_group_datapaths_find(const struct hmap 
>> *lb_group_dps_map,
>>      }
>>      return NULL;
>>  }
>> +
>> +void
>> +ovn_lb_group_datapaths_add_ls(struct ovn_lb_group_datapaths *lbg_dps, 
>> size_t n,
>> +                              struct ovn_datapath **ods)
>> +{
>> +    for (size_t i = 0; i < n; i++) {
>> +        if (!bitmap_is_set(lbg_dps->nb_ls_map, ods[i]->index)) {
>> +            bitmap_set1(lbg_dps->nb_ls_map, ods[i]->index);
>> +            lbg_dps->n_nb_ls++;
>> +        }
>> +    }
>> +}
>> +
>> +void
>> +ovn_lb_group_datapaths_remove_ls(struct ovn_lb_group_datapaths *lbg_dps,
>> +                                 size_t n, struct ovn_datapath **ods)
>> +{
>> +    for (size_t i = 0; i < n; i++) {
>> +        if (bitmap_is_set(lbg_dps->nb_ls_map, ods[i]->index)) {
>> +            bitmap_set0(lbg_dps->nb_ls_map, ods[i]->index);
>> +            lbg_dps->n_nb_ls--;
>> +        }
>> +    }
>> +}
>> +
>> +void
>> +ovn_lb_group_datapaths_add_lr(struct ovn_lb_group_datapaths *lbg_dps,
>> +                              struct ovn_datapath *lr)
>> +{
>> +    bitmap_set1(lbg_dps->nb_lr_map, lr->index);
>> +}
>> diff --git a/lib/lb.h b/lib/lb.h
>> index ac3a7f..08723e8ef3 100644
>> --- a/lib/lb.h
>> +++ b/lib/lb.h
>> @@ -19,6 +19,7 @@
>>
>>  #include 
>>  #include 
>> +#include "lib/bitmap.h"
>>  #include "lib/smap.h"
>>  #include "openvswitch/hmap.h"
>>  #include "ovn-util.h"
>> @@ -179,6 +180,9 @@ void ovn_lb_datapaths_add_lr(struct ovn_lb_datapaths *, 
>> size_t n,
>>                               struct ovn_datapath **);
>>  void ovn_lb_datapaths_add_ls(struct ovn_lb_datapaths *, size_t n,
>>                               struct ovn_datapath **);
>> +void ovn_lb_datapaths_set_ls(struct ovn_lb_datapaths *, size_t n,
>> +                             struct ovn_datapath **);
>> +
>>  void ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths *, size_t n,
>>                                  struct ovn_datapath **);
>>
>> @@ -188,10 +192,11 @@ struct ovn_lb_group_datapaths {
>>      const struct ovn_lb_group *lb_group;
>>
>>      /* Datapaths to which 'lb_group' is applied. */
>> -    size_t n_ls;
>> -    struct ovn_datapath **ls;
>> -    size_t n_lr;
>> -    

[ovs-dev] [PATCH v2] vswitchd: Wait for a bridge exit before replying to exit unixctl.

2023-07-18 Thread Ilya Maximets
Before the cleanup option, the bridge_exit() call was fairly fast,
because it didn't include any particularly long operations.  However,
with the cleanup flag, this function destroys a lot of datapath
resources freeing a lot of memory, waiting on RCU and talking to
the kernel.  That may take a noticeable amount of time, especially
on a busy system or under profilers/sanitizers.  However, the unixctl
'exit' command replies instantly without waiting for any work to
actually be done.  This may cause system test failures or other
issues where scripts expect ovs-vswitchd to exit or destroy all the
datapath resources shortly after appctl call.

Fix that by waiting for the bridge_exit() before replying to the user.
At least, all the datapath resources will actually be destroyed by
the time ovs-appctl exits.

Also moving a structure from stack to global.  Seems cleaner this way.

Since we're not replying right away and it's technically possible
to have multiple clients requesting exit at the same time, storing
connections in an array.

Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' 
command")
Signed-off-by: Ilya Maximets 
---

Version 2:
  - Added handling of multiple connections.

 vswitchd/ovs-vswitchd.c | 46 -
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index a244d2f70..273af9f5d 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -68,19 +68,19 @@ static unixctl_cb_func ovs_vswitchd_exit;
 static char *parse_options(int argc, char *argv[], char **unixctl_path);
 OVS_NO_RETURN static void usage(void);
 
-struct ovs_vswitchd_exit_args {
-bool *exiting;
-bool *cleanup;
-};
+static struct ovs_vswitchd_exit_args {
+struct unixctl_conn **conns;
+size_t n_conns;
+bool exiting;
+bool cleanup;
+} exit_args;
 
 int
 main(int argc, char *argv[])
 {
-char *unixctl_path = NULL;
 struct unixctl_server *unixctl;
+char *unixctl_path = NULL;
 char *remote;
-bool exiting, cleanup;
-struct ovs_vswitchd_exit_args exit_args = {, };
 int retval;
 
 set_program_name(argv[0]);
@@ -111,14 +111,12 @@ main(int argc, char *argv[])
 exit(EXIT_FAILURE);
 }
 unixctl_command_register("exit", "[--cleanup]", 0, 1,
- ovs_vswitchd_exit, _args);
+ ovs_vswitchd_exit, NULL);
 
 bridge_init(remote);
 free(remote);
 
-exiting = false;
-cleanup = false;
-while (!exiting) {
+while (!exit_args.exiting) {
 OVS_USDT_PROBE(main, run_start);
 memory_run();
 if (memory_should_report()) {
@@ -137,16 +135,22 @@ main(int argc, char *argv[])
 bridge_wait();
 unixctl_server_wait(unixctl);
 netdev_wait();
-if (exiting) {
+if (exit_args.exiting) {
 poll_immediate_wake();
 }
 OVS_USDT_PROBE(main, poll_block);
 poll_block();
 if (should_service_stop()) {
-exiting = true;
+exit_args.exiting = true;
 }
 }
-bridge_exit(cleanup);
+bridge_exit(exit_args.cleanup);
+
+for (size_t i = 0; i < exit_args.n_conns; i++) {
+unixctl_command_reply(exit_args.conns[i], NULL);
+}
+free(exit_args.conns);
+
 unixctl_server_destroy(unixctl);
 service_stop();
 vlog_disable_async();
@@ -304,10 +308,14 @@ usage(void)
 
 static void
 ovs_vswitchd_exit(struct unixctl_conn *conn, int argc,
-  const char *argv[], void *exit_args_)
+  const char *argv[], void *args OVS_UNUSED)
 {
-struct ovs_vswitchd_exit_args *exit_args = exit_args_;
-*exit_args->exiting = true;
-*exit_args->cleanup = argc == 2 && !strcmp(argv[1], "--cleanup");
-unixctl_command_reply(conn, NULL);
+exit_args.n_conns++;
+exit_args.conns = xrealloc(exit_args.conns,
+   exit_args.n_conns * sizeof *exit_args.conns);
+exit_args.conns[exit_args.n_conns - 1] = conn;
+exit_args.exiting = true;
+if (!exit_args.cleanup) {
+exit_args.cleanup = argc == 2 && !strcmp(argv[1], "--cleanup");
+}
 }
-- 
2.40.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] OpenFlow: Add extn to set conntrack entries limit per zone.

2023-07-18 Thread Ales Musil
On Fri, May 5, 2023 at 3:20 PM Ilya Maximets  wrote:

> On 5/5/23 14:06, Naveen Yerramneni wrote:
> > Hi IIya,
> >
> > Thanks for the review.
> >
> > One question, once vswitchd DB schema is updated to store zone level
> limits. OVN controller should directly update the configuration in the
> database , right ?
>
> Right.  ovn-controller already has a database connection,
> so that should not be a problem.
>
> Best regards, Ilya Maximets.
>
> >
> > Thanks,
> > Naveen
> >
> > On 04-May-2023, at 3:33 PM, Ilya Maximets  > wrote:
> >
> > On 3/30/23 10:17, Naveen Yerramneni wrote:
> >
> > Add OpenFlow extn to set conntrack entries limit per zone.
> > This extn will be used in future to set the zone level limit for
> > drop zones used by OVN.
> >
> > Signed-off-by: Naveen Yerramneni  >
> > Reviewed-by: Simon Horman  simon.hor...@corigine.com>>
> > ---
> > Notes:
> >  v1 -> v2
> >  - Fix memory leak and added logs
> >  v2 -> v3
> >  - Addressed nits
> >  v3 -> v4
> >  - Updated change description
> >
> > NEWS   |  2 ++
> > include/openflow/nicira-ext.h  | 10 ++
> > include/openvswitch/ofp-msgs.h |  4 
> > lib/ofp-bundle.c   |  1 +
> > lib/ofp-print.c| 11 +++
> > lib/rconn.c|  1 +
> > ofproto/ofproto-dpif.c | 21 +
> > ofproto/ofproto-provider.h |  4 
> > ofproto/ofproto.c  | 25 +
> > tests/ofp-print.at  | 10
> ++
> > tests/ovs-ofctl.at  | 12
> 
> > utilities/ovs-ofctl.8.in    |  5
> +
> > utilities/ovs-ofctl.c  | 34
> ++
> > 13 files changed, 140 insertions(+)
> >
> > diff --git a/NEWS b/NEWS
> > index fe6055a27..f6ae60856 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -32,6 +32,8 @@ v3.1.0 - xx xxx 
> >- OpenFlow:
> >  * New OpenFlow extension NXT_CT_FLUSH to flush connections
> matching
> >the specified fields.
> > + * New OpenFlow extension NXT_CT_SET_ZONE_LIMIT to set
> conntrack table
> > +   limit at zone level.
> >
> >
> > Hi, Naveen.  Sorry for the late reply, but I don't think this
> functionality
> > should be implemented within OpenFlow interface.
> >
> > OpenFlow is not great for configurations that should preserve the
> state on
> > re-start, for example.  CT_FLUSH is reasonable to implement via
> OpenFlow,
> > because it is a one-shot stateless command.  But limits are
> stateful.  We
> > should be able to request a current value and save and restore the
> > configuration on re-start with ovs-save script.  This is not ideal.
> >
> > What we should do instead is to naturally extend the existing
> database
> > configuration that already supports per-zone configuration of
> conntrack
> > timeouts.  That will be more organic within OVS and will solve the
> problem
> > with configuration persistence.
> >
> > See the CT_Zone and CT_Timeout_Policy database columns.
> >
> > Best regards, Ilya Maximets.
> >
> >
> >
>
>
Hi Naveen,

it has been a while since the last update. Do you still plan to work on
this?
We have a request for this feature in OVN, please let me know if not and
I'll
continue the work myself.

Thank you.
Best regards,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.comIM: amusil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-controller: Detect and use L4_SYM dp-hash if available.

2023-07-18 Thread Han Zhou
On Mon, Jul 17, 2023 at 9:51 PM Dumitru Ceara  wrote:
>
> On 7/14/23 08:41, Ales Musil wrote:
> > On Thu, Jul 13, 2023 at 4:39 PM Dumitru Ceara  wrote:
> >
> >> Regular dp-hash is not a canonical L4 hash (at least with the netlink
> >> datapath).  If the datapath supports l4 symmetrical dp-hash use that
one
> >> instead.
> >>
> >> Reported-at: https://github.com/ovn-org/ovn/issues/112
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2188679
> >> Signed-off-by: Dumitru Ceara 
> >>
> >
> > Hi Dumitru,
> >
> >
> >> ---
> >>  include/ovn/features.h |  2 ++
> >>  lib/actions.c  |  6 ++
> >>  lib/features.c | 49 +-
> >>  3 files changed, 47 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/include/ovn/features.h b/include/ovn/features.h
> >> index de8f1f5489..3bf536127f 100644
> >> --- a/include/ovn/features.h
> >> +++ b/include/ovn/features.h
> >> @@ -34,12 +34,14 @@ enum ovs_feature_support_bits {
> >>  OVS_CT_ZERO_SNAT_SUPPORT_BIT,
> >>  OVS_DP_METER_SUPPORT_BIT,
> >>  OVS_CT_TUPLE_FLUSH_BIT,
> >> +OVS_DP_HASH_L4_SYM_BIT,
> >>  };
> >>
> >>  enum ovs_feature_value {
> >>  OVS_CT_ZERO_SNAT_SUPPORT = (1 << OVS_CT_ZERO_SNAT_SUPPORT_BIT),
> >>  OVS_DP_METER_SUPPORT = (1 << OVS_DP_METER_SUPPORT_BIT),
> >>  OVS_CT_TUPLE_FLUSH_SUPPORT = (1 << OVS_CT_TUPLE_FLUSH_BIT),
> >> +OVS_DP_HASH_L4_SYM_SUPPORT = (1 << OVS_DP_HASH_L4_SYM_BIT),
> >>  };
> >>
> >>  void ovs_feature_support_destroy(void);
> >> diff --git a/lib/actions.c b/lib/actions.c
> >> index 037172e606..9d52cb75a8 100644
> >> --- a/lib/actions.c
> >> +++ b/lib/actions.c
> >> @@ -1625,6 +1625,12 @@ encode_SELECT(const struct ovnact_select
*select,
> >>  struct ds ds = DS_EMPTY_INITIALIZER;
> >>  ds_put_format(, "type=select,selection_method=dp_hash");
> >>
> >> +if (ovs_feature_is_supported(OVS_DP_HASH_L4_SYM_SUPPORT)) {
> >> +/* Select dp-hash l4_symmetric by setting the upper 32bits of
> >> + * selection_method_param to 1: */
> >>
> >
> > This comment is a bit unfortunate, because it reads like you want to set
> > all bits of the upper half
> > to 1 e.g. 0x. Maybe change it to: "selection_method_param to
value
> > 1 (1 << 32)." during merge, wdyt?
> >
> >
>
> Makes sense, thanks for the suggestion!  I changed it accordingly.
>
> >> +ds_put_cstr(, ",selection_method_param=0x1");
> >> +}
> >> +
> >>  struct mf_subfield sf = expr_resolve_field(>res_field);
> >>
> >>  for (size_t bucket_id = 0; bucket_id < select->n_dsts;
bucket_id++) {
> >> diff --git a/lib/features.c b/lib/features.c
> >> index 97c9c86f00..d24e8f6c5c 100644
> >> --- a/lib/features.c
> >> +++ b/lib/features.c
> >> @@ -21,6 +21,7 @@
> >>  #include "lib/dirs.h"
> >>  #include "socket-util.h"
> >>  #include "lib/vswitch-idl.h"
> >> +#include "odp-netlink.h"
> >>  #include "openvswitch/vlog.h"
> >>  #include "openvswitch/ofpbuf.h"
> >>  #include "openvswitch/rconn.h"
> >> @@ -33,20 +34,48 @@ VLOG_DEFINE_THIS_MODULE(features);
> >>
> >>  #define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
> >>
> >> +/* Parses 'cap_name' from 'ovs_capabilities' and returns whether the
> >> + * type of capability is supported or not. */
> >> +typedef bool ovs_feature_parse_func(const struct smap
*ovs_capabilities,
> >> +const char *cap_name);
> >> +
> >>  struct ovs_feature {
> >>  enum ovs_feature_value value;
> >>  const char *name;
> >> +ovs_feature_parse_func *parse;
> >>  };
> >>
> >> +static bool
> >> +bool_parser(const struct smap *ovs_capabilities, const char *cap_name)
> >> +{
> >> +return smap_get_bool(ovs_capabilities, cap_name, false);
> >> +}
> >> +
> >> +static bool
> >> +dp_hash_l4_sym_support_parser(const struct smap *ovs_capabilities,
> >> +  const char *cap_name OVS_UNUSED)
> >> +{
> >> +int max_hash_alg = smap_get_int(ovs_capabilities, "max_hash_alg",
0);
> >> +
> >> +return max_hash_alg == OVS_HASH_ALG_SYM_L4;
> >> +}
> >> +
> >>  static struct ovs_feature all_ovs_features[] = {
> >>  {
> >>  .value = OVS_CT_ZERO_SNAT_SUPPORT,
> >> -.name = "ct_zero_snat"
> >> +.name = "ct_zero_snat",
> >> +.parse = bool_parser,
> >>  },
> >>  {
> >>  .value = OVS_CT_TUPLE_FLUSH_SUPPORT,
> >> -.name = "ct_flush"
> >> -}
> >> +.name = "ct_flush",
> >> +.parse = bool_parser,
> >> +},
> >> +{
> >> +.value = OVS_DP_HASH_L4_SYM_SUPPORT,
> >> +.name = "dp_hash_l4_sym_support",
> >> +.parse = dp_hash_l4_sym_support_parser,
> >> +},
> >>  };
> >>
> >>  /* A bitmap of OVS features that have been detected as 'supported'. */
> >> @@ -65,6 +94,7 @@ ovs_feature_is_valid(enum ovs_feature_value feature)
> >>  case OVS_CT_ZERO_SNAT_SUPPORT:
> >>  case OVS_DP_METER_SUPPORT:
> >>  case OVS_CT_TUPLE_FLUSH_SUPPORT:
> >> +case OVS_DP_HASH_L4_SYM_SUPPORT:

[ovs-dev] [PATCH ovn 1/3] tests: run system tests also with monitor-all=true

2023-07-18 Thread Xavier Simonart
Before this patch, each system test was supposed to run four times:
- with and without northd-parallelization
- with and without monitor-all
However, while the titles of the test were updated, system tests were never
run with monitor-all=true (i.e. they were run twice w/o monitor-all).

This is now fixed, and the four flavors of the tssts are run.

Signed-off-by: Xavier Simonart 
---
 tests/system-dpdk-macros.at  | 4 
 tests/system-kmod-macros.at  | 3 +++
 tests/system-userspace-macros.at | 3 +++
 3 files changed, 10 insertions(+)

diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
index 8843baa25..440908af7 100644
--- a/tests/system-dpdk-macros.at
+++ b/tests/system-dpdk-macros.at
@@ -47,6 +47,10 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
dnl Add bridges, ports, etc.
OVS_WAIT_WHILE([ip link show br0])
AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| 
uuidfilt])], [0], [$2])
+
+   if test OVN_MONITOR_ALL = yes; then
+ovs-vsctl set open . external_ids:ovn-monitor-all=true
+   fi
 ])
 
 # OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 787a59c97..6f6670199 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -27,6 +27,9 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
_OVS_VSWITCHD_START([])
dnl Add bridges, ports, etc.
AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| 
uuidfilt])], [0], [$2])
+   if test OVN_MONITOR_ALL = yes; then
+ovs-vsctl set open . external_ids:ovn-monitor-all=true
+   fi
 ])
 
 # OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at
index 86ed22622..73ca2cce3 100644
--- a/tests/system-userspace-macros.at
+++ b/tests/system-userspace-macros.at
@@ -19,6 +19,9 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
dnl Add bridges, ports, etc.
OVS_WAIT_WHILE([ip link show br0])
AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| 
uuidfilt])], [0], [$2])
+   if test OVN_MONITOR_ALL = yes; then
+ovs-vsctl set open . external_ids:ovn-monitor-all=true
+   fi
 ])
 
 # OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 3/3] ovn-controller: remove un-necessary "trying to release" warnings

2023-07-18 Thread Xavier Simonart
When a port is added and deleted in the same loop by ovn-controller
this was causing the following warning
if_status|WARN|Trying to release unknown interface
We now avoid the warning in that ascenario.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=252
Signed-off-by: Xavier Simonart 
---
 controller/binding.c| 12 ++--
 controller/if-status.c  | 12 
 controller/if-status.h  |  2 ++
 tests/ofproto-macros.at |  4 ++--
 tests/ovn.at| 27 +++
 5 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index f619aba2e..bc7adb048 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -2682,7 +2682,13 @@ handle_deleted_lport(const struct sbrec_port_binding *pb,
 if (ld) {
 remove_pb_from_local_datapath(pb,
   b_ctx_out, ld);
-if_status_mgr_release_iface(b_ctx_out->if_mgr, pb->logical_port);
+/* Only try to release the port if it was ever claimed.
+ * If a port was added and deleted within the same ovn-controller loop,
+ * it is seen as never claimed.
+ */
+if (if_status_is_port_claimed(b_ctx_out->if_mgr, pb->logical_port)) {
+if_status_mgr_release_iface(b_ctx_out->if_mgr, pb->logical_port);
+}
 return;
 }
 
@@ -2706,7 +2712,9 @@ handle_deleted_lport(const struct sbrec_port_binding *pb,
 remove_pb_from_local_datapath(pb, b_ctx_out,
   ld);
 }
-if_status_mgr_release_iface(b_ctx_out->if_mgr, pb->logical_port);
+if (if_status_is_port_claimed(b_ctx_out->if_mgr, pb->logical_port)) {
+if_status_mgr_release_iface(b_ctx_out->if_mgr, pb->logical_port);
+}
 }
 }
 
diff --git a/controller/if-status.c b/controller/if-status.c
index b45208746..faf4e1f67 100644
--- a/controller/if-status.c
+++ b/controller/if-status.c
@@ -820,3 +820,15 @@ if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
 simap_increase(usage, "if_status_mgr_ifaces_state_usage-KB",
ROUND_UP(ifaces_state_usage, 1024) / 1024);
 }
+
+bool
+if_status_is_port_claimed(const struct if_status_mgr *mgr,
+  const char *iface_id)
+{
+struct ovs_iface *iface = shash_find_data(>ifaces, iface_id);
+if (!iface || (iface->state > OIF_INSTALLED)) {
+return false;
+} else {
+return true;
+}
+}
diff --git a/controller/if-status.h b/controller/if-status.h
index 15624bcfa..9714f6d8d 100644
--- a/controller/if-status.h
+++ b/controller/if-status.h
@@ -62,5 +62,7 @@ uint16_t if_status_mgr_iface_get_mtu(const struct 
if_status_mgr *mgr,
  const char *iface_id);
 bool if_status_mgr_iface_update(const struct if_status_mgr *mgr,
 const struct ovsrec_interface *iface_rec);
+bool if_status_is_port_claimed(const struct if_status_mgr *mgr,
+   const char *iface_id);
 
 # endif /* controller/if-status.h */
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index b0101330f..07ef1d092 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -250,11 +250,11 @@ m4_define([OVS_VSWITCHD_START],
 
 # check_logs scans through all *.log files (except '*.log' and 
'*testsuite.log')
 # and reports all WARN, ERR, EMER log entries.  User can add custom sed filters
-# in $1.
+# in $1 and select folder with $2.
 m4_divert_push([PREPARE_TESTS])
 check_logs () {
 local logs
-for log in *.log; do
+for log in ./$2/*.log; do
 case ${log} in # (
 '*.log'|*testsuite.log) ;; # (
 *) logs="${logs} ${log}" ;;
diff --git a/tests/ovn.at b/tests/ovn.at
index f2148faac..f57e0f263 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -36568,3 +36568,30 @@ OVS_WAIT_UNTIL([test $(as hv-1 ovs-vsctl list qos | 
grep -c linux-htb) -eq 1])
 
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([Add and delete port])
+ovn_start
+
+net_add n1
+
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.11
+
+check ovn-nbctl ls-add ls0
+check ovn-nbctl --wait=hv lsp-add ls0 lsp1
+as hv1 check ovs-vsctl -- add-port br-int lsp1 -- \
+set Interface lsp1 external-ids:iface-id=lsp1
+
+wait_for_ports_up
+sleep_controller hv1
+check ovn-nbctl --wait=sb lsp-add ls0 lsp0
+check ovn-nbctl --wait=sb lsp-del lsp0
+wake_up_controller hv1
+AT_CHECK([check_logs [""] hv1])
+OVN_CLEANUP([hv1])
+
+AT_CLEANUP
+])
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/3] ovn-controller: avoid monitoring wrong chassis

2023-07-18 Thread Xavier Simonart
To register a chassis to ovn-sb, ovn-controller create a "name-uuid"
for the chassis, which it will send to ovn-sb. In return, ovn-sb creates
the uuid based on that "name".
Using this "name-uuid" as a uuid in a monitor request to ovn-sb results in
monitoring a non-existing chassis.
The system returns on track as soon as ovn-controller receives the chassis
update from ovn-sb, as it will then send a new monitor request to ovn-sb,
with the proper chassis uuid.
However, if some ports were already in ovn-sb when it receives the monitor 
request
with the wrong uuid, ovn-sb will notify ovn-controller to delete all those 
ports.
This cause the following sequence in ovn-controller
- notification of a new port
- deletion of that port
- notification of a new port

We now prevent sending wrong uuid to ovn-sb.

This issue was causing test "testing load-balancer template IPv6" to fail
in a flaky way, due to the following warning:
if_status|WARN|Trying to release unknown interface vm2

Signed-off-by: Xavier Simonart 
---
 controller/chassis.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index ce88541ba..d3c944973 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -831,7 +831,11 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 }
 
 ovs_chassis_cfg_destroy(_cfg);
-return chassis_rec;
+if (existed) {
+return chassis_rec;
+} else {
+return NULL;
+}
 }
 
 bool
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/3] Unexpected warning "Trying to release unknown

2023-07-18 Thread Xavier Simonart
Xavier Simonart (3):
  tests: run system tests also with monitor-all=true
  ovn-controller: avoid monitoring wrong chassis
  ovn-controller: remove un-necessary "trying to release" warnings

 controller/binding.c | 12 ++--
 controller/chassis.c |  6 +-
 controller/if-status.c   | 12 
 controller/if-status.h   |  2 ++
 tests/ofproto-macros.at  |  4 ++--
 tests/ovn.at | 27 +++
 tests/system-dpdk-macros.at  |  4 
 tests/system-kmod-macros.at  |  3 +++
 tests/system-userspace-macros.at |  3 +++
 9 files changed, 68 insertions(+), 5 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovsdb: raft: Support pre-vote mechanism to deal with disruptive server.

2023-07-18 Thread Han Zhou
When a server becomes unstable due to system overloading or intermittent
partitioning, it may miss some heartbeats and then starts election with
a new term, which would disrupt the otherwise healthy cluster formed by
the rest of the healthy nodes.  Such situation may exist for a long time
until the "flapping" server is shutdown or recovered completely, which
can severely impact the availability of the cluster. The pre-vote
mechanism introduced in the raft paper section 9.6 can prevent such
problems. This patch implements the pre-vote mechanism.

Note: during the upgrade, since the old version doesn't recognize the
new optional field in the vote rpc (and the ovsdb_parse_finish validates
that all fields in the jsonrpc are parsed), an error log may be noticed
on old nodes if an upgraded node happens to become candidate first and
vote for itself, and the vote request will be discarded. If this happens
before enough nodes complete the upgrade, the vote from the upgraded
node may not reach the quorum. This results in re-election, and any old
nodes should be able to vote and get elected as leader. So, in unlucky
cases there can be more leader elections happening during the upgrade.

Signed-off-by: Han Zhou 
---
v1 -> v2: Address comments from Ilya.

 NEWS   |  7 +++-
 ovsdb/raft-rpc.c   | 22 ++-
 ovsdb/raft-rpc.h   |  3 ++
 ovsdb/raft.c   | 88 ++
 tests/ovsdb-cluster.at | 42 
 5 files changed, 135 insertions(+), 27 deletions(-)

diff --git a/NEWS b/NEWS
index 7a852427e517..32b6acfbf9d9 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,11 @@
 Post-v3.2.0
 
-
+   - OVSDB:
+ * Support pre-vote mechanism in RAFT that protects the cluster against
+   disruptive servers (section 9.6 of the original RAFT paper).  Upgrading
+   from older version is supported but it may trigger more leader elections
+   during the process, and error logs complaining unrecognized fields may
+   be observed on old nodes.
 
 v3.2.0 - xx xxx 
 
diff --git a/ovsdb/raft-rpc.c b/ovsdb/raft-rpc.c
index dd14d81091fc..1529efbb7412 100644
--- a/ovsdb/raft-rpc.c
+++ b/ovsdb/raft-rpc.c
@@ -283,6 +283,10 @@ raft_vote_request_to_jsonrpc(const struct 
raft_vote_request *rq,
 json_object_put(args, "leadership_transfer",
 json_boolean_create(true));
 }
+if (rq->is_prevote) {
+json_object_put(args, "is_prevote",
+json_boolean_create(true));
+}
 }
 
 static void
@@ -294,6 +298,8 @@ raft_vote_request_from_jsonrpc(struct ovsdb_parser *p,
 rq->last_log_term = raft_parse_required_uint64(p, "last_log_term");
 rq->leadership_transfer
 = raft_parse_optional_boolean(p, "leadership_transfer") == 1;
+rq->is_prevote
+= raft_parse_optional_boolean(p, "is_prevote") == 1;
 }
 
 static void
@@ -305,6 +311,9 @@ raft_format_vote_request(const struct raft_vote_request 
*rq, struct ds *s)
 if (rq->leadership_transfer) {
 ds_put_cstr(s, " leadership_transfer=true");
 }
+if (rq->is_prevote) {
+ds_put_cstr(s, " is_prevote=true");
+}
 }
 
 /* raft_vote_reply. */
@@ -326,6 +335,9 @@ raft_vote_reply_to_jsonrpc(const struct raft_vote_reply 
*rpy,
 {
 raft_put_uint64(args, "term", rpy->term);
 json_object_put_format(args, "vote", UUID_FMT, UUID_ARGS(>vote));
+if (rpy->is_prevote) {
+json_object_put(args, "is_prevote", json_boolean_create(true));
+}
 }
 
 static void
@@ -334,6 +346,7 @@ raft_vote_reply_from_jsonrpc(struct ovsdb_parser *p,
 {
 rpy->term = raft_parse_required_uint64(p, "term");
 rpy->vote = raft_parse_required_uuid(p, "vote");
+rpy->is_prevote = raft_parse_optional_boolean(p, "is_prevote") == 1;
 }
 
 static void
@@ -341,6 +354,9 @@ raft_format_vote_reply(const struct raft_vote_reply *rpy, 
struct ds *s)
 {
 ds_put_format(s, " term=%"PRIu64, rpy->term);
 ds_put_format(s, " vote="SID_FMT, SID_ARGS(>vote));
+if (rpy->is_prevote) {
+ds_put_cstr(s, " is_prevote=true");
+}
 }
 
 /* raft_add_server_request */
@@ -1007,8 +1023,10 @@ raft_rpc_get_vote(const union raft_rpc *rpc)
 case RAFT_RPC_BECOME_LEADER:
 return NULL;
 
-case RAFT_RPC_VOTE_REPLY:
-return _vote_reply_cast(rpc)->vote;
+case RAFT_RPC_VOTE_REPLY: {
+const struct raft_vote_reply *rpy = raft_vote_reply_cast(rpc);
+return rpy->is_prevote ? NULL : >vote;
+}
 
 default:
 OVS_NOT_REACHED();
diff --git a/ovsdb/raft-rpc.h b/ovsdb/raft-rpc.h
index 221f24d00128..7677c35b4e04 100644
--- a/ovsdb/raft-rpc.h
+++ b/ovsdb/raft-rpc.h
@@ -125,12 +125,15 @@ struct raft_vote_request {
 uint64_t last_log_index; /* Index of candidate's last log entry. */
 uint64_t last_log_term;  /* Term of candidate's last log entry. */
 bool leadership_transfer;  /* True to override minimum election timeout. */
+bool