Re: [ovs-dev] [PATCH v4] ofproto-dpif-upcall: Mirror packets that are modified

2023-07-16 Thread Mike Pattrick
On Fri, Jul 14, 2023 at 7:31 AM Ilya Maximets  wrote:
>
> On 7/11/23 14:46, Eelco Chaudron wrote:
> >
> >
> > On 10 Jul 2023, at 17:34, Mike Pattrick wrote:
> >
> >> Currently OVS keeps track of which mirrors that each packet has been
> >> sent to for the purpose of deduplication. However, this doesn't consider
> >> that openflow rules can make significant changes to packets after
> >> ingress.
> >>
> >> For example, OVN can create OpenFlow rules that turn an echo request
> >> into an echo response by flipping source/destination addresses and
> >> setting the ICMP type to Reply. When a mirror is configured, only the
> >> request gets mirrored even though a response is recieved.
> >>
> >> This can cause a false impression of the actual traffic on wire if
> >> someone inspects the mirror and doesn't see an echo reply even though
> >> one has been sent.
> >>
> >> This patch resets the mirrors every time a packet is modified, so
> >> mirrors will recieve every copy of a packet that is sent for output.
> >>
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2155579
> >> Signed-off-by: Mike Pattrick 
> >
> > Thanks Mike for fixing my v3 comments. This patch looks good to me.
> >
> > Acked-by: Eelco Chaudron 
>
> Hi, Mike and Eelco.  The patch seems fine in general, but there are
> some inconsistencies, see below.
>
> >
> >> ---
> >>  v2: Refactored all code into a single function
> >>  v3: Cleaned up a code change that was incorrectly retained in v2 but
> >>  not needed
> >>  v4: Removed the default case from reset_mirror_ctx()
> >> ---
> >>  ofproto/ofproto-dpif-xlate.c | 86 
> >>  tests/ofproto-dpif.at|  6 +--
> >>  2 files changed, 89 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >> index 29f4daa63..aba0cdb6e 100644
> >> --- a/ofproto/ofproto-dpif-xlate.c
> >> +++ b/ofproto/ofproto-dpif-xlate.c
> >> @@ -6989,6 +6989,90 @@ xlate_ofpact_unroll_xlate(struct xlate_ctx *ctx,
> >>   "cookie=%#"PRIx64, a->rule_table_id, a->rule_cookie);
> >>  }
> >>
> >> +/* Reset the mirror context if we modify the packet and would like to 
> >> mirror
> >> + * the new copy. */
> >> +static void
> >> +reset_mirror_ctx(struct xlate_ctx *ctx, const struct flow *flow, uint8_t 
> >> type)
> >> +{
> >> +switch (type) {
> >> +case OFPACT_STRIP_VLAN:
> >> +case OFPACT_PUSH_VLAN:
> >> +case OFPACT_SET_ETH_SRC:
> >> +case OFPACT_SET_ETH_DST:
> >> +case OFPACT_PUSH_MPLS:
> >> +case OFPACT_POP_MPLS:
> >> +case OFPACT_SET_MPLS_LABEL:
> >> +case OFPACT_SET_MPLS_TC:
> >> +case OFPACT_SET_MPLS_TTL:
> >> +case OFPACT_DEC_MPLS_TTL:
> >> +case OFPACT_DEC_NSH_TTL:
> >> +case OFPACT_DEC_TTL:
> >> +case OFPACT_SET_FIELD:
>
> We clear the mask unconditionally here.  For other similar actions
> below we check prerequisites, but not for this one.  Seems a bit
> inconsistent.

We could add the set field prereq check here.

> Also, existence of the OF action doesn't actually mean that packet
> will be changed.  Either due to prerequisite checks or due to the
> value being changed to the same value.  The actual packet changes
> can only be detected after commit_odp_actions().
>
> Maybe that's OK to mirror duplicates in this case, I'm not sure.

Since the mirror action only activates on output action, there are a
limited number of scenarios where this could happen. Specifically when
the user configures a mirror on multiple interfaces and has one of
these redundant actions, or a mirror on a single interface but
redundant modification actions result in sending duplicate packets out
on it multiple times. It's debatable whether we should mirror a
duplicate packet in that case, failure to could be interpreted as an
error if the user who may be expecting to receive a copy of all
ingress and egress packets even if packet modifications have no
effect.

For example, broken OF flows may be inadvertently sending multiple
copies of a packet out an interface that should be modified. The user
in this case would see those packets on the remote end, but may be
confused as to why they only see one copy of the packet on the local
end when using a tool like ovs-tcpdump.

>
> >> +case OFPACT_SET_VLAN_VID:
> >> +case OFPACT_SET_VLAN_PCP:
> >> +case OFPACT_ENCAP:
> >> +case OFPACT_DECAP:
> >> +case OFPACT_NAT:
>
> The NAT is tricky, because on it's own it doesn't change the packet.
> NAT is applied during the CT action and CT action always forks the
> pipeline.  So, the packet will remain unchanged in the current and
> it will be modified in the forked pipeline.  But we're clearing the
> list of mirrors for both here.

From my understanding, the NAT action is evaluated inside the
recursion, but it is applied outside of the recursion and the table
action also takes effect outside of the recursion. I may be completely
wrong here, but if that's the case, then this acts 

Re: [ovs-dev] [PATCH ovn v2 5/8] northd: Handle load balancer changes for a logical switch.

2023-07-16 Thread Han Zhou
On Fri, Jul 7, 2023 at 1:54 PM  wrote:
>
> From: Numan Siddique 
>
> Logical switch change handler of the 'northd' engine node
> now also handles changes to load balancers.
>
> Signed-off-by: Numan Siddique 
> ---
>  lib/lb.c|  17 +++-
>  lib/lb.h|   2 +
>  northd/northd.c | 212 +---
>  northd/northd.h |   9 ++
>  tests/ovn-northd.at |   8 +-
>  5 files changed, 191 insertions(+), 57 deletions(-)
>
> diff --git a/lib/lb.c b/lib/lb.c
> index a51fe225fa..b2b6473c1d 100644
> --- a/lib/lb.c
> +++ b/lib/lb.c
> @@ -1091,7 +1091,22 @@ ovn_lb_datapaths_add_ls(struct ovn_lb_datapaths
*lb_dps, size_t n,
>  struct ovn_datapath **ods)
>  {
>  for (size_t i = 0; i < n; i++) {
> -bitmap_set1(lb_dps->nb_ls_map, ods[i]->index);
> +if (!bitmap_is_set(lb_dps->nb_ls_map, ods[i]->index)) {
> +bitmap_set1(lb_dps->nb_ls_map, ods[i]->index);
> +lb_dps->n_nb_ls++;
> +}
> +}
> +}
> +
> +void
> +ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths *lb_dps, size_t n,
> +struct ovn_datapath **ods)
> +{
> +for (size_t i = 0; i < n; i++) {
> +if (bitmap_is_set(lb_dps->nb_ls_map, ods[i]->index)) {
> +bitmap_set0(lb_dps->nb_ls_map, ods[i]->index);
> +lb_dps->n_nb_ls--;
> +}
>  }
>  }
>
> diff --git a/lib/lb.h b/lib/lb.h
> index 74905c73b7..ac3a7f 100644
> --- a/lib/lb.h
> +++ b/lib/lb.h
> @@ -179,6 +179,8 @@ void ovn_lb_datapaths_add_lr(struct ovn_lb_datapaths
*, size_t n,
>   struct ovn_datapath **);
>  void ovn_lb_datapaths_add_ls(struct ovn_lb_datapaths *, size_t n,
>   struct ovn_datapath **);
> +void ovn_lb_datapaths_remove_ls(struct ovn_lb_datapaths *, size_t n,
> +struct ovn_datapath **);
>
>  struct ovn_lb_group_datapaths {
>  struct hmap_node hmap_node;
> diff --git a/northd/northd.c b/northd/northd.c
> index fc98ab7bfd..bf02190f7c 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -853,7 +853,8 @@ ovn_datapath_destroy(struct hmap *datapaths, struct
ovn_datapath *od)
>  ovn_ls_port_group_destroy(>nb_pgs);
>  destroy_mcast_info_for_datapath(od);
>  destroy_ports_for_datapath(od);
> -
> +free(od->lb_uuids);
> +free(od->lb_group_uuids);
>  free(od);
>  }
>  }
> @@ -4080,6 +4081,48 @@ build_lb_vip_actions(const struct ovn_northd_lb
*lb,
>  return reject;
>  }
>
> +static void
> +associate_ls_lbs(struct ovn_datapath *ls_od, struct hmap
*lb_datapaths_map)
> +{
> +struct ovn_lb_datapaths *lb_dps;
> +
> +free(ls_od->lb_uuids);
> +ls_od->lb_uuids = xcalloc(ls_od->nbs->n_load_balancer,
> +  sizeof *ls_od->lb_uuids);
> +ls_od->n_lb_uuids = ls_od->nbs->n_load_balancer;
> +for (size_t i = 0; i < ls_od->nbs->n_load_balancer; i++) {
> +const struct uuid *lb_uuid =
> +_od->nbs->load_balancer[i]->header_.uuid;
> +lb_dps = ovn_lb_datapaths_find(lb_datapaths_map, lb_uuid);
> +ovs_assert(lb_dps);
> +ovn_lb_datapaths_add_ls(lb_dps, 1, _od);
> +ls_od->lb_uuids[i] = *lb_uuid;
> +}
> +}
> +
> +static void
> +associate_ls_lb_groups(struct ovn_datapath *ls_od,
> +   struct hmap *lb_group_datapaths_map)
> +{
> +const struct nbrec_load_balancer_group *nbrec_lb_group;
> +struct ovn_lb_group_datapaths *lb_group_dps;
> +
> +free(ls_od->lb_group_uuids);
> +ls_od->lb_group_uuids = xcalloc(ls_od->nbs->n_load_balancer_group,
> +sizeof *ls_od->lb_group_uuids);
> +ls_od->n_lb_group_uuids = ls_od->nbs->n_load_balancer_group;
> +for (size_t i = 0; i < ls_od->nbs->n_load_balancer_group; i++) {
> +nbrec_lb_group = ls_od->nbs->load_balancer_group[i];
> +const struct uuid *lb_group_uuid = _lb_group->header_.uuid;
> +lb_group_dps =
> +ovn_lb_group_datapaths_find(lb_group_datapaths_map,
> +lb_group_uuid);
> +ovs_assert(lb_group_dps);
> +ovn_lb_group_datapaths_add_ls(lb_group_dps, 1, _od);
> +ls_od->lb_group_uuids[i] = *lb_group_uuid;
> +}
> +}
> +
>  static void
>  build_lb_datapaths(const struct hmap *lbs, const struct hmap *lb_groups,
> struct ovn_datapaths *ls_datapaths,
> @@ -4115,24 +4158,8 @@ build_lb_datapaths(const struct hmap *lbs, const
struct hmap *lb_groups,
>  if (!od->nbs) {
>  continue;
>  }
> -
> -for (size_t i = 0; i < od->nbs->n_load_balancer; i++) {
> -const struct uuid *lb_uuid =
> ->nbs->load_balancer[i]->header_.uuid;
> -lb_dps = ovn_lb_datapaths_find(lb_datapaths_map, lb_uuid);
> -ovs_assert(lb_dps);
> -ovn_lb_datapaths_add_ls(lb_dps, 1, );
> -}
> -
> -for 

[ovs-dev] 答复: [PATCH] [PATCH] userspace: support vxlan and geneve tunnel tso

2023-07-16 Thread Dexia Li via dev
Ok,thanks for your opinions, I will try to correct it.


Best regards,Dexia

-邮件原件-
发件人: Ilya Maximets  
发送时间: 2023年7月15日 0:33
收件人: Dexia Li ; ovs-dev@openvswitch.org
抄送: i.maxim...@ovn.org
主题: Re: [ovs-dev] [PATCH] [PATCH] userspace: support vxlan and geneve tunnel tso

On 7/13/23 13:01, Dexia Li via dev wrote:
> Signed-off-by: Dexia Li 
> ---
>  lib/dp-packet.h |  7 +++---
>  lib/netdev-dpdk.c   | 45 +-
>  lib/netdev-native-tnl.c | 48 +
>  lib/netdev.c| 17 +--
>  4 files changed, 97 insertions(+), 20 deletions(-)

Hi, Dexia Li.  Thanks for the patch!

It's lacking a few important parts though.  First of all, as you may notice, 
most of the tests failed in CI.  This is because it must be possible to build 
the code without DPDK.  More generally, DPDK-specific code should not appear in 
files that do not have DPDK in the name.
In the exception of dp-packet headers, since we add rte_mbuf as part of the 
dp-packet impleentation.  It means that there should be a generic 
implementation for this functionality that is using DPDK under the hood, if 
available.

Also, not all devices support segmentation of encapsulated packets.
It should be checked that particular netdev supports offloads of this type 
before packets can be send to it.  And we may need a generic software fallback 
in case device doesn't support this functionality.
There should be a way to send these packets to netdev-linux or dpdk vhost-user 
ports, for example, left physical DPDK ports that do not support this kind of 
offloading.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct

2023-07-16 Thread Xin Long
With the following flows, the packets will be dropped if OVS TC offload is
enabled.

  'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'

In the 1st flow, it finds the exp from the hashtable and removes it then
creates the ct with this exp in act_ct. However, in the 2nd flow it goes
to the OVS upcall at the 1st time. When the skb comes back from userspace,
it has to create the ct again without exp(the exp was removed last time).
With no 'rel' set in the ct, the 3rd flow can never get matched.

In OVS conntrack, it works around it by adding its own exp lookup function
ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating
a real ct, it only updates its keys with the exp and its master info. So
when the skb comes back, the exp is still in the hashtable.

However, we can't do this trick in act_ct, as tc flower match is using a
real ct, and passing the exp and its master info to flower parsing via
tc_skb_cb is also not possible (tc_skb_cb size is not big enough).

The simple and clear fix is to not remove the exp at the 1st flow, namely,
not set IPS_CONFIRMED in tmpl when commit is not set in act_ct.

Reported-by: Shuang Li 
Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index abc71a06d634..7c652d14528b 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1238,7 +1238,8 @@ static int tcf_ct_fill_params(struct net *net,
}
}
 
-   __set_bit(IPS_CONFIRMED_BIT, >status);
+   if (p->ct_action & TCA_CT_ACT_COMMIT)
+   __set_bit(IPS_CONFIRMED_BIT, >status);
return 0;
 err:
nf_ct_put(p->tmpl);
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2023-07-16 Thread Xin Long
By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
from the hashtable when lookup, we can simplify the exp processing code a
lot in openvswitch conntrack.

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 78 +
 1 file changed, 10 insertions(+), 68 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 331730fd3580..fa955e892210 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct 
sw_flow_key *key,
return 0;
 }
 
-static struct nf_conntrack_expect *
-ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
-  u16 proto, const struct sk_buff *skb)
-{
-   struct nf_conntrack_tuple tuple;
-   struct nf_conntrack_expect *exp;
-
-   if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
))
-   return NULL;
-
-   exp = __nf_ct_expect_find(net, zone, );
-   if (exp) {
-   struct nf_conntrack_tuple_hash *h;
-
-   /* Delete existing conntrack entry, if it clashes with the
-* expectation.  This can happen since conntrack ALGs do not
-* check for clashes between (new) expectations and existing
-* conntrack entries.  nf_conntrack_in() will check the
-* expectations only if a conntrack entry can not be found,
-* which can lead to OVS finding the expectation (here) in the
-* init direction, but which will not be removed by the
-* nf_conntrack_in() call, if a matching conntrack entry is
-* found instead.  In this case all init direction packets
-* would be reported as new related packets, while reply
-* direction packets would be reported as un-related
-* established packets.
-*/
-   h = nf_conntrack_find_get(net, zone, );
-   if (h) {
-   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
-
-   nf_ct_delete(ct, 0, 0);
-   nf_ct_put(ct);
-   }
-   }
-
-   return exp;
-}
-
 /* This replicates logic from nf_conntrack_core.c that is not exported. */
 static enum ip_conntrack_info
 ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
@@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 const struct ovs_conntrack_info *info,
 struct sk_buff *skb)
 {
-   struct nf_conntrack_expect *exp;
-
-   /* If we pass an expected packet through nf_conntrack_in() the
-* expectation is typically removed, but the packet could still be
-* lost in upcall processing.  To prevent this from happening we
-* perform an explicit expectation lookup.  Expected connections are
-* always new, and will be passed through conntrack only when they are
-* committed, as it is OK to remove the expectation at that time.
-*/
-   exp = ovs_ct_expect_find(net, >zone, info->family, skb);
-   if (exp) {
-   u8 state;
-
-   /* NOTE: New connections are NATted and Helped only when
-* committed, so we are not calling into NAT here.
-*/
-   state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
-   __ovs_ct_update_key(key, state, >zone, exp->master);
-   } else {
-   struct nf_conn *ct;
-   int err;
+   struct nf_conn *ct;
+   int err;
 
-   err = __ovs_ct_lookup(net, key, info, skb);
-   if (err)
-   return err;
+   err = __ovs_ct_lookup(net, key, info, skb);
+   if (err)
+   return err;
 
-   ct = (struct nf_conn *)skb_nfct(skb);
-   if (ct)
-   nf_ct_deliver_cached_events(ct);
-   }
+   ct = (struct nf_conn *)skb_nfct(skb);
+   if (ct)
+   nf_ct_deliver_cached_events(ct);
 
return 0;
 }
@@ -1460,7 +1401,8 @@ int ovs_ct_copy_action(struct net *net, const struct 
nlattr *attr,
if (err)
goto err_free_ct;
 
-   __set_bit(IPS_CONFIRMED_BIT, _info.ct->status);
+   if (ct_info.commit)
+   __set_bit(IPS_CONFIRMED_BIT, _info.ct->status);
return 0;
 err_free_ct:
__ovs_ct_free_action(_info);
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation

2023-07-16 Thread Xin Long
Currently nf_conntrack_in() calling nf_ct_find_expectation() will
remove the exp from the hash table. However, in some scenario, we
expect the exp not to be removed when the created ct will not be
confirmed, like in OVS and TC conntrack in the following patches.

This patch allows exp not to be removed by setting IPS_CONFIRMED
in the status of the tmpl.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_expect.h | 2 +-
 net/netfilter/nf_conntrack_core.c   | 2 +-
 net/netfilter/nf_conntrack_expect.c | 4 ++--
 net/netfilter/nft_ct.c  | 2 ++
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_expect.h 
b/include/net/netfilter/nf_conntrack_expect.h
index cf0d81be5a96..165e7a03b8e9 100644
--- a/include/net/netfilter/nf_conntrack_expect.h
+++ b/include/net/netfilter/nf_conntrack_expect.h
@@ -100,7 +100,7 @@ nf_ct_expect_find_get(struct net *net,
 struct nf_conntrack_expect *
 nf_ct_find_expectation(struct net *net,
   const struct nf_conntrack_zone *zone,
-  const struct nf_conntrack_tuple *tuple);
+  const struct nf_conntrack_tuple *tuple, bool unlink);
 
 void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp,
u32 portid, int report);
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 992393102d5f..9f6f2e643575 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1756,7 +1756,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
cnet = nf_ct_pernet(net);
if (cnet->expect_count) {
spin_lock_bh(_conntrack_expect_lock);
-   exp = nf_ct_find_expectation(net, zone, tuple);
+   exp = nf_ct_find_expectation(net, zone, tuple, !tmpl || 
nf_ct_is_confirmed(tmpl));
if (exp) {
/* Welcome, Mr. Bond.  We've been expecting you... */
__set_bit(IPS_EXPECTED_BIT, >status);
diff --git a/net/netfilter/nf_conntrack_expect.c 
b/net/netfilter/nf_conntrack_expect.c
index 96948e98ec53..81ca348915c9 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -171,7 +171,7 @@ EXPORT_SYMBOL_GPL(nf_ct_expect_find_get);
 struct nf_conntrack_expect *
 nf_ct_find_expectation(struct net *net,
   const struct nf_conntrack_zone *zone,
-  const struct nf_conntrack_tuple *tuple)
+  const struct nf_conntrack_tuple *tuple, bool unlink)
 {
struct nf_conntrack_net *cnet = nf_ct_pernet(net);
struct nf_conntrack_expect *i, *exp = NULL;
@@ -211,7 +211,7 @@ nf_ct_find_expectation(struct net *net,
 !refcount_inc_not_zero(>master->ct_general.use)))
return NULL;
 
-   if (exp->flags & NF_CT_EXPECT_PERMANENT) {
+   if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink) {
refcount_inc(>use);
return exp;
} else if (del_timer(>timeout)) {
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 38958e067aa8..e87fd4314c68 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -262,6 +262,7 @@ static void nft_ct_set_zone_eval(const struct nft_expr 
*expr,
regs->verdict.code = NF_DROP;
return;
}
+   __set_bit(IPS_CONFIRMED_BIT, >status);
}
 
nf_ct_set(skb, ct, IP_CT_NEW);
@@ -368,6 +369,7 @@ static bool nft_ct_tmpl_alloc_pcpu(void)
return false;
}
 
+   __set_bit(IPS_CONFIRMED_BIT, >status);
per_cpu(nft_ct_pcpu_template, cpu) = tmp;
}
 
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly

2023-07-16 Thread Xin Long
With the OVS upcall, the original ct in the skb will be dropped, and when
the skb comes back from userspace it has to create a new ct again through
nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().

However, the new ct will not be able to have the exp as the original ct
has taken it away from the hash table in nf_ct_find_expectation(). This
will cause some flow never to be matched, like:

  'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'

if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
matched.

OVS conntrack works around this by adding its own exp lookup function to
not remove the exp from the hash table and saving the exp and its master
info to the flow keys instead of create a real ct. But this way doesn't
work for TC act_ct.

The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
allows both OVS conntrack and TC act_ct to have a simple and clear fix
for this problem in the patch 2/3 and 3/3.

Xin Long (3):
  netfilter: allow exp not to be removed in nf_ct_find_expectation
  net: sched: set IPS_CONFIRMED in tmpl status only when commit is set
in act_ct
  openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set
in conntrack

 include/net/netfilter/nf_conntrack_expect.h |  2 +-
 net/netfilter/nf_conntrack_core.c   |  2 +-
 net/netfilter/nf_conntrack_expect.c |  4 +-
 net/netfilter/nft_ct.c  |  2 +
 net/openvswitch/conntrack.c | 78 +++--
 net/sched/act_ct.c  |  3 +-
 6 files changed, 18 insertions(+), 73 deletions(-)

-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 1/2] netdev-dpdk: negotiate delivery of per-packet Rx metadata

2023-07-16 Thread Ivan Malov via dev

Hi Ilya,

A quick follow-up from me: I made a new version of this patch [1].
[1] 
https://patchwork.ozlabs.org/project/openvswitch/patch/20230716115720.6789-1-ivan.ma...@arknetworks.am/

Thank you.

On Fri, 14 Jul 2023, Ilya Maximets wrote:


On 6/30/23 04:46, Ivan Malov wrote:

This may be required by some PMDs in offload scenarios.

Signed-off-by: Ivan Malov 


Hi, Ivan.  Thanks for the patch!
I suppose, it can be considered as a bug fix.  Could you add
a Fixes tag to the commit message?


---
 lib/netdev-dpdk.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 63dac689e..d9d1b43f6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -492,6 +492,9 @@ struct netdev_dpdk {

 /* Array of vhost rxq states, see vring_state_changed. */
 bool *vhost_rxq_enabled;
+
+/* Ensures that Rx metadata delivery is configured only once. */
+bool rx_metadata_delivery_configured;
 );

 PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1187,6 +1190,42 @@ dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) 
OVS_REQUIRES(dev->mutex)
 }
 }

+static void
+dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev)
+{
+uint64_t rx_metadata = 0;
+int ret;
+
+if (dev->rx_metadata_delivery_configured) {
+return;
+}
+
+/* For the fallback offload (non-"transfer" rules) */
+rx_metadata |= RTE_ETH_RX_METADATA_USER_MARK;
+/* For the full offload ("transfer" rules) */


This doesn't seem to be correct.  This flag is only needed for
tunnel offload.  Should also be under experimental api ifdef?


+rx_metadata |= RTE_ETH_RX_METADATA_TUNNEL_ID;
+
+ret = rte_eth_rx_metadata_negotiate(dev->port_id, _metadata);
+if (ret == 0) {
+if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) {
+VLOG_DBG("The NIC will not provide per-packet USER_MARK on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+}
+if (!(rx_metadata & RTE_ETH_RX_METADATA_TUNNEL_ID)) {
+VLOG_DBG("The NIC will not provide per-packet TUNNEL_ID on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+}
+} else if (ret == -ENOTSUP) {
+VLOG_DBG("Rx metadata negotiate procedure is not supported on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+} else {
+VLOG_WARN("Cannot negotiate Rx metadata on port "
+  DPDK_PORT_ID_FMT, dev->port_id);


For all the logs above, use the following format instead:

   VLOG_*("%s: ...", netdev_get_name(>up));

For the last two, you may use:

   VLOG(ret == -ENOTSUP ? VLL_DBG : VLL_WARN,
"%s: Cannot negotiate Rx metadata: %s",
netdev_get_name(>up), rte_strerror(-ret));


+}
+
+dev->rx_metadata_delivery_configured = true;
+}
+
 static int
 dpdk_eth_dev_init(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
@@ -1200,6 +1239,16 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
  RTE_ETH_RX_OFFLOAD_TCP_CKSUM |
  RTE_ETH_RX_OFFLOAD_IPV4_CKSUM;

+/*
+ * Full tunnel offload requires that tunnel ID metadata be
+ * delivered with "miss" packets from the hardware to the
+ * PMD. The same goes for megaflow mark metadata which is
+ * used in MARK + RSS offload scenario.
+ *
+ * Request delivery of such metadata.
+ */
+dpdk_eth_dev_init_rx_metadata(dev);


It's a bit strange that we request metadata regardless of hw-offload
being enabled.  Should this be under netdev_is_flow_api_enabled() ?


+
 rte_eth_dev_info_get(dev->port_id, );

 if (strstr(info.driver_name, "vf") != NULL) {
@@ -1382,6 +1431,8 @@ common_construct(struct netdev *netdev, dpdk_port_t 
port_no,
 /* Initilize the hardware offload flags to 0 */
 dev->hw_ol_features = 0;

+dev->rx_metadata_delivery_configured = false;
+
 dev->flags = NETDEV_UP | NETDEV_PROMISC;

 ovs_list_push_back(_list, >list_node);




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/1] netdev-dpdk: negotiate delivery of per-packet Rx metadata

2023-07-16 Thread Ivan Malov via dev
This may be required by some PMDs in offload scenarios.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ivan Malov 
---
 v2: add missing experimental api ifdef

 lib/netdev-dpdk.c | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index aa87ee546..ea4cc6977 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -512,6 +512,9 @@ struct netdev_dpdk {
 
 /* Array of vhost rxq states, see vring_state_changed. */
 bool *vhost_rxq_enabled;
+
+/* Ensures that Rx metadata delivery is configured only once. */
+bool rx_metadata_delivery_configured;
 );
 
 PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1220,6 +1223,45 @@ dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) 
OVS_REQUIRES(dev->mutex)
 }
 }
 
+static void
+dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev)
+{
+uint64_t rx_metadata = 0;
+int ret;
+
+if (dev->rx_metadata_delivery_configured) {
+return;
+}
+
+/* For the fallback offload (non-"transfer" rules) */
+rx_metadata |= RTE_ETH_RX_METADATA_USER_MARK;
+
+#ifdef ALLOW_EXPERIMENTAL_API
+/* For the tunnel offload  */
+rx_metadata |= RTE_ETH_RX_METADATA_TUNNEL_ID;
+#endif /* ALLOW_EXPERIMENTAL_API */
+
+ret = rte_eth_rx_metadata_negotiate(dev->port_id, _metadata);
+if (ret == 0) {
+if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) {
+VLOG_DBG("%s: The NIC will not provide per-packet USER_MARK",
+ netdev_get_name(>up));
+}
+#ifdef ALLOW_EXPERIMENTAL_API
+if (!(rx_metadata & RTE_ETH_RX_METADATA_TUNNEL_ID)) {
+VLOG_DBG("%s: The NIC will not provide per-packet TUNNEL_ID",
+ netdev_get_name(>up));
+}
+#endif /* ALLOW_EXPERIMENTAL_API */
+} else {
+VLOG(ret == -ENOTSUP ? VLL_DBG : VLL_WARN,
+ "%s: Cannot negotiate Rx metadata: %s",
+ netdev_get_name(>up), rte_strerror(-ret));
+}
+
+dev->rx_metadata_delivery_configured = true;
+}
+
 static int
 dpdk_eth_dev_init(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
@@ -1233,6 +1275,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
  RTE_ETH_RX_OFFLOAD_TCP_CKSUM |
  RTE_ETH_RX_OFFLOAD_IPV4_CKSUM;
 
+if (netdev_is_flow_api_enabled()) {
+/*
+ * Full tunnel offload requires that tunnel ID metadata be
+ * delivered with "miss" packets from the hardware to the
+ * PMD. The same goes for megaflow mark metadata which is
+ * used in MARK + RSS offload scenario.
+ *
+ * Request delivery of such metadata.
+ */
+dpdk_eth_dev_init_rx_metadata(dev);
+}
+
 rte_eth_dev_info_get(dev->port_id, );
 
 if (strstr(info.driver_name, "vf") != NULL) {
@@ -1421,6 +1475,8 @@ common_construct(struct netdev *netdev, dpdk_port_t 
port_no,
 /* Initilize the hardware offload flags to 0 */
 dev->hw_ol_features = 0;
 
+dev->rx_metadata_delivery_configured = false;
+
 dev->flags = NETDEV_UP | NETDEV_PROMISC;
 
 ovs_list_push_back(_list, >list_node);
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/1] netdev-dpdk: negotiate delivery of per-packet Rx metadata

2023-07-16 Thread Ivan Malov via dev
This may be required by some PMDs in offload scenarios.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ivan Malov 
---
 lib/netdev-dpdk.c | 54 +++
 1 file changed, 54 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index aa87ee546..3d39a3afd 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -512,6 +512,9 @@ struct netdev_dpdk {
 
 /* Array of vhost rxq states, see vring_state_changed. */
 bool *vhost_rxq_enabled;
+
+/* Ensures that Rx metadata delivery is configured only once. */
+bool rx_metadata_delivery_configured;
 );
 
 PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1220,6 +1223,43 @@ dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) 
OVS_REQUIRES(dev->mutex)
 }
 }
 
+static void
+dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev)
+{
+uint64_t rx_metadata = 0;
+int ret;
+
+if (dev->rx_metadata_delivery_configured) {
+return;
+}
+
+/* For the fallback offload (non-"transfer" rules) */
+rx_metadata |= RTE_ETH_RX_METADATA_USER_MARK;
+
+#ifdef ALLOW_EXPERIMENTAL_API
+/* For the tunnel offload  */
+rx_metadata |= RTE_ETH_RX_METADATA_TUNNEL_ID;
+#endif /* ALLOW_EXPERIMENTAL_API */
+
+ret = rte_eth_rx_metadata_negotiate(dev->port_id, _metadata);
+if (ret == 0) {
+if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) {
+VLOG_DBG("%s: The NIC will not provide per-packet USER_MARK",
+ netdev_get_name(>up));
+}
+if (!(rx_metadata & RTE_ETH_RX_METADATA_TUNNEL_ID)) {
+VLOG_DBG("%s: The NIC will not provide per-packet TUNNEL_ID",
+ netdev_get_name(>up));
+}
+} else {
+VLOG(ret == -ENOTSUP ? VLL_DBG : VLL_WARN,
+ "%s: Cannot negotiate Rx metadata: %s",
+ netdev_get_name(>up), rte_strerror(-ret));
+}
+
+dev->rx_metadata_delivery_configured = true;
+}
+
 static int
 dpdk_eth_dev_init(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
@@ -1233,6 +1273,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
  RTE_ETH_RX_OFFLOAD_TCP_CKSUM |
  RTE_ETH_RX_OFFLOAD_IPV4_CKSUM;
 
+if (netdev_is_flow_api_enabled()) {
+/*
+ * Full tunnel offload requires that tunnel ID metadata be
+ * delivered with "miss" packets from the hardware to the
+ * PMD. The same goes for megaflow mark metadata which is
+ * used in MARK + RSS offload scenario.
+ *
+ * Request delivery of such metadata.
+ */
+dpdk_eth_dev_init_rx_metadata(dev);
+}
+
 rte_eth_dev_info_get(dev->port_id, );
 
 if (strstr(info.driver_name, "vf") != NULL) {
@@ -1421,6 +1473,8 @@ common_construct(struct netdev *netdev, dpdk_port_t 
port_no,
 /* Initilize the hardware offload flags to 0 */
 dev->hw_ol_features = 0;
 
+dev->rx_metadata_delivery_configured = false;
+
 dev->flags = NETDEV_UP | NETDEV_PROMISC;
 
 ovs_list_push_back(_list, >list_node);
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 1/2] netdev-dpdk: negotiate delivery of per-packet Rx metadata

2023-07-16 Thread Ivan Malov via dev

Hi Ilya,

Thanks for reviewing this. PSB.

On Fri, 14 Jul 2023, Ilya Maximets wrote:


On 6/30/23 04:46, Ivan Malov wrote:

This may be required by some PMDs in offload scenarios.

Signed-off-by: Ivan Malov 


Hi, Ivan.  Thanks for the patch!
I suppose, it can be considered as a bug fix.  Could you add
a Fixes tag to the commit message?


---
 lib/netdev-dpdk.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 63dac689e..d9d1b43f6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -492,6 +492,9 @@ struct netdev_dpdk {

 /* Array of vhost rxq states, see vring_state_changed. */
 bool *vhost_rxq_enabled;
+
+/* Ensures that Rx metadata delivery is configured only once. */
+bool rx_metadata_delivery_configured;
 );

 PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1187,6 +1190,42 @@ dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) 
OVS_REQUIRES(dev->mutex)
 }
 }

+static void
+dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev)
+{
+uint64_t rx_metadata = 0;
+int ret;
+
+if (dev->rx_metadata_delivery_configured) {
+return;
+}
+
+/* For the fallback offload (non-"transfer" rules) */
+rx_metadata |= RTE_ETH_RX_METADATA_USER_MARK;
+/* For the full offload ("transfer" rules) */


This doesn't seem to be correct.  This flag is only needed for
tunnel offload.  Should also be under experimental api ifdef?


My guess, it's comment which is incorrect. I'm going to fix it, yes.
But, regarding the experimental api ifdef, according to review [1]
from Eli, it might be unneeded. Furthermore, the dpdk api being
invoked is no longer experimental. What do you think?

[1] 
https://patchwork.ozlabs.org/project/openvswitch/patch/20220720121823.2497727-4-ivan.ma...@oktetlabs.ru/#2935847




+rx_metadata |= RTE_ETH_RX_METADATA_TUNNEL_ID;
+
+ret = rte_eth_rx_metadata_negotiate(dev->port_id, _metadata);
+if (ret == 0) {
+if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) {
+VLOG_DBG("The NIC will not provide per-packet USER_MARK on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+}
+if (!(rx_metadata & RTE_ETH_RX_METADATA_TUNNEL_ID)) {
+VLOG_DBG("The NIC will not provide per-packet TUNNEL_ID on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+}
+} else if (ret == -ENOTSUP) {
+VLOG_DBG("Rx metadata negotiate procedure is not supported on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+} else {
+VLOG_WARN("Cannot negotiate Rx metadata on port "
+  DPDK_PORT_ID_FMT, dev->port_id);


For all the logs above, use the following format instead:

   VLOG_*("%s: ...", netdev_get_name(>up));

For the last two, you may use:

   VLOG(ret == -ENOTSUP ? VLL_DBG : VLL_WARN,
"%s: Cannot negotiate Rx metadata: %s",
netdev_get_name(>up), rte_strerror(-ret));


+}
+
+dev->rx_metadata_delivery_configured = true;
+}
+
 static int
 dpdk_eth_dev_init(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
@@ -1200,6 +1239,16 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
  RTE_ETH_RX_OFFLOAD_TCP_CKSUM |
  RTE_ETH_RX_OFFLOAD_IPV4_CKSUM;

+/*
+ * Full tunnel offload requires that tunnel ID metadata be
+ * delivered with "miss" packets from the hardware to the
+ * PMD. The same goes for megaflow mark metadata which is
+ * used in MARK + RSS offload scenario.
+ *
+ * Request delivery of such metadata.
+ */
+dpdk_eth_dev_init_rx_metadata(dev);


It's a bit strange that we request metadata regardless of hw-offload
being enabled.  Should this be under netdev_is_flow_api_enabled() ?


+
 rte_eth_dev_info_get(dev->port_id, );

 if (strstr(info.driver_name, "vf") != NULL) {
@@ -1382,6 +1431,8 @@ common_construct(struct netdev *netdev, dpdk_port_t 
port_no,
 /* Initilize the hardware offload flags to 0 */
 dev->hw_ol_features = 0;

+dev->rx_metadata_delivery_configured = false;
+
 dev->flags = NETDEV_UP | NETDEV_PROMISC;

 ovs_list_push_back(_list, >list_node);




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev