Re: [ovs-dev] [PATCH] ofproto: fix use-after-free for "ofproto".

2022-01-17 Thread 贺鹏
Hi, yunjian and Ilya,

this patch reminds me of the discussion we have in March last year, and
during the discussion, you have spotted this
thread-safety issue of uuid map. Unfortunately, in that email, you did not
reply to the mailist, so I cannot give a link to
the email. I attach it as a reference.

I quote it here:

"
That is an interesting example here...
I can't help but notice that this function is typically called
from different handler or pmd threads and modified by the main thread
while upcalls enabled.  And hmap is not a thread-safe structure.
I guess, we have another possible problem here.  We need to protect
at least this hmap and the other one with rw lock or something...
Am I right or am I missing something?  What else we need to protect?
"

And this patch fixes exactly the problem you stated.



wangyunjian via dev  于2022年1月8日周六 17:18写道:

> Friendly ping.
>
> > -Original Message-
> > From: wangyunjian
> > Sent: Friday, December 3, 2021 7:25 PM
> > To: d...@openvswitch.org; i.maxim...@ovn.org
> > Cc: dingxiaoxiong ; xudingke
> > ; wangyunjian ; Justin
> > Pettit 
> > Subject: [ovs-dev] [PATCH] ofproto: fix use-after-free for "ofproto".
> >
> > When handler threads lookup a "ofproto" and use it, main thread maybe
> > remove and free the "ofproto" at the same time. The "ofproto" has not
> > been protected well, which can lead to an OVS crash.
> >
> > This patch fixes this by making the "ofproto" lookup RCU-safe by using
> > cmap instead of hmap and moving remove "ofproto" call before
> > xlate_txn_commit().
> >
> > (gdb) bt
> >   hmap_next (hmap.h:398)
> >   seq_wake_waiters (seq.c:326)
> >   seq_change_protected (seq.c:134)
> >   seq_change (seq.c:144)
> >   ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
> >   process_upcall (ofproto_dpif_upcall.c:1782)
> >   recv_upcalls (ofproto_dpif_upcall.c:1026)
> >   udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
> >   ovsthread_wrapper (ovs_thread.c:734)
> >
> > Cc: Justin Pettit 
> > Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to
> user
> > action cookie.")
> > Signed-off-by: Yunjian Wang 
> > ---
> >  ofproto/ofproto-dpif.c | 12 ++--
> >  ofproto/ofproto-dpif.h |  2 +-
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> > index cba49a99e..aa8d32f81 100644
> > --- a/ofproto/ofproto-dpif.c
> > +++ b/ofproto/ofproto-dpif.c
> > @@ -216,8 +216,7 @@ static struct hmap all_ofproto_dpifs_by_name =
> >
> > HMAP_INITIALIZER(_ofproto_dpifs_by_name);
> >
> >  /* All existing ofproto_dpif instances, indexed by ->uuid. */
> > -static struct hmap all_ofproto_dpifs_by_uuid =
> > -
> > HMAP_INITIALIZER(_ofproto_dpifs_by_uuid);
> > +static struct cmap all_ofproto_dpifs_by_uuid = CMAP_INITIALIZER;
> >
> >  static bool ofproto_use_tnl_push_pop = true;
> >  static void ofproto_unixctl_init(void);
> > @@ -1682,7 +1681,7 @@ construct(struct ofproto *ofproto_)
> >  hmap_insert(_ofproto_dpifs_by_name,
> >  >all_ofproto_dpifs_by_name_node,
> >  hash_string(ofproto->up.name, 0));
> > -hmap_insert(_ofproto_dpifs_by_uuid,
> > +cmap_insert(_ofproto_dpifs_by_uuid,
> >  >all_ofproto_dpifs_by_uuid_node,
> >  uuid_hash(>uuid));
> >  memset(>stats, 0, sizeof ofproto->stats);
> > @@ -1778,12 +1777,13 @@ destruct(struct ofproto *ofproto_, bool del)
> >  ofproto->backer->need_revalidate = REV_RECONFIGURE;
> >  xlate_txn_start();
> >  xlate_remove_ofproto(ofproto);
> > +cmap_remove(_ofproto_dpifs_by_uuid,
> > +>all_ofproto_dpifs_by_uuid_node,
> > +uuid_hash(>uuid));
>

I guess some comments are needed here, you actually make use of
the rcu_synchronize in the xlate_txn_commit to avoid access to the
ofproto from other thread through uuid map.


> >  xlate_txn_commit();
> >
> >  hmap_remove(_ofproto_dpifs_by_name,
> >  >all_ofproto_dpifs_by_name_node);
> > -hmap_remove(_ofproto_dpifs_by_uuid,
> > ->all_ofproto_dpifs_by_uuid_node);
> >
> >  OFPROTO_FOR_EACH_TABLE (table, >up) {
> >  CLS_FOR_EACH (rule, up.cr, >cls) {
> > @@ -5781,7 +5781,7 @@ ofproto_dpif_lookup_by_uuid(const struct uuid
> > *uuid)
> >  {
> >  struct ofproto_dpif *ofproto;
> >
> > -HMAP_FOR_EACH_WITH_HASH (ofproto,
> > all_ofproto_dpifs_by_uuid_node,
> > +CMAP_FOR_EACH_WITH_HASH (ofproto,
> > all_ofproto_dpifs_by_uuid_node,
> >   uuid_hash(uuid),
> > _ofproto_dpifs_by_uuid) {
> >  if (uuid_equals(>uuid, uuid)) {
> >  return ofproto;
> > diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
> > index 191cfcb0d..fba03f2cc 100644
> > --- a/ofproto/ofproto-dpif.h
> > +++ b/ofproto/ofproto-dpif.h
> > @@ -295,7 +295,7 @@ struct ofproto_dpif {
> >  struct hmap_node all_ofproto_dpifs_by_name_node;
> >
> >  /* In 'all_ofproto_dpifs_by_uuid'. */
> > -

Re: [ovs-dev] [PATCH] netdev-dpdk: prepare for tso offload for tx copy packet

2022-01-17 Thread 贺鹏
Hi, Harold,

I am curious how this bug is found as I have an unsolved bug that I believe
it's related to
some memory issues and may be related to NIC's problem.


Harold Huang  于2022年1月13日周四 16:24写道:

> From: Harold Huang 
>
> When one flow is output to multiple egress ports, OVS copy the packets
> and send the copy packets to the intermediate ports. The original packets
> is sent to the last port. If the intermediate port is a dpdk port, the copy
> packets should also be prepared for tso offload.
>
> Fixes: 29cf9c1b3b ("userspace: Add TCP Segmentation Offload support")
> Signed-off-by: Harold Huang 
> ---
>  lib/netdev-dpdk.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6782d3e8f..83029405e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2737,10 +2737,11 @@ dpdk_pktmbuf_alloc(struct rte_mempool *mp,
> uint32_t data_len)
>  }
>
>  static struct dp_packet *
> -dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet
> *pkt_orig)
> +dpdk_copy_dp_packet_to_mbuf(struct netdev_dpdk *dev, struct dp_packet
> *pkt_orig)
>  {
>  struct rte_mbuf *mbuf_dest;
>  struct dp_packet *pkt_dest;
> +struct rte_mempool *mp = dev->dpdk_mp->mp;
>  uint32_t pkt_len;
>
>  pkt_len = dp_packet_size(pkt_orig);
> @@ -2761,11 +2762,9 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp,
> struct dp_packet *pkt_orig)
>  memcpy(_dest->l2_pad_size, _orig->l2_pad_size,
> sizeof(struct dp_packet) - offsetof(struct dp_packet,
> l2_pad_size));
>
> -if (mbuf_dest->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
> -mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest)
> -- (char *)dp_packet_eth(pkt_dest);
> -mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest)
> -- (char *) dp_packet_l3(pkt_dest);
> +if (!netdev_dpdk_prep_hwol_packet(dev, mbuf_dest)) {
>

perhaps check userspace_tso_enabled()?


> +rte_pktmbuf_free(mbuf_dest);
> +return NULL;
>  }
>
>  return pkt_dest;
> @@ -2813,7 +2812,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
> struct dp_packet_batch *batch)
>  continue;
>  }
>
> -pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp,
> packet);
> +pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev, packet);
>  if (OVS_UNLIKELY(!pkts[txcnt])) {
>  dropped = cnt - i;
>  break;
> --
> 2.27.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


-- 
hepeng
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy get/set cmd for netdev datapath

2022-01-17 Thread wenxu








From: Aaron Conole 
Date: 2022-01-17 23:51:55
To:  we...@ucloud.cn
Cc:  i.maxim...@ovn.org,pvale...@redhat.com,d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy 
get/set cmd for netdev datapath>Aaron Conole  writes:
>
>> we...@ucloud.cn writes:
>>
>>> From: wenxu 
>>>
>>> Now, the default timeout policy for netdev datapath is hard codeing. In
>>> some case show or modify is needed.
>>> Add command for get/set default timeout policy. Using like this:
>>>
>>> ovs-appctl dpctl/ct-get-default-tp [dp]
>>> ovs-appctl dpctl/ct-set-default-tp [dp] policies
>>>
>>> Signed-off-by: wenxu 
>>> ---
>>
>> So far looks good.  Just one minor comment (and one conflict) that might be 
>> able to be addressed
>
>After over an hour of loops, I did manage to produce one failure with
>your test case:
>
>--- -  2022-01-17 10:31:12.740151293 -0500
>+++ 
>/home/aconole/git/ovs2/ovs/tests/system-userspace-testsuite.dir/at-groups/94/stdout
>2022-01-17 10:31:12.739245956 -0500
>@@ -1,3 +1,2 @@
>-icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
>
>
>I think the test suite is "stable" but it might be better to use
>something like 'ovs-ofctl monitor' to watch for events.  Maybe you have
>tried this already?
>
>The errant 'sleep's can result in failures like this.
This ‘sleep’ is just following the testcase [conntrack - zone-based timeout 
policy].  Maybe it is enough?

>
>>>  NEWS |  4 +++
>>>  lib/conntrack-tp.c   | 11 +++
>>>  lib/conntrack-tp.h   |  2 ++
>>>  lib/ct-dpif.c| 56 
>>>  lib/ct-dpif.h|  9 ++
>>>  lib/dpctl.c  | 69 
>>> 
>>>  lib/dpif-netdev.c| 25 +++
>>>  lib/dpif-netlink.c   |  2 ++
>>>  lib/dpif-provider.h  |  8 +
>>>  tests/system-kmod-macros.at  | 10 ++
>>>  tests/system-traffic.at  | 67 
>>> ++
>>>  tests/system-userspace-macros.at |  7 
>>>  12 files changed, 270 insertions(+)
>>>
>>> diff --git a/NEWS b/NEWS
>>> index 553a57d..f9fc04a 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -39,6 +39,10 @@ Post-v2.16.0
>>> is mainly for the benefit of OVN load balancing configurations.
>>> - Ingress policing on Linux now uses 'matchall' classifier instead of
>>>   'basic', if available.
>>> +   - ovs-appctl dpctl/:
>>> + * New commands 'ct-set-default-tp' and
>>> +   'ct-set-default-tp' that allows to get or configure
>>> +   netdev datapath ct default timeout policy.
>>
>> I had a conflict with this section on apply.  It isn't a huge deal.
>>
>>> diff --git a/lib/conntrack-tp.h b/lib/conntrack-tp.h
>>> index 4d411d1..07dcb4e 100644
>>> --- a/lib/conntrack-tp.h
>>> +++ b/lib/conntrack-tp.h
>>> @@ -710,6 +729,43 @@ ct_dpif_free_zone_limits(struct ovs_list *zone_limits)
>>>  }
>>>  }
>>>  
>>> +
>>> +/* Parses a specification of a timeout policy from 's' into '*tp'
>>> + * .  Returns true on success.  Otherwise, returns false and
>>> + * and puts the error message in 'ds'. */
>>
>> The '.' in the above is ending the sentence on the previous line, and
>> the world 'and' appears twice in a row.
>>
>> I think it should read:
>>
>>/* Parses a specification of a timeout policy from 's' into '*tp'.
>> * Returns true on success.  Otherwise, returns false and puts the
>> * error message in 'ds'. */
>




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ofproto-dpif: fix packet_execute_prepare

2022-01-17 Thread 0-day Robot
Bleep bloop.  Greetings 贺鹏, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Peng He 
Lines checked: 33, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 2/3] conntrack: prefer dst port range during unique tuple search

2022-01-17 Thread wenxu
From: wenxu 

This commit splits the nested loop used to search the unique ports for
the reverse tuple.
It affects only the dnat action, giving more precedence to the dnat
range, similarly to the kernel dp, instead of searching through the
default ephemeral source range for each destination port.

Signed-off-by: wenxu 
---
 lib/conntrack.c | 61 +++--
 1 file changed, 46 insertions(+), 15 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 44f99f3..dc29c9b 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2397,6 +2397,22 @@ next_addr_in_range_guarded(union ct_addr *curr, union 
ct_addr *min,
 return exhausted;
 }
 
+static bool
+nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn,
+  ovs_be16 *port, uint16_t curr, uint16_t min,
+  uint16_t max)
+{
+FOR_EACH_PORT_IN_RANGE(curr, min, max) {
+*port = htons(curr);
+if (!conn_lookup(ct, _conn->rev_key,
+ time_msec(), NULL, NULL)) {
+return true;
+}
+}
+
+return false;
+}
+
 /* This function tries to get a unique tuple.
  * Every iteration checks that the reverse tuple doesn't
  * collide with any existing one.
@@ -2411,9 +2427,11 @@ next_addr_in_range_guarded(union ct_addr *curr, union 
ct_addr *min,
  *
  * In case of DNAT:
  *- For each dst IP address in the range (if any).
- *- For each dport in range (if any).
- * - Try to find a source port in the ephemeral range
- *   (after testing the port used by the sender).
+ *- For each dport in range (if any) tries to find
+ *  an unique tuple.
+ *- Eventually, if the previous attempt fails,
+ *  tries to find a source port in the ephemeral
+ *  range (after testing the port used by the sender).
  *
  * If none can be found, return exhaustion to the caller. */
 static bool
@@ -2424,10 +2442,10 @@ nat_get_unique_tuple(struct conntrack *ct, const struct 
conn *conn,
 union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0},
   guard_addr = {0};
 uint32_t hash = nat_range_hash(conn, ct->hash_basis, nat_info);
+uint16_t min_sport, max_sport, curr_sport, orig_sport;
+uint16_t min_dport, max_dport, curr_dport, orig_dport;
 bool pat_proto = conn->key.nw_proto == IPPROTO_TCP ||
  conn->key.nw_proto == IPPROTO_UDP;
-uint16_t min_dport, max_dport, curr_dport;
-uint16_t min_sport, max_sport, curr_sport;
 
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
@@ -2439,9 +2457,9 @@ nat_get_unique_tuple(struct conntrack *ct, const struct 
conn *conn,
  * we can stop once we reach it. */
 guard_addr = curr_addr;
 
-set_sport_range(nat_info, >key, hash, _sport,
+set_sport_range(nat_info, >key, hash, _sport,
 _sport, _sport);
-set_dport_range(nat_info, >key, hash, _dport,
+set_dport_range(nat_info, >key, hash, _dport,
 _dport, _dport);
 
 another_round:
@@ -2457,17 +2475,30 @@ another_round:
 goto next_addr;
 }
 
-FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) {
-nat_conn->rev_key.src.port = htons(curr_dport);
-FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) {
-nat_conn->rev_key.dst.port = htons(curr_sport);
-if (!conn_lookup(ct, _conn->rev_key,
- time_msec(), NULL, NULL)) {
-return true;
-}
+curr_sport = orig_sport;
+curr_dport = orig_dport;
+
+nat_conn->rev_key.src.port = htons(curr_dport);
+nat_conn->rev_key.dst.port = htons(curr_sport);
+
+bool found = false;
+if (nat_info->nat_action & NAT_ACTION_DST_PORT) {
+found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.src.port,
+  curr_dport, min_dport, max_dport);
+if (!found) {
+nat_conn->rev_key.src.port = htons(orig_dport);
 }
 }
 
+if (!found) {
+found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.dst.port,
+  curr_sport, min_sport, max_sport);
+}
+
+if (found) {
+return true;
+}
+
 /* Check if next IP is in range and respin. Otherwise, notify
  * exhaustion to the caller. */
 next_addr:
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 3/3] conntrack: limit port clash resolution attempts

2022-01-17 Thread wenxu
From: wenxu 

In case almost or all available ports are taken, clash resolution can
take a very long time, resulting in pmd hang in conntrack.

This can happen when many to-be-natted hosts connect to same
destination:port (e.g. a proxy) and all connections pass the same SNAT.

Pick a random offset in the acceptable range, then try ever smaller
number of adjacent port numbers, until either the limit is reached or a
useable port was found.  This results in at most 248 attempts
(128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
instead of 64000+.

And if thenumber of ip address will limit the max attempts and which
will lead the total attempts under 248.

Signed-off-by: wenxu 
---
 lib/conntrack.c | 71 ++---
 1 file changed, 63 insertions(+), 8 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index dc29c9b..248cd45 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2400,19 +2400,50 @@ next_addr_in_range_guarded(union ct_addr *curr, union 
ct_addr *min,
 static bool
 nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn,
   ovs_be16 *port, uint16_t curr, uint16_t min,
-  uint16_t max)
+  uint16_t max, unsigned int attempts)
 {
+unsigned int i = 0;
+
 FOR_EACH_PORT_IN_RANGE(curr, min, max) {
-*port = htons(curr);
-if (!conn_lookup(ct, _conn->rev_key,
- time_msec(), NULL, NULL)) {
-return true;
+if (i++ < attempts) {
+*port = htons(curr);
+if (!conn_lookup(ct, _conn->rev_key,
+ time_msec(), NULL, NULL)) {
+return true;
+}
+} else {
+break;
 }
 }
 
 return false;
 }
 
+static void
+nat_get_attempts_range(union ct_addr *min_addr, union ct_addr *max_addr,
+   uint16_t dl_type, unsigned int *max_attempts,
+   unsigned int *min_attempts)
+{
+uint32_t range_addr;
+
+if (dl_type == htons(ETH_TYPE_IP)) {
+range_addr = ntohl(max_addr->ipv4) - ntohl(min_addr->ipv4) + 1;
+} else {
+range_addr = nat_ipv6_addrs_delta(_addr->ipv6,
+  _addr->ipv6) + 1;
+}
+
+*max_attempts = 128 / range_addr;
+if (*max_attempts < 1) {
+*max_attempts = 1;
+}
+
+*min_attempts = 16 / range_addr;
+if (*min_attempts < 2) {
+*min_attempts = 2;
+}
+}
+
 /* This function tries to get a unique tuple.
  * Every iteration checks that the reverse tuple doesn't
  * collide with any existing one.
@@ -2446,6 +2477,9 @@ nat_get_unique_tuple(struct conntrack *ct, const struct 
conn *conn,
 uint16_t min_dport, max_dport, curr_dport, orig_dport;
 bool pat_proto = conn->key.nw_proto == IPPROTO_TCP ||
  conn->key.nw_proto == IPPROTO_UDP;
+unsigned int attempts, max_attempts, min_attempts;
+uint16_t range_src, range_dst, range_max;
+bool found;
 
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
@@ -2462,6 +2496,13 @@ nat_get_unique_tuple(struct conntrack *ct, const struct 
conn *conn,
 set_dport_range(nat_info, >key, hash, _dport,
 _dport, _dport);
 
+range_src = max_sport - min_sport + 1;
+range_dst = max_dport - min_dport + 1;
+range_max = range_src > range_dst ? range_src : range_dst;
+
+nat_get_attempts_range(_addr, _addr, conn->key.dl_type,
+   _attempts, _attempts);
+
 another_round:
 store_addr_to_key(_addr, _conn->rev_key,
   nat_info->nat_action);
@@ -2481,10 +2522,16 @@ another_round:
 nat_conn->rev_key.src.port = htons(curr_dport);
 nat_conn->rev_key.dst.port = htons(curr_sport);
 
-bool found = false;
+attempts = range_max;
+if (attempts > max_attempts) {
+attempts = max_attempts;
+}
+
+another_port_round:
+found = false;
 if (nat_info->nat_action & NAT_ACTION_DST_PORT) {
 found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.src.port,
-  curr_dport, min_dport, max_dport);
+  curr_dport, min_dport, max_dport, attempts);
 if (!found) {
 nat_conn->rev_key.src.port = htons(orig_dport);
 }
@@ -2492,13 +2539,21 @@ another_round:
 
 if (!found) {
 found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.dst.port,
-  curr_sport, min_sport, max_sport);
+  curr_sport, min_sport, max_sport, attempts);
 }
 
 if (found) {
 return true;
 }
 
+if (attempts < range_max && attempts >= min_attempts) {
+attempts /= 2;
+curr_dport = min_dport + (random_uint32() % range_dst);
+curr_sport = min_sport + (random_uint32() % range_src);
+
+goto another_port_round;
+}
+
 /* Check if next IP 

[ovs-dev] [PATCH v10 1/3] conntrack: select correct sport range for well-known origin sport

2022-01-17 Thread wenxu
From: wenxu 

Like the kernel datapath. The sport nat range for well-konwn origin
sport should limit in the well-known ports.

Signed-off-by: wenxu 
Acked-by: Paolo Valerio 
---
 lib/conntrack.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 33a1a92..44f99f3 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2265,8 +2265,16 @@ set_sport_range(const struct nat_action_info_t *ni, 
const struct conn_key *k,
 if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) ||
 ((ni->nat_action & NAT_ACTION_DST))) {
 *curr = ntohs(k->src.port);
-*min = MIN_NAT_EPHEMERAL_PORT;
-*max = MAX_NAT_EPHEMERAL_PORT;
+if (*curr < 512) {
+*min = 1;
+*max = 511;
+} else if (*curr < 1024) {
+*min = 600;
+*max = 1023;
+} else {
+*min = MIN_NAT_EPHEMERAL_PORT;
+*max = MAX_NAT_EPHEMERAL_PORT;
+}
 } else {
 *min = ni->min_port;
 *max = ni->max_port;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev v8 1/2] ipf: add ipf context

2022-01-17 Thread 贺鹏
Hi,

this patch can pass all the system-userspace testsuite if one of my another
patch
gets merged.

So maybe review this patch "[ovs-dev] ofproto-dpif: fix
packet_execute_prepare" first.


Peng He  于2022年1月18日周二 11:28写道:

> From: Peng He 
>
> ipf_postprocess will emit packets into the datapath pipeline ignoring
> the conntrack context, this might casuse weird issues when a packet
> batch has less space to hold all the fragments belonging to single
> packet.
>
> Given the below ruleest and consider sending a 64K ICMP packet which
> is splitted into 64 fragments.
>
> priority=1,action=drop
> priority=10,arp,action=normal
> priority=100,in_port=1,ct_state=-trk,icmp,action=ct(zone=9,table=0)
> priority=100,in_port=1,ct_state=+new+trk,icmp,action=ct(zone=9,commit),2
> priority=100,in_port=1,ct_state=-new+est+trk,icmp,action=2
> priority=100,in_port=2,ct_state=-trk,icmp,action=ct(table=0,zone=9)
> priority=100,in_port=2,ct_state=+trk+est-new,icmp,action=1
>
> Batch 1:
> the first 32 packets will be buffered in the ipf preprocessing, nothing
> more proceeds.
>
> Batch 2:
> the second 32 packets succeed the fragment reassembly and goes to ct
> and ipf_post will emits the first 32 packets due to the limit of batch
> size.
>
> the first 32 packets goes to the datapath again due to the
> recirculation, and again buffered at ipf preprocessing before ct commit,
> then the ovs tries to call ct commit as well as ipf_postprocessing which
> emits
> the last 32 packets, in this case the last 32 packets will follow
> the current actions which will be sent to port 2 directly without
> recirculation and going to ipf preprocssing and ct commit again.
>
> This will cause the first 32 packets never get the chance to
> reassemble and evevntually this large ICMP packets fail to transmit.
>
> this patch fixes this issue by adding firstly ipf context to avoid
> ipf_postprocessing emits packets in the wrong context. Then by
> re-executing the action list again to emit the last 32 packets
> in the right context to correctly transmitting multiple fragments.
>
> This might be one of the root cause why the OVS log warns:
>
> "
> skipping output to input port on bridge br-int while processing
> "
>
> As a pkt might enter into different ipf context.
>
> One implementation trick of implementing ipf_ctx_eq, is to use ipf_list's
> key instead of the first frag's metadata, this can reduce quite a lot of
> cache misses as at the time of calling ipf_ctx_eq, ipf_list is cache-hot.
> On x86_64, this trick saves 2% of CPU usage of ipf processing.
>
> Signed-off-by: Peng He 
> Co-authored-by: Mike Pattrick 
> ---
>  lib/conntrack.c |  9 ---
>  lib/conntrack.h | 15 ++--
>  lib/dpif-netdev.c   | 52 +
>  lib/ipf.c   | 39 ---
>  lib/ipf.h   | 11 +++--
>  tests/system-traffic.at | 33 ++
>  tests/test-conntrack.c  |  6 ++---
>  7 files changed, 135 insertions(+), 30 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 33a1a9295..72999f1ae 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1434,14 +1434,14 @@ process_one(struct conntrack *ct, struct dp_packet
> *pkt,
>   * connection tables.  'setmark', if not NULL, should point to a two
>   * elements array containing a value and a mask to set the connection
> mark.
>   * 'setlabel' behaves similarly for the connection label.*/
> -int
> +bool
>  conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
>ovs_be16 dl_type, bool force, bool commit, uint16_t
> zone,
>const uint32_t *setmark,
>const struct ovs_key_ct_labels *setlabel,
>ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper,
>const struct nat_action_info_t *nat_action_info,
> -  long long now, uint32_t tp_id)
> +  long long now, uint32_t tp_id, struct ipf_ctx *ipf_ctx)
>  {
>  ipf_preprocess_conntrack(ct->ipf, pkt_batch, now, dl_type, zone,
>   ct->hash_basis);
> @@ -1468,9 +1468,8 @@ conntrack_execute(struct conntrack *ct, struct
> dp_packet_batch *pkt_batch,
>  }
>  }
>
> -ipf_postprocess_conntrack(ct->ipf, pkt_batch, now, dl_type);
> -
> -return 0;
> +return ipf_postprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, \
> + now, dl_type);
>  }
>
>  void
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index 9553b188a..211efad3f 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -64,6 +64,7 @@
>  struct dp_packet_batch;
>
>  struct conntrack;
> +struct ipf_ctx;
>
>  union ct_addr {
>  ovs_be32 ipv4;
> @@ -88,13 +89,13 @@ struct nat_action_info_t {
>  struct conntrack *conntrack_init(void);
>  void conntrack_destroy(struct conntrack *);
>
> -int conntrack_execute(struct conntrack *ct, struct dp_packet_batch
> *pkt_batch,

[ovs-dev] ofproto-dpif: fix packet_execute_prepare

2022-01-17 Thread Peng He
In the commit 1df7f7aac8c976690167261fec9a50d832ef9e7e, the packet
metadata's in_port has been changed from ofp port into odp port,
however, the odp->flow->in_port is still ofp_port. This patch changes
the odp->flow->in_port into odp_port.

Fixes: 1df7f7aac8 ("ofproto-dpif: Restore packet metadata when a
continuation is resumed.")
Signed-off-by: Peng He 
---
 ofproto/ofproto-dpif.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index cba49a99e..72de67c63 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4973,6 +4973,7 @@ packet_execute_prepare(struct ofproto *ofproto_,
 ofproto_dpif_set_packet_odp_port(ofproto, opo->flow->in_port.ofp_port,
  opo->packet);
 
+opo->flow->in_port = opo->packet->md.in_port;
 ofproto_dpif_packet_out_delete(aux);
 opo->aux = execute;
 }
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev v8 2/2] ipf: ipf_preprocess_conntrack should also consider ipf_ctx

2022-01-17 Thread 0-day Robot
Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Peng He 
Lines checked: 91, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev v8 1/2] ipf: add ipf context

2022-01-17 Thread 0-day Robot
Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He  needs to sign off.
ERROR: Co-author Mike Pattrick  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Peng He 
Lines checked: 421, Warnings: 1, Errors: 2


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev v8 2/2] ipf: ipf_preprocess_conntrack should also consider ipf_ctx

2022-01-17 Thread Peng He
considering a multi-thread PMD setting, when the frags are reassembled
in one PMD, another thread might call *ipf_execute_reass_pkts* and 'steal'
the reassembled packets into its ipf ctx, then this reassembled
packet will enter into another ipf context and causes errors.

This happends when there are multiple CT zones, and frags are
reassembled in ct(zone=X) might be 'stealed' into the ct(zone=Y).

Signed-off-by: Peng He 
---
 lib/conntrack.c |  2 +-
 lib/ipf.c   | 10 --
 lib/ipf.h   |  1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 72999f1ae..aafa6b536 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1443,7 +1443,7 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
   const struct nat_action_info_t *nat_action_info,
   long long now, uint32_t tp_id, struct ipf_ctx *ipf_ctx)
 {
-ipf_preprocess_conntrack(ct->ipf, pkt_batch, now, dl_type, zone,
+ipf_preprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, now, dl_type, zone,
  ct->hash_basis);
 
 struct dp_packet *packet;
diff --git a/lib/ipf.c b/lib/ipf.c
index 2ff04c615..031164bb1 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -1135,7 +1135,8 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
dp_packet_batch *pb,
 /* Adds a reassmebled packet to a packet batch to be processed by the caller.
  */
 static void
-ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb)
+ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb,
+   struct ipf_ctx *ctx)
 {
 if (ovs_list_is_empty(>reassembled_pkt_list)) {
 return;
@@ -1145,6 +1146,10 @@ ipf_execute_reass_pkts(struct ipf *ipf, struct 
dp_packet_batch *pb)
 struct reassembled_pkt *rp, *next;
 
 LIST_FOR_EACH_SAFE (rp, next, rp_list_node, >reassembled_pkt_list) {
+if (ctx && !ipf_ctx_eq(rp->list, ctx, rp->pkt)) {
+continue;
+}
+
 if (!rp->list->reass_execute_ctx &&
 ipf_dp_packet_batch_add(pb, rp->pkt, false)) {
 rp->list->reass_execute_ctx = rp->pkt;
@@ -1246,6 +1251,7 @@ ipf_post_execute_reass_pkts(struct ipf *ipf,
  * be added to the batch to be sent through conntrack. */
 void
 ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb,
+ struct ipf_ctx *ipf_ctx,
  long long now, ovs_be16 dl_type,
  uint16_t zone, uint32_t hash_basis)
 {
@@ -1254,7 +1260,7 @@ ipf_preprocess_conntrack(struct ipf *ipf, struct 
dp_packet_batch *pb,
 }
 
 if (ipf_get_enabled(ipf) || atomic_count_get(>nfrag)) {
-ipf_execute_reass_pkts(ipf, pb);
+ipf_execute_reass_pkts(ipf, pb, ipf_ctx);
 }
 }
 
diff --git a/lib/ipf.h b/lib/ipf.h
index 7bbf453af..8974f15ae 100644
--- a/lib/ipf.h
+++ b/lib/ipf.h
@@ -49,6 +49,7 @@ struct ipf_ctx {
 struct ipf *ipf_init(void);
 void ipf_destroy(struct ipf *ipf);
 void ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb,
+  struct ipf_ctx *ctx,
   long long now, ovs_be16 dl_type, uint16_t zone,
   uint32_t hash_basis);
 
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev v8 1/2] ipf: add ipf context

2022-01-17 Thread Peng He
From: Peng He 

ipf_postprocess will emit packets into the datapath pipeline ignoring
the conntrack context, this might casuse weird issues when a packet
batch has less space to hold all the fragments belonging to single
packet.

Given the below ruleest and consider sending a 64K ICMP packet which
is splitted into 64 fragments.

priority=1,action=drop
priority=10,arp,action=normal
priority=100,in_port=1,ct_state=-trk,icmp,action=ct(zone=9,table=0)
priority=100,in_port=1,ct_state=+new+trk,icmp,action=ct(zone=9,commit),2
priority=100,in_port=1,ct_state=-new+est+trk,icmp,action=2
priority=100,in_port=2,ct_state=-trk,icmp,action=ct(table=0,zone=9)
priority=100,in_port=2,ct_state=+trk+est-new,icmp,action=1

Batch 1:
the first 32 packets will be buffered in the ipf preprocessing, nothing
more proceeds.

Batch 2:
the second 32 packets succeed the fragment reassembly and goes to ct
and ipf_post will emits the first 32 packets due to the limit of batch
size.

the first 32 packets goes to the datapath again due to the
recirculation, and again buffered at ipf preprocessing before ct commit,
then the ovs tries to call ct commit as well as ipf_postprocessing which emits
the last 32 packets, in this case the last 32 packets will follow
the current actions which will be sent to port 2 directly without
recirculation and going to ipf preprocssing and ct commit again.

This will cause the first 32 packets never get the chance to
reassemble and evevntually this large ICMP packets fail to transmit.

this patch fixes this issue by adding firstly ipf context to avoid
ipf_postprocessing emits packets in the wrong context. Then by
re-executing the action list again to emit the last 32 packets
in the right context to correctly transmitting multiple fragments.

This might be one of the root cause why the OVS log warns:

"
skipping output to input port on bridge br-int while processing
"

As a pkt might enter into different ipf context.

One implementation trick of implementing ipf_ctx_eq, is to use ipf_list's
key instead of the first frag's metadata, this can reduce quite a lot of
cache misses as at the time of calling ipf_ctx_eq, ipf_list is cache-hot.
On x86_64, this trick saves 2% of CPU usage of ipf processing.

Signed-off-by: Peng He 
Co-authored-by: Mike Pattrick 
---
 lib/conntrack.c |  9 ---
 lib/conntrack.h | 15 ++--
 lib/dpif-netdev.c   | 52 +
 lib/ipf.c   | 39 ---
 lib/ipf.h   | 11 +++--
 tests/system-traffic.at | 33 ++
 tests/test-conntrack.c  |  6 ++---
 7 files changed, 135 insertions(+), 30 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 33a1a9295..72999f1ae 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1434,14 +1434,14 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
  * connection tables.  'setmark', if not NULL, should point to a two
  * elements array containing a value and a mask to set the connection mark.
  * 'setlabel' behaves similarly for the connection label.*/
-int
+bool
 conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
   ovs_be16 dl_type, bool force, bool commit, uint16_t zone,
   const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper,
   const struct nat_action_info_t *nat_action_info,
-  long long now, uint32_t tp_id)
+  long long now, uint32_t tp_id, struct ipf_ctx *ipf_ctx)
 {
 ipf_preprocess_conntrack(ct->ipf, pkt_batch, now, dl_type, zone,
  ct->hash_basis);
@@ -1468,9 +1468,8 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 }
 }
 
-ipf_postprocess_conntrack(ct->ipf, pkt_batch, now, dl_type);
-
-return 0;
+return ipf_postprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, \
+ now, dl_type);
 }
 
 void
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 9553b188a..211efad3f 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -64,6 +64,7 @@
 struct dp_packet_batch;
 
 struct conntrack;
+struct ipf_ctx;
 
 union ct_addr {
 ovs_be32 ipv4;
@@ -88,13 +89,13 @@ struct nat_action_info_t {
 struct conntrack *conntrack_init(void);
 void conntrack_destroy(struct conntrack *);
 
-int conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
-  ovs_be16 dl_type, bool force, bool commit, uint16_t zone,
-  const uint32_t *setmark,
-  const struct ovs_key_ct_labels *setlabel,
-  ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper,
-  const struct nat_action_info_t *nat_action_info,
-  long long now, uint32_t tp_id);
+bool conntrack_execute(struct conntrack *ct, 

[ovs-dev] [PATCH ovn] northd: Fix missing space in log warnings.

2022-01-17 Thread James Troup
Add a space between 'static route' and the UUID in logs.

Signed-off-by: James Troup 
---
 northd/northd.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

e.g.

2022-01-17T19:55:27.701Z|1601438|ovn_northd|WARN|Address family doesn't match 
between 'ip_prefix' 0.0.0.0/0 and 'nexthop' 2620:2d:4000:2000::1 in static 
routefb68fe04-ca8e-4885-a01f-9e7ac6b337e5

diff --git a/northd/northd.c b/northd/northd.c
index c714227b2..22e783ff6 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8872,7 +8872,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct 
hmap *ports,
 if (valid_nexthop) {
 if (!ip46_parse_cidr(route->nexthop, , )) {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
-VLOG_WARN_RL(, "bad 'nexthop' %s in static route"
+VLOG_WARN_RL(, "bad 'nexthop' %s in static route "
  UUID_FMT, route->nexthop,
  UUID_ARGS(>header_.uuid));
 return NULL;
@@ -8880,7 +8880,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct 
hmap *ports,
 if ((IN6_IS_ADDR_V4MAPPED() && plen != 32) ||
 (!IN6_IS_ADDR_V4MAPPED() && plen != 128)) {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
-VLOG_WARN_RL(, "bad next hop mask %s in static route"
+VLOG_WARN_RL(, "bad next hop mask %s in static route "
  UUID_FMT, route->nexthop,
  UUID_ARGS(>header_.uuid));
 return NULL;
@@ -8891,7 +8891,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct 
hmap *ports,
 struct in6_addr prefix;
 if (!ip46_parse_cidr(route->ip_prefix, , )) {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
-VLOG_WARN_RL(, "bad 'ip_prefix' %s in static route"
+VLOG_WARN_RL(, "bad 'ip_prefix' %s in static route "
  UUID_FMT, route->ip_prefix,
  UUID_ARGS(>header_.uuid));
 return NULL;
@@ -8902,7 +8902,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct 
hmap *ports,
 if (IN6_IS_ADDR_V4MAPPED() != IN6_IS_ADDR_V4MAPPED()) {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
 VLOG_WARN_RL(, "Address family doesn't match between 
'ip_prefix'"
- " %s and 'nexthop' %s in static route"UUID_FMT,
+ " %s and 'nexthop' %s in static route "UUID_FMT,
  route->ip_prefix, route->nexthop,
  UUID_ARGS(>header_.uuid));
 return NULL;
-- 
2.32.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS "soft freeze" for 2.17 is in effect.

2022-01-17 Thread Ilya Maximets
On 1/4/22 00:37, Ilya Maximets wrote:
> Hi.  As described in Documentation/internals/release-process.rst, we are
> in a "soft freeze" state:
> 
>During the freeze, we ask committers to refrain from applying patches that
>add new features unless those patches were already being publicly discussed
>and reviewed before the freeze began.  Bug fixes are welcome at any time.
>Please propose and discuss exceptions on ovs-dev.
> 
> We should branch for version 2.17 in two weeks from now, on Monday, Jan 17.

I'm a bit behind the schedule.  It's already way past the business hours for
me and I still have a few patches in the queue to apply.  So, I'm finishing my
work for today and will continue tomorrow.  Sorry for the inconvenience.

Branch creation date shifts for about a day for now.  I think, the actual
release date should not be affected by that and should still be on Feb 17,
unless there are any concerns.

Best regards, Ilya Maximets.

> With that, release should be on Thursday, Feb 17.
> 
> We do have a fair amount of new features on a mailing list which are already
> reviewed but not accepted yet.  For the next 2 weeks before branching I will
> mostly work on re-checking and applying those patches.
> 
> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/5] adding USDT points to OVS

2022-01-17 Thread Ilya Maximets
On 12/22/21 10:15, Eelco Chaudron wrote:
> This patchset introduces User Statically Defined Trace (USDT)
> points to OVS. It included the general infrastructure,
> documentation, and some example scripts.
> 
> This idea was introduced during the OVS conference 2021, more
> information about the presentation can be found here:
> http://www.openvswitch.org/support/ovscon2021/#T1
> 
> While waiting for the official conference video post, the
> pre-recorded video can be found here: https://youtu.be/NrO0YbdTvbg
> 
> Eelco Chaudron (5):
>   configure: add --enable-usdt option to enable USDT probes
>   openvswitch: define the OVS_STATIC_TRACE() macro
>   Documentation: add USDT documentation and bpftrace example
>   utilities: add upcall USDT probe and associated script
>   utilities: add netlink flow operation USDT probes and upcall_cost script

Thanks, Eelco!  Very cool traces and scripts!

And thanks, Paolo, for review!
Applied.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/5] dpif-netdev: Hash-based Tx packet steering

2022-01-17 Thread Maxime Coquelin




On 1/17/22 18:44, Ilya Maximets wrote:

On 1/5/22 09:19, Maxime Coquelin wrote:

This series introduces a new hash-based Tx packets steering
mode alognside existing XPS and static modes. The goal is
to provide a mode where all the transmit queues are used,
whatever the number of PMD threads. This may be used with
Vhost-user ports, where the guest application driving the
Virtio device expects packets to be distributed on all the
queues.

As a preliminary step, in order to be able to validate the
feature at OVS level, the two first patches introduce
per-queue basic statistics for Vhost-user and dummy ports.
These patches are complementary to David's patch [0] adding
per-queue statistics to DPDK ports using xstats.

The series also introduces a dpif-netdev test for tx
steering, using dummy-pmd ports only, instead of the more
complex Vhost-user test in first revision.

Changes in v5:
==
- Fix indents (David)
- Fix/remove comments (David)
- Fix grammar in documentation (David)
- Rebase and applied David's R-by's

Maxime Coquelin (5):
   netdev-dpdk: Introduce per rxq/txq Vhost-user statistics.
   netdev-dummy: Introduce per rxq/txq statistics.
   dpif-netdev: Introduce Tx queue mode.
   dpif-netdev: Introduce hash-based Tx packet steering mode.
   dpif-netdev.at: Add test for Tx packets steering.

  Documentation/automake.mk |   1 +
  Documentation/topics/index.rst|   1 +
  .../topics/userspace-tx-steering.rst  |  73 +
  lib/dpif-netdev.c | 142 ++---
  lib/netdev-dpdk.c | 147 --
  lib/netdev-dummy.c|  87 +--
  tests/dpif-netdev.at  |  67 
  tests/ofproto.at  |   3 +-
  8 files changed, 477 insertions(+), 44 deletions(-)
  create mode 100644 Documentation/topics/userspace-tx-steering.rst



Thanks, Maxime and David!

With some cosmetic changes I applied the series, except for the
first patch (vhost-user statistics), because it causes performance
degradation for me.  I'll reply with details to that patch.


Thanks, I agree with skipping the Vhost-user stats patch for now.


And I just noticed that we're missing the documentation update
for vswitchd/vswitch.xml.  Could you, please, update it with
a new tx-steering option and send as a separate patch?


I just sent the documentation patch adding the new PMD option.

Regards,
Maxime


Best regards, Ilya Maximets.



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] vswitchd.xml: Add missing tx-steering PMD option.

2022-01-17 Thread Maxime Coquelin
This patch documents PMD's other_config:tx-steering option.

Signed-off-by: Maxime Coquelin 
---
 vswitchd/vswitch.xml | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 026b5e2ca..ef7f8f2c8 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3375,6 +3375,28 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch 
options:peer=p1 \
 
 This option may only be used with dpdk VF representors.
   
+
+  
+
+  Specifies the Tx steering mode for the interface.
+
+
+  thread enables static txq mapping when the number of txq
+  is greater or equal than the number of PMD threads, and XPS mode if
+  lower than the number of PMD threads.
+
+
+  hash enables hash-based Tx steering, which distributes
+  the packets on all the transmit queues based on their 5-tuples
+  hashes.
+
+
+  Defaults to thread.
+
+  
+
 
 
 
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v9 3/3] conntrack: limit port clash resolution attempts

2022-01-17 Thread Paolo Valerio
Hi wenxu,

I think we're almost there.
There are some improvements still possible, but could be included in
later patches.
Theoretically, we should perform the first lookup with the orig_ports
and then start with random sub ranges. The current patch doesn't do
that, but we can leave this as a further improvement.

we...@ucloud.cn writes:

> From: wenxu 
>
> In case almost or all available ports are taken, clash resolution can
> take a very long time, resulting in pmd hang in conntrack.
>
> This can happen when many to-be-natted hosts connect to same
> destination:port (e.g. a proxy) and all connections pass the same SNAT.
>
> Pick a random offset in the acceptable range, then try ever smaller
> number of adjacent port numbers, until either the limit is reached or a
> useable port was found.  This results in at most 248 attempts
> (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
> instead of 64000+.
>
> And if thenumber of ip address will limit the max attempts and which
> will lead the total attempts under 248.
>
> Signed-off-by: wenxu 
> ---
>  lib/conntrack.c | 60 
> ++---
>  1 file changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index dc29c9b..4079a35 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2400,13 +2400,18 @@ next_addr_in_range_guarded(union ct_addr *curr, union 
> ct_addr *min,
>  static bool
>  nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn,
>ovs_be16 *port, uint16_t curr, uint16_t min,
> -  uint16_t max)
> +  uint16_t max, unsigned int *round,
> +  unsigned int attempts)
>  {
>  FOR_EACH_PORT_IN_RANGE(curr, min, max) {
> -*port = htons(curr);
> -if (!conn_lookup(ct, _conn->rev_key,
> - time_msec(), NULL, NULL)) {
> -return true;
> +if ((*round)++ < attempts) {
> +*port = htons(curr);
> +if (!conn_lookup(ct, _conn->rev_key,
> + time_msec(), NULL, NULL)) {
> +return true;
> +}
> +} else {
> +break;
>  }
>  }
>  
> @@ -2446,6 +2451,10 @@ nat_get_unique_tuple(struct conntrack *ct, const 
> struct conn *conn,
>  uint16_t min_dport, max_dport, curr_dport, orig_dport;
>  bool pat_proto = conn->key.nw_proto == IPPROTO_TCP ||
>   conn->key.nw_proto == IPPROTO_UDP;
> +unsigned int attempts, max_attempts, min_attempts;
> +uint16_t range_src, range_dst, range_max;
> +uint32_t range_addr;
> +unsigned int i;
>  
>  min_addr = nat_info->min_addr;
>  max_addr = nat_info->max_addr;
> @@ -2462,6 +2471,24 @@ nat_get_unique_tuple(struct conntrack *ct, const 
> struct conn *conn,
>  set_dport_range(nat_info, >key, hash, _dport,
>  _dport, _dport);
>  
> +range_src = max_sport - min_sport + 1;
> +range_dst = max_dport - min_dport + 1;
> +range_max = range_src > range_dst ? range_src : range_dst;
> +if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
> +range_addr = ntohl(max_addr.ipv4) - ntohl(min_addr.ipv4) + 1;
> +} else {
> +range_addr = nat_ipv6_addrs_delta(_info->min_addr.ipv6,
> +  _info->max_addr.ipv6) + 1;
> +}
> +max_attempts = 128 / range_addr;
> +if (max_attempts < 1) {
> +max_attempts = 1;
> +}
> +min_attempts = 16 / range_addr;
> +if (min_attempts < 2) {
> +min_attempts = 2;
> +}
> +

can you put the code of this hunk in a function? I think it could
improve a bit the readability.
Please use min_addr and max_addr instead of nat_info when calculating
the addr ranges.

>  another_round:
>  store_addr_to_key(_addr, _conn->rev_key,
>nat_info->nat_action);
> @@ -2481,24 +2508,43 @@ another_round:
>  nat_conn->rev_key.src.port = htons(curr_dport);
>  nat_conn->rev_key.dst.port = htons(curr_sport);
>  
> +attempts = range_max;
> +if (attempts > max_attempts) {
> +attempts = max_attempts;
> +}
> +
> +another_port_round:
> +i = 0;
> +
>  bool found = false;
>  if (nat_info->nat_action & NAT_ACTION_DST_PORT) {
>  found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.src.port,
> -  curr_dport, min_dport, max_dport);
> +  curr_dport, min_dport, max_dport, ,

is there any reason you pass  instead of using a local variable in
nat_get_unique_l4()?

> +  attempts);
>  if (!found) {
> +i = 0;
>  nat_conn->rev_key.src.port = htons(orig_dport);
>  }
>  }
>  
>  if (!found) {
>  found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.dst.port,
> -  curr_sport, min_sport, max_sport);
> + 

Re: [ovs-dev] [PATCH v5 1/5] netdev-dpdk: Introduce per rxq/txq Vhost-user statistics.

2022-01-17 Thread Ilya Maximets
On 1/5/22 09:19, Maxime Coquelin wrote:
> Hash-based Tx steering feature will enable steering Tx
> packets on transmit queues based on their hashes. In order
> to test the feature, it is needed to be able to get the
> per-queue statistics for Vhost-user ports.
> 
> This patch introduces "bytes", "packets" and "error"
> per-queue custom statistics for Vhost-user ports.
> 
> Suggested-by David Marchand 
> Signed-off-by: Maxime Coquelin 
> Reviewed-by: David Marchand 
> ---
>  lib/netdev-dpdk.c | 147 +++---
>  1 file changed, 138 insertions(+), 9 deletions(-)

Hi, Maxime.

Thanks for the patch; and I really think that it's an important
feature for debugging performance issues in a real-world setups.

However, it causes a performance drop by about 2-2.5% for me
with the VM-VM bidirectional traffic with 2 PMD threads.

The reason is the existing stats_lock.  Unfortunately, in current
code, we're taking the same stats_lock on both rx and tx paths,
and since rx and tx are likely performed by different threads at
the same time, they are frequently locking each other.

Under this circumstances even a slight increase of a critical
section causes a visible performance drop.

One of the possible solutions might be to split the stats_lock
in two (one for rx stats and one for tx stats).  We also should
split or re-align to different cache lines rx and tx fields
of the generic struct netdev_stats, or count all the stats on the
per-queue basis.
Quick prototype of such a solution gives an extra 2-3% performance
boost over the current master and reduces the impact of extra
stats in this patch to a minimum.

I'll polish and submit my prototype code sometime later.
For now, I think, we won't be able to accept this change for 2.17,
since some more development is needed to avoid regression.

There is also a memory leak in this code, but that can be easily
fixed:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0e1efefe3..f8680a058 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1549,6 +1549,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
 dev->vhost_id = NULL;
 rte_free(dev->vhost_rxq_enabled);
 
+free(dev->vhost_rxq_stats);
+free(dev->vhost_txq_stats);
+
 common_destruct(dev);
 
 ovs_mutex_unlock(_mutex);
---

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/5] dpif-netdev: Hash-based Tx packet steering

2022-01-17 Thread Ilya Maximets
On 1/5/22 09:19, Maxime Coquelin wrote:
> This series introduces a new hash-based Tx packets steering
> mode alognside existing XPS and static modes. The goal is
> to provide a mode where all the transmit queues are used,
> whatever the number of PMD threads. This may be used with
> Vhost-user ports, where the guest application driving the
> Virtio device expects packets to be distributed on all the
> queues.
> 
> As a preliminary step, in order to be able to validate the
> feature at OVS level, the two first patches introduce
> per-queue basic statistics for Vhost-user and dummy ports.
> These patches are complementary to David's patch [0] adding
> per-queue statistics to DPDK ports using xstats.
> 
> The series also introduces a dpif-netdev test for tx
> steering, using dummy-pmd ports only, instead of the more
> complex Vhost-user test in first revision.
> 
> Changes in v5:
> ==
> - Fix indents (David)
> - Fix/remove comments (David)
> - Fix grammar in documentation (David)
> - Rebase and applied David's R-by's
> 
> Maxime Coquelin (5):
>   netdev-dpdk: Introduce per rxq/txq Vhost-user statistics.
>   netdev-dummy: Introduce per rxq/txq statistics.
>   dpif-netdev: Introduce Tx queue mode.
>   dpif-netdev: Introduce hash-based Tx packet steering mode.
>   dpif-netdev.at: Add test for Tx packets steering.
> 
>  Documentation/automake.mk |   1 +
>  Documentation/topics/index.rst|   1 +
>  .../topics/userspace-tx-steering.rst  |  73 +
>  lib/dpif-netdev.c | 142 ++---
>  lib/netdev-dpdk.c | 147 --
>  lib/netdev-dummy.c|  87 +--
>  tests/dpif-netdev.at  |  67 
>  tests/ofproto.at  |   3 +-
>  8 files changed, 477 insertions(+), 44 deletions(-)
>  create mode 100644 Documentation/topics/userspace-tx-steering.rst
> 

Thanks, Maxime and David!

With some cosmetic changes I applied the series, except for the
first patch (vhost-user statistics), because it causes performance
degradation for me.  I'll reply with details to that patch.

And I just noticed that we're missing the documentation update
for vswitchd/vswitch.xml.  Could you, please, update it with
a new tx-steering option and send as a separate patch?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 ovn 0/3] northd: fix IPv6-PD with northd IP rework

2022-01-17 Thread Numan Siddique
On Fri, Jan 14, 2022 at 1:17 PM Mark Michelson  wrote:
>
> Thanks, Lorenzo.
>
> Acked-by: Mark Michelson 

Thanks Lorenzo and Mark.

I applied the patch series to the main branch and branch-21.12

Numan

>
> On 1/13/22 19:12, Lorenzo Bianconi wrote:
> > Changes since v1:
> > - fix dhcpd issue on ubuntu env
> > - enable IPv6 PD testing by default in gh workload
> >
> > Lorenzo Bianconi (3):
> >northd: fix IPv6-PD with northd IP rework
> >test: replace dibbler with dhcpd
> >Add isc-dhcp-server to github workload
> >
> >   .github/workflows/test.yml |  2 +-
> >   northd/ovn-northd.c|  3 ++-
> >   tests/atlocal.in   |  4 ++--
> >   tests/system-ovn.at| 39 --
> >   4 files changed, 21 insertions(+), 27 deletions(-)
> >
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy get/set cmd for netdev datapath

2022-01-17 Thread Aaron Conole
Aaron Conole  writes:

> we...@ucloud.cn writes:
>
>> From: wenxu 
>>
>> Now, the default timeout policy for netdev datapath is hard codeing. In
>> some case show or modify is needed.
>> Add command for get/set default timeout policy. Using like this:
>>
>> ovs-appctl dpctl/ct-get-default-tp [dp]
>> ovs-appctl dpctl/ct-set-default-tp [dp] policies
>>
>> Signed-off-by: wenxu 
>> ---
>
> So far looks good.  Just one minor comment (and one conflict) that might be 
> able to be addressed

After over an hour of loops, I did manage to produce one failure with
your test case:

--- -   2022-01-17 10:31:12.740151293 -0500
+++ 
/home/aconole/git/ovs2/ovs/tests/system-userspace-testsuite.dir/at-groups/94/stdout
 2022-01-17 10:31:12.739245956 -0500
@@ -1,3 +1,2 @@
-icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
 
udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5


I think the test suite is "stable" but it might be better to use
something like 'ovs-ofctl monitor' to watch for events.  Maybe you have
tried this already?

The errant 'sleep's can result in failures like this.

WDYT?

>>  NEWS |  4 +++
>>  lib/conntrack-tp.c   | 11 +++
>>  lib/conntrack-tp.h   |  2 ++
>>  lib/ct-dpif.c| 56 
>>  lib/ct-dpif.h|  9 ++
>>  lib/dpctl.c  | 69 
>> 
>>  lib/dpif-netdev.c| 25 +++
>>  lib/dpif-netlink.c   |  2 ++
>>  lib/dpif-provider.h  |  8 +
>>  tests/system-kmod-macros.at  | 10 ++
>>  tests/system-traffic.at  | 67 ++
>>  tests/system-userspace-macros.at |  7 
>>  12 files changed, 270 insertions(+)
>>
>> diff --git a/NEWS b/NEWS
>> index 553a57d..f9fc04a 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -39,6 +39,10 @@ Post-v2.16.0
>> is mainly for the benefit of OVN load balancing configurations.
>> - Ingress policing on Linux now uses 'matchall' classifier instead of
>>   'basic', if available.
>> +   - ovs-appctl dpctl/:
>> + * New commands 'ct-set-default-tp' and
>> +   'ct-set-default-tp' that allows to get or configure
>> +   netdev datapath ct default timeout policy.
>
> I had a conflict with this section on apply.  It isn't a huge deal.
>
>> diff --git a/lib/conntrack-tp.h b/lib/conntrack-tp.h
>> index 4d411d1..07dcb4e 100644
>> --- a/lib/conntrack-tp.h
>> +++ b/lib/conntrack-tp.h
>> @@ -710,6 +729,43 @@ ct_dpif_free_zone_limits(struct ovs_list *zone_limits)
>>  }
>>  }
>>  
>> +
>> +/* Parses a specification of a timeout policy from 's' into '*tp'
>> + * .  Returns true on success.  Otherwise, returns false and
>> + * and puts the error message in 'ds'. */
>
> The '.' in the above is ending the sentence on the previous line, and
> the world 'and' appears twice in a row.
>
> I think it should read:
>
>/* Parses a specification of a timeout policy from 's' into '*tp'.
> * Returns true on success.  Otherwise, returns false and puts the
> * error message in 'ds'. */

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] system-traffic.at: Do not use ranges with broadcast address.

2022-01-17 Thread Aaron Conole
Paolo Valerio  writes:

> turn a bunch of test ranges from 10.1.1.240-10.1.1.255 to
> 10.1.1.240-10.1.1.254. 10.1.1.255 is the broadcast address for
> 10.1.1.0/24 and can be picked to SNAT packets causing the subsequent
> failure of the test.
>
> Fixes: 9ac0aadab9f9 ("conntrack: Add support for NAT.")
> Fixes: e32cd4c6292e ("conntrack: ignore port for ICMP/ICMPv6 NAT.")
> Signed-off-by: Paolo Valerio 
> ---

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy get/set cmd for netdev datapath

2022-01-17 Thread Aaron Conole
we...@ucloud.cn writes:

> From: wenxu 
>
> Now, the default timeout policy for netdev datapath is hard codeing. In
> some case show or modify is needed.
> Add command for get/set default timeout policy. Using like this:
>
> ovs-appctl dpctl/ct-get-default-tp [dp]
> ovs-appctl dpctl/ct-set-default-tp [dp] policies
>
> Signed-off-by: wenxu 
> ---

So far looks good.  Just one minor comment (and one conflict) that might be 
able to be addressed

>  NEWS |  4 +++
>  lib/conntrack-tp.c   | 11 +++
>  lib/conntrack-tp.h   |  2 ++
>  lib/ct-dpif.c| 56 
>  lib/ct-dpif.h|  9 ++
>  lib/dpctl.c  | 69 
> 
>  lib/dpif-netdev.c| 25 +++
>  lib/dpif-netlink.c   |  2 ++
>  lib/dpif-provider.h  |  8 +
>  tests/system-kmod-macros.at  | 10 ++
>  tests/system-traffic.at  | 67 ++
>  tests/system-userspace-macros.at |  7 
>  12 files changed, 270 insertions(+)
>
> diff --git a/NEWS b/NEWS
> index 553a57d..f9fc04a 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -39,6 +39,10 @@ Post-v2.16.0
> is mainly for the benefit of OVN load balancing configurations.
> - Ingress policing on Linux now uses 'matchall' classifier instead of
>   'basic', if available.
> +   - ovs-appctl dpctl/:
> + * New commands 'ct-set-default-tp' and
> +   'ct-set-default-tp' that allows to get or configure
> +   netdev datapath ct default timeout policy.

I had a conflict with this section on apply.  It isn't a huge deal.

> diff --git a/lib/conntrack-tp.h b/lib/conntrack-tp.h
> index 4d411d1..07dcb4e 100644
> --- a/lib/conntrack-tp.h
> +++ b/lib/conntrack-tp.h
> @@ -710,6 +729,43 @@ ct_dpif_free_zone_limits(struct ovs_list *zone_limits)
>  }
>  }
>  
> +
> +/* Parses a specification of a timeout policy from 's' into '*tp'
> + * .  Returns true on success.  Otherwise, returns false and
> + * and puts the error message in 'ds'. */

The '.' in the above is ending the sentence on the previous line, and
the world 'and' appears twice in a row.

I think it should read:

   /* Parses a specification of a timeout policy from 's' into '*tp'.
* Returns true on success.  Otherwise, returns false and puts the
* error message in 'ds'. */

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] conntrack: Fix FTP NAT when TCP but not IP offload is supported

2022-01-17 Thread Mike Pattrick
On Fri, Jan 14, 2022 at 4:13 PM Flavio Leitner  wrote:
>
> On Fri, Jan 14, 2022 at 03:50:52PM -0500, Mike Pattrick wrote:
> > On Fri, Jan 14, 2022 at 3:33 PM Flavio Leitner  wrote:
> > >
> > >
> > > Hi Mike,
> > >
> > > Thanks for working on this issue.
> > >
> > > On Fri, Jan 14, 2022 at 10:45:35AM -0500, Mike Pattrick wrote:
> > > > Formerly when userspace TSO was enabled but with a non-DKDK interface
> > > > without support IP checksum offloading, FTP NAT connections would fail
> > > > if the packet length changed. This can happen if the packets length
> > > > changes during L7 NAT translation, predominantly with FTP.
> > > >
> > > > Now we correct the IP header checksum if hwol is disabled or if DPDK
> > > > will not handle the IP checksum. This fixes the conntrack - IPv4 FTP
> > > > Passive with DNAT" test when run with check-system-tso.
> > > >
> > > > Reported-by: Flavio Leitner 
> > >
> > > Actually, this was initially reported by Ilya.
> > >
> > > > Signed-off-by: Mike Pattrick 
> > > > ---
> > > >  lib/conntrack.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > > > index 33a1a9295..1b8a26ac2 100644
> > > > --- a/lib/conntrack.c
> > > > +++ b/lib/conntrack.c
> > > > @@ -3402,7 +3402,8 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
> > > > conn_lookup_ctx *ctx,
> > > >  }
> > > >  if (seq_skew) {
> > > >  ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
> > > > -if (!dp_packet_hwol_is_ipv4(pkt)) {
> > > > +if (!dp_packet_hwol_is_ipv4(pkt) ||
> > > > +!dp_packet_ip_checksum_valid(pkt)) {
> > > >  l3_hdr->ip_csum = 
> > > > recalc_csum16(l3_hdr->ip_csum,
> > > >  
> > > > l3_hdr->ip_tot_len,
> > > >  htons(ip_len));
> > >
> > > The problem is that the current code doesn't include IPv4 csum
> > > handling as required by the Linux software ports.
> > >
> > > The patch above resolves the unit test issue because non-DPDK
> > > interfaces will not flag the packet with good IP csum, and then
> > > the csum is updated accordingly. However, a packet coming from
> > > a physical DPDK port can have that flag set by the PMD, then if
> > > it goes through that part the IP csum is not updated, which
> > > will cause a problem if that packet is sent out over a Linux
> > > software port later.

I was curious about the performance impact of using csum() instead of
recalc_csum16(). While setting up this benchmark, I also noticed some
ways that we could improve recalc_csum16(). For example, making the
function inline and unrolling the while loop.

In my test, Performed 2^32 checksum updates with both the original,
and my updated recalc_csum16() function, and then 2^32 full ipv4
header checksum calculations with the csum function(). I also tested
both with and without clearing the cpu cache between calls.

I found that the optimized recalc_csum16() was 1.5x faster then the
current implementation, and the current recalc_csum16() implementation
was only 2x faster then a full header calculation with csum().

Given these results, and the fact that any time we update the header,
we will usually be affecting more then two bytes anyways, I think the
performance improvement from using recalc_csum16() instead of csum()
isn't so fantastic.

-M

> >
> > This is a good point, I was trying to get to a happy medium without
> > repurposing the patchset that you had previously submitted. That way
> > this patch could be available immediately, and your more thorough TSO
> > patchset could be applied after.
>
> I see what you did, and I appreciate that. My concern is that it's
> not unusual to have packets moving between dpdk and linux ports, so
> we might have to visit this issue again, though the unit test is not
> failing.
>
> > I had also prepared a larger patch as part of this work, but scrapped
> > it because it was duplicating a lot of the work you had previously
> > done. But if you prefer the more substantial solution, we can scrap
> > this patch and I can submit the larger one.
>
> Feel free to duplicate any parts of that RFC if that makes sense.
> I think it's not duplicating anything. It's more as doing the
> pieces that are more pressing first.
>
> Let's see the larger one and decide. If it is not suitable, then
> maybe as a potential work around we can ignore offloading at that
> point and always calculate full IP csum there.
>
> fbl
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS DPDK MFEX Unit Tests Failing

2022-01-17 Thread David Marchand
On Wed, Jan 12, 2022 at 5:44 PM Ilya Maximets  wrote:
> Thanks for the report.  And thanks, David, for checking,
> I can not verify the change right now, I just wanted to rant for a
> bit about system-dpdk testsuite being highly unstable.  It depends
> on the system configuration too much.  Today I touched my iommu
> configuration and now all the tests are failing due to tons of
> virtual address hint warnings:
>
> +2022-01-12T16:25:40.190Z|00023|dpdk|WARN|EAL: WARNING! Base virtual address 
> hint (0x15000 != 0x7f7855c9) not respected!
> +2022-01-12T16:25:40.190Z|00024|dpdk|WARN|EAL:This may cause issues with 
> mapping memory into secondary processes
> +2022-01-12T16:25:40.191Z|00025|dpdk|WARN|EAL: WARNING! Base virtual address 
> hint (0x1b000 != 0x7f7855e0c000) not respected!
> +2022-01-12T16:25:40.191Z|00026|dpdk|WARN|EAL:This may cause issues with 
> mapping memory into secondary processes
> +2022-01-12T16:25:40.191Z|00027|dpdk|WARN|EAL: WARNING! Base virtual address 
> hint (0x24000c000 != 0x7f704000) not respected!
> +2022-01-12T16:25:40.191Z|00028|dpdk|WARN|EAL:This may cause issues with 
> mapping memory into secondary processes
>
> Maybe we should just disable all DPDK warnings?

About disabling all warnings, I am unsure.
As a DPDK contributor, this test helps me to catch unwanted DPDK logs
from the core libraries...

No strong opinion.


> BTW, DPDK should probably not emit a warning about secondary process
> for applications with disabled multiprocessing.

For this log here, that's something to fix/enhance.
There was a patch from Anatoly that made those logs debug level
(unless a base-virtaddr option was passed). I can revive the thread.
That would hide this issue, but the memory allocator still expects mp
to be available, regardless of --in-memory or rte_mp_disable calls.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v8] Encap & Decap actions for MPLS packet type.

2022-01-17 Thread Varghese, Martin (Nokia - IN/Bangalore)
> This version looks much better with PT_MPLS_MC.  With a few cosmetic changes, 
> I applied the patch.

HI Ilya,
Thanks for adding the missing pieces and merging.

> One non-cosmetic change though that I made is I added the case for the 
> PT_MPLS_MC to the second switch in the following code:

> I think, it's needed there, but, please, let me know, if that's not the case.
It is needed . I missed it.

@echau...@redhat.com, Thanks again for guiding through.

Btw, Due to some reason I have not received the email in my gmail account.

Regards,
Martin
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif: Fix memory leak in dpif/show-dp-features appctl.

2022-01-17 Thread Ilya Maximets
On 1/17/22 10:14, Eelco Chaudron wrote:
> 
> 
> On 17 Jan 2022, at 2:08, Ilya Maximets wrote:
> 
>> Fixes: a98b700db617 ("ofproto: Add appctl command to show Datapath features")
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> Change looks good to me...
> 
> Acked-by: Eelco Chaudron 
> 

Thanks!  Applied and backported down to 2.13.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 1/2] ofproto-dpif: trigger revalidation when ipfix changes

2022-01-17 Thread Eelco Chaudron


On 15 Jan 2022, at 3:00, lic121 wrote:

>>
>>
>> On 14 Jan 2022, at 10:38, lic121 wrote:
>>


 On 14 Jan 2022, at 9:58, lic121 wrote:

>>
>>
>> On 9 Jan 2022, at 14:44, lic121 wrote:
>>
>>> Currently, ipfix creation/delection don't trigger dpif backer
>>> revalidation. This is not expected, as we need the revalidation
>>> to commit ipfix into xlate. So that xlate can generate ipfix
>>> actions.
>>>
>>> Signed-off-by: lic121 
>>> ---
>>>  ofproto/ofproto-dpif.c | 5 +
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>>> index bc3df8e..1cdef18 100644
>>> --- a/ofproto/ofproto-dpif.c
>>> +++ b/ofproto/ofproto-dpif.c
>>> @@ -2306,6 +2306,7 @@ set_ipfix(
>>>  {
>>>  struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
>>>  struct dpif_ipfix *di = ofproto->ipfix;
>>> +struct dpif_ipfix *old_ipfix = ofproto->ipfix;
>>>  bool has_options = bridge_exporter_options || 
>>> flow_exporters_options;
>>>  bool new_di = false;
>>>
>>> @@ -2335,6 +2336,10 @@ set_ipfix(
>>>  }
>>>  }
>>>
>>> +if (old_ipfix != ofproto->ipfix) {
>>
>> This only works if ipfix was not configured earlier or disabled, i.e., 
>> ofproto->ipfix was/is NULL.
>> If this was your intention, you could just have done “if (new_di || 
>> !ofproto->ipfix)”.
>>
>> However, I think there can also be changed in the configuration that 
>> requires a revalidate, what do you think? For example, 
>> enabling/disabling ingress/egress sampling.
>> In this case dpif_ipfix_set_options() can be changed so it will return 
>> true if any configuration changes.
>>
> Actually, I had ever thought the same thing as you. But at last I didn't 
> do as that, for 3 reasons:
> 1. I checked the history commit of ipfix, seems its not active in last 
> 2-3 years. So I guess not so many ovs users are using ipfix feature.
> 2. In xlate_xbridge_set, the place where checks the change flag, it 
> doesn't check the ipfix options changes as well.
> https://github.com/openvswitch/ovs/blob/eb1ab5357b6f3755f0ef1ee6d341ce24398d3bc1/ofproto/ofproto-dpif-xlate.c#L975-L978

 But here it just set the pointer as a reference, it does not take care of 
 acting on configuration changes, or do I miss something?

>>> Assume that we checks the ipfix options changes in set_ipfix():
>>> 1. set_ipfix(ofproto, new_option,...) {
>>>if (ofproto->ipfix->options != new_options) {
>>>   ipfix->options = new_options;
>>>   ofproto->backer->need_revalidate = REV_RECONFIGURE;
>>>}
>>>}
>>>
>>> 2. in xlate_xbridge_set, which is under revalidate context.
>>> xlate_xbridge_set() {
>>> ...
>>> if (xbridge->ipfix != ipfix) { // here the ipfix options has changed, 
>>> but the "if test" will not aware that
>>> dpif_ipfix_unref(xbridge->ipfix);
>>> xbridge->ipfix = dpif_ipfix_ref(ipfix);
>>
>> But here the pointer does not change, so no need to update the reference to 
>> it.
>> The actual configuration is taken in 
>> xlate_sample_action()/xlate_sample_action() used when creating/updating 
>> rules by the revalidator, kicked in by the succeeding udpif_revalidate() 
>> call.
>>
> After reading the code carefully again, I think you are right.
>
> I would like to divide the things into two parts.
> Part #1 is to resolve unneccesary revalidate cased by set_lldp(). Part #2 is 
> the to take care of the ipfix/lldp content change more that enabling/diabling.
> My team uses neither ipfix nor lldp, the problem we met is that 
> bridge_reconfigure() already trigger udpif revalidate because of lldp 
> problem(set_lldp() always trigger udpif revalidate).
> So I would like to focus on the first part, but leave the second part for 
> users who really want to use ipfix/lldp features.
> Please let me know your think, thanks.

I see no reason why you could not split this patch up into two individual 
patches where the first one is sent out earlier than the other.

>>> }
>>> }
>>>
> 3. My main patch is the second one in this series, this patch is here 
> because it breaks two test cases. So I didn't spend muhc time on the 
> ipfix issue.

 See comments on the second patch.

>>> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>>> +}
>>> +
>>>  return 0;
>>>  }
>>>
>>> --
>>> 1.8.3.1
>>>
>>> ___
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>


>>
>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Update to use RTE_ETH namespace defines.

2022-01-17 Thread Eelco Chaudron



On 13 Jan 2022, at 17:23, Kevin Traynor wrote:

> This patch updates OVS to use DPDK RTE_ETH namespaces.
>
> DPDK commit 295968d17407 ("ethdev: add namespace") [0] added RTE_ETH
> namespaces for ethdev enums and macros in DPDK 21.11.
>
> As compatibility for the older names was kept and they were not officially
> deprecated in DPDK 21.11, there was no impact to OVS and OVS did not have
> to be updated.
>
> In future DPDK releases the older names will be deprecated and that will
> cause build warnings for OVS. They may also be removed from DPDK at some
> point.
>
> There is no immediate need to update OVS to use the new namespaces while
> DPDK 21.11 is being used but at the same time there is no need to wait
> until it becomes an issue either. So might as well align with the
> updated names in DPDK 21.11.
>
> [0] 
> http://git.dpdk.org/dpdk/commit/?id=295968d1740760337e16b0d7914875c5cac52850
>
> Signed-off-by: Kevin Traynor 
>
> ---

Changes look good to me, and my pv(p) test all pass.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto: fix ipfix not always sampling on egress

2022-01-17 Thread Adrian Moreno
We are currently requiring in_port to be a valid port number for ipfix
sampling even if the sampling is done on the output port (egress).

This restriction is not really needed and can affect pipelines that
intentionally set the in_port to OFPP_NONE during flow processing. For
instance, OVN does this, see:

cfa547821 Fix ovn-controller generated packets from getting dropped for
reject ACL action.

This patch skips ipfix sampling only if both (ingress and egress) ports
are invalid.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2016346
Signed-off-by: Adrian Moreno 
---
 ofproto/ofproto-dpif-xlate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index cafcd014a..189276bc1 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3272,7 +3272,9 @@ compose_ipfix_action(struct xlate_ctx *ctx, odp_port_t 
output_odp_port)
 struct dpif_ipfix *ipfix = ctx->xbridge->ipfix;
 odp_port_t tunnel_out_port = ODPP_NONE;
 
-if (!ipfix || ctx->xin->flow.in_port.ofp_port == OFPP_NONE) {
+if (!ipfix ||
+(output_odp_port == ODPP_NONE &&
+ ctx->xin->flow.in_port.ofp_port == OFPP_NONE)) {
 return;
 }
 
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] system-traffic.at: Do not use ranges with broadcast address.

2022-01-17 Thread Eelco Chaudron



On 16 Jan 2022, at 22:36, Paolo Valerio wrote:

> turn a bunch of test ranges from 10.1.1.240-10.1.1.255 to
> 10.1.1.240-10.1.1.254. 10.1.1.255 is the broadcast address for
> 10.1.1.0/24 and can be picked to SNAT packets causing the subsequent
> failure of the test.
>
> Fixes: 9ac0aadab9f9 ("conntrack: Add support for NAT.")
> Fixes: e32cd4c6292e ("conntrack: ignore port for ICMP/ICMPv6 NAT.")
> Signed-off-by: Paolo Valerio 
>

Changes look good, and tess passes without any failures.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif: Fix memory leak in dpif/show-dp-features appctl.

2022-01-17 Thread Eelco Chaudron



On 17 Jan 2022, at 2:08, Ilya Maximets wrote:

> Fixes: a98b700db617 ("ofproto: Add appctl command to show Datapath features")
> Signed-off-by: Ilya Maximets 
> ---

Change looks good to me...

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev