Re: [ovs-dev] [PATCH] ofproto: fix use-after-free for "ofproto".
Hi, yunjian and Ilya, this patch reminds me of the discussion we have in March last year, and during the discussion, you have spotted this thread-safety issue of uuid map. Unfortunately, in that email, you did not reply to the mailist, so I cannot give a link to the email. I attach it as a reference. I quote it here: " That is an interesting example here... I can't help but notice that this function is typically called from different handler or pmd threads and modified by the main thread while upcalls enabled. And hmap is not a thread-safe structure. I guess, we have another possible problem here. We need to protect at least this hmap and the other one with rw lock or something... Am I right or am I missing something? What else we need to protect? " And this patch fixes exactly the problem you stated. wangyunjian via dev 于2022年1月8日周六 17:18写道: > Friendly ping. > > > -Original Message- > > From: wangyunjian > > Sent: Friday, December 3, 2021 7:25 PM > > To: d...@openvswitch.org; i.maxim...@ovn.org > > Cc: dingxiaoxiong ; xudingke > > ; wangyunjian ; Justin > > Pettit > > Subject: [ovs-dev] [PATCH] ofproto: fix use-after-free for "ofproto". > > > > When handler threads lookup a "ofproto" and use it, main thread maybe > > remove and free the "ofproto" at the same time. The "ofproto" has not > > been protected well, which can lead to an OVS crash. > > > > This patch fixes this by making the "ofproto" lookup RCU-safe by using > > cmap instead of hmap and moving remove "ofproto" call before > > xlate_txn_commit(). > > > > (gdb) bt > > hmap_next (hmap.h:398) > > seq_wake_waiters (seq.c:326) > > seq_change_protected (seq.c:134) > > seq_change (seq.c:144) > > ofproto_dpif_send_async_msg (ofproto_dpif.c:263) > > process_upcall (ofproto_dpif_upcall.c:1782) > > recv_upcalls (ofproto_dpif_upcall.c:1026) > > udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945) > > ovsthread_wrapper (ovs_thread.c:734) > > > > Cc: Justin Pettit > > Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to > user > > action cookie.") > > Signed-off-by: Yunjian Wang > > --- > > ofproto/ofproto-dpif.c | 12 ++-- > > ofproto/ofproto-dpif.h | 2 +- > > 2 files changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > > index cba49a99e..aa8d32f81 100644 > > --- a/ofproto/ofproto-dpif.c > > +++ b/ofproto/ofproto-dpif.c > > @@ -216,8 +216,7 @@ static struct hmap all_ofproto_dpifs_by_name = > > > > HMAP_INITIALIZER(_ofproto_dpifs_by_name); > > > > /* All existing ofproto_dpif instances, indexed by ->uuid. */ > > -static struct hmap all_ofproto_dpifs_by_uuid = > > - > > HMAP_INITIALIZER(_ofproto_dpifs_by_uuid); > > +static struct cmap all_ofproto_dpifs_by_uuid = CMAP_INITIALIZER; > > > > static bool ofproto_use_tnl_push_pop = true; > > static void ofproto_unixctl_init(void); > > @@ -1682,7 +1681,7 @@ construct(struct ofproto *ofproto_) > > hmap_insert(_ofproto_dpifs_by_name, > > >all_ofproto_dpifs_by_name_node, > > hash_string(ofproto->up.name, 0)); > > -hmap_insert(_ofproto_dpifs_by_uuid, > > +cmap_insert(_ofproto_dpifs_by_uuid, > > >all_ofproto_dpifs_by_uuid_node, > > uuid_hash(>uuid)); > > memset(>stats, 0, sizeof ofproto->stats); > > @@ -1778,12 +1777,13 @@ destruct(struct ofproto *ofproto_, bool del) > > ofproto->backer->need_revalidate = REV_RECONFIGURE; > > xlate_txn_start(); > > xlate_remove_ofproto(ofproto); > > +cmap_remove(_ofproto_dpifs_by_uuid, > > +>all_ofproto_dpifs_by_uuid_node, > > +uuid_hash(>uuid)); > I guess some comments are needed here, you actually make use of the rcu_synchronize in the xlate_txn_commit to avoid access to the ofproto from other thread through uuid map. > > xlate_txn_commit(); > > > > hmap_remove(_ofproto_dpifs_by_name, > > >all_ofproto_dpifs_by_name_node); > > -hmap_remove(_ofproto_dpifs_by_uuid, > > ->all_ofproto_dpifs_by_uuid_node); > > > > OFPROTO_FOR_EACH_TABLE (table, >up) { > > CLS_FOR_EACH (rule, up.cr, >cls) { > > @@ -5781,7 +5781,7 @@ ofproto_dpif_lookup_by_uuid(const struct uuid > > *uuid) > > { > > struct ofproto_dpif *ofproto; > > > > -HMAP_FOR_EACH_WITH_HASH (ofproto, > > all_ofproto_dpifs_by_uuid_node, > > +CMAP_FOR_EACH_WITH_HASH (ofproto, > > all_ofproto_dpifs_by_uuid_node, > > uuid_hash(uuid), > > _ofproto_dpifs_by_uuid) { > > if (uuid_equals(>uuid, uuid)) { > > return ofproto; > > diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h > > index 191cfcb0d..fba03f2cc 100644 > > --- a/ofproto/ofproto-dpif.h > > +++ b/ofproto/ofproto-dpif.h > > @@ -295,7 +295,7 @@ struct ofproto_dpif { > > struct hmap_node all_ofproto_dpifs_by_name_node; > > > > /* In 'all_ofproto_dpifs_by_uuid'. */ > > -
Re: [ovs-dev] [PATCH] netdev-dpdk: prepare for tso offload for tx copy packet
Hi, Harold, I am curious how this bug is found as I have an unsolved bug that I believe it's related to some memory issues and may be related to NIC's problem. Harold Huang 于2022年1月13日周四 16:24写道: > From: Harold Huang > > When one flow is output to multiple egress ports, OVS copy the packets > and send the copy packets to the intermediate ports. The original packets > is sent to the last port. If the intermediate port is a dpdk port, the copy > packets should also be prepared for tso offload. > > Fixes: 29cf9c1b3b ("userspace: Add TCP Segmentation Offload support") > Signed-off-by: Harold Huang > --- > lib/netdev-dpdk.c | 13 ++--- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 6782d3e8f..83029405e 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -2737,10 +2737,11 @@ dpdk_pktmbuf_alloc(struct rte_mempool *mp, > uint32_t data_len) > } > > static struct dp_packet * > -dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet > *pkt_orig) > +dpdk_copy_dp_packet_to_mbuf(struct netdev_dpdk *dev, struct dp_packet > *pkt_orig) > { > struct rte_mbuf *mbuf_dest; > struct dp_packet *pkt_dest; > +struct rte_mempool *mp = dev->dpdk_mp->mp; > uint32_t pkt_len; > > pkt_len = dp_packet_size(pkt_orig); > @@ -2761,11 +2762,9 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, > struct dp_packet *pkt_orig) > memcpy(_dest->l2_pad_size, _orig->l2_pad_size, > sizeof(struct dp_packet) - offsetof(struct dp_packet, > l2_pad_size)); > > -if (mbuf_dest->ol_flags & RTE_MBUF_F_TX_L4_MASK) { > -mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest) > -- (char *)dp_packet_eth(pkt_dest); > -mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest) > -- (char *) dp_packet_l3(pkt_dest); > +if (!netdev_dpdk_prep_hwol_packet(dev, mbuf_dest)) { > perhaps check userspace_tso_enabled()? > +rte_pktmbuf_free(mbuf_dest); > +return NULL; > } > > return pkt_dest; > @@ -2813,7 +2812,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, > struct dp_packet_batch *batch) > continue; > } > > -pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp, > packet); > +pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev, packet); > if (OVS_UNLIKELY(!pkts[txcnt])) { > dropped = cnt - i; > break; > -- > 2.27.0 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > -- hepeng ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy get/set cmd for netdev datapath
From: Aaron Conole Date: 2022-01-17 23:51:55 To: we...@ucloud.cn Cc: i.maxim...@ovn.org,pvale...@redhat.com,d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy get/set cmd for netdev datapath>Aaron Conole writes: > >> we...@ucloud.cn writes: >> >>> From: wenxu >>> >>> Now, the default timeout policy for netdev datapath is hard codeing. In >>> some case show or modify is needed. >>> Add command for get/set default timeout policy. Using like this: >>> >>> ovs-appctl dpctl/ct-get-default-tp [dp] >>> ovs-appctl dpctl/ct-set-default-tp [dp] policies >>> >>> Signed-off-by: wenxu >>> --- >> >> So far looks good. Just one minor comment (and one conflict) that might be >> able to be addressed > >After over an hour of loops, I did manage to produce one failure with >your test case: > >--- - 2022-01-17 10:31:12.740151293 -0500 >+++ >/home/aconole/git/ovs2/ovs/tests/system-userspace-testsuite.dir/at-groups/94/stdout >2022-01-17 10:31:12.739245956 -0500 >@@ -1,3 +1,2 @@ >-icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5 > udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5 > > >I think the test suite is "stable" but it might be better to use >something like 'ovs-ofctl monitor' to watch for events. Maybe you have >tried this already? > >The errant 'sleep's can result in failures like this. This ‘sleep’ is just following the testcase [conntrack - zone-based timeout policy]. Maybe it is enough? > >>> NEWS | 4 +++ >>> lib/conntrack-tp.c | 11 +++ >>> lib/conntrack-tp.h | 2 ++ >>> lib/ct-dpif.c| 56 >>> lib/ct-dpif.h| 9 ++ >>> lib/dpctl.c | 69 >>> >>> lib/dpif-netdev.c| 25 +++ >>> lib/dpif-netlink.c | 2 ++ >>> lib/dpif-provider.h | 8 + >>> tests/system-kmod-macros.at | 10 ++ >>> tests/system-traffic.at | 67 >>> ++ >>> tests/system-userspace-macros.at | 7 >>> 12 files changed, 270 insertions(+) >>> >>> diff --git a/NEWS b/NEWS >>> index 553a57d..f9fc04a 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -39,6 +39,10 @@ Post-v2.16.0 >>> is mainly for the benefit of OVN load balancing configurations. >>> - Ingress policing on Linux now uses 'matchall' classifier instead of >>> 'basic', if available. >>> + - ovs-appctl dpctl/: >>> + * New commands 'ct-set-default-tp' and >>> + 'ct-set-default-tp' that allows to get or configure >>> + netdev datapath ct default timeout policy. >> >> I had a conflict with this section on apply. It isn't a huge deal. >> >>> diff --git a/lib/conntrack-tp.h b/lib/conntrack-tp.h >>> index 4d411d1..07dcb4e 100644 >>> --- a/lib/conntrack-tp.h >>> +++ b/lib/conntrack-tp.h >>> @@ -710,6 +729,43 @@ ct_dpif_free_zone_limits(struct ovs_list *zone_limits) >>> } >>> } >>> >>> + >>> +/* Parses a specification of a timeout policy from 's' into '*tp' >>> + * . Returns true on success. Otherwise, returns false and >>> + * and puts the error message in 'ds'. */ >> >> The '.' in the above is ending the sentence on the previous line, and >> the world 'and' appears twice in a row. >> >> I think it should read: >> >>/* Parses a specification of a timeout policy from 's' into '*tp'. >> * Returns true on success. Otherwise, returns false and puts the >> * error message in 'ds'. */ > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ofproto-dpif: fix packet_execute_prepare
Bleep bloop. Greetings 贺鹏, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: ERROR: Author Peng He needs to sign off. WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He Lines checked: 33, Warnings: 1, Errors: 1 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v10 2/3] conntrack: prefer dst port range during unique tuple search
From: wenxu This commit splits the nested loop used to search the unique ports for the reverse tuple. It affects only the dnat action, giving more precedence to the dnat range, similarly to the kernel dp, instead of searching through the default ephemeral source range for each destination port. Signed-off-by: wenxu --- lib/conntrack.c | 61 +++-- 1 file changed, 46 insertions(+), 15 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 44f99f3..dc29c9b 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2397,6 +2397,22 @@ next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, return exhausted; } +static bool +nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, + ovs_be16 *port, uint16_t curr, uint16_t min, + uint16_t max) +{ +FOR_EACH_PORT_IN_RANGE(curr, min, max) { +*port = htons(curr); +if (!conn_lookup(ct, _conn->rev_key, + time_msec(), NULL, NULL)) { +return true; +} +} + +return false; +} + /* This function tries to get a unique tuple. * Every iteration checks that the reverse tuple doesn't * collide with any existing one. @@ -2411,9 +2427,11 @@ next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, * * In case of DNAT: *- For each dst IP address in the range (if any). - *- For each dport in range (if any). - * - Try to find a source port in the ephemeral range - * (after testing the port used by the sender). + *- For each dport in range (if any) tries to find + * an unique tuple. + *- Eventually, if the previous attempt fails, + * tries to find a source port in the ephemeral + * range (after testing the port used by the sender). * * If none can be found, return exhaustion to the caller. */ static bool @@ -2424,10 +2442,10 @@ nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0}, guard_addr = {0}; uint32_t hash = nat_range_hash(conn, ct->hash_basis, nat_info); +uint16_t min_sport, max_sport, curr_sport, orig_sport; +uint16_t min_dport, max_dport, curr_dport, orig_dport; bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || conn->key.nw_proto == IPPROTO_UDP; -uint16_t min_dport, max_dport, curr_dport; -uint16_t min_sport, max_sport, curr_sport; min_addr = nat_info->min_addr; max_addr = nat_info->max_addr; @@ -2439,9 +2457,9 @@ nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, * we can stop once we reach it. */ guard_addr = curr_addr; -set_sport_range(nat_info, >key, hash, _sport, +set_sport_range(nat_info, >key, hash, _sport, _sport, _sport); -set_dport_range(nat_info, >key, hash, _dport, +set_dport_range(nat_info, >key, hash, _dport, _dport, _dport); another_round: @@ -2457,17 +2475,30 @@ another_round: goto next_addr; } -FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) { -nat_conn->rev_key.src.port = htons(curr_dport); -FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) { -nat_conn->rev_key.dst.port = htons(curr_sport); -if (!conn_lookup(ct, _conn->rev_key, - time_msec(), NULL, NULL)) { -return true; -} +curr_sport = orig_sport; +curr_dport = orig_dport; + +nat_conn->rev_key.src.port = htons(curr_dport); +nat_conn->rev_key.dst.port = htons(curr_sport); + +bool found = false; +if (nat_info->nat_action & NAT_ACTION_DST_PORT) { +found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.src.port, + curr_dport, min_dport, max_dport); +if (!found) { +nat_conn->rev_key.src.port = htons(orig_dport); } } +if (!found) { +found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.dst.port, + curr_sport, min_sport, max_sport); +} + +if (found) { +return true; +} + /* Check if next IP is in range and respin. Otherwise, notify * exhaustion to the caller. */ next_addr: -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v10 3/3] conntrack: limit port clash resolution attempts
From: wenxu In case almost or all available ports are taken, clash resolution can take a very long time, resulting in pmd hang in conntrack. This can happen when many to-be-natted hosts connect to same destination:port (e.g. a proxy) and all connections pass the same SNAT. Pick a random offset in the acceptable range, then try ever smaller number of adjacent port numbers, until either the limit is reached or a useable port was found. This results in at most 248 attempts (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset) instead of 64000+. And if thenumber of ip address will limit the max attempts and which will lead the total attempts under 248. Signed-off-by: wenxu --- lib/conntrack.c | 71 ++--- 1 file changed, 63 insertions(+), 8 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index dc29c9b..248cd45 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2400,19 +2400,50 @@ next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, static bool nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, ovs_be16 *port, uint16_t curr, uint16_t min, - uint16_t max) + uint16_t max, unsigned int attempts) { +unsigned int i = 0; + FOR_EACH_PORT_IN_RANGE(curr, min, max) { -*port = htons(curr); -if (!conn_lookup(ct, _conn->rev_key, - time_msec(), NULL, NULL)) { -return true; +if (i++ < attempts) { +*port = htons(curr); +if (!conn_lookup(ct, _conn->rev_key, + time_msec(), NULL, NULL)) { +return true; +} +} else { +break; } } return false; } +static void +nat_get_attempts_range(union ct_addr *min_addr, union ct_addr *max_addr, + uint16_t dl_type, unsigned int *max_attempts, + unsigned int *min_attempts) +{ +uint32_t range_addr; + +if (dl_type == htons(ETH_TYPE_IP)) { +range_addr = ntohl(max_addr->ipv4) - ntohl(min_addr->ipv4) + 1; +} else { +range_addr = nat_ipv6_addrs_delta(_addr->ipv6, + _addr->ipv6) + 1; +} + +*max_attempts = 128 / range_addr; +if (*max_attempts < 1) { +*max_attempts = 1; +} + +*min_attempts = 16 / range_addr; +if (*min_attempts < 2) { +*min_attempts = 2; +} +} + /* This function tries to get a unique tuple. * Every iteration checks that the reverse tuple doesn't * collide with any existing one. @@ -2446,6 +2477,9 @@ nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, uint16_t min_dport, max_dport, curr_dport, orig_dport; bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || conn->key.nw_proto == IPPROTO_UDP; +unsigned int attempts, max_attempts, min_attempts; +uint16_t range_src, range_dst, range_max; +bool found; min_addr = nat_info->min_addr; max_addr = nat_info->max_addr; @@ -2462,6 +2496,13 @@ nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, set_dport_range(nat_info, >key, hash, _dport, _dport, _dport); +range_src = max_sport - min_sport + 1; +range_dst = max_dport - min_dport + 1; +range_max = range_src > range_dst ? range_src : range_dst; + +nat_get_attempts_range(_addr, _addr, conn->key.dl_type, + _attempts, _attempts); + another_round: store_addr_to_key(_addr, _conn->rev_key, nat_info->nat_action); @@ -2481,10 +2522,16 @@ another_round: nat_conn->rev_key.src.port = htons(curr_dport); nat_conn->rev_key.dst.port = htons(curr_sport); -bool found = false; +attempts = range_max; +if (attempts > max_attempts) { +attempts = max_attempts; +} + +another_port_round: +found = false; if (nat_info->nat_action & NAT_ACTION_DST_PORT) { found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.src.port, - curr_dport, min_dport, max_dport); + curr_dport, min_dport, max_dport, attempts); if (!found) { nat_conn->rev_key.src.port = htons(orig_dport); } @@ -2492,13 +2539,21 @@ another_round: if (!found) { found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.dst.port, - curr_sport, min_sport, max_sport); + curr_sport, min_sport, max_sport, attempts); } if (found) { return true; } +if (attempts < range_max && attempts >= min_attempts) { +attempts /= 2; +curr_dport = min_dport + (random_uint32() % range_dst); +curr_sport = min_sport + (random_uint32() % range_src); + +goto another_port_round; +} + /* Check if next IP
[ovs-dev] [PATCH v10 1/3] conntrack: select correct sport range for well-known origin sport
From: wenxu Like the kernel datapath. The sport nat range for well-konwn origin sport should limit in the well-known ports. Signed-off-by: wenxu Acked-by: Paolo Valerio --- lib/conntrack.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 33a1a92..44f99f3 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2265,8 +2265,16 @@ set_sport_range(const struct nat_action_info_t *ni, const struct conn_key *k, if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) || ((ni->nat_action & NAT_ACTION_DST))) { *curr = ntohs(k->src.port); -*min = MIN_NAT_EPHEMERAL_PORT; -*max = MAX_NAT_EPHEMERAL_PORT; +if (*curr < 512) { +*min = 1; +*max = 511; +} else if (*curr < 1024) { +*min = 600; +*max = 1023; +} else { +*min = MIN_NAT_EPHEMERAL_PORT; +*max = MAX_NAT_EPHEMERAL_PORT; +} } else { *min = ni->min_port; *max = ni->max_port; -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovs-dev v8 1/2] ipf: add ipf context
Hi, this patch can pass all the system-userspace testsuite if one of my another patch gets merged. So maybe review this patch "[ovs-dev] ofproto-dpif: fix packet_execute_prepare" first. Peng He 于2022年1月18日周二 11:28写道: > From: Peng He > > ipf_postprocess will emit packets into the datapath pipeline ignoring > the conntrack context, this might casuse weird issues when a packet > batch has less space to hold all the fragments belonging to single > packet. > > Given the below ruleest and consider sending a 64K ICMP packet which > is splitted into 64 fragments. > > priority=1,action=drop > priority=10,arp,action=normal > priority=100,in_port=1,ct_state=-trk,icmp,action=ct(zone=9,table=0) > priority=100,in_port=1,ct_state=+new+trk,icmp,action=ct(zone=9,commit),2 > priority=100,in_port=1,ct_state=-new+est+trk,icmp,action=2 > priority=100,in_port=2,ct_state=-trk,icmp,action=ct(table=0,zone=9) > priority=100,in_port=2,ct_state=+trk+est-new,icmp,action=1 > > Batch 1: > the first 32 packets will be buffered in the ipf preprocessing, nothing > more proceeds. > > Batch 2: > the second 32 packets succeed the fragment reassembly and goes to ct > and ipf_post will emits the first 32 packets due to the limit of batch > size. > > the first 32 packets goes to the datapath again due to the > recirculation, and again buffered at ipf preprocessing before ct commit, > then the ovs tries to call ct commit as well as ipf_postprocessing which > emits > the last 32 packets, in this case the last 32 packets will follow > the current actions which will be sent to port 2 directly without > recirculation and going to ipf preprocssing and ct commit again. > > This will cause the first 32 packets never get the chance to > reassemble and evevntually this large ICMP packets fail to transmit. > > this patch fixes this issue by adding firstly ipf context to avoid > ipf_postprocessing emits packets in the wrong context. Then by > re-executing the action list again to emit the last 32 packets > in the right context to correctly transmitting multiple fragments. > > This might be one of the root cause why the OVS log warns: > > " > skipping output to input port on bridge br-int while processing > " > > As a pkt might enter into different ipf context. > > One implementation trick of implementing ipf_ctx_eq, is to use ipf_list's > key instead of the first frag's metadata, this can reduce quite a lot of > cache misses as at the time of calling ipf_ctx_eq, ipf_list is cache-hot. > On x86_64, this trick saves 2% of CPU usage of ipf processing. > > Signed-off-by: Peng He > Co-authored-by: Mike Pattrick > --- > lib/conntrack.c | 9 --- > lib/conntrack.h | 15 ++-- > lib/dpif-netdev.c | 52 + > lib/ipf.c | 39 --- > lib/ipf.h | 11 +++-- > tests/system-traffic.at | 33 ++ > tests/test-conntrack.c | 6 ++--- > 7 files changed, 135 insertions(+), 30 deletions(-) > > diff --git a/lib/conntrack.c b/lib/conntrack.c > index 33a1a9295..72999f1ae 100644 > --- a/lib/conntrack.c > +++ b/lib/conntrack.c > @@ -1434,14 +1434,14 @@ process_one(struct conntrack *ct, struct dp_packet > *pkt, > * connection tables. 'setmark', if not NULL, should point to a two > * elements array containing a value and a mask to set the connection > mark. > * 'setlabel' behaves similarly for the connection label.*/ > -int > +bool > conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, >ovs_be16 dl_type, bool force, bool commit, uint16_t > zone, >const uint32_t *setmark, >const struct ovs_key_ct_labels *setlabel, >ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper, >const struct nat_action_info_t *nat_action_info, > - long long now, uint32_t tp_id) > + long long now, uint32_t tp_id, struct ipf_ctx *ipf_ctx) > { > ipf_preprocess_conntrack(ct->ipf, pkt_batch, now, dl_type, zone, > ct->hash_basis); > @@ -1468,9 +1468,8 @@ conntrack_execute(struct conntrack *ct, struct > dp_packet_batch *pkt_batch, > } > } > > -ipf_postprocess_conntrack(ct->ipf, pkt_batch, now, dl_type); > - > -return 0; > +return ipf_postprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, \ > + now, dl_type); > } > > void > diff --git a/lib/conntrack.h b/lib/conntrack.h > index 9553b188a..211efad3f 100644 > --- a/lib/conntrack.h > +++ b/lib/conntrack.h > @@ -64,6 +64,7 @@ > struct dp_packet_batch; > > struct conntrack; > +struct ipf_ctx; > > union ct_addr { > ovs_be32 ipv4; > @@ -88,13 +89,13 @@ struct nat_action_info_t { > struct conntrack *conntrack_init(void); > void conntrack_destroy(struct conntrack *); > > -int conntrack_execute(struct conntrack *ct, struct dp_packet_batch > *pkt_batch,
[ovs-dev] ofproto-dpif: fix packet_execute_prepare
In the commit 1df7f7aac8c976690167261fec9a50d832ef9e7e, the packet metadata's in_port has been changed from ofp port into odp port, however, the odp->flow->in_port is still ofp_port. This patch changes the odp->flow->in_port into odp_port. Fixes: 1df7f7aac8 ("ofproto-dpif: Restore packet metadata when a continuation is resumed.") Signed-off-by: Peng He --- ofproto/ofproto-dpif.c | 1 + 1 file changed, 1 insertion(+) diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index cba49a99e..72de67c63 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -4973,6 +4973,7 @@ packet_execute_prepare(struct ofproto *ofproto_, ofproto_dpif_set_packet_odp_port(ofproto, opo->flow->in_port.ofp_port, opo->packet); +opo->flow->in_port = opo->packet->md.in_port; ofproto_dpif_packet_out_delete(aux); opo->aux = execute; } -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovs-dev v8 2/2] ipf: ipf_preprocess_conntrack should also consider ipf_ctx
Bleep bloop. Greetings Peng He, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: ERROR: Author Peng He needs to sign off. WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He Lines checked: 91, Warnings: 1, Errors: 1 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovs-dev v8 1/2] ipf: add ipf context
Bleep bloop. Greetings Peng He, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: ERROR: Author Peng He needs to sign off. ERROR: Co-author Mike Pattrick needs to sign off. WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He Lines checked: 421, Warnings: 1, Errors: 2 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [ovs-dev v8 2/2] ipf: ipf_preprocess_conntrack should also consider ipf_ctx
considering a multi-thread PMD setting, when the frags are reassembled in one PMD, another thread might call *ipf_execute_reass_pkts* and 'steal' the reassembled packets into its ipf ctx, then this reassembled packet will enter into another ipf context and causes errors. This happends when there are multiple CT zones, and frags are reassembled in ct(zone=X) might be 'stealed' into the ct(zone=Y). Signed-off-by: Peng He --- lib/conntrack.c | 2 +- lib/ipf.c | 10 -- lib/ipf.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 72999f1ae..aafa6b536 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -1443,7 +1443,7 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, const struct nat_action_info_t *nat_action_info, long long now, uint32_t tp_id, struct ipf_ctx *ipf_ctx) { -ipf_preprocess_conntrack(ct->ipf, pkt_batch, now, dl_type, zone, +ipf_preprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, now, dl_type, zone, ct->hash_basis); struct dp_packet *packet; diff --git a/lib/ipf.c b/lib/ipf.c index 2ff04c615..031164bb1 100644 --- a/lib/ipf.c +++ b/lib/ipf.c @@ -1135,7 +1135,8 @@ ipf_send_expired_frags(struct ipf *ipf, struct dp_packet_batch *pb, /* Adds a reassmebled packet to a packet batch to be processed by the caller. */ static void -ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb) +ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb, + struct ipf_ctx *ctx) { if (ovs_list_is_empty(>reassembled_pkt_list)) { return; @@ -1145,6 +1146,10 @@ ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb) struct reassembled_pkt *rp, *next; LIST_FOR_EACH_SAFE (rp, next, rp_list_node, >reassembled_pkt_list) { +if (ctx && !ipf_ctx_eq(rp->list, ctx, rp->pkt)) { +continue; +} + if (!rp->list->reass_execute_ctx && ipf_dp_packet_batch_add(pb, rp->pkt, false)) { rp->list->reass_execute_ctx = rp->pkt; @@ -1246,6 +1251,7 @@ ipf_post_execute_reass_pkts(struct ipf *ipf, * be added to the batch to be sent through conntrack. */ void ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, + struct ipf_ctx *ipf_ctx, long long now, ovs_be16 dl_type, uint16_t zone, uint32_t hash_basis) { @@ -1254,7 +1260,7 @@ ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, } if (ipf_get_enabled(ipf) || atomic_count_get(>nfrag)) { -ipf_execute_reass_pkts(ipf, pb); +ipf_execute_reass_pkts(ipf, pb, ipf_ctx); } } diff --git a/lib/ipf.h b/lib/ipf.h index 7bbf453af..8974f15ae 100644 --- a/lib/ipf.h +++ b/lib/ipf.h @@ -49,6 +49,7 @@ struct ipf_ctx { struct ipf *ipf_init(void); void ipf_destroy(struct ipf *ipf); void ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, + struct ipf_ctx *ctx, long long now, ovs_be16 dl_type, uint16_t zone, uint32_t hash_basis); -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [ovs-dev v8 1/2] ipf: add ipf context
From: Peng He ipf_postprocess will emit packets into the datapath pipeline ignoring the conntrack context, this might casuse weird issues when a packet batch has less space to hold all the fragments belonging to single packet. Given the below ruleest and consider sending a 64K ICMP packet which is splitted into 64 fragments. priority=1,action=drop priority=10,arp,action=normal priority=100,in_port=1,ct_state=-trk,icmp,action=ct(zone=9,table=0) priority=100,in_port=1,ct_state=+new+trk,icmp,action=ct(zone=9,commit),2 priority=100,in_port=1,ct_state=-new+est+trk,icmp,action=2 priority=100,in_port=2,ct_state=-trk,icmp,action=ct(table=0,zone=9) priority=100,in_port=2,ct_state=+trk+est-new,icmp,action=1 Batch 1: the first 32 packets will be buffered in the ipf preprocessing, nothing more proceeds. Batch 2: the second 32 packets succeed the fragment reassembly and goes to ct and ipf_post will emits the first 32 packets due to the limit of batch size. the first 32 packets goes to the datapath again due to the recirculation, and again buffered at ipf preprocessing before ct commit, then the ovs tries to call ct commit as well as ipf_postprocessing which emits the last 32 packets, in this case the last 32 packets will follow the current actions which will be sent to port 2 directly without recirculation and going to ipf preprocssing and ct commit again. This will cause the first 32 packets never get the chance to reassemble and evevntually this large ICMP packets fail to transmit. this patch fixes this issue by adding firstly ipf context to avoid ipf_postprocessing emits packets in the wrong context. Then by re-executing the action list again to emit the last 32 packets in the right context to correctly transmitting multiple fragments. This might be one of the root cause why the OVS log warns: " skipping output to input port on bridge br-int while processing " As a pkt might enter into different ipf context. One implementation trick of implementing ipf_ctx_eq, is to use ipf_list's key instead of the first frag's metadata, this can reduce quite a lot of cache misses as at the time of calling ipf_ctx_eq, ipf_list is cache-hot. On x86_64, this trick saves 2% of CPU usage of ipf processing. Signed-off-by: Peng He Co-authored-by: Mike Pattrick --- lib/conntrack.c | 9 --- lib/conntrack.h | 15 ++-- lib/dpif-netdev.c | 52 + lib/ipf.c | 39 --- lib/ipf.h | 11 +++-- tests/system-traffic.at | 33 ++ tests/test-conntrack.c | 6 ++--- 7 files changed, 135 insertions(+), 30 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 33a1a9295..72999f1ae 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -1434,14 +1434,14 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, * connection tables. 'setmark', if not NULL, should point to a two * elements array containing a value and a mask to set the connection mark. * 'setlabel' behaves similarly for the connection label.*/ -int +bool conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, ovs_be16 dl_type, bool force, bool commit, uint16_t zone, const uint32_t *setmark, const struct ovs_key_ct_labels *setlabel, ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper, const struct nat_action_info_t *nat_action_info, - long long now, uint32_t tp_id) + long long now, uint32_t tp_id, struct ipf_ctx *ipf_ctx) { ipf_preprocess_conntrack(ct->ipf, pkt_batch, now, dl_type, zone, ct->hash_basis); @@ -1468,9 +1468,8 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, } } -ipf_postprocess_conntrack(ct->ipf, pkt_batch, now, dl_type); - -return 0; +return ipf_postprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, \ + now, dl_type); } void diff --git a/lib/conntrack.h b/lib/conntrack.h index 9553b188a..211efad3f 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -64,6 +64,7 @@ struct dp_packet_batch; struct conntrack; +struct ipf_ctx; union ct_addr { ovs_be32 ipv4; @@ -88,13 +89,13 @@ struct nat_action_info_t { struct conntrack *conntrack_init(void); void conntrack_destroy(struct conntrack *); -int conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, - ovs_be16 dl_type, bool force, bool commit, uint16_t zone, - const uint32_t *setmark, - const struct ovs_key_ct_labels *setlabel, - ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper, - const struct nat_action_info_t *nat_action_info, - long long now, uint32_t tp_id); +bool conntrack_execute(struct conntrack *ct,
[ovs-dev] [PATCH ovn] northd: Fix missing space in log warnings.
Add a space between 'static route' and the UUID in logs. Signed-off-by: James Troup --- northd/northd.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) e.g. 2022-01-17T19:55:27.701Z|1601438|ovn_northd|WARN|Address family doesn't match between 'ip_prefix' 0.0.0.0/0 and 'nexthop' 2620:2d:4000:2000::1 in static routefb68fe04-ca8e-4885-a01f-9e7ac6b337e5 diff --git a/northd/northd.c b/northd/northd.c index c714227b2..22e783ff6 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -8872,7 +8872,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *ports, if (valid_nexthop) { if (!ip46_parse_cidr(route->nexthop, , )) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_WARN_RL(, "bad 'nexthop' %s in static route" +VLOG_WARN_RL(, "bad 'nexthop' %s in static route " UUID_FMT, route->nexthop, UUID_ARGS(>header_.uuid)); return NULL; @@ -8880,7 +8880,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *ports, if ((IN6_IS_ADDR_V4MAPPED() && plen != 32) || (!IN6_IS_ADDR_V4MAPPED() && plen != 128)) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_WARN_RL(, "bad next hop mask %s in static route" +VLOG_WARN_RL(, "bad next hop mask %s in static route " UUID_FMT, route->nexthop, UUID_ARGS(>header_.uuid)); return NULL; @@ -8891,7 +8891,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *ports, struct in6_addr prefix; if (!ip46_parse_cidr(route->ip_prefix, , )) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_WARN_RL(, "bad 'ip_prefix' %s in static route" +VLOG_WARN_RL(, "bad 'ip_prefix' %s in static route " UUID_FMT, route->ip_prefix, UUID_ARGS(>header_.uuid)); return NULL; @@ -8902,7 +8902,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *ports, if (IN6_IS_ADDR_V4MAPPED() != IN6_IS_ADDR_V4MAPPED()) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); VLOG_WARN_RL(, "Address family doesn't match between 'ip_prefix'" - " %s and 'nexthop' %s in static route"UUID_FMT, + " %s and 'nexthop' %s in static route "UUID_FMT, route->ip_prefix, route->nexthop, UUID_ARGS(>header_.uuid)); return NULL; -- 2.32.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVS "soft freeze" for 2.17 is in effect.
On 1/4/22 00:37, Ilya Maximets wrote: > Hi. As described in Documentation/internals/release-process.rst, we are > in a "soft freeze" state: > >During the freeze, we ask committers to refrain from applying patches that >add new features unless those patches were already being publicly discussed >and reviewed before the freeze began. Bug fixes are welcome at any time. >Please propose and discuss exceptions on ovs-dev. > > We should branch for version 2.17 in two weeks from now, on Monday, Jan 17. I'm a bit behind the schedule. It's already way past the business hours for me and I still have a few patches in the queue to apply. So, I'm finishing my work for today and will continue tomorrow. Sorry for the inconvenience. Branch creation date shifts for about a day for now. I think, the actual release date should not be affected by that and should still be on Feb 17, unless there are any concerns. Best regards, Ilya Maximets. > With that, release should be on Thursday, Feb 17. > > We do have a fair amount of new features on a mailing list which are already > reviewed but not accepted yet. For the next 2 weeks before branching I will > mostly work on re-checking and applying those patches. > > Best regards, Ilya Maximets. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 0/5] adding USDT points to OVS
On 12/22/21 10:15, Eelco Chaudron wrote: > This patchset introduces User Statically Defined Trace (USDT) > points to OVS. It included the general infrastructure, > documentation, and some example scripts. > > This idea was introduced during the OVS conference 2021, more > information about the presentation can be found here: > http://www.openvswitch.org/support/ovscon2021/#T1 > > While waiting for the official conference video post, the > pre-recorded video can be found here: https://youtu.be/NrO0YbdTvbg > > Eelco Chaudron (5): > configure: add --enable-usdt option to enable USDT probes > openvswitch: define the OVS_STATIC_TRACE() macro > Documentation: add USDT documentation and bpftrace example > utilities: add upcall USDT probe and associated script > utilities: add netlink flow operation USDT probes and upcall_cost script Thanks, Eelco! Very cool traces and scripts! And thanks, Paolo, for review! Applied. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 0/5] dpif-netdev: Hash-based Tx packet steering
On 1/17/22 18:44, Ilya Maximets wrote: On 1/5/22 09:19, Maxime Coquelin wrote: This series introduces a new hash-based Tx packets steering mode alognside existing XPS and static modes. The goal is to provide a mode where all the transmit queues are used, whatever the number of PMD threads. This may be used with Vhost-user ports, where the guest application driving the Virtio device expects packets to be distributed on all the queues. As a preliminary step, in order to be able to validate the feature at OVS level, the two first patches introduce per-queue basic statistics for Vhost-user and dummy ports. These patches are complementary to David's patch [0] adding per-queue statistics to DPDK ports using xstats. The series also introduces a dpif-netdev test for tx steering, using dummy-pmd ports only, instead of the more complex Vhost-user test in first revision. Changes in v5: == - Fix indents (David) - Fix/remove comments (David) - Fix grammar in documentation (David) - Rebase and applied David's R-by's Maxime Coquelin (5): netdev-dpdk: Introduce per rxq/txq Vhost-user statistics. netdev-dummy: Introduce per rxq/txq statistics. dpif-netdev: Introduce Tx queue mode. dpif-netdev: Introduce hash-based Tx packet steering mode. dpif-netdev.at: Add test for Tx packets steering. Documentation/automake.mk | 1 + Documentation/topics/index.rst| 1 + .../topics/userspace-tx-steering.rst | 73 + lib/dpif-netdev.c | 142 ++--- lib/netdev-dpdk.c | 147 -- lib/netdev-dummy.c| 87 +-- tests/dpif-netdev.at | 67 tests/ofproto.at | 3 +- 8 files changed, 477 insertions(+), 44 deletions(-) create mode 100644 Documentation/topics/userspace-tx-steering.rst Thanks, Maxime and David! With some cosmetic changes I applied the series, except for the first patch (vhost-user statistics), because it causes performance degradation for me. I'll reply with details to that patch. Thanks, I agree with skipping the Vhost-user stats patch for now. And I just noticed that we're missing the documentation update for vswitchd/vswitch.xml. Could you, please, update it with a new tx-steering option and send as a separate patch? I just sent the documentation patch adding the new PMD option. Regards, Maxime Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] vswitchd.xml: Add missing tx-steering PMD option.
This patch documents PMD's other_config:tx-steering option. Signed-off-by: Maxime Coquelin --- vswitchd/vswitch.xml | 22 ++ 1 file changed, 22 insertions(+) diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 026b5e2ca..ef7f8f2c8 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -3375,6 +3375,28 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ This option may only be used with dpdk VF representors. + + + + Specifies the Tx steering mode for the interface. + + + thread enables static txq mapping when the number of txq + is greater or equal than the number of PMD threads, and XPS mode if + lower than the number of PMD threads. + + + hash enables hash-based Tx steering, which distributes + the packets on all the transmit queues based on their 5-tuples + hashes. + + + Defaults to thread. + + + -- 2.34.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v9 3/3] conntrack: limit port clash resolution attempts
Hi wenxu, I think we're almost there. There are some improvements still possible, but could be included in later patches. Theoretically, we should perform the first lookup with the orig_ports and then start with random sub ranges. The current patch doesn't do that, but we can leave this as a further improvement. we...@ucloud.cn writes: > From: wenxu > > In case almost or all available ports are taken, clash resolution can > take a very long time, resulting in pmd hang in conntrack. > > This can happen when many to-be-natted hosts connect to same > destination:port (e.g. a proxy) and all connections pass the same SNAT. > > Pick a random offset in the acceptable range, then try ever smaller > number of adjacent port numbers, until either the limit is reached or a > useable port was found. This results in at most 248 attempts > (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset) > instead of 64000+. > > And if thenumber of ip address will limit the max attempts and which > will lead the total attempts under 248. > > Signed-off-by: wenxu > --- > lib/conntrack.c | 60 > ++--- > 1 file changed, 53 insertions(+), 7 deletions(-) > > diff --git a/lib/conntrack.c b/lib/conntrack.c > index dc29c9b..4079a35 100644 > --- a/lib/conntrack.c > +++ b/lib/conntrack.c > @@ -2400,13 +2400,18 @@ next_addr_in_range_guarded(union ct_addr *curr, union > ct_addr *min, > static bool > nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, >ovs_be16 *port, uint16_t curr, uint16_t min, > - uint16_t max) > + uint16_t max, unsigned int *round, > + unsigned int attempts) > { > FOR_EACH_PORT_IN_RANGE(curr, min, max) { > -*port = htons(curr); > -if (!conn_lookup(ct, _conn->rev_key, > - time_msec(), NULL, NULL)) { > -return true; > +if ((*round)++ < attempts) { > +*port = htons(curr); > +if (!conn_lookup(ct, _conn->rev_key, > + time_msec(), NULL, NULL)) { > +return true; > +} > +} else { > +break; > } > } > > @@ -2446,6 +2451,10 @@ nat_get_unique_tuple(struct conntrack *ct, const > struct conn *conn, > uint16_t min_dport, max_dport, curr_dport, orig_dport; > bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || > conn->key.nw_proto == IPPROTO_UDP; > +unsigned int attempts, max_attempts, min_attempts; > +uint16_t range_src, range_dst, range_max; > +uint32_t range_addr; > +unsigned int i; > > min_addr = nat_info->min_addr; > max_addr = nat_info->max_addr; > @@ -2462,6 +2471,24 @@ nat_get_unique_tuple(struct conntrack *ct, const > struct conn *conn, > set_dport_range(nat_info, >key, hash, _dport, > _dport, _dport); > > +range_src = max_sport - min_sport + 1; > +range_dst = max_dport - min_dport + 1; > +range_max = range_src > range_dst ? range_src : range_dst; > +if (conn->key.dl_type == htons(ETH_TYPE_IP)) { > +range_addr = ntohl(max_addr.ipv4) - ntohl(min_addr.ipv4) + 1; > +} else { > +range_addr = nat_ipv6_addrs_delta(_info->min_addr.ipv6, > + _info->max_addr.ipv6) + 1; > +} > +max_attempts = 128 / range_addr; > +if (max_attempts < 1) { > +max_attempts = 1; > +} > +min_attempts = 16 / range_addr; > +if (min_attempts < 2) { > +min_attempts = 2; > +} > + can you put the code of this hunk in a function? I think it could improve a bit the readability. Please use min_addr and max_addr instead of nat_info when calculating the addr ranges. > another_round: > store_addr_to_key(_addr, _conn->rev_key, >nat_info->nat_action); > @@ -2481,24 +2508,43 @@ another_round: > nat_conn->rev_key.src.port = htons(curr_dport); > nat_conn->rev_key.dst.port = htons(curr_sport); > > +attempts = range_max; > +if (attempts > max_attempts) { > +attempts = max_attempts; > +} > + > +another_port_round: > +i = 0; > + > bool found = false; > if (nat_info->nat_action & NAT_ACTION_DST_PORT) { > found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.src.port, > - curr_dport, min_dport, max_dport); > + curr_dport, min_dport, max_dport, , is there any reason you pass instead of using a local variable in nat_get_unique_l4()? > + attempts); > if (!found) { > +i = 0; > nat_conn->rev_key.src.port = htons(orig_dport); > } > } > > if (!found) { > found = nat_get_unique_l4(ct, nat_conn, _conn->rev_key.dst.port, > - curr_sport, min_sport, max_sport); > +
Re: [ovs-dev] [PATCH v5 1/5] netdev-dpdk: Introduce per rxq/txq Vhost-user statistics.
On 1/5/22 09:19, Maxime Coquelin wrote: > Hash-based Tx steering feature will enable steering Tx > packets on transmit queues based on their hashes. In order > to test the feature, it is needed to be able to get the > per-queue statistics for Vhost-user ports. > > This patch introduces "bytes", "packets" and "error" > per-queue custom statistics for Vhost-user ports. > > Suggested-by David Marchand > Signed-off-by: Maxime Coquelin > Reviewed-by: David Marchand > --- > lib/netdev-dpdk.c | 147 +++--- > 1 file changed, 138 insertions(+), 9 deletions(-) Hi, Maxime. Thanks for the patch; and I really think that it's an important feature for debugging performance issues in a real-world setups. However, it causes a performance drop by about 2-2.5% for me with the VM-VM bidirectional traffic with 2 PMD threads. The reason is the existing stats_lock. Unfortunately, in current code, we're taking the same stats_lock on both rx and tx paths, and since rx and tx are likely performed by different threads at the same time, they are frequently locking each other. Under this circumstances even a slight increase of a critical section causes a visible performance drop. One of the possible solutions might be to split the stats_lock in two (one for rx stats and one for tx stats). We also should split or re-align to different cache lines rx and tx fields of the generic struct netdev_stats, or count all the stats on the per-queue basis. Quick prototype of such a solution gives an extra 2-3% performance boost over the current master and reduces the impact of extra stats in this patch to a minimum. I'll polish and submit my prototype code sometime later. For now, I think, we won't be able to accept this change for 2.17, since some more development is needed to avoid regression. There is also a memory leak in this code, but that can be easily fixed: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0e1efefe3..f8680a058 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1549,6 +1549,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev) dev->vhost_id = NULL; rte_free(dev->vhost_rxq_enabled); +free(dev->vhost_rxq_stats); +free(dev->vhost_txq_stats); + common_destruct(dev); ovs_mutex_unlock(_mutex); --- Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 0/5] dpif-netdev: Hash-based Tx packet steering
On 1/5/22 09:19, Maxime Coquelin wrote: > This series introduces a new hash-based Tx packets steering > mode alognside existing XPS and static modes. The goal is > to provide a mode where all the transmit queues are used, > whatever the number of PMD threads. This may be used with > Vhost-user ports, where the guest application driving the > Virtio device expects packets to be distributed on all the > queues. > > As a preliminary step, in order to be able to validate the > feature at OVS level, the two first patches introduce > per-queue basic statistics for Vhost-user and dummy ports. > These patches are complementary to David's patch [0] adding > per-queue statistics to DPDK ports using xstats. > > The series also introduces a dpif-netdev test for tx > steering, using dummy-pmd ports only, instead of the more > complex Vhost-user test in first revision. > > Changes in v5: > == > - Fix indents (David) > - Fix/remove comments (David) > - Fix grammar in documentation (David) > - Rebase and applied David's R-by's > > Maxime Coquelin (5): > netdev-dpdk: Introduce per rxq/txq Vhost-user statistics. > netdev-dummy: Introduce per rxq/txq statistics. > dpif-netdev: Introduce Tx queue mode. > dpif-netdev: Introduce hash-based Tx packet steering mode. > dpif-netdev.at: Add test for Tx packets steering. > > Documentation/automake.mk | 1 + > Documentation/topics/index.rst| 1 + > .../topics/userspace-tx-steering.rst | 73 + > lib/dpif-netdev.c | 142 ++--- > lib/netdev-dpdk.c | 147 -- > lib/netdev-dummy.c| 87 +-- > tests/dpif-netdev.at | 67 > tests/ofproto.at | 3 +- > 8 files changed, 477 insertions(+), 44 deletions(-) > create mode 100644 Documentation/topics/userspace-tx-steering.rst > Thanks, Maxime and David! With some cosmetic changes I applied the series, except for the first patch (vhost-user statistics), because it causes performance degradation for me. I'll reply with details to that patch. And I just noticed that we're missing the documentation update for vswitchd/vswitch.xml. Could you, please, update it with a new tx-steering option and send as a separate patch? Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 ovn 0/3] northd: fix IPv6-PD with northd IP rework
On Fri, Jan 14, 2022 at 1:17 PM Mark Michelson wrote: > > Thanks, Lorenzo. > > Acked-by: Mark Michelson Thanks Lorenzo and Mark. I applied the patch series to the main branch and branch-21.12 Numan > > On 1/13/22 19:12, Lorenzo Bianconi wrote: > > Changes since v1: > > - fix dhcpd issue on ubuntu env > > - enable IPv6 PD testing by default in gh workload > > > > Lorenzo Bianconi (3): > >northd: fix IPv6-PD with northd IP rework > >test: replace dibbler with dhcpd > >Add isc-dhcp-server to github workload > > > > .github/workflows/test.yml | 2 +- > > northd/ovn-northd.c| 3 ++- > > tests/atlocal.in | 4 ++-- > > tests/system-ovn.at| 39 -- > > 4 files changed, 21 insertions(+), 27 deletions(-) > > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy get/set cmd for netdev datapath
Aaron Conole writes: > we...@ucloud.cn writes: > >> From: wenxu >> >> Now, the default timeout policy for netdev datapath is hard codeing. In >> some case show or modify is needed. >> Add command for get/set default timeout policy. Using like this: >> >> ovs-appctl dpctl/ct-get-default-tp [dp] >> ovs-appctl dpctl/ct-set-default-tp [dp] policies >> >> Signed-off-by: wenxu >> --- > > So far looks good. Just one minor comment (and one conflict) that might be > able to be addressed After over an hour of loops, I did manage to produce one failure with your test case: --- - 2022-01-17 10:31:12.740151293 -0500 +++ /home/aconole/git/ovs2/ovs/tests/system-userspace-testsuite.dir/at-groups/94/stdout 2022-01-17 10:31:12.739245956 -0500 @@ -1,3 +1,2 @@ -icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5 udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5 I think the test suite is "stable" but it might be better to use something like 'ovs-ofctl monitor' to watch for events. Maybe you have tried this already? The errant 'sleep's can result in failures like this. WDYT? >> NEWS | 4 +++ >> lib/conntrack-tp.c | 11 +++ >> lib/conntrack-tp.h | 2 ++ >> lib/ct-dpif.c| 56 >> lib/ct-dpif.h| 9 ++ >> lib/dpctl.c | 69 >> >> lib/dpif-netdev.c| 25 +++ >> lib/dpif-netlink.c | 2 ++ >> lib/dpif-provider.h | 8 + >> tests/system-kmod-macros.at | 10 ++ >> tests/system-traffic.at | 67 ++ >> tests/system-userspace-macros.at | 7 >> 12 files changed, 270 insertions(+) >> >> diff --git a/NEWS b/NEWS >> index 553a57d..f9fc04a 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -39,6 +39,10 @@ Post-v2.16.0 >> is mainly for the benefit of OVN load balancing configurations. >> - Ingress policing on Linux now uses 'matchall' classifier instead of >> 'basic', if available. >> + - ovs-appctl dpctl/: >> + * New commands 'ct-set-default-tp' and >> + 'ct-set-default-tp' that allows to get or configure >> + netdev datapath ct default timeout policy. > > I had a conflict with this section on apply. It isn't a huge deal. > >> diff --git a/lib/conntrack-tp.h b/lib/conntrack-tp.h >> index 4d411d1..07dcb4e 100644 >> --- a/lib/conntrack-tp.h >> +++ b/lib/conntrack-tp.h >> @@ -710,6 +729,43 @@ ct_dpif_free_zone_limits(struct ovs_list *zone_limits) >> } >> } >> >> + >> +/* Parses a specification of a timeout policy from 's' into '*tp' >> + * . Returns true on success. Otherwise, returns false and >> + * and puts the error message in 'ds'. */ > > The '.' in the above is ending the sentence on the previous line, and > the world 'and' appears twice in a row. > > I think it should read: > >/* Parses a specification of a timeout policy from 's' into '*tp'. > * Returns true on success. Otherwise, returns false and puts the > * error message in 'ds'. */ ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] system-traffic.at: Do not use ranges with broadcast address.
Paolo Valerio writes: > turn a bunch of test ranges from 10.1.1.240-10.1.1.255 to > 10.1.1.240-10.1.1.254. 10.1.1.255 is the broadcast address for > 10.1.1.0/24 and can be picked to SNAT packets causing the subsequent > failure of the test. > > Fixes: 9ac0aadab9f9 ("conntrack: Add support for NAT.") > Fixes: e32cd4c6292e ("conntrack: ignore port for ICMP/ICMPv6 NAT.") > Signed-off-by: Paolo Valerio > --- Acked-by: Aaron Conole ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6] conntrack: support default timeout policy get/set cmd for netdev datapath
we...@ucloud.cn writes: > From: wenxu > > Now, the default timeout policy for netdev datapath is hard codeing. In > some case show or modify is needed. > Add command for get/set default timeout policy. Using like this: > > ovs-appctl dpctl/ct-get-default-tp [dp] > ovs-appctl dpctl/ct-set-default-tp [dp] policies > > Signed-off-by: wenxu > --- So far looks good. Just one minor comment (and one conflict) that might be able to be addressed > NEWS | 4 +++ > lib/conntrack-tp.c | 11 +++ > lib/conntrack-tp.h | 2 ++ > lib/ct-dpif.c| 56 > lib/ct-dpif.h| 9 ++ > lib/dpctl.c | 69 > > lib/dpif-netdev.c| 25 +++ > lib/dpif-netlink.c | 2 ++ > lib/dpif-provider.h | 8 + > tests/system-kmod-macros.at | 10 ++ > tests/system-traffic.at | 67 ++ > tests/system-userspace-macros.at | 7 > 12 files changed, 270 insertions(+) > > diff --git a/NEWS b/NEWS > index 553a57d..f9fc04a 100644 > --- a/NEWS > +++ b/NEWS > @@ -39,6 +39,10 @@ Post-v2.16.0 > is mainly for the benefit of OVN load balancing configurations. > - Ingress policing on Linux now uses 'matchall' classifier instead of > 'basic', if available. > + - ovs-appctl dpctl/: > + * New commands 'ct-set-default-tp' and > + 'ct-set-default-tp' that allows to get or configure > + netdev datapath ct default timeout policy. I had a conflict with this section on apply. It isn't a huge deal. > diff --git a/lib/conntrack-tp.h b/lib/conntrack-tp.h > index 4d411d1..07dcb4e 100644 > --- a/lib/conntrack-tp.h > +++ b/lib/conntrack-tp.h > @@ -710,6 +729,43 @@ ct_dpif_free_zone_limits(struct ovs_list *zone_limits) > } > } > > + > +/* Parses a specification of a timeout policy from 's' into '*tp' > + * . Returns true on success. Otherwise, returns false and > + * and puts the error message in 'ds'. */ The '.' in the above is ending the sentence on the previous line, and the world 'and' appears twice in a row. I think it should read: /* Parses a specification of a timeout policy from 's' into '*tp'. * Returns true on success. Otherwise, returns false and puts the * error message in 'ds'. */ ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1] conntrack: Fix FTP NAT when TCP but not IP offload is supported
On Fri, Jan 14, 2022 at 4:13 PM Flavio Leitner wrote: > > On Fri, Jan 14, 2022 at 03:50:52PM -0500, Mike Pattrick wrote: > > On Fri, Jan 14, 2022 at 3:33 PM Flavio Leitner wrote: > > > > > > > > > Hi Mike, > > > > > > Thanks for working on this issue. > > > > > > On Fri, Jan 14, 2022 at 10:45:35AM -0500, Mike Pattrick wrote: > > > > Formerly when userspace TSO was enabled but with a non-DKDK interface > > > > without support IP checksum offloading, FTP NAT connections would fail > > > > if the packet length changed. This can happen if the packets length > > > > changes during L7 NAT translation, predominantly with FTP. > > > > > > > > Now we correct the IP header checksum if hwol is disabled or if DPDK > > > > will not handle the IP checksum. This fixes the conntrack - IPv4 FTP > > > > Passive with DNAT" test when run with check-system-tso. > > > > > > > > Reported-by: Flavio Leitner > > > > > > Actually, this was initially reported by Ilya. > > > > > > > Signed-off-by: Mike Pattrick > > > > --- > > > > lib/conntrack.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/lib/conntrack.c b/lib/conntrack.c > > > > index 33a1a9295..1b8a26ac2 100644 > > > > --- a/lib/conntrack.c > > > > +++ b/lib/conntrack.c > > > > @@ -3402,7 +3402,8 @@ handle_ftp_ctl(struct conntrack *ct, const struct > > > > conn_lookup_ctx *ctx, > > > > } > > > > if (seq_skew) { > > > > ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew; > > > > -if (!dp_packet_hwol_is_ipv4(pkt)) { > > > > +if (!dp_packet_hwol_is_ipv4(pkt) || > > > > +!dp_packet_ip_checksum_valid(pkt)) { > > > > l3_hdr->ip_csum = > > > > recalc_csum16(l3_hdr->ip_csum, > > > > > > > > l3_hdr->ip_tot_len, > > > > htons(ip_len)); > > > > > > The problem is that the current code doesn't include IPv4 csum > > > handling as required by the Linux software ports. > > > > > > The patch above resolves the unit test issue because non-DPDK > > > interfaces will not flag the packet with good IP csum, and then > > > the csum is updated accordingly. However, a packet coming from > > > a physical DPDK port can have that flag set by the PMD, then if > > > it goes through that part the IP csum is not updated, which > > > will cause a problem if that packet is sent out over a Linux > > > software port later. I was curious about the performance impact of using csum() instead of recalc_csum16(). While setting up this benchmark, I also noticed some ways that we could improve recalc_csum16(). For example, making the function inline and unrolling the while loop. In my test, Performed 2^32 checksum updates with both the original, and my updated recalc_csum16() function, and then 2^32 full ipv4 header checksum calculations with the csum function(). I also tested both with and without clearing the cpu cache between calls. I found that the optimized recalc_csum16() was 1.5x faster then the current implementation, and the current recalc_csum16() implementation was only 2x faster then a full header calculation with csum(). Given these results, and the fact that any time we update the header, we will usually be affecting more then two bytes anyways, I think the performance improvement from using recalc_csum16() instead of csum() isn't so fantastic. -M > > > > This is a good point, I was trying to get to a happy medium without > > repurposing the patchset that you had previously submitted. That way > > this patch could be available immediately, and your more thorough TSO > > patchset could be applied after. > > I see what you did, and I appreciate that. My concern is that it's > not unusual to have packets moving between dpdk and linux ports, so > we might have to visit this issue again, though the unit test is not > failing. > > > I had also prepared a larger patch as part of this work, but scrapped > > it because it was duplicating a lot of the work you had previously > > done. But if you prefer the more substantial solution, we can scrap > > this patch and I can submit the larger one. > > Feel free to duplicate any parts of that RFC if that makes sense. > I think it's not duplicating anything. It's more as doing the > pieces that are more pressing first. > > Let's see the larger one and decide. If it is not suitable, then > maybe as a potential work around we can ignore offloading at that > point and always calculate full IP csum there. > > fbl > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVS DPDK MFEX Unit Tests Failing
On Wed, Jan 12, 2022 at 5:44 PM Ilya Maximets wrote: > Thanks for the report. And thanks, David, for checking, > I can not verify the change right now, I just wanted to rant for a > bit about system-dpdk testsuite being highly unstable. It depends > on the system configuration too much. Today I touched my iommu > configuration and now all the tests are failing due to tons of > virtual address hint warnings: > > +2022-01-12T16:25:40.190Z|00023|dpdk|WARN|EAL: WARNING! Base virtual address > hint (0x15000 != 0x7f7855c9) not respected! > +2022-01-12T16:25:40.190Z|00024|dpdk|WARN|EAL:This may cause issues with > mapping memory into secondary processes > +2022-01-12T16:25:40.191Z|00025|dpdk|WARN|EAL: WARNING! Base virtual address > hint (0x1b000 != 0x7f7855e0c000) not respected! > +2022-01-12T16:25:40.191Z|00026|dpdk|WARN|EAL:This may cause issues with > mapping memory into secondary processes > +2022-01-12T16:25:40.191Z|00027|dpdk|WARN|EAL: WARNING! Base virtual address > hint (0x24000c000 != 0x7f704000) not respected! > +2022-01-12T16:25:40.191Z|00028|dpdk|WARN|EAL:This may cause issues with > mapping memory into secondary processes > > Maybe we should just disable all DPDK warnings? About disabling all warnings, I am unsure. As a DPDK contributor, this test helps me to catch unwanted DPDK logs from the core libraries... No strong opinion. > BTW, DPDK should probably not emit a warning about secondary process > for applications with disabled multiprocessing. For this log here, that's something to fix/enhance. There was a patch from Anatoly that made those logs debug level (unless a base-virtaddr option was passed). I can revive the thread. That would hide this issue, but the memory allocator still expects mp to be available, regardless of --in-memory or rte_mp_disable calls. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v8] Encap & Decap actions for MPLS packet type.
> This version looks much better with PT_MPLS_MC. With a few cosmetic changes, > I applied the patch. HI Ilya, Thanks for adding the missing pieces and merging. > One non-cosmetic change though that I made is I added the case for the > PT_MPLS_MC to the second switch in the following code: > I think, it's needed there, but, please, let me know, if that's not the case. It is needed . I missed it. @echau...@redhat.com, Thanks again for guiding through. Btw, Due to some reason I have not received the email in my gmail account. Regards, Martin ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto-dpif: Fix memory leak in dpif/show-dp-features appctl.
On 1/17/22 10:14, Eelco Chaudron wrote: > > > On 17 Jan 2022, at 2:08, Ilya Maximets wrote: > >> Fixes: a98b700db617 ("ofproto: Add appctl command to show Datapath features") >> Signed-off-by: Ilya Maximets >> --- > > Change looks good to me... > > Acked-by: Eelco Chaudron > Thanks! Applied and backported down to 2.13. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1 1/2] ofproto-dpif: trigger revalidation when ipfix changes
On 15 Jan 2022, at 3:00, lic121 wrote: >> >> >> On 14 Jan 2022, at 10:38, lic121 wrote: >> On 14 Jan 2022, at 9:58, lic121 wrote: >> >> >> On 9 Jan 2022, at 14:44, lic121 wrote: >> >>> Currently, ipfix creation/delection don't trigger dpif backer >>> revalidation. This is not expected, as we need the revalidation >>> to commit ipfix into xlate. So that xlate can generate ipfix >>> actions. >>> >>> Signed-off-by: lic121 >>> --- >>> ofproto/ofproto-dpif.c | 5 + >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c >>> index bc3df8e..1cdef18 100644 >>> --- a/ofproto/ofproto-dpif.c >>> +++ b/ofproto/ofproto-dpif.c >>> @@ -2306,6 +2306,7 @@ set_ipfix( >>> { >>> struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_); >>> struct dpif_ipfix *di = ofproto->ipfix; >>> +struct dpif_ipfix *old_ipfix = ofproto->ipfix; >>> bool has_options = bridge_exporter_options || >>> flow_exporters_options; >>> bool new_di = false; >>> >>> @@ -2335,6 +2336,10 @@ set_ipfix( >>> } >>> } >>> >>> +if (old_ipfix != ofproto->ipfix) { >> >> This only works if ipfix was not configured earlier or disabled, i.e., >> ofproto->ipfix was/is NULL. >> If this was your intention, you could just have done “if (new_di || >> !ofproto->ipfix)”. >> >> However, I think there can also be changed in the configuration that >> requires a revalidate, what do you think? For example, >> enabling/disabling ingress/egress sampling. >> In this case dpif_ipfix_set_options() can be changed so it will return >> true if any configuration changes. >> > Actually, I had ever thought the same thing as you. But at last I didn't > do as that, for 3 reasons: > 1. I checked the history commit of ipfix, seems its not active in last > 2-3 years. So I guess not so many ovs users are using ipfix feature. > 2. In xlate_xbridge_set, the place where checks the change flag, it > doesn't check the ipfix options changes as well. > https://github.com/openvswitch/ovs/blob/eb1ab5357b6f3755f0ef1ee6d341ce24398d3bc1/ofproto/ofproto-dpif-xlate.c#L975-L978 But here it just set the pointer as a reference, it does not take care of acting on configuration changes, or do I miss something? >>> Assume that we checks the ipfix options changes in set_ipfix(): >>> 1. set_ipfix(ofproto, new_option,...) { >>>if (ofproto->ipfix->options != new_options) { >>> ipfix->options = new_options; >>> ofproto->backer->need_revalidate = REV_RECONFIGURE; >>>} >>>} >>> >>> 2. in xlate_xbridge_set, which is under revalidate context. >>> xlate_xbridge_set() { >>> ... >>> if (xbridge->ipfix != ipfix) { // here the ipfix options has changed, >>> but the "if test" will not aware that >>> dpif_ipfix_unref(xbridge->ipfix); >>> xbridge->ipfix = dpif_ipfix_ref(ipfix); >> >> But here the pointer does not change, so no need to update the reference to >> it. >> The actual configuration is taken in >> xlate_sample_action()/xlate_sample_action() used when creating/updating >> rules by the revalidator, kicked in by the succeeding udpif_revalidate() >> call. >> > After reading the code carefully again, I think you are right. > > I would like to divide the things into two parts. > Part #1 is to resolve unneccesary revalidate cased by set_lldp(). Part #2 is > the to take care of the ipfix/lldp content change more that enabling/diabling. > My team uses neither ipfix nor lldp, the problem we met is that > bridge_reconfigure() already trigger udpif revalidate because of lldp > problem(set_lldp() always trigger udpif revalidate). > So I would like to focus on the first part, but leave the second part for > users who really want to use ipfix/lldp features. > Please let me know your think, thanks. I see no reason why you could not split this patch up into two individual patches where the first one is sent out earlier than the other. >>> } >>> } >>> > 3. My main patch is the second one in this series, this patch is here > because it breaks two test cases. So I didn't spend muhc time on the > ipfix issue. See comments on the second patch. >>> +ofproto->backer->need_revalidate = REV_RECONFIGURE; >>> +} >>> + >>> return 0; >>> } >>> >>> -- >>> 1.8.3.1 >>> >>> ___ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> >> >> ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: Update to use RTE_ETH namespace defines.
On 13 Jan 2022, at 17:23, Kevin Traynor wrote: > This patch updates OVS to use DPDK RTE_ETH namespaces. > > DPDK commit 295968d17407 ("ethdev: add namespace") [0] added RTE_ETH > namespaces for ethdev enums and macros in DPDK 21.11. > > As compatibility for the older names was kept and they were not officially > deprecated in DPDK 21.11, there was no impact to OVS and OVS did not have > to be updated. > > In future DPDK releases the older names will be deprecated and that will > cause build warnings for OVS. They may also be removed from DPDK at some > point. > > There is no immediate need to update OVS to use the new namespaces while > DPDK 21.11 is being used but at the same time there is no need to wait > until it becomes an issue either. So might as well align with the > updated names in DPDK 21.11. > > [0] > http://git.dpdk.org/dpdk/commit/?id=295968d1740760337e16b0d7914875c5cac52850 > > Signed-off-by: Kevin Traynor > > --- Changes look good to me, and my pv(p) test all pass. Acked-by: Eelco Chaudron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ofproto: fix ipfix not always sampling on egress
We are currently requiring in_port to be a valid port number for ipfix sampling even if the sampling is done on the output port (egress). This restriction is not really needed and can affect pipelines that intentionally set the in_port to OFPP_NONE during flow processing. For instance, OVN does this, see: cfa547821 Fix ovn-controller generated packets from getting dropped for reject ACL action. This patch skips ipfix sampling only if both (ingress and egress) ports are invalid. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2016346 Signed-off-by: Adrian Moreno --- ofproto/ofproto-dpif-xlate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index cafcd014a..189276bc1 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3272,7 +3272,9 @@ compose_ipfix_action(struct xlate_ctx *ctx, odp_port_t output_odp_port) struct dpif_ipfix *ipfix = ctx->xbridge->ipfix; odp_port_t tunnel_out_port = ODPP_NONE; -if (!ipfix || ctx->xin->flow.in_port.ofp_port == OFPP_NONE) { +if (!ipfix || +(output_odp_port == ODPP_NONE && + ctx->xin->flow.in_port.ofp_port == OFPP_NONE)) { return; } -- 2.34.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] system-traffic.at: Do not use ranges with broadcast address.
On 16 Jan 2022, at 22:36, Paolo Valerio wrote: > turn a bunch of test ranges from 10.1.1.240-10.1.1.255 to > 10.1.1.240-10.1.1.254. 10.1.1.255 is the broadcast address for > 10.1.1.0/24 and can be picked to SNAT packets causing the subsequent > failure of the test. > > Fixes: 9ac0aadab9f9 ("conntrack: Add support for NAT.") > Fixes: e32cd4c6292e ("conntrack: ignore port for ICMP/ICMPv6 NAT.") > Signed-off-by: Paolo Valerio > Changes look good, and tess passes without any failures. Acked-by: Eelco Chaudron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto-dpif: Fix memory leak in dpif/show-dp-features appctl.
On 17 Jan 2022, at 2:08, Ilya Maximets wrote: > Fixes: a98b700db617 ("ofproto: Add appctl command to show Datapath features") > Signed-off-by: Ilya Maximets > --- Change looks good to me... Acked-by: Eelco Chaudron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev