Re: [ovs-dev] [PATCH 0/6] Add support for DPDK meter HW offload

2023-01-09 Thread Nole Zhang


> -Original Message-
> From: Eli Britstein 
> Sent: 2023年1月9日 17:32
> To: Nole Zhang ; d...@openvswitch.org
> Cc: Eelco Chaudron ; Ilya Maximets
> ; Chaoyong He ; oss-
> drivers 
> Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> 
> 
> 
> >-Original Message-
> >From: Nole Zhang 
> >Sent: Monday, 9 January 2023 11:23
> >To: Eli Britstein ; d...@openvswitch.org
> >Cc: Eelco Chaudron ; Ilya Maximets
> >; Chaoyong He ; oss-
> >drivers 
> >Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> >
> >External email: Use caution opening links or attachments
> >
> >
> >> -Original Message-
> >> From: Eli Britstein 
> >> Sent: 2023年1月8日 15:27
> >> To: Nole Zhang ; d...@openvswitch.org
> >> Cc: Eelco Chaudron ; Ilya Maximets
> >> ; Chaoyong He ; oss-
> >> drivers 
> >> Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> >>
> >>
> >>
> >> >-Original Message-
> >> >From: Nole Zhang 
> >> >Sent: Friday, 6 January 2023 11:28
> >> >To: Eli Britstein ; d...@openvswitch.org
> >> >Cc: Eelco Chaudron ; Ilya Maximets
> >> >; Chaoyong He ;
> oss-
> >> >drivers ; Nole Zhang
> >> >
> >> >Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> >> >
> >> >External email: Use caution opening links or attachments
> >> >
> >> >
> >> >> -Original Message-
> >> >> From: Eli Britstein 
> >> >> Sent: 2022年12月26日 18:04
> >> >> To: Simon Horman ;
> >> d...@openvswitch.org
> >> >> Cc: Eelco Chaudron ; Ilya Maximets
> >> >> ; Chaoyong He ;
> >oss-
> >> >> drivers ; Nole Zhang
> >> >> 
> >> >> Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> >> >>
> >> >> Dpif-netdev should not implement internal HW offload details. If
> >> >> need to "apply on all ports", it needs to be done in offload layer.
> >> >> However, in arch level, there is a problem with the proposed series.
> >> >> It will create a meter object per port, while in SW it is one
> >> >> object, that can be shared between multiple flows, on different ports.
> >> >
> >> >In dpif-netdev, it doesn't relate with implement internal HW offload
> >> >details, I just try to add the meter to the PMD if the PMD support
> >> >the meter
> >> offload.
> >> [Eli Britstein] your loops over ports, not over PMDs. See in [1], for
> >> example in
> >> dpif_netdev_offload_meter_set():
> >> +HMAP_FOR_EACH (port, node, >ports) {
> >> +dev = port->netdev;
> >> Am I wrong?
> >
> >Thanks for your notice, yes, as ovs code, it will add the meter in different
> port.
> >
> >As our design,  for different port, if the different port has the same
> >PMD with the same meter id, it will just  add the meter successfully
> >once in the dpdk and it can achieve sharing the same NIC different vf.
> >
> >If I add the judge for the PMD, for different PMD, just add the meter
> >once, do you think it is ok?
> No. Even if you improve the code to create "once". Suppose the PF is port 0,
> and 2 VF representors are ports 1,2.
> If the meter is created on port 1, using it with ports 0,2 is illegal from 
> DPDK
> generic point of view. It might be supported depending on specific PMD
> support.
> What will happen if port 1 is detached from OVS for example?
> I think it has to be on the "proxy" port, see below.

Thanks for your reply.

Yes, as your said, if I want support meter offload, the best way maybe is the 
proxy port, but
the commit seems doesn't make a further progress for a long time.

I am also rookie in the public community, in this situation, if I want to 
upstream this func,
I just wait or what can I do anything?

> >
> >>
> >> Other than that, there is already a convenient API to traverse ports
> >> for offload - netdev_ports_traverse().
> >
> >Ok, thanks, I will investigate it.
> >
> >> >
> >> >No, it will create a meter object per PMD not per port, so the meter
> >> >can share the same NIC different vf,  different NIC can't share  the
> >> >meter, it is same with ovs-tc meter offload
> >> [Eli Britstein] no, it will create an object per port, as this is your 
> >> code.
> >> To create a shared object for all the VFs in the same NIC, need to
> >> use the "proxy" port. Such work has started in [2].
> >>
> >> [1]
> >>
> >https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatc
> h
> >>
> >work.ozlabs.org%2Fproject%2Fopenvswitch%2Fpatch%2F20221216155054.
> 9&
> >dat
> >>
> >a=05%7C01%7Celibr%40nvidia.com%7C6ecc84b885334be08f7d08daf2230e
> 2a%
> >7C43
> >>
> >083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638088529626823112%7
> CUn
> >known%
> >>
> >7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW
> wi
> >LCJX
> >>
> >VCI6Mn0%3D%7C3000%7C%7C%7C=rcM52oDqpL%2F%2BewlX8ZRZS
> TL
> >mJslYBK7SI
> >> 4c9BX3ocY8%3D=0
> >> 86464-3-simon.hor...@corigine.com/
> >> [2]
> >>
> >https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatc
> h
> >>
> >work.ozlabs.org%2Fproject%2Fopenvswitch%2Fpatch%2F20220720121823.
> 2&
> >dat
> >>
> 

Re: [ovs-dev] [PATCH v5 1/1] userspace: Add SRv6 tunnel support.

2023-01-09 Thread Nobuhiro MIKI
On 2023/01/07 4:14, Ilya Maximets wrote:
> On 1/6/23 08:58, Nobuhiro MIKI wrote:
>> On 2023/01/03 7:59, Ilya Maximets wrote:
>>> On 10/11/22 08:11, Nobuhiro MIKI wrote:
 SRv6 (Segment Routing IPv6) tunnel vport is responsible
 for encapsulation and decapsulation the inner packets with
 IPv6 header and an extended header called SRH
 (Segment Routing Header). See spec in:

 https://datatracker.ietf.org/doc/html/rfc8754

 This patch implements SRv6 tunneling in userspace datapath.
 It uses `remote_ip` and `local_ip` options as with existing
 tunnel protocols. It also adds a dedicated `srv6_segs` option
 to define a sequence of routers called segment list.

 Signed-off-by: Nobuhiro MIKI 
>>>
>>> Hi.  Thanks for the patch and sorry for the late reply.
>>> See some coments inline.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>
>> Hi Ilya,
>>
>> Thanks for your kind review.
>> I have written comments on the points that need to be discussed.
>> I'll fix the other parts including the code format in the next patch.
>>
 +netdev_srv6_pop_header(struct dp_packet *packet)
 +{
 +struct pkt_metadata *md = >md;
 +struct flow_tnl *tnl = >tunnel;
 +struct srv6_base_hdr *srh;
 +int hlen = ETH_HEADER_LEN + IPV6_HEADER_LEN;
>>>
>>> We assume here that packet doesn't have any other extension
>>> headers.  Is that always true?  Should we also check that
>>> the next header is actually a routing header?  And if it is
>>> a type 4 header?
>>>
>>
>> Other extension headers can also be inserted. And since those extension
>> headers are in no particular order (although there is a recommended
>> order), we need to loop through all the extension headers to make sure
>> SRH is present. We also need to check that it is type 4.
> 
> We have a function that partially parses extension headers
> in lib/flow.c - parse_ipv6_ext_hdrs().  It advances the
> data pointer to the first non-extension header and collects
> some data along the way.  We may probably enhance it to
> additionally return a pointer to the RH/SRH header if present.
> 

Thanks! I think it works.

>>
 +
 +srh = ALIGNED_CAST(struct srv6_base_hdr *,
 +   (char *) dp_packet_data(packet) + hlen);
 +if (srh->segments_left > 0) {
 +VLOG_WARN_RL(_rl, "invalid srv6 segments_left=%d\n",
 + srh->segments_left);
>>>
>>> I suppose, this means that we do not support receiving
>>> packets with a segment list specified.  It might be OK,
>>> but the limitation should be documented somewhere.
>>>
>>> Is there a reason to not accept such packets?  i.e. not
>>> really decap the packet, but swap the destination IPs.
>>>
>>
>> I recognize the question as to whether or not to implement SRv6 End
>> function. I think it would be appropriate to reject the packet at this
>> time for the following two reasons:
>>
>> * We can be sure that all packets going out to type=srv6 vport are decapped.
>>   This is consistent with existing tunneling because it is the same behavior.
> 
> I don't think there should be a problem if the packet is not
> actually decaped, because OVS will fully re-parse the packet
> from scratch after executing the decap() action.  So, I guess,
> we should be fine here.  But, of course, I didn't test.
> 
>> * We can take the option to accept the packet which is segments_left > 0
>>   for SRv6 End function in the future.
> 
> OK, sure.  If you can add a note about this part not being
> supported in the FAQ, that can be enough for now.
> 

I did not know about re-parse.
At this time, I will state in the FAQ that it is unsupported now.

 diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c
 index 050eafa6b..33313d8fd 100644
 --- a/lib/tnl-ports.c
 +++ b/lib/tnl-ports.c
 @@ -126,7 +126,7 @@ map_insert(odp_port_t port, struct eth_addr mac, 
 struct in6_addr *addr,
 /* XXX: No fragments support. */
match.wc.masks.nw_frag = FLOW_NW_FRAG_MASK;

 -/* 'tp_port' is zero for GRE tunnels. In this case it
 +/* 'tp_port' is zero for GRE and SRv6 tunnels. In this case it
 * doesn't make sense to match on UDP port numbers. */
>>>
>>> Currently to detect that a tunnel port should receive a packet
>>> we're matching on the protocol and the port number.
>>>
>>> GRE is an exception, since it's another IP in IP protocol, but
>>> it has a distinct protocol number and a special header, so it's
>>> simple to detect.
>>>
>>> However, with SRv6 we have a generic IPv6 header with generic
>>> IPIP or IPV6 network protocol number.  It looks like OVS will
>>> intercept any other IPIP or IPv6|IPv6 packets, try to decapsuate
>>> them as if they are SRv6 and drop if they are not.
>>> This miht be an unwanted behavior.
>>>
>>> I wonder if we actually need to match on the exstence of the
>>> routing extension header.  For that we'll need parts of the
>>> following 

Re: [ovs-dev] [PATCH ovn] ovn-ic: Only monitor useful tables and columns.

2023-01-09 Thread Mark Michelson

Thanks Dumitru,

Acked-by: Mark Michelson 

I applied this change to main.

On 12/1/22 11:24, Dumitru Ceara wrote:

Signed-off-by: Dumitru Ceara 
---
  ic/ovn-ic.c | 109 +---
  1 file changed, 104 insertions(+), 5 deletions(-)

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index e5c193d9dc..45daa337ce 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -1866,13 +1866,112 @@ main(int argc, char *argv[])
  struct ovsdb_idl_loop ovnisb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER(
  ovsdb_idl_create(ovn_ic_sb_db, _idl_class, true, true));
  
-/* ovn-nb db. XXX: add only needed tables and columns */

+/* ovn-nb db. */
  struct ovsdb_idl_loop ovnnb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER(
-ovsdb_idl_create(ovnnb_db, _idl_class, true, true));
-
-/* ovn-sb db. XXX: add only needed tables and columns */
+ovsdb_idl_create(ovnnb_db, _idl_class, false, true));
+
+ovsdb_idl_add_table(ovnnb_idl_loop.idl, _table_nb_global);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl, _nb_global_col_name);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl, _nb_global_col_options);
+
+ovsdb_idl_add_table(ovnnb_idl_loop.idl,
+_table_logical_router_static_route);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_static_route_col_route_table);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_static_route_col_ip_prefix);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_static_route_col_nexthop);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_static_route_col_external_ids);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_static_route_col_options);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_static_route_col_policy);
+
+ovsdb_idl_add_table(ovnnb_idl_loop.idl, _table_logical_router);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_col_name);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_col_static_routes);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_col_ports);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_col_options);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_col_external_ids);
+
+ovsdb_idl_add_table(ovnnb_idl_loop.idl, _table_logical_router_port);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_port_col_name);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_port_col_networks);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_port_col_external_ids);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_router_port_col_options);
+
+ovsdb_idl_add_table(ovnnb_idl_loop.idl, _table_logical_switch);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_col_name);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_col_ports);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_col_other_config);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_col_external_ids);
+
+ovsdb_idl_add_table(ovnnb_idl_loop.idl, _table_logical_switch_port);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_name);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_addresses);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_options);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_type);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_up);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_addresses);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_enabled);
+ovsdb_idl_add_column(ovnnb_idl_loop.idl,
+ _logical_switch_port_col_external_ids);
+
+/* ovn-sb db. */
  struct ovsdb_idl_loop ovnsb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER(
-ovsdb_idl_create(ovnsb_db, _idl_class, true, true));
+ovsdb_idl_create(ovnsb_db, _idl_class, false, true));
+
+ovsdb_idl_add_table(ovnsb_idl_loop.idl, _table_chassis);
+ovsdb_idl_add_column(ovnsb_idl_loop.idl, _chassis_col_encaps);
+ovsdb_idl_add_column(ovnsb_idl_loop.idl, _chassis_col_name);
+ovsdb_idl_add_column(ovnsb_idl_loop.idl, _chassis_col_hostname);
+ovsdb_idl_add_column(ovnsb_idl_loop.idl, 

Re: [ovs-dev] [PATCH] cirrus: Update to use FreeBSD 12.4.

2023-01-09 Thread Ilya Maximets
On 1/9/23 17:19, Aaron Conole wrote:
> Ilya Maximets  writes:
> 
>> 12.4 was released in December.  That means that 12.3
>> will become unavailable in a near future.  Updating.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> Acked-by: Aaron Conole 
> 

Thanks!  Applied to all supported branches.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] actions: Clarify the NAT type for ovnact_ct_nat

2023-01-09 Thread Mark Michelson

THanks Ales and Dumitru, I pushed this to main.

On 12/9/22 09:28, Dumitru Ceara wrote:

On 11/30/22 10:47, Ales Musil wrote:

The encode_ct_nat took a bool specifying if the NAT
type is source or destination. However that doesn't work
with ct_commit_nat which has the NAT type unspecified.
Add enum that allows to differentiate between those
to make it clearer which type of NAT should be applied.

Signed-off-by: Ales Musil 
---


Looks good to me, thanks for the improvement!

Acked-by: Dumitru Ceara 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] system-dpdk: Fix error message in ping vhost-user ports.

2023-01-09 Thread Ilya Maximets
On 1/7/23 11:06, David Marchand wrote:
> On Fri, Jan 6, 2023 at 11:57 AM Eelco Chaudron  wrote:
>>
>> In some environments, ovs-vswitchd gets shutdown before the pkill of
>> testpmd has been completed, which results in the following error messages:
>>
>>   Removing port 'dpdkvhostuser0' while vhost device still attached.
>>   To restore connectivity after re-adding of port, VM on socket '' must be 
>> restarted.
>>
>> This patch will wait for the socket disconnect to be handled by the
>> vhost-user before shutting down OVS.
>>
>> Signed-off-by: Eelco Chaudron 
>> Signed-off-by: David Marchand 
>> Co-authored-by: David Marchand 
> 
> Just confirming that the patch lgtm.

Thanks, Eelco and David!  Applied.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-dpdk: Free mbufs in bulk.

2023-01-09 Thread Ilya Maximets
rte_pktmbuf_free_bulk() function was introduced in 19.11 and became
stable in 21.11.  Use it to free arrays of mbufs instead of freeing
packets one by one.

In simple V2V testing with 64B packets, 2 PMD threads and bidirectional
traffic this change improves performance by 3.5 - 4.5 %.

Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 5e2d64651..28a04bb7f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2286,13 +2286,8 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int 
qid,
 }
 
 if (OVS_UNLIKELY(nb_tx != cnt)) {
-/* Free buffers, which we couldn't transmit, one at a time (each
- * packet could come from a different mempool) */
-int i;
-
-for (i = nb_tx; i < cnt; i++) {
-rte_pktmbuf_free(pkts[i]);
-}
+/* Free buffers, which we couldn't transmit. */
+rte_pktmbuf_free_bulk([nb_tx], cnt - nb_tx);
 }
 
 return cnt - nb_tx;
@@ -2768,9 +2763,7 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
 }
 
 pkts = (struct rte_mbuf **) batch->packets;
-for (int i = 0; i < vhost_batch_cnt; i++) {
-rte_pktmbuf_free(pkts[i]);
-}
+rte_pktmbuf_free_bulk(pkts, vhost_batch_cnt);
 
 return 0;
 }
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 1/2] netdev-dpdk: Add per virtqueue statistics.

2023-01-09 Thread Ilya Maximets
On 1/9/23 10:31, Maxime Coquelin wrote:
> 
> 
> On 1/6/23 16:58, David Marchand wrote:
>> The DPDK vhost-user library maintains more granular per queue stats
>> which can replace what OVS was providing for vhost-user ports.
>>
>> The benefits for OVS:
>> - OVS can skip parsing packet sizes on the rx side,
>> - dev->stats_lock won't be taken in rx/tx code unless some packet is
>>    dropped,
>> - vhost-user is aware of which packets are transmitted to the guest,
>>    so per *transmitted* packet size stats can be reported,
>> - more internal stats from vhost-user may be exposed, without OVS
>>    needing to understand them,
>>
>> Note: the vhost-user library does not provide global stats for a port.
>> The proposed implementation is to have the global stats (exposed via
>> netdev_get_stats()) computed by querying and aggregating all per queue
>> stats.
>> Since per queue stats are exposed via another netdev ops
>> (netdev_get_custom_stats()), this may lead to some race and small
>> discrepancies.
>> This issue might already affect other netdev classes.
>>
>> Example:
>> $ ovs-vsctl get interface vhost4 statistics |
>>    sed -e 's#[{}]##g' -e 's#, #\n#g' |
>>    grep -v =0$
>> rx_1_to_64_packets=12
>> rx_256_to_511_packets=15
>> rx_65_to_127_packets=21
>> rx_broadcast_packets=15
>> rx_bytes=7497
>> rx_multicast_packets=33
>> rx_packets=48
>> rx_q0_good_bytes=242
>> rx_q0_good_packets=3
>> rx_q0_guest_notifications=3
>> rx_q0_multicast_packets=3
>> rx_q0_size_65_127_packets=2
>> rx_q0_undersize_packets=1
>> rx_q1_broadcast_packets=15
>> rx_q1_good_bytes=7255
>> rx_q1_good_packets=45
>> rx_q1_guest_notifications=45
>> rx_q1_multicast_packets=30
>> rx_q1_size_256_511_packets=15
>> rx_q1_size_65_127_packets=19
>> rx_q1_undersize_packets=11
>> tx_1_to_64_packets=36
>> tx_256_to_511_packets=45
>> tx_65_to_127_packets=63
>> tx_broadcast_packets=45
>> tx_bytes=22491
>> tx_multicast_packets=99
>> tx_packets=144
>> tx_q0_broadcast_packets=30
>> tx_q0_good_bytes=14994
>> tx_q0_good_packets=96
>> tx_q0_guest_notifications=96
>> tx_q0_multicast_packets=66
>> tx_q0_size_256_511_packets=30
>> tx_q0_size_65_127_packets=42
>> tx_q0_undersize_packets=24
>> tx_q1_broadcast_packets=15
>> tx_q1_good_bytes=7497
>> tx_q1_good_packets=48
>> tx_q1_guest_notifications=48
>> tx_q1_multicast_packets=33
>> tx_q1_size_256_511_packets=15
>> tx_q1_size_65_127_packets=21
>> tx_q1_undersize_packets=12
>>
>> Reviewed-by: Maxime Coquelin 
>> Signed-off-by: David Marchand 
>> ---
>> Changes since v6:
>> - fixed tx_retries,
>> - dropped netdev_dpdk_vhost_update_[rt]x_counters helpers. This makes
>>    it clearer that a lock can be taken in the vhost rx/tx code and this
>>    aligns with the dpdk rx/tx code too,
>>
>> Changes since v5:
>> - added missing dev->stats_lock acquire in netdev_dpdk_vhost_get_stats,
>> - changed netdev_dpdk_vhost_update_[rt]x_counters to take dev->stats_lock
>>    only when some packets got dropped in OVS. Since the rx side won't
>>    take the lock unless some QoS configuration is in place, this change
>>    will likely have the same effect as separating stats_lock into rx/tx
>>    dedicated locks. Testing shows a slight (around 1%) improvement of
>>    performance for some V2V setup,
>>
>> Changes since v3:
>> - rebased to master now that v22.11 landed,
>> - fixed error code in stats helper when vhost port is not "running",
>> - shortened rx/tx stats macro names,
>>
>> Changes since RFC v2:
>> - dropped the experimental api check (now that the feature is marked
>>    stable in DPDK),
>> - moved netdev_dpdk_get_carrier() forward declaration next to the
>>    function needing it,
>> - used per q stats for netdev_get_stats() and removed OVS per packet
>>    size accounting logic,
>> - fixed small packets counter (see rx_undersized_errors hack),
>> - added more Tx stats,
>> - added unit tests,
>>
>> ---
>>   lib/netdev-dpdk.c    | 447 ++-
>>   tests/system-dpdk.at |  33 +++-
>>   2 files changed, 348 insertions(+), 132 deletions(-)
>>
> 
> My R-by was already set from previous revision, but just to confirm this
> new version is good to me:
> 
> Reviewed-by: Maxime Coquelin 


Thanks, Maxime and David!  Applied.

Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 0day-bot system is shutting down for system update

2023-01-09 Thread Michael Santana
Hi all,

We are shutting down 0day-bot for today for a system update. We will be
back online later by the end of the day EST.

Best
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/5] Documentation: Fix link to Netperf.

2023-01-09 Thread David Marchand
netperf.org was shut down in favor of some HP related resources.

Signed-off-by: David Marchand 
---
 Documentation/howto/qos.rst | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/howto/qos.rst b/Documentation/howto/qos.rst
index 376ec2514b..7d625e0019 100644
--- a/Documentation/howto/qos.rst
+++ b/Documentation/howto/qos.rst
@@ -59,10 +59,10 @@ is participating in an OVS bridge, no IP address can be 
assigned on `eth0`.
 
 The second host, named Measurement Host, can be any host capable of measuring
 throughput from a VM. For this guide, we use `netperf
-`__, a free tool for testing the rate at which one host
-can send to another. The Measurement Host has only a single NIC, `eth0`, which
-is connected to the Data Network. `eth0` has an IP address that can reach any
-VM on `host1`.
+`__, a free tool for testing the rate
+at which one host can send to another. The Measurement Host has only a single
+NIC, `eth0`, which is connected to the Data Network. `eth0` has an IP address
+that can reach any VM on `host1`.
 
 Two VMs
 ~~~
-- 
2.39.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 5/5] Documentation: Remove reference to RST online editor.

2023-01-09 Thread David Marchand
rst.ninjs.org is not available anymore, but there are alternatives
listed in this doc.

Signed-off-by: David Marchand 
---
 Documentation/internals/contributing/documentation-style.rst | 4 
 1 file changed, 4 deletions(-)

diff --git a/Documentation/internals/contributing/documentation-style.rst 
b/Documentation/internals/contributing/documentation-style.rst
index 045cdf6967..2eec4c4d26 100644
--- a/Documentation/internals/contributing/documentation-style.rst
+++ b/Documentation/internals/contributing/documentation-style.rst
@@ -423,10 +423,6 @@ Helpful Tools
 There are a number of tools, online and offline, which can be used to preview
 documents are you edit them:
 
-- `rst.ninjs.org `__
-
-  An online rST editor/previewer
-
 - `ReText `__
 
   A simple but powerful editor for Markdown and reStructuredText. ReText is
-- 
2.39.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/5] Documentation: Fix link to AppVeyor.

2023-01-09 Thread David Marchand
Sphinx linkcheck complains with:

Warning, treated as error:
.../Documentation/intro/install/windows.rst:1093:broken link:
www.appveyor.com ()

Add a https scheme in link to AppVeyor website.

Signed-off-by: David Marchand 
---
 Documentation/intro/install/windows.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/intro/install/windows.rst 
b/Documentation/intro/install/windows.rst
index 44fc6ae379..78f60f35ac 100644
--- a/Documentation/intro/install/windows.rst
+++ b/Documentation/intro/install/windows.rst
@@ -1090,9 +1090,9 @@ To stop and delete the services, run:
 Windows CI Service
 --
 
-`AppVeyor `__ provides a free Windows autobuild service for
-open source projects.  Open vSwitch has integration with AppVeyor for
-continuous build.  A developer can build test his changes for Windows by
+`AppVeyor `__ provides a free Windows autobuild
+service for open source projects.  Open vSwitch has integration with AppVeyor
+for continuous build.  A developer can build test his changes for Windows by
 logging into appveyor.com using a github account, creating a new project by
 linking it to his development repository in github and triggering a new build.
 
-- 
2.39.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 4/5] Documentation: Remove link to obsolete sources.

2023-01-09 Thread David Marchand
This archive website disappeared.
On the other hand, the link to an obsolete dpif-provider man page
probably did not provide much info and we can simply mention the current
file.

Signed-off-by: David Marchand 
---
 Documentation/topics/windows.rst | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/topics/windows.rst b/Documentation/topics/windows.rst
index c5b34c85fb..7b10de43c1 100644
--- a/Documentation/topics/windows.rst
+++ b/Documentation/topics/windows.rst
@@ -190,8 +190,8 @@ the kernel datapath and was ported independently of the 
kernel datapath effort.
 
 As explained in the OVS porting design document [7]_, DPIF is the portion of
 userspace that interfaces with the kernel portion of the OVS. The interface
-that each DPIF provider has to implement is defined in ``dpif-provider.h``
-[3]_.  Though each platform is allowed to have its own implementation of the
+that each DPIF provider has to implement is defined in ``dpif-provider.h``.
+Though each platform is allowed to have its own implementation of the
 DPIF provider, it was found, via community feedback, that it is desired to
 share code whenever possible. Thus, the DPIF provider for OVS on Hyper-V shares
 code with the DPIF provider on Linux. This interface is implemented in
@@ -501,7 +501,6 @@ References
 
 .. [1] Hyper-V Extensible Switch 
https://msdn.microsoft.com/windows/hardware/drivers/network/hyper-v-extensible-switch
 .. [2] Hyper-V Extensible Switch Extensions 
https://msdn.microsoft.com/windows/hardware/drivers/network/hyper-v-extensible-switch-extensions
-.. [3] DPIF Provider 
http://openvswitch.sourcearchive.com/documentation/1.1.0-1/dpif-provider_8h_source.html
 .. [4] Hyper-V Extensible Switch Components 
https://msdn.microsoft.com/windows/hardware/drivers/network/hyper-v-extensible-switch-components
 .. [5] Windows Filtering Platform 
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366510(v=vs.85).aspx
 .. [6] IP Helper 
https://msdn.microsoft.com/windows/hardware/drivers/network/ip-helper
-- 
2.39.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/5] Documentation: Fix link to iproute2 git repository.

2023-01-09 Thread David Marchand
iproute2 git repositories were splitted and moved around v4.15 [1].
It is time to fix the link in OVS documentation.

1: https://lore.kernel.org/netdev/20180129082052.0eb85e9b@xeon-e3/

Signed-off-by: David Marchand 
---
 Documentation/topics/testing.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/topics/testing.rst b/Documentation/topics/testing.rst
index bc41b217a5..5f6940b84d 100644
--- a/Documentation/topics/testing.rst
+++ b/Documentation/topics/testing.rst
@@ -448,7 +448,7 @@ datapath testsuite.
   an updated iproute2 utilities package.  The package is available from
   the Linux kernel organization open source git repositories.
 
-  https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
+  https://git.kernel.org/pub/scm/network/iproute2/iproute2.git
 
 .. _testing-static-analysis:
 
-- 
2.39.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v9 0/8] Support 2+ controllers on the same vswitchd

2023-01-09 Thread Mark Michelson

Hi Ihar,

I gave the series another look (specifically at the changes made between 
v8 and v9).


Given the conversation after v8, I went ahead and added my Acks to any 
of the patches that did not already have them. I then pushed the series 
to main.


Thanks a bunch for your hard work and patience on this.

On 1/6/23 15:03, Ihar Hrachyshka wrote:

This series adds support to run multiple ovn-controller instances using
the same vswitchd instance. This may be used to reuse a single host
level vswitchd installation to run multiple CMS (e.g. k8s and
openstack), each having its own OVN stack running on a separate
integration bridge.

This setup may, in some instances, simplify administration of the
system, since the admin no longer needs to maintain separate vswitchd
installations (e.g. in separate containers). This is also helpful when
running different datapath types for the mixed setup.

v1: initial series
v2: change tunnel port naming scheme: include "chassis index" instead of
 its name for source chassis.
v2: formatting adjustments.
v3: fixed build due to ovs_abort missing arguments.
v3: added documentation to CLI and system-id-override file.
v3: added documentation for chassis specific db config options.
v3: documented the ability to run multiple controllers on the same host,
 while mentioning that this support is highly experimental.
v3: updated NEWS file to include the note about the new experimental
 issue.
v3: rebased.
v4: fixed a memory leak in get_chassis_idx.
v5: actually fix the leak...
v6: fix race condition in new test cases where ssl db configuration was
 removed before ovn-controller has a chance to read it from db,
 making it fail to start and process ports
v7: addresses Mark's comments from v6, specifically:
 - 1/7: Clean up allocated index on exit
 - 1/7: Remove hardcoded 16 with sizeof(CHASSIS_IDX_PREFIX) - 1
 - 1/7: Explain in comments why the first chassis uses an empty
   string index and not “0”
 - 2/7: Document that requested-chassis should use unique chassis
   names, not hostnames
 - 2/7: Document that unique own-bridges should be used for multiple
   co-hosted controllers
 - 2/7: Reworked logic for get_chassis_external_value functions to
   avoid duplication with smap_get_ functions
 - 2/7: Use wait_column in tests
 - 3/7: Add information about the directory where system-id-override
   file is located
 - 3/7: Use FILE * functions from stdio.h instead of Unix lower level
   equivalents
 - 3/7: Use wait_column in tests
 - 4/7: Also validate that CLI takes precedence over override file,
   not only db
 - 4/7: Use wait_column in tests
 - 5/7: - (no changes)
 - 6/7: Move start_virtual_controller to ovn-macros.at to avoid code
   duplication
 - 6/7: Added comments explaining why test cases running two
   controllers use the same sandbox name for both
 - 7/7: - (no changes)
v8:
 - 1/7: fixed test case for ovn-chassis-idx population
 - 1/7: fixed SIGSEGV errors in chassis_cleanup for chassis index
v9:
 - 5/8: fixed tunnel ports cleanup test (it wasn't catching
regressions)
 - 6/8: include "Don't delete patch ports that don't belong to
br-int" patch similar to the one for tunnel ports
 - 8/8: explicitly mention in the documentation that stateful ACLs
won't work as-is with the new feature.

Ihar Hrachyshka (8):
   Include "chassis index" into tunnel port name
   Support ovn-...- specific global ovsdb options
   Allow to override system-id via file
   Support passing chassis name via CLI
   Don't touch tunnel ports from a different br-int
   Don't delete patch ports that don't belong to br-int
   Add connectivity test for 2 controllers on the same host
   Document experimental support for co-hosted controllers

  NEWS|   2 +
  controller/chassis.c| 280 -
  controller/chassis.h|  15 +-
  controller/encaps.c |  76 +++
  controller/encaps.h |   1 -
  controller/ovn-controller.8.xml |  36 +++-
  controller/ovn-controller.c | 186 -
  controller/patch.c  |  42 +++-
  controller/patch.h  |   2 +-
  controller/physical.c   |   2 +-
  lib/ovn-util.c  |  79 +++
  lib/ovn-util.h  |  26 +++
  ovn-nb.xml  |   8 +
  tests/automake.mk   |   1 +
  tests/ovn-macros.at |  36 +++-
  tests/ovn.at| 360 
  tests/ovs-macros.at |   2 +
  17 files changed, 992 insertions(+), 162 deletions(-)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] cirrus: Update to use FreeBSD 12.4.

2023-01-09 Thread Aaron Conole
Ilya Maximets  writes:

> 12.4 was released in December.  That means that 12.3
> will become unavailable in a near future.  Updating.
>
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/2] dpif-netdev: Add load based PMD sleeping.

2023-01-09 Thread Robin Jarry
Kevin Traynor, Jan 06, 2023 at 15:59:
> Sleep for an incremental amount of time if none of the Rx queues
> assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
> on an polling iteration of the PMD.
>
> Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
> sleep time to zero (i.e. no sleep).
>
> Sleep time will be increased on each iteration where the low load
> conditions remain up to a total of the max sleep time which is set
> by the user e.g:
> ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500
>
> The default pmd-maxsleep value is 0, which means that no sleeps
> will occur and the default behaviour is unchanged from previously.
>
> Also add new stats to pmd-perf-show to get visibility of operation
> e.g.
> ...
>- sleep iterations:   153994  ( 76.8 % of iterations)
>Sleep time:   9159399  us ( 46 us/iteration avg.)
> ...
>
> Signed-off-by: Kevin Traynor 

Hi Kevin,

For the record, here are a few numbers that were gathered on a HP DL360
Gen9 server (Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz) with and without
this patch series applied.

Single socket, Physical to physical test, 2 cores in pmd-cpu-mask, power
measurement with pcm-power:

++++--+-+
|| Reference: | Powersave: | pmd-maxsleep | Power off   |
|| disabled   || 500us| unused cores|
|| c-states   | C6 enabled | C6 enabled   | (X remaining)   |
++++--+-+
| No OvS | 33 W   | 11.30W | N/A  | 2 cores online  |
||||  | All OFF: 11.30W |
++++--+-+
| No traffic | 37W| 26.5W  | 12W  | 12W |
| 0 PPS  |||  | |
++++--+-+
| Idle   | 37W| 26.5W  | 12W  | 12W |
| 1k pps |||  | |
++++--+-+
| Medium | 37W| 27W| 15-20W   | 15-20W  |
| 1 Mpps |||  | |
++++--+-+
| High   | 38W| 28W| 28W  | 28W |
| 14 Mpps|||  | |
++++--+-+

> diff --git a/Documentation/topics/dpdk/pmd.rst 
> b/Documentation/topics/dpdk/pmd.rst
> index 9006fd40f..89f6b3052 100644
> --- a/Documentation/topics/dpdk/pmd.rst
> +++ b/Documentation/topics/dpdk/pmd.rst
> @@ -325,4 +325,55 @@ reassignment due to PMD Auto Load Balance. For example, 
> this could be set
>  (in min) such that a reassignment is triggered at most every few hours.
>  
> +PMD Power Saving (Experimental)
> +---
> +
> +PMD threads constantly poll Rx queues which are assigned to them. In order to
> +reduce the CPU cycles they use, they can sleep for small periods of time
> +when there is no load or very-low load on all the Rx queues they poll.
> +
> +This can be enabled by setting the max requested sleep time (in microseconds)
> +for a PMD thread::
> +
> +$ ovs-vsctl set open_vswitch . other_config:pmd-maxsleep=500
> +
> +Non-zero values will be rounded up to the nearest 10 microseconds to avoid
> +requesting very small sleep times.
> +
> +With a non-zero max value a PMD may request to sleep by an incrementing 
> amount
> +of time up to the maximum time. If at any point the threshold of at least 
> half
> +a batch of packets (i.e. 16) is received from an Rx queue that the PMD is
> +polling is met, the requested sleep time will be reset to 0. At that point no
> +sleeps will occur until the no/low load conditions return.
> +
> +Sleeping in a PMD thread will mean there is a period of time when the PMD
> +thread will not process packets. Sleep times requested are not guaranteed
> +and can differ significantly depending on system configuration. The actual
> +time not processing packets will be determined by the sleep and processor
> +wake-up times and should be tested with each system configuration.
> +
> +Sleep time statistics for 10 secs can be seen with::
> +
> +$ ovs-appctl dpif-netdev/pmd-stats-clear \
> +&& sleep 10 && ovs-appctl dpif-netdev/pmd-perf-show
> +
> +Example output, showing that during the last 10 seconds, 76.8% of iterations
> +had a sleep of some length. The total amount of sleep time was 9.15 seconds 
> and
> +the average sleep time per iteration was 46 microseconds::
> +
> +   - sleep iterations:   153994  ( 76.8 % of iterations)
> +   Sleep time:   9159399  us ( 46 

Re: [ovs-dev] [PATCH ovn] northd: Drop packets destined to router owned NAT IP for DGP.

2023-01-09 Thread Mark Michelson
Hi Han, thanks for this change and thanks for putting in the work to 
debug the issue and to explain how the problem manifests.


I only have a couple of small comments below.

On 12/19/22 19:05, Han Zhou wrote:

When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f9275ac (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda 
Reported-at: 
https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimo...@syntaxhighlighted.com/
Reported-by: Frode Nordahl 
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta 
Reported-at: https://github.com/ovn-org/ovn/issues/134
Reported-by: Syed Ammad Ali 
Reported-at: https://github.com/ovn-org/ovn/issues/153
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou 
---
  northd/northd.c | 17 
  northd/ovn-northd.8.xml | 29 +
  tests/ovn.at| 92 ++---
  3 files changed, 116 insertions(+), 22 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 4751feab4f3c..ebd153b020fa 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -14341,6 +14341,23 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od, struct hmap *lflows,
  sset_add(_entries, nat->external_ip);
  } else {
  if (!sset_contains(_entries, nat->external_ip)) {
+/* Drop packets coming in from external but still has
+ * destination IP equals to the NAT external IP, to avoid loop.
+ * The packets must have went through DNAT/unSNAT stage but
+ * failed to convert the destination. */
+ds_clear(match);
+ds_put_format(
+match, "inport == %s && outport == %s && ip%s.dst == %s",
+l3dgw_port->json_key, l3dgw_port->json_key,
+is_v6 ? "6" : "4", nat->external_ip);
+ovn_lflow_add_with_hint(lflows, od,
+S_ROUTER_IN_ARP_RESOLVE,
+150, ds_cstr(match),
+"drop;",


With Adrian Moreno's series to sample all packet drops, we should not be 
using a bare "drop;" action in ovn-northd any more. Instead, call the 
debug_drop_action() function to fill in the proper drop action.



+>header_);
+/* Now for packets coming from other (downlink) LRPs, allow ARP
+ * resolve for the NAT IP, so that such packets can be
+ * forwarded for E/W NAT. */
  ds_clear(match);
  ds_put_format(
  match, "outport == %s && %s == %s",
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 25a742c904c6..e06f49b26ea4 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -4262,13 +4262,17 @@ outport = P
For each row in the NAT table with IPv4 address
A in the  column of
-   table, a priority-100
-  flow with the match outport === P 
-  reg0 == A has 

Re: [ovs-dev] [PATCH] Documentation: Fix links in the DPDK guide on physical ports.

2023-01-09 Thread Ilya Maximets
On 1/9/23 16:41, David Marchand wrote:
> On Fri, Jan 6, 2023 at 6:31 PM Ilya Maximets  wrote:
>>
>> On 1/6/23 08:59, David Marchand wrote:
>>> On Thu, Jan 5, 2023 at 8:34 PM Ilya Maximets  wrote:

 The text enclosed in '<...>' supposed to be an actual link and not the
 name of the link.  This generates incorrect links that lead nowhere.
>>>
>>> Indeed.
>>> I was wondering if you have access to openvswitch.org http logs, to
>>> catch other dead links, perhaps?
>>
>> I found these by running 'make docs-check' locally.  The result might
>> depend on the sphinx version though as I do not remember seeing these
>> errors in the past.  I have Sphinx 4.4.0 right now.
>>
>> There is also 'make check-docs' (horrible naming :D ) that invokes the
>> full link check and it can detect broken links, redirects or unavailable
> 
> Wah... nicely hidden check :-).
> 
>> resources as it actually queries these links.  For example, we currently
>> have a broken link to iproute2 git in the docs...
> 
> Can't we fix as:
>  -  https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
> +  https://git.kernel.org/pub/scm/network/iproute2/iproute2.git
> 
> ?

Feel free to submit a patch. :)

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Fix links in the DPDK guide on physical ports.

2023-01-09 Thread David Marchand
On Fri, Jan 6, 2023 at 6:31 PM Ilya Maximets  wrote:
>
> On 1/6/23 08:59, David Marchand wrote:
> > On Thu, Jan 5, 2023 at 8:34 PM Ilya Maximets  wrote:
> >>
> >> The text enclosed in '<...>' supposed to be an actual link and not the
> >> name of the link.  This generates incorrect links that lead nowhere.
> >
> > Indeed.
> > I was wondering if you have access to openvswitch.org http logs, to
> > catch other dead links, perhaps?
>
> I found these by running 'make docs-check' locally.  The result might
> depend on the sphinx version though as I do not remember seeing these
> errors in the past.  I have Sphinx 4.4.0 right now.
>
> There is also 'make check-docs' (horrible naming :D ) that invokes the
> full link check and it can detect broken links, redirects or unavailable

Wah... nicely hidden check :-).

> resources as it actually queries these links.  For example, we currently
> have a broken link to iproute2 git in the docs...

Can't we fix as:
 -  https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
+  https://git.kernel.org/pub/scm/network/iproute2/iproute2.git

?

-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/2] dpif-netdev: Add load based PMD sleeping.

2023-01-09 Thread David Marchand
On Fri, Jan 6, 2023 at 4:00 PM Kevin Traynor  wrote:
>
> Sleep for an incremental amount of time if none of the Rx queues
> assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
> on an polling iteration of the PMD.
>
> Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
> sleep time to zero (i.e. no sleep).
>
> Sleep time will be increased on each iteration where the low load
> conditions remain up to a total of the max sleep time which is set
> by the user e.g:
> ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500
>
> The default pmd-maxsleep value is 0, which means that no sleeps
> will occur and the default behaviour is unchanged from previously.
>
> Also add new stats to pmd-perf-show to get visibility of operation
> e.g.
> ...
>- sleep iterations:   153994  ( 76.8 % of iterations)
>Sleep time:   9159399  us ( 46 us/iteration avg.)
> ...
>
> Signed-off-by: Kevin Traynor 
> ---
>  Documentation/topics/dpdk/pmd.rst | 51 
>  NEWS  |  3 ++
>  lib/dpif-netdev-perf.c| 24 +---
>  lib/dpif-netdev-perf.h|  5 ++-
>  lib/dpif-netdev.c | 64 +--
>  tests/pmd.at  | 46 ++
>  vswitchd/vswitch.xml  | 26 +
>  7 files changed, 209 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/topics/dpdk/pmd.rst 
> b/Documentation/topics/dpdk/pmd.rst
> index 9006fd40f..89f6b3052 100644
> --- a/Documentation/topics/dpdk/pmd.rst
> +++ b/Documentation/topics/dpdk/pmd.rst
> @@ -325,4 +325,55 @@ reassignment due to PMD Auto Load Balance. For example, 
> this could be set
>  (in min) such that a reassignment is triggered at most every few hours.
>
> +PMD Power Saving (Experimental)
> +---

I would stick to: "PMD load based sleeping"
The powersaving comes from some external configuration that this patch
does not cover.

Maybe you could mention something about c-states, but it seems out of
OVS scope itself.


> +
> +PMD threads constantly poll Rx queues which are assigned to them. In order to
> +reduce the CPU cycles they use, they can sleep for small periods of time
> +when there is no load or very-low load on all the Rx queues they poll.
> +
> +This can be enabled by setting the max requested sleep time (in microseconds)
> +for a PMD thread::
> +
> +$ ovs-vsctl set open_vswitch . other_config:pmd-maxsleep=500
> +
> +Non-zero values will be rounded up to the nearest 10 microseconds to avoid
> +requesting very small sleep times.
> +
> +With a non-zero max value a PMD may request to sleep by an incrementing 
> amount
> +of time up to the maximum time. If at any point the threshold of at least 
> half
> +a batch of packets (i.e. 16) is received from an Rx queue that the PMD is
> +polling is met, the requested sleep time will be reset to 0. At that point no
> +sleeps will occur until the no/low load conditions return.
> +
> +Sleeping in a PMD thread will mean there is a period of time when the PMD
> +thread will not process packets. Sleep times requested are not guaranteed
> +and can differ significantly depending on system configuration. The actual
> +time not processing packets will be determined by the sleep and processor
> +wake-up times and should be tested with each system configuration.
> +
> +Sleep time statistics for 10 secs can be seen with::
> +
> +$ ovs-appctl dpif-netdev/pmd-stats-clear \
> +&& sleep 10 && ovs-appctl dpif-netdev/pmd-perf-show
> +
> +Example output, showing that during the last 10 seconds, 76.8% of iterations
> +had a sleep of some length. The total amount of sleep time was 9.15 seconds 
> and
> +the average sleep time per iteration was 46 microseconds::
> +
> +   - sleep iterations:   153994  ( 76.8 % of iterations)
> +   Sleep time:   9159399  us ( 46 us/iteration avg.)
> +
> +.. note::
> +
> +If there is a sudden spike of packets while the PMD thread is sleeping 
> and
> +the processor is in a low-power state it may result in some lost packets 
> or
> +extra latency before the PMD thread returns to processing packets at full
> +rate.
> +
> +.. note::
> +
> +Default Linux kernel hrtimer resolution is set to 50 microseconds so this
> +will add overhead to requested sleep time.
> +
>  .. _ovs-vswitchd(8):
>  http://openvswitch.org/support/dist-docs/ovs-vswitchd.8.html
> diff --git a/NEWS b/NEWS
> index 2f6ededfe..54d97825e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -31,4 +31,7 @@ Post-v3.0.0
>   * Add '-secs' argument to appctl 'dpif-netdev/pmd-rxq-show' to show
> the pmd usage of an Rx queue over a configurable time period.
> + * Add new experiemental PMD load based sleeping feature. PMD threads can

*experimental


> +   request to sleep up to a user configured 'pmd-maxsleep' value under no
> +   and low load conditions.
>
>
> diff --git 

[ovs-dev] [PATCH ovn v2] northd: Add flag for CT related

2023-01-09 Thread Ales Musil
In order to be backward compatible add feature flag
that ensures that the CT related flows are skipped
if needed.

Signed-off-by: Ales Musil 
---
v2: Fix the wrong * position.
---
 controller/chassis.c   |   7 +++
 include/ovn/features.h |   1 +
 northd/northd.c|  51 +--
 northd/northd.h|   1 +
 tests/ovn-northd.at| 109 +++--
 5 files changed, 149 insertions(+), 20 deletions(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index 685d9b2ae..98f8da2be 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -352,6 +352,7 @@ chassis_build_other_config(const struct ovs_chassis_cfg 
*ovs_cfg,
 smap_replace(config, OVN_FEATURE_PORT_UP_NOTIF, "true");
 smap_replace(config, OVN_FEATURE_CT_NO_MASKED_LABEL, "true");
 smap_replace(config, OVN_FEATURE_MAC_BINDING_TIMESTAMP, "true");
+smap_replace(config, OVN_FEATURE_CT_LB_RELATED, "true");
 }
 
 /*
@@ -469,6 +470,12 @@ chassis_other_config_changed(const struct ovs_chassis_cfg 
*ovs_cfg,
 return true;
 }
 
+if (!smap_get_bool(_rec->other_config,
+   OVN_FEATURE_CT_LB_RELATED,
+   false)) {
+return true;
+}
+
 return false;
 }
 
diff --git a/include/ovn/features.h b/include/ovn/features.h
index 679f67457..5bcd68739 100644
--- a/include/ovn/features.h
+++ b/include/ovn/features.h
@@ -24,6 +24,7 @@
 #define OVN_FEATURE_PORT_UP_NOTIF  "port-up-notif"
 #define OVN_FEATURE_CT_NO_MASKED_LABEL "ct-no-masked-label"
 #define OVN_FEATURE_MAC_BINDING_TIMESTAMP "mac-binding-timestamp"
+#define OVN_FEATURE_CT_LB_RELATED "ovn-ct-lb-related"
 
 /* OVS datapath supported features.  Based on availability OVN might generate
  * different types of openflows.
diff --git a/northd/northd.c b/northd/northd.c
index 4751feab4..72f7118a8 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -447,6 +447,15 @@ build_chassis_features(const struct northd_input 
*input_data,
 chassis_features->mac_binding_timestamp) {
 chassis_features->mac_binding_timestamp = false;
 }
+
+bool ct_lb_related =
+smap_get_bool(>other_config,
+  OVN_FEATURE_CT_LB_RELATED,
+  false);
+if (!ct_lb_related &&
+chassis_features->ct_lb_related) {
+chassis_features->ct_lb_related = false;
+}
 }
 }
 
@@ -6786,14 +6795,17 @@ build_acls(struct ovn_datapath *od, const struct 
chassis_features *features,
  * a dynamically negotiated FTP data channel), but will allow
  * related traffic such as an ICMP Port Unreachable through
  * that's generated from a non-listening UDP port.  */
+const char *ct_acl_action = features->ct_lb_related
+? "ct_commit_nat;"
+: "next;";
 ds_clear();
 ds_put_format(, "!ct.est && ct.rel && !ct.new%s && %s == 0",
   use_ct_inv_match ? " && !ct.inv" : "",
   ct_blocked_match);
 ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX - 3,
-  ds_cstr(), "ct_commit_nat;");
+  ds_cstr(), ct_acl_action);
 ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX - 3,
-  ds_cstr(), "ct_commit_nat;");
+  ds_cstr(), ct_acl_action);
 
 /* Ingress and Egress ACL Table (Priority 65532).
  *
@@ -10484,9 +10496,11 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
*lb_vip,
struct hmap *lflows,
struct ds *match, struct ds *action,
const struct shash *meter_groups,
-   bool ct_lb_mark)
+   const struct chassis_features *features)
 {
-const char *ct_natted = ct_lb_mark ? "ct_mark.natted" : "ct_label.natted";
+const char *ct_natted = features->ct_no_masked_label
+? "ct_mark.natted"
+: "ct_label.natted";
 char *skip_snat_new_action = NULL;
 char *skip_snat_est_action = NULL;
 char *new_match;
@@ -10497,7 +10511,7 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
*lb_vip,
 
 bool reject = build_lb_vip_actions(lb_vip, vips_nb, action,
lb->selection_fields, false,
-   ct_lb_mark);
+   features->ct_no_masked_label);
 bool drop = !!strncmp(ds_cstr(action), "ct_lb", strlen("ct_lb"));
 if (!drop) {
 /* Remove the trailing ");". */
@@ -10519,9 +10533,11 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
*lb_vip,
 }
 
 if (lb->skip_snat) {
+const char *skip_snat = features->ct_lb_related && !drop
+? "; skip_snat);"
+

Re: [ovs-dev] [PATCH v3 1/2] util: Add non quiesce xnanosleep.

2023-01-09 Thread David Marchand
On Fri, Jan 6, 2023 at 3:59 PM Kevin Traynor  wrote:
>
> xnanosleep forces the thread into quiesce state in anticipation that
> it will be sleeping for a considerable time and that the thread may
> need to quiesce before the sleep is finished.
>
> In some cases, a very short sleep may be requested and in that case
> the overhead of going to into quiesce state may be unnecessary.
>
> To allow for those cases add a xnanosleep_no_quiesce() variant.
>
> Suggested-by: Ilya Maximets 
> Signed-off-by: Kevin Traynor 
> ---
>  lib/util.c | 19 +++
>  lib/util.h |  1 +
>  2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/lib/util.c b/lib/util.c
> index 1195c7982..0daf06e8f 100644
> --- a/lib/util.c
> +++ b/lib/util.c
> @@ -2372,9 +2372,7 @@ xsleep(unsigned int seconds)
>  }
>
> -/* High resolution sleep. */
> -void
> -xnanosleep(uint64_t nanoseconds)
> +static void
> +__xnanosleep(uint64_t nanoseconds)
>  {

According to coding style:
"Do not use names that begin with _. If you need a name for “internal
use only”, use __ as a suffix instead of a prefix."

So it should be xnanosleep__.


> -ovsrcu_quiesce_start();
>  #ifndef _WIN32
>  int retval;
> @@ -2404,7 +2402,20 @@ xnanosleep(uint64_t nanoseconds)
>  }
>  #endif
> +}
> +
> +/* High resolution sleep with thread quiesce. */
> +void
> +xnanosleep(uint64_t nanoseconds) {

And I think { should be on next line.



> +ovsrcu_quiesce_start();
> +__xnanosleep(nanoseconds);
>  ovsrcu_quiesce_end();
>  }
>
> +/* High resolution sleep without thread quiesce. */
> +void
> +xnanosleep_no_quiesce(uint64_t nanoseconds) {

Idem.


> +__xnanosleep(nanoseconds);
> +}
> +
>  /* Determine whether standard output is a tty or not. This is useful to 
> decide
>   * whether to use color output or not when --color option for utilities is 
> set
> diff --git a/lib/util.h b/lib/util.h
> index 9ff84b3dc..f35f33021 100644
> --- a/lib/util.h
> +++ b/lib/util.h
> @@ -594,4 +594,5 @@ ovs_u128_is_superset(ovs_u128 super, ovs_u128 sub)
>  void xsleep(unsigned int seconds);
>  void xnanosleep(uint64_t nanoseconds);
> +void xnanosleep_no_quiesce(uint64_t nanoseconds);
>
>  bool is_stdout_a_tty(void);

With this fixed,
Reviewed-by: David Marchand 


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.17] tc: Add support for TCA_STATS_PKT64

2023-01-09 Thread Ilya Maximets
On 1/9/23 09:27, Eelco Chaudron wrote:
> 
> 
> On 6 Jan 2023, at 22:15, Mike Pattrick wrote:
> 
>> Currently tc offload flow packet counters will roll over every ~4
>> billion packets. This is because the packet counter in struct
>> tc_stats provided by TCA_STATS_BASIC is a 32bit integer.
>>
>> Now we check for the optional TCA_STATS_PKT64 attribute which provides
>> the full 64bit packet counter if the 32bit one has rolled over. Because
>> the TCA_STATS_PKT64 attribute may appear multiple times in a netlink
>> message, the method of parsing attributes was changed.
>>
>> Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1776816
>> Signed-off-by: Mike Pattrick 
> 
> Changes look good to me! Thanks for back porting to 2.17.
> 
> Acked-by: Eelco Chaudron 

Applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] feature: Add flag for CT related

2023-01-09 Thread 0-day Robot
Bleep bloop.  Greetings Ales Musil, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Inappropriate spacing in pointer declaration
WARNING: Line lacks whitespace around operator
#123 FILE: northd/northd.c:10536:
const char* skip_snat = features->ct_lb_related && !drop

ERROR: Inappropriate spacing in pointer declaration
WARNING: Line lacks whitespace around operator
#137 FILE: northd/northd.c:10673:
const char* force_snat = features->ct_lb_related && !drop

Lines checked: 372, Warnings: 2, Errors: 2


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] feature: Add flag for CT related

2023-01-09 Thread Ales Musil
In order to be backward compatible add feature flag
that ensures that the CT related flows are skipped
if needed.

Signed-off-by: Ales Musil 
---
 controller/chassis.c   |   7 +++
 include/ovn/features.h |   1 +
 northd/northd.c|  51 +--
 northd/northd.h|   1 +
 tests/ovn-northd.at| 109 +++--
 5 files changed, 149 insertions(+), 20 deletions(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index 685d9b2ae..98f8da2be 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -352,6 +352,7 @@ chassis_build_other_config(const struct ovs_chassis_cfg 
*ovs_cfg,
 smap_replace(config, OVN_FEATURE_PORT_UP_NOTIF, "true");
 smap_replace(config, OVN_FEATURE_CT_NO_MASKED_LABEL, "true");
 smap_replace(config, OVN_FEATURE_MAC_BINDING_TIMESTAMP, "true");
+smap_replace(config, OVN_FEATURE_CT_LB_RELATED, "true");
 }
 
 /*
@@ -469,6 +470,12 @@ chassis_other_config_changed(const struct ovs_chassis_cfg 
*ovs_cfg,
 return true;
 }
 
+if (!smap_get_bool(_rec->other_config,
+   OVN_FEATURE_CT_LB_RELATED,
+   false)) {
+return true;
+}
+
 return false;
 }
 
diff --git a/include/ovn/features.h b/include/ovn/features.h
index 679f67457..5bcd68739 100644
--- a/include/ovn/features.h
+++ b/include/ovn/features.h
@@ -24,6 +24,7 @@
 #define OVN_FEATURE_PORT_UP_NOTIF  "port-up-notif"
 #define OVN_FEATURE_CT_NO_MASKED_LABEL "ct-no-masked-label"
 #define OVN_FEATURE_MAC_BINDING_TIMESTAMP "mac-binding-timestamp"
+#define OVN_FEATURE_CT_LB_RELATED "ovn-ct-lb-related"
 
 /* OVS datapath supported features.  Based on availability OVN might generate
  * different types of openflows.
diff --git a/northd/northd.c b/northd/northd.c
index 4751feab4..289d292e4 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -447,6 +447,15 @@ build_chassis_features(const struct northd_input 
*input_data,
 chassis_features->mac_binding_timestamp) {
 chassis_features->mac_binding_timestamp = false;
 }
+
+bool ct_lb_related =
+smap_get_bool(>other_config,
+  OVN_FEATURE_CT_LB_RELATED,
+  false);
+if (!ct_lb_related &&
+chassis_features->ct_lb_related) {
+chassis_features->ct_lb_related = false;
+}
 }
 }
 
@@ -6786,14 +6795,17 @@ build_acls(struct ovn_datapath *od, const struct 
chassis_features *features,
  * a dynamically negotiated FTP data channel), but will allow
  * related traffic such as an ICMP Port Unreachable through
  * that's generated from a non-listening UDP port.  */
+const char *ct_acl_action = features->ct_lb_related
+? "ct_commit_nat;"
+: "next;";
 ds_clear();
 ds_put_format(, "!ct.est && ct.rel && !ct.new%s && %s == 0",
   use_ct_inv_match ? " && !ct.inv" : "",
   ct_blocked_match);
 ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX - 3,
-  ds_cstr(), "ct_commit_nat;");
+  ds_cstr(), ct_acl_action);
 ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX - 3,
-  ds_cstr(), "ct_commit_nat;");
+  ds_cstr(), ct_acl_action);
 
 /* Ingress and Egress ACL Table (Priority 65532).
  *
@@ -10484,9 +10496,11 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
*lb_vip,
struct hmap *lflows,
struct ds *match, struct ds *action,
const struct shash *meter_groups,
-   bool ct_lb_mark)
+   const struct chassis_features *features)
 {
-const char *ct_natted = ct_lb_mark ? "ct_mark.natted" : "ct_label.natted";
+const char *ct_natted = features->ct_no_masked_label
+? "ct_mark.natted"
+: "ct_label.natted";
 char *skip_snat_new_action = NULL;
 char *skip_snat_est_action = NULL;
 char *new_match;
@@ -10497,7 +10511,7 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
*lb_vip,
 
 bool reject = build_lb_vip_actions(lb_vip, vips_nb, action,
lb->selection_fields, false,
-   ct_lb_mark);
+   features->ct_no_masked_label);
 bool drop = !!strncmp(ds_cstr(action), "ct_lb", strlen("ct_lb"));
 if (!drop) {
 /* Remove the trailing ");". */
@@ -10519,9 +10533,11 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
*lb_vip,
 }
 
 if (lb->skip_snat) {
+const char* skip_snat = features->ct_lb_related && !drop
+? "; skip_snat);"
+: "";
 

Re: [ovs-dev] [PATCH v7 2/2] netdev-dpdk: Drop coverage counter for vhost IRQs.

2023-01-09 Thread Maxime Coquelin




On 1/6/23 16:58, David Marchand wrote:

The vhost library now provides finegrained statistics for guest
notifications:
- notifications for buffer reclaim by the guest,
- notifications for buffer availability to the guest,

Example before this patch:
$ ovs-appctl coverage/show |
   grep vhost_notification
vhost_notification 0.0/sec 0.000/sec2.0283/sec   total: 7302

$ ovs-vsctl get interface vhost4 statistics |
   sed -e 's#[{}]##g' -e 's#, #\n#g' |
   grep guest_notifications
rx_q0_guest_notifications=66
tx_q0_guest_notifications=7236

Signed-off-by: David Marchand 
---
  lib/netdev-dpdk.c | 9 -
  1 file changed, 9 deletions(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/6] Add support for DPDK meter HW offload

2023-01-09 Thread Eli Britstein via dev


>-Original Message-
>From: Nole Zhang 
>Sent: Monday, 9 January 2023 11:23
>To: Eli Britstein ; d...@openvswitch.org
>Cc: Eelco Chaudron ; Ilya Maximets
>; Chaoyong He ; oss-
>drivers 
>Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
>
>External email: Use caution opening links or attachments
>
>
>> -Original Message-
>> From: Eli Britstein 
>> Sent: 2023年1月8日 15:27
>> To: Nole Zhang ; d...@openvswitch.org
>> Cc: Eelco Chaudron ; Ilya Maximets
>> ; Chaoyong He ; oss-
>> drivers 
>> Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
>>
>>
>>
>> >-Original Message-
>> >From: Nole Zhang 
>> >Sent: Friday, 6 January 2023 11:28
>> >To: Eli Britstein ; d...@openvswitch.org
>> >Cc: Eelco Chaudron ; Ilya Maximets
>> >; Chaoyong He ; oss-
>> >drivers ; Nole Zhang
>> >
>> >Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
>> >
>> >External email: Use caution opening links or attachments
>> >
>> >
>> >> -Original Message-
>> >> From: Eli Britstein 
>> >> Sent: 2022年12月26日 18:04
>> >> To: Simon Horman ;
>> d...@openvswitch.org
>> >> Cc: Eelco Chaudron ; Ilya Maximets
>> >> ; Chaoyong He ;
>oss-
>> >> drivers ; Nole Zhang
>> >> 
>> >> Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
>> >>
>> >> Dpif-netdev should not implement internal HW offload details. If
>> >> need to "apply on all ports", it needs to be done in offload layer.
>> >> However, in arch level, there is a problem with the proposed series.
>> >> It will create a meter object per port, while in SW it is one
>> >> object, that can be shared between multiple flows, on different ports.
>> >
>> >In dpif-netdev, it doesn't relate with implement internal HW offload
>> >details, I just try to add the meter to the PMD if the PMD support
>> >the meter
>> offload.
>> [Eli Britstein] your loops over ports, not over PMDs. See in [1], for
>> example in
>> dpif_netdev_offload_meter_set():
>> +HMAP_FOR_EACH (port, node, >ports) {
>> +dev = port->netdev;
>> Am I wrong?
>
>Thanks for your notice, yes, as ovs code, it will add the meter in different 
>port.
>
>As our design,  for different port, if the different port has the same PMD with
>the same meter id, it will just  add the meter successfully once in the dpdk 
>and
>it can achieve sharing the same NIC different vf.
>
>If I add the judge for the PMD, for different PMD, just add the meter once, do
>you think it is ok?
No. Even if you improve the code to create "once". Suppose the PF is port 0, 
and 2 VF representors are ports 1,2.
If the meter is created on port 1, using it with ports 0,2 is illegal from DPDK 
generic point of view. It might be supported depending on specific PMD support.
What will happen if port 1 is detached from OVS for example?
I think it has to be on the "proxy" port, see below.
>
>>
>> Other than that, there is already a convenient API to traverse ports
>> for offload - netdev_ports_traverse().
>
>Ok, thanks, I will investigate it.
>
>> >
>> >No, it will create a meter object per PMD not per port, so the meter
>> >can share the same NIC different vf,  different NIC can't share  the
>> >meter, it is same with ovs-tc meter offload
>> [Eli Britstein] no, it will create an object per port, as this is your code.
>> To create a shared object for all the VFs in the same NIC, need to use
>> the "proxy" port. Such work has started in [2].
>>
>> [1]
>>
>https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
>>
>work.ozlabs.org%2Fproject%2Fopenvswitch%2Fpatch%2F20221216155054.9&
>dat
>>
>a=05%7C01%7Celibr%40nvidia.com%7C6ecc84b885334be08f7d08daf2230e2a%
>7C43
>>
>083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638088529626823112%7CUn
>known%
>>
>7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
>LCJX
>>
>VCI6Mn0%3D%7C3000%7C%7C%7C=rcM52oDqpL%2F%2BewlX8ZRZSTL
>mJslYBK7SI
>> 4c9BX3ocY8%3D=0
>> 86464-3-simon.hor...@corigine.com/
>> [2]
>>
>https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
>>
>work.ozlabs.org%2Fproject%2Fopenvswitch%2Fpatch%2F20220720121823.2&
>dat
>>
>a=05%7C01%7Celibr%40nvidia.com%7C6ecc84b885334be08f7d08daf2230e2a%
>7C43
>>
>083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638088529626823112%7CUn
>known%
>>
>7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
>LCJX
>>
>VCI6Mn0%3D%7C3000%7C%7C%7C=%2B1ahxjeQoGyaQJHh9mKv91e5
>OH46qUc1VMb
>> 2wiPxOFI%3D=0
>> 497727-4-ivan.ma...@oktetlabs.ru/
>
>Ok, thanks for your notice, I will investigate it, but the commit seems cant be
>accepted.
It can't be accepted as it was incomplete and premature. I think that work must 
be completed and integrated before introducing meter offloads.

>>
>> >>
>> >> >-Original Message-
>> >> >From: Simon Horman 
>> >> >Sent: Friday, 16 December 2022 17:51
>> >> >To: d...@openvswitch.org
>> >> >Cc: Eelco Chaudron ; Ilya Maximets
>> >> >; Eli Britstein ; Chaoyong
>> >> >He ; oss-driv...@corigine.com; Peng
>> >> >Zhang ; Simon Horman
>> >> 
>> >> >Subject: [PATCH 

Re: [ovs-dev] [PATCH v7 1/2] netdev-dpdk: Add per virtqueue statistics.

2023-01-09 Thread Maxime Coquelin




On 1/6/23 16:58, David Marchand wrote:

The DPDK vhost-user library maintains more granular per queue stats
which can replace what OVS was providing for vhost-user ports.

The benefits for OVS:
- OVS can skip parsing packet sizes on the rx side,
- dev->stats_lock won't be taken in rx/tx code unless some packet is
   dropped,
- vhost-user is aware of which packets are transmitted to the guest,
   so per *transmitted* packet size stats can be reported,
- more internal stats from vhost-user may be exposed, without OVS
   needing to understand them,

Note: the vhost-user library does not provide global stats for a port.
The proposed implementation is to have the global stats (exposed via
netdev_get_stats()) computed by querying and aggregating all per queue
stats.
Since per queue stats are exposed via another netdev ops
(netdev_get_custom_stats()), this may lead to some race and small
discrepancies.
This issue might already affect other netdev classes.

Example:
$ ovs-vsctl get interface vhost4 statistics |
   sed -e 's#[{}]##g' -e 's#, #\n#g' |
   grep -v =0$
rx_1_to_64_packets=12
rx_256_to_511_packets=15
rx_65_to_127_packets=21
rx_broadcast_packets=15
rx_bytes=7497
rx_multicast_packets=33
rx_packets=48
rx_q0_good_bytes=242
rx_q0_good_packets=3
rx_q0_guest_notifications=3
rx_q0_multicast_packets=3
rx_q0_size_65_127_packets=2
rx_q0_undersize_packets=1
rx_q1_broadcast_packets=15
rx_q1_good_bytes=7255
rx_q1_good_packets=45
rx_q1_guest_notifications=45
rx_q1_multicast_packets=30
rx_q1_size_256_511_packets=15
rx_q1_size_65_127_packets=19
rx_q1_undersize_packets=11
tx_1_to_64_packets=36
tx_256_to_511_packets=45
tx_65_to_127_packets=63
tx_broadcast_packets=45
tx_bytes=22491
tx_multicast_packets=99
tx_packets=144
tx_q0_broadcast_packets=30
tx_q0_good_bytes=14994
tx_q0_good_packets=96
tx_q0_guest_notifications=96
tx_q0_multicast_packets=66
tx_q0_size_256_511_packets=30
tx_q0_size_65_127_packets=42
tx_q0_undersize_packets=24
tx_q1_broadcast_packets=15
tx_q1_good_bytes=7497
tx_q1_good_packets=48
tx_q1_guest_notifications=48
tx_q1_multicast_packets=33
tx_q1_size_256_511_packets=15
tx_q1_size_65_127_packets=21
tx_q1_undersize_packets=12

Reviewed-by: Maxime Coquelin 
Signed-off-by: David Marchand 
---
Changes since v6:
- fixed tx_retries,
- dropped netdev_dpdk_vhost_update_[rt]x_counters helpers. This makes
   it clearer that a lock can be taken in the vhost rx/tx code and this
   aligns with the dpdk rx/tx code too,

Changes since v5:
- added missing dev->stats_lock acquire in netdev_dpdk_vhost_get_stats,
- changed netdev_dpdk_vhost_update_[rt]x_counters to take dev->stats_lock
   only when some packets got dropped in OVS. Since the rx side won't
   take the lock unless some QoS configuration is in place, this change
   will likely have the same effect as separating stats_lock into rx/tx
   dedicated locks. Testing shows a slight (around 1%) improvement of
   performance for some V2V setup,

Changes since v3:
- rebased to master now that v22.11 landed,
- fixed error code in stats helper when vhost port is not "running",
- shortened rx/tx stats macro names,

Changes since RFC v2:
- dropped the experimental api check (now that the feature is marked
   stable in DPDK),
- moved netdev_dpdk_get_carrier() forward declaration next to the
   function needing it,
- used per q stats for netdev_get_stats() and removed OVS per packet
   size accounting logic,
- fixed small packets counter (see rx_undersized_errors hack),
- added more Tx stats,
- added unit tests,

---
  lib/netdev-dpdk.c| 447 ++-
  tests/system-dpdk.at |  33 +++-
  2 files changed, 348 insertions(+), 132 deletions(-)



My R-by was already set from previous revision, but just to confirm this
new version is good to me:

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/6] Add support for DPDK meter HW offload

2023-01-09 Thread Nole Zhang


> -Original Message-
> From: Eli Britstein 
> Sent: 2023年1月8日 15:27
> To: Nole Zhang ; d...@openvswitch.org
> Cc: Eelco Chaudron ; Ilya Maximets
> ; Chaoyong He ; oss-
> drivers 
> Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> 
> 
> 
> >-Original Message-
> >From: Nole Zhang 
> >Sent: Friday, 6 January 2023 11:28
> >To: Eli Britstein ; d...@openvswitch.org
> >Cc: Eelco Chaudron ; Ilya Maximets
> >; Chaoyong He ; oss-
> >drivers ; Nole Zhang
> >
> >Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> >
> >External email: Use caution opening links or attachments
> >
> >
> >> -Original Message-
> >> From: Eli Britstein 
> >> Sent: 2022年12月26日 18:04
> >> To: Simon Horman ;
> d...@openvswitch.org
> >> Cc: Eelco Chaudron ; Ilya Maximets
> >> ; Chaoyong He ; oss-
> >> drivers ; Nole Zhang
> >> 
> >> Subject: RE: [PATCH 0/6] Add support for DPDK meter HW offload
> >>
> >> Dpif-netdev should not implement internal HW offload details. If need
> >> to "apply on all ports", it needs to be done in offload layer.
> >> However, in arch level, there is a problem with the proposed series.
> >> It will create a meter object per port, while in SW it is one object,
> >> that can be shared between multiple flows, on different ports.
> >
> >In dpif-netdev, it doesn't relate with implement internal HW offload
> >details, I just try to add the meter to the PMD if the PMD support the meter
> offload.
> [Eli Britstein] your loops over ports, not over PMDs. See in [1], for example 
> in
> dpif_netdev_offload_meter_set():
> +HMAP_FOR_EACH (port, node, >ports) {
> +dev = port->netdev;
> Am I wrong?

Thanks for your notice, yes, as ovs code, it will add the meter in different 
port.

As our design,  for different port, if the different port has the same PMD with
the same meter id, it will just  add the meter successfully once in the dpdk and
it can achieve sharing the same NIC different vf.

If I add the judge for the PMD, for different PMD, just add the meter once, do
you think it is ok?

> 
> Other than that, there is already a convenient API to traverse ports for
> offload - netdev_ports_traverse().

Ok, thanks, I will investigate it.

> >
> >No, it will create a meter object per PMD not per port, so the meter
> >can share the same NIC different vf,  different NIC can't share  the
> >meter, it is same with ovs-tc meter offload
> [Eli Britstein] no, it will create an object per port, as this is your code.
> To create a shared object for all the VFs in the same NIC, need to use the
> "proxy" port. Such work has started in [2].
> 
> [1]
> http://patchwork.ozlabs.org/project/openvswitch/patch/20221216155054.9
> 86464-3-simon.hor...@corigine.com/
> [2]
> http://patchwork.ozlabs.org/project/openvswitch/patch/20220720121823.2
> 497727-4-ivan.ma...@oktetlabs.ru/

Ok, thanks for your notice, I will investigate it, but the commit seems cant be 
accepted.
> 
> >>
> >> >-Original Message-
> >> >From: Simon Horman 
> >> >Sent: Friday, 16 December 2022 17:51
> >> >To: d...@openvswitch.org
> >> >Cc: Eelco Chaudron ; Ilya Maximets
> >> >; Eli Britstein ; Chaoyong He
> >> >; oss-driv...@corigine.com; Peng Zhang
> >> >; Simon Horman
> >> 
> >> >Subject: [PATCH 0/6] Add support for DPDK meter HW offload
> >> >
> >> >External email: Use caution opening links or attachments
> >> >
> >> >
> >> >Hi,
> >> >
> >> >this series adds support for DPDK meter HW offload
> >> >
> >> >* Patch 1/6: Add netdev provider API for HW offload of DPDK meters
> >> >* Patch 2/6: Add DPIF API to offload OpenFlow meters to DPDK
> >> >* Patch 3/6: Implement netdev provider API for HW offload of DPDK
> >> >meters
> >> >* Patch 4/6: Add more DPDK meter algorithms
> >> >* Patch 4/6: Add support for meter action ti DPDK HW offload
> >> >* Patch 4/6: Add CI builds with ALLOW_EXPERIMENTAL_API
> >> >
> >> >Peng Zhang (6):
> >> >  netdev-offload: Add DPDK meter offload API
> >> >  dpif-netdev: Offloading meter with DPDK
> >> >  netdev-offload-dpdk: Implement meter offload API for DPDK
> >> >  netdev-dpdk: add meter algorithms
> >> >  netdev-dpdk-offload: Add support for meter action
> >> >  ci: add the opts about ALLOW_EXPERIMENTAL_API
> >> >
> >> > .ci/linux-build.sh   |   4 +
> >> > .github/workflows/build-and-test.yml |  31 
> >> > Documentation/howto/dpdk.rst |   5 +-
> >> > lib/dpif-netdev.c| 102 +++
> >> > lib/netdev-dpdk.c| 243 +++
> >> > lib/netdev-dpdk.h|  41 +
> >> > lib/netdev-offload-dpdk.c| 101 +++
> >> > lib/netdev-offload-provider.h|  30 
> >> > lib/netdev-offload.c |  59 +++
> >> > lib/netdev-offload.h |   9 +
> >> > 10 files changed, 623 insertions(+), 2 deletions(-)
> >> >
> >> >--
> >> >2.30.2

___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [PATCH branch-2.17] tc: Add support for TCA_STATS_PKT64

2023-01-09 Thread Eelco Chaudron



On 6 Jan 2023, at 22:15, Mike Pattrick wrote:

> Currently tc offload flow packet counters will roll over every ~4
> billion packets. This is because the packet counter in struct
> tc_stats provided by TCA_STATS_BASIC is a 32bit integer.
>
> Now we check for the optional TCA_STATS_PKT64 attribute which provides
> the full 64bit packet counter if the 32bit one has rolled over. Because
> the TCA_STATS_PKT64 attribute may appear multiple times in a netlink
> message, the method of parsing attributes was changed.
>
> Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1776816
> Signed-off-by: Mike Pattrick 

Changes look good to me! Thanks for back porting to 2.17.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] [ovs-dev v4] dpctl: Add support to count upcall packets

2023-01-09 Thread wangchuanlei
On 6 Jan 2023, at 11:37, Ilya Maximets wrote:

> On 1/6/23 11:12, Eelco Chaudron wrote:
>>
>>
>> On 6 Jan 2023, at 10:49, wangchuanlei wrote:
>>
 On 1/5/23 16:40, Eelco Chaudron wrote:
>
>
> On 5 Jan 2023, at 2:52, wangchuanlei wrote:
>
>> Add support to count upall packets, when kmod of openvswitch 
>> upcall to count the number of packets for upcall succeed and 
>> failed, which is a better way to see how many packets upcalled on every 
>> interfaces.
>>
>> Signed-off-by: wangchuanlei >
>
> The code works, and I see no other problems, so I???m ok to ACK it.
>
> However, before I do, I do think we should get some feedback on how the 
> stats are displayed.
>
> This is what it looks like now:
>
>   port 0: my (internal)
> RX packets:10 errors:0 dropped:0 overruns:0 frame:0
> upcall success:1 upcall fail:0
> TX packets:0 errors:0 dropped:0 aborted:0 carrier:0
> collisions:0
> RX bytes:1230  TX bytes:0
>
>
> It???s a bit confusing with the space in the name. Should we maybe 
> separate upcall stats completely?
> Something like:
>
>   port 0: my (internal)
> RX packets:10 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 aborted:0 carrier:0
> collisions:0
> RX bytes:1230  TX bytes:0
> UPCALL packets:1 failed:0
>
> Also, note I used ???packets??? and ???failed??? to be more in line with 
> existing stats.

 I like the 'UPCALL ...' approach better.  We may even go further and have 
 'UPCALL packets:X dropped:Y', i.e. 'dropped' instead of 'failed'.  Not 
 sure.
>>>
>>> Yes, 'UPCALL ...' approach better, may be 'UPCALL packets:X errors:Y' is 
>>> better, after all, failed/dropped/errors means the kernel datapath stopped 
>>> upcall to userspce when it find errors in the info of packet.
>>
>> Ok, so let???s do the UPCALL, and I also think error would fit best.
>
> +1
> 'UPCALL packets:X errors:Y' indeed looks better.
>
> One other thought here is that 'success' and 'failure' are meaningful 
> from the kernel side of things, but a bit ambiguous from the OVS point 
> of view, because upcall may fail somewhere in the flow translation in 
> userspace and packet will be dropped, but the counter will say 'upcall 
> success', which is a bit misleading.  'errors' from that point of view 
> is a bit better term.  Not ideal, but better than other options.
>
>>
>
> And maybe not even display it when the feature is not supported?

 It should be OK to print these as long as we correctly print '?'
 on ports that do not support that counter.  Did you test that?
>>>
>>> I already tested it, if datapath didn't support the feature, it will print 
>>> '?'.
>>
>> ACK, did test this also.
>
> OK, good.  Thanks!
>
>>

> Anyone any thoughts!?

 This patch is missing the stats propagation to the database in the 
 iface_refresh_stats().  That should be added.
>>>
>>> Ok!
>>>

 OTOH, a point can be made that these stats should not be part of the main 
 netdev_stats and hence should not be reported along side of them at all.  
 It might be better to report them via 'custom stats' mechanism, since not 
 all the ports will have them and it might be difficult to add support for 
 upcall counting in that form to userspace datapath, so it will always 
 report question marks.  Custom stats will be accessed via database or via 
 OpenFlow 1.4+ version of dump-ports request.

 Don't have a strong opinion on this though.
>>>
>>> Userspace datapath is not count in dpctl command, we can do not include 
>>> this patch ?
>
> As I mentioned above, it might be hard to track upcalls this way for 
> userpsace datapath.  The reason is that upcalls are happening on the 
> datapath level (dpif-netdev), but statistics are queried from the 
> netdev level.  Getting upcall stats from the dpif-netdev down to 
> netdev stats might be tricky from both the logical and performance 
> point of view.  So, I think, it's OK to always report question marks 
> for ports in userspace datapath for now.  No extra changes required.
>
>>>
>>> Any ideas???
>>
>> I guess you can just add them in iface_refresh_stats() ???#define 
>> IFACE_STATS??? definition, and they will be included if not set the 
>> UINT64_MAX. Or do I miss anything?
>
> Yes, we only need to update the IFACE_STATS definition.  We should 
> probably alter the names of the counters though the same way we agreed 
> to do for the dpctl output, i.e. report 'upcall_success'
> as 'upcall_packets' and 'upcall_fail' as 'upcall_errors'.
> Doe that make sense?

> Yes it does.
I agree, Thank you for review, Eelco & Ilya!
Best regards!
wangchuanlei

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev