Re: [ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-12-03 Thread Ilya Maximets
On 11/22/21 14:37, Eli Britstein via dev wrote:
> 
> On 11/22/2021 3:19 PM, Sriharsha Basavapatna wrote:
>> Hi Eli,
>>
>> On Sun, Nov 21, 2021 at 12:03 PM Eli Britstein via dev
>>  wrote:
>>> Hi Harsha,
>>>
>>> It's a clever idea, though have some problems in the implementation. PSB.
>> Thanks, please see my response below.
>>>
>>> On 11/20/2021 11:20 AM, Sriharsha Basavapatna wrote:
 The hw_miss_packet_recover() API results in performance degradation, for
 ports that are either not offload capable or do not support this specific
 offload API.

 For example, in the test configuration shown below, the vhost-user port
 does not support offloads and the VF port doesn't support hw_miss offload
 API. But because tunnel offload needs to be configured in other bridges
 (br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.

   br-vhost    br-vxlan    br-phy
 vhost-user<-->VF    VF-Rep<-->VxLAN   uplink-port

 For every packet between the VF and the vhost-user ports, hw_miss API is
 called even though it is not supported by the ports involved. This leads
 to significant performance drop (~3x in some cases; both cycles and pps).

 To fix this, return EOPNOTSUPP when this API fails for a device that
>>> "To fix" -> "To improve"
 doesn't support it and avoid this API on that port for subsequent packets.

 Signed-off-by: Sriharsha Basavapatna 
 ---
    lib/dpif-netdev-private.h |  2 +-
    lib/dpif-netdev.c | 29 +
    lib/netdev-offload-dpdk.c |  9 +++--
    3 files changed, 29 insertions(+), 11 deletions(-)




 diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
 index 9fee7570a..8bd2e 100644
 --- a/lib/netdev-offload-dpdk.c
 +++ b/lib/netdev-offload-dpdk.c
 @@ -2292,11 +2292,16 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct 
 netdev *netdev,
    odp_port_t vport_odp;
    int ret = 0;

 -    if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
 -  &rte_restore_info, NULL)) {
 +    ret = netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
 +    &rte_restore_info, NULL);
 +    if (ret) {
    /* This function is called for every packet, and in most cases 
 there
     * will be no restore info from the HW, thus error is expected.
     */
 +    VLOG_DBG_RL(&rl, "flow_get_restore_info failed: %d\n", ret);
>>> It is likely that most of the packets will have an "error" when calling
>>> this API, if there is nothing to get. See the comment. There was no
>>> print on purpose.
>> I added it for debugging, will remove.
 +    if (ret == -EOPNOTSUPP) {
>>> How can we guarantee EOPNOTSUPP is only if indeed not supported and not
>>> that the PMD returned it (for this packet only)?
>> PMDs must return some other errors for per-packet issues (e.g EINVAL
>> for invalid packet format etc). EOPNOTSUPP indicates lack of API
>> support. It is already being used to indicate that, as seen in
>> netdev-offload.c::hw_miss_packet_recover(). Similarly, a PMD that
>> doesn't support this API should return EOPNOTSUPP.
> 
> OK. maybe it's better than to push a comment in DPDK to clarify this. 
> Currently it just says:
> 
>  *   0 on success, a negative errno value otherwise and rte_errno is set.

+1
The clarification in DPDK docs would be great.

> 
>>
>>  return (flow_api && flow_api->hw_miss_packet_recover)
>>  ? flow_api->hw_miss_packet_recover(netdev, packet)
>>  : EOPNOTSUPP;
>>> Maybe it is better to have another API to query (might also need changes
>>> in dpdk).
>> I don't see the need here, as explained above.

I think that idea of this patch is fine and we should implement the solution
on OVS's side.  However, I also think that we do need the DPDK API change.
For example, if the port actually supports HW miss recover, but user doesn't
have tunnels configured, or flows are not offloadable (e.g. has a destination
on a virtual port).  In these cases OVS will still waste a lot of time calling
the recovery API, going deep inside the DPDK PMD driver basically for nothing.
So, these cases will experience significant performance degradation similarly
to what you see right now for ports that doesn't support miss recovery.

Ideally, that should be a separate mbuf flag that application can check and
make a decision to execute an expensive API call.  I see that sfc driver 
allocates
a dynamic mbuf flag for the purpose of HW miss recover API, but mlx5 driver
re-uses same flags as for the partial offload and I'm not sure if that can
be changed.  But something needs to be changed.  It's essential for the
application to be able to make a decision on a per-packet basis without making
expensive API calls.

Fo

Re: [ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-11-22 Thread Eli Britstein via dev


On 11/22/2021 3:19 PM, Sriharsha Basavapatna wrote:

Hi Eli,

On Sun, Nov 21, 2021 at 12:03 PM Eli Britstein via dev
 wrote:

Hi Harsha,

It's a clever idea, though have some problems in the implementation. PSB.

Thanks, please see my response below.


On 11/20/2021 11:20 AM, Sriharsha Basavapatna wrote:

The hw_miss_packet_recover() API results in performance degradation, for
ports that are either not offload capable or do not support this specific
offload API.

For example, in the test configuration shown below, the vhost-user port
does not support offloads and the VF port doesn't support hw_miss offload
API. But because tunnel offload needs to be configured in other bridges
(br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.

  br-vhostbr-vxlanbr-phy
vhost-user<-->VFVF-Rep<-->VxLAN   uplink-port

For every packet between the VF and the vhost-user ports, hw_miss API is
called even though it is not supported by the ports involved. This leads
to significant performance drop (~3x in some cases; both cycles and pps).

To fix this, return EOPNOTSUPP when this API fails for a device that

"To fix" -> "To improve"

doesn't support it and avoid this API on that port for subsequent packets.

Signed-off-by: Sriharsha Basavapatna 
---
   lib/dpif-netdev-private.h |  2 +-
   lib/dpif-netdev.c | 29 +
   lib/netdev-offload-dpdk.c |  9 +++--
   3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h
index 4593649bd..e2a6a9d3a 100644
--- a/lib/dpif-netdev-private.h
+++ b/lib/dpif-netdev-private.h
@@ -46,7 +46,7 @@ dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd,

   int
   dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no,
+  void *port,

void * -> struct tx_port *. use a forward declaration.


 struct dp_packet *packet,
 struct dp_netdev_flow **flow);

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 69d7ec26e..207b1961c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -434,6 +434,7 @@ struct tx_port {
   long long last_used;
   struct hmap_node node;
   long long flush_time;
+bool hw_miss_api_supported;
   struct dp_packet_batch output_pkts;
   struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
   };
@@ -6972,6 +6973,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread 
*pmd,
   tx->port = port;
   tx->qid = -1;
   tx->flush_time = 0LL;
+tx->hw_miss_api_supported = true;
   dp_packet_batch_init(&tx->output_pkts);

   hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port->port_no));
@@ -7327,22 +7329,28 @@ static struct tx_port * pmd_send_port_cache_lookup(

   inline int
   dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no OVS_UNUSED,
+  void *port,

don't omit OVS_UNUSED. it is for compiling without ALLOW_EXPERIMENTAL_API

Ok.

 struct dp_packet *packet,
 struct dp_netdev_flow **flow)
   {
-struct tx_port *p OVS_UNUSED;
+struct tx_port *p = port;

no need for this local variable, you get it from the function arguments

The declaration of dp_netdev_hw_flow() in dpif-netdev-private.h can't
see 'struct tx_port' since it is defined in dpif_netdev.c. So it needs
to be a void * argument.


In the H file, use forward declaration, like this:

struct tx_port;

void foo(struct tx_port *port);

Then, in the C file this stack variable can be removed.


   uint32_t mark;

   #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
   /* Restore the packet if HW processing was terminated before completion. 
*/
-p = pmd_send_port_cache_lookup(pmd, port_no);
-if (OVS_LIKELY(p)) {
+if (OVS_LIKELY(p) && p->hw_miss_api_supported) {
   int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);

-if (err && err != EOPNOTSUPP) {
-COVERAGE_INC(datapath_drop_hw_miss_recover);
-return -1;
+if (err) {
+if (err != EOPNOTSUPP) {
+COVERAGE_INC(datapath_drop_hw_miss_recover);
+return -1;
+} else {
+/* API unsupported by the port; avoid subsequent calls. */
+VLOG_DBG("hw_miss_api unsupported: port: %d",
+ p->port->port_no);
+p->hw_miss_api_supported = false;
+}
   }
   }
   #endif
@@ -7394,6 +7402,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
   uint16_t tcp_flags;
   size_t map_cnt = 0;
   bool batch_enable = true;
+struct tx_port *port = NULL;
+
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
+port = pmd_send_port_cache_lookup(pmd, port_no);
+#endif

   pmd_perf_update_counter(&pmd->perf_stats,

Re: [ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-11-22 Thread Sriharsha Basavapatna via dev
Hi Eli,

On Sun, Nov 21, 2021 at 12:03 PM Eli Britstein via dev
 wrote:
>
> Hi Harsha,
>
> It's a clever idea, though have some problems in the implementation. PSB.

Thanks, please see my response below.
>
>
> On 11/20/2021 11:20 AM, Sriharsha Basavapatna wrote:
> > The hw_miss_packet_recover() API results in performance degradation, for
> > ports that are either not offload capable or do not support this specific
> > offload API.
> >
> > For example, in the test configuration shown below, the vhost-user port
> > does not support offloads and the VF port doesn't support hw_miss offload
> > API. But because tunnel offload needs to be configured in other bridges
> > (br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.
> >
> >  br-vhostbr-vxlanbr-phy
> > vhost-user<-->VFVF-Rep<-->VxLAN   uplink-port
> >
> > For every packet between the VF and the vhost-user ports, hw_miss API is
> > called even though it is not supported by the ports involved. This leads
> > to significant performance drop (~3x in some cases; both cycles and pps).
> >
> > To fix this, return EOPNOTSUPP when this API fails for a device that
> "To fix" -> "To improve"
> > doesn't support it and avoid this API on that port for subsequent packets.
> >
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >   lib/dpif-netdev-private.h |  2 +-
> >   lib/dpif-netdev.c | 29 +
> >   lib/netdev-offload-dpdk.c |  9 +++--
> >   3 files changed, 29 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h
> > index 4593649bd..e2a6a9d3a 100644
> > --- a/lib/dpif-netdev-private.h
> > +++ b/lib/dpif-netdev-private.h
> > @@ -46,7 +46,7 @@ dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd,
> >
> >   int
> >   dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> > -  odp_port_t port_no,
> > +  void *port,
>
> void * -> struct tx_port *. use a forward declaration.
>
> > struct dp_packet *packet,
> > struct dp_netdev_flow **flow);
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 69d7ec26e..207b1961c 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -434,6 +434,7 @@ struct tx_port {
> >   long long last_used;
> >   struct hmap_node node;
> >   long long flush_time;
> > +bool hw_miss_api_supported;
> >   struct dp_packet_batch output_pkts;
> >   struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
> >   };
> > @@ -6972,6 +6973,7 @@ dp_netdev_add_port_tx_to_pmd(struct 
> > dp_netdev_pmd_thread *pmd,
> >   tx->port = port;
> >   tx->qid = -1;
> >   tx->flush_time = 0LL;
> > +tx->hw_miss_api_supported = true;
> >   dp_packet_batch_init(&tx->output_pkts);
> >
> >   hmap_insert(&pmd->tx_ports, &tx->node, 
> > hash_port_no(tx->port->port_no));
> > @@ -7327,22 +7329,28 @@ static struct tx_port * pmd_send_port_cache_lookup(
> >
> >   inline int
> >   dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> > -  odp_port_t port_no OVS_UNUSED,
> > +  void *port,
> don't omit OVS_UNUSED. it is for compiling without ALLOW_EXPERIMENTAL_API
Ok.
> > struct dp_packet *packet,
> > struct dp_netdev_flow **flow)
> >   {
> > -struct tx_port *p OVS_UNUSED;
> > +struct tx_port *p = port;
> no need for this local variable, you get it from the function arguments
The declaration of dp_netdev_hw_flow() in dpif-netdev-private.h can't
see 'struct tx_port' since it is defined in dpif_netdev.c. So it needs
to be a void * argument.
> >   uint32_t mark;
> >
> >   #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
> >   /* Restore the packet if HW processing was terminated before 
> > completion. */
> > -p = pmd_send_port_cache_lookup(pmd, port_no);
> > -if (OVS_LIKELY(p)) {
> > +if (OVS_LIKELY(p) && p->hw_miss_api_supported) {
> >   int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
> >
> > -if (err && err != EOPNOTSUPP) {
> > -COVERAGE_INC(datapath_drop_hw_miss_recover);
> > -return -1;
> > +if (err) {
> > +if (err != EOPNOTSUPP) {
> > +COVERAGE_INC(datapath_drop_hw_miss_recover);
> > +return -1;
> > +} else {
> > +/* API unsupported by the port; avoid subsequent calls. */
> > +VLOG_DBG("hw_miss_api unsupported: port: %d",
> > + p->port->port_no);
> > +p->hw_miss_api_supported = false;
> > +}
> >   }
> >   }
> >   #endif
> > @@ -7394,6 +7402,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
> >   uint16_t tcp_flags;
> >   size_t map_cnt = 0;
> >   bool batch_enable = true;
> > +struct tx_port *port = NULL;
> > +
> > +#ifdef ALLOW_

Re: [ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-11-20 Thread Eli Britstein via dev

Hi Harsha,

It's a clever idea, though have some problems in the implementation. PSB.


On 11/20/2021 11:20 AM, Sriharsha Basavapatna wrote:

The hw_miss_packet_recover() API results in performance degradation, for
ports that are either not offload capable or do not support this specific
offload API.

For example, in the test configuration shown below, the vhost-user port
does not support offloads and the VF port doesn't support hw_miss offload
API. But because tunnel offload needs to be configured in other bridges
(br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.

 br-vhostbr-vxlanbr-phy
vhost-user<-->VFVF-Rep<-->VxLAN   uplink-port

For every packet between the VF and the vhost-user ports, hw_miss API is
called even though it is not supported by the ports involved. This leads
to significant performance drop (~3x in some cases; both cycles and pps).

To fix this, return EOPNOTSUPP when this API fails for a device that

"To fix" -> "To improve"

doesn't support it and avoid this API on that port for subsequent packets.

Signed-off-by: Sriharsha Basavapatna 
---
  lib/dpif-netdev-private.h |  2 +-
  lib/dpif-netdev.c | 29 +
  lib/netdev-offload-dpdk.c |  9 +++--
  3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h
index 4593649bd..e2a6a9d3a 100644
--- a/lib/dpif-netdev-private.h
+++ b/lib/dpif-netdev-private.h
@@ -46,7 +46,7 @@ dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd,
  
  int

  dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no,
+  void *port,


void * -> struct tx_port *. use a forward declaration.


struct dp_packet *packet,
struct dp_netdev_flow **flow);
  
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

index 69d7ec26e..207b1961c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -434,6 +434,7 @@ struct tx_port {
  long long last_used;
  struct hmap_node node;
  long long flush_time;
+bool hw_miss_api_supported;
  struct dp_packet_batch output_pkts;
  struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
  };
@@ -6972,6 +6973,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread 
*pmd,
  tx->port = port;
  tx->qid = -1;
  tx->flush_time = 0LL;
+tx->hw_miss_api_supported = true;
  dp_packet_batch_init(&tx->output_pkts);
  
  hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port->port_no));

@@ -7327,22 +7329,28 @@ static struct tx_port * pmd_send_port_cache_lookup(
  
  inline int

  dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no OVS_UNUSED,
+  void *port,

don't omit OVS_UNUSED. it is for compiling without ALLOW_EXPERIMENTAL_API

struct dp_packet *packet,
struct dp_netdev_flow **flow)
  {
-struct tx_port *p OVS_UNUSED;
+struct tx_port *p = port;

no need for this local variable, you get it from the function arguments

  uint32_t mark;
  
  #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */

  /* Restore the packet if HW processing was terminated before completion. 
*/
-p = pmd_send_port_cache_lookup(pmd, port_no);
-if (OVS_LIKELY(p)) {
+if (OVS_LIKELY(p) && p->hw_miss_api_supported) {
  int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
  
-if (err && err != EOPNOTSUPP) {

-COVERAGE_INC(datapath_drop_hw_miss_recover);
-return -1;
+if (err) {
+if (err != EOPNOTSUPP) {
+COVERAGE_INC(datapath_drop_hw_miss_recover);
+return -1;
+} else {
+/* API unsupported by the port; avoid subsequent calls. */
+VLOG_DBG("hw_miss_api unsupported: port: %d",
+ p->port->port_no);
+p->hw_miss_api_supported = false;
+}
  }
  }
  #endif
@@ -7394,6 +7402,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
  uint16_t tcp_flags;
  size_t map_cnt = 0;
  bool batch_enable = true;
+struct tx_port *port = NULL;
+
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
+port = pmd_send_port_cache_lookup(pmd, port_no);
+#endif
  
  pmd_perf_update_counter(&pmd->perf_stats,

  md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
@@ -7420,7 +7433,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
  }
  
  if (netdev_flow_api && recirc_depth == 0) {

-if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
+if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port, packet, &flow))) {
  /* Packet restoration failed and it was dropped, do not
   * continue processing.
   *

Re: [ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-11-20 Thread 0-day Robot
Bleep bloop.  Greetings Sriharsha Basavapatna, I am a robot and I have tried 
out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


build:
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
   -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall 
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security 
-Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align 
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes 
-Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror   -g 
-O2 -MT lib/dpif-netdev-lookup.lo -MD -MP -MF $depbase.Tpo -c -o 
lib/dpif-netdev-lookup.lo lib/dpif-netdev-lookup.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include 
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare 
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter 
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition 
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow 
-Werror -Werror -g -O2 -MT lib/dpif-netdev-lookup.lo -MD -MP -MF 
lib/.deps/dpif-netdev-lookup.Tpo -c lib/dpif-netdev-lookup.c -o 
lib/dpif-netdev-lookup.o
depbase=`echo lib/dpif-netdev-lookup-autovalidator.lo | sed 
's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
   -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall 
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security 
-Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align 
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes 
-Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror   -g 
-O2 -MT lib/dpif-netdev-lookup-autovalidator.lo -MD -MP -MF $depbase.Tpo -c -o 
lib/dpif-netdev-lookup-autovalidator.lo lib/dpif-netdev-lookup-autovalidator.c 
&&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include 
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare 
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter 
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition 
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow 
-Werror -Werror -g -O2 -MT lib/dpif-netdev-lookup-autovalidator.lo -MD -MP -MF 
lib/.deps/dpif-netdev-lookup-autovalidator.Tpo -c 
lib/dpif-netdev-lookup-autovalidator.c -o lib/dpif-netdev-lookup-autovalidator.o
depbase=`echo lib/dpif-netdev-lookup-generic.lo | sed 
's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
   -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall 
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security 
-Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align 
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes 
-Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror   -g 
-O2 -MT lib/dpif-netdev-lookup-generic.lo -MD -MP -MF $depbase.Tpo -c -o 
lib/dpif-netdev-lookup-generic.lo lib/dpif-netdev-lookup-generic.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include 
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare 
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter 
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition 
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow 
-Werror -Werror -g -O2 -MT lib/dpif-netdev-lookup-generic.lo -MD -MP -MF 
lib/.deps/dpif-netdev-lookup-generic.Tpo -c lib/dpif-netdev-lookup-generic.c -o 
lib/dpif-netdev-lookup-generic.o
depbase=`echo lib/dpif-netdev.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
   -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall 
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security 
-Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align 
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes 
-Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror   -g 
-O2 -MT lib/dpif-netdev.lo -MD -MP -MF $depbase.Tpo -c -o lib/dpif-netdev.lo 
lib/dpif-netdev.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include 
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare 
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter 
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition 
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow 
-Werror -We

[ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-11-20 Thread Sriharsha Basavapatna via dev
The hw_miss_packet_recover() API results in performance degradation, for
ports that are either not offload capable or do not support this specific
offload API.

For example, in the test configuration shown below, the vhost-user port
does not support offloads and the VF port doesn't support hw_miss offload
API. But because tunnel offload needs to be configured in other bridges
(br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.

br-vhostbr-vxlanbr-phy
vhost-user<-->VFVF-Rep<-->VxLAN   uplink-port

For every packet between the VF and the vhost-user ports, hw_miss API is
called even though it is not supported by the ports involved. This leads
to significant performance drop (~3x in some cases; both cycles and pps).

To fix this, return EOPNOTSUPP when this API fails for a device that
doesn't support it and avoid this API on that port for subsequent packets.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev-private.h |  2 +-
 lib/dpif-netdev.c | 29 +
 lib/netdev-offload-dpdk.c |  9 +++--
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h
index 4593649bd..e2a6a9d3a 100644
--- a/lib/dpif-netdev-private.h
+++ b/lib/dpif-netdev-private.h
@@ -46,7 +46,7 @@ dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd,
 
 int
 dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no,
+  void *port,
   struct dp_packet *packet,
   struct dp_netdev_flow **flow);
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 69d7ec26e..207b1961c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -434,6 +434,7 @@ struct tx_port {
 long long last_used;
 struct hmap_node node;
 long long flush_time;
+bool hw_miss_api_supported;
 struct dp_packet_batch output_pkts;
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
@@ -6972,6 +6973,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread 
*pmd,
 tx->port = port;
 tx->qid = -1;
 tx->flush_time = 0LL;
+tx->hw_miss_api_supported = true;
 dp_packet_batch_init(&tx->output_pkts);
 
 hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port->port_no));
@@ -7327,22 +7329,28 @@ static struct tx_port * pmd_send_port_cache_lookup(
 
 inline int
 dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no OVS_UNUSED,
+  void *port,
   struct dp_packet *packet,
   struct dp_netdev_flow **flow)
 {
-struct tx_port *p OVS_UNUSED;
+struct tx_port *p = port;
 uint32_t mark;
 
 #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
 /* Restore the packet if HW processing was terminated before completion. */
-p = pmd_send_port_cache_lookup(pmd, port_no);
-if (OVS_LIKELY(p)) {
+if (OVS_LIKELY(p) && p->hw_miss_api_supported) {
 int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
 
-if (err && err != EOPNOTSUPP) {
-COVERAGE_INC(datapath_drop_hw_miss_recover);
-return -1;
+if (err) {
+if (err != EOPNOTSUPP) {
+COVERAGE_INC(datapath_drop_hw_miss_recover);
+return -1;
+} else {
+/* API unsupported by the port; avoid subsequent calls. */
+VLOG_DBG("hw_miss_api unsupported: port: %d",
+ p->port->port_no);
+p->hw_miss_api_supported = false;
+}
 }
 }
 #endif
@@ -7394,6 +7402,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 uint16_t tcp_flags;
 size_t map_cnt = 0;
 bool batch_enable = true;
+struct tx_port *port = NULL;
+
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
+port = pmd_send_port_cache_lookup(pmd, port_no);
+#endif
 
 pmd_perf_update_counter(&pmd->perf_stats,
 md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
@@ -7420,7 +7433,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 
 if (netdev_flow_api && recirc_depth == 0) {
-if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
+if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port, packet, &flow))) {
 /* Packet restoration failed and it was dropped, do not
  * continue processing.
  */
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 9fee7570a..8bd2e 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -2292,11 +2292,16 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct 
netdev *netdev,
 odp_port_t vport_odp;
 int ret = 0;
 
-if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
-  &rte_restore_info, NULL