Re: [ovs-dev] [RFC v2 PATCH 4/5] dpif-netdev: Skip encap action during datapath execution

2020-05-28 Thread Sriharsha Basavapatna via dev
On Tue, May 26, 2020 at 7:12 PM Eli Britstein  wrote:
>
>
> On 5/18/2020 10:27 PM, Sriharsha Basavapatna wrote:
> > In this patch we check if action processing (apart from OUTPUT action),
> > should be skipped for a given dp_netdev_flow. Specifically, we check if
> > the action is TNL_PUSH and if it has been offloaded to HW, then we do not
> > push the tunnel header in SW. The datapath only executes the OUTPUT action.
> > The packet will be encapsulated in HW during transmit.
> This commit should be before the actual offloading. currently, the
> version after the offload and before this skip does not work.

Sure, I'll also move the get_stats() patch before the offloading patch.

> >
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >   lib/dpif-netdev.c | 247 +-
> >   1 file changed, 224 insertions(+), 23 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 4315f237c..07d16ad61 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -112,6 +112,7 @@ COVERAGE_DEFINE(datapath_drop_recirc_error);
> >   COVERAGE_DEFINE(datapath_drop_invalid_port);
> >   COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
> >   COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
> > +COVERAGE_DEFINE(datapath_skip_tunnel_push);
> >
> >   /* Protects against changes to 'dp_netdevs'. */
> >   static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
> > @@ -547,6 +548,7 @@ struct dp_netdev_flow {
> >*/
> >   bool partial_actions_offloaded;
> >   odp_port_t  egress_offload_port;
> > +struct ovs_mutex partial_action_offload_mutex;
> locking in the datapath should be avoided. furthermore, the mutex is in
> struct dp_netdev_flow. there is an instance of this struct for each pmd
> thread, and the flow put is done on a mega-flow. different pmd threads
> will have different instance of this mutex. this locking scheme doesn't
> resolve the issues from v1 about non-synched between datapath and hw
> offload.

I understand locking is not preferred, but I wanted to start with a
mutex to get the functionality right and possibly replace it with
better synchronization primitives subsequently (that's the reason it
is currently wrapped in separate lock() and unlock() functions). The
lock correctly synchronizes when the offload thread is processing the
same pmd-flow (dp_netdev_flow) that is running in the datapath. But
your point about a different pmd thread is right. We will have to then
synchronize it at the mega-flow level.

Having said that, assuming it is not synched (as in patchset v1), and
the tunnel header is pushed in SW, then the HW wouldn't push the
tunnel header again since the packet is already encapsulated. That is,
there's no matching flow in HW and the action wouldn't be executed in
HW. So there won't be any issue of double encap. I've also confirmed
this with testing (debug code to ignore successfully offloaded flow
and still push the header in SW).  Due to this, a few packets (at the
beginning of the session) that could have been offloaded will just end
up consuming SW cycles, but eventually packets get offloaded. But it's
not bad considering that the trade-off is additional complexity of
synchronization code.

Based on this, I'm going to revert back this part of the code (i.e,
no-sync), and I'll add inline comments to explain this.

>
> >
> >   /* Statistics. */
> >   struct dp_netdev_flow_stats stats;
> > @@ -801,7 +803,8 @@ static void dp_netdev_execute_actions(struct 
> > dp_netdev_pmd_thread *pmd,
> > bool should_steal,
> > const struct flow *flow,
> > const struct nlattr *actions,
> > -  size_t actions_len);
> > +  size_t actions_len,
> > +  const struct dp_netdev_flow 
> > *dp_flow);
> >   static void dp_netdev_input(struct dp_netdev_pmd_thread *,
> >   struct dp_packet_batch *, odp_port_t port_no);
> >   static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
> > @@ -2361,17 +2364,159 @@ dp_netdev_append_flow_offload(struct 
> > dp_flow_offload_item *offload)
> >   ovs_mutex_unlock(_flow_offload.mutex);
> >   }
> >
> > +/*
> > + * Mapping structure to map ufid to a partial offload egress device.
> > + * Synchronization: accessed only in the context of the offload thread.
> > + */
> > +struct ufid_to_egdev_data {
> > +const struct cmap_node node;   /* link to cmap */
> > +ovs_u128 mega_ufid;/* mega-ufid being mapped */
> > +odp_port_t egress_port_num;/* Port number mapped to */
> > +struct dp_netdev_flow *flow;   /* flow that maps to this ufid */
> > +};
> mapping code for offloads should be done in offload layer and not in
> dpif. though the partial offload (mark+rss) does the same, i think it
> was wrong for 

Re: [ovs-dev] [RFC v2 PATCH 4/5] dpif-netdev: Skip encap action during datapath execution

2020-05-26 Thread Eli Britstein



On 5/18/2020 10:27 PM, Sriharsha Basavapatna wrote:

In this patch we check if action processing (apart from OUTPUT action),
should be skipped for a given dp_netdev_flow. Specifically, we check if
the action is TNL_PUSH and if it has been offloaded to HW, then we do not
push the tunnel header in SW. The datapath only executes the OUTPUT action.
The packet will be encapsulated in HW during transmit.
This commit should be before the actual offloading. currently, the 
version after the offload and before this skip does not work.


Signed-off-by: Sriharsha Basavapatna 
---
  lib/dpif-netdev.c | 247 +-
  1 file changed, 224 insertions(+), 23 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 4315f237c..07d16ad61 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -112,6 +112,7 @@ COVERAGE_DEFINE(datapath_drop_recirc_error);
  COVERAGE_DEFINE(datapath_drop_invalid_port);
  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
+COVERAGE_DEFINE(datapath_skip_tunnel_push);
  
  /* Protects against changes to 'dp_netdevs'. */

  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -547,6 +548,7 @@ struct dp_netdev_flow {
   */
  bool partial_actions_offloaded;
  odp_port_t  egress_offload_port;
+struct ovs_mutex partial_action_offload_mutex;
locking in the datapath should be avoided. furthermore, the mutex is in 
struct dp_netdev_flow. there is an instance of this struct for each pmd 
thread, and the flow put is done on a mega-flow. different pmd threads 
will have different instance of this mutex. this locking scheme doesn't 
resolve the issues from v1 about non-synched between datapath and hw 
offload.


  
  /* Statistics. */

  struct dp_netdev_flow_stats stats;
@@ -801,7 +803,8 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
bool should_steal,
const struct flow *flow,
const struct nlattr *actions,
-  size_t actions_len);
+  size_t actions_len,
+  const struct dp_netdev_flow *dp_flow);
  static void dp_netdev_input(struct dp_netdev_pmd_thread *,
  struct dp_packet_batch *, odp_port_t port_no);
  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
@@ -2361,17 +2364,159 @@ dp_netdev_append_flow_offload(struct 
dp_flow_offload_item *offload)
  ovs_mutex_unlock(_flow_offload.mutex);
  }
  
+/*

+ * Mapping structure to map ufid to a partial offload egress device.
+ * Synchronization: accessed only in the context of the offload thread.
+ */
+struct ufid_to_egdev_data {
+const struct cmap_node node;   /* link to cmap */
+ovs_u128 mega_ufid;/* mega-ufid being mapped */
+odp_port_t egress_port_num;/* Port number mapped to */
+struct dp_netdev_flow *flow;   /* flow that maps to this ufid */
+};
mapping code for offloads should be done in offload layer and not in 
dpif. though the partial offload (mark+rss) does the same, i think it 
was wrong for that case too.

+
+/*
+ * A mapping from mega-ufid to partial-offload egress-device.
+ */
+static struct cmap ufid_to_egdev = CMAP_INITIALIZER;
+
+static uint32_t
+ufid_to_egdev_refcnt(const ovs_u128 *mega_ufid)
+{
+size_t hash = hash_bytes(mega_ufid, sizeof *mega_ufid, 0);
+struct ufid_to_egdev_data *data;
+uint32_t refcnt = 0;
+
+CMAP_FOR_EACH_WITH_HASH (data, node, hash, _to_egdev) {
+if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
+refcnt++;
+}
+}
+
+return refcnt;
+}
+
+/* Find egdev_data @(ufid, flow) */
+static struct ufid_to_egdev_data *
+ufid_to_egdev_data_find(const ovs_u128 *ufid,
+const struct dp_netdev_flow *flow)
+{
+size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
+struct ufid_to_egdev_data *data;
+
+CMAP_FOR_EACH_WITH_HASH (data, node, hash, _to_egdev) {
+if (data->flow == flow && ovs_u128_equals(*ufid, data->mega_ufid)) {
+return data;
+}
+}
+
+return NULL;
+}
+
+/* Map the given pair of mega-ufid and flow to the given port. Returns 0
+ * when the mapping is created initially in the context of a flow. For
+ * subsequent calls, if it is a new flow with the same mega-ufid, creates
+ * a mapping entry but returns EEXIST (i.e, at least one other flow with
+ * the same ufid exists in the table). If it is an already mapped mega-ufid
+ * & flow pair, returns EEXIST.
+ */
  static int
-partial_offload_egress_flow_del(struct dp_netdev_pmd_thread *pmd,
-struct dp_netdev_flow *flow)
+map_ufid_to_egdev(const ovs_u128 *mega_ufid,
+  const struct dp_netdev_flow *flow,
+  odp_port_t egress_port_num)
+{
+