Re: [ovs-dev] [PATCH] dpif-netdev-unixctl.man: document bond-show command

2020-06-29 Thread Vishal Deep Ajmera via dev
>  .
>  .IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]"
>  Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current
> usage.
> +.
> +.IP "\fBdpif-netdev/bond-show\fR [\fIdp\fR]"
> +When "other_config:lb-output-action" is set to "true", the userspace
> datapath
> +handles the load balancing of bonds directly instead of depending on flow
> +recirculation (only in balance-tcp mode).
> +
> +When this is the case, the above command prints the load-balancing
> information
> +of the bonds configured in datapath \fIdp\fR showing the slave associated
> with
> +each bucket (hash).

Hi Adrian,

Thanks for the patch. It looks ok to me.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-06-18 Thread Vishal Deep Ajmera via dev
>
> Hi.  Thanks for the new version it works fine.  In order to speed things
> up I prepared an incremental patch with one bug fix and some changes
> that I think are necessary to keep the code clean.  Don't be afraid of
> the patch size, there are very few functional changes, most of other
> changes are style fixes or readability improvements.
>
> This patch + my incremental was reviewed by Eelco and tested by Adrian.
> I'll add proper tags while applying to master.
>
> So, what I'm proposing for you is to take a look at the incremental patch
> and if it's OK for you, I could squash it into your v13 and apply to master.
>
> What do you think?
>
> Best regards, Ilya Maximets.
>
> Incremental patch itself:
> --
> Summary:
>   * Dropped 'dp->bond_mutex' while removing stale ports.  We're not
> touching bonds there at all.
>   * Added new argument 'update' to dp_netdev_add_bond_tx_to_pmd() to
> avoid unnecessary re-allocation of the same slaves during
> reconfiguration.  We might also want to avoid adding tx bonds to
> PMDs that doesn't poll anything inside dpif_netdev_bond_add(), but
> this might be done later.
>   * Added actual error codes to dpif_netdev_bond_*() functions.
> They are not used right now, but it's better to have them anyway.
>   * Refined API around dpif_bond_stats_get().  Now it always clears
> n_bytes array regardless of success.  Memory clearing removed from
> the caller.
>   * dpif_bond_*() functions should not check dpif and dpif_class
> pointers.  Rewritten to be more like other code.  Error codes
> must be positive and we're usually using EOPNOTSUPP instead of
> ENOTSUP for unsupported operations.
>   * bond/show output changed to reflect actual values, not the
> configured ones.  Dropped printing of recirc_id if recirc_id is not
> supported (actually if lb_output is not supported).
>   * Fixed wrong array size in bond_add_lb_output_buckets().
> MASK mistakenly used as an array size.
>   * Reorganized compose_output_action__() to make it more readable
> and not fetch hash algo if not needed. (incremental diff is big,
> but it mostly restores previous code :) )
>   * Removed is_lb_output_action_supported from the ofproto-provider
> API.  This was not a pretty solution.  Replaced with new
> ovs_lb_output_action_supported() function called from the
> ofproto/bond.c.  For this purpose validation of lb-output-action
> knob moved down to bond_reconfigure() out of bridge.c.
>   * 'lb-output-action' replaced with 'lb_output action' in docs and
> comments and in the output of bond/show to make things more
> readable and easier to understand.
>   * Tabs in output of bond/show command replaced with spaces.
> OVS doesn't use tabs in the output anywhere.
>   * Random style fixes, improvements.
>
Hi Ilya,

The changes look ok to me.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-06-15 Thread Vishal Deep Ajmera via dev
>
> Hi.  Thanks for the new version it works fine.  In order to speed things
> up I prepared an incremental patch with one bug fix and some changes
> that I think are necessary to keep the code clean.  Don't be afraid of
> the patch size, there are very few functional changes, most of other
> changes are style fixes or readability improvements.
>
> This patch + my incremental was reviewed by Eelco and tested by Adrian.
> I'll add proper tags while applying to master.
>
> So, what I'm proposing for you is to take a look at the incremental patch
> and if it's OK for you, I could squash it into your v13 and apply to master.
>
> What do you think?
>
Thanks Ilya. I will have a look at it and get back if I have any comments. I 
plan to finish it in next couple of days.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 0/1] Balance-tcp bond mode optimization

2020-05-22 Thread Vishal Deep Ajmera via dev
v12->v13:
 Addressed comments from Ilya, Eelco and Matteo (offline review).
 Fixed issues with the patch found by Matteo during testing.
 Rebased to OVS master.

v11->v12:
 Addressed most of comments from Ilya and Eelco.
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367832.html
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367842.html
 Rebased to OVS master.

v10->v11:
 Addressed Ben and Ilya's comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366626.html
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367306.html
 Using cmap instead of hmap and removed use of bond cache for pmds.

v9->v10:
 Rebased to OVS master.

v8->v9:
 Applied Ben's patch for fixing sparse and type errors.
 Addressed review comments from Ben.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/366101.html

v7->v8:
 Removed hash action for balance-tcp mode.
 Removed bond-only pmd reload action.
 Rebased to OVS master.

v6->v7:
 Fixed issue reported by Matteo for bond/show.

v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Ilya Maximets 
CC: Eelco Chaudron 
CC: Matteo Croce 
CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Ben Pfaff 
CC: Ian Stokes 
CC: David Marchand 

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 413 +++---
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |  11 +
 lib/dpif.c|  49 +++
 lib/dpif.h|  13 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|  14 +
 ofproto/bond.c|  97 -
 ofproto/bond.h|   5 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   3 +-
 ofproto/ofproto-dpif-xlate.c  |  32 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  10 +-
 ofproto/ofproto-provider.h|   3 +
 ofproto/ofproto.c |  10 +
 ofproto/ofproto.h |   1 +
 tests/lacp.at |   9 +
 tests/odp.at  |   1 +
 vswitchd/bridge.c |  13 +
 vswitchd/vswitch.xml  |  23 ++
 22 files changed, 675 insertions(+), 71 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-05-22 Thread Vishal Deep Ajmera via dev
een
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.345.5928.86
| 10 4.095.4332.61
| 4003.445.3555.25
| 1k 3.315.2558.41
| 10k2.464.4379.78
| 100k   2.324.2783.59
| 500k   2.294.2384.57
+--+
mpps: million packets per second.
packet size 64 bytes.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Matteo Croce 
CC: Eelco Chaudron 
CC: Ilya Maximets 
CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Ben Pfaff 
CC: Ian Stokes 
CC: David Marchand 
Signed-off-by: Vishal Deep Ajmera 
---
 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 413 +++---
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |  11 +
 lib/dpif.c|  49 +++
 lib/dpif.h|  13 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|  14 +
 ofproto/bond.c|  97 -
 ofproto/bond.h|   5 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   3 +-
 ofproto/ofproto-dpif-xlate.c  |  32 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  10 +-
 ofproto/ofproto-provider.h|   3 +
 ofproto/ofproto.c |  10 +
 ofproto/ofproto.h |   1 +
 tests/lacp.at |   9 +
 tests/odp.at  |   1 +
 vswitchd/bridge.c |  13 +
 vswitchd/vswitch.xml  |  23 ++
 22 files changed, 675 insertions(+), 71 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index f7c3b2e..cc41bbe 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -1023,6 +1023,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
OVS_ACTION_ATTR_DROP,  /* u32 xlate_error. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id. */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 51c8885..c55d452 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -111,6 +111,7 @@ COVERAGE_DEFINE(datapath_drop_tunnel_push_error);
 COVERAGE_DEFINE(datapath_drop_tunnel_pop_error);
 COVERAGE_DEFINE(datapath_drop_recirc_error);
 COVERAGE_DEFINE(datapath_drop_invalid_port);
+COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
 
@@ -310,6 +311,7 @@ struct pmd_auto_lb {
  *
  *dp_netdev_mutex (global)
  *port_mutex
+ *bond_mutex
  *non_pmd_mutex
  */
 struct dp_netdev {
@@ -377,6 +379,10 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+
+/* Bonds. */
+struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */
+struct cmap tx_bonds; /* Contains 'struct tx_bond'. */
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -608,6 +614,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets'. */
+struct slave_entry {
+odp_port_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'tx_bonds'. */
+struct tx_bond {
+struct cmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -740,6 +760,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Protects updates of 'tx_bonds'. */
+/* Map of 'tx

Re: [ovs-dev] [PATCH] ofproto: Fix for frequent invalidation of flows due to mismatch in mask bits

2020-04-20 Thread Vishal Deep Ajmera via dev
> >
> > Any thoughts on this? Should I write an equivalent of flow_wc_map() and
> use
> > it in xlate_wc_finish()
> > or we use a stricter version i.e. flow_wildcards_init_for_packet() and
use
> > in revalidate_ukey__?
> 
> My big problem for review here is that I don't understand the motivation
> for moving the code around.  Sure, it *works*, but why?  Why did you
> think to do it?

The reason to move the code from xlate_wc_finish() to revalidate_ukey__() is
as it caused quite a few unit tests to fail.
After looking at the failures I notice that Megaflow is not matching the
'expected' value.

For e.g. failure for test 0738 is:

-Megaflow:
recirc_id=0,eth,ip,tun_id=0,tun_src=1.1.1.1,tun_dst=2.2.2.2,tun_tos=3,tun_fl
ags=-df-csum-key,in_port=1,nw_ecn=3,nw_frag=no
+Megaflow:
recirc_id=0,eth,ip,tun_src=1.1.1.1,tun_dst=2.2.2.2,tun_tos=3,tun_flags=-df-c
sum-key,in_port=1,nw_ecn=3,nw_frag=no

flow_wildcards_init_for_packet() unwildcards 'tun_id' only when either
FLOW_TNL_F_KEY is set in tunnel flags or tun->ip_dst is not set & tun_id is
non-zero.

However the 'expected' Megaflow itself is not correct in this case as
tun_flags = -key and tun_dst is set, so tun_id should have been wildcarded
but it is not.

Let me know if it makes sense.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12 0/1] Balance-tcp bond mode optimization

2020-04-06 Thread Vishal Deep Ajmera via dev
> > Hi Matteo,
> >
> > Thanks for the patch. But I fail to understand why there is a memory leak.
> > In fact when I tested in my setup, the test ran without any leak.
>
> The main part is:
> '_REFILL_' loop removes packets from the original batch and
> dp_execute_output_action()
> doesn't free them if 'should_steal' equals to false.  Since,
> dp_execute_output_action()
> returns success, '_REFILL_' doesn't put packet back to original batch 
> leaking
> it.
>
> You need lb_output to not be the last action to reproduce.
>
Thanks Ilya.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12 0/1] Balance-tcp bond mode optimization

2020-04-06 Thread Vishal Deep Ajmera via dev
Hi Matteo,

Thanks for the patch. But I fail to understand why there is a memory leak.
In fact when I tested in my setup, the test ran without any leak.

The functions dp_execute_lb_output_action and dp_execute_output_action, both
return 'false' if current batch is non-empty.

   case OVS_ACTION_ATTR_LB_OUTPUT:
if (dp_execute_lb_output_action(pmd, packets_, should_steal,
nl_attr_get_u32(a))) {
return;
} else {
COVERAGE_ADD(datapath_drop_invalid_port,
 dp_packet_batch_size(packets_));
}
break;

And the caller to the function i.e. dp_execute_cb() will then execute
dp_packet_delete_batch(packets_, should_steal); which is last line of this
function.

   case OVS_ACTION_ATTR_DROP:
case __OVS_ACTION_ATTR_MAX:
OVS_NOT_REACHED();
}

dp_packet_delete_batch(packets_, should_steal);

If the should_steal is correctly set to true (for e.g. in case this is the
last action), then packet buffer will be freed. Am I missing something here?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v12 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-03-11 Thread Vishal Deep Ajmera via dev
improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.345.5928.86
| 10 4.095.4332.61
| 4003.445.3555.25
| 1k 3.315.2558.41
| 10k2.464.4379.78
| 100k   2.324.2783.59
| 500k   2.294.2384.57
+--+
mpps: million packets per second.
packet size 64 bytes.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Ben Pfaff 
CC: Ilya Maximets 
CC: Ian Stokes 
CC: David Marchand 
CC: Matteo Croce 
CC: Eelco Chaudron 

Signed-off-by: Vishal Deep Ajmera 
---
 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 399 +++---
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |  11 +
 lib/dpif.c|  49 +++
 lib/dpif.h|  13 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  68 +++-
 ofproto/bond.h|   5 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   3 +-
 ofproto/ofproto-dpif-xlate.c  |  32 +-
 ofproto/ofproto-dpif.c|  30 ++
 ofproto/ofproto-dpif.h|   8 +-
 ofproto/ofproto-provider.h|   3 +
 ofproto/ofproto.c |  10 +
 ofproto/ofproto.h |   1 +
 tests/lacp.at |   9 +
 vswitchd/bridge.c |  13 +
 vswitchd/vswitch.xml  |  23 ++
 21 files changed, 632 insertions(+), 56 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 2f0c655..8073d39 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -1021,6 +1021,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
OVS_ACTION_ATTR_DROP,  /* u32 xlate_error. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id. */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d393aab..68f3a95 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -309,6 +309,7 @@ struct pmd_auto_lb {
  *
  *dp_netdev_mutex (global)
  *port_mutex
+ *bond_mutex
  *non_pmd_mutex
  */
 struct dp_netdev {
@@ -376,6 +377,12 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+
+/* Bonds.
+ *
+ */
+struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */
+struct cmap tx_bonds; /* Contains 'struct tx_bond'. */
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -607,6 +614,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets'. */
+struct slave_entry {
+odp_port_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'tx_bonds'. */
+struct tx_bond {
+struct cmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -739,6 +760,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Protects updates of 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread
+ * and read by the pmd thread. */
+struct cmap tx_bonds;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunnel
  * ports (that support push_tunnel/pop_tunnel), the other contains ports
  * with at least one txq (that support send).  A port can be in both.
@@ -830,6 +856,10 @@ static

[ovs-dev] [PATCH v12 0/1] Balance-tcp bond mode optimization

2020-03-11 Thread Vishal Deep Ajmera via dev
v11->v12:
 Addressed most of comments from Ilya and Eelco.
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367832.html
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367842.html
 Rebased to OVS master.

v10->v11:
 Addressed Ben and Ilya's comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366626.html
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367306.html
 Using cmap instead of hmap and removed use of bond cache for pmds.

v9->v10:
 Rebased to OVS master.

v8->v9:
 Applied Ben's patch for fixing sparse and type errors.
 Addressed review comments from Ben.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/366101.html

v7->v8:
 Removed hash action for balance-tcp mode.
 Removed bond-only pmd reload action.
 Rebased to OVS master.

v6->v7:
 Fixed issue reported by Matteo for bond/show.

v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Ben Pfaff 
CC: Ilya Maximets 
CC: Ian Stokes 
CC: David Marchand 
CC: Matteo Croce 
CC: Eelco Chaudron 

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 399 +++---
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |  11 +
 lib/dpif.c|  49 +++
 lib/dpif.h|  13 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  68 +++-
 ofproto/bond.h|   5 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   3 +-
 ofproto/ofproto-dpif-xlate.c  |  32 +-
 ofproto/ofproto-dpif.c|  30 ++
 ofproto/ofproto-dpif.h|   8 +-
 ofproto/ofproto-provider.h|   3 +
 ofproto/ofproto.c |  10 +
 ofproto/ofproto.h |   1 +
 tests/lacp.at |   9 +
 vswitchd/bridge.c |  13 +
 vswitchd/vswitch.xml  |  23 ++
 21 files changed, 632 insertions(+), 56 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH repost] ofproto: Add support to watch controller port liveness in fast-failover group

2020-03-01 Thread Vishal Deep Ajmera via dev
> > Currently fast-failover group does not support checking liveness of
> controller
> > port (OFPP_CONTROLLER). However this feature can be useful for selecting
> > alternate pipeline when controller connection itself is down for e.g.
> > by using local DHCP server to reply for any DHCP request originating
from
> > VMs.
> >
> > This patch adds the support for watching controller port liveness in
fast-
> > failover group. Controller port is considered live when atleast one
> > of-connection is alive.
> >
> > Example usage:
> >
> > ovs-ofctl add-group br-int 'group_id=1234,type=ff,
> >   bucket=watch_port:CONTROLLER,actions:,
> >   bucket=watch_port:1,actions:
> >
> 
> Hi Ben, any comments on this patch?
> 

Hi Ben,

Can you help reviewing this patch and merge if it looks ok? We (from
Ericsson) are currently working on an SDN feature requiring this support.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-02-12 Thread Vishal Deep Ajmera via dev
> > I am inclined to use option 2 to have per-pmd cmap of bonds however,
> > I see that even in this option we will need to have one cmap (or hmap)
> > of bonds at global datapath level. This will take care of
> > reconfigure_pmd_threads
> > scenario. New PMDs needs to be configured with bond cmap from global
> map.
>
> OK. Make sense for a newly created threads.
>
Hi Ilya,

I have posted new patch addressing your comments. Let me know if it looks ok 
now.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH repost] ofproto: Add support to watch controller port liveness in fast-failover group

2020-02-12 Thread Vishal Deep Ajmera via dev


> 
> Currently fast-failover group does not support checking liveness of
controller
> port (OFPP_CONTROLLER). However this feature can be useful for selecting
> alternate pipeline when controller connection itself is down for e.g.
> by using local DHCP server to reply for any DHCP request originating from
> VMs.
> 
> This patch adds the support for watching controller port liveness in fast-
> failover group. Controller port is considered live when atleast one
> of-connection is alive.
> 
> Example usage:
> 
> ovs-ofctl add-group br-int 'group_id=1234,type=ff,
>   bucket=watch_port:CONTROLLER,actions:,
>   bucket=watch_port:1,actions:
> 

Hi Ben, any comments on this patch?

Warm Regards,
Vishal Ajmera


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix for frequent invalidation of flows due to mismatch in mask bits

2020-02-12 Thread Vishal Deep Ajmera via dev
> To me it looks like we will need an equivalent of flow_wc_map() which is
> less stricter than
> flow_wildcards_init_for_packet() if we want to add this check in
> xlate_wc_finish().
> 
> Let me know if the current solution has issues. I can try to move this
check
> to
> xlate_wc_finish() otherwise.

Hi Ben,

Any thoughts on this? Should I write an equivalent of flow_wc_map() and use
it in xlate_wc_finish()
or we use a stricter version i.e. flow_wildcards_init_for_packet() and use
in revalidate_ukey__?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v11 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-02-11 Thread Vishal Deep Ajmera via dev
improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.345.5928.86
| 10 4.095.4332.61
| 4003.445.3555.25
| 1k 3.315.2558.41
| 10k2.464.4379.78
| 100k   2.324.2783.59
| 500k   2.294.2384.57
+--+
mpps: million packets per second.
packet size 64 bytes.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Ben Pfaff 
CC: Ilya Maximets 
CC: Ian Stokes 
CC: David Marchand 
CC: Matteo Croce 
---
 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 399 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   9 +
 lib/dpif.c|  49 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  58 +++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  31 +-
 ofproto/ofproto-dpif.c|  24 ++
 ofproto/ofproto-dpif.h|   9 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   5 +
 vswitchd/vswitch.xml  |  23 ++
 18 files changed, 587 insertions(+), 57 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 2f0c655..7604a2a 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -1021,6 +1021,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
OVS_ACTION_ATTR_DROP,  /* u32 xlate_error. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d393aab..f550192 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -376,6 +377,12 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct cmap tx_bonds; /* Contains "struct tx_bond"s. */
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -607,6 +614,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets' */
+struct slave_entry {
+odp_port_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'tx_bonds'. */
+struct tx_bond {
+struct cmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -739,6 +760,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Mutex for 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread,
+ * read/written by the pmd thread. */
+struct cmap tx_bonds OVS_GUARDED;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunnel
  * ports (that support push_tunnel/pop_tunnel), the other contains ports
  * with at least one txq (that support send).  A port can be in both.
@@ -830,6 +856,12 @@ static void dp_netdev_del_rxq_from_pmd(struct 
dp_netdev_pmd_thread *pmd,
 static int
 dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd,

[ovs-dev] [PATCH v11 0/1] Balance-tcp bond mode optimization

2020-02-11 Thread Vishal Deep Ajmera via dev
v10->v11:
 Addressed Ben and Ilya's comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366626.html
 https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367306.html
 Using cmap instead of hmap and removed use of bond cache for pmds.

v9->v10:
 Rebased to OVS master.

v8->v9:
 Applied Ben's patch for fixing sparse and type errors.
 Addressed review comments from Ben.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/366101.html

v7->v8:
 Removed hash action for balance-tcp mode.
 Removed bond-only pmd reload action.
 Rebased to OVS master.

v6->v7:
 Fixed issue reported by Matteo for bond/show.

v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Ben Pfaff 
CC: Ilya Maximets 
CC: Ian Stokes 
CC: David Marchand 
CC: Matteo Croce 

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 399 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   9 +
 lib/dpif.c|  49 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  58 +++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  31 +-
 ofproto/ofproto-dpif.c|  24 ++
 ofproto/ofproto-dpif.h|   9 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   5 +
 vswitchd/vswitch.xml  |  23 ++
 18 files changed, 587 insertions(+), 57 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-02-06 Thread Vishal Deep Ajmera via dev
> >
> > So, the root cause of all the issues in this patch, in my understanding,
> > is the fact that you need to collect statistics for all the bond hashes
> > in order to be able to rebalance traffic.  This forces you to have access
> > to PMD local caches.
> >
> > The basic idea how to overcome this issue is to not have PMD local bond
> > cache, but have an RCU-protected data structure instead.
> >
> > Memory layout scheme in current patch consists of 3 layers:
> >   1. Global hash map for all bonds. One for the whole datapath.
> >  Protected by dp->bond_mutex.
> >   2. Hash map of all the bonds. One for each PMD thread.
> >  Protected by pmd->bond_mutex.
> >   3. Hash map of all the bonds. Local copy that could be used
> >  lockless, but only by the owning PMD thread.
> >
> > Suggested layout #1:
> >   Single global concurrent hash map (cmap) for all bonds. One for the 
> > whole
> >   datapath.  Bond addition/deletion protected by the dp->bond_mutex.
> >   Reads are lockless since protected by RCU.  Statistics updates must be
> >   fully atomic (i.e. atomic_add_relaxed()).
> >
> > Suggested layout #2:
> >   One cmap for each PMD thread (no global one).  Bond addition/deletion
> >   protected by the pmd->bond_mutex.  Reads are lockless since protected
> >   by RCU.  Statistics updates should be atomic in terms of reads and 
> > writes.
> >   (non_atomic_ullong_add() function could be used).
> >   (This is similar to how we store per-PMD flow tables.)
> >
> > #1 will consume a lot less memory, but could scale worse in case of too 
> > many
> > threads trying to send traffic to the same bond port.  #2 might be a bit
> > faster and more scalable in terms of performance, but less efficient in
> > memory consumption and might be slower in terms of response to slave
> updates
> > since we have to update hash maps on all the threads.
> > Both solutions doesn't require any reconfiguration of running PMD threads.
> >
> > Note that to update cmap entry, you will likely need to prepare the new 
> > cmap
> > node and use a cmap_replace().  Statistics copy in this case might be a 
> > bit
> > tricky because you may lost part of additions within the period while PMD
> > threads are still using the old entry. To avoid that, statistics copying
> > should be RCU-postponed.  However, I'm not sure if we need highly accurate
> > stats there.
> >

Hi Ilya,

I am inclined to use option 2 to have per-pmd cmap of bonds however,
I see that even in this option we will need to have one cmap (or hmap)
of bonds at global datapath level. This will take care of 
reconfigure_pmd_threads
scenario. New PMDs needs to be configured with bond cmap from global map.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix for frequent invalidation of flows due to mismatch in mask bits

2020-02-04 Thread Vishal Deep Ajmera via dev
> Thanks for reviewing the patch. When I use the same fix in
xlate_wc_finish()
> I get several unit test failures: 0738  0756  0763  0768  2247  2251  2255
> 2256.
> 
> 0738:
> -Megaflow:
>
recirc_id=0,eth,ip,tun_id=0,tun_src=1.1.1.1,tun_dst=2.2.2.2,tun_tos=3,tun_fl
> ags=-df-csum-key,in_port=1,nw_ecn=3,nw_frag=no
> +Megaflow:
>
recirc_id=0,eth,ip,tun_src=1.1.1.1,tun_dst=2.2.2.2,tun_tos=3,tun_flags=-df-c
> sum-key,in_port=1,nw_ecn=3,nw_frag=no
> 
> 0756:
> -Megaflow:
>
recirc_id=0,eth,ip,tun_id=0,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos=0,tun_fl
> ags=+df-
> csum+key,tun_metadata0,tun_metadata1=NP,tun_metadata2=NP,in_port=1,n
> w_frag=no
> +Megaflow:
>
recirc_id=0,eth,ip,tun_id=0,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos=0,tun_fl
> ags=+df-csum+key,tun_metadata0,in_port=1,nw_frag=no
> 
> 0763:
> +2020-01-23T06:05:58.695Z|2|odp_util(revalidator9)|WARN|unexpected
> L3
> matching with masked Ethertype 0x800/0
> +2020-01-23T06:05:58.695Z|3|odp_util(revalidator9)|WARN|the flow
> mask in
> error is:
>
skb_priority(0),tunnel(tun_id=0x,src=255.255.255.255,dst=255
>
.255.255.255,tos=0xff,ttl=0,flags(df|csum|key)),skb_mark(0),ct_state(0),ct_z
>
one(0),ct_mark(0),ct_label(0),recirc_id(0x),dp_hash(0),in_port(42949
>
67295),packet_type(ns=65535,id=0x),eth_type(0x),ipv4(src=0.0.0.0,dst
> =0.0.0.0,proto=0,tos=0x3,ttl=0,frag=),icmp(type=0,code=0), for the
> following flow key:
>
packet_type=(1,0x800),tun_id=0x1c8,tun_src=1.1.2.92,tun_dst=1.1.2.88,tun_ipv
>
6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=3,tun_ttl=64,t
> un_erspan_ver=0,tun_flags=key,in_port=3,nw_src=30.0.0.1,nw_dst=30.0.0.2,n
> w_p
> roto=1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> 
> And likewise. Any idea why these failures? It looks to me that the patch
> wild-carded more mask bits then expected.

Hi Ben,

To me it looks like we will need an equivalent of flow_wc_map() which is
less stricter than
flow_wildcards_init_for_packet() if we want to add this check in
xlate_wc_finish().

Let me know if the current solution has issues. I can try to move this check
to
xlate_wc_finish() otherwise.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Add support to watch controller port liveness in fast-failover group

2020-02-03 Thread Vishal Deep Ajmera via dev
> >
> > Please add an item to NEWS.
> >
> > OVS is currently in "soft freeze" for the next release.  We should fork
> > for the next release at the end of the week.  Therefore, please resubmit
> > your patch next week.
> 
> Thanks Ben. Sure I will post new patch after fork is done.
> 
Hi Ben,

I have reposted the patch in mailing list.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH repost] ofproto: Add support to watch controller port liveness in fast-failover group

2020-02-03 Thread Vishal Deep Ajmera via dev
Currently fast-failover group does not support checking liveness of controller
port (OFPP_CONTROLLER). However this feature can be useful for selecting
alternate pipeline when controller connection itself is down for e.g.
by using local DHCP server to reply for any DHCP request originating from VMs.

This patch adds the support for watching controller port liveness in fast-
failover group. Controller port is considered live when atleast one
of-connection is alive.

Example usage:

ovs-ofctl add-group br-int 'group_id=1234,type=ff,
  bucket=watch_port:CONTROLLER,actions:,
  bucket=watch_port:1,actions:

Signed-off-by: Vishal Deep Ajmera 
---
 NEWS |  1 +
 lib/ofp-group.c  |  3 ++-
 ofproto/ofproto-dpif-xlate.c |  5 -
 ofproto/ofproto-dpif.c   | 10 ++
 ofproto/ofproto-dpif.h   |  3 +++
 ofproto/ofproto.c|  3 ++-
 6 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 9bbe71d..354291a 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,7 @@ Post-v2.13.0
- OpenFlow:
  * The OpenFlow ofp_desc/serial_num may now be configured by setting the
value of other-config:dp-sn in the Bridge table.
+ * Added support to watch CONTROLLER port status in fast failover group.
 
 
 v2.13.0 - xx xxx 
diff --git a/lib/ofp-group.c b/lib/ofp-group.c
index b675e80..bf0f8af 100644
--- a/lib/ofp-group.c
+++ b/lib/ofp-group.c
@@ -660,7 +660,8 @@ parse_bucket_str(struct ofputil_bucket *bucket, char *str_,
 } else if (!strcasecmp(key, "watch_port")) {
 if (!ofputil_port_from_string(value, port_map, >watch_port)
 || (ofp_to_u16(bucket->watch_port) >= ofp_to_u16(OFPP_MAX)
-&& bucket->watch_port != OFPP_ANY)) {
+&& bucket->watch_port != OFPP_ANY
+&& bucket->watch_port != OFPP_CONTROLLER)) {
 error = xasprintf("%s: invalid watch_port", value);
 }
 } else if (!strcasecmp(key, "watch_group")) {
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 0b45ecf..adf57a5 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1888,9 +1888,12 @@ bucket_is_alive(const struct xlate_ctx *ctx,
 
 return (!ofputil_bucket_has_liveness(bucket)
 || (bucket->watch_port != OFPP_ANY
+   && bucket->watch_port != OFPP_CONTROLLER
&& odp_port_is_alive(ctx, bucket->watch_port))
 || (bucket->watch_group != OFPG_ANY
-   && group_is_alive(ctx, bucket->watch_group, depth + 1)));
+   && group_is_alive(ctx, bucket->watch_group, depth + 1))
+|| (bucket->watch_port == OFPP_CONTROLLER
+   && ofproto_is_alive(>xbridge->ofproto->up)));
 }
 
 static void
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 0222ec8..65f214a 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -1798,6 +1798,7 @@ run(struct ofproto *ofproto_)
 {
 struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
 uint64_t new_seq, new_dump_seq;
+bool is_connected;
 
 if (mbridge_need_revalidate(ofproto->mbridge)) {
 ofproto->backer->need_revalidate = REV_RECONFIGURE;
@@ -1866,6 +1867,15 @@ run(struct ofproto *ofproto_)
 ofproto->backer->need_revalidate = REV_MCAST_SNOOPING;
 }
 
+/* Check if controller connection is toggled. */
+is_connected = ofproto_is_alive(>up);
+if (ofproto->is_controller_connected != is_connected) {
+ofproto->is_controller_connected = is_connected;
+/* Trigger revalidation as fast failover group monitoring
+ * controller port may need to check liveness again. */
+ofproto->backer->need_revalidate = REV_RECONFIGURE;
+}
+
 new_dump_seq = seq_read(udpif_dump_seq(ofproto->backer->udpif));
 if (ofproto->dump_seq != new_dump_seq) {
 struct rule *rule, *next_rule;
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index c9d5df3..aee61d6 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -342,6 +342,9 @@ struct ofproto_dpif {
 struct guarded_list ams;  /* Contains "struct ofproto_async_msgs"s. */
 struct seq *ams_seq;  /* For notifying 'ams' reception. */
 uint64_t ams_seqno;
+
+bool is_controller_connected; /* True if any controller admitted this
+   * switch connection. */
 };
 
 struct ofproto_dpif *ofproto_dpif_lookup_by_name(const char *name);
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index e259128..0fbd6c3 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -1906,7 +1906,8 @@ ofproto_wait(struct ofproto *p)
 bool
 ofproto_is_alive(const struct ofproto *p)
 {
-  

Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-30 Thread Vishal Deep Ajmera via dev
>
> So, the root cause of all the issues in this patch, in my understanding,
> is the fact that you need to collect statistics for all the bond hashes
> in order to be able to rebalance traffic.  This forces you to have access
> to PMD local caches.
>
> The basic idea how to overcome this issue is to not have PMD local bond
> cache, but have an RCU-protected data structure instead.
>
> Memory layout scheme in current patch consists of 3 layers:
>   1. Global hash map for all bonds. One for the whole datapath.
>  Protected by dp->bond_mutex.
>   2. Hash map of all the bonds. One for each PMD thread.
>  Protected by pmd->bond_mutex.
>   3. Hash map of all the bonds. Local copy that could be used
>  lockless, but only by the owning PMD thread.
>
> Suggested layout #1:
>   Single global concurrent hash map (cmap) for all bonds. One for the whole
>   datapath.  Bond addition/deletion protected by the dp->bond_mutex.
>   Reads are lockless since protected by RCU.  Statistics updates must be
>   fully atomic (i.e. atomic_add_relaxed()).
>
> Suggested layout #2:
>   One cmap for each PMD thread (no global one).  Bond addition/deletion
>   protected by the pmd->bond_mutex.  Reads are lockless since protected
>   by RCU.  Statistics updates should be atomic in terms of reads and writes.
>   (non_atomic_ullong_add() function could be used).
>   (This is similar to how we store per-PMD flow tables.)
>
> #1 will consume a lot less memory, but could scale worse in case of too many
> threads trying to send traffic to the same bond port.  #2 might be a bit
> faster and more scalable in terms of performance, but less efficient in
> memory consumption and might be slower in terms of response to slave updates
> since we have to update hash maps on all the threads.
> Both solutions doesn't require any reconfiguration of running PMD threads.
>
> Note that to update cmap entry, you will likely need to prepare the new cmap
> node and use a cmap_replace().  Statistics copy in this case might be a bit
> tricky because you may lost part of additions within the period while PMD
> threads are still using the old entry. To avoid that, statistics copying
> should be RCU-postponed.  However, I'm not sure if we need highly accurate
> stats there.
>
> Any thoughts and suggestion (alternative solutions) are welcome to discuss.

Thanks Ilya. I will have a look at it and implement it in the next patch-set.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix for frequent invalidation of flows due to mismatch in mask bits

2020-01-23 Thread Vishal Deep Ajmera via dev
 
> This is clever!  I think that this kind of approach is the right one.
> 
> I don't yet understand why this should be moved from xlate_wc_finish()
> to revalidate_ukey__().  Can you help me understand that?

Hi Ben,

Thanks for reviewing the patch. When I use the same fix in xlate_wc_finish()
I get several unit test failures: 0738  0756  0763  0768  2247  2251  2255
2256.

0738:
-Megaflow:
recirc_id=0,eth,ip,tun_id=0,tun_src=1.1.1.1,tun_dst=2.2.2.2,tun_tos=3,tun_fl
ags=-df-csum-key,in_port=1,nw_ecn=3,nw_frag=no
+Megaflow:
recirc_id=0,eth,ip,tun_src=1.1.1.1,tun_dst=2.2.2.2,tun_tos=3,tun_flags=-df-c
sum-key,in_port=1,nw_ecn=3,nw_frag=no

0756:
-Megaflow:
recirc_id=0,eth,ip,tun_id=0,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos=0,tun_fl
ags=+df-csum+key,tun_metadata0,tun_metadata1=NP,tun_metadata2=NP,in_port=1,n
w_frag=no
+Megaflow:
recirc_id=0,eth,ip,tun_id=0,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos=0,tun_fl
ags=+df-csum+key,tun_metadata0,in_port=1,nw_frag=no

0763:
+2020-01-23T06:05:58.695Z|2|odp_util(revalidator9)|WARN|unexpected L3
matching with masked Ethertype 0x800/0
+2020-01-23T06:05:58.695Z|3|odp_util(revalidator9)|WARN|the flow mask in
error is:
skb_priority(0),tunnel(tun_id=0x,src=255.255.255.255,dst=255
.255.255.255,tos=0xff,ttl=0,flags(df|csum|key)),skb_mark(0),ct_state(0),ct_z
one(0),ct_mark(0),ct_label(0),recirc_id(0x),dp_hash(0),in_port(42949
67295),packet_type(ns=65535,id=0x),eth_type(0x),ipv4(src=0.0.0.0,dst
=0.0.0.0,proto=0,tos=0x3,ttl=0,frag=),icmp(type=0,code=0), for the
following flow key:
packet_type=(1,0x800),tun_id=0x1c8,tun_src=1.1.2.92,tun_dst=1.1.2.88,tun_ipv
6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=3,tun_ttl=64,t
un_erspan_ver=0,tun_flags=key,in_port=3,nw_src=30.0.0.1,nw_dst=30.0.0.2,nw_p
roto=1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0

And likewise. Any idea why these failures? It looks to me that the patch
wild-carded more mask bits then expected.

Regards,
Vishal
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-20 Thread Vishal Deep Ajmera via dev
> >
> > Thanks for the patch!
> >
> > I have a few minor stylistic suggestions, see below.
> >
> > I'd like to hear Ilya's opinion on this.
>
> Thanks Ben for review!  I'll take another look at this patch in a next
> couple of days.
>
> >

Hi Ilya, Ben,

Let me know if you have any comments. Can we get this patch reviewed/merged 
before the branching is done?

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto: Fix for frequent invalidation of flows due to mismatch in mask bits

2020-01-16 Thread Vishal Deep Ajmera via dev
The wildcard bits in the installed megaflow entry could be different from
the bits originally generated by the ofproto layer. Datapath implementation
wildcards those match fields which are not present in the incoming packet
before installing the flow.

When the revalidator thread validates a megaflow, it will first query the
ofproto layer to get the wildcard bits and then compare it against the
wildcard bits in the megaflow. If the bits are different the entry will be
removed. A subsequent packet will again result in the same megaflow entry
being installed only for it to be removed by the revalidator thread. This
cycle will continue and will significantly degrade performance.

An earlier patch fixing the issue for MPLS and VLAN was sent.
However similar problem now appears for IPv6 datapath flows.

This patch addresses the issue in a generic way.

Signed-off-by: Vishal Deep Ajmera 
---
 ofproto/ofproto-dpif-upcall.c | 14 +-
 ofproto/ofproto-dpif-xlate.c  | 20 
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 409286a..1371486 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2169,7 +2169,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 {
 struct xlate_out *xoutp;
 struct netflow *netflow;
-struct flow_wildcards dp_mask, wc;
+struct flow_wildcards dp_mask, wc, wc_from_flow;
 enum reval_result result;
 struct reval_context ctx = {
 .odp_actions = odp_actions,
@@ -2215,6 +2215,18 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 goto exit;
 }
 
+/* Clear flow wildcard bits for fields which are not present
+ * in the original packet header. These wildcards may get set
+ * due to push/set_field actions. This results into frequent
+ * invalidation of datapath flows by revalidator thread. */
+
+/* Create wc mask based on incoming packet. */
+flow_wildcards_init_for_packet(_from_flow,
+   );
+
+/* Clear mask fields in ctx which are not relevant for packet. */
+flow_wildcards_and(ctx.wc, _from_flow, ctx.wc);
+
 /* Do not modify if any bit is wildcarded by the installed datapath flow,
  * but not the newly revalidated wildcard mask (wc), i.e., if revalidation
  * tells that the datapath flow is now too generic and must be narrowed
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 4407f9c..42fdb9f 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -7363,26 +7363,6 @@ xlate_wc_finish(struct xlate_ctx *ctx)
 ctx->wc->masks.tp_src = 0;
 ctx->wc->masks.tp_dst = 0;
 }
-
-/* Clear flow wildcard bits for fields which are not present
- * in the original packet header. These wildcards may get set
- * due to push/set_field actions. This results into frequent
- * invalidation of datapath flows by revalidator thread. */
-
-/* Clear mpls label wc bits if original packet is non-mpls. */
-if (!eth_type_mpls(ctx->xin->upcall_flow->dl_type)) {
-for (i = 0; i < FLOW_MAX_MPLS_LABELS; i++) {
-ctx->wc->masks.mpls_lse[i] = 0;
-}
-}
-/* Clear vlan header wc bits if original packet does not have
- * vlan header. */
-for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
-if (!eth_type_vlan(ctx->xin->upcall_flow->vlans[i].tpid)) {
-ctx->wc->masks.vlans[i].tpid = 0;
-ctx->wc->masks.vlans[i].tci = 0;
-}
-}
 }
 
 /* Translates the flow, actions, or rule in 'xin' into datapath actions in
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Add support to watch controller port liveness in fast-failover group

2020-01-14 Thread Vishal Deep Ajmera via dev
> 
> Thanks for the new feature.
> 
> Please add an item to NEWS.
> 
> OVS is currently in "soft freeze" for the next release.  We should fork
> for the next release at the end of the week.  Therefore, please resubmit
> your patch next week.

Thanks Ben. Sure I will post new patch after fork is done.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto: Add support to watch controller port liveness in fast-failover group

2020-01-12 Thread Vishal Deep Ajmera via dev
Currently fast-failover group does not support checking liveness of controller
port (OFPP_CONTROLLER). However this feature can be useful for selecting
alternate pipeline when controller connection itself is down for e.g.
by using local DHCP server to reply for any DHCP request originating from VMs.

This patch adds the support for watching controller port liveness in fast-
failover group. Controller port is considered live when atleast one
of-connection is alive.

Example usage:

ovs-ofctl add-group br-int 'group_id=1234,type=ff,
  bucket=watch_port:CONTROLLER,actions:,
  bucket=watch_port:1,actions:

Signed-off-by: Vishal Deep Ajmera 
---
 lib/ofp-group.c  |  3 ++-
 ofproto/ofproto-dpif-xlate.c |  5 -
 ofproto/ofproto-dpif.c   | 10 ++
 ofproto/ofproto-dpif.h   |  3 +++
 ofproto/ofproto.c|  3 ++-
 5 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/lib/ofp-group.c b/lib/ofp-group.c
index b675e80..bf0f8af 100644
--- a/lib/ofp-group.c
+++ b/lib/ofp-group.c
@@ -660,7 +660,8 @@ parse_bucket_str(struct ofputil_bucket *bucket, char *str_,
 } else if (!strcasecmp(key, "watch_port")) {
 if (!ofputil_port_from_string(value, port_map, >watch_port)
 || (ofp_to_u16(bucket->watch_port) >= ofp_to_u16(OFPP_MAX)
-&& bucket->watch_port != OFPP_ANY)) {
+&& bucket->watch_port != OFPP_ANY
+&& bucket->watch_port != OFPP_CONTROLLER)) {
 error = xasprintf("%s: invalid watch_port", value);
 }
 } else if (!strcasecmp(key, "watch_group")) {
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 4407f9c..461e8aa 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1884,9 +1884,12 @@ bucket_is_alive(const struct xlate_ctx *ctx,
 
 return (!ofputil_bucket_has_liveness(bucket)
 || (bucket->watch_port != OFPP_ANY
+   && bucket->watch_port != OFPP_CONTROLLER
&& odp_port_is_alive(ctx, bucket->watch_port))
 || (bucket->watch_group != OFPG_ANY
-   && group_is_alive(ctx, bucket->watch_group, depth + 1)));
+   && group_is_alive(ctx, bucket->watch_group, depth + 1))
+|| (bucket->watch_port == OFPP_CONTROLLER
+   && ofproto_is_alive(>xbridge->ofproto->up)));
 }
 
 static void
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index d3cb392..818a077 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -1802,6 +1802,7 @@ run(struct ofproto *ofproto_)
 {
 struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
 uint64_t new_seq, new_dump_seq;
+bool is_connected;
 
 if (mbridge_need_revalidate(ofproto->mbridge)) {
 ofproto->backer->need_revalidate = REV_RECONFIGURE;
@@ -1870,6 +1871,15 @@ run(struct ofproto *ofproto_)
 ofproto->backer->need_revalidate = REV_MCAST_SNOOPING;
 }
 
+/* Check if controller connection is toggled. */
+is_connected = ofproto_is_alive(>up);
+if (ofproto->is_controller_connected != is_connected) {
+ofproto->is_controller_connected = is_connected;
+/* Trigger revalidation as fast failover group monitoring
+ * controller port may need to check liveness again. */
+ofproto->backer->need_revalidate = REV_RECONFIGURE;
+}
+
 new_dump_seq = seq_read(udpif_dump_seq(ofproto->backer->udpif));
 if (ofproto->dump_seq != new_dump_seq) {
 struct rule *rule, *next_rule;
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index c9d5df3..aee61d6 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -342,6 +342,9 @@ struct ofproto_dpif {
 struct guarded_list ams;  /* Contains "struct ofproto_async_msgs"s. */
 struct seq *ams_seq;  /* For notifying 'ams' reception. */
 uint64_t ams_seqno;
+
+bool is_controller_connected; /* True if any controller admitted this
+   * switch connection. */
 };
 
 struct ofproto_dpif *ofproto_dpif_lookup_by_name(const char *name);
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 08830d8..a155fad 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -1899,7 +1899,8 @@ ofproto_wait(struct ofproto *p)
 bool
 ofproto_is_alive(const struct ofproto *p)
 {
-return connmgr_has_controllers(p->connmgr);
+return (connmgr_has_controllers(p->connmgr)
+&& connmgr_is_any_controller_admitted(p->connmgr));
 }
 
 /* Adds some memory usage statistics for 'ofproto' into 'usage', for use with
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-07 Thread Vishal Deep Ajmera via dev
improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.475.7328.21
| 10 4.175.3528.45
| 4003.525.3953.01
| 1k 3.415.2553.78
| 10k2.534.5780.44
| 100k   2.324.2783.59
| 500k   2.324.2783.59
+--+
mpps: million packets per second.
packet size 64 bytes.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 502 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   9 +
 lib/dpif.c|  49 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  56 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  42 +-
 ofproto/ofproto-dpif.c|  24 ++
 ofproto/ofproto-dpif.h|   9 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  23 +
 18 files changed, 695 insertions(+), 62 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 2f0c655..7604a2a 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -1021,6 +1021,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
OVS_ACTION_ATTR_DROP,  /* u32 xlate_error. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2421821..78f9cf9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -377,6 +378,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -607,6 +613,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets' */
+struct slave_entry {
+odp_port_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -718,7 +738,6 @@ struct dp_netdev_pmd_thread {
 atomic_bool wait_for_reload;/* Can we busy wait for the next reload? */
 atomic_bool reload_tx_qid;  /* Do we need to reload static_tx_qid? */
 atomic_bool exit;   /* For terminating the pmd thread. */
-
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
@@ -739,6 +758,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Mutex for 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread,
+ * read/written by the pmd thread. */
+struct hmap tx_bonds OVS_GUARDED;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunne

[ovs-dev] [PATCH v10 0/1] Balance-tcp bond mode optimization

2020-01-07 Thread Vishal Deep Ajmera via dev
v9->v10:
 Rebased to OVS master.

v8->v9:
 Applied Ben's patch for fixing sparse and type errors.
 Addressed review comments from Ben.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/366101.html

v7->v8:
 Removed hash action for balance-tcp mode.
 Removed bond-only pmd reload action.
 Rebased to OVS master.

v6->v7:
 Fixed issue reported by Matteo for bond/show.

v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 502 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   9 +
 lib/dpif.c|  49 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  56 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  42 +-
 ofproto/ofproto-dpif.c|  24 ++
 ofproto/ofproto-dpif.h|   9 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  23 +
 18 files changed, 695 insertions(+), 62 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v9 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-07 Thread Vishal Deep Ajmera via dev
>
> I'm not sure what's happened here but this patch really really fails to
> apply:
>
> Applying: Avoid dp_hash recirculation for balance-tcp bond selection 
> mode
> error: patch failed:
> datapath/linux/compat/include/linux/openvswitch.h:994
> error: datapath/linux/compat/include/linux/openvswitch.h: patch does
> not apply
> error: patch failed: lib/dpif-netdev.c:7045
> error: lib/dpif-netdev.c: patch does not apply
> error: patch failed: lib/dpif.h:905
> error: lib/dpif.h: patch does not apply
> error: patch failed: lib/odp-execute.c:990
> error: lib/odp-execute.c: patch does not apply
> error: patch failed: ofproto/ofproto-dpif-ipfix.c:3016
> error: ofproto/ofproto-dpif-ipfix.c: patch does not apply
> error: patch failed: ofproto/ofproto-dpif.c:1584
> error: ofproto/ofproto-dpif.c: patch does not apply
> error: patch failed: ofproto/ofproto-dpif.h:199
> error: ofproto/ofproto-dpif.h: patch does not apply
> Patch failed at 0001 Avoid dp_hash recirculation for balance-tcp bond
> selection mode
> hint: Use 'git am --show-current-patch' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
>
> Maybe it needs a rebase or maybe there's something else wrong somehow.

Hi Ben,

I have sent v10 version of patch after rebasing with ovs-master. There were 
multiple conflicts and I have resolved them now.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 0/1] Balance-tcp bond mode optimization

2019-12-19 Thread Vishal Deep Ajmera via dev
v8->v9:
 Applied Ben's patch for fixing sparse and type errors.
 Addressed review comments from Ben.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/366101.html

v7->v8:
 Removed hash action for balance-tcp mode.
 Removed bond-only pmd reload action.
 Rebased to OVS master.

v6->v7:
 Fixed issue reported by Matteo for bond/show.

v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 502 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   9 +
 lib/dpif.c|  49 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  56 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  42 +-
 ofproto/ofproto-dpif.c|  24 ++
 ofproto/ofproto-dpif.h|   8 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  23 +
 18 files changed, 695 insertions(+), 61 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-12-19 Thread Vishal Deep Ajmera via dev
improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.475.7328.21
| 10 4.175.3528.45
| 4003.525.3953.01
| 1k 3.415.2553.78
| 10k2.534.5780.44
| 100k   2.324.2783.59
| 500k   2.324.2783.59
+--+
mpps: million packets per second.
packet size 64 bytes.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 502 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   9 +
 lib/dpif.c|  49 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  56 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  42 +-
 ofproto/ofproto-dpif.c|  24 ++
 ofproto/ofproto-dpif.h|   8 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  23 +
 18 files changed, 695 insertions(+), 61 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 778827f..a06ae7f 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -994,6 +994,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3f21211..fa9ff73 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -366,6 +367,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -596,6 +602,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets' */
+struct slave_entry {
+odp_port_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -707,7 +727,6 @@ struct dp_netdev_pmd_thread {
 atomic_bool wait_for_reload;/* Can we busy wait for the next reload? */
 atomic_bool reload_tx_qid;  /* Do we need to reload static_tx_qid? */
 atomic_bool exit;   /* For terminating the pmd thread. */
-
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
@@ -728,6 +747,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Mutex for 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread,
+ * read/written by the pmd thread. */
+struct hmap tx_bonds OVS_GUARDED;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunnel
  * ports (that support push_tunnel/po

Re: [ovs-dev] [PATCH v8 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-12-18 Thread Vishal Deep Ajmera via dev
> 
> I took a look.  The underlying issue is that code here mixes integers,
> ofp_port_t, and odp_port_t.  OVS uses "sparse" annotations to keep these
> from being confused, since they are different in important ways.  I
> spent some time working through the types here and appended a patch that
> fixes them up.  I also made a bunch of style updates throughout the
> code, which might obscure the relevant changes a bit; my apologies.
> 
> It seems odd that dpif_bond_*() succeed if the dpif doesn't support
> them.  Shouldn't they return an error by default, instead of 0?
> 

I will fix this in next revision. May be not to call these APIs when 
datapath do not support them.

> Is there a reason to allow users to turn this off?  What is the downside
> of enabling it?  Why is it disabled by default?  In general, OVS should
> optimize itself to the extent it can rather than relying on a
> knowledgeable user to tweak it.  If it's necessary to make it
> configurable, then the documentation (in vswitch.xml) should explain why
> one would want to turn it on or off and what the default is.
> 
> This introduces a new capabilities flag, which should be documented with
> the other capabilities in vswitch.xml.
> 

Do we read these capabilities in the ofproto layer somewhere ? Or is it for
display and documentation purpose? Trying to see if I can use same name
for 'key' in here.

> I'm appending a suggested diff to fold into your patch.
> 
> I did not do a full technical review for correctness.  I'll do that on
> the next revision.

Thanks Ben for fixing the issues.  I will include your patch in next
revision.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v8 0/1] Balance-tcp bond mode optimization

2019-12-17 Thread Vishal Deep Ajmera via dev
v7->v8:
 Removed hash action for balance-tcp mode.
 Removed bond-only pmd reload action.
 Rebased to OVS master.

v6->v7:
 Fixed issue reported by Matteo for bond/show.

v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 503 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  50 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  43 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  10 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 688 insertions(+), 59 deletions(-)

-- 
1.9.1

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v8 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-12-17 Thread Vishal Deep Ajmera via dev
improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.475.7328.21
| 10 4.175.3528.45
| 4003.525.3953.01
| 1k 3.415.2553.78
| 10k2.534.5780.44
| 100k   2.324.2783.59
| 500k   2.324.2783.59
+--+
mpps: million packets per second.
packet size 64 bytes.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
---
 datapath/linux/compat/include/linux/openvswitch.h |   1 +
 lib/dpif-netdev.c | 503 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 +++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  50 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  43 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  10 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 688 insertions(+), 59 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 778827f..a06ae7f 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -994,6 +994,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3f21211..9a2befd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -366,6 +367,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -596,6 +602,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets' */
+struct slave_entry {
+uint32_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -707,7 +727,6 @@ struct dp_netdev_pmd_thread {
 atomic_bool wait_for_reload;/* Can we busy wait for the next reload? */
 atomic_bool reload_tx_qid;  /* Do we need to reload static_tx_qid? */
 atomic_bool exit;   /* For terminating the pmd thread. */
-
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
@@ -728,6 +747,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Mutex for 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread,
+ * read/written by the pmd thread. */
+struct hmap tx_bonds OVS_GUARDED;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunnel
  * ports (that support push_tunnel/pop_tunnel), the othe

Re: [ovs-dev] [PATCH] ofproto: Fix for frequent invalidation of mega flows for push actions

2019-10-09 Thread Vishal Deep Ajmera via dev
 
> Thanks, applied to master.

Thanks Ben.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto: Fix for frequent invalidation of mega flows for push actions

2019-10-09 Thread Vishal Deep Ajmera via dev
When a packet is processed by the slow path and the matching OpenFlow
rule has actions like push_mpls/set_field and push_vlan/set_field, the
ofproto layer un-wildcards the MPLS and VLAN match fields in the megaflow
entry that it plans to install. However, when the megaflow entry is
actually installed, all protocol match fields that are not present in the
packet are wildcarded. Thus, the wildcard bits in the installed megaflow
entry could be different from the bits originally generated by the ofproto
layer.

When the revalidator thread validates a megaflow, it will first query the
ofproto layer to get the wildcard bits and then compare it against the
wildcard bits in the megaflow. If the bits are different the entry will be
removed.  A subsequent packet will again result in the same megaflow entry
being installed only for it to be removed by the revalidator thread. This
cycle will continue and will significantly degrade performance.

This patch fixes the issue by wildcarding flow fields which are not present
in the incoming packet.

Signed-off-by: Vishal Deep Ajmera 
---
 ofproto/ofproto-dpif-xlate.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 44f856d..f92cb62 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -7343,6 +7343,26 @@ xlate_wc_finish(struct xlate_ctx *ctx)
 ctx->wc->masks.tp_src = 0;
 ctx->wc->masks.tp_dst = 0;
 }
+
+/* Clear flow wildcard bits for fields which are not present
+ * in the original packet header. These wildcards may get set
+ * due to push/set_field actions. This results into frequent
+ * invalidation of datapath flows by revalidator thread. */
+
+/* Clear mpls label wc bits if original packet is non-mpls. */
+if (!eth_type_mpls(ctx->xin->upcall_flow->dl_type)) {
+for (i = 0; i < FLOW_MAX_MPLS_LABELS; i++) {
+ctx->wc->masks.mpls_lse[i] = 0;
+}
+}
+/* Clear vlan header wc bits if original packet does not have
+ * vlan header. */
+for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+if (!eth_type_vlan(ctx->xin->upcall_flow->vlans[i].tpid)) {
+ctx->wc->masks.vlans[i].tpid = 0;
+ctx->wc->masks.vlans[i].tci = 0;
+}
+}
 }
 
 /* Translates the flow, actions, or rule in 'xin' into datapath actions in
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.6] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-29 Thread Vishal Deep Ajmera via dev
Hi Darrell, Ben

Sent v2 patch for branch 2.6. It will also apply cleanly to branch 2.7. If 
looks ok, kindly merge.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH branch-2.6 v2] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-29 Thread Vishal Deep Ajmera via dev
From: Darrell Ball 

The ICMPv4 error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPv4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMPv4 related UDP tests to
cover TCP and ICMP inner protocol cases.
Note that ICMPv6 does not have an 8 byte limit for error L4 data.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
Signed-off-by: Darrell Ball 

---
 lib/conntrack.c | 35 ---
 lib/packets.h   |  3 +++
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 8abaf7e..d59083e 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -664,11 +664,12 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 }
 
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
 const struct tcp_header *tcp = data;
 
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
 return false;
 }
 
@@ -680,11 +681,12 @@ extract_l4_tcp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool
-extract_l4_udp(struct conn_key *key, const void *data, size_t size)
+extract_l4_udp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
 const struct udp_header *udp = data;
 
-if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
 return false;
 }
 
@@ -696,7 +698,8 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool extract_l4(struct conn_key *key, const void *data,
-  size_t size, bool *related, const void *l3);
+  size_t size, bool *related, const void *l3,
+  size_t *chk_len);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -728,11 +731,11 @@ reverse_icmp_type(uint8_t type)
  * possible */
 static inline int
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *related, size_t *chk_len)
 {
 const struct icmp_header *icmp = data;
 
-if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
 return false;
 }
 
@@ -783,8 +786,9 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
+size_t check_len = ICMP_ERROR_DATA_L4_LEN;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, _len);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -872,7 +876,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, NULL);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -897,21 +901,22 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
  * an ICMP or ICMP6 header. *
  *
  * If 'related' is NULL, it means that we're already parsing a header nested
- * in an ICMP error.  In this case, we skip checksum and length validation. */
+ * in an ICMP error.  In this case, we skip the checksum and some length
+ * validations. */
 static inline bool
 extract_l4(struct conn_key *key, const void *data, size_t size, bool *related,
-   const void *l3)
+   const void *l3, size_t *chk_len)
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3))
-   && extract_l4_tcp(key, data, size);
+   && extract_l4_tcp(key, data, size, chk_len);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || check_l4_udp(key, data, size, l3))
-   && extract_l4_udp(key, data, size);
+   && extract_l4_udp(key, data, size, chk_len);
 } else if (key->dl_type == htons(ETH_TYPE_IP)
&& key->nw_proto == IPPROTO_ICMP) {
 re

Re: [ovs-dev] [patch v2] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-27 Thread Vishal Deep Ajmera via dev
> 
> Thanks Darrell. I have sent patches for branch 2.8 and 2.9.
> For branches before 2.7 & 2.6 it is giving quite a few conflicts.
> Can you please have a look at it?
> 
Hi Darrell, Ben
I have tried manual merge on OVS 2.6 branch and sent a patch.
Can you please review and apply it on branch?

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH branch-2.6] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-27 Thread Vishal Deep Ajmera via dev
From: Darrell Ball 

The ICMPv4 error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPv4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMPv4 related UDP tests to
cover TCP and ICMP inner protocol cases.
Note that ICMPv6 does not have an 8 byte limit for error L4 data.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
Signed-off-by: Darrell Ball 
Signed-off-by: Ben Pfaff 
(cherry picked from commit 6c2a93064afe8d812e4506880d1fd8f96108f92a)

Conflicts:
lib/conntrack.c
---
 lib/conntrack.c | 35 ---
 lib/packets.h   |  3 +++
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 8abaf7e..d59083e 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -664,11 +664,12 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 }
 
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
 const struct tcp_header *tcp = data;
 
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
 return false;
 }
 
@@ -680,11 +681,12 @@ extract_l4_tcp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool
-extract_l4_udp(struct conn_key *key, const void *data, size_t size)
+extract_l4_udp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
 const struct udp_header *udp = data;
 
-if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
 return false;
 }
 
@@ -696,7 +698,8 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool extract_l4(struct conn_key *key, const void *data,
-  size_t size, bool *related, const void *l3);
+  size_t size, bool *related, const void *l3,
+  size_t *chk_len);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -728,11 +731,11 @@ reverse_icmp_type(uint8_t type)
  * possible */
 static inline int
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *related, size_t *chk_len)
 {
 const struct icmp_header *icmp = data;
 
-if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
 return false;
 }
 
@@ -783,8 +786,9 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
+size_t check_len = ICMP_ERROR_DATA_L4_LEN;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, _len);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -872,7 +876,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, NULL);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -897,21 +901,22 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
  * an ICMP or ICMP6 header. *
  *
  * If 'related' is NULL, it means that we're already parsing a header nested
- * in an ICMP error.  In this case, we skip checksum and length validation. */
+ * in an ICMP error.  In this case, we skip the checksum and some length
+ * validations. */
 static inline bool
 extract_l4(struct conn_key *key, const void *data, size_t size, bool *related,
-   const void *l3)
+   const void *l3, size_t *chk_len)
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3))
-   && extract_l4_tcp(key, data, size);
+   && extract_l4_tcp(key, data, size, chk_len);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || check_l4_udp(key, data, size, l3))
-   && extract_l4_udp(key, data, size);
+   && extract_l4_udp(key, data, size, ch

Re: [ovs-dev] [patch v2] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-23 Thread Vishal Deep Ajmera via dev

> 
> Thanks
> This is eligible to go back to 2.6; it should apply cleanly back to 2.9; I
> can look into the remaining ones,
> unless Vishal would like to do those.

Thanks Darrell. I have sent patches for branch 2.8 and 2.9.
For branches before 2.7 & 2.6 it is giving quite a few conflicts.
Can you please have a look at it?

Hi Ben, 

Can you please apply the patch on branch 2.10 & 2.11. It should apply
cleanly on them.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH branch-2.9] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-23 Thread Vishal Deep Ajmera via dev
From: Darrell Ball 

The ICMPv4 error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPv4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMPv4 related UDP tests to
cover TCP and ICMP inner protocol cases.
Note that ICMPv6 does not have an 8 byte limit for error L4 data.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
Signed-off-by: Darrell Ball 
Signed-off-by: Ben Pfaff 
(cherry picked from commit 6c2a93064afe8d812e4506880d1fd8f96108f92a)
---
 lib/conntrack.c | 41 -
 lib/packets.h   |  3 +++
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 48c54d3..d874551 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1712,9 +1712,10 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 }
 
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
 return false;
 }
 
@@ -1727,9 +1728,10 @@ extract_l4_tcp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool
-extract_l4_udp(struct conn_key *key, const void *data, size_t size)
+extract_l4_udp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
 return false;
 }
 
@@ -1743,7 +1745,7 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 
 static inline bool extract_l4(struct conn_key *key, const void *data,
   size_t size, bool *related, const void *l3,
-  bool validate_checksum);
+  bool validate_checksum, size_t *chk_len);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -1775,9 +1777,9 @@ reverse_icmp_type(uint8_t type)
  * possible */
 static inline int
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *related, size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
 return false;
 }
 
@@ -1828,8 +1830,9 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
+size_t check_len = ICMP_ERROR_DATA_L4_LEN;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, _len);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1916,7 +1919,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, NULL);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1941,26 +1944,29 @@ extract_l4_icmp6(struct conn_key *key, const void 
*data, size_t size,
  * an ICMP or ICMP6 header.
  *
  * If 'related' is NULL, it means that we're already parsing a header nested
- * in an ICMP error.  In this case, we skip checksum and length validation. */
+ * in an ICMP error.  In this case, we skip the checksum and some length
+ * validations. */
 static inline bool
 extract_l4(struct conn_key *key, const void *data, size_t size, bool *related,
-   const void *l3, bool validate_checksum)
+   const void *l3, bool validate_checksum, size_t *chk_len)
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3,
-validate_checksum)) && extract_l4_tcp(key, data, size);
+validate_checksum))
+   && extract_l4_tcp(key, data, size, chk_len);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || check_l4_udp(key, data, size, l3,
-validate_checksum)) && extract_l4_udp(key, data, size);
+validate_checksum))
+   &&

[ovs-dev] [PATCH branch-2.8] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-23 Thread Vishal Deep Ajmera via dev
From: Darrell Ball 

The ICMPv4 error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPv4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMPv4 related UDP tests to
cover TCP and ICMP inner protocol cases.
Note that ICMPv6 does not have an 8 byte limit for error L4 data.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
Signed-off-by: Darrell Ball 
Signed-off-by: Ben Pfaff 
(cherry picked from commit 6c2a93064afe8d812e4506880d1fd8f96108f92a)

Conflicts:
lib/conntrack.c
---
 lib/conntrack.c | 41 -
 lib/packets.h   |  3 +++
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 3e9e633..eda0df3 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1589,11 +1589,12 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 }
 
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
 const struct tcp_header *tcp = data;
 
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
 return false;
 }
 
@@ -1605,11 +1606,12 @@ extract_l4_tcp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool
-extract_l4_udp(struct conn_key *key, const void *data, size_t size)
+extract_l4_udp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
 const struct udp_header *udp = data;
 
-if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
 return false;
 }
 
@@ -1622,7 +1624,7 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 
 static inline bool extract_l4(struct conn_key *key, const void *data,
   size_t size, bool *related, const void *l3,
-  bool validate_checksum);
+  bool validate_checksum, size_t *chk_len);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -1654,11 +1656,11 @@ reverse_icmp_type(uint8_t type)
  * possible */
 static inline int
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *related, size_t *chk_len)
 {
 const struct icmp_header *icmp = data;
 
-if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
 return false;
 }
 
@@ -1708,8 +1710,9 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
+size_t check_len = ICMP_ERROR_DATA_L4_LEN;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, _len);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1797,7 +1800,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, NULL);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1822,26 +1825,29 @@ extract_l4_icmp6(struct conn_key *key, const void 
*data, size_t size,
  * an ICMP or ICMP6 header. **
  *
  * If 'related' is NULL, it means that we're already parsing a header nested
- * in an ICMP error.  In this case, we skip checksum and length validation. */
+ * in an ICMP error.  In this case, we skip the checksum and some length
+ * validations. */
 static inline bool
 extract_l4(struct conn_key *key, const void *data, size_t size, bool *related,
-   const void *l3, bool validate_checksum)
+   const void *l3, bool validate_checksum, size_t *chk_len)
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3,
-validate_checksum)) && extract_l4_tcp(key, data, size);
+validate_checksum))
+   && extract_l4_tcp(key, data, size, chk_len);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || che

Re: [ovs-dev] [Branch 2.9 Patch] compat: Fixup ipv6 fragmentation on 4.9.135+ kernels

2019-09-20 Thread Vishal Deep Ajmera via dev

> >
> > I'll be able to provide the patches but you will need to test them
> > since I no longer have an
> > environment that will build branches older than 2.8.  I'll post
> > patches for 2.8, 2.7 and 2.6
> > in a bit.
> 
> The patches are posted Vishal, please test and review.
> 
Thanks Greg. I will apply the patches and compile the package on target
system.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [Branch 2.9 Patch] compat: Fixup ipv6 fragmentation on 4.9.135+ kernels

2019-09-19 Thread Vishal Deep Ajmera via dev
> > Upstream commit 648700f76b03 ("inet: frags: use rhashtables...") changed
> > how ipv6 fragmentation is implemented.  This patch was backported to
> > the upstream stable 4.9.x kernel starting at 4.9.135.
> >
> > This patch creates the compatibility layer changes required to both
> > compile and also operate correctly with ipv6 fragmentation on these
> > kernels. Check if the inet_frags 'rnd' field is present to key on
> > whether the upstream patch is present.  Also update Travis to the
> > latest 4.9 kernel release so that this patch is compile tested.
> >
> > Passes Travis:
> > https://travis-ci.org/gvrose8192/ovs-experimental/builds/478033409
> >
Hi Ben, Greg,

We are using OVS 2.6 branch and facing same issue with kernel 4.4.177
(ubuntu-4.4.0.148). Looks like this field 'rnd' is not present in inet_frags
structure here.

Is it possible to get a backport of this patch on 2.6  & 2.8 branch as well
?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 0/1] Balance-tcp bond mode optimization

2019-09-19 Thread Vishal Deep Ajmera via dev
> 
> All unit test passed, plus others I did.
> 
> Tested-by: Matteo Croce 
> 
Thank you Matteo for your efforts in testing this patch.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 0/1] Balance-tcp bond mode optimization

2019-09-17 Thread Vishal Deep Ajmera via dev

> >
> Let me check this in my setup. I always used 'netdev' bridges for testing my
> patch.
> May be I need to be check for data path support in the display function as
> well.

Hi,
I have sent v7 version of patch fixing this issue.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v7 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-09-17 Thread Vishal Deep Ajmera via dev
ed idea, the following perf improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.535.8929.96
| 10 4.165.8941.51
| 4003.555.5556.22
| 1k 3.445.4558.30
| 10k2.504.6385.34
| 100k   2.294.2786.15
| 500k   2.254.2789.23
+--+
mpps: million packets per second.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 515 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 ++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  52 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  39 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 700 insertions(+), 60 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 65a003a..20467f9 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -734,6 +734,7 @@ enum ovs_hash_alg {
OVS_HASH_ALG_L4,
 #ifndef __KERNEL__
OVS_HASH_ALG_SYM_L4,
+   OVS_HASH_ALG_L4_RSS,
 #endif
__OVS_HASH_MAX
 };
@@ -989,6 +990,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a88a78f..05a0ca3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -366,6 +367,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -596,6 +602,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets' */
+struct slave_entry {
+uint32_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -708,6 +728,11 @@ struct dp_netdev_pmd_thread {
 atomic_bool reload_tx_qid;  /* Do we need to reload static_tx_qid? */
 atomic_bool exit;   /* For terminating the pmd thread. */
 
+atomic_bool reload_bond_cache;  /* Do we need to load tx bond cache?
+ * Note: This flag is decoupled from 
'reload'
+ * flag otherwise full pmd reload will 
become
+ * frequent and costly everytime bond
+ * rebalancing is done. */
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread

[ovs-dev] [PATCH v7 0/1] Balance-tcp bond mode optimization

2019-09-17 Thread Vishal Deep Ajmera via dev
v6->v7:
 Fixed issue reported by Matteo for bond/show.

v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 515 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 ++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  52 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  39 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   6 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 700 insertions(+), 60 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 0/1] Balance-tcp bond mode optimization

2019-09-14 Thread Vishal Deep Ajmera via dev
> >
> > I confirm a decent performance improvement with DPDK and balance-tcp
> bonding:
> >
> > lb-output-action=false
> >
> > rx: 740 Mbps 1446 kpps
> >
> > lb-output-action=true
> >
> > rx: 860 Mbps 1680 kpps
> >
> > I'm running a very simple test with a tweaked version of testpmd which
> > generates 256 L4 flows, I guess that with much flows the improvement
> > is way higher.
> >
> > Tested-by: Matteo Croce 
> >

Thank you Matteo for testing this patch.

> I found an issue with the patch. It's not 100% reproducible, but sometimes 
> the
> option gets enabled regardless of the bonding type and the configuration.
> This breaks the unit tests, as the bond/show output is wrong:
>
> # ovs-vsctl add-br br0
> [63129.589500] device ovs-system entered promiscuous mode [63129.591802]
> device br0 entered promiscuous mode
>
> # ovs-vsctl add-bond br0 bond dummy0 dummy1 -- set Port bond lacp=active
> bond-mode=active-backup [63150.700542] device dummy1 entered
> promiscuous mode [63150.700892] device dummy0 entered promiscuous mode
>
Let me check this in my setup. I always used 'netdev' bridges for testing my 
patch.
May be I need to be check for data path support in the display function as 
well.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-09-10 Thread Vishal Deep Ajmera via dev
ed idea, the following perf improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.535.8929.96
| 10 4.165.8941.51
| 4003.555.5556.22
| 1k 3.445.4558.30
| 10k2.504.6385.34
| 100k   2.294.2786.15
| 500k   2.254.2789.23
+--+
mpps: million packets per second.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 515 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 ++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  52 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  39 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 698 insertions(+), 60 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 65a003a..20467f9 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -734,6 +734,7 @@ enum ovs_hash_alg {
OVS_HASH_ALG_L4,
 #ifndef __KERNEL__
OVS_HASH_ALG_SYM_L4,
+   OVS_HASH_ALG_L4_RSS,
 #endif
__OVS_HASH_MAX
 };
@@ -989,6 +990,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a88a78f..05a0ca3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -366,6 +367,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -596,6 +602,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets' */
+struct slave_entry {
+uint32_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -708,6 +728,11 @@ struct dp_netdev_pmd_thread {
 atomic_bool reload_tx_qid;  /* Do we need to reload static_tx_qid? */
 atomic_bool exit;   /* For terminating the pmd thread. */
 
+atomic_bool reload_bond_cache;  /* Do we need to load tx bond cache?
+ * Note: This flag is decoupled from 
'reload'
+ * flag otherwise full pmd reload will 
become
+ * frequent and costly everytime bond
+ * rebalancing is done. */
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this

[ovs-dev] [PATCH v6 0/1] Balance-tcp bond mode optimization

2019-09-10 Thread Vishal Deep Ajmera via dev
v5->v6:
 Addressed comments from Ilya Maximets.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362001.html
 Rebased to OVS master.

v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 515 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 ++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  52 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  39 +-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 698 insertions(+), 60 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-28 Thread Vishal Deep Ajmera
Thanks Darrell. Patch looks ok to me.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-28 Thread Vishal Deep Ajmera
That is interesting
i just tried applying on top of tree and I see that the git applies some 
changes (2 lines)
in extract_l4_icmp6() rather the intended extract_l4_icmp() as in the patch I 
sent out.
My guess is that the surrounding lines are identical in the 2 functions and I 
had other
patches in the same branch shifting the patch downward, hence git applied the 
changes
to extract_l4_icmp6() rather than extract_l4_icmp()

I'll make the changes on a clean branch and resend.

Thanks. I applied this patch and looks ok to me.

JTBC, the 8 byte ICMP error data L4 length restriction is only for V4.
ICMP6 does not have this restriction; see https://tools.ietf.org/html/rfc4443

In my opinion, we should limit the check to < 8 bytes even in case of ICMPv6 as 
that is all
is required from the TCP header to extract port numbers and aligns it with 
ICMPv4.
Specially because RFC is not mandating minimum size for L4 header in case of 
ICMPv6.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-27 Thread Vishal Deep Ajmera
Thanks Ilya for comments. I will address them in the next patch-set.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-27 Thread Vishal Deep Ajmera
> 
> Hi Vishal,
> 
> I quickly tested your patch on two servers with 2 ixgbe cards each linked via 
> a
> juniper switch with LACP.
> With testpmd using a single core the switching rate raised from ~2.0 Mpps to
> ~2.4+ Mpps, so I read at least a +20% gain.
> 
> Please add an example command on how to enable it in the commit message,
> e.g.
> 
>   ovs-vsctl set port bond0 other_config:opt-bond-tcp=true

Thanks Matteo for testing the patch and sharing results. I will add example in 
the commit message for next patch-set.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-27 Thread Vishal Deep Ajmera
Hi Darrell,

Thanks for the patch. When I applied the patch to latest master, 
I see that we take care of length check (< 8) only for ICMPv6 and 
not for ICMPv4. We need to do it for ICMPv4 as well.

Also, we are already using 'related' to skip or not to skip length check.

 * If 'related' is NULL, it means that we're already parsing a header nested
 * in an ICMP error.  In this case, we skip checksum and length validation.

However we continue to validate length in extract_l4_tcp (<8 or <20). 
I understand that check for minimum 8 bytes header is needed to make
sure we can extract tcp port numbers.

Can we instead try to converge all checks at one place and still take care 
of nested header? In my opinion it will simplify the code.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Correct length check for tcp packet inside ICMP data.

2019-08-26 Thread Vishal Deep Ajmera
Hi Darrell,

Can we get this merged into master as well as backports to affected branches?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-25 Thread Vishal Deep Ajmera
Hi Ilya, Ben,

Any thoughts on patch-set v5? Rebalancing is now supported with optimized tcp 
bond implementation.

Warm Regards,
Vishal Ajmera

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org  boun...@openvswitch.org> On Behalf Of Vishal Deep Ajmera
> Sent: Friday, August 9, 2019 9:34 AM
> To: Ilya Maximets ; d...@openvswitch.org
> Cc: Manohar Krishnappa Chidambaraswamy ; Nitin
> Katiyar 
> Subject: Re: [ovs-dev] [PATCH v3] Avoid dp_hash recirculation for balance-tcp
> bond selection mode
> 
> 
> > Comments for v2 wasn't addressed and still valid:
> > https://protect2.fireeye.com/url?k=c85828c4-94d1f2d4-c858685f-
> > 0cc47ad93e2a-
> >
> cd974aa1a210bc2b=1=https%3A%2F%2Fmail.openvswitch.org%2Fpiperma
> > il%2Fovs-dev%2F2019-July%2F360452.html
> 
> Patch-set v5 set for review. It now includes bond_rebalance support.
> 
> A new atomic flag reload_bond_cache is introduced to allow pmd to reload only
> bond cache in order to apply rebalancing. This does not require full pmd 
> reload
> (including port/queue reloads, dfc flush etc.). Also rebalancing can be a 
> frequent
> operation hence it will save pmd from frequent full reloads minimizing traffic
> impact.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] conntrack: Correct length check for tcp packet inside ICMP data.

2019-08-23 Thread Vishal Deep Ajmera
An ICMP packet with type destination or host not reachable also carries
28 bytes of ICMP data field. This data field contains IP header and TCP
header (partial first 8 bytes) of the original packet for which ICMP
is being generated.

Conntrack module when processing these ICMP packets checks for TCP header
length (20 bytes). Since TCP header is partial the length check fails and
packet is erroneously dropped.

This patch fixes length check for TCP header when processing ICMP data
fields.

Signed-off-by: Vishal Deep Ajmera 
---
 lib/conntrack.c | 14 +++---
 lib/packets.h   |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5f60fea..0618fdd 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1513,10 +1513,18 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 return validate_checksum ? checksum_valid(key, data, size, l3) : true;
 }
 
+/* If related is NULL, we are parsing nested TCP header  inside ICMP packet.
+ * Only 8 bytes of TCP header is required by RFC to be present in such case.
+ */
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   bool *related)
 {
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (!related) {
+if (size < ICMP_L4_DATA_LEN) {
+return false;
+}
+} else if (size < TCP_HEADER_LEN) {
 return false;
 }
 
@@ -1750,7 +1758,7 @@ extract_l4(struct conn_key *key, const void *data, size_t 
size, bool *related,
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3,
-validate_checksum)) && extract_l4_tcp(key, data, size);
+  validate_checksum)) && extract_l4_tcp(key, data, size, related);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || check_l4_udp(key, data, size, l3,
 validate_checksum)) && extract_l4_udp(key, data, size);
diff --git a/lib/packets.h b/lib/packets.h
index a4bee38..2bc65c9 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -886,6 +886,7 @@ struct tcp_header {
 ovs_be16 tcp_urg;
 };
 BUILD_ASSERT_DECL(TCP_HEADER_LEN == sizeof(struct tcp_header));
+#define ICMP_L4_DATA_LEN 8
 
 /* Connection states.
  *
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-13 Thread Vishal Deep Ajmera


> 
> Hi,
> 
> why not a static function in the header file? So it gets inlined.
> 
> Regards,
> --
> Matteo Croce
> per aspera ad upstream

Thanks Matteo for looking into this patch-set. Yes I agree. I will address your 
suggestion in the next revision.

Warm Regards,
Vishal
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-08 Thread Vishal Deep Ajmera
 
> Comments for v2 wasn't addressed and still valid:
> https://protect2.fireeye.com/url?k=c85828c4-94d1f2d4-c858685f-
> 0cc47ad93e2a-
> cd974aa1a210bc2b=1=https%3A%2F%2Fmail.openvswitch.org%2Fpiperma
> il%2Fovs-dev%2F2019-July%2F360452.html

Patch-set v5 set for review. It now includes bond_rebalance support.

A new atomic flag reload_bond_cache is introduced to allow pmd to 
reload only bond cache in order to apply rebalancing. This does not require
full pmd reload (including port/queue reloads, dfc flush etc.). Also rebalancing
can be a frequent operation hence it will save pmd from frequent full reloads
minimizing traffic impact.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5] Balance-tcp bond mode optimization

2019-08-08 Thread Vishal Deep Ajmera
v4->v5:
 Support for stats per hash bucket.
 Support for dynamic load balancing.
 Rebased to OVS Master.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 528 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 ++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  52 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  65 ++-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 736 insertions(+), 61 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-08-08 Thread Vishal Deep Ajmera
-
With a prototype of the proposed idea, the following perf improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.535.8929.96
| 10 4.165.8941.51
| 4003.555.5556.22
| 1k 3.445.4558.30
| 10k2.504.6385.34
| 100k   2.294.2786.15
| 500k   2.254.2789.23
+--+
mpps: million packets per second.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 528 --
 lib/dpif-netlink.c|   3 +
 lib/dpif-provider.h   |   8 +
 lib/dpif.c|  48 ++
 lib/dpif.h|   7 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  52 ++-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  65 ++-
 ofproto/ofproto-dpif.c|  32 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 736 insertions(+), 61 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 65a003a..6dafcfb 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -734,6 +734,7 @@ enum ovs_hash_alg {
OVS_HASH_ALG_L4,
 #ifndef __KERNEL__
OVS_HASH_ALG_SYM_L4,
+OVS_HASH_ALG_L4_RSS,
 #endif
__OVS_HASH_MAX
 };
@@ -989,6 +990,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d0a1c58..9db2a73 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -366,6 +367,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -596,6 +602,20 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct tx_bond 'slave_buckets' */
+struct slave_entry {
+uint32_t slave_id;
+atomic_ullong n_packets;
+atomic_ullong n_bytes;
+};
+
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+struct slave_entry slave_buckets[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -708,6 +728,11 @@ struct dp_netdev_pmd_thread {
 atomic_bool reload_tx_qid;  /* Do we need to reload static_tx_qid? */
 atomic_bool exit;   /* For terminating the pmd thread. */
 
+atomic_bool reload_bond_cache;  /* Do we need to load tx bond cache?
+ * Note: This flag is decoupled from 
'reload'
+ * flag otherwise full pmd reload will 
become
+ * frequent and costly everytime bond
+ * rebalancing is done. */
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;   

Re: [ovs-dev] [PATCH] flow: Wildcard UDP ports when using SYMMETRIC_L4 hash for select groups.

2019-07-17 Thread Vishal Deep Ajmera
> 
> Applied to master, thanks!

Thanks Ben.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-07-15 Thread Vishal Deep Ajmera
-
With a prototype of the proposed idea, the following perf improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.535.8929.96
| 10 4.165.8941.51
| 4003.555.5556.22
| 1k 3.445.4558.30
| 10k2.504.6385.34
| 100k   2.294.2786.15
| 500k   2.254.2789.23
+--+
mpps: million packets per second.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 462 --
 lib/dpif-netlink.c|   2 +
 lib/dpif-provider.h   |   5 +
 lib/dpif.c|  36 ++
 lib/dpif.h|   5 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  35 +-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  65 ++-
 ofproto/ofproto-dpif.c|  31 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 640 insertions(+), 55 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 65a003a..6dafcfb 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -734,6 +734,7 @@ enum ovs_hash_alg {
OVS_HASH_ALG_L4,
 #ifndef __KERNEL__
OVS_HASH_ALG_SYM_L4,
+OVS_HASH_ALG_L4_RSS,
 #endif
__OVS_HASH_MAX
 };
@@ -989,6 +990,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 6b99a3c..9f67325 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -78,6 +78,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -383,6 +384,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -613,6 +619,13 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+uint32_t slave_map[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -745,6 +758,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Mutex for 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread,
+ * read/written by the pmd thread. */
+struct hmap tx_bonds OVS_GUARDED;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunnel
  * ports (that support push_tunnel/pop_tunnel), the other contains ports
  * with at least one txq (that support send).  A port can be in both.
@@ -757,6 +775,8 @@ struct dp_netdev_pmd_thread {
  * other instance will only be accessed by its own pmd thread. */
 struct hmap tnl_port_cache;
 struct hmap send_port_cache;
+/* These are thread-local copies of 'tx_bonds' */
+struct hmap bond_cache;
 
 /* Keep track of detailed PMD perf

[ovs-dev] [PATCH v4] Balance-tcp bond mode optimization

2019-07-15 Thread Vishal Deep Ajmera
v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

v3->v4:
 Addressed Ilya Maximets comments.
 https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360452.html

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 462 --
 lib/dpif-netlink.c|   2 +
 lib/dpif-provider.h   |   5 +
 lib/dpif.c|  36 ++
 lib/dpif.h|   5 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  35 +-
 ofproto/bond.h|   9 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  65 ++-
 ofproto/ofproto-dpif.c|  31 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 18 files changed, 640 insertions(+), 55 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] flow: Wildcard UDP ports when using SYMMETRIC_L4 hash for select groups.

2019-07-15 Thread Vishal Deep Ajmera
> 
> This patch wildcards UDP ports when using select group with SYMMETRIC_L4
> hash function.
> 

Hi Ben,

Any thoughts on this patch ? This looks like a bug in OVS master.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] flow: Wildcard UDP ports when using SYMMETRIC_L4 hash for select groups.

2019-07-09 Thread Vishal Deep Ajmera
UDP source and destination ports are not used to derive the hash index
used for selecting the bucket in case of SYMMETRIC_L4 hash based select
groups. However, they are un-wildcarded in the megaflow entry match criteria.
This results in distinct megaflow entry being created for each pair of UDP
source and destination ports unnecessarily and causes significant performance
deterioration when the megaflow cache limit is reached.

This patch wildcards UDP ports when using select group with SYMMETRIC_L4
hash function.

Signed-off-by: Vishal Deep Ajmera 
CC: Jan Scheurich 
---
 lib/flow.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/flow.c b/lib/flow.c
index de93704..95da7d4 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -2478,7 +2478,12 @@ flow_mask_hash_fields(const struct flow *flow, struct 
flow_wildcards *wc,
 }
 if (is_ip_any(flow)) {
 memset(>masks.nw_proto, 0xff, sizeof wc->masks.nw_proto);
-flow_unwildcard_tp_ports(flow, wc);
+/* Unwildcard port only for non-UDP packets as udp port
+ * numbers are not used in hash calculations.
+ */
+if (flow->nw_proto != IPPROTO_UDP) {
+flow_unwildcard_tp_ports(flow, wc);
+}
 }
 for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
 wc->masks.vlans[i].tci |= htons(VLAN_VID_MASK | VLAN_CFI);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Preserve ofport number for failed ofport across restarts

2019-07-09 Thread Vishal Deep Ajmera
> 
> I've been taking a look at this patch for the last few minutes.  It 
> introduces a lot
> of mechanism for the use case.  Did you consider any simpler mechanisms to
> achieve the same effect?  What prevented them from working?
> 
I agree Ben. This change does bring some complexity to the code. A simpler 
solution would be to leave the ofport number in ovsdb record of the interface 
(instead of setting it to -1). But issue was when the port delete happens we do 
not see this record to be able to free the ofport number. Is there a way to 
handle this?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-07-09 Thread Vishal Deep Ajmera
-
With a prototype of the proposed idea, the following perf improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.535.8929.96
| 10 4.165.8941.51
| 4003.555.5556.22
| 1k 3.445.4558.30
| 10k2.504.6385.34
| 100k   2.294.2786.15
| 500k   2.254.2789.23
+--+
mpps: million packets per second.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 491 --
 lib/dpif-provider.h   |   5 +
 lib/dpif.c|  36 ++
 lib/dpif.h|   5 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  34 +-
 ofproto/bond.h|   8 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  67 ++-
 ofproto/ofproto-dpif.c|  31 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 16 files changed, 666 insertions(+), 55 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 65a003a..6dafcfb 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -734,6 +734,7 @@ enum ovs_hash_alg {
OVS_HASH_ALG_L4,
 #ifndef __KERNEL__
OVS_HASH_ALG_SYM_L4,
+OVS_HASH_ALG_L4_RSS,
 #endif
__OVS_HASH_MAX
 };
@@ -989,6 +990,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f4b59e4..600a79e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -384,6 +385,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -614,6 +620,13 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+uint32_t slave_map[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -708,6 +721,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Mutex for 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread,
+ * read/written by the pmd thread. */
+struct hmap tx_bonds OVS_GUARDED;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunnel
  * ports (that support push_tunnel/pop_tunnel), the other contains ports
  * with at least one txq (that support send).  A port can be in both.
@@ -720,6 +738,8 @@ struct dp_netdev_pmd_thread {
  * other instance will only be accessed by its own pmd thread. */
 struct hmap tnl_port_cache;
 struct hmap send_port_cache;
+/* These are thread-local copies of 'tx_bonds' */
+struct hmap bond_cache;
 
 /* Keep track of detailed PMD performance statistics. */
 struct pmd_perf_stats perf_stats;
@@ -799,6 +819,12 @@ static

[ovs-dev] [PATCH v3] Balance-tcp bond mode optimization

2019-07-09 Thread Vishal Deep Ajmera
v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

v2->v3:
 Rebased to OVS master.
 Fixed git merge issue.

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 491 --
 lib/dpif-provider.h   |   5 +
 lib/dpif.c|  36 ++
 lib/dpif.h|   5 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  34 +-
 ofproto/bond.h|   8 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  67 ++-
 ofproto/ofproto-dpif.c|  31 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 16 files changed, 666 insertions(+), 55 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-07-09 Thread Vishal Deep Ajmera
> 
> Thanks for the patch.
> 
> I wasn't able to apply this patch.  In two places, the patch was corrupted 
> (the
> patch didn't properly represent page-break characters in the source code but
> transformed them into blank lines).  I fixed that up after figuring out what 
> was
> going on, but after that there were patch rejects.
> 
> It might be a good idea to resubmit this as a Github pull request.
> 
Hi Ben,

I have some issues sending ovs patches through git send-email. I am trying to 
get it fixed. I will send the updated & rebased patch soon.

Apologies for the trouble.

Warm Regards,
Vishal
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-06-25 Thread Vishal Deep Ajmera
V2 patch- set is posted for review.

> 
> One more thing:
> Despite of usual OVS bonding, this implementation doesn't support
> shifting the load between ports. Am I right?
> This could be an issue, because few heavy flows could be mapped to
> a single port, while other ports will be underloaded. This will
> be a bad case for tunnelling where we have only few heavy flows.
> As I understood, this version of bonding doesn't support any load
> statistics.
> [manu] Yes that’s correct. This implementation does not yet support
> accumulation of per slave stats (in "struct bond_entry"). Since load 
> balancing is
> done without using the dp_hashed flows, rule level stats can't be used and
> bond_rebalance() won't take effect. I was planning to add per-slave stats
> collection/accumulation in OVS_ACTION_ATTR_LB_OUTPUT handling. This will
> be done in another patch set.
> 
I will address this in V3 patch set.

> Best regards, Ilya Maximets.
> 
> > Problem:
> > 
> > In OVS-DPDK, flows with output over a bond interface of type 
> “balance-tcp”
> > (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer 
> into
> > "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> > forwarded to the bond member port based on 8-bits of the datapath hash
> > value computed through dp_hash. This causes performance degradation in
> the
> > following ways:
> >
> > 1. L4-Hash computation in software is CPU intensive, it consumes
> > considerable CPU cycles of the PMD.
> 
> RSS is in use in most cases in current master and 2.9. Details below.
> [manu] OK Thanx. I was working on an earlier version of OVS and didn’t notice 
> it
> while porting to master.
> 
> >
> > 2. The recirculation of the packet implies another lookup of the 
> packet’s
> > flow key in the exact match cache (EMC) and potentially Megaflow 
> classifier
> > (DPCLS). This is the biggest cost factor.
> >
> > 3. The recirculated packets have a new “RSS” hash and compete with the
> > original packets for the scarce number of EMC slots. This implies more
> > EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
> >
> > 4. The 256 extra megaflow entries per bond for dp_hash bond selection 
> put
> > additional load on the revalidation threads.
> >
> > Owing to this performance degradation, deployments stick to 
> “balance-slb”
> > bond mode even though it does not do active-active load balancing for
> > VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> > source MAC address.
> >
> > Proposed optimization:
> > --
> > This proposal has 2 main optimizations in balance-tcp handling at 
> egress.
> >
> > 1. When feasible, re-use the existing L4 RSS-hash of the packet for bond
> > selection instead of computing another L4-hash in software.
> 
> This is already done. See commit
> 95a6cb3497c3 ("odp-execute: Reuse rss hash in OVS_ACTION_ATTR_HASH.")
> 
> It was done a year ago and, currently, if RSS is available it's used for
> OVS_ACTION_ATTR_HASH while balanced bonding handling.
> 
> So, at least, you should reword a lot of RSS related comments around the
> code.
> [manu] With this I think OVS_ACTION_ATTR_HASH can be reused and only
> OVS_ACTION_ATTR_RECIRC action can be replaced with
> OVS_ACTION_ATTR_LB_OUTPUT.
> So it will be "HASH + LB-OUTPUT" instead of existing "HASH + RECIRC".
> Will evaluate this and then send v2 diffs.

Included this in V2 patch set.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-06-25 Thread Vishal Deep Ajmera
 - slave 1

Performance improvement:

With a prototype of the proposed idea, the following perf improvement is seen
with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
is even more enhanced (due to reduced number of flows).

1 VM:
*
+--+
| mpps |
+--+
| Flows  master  with-opt.   %delta|
+--+
| 1  4.535.8929.96
| 10 4.165.8941.51
| 4003.555.5556.22
| 1k 3.445.4558.30
| 10k2.504.6385.34
| 100k   2.294.2786.15
| 500k   2.254.2789.23
+--+
mpps: million packets per second.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Vishal Deep Ajmera 

CC: Jan Scheurich 
CC: Venkatesan Pradeep 
CC: Nitin Katiyar 
---
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 491 --
 lib/dpif-provider.h   |   5 +
 lib/dpif.c|  36 ++
 lib/dpif.h|   5 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  34 +-
 ofproto/bond.h|   8 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  67 ++-
 ofproto/ofproto-dpif.c|  31 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 16 files changed, 666 insertions(+), 55 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 65a003a..6dafcfb 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -734,6 +734,7 @@ enum ovs_hash_alg {
OVS_HASH_ALG_L4,
 #ifndef __KERNEL__
OVS_HASH_ALG_SYM_L4,
+OVS_HASH_ALG_L4_RSS,
 #endif
__OVS_HASH_MAX
 };
@@ -989,6 +990,7 @@ enum ovs_action_attr {
 #ifndef __KERNEL__
OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */
+   OVS_ACTION_ATTR_LB_OUTPUT, /* bond-id */
 #endif
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 4e73f96..b38c259 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,6 +79,7 @@
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -384,6 +385,11 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+/* Bonds.
+ *
+ * Any lookup into 'bonds' requires taking 'bond_mutex'. */
+struct ovs_mutex bond_mutex;
+struct hmap bonds;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -612,6 +618,13 @@ struct tx_port {
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
 
+/* Contained by struct dp_netdev_pmd_thread's 'bond_cache' or 'tx_bonds'. */
+struct tx_bond {
+struct hmap_node node;
+uint32_t bond_id;
+uint32_t slave_map[BOND_BUCKETS];
+};
+
 /* A set of properties for the current processing loop that is not directly
  * associated with the pmd thread itself, but with the packets being
  * processed or the short-term system configuration (for example, time).
@@ -706,6 +719,11 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
+struct ovs_mutex bond_mutex;/* Mutex for 'tx_bonds'. */
+/* Map of 'tx_bond's used for transmission.  Written by the main thread,
+ * read/written by the pmd thread. */
+struct hmap tx_bonds OVS_GUARDED;
+
 /* These are thread-local copies of 'tx_ports'.  One contains only tunnel
  * ports (that support push_tunnel/pop_tunnel), the other contains ports
  * with at least one txq (that support send).  A port can be in both.
@@ -718,6 +736,8 @@ struct dp_netdev_pmd_thread {
  * other instance will only be accessed by its own pmd thread. */
 struct hmap tnl_port_cache;
 struct hmap send_port_cache;
+/* These are thread-local copies of 'tx_bonds' */
+struct hmap bond_cache;
 
 /* Keep track of detailed PMD performance statistics. */
 s

[ovs-dev] [PATCH v2] Balance-tcp bond mode optimization

2019-06-25 Thread Vishal Deep Ajmera
v1->v2:
 Updated datapath action to hash + lb-output.
 Updated throughput test observations.
 Rebased to OVS master.

Vishal Deep Ajmera (1):
  Avoid dp_hash recirculation for balance-tcp bond selection mode

 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/dpif-netdev.c | 491 --
 lib/dpif-provider.h   |   5 +
 lib/dpif.c|  36 ++
 lib/dpif.h|   5 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c|   4 +
 ofproto/bond.c|  34 +-
 ofproto/bond.h|   8 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  |  67 ++-
 ofproto/ofproto-dpif.c|  31 ++
 ofproto/ofproto-dpif.h|  12 +-
 tests/lacp.at |   9 +
 vswitchd/bridge.c |   4 +
 vswitchd/vswitch.xml  |  10 +
 16 files changed, 666 insertions(+), 55 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Preserve ofport number for failed ofport across restarts

2019-06-13 Thread Vishal Deep Ajmera


> 
> This patch fixes both the above issue by retaining the ofport number till 
> ofport
> entry exists in the OVSDB.
> 
> Warm Regards,
> Vishal Ajmera

Hi Ben,

Did you get a chance to review this patch ?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Preserve ofport number for failed ofport across restarts

2019-06-06 Thread Vishal Deep Ajmera
> >
> > This patch reserves the ofport port number for the duration the ofport
> > entry persists in the ovsdb. Once the entry is deleted from ovsdb the
> > ofport number is recirculated on LRU basis.
> >
> > Signed-off-by: Vishal Deep Ajmera 
> 
> This is an interesting patch and I like the idea.  The implementation does add
> some conceptual complexity.
> 
> Can you say some more about the scenario that this covers?  It isn't one that 
> I've
> heard come up very much in the past.  Does it arise in real situations that 
> you've
> encountered?

Hi Ben,

Thanks for looking into this patch. We have 2 scenarios in our deployment where 
we encountered issue.

Scenario 1: (SDN Controller - ODL, OpenStack - Pike)

Assume we have 4 ports in ovs on "netdev" bridge br-int: 
tap0 (1)
tap1 (2)
vhu1 (3)
vhu2 (4)
() denotes the ofport number allocated to the port by OVS.

If the host compute is rebooted, the tap devices tap0 and tap1 are not 
recreated 
(or atleast not immediately). When OVS restarts the corresponding tap 
interfaces 
are in error state: "could not open network device tap0 (No such device)".  
Unfortunately due to this error, OVS sets the ofport field to -1 in OVSDB which 
means these ofport numbers (1) and (2) are now available to be allocated to any 
new port. At this time, SDN controller will need to remove all the flows 
pertaining to tap ports as ofport numbers are no longer same. 

A peculiar behavior we observed with OpenStack Pike was that when compute was 
restarted, it also triggered deletes and adds of all VHU ports (and not as a 
single transaction). This also resulted in change of existing vhu's ofport 
number.

After compute reboot:

tap0 (-1) -> "could not open network device tap0 (No such device)"
tap1 (-1) -> "could not open network device tap1 (No such device)"
vhu1 (3)
vhu2 (4)

Then a delete/add of vhu port changes the ofport allocation as follows:
tap0 (-1)
tap1 (-1)
vhu1 (1)
vhu2 (2)

Things got messed up here as SDN controller on one hand trying to remove 
flows for tap0 /tap1 and on the other also modifying flows for vhu1 and vhu2. 
Unfortunately both tap0 and vhu1 has ofport (1) and tap1 and vhu2 has ofport 
(2). 
Until the time flows are properly cleaned by SDN controller we end up in stale 
flows of tap ports being used for vhu ports. If OVS avoids giving same ofport 
number (1) and (2) to existing VHU ports or any new port, this situation could 
be avoided.

Scenario 2: (SDN Controller - ODL, OpenStack - Pike)
We have another scenario in our deployment, where in we always start OVS with 
DPDK enabled. But if DPDK initialization fails due to any error, we fall back to
non-dpdk mode and start OVS. This is to make sure we still manage to have 
connectivity with the computes. But this causes all VHU's and DPDK ports to 
fail and
frees up their ofport numbers. Once we correct the underlying DPDK issue and
start OVS again with dpdk mode, SDN controller is now required to reconstruct 
(and 
not just replay) all the flows which it had earlier as the ofport numbers are 
not guaranteed to be same again.

This patch fixes both the above issue by retaining the ofport number till ofport
entry exists in the OVSDB.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto: Preserve ofport number for failed ofport across restarts

2019-06-04 Thread Vishal Deep Ajmera
Current OVS implementation frees the ofport number allocated for
a given ofport if the underlying interface could not get initialized
due to any error. However it does not delete the entry of ofport from
ovsdb. This means any new ofport can get this ofport number anytime
in future. When the underlying interface issue is resolved for the
ofport it gets a new ofport number from the pool. This has an
undesired effect where any SDN controller will have to modify all the
flows pertaining this ofport due to change in ofport number and
reprogram the flows in the switch.

This patch reserves the ofport port number for the duration the ofport
entry persists in the ovsdb. Once the entry is deleted from ovsdb the
ofport number is recirculated on LRU basis.

Signed-off-by: Vishal Deep Ajmera 
---
 ofproto/ofproto.c |  17 +--
 ofproto/ofproto.h |   2 +
 vswitchd/bridge.c | 145 +++---
 3 files changed, 132 insertions(+), 32 deletions(-)

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 1d6fc00..df77859 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -190,8 +190,6 @@ static void reinit_ports(struct ofproto *);
 
 static long long int ofport_get_usage(const struct ofproto *,
   ofp_port_t ofp_port);
-static void ofport_set_usage(struct ofproto *, ofp_port_t ofp_port,
- long long int last_used);
 static void ofport_remove_usage(struct ofproto *, ofp_port_t ofp_port);
 
 /* Ofport usage.
@@ -2264,10 +2262,23 @@ static ofp_port_t
 alloc_ofp_port(struct ofproto *ofproto, const char *netdev_name)
 {
 uint16_t port_idx;
+struct ofport *port;
 
 port_idx = simap_get(>ofp_requests, netdev_name);
 port_idx = port_idx ? port_idx : UINT16_MAX;
 
+/* Check if port_idx is not allocated to any other interface.
+ * If available, then reserve for given netdev and skip
+ * allocation algorithm. */
+if (port_idx != UINT16_MAX) {
+port = ofproto_get_port(ofproto, u16_to_ofp(port_idx));
+if ((!port) || (port->netdev != NULL &&
+!strcmp(netdev_name, netdev_get_name(port->netdev {
+ofport_set_usage(ofproto, u16_to_ofp(port_idx), LLONG_MAX);
+return u16_to_ofp(port_idx);
+}
+}
+
 if (port_idx >= ofproto->max_ports
 || ofport_get_usage(ofproto, u16_to_ofp(port_idx)) == LLONG_MAX) {
 uint16_t lru_ofport = 0, end_port_no = ofproto->alloc_port_no;
@@ -2577,7 +2588,7 @@ ofport_get_usage(const struct ofproto *ofproto, 
ofp_port_t ofp_port)
 return 0;
 }
 
-static void
+void
 ofport_set_usage(struct ofproto *ofproto, ofp_port_t ofp_port,
  long long int last_used)
 {
diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index 6e4afff..c236e0b 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -313,6 +313,8 @@ const char *ofproto_port_open_type(const struct ofproto *,
const char *port_type);
 int ofproto_port_add(struct ofproto *, struct netdev *, ofp_port_t *ofp_portp);
 int ofproto_port_del(struct ofproto *, ofp_port_t ofp_port);
+void ofport_set_usage(struct ofproto *, ofp_port_t ofp_port,
+  long long int last_used);
 void ofproto_port_set_config(struct ofproto *, ofp_port_t ofp_port,
  const struct smap *cfg);
 int ofproto_port_get_stats(const struct ofport *, struct netdev_stats *stats);
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 0702cc6..4852ef0 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -137,6 +137,8 @@ struct bridge {
 
 /* Used during reconfiguration. */
 struct shash wanted_ports;
+struct hmap failed_ports;   /* List of ports for which interface
+ * add failed. */
 
 /* Synthetic local port if necessary. */
 struct ovsrec_port synth_local_port;
@@ -273,10 +275,17 @@ static bool port_is_bond_fake_iface(const struct port *);
 static unixctl_cb_func qos_unixctl_show_types;
 static unixctl_cb_func qos_unixctl_show;
 
-static struct port *port_create(struct bridge *, const struct ovsrec_port *);
-static void port_del_ifaces(struct port *);
+static struct port *port_create(struct bridge *, const struct ovsrec_port *,
+bool stale);
+static void port_del_ifaces(struct port *, bool stale);
 static void port_destroy(struct port *);
-static struct port *port_lookup(const struct bridge *, const char *name);
+static struct port *port_lookup(const struct bridge *, const char *name,
+bool stale);
+
+static void failed_port_destroy(struct port *);
+static void failed_iface_insert(struct port *port,
+const struct ovsrec_interface *iface_cfg);
+
 static void port_configure(struct port *);
 static struct lacp_settings *port_configure_lacp(struct port *,
  

Re: [ovs-dev] [PATCH v4] Support for match & set ICMPv6 reserved and options type fields

2019-02-04 Thread Vishal Deep Ajmera
> > Currently OVS supports all ARP protocol fields as OXM match fields to
> > implement the relevant ARP procedures for IPv4. This includes support
> > for matching copying and setting ARP fields. In IPv6 ARP has been
> > replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
> > advertisement and neighbor solicitation.
> >
> > The support for ICMPv6 fields in OVS is not complete for the use cases
> > equivalent to ARP in IPv4. OVS lacks support for matching, copying and
> > setting the “ND option type” and “ND reserved” fields. Without these
> > user cannot implement all ICMPv6 ND procedures for IPv6 support.
> >
> > This commit adds additional OXM fields to OVS for ICMPv6 “ND option
> > type“ and ICMPv6 “ND reserved” using the OXM extension mechanism. This
> > allows support for parsing these fields from an ICMPv6 packet header
> > and extending the OpenFlow protocol with specifications for these new
> > OXM fields for matching, copying and setting.
> >
> > Signed-off-by: Vishal Deep Ajmera 
> > Co-authored-by: Ashvin Lakshmikantha
> > 
> > Signed-off-by: Ashvin Lakshmikantha
> > 
> 
> Applied to master, thanks!

Thanks Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] Support for match & set ICMPv6 reserved and options type fields

2019-02-03 Thread Vishal Deep Ajmera
> 
> Currently OVS supports all ARP protocol fields as OXM match fields to
> implement the relevant ARP procedures for IPv4. This includes support
> for matching copying and setting ARP fields. In IPv6 ARP has been
> replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
> advertisement and neighbor solicitation.
> 
> The support for ICMPv6 fields in OVS is not complete for the use cases
> equivalent to ARP in IPv4. OVS lacks support for matching, copying and
> setting the “ND option type” and “ND reserved” fields. Without these user
> cannot implement all ICMPv6 ND procedures for IPv6 support.
> 
> This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
> and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
> support for parsing these fields from an ICMPv6 packet header and extending
> the OpenFlow protocol with specifications for these new OXM fields for
> matching, copying and setting.
> 
Hi Ben,

Does the changes in V4 look ok?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v3 2/2] conntrack: Exclude l2 padding in 'conn_key_extract()'.

2019-02-01 Thread Vishal Deep Ajmera
> 
> 'conn_key_extract()' in userspace conntrack is including L2
> (Ethernet) pad bytes for both L3 and L4 sizes. One problem is any packet with
> non-zero L2 padding can incorrectly fail L4 checksum validation.
> 
> This patch fixes conn_key_extract() by ignoring L2 pad bytes.
> 

Thanks Darrell for the patch. It looks fine. Can we get this in for upcoming 
OVS release ? 
Also need a backport till OVS branch v2.6 if possible.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] Support for match & set ICMPv6 reserved and options type fields

2019-01-28 Thread Vishal Deep Ajmera
Currently OVS supports all ARP protocol fields as OXM match fields to
implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.

The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.

This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
support for parsing these fields from an ICMPv6 packet header and extending
the OpenFlow protocol with specifications for these new OXM fields for
matching, copying and setting.

Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Ashvin Lakshmikantha 
Signed-off-by: Ashvin Lakshmikantha 
---
 NEWS  |   2 +
 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   8 ++
 include/openvswitch/flow.h|   6 +-
 include/openvswitch/match.h   |   3 +
 include/openvswitch/meta-flow.h   |  28 +
 lib/flow.c|  58 +--
 lib/match.c   |  21 
 lib/meta-flow.c   |  38 +++
 lib/meta-flow.xml |  12 +++
 lib/nx-match.c|   8 ++
 lib/odp-execute.c |  32 ++
 lib/odp-util.c| 120 +-
 lib/packets.c |  33 ++
 lib/packets.h |   3 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 tests/odp.at  |   1 +
 tests/ofproto.at  |   4 +-
 18 files changed, 365 insertions(+), 14 deletions(-)

diff --git a/NEWS b/NEWS
index 4985dba..0fa530c 100644
--- a/NEWS
+++ b/NEWS
@@ -36,6 +36,8 @@ v2.11.0 - xx xxx 
  * Add support for Auto load balancing of PMDs (experimental)
  * Added new per-port configurable option to manage EMC:
'other_config:emc-enable'.
+ * ICMPv6 ND enhancements: support for match and set ND options type
+   and reserved fields.
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
- ovs-vswitchd:
diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 3592594..e159a1d 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -71,6 +71,7 @@ OXM_CLASSES = {"NXM_OF_":(0,  0x, 
'extension'),
"OXM_OF_":(0,  0x8000, 'standard'),
"OXM_OF_PKT_REG": (0,  0x8001, 'standard'),
"ONFOXM_ET_": (0x4f4e4600, 0x, 'standard'),
+   "ERICOXM_OF_":(0,  0x1000, 'extension'),
 
# This is the experimenter OXM class for Nicira, which is the
# one that OVS would be using instead of NXM_OF_ and NXM_NX_
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 9b087f1..d5aa09d 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -375,6 +375,7 @@ enum ovs_key_attr {
 #ifndef __KERNEL__
/* Only used within userspace data path. */
OVS_KEY_ATTR_PACKET_TYPE,  /* be32 packet type */
+   OVS_KEY_ATTR_ND_EXTENSIONS, /* struct ovs_key_nd_extensions */
 #endif
 
__OVS_KEY_ATTR_MAX
@@ -489,6 +490,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
+#ifndef __KERNEL__
+struct ovs_key_nd_extensions {
+__be32  nd_reserved;
+__u8nd_options_type;
+};
+#endif
+
 #define OVS_CT_LABELS_LEN_32   4
 #define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index 5d2cf09..57b6c92 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -144,7 +144,8 @@ struct flow {
 struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
 struct eth_addr arp_sha;/* ARP/ND source hardware address. */
 struct eth_addr arp_tha;/* ARP/ND target hardware address. */
-ovs_be16 tcp_flags; /* TCP flags. With L3 to avoid matching L4. */
+ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type.
+ * With L3 to avoid matching L4. */
 ovs_be16 pa

[ovs-dev] [PATCH v4] Enhancing ICMPv6 support

2019-01-28 Thread Vishal Deep Ajmera
This patch adds support for match and set ICMPv6
"reserved" and "nd options type" fields.

v1->v2: Fixed compiler and sparse warnings.
v2->v3: Updated NEWS, simplified miniflow_extract for ICMPv6,
updated usage for tcp_flags and igmpgroup_ipv4.
v3->v4: Added parsing test in odp.at. Rebased to latest master.

Vishal Deep Ajmera (1):
  Support for match & set ICMPv6 reserved and options type fields

 NEWS  |   2 +
 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   8 ++
 include/openvswitch/flow.h|   6 +-
 include/openvswitch/match.h   |   3 +
 include/openvswitch/meta-flow.h   |  28 +
 lib/flow.c|  58 +--
 lib/match.c   |  21 
 lib/meta-flow.c   |  38 +++
 lib/meta-flow.xml |  12 +++
 lib/nx-match.c|   8 ++
 lib/odp-execute.c |  32 ++
 lib/odp-util.c| 120 +-
 lib/packets.c |  33 ++
 lib/packets.h |   3 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 tests/odp.at  |   1 +
 tests/ofproto.at  |   4 +-
 18 files changed, 365 insertions(+), 14 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] conntrack: Exclude l2 pad bytes from L4 checksum validation

2019-01-28 Thread Vishal Deep Ajmera
Userspace conntrack implementation in OVS is including L2 (Ethernet) pad bytes
as well for L4 checksum validation. This is a bug and any packet with non-zero
L2 padding bytes will be treated as invalid packet by conntrack due to checksum
failures.

This patch fixes the conntrack implementation by ignoring L2 pad bytes for L4
checksum validation.

Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Venkatesan Pradeep 
Co-authored-by: Nitin Katiyar 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 

---
 lib/conntrack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index f732b9e..1f6dccf 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1973,8 +1973,8 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 if (!hwol_bad_l4_csum) {
 bool  hwol_good_l4_csum = dp_packet_l4_checksum_valid(pkt);
 /* Validate the checksum only when hwol is not supported. */
-if (extract_l4(>key, l4, tail - l4, >icmp_related, l3,
-   !hwol_good_l4_csum)) {
+if (extract_l4(>key, l4, (tail - l4) - 
dp_packet_l2_pad_size(pkt),
+   >icmp_related, l3, !hwol_good_l4_csum)) {
 ctx->hash = conn_key_hash(>key, ct->hash_basis);
 return true;
 }
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Support for match & set ICMPv6 reserved and options type fields

2019-01-23 Thread Vishal Deep Ajmera


> -Original Message-
> From: Ben Pfaff 
> Sent: Saturday, January 19, 2019 12:04 AM
> To: Vishal Deep Ajmera 
> Cc: d...@openvswitch.org; Ashvin Lakshmikantha
> 
> Subject: Re: [ovs-dev] [PATCH v2] Support for match & set ICMPv6 reserved and
> options type fields
> 
> On Thu, Jan 17, 2019 at 11:14:26AM +, Vishal Deep Ajmera wrote:
> > Currently OVS supports all ARP protocol fields as OXM match fields to
> > implement the relevant ARP procedures for IPv4. This includes support
> > for matching copying and setting ARP fields. In IPv6 ARP has been
> > replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
> > advertisement and neighbor solicitation.
> >
> > The support for ICMPv6 fields in OVS is not complete for the use cases
> > equivalent to ARP in IPv4. OVS lacks support for matching, copying and
> > setting the “ND option type” and “ND reserved” fields. Without these
> > user cannot implement all ICMPv6 ND procedures for IPv6 support.
> >
> > This commit adds additional OXM fields to OVS for ICMPv6 “ND option
> > type“ and ICMPv6 “ND reserved” using the OXM extension mechanism. This
> > allows support for parsing these fields from an ICMPv6 packet header
> > and extending the OpenFlow protocol with specifications for these new
> > OXM fields for matching, copying and setting.
> >
> > Signed-off-by: Vishal Deep Ajmera 
> > Co-authored-by: Ashvin Lakshmikantha
> > 
> > Signed-off-by: Ashvin Lakshmikantha
> > 
> 
> Thanks for working to make OVS better!
> 
> It looks like miniflow_extract() calls data_pull() for the RSO flags field 
> without
> first checking to see whether the message is long enough.
> This is dangerous.
>
Thanks Ben for reviewing this patch.

I have refactored this code addressing your comments in V3.
 
> This cast should not be needed:
> 
> rso_flags = (uint32_t *) data_pull(,
>, sizeof(uint32_t));
> 
> The code for populating opt_type[0] and opt_type[1] into the miniflow is
> confusing.  It looks like only one of these can be nonzero, and if either one 
> is
> present then we put it in the same spot (in the tcp_flags)?  If that's so, 
> then why
> bother distinguishing them during parsing?  (And how should flow_compose_l4()
> know where to put them?
> Currently it doesn't bother with them at all.)
> 
Fixed in V3.

> The comments on struct flow should describe the new uses of tcp_flags and
> igmp_group_ip4.
> 
Fixed in V3.

> Please add an item to describe the new feature in NEWS.
>
Fixed in V3.
 
> Please add parsing tests to odp.at.
> 
I will add parsing test in subsequent patch V4.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] Support for match & set ICMPv6 reserved and options type fields

2019-01-23 Thread Vishal Deep Ajmera
Currently OVS supports all ARP protocol fields as OXM match fields to
implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.

The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.

This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
support for parsing these fields from an ICMPv6 packet header and extending
the OpenFlow protocol with specifications for these new OXM fields for
matching, copying and setting.

Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Ashvin Lakshmikantha 
Signed-off-by: Ashvin Lakshmikantha 
---
 NEWS  |   2 +
 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   8 ++
 include/openvswitch/flow.h|   6 +-
 include/openvswitch/match.h   |   3 +
 include/openvswitch/meta-flow.h   |  28 +
 lib/flow.c|  58 +--
 lib/match.c   |  21 
 lib/meta-flow.c   |  38 +++
 lib/meta-flow.xml |  12 +++
 lib/nx-match.c|   8 ++
 lib/odp-execute.c |  32 ++
 lib/odp-util.c| 120 +-
 lib/packets.c |  33 ++
 lib/packets.h |   3 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 tests/ofproto.at  |   4 +-
 17 files changed, 364 insertions(+), 14 deletions(-)

diff --git a/NEWS b/NEWS
index 358c9b9..6ffac90 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,8 @@ Post-v2.10.0
  * Add option for simple round-robin based Rxq to PMD assignment.
It can be set with pmd-rxq-assign.
  * Add support for DPDK 18.11
+ * ICMPv6 ND enhancements: support for match and set ND options type
+   and reserved fields.
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
- ovs-vswitchd:
diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 3592594..e159a1d 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -71,6 +71,7 @@ OXM_CLASSES = {"NXM_OF_":(0,  0x, 
'extension'),
"OXM_OF_":(0,  0x8000, 'standard'),
"OXM_OF_PKT_REG": (0,  0x8001, 'standard'),
"ONFOXM_ET_": (0x4f4e4600, 0x, 'standard'),
+   "ERICOXM_OF_":(0,  0x1000, 'extension'),
 
# This is the experimenter OXM class for Nicira, which is the
# one that OVS would be using instead of NXM_OF_ and NXM_NX_
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 9b087f1..d5aa09d 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -375,6 +375,7 @@ enum ovs_key_attr {
 #ifndef __KERNEL__
/* Only used within userspace data path. */
OVS_KEY_ATTR_PACKET_TYPE,  /* be32 packet type */
+   OVS_KEY_ATTR_ND_EXTENSIONS, /* struct ovs_key_nd_extensions */
 #endif
 
__OVS_KEY_ATTR_MAX
@@ -489,6 +490,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
+#ifndef __KERNEL__
+struct ovs_key_nd_extensions {
+__be32  nd_reserved;
+__u8nd_options_type;
+};
+#endif
+
 #define OVS_CT_LABELS_LEN_32   4
 #define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index 5d2cf09..57b6c92 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -144,7 +144,8 @@ struct flow {
 struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
 struct eth_addr arp_sha;/* ARP/ND source hardware address. */
 struct eth_addr arp_tha;/* ARP/ND target hardware address. */
-ovs_be16 tcp_flags; /* TCP flags. With L3 to avoid matching L4. */
+ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type.
+ * With L3 to avoid matching L4. */
 ovs_be16 pad2;  /* Pad to 64 bits. */
 struct ovs_key_nsh nsh; /* Network Servic

[ovs-dev] [PATCH v3] Enhancing ICMPv6 support

2019-01-23 Thread Vishal Deep Ajmera
This patch adds support for match and set ICMPv6
"reserved" and "nd options type" fields.

v1->v2: Fixed compiler and sparse warnings.
v2->v3: Updated NEWS, simplified miniflow_extract for ICMPv6,
updated usage for tcp_flags and igmpgroup_ipv4.

Vishal Deep Ajmera (1):
  Support for match & set ICMPv6 reserved and options type fields

 NEWS  |   2 +
 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   8 ++
 include/openvswitch/flow.h|   6 +-
 include/openvswitch/match.h   |   3 +
 include/openvswitch/meta-flow.h   |  28 +
 lib/flow.c|  58 +--
 lib/match.c   |  21 
 lib/meta-flow.c   |  38 +++
 lib/meta-flow.xml |  12 +++
 lib/nx-match.c|   8 ++
 lib/odp-execute.c |  32 ++
 lib/odp-util.c| 120 +-
 lib/packets.c |  33 ++
 lib/packets.h |   3 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 tests/ofproto.at  |   4 +-
 17 files changed, 364 insertions(+), 14 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Support for match & set ICMPv6 reserved and options type fields

2019-01-17 Thread Vishal Deep Ajmera

> -Original Message-
> From: Ben Pfaff 
> Sent: Thursday, January 17, 2019 6:11 AM
> To: Vishal Deep Ajmera 
> Cc: d...@openvswitch.org; Ashvin Lakshmikantha
> 
> Subject: Re: [ovs-dev] [PATCH] Support for match & set ICMPv6 reserved and
> options type fields
> 
> On Fri, Jan 11, 2019 at 08:32:48AM +, Vishal Deep Ajmera wrote:
> > Currently OVS supports all ARP protocol fields as OXM match fields to
> > implement the relevant ARP procedures for IPv4. This includes support
> > for matching copying and setting ARP fields. In IPv6 ARP has been
> > replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
> > advertisement and neighbor solicitation.
> >
> > The support for ICMPv6 fields in OVS is not complete for the use cases
> > equivalent to ARP in IPv4. OVS lacks support for matching, copying and
> > setting the “ND option type” and “ND reserved” fields. Without these user
> > cannot implement all ICMPv6 ND procedures for IPv6 support.
> >
> > This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
> > and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
> > support for parsing these fields from an ICMPv6 packet header and extending
> > the OpenFlow protocol with specifications for these new OXM fields for
> > matching, copying and setting.
> >
> > Signed-off-by: Vishal Deep Ajmera 
> > Co-authored-by: Ashvin Lakshmikantha
> 
> > Signed-off-by: Ashvin Lakshmikantha 
> 
> Please fix up the compiler and "sparse" warnings:
>

Thanks Ben for reviewing the patch. I have sent V2 version after fixing 
compiler and sparse warnings.
 
> ../ofproto/ofproto-dpif-sflow.c:969:13: error: enumeration value
> 'OVS_KEY_ATTR_ND_EXTENSIONS' not explicitly handled in switch [-Werror,-
> Wswitch-enum]
> 
> ../lib/match.c:1084:1: error: symbol 'match_set_nd_reserved' redeclared with
> different type (originally declared at ../include/openvswitch/match.h:222) -
> incompatible argument 2 (different base types)
> 
> ../lib/meta-flow.c:924:21: error: incorrect type in assignment (different base
> types)
> ../lib/meta-flow.c:924:21:expected restricted ovs_be32 [usertype] be32
> ../lib/meta-flow.c:924:21:got unsigned int
> ../lib/meta-flow.c:1275:43: error: incorrect type in argument 2 (different 
> base
> types)
> ../lib/meta-flow.c:1275:43:expected unsigned int [unsigned] [usertype]
> 
> ../lib/meta-flow.c:1275:43:got restricted ovs_be32 const [usertype] be32
> ../lib/meta-flow.c:1691:43: error: incorrect type in argument 1 (different 
> base
> types)
> ../lib/meta-flow.c:1691:43:expected unsigned int [unsigned] [usertype] x
> ../lib/meta-flow.c:1691:43:got restricted ovs_be32 const [usertype] be32
> 
> ../lib/nx-match.c:995:33: error: incorrect type in argument 4 (different base
> types)
> ../lib/nx-match.c:995:33:expected restricted ovs_be32 [addressable]
> [usertype] value
> ../lib/nx-match.c:995:33:got unsigned int
> 
> ../lib/odp-util.c:4056:34: error: incorrect type in argument 1 (different base
> types)
> ../lib/odp-util.c:4056:34:expected unsigned int [unsigned] [usertype] x
> ../lib/odp-util.c:4056:34:got restricted ovs_be32 [usertype] 
> ../lib/odp-util.c:5964:31: error: incorrect type in argument 1 (different base
> types)
> ../lib/odp-util.c:5964:31:expected restricted ovs_be32 [usertype] x
> ../lib/odp-util.c:5964:31:got restricted ovs_be16 const [usertype] 
> tcp_flags
> ../lib/odp-util.c:5969:45: error: incorrect type in assignment (different base
> types)
> ../lib/odp-util.c:5969:45:expected restricted ovs_be32 [usertype] 
> nd_reserved
> ../lib/odp-util.c:5969:45:got unsigned int
> ../lib/odp-util.c:6605:60: error: incorrect type in argument 1 (different base
> types)
> ../lib/odp-util.c:6605:60:expected unsigned int [unsigned] [usertype] x
> ../lib/odp-util.c:6605:60:got restricted ovs_be32 const [usertype] 
> nd_reserved
> ../lib/odp-util.c:7526:25: error: incorrect type in assignment (different base
> types)
> ../lib/odp-util.c:7526:25:expected restricted ovs_be32 [usertype] 
> nd_reserved
> ../lib/odp-util.c:7526:25:got unsigned int
> ../lib/odp-util.c:7534:39: error: incorrect type in argument 1 (different base
> types)
> ../lib/odp-util.c:7534:39:expected unsigned int [unsigned] [usertype] x
> ../lib/odp-util.c:7534:39:got restricted ovs_be32 const [usertype] 
> nd_reserved
> ../lib/odp-util.c:7599:34: error: incorrect type in argument 1 (different base
> types)
> ../lib/odp-util.c:7599:34:expected restricted ovs_be16 [usertype] x
> ../lib/odp-util.c:7599:34:got int
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] Support for match & set ICMPv6 reserved and options type fields

2019-01-17 Thread Vishal Deep Ajmera
Currently OVS supports all ARP protocol fields as OXM match fields to
implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.

The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.

This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
support for parsing these fields from an ICMPv6 packet header and extending
the OpenFlow protocol with specifications for these new OXM fields for
matching, copying and setting.

Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Ashvin Lakshmikantha 
Signed-off-by: Ashvin Lakshmikantha 
---
 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   8 ++
 include/openvswitch/match.h   |   3 +
 include/openvswitch/meta-flow.h   |  28 +
 lib/flow.c|  46 -
 lib/match.c   |  21 
 lib/meta-flow.c   |  38 +++
 lib/meta-flow.xml |  12 +++
 lib/nx-match.c|   8 ++
 lib/odp-execute.c |  32 ++
 lib/odp-util.c| 120 +-
 lib/packets.c |  33 ++
 lib/packets.h |   3 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 tests/ofproto.at  |   4 +-
 15 files changed, 349 insertions(+), 9 deletions(-)

diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 3592594..e159a1d 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -71,6 +71,7 @@ OXM_CLASSES = {"NXM_OF_":(0,  0x, 
'extension'),
"OXM_OF_":(0,  0x8000, 'standard'),
"OXM_OF_PKT_REG": (0,  0x8001, 'standard'),
"ONFOXM_ET_": (0x4f4e4600, 0x, 'standard'),
+   "ERICOXM_OF_":(0,  0x1000, 'extension'),
 
# This is the experimenter OXM class for Nicira, which is the
# one that OVS would be using instead of NXM_OF_ and NXM_NX_
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 9b087f1..d5aa09d 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -375,6 +375,7 @@ enum ovs_key_attr {
 #ifndef __KERNEL__
/* Only used within userspace data path. */
OVS_KEY_ATTR_PACKET_TYPE,  /* be32 packet type */
+   OVS_KEY_ATTR_ND_EXTENSIONS, /* struct ovs_key_nd_extensions */
 #endif
 
__OVS_KEY_ATTR_MAX
@@ -489,6 +490,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
+#ifndef __KERNEL__
+struct ovs_key_nd_extensions {
+__be32  nd_reserved;
+__u8nd_options_type;
+};
+#endif
+
 #define OVS_CT_LABELS_LEN_32   4
 #define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
diff --git a/include/openvswitch/match.h b/include/openvswitch/match.h
index e8c80dd..05ecee7 100644
--- a/include/openvswitch/match.h
+++ b/include/openvswitch/match.h
@@ -219,6 +219,9 @@ void match_set_nd_target(struct match *, const struct 
in6_addr *);
 void match_set_nd_target_masked(struct match *, const struct in6_addr *,
 const struct in6_addr *);
 
+void match_set_nd_reserved(struct match *, ovs_be32);
+void match_set_nd_options_type(struct match *, uint8_t);
+
 bool match_equal(const struct match *, const struct match *);
 uint32_t match_hash(const struct match *, uint32_t basis);
 
diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index ffd8945..65d99f4 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -1796,6 +1796,34 @@ enum OVS_PACKED_ENUM mf_field_id {
  */
 MFF_ND_TLL,
 
+/* "nd_reserved".
+ *
+ * The reserved field in IPv6 Neighbor Discovery message.
+ *
+ * Type: be32.
+ * Maskable: no.
+ * Formatting: decimal.
+ * Prerequisites: ND.
+ * Access: read/write.
+ * NXM: none.
+ * OXM: ERICOXM_OF_ICMPV6_ND_RESERVED(1) since v2.11.
+ */
+MFF_ND_RESERVED,
+
+/* "nd_options_type".
+ *
+ * The type of the option in IPv6 Neighbor Discovery messa

[ovs-dev] [PATCH v2] Enhancing ICMPv6 support

2019-01-17 Thread Vishal Deep Ajmera
This patch adds support for match and set ICMPv6
"reserved" and "nd options type" fields.

v1->v2: Fixed compiler and sparse warnings.

Vishal Deep Ajmera (1):
  Support for match & set ICMPv6 reserved and options type fields

 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   8 ++
 include/openvswitch/match.h   |   3 +
 include/openvswitch/meta-flow.h   |  28 +
 lib/flow.c|  46 -
 lib/match.c   |  21 
 lib/meta-flow.c   |  38 +++
 lib/meta-flow.xml |  12 +++
 lib/nx-match.c|   8 ++
 lib/odp-execute.c |  32 ++
 lib/odp-util.c| 120 +-
 lib/packets.c |  33 ++
 lib/packets.h |   3 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 tests/ofproto.at  |   4 +-
 15 files changed, 349 insertions(+), 9 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] IPv6: Add support for match and set nd_options_type and reserved fields.

2018-09-25 Thread Vishal Deep Ajmera
> >
> > That is good to know, but the inner struct is not the main issue as I see 
> > it.
> > The main issue is that the change breaks the ABI.  This change must be
> > implemented in some way that does not break the ABI.
> 
> Hi Ben,
> Are there any guidelines or mechanism available to extend such structures 
> which
> are defined in openvswitch.h with new fields for supporting new features?
> 
> We primarily need this feature for 'netdev' datapath and so kernel
> implementation need not change.
> 
Hi,

Can you help us in this regard? How do I extend this data structure to include 
more fields namely 'nd_options_type' and 'reserved'? As I mentioned we need 
this support only for netdev data path. Is it possible using some #defines?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] IPv6: Add support for match and set nd_options_type and reserved fields.

2018-09-18 Thread Vishal Deep Ajmera


> > Thanks Ben. Yes, the inner structure is not required. Earlier we
> > defined it as union and later changed it to structure. Instead those
> > fields can be simply defined outside of inner structure. I will fix this in 
> > v2
> patch.
> 
> That is good to know, but the inner struct is not the main issue as I see it.
> The main issue is that the change breaks the ABI.  This change must be
> implemented in some way that does not break the ABI.

Hi Ben,
Are there any guidelines or mechanism available to extend such structures which 
are 
defined in openvswitch.h with new fields for supporting new features?

We primarily need this feature for 'netdev' datapath and so kernel 
implementation need
not change.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] IPv6: Add support for match and set nd_options_type and reserved fields.

2018-09-18 Thread Vishal Deep Ajmera
> 
> Thanks for working on making OVS better support IPv6.
> 
> The following change stood out to me.  It appears to break ABI compatibility
> in the kernel datapath.  Is there some reason it's OK?  I especially don't
> understand why it adds a nested inner struct.
> 
Thanks Ben. Yes, the inner structure is not required. Earlier we defined it as 
union and 
later changed it to structure. Instead those fields can be simply defined 
outside of inner 
structure. I will fix this in v2 patch.

Warm Regards,
Vishal
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] IPv6: Add support for match and set nd_options_type and reserved fields.

2018-09-18 Thread Vishal Deep Ajmera
Currently OVS supports all ARP protocol fields as OXM match fields
to implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.

The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.

This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism.

This allows support for parsing these fields from an ICMPv6 packet header
and extending the OpenFlow protocol with specifications for these new OXM
fields for matching, copying and setting.

Signed-off-by: Ashvin Lakshmikantha 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
---
 build-aux/extract-ofp-fields  |  2 +-
 datapath/linux/compat/include/linux/openvswitch.h | 10 +++--
 include/openvswitch/match.h   |  3 ++
 include/openvswitch/meta-flow.h   | 28 
 lib/flow.c| 54 ---
 lib/match.c   | 16 +++
 lib/meta-flow.c   | 37 
 lib/meta-flow.xml | 13 ++
 lib/nx-match.c|  8 
 lib/odp-execute.c | 21 +++--
 lib/odp-util.c| 19 +++-
 lib/packets.c | 36 +--
 lib/packets.h |  3 +-
 tests/odp.at  | 14 +++---
 tests/ofproto-dpif.at |  8 ++--
 tests/ofproto.at  |  4 +-
 16 files changed, 234 insertions(+), 42 deletions(-)

diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 3592594..3558601 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -71,7 +71,7 @@ OXM_CLASSES = {"NXM_OF_":(0,  0x, 
'extension'),
"OXM_OF_":(0,  0x8000, 'standard'),
"OXM_OF_PKT_REG": (0,  0x8001, 'standard'),
"ONFOXM_ET_": (0x4f4e4600, 0x, 'standard'),
-
+   "ERICOXM_OF_":(0,  0x1000, 'extension'),
# This is the experimenter OXM class for Nicira, which is the
# one that OVS would be using instead of NXM_OF_ and NXM_NX_
# if OVS didn't have those grandfathered in.  It is currently
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index aaeb034..9c7077f 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -484,9 +484,13 @@ struct ovs_key_arp {
 };
 
 struct ovs_key_nd {
-   __be32  nd_target[4];
-   __u8nd_sll[ETH_ALEN];
-   __u8nd_tll[ETH_ALEN];
+__be32  nd_reserved;
+__be32  nd_target[4];
+__u8nd_options_type;
+struct {
+   __u8nd_sll[ETH_ALEN];
+   __u8nd_tll[ETH_ALEN];
+};
 };
 
 #define OVS_CT_LABELS_LEN_32   4
diff --git a/include/openvswitch/match.h b/include/openvswitch/match.h
index e8c80dd..800c55d 100644
--- a/include/openvswitch/match.h
+++ b/include/openvswitch/match.h
@@ -68,6 +68,8 @@ void match_zero_wildcarded_fields(struct match *);
 void match_set_dp_hash(struct match *, uint32_t value);
 void match_set_dp_hash_masked(struct match *, uint32_t value, uint32_t mask);
 
+void match_set_nd_reserved(struct match *, uint32_t value);
+
 void match_set_recirc_id(struct match *, uint32_t value);
 
 void match_set_conj_id(struct match *, uint32_t value);
@@ -199,6 +201,7 @@ void match_set_nw_frag(struct match *, uint8_t nw_frag);
 void match_set_nw_frag_masked(struct match *, uint8_t nw_frag, uint8_t mask);
 void match_set_icmp_type(struct match *, uint8_t);
 void match_set_icmp_code(struct match *, uint8_t);
+void match_set_nd_options_type(struct match *, uint8_t);
 void match_set_arp_sha(struct match *, const struct eth_addr);
 void match_set_arp_sha_masked(struct match *,
   const struct eth_addr arp_sha,
diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 627c8a3..9ef23df 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -1796,6 +1796,34 @@ enum OVS_PACKED_ENUM mf_field_id {
  */
 MFF_ND_TLL,
 
+/* "nd_reserved".
+ *
+ * The reserved field in

Re: [ovs-dev] OVS DPDK: dpdk_merge pull request for master

2018-08-27 Thread Vishal Deep Ajmera
Thanks Ian and Ben.

Warm Regards,
Vishal Ajmera

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org  boun...@openvswitch.org> On Behalf Of Ben Pfaff
> Sent: Tuesday, August 28, 2018 2:00 AM
> To: Ian Stokes 
> Cc: d...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK: dpdk_merge pull request for master
> 
> On Mon, Aug 27, 2018 at 08:17:25PM +0100, Ian Stokes wrote:
> > Hi Ben,
> >
> > The following changes since commit
> 418a7a84245f5fbe589dd1267463fc9ba27a1dd6:
> >
> >   ofproto-dpif-trace: Make -generate send packets to controller again.
> > (2018-08-27 09:35:21 -0700)
> >
> > are available in the git repository at:
> >
> >   https://github.com/istokes/ovs dpdk_merge
> >
> > for you to fetch changes up to
> 9b4f08cdcaf253175edda088683bdd3db9e4c097:
> >
> >   dpif-netdev: Avoid reordering of packets in a batch with same megaflow
> > (2018-08-27 17:48:23 +0100)
> 
> Merged, thanks!
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-upcall: Fix for flow limit issue in revalidator

2018-08-02 Thread Vishal Deep Ajmera
When the revalidator thread takes a long time to dump data path
flows (e.g. due to busy CPU), it reduces the maximum limit for
new flows that can be added. This results in more upcalls for
packets which do not find data path flows and temporarily reduces
overall throughput. When the situation improves and the revalidator
gets enough CPU cycles, it should increase the flow limit allowing
more flows to get inserted.

Currently the flow limit does not increase if the existing number of
flows is less than 2000 and does not allow any new flows due to
incorrect condition check. This results in a permanent drop in
performance in OVS with no automatic recovery.

This patch fixes the conditional check for increasing flow limit.

Signed-off-by: Vishal Deep Ajmera 
---
 ofproto/ofproto-dpif-upcall.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 85f5792..607 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -931,8 +931,8 @@ udpif_revalidator(void *arg)
 flow_limit /= duration / 1000;
 } else if (duration > 1300) {
 flow_limit = flow_limit * 3 / 4;
-} else if (duration < 1000 && n_flows > 2000
-   && flow_limit < n_flows * 1000 / duration) {
+} else if (duration < 1000 &&
+   flow_limit < n_flows * 1000 / duration) {
 flow_limit += 1000;
 }
 flow_limit = MIN(ofproto_flow_limit, MAX(flow_limit, 1000));
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev, v5] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-27 Thread Vishal Deep Ajmera
> Have you tried solution with direct pushing of all the packets to
> 'flow_map' and per-flow batching only at the end of 'dp_netdev_input__()'?
> 
> Is it really slower?
> If you have some performance data, I'd like to see it, because it looks
> like code should be simpler without this optimization.

I do not have any comparison data. However, for best case where all 
packets have a hit in EMC, this will add to the cycles since an extra loop for 
batch needs to be run in dp_netdev_input__() after emc processing.

> 
> Also, I'm thinking, maybe it'll be better to keep the receive order
> values somewhere in the packet batch itself modifying the
> 'dp_packet_batch' structure and API. Using this we may restore the
> original order inside the per-flow batches before executing actions and
> completely avoid any modifications in processing related code.
> Maybe, this solution will look more clean and easy to understand.

I could not try the solution suggested by you. However I have addressed your 
code review comments in v6 patch. Also, v6 patch is rebased with dfc/smc 
feature.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-27 Thread Vishal Deep Ajmera
OVS reads packets in batches from a given port and packets in the
batch are subjected to potentially 3 levels of lookups to identify
the datapath megaflow entry (or flow) associated with the packet.
Each megaflow entry has a dedicated buffer in which packets that match
the flow classification criteria are collected. This buffer helps OVS
perform batch processing for all packets associated with a given flow.

Each packet in the received batch is first subjected to lookup in the
Exact Match Cache (EMC). Each EMC entry will point to a flow. If the
EMC lookup is successful, the packet is moved from the rx batch to the
per-flow buffer.

Packets that did not match any EMC entry are rearranged in the rx batch
at the beginning and are now subjected to a lookup in the megaflow cache.
Packets that match a megaflow cache entry are *appended* to the per-flow
buffer.

Packets that do not match any megaflow entry are subjected to slow-path
processing through the upcall mechanism. This cannot change the order of
packets as by definition upcall processing is only done for packets
without matching megaflow entry.

The EMC entry match fields encompass all potentially significant header
fields, typically more than specified in the associated flow's match
criteria. Hence, multiple EMC entries can point to the same flow. Given
that per-flow batching happens at each lookup stage, packets belonging
to the same megaflow can get re-ordered because some packets match EMC
entries while others do not.

The following example can illustrate the issue better. Consider
following batch of packets (labelled P1 to P8) associated with a single
TCP connection and associated with a single flow. Let us assume that
packets with just the ACK bit set in TCP flags have been received in a
prior batch also and a corresponding EMC entry exists.

1. P1 (TCP Flag: ACK)
2. P2 (TCP Flag: ACK)
3. P3 (TCP Flag: ACK)
4. P4 (TCP Flag: ACK, PSH)
5. P5 (TCP Flag: ACK)
6. P6 (TCP Flag: ACK)
7. P7 (TCP Flag: ACK)
8. P8 (TCP Flag: ACK)

The megaflow classification criteria does not include TCP flags while
the EMC match criteria does. Thus, all packets other than P4 match
the existing EMC entry and are moved to the per-flow packet batch.
Subsequently, packet P4 is moved to the same per-flow packet batch as
a result of the megaflow lookup. Though the packets have all been
correctly classified as being associated with the same flow, the
packet order has not been preserved because of the per-flow batching
performed during the EMC lookup stage. This packet re-ordering has
performance implications for TCP applications.

This patch preserves the packet ordering by performing the per-flow
batching after both the EMC and megaflow lookups are complete. As an
optimization, packets are flow-batched in emc processing till any
packet in the batch has an EMC miss.

A new flow map is maintained to keep the original order of packet
along with flow information. Post fastpath processing, packets from
flow map are *appended* to per-flow buffer.

Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Venkatesan Pradeep 
---
 lib/dpif-netdev.c | 125 +-
 1 file changed, 106 insertions(+), 19 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 13a20f0..3ea25e2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -244,6 +244,13 @@ struct dpcls_rule {
 /* 'flow' must be the last field, additional space is allocated here. */
 };
 
+/* Data structure to keep packet order till fastpath processing. */
+struct dp_packet_flow_map {
+struct dp_packet *packet;
+struct dp_netdev_flow *flow;
+uint16_t tcp_flags;
+};
+
 static void dpcls_init(struct dpcls *);
 static void dpcls_destroy(struct dpcls *);
 static void dpcls_sort_subtable_vector(struct dpcls *);
@@ -5774,6 +5781,19 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
 packet_batch_per_flow_update(batch, pkt, tcp_flags);
 }
 
+static inline void
+packet_enqueue_to_flow_map(struct dp_packet *packet,
+   struct dp_netdev_flow *flow,
+   uint16_t tcp_flags,
+   struct dp_packet_flow_map *flow_map,
+   size_t index)
+{
+struct dp_packet_flow_map *map = _map[index];
+map->flow = flow;
+map->packet = packet;
+map->tcp_flags = tcp_flags;
+}
+
 /* SMC lookup function for a batch of packets.
  * By doing batching SMC lookup, we can use prefetch
  * to hide memory access latency.
@@ -5783,8 +5803,9 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
 struct netdev_flow_key *keys,
 struct netdev_flow_key **missed_keys,
 struct dp_packet_batch *packets_,
-struct packet_batch_per_flow batches[],
-size_t *n_batches, const int cnt)
+const int cnt,
+struct dp_packet_flow_map *flow_map,
+uint8_t *index_map)
 {
 

Re: [ovs-dev] [PATCH v5] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-13 Thread Vishal Deep Ajmera
Hi Ian, Ilya,

If there are no more comments, can I request to please include the fix
in this week's pull request?

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   >