[ovs-dev] [PATCH] lacp: Add support to recognize LACP Marker RX PDUs.

2019-11-12 Thread Nitin katiyar via dev
OVS does not support the LACP Marker protocol. Typically, ToR switches
send a LACP Marker PDU when restarting LACP negotiation following a link
flap or LACP timeout.

When a LACP Marker PDU is received, OVS logs the following warning and
drops the packet:
“lacp(pmdXXX)|WARN|bond-prv: received an unparsable LACP PDU.”

As the above message is logged around the same time the link flap or
LACP down events are logged, it gives a misleading impression that the
reception of an unparsable LACP PDU is the reason for the LACP down
event.

The proposed patch does not add support for the LACP Marker protocol.
It simply recognizes LACP Marker packets, drops them and logs a clear
message indicating that a Marker packet was a received. A counter to
track the number of such packets received is also added.

Signed-off-by: Nitin katiyar 
---
 lib/lacp.c | 49 +
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index 16c823b..705d88f 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -86,6 +86,12 @@ BUILD_ASSERT_DECL(LACP_PDU_LEN == sizeof(struct lacp_pdu));
 
 /* Implementation. */
 
+enum pdu_subtype {
+SUBTYPE_UNUSED = 0,
+SUBTYPE_LACP,   /* Link Aggregation Control Protocol. */
+SUBTYPE_MARKER, /* Link Aggregation Marker Protocol. */
+};
+
 enum slave_status {
 LACP_CURRENT,   /* Current State.  Partner up to date. */
 LACP_EXPIRED,   /* Expired State.  Partner out of date. */
@@ -130,6 +136,7 @@ struct slave {
 
 uint32_t count_rx_pdus; /* dot3adAggPortStatsLACPDUsRx */
 uint32_t count_rx_pdus_bad; /* dot3adAggPortStatsIllegalRx */
+uint32_t count_rx_pdus_marker;  /* dot3adAggPortStatsMarkerPDUsRx */
 uint32_t count_tx_pdus; /* dot3adAggPortStatsLACPDUsTx */
 uint32_t count_link_expired;/* Num of times link expired */
 uint32_t count_link_defaulted;  /* Num of times link defaulted */
@@ -183,12 +190,13 @@ compose_lacp_pdu(const struct lacp_info *actor,
 pdu->collector_delay = htons(0);
 }
 
-/* Parses 'b' which represents a packet containing a LACP PDU.  This function
- * returns NULL if 'b' is malformed, or does not represent a LACP PDU format
+/* Parses 'p' which represents a packet containing a LACP PDU. This function
+ * returns NULL if 'p' is malformed, or does not represent a LACP PDU format
  * supported by OVS.  Otherwise, it returns a pointer to the lacp_pdu contained
- * within 'b'. */
+ * within 'p'. It also returns the subtype of PDU.*/
+
 static const struct lacp_pdu *
-parse_lacp_packet(const struct dp_packet *p)
+parse_lacp_packet(const struct dp_packet *p, enum pdu_subtype *subtype)
 {
 const struct lacp_pdu *pdu;
 
@@ -198,8 +206,13 @@ parse_lacp_packet(const struct dp_packet *p)
 if (pdu && pdu->subtype == 1
 && pdu->actor_type == 1 && pdu->actor_len == 20
 && pdu->partner_type == 2 && pdu->partner_len == 20) {
+*subtype = SUBTYPE_LACP;
 return pdu;
-} else {
+} else if (pdu && pdu->subtype == SUBTYPE_MARKER) {
+*subtype = SUBTYPE_MARKER;
+return NULL;
+} else{
+*subtype = SUBTYPE_UNUSED;
 return NULL;
 }
 }
@@ -336,6 +349,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 long long int tx_rate;
 struct slave *slave;
 bool lacp_may_enable = false;
+enum pdu_subtype subtype;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -344,11 +358,20 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 }
 slave->count_rx_pdus++;
 
-pdu = parse_lacp_packet(packet);
-if (!pdu) {
-slave->count_rx_pdus_bad++;
-VLOG_WARN_RL(, "%s: received an unparsable LACP PDU.", lacp->name);
-goto out;
+pdu = parse_lacp_packet(packet, );
+switch (subtype) {
+case SUBTYPE_LACP:
+break;
+case SUBTYPE_MARKER:
+slave->count_rx_pdus_marker++;
+VLOG_DBG("%s: received a LACP marker PDU.", lacp->name);
+goto out;
+case SUBTYPE_UNUSED:
+default:
+slave->count_rx_pdus_bad++;
+VLOG_WARN_RL(, "%s: received an unparsable LACP PDU.",
+ lacp->name);
+goto out;
 }
 
 /* On some NICs L1 state reporting is slow. In case LACP packets are
@@ -367,7 +390,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 
 slave->ntt_actor = pdu->partner;
 
-/* Update our information about our partner if it's out of date.  This may
+/* Update our information about our partner if it's out of date. This may
  * cause priorities to change so re-calculate attached status of all
  * slaves.  */
 if (memcmp(>partner, >actor, sizeof pdu->actor)) {
@@ -1054,9 +1077,11 @@ lacp_print_stats(struct d

Re: [ovs-dev] [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.

2019-11-11 Thread Nitin Katiyar via dev
Hi Guys,
Can you please review the patch? It has been in mailing list for long time.

Regards,
Nitin

> -Original Message-
> From: Nitin Katiyar
> Sent: Friday, October 04, 2019 11:06 AM
> To: 'ovs-dev@openvswitch.org' 
> Cc: Anju Thomas ; 'Ilya Maximets'
> ; Ben Pfaff 
> Subject: RE: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval
> in PMD main loop.
> 
> Hi,
> Can you please review the following patch and provide the feedback?
> 
> Regards,
> Nitin
> 
> > -Original Message-
> > From: Nitin Katiyar
> > Sent: Wednesday, August 28, 2019 2:00 PM
> > To: ovs-dev@openvswitch.org
> > Cc: Anju Thomas ; Ilya Maximets
> > 
> > Subject: RE: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed
> > interval in PMD main loop.
> >
> > Hi,
> > Please provide your feedback.
> >
> > Regards,
> > Nitin
> >
> > > -Original Message-
> > > From: Nitin Katiyar
> > > Sent: Thursday, August 22, 2019 10:24 PM
> > > To: ovs-dev@openvswitch.org
> > > Cc: Nitin Katiyar ; Anju Thomas
> > > 
> > > Subject: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed
> > > interval in PMD main loop.
> > >
> > > Each PMD updates the global sequence number for RCU synchronization
> > > purpose with other OVS threads. This is done at every 1025th
> > > iteration in PMD main loop.
> > >
> > > If the PMD thread is responsible for polling large number of queues
> > > that are carrying traffic, it spends a lot of time processing
> > > packets and this results in significant delay in performing the
> > > housekeeping
> > activities.
> > >
> > > If the OVS main thread is waiting to synchronize with the PMD
> > > threads and if those threads delay performing housekeeping
> > > activities for more than 3 sec then LACP processing will be impacted
> > > and it will lead to LACP flaps. Similarly, other controls protocols
> > > run by OVS main thread are
> > impacted.
> > >
> > > For e.g. a PMD thread polling 200 ports/queues with average of 1600
> > > processing cycles per packet with batch size of 32 may take 1024
> > > (200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU
> > > it means more than 5 ms per iteration. So, for 1024 iterations to
> > > complete it would be more than 5 seconds.
> > >
> > > This gets worse when there are PMD threads which are less loaded.
> > > It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
> > > by heavily loaded PMD and next attempt to quiesce would be after
> > > 1024
> > iterations.
> > >
> > > With this patch, PMD RCU synchronization will be performed after
> > > fixed interval instead after a fixed number of iterations. This will
> > > ensure that even if the packet processing load is high the RCU
> > > synchronization will not be delayed long.
> > >
> > > Co-authored-by: Anju Thomas 
> > > Signed-off-by: Anju Thomas 
> > > Signed-off-by: Nitin Katiyar 
> > > ---
> > >  lib/dpif-netdev.c | 22 ++
> > >  1 file changed, 22 insertions(+)
> > >
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > > d0a1c58..63b6cb9
> > > 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -228,6 +228,9 @@ struct dfc_cache {
> > >   * and used during rxq to pmd assignment. */  #define
> > > PMD_RXQ_INTERVAL_MAX 6
> > >
> > > +/* Time in microseconds to try RCU quiescing. */ #define
> > > +PMD_RCU_QUIESCE_INTERVAL 1LL
> > > +
> > >  struct dpcls {
> > >  struct cmap_node node;  /* Within
> dp_netdev_pmd_thread.classifiers
> > > */
> > >  odp_port_t in_port;
> > > @@ -751,6 +754,9 @@ struct dp_netdev_pmd_thread {
> > >
> > >  /* Set to true if the pmd thread needs to be reloaded. */
> > >  bool need_reload;
> > > +
> > > +/* Next time when PMD should try RCU quiescing. */
> > > +long long next_rcu_quiesce;
> > >  };
> > >
> > >  /* Interface to netdev-based datapath. */ @@ -5486,6 +5492,8 @@
> > reload:
> > >  pmd->intrvl_tsc_prev = 0;
> > >  atomic_store_relaxed(>intrvl_cycles, 0);
> > >  cycles_counter_update(s);
> > > +
> > > +pmd->next_rcu_quiesce = pmd->ctx.now +
> > > PMD_RCU_QUIE

Re: [ovs-dev] [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.

2019-10-03 Thread Nitin Katiyar via dev
Hi,
Can you please review the following patch and provide the feedback?

Regards,
Nitin

> -Original Message-
> From: Nitin Katiyar
> Sent: Wednesday, August 28, 2019 2:00 PM
> To: ovs-dev@openvswitch.org
> Cc: Anju Thomas ; Ilya Maximets
> 
> Subject: RE: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval
> in PMD main loop.
> 
> Hi,
> Please provide your feedback.
> 
> Regards,
> Nitin
> 
> > -Original Message-
> > From: Nitin Katiyar
> > Sent: Thursday, August 22, 2019 10:24 PM
> > To: ovs-dev@openvswitch.org
> > Cc: Nitin Katiyar ; Anju Thomas
> > 
> > Subject: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed
> > interval in PMD main loop.
> >
> > Each PMD updates the global sequence number for RCU synchronization
> > purpose with other OVS threads. This is done at every 1025th iteration
> > in PMD main loop.
> >
> > If the PMD thread is responsible for polling large number of queues
> > that are carrying traffic, it spends a lot of time processing packets
> > and this results in significant delay in performing the housekeeping
> activities.
> >
> > If the OVS main thread is waiting to synchronize with the PMD threads
> > and if those threads delay performing housekeeping activities for more
> > than 3 sec then LACP processing will be impacted and it will lead to
> > LACP flaps. Similarly, other controls protocols run by OVS main thread are
> impacted.
> >
> > For e.g. a PMD thread polling 200 ports/queues with average of 1600
> > processing cycles per packet with batch size of 32 may take 1024
> > (200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU it
> > means more than 5 ms per iteration. So, for 1024 iterations to
> > complete it would be more than 5 seconds.
> >
> > This gets worse when there are PMD threads which are less loaded.
> > It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
> > by heavily loaded PMD and next attempt to quiesce would be after 1024
> iterations.
> >
> > With this patch, PMD RCU synchronization will be performed after fixed
> > interval instead after a fixed number of iterations. This will ensure
> > that even if the packet processing load is high the RCU
> > synchronization will not be delayed long.
> >
> > Co-authored-by: Anju Thomas 
> > Signed-off-by: Anju Thomas 
> > Signed-off-by: Nitin Katiyar 
> > ---
> >  lib/dpif-netdev.c | 22 ++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > d0a1c58..63b6cb9
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -228,6 +228,9 @@ struct dfc_cache {
> >   * and used during rxq to pmd assignment. */  #define
> > PMD_RXQ_INTERVAL_MAX 6
> >
> > +/* Time in microseconds to try RCU quiescing. */ #define
> > +PMD_RCU_QUIESCE_INTERVAL 1LL
> > +
> >  struct dpcls {
> >  struct cmap_node node;  /* Within dp_netdev_pmd_thread.classifiers
> > */
> >  odp_port_t in_port;
> > @@ -751,6 +754,9 @@ struct dp_netdev_pmd_thread {
> >
> >  /* Set to true if the pmd thread needs to be reloaded. */
> >  bool need_reload;
> > +
> > +/* Next time when PMD should try RCU quiescing. */
> > +long long next_rcu_quiesce;
> >  };
> >
> >  /* Interface to netdev-based datapath. */ @@ -5486,6 +5492,8 @@
> reload:
> >  pmd->intrvl_tsc_prev = 0;
> >  atomic_store_relaxed(>intrvl_cycles, 0);
> >  cycles_counter_update(s);
> > +
> > +pmd->next_rcu_quiesce = pmd->ctx.now +
> > PMD_RCU_QUIESCE_INTERVAL;
> >  /* Protect pmd stats from external clearing while polling. */
> >  ovs_mutex_lock(>perf_stats.stats_mutex);
> >  for (;;) {
> > @@ -5493,6 +5501,17 @@ reload:
> >
> >  pmd_perf_start_iteration(s);
> >
> > +/* Do RCU synchronization at fixed interval instead of doing it
> > + * at fixed number of iterations. This ensures that synchronization
> > + * would not be delayed long even at high load of packet
> > + * processing. */
> > +if (pmd->ctx.now > pmd->next_rcu_quiesce) {
> > +if (!ovsrcu_try_quiesce()) {
> > +pmd->next_rcu_quiesce =
> > +pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
> > +}
> > +}
> > +
> >  for (i = 0; i < poll_cnt; i++) {
> >
> > 

[ovs-dev] [PATCH V3] packets: Fix using outdated RSS hash after MPLS decapsulation.

2019-08-28 Thread Nitin Katiyar
When a packet is received, the RSS hash is calculated if it is not
already available. The Exact Match Cache (EMC) entry is then looked up
using this RSS hash.

When a MPLS encapsulated packet is received, the MPLS header is popped
and the packet is recirculated. Since the RSS hash has not been
invalidated here, the EMC lookup for all decapsulated packets will hit
the same entry even though these packets will have different tuple
values. This degrades performance severely as different inner packets
from the same MPLS tunnel would hit the same EMC entry.

This patch invalidates RSS hash (by resetting offload flags) after MPLS
header is popped.

Signed-off-by: Nitin Katiyar 
---
 lib/packets.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/packets.c b/lib/packets.c
index ab0b1a3..12053df 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -411,6 +411,10 @@ pop_mpls(struct dp_packet *packet, ovs_be16 ethtype)
 /* Shift the l2 header forward. */
 memmove((char*)dp_packet_data(packet) + MPLS_HLEN, 
dp_packet_data(packet), len);
 dp_packet_resize_l2_5(packet, -MPLS_HLEN);
+
+/* Invalidate offload flags as they are not valid after
+ * decapsulation of MPLS header. */
+dp_packet_reset_offload(packet);
 }
 }
 
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.

2019-08-28 Thread Nitin Katiyar
Hi,
Please provide your feedback.

Regards,
Nitin

> -Original Message-
> From: Nitin Katiyar
> Sent: Thursday, August 22, 2019 10:24 PM
> To: ovs-dev@openvswitch.org
> Cc: Nitin Katiyar ; Anju Thomas
> 
> Subject: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval in
> PMD main loop.
> 
> Each PMD updates the global sequence number for RCU synchronization
> purpose with other OVS threads. This is done at every 1025th iteration in
> PMD main loop.
> 
> If the PMD thread is responsible for polling large number of queues that are
> carrying traffic, it spends a lot of time processing packets and this results 
> in
> significant delay in performing the housekeeping activities.
> 
> If the OVS main thread is waiting to synchronize with the PMD threads and if
> those threads delay performing housekeeping activities for more than 3 sec
> then LACP processing will be impacted and it will lead to LACP flaps. 
> Similarly,
> other controls protocols run by OVS main thread are impacted.
> 
> For e.g. a PMD thread polling 200 ports/queues with average of 1600
> processing cycles per packet with batch size of 32 may take 1024
> (200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU it means
> more than 5 ms per iteration. So, for 1024 iterations to complete it would be
> more than 5 seconds.
> 
> This gets worse when there are PMD threads which are less loaded.
> It reduces possibility of getting mutex lock in ovsrcu_try_quiesce() by 
> heavily
> loaded PMD and next attempt to quiesce would be after 1024 iterations.
> 
> With this patch, PMD RCU synchronization will be performed after fixed
> interval instead after a fixed number of iterations. This will ensure that 
> even if
> the packet processing load is high the RCU synchronization will not be delayed
> long.
> 
> Co-authored-by: Anju Thomas 
> Signed-off-by: Anju Thomas 
> Signed-off-by: Nitin Katiyar 
> ---
>  lib/dpif-netdev.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index d0a1c58..63b6cb9
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -228,6 +228,9 @@ struct dfc_cache {
>   * and used during rxq to pmd assignment. */  #define
> PMD_RXQ_INTERVAL_MAX 6
> 
> +/* Time in microseconds to try RCU quiescing. */ #define
> +PMD_RCU_QUIESCE_INTERVAL 1LL
> +
>  struct dpcls {
>  struct cmap_node node;  /* Within dp_netdev_pmd_thread.classifiers
> */
>  odp_port_t in_port;
> @@ -751,6 +754,9 @@ struct dp_netdev_pmd_thread {
> 
>  /* Set to true if the pmd thread needs to be reloaded. */
>  bool need_reload;
> +
> +/* Next time when PMD should try RCU quiescing. */
> +long long next_rcu_quiesce;
>  };
> 
>  /* Interface to netdev-based datapath. */ @@ -5486,6 +5492,8 @@ reload:
>  pmd->intrvl_tsc_prev = 0;
>  atomic_store_relaxed(>intrvl_cycles, 0);
>  cycles_counter_update(s);
> +
> +pmd->next_rcu_quiesce = pmd->ctx.now +
> PMD_RCU_QUIESCE_INTERVAL;
>  /* Protect pmd stats from external clearing while polling. */
>  ovs_mutex_lock(>perf_stats.stats_mutex);
>  for (;;) {
> @@ -5493,6 +5501,17 @@ reload:
> 
>  pmd_perf_start_iteration(s);
> 
> +/* Do RCU synchronization at fixed interval instead of doing it
> + * at fixed number of iterations. This ensures that synchronization
> + * would not be delayed long even at high load of packet
> + * processing. */
> +if (pmd->ctx.now > pmd->next_rcu_quiesce) {
> +if (!ovsrcu_try_quiesce()) {
> +pmd->next_rcu_quiesce =
> +pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
> +}
> +}
> +
>  for (i = 0; i < poll_cnt; i++) {
> 
>  if (!poll_list[i].rxq_enabled) { @@ -5527,6 +5546,8 @@ reload:
>  dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
>  if (!ovsrcu_try_quiesce()) {
>  emc_cache_slow_sweep(&((pmd->flow_cache).emc_cache));
> +pmd->next_rcu_quiesce =
> +pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
>  }
> 
>  for (i = 0; i < poll_cnt; i++) { @@ -5986,6 +6007,7 @@
> dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct
> dp_netdev *dp,
>  pmd->ctx.last_rxq = NULL;
>  pmd_thread_ctx_time_update(pmd);
>  pmd->next_optimization = pmd->ctx.now +
> DPCLS_OPTIMIZATION_INTERVAL;
> +pmd->next_rcu_quiesce = pmd->ctx.now +
> PMD_RCU_QUIESCE_INTERVAL;
>  pmd->rxq_next_cycle_store = pmd->ctx.now + PMD_RXQ_INTERVAL_LEN;
>  hmap_init(>poll_list);
>  hmap_init(>tx_ports);
> --
> 1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] packets: Fix using outdated RSS hash after MPLS decapsulation.

2019-08-28 Thread Nitin Katiyar



> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Wednesday, August 28, 2019 12:44 PM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org
> Cc: Stokes, Ian 
> Subject: Re: [ovs-dev] [PATCH] packets: Fix using outdated RSS hash after
> MPLS decapsulation.
> 
> On 28.08.2019 10:03, Nitin Katiyar wrote:
> >
> >
> >> -Original Message-
> >> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >> Sent: Tuesday, August 27, 2019 5:37 PM
> >> To: Nitin Katiyar ;
> >> ovs-dev@openvswitch.org
> >> Cc: Stokes, Ian 
> >> Subject: Re: [ovs-dev] [PATCH] packets: Fix using outdated RSS hash
> >> after MPLS decapsulation.
> >>
> >> Hi.
> >>
> >> Since this is not a first version of a patch, the subject prefix
> >> should be [PATCH v2]. For the next version, please, use [PATCH v3].
> >> This way it's much easier to recognize/track patches in a list or mailbox.
> > Hi,
> > Since the title got of the patch got changed so I didn't make it V2. I 
> > wasn't
> sure to use V2 as with new title it becomes new thread. Anyway, I will be
> sending the new patch with V3.
> >
> >>
> >> Hint: It's generally speeds up review process if relevant people are
> >>   in CC list while sending the patch. Not everyone subscribed for
> >>   all the messages in a list.
> > Thanks for suggestion. I will follow it in future.
> >
> >>
> >> The code looks good in general.
> >> Few comments inline.
> >>
> >> Best regards, Ilya Maximets.
> >>
> >> On 16.08.2019 21:54, Nitin Katiyar wrote:
> >>> When a packet is received, the RSS hash is calculated if it is not
> >>> already available. The Exact Match Cache (EMC) entry is then looked
> >>> up using this RSS hash.
> >>>
> >>> When a MPLS encapsulated packet is received, the MPLS header is
> >>> popped and the packet is recirculated. Since the RSS hash has not
> >>> been invalidated here, the EMC lookup will hit the same entry as
> >>> that before recirculation. This degrades performance severely.
> >>
> >> Above sentences are not correct. It's stated that the collision
> >> happens with the previous pass of the same packet, but it happens
> >> with different packets from the same MPLS tunnel. This should be fixed.
> > It will actually collide with the entry of packet before de-capsulation. I 
> > have
> rephrased it as follows:
> >
> > When a MPLS encapsulated packet is received, the MPLS header is
> popped
> > and the packet is recirculated. Since the RSS hash has not been
> > invalidated here, the EMC lookup for decapsulated packets will hit the
> > same entry as that before recirculation (i.e. entry of the packet with
> > MPLS encapsulation). This degrades performance severely as different
> > inner packets from the same MPLS tunnel would hit the same EMC entry.
> >
> > Let me know if it is more clear now.
> 
> But dpif_netdev_packet_get_rss_hash() mixes the recirculation depth into
> RSS hash on each packet pass through the datapath. So, after recirculation RSS
> hash should be different from the hash of encapsulated packet. and where
> should be no collisions.
> I already wrote about this in my review to v1 and you said: "If there are
> multiple flows are sent across same tunnel (with same MPLS label) then they
> will all lead to same hash as external tuple and recirculation id remains same
> for them."  That makes sense, but it's opposite to what you're saying now. I'm
> confused.
Ah okay, I misinterpret your previous comment. You are correct. I have modified 
it as below:
When a MPLS encapsulated packet is received, the MPLS header is popped
and the packet is recirculated. Since the RSS hash has not been
invalidated here, the EMC lookup for all decapsulated packets will hit
the same entry even though these packets will have different tuple
values. This degrades performance severely as different inner packets
from the same MPLS tunnel would hit the same EMC entry.

Thanks for correcting this.

Regards,
Nitin
> 
> Best regards, Ilya Maximets.
> 
> >
> > Thanks,
> > Nitin
> >>
> >>>
> >>> This patch invalidates RSS hash (by resetting offload flags) after
> >>> MPLS header is popped.
> >>>
> >>> Signed-off-by: Nitin Katiyar 
> >>> ---
> >>>  lib/packets.c | 4 
> >>>  1 file changed, 4 insertions(+)
> >>>
>

Re: [ovs-dev] [PATCH] packets: Fix using outdated RSS hash after MPLS decapsulation.

2019-08-28 Thread Nitin Katiyar



> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, August 27, 2019 5:37 PM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org
> Cc: Stokes, Ian 
> Subject: Re: [ovs-dev] [PATCH] packets: Fix using outdated RSS hash after
> MPLS decapsulation.
> 
> Hi.
> 
> Since this is not a first version of a patch, the subject prefix should be 
> [PATCH
> v2]. For the next version, please, use [PATCH v3].
> This way it's much easier to recognize/track patches in a list or mailbox.
Hi,
Since the title got of the patch got changed so I didn't make it V2. I wasn't 
sure to use V2 as with new title it becomes new thread. Anyway, I will be 
sending the new patch with V3.

> 
> Hint: It's generally speeds up review process if relevant people are
>   in CC list while sending the patch. Not everyone subscribed for
>   all the messages in a list.
Thanks for suggestion. I will follow it in future.

> 
> The code looks good in general.
> Few comments inline.
> 
> Best regards, Ilya Maximets.
> 
> On 16.08.2019 21:54, Nitin Katiyar wrote:
> > When a packet is received, the RSS hash is calculated if it is not
> > already available. The Exact Match Cache (EMC) entry is then looked up
> > using this RSS hash.
> >
> > When a MPLS encapsulated packet is received, the MPLS header is popped
> > and the packet is recirculated. Since the RSS hash has not been
> > invalidated here, the EMC lookup will hit the same entry as that
> > before recirculation. This degrades performance severely.
> 
> Above sentences are not correct. It's stated that the collision happens with
> the previous pass of the same packet, but it happens with different packets
> from the same MPLS tunnel. This should be fixed.
It will actually collide with the entry of packet before de-capsulation. I have 
rephrased it as follows:

When a MPLS encapsulated packet is received, the MPLS header is popped
and the packet is recirculated. Since the RSS hash has not been
invalidated here, the EMC lookup for decapsulated packets will hit the
same entry as that before recirculation (i.e. entry of the packet with
MPLS encapsulation). This degrades performance severely as different
inner packets from the same MPLS tunnel would hit the same EMC entry.

Let me know if it is more clear now.

Thanks,
Nitin
> 
> >
> > This patch invalidates RSS hash (by resetting offload flags) after
> > MPLS header is popped.
> >
> > Signed-off-by: Nitin Katiyar 
> > ---
> >  lib/packets.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/lib/packets.c b/lib/packets.c index ab0b1a3..db3f6d0
> > 100644
> > --- a/lib/packets.c
> > +++ b/lib/packets.c
> > @@ -404,6 +404,10 @@ pop_mpls(struct dp_packet *packet, ovs_be16
> ethtype)
> >  struct mpls_hdr *mh = dp_packet_l2_5(packet);
> >  size_t len = packet->l2_5_ofs;
> >
> > +/* Invalidate offload flags as they are not valid after
> > + * decapsulation of MPLS header. */
> > +dp_packet_reset_offload(packet);
> 
> Maybe it's better to move this code down to 'dp_packet_resize_l2_5' at the
> end of this function to have similar changes closer?
> 
Sure, I will move it to end  of function.
> > +
> >  set_ethertype(packet, ethtype);
> >  if (get_16aligned_be32(>mpls_lse) & htonl(MPLS_BOS_MASK)) {
> >  dp_packet_set_l2_5(packet, NULL);
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] packets: Fix using outdated RSS hash after MPLS decapsulation.

2019-08-27 Thread Nitin Katiyar
Hi,
Gentle reminder. Please provide your feedback.

Regards,
Nitin

> -Original Message-
> From: Nitin Katiyar
> Sent: Saturday, August 17, 2019 12:25 AM
> To: ovs-dev@openvswitch.org
> Cc: Nitin Katiyar 
> Subject: [PATCH] packets: Fix using outdated RSS hash after MPLS
> decapsulation.
> 
> When a packet is received, the RSS hash is calculated if it is not already
> available. The Exact Match Cache (EMC) entry is then looked up using this RSS
> hash.
> 
> When a MPLS encapsulated packet is received, the MPLS header is popped
> and the packet is recirculated. Since the RSS hash has not been invalidated
> here, the EMC lookup will hit the same entry as that before recirculation. 
> This
> degrades performance severely.
> 
> This patch invalidates RSS hash (by resetting offload flags) after MPLS header
> is popped.
> 
> Signed-off-by: Nitin Katiyar 
> ---
>  lib/packets.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/packets.c b/lib/packets.c index ab0b1a3..db3f6d0 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -404,6 +404,10 @@ pop_mpls(struct dp_packet *packet, ovs_be16
> ethtype)
>  struct mpls_hdr *mh = dp_packet_l2_5(packet);
>  size_t len = packet->l2_5_ofs;
> 
> +/* Invalidate offload flags as they are not valid after
> + * decapsulation of MPLS header. */
> +dp_packet_reset_offload(packet);
> +
>  set_ethertype(packet, ethtype);
>  if (get_16aligned_be32(>mpls_lse) & htonl(MPLS_BOS_MASK)) {
>  dp_packet_set_l2_5(packet, NULL);
> --
> 1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.

2019-08-22 Thread Nitin Katiyar
Each PMD updates the global sequence number for RCU synchronization
purpose with other OVS threads. This is done at every 1025th iteration
in PMD main loop.

If the PMD thread is responsible for polling large number of queues
that are carrying traffic, it spends a lot of time processing packets
and this results in significant delay in performing the housekeeping
activities.

If the OVS main thread is waiting to synchronize with the PMD threads
and if those threads delay performing housekeeping activities for
more than 3 sec then LACP processing will be impacted and it will lead
to LACP flaps. Similarly, other controls protocols run by OVS main
thread are impacted.

For e.g. a PMD thread polling 200 ports/queues with average of 1600
processing cycles per packet with batch size of 32 may take 1024
(200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU
it means more than 5 ms per iteration. So, for 1024 iterations to
complete it would be more than 5 seconds.

This gets worse when there are PMD threads which are less loaded.
It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
by heavily loaded PMD and next attempt to quiesce would be after 1024
iterations.

With this patch, PMD RCU synchronization will be performed after fixed
interval instead after a fixed number of iterations. This will ensure
that even if the packet processing load is high the RCU synchronization
will not be delayed long.

Co-authored-by: Anju Thomas 
Signed-off-by: Anju Thomas 
Signed-off-by: Nitin Katiyar 
---
 lib/dpif-netdev.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d0a1c58..63b6cb9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -228,6 +228,9 @@ struct dfc_cache {
  * and used during rxq to pmd assignment. */
 #define PMD_RXQ_INTERVAL_MAX 6
 
+/* Time in microseconds to try RCU quiescing. */
+#define PMD_RCU_QUIESCE_INTERVAL 1LL
+
 struct dpcls {
 struct cmap_node node;  /* Within dp_netdev_pmd_thread.classifiers */
 odp_port_t in_port;
@@ -751,6 +754,9 @@ struct dp_netdev_pmd_thread {
 
 /* Set to true if the pmd thread needs to be reloaded. */
 bool need_reload;
+
+/* Next time when PMD should try RCU quiescing. */
+long long next_rcu_quiesce;
 };
 
 /* Interface to netdev-based datapath. */
@@ -5486,6 +5492,8 @@ reload:
 pmd->intrvl_tsc_prev = 0;
 atomic_store_relaxed(>intrvl_cycles, 0);
 cycles_counter_update(s);
+
+pmd->next_rcu_quiesce = pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
 /* Protect pmd stats from external clearing while polling. */
 ovs_mutex_lock(>perf_stats.stats_mutex);
 for (;;) {
@@ -5493,6 +5501,17 @@ reload:
 
 pmd_perf_start_iteration(s);
 
+/* Do RCU synchronization at fixed interval instead of doing it
+ * at fixed number of iterations. This ensures that synchronization
+ * would not be delayed long even at high load of packet
+ * processing. */
+if (pmd->ctx.now > pmd->next_rcu_quiesce) {
+if (!ovsrcu_try_quiesce()) {
+pmd->next_rcu_quiesce =
+pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
+}
+}
+
 for (i = 0; i < poll_cnt; i++) {
 
 if (!poll_list[i].rxq_enabled) {
@@ -5527,6 +5546,8 @@ reload:
 dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
 if (!ovsrcu_try_quiesce()) {
 emc_cache_slow_sweep(&((pmd->flow_cache).emc_cache));
+pmd->next_rcu_quiesce =
+pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
 }
 
 for (i = 0; i < poll_cnt; i++) {
@@ -5986,6 +6007,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 pmd->ctx.last_rxq = NULL;
 pmd_thread_ctx_time_update(pmd);
 pmd->next_optimization = pmd->ctx.now + DPCLS_OPTIMIZATION_INTERVAL;
+pmd->next_rcu_quiesce = pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
 pmd->rxq_next_cycle_store = pmd->ctx.now + PMD_RXQ_INTERVAL_LEN;
 hmap_init(>poll_list);
 hmap_init(>tx_ports);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] packets: Fix using outdated RSS hash after MPLS decapsulation.

2019-08-16 Thread Nitin Katiyar
When a packet is received, the RSS hash is calculated if it is not
already available. The Exact Match Cache (EMC) entry is then looked up
using this RSS hash.

When a MPLS encapsulated packet is received, the MPLS header is popped
and the packet is recirculated. Since the RSS hash has not been
invalidated here, the EMC lookup will hit the same entry as that before
recirculation. This degrades performance severely.

This patch invalidates RSS hash (by resetting offload flags) after MPLS
header is popped.

Signed-off-by: Nitin Katiyar 
---
 lib/packets.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/packets.c b/lib/packets.c
index ab0b1a3..db3f6d0 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -404,6 +404,10 @@ pop_mpls(struct dp_packet *packet, ovs_be16 ethtype)
 struct mpls_hdr *mh = dp_packet_l2_5(packet);
 size_t len = packet->l2_5_ofs;
 
+/* Invalidate offload flags as they are not valid after
+ * decapsulation of MPLS header. */
+dp_packet_reset_offload(packet);
+
 set_ethertype(packet, ethtype);
 if (get_16aligned_be32(>mpls_lse) & htonl(MPLS_BOS_MASK)) {
 dp_packet_set_l2_5(packet, NULL);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Improve MPLSoGRE performance by reducing EMC hash collisions.

2019-08-16 Thread Nitin Katiyar
Hi Ilya,
Please see my response inline. I will be sending new patch after incorporating 
some of your comments.

Regards,
Nitin

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Wednesday, August 14, 2019 5:45 PM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org
> Cc: Stokes, Ian ; Simon Horman
> ; Jan Scheurich
> ; Ben Pfaff 
> Subject: Re: [ovs-dev] [PATCH] Improve MPLSoGRE performance by reducing
> EMC hash collisions.
> 
> On 14.08.2019 11:33, Nitin Katiyar wrote:
> >
> >
> >> -Original Message-
> >> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >> Sent: Wednesday, August 14, 2019 1:42 PM
> >> To: Nitin Katiyar ;
> >> ovs-dev@openvswitch.org
> >> Cc: Stokes, Ian 
> >> Subject: Re: [ovs-dev] [PATCH] Improve MPLSoGRE performance by
> >> reducing EMC hash collisions.
> >>
> >> On 13.08.2019 20:49, Nitin Katiyar wrote:
> >>> When a packet is received, the RSS hash is calculated if it is not
> >>> already available. The Exact Match Cache (EMC) entry is then looked
> >>> up using this RSS hash.
> >>>
> >>> When a MPLSoGRE encapsulated packet is received, the GRE header is
> >>> popped, the RSS hash is invalidated and the packet is recirculated
> >>> for further processing. When the recirculated packet is processed,
> >>> the MPLS header is popped and the packet is recirculated again.
> >>> Since the RSS hash has not been invalidated here, the EMC lookup
> >>> will hit the same entry as that after the first recirculation. This
> >>> degrades performance severely.
> >>
> > Hi Ilya,
> >> Hmm. Function 'dpif_netdev_packet_get_rss_hash()' accounts
> >> recirculation depth into the hash to avoid such cases. Why this doesn't
> work for you?
> > This will not very for multiple inner flows. If there are multiple flows 
> > are sent
> across same tunnel (with same MPLS label) then they will all lead to same hash
> as external tuple and recirculation id remains same for them. Let me know if I
> am wrong.
> 
> OK. I see. Collision is not with the previous pass of this packet but with 
> other
> packets from the same MPLS tunnel. This needs to be made clear in the
> commit message.
> 
> And, IIUC, this issue is not related to MPLSoGRE specifically, it's just an 
> issue
> with any MPLS. I think, you need to re-write the commit message to be more
> general and, probably, rename the patch to something like:
> "packets: Invalidate offloads after MPLS decapsulation."
> or
> "packets: Fix using outdated RSS hash after MPLS decapsulation."
> etc.
> 
Yes, this is for MPLS. Will update the patch title.
> Another thing:
> It looks like we need to do the same for NSH decap. What do you think?
I will check this and take it separately.
> 
> Note:
> This change will force hash re-calculation in case of output to balanced
> bonding.
Yes, this should be okay as packets are recirculated after pop action and new 
hash should be calculated. Anyways hash is re-computated with new recirc-id. 
This will give better distribution over bond.
> 
> Also, It's better to not mention EMC in a common 'lib/packets.c' file.
> I think that it's enough to just say that we need to invalidate offloads since
> they are not valid anymore after decapsulation.
Okay.
> 
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Do RCU synchronization at fixed interval in PMD main loop.

2019-08-14 Thread Nitin Katiyar



> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, August 13, 2019 8:49 PM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org
> Cc: Anju Thomas ; Stokes, Ian
> 
> Subject: Re: [ovs-dev] [PATCH] Do RCU synchronization at fixed interval in
> PMD main loop.
> 
> On 07.08.2019 17:21, Nitin Katiyar wrote:
> > Each PMD updates the global sequence number for RCU synchronization
> > purpose with other OVS threads. This is done at every 1025th iteration
> > in PMD main loop.
> >
> > If the PMD thread is responsible for polling large number of queues
> > that are carrying traffic, it spends a lot of time processing packets
> > and this results in significant delay in performing the housekeeping
> > activities.
> >
> > If the OVS main thread is waiting to synchronize with the PMD threads
> > and if those threads delay performing housekeeping activities for more
> > than 3 sec then LACP processing will be impacted and it will lead to
> > LACP flaps. Similarly, other controls protocols run by OVS main thread
> > are impacted.
> >
> > For e.g. a PMD thread polling 200 ports/queues with average of 1600
> > processing cycles per packet with batch size of 32 may take 1024
> > (200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU it
> > means more than 5 ms per iteration. So, for 1024 iterations to
> > complete it would be more than 5 seconds.
> >
> > This gets worse when there are PMD threads which are less loaded.
> > It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
> > by heavily loaded PMD and next attempt to quiesce would be after 1024
> > iterations.
> >
> > With this patch, PMD RCU synchronization will be performed after fixed
> > interval instead after a fixed number of iterations. This will ensure
> > that even if the packet processing load is high the RCU
> > synchronization will not be delayed long.
> >
> > Co-authored-by: Anju Thomas 
> >
> > Signed-off-by: Nitin Katiyar 
> > Signed-off-by: Anju Thomas 
> > ---
> 
> 
> Hi. Thanks for working on this.
> 
> Your setup seems a bit strange to me (thread will be able to handle only 6Kpps
> per port), but I'm tend to agree that RCU synchronization needs to be fixed.
> Not sure about implementation. At least, I'd suggest using time instead of TSC
> for measurements.
> 
> Each PMD thread has 'ctx.now' time context measured in microseconds, so
> you may schedule next synchronization to exactly ctx.now + 10ms.  See
> 'next_optimization'
> and 'rxq_next_cycle_store' as a reference.  This will save few instructions 
> and
> also will help to avoid issues in non-DPDK (e.g. netdev-afxdp) case.
Thanks for reviewing it. I will look into it and send updated patch.
> 
> BTW, It's better to add 'dpif-netdev:' prefix to the patch name.
I will add this too.

Regards,
Nitin
> 
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Improve MPLSoGRE performance by reducing EMC hash collisions.

2019-08-14 Thread Nitin Katiyar



> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Wednesday, August 14, 2019 1:42 PM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org
> Cc: Stokes, Ian 
> Subject: Re: [ovs-dev] [PATCH] Improve MPLSoGRE performance by reducing
> EMC hash collisions.
> 
> On 13.08.2019 20:49, Nitin Katiyar wrote:
> > When a packet is received, the RSS hash is calculated if it is not
> > already available. The Exact Match Cache (EMC) entry is then looked up
> > using this RSS hash.
> >
> > When a MPLSoGRE encapsulated packet is received, the GRE header is
> > popped, the RSS hash is invalidated and the packet is recirculated for
> > further processing. When the recirculated packet is processed, the
> > MPLS header is popped and the packet is recirculated again. Since the
> > RSS hash has not been invalidated here, the EMC lookup will hit the
> > same entry as that after the first recirculation. This degrades
> > performance severely.
> 
Hi Ilya,
> Hmm. Function 'dpif_netdev_packet_get_rss_hash()' accounts recirculation
> depth into the hash to avoid such cases. Why this doesn't work for you?
This will not very for multiple inner flows. If there are multiple flows are 
sent across same tunnel (with same MPLS label) then they will all lead to same 
hash as external tuple and recirculation id remains same for them. Let me know 
if I am wrong.

Regards,
Nitin
> 
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Improve MPLSoGRE performance by reducing EMC hash collisions.

2019-08-13 Thread Nitin Katiyar
When a packet is received, the RSS hash is calculated if it is not
already available. The Exact Match Cache (EMC) entry is then looked up
using this RSS hash.

When a MPLSoGRE encapsulated packet is received, the GRE header is
popped, the RSS hash is invalidated and the packet is recirculated for
further processing. When the recirculated packet is processed, the MPLS
header is popped and the packet is recirculated again. Since the RSS
hash has not been invalidated here, the EMC lookup will hit the same
entry as that after the first recirculation. This degrades performance
severely.

This patch invalides RSS hash after MPLS header is popped.

Signed-off-by: Nitin Katiyar 
---
 lib/packets.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/packets.c b/lib/packets.c
index ab0b1a3..35fa3c7 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -404,6 +404,11 @@ pop_mpls(struct dp_packet *packet, ovs_be16 ethtype)
 struct mpls_hdr *mh = dp_packet_l2_5(packet);
 size_t len = packet->l2_5_ofs;
 
+/* Invalidate the hash so that it is recomputed using inner packet
+ * header after recirculation. Else it would cause EMC collision for
+ * packets recirculated after popping mpls header. */
+dp_packet_reset_offload(packet);
+
 set_ethertype(packet, ethtype);
 if (get_16aligned_be32(>mpls_lse) & htonl(MPLS_BOS_MASK)) {
 dp_packet_set_l2_5(packet, NULL);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Do RCU synchronization at fixed interval in PMD main loop.

2019-08-12 Thread Nitin Katiyar
Hi
A gentle reminder. Please review and provide the feedback.

Regards,
Nitin

> -Original Message-
> From: Nitin Katiyar
> Sent: Wednesday, August 07, 2019 7:52 PM
> To: ovs-dev@openvswitch.org
> Cc: Nitin Katiyar ; Anju Thomas
> 
> Subject: [PATCH] Do RCU synchronization at fixed interval in PMD main loop.
> 
> Each PMD updates the global sequence number for RCU synchronization
> purpose with other OVS threads. This is done at every 1025th iteration in
> PMD main loop.
> 
> If the PMD thread is responsible for polling large number of queues that are
> carrying traffic, it spends a lot of time processing packets and this results 
> in
> significant delay in performing the housekeeping activities.
> 
> If the OVS main thread is waiting to synchronize with the PMD threads and if
> those threads delay performing housekeeping activities for more than 3 sec
> then LACP processing will be impacted and it will lead to LACP flaps. 
> Similarly,
> other controls protocols run by OVS main thread are impacted.
> 
> For e.g. a PMD thread polling 200 ports/queues with average of 1600
> processing cycles per packet with batch size of 32 may take 1024
> (200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU it means
> more than 5 ms per iteration. So, for 1024 iterations to complete it would be
> more than 5 seconds.
> 
> This gets worse when there are PMD threads which are less loaded.
> It reduces possibility of getting mutex lock in ovsrcu_try_quiesce() by 
> heavily
> loaded PMD and next attempt to quiesce would be after 1024 iterations.
> 
> With this patch, PMD RCU synchronization will be performed after fixed
> interval instead after a fixed number of iterations. This will ensure that 
> even if
> the packet processing load is high the RCU synchronization will not be delayed
> long.
> 
> Co-authored-by: Anju Thomas 
> 
> Signed-off-by: Nitin Katiyar 
> Signed-off-by: Anju Thomas 
> ---
>  lib/dpif-netdev-perf.c | 16   lib/dpif-netdev-perf.h | 17
> +
>  lib/dpif-netdev.c  | 27 +++
>  3 files changed, 44 insertions(+), 16 deletions(-)
> 
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> e7ed49e..c888e5d 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -43,22 +43,6 @@ uint64_t iter_cycle_threshold;
> 
>  static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600);
> 
> -#ifdef DPDK_NETDEV
> -static uint64_t
> -get_tsc_hz(void)
> -{
> -return rte_get_tsc_hz();
> -}
> -#else
> -/* This function is only invoked from PMD threads which depend on DPDK.
> - * A dummy function is sufficient when building without DPDK_NETDEV. */ -
> static uint64_t
> -get_tsc_hz(void)
> -{
> -return 1;
> -}
> -#endif
> -
>  /* Histogram functions. */
> 
>  static void
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> 244813f..3f2ee1c 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -187,6 +187,23 @@ struct pmd_perf_stats {
>  char *log_reason;
>  };
> 
> +#ifdef DPDK_NETDEV
> +static inline uint64_t
> +get_tsc_hz(void)
> +{
> +return rte_get_tsc_hz();
> +}
> +#else
> +/* This function is only invoked from PMD threads which depend on DPDK.
> + * A dummy function is sufficient when building without DPDK_NETDEV. */
> +static inline uint64_t
> +get_tsc_hz(void)
> +{
> +return 1;
> +}
> +#endif
> +
> +
>  #ifdef __linux__
>  static inline uint64_t
>  rdtsc_syscall(struct pmd_perf_stats *s) diff --git a/lib/dpif-netdev.c 
> b/lib/dpif-
> netdev.c index d0a1c58..c3d6835 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -751,6 +751,9 @@ struct dp_netdev_pmd_thread {
> 
>  /* Set to true if the pmd thread needs to be reloaded. */
>  bool need_reload;
> +
> +/* Last time (in tsc) when PMD was last quiesced */
> +uint64_t last_rcu_quiesced;
>  };
> 
>  /* Interface to netdev-based datapath. */ @@ -5445,6 +5448,7 @@
> pmd_thread_main(void *f_)
>  int poll_cnt;
>  int i;
>  int process_packets = 0;
> +uint64_t rcu_quiesce_interval = 0;
> 
>  poll_list = NULL;
> 
> @@ -5486,6 +5490,13 @@ reload:
>  pmd->intrvl_tsc_prev = 0;
>  atomic_store_relaxed(>intrvl_cycles, 0);
>  cycles_counter_update(s);
> +
> +if (get_tsc_hz() > 1) {
> +/* Calculate ~10 ms interval. */
> +rcu_quiesce_interval = get_tsc_hz() / 100;
> +pmd->last_rcu_quiesced = cycles_counter_get(s);
> +}
> +
>  /* Protect pmd stats fro

[ovs-dev] [PATCH] Do RCU synchronization at fixed interval in PMD main loop.

2019-08-07 Thread Nitin Katiyar
Each PMD updates the global sequence number for RCU synchronization
purpose with other OVS threads. This is done at every 1025th iteration
in PMD main loop.

If the PMD thread is responsible for polling large number of queues
that are carrying traffic, it spends a lot of time processing packets
and this results in significant delay in performing the housekeeping
activities.

If the OVS main thread is waiting to synchronize with the PMD threads
and if those threads delay performing housekeeping activities for
more than 3 sec then LACP processing will be impacted and it will lead
to LACP flaps. Similarly, other controls protocols run by OVS main
thread are impacted.

For e.g. a PMD thread polling 200 ports/queues with average of 1600
processing cycles per packet with batch size of 32 may take 1024
(200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU
it means more than 5 ms per iteration. So, for 1024 iterations to
complete it would be more than 5 seconds.

This gets worse when there are PMD threads which are less loaded.
It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
by heavily loaded PMD and next attempt to quiesce would be after 1024
iterations.

With this patch, PMD RCU synchronization will be performed after fixed
interval instead after a fixed number of iterations. This will ensure
that even if the packet processing load is high the RCU synchronization
will not be delayed long.

Co-authored-by: Anju Thomas 

Signed-off-by: Nitin Katiyar 
Signed-off-by: Anju Thomas 
---
 lib/dpif-netdev-perf.c | 16 
 lib/dpif-netdev-perf.h | 17 +
 lib/dpif-netdev.c  | 27 +++
 3 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index e7ed49e..c888e5d 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -43,22 +43,6 @@ uint64_t iter_cycle_threshold;
 
 static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600);
 
-#ifdef DPDK_NETDEV
-static uint64_t
-get_tsc_hz(void)
-{
-return rte_get_tsc_hz();
-}
-#else
-/* This function is only invoked from PMD threads which depend on DPDK.
- * A dummy function is sufficient when building without DPDK_NETDEV. */
-static uint64_t
-get_tsc_hz(void)
-{
-return 1;
-}
-#endif
-
 /* Histogram functions. */
 
 static void
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 244813f..3f2ee1c 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -187,6 +187,23 @@ struct pmd_perf_stats {
 char *log_reason;
 };
 
+#ifdef DPDK_NETDEV
+static inline uint64_t
+get_tsc_hz(void)
+{
+return rte_get_tsc_hz();
+}
+#else
+/* This function is only invoked from PMD threads which depend on DPDK.
+ * A dummy function is sufficient when building without DPDK_NETDEV. */
+static inline uint64_t
+get_tsc_hz(void)
+{
+return 1;
+}
+#endif
+
+
 #ifdef __linux__
 static inline uint64_t
 rdtsc_syscall(struct pmd_perf_stats *s)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d0a1c58..c3d6835 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -751,6 +751,9 @@ struct dp_netdev_pmd_thread {
 
 /* Set to true if the pmd thread needs to be reloaded. */
 bool need_reload;
+
+/* Last time (in tsc) when PMD was last quiesced */
+uint64_t last_rcu_quiesced;
 };
 
 /* Interface to netdev-based datapath. */
@@ -5445,6 +5448,7 @@ pmd_thread_main(void *f_)
 int poll_cnt;
 int i;
 int process_packets = 0;
+uint64_t rcu_quiesce_interval = 0;
 
 poll_list = NULL;
 
@@ -5486,6 +5490,13 @@ reload:
 pmd->intrvl_tsc_prev = 0;
 atomic_store_relaxed(>intrvl_cycles, 0);
 cycles_counter_update(s);
+
+if (get_tsc_hz() > 1) {
+/* Calculate ~10 ms interval. */
+rcu_quiesce_interval = get_tsc_hz() / 100;
+pmd->last_rcu_quiesced = cycles_counter_get(s);
+}
+
 /* Protect pmd stats from external clearing while polling. */
 ovs_mutex_lock(>perf_stats.stats_mutex);
 for (;;) {
@@ -5493,6 +5504,19 @@ reload:
 
 pmd_perf_start_iteration(s);
 
+/* Do RCU synchronization at fixed interval instead of doing it
+ * at fixed number of iterations. This ensures that synchronization
+ * would not be delayed long even at high load of packet
+ * processing. */
+
+if (rcu_quiesce_interval &&
+((cycles_counter_get(s) - pmd->last_rcu_quiesced) >
+ rcu_quiesce_interval)) {
+if (!ovsrcu_try_quiesce()) {
+pmd->last_rcu_quiesced = cycles_counter_get(s);
+}
+}
+
 for (i = 0; i < poll_cnt; i++) {
 
 if (!poll_list[i].rxq_enabled) {
@@ -5527,6 +5551,9 @@ reload:
 dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
 if (!ovsrcu_try_quiesce()) {
 emc_cache_slow_sweep(&((pmd->flow_cache).emc_cache));

[ovs-dev] [PATCH v3 2/2] Don't send or receive LACP PDUs when carrier state of slave is down

2019-06-09 Thread Nitin Katiyar
Fortville NICs (or their drivers) can get into an inconsistent state,
in which the NIC can actually transmit and receive packets even
though they report "PHY down". In such a state, OVS can exchange and
process LACP messages and enable a LACP slave. However, further packet
exchange over the slave fails because OVS sees that the PHY is down.

This commit fixes the problem by making OVS ignore received LACP PDUs
and suppress transmitting LACP PDUs when carrier is down. In addition,
when a LACP PDU is received with carrier down, this commit triggers
rechecking the carrier status (by incrementing the connectivity sequence
number) to ensure that it is updated as quickly as possible.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c | 35 ++-
 lib/lacp.h |  3 ++-
 ofproto/ofproto-dpif.c | 14 +++---
 3 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index e768012..16c823b 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -122,6 +122,7 @@ struct slave {
 
 enum slave_status status; /* Slave status. */
 bool attached;/* Attached. Traffic may flow. */
+bool carrier_up;  /* Carrier state of link. */
 struct lacp_info partner; /* Partner information. */
 struct lacp_info ntt_actor;   /* Used to decide if we Need To Transmit. */
 struct timer tx;  /* Next message transmission timer. */
@@ -350,6 +351,16 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 goto out;
 }
 
+/* On some NICs L1 state reporting is slow. In case LACP packets are
+ * received while carrier (L1) state is still down, drop the LACP PDU and
+ * trigger re-checking of L1 state. */
+if (!slave->carrier_up) {
+VLOG_INFO_RL(, "%s: carrier state is DOWN,"
+ " dropping received LACP PDU.", slave->name);
+seq_change(connectivity_seq_get());
+goto out;
+}
+
 slave->status = LACP_CURRENT;
 tx_rate = lacp->fast ? LACP_FAST_TIME_TX : LACP_SLOW_TIME_TX;
 timer_set_duration(>rx, LACP_RX_MULTIPLIER * tx_rate);
@@ -456,7 +467,8 @@ lacp_slave_unregister(struct lacp *lacp, const void *slave_)
 /* This function should be called whenever the carrier status of 'slave_' has
  * changed.  If 'lacp' is null, this function has no effect.*/
 void
-lacp_slave_carrier_changed(const struct lacp *lacp, const void *slave_)
+lacp_slave_carrier_changed(const struct lacp *lacp, const void *slave_,
+   bool carrier_up)
 OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
@@ -473,7 +485,11 @@ lacp_slave_carrier_changed(const struct lacp *lacp, const 
void *slave_)
 if (slave->status == LACP_CURRENT || slave->lacp->active) {
 slave_set_expired(slave);
 }
-slave->count_carrier_changed++;
+
+if (slave->carrier_up != carrier_up) {
+slave->carrier_up = carrier_up;
+slave->count_carrier_changed++;
+}
 
 out:
 lacp_unlock();
@@ -498,11 +514,18 @@ lacp_slave_may_enable(const struct lacp *lacp, const void 
*slave_)
 {
 if (lacp) {
 struct slave *slave;
-bool ret;
+bool ret = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
-ret = slave ? slave_may_enable__(slave) : false;
+if (slave) {
+/* It is only called when carrier is up. So, enable slave's
+ * carrier state if it is currently down. */
+if (!slave->carrier_up) {
+slave->carrier_up = true;
+}
+ret = slave_may_enable__(slave);
+}
 lacp_unlock();
 return ret;
 } else {
@@ -820,7 +843,9 @@ slave_get_priority(struct slave *slave, struct lacp_info 
*priority)
 static bool
 slave_may_tx(const struct slave *slave) OVS_REQUIRES(mutex)
 {
-return slave->lacp->active || slave->status != LACP_DEFAULTED;
+/* Check for L1 state as well as LACP state. */
+return (slave->carrier_up) && ((slave->lacp->active) ||
+(slave->status != LACP_DEFAULTED));
 }
 
 static struct slave *
diff --git a/lib/lacp.h b/lib/lacp.h
index 0dfaef0..d731ae9 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -60,7 +60,8 @@ struct lacp_slave_settings {
 void lacp_slave_register(struct lacp *, void *slave_,
  const struct lacp_slave_settings *);
 void lacp_slave_unregister(struct lacp *, const void *slave);
-void lacp_slave_carrier_changed(const struct lacp *, const void *slave);
+void lacp_slave_carrier_changed(const struct lacp *, const void *slave,
+bool carrier_up);
 bool lacp_slave_may_enable(const struct lacp *, const void *slave);
 bool lacp_slave_is_current(const struct lacp *, const void *

[ovs-dev] [PATCH v7 1/2] Avoid packet drop on LACP bond after link up

2019-06-09 Thread Nitin Katiyar
Problem:

The OVS state machine that enables and disables bond slaves runs in 
the OVS main thread. The OVS code that processes received LACP packets
runs in a different thread. Until now, when the latter processes a LACP
PDU that should enable a slave, the slave was only enabled when the
main thread was able to run the state machine. In some cases this led
to delays of up to 350ms when the main thread was busy or not scheduled,
which led to corresponding delays in which packets were dropped due to
the bond-admissibility check.

Fix:

When a LACP PDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c   | 10 +-
 lib/lacp.h   |  2 +-
 ofproto/bond.c   | 22 +++---
 ofproto/ofproto-dpif-xlate.c | 10 +-
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..e768012 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,7 +325,7 @@ lacp_is_active(const struct lacp *lacp) OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function should be
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
-void
+bool
 lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,14 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 slave->partner = pdu->actor;
 }
 
+/* Evaluate may_enable here to avoid dropping of packets till main thread
+ * sets may_enable to true. */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). */
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..0dfaef0 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,7 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index d2a8b1f..c5d5f2c 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -794,6 +794,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -811,7 +812,11 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/* To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for the
+ * main thread to run LACP state machine and enable the slave. */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fallback_ab) {
@@ -847,8 +852,6 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 /* Drop all packets which arrive on backup slaves.  This is similar to
  * how Linux bonding handles active-backup bonds. */
 if (bond->active_slave != slave) {
-

Re: [ovs-dev] [PATCH v2 2/2] Don't send or receive LACP PDUs when carrier state of slave is down

2019-06-09 Thread Nitin Katiyar



> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Friday, June 07, 2019 11:03 PM
> To: Nitin Katiyar 
> Cc: ovs-dev@openvswitch.org; Manohar Krishnappa Chidambaraswamy
> 
> Subject: Re: [ovs-dev] [PATCH v2 2/2] Don't send or receive LACP PDUs when
> carrier state of slave is down
> 
> On Thu, Jun 06, 2019 at 06:34:21AM +, Nitin Katiyar wrote:
> >
> >
> > > -Original Message-
> > > From: Ben Pfaff [mailto:b...@ovn.org]
> > > Sent: Wednesday, June 05, 2019 12:03 AM
> > > To: Nitin Katiyar 
> > > Cc: ovs-dev@openvswitch.org; Manohar Krishnappa Chidambaraswamy
> > > 
> > > Subject: Re: [ovs-dev] [PATCH v2 2/2] Don't send or receive LACP
> > > PDUs when carrier state of slave is down
> > >
> > > On Sun, Jun 02, 2019 at 04:14:34PM +, Nitin Katiyar wrote:
> > > > Problem:
> > > > 
> > > > On certain Fortville NICs it has been observed that PHY UP
> > > > detection can get delayed (sometimes up to 4-5 secs). When the
> > > > driver fails to fetch PHY status as UP even though its actually
> > > > UP, LACP packets can get exchanged and LACP slave brought UP. In
> > > > such a case, the remote end would start sending traffic on that
> > > > slave, but OVS drops it, as the bond-slave is not yet enabled due to PHY
> DOWN.
> > > >
> > > > Fix:
> > > > 
> > > > The main intention here is delay LACP negotiation until carrier
> > > > (PHY) status is read as UP.
> > > >
> > > > 1. When carrier state is DOWN, do not send any LACP PDUs and
> > > >   drop any received LACP PDUs.
> > > >
> > > > 2. When LACP PDU is received, trigger re-checking
> > > >   of carrier-state (in port_run()) by incrementing the connectivity
> > > >   sequence number to find out the true carrier state as fast as
> > > >   possible.
> > >
> > > I had to read this patch and the above description a few times.
> > > Maybe I understand it now.  Please let me try to re-state it and let
> > > me know if I've got it right.
> > >
> > > Fortville NICs (or perhaps their drivers) do not consistently send
> > > an immediate notification when the PHY comes up.  However, packets can
> still be received.
> > > Until now, OVS drops traffic until the notification that the PHY is up 
> > > arrives.
> > > This patch changes OVS to respond to receiving a LACP packet on a
> > > device, for which it has not yet been notified that carrier is up,
> > > by immediately polling carrier state.  This polling can bypass the
> > > Fortville delayed notification and allow OVS to bring up the bond slave
> faster.
> > It is other way. Until now, OVS (and connected switch) may see LACP in
> negotiated state but OVS may have link/carrier state as DOWN due to
> NIC/driver issue.
> > In this case, NIC/driver didn't update the link UP state for few seconds
> although link had come up and started carrying traffic. This brought up the
> LACP slave and remote end started traffic on this slave. OVS dropped the
> packets as slave status was down.
> > To avoid this situation we propose this patch for not sending/receiving LACP
> PDUs till link is reported UP. In short, we delay LACP negotiation till 
> carrier
> status is UP. Additionally, if PMD is receiving LACP PDU when carrier state is
> down it triggers checking of state by changing connectivity sequence number.
> 
> OK, let me try again.  Is the following correct?
> 
> --8<--cut here-->8--
> 
> Fortville NICs (or their drivers) can get into an inconsistent state, in 
> which the
> NIC can actually transmit and receive packets even though they report "PHY
> down".  In such a state, OVS can exchange and process LACP messages and
> enable a bond slave.  However, further packet exchange over the slave fails
> because OVS sees that the PHY is down.
> 
> This commit fixes the problem by making OVS ignore received LACP PDUs and
> suppress transmitting LACP PDUs when carrier is down.  In addition, when a
> LACP PDU is received with carrier down, this commit triggers rechecking the
> carrier status (by incrementing the connectivity sequence
> number) to ensure that it is updated as quickly as possible.
This is correct Ben. I will update the patch description.

Regards,
Nitin
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/2] Avoid packet drop on LACP bond after link up

2019-06-06 Thread Nitin Katiyar



> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, June 04, 2019 11:39 PM
> To: Nitin Katiyar 
> Cc: ovs-dev@openvswitch.org; Manohar Krishnappa Chidambaraswamy
> 
> Subject: Re: [ovs-dev] [PATCH v6 1/2] Avoid packet drop on LACP bond after
> link up
> 
> On Sun, Jun 02, 2019 at 04:06:34PM +, Nitin Katiyar wrote:
> > Problem:
> > 
> > During port DOWN->UP of link (slave) in a LACP bond, after receiving
> > the LACP PDU with SYNC set for both actor and partner, the bond-slave
> > remains "disabled" until OVS main thread runs LACP state machine and
> > eventually "enables" the bond-slave. With this, we have observed
> > delays in the order of 350ms and packets are dropped in OVS due to
> > bond-admissibility check (packets received on slave in "disabled" state are
> dropped).
> 
> I had to think about this a bit.  Is the following alternate statement of the
> problem also correct?
Yes, that's correct. Do you want me to update the patch with the following in 
problem statement?
> 
> The OVS state machine that enables and disables bond slaves runs in the OVS
> main thread.  The OVS code that processes received LACP packets runs in a
> different thread.  Until now, when the latter processes a LACP PDU that
> should enable a slave, the slave was only enabled when the main thread was
> able to run the state machine.   In some cases this led to delays of up to
> 350ms when the main thread was busy or not scheduled, which led to
> corresponding delays in which packets were dropped due to the bond-
> admissibility check.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Don't send or receive LACP PDUs when carrier state of slave is down

2019-06-06 Thread Nitin Katiyar



> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Wednesday, June 05, 2019 12:03 AM
> To: Nitin Katiyar 
> Cc: ovs-dev@openvswitch.org; Manohar Krishnappa Chidambaraswamy
> 
> Subject: Re: [ovs-dev] [PATCH v2 2/2] Don't send or receive LACP PDUs when
> carrier state of slave is down
> 
> On Sun, Jun 02, 2019 at 04:14:34PM +, Nitin Katiyar wrote:
> > Problem:
> > 
> > On certain Fortville NICs it has been observed that PHY UP detection
> > can get delayed (sometimes up to 4-5 secs). When the driver fails to
> > fetch PHY status as UP even though its actually UP, LACP packets can
> > get exchanged and LACP slave brought UP. In such a case, the remote
> > end would start sending traffic on that slave, but OVS drops it, as
> > the bond-slave is not yet enabled due to PHY DOWN.
> >
> > Fix:
> > 
> > The main intention here is delay LACP negotiation until carrier (PHY)
> > status is read as UP.
> >
> > 1. When carrier state is DOWN, do not send any LACP PDUs and
> >   drop any received LACP PDUs.
> >
> > 2. When LACP PDU is received, trigger re-checking
> >   of carrier-state (in port_run()) by incrementing the connectivity
> >   sequence number to find out the true carrier state as fast as
> >   possible.
> 
> I had to read this patch and the above description a few times.  Maybe I
> understand it now.  Please let me try to re-state it and let me know if I've 
> got it
> right.
> 
> Fortville NICs (or perhaps their drivers) do not consistently send an 
> immediate
> notification when the PHY comes up.  However, packets can still be received.
> Until now, OVS drops traffic until the notification that the PHY is up 
> arrives.
> This patch changes OVS to respond to receiving a LACP packet on a device, for
> which it has not yet been notified that carrier is up, by immediately polling
> carrier state.  This polling can bypass the Fortville delayed notification and
> allow OVS to bring up the bond slave faster.
It is other way. Until now, OVS (and connected switch) may see LACP in 
negotiated state but OVS may have link/carrier state as DOWN due to NIC/driver 
issue.
In this case, NIC/driver didn't update the link UP state for few seconds 
although link had come up and started carrying traffic. This brought up the 
LACP slave and remote end started traffic on this slave. OVS dropped the 
packets as slave status was down.
To avoid this situation we propose this patch for not sending/receiving LACP 
PDUs till link is reported UP. In short, we delay LACP negotiation till carrier 
status is UP. Additionally, if PMD is receiving LACP PDU when carrier state is 
down it triggers checking of state by changing connectivity sequence number. 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/2] Don't send or receive LACP PDUs when carrier state of slave is down

2019-06-02 Thread Nitin Katiyar
Problem:

On certain Fortville NICs it has been observed that PHY UP detection can
get delayed (sometimes up to 4-5 secs). When the driver fails to fetch
PHY status as UP even though its actually UP, LACP packets can get
exchanged and LACP slave brought UP. In such a case, the remote end
would start sending traffic on that slave, but OVS drops it, as the
bond-slave is not yet enabled due to PHY DOWN.

Fix:

The main intention here is delay LACP negotiation until carrier (PHY)
status is read as UP.

1. When carrier state is DOWN, do not send any LACP PDUs and
  drop any received LACP PDUs.

2. When LACP PDU is received, trigger re-checking
  of carrier-state (in port_run()) by incrementing the connectivity
  sequence number to find out the true carrier state as fast as
  possible.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c | 35 ++-
 lib/lacp.h |  3 ++-
 ofproto/ofproto-dpif.c | 14 +++---
 3 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index e768012..16c823b 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -122,6 +122,7 @@ struct slave {
 
 enum slave_status status; /* Slave status. */
 bool attached;/* Attached. Traffic may flow. */
+bool carrier_up;  /* Carrier state of link. */
 struct lacp_info partner; /* Partner information. */
 struct lacp_info ntt_actor;   /* Used to decide if we Need To Transmit. */
 struct timer tx;  /* Next message transmission timer. */
@@ -350,6 +351,16 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 goto out;
 }
 
+/* On some NICs L1 state reporting is slow. In case LACP packets are
+ * received while carrier (L1) state is still down, drop the LACP PDU and
+ * trigger re-checking of L1 state. */
+if (!slave->carrier_up) {
+VLOG_INFO_RL(, "%s: carrier state is DOWN,"
+ " dropping received LACP PDU.", slave->name);
+seq_change(connectivity_seq_get());
+goto out;
+}
+
 slave->status = LACP_CURRENT;
 tx_rate = lacp->fast ? LACP_FAST_TIME_TX : LACP_SLOW_TIME_TX;
 timer_set_duration(>rx, LACP_RX_MULTIPLIER * tx_rate);
@@ -456,7 +467,8 @@ lacp_slave_unregister(struct lacp *lacp, const void *slave_)
 /* This function should be called whenever the carrier status of 'slave_' has
  * changed.  If 'lacp' is null, this function has no effect.*/
 void
-lacp_slave_carrier_changed(const struct lacp *lacp, const void *slave_)
+lacp_slave_carrier_changed(const struct lacp *lacp, const void *slave_,
+   bool carrier_up)
 OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
@@ -473,7 +485,11 @@ lacp_slave_carrier_changed(const struct lacp *lacp, const 
void *slave_)
 if (slave->status == LACP_CURRENT || slave->lacp->active) {
 slave_set_expired(slave);
 }
-slave->count_carrier_changed++;
+
+if (slave->carrier_up != carrier_up) {
+slave->carrier_up = carrier_up;
+slave->count_carrier_changed++;
+}
 
 out:
 lacp_unlock();
@@ -498,11 +514,18 @@ lacp_slave_may_enable(const struct lacp *lacp, const void 
*slave_)
 {
 if (lacp) {
 struct slave *slave;
-bool ret;
+bool ret = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
-ret = slave ? slave_may_enable__(slave) : false;
+if (slave) {
+/* It is only called when carrier is up. So, enable slave's
+ * carrier state if it is currently down. */
+if (!slave->carrier_up) {
+slave->carrier_up = true;
+}
+ret = slave_may_enable__(slave);
+}
 lacp_unlock();
 return ret;
 } else {
@@ -820,7 +843,9 @@ slave_get_priority(struct slave *slave, struct lacp_info 
*priority)
 static bool
 slave_may_tx(const struct slave *slave) OVS_REQUIRES(mutex)
 {
-return slave->lacp->active || slave->status != LACP_DEFAULTED;
+/* Check for L1 state as well as LACP state. */
+return (slave->carrier_up) && ((slave->lacp->active) ||
+(slave->status != LACP_DEFAULTED));
 }
 
 static struct slave *
diff --git a/lib/lacp.h b/lib/lacp.h
index 0dfaef0..d731ae9 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -60,7 +60,8 @@ struct lacp_slave_settings {
 void lacp_slave_register(struct lacp *, void *slave_,
  const struct lacp_slave_settings *);
 void lacp_slave_unregister(struct lacp *, const void *slave);
-void lacp_slave_carrier_changed(const struct lacp *, const void *slave);
+void lacp_slave_carrier_changed(const struct lacp *, const void *slave,
+bool carrier_up);
 bool lacp_slave_may_en

[ovs-dev] [PATCH v6 1/2] Avoid packet drop on LACP bond after link up

2019-06-02 Thread Nitin Katiyar
Problem:

During port DOWN->UP of link (slave) in a LACP bond, after receiving the
LACP PDU with SYNC set for both actor and partner, the bond-slave remains
"disabled" until OVS main thread runs LACP state machine and eventually
"enables" the bond-slave. With this, we have observed delays in the order
of 350ms and packets are dropped in OVS due to bond-admissibility check
(packets received on slave in "disabled" state are dropped).

Fix:

When a LACP PDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c   | 10 +-
 lib/lacp.h   |  2 +-
 ofproto/bond.c   | 22 +++---
 ofproto/ofproto-dpif-xlate.c | 10 +-
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..e768012 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,7 +325,7 @@ lacp_is_active(const struct lacp *lacp) OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function should be
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
-void
+bool
 lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,14 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 slave->partner = pdu->actor;
 }
 
+/* Evaluate may_enable here to avoid dropping of packets till main thread
+ * sets may_enable to true. */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). */
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..0dfaef0 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,7 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index d2a8b1f..c5d5f2c 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -794,6 +794,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -811,7 +812,11 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/* To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for the
+ * main thread to run LACP state machine and enable the slave. */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fallback_ab) {
@@ -847,8 +852,6 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 /* Drop all packets which arrive on backup slaves.  This is similar to
  * how Linux bonding handles active-backup bonds. */
 if (bond->active_slave != slave) {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5

[ovs-dev] [PATCH 2/2] Do not send or receive LACP PDUs when carrier state of slave is down

2019-03-05 Thread Nitin Katiyar
Problem:

On certain Fortville NICs it has been observed that PHY UP detection can
get delayed (sometimes up to 4-5 secs). When the driver fails to fetch
PHY status as UP even though its actually UP, LACP packets can get
exchanged and LACP slave brought UP. In such a case, the remote end
would start sending traffic on that slave, but OVS drops it, as the
bond-slave is not yet enabled due to PHY DOWN.

Fix:

The main intention here is delay LACP negotiation until carrier (PHY)
status is read as UP.

1. In port_run()/bundle_run(), cache the carrier status in
  "struct ofport_dpif"/"struct bond_slave".

2. When carrier state is DOWN, do not send any LACPDUs and
  drop any received LACPDUs.

3. When LACP state changes or LACPDU is received, trigger re-checking
  of carrier-state (in port_run()) by incrementing the connectivity
  sequence number to find out the true carrier state as fast as
  possible.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c   | 40 +++-
 lib/lacp.h   |  8 
 ofproto/bond.c   | 30 ++
 ofproto/bond.h   |  3 +++
 ofproto/ofproto-dpif-xlate.c |  1 +
 ofproto/ofproto-dpif.c   | 10 +++---
 6 files changed, 76 insertions(+), 16 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index e768012..0259098 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -33,6 +33,7 @@
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
 #include "util.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(lacp);
 
@@ -148,7 +149,7 @@ static void slave_get_actor(struct slave *, struct 
lacp_info *actor)
 OVS_REQUIRES(mutex);
 static void slave_get_priority(struct slave *, struct lacp_info *priority)
 OVS_REQUIRES(mutex);
-static bool slave_may_tx(const struct slave *)
+static bool slave_may_tx(const void *bond, const struct slave *)
 OVS_REQUIRES(mutex);
 static struct slave *slave_lookup(const struct lacp *, const void *slave)
 OVS_REQUIRES(mutex);
@@ -326,14 +327,15 @@ lacp_is_active(const struct lacp *lacp) 
OVS_EXCLUDED(mutex)
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
 bool
-lacp_process_packet(struct lacp *lacp, const void *slave_,
-const struct dp_packet *packet)
+lacp_process_packet(struct lacp *lacp, const void *bond,
+const void *slave_, const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
 {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool carrier_up = false;
 bool lacp_may_enable = false;
 
 lacp_lock();
@@ -350,6 +352,17 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 goto out;
 }
 
+/* On some NICs L1 state reporting is slow. In case LACP packets are
+ * received while carrier (L1) state is still down, drop the LACP PDU and
+ * trigger re-checking of L1 state. */
+carrier_up = bond_slave_get_carrier(bond, slave->aux);
+if (!carrier_up) {
+VLOG_INFO_RL(, "%s: carrier state is DOWN,"
+ " dropping received LACP PDU.", slave->name);
+seq_change(connectivity_seq_get());
+goto out;
+}
+
 slave->status = LACP_CURRENT;
 tx_rate = lacp->fast ? LACP_FAST_TIME_TX : LACP_SLOW_TIME_TX;
 timer_set_duration(>rx, LACP_RX_MULTIPLIER * tx_rate);
@@ -529,7 +542,8 @@ lacp_slave_is_current(const struct lacp *lacp, const void 
*slave_)
 
 /* This function should be called periodically to update 'lacp'. */
 void
-lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
+lacp_run(struct lacp *lacp, const void *bond,
+ lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
@@ -559,7 +573,7 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 HMAP_FOR_EACH (slave, node, >slaves) {
 struct lacp_info actor;
 
-if (!slave_may_tx(slave)) {
+if (!slave_may_tx(bond, slave)) {
 continue;
 }
 
@@ -588,13 +602,13 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 
 /* Causes poll_block() to wake up when lacp_run() needs to be called again. */
 void
-lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex)
+lacp_wait(struct lacp *lacp, const void *bond) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
 lacp_lock();
 HMAP_FOR_EACH (slave, node, >slaves) {
-if (slave_may_tx(slave)) {
+if (slave_may_tx(bond, slave)) {
 timer_wait(>tx);
 }
 
@@ -818,9 +832,17 @@ slave_get_priority(struct slave *slave, struct lacp_info 
*priority)
 }
 
 static bool
-slave_may_tx(const struct slave *slave

[ovs-dev] [PATCH v5 1/2] Avoid packet drop on LACP bond after link up

2019-03-05 Thread Nitin Katiyar
Problem:

During port DOWN->UP of link (slave) in a LACP bond, after receiving the
LACP PDU with SYNC set for both actor and partner, the bond-slave remains
"disabled" until OVS main thread runs LACP state machine and eventually
"enables" the bond-slave. With this, we have observed delays in the order
of 350ms and packets are dropped in OVS due to bond-admissibility check
(packets received on slave in "disabled" state are dropped).

Fix:

When a LACP PDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c   | 10 +-
 lib/lacp.h   |  2 +-
 ofproto/bond.c   | 22 +++---
 ofproto/ofproto-dpif-xlate.c | 10 +-
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..e768012 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,7 +325,7 @@ lacp_is_active(const struct lacp *lacp) OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function should be
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
-void
+bool
 lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,14 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 slave->partner = pdu->actor;
 }
 
+/* Evaluate may_enable here to avoid dropping of packets till main thread
+ * sets may_enable to true. */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). */
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..0dfaef0 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,7 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index d2a8b1f..c5d5f2c 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -794,6 +794,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -811,7 +812,11 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/* To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for the
+ * main thread to run LACP state machine and enable the slave. */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fallback_ab) {
@@ -847,8 +852,6 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 /* Drop all packets which arrive on backup slaves.  This is similar to
  * how Linux bonding handles active-backup bonds. */
 if (bond->active_slave != slave) {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5

Re: [ovs-dev] [PATCH v4 2/2] Avoid packet drop on LACP bond after link up

2019-03-02 Thread Nitin Katiyar
Hi,

> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Saturday, March 02, 2019 12:15 AM
> To: Nitin Katiyar 
> Cc: ovs-dev@openvswitch.org; Manohar Krishnappa Chidambaraswamy
> 
> Subject: Re: [ovs-dev] [PATCH v4 2/2] Avoid packet drop on LACP bond after
> link up
> 
> I don't entirely understand the problem.  It seems like a driver bug.
> Why isn't the bug being fixed?
I agree it is driver/firmware bug but OVS implementation also doesn't account 
for link state which causes traffic drop in case of bond.
> 
> Patches 1 and 2 have the same title.  It would be better if they were 
> different.
Both issues had surfaced in same scenario so we decided to have 2 patches. But 
we can send it as 2 different patches next time.
> 
> [more below]
> 
> On Thu, Feb 28, 2019 at 01:56:20PM +, Nitin Katiyar wrote:
> > +/*
> > + * On some NICs L1 state reporting is slow. In case LACP packets are
> > + * received while carrier (L1) state is still down, drop the LACPDU and
> > + * trigger re-checking of L1 state.
> > + */
> > +carrier_up = bond_slave_get_carrier(bond, slave->aux);
> > +if (!carrier_up) {
> > +seq_change(connectivity_seq_get());
> > +
> > +VLOG_INFO_RL(, "carrier still DOWN - conn seq changed for %s, "
> > +  "dropping packet\n", slave->name);
> 
> It's going to be hard for a user to understand what this means, especially the
> "conn seq changed" part.  The user also doesn't know the context, so it's not
> clear what "carrier still DOWN" means--still down after what?  Can you please
> rephrase the log message so that it's clear what happened and why it's
> unusual?
Sure, I will rephrase it.
> 
> Please put the name of the lacp object at the beginning of the message.
> 
> Please don't add \n at the end of a log message; the logger does that itself.
> 
> In bond_slave_get_carrier(), please use CONST_CAST for casting away const.
I will update all these three in next patch.

Thanks,
Nitin
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 2/2] Avoid packet drop on LACP bond after link up

2019-02-28 Thread Nitin Katiyar
Problem:

On certain Fortville NICs it has been observed that PHY UP detection can
get delayed (sometimes up to 4-5 secs). When the driver fails to fetch
PHY status as UP even though its actually UP,  LACP packets can get
exchanged and LACP slave brought UP. In such a case, the remote end
would start sending traffic on that slave, but OVS drops it, as the
bond-slave is not yet enabled due to PHY DOWN.

Fix:

The main intention here is delay LACP negotiation until carrier (PHY)
status is read as UP.

1. In port_run()/bundle_run(), cache the carrier status in
  "struct ofport_dpif"/"struct bond_slave".

2. When carrier state is DOWN, do not send any LACPDUs and
  drop any received LACPDUs.

3. When LACP state changes or LACPDU is received, trigger re-checking
  of carrier-state (in port_run()) by incrementing the connectivity
  sequence number to find out the true carrier state as fast as
  possible.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c   | 44 +++-
 lib/lacp.h   |  8 
 ofproto/bond.c   | 30 ++
 ofproto/bond.h   |  3 +++
 ofproto/ofproto-dpif-xlate.c |  1 +
 ofproto/ofproto-dpif.c   | 10 +++---
 6 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index e768012..2bfa267 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -33,6 +33,7 @@
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
 #include "util.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(lacp);
 
@@ -148,7 +149,7 @@ static void slave_get_actor(struct slave *, struct 
lacp_info *actor)
 OVS_REQUIRES(mutex);
 static void slave_get_priority(struct slave *, struct lacp_info *priority)
 OVS_REQUIRES(mutex);
-static bool slave_may_tx(const struct slave *)
+static bool slave_may_tx(const void *bond, const struct slave *)
 OVS_REQUIRES(mutex);
 static struct slave *slave_lookup(const struct lacp *, const void *slave)
 OVS_REQUIRES(mutex);
@@ -326,14 +327,15 @@ lacp_is_active(const struct lacp *lacp) 
OVS_EXCLUDED(mutex)
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
 bool
-lacp_process_packet(struct lacp *lacp, const void *slave_,
-const struct dp_packet *packet)
+lacp_process_packet(struct lacp *lacp, const void *bond,
+const void *slave_, const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
 {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool carrier_up = false;
 bool lacp_may_enable = false;
 
 lacp_lock();
@@ -350,6 +352,21 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 goto out;
 }
 
+/*
+ * On some NICs L1 state reporting is slow. In case LACP packets are
+ * received while carrier (L1) state is still down, drop the LACPDU and
+ * trigger re-checking of L1 state.
+ */
+carrier_up = bond_slave_get_carrier(bond, slave->aux);
+if (!carrier_up) {
+seq_change(connectivity_seq_get());
+
+VLOG_INFO_RL(, "carrier still DOWN - conn seq changed for %s, "
+  "dropping packet\n", slave->name);
+
+goto out;
+}
+
 slave->status = LACP_CURRENT;
 tx_rate = lacp->fast ? LACP_FAST_TIME_TX : LACP_SLOW_TIME_TX;
 timer_set_duration(>rx, LACP_RX_MULTIPLIER * tx_rate);
@@ -529,7 +546,8 @@ lacp_slave_is_current(const struct lacp *lacp, const void 
*slave_)
 
 /* This function should be called periodically to update 'lacp'. */
 void
-lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
+lacp_run(struct lacp *lacp, const void *bond,
+ lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
@@ -559,7 +577,7 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 HMAP_FOR_EACH (slave, node, >slaves) {
 struct lacp_info actor;
 
-if (!slave_may_tx(slave)) {
+if (!slave_may_tx(bond, slave)) {
 continue;
 }
 
@@ -588,13 +606,13 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 
 /* Causes poll_block() to wake up when lacp_run() needs to be called again. */
 void
-lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex)
+lacp_wait(struct lacp *lacp, const void *bond) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
 lacp_lock();
 HMAP_FOR_EACH (slave, node, >slaves) {
-if (slave_may_tx(slave)) {
+if (slave_may_tx(bond, slave)) {
 timer_wait(>tx);
 }
 
@@ -818,9 +836,17 @@ slave_get_priority(struct slave *slave, struct lacp_info 
*priority)
 }
 
 static bool
-slave_may_tx(

[ovs-dev] [PATCH v4 1/2] Avoid packet drop on LACP bond after link up

2019-02-28 Thread Nitin Katiyar
Problem:

During port DOWN->UP of link (slave) in a LACP bond, after receiving the
LACPDU with SYNC set for both actor and partner, the bond-slave remains
"disabled" until OVS main thread runs LACP state machine and eventually
"enables" the bond-slave. With this, we have observed delays in the order
of 350ms and packets are  dropped in OVS due to bond-admissibility check
(packets received on slave in "disabled" state are dropped).

Fix:

When a LACPDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c   | 10 +-
 lib/lacp.h   |  2 +-
 ofproto/bond.c   | 16 +---
 ofproto/ofproto-dpif-xlate.c | 10 +-
 4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..e768012 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,7 +325,7 @@ lacp_is_active(const struct lacp *lacp) OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function should be
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
-void
+bool
 lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,14 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 slave->partner = pdu->actor;
 }
 
+/* Evaluate may_enable here to avoid dropping of packets till main thread
+ * sets may_enable to true. */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). */
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..0dfaef0 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,7 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index d2a8b1f..cd6fd33 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -794,6 +794,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -811,7 +812,11 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/* To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for the
+ * main thread to run LACP state machine and enable the slave. */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fallback_ab) {
@@ -847,8 +852,6 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 /* Drop all packets which arrive on backup slaves.  This is similar to
  * how Linux bonding handles active-backup bonds. */
 if (bond->active_slave != slave) {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5

[ovs-dev] [PATCH 1/2] Avoid packet drop on LACP bond after link up

2019-02-27 Thread Nitin Katiyar
Problem:

During port DOWN->UP of link (slave) in a LACP bond, after receiving the
LACPDU with SYNC set for both actor and partner, the bond-slave remains
"disabled" until OVS main thread runs LACP state machine and eventually
"enables" the bond-slave. With this, we have observed delays in the order
of 350ms and packets are  dropped in OVS due to bond-admissibility check
(packets received on slave in "disabled" state are dropped).

Fix:

When a LACPDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Nitin Katiyar 
Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
---
 lib/lacp.c   | 12 ++--
 lib/lacp.h   |  3 ++-
 ofproto/bond.c   | 16 +---
 ofproto/ofproto-dpif-xlate.c | 11 ++-
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..7ac85f9 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,8 +325,8 @@ lacp_is_active(const struct lacp *lacp) OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function should be
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
-void
-lacp_process_packet(struct lacp *lacp, const void *slave_,
+bool
+lacp_process_packet(struct lacp *lacp, const void *bond, const void *slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
 {
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,14 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 slave->partner = pdu->actor;
 }
 
+/* Evaluate may_enable here to avoid dropping of packets till main thread
+ * sets may_enable to true. */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). */
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..1505c2c 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,8 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *bond,
+ const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index d2a8b1f..cd6fd33 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -794,6 +794,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -811,7 +812,11 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/* To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for the
+ * main thread to run LACP state machine and enable the slave. */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fallback_ab) {
@@ -847,8 +852,6 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 /* Drop all packets which arrive on backup slaves.  This is similar to
  * how Linux bonding handles active-backup bonds.

[ovs-dev] [PATCH 2/2] Avoid packet drop on LACP bond after link up

2019-02-27 Thread Nitin Katiyar
Problem:

On certain Fortville NICs it has been observed that PHY UP detection can
get delayed (sometimes up to 4-5 secs). When the driver fails to fetch
PHY status as UP even though its actually UP,  LACP packets can get
exchanged and LACP slave brought UP. In such a case, the remote end
would start sending traffic on that slave, but OVS drops it, as the
bond-slave is not yet enabled due to PHY DOWN.

Fix:

The main intention here is delay LACP negotiation until carrier (PHY)
status is read as UP.

1. In port_run()/bundle_run(), cache the carrier status in
  "struct ofport_dpif"/"struct bond_slave".

2. When carrier state is DOWN, do not send any LACPDUs and
  drop any received LACPDUs.

3. When LACP state changes or LACPDU is received, trigger re-checking
  of carrier-state (in port_run()) by incrementing the connectivity
  sequence number to find out the true carrier state as fast as
  possible.

Signed-off-by: Nitin Katiyar 
Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
---
 lib/lacp.c | 40 +---
 lib/lacp.h |  4 ++--
 ofproto/bond.c | 30 ++
 ofproto/bond.h |  3 +++
 ofproto/ofproto-dpif.c | 10 +++---
 tests/ofproto-dpif.at  |  3 +++
 6 files changed, 78 insertions(+), 12 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index 7ac85f9..359b6ce 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -33,6 +33,7 @@
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
 #include "util.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(lacp);
 
@@ -148,7 +149,7 @@ static void slave_get_actor(struct slave *, struct 
lacp_info *actor)
 OVS_REQUIRES(mutex);
 static void slave_get_priority(struct slave *, struct lacp_info *priority)
 OVS_REQUIRES(mutex);
-static bool slave_may_tx(const struct slave *)
+static bool slave_may_tx(const void *bond, const struct slave *)
 OVS_REQUIRES(mutex);
 static struct slave *slave_lookup(const struct lacp *, const void *slave)
 OVS_REQUIRES(mutex);
@@ -334,6 +335,7 @@ lacp_process_packet(struct lacp *lacp, const void *bond, 
const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool carrier_up = false;
 bool lacp_may_enable = false;
 
 lacp_lock();
@@ -350,6 +352,21 @@ lacp_process_packet(struct lacp *lacp, const void *bond, 
const void *slave_,
 goto out;
 }
 
+/*
+ * On some NICs L1 state reporting is slow. In case LACP packets are
+ * received while carrier (L1) state is still down, drop the LACPDU and
+ * trigger re-checking of L1 state.
+ */
+carrier_up = bond_slave_get_carrier(bond, slave_);
+if (!carrier_up) {
+seq_change(connectivity_seq_get());
+
+VLOG_INFO_RL(, "carrier still DOWN - conn seq changed for %s, "
+  "dropping packet\n", slave->name);
+
+goto out;
+}
+
 slave->status = LACP_CURRENT;
 tx_rate = lacp->fast ? LACP_FAST_TIME_TX : LACP_SLOW_TIME_TX;
 timer_set_duration(>rx, LACP_RX_MULTIPLIER * tx_rate);
@@ -529,7 +546,8 @@ lacp_slave_is_current(const struct lacp *lacp, const void 
*slave_)
 
 /* This function should be called periodically to update 'lacp'. */
 void
-lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
+lacp_run(struct lacp *lacp, const void *bond,
+ lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
@@ -559,7 +577,7 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 HMAP_FOR_EACH (slave, node, >slaves) {
 struct lacp_info actor;
 
-if (!slave_may_tx(slave)) {
+if (!slave_may_tx(bond, slave)) {
 continue;
 }
 
@@ -588,13 +606,13 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 
 /* Causes poll_block() to wake up when lacp_run() needs to be called again. */
 void
-lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex)
+lacp_wait(struct lacp *lacp, const void *bond) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
 lacp_lock();
 HMAP_FOR_EACH (slave, node, >slaves) {
-if (slave_may_tx(slave)) {
+if (slave_may_tx(bond, slave)) {
 timer_wait(>tx);
 }
 
@@ -818,9 +836,17 @@ slave_get_priority(struct slave *slave, struct lacp_info 
*priority)
 }
 
 static bool
-slave_may_tx(const struct slave *slave) OVS_REQUIRES(mutex)
+slave_may_tx(const void *bond, const struct slave *slave)
+OVS_REQUIRES(mutex)
 {
-return slave->lacp->active || slave->status != LACP_DEFAULTED;
+bool carrier_up = true;
+
+if (bond) {
+carrier_up = bond_slave_get_carrier(bond, slave->aux);
+}
+
+return carrier_up &&
+(slave->lacp->active || slave->status != LACP_DEFAULTED)

[ovs-dev] [PATCH v3 2/2] Avoid packet drop on LACP bond after link up

2019-02-26 Thread Nitin Katiyar
Problem:

On certain Fortville NICs it has been observed that PHY UP detection can
get delayed (sometimes up to 4-5 secs). When the driver fails to fetch
PHY status as UP even though its actually UP,  LACP packets can get
exchanged and LACP slave brought UP. In such a case, the remote end
would start sending traffic on that slave, but OVS drops it, as the
bond-slave is not yet enabled due to PHY DOWN.

Fix:

The main intention here is delay LACP negotiation until carrier (PHY)
status is read as UP.

1. In port_run()/bundle_run(), cache the carrier status in
  "struct ofport_dpif"/"struct bond_slave".

2. When carrier state is DOWN, do not send any LACPDUs and
  drop any received LACPDUs.

3. When LACP state changes or LACPDU is received, trigger re-checking
  of carrier-state (in port_run()) by incrementing the connectivity
  sequence number to find out the true carrier state as fast as
  possible.

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c | 44 +++-
 lib/lacp.h |  8 
 ofproto/bond.c | 30 ++
 ofproto/bond.h |  3 +++
 ofproto/ofproto-dpif.c | 10 +++---
 5 files changed, 79 insertions(+), 16 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index e768012..2bfa267 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -33,6 +33,7 @@
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
 #include "util.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(lacp);
 
@@ -148,7 +149,7 @@ static void slave_get_actor(struct slave *, struct 
lacp_info *actor)
 OVS_REQUIRES(mutex);
 static void slave_get_priority(struct slave *, struct lacp_info *priority)
 OVS_REQUIRES(mutex);
-static bool slave_may_tx(const struct slave *)
+static bool slave_may_tx(const void *bond, const struct slave *)
 OVS_REQUIRES(mutex);
 static struct slave *slave_lookup(const struct lacp *, const void *slave)
 OVS_REQUIRES(mutex);
@@ -326,14 +327,15 @@ lacp_is_active(const struct lacp *lacp) 
OVS_EXCLUDED(mutex)
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
 bool
-lacp_process_packet(struct lacp *lacp, const void *slave_,
-const struct dp_packet *packet)
+lacp_process_packet(struct lacp *lacp, const void *bond,
+const void *slave_, const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
 {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool carrier_up = false;
 bool lacp_may_enable = false;
 
 lacp_lock();
@@ -350,6 +352,21 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 goto out;
 }
 
+/*
+ * On some NICs L1 state reporting is slow. In case LACP packets are
+ * received while carrier (L1) state is still down, drop the LACPDU and
+ * trigger re-checking of L1 state.
+ */
+carrier_up = bond_slave_get_carrier(bond, slave->aux);
+if (!carrier_up) {
+seq_change(connectivity_seq_get());
+
+VLOG_INFO_RL(, "carrier still DOWN - conn seq changed for %s, "
+  "dropping packet\n", slave->name);
+
+goto out;
+}
+
 slave->status = LACP_CURRENT;
 tx_rate = lacp->fast ? LACP_FAST_TIME_TX : LACP_SLOW_TIME_TX;
 timer_set_duration(>rx, LACP_RX_MULTIPLIER * tx_rate);
@@ -529,7 +546,8 @@ lacp_slave_is_current(const struct lacp *lacp, const void 
*slave_)
 
 /* This function should be called periodically to update 'lacp'. */
 void
-lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
+lacp_run(struct lacp *lacp, const void *bond,
+ lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
@@ -559,7 +577,7 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 HMAP_FOR_EACH (slave, node, >slaves) {
 struct lacp_info actor;
 
-if (!slave_may_tx(slave)) {
+if (!slave_may_tx(bond, slave)) {
 continue;
 }
 
@@ -588,13 +606,13 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 
 /* Causes poll_block() to wake up when lacp_run() needs to be called again. */
 void
-lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex)
+lacp_wait(struct lacp *lacp, const void *bond) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
 lacp_lock();
 HMAP_FOR_EACH (slave, node, >slaves) {
-if (slave_may_tx(slave)) {
+if (slave_may_tx(bond, slave)) {
 timer_wait(>tx);
 }
 
@@ -818,9 +836,17 @@ slave_get_priority(struct slave *slave, struct lacp_info 
*priority)
 }
 
 static bool
-slave_may_tx(const struct slave *slave) OVS_REQUIRES(mutex)
+slave_may_tx(

[ovs-dev] [PATCH v3 1/2] Avoid packet drop on LACP bond after link up

2019-02-26 Thread Nitin Katiyar
Problem:

During port DOWN->UP of link (slave) in a LACP bond, after receiving the
LACPDU with SYNC set for both actor and partner, the bond-slave remains
"disabled" until OVS main thread runs LACP state machine and eventually
"enables" the bond-slave. With this, we have observed delays in the order
of 350ms and packets are  dropped in OVS due to bond-admissibility check
(packets received on slave in "disabled" state are dropped).

Fix:

When a LACPDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 lib/lacp.c   | 10 +-
 lib/lacp.h   |  2 +-
 ofproto/bond.c   | 16 +---
 ofproto/ofproto-dpif-xlate.c | 11 ++-
 4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..e768012 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,7 +325,7 @@ lacp_is_active(const struct lacp *lacp) OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function should be
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
-void
+bool
 lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,14 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 slave->partner = pdu->actor;
 }
 
+/* Evaluate may_enable here to avoid dropping of packets till main thread
+ * sets may_enable to true. */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). */
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..0dfaef0 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,7 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index d2a8b1f..cd6fd33 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -794,6 +794,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -811,7 +812,11 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/* To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for the
+ * main thread to run LACP state machine and enable the slave. */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fallback_ab) {
@@ -847,8 +852,6 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 /* Drop all packets which arrive on backup slaves.  This is similar to
  * how Linux bonding handles active-backup bonds. */
 if (bond->active_slave != slave) {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5

Re: [ovs-dev] [PATCH v2 1/2] Avoid packet drop on LACP bond after link up

2019-02-25 Thread Nitin Katiyar
Hi Ben,
Thanks for checking it. I will send the new patch.

Regards,
Nitin

> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Monday, February 25, 2019 11:23 PM
> To: Nitin Katiyar 
> Cc: ovs-dev@openvswitch.org; Manohar Krishnappa Chidambaraswamy
> 
> Subject: Re: [ovs-dev] [PATCH v2 1/2] Avoid packet drop on LACP bond after
> link up
> 
> Thanks for working to make OVS better!  This sounds like a useful
> improvement to me.
> 
> This patch causes a new warning:
> ../lib/lacp.c:329:52: error: unused parameter 'bond' [-Werror,-Wunused-
> parameter]
> 
> Will you please fix that and resubmit?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/2] Avoid packet drop on LACP bond after link up

2019-02-24 Thread Nitin Katiyar
Problem:

On certain Fortville NICs it has been observed that PHY UP detection can
get delayed (sometimes up to 4-5 secs). When the driver fails to fetch
PHY status as UP even though its actually UP,  LACP packets can get
exchanged and LACP slave brought UP. In such a case, the remote end
would start sending traffic on that slave, but OVS drops it, as the
bond-slave is not yet enabled due to PHY DOWN.

Fix:

The main intention here is delay LACP negotiation until carrier (PHY)
status is read as UP.

1. In port_run()/bundle_run(), cache the carrier status in
  "struct ofport_dpif"/"struct bond_slave".

2. When carrier state is DOWN, do not send any LACPDUs and
  drop any received LACPDUs.

3. When LACP state changes or LACPDU is received, trigger re-checking
  of carrier-state (in port_run()) by incrementing the connectivity
  sequence number to find out the true carrier state as fast as
  possible.

Signed-off-by: Nitin Katiyar 
Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
---
 lib/lacp.c | 40 +---
 lib/lacp.h |  4 ++--
 ofproto/bond.c | 30 ++
 ofproto/bond.h |  3 +++
 ofproto/ofproto-dpif.c | 10 +++---
 tests/ofproto-dpif.at  |  3 +++
 6 files changed, 78 insertions(+), 12 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index 7ac85f9..359b6ce 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -33,6 +33,7 @@
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
 #include "util.h"
+#include "ofproto/bond.h"
 
 VLOG_DEFINE_THIS_MODULE(lacp);
 
@@ -148,7 +149,7 @@ static void slave_get_actor(struct slave *, struct 
lacp_info *actor)
 OVS_REQUIRES(mutex);
 static void slave_get_priority(struct slave *, struct lacp_info *priority)
 OVS_REQUIRES(mutex);
-static bool slave_may_tx(const struct slave *)
+static bool slave_may_tx(const void *bond, const struct slave *)
 OVS_REQUIRES(mutex);
 static struct slave *slave_lookup(const struct lacp *, const void *slave)
 OVS_REQUIRES(mutex);
@@ -334,6 +335,7 @@ lacp_process_packet(struct lacp *lacp, const void *bond, 
const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool carrier_up = false;
 bool lacp_may_enable = false;
 
 lacp_lock();
@@ -350,6 +352,21 @@ lacp_process_packet(struct lacp *lacp, const void *bond, 
const void *slave_,
 goto out;
 }
 
+/*
+ * On some NICs L1 state reporting is slow. In case LACP packets are
+ * received while carrier (L1) state is still down, drop the LACPDU and
+ * trigger re-checking of L1 state.
+ */
+carrier_up = bond_slave_get_carrier(bond, slave_);
+if (!carrier_up) {
+seq_change(connectivity_seq_get());
+
+VLOG_INFO_RL(, "carrier still DOWN - conn seq changed for %s, "
+  "dropping packet\n", slave->name);
+
+goto out;
+}
+
 slave->status = LACP_CURRENT;
 tx_rate = lacp->fast ? LACP_FAST_TIME_TX : LACP_SLOW_TIME_TX;
 timer_set_duration(>rx, LACP_RX_MULTIPLIER * tx_rate);
@@ -529,7 +546,8 @@ lacp_slave_is_current(const struct lacp *lacp, const void 
*slave_)
 
 /* This function should be called periodically to update 'lacp'. */
 void
-lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
+lacp_run(struct lacp *lacp, const void *bond,
+ lacp_send_pdu *send_pdu) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
@@ -559,7 +577,7 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 HMAP_FOR_EACH (slave, node, >slaves) {
 struct lacp_info actor;
 
-if (!slave_may_tx(slave)) {
+if (!slave_may_tx(bond, slave)) {
 continue;
 }
 
@@ -588,13 +606,13 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 
 /* Causes poll_block() to wake up when lacp_run() needs to be called again. */
 void
-lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex)
+lacp_wait(struct lacp *lacp, const void *bond) OVS_EXCLUDED(mutex)
 {
 struct slave *slave;
 
 lacp_lock();
 HMAP_FOR_EACH (slave, node, >slaves) {
-if (slave_may_tx(slave)) {
+if (slave_may_tx(bond, slave)) {
 timer_wait(>tx);
 }
 
@@ -818,9 +836,17 @@ slave_get_priority(struct slave *slave, struct lacp_info 
*priority)
 }
 
 static bool
-slave_may_tx(const struct slave *slave) OVS_REQUIRES(mutex)
+slave_may_tx(const void *bond, const struct slave *slave)
+OVS_REQUIRES(mutex)
 {
-return slave->lacp->active || slave->status != LACP_DEFAULTED;
+bool carrier_up = true;
+
+if (bond) {
+carrier_up = bond_slave_get_carrier(bond, slave->aux);
+}
+
+return carrier_up &&
+(slave->lacp->active || slave->status != LACP_DEFAULTED)

[ovs-dev] [PATCH v2 1/2] Avoid packet drop on LACP bond after link up

2019-02-24 Thread Nitin Katiyar
Problem:

During port DOWN->UP of link (slave) in a LACP bond, after receiving the
LACPDU with SYNC set for both actor and partner, the bond-slave remains
"disabled" until OVS main thread runs LACP state machine and eventually
"enables" the bond-slave. With this, we have observed delays in the order
of 350ms and packets are  dropped in OVS due to bond-admissibility check
(packets received on slave in "disabled" state are dropped).

Fix:

When a LACPDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Nitin Katiyar 
Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Co-authored-by: Manohar Krishnappa Chidambaraswamy 
---
 lib/lacp.c   | 12 ++--
 lib/lacp.h   |  3 ++-
 ofproto/bond.c   | 16 +---
 ofproto/ofproto-dpif-xlate.c | 11 ++-
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..7ac85f9 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,8 +325,8 @@ lacp_is_active(const struct lacp *lacp) OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function should be
  * called on all packets received on 'slave_' with Ethernet Type ETH_TYPE_LACP.
  */
-void
-lacp_process_packet(struct lacp *lacp, const void *slave_,
+bool
+lacp_process_packet(struct lacp *lacp, const void *bond, const void *slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
 {
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,14 @@ lacp_process_packet(struct lacp *lacp, const void *slave_,
 slave->partner = pdu->actor;
 }
 
+/* Evaluate may_enable here to avoid dropping of packets till main thread
+ * sets may_enable to true. */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). */
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..1505c2c 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,8 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *bond,
+ const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index d2a8b1f..cd6fd33 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -794,6 +794,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -811,7 +812,11 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/* To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for the
+ * main thread to run LACP state machine and enable the slave. */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fallback_ab) {
@@ -847,8 +852,6 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 /* Drop all packets which arrive on backup slaves.  This is similar to
  * how Linux bonding handles active-backup bonds.

Re: [ovs-dev] [PATCH] LACP-Rx packets are not captured in ovs-tcpdump.

2019-02-14 Thread Nitin Katiyar



> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Thursday, February 14, 2019 10:58 PM
> To: Nitin Katiyar 
> Cc: ovs-dev@openvswitch.org; Manohar Krishnappa Chidambaraswamy
> 
> Subject: Re: [ovs-dev] [PATCH] LACP-Rx packets are not captured in ovs-
> tcpdump.
> 
> On Wed, Feb 13, 2019 at 10:51:18AM +, Nitin Katiyar wrote:
> > Mirroring received LACP packets to help in debugging LACP issues.
> >
> > Co-authored-by: Manohar Krishnappa Chidambaraswamy
> 
> > Signed-off-by: Manohar Krishnappa Chidambaraswamy
> 
> > Signed-off-by: Nitin Katiyar 
> > ---
> >  ofproto/ofproto-dpif-xlate.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/ofproto/ofproto-dpif-xlate.c
> > b/ofproto/ofproto-dpif-xlate.c index acd4817..111f36e 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -3308,6 +3308,7 @@ process_special(struct xlate_ctx *ctx, const
> struct xport *xport)
> >  } else if (xport->xbundle && xport->xbundle->lacp
> > && flow->dl_type == htons(ETH_TYPE_LACP)) {
> >  if (packet) {
> > +mirror_ingress_packet(ctx);
> >  lacp_process_packet(xport->xbundle->lacp, xport->ofport, 
> > packet);
> >  }
> >  slow = SLOW_LACP;
> 
> As far as I can tell this will get called from xlate_action() anyway:
Yeah, you are right. I can drop this patch.
> 
> if (!xin->frozen_state && process_special(, in_port)) {
> /* process_special() did all the processing for this packet.
>  *
>  * We do not perform special processing on thawed packets, since that
>  * was done before they were frozen and should not be redone. */
> mirror_ingress_packet();
> } else if (in_port && in_port->xbundle
> 
> Why is LACP more worthy of mirroring than the other protocols that
> process_special() handles?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] LACP-Rx packets are not captured in ovs-tcpdump.

2019-02-13 Thread Nitin Katiyar
Mirroring received LACP packets to help in debugging LACP issues.

Co-authored-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Manohar Krishnappa Chidambaraswamy 
Signed-off-by: Nitin Katiyar 
---
 ofproto/ofproto-dpif-xlate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index acd4817..111f36e 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3308,6 +3308,7 @@ process_special(struct xlate_ctx *ctx, const struct xport 
*xport)
 } else if (xport->xbundle && xport->xbundle->lacp
&& flow->dl_type == htons(ETH_TYPE_LACP)) {
 if (packet) {
+mirror_ingress_packet(ctx);
 lacp_process_packet(xport->xbundle->lacp, xport->ofport, packet);
 }
 slow = SLOW_LACP;
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v4 2/2] conntrack: Exclude l2 padding in 'conn_key_extract()'.

2019-02-12 Thread Nitin Katiyar
Hi,
We would like to get it backported till OVS 2.6 at least.

Regards,
Nitin

> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, February 12, 2019 9:13 AM
> To: Darrell Ball 
> Cc: d...@openvswitch.org; Nitin Katiyar 
> Subject: Re: [ovs-dev] [patch v4 2/2] conntrack: Exclude l2 padding in
> 'conn_key_extract()'.
> 
> On Mon, Feb 04, 2019 at 04:23:07PM -0800, Darrell Ball wrote:
> > 'conn_key_extract()' in userspace conntrack is including L2
> > (Ethernet) pad bytes for both L3 and L4 sizes. One problem is any
> > packet with non-zero L2 padding can incorrectly fail L4 checksum
> > validation.
> >
> > This patch fixes conn_key_extract() by ignoring L2 pad bytes.
> >
> > Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
> > CC: Daniele Di Proietto 
> > Co-authored-by: Vishal Deep Ajmera 
> > Co-authored-by: Venkatesan Pradeep
> 
> > Co-authored-by: Nitin Katiyar 
> > Signed-off-by: Vishal Deep Ajmera 
> > Signed-off-by: Venkatesan Pradeep 
> > Signed-off-by: Nitin Katiyar 
> > Signed-off-by: Darrell Ball 
> 
> Thanks.
> 
> I applied this series to master.  Please let me know what branches I should
> backport it to.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6] Adding support for PMD auto load balancing

2019-01-15 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
---
 Documentation/topics/dpdk/pmd.rst |  63 +++
 NEWS  |   1 +
 lib/dpif-netdev.c | 383 ++
 vswitchd/vswitch.xml  |  41 
 4 files changed, 488 insertions(+)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index dd9172d..b0e19d7 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -183,3 +183,66 @@ or can be triggered by using::
In addition, the output of ``pmd-rxq-show`` was modified to include
Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
stats was not available.
+
+Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
+---
+
+Cycle or utilization based allocation of Rx queues to PMDs gives efficient
+load distribution but it is not adaptive to change in traffic pattern
+occurring over the time. This causes uneven load among the PMDs which results
+in overall lower throughput.
+
+To address this automatic load balancing of PMDs can be set by::
+
+$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
+
+If pmd-auto-lb is set to true AND cycle based assignment is enabled then auto
+load balancing of PMDs is enabled provided there are 2 or more non-isolated
+PMDs and at least one of these PMDs is polling more than one RX queue. So,
+following conditions need to be met to have Auto Load balancing enabled:
+
+1. cycle based assignment of RX queues to PMD is enabled.
+2. pmd-auto-lb is set to true.
+3. There are two or more non-isolated PMDs present.
+4. And at least one of the non-isolated PMD has more than one queue polling.
+
+If any of above is not met PMD Auto Load Balancing is disabled.
+
+Once auto load balancing is set, each non-isolated PMD measures the processing
+load for each of its associated queues every 10 seconds. If the aggregated PMD
+load reaches 95% for 6 consecutive intervals then PMD considers itself to be
+overloaded.
+
+If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
+performed by OVS main thread. The dry-run does NOT change the existing queue
+to PMD assignments.
+
+If the resultant mapping of dry-run indicates an improved distribution of the
+load then the actual reassignment will be performed.
+
+.. note::
+
+PMD Auto Load Balancing doesn't currently work if queues are assigned
+cross NUMA as actual processing load could get worse after assignment
+as compared to what dry run predicts.
+
+The minimum time between 2 consecutive PMD auto load balancing iterations can
+also be configured by::
+
+$ ovs-vsctl set open_vswitch .\
+other_config:pmd-auto-lb-rebal-interval=""
+
+where  is a value in minutes. The default interval is 1 minute
+and setting it to 0 will also result in default value i.e. 1 min.
+
+A user can use this option to avoid frequent trigger of Auto Load Balancing of
+PMDs. For e.g. set this (in min) such that it occurs once in few hours or a day
+or a week.
+
+.. note::
+In some scenarios it may not be desired to have Auto Load Balancing
+triggerred. For example, if traffic profile for specific RX queue is
+changing dramatically very frequently which in turn thrashes CPU cache
+due to changes required in d

Re: [ovs-dev] [PATCH v5] Adding support for PMD auto load balancing

2019-01-15 Thread Nitin Katiyar


> -Original Message-
> From: Ian Stokes [mailto:ian.sto...@intel.com]
> Sent: Tuesday, January 15, 2019 5:06 AM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org;
> Kevin Traynor ; Ilya Maximets
> 
> Subject: Re: [ovs-dev] [PATCH v5] Adding support for PMD auto load
> balancing
> 
> On 1/14/2019 10:44 AM, Nitin Katiyar wrote:
> > Port rx queues that have not been statically assigned to PMDs are
> > currently assigned based on periodically sampled load measurements.
> > The assignment is performed at specific instances – port addition,
> > port deletion, upon reassignment request via CLI etc.
> >
> > Due to change in traffic pattern over time it can cause uneven load
> > among the PMDs and thus resulting in lower overall throughout.
> >
> > This patch enables the support of auto load balancing of PMDs based on
> > measured load of RX queues. Each PMD measures the processing load for
> > each of its associated queues every 10 seconds. If the aggregated PMD
> > load reaches 95% for 6 consecutive intervals then PMD considers itself to
> be overloaded.
> >
> > If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
> > performed by OVS main thread. The dry-run does NOT change the existing
> > queue to PMD assignments.
> >
> > If the resultant mapping of dry-run indicates an improved distribution
> > of the load then the actual reassignment will be performed.
> >
> > The automatic rebalancing will be disabled by default and has to be
> > enabled via configuration option. The interval (in minutes) between
> > two consecutive rebalancing can also be configured via CLI, default is
> > 1 min.
> >
> > Following example commands can be used to set the auto-lb params:
> > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
> >
> Thanks for the patch Nitin. A few comments below.
> 
> On An aside, there was discussion if this could be part of OVS 2.11 from the
> community call last week. Although this is a v5 I believe it has been under
> review and testing from the folks at Red Hat however I don't see any acks to
> date.
> 
> What are peoples thoughts?
> 
> This change seems quite contained and doesn't interfere with default cases
> where rxq isolation, round robin or cycle based assignment is used.
> 
> In testing the previous balancing still work work fine and the new load
> balancing works well also although I have queries on default values which are
> specific to use cases discussed below.
> 
> Do people feel there is any reason to hold off merging if the issues below are
> addressed and there are no other concerns?
> 
> > Co-authored-by: Rohith Basavaraja 
> > Co-authored-by: Venkatesan Pradeep
> 
> > Signed-off-by: Rohith Basavaraja 
> > Signed-off-by: Venkatesan Pradeep 
> > Signed-off-by: Nitin Katiyar 
> > ---
> >   Documentation/topics/dpdk/pmd.rst |  41 +
> >   NEWS  |   1 +
> >   lib/dpif-netdev.c | 379
> ++
> >   vswitchd/vswitch.xml  |  41 +
> >   4 files changed, 462 insertions(+)
> >
> > diff --git a/Documentation/topics/dpdk/pmd.rst
> > b/Documentation/topics/dpdk/pmd.rst
> > index dd9172d..c273b40 100644
> > --- a/Documentation/topics/dpdk/pmd.rst
> > +++ b/Documentation/topics/dpdk/pmd.rst
> > @@ -183,3 +183,44 @@ or can be triggered by using::
> >  In addition, the output of ``pmd-rxq-show`` was modified to include
> >  Rx queue utilization of the PMD as a percentage. Prior to this, 
> > tracking of
> >  stats was not available.
> > +
> > +Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
> > +---
> > +
> > +Cycle or utilization based allocation of Rx queues to PMDs gives
> > +efficient load distribution but it is not adaptive to change in
> > +traffic pattern occuring
> Minor typo above, 'occuring' -> 'occurring'
I will update it.
> > +over the time. This causes uneven load among the PMDs which results
> > +in overall lower throughput.
> > +
> > +To address this automatic load balancing of PMDs can be set by::
> > +
> > +$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> > +
> > +If pmd-auto-lb is set to true AND cycle based assignment is enabled
> > +then auto load balancing of PMDs is enabled provided there are 2 or
> > +more non-isolated 

Re: [ovs-dev] [PATCH v5] Adding support for PMD auto load balancing

2019-01-15 Thread Nitin Katiyar


> -Original Message-
> From: Federico Iezzi [mailto:fie...@redhat.com]
> Sent: Monday, January 14, 2019 8:54 PM
> To: Nitin Katiyar 
> Cc: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v5] Adding support for PMD auto load
> balancing
> 
> Maybe it's a bit late for this series, but would be possible in a future
> enhancement to have a user parameter to set a different value for
> ALB_PMD_LOAD_THRESHOLD?
> 
Hi,
Yes, it can be done as enhancement in future.

Regards,
Nitin
> Regards,
> Federico
> 
> FEDERICO IEZZI
> 
> SR. TELCO ARCHITECT
> 
> Red Hat EMEA
> 
> fie...@redhat.comM: +31-6-5152-9709
> 
> TRIED. TESTED. TRUSTED.
> @RedHat   Red Hat   Red Hat
> 
> 
> On Mon, 14 Jan 2019 at 11:56, Nitin Katiyar 
> wrote:
> >
> > Port rx queues that have not been statically assigned to PMDs are
> > currently assigned based on periodically sampled load measurements.
> > The assignment is performed at specific instances – port addition,
> > port deletion, upon reassignment request via CLI etc.
> >
> > Due to change in traffic pattern over time it can cause uneven load
> > among the PMDs and thus resulting in lower overall throughout.
> >
> > This patch enables the support of auto load balancing of PMDs based on
> > measured load of RX queues. Each PMD measures the processing load for
> > each of its associated queues every 10 seconds. If the aggregated PMD
> > load reaches 95% for 6 consecutive intervals then PMD considers itself to
> be overloaded.
> >
> > If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
> > performed by OVS main thread. The dry-run does NOT change the existing
> > queue to PMD assignments.
> >
> > If the resultant mapping of dry-run indicates an improved distribution
> > of the load then the actual reassignment will be performed.
> >
> > The automatic rebalancing will be disabled by default and has to be
> > enabled via configuration option. The interval (in minutes) between
> > two consecutive rebalancing can also be configured via CLI, default is
> > 1 min.
> >
> > Following example commands can be used to set the auto-lb params:
> > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
> >
> > Co-authored-by: Rohith Basavaraja 
> > Co-authored-by: Venkatesan Pradeep
> 
> > Signed-off-by: Rohith Basavaraja 
> > Signed-off-by: Venkatesan Pradeep 
> > Signed-off-by: Nitin Katiyar 
> > ---
> >  Documentation/topics/dpdk/pmd.rst |  41 +
> >  NEWS  |   1 +
> >  lib/dpif-netdev.c | 379
> ++
> >  vswitchd/vswitch.xml  |  41 +
> >  4 files changed, 462 insertions(+)
> >
> > diff --git a/Documentation/topics/dpdk/pmd.rst
> > b/Documentation/topics/dpdk/pmd.rst
> > index dd9172d..c273b40 100644
> > --- a/Documentation/topics/dpdk/pmd.rst
> > +++ b/Documentation/topics/dpdk/pmd.rst
> > @@ -183,3 +183,44 @@ or can be triggered by using::
> > In addition, the output of ``pmd-rxq-show`` was modified to include
> > Rx queue utilization of the PMD as a percentage. Prior to this, 
> > tracking of
> > stats was not available.
> > +
> > +Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
> > +---
> > +
> > +Cycle or utilization based allocation of Rx queues to PMDs gives
> > +efficient load distribution but it is not adaptive to change in
> > +traffic pattern occuring over the time. This causes uneven load among
> > +the PMDs which results in overall lower throughput.
> > +
> > +To address this automatic load balancing of PMDs can be set by::
> > +
> > +$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> > +
> > +If pmd-auto-lb is set to true AND cycle based assignment is enabled
> > +then auto load balancing of PMDs is enabled provided there are 2 or
> > +more non-isolated PMDs and at least one of these PMDs is polling more
> than one RX queue.
> > +
> > +Once auto load balancing is set, each non-isolated PMD measures the
> > +processing load for each of its associated queues every 10 seconds.
> > +If the aggregated PMD load reaches 95% for 6 consecutive intervals
> > +then PMD considers itself to be overloaded.
> > +
> > +If any PMD is overloaded, a dry-run of the PMD assignment algorithm
> > +is perf

[ovs-dev] [PATCH v5] Adding support for PMD auto load balancing

2019-01-14 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
---
 Documentation/topics/dpdk/pmd.rst |  41 +
 NEWS  |   1 +
 lib/dpif-netdev.c | 379 ++
 vswitchd/vswitch.xml  |  41 +
 4 files changed, 462 insertions(+)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index dd9172d..c273b40 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -183,3 +183,44 @@ or can be triggered by using::
In addition, the output of ``pmd-rxq-show`` was modified to include
Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
stats was not available.
+
+Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
+---
+
+Cycle or utilization based allocation of Rx queues to PMDs gives efficient
+load distribution but it is not adaptive to change in traffic pattern occuring
+over the time. This causes uneven load among the PMDs which results in overall
+lower throughput.
+
+To address this automatic load balancing of PMDs can be set by::
+
+$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
+
+If pmd-auto-lb is set to true AND cycle based assignment is enabled then auto
+load balancing of PMDs is enabled provided there are 2 or more non-isolated
+PMDs and at least one of these PMDs is polling more than one RX queue.
+
+Once auto load balancing is set, each non-isolated PMD measures the processing
+load for each of its associated queues every 10 seconds. If the aggregated PMD
+load reaches 95% for 6 consecutive intervals then PMD considers itself to be
+overloaded.
+
+If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
+performed by OVS main thread. The dry-run does NOT change the existing queue
+to PMD assignments.
+
+If the resultant mapping of dry-run indicates an improved distribution of the
+load then the actual reassignment will be performed.
+
+The minimum time between 2 consecutive PMD auto load balancing iterations can
+also be configured by::
+
+$ ovs-vsctl set open_vswitch .\
+other_config:pmd-auto-lb-rebal-interval=""
+
+where  is a value in minutes. The default interval is 1 minute
+and setting it to 0 will also result in default value i.e. 1 min.
+
+A user can use this option to avoid frequent trigger of auto load balancing of
+PMDs. For e.g. set this (in min) such that it occurs once in few hours or a day
+or a week.
diff --git a/NEWS b/NEWS
index 2de844f..0e9fcb1 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,7 @@ Post-v2.10.0
  * Add option for simple round-robin based Rxq to PMD assignment.
It can be set with pmd-rxq-assign.
  * Add support for DPDK 18.11
+ * Add support for Auto load balancing of PMDs (experimental)
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
- ovs-vswitchd:
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1564db9..c1757ab 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,12 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ALB_ACCEPTABLE_IMPROVEMENT   25
+#define ALB_PMD_LOAD_THRESHOLD   95
+#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#define MIN_T

[ovs-dev] [PATCH v4] Adding support for PMD auto load balancing

2019-01-10 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebal-interval="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
---
 Documentation/topics/dpdk/pmd.rst |  41 +
 NEWS  |   1 +
 lib/dpif-netdev.c | 378 ++
 vswitchd/vswitch.xml  |  41 +
 4 files changed, 461 insertions(+)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index dd9172d..c273b40 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -183,3 +183,44 @@ or can be triggered by using::
In addition, the output of ``pmd-rxq-show`` was modified to include
Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
stats was not available.
+
+Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
+---
+
+Cycle or utilization based allocation of Rx queues to PMDs gives efficient
+load distribution but it is not adaptive to change in traffic pattern occuring
+over the time. This causes uneven load among the PMDs which results in overall
+lower throughput.
+
+To address this automatic load balancing of PMDs can be set by::
+
+$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
+
+If pmd-auto-lb is set to true AND cycle based assignment is enabled then auto
+load balancing of PMDs is enabled provided there are 2 or more non-isolated
+PMDs and at least one of these PMDs is polling more than one RX queue.
+
+Once auto load balancing is set, each non-isolated PMD measures the processing
+load for each of its associated queues every 10 seconds. If the aggregated PMD
+load reaches 95% for 6 consecutive intervals then PMD considers itself to be
+overloaded.
+
+If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
+performed by OVS main thread. The dry-run does NOT change the existing queue
+to PMD assignments.
+
+If the resultant mapping of dry-run indicates an improved distribution of the
+load then the actual reassignment will be performed.
+
+The minimum time between 2 consecutive PMD auto load balancing iterations can
+also be configured by::
+
+$ ovs-vsctl set open_vswitch .\
+other_config:pmd-auto-lb-rebal-interval=""
+
+where  is a value in minutes. The default interval is 1 minute
+and setting it to 0 will also result in default value i.e. 1 min.
+
+A user can use this option to avoid frequent trigger of auto load balancing of
+PMDs. For e.g. set this (in min) such that it occurs once in few hours or a day
+or a week.
diff --git a/NEWS b/NEWS
index 2de844f..0e9fcb1 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,7 @@ Post-v2.10.0
  * Add option for simple round-robin based Rxq to PMD assignment.
It can be set with pmd-rxq-assign.
  * Add support for DPDK 18.11
+ * Add support for Auto load balancing of PMDs (experimental)
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
- ovs-vswitchd:
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1564db9..81e9cbf 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,12 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ALB_ACCEPTABLE_IMPROVEMENT   25
+#define ALB_PMD_LOAD_THRESHOLD   95
+#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#define MIN_T

[ovs-dev] [PATCH v3] Adding support for PMD auto load balancing

2019-01-07 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
---
 Documentation/topics/dpdk/pmd.rst |  42 +
 NEWS  |   1 +
 lib/dpif-netdev.c | 388 +-
 vswitchd/vswitch.xml  |  40 
 4 files changed, 463 insertions(+), 8 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index dd9172d..8670a77 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -183,3 +183,45 @@ or can be triggered by using::
In addition, the output of ``pmd-rxq-show`` was modified to include
Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
stats was not available.
+
+Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
+---
+
+Cycle or utilization based allocation of Rx queues to PMDs gives efficient
+load distribution but it is not adaptive to change in traffic pattern occuring
+over the time. This causes uneven load among the PMDs which results in overall
+lower throughput.
+
+To address this automatic load balancing of PMDs can be set by::
+
+$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
+
+If pmd-auto-lb is set to true AND cycle based assignment is enabled then auto
+load balancing of PMDs is enabled provided there are 2 or more non-isolated
+PMDs and at least one of these PMDs is polling more than one RX queue.
+
+Once auto load balancing is set, each non-isolated PMD measures the processing
+load for each of its associated queues every 10 seconds. If the aggregated PMD
+load reaches 95% for 6 consecutive intervals then PMD considers itself to be
+overloaded.
+
+If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
+performed by OVS main thread. The dry-run does NOT change the existing queue
+to PMD assignments.
+
+If the resultant mapping of dry-run indicates an improved distribution of the
+load then the actual reassignment will be performed.
+
+The minimum time between 2 consecutive PMD auto load balancing iterations can
+also be configured by::
+$ ovs-vsctl set open_vswitch .\
+other_config:pmd-auto-lb-rebalance-intvl=""
+
+where:
+
+-  is a value in minutes. The default interval is 1 minute and
+setting it to 0 will also result in default value i.e. 1 min.
+
+A user can use this option to avoid frequent trigger of auto load balancing of
+PMDs. For e.g. set this (in min) such that it occurs once in few hours or a day
+or a week.
diff --git a/NEWS b/NEWS
index 2de844f..0e9fcb1 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,7 @@ Post-v2.10.0
  * Add option for simple round-robin based Rxq to PMD assignment.
It can be set with pmd-rxq-assign.
  * Add support for DPDK 18.11
+ * Add support for Auto load balancing of PMDs (experimental)
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
- ovs-vswitchd:
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1564db9..c9e5105 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,12 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ALB_ACCEPTABLE_IMPROVEMENT   25
+#define ALB_PMD_LOAD_THRESHOLD   95
+#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#d

Re: [ovs-dev] [PATCH] Adding support for PMD auto load balancing

2019-01-06 Thread Nitin Katiyar


From: Gowrishankar Muthukrishnan [mailto:gmuth...@redhat.com]
Sent: Saturday, January 05, 2019 3:24 PM
To: Nitin Katiyar 
Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] [PATCH] Adding support for PMD auto load balancing

Hi,
Thanks for reviewing it. Some of these are addressed in v2 and rest I will try 
to address in next version.


 VLOG_DEFINE_THIS_MODULE(dpif_netdev);

Minor suggestions on naming variable/macros follow as below, as coding itself 
is documentation IMO :).

+/* Auto Load Balancing Defaults */
+#define ACCEPT_IMPROVE_DEFAULT   (25)

Instead, how about LB_ACCEPTABLE_IMPROVEMENT ?
prefixing global variables/macros with what you would use for, would always 
help reading code.
Sure, I will change it to ALB_*

+#define PMD_LOAD_THRE_DEFAULT(95)

LB_PMD_LOAD_THRESOLD ?

+#define PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */

LB_PMD_ POLL_REBALANCE_INTERVAL ?

+#define MIN_TO_MSEC  6
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */
 #define MAX_RECIRC_DEPTH 6
@@ -288,6 +294,13 @@ struct dp_meter {
 struct dp_meter_band bands[];
 };

+struct pmd_auto_lb {
+bool auto_lb_conf;//enable-disable auto load balancing
+bool is_enabled;  //auto_lb current status
+uint64_t rebalance_intvl;
+uint64_t rebalance_poll_timer;
+};
+
 /* Datapath based on the network device interface from netdev.h.
  *
  *
@@ -368,6 +381,7 @@ struct dp_netdev {
 uint64_t last_tnl_conf_seq;

 struct conntrack conntrack;
+struct pmd_auto_lb pmd_alb;
 };

 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -439,6 +453,10 @@ struct dp_netdev_rxq {
   particular core. */
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
+struct dp_netdev_pmd_thread *dry_run_pmd;
+   /* During auto lb trigger, pmd thread
+  associated with this q during dry
+  run. */

/* pmd thread that execute(or dry-run) this queue in auto load balance period */
 This is removed in v2
 bool is_vhost; /* Is rxq of a vhost port. */

 /* Counters of cycles spent successfully polling and processing pkts. */
@@ -682,6 +700,12 @@ struct dp_netdev_pmd_thread {
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
 struct hmap poll_list OVS_GUARDED;
+
+/* List of rx queues got associated during
+   pmd load balance dry run. These queues are

"during dry run of pmd auto load balance. These queues ..."
Removed in v2

+   not polled by pmd. */
+struct hmap dry_poll_list OVS_GUARDED;
+
 /* Map of 'tx_port's used for transmission.  Written by the main thread,
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
@@ -702,6 +726,11 @@ struct dp_netdev_pmd_thread {
 /* Keep track of detailed PMD performance statistics. */
 struct pmd_perf_stats perf_stats;

+/* Some stats from previous iteration used by automatic pmd
+   load balance logic. */

/* stats from previous iteration during auto rebalance of pmds*/
Yes, already taken care of it.

+uint64_t prev_stats[PMD_N_STATS];
+atomic_count pmd_overloaded;
+
 /* Set to true if the pmd thread needs to be reloaded. */
 bool need_reload;
 };
@@ -764,7 +793,8 @@ static void dp_netdev_del_port_tx_from_pmd(struct 
dp_netdev_pmd_thread *pmd,
struct tx_port *tx)
 OVS_REQUIRES(pmd->port_mutex);
 static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
- struct dp_netdev_rxq *rxq)
+ struct dp_netdev_rxq *rxq,
+ bool dry_run)
 OVS_REQUIRES(pmd->port_mutex);
 static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread *pmd,
struct rxq_poll *poll)
@@ -792,9 +822,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
  enum rxq_cycles_counter_type type);
 static void
 dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx,
-   unsigned long long cycles);
+unsigned long long cycles,
+unsigned idx);
 static uint64_t
-dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx);
+dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx,
+unsigned idx);
 static void
 dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
bool purge);
@@ -3734,6 +3766,49 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**o

Re: [ovs-dev] [PATCH v2] Adding support for PMD auto load balancing

2019-01-06 Thread Nitin Katiyar


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Friday, January 04, 2019 10:26 PM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org
> Cc: Rohith Basavaraja ; Venkatesan Pradeep
> 
> Subject: Re: [PATCH v2] Adding support for PMD auto load balancing
> 
> On 01/04/2019 02:56 PM, Kevin Traynor wrote:
> > On 01/04/2019 12:39 PM, Nitin Katiyar wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: Kevin Traynor [mailto:ktray...@redhat.com]
> >>> Sent: Friday, January 04, 2019 1:48 AM
> >>> To: Nitin Katiyar ;
> >>> ovs-dev@openvswitch.org
> >>> Cc: Rohith Basavaraja ; Venkatesan
> >>> Pradeep 
> >>> Subject: Re: [PATCH v2] Adding support for PMD auto load balancing
> >>>
> >>> On 01/03/2019 12:36 PM, Nitin Katiyar wrote:
> >>>> Port rx queues that have not been statically assigned to PMDs are
> >>>> currently assigned based on periodically sampled load measurements.
> >>>> The assignment is performed at specific instances – port addition,
> >>>> port deletion, upon reassignment request via CLI etc.
> >>>>
> >>>> Due to change in traffic pattern over time it can cause uneven load
> >>>> among the PMDs and thus resulting in lower overall throughout.
> >>>>
> >>>> This patch enables the support of auto load balancing of PMDs based
> >>>> on measured load of RX queues. Each PMD measures the processing
> >>>> load for each of its associated queues every 10 seconds. If the
> >>>> aggregated PMD load reaches 95% for 6 consecutive intervals then
> >>>> PMD considers itself to
> >>> be overloaded.
> >>>>
> >>>> If any PMD is overloaded, a dry-run of the PMD assignment algorithm
> >>>> is performed by OVS main thread. The dry-run does NOT change the
> >>>> existing queue to PMD assignments.
> >>>>
> >>>> If the resultant mapping of dry-run indicates an improved
> >>>> distribution of the load then the actual reassignment will be performed.
> >>>>
> >>>> The automatic rebalancing will be disabled by default and has to be
> >>>> enabled via configuration option. The interval (in minutes) between
> >>>> two consecutive rebalancing can also be configured via CLI, default
> >>>> is
> >>>> 1 min.
> >>>>
> >>>> Following example commands can be used to set the auto-lb params:
> >>>> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> >>>> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-
> intvl="5"
> >>>>
> >>>
> >>> Hi Nitin, thanks for v2. Not full review yet but sending some comments
> below.
> >>>
> 
> Additional minor comment below, thanks.
Thanks again Kevin.
> 
> >>> Maybe you can put some of the above into a new section below this
> >>> http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue
> >>> -
> >>> assigment-to-pmd-threads
> >> Sure, I will update that too.
> >>>
> >>> I also think this feature should be experimental until it has been
> >>> road tested a bit more.
> >>>
> >>>> Co-authored-by: Rohith Basavaraja 
> >>>> Co-authored-by: Venkatesan Pradeep
> >>> 
> >>>> Signed-off-by: Rohith Basavaraja 
> >>>> Signed-off-by: Venkatesan Pradeep
> 
> >>>> Signed-off-by: Nitin Katiyar 
> >>>> ---
> >>>>  lib/dpif-netdev.c| 403
> >>> +--
> >>>>  vswitchd/vswitch.xml |  30 
> >>>>  2 files changed, 424 insertions(+), 9 deletions(-)
> >>>>
> >>>
> >>> There seems to be windows style line endings in the patch? or
> >>> something else has gone wrong for me.
> >>>
> >>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> >>>> 1564db9..8db106f 100644
> >>>> --- a/lib/dpif-netdev.c
> >>>> +++ b/lib/dpif-netdev.c
> >>>> @@ -80,6 +80,12 @@
> >>>>
> >>>>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> >>>>
> >>>> +/* Auto Load Balancing Defaults */
> >>>> +#define ACCEPT_IMPROVE_DEFAULT   (25)
> >>

Re: [ovs-dev] [PATCH v2] Adding support for PMD auto load balancing

2019-01-04 Thread Nitin Katiyar


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Friday, January 04, 2019 1:48 AM
> To: Nitin Katiyar ; ovs-dev@openvswitch.org
> Cc: Rohith Basavaraja ; Venkatesan Pradeep
> 
> Subject: Re: [PATCH v2] Adding support for PMD auto load balancing
> 
> On 01/03/2019 12:36 PM, Nitin Katiyar wrote:
> > Port rx queues that have not been statically assigned to PMDs are
> > currently assigned based on periodically sampled load measurements.
> > The assignment is performed at specific instances – port addition,
> > port deletion, upon reassignment request via CLI etc.
> >
> > Due to change in traffic pattern over time it can cause uneven load
> > among the PMDs and thus resulting in lower overall throughout.
> >
> > This patch enables the support of auto load balancing of PMDs based on
> > measured load of RX queues. Each PMD measures the processing load for
> > each of its associated queues every 10 seconds. If the aggregated PMD
> > load reaches 95% for 6 consecutive intervals then PMD considers itself to
> be overloaded.
> >
> > If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
> > performed by OVS main thread. The dry-run does NOT change the existing
> > queue to PMD assignments.
> >
> > If the resultant mapping of dry-run indicates an improved distribution
> > of the load then the actual reassignment will be performed.
> >
> > The automatic rebalancing will be disabled by default and has to be
> > enabled via configuration option. The interval (in minutes) between
> > two consecutive rebalancing can also be configured via CLI, default is
> > 1 min.
> >
> > Following example commands can be used to set the auto-lb params:
> > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
> >
> 
> Hi Nitin, thanks for v2. Not full review yet but sending some comments below.
> 
> Maybe you can put some of the above into a new section below this
> http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue-
> assigment-to-pmd-threads
Sure, I will update that too.
> 
> I also think this feature should be experimental until it has been road 
> tested a
> bit more.
> 
> > Co-authored-by: Rohith Basavaraja 
> > Co-authored-by: Venkatesan Pradeep
> 
> > Signed-off-by: Rohith Basavaraja 
> > Signed-off-by: Venkatesan Pradeep 
> > Signed-off-by: Nitin Katiyar 
> > ---
> >  lib/dpif-netdev.c| 403
> +--
> >  vswitchd/vswitch.xml |  30 
> >  2 files changed, 424 insertions(+), 9 deletions(-)
> >
> 
> There seems to be windows style line endings in the patch? or something else
> has gone wrong for me.
> 
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > 1564db9..8db106f 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -80,6 +80,12 @@
> >
> >  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> >
> > +/* Auto Load Balancing Defaults */
> > +#define ACCEPT_IMPROVE_DEFAULT   (25)
> > +#define PMD_LOAD_THRE_DEFAULT(95)
> 
> Probably you should remove the brackets above to be consistent with the
> others below and in the rest of the file.
> 
> > +#define PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> > +#define MIN_TO_MSEC  6
> > +
> >  #define FLOW_DUMP_MAX_BATCH 50
> >  /* Use per thread recirc_depth to prevent recirculation loop. */
> > #define MAX_RECIRC_DEPTH 6 @@ -288,6 +294,13 @@ struct dp_meter {
> >  struct dp_meter_band bands[];
> >  };
> >
> > +struct pmd_auto_lb {
> > +bool auto_lb_conf;/* enable-disable auto load balancing */
> 
> I'm not sure what '_conf' is short for? maybe it should be something like
> 'auto_lb_requested'
Sure
> 
> > +bool is_enabled;  /* auto_lb current status */
> 
> Comments should be of style /* Sentence case. */
> http://docs.openvswitch.org/en/latest/internals/contributing/coding-
> style/#comments
> 
Thanks for providing the link. I will update in next version
> 
> > +uint64_t rebalance_intvl;
> > +uint64_t rebalance_poll_timer;
> > +};
> > +
> >  /* Datapath based on the network device interface from netdev.h.
> >   *
> >   *
> > @@ -368,6 +381,7 @@ struct dp_netdev {
> >  uint64_t last_tnl_conf_seq;
> >
> >  struct conntrack conntrack;
> > +struct pmd_auto_lb pmd_alb;
> >  };
> >
> >  static 

[ovs-dev] [PATCH v2] Adding support for PMD auto load balancing

2019-01-03 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
---
 lib/dpif-netdev.c| 403 +--
 vswitchd/vswitch.xml |  30 
 2 files changed, 424 insertions(+), 9 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1564db9..8db106f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,12 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ACCEPT_IMPROVE_DEFAULT   (25)
+#define PMD_LOAD_THRE_DEFAULT(95)
+#define PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#define MIN_TO_MSEC  6
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */
 #define MAX_RECIRC_DEPTH 6
@@ -288,6 +294,13 @@ struct dp_meter {
 struct dp_meter_band bands[];
 };
 
+struct pmd_auto_lb {
+bool auto_lb_conf;/* enable-disable auto load balancing */
+bool is_enabled;  /* auto_lb current status */
+uint64_t rebalance_intvl;
+uint64_t rebalance_poll_timer;
+};
+
 /* Datapath based on the network device interface from netdev.h.
  *
  *
@@ -368,6 +381,7 @@ struct dp_netdev {
 uint64_t last_tnl_conf_seq;
 
 struct conntrack conntrack;
+struct pmd_auto_lb pmd_alb;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread {
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
 struct hmap poll_list OVS_GUARDED;
+
 /* Map of 'tx_port's used for transmission.  Written by the main thread,
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
@@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread {
 /* Keep track of detailed PMD performance statistics. */
 struct pmd_perf_stats perf_stats;
 
+/* Some stats from previous iteration used by automatic pmd
+   load balance logic. */
+uint64_t prev_stats[PMD_N_STATS];
+atomic_count pmd_overloaded;
+
 /* Set to true if the pmd thread needs to be reloaded. */
 bool need_reload;
 };
@@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
  enum rxq_cycles_counter_type type);
 static void
 dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx,
-   unsigned long long cycles);
+unsigned long long cycles,
+unsigned idx);
 static uint64_t
-dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx);
+dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx,
+unsigned idx);
 static void
 dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
bool purge);
@@ -3734,6 +3756,51 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**ops, size_t n_ops,
 }
 }
 
+/* Enable/Disable PMD auto load balancing */
+static void
+set_pmd_auto_lb(struct dp_netdev *dp)
+{
+unsigned int cnt = 0;
+struct dp_netdev_pmd_thread *pmd;
+struct pmd_auto_lb * pmd_alb = >pmd_alb;
+
+bool enable = false;
+bool pmd_rxq_assign_cyc = dp->pmd_rxq_assign_cyc;
+
+/* Ensure that there is at least 2 non-isolated PMDs and
+ * one of the PMD is polling more than one rxq
+ */
+CMAP_FOR_EACH (pmd, node, >poll_threads) {
+if (pmd->c

[ovs-dev] [PATCH] Adding support for PMD auto load balancing

2018-12-21 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
---
 lib/dpif-netdev.c | 464 +-
 1 file changed, 425 insertions(+), 39 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1564db9..b25ff77 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,12 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ACCEPT_IMPROVE_DEFAULT   (25)
+#define PMD_LOAD_THRE_DEFAULT(95)
+#define PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#define MIN_TO_MSEC  6
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */
 #define MAX_RECIRC_DEPTH 6
@@ -288,6 +294,13 @@ struct dp_meter {
 struct dp_meter_band bands[];
 };
 
+struct pmd_auto_lb {
+bool auto_lb_conf;//enable-disable auto load balancing
+bool is_enabled;  //auto_lb current status
+uint64_t rebalance_intvl;
+uint64_t rebalance_poll_timer;
+};
+
 /* Datapath based on the network device interface from netdev.h.
  *
  *
@@ -368,6 +381,7 @@ struct dp_netdev {
 uint64_t last_tnl_conf_seq;
 
 struct conntrack conntrack;
+struct pmd_auto_lb pmd_alb;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -439,6 +453,10 @@ struct dp_netdev_rxq {
   particular core. */
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
+struct dp_netdev_pmd_thread *dry_run_pmd;
+   /* During auto lb trigger, pmd thread
+  associated with this q during dry
+  run. */
 bool is_vhost; /* Is rxq of a vhost port. */
 
 /* Counters of cycles spent successfully polling and processing pkts. */
@@ -682,6 +700,12 @@ struct dp_netdev_pmd_thread {
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
 struct hmap poll_list OVS_GUARDED;
+
+/* List of rx queues got associated during
+   pmd load balance dry run. These queues are
+   not polled by pmd. */
+struct hmap dry_poll_list OVS_GUARDED;
+
 /* Map of 'tx_port's used for transmission.  Written by the main thread,
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
@@ -702,6 +726,11 @@ struct dp_netdev_pmd_thread {
 /* Keep track of detailed PMD performance statistics. */
 struct pmd_perf_stats perf_stats;
 
+/* Some stats from previous iteration used by automatic pmd
+   load balance logic. */
+uint64_t prev_stats[PMD_N_STATS];
+atomic_count pmd_overloaded;
+
 /* Set to true if the pmd thread needs to be reloaded. */
 bool need_reload;
 };
@@ -764,7 +793,8 @@ static void dp_netdev_del_port_tx_from_pmd(struct 
dp_netdev_pmd_thread *pmd,
struct tx_port *tx)
 OVS_REQUIRES(pmd->port_mutex);
 static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
- struct dp_netdev_rxq *rxq)
+ struct dp_netdev_rxq *rxq,
+ bool dry_run)
 OVS_REQUIRES(pmd->port_mutex);
 static void dp_netd

Re: [ovs-dev] [PATCH] RFC for support of PMD Auto load balancing

2018-10-28 Thread Nitin Katiyar
Thanks Kevin for reviewing it. I will look into your comments and send the new 
version for review.

I would like to clarify that it samples load every 10 seconds and if the 
criterion for triggering dry run (i.e load threshold and/or drops) is met for 
consecutive 6 iterations then only it will trigger dry run. For this 
rxq->overloading_pmd is required.

Regards,
Nitin

-Original Message-
From: Kevin Traynor [mailto:ktray...@redhat.com] 
Sent: Friday, October 26, 2018 3:53 PM
To: Nitin Katiyar ; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] [PATCH] RFC for support of PMD Auto load balancing

Hi Nitin,

Thanks for your work on this and sharing the RFC. Initial comments below.

On 10/11/2018 08:59 PM, Nitin Katiyar wrote:
> Port rx queues that have not been statically assigned to PMDs are 
> currently assigned based on periodically sampled load measurements.
> The assignment is performed at specific instances – port addition, 
> port deletion, upon reassignment request via CLI etc.
> 
> Over time it can cause uneven load among the PMDs due to change in 
> traffic pattern and thus resulting in lower overall throughout.
> 
> This patch enables the support of auto load balancing of PMDs based on 
> measured load of RX queues. Each PMD measures the processing load for 
> each of its associated queues every 10 seconds. If the aggregated PMD 
> load exceeds a configured threshold for 6 consecutive intervals and if 
> there are receive packet drops at the NIC the PMD considers itself to be 
> overloaded.
> 
> If any PMD considers itself to be overloaded, a dry-run of the PMD 
> assignment algorithm is performed by OVS main thread. The dry-run does 
> NOT change the existing queue to PMD assignments.
> 
> If the resultant mapping of dry-run indicates an improved distribution 
> of the load then the actual reassignment will be performed. The 
> automatic rebalancing will be disabled by default and has to be 
> enabled via configuration option. Load thresholds, improvement factor 
> etc are also configurable.
> 
> Following example commands can be used to set the auto-lb params:
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-thresh="80"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-min-improvement="5"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-drop-check="true"
> 

As you mentioned in follow up, this will never be perfect, so there needs to be 
a way that the user can limit it happening even if the criteria above is met. 
Something like allowing the user to set a max number of rebalances for some 
time period. e.g. pmd-auto-lb-max-num and pmd-auto-lb-time.

> Co-authored-by: Rohith Basavaraja 
> 
> Signed-off-by: Nitin Katiyar 
> Signed-off-by: Rohith Basavaraja 
> ---
>  lib/dpif-netdev.c | 589 
> +++---
>  1 file changed, 561 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 
> e322f55..28593cc 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -80,6 +80,26 @@
>  
>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>  
> +/* Auto Load Balancing Defaults */
> +#define ACCEPT_IMPROVE_DEFAULT   (25)
> +#define PMD_LOAD_THRE_DEFAULT(99)
> +#define PMD_AUTO_LB_DISABLE  false
> +#define SKIP_DROP_CHECK_DEFAULT  false
> +
> +//TODO: Should we make it configurable??
> +#define PMD_MIN_NUM_DROPS(1)
> +#define PMD_MIN_NUM_QFILLS   (1)

It seems like a very small default. This would indicate that one dropped pkt or 
one vhost q full is considered enough to do a dry run.

> +#define PMD_REBALANCE_POLL_TIMER_INTERVAL 6
> +
> +extern uint32_t log_q_thr;
> +
> +static bool pmd_auto_lb = PMD_AUTO_LB_DISABLE; static bool 
> +auto_lb_skip_drop_check = SKIP_DROP_CHECK_DEFAULT; static float 
> +auto_lb_pmd_load_ther = PMD_LOAD_THRE_DEFAULT; static unsigned int 
> +auto_lb_accept_improve = ACCEPT_IMPROVE_DEFAULT; static long long int 
> +pmd_rebalance_poll_timer = 0;
> +

I think these can be in 'struct dp_netdev' like the other config items instead 
of globals

> +
>  #define FLOW_DUMP_MAX_BATCH 50
>  /* Use per thread recirc_depth to prevent recirculation loop. */  
> #define MAX_RECIRC_DEPTH 6 @@ -393,6 +413,8 @@ enum 
> rxq_cycles_counter_type {
> interval. */
>  RXQ_CYCLES_PROC_HIST,   /* Total cycles of all intervals that are 
> used
> during rxq to pmd assignment. */
> +RXQ_CYCLES_IDLE_CURR,   /* Cycles spent in idling. */
> +RXQ_CYCLES_IDLE_HIST,   /* Total cycles of all idle intervals. */

I'm not sure if it's really needed to meas

Re: [ovs-dev] [PATCH] RFC for support of PMD Auto load balancing

2018-10-22 Thread Nitin Katiyar
Hi,
Gentle reminder for review.

Regards,
Nitin

-Original Message-
From: Nitin Katiyar 
Sent: Friday, October 12, 2018 10:49 AM
To: ovs-dev@openvswitch.org
Cc: Rohith Basavaraja 
Subject: RE: [PATCH] RFC for support of PMD Auto load balancing

Hi,
I forgot to mention that this patch does not handle frequent rx scheduling of 
queues due to auto load balancing. That is something we had identified and 
changes need to be done to dampen the frequent scheduling of rx queues across 
PMDs.

Regards,
Nitin

-Original Message-
From: Nitin Katiyar
Sent: Friday, October 12, 2018 1:30 AM
To: ovs-dev@openvswitch.org
Cc: Nitin Katiyar ; Rohith Basavaraja 

Subject: [PATCH] RFC for support of PMD Auto load balancing

Port rx queues that have not been statically assigned to PMDs are currently 
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port 
deletion, upon reassignment request via CLI etc.

Over time it can cause uneven load among the PMDs due to change in traffic 
pattern and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on measured 
load of RX queues. Each PMD measures the processing load for each of its 
associated queues every 10 seconds. If the aggregated PMD load exceeds a 
configured threshold for 6 consecutive intervals and if there are receive 
packet drops at the NIC the PMD considers itself to be overloaded.

If any PMD considers itself to be overloaded, a dry-run of the PMD assignment 
algorithm is performed by OVS main thread. The dry-run does NOT change the 
existing queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution of the 
load then the actual reassignment will be performed. The automatic rebalancing 
will be disabled by default and has to be enabled via configuration option. 
Load thresholds, improvement factor etc are also configurable.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-thresh="80"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-min-improvement="5"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-drop-check="true"

Co-authored-by: Rohith Basavaraja 

Signed-off-by: Nitin Katiyar 
Signed-off-by: Rohith Basavaraja 
---
 lib/dpif-netdev.c | 589 +++---
 1 file changed, 561 insertions(+), 28 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e322f55..28593cc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,26 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ACCEPT_IMPROVE_DEFAULT   (25)
+#define PMD_LOAD_THRE_DEFAULT(99)
+#define PMD_AUTO_LB_DISABLE  false
+#define SKIP_DROP_CHECK_DEFAULT  false
+
+//TODO: Should we make it configurable??
+#define PMD_MIN_NUM_DROPS(1)
+#define PMD_MIN_NUM_QFILLS   (1)
+#define PMD_REBALANCE_POLL_TIMER_INTERVAL 6
+
+extern uint32_t log_q_thr;
+
+static bool pmd_auto_lb = PMD_AUTO_LB_DISABLE; static bool 
+auto_lb_skip_drop_check = SKIP_DROP_CHECK_DEFAULT; static float 
+auto_lb_pmd_load_ther = PMD_LOAD_THRE_DEFAULT; static unsigned int 
+auto_lb_accept_improve = ACCEPT_IMPROVE_DEFAULT; static long long int 
+pmd_rebalance_poll_timer = 0;
+
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */  #define 
MAX_RECIRC_DEPTH 6 @@ -393,6 +413,8 @@ enum rxq_cycles_counter_type {
interval. */
 RXQ_CYCLES_PROC_HIST,   /* Total cycles of all intervals that are used
during rxq to pmd assignment. */
+RXQ_CYCLES_IDLE_CURR,   /* Cycles spent in idling. */
+RXQ_CYCLES_IDLE_HIST,   /* Total cycles of all idle intervals. */
 RXQ_N_CYCLES
 };
 
@@ -429,6 +451,14 @@ static struct ovsthread_once offload_thread_once
 
 #define XPS_TIMEOUT 50LL/* In microseconds. */
 
+typedef struct {
+unsigned long long prev_drops;
+} q_drops;
+typedef struct {
+unsigned int num_vhost_qfill;
+unsigned int prev_num_vhost_qfill;
+} vhost_qfill;
+
 /* Contained by struct dp_netdev_port's 'rxqs' member.  */  struct 
dp_netdev_rxq {
 struct dp_netdev_port *port;
@@ -439,6 +469,10 @@ struct dp_netdev_rxq {
   particular core. */
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
+struct dp_netdev_pmd_thread *dry_run_pmd;
+   /* During auto lb trigger, pmd thread
+  associated with this q during dry
+  run. */
 bool is_vhost;  

Re: [ovs-dev] [PATCH] RFC for support of PMD Auto load balancing

2018-10-11 Thread Nitin Katiyar
Hi,
I forgot to mention that this patch does not handle frequent rx scheduling of 
queues due to auto load balancing. That is something we had identified and 
changes need to be done to dampen the frequent scheduling of rx queues across 
PMDs.

Regards,
Nitin

-Original Message-
From: Nitin Katiyar 
Sent: Friday, October 12, 2018 1:30 AM
To: ovs-dev@openvswitch.org
Cc: Nitin Katiyar ; Rohith Basavaraja 

Subject: [PATCH] RFC for support of PMD Auto load balancing

Port rx queues that have not been statically assigned to PMDs are currently 
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port 
deletion, upon reassignment request via CLI etc.

Over time it can cause uneven load among the PMDs due to change in traffic 
pattern and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on measured 
load of RX queues. Each PMD measures the processing load for each of its 
associated queues every 10 seconds. If the aggregated PMD load exceeds a 
configured threshold for 6 consecutive intervals and if there are receive 
packet drops at the NIC the PMD considers itself to be overloaded.

If any PMD considers itself to be overloaded, a dry-run of the PMD assignment 
algorithm is performed by OVS main thread. The dry-run does NOT change the 
existing queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution of the 
load then the actual reassignment will be performed. The automatic rebalancing 
will be disabled by default and has to be enabled via configuration option. 
Load thresholds, improvement factor etc are also configurable.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-thresh="80"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-min-improvement="5"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-drop-check="true"

Co-authored-by: Rohith Basavaraja 

Signed-off-by: Nitin Katiyar 
Signed-off-by: Rohith Basavaraja 
---
 lib/dpif-netdev.c | 589 +++---
 1 file changed, 561 insertions(+), 28 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e322f55..28593cc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,26 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ACCEPT_IMPROVE_DEFAULT   (25)
+#define PMD_LOAD_THRE_DEFAULT(99)
+#define PMD_AUTO_LB_DISABLE  false
+#define SKIP_DROP_CHECK_DEFAULT  false
+
+//TODO: Should we make it configurable??
+#define PMD_MIN_NUM_DROPS(1)
+#define PMD_MIN_NUM_QFILLS   (1)
+#define PMD_REBALANCE_POLL_TIMER_INTERVAL 6
+
+extern uint32_t log_q_thr;
+
+static bool pmd_auto_lb = PMD_AUTO_LB_DISABLE; static bool 
+auto_lb_skip_drop_check = SKIP_DROP_CHECK_DEFAULT; static float 
+auto_lb_pmd_load_ther = PMD_LOAD_THRE_DEFAULT; static unsigned int 
+auto_lb_accept_improve = ACCEPT_IMPROVE_DEFAULT; static long long int 
+pmd_rebalance_poll_timer = 0;
+
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */  #define 
MAX_RECIRC_DEPTH 6 @@ -393,6 +413,8 @@ enum rxq_cycles_counter_type {
interval. */
 RXQ_CYCLES_PROC_HIST,   /* Total cycles of all intervals that are used
during rxq to pmd assignment. */
+RXQ_CYCLES_IDLE_CURR,   /* Cycles spent in idling. */
+RXQ_CYCLES_IDLE_HIST,   /* Total cycles of all idle intervals. */
 RXQ_N_CYCLES
 };
 
@@ -429,6 +451,14 @@ static struct ovsthread_once offload_thread_once
 
 #define XPS_TIMEOUT 50LL/* In microseconds. */
 
+typedef struct {
+unsigned long long prev_drops;
+} q_drops;
+typedef struct {
+unsigned int num_vhost_qfill;
+unsigned int prev_num_vhost_qfill;
+} vhost_qfill;
+
 /* Contained by struct dp_netdev_port's 'rxqs' member.  */  struct 
dp_netdev_rxq {
 struct dp_netdev_port *port;
@@ -439,6 +469,10 @@ struct dp_netdev_rxq {
   particular core. */
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
+struct dp_netdev_pmd_thread *dry_run_pmd;
+   /* During auto lb trigger, pmd thread
+  associated with this q during dry
+  run. */
 bool is_vhost; /* Is rxq of a vhost port. */
 
 /* Counters of cycles spent successfully polling and processing pkts. */ 
@@ -446,6 +480,16 @@ struct dp_netdev_rxq {
 /* We store PMD_RXQ_INTERVAL_MAX intervals of data for an rxq and then
 

[ovs-dev] [PATCH] RFC for support of PMD Auto load balancing

2018-10-11 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Over time it can cause uneven load among the PMDs due to change in traffic
pattern and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based
on measured load of RX queues. Each PMD measures the processing load for
each of its associated queues every 10 seconds. If the aggregated PMD load
exceeds a configured threshold for 6 consecutive intervals and if there are
receive packet drops at the NIC the PMD considers itself to be overloaded.

If any PMD considers itself to be overloaded, a dry-run of the PMD
assignment algorithm is performed by OVS main thread. The dry-run
does NOT change the existing queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed. The automatic
rebalancing will be disabled by default and has to be enabled via
configuration option. Load thresholds, improvement factor etc are also
configurable.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-thresh="80"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-min-improvement="5"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-drop-check="true"

Co-authored-by: Rohith Basavaraja 

Signed-off-by: Nitin Katiyar 
Signed-off-by: Rohith Basavaraja 
---
 lib/dpif-netdev.c | 589 +++---
 1 file changed, 561 insertions(+), 28 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e322f55..28593cc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,26 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ACCEPT_IMPROVE_DEFAULT   (25)
+#define PMD_LOAD_THRE_DEFAULT(99)
+#define PMD_AUTO_LB_DISABLE  false
+#define SKIP_DROP_CHECK_DEFAULT  false
+
+//TODO: Should we make it configurable??
+#define PMD_MIN_NUM_DROPS(1)
+#define PMD_MIN_NUM_QFILLS   (1)
+#define PMD_REBALANCE_POLL_TIMER_INTERVAL 6
+
+extern uint32_t log_q_thr;
+
+static bool pmd_auto_lb = PMD_AUTO_LB_DISABLE;
+static bool auto_lb_skip_drop_check = SKIP_DROP_CHECK_DEFAULT;
+static float auto_lb_pmd_load_ther = PMD_LOAD_THRE_DEFAULT;
+static unsigned int auto_lb_accept_improve = ACCEPT_IMPROVE_DEFAULT;
+static long long int pmd_rebalance_poll_timer = 0;
+
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */
 #define MAX_RECIRC_DEPTH 6
@@ -393,6 +413,8 @@ enum rxq_cycles_counter_type {
interval. */
 RXQ_CYCLES_PROC_HIST,   /* Total cycles of all intervals that are used
during rxq to pmd assignment. */
+RXQ_CYCLES_IDLE_CURR,   /* Cycles spent in idling. */
+RXQ_CYCLES_IDLE_HIST,   /* Total cycles of all idle intervals. */
 RXQ_N_CYCLES
 };
 
@@ -429,6 +451,14 @@ static struct ovsthread_once offload_thread_once
 
 #define XPS_TIMEOUT 50LL/* In microseconds. */
 
+typedef struct {
+unsigned long long prev_drops;
+} q_drops;
+typedef struct {
+unsigned int num_vhost_qfill;
+unsigned int prev_num_vhost_qfill;
+} vhost_qfill;
+
 /* Contained by struct dp_netdev_port's 'rxqs' member.  */
 struct dp_netdev_rxq {
 struct dp_netdev_port *port;
@@ -439,6 +469,10 @@ struct dp_netdev_rxq {
   particular core. */
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
+struct dp_netdev_pmd_thread *dry_run_pmd;
+   /* During auto lb trigger, pmd thread
+  associated with this q during dry
+  run. */
 bool is_vhost; /* Is rxq of a vhost port. */
 
 /* Counters of cycles spent successfully polling and processing pkts. */
@@ -446,6 +480,16 @@ struct dp_netdev_rxq {
 /* We store PMD_RXQ_INTERVAL_MAX intervals of data for an rxq and then
sum them to yield the cycles used for an rxq. */
 atomic_ullong cycles_intrvl[PMD_RXQ_INTERVAL_MAX];
+
+/* Following param are used to determine the load on the PMD
+ * for automatic load balance
+ */
+atomic_ullong idle_intrvl[PMD_RXQ_INTERVAL_MAX];
+union {
+q_drops rxq_drops;
+vhost_qfill rxq_vhost_qfill;
+} rxq_drops_or_qfill;
+atomic_uint   overloading_pmd;
 };
 
 /* A port in a netdev-based datapath. */
@@ -682,6 +726,12 @@ struct dp_n

Re: [ovs-dev] [PATCH v2 1/2] Fix packet drops on LACP bond after link up

2018-08-14 Thread Nitin Katiyar
Hi Ben,
I checked it. There were 2 patches sent by Manohar earlier. First patch (v1)  
had some white spaces because of that merge failed. So, Manohar sent the v2 of 
the same while second patch he didn't include again. I will be sending both the 
patches again on latest baseline.

Thanks and Regards,
Nitin

-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Monday, August 13, 2018 9:33 PM
To: Nitin Katiyar 
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v2 1/2] Fix packet drops on LACP bond after link 
up

No.  There were two comments but neither one received a response:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348943.html
https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348944.html

On Mon, Aug 13, 2018 at 11:22:54AM +, Nitin Katiyar wrote:
> Ben,
> I am following up on the patch which Manohar had sent earlier. Can it be 
> merged?
> 
> Regards,
> Nitin
> 
> -Original Message-
> From: Manohar Krishnappa Chidambaraswamy 
> [mailto:manohar.krishnappa.chidambarasw...@ericsson.com] 
> Sent: Monday, June 25, 2018 2:49 PM
> To: Ben Pfaff 
> Cc: d...@openvswitch.org; Nitin Katiyar 
> Subject: Re: [ovs-dev] [PATCH v2 1/2] Fix packet drops on LACP bond after 
> link up
> 
> Hi Ben,
> 
> Does this patch apply without issues?
> 
> Would you be able to look at 2/2 of this series as well?
> 
> Thanx
> Manu
> 
> On 18/06/18, 2:05 PM, "ovs-dev-boun...@openvswitch.org on behalf of Manohar 
> Krishnappa Chidambaraswamy"  manohar.krishnappa.chidambarasw...@ericsson.com> wrote:
> 
> Ben,
> 
> Here are the v2 diffs. Hope this applies without any issue.
>     
> Thanx
> Manu
> 
> Signed-off-by: Manohar K C
> 
> CC: Jan Scheurich 
> CC: Nitin Katiyar 
> ---
> v1 1/2: https://patchwork.ozlabs.org/patch/915285/
> v2 1/2: Rebased to master
> 
>  lib/lacp.c   | 14 --
>  lib/lacp.h   |  3 ++-
>  ofproto/bond.c   | 18 +++---
>  ofproto/ofproto-dpif-xlate.c | 13 -
>  4 files changed, 41 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/lacp.c b/lib/lacp.c
> index d6b36aa..9e43e06 100644
> --- a/lib/lacp.c
> +++ b/lib/lacp.c
> @@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp 
> *, const void *slave)
>  OVS_REQUIRES(mutex);
>  static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
>  OVS_REQUIRES(mutex);
> +static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
>  
>  static unixctl_cb_func lacp_unixctl_show;
>  static unixctl_cb_func lacp_unixctl_show_stats;
> @@ -324,8 +325,8 @@ lacp_is_active(const struct lacp *lacp) 
> OVS_EXCLUDED(mutex)
>  /* Processes 'packet' which was received on 'slave_'.  This function 
> should be
>   * called on all packets received on 'slave_' with Ethernet Type 
> ETH_TYPE_LACP.
>   */
> -void
> -lacp_process_packet(struct lacp *lacp, const void *slave_,
> +bool
> +lacp_process_packet(struct lacp *lacp, const void *bond, const void 
> *slave_,
>  const struct dp_packet *packet)
>  OVS_EXCLUDED(mutex)
>  {
> @@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void 
> *slave_,
>  const struct lacp_pdu *pdu;
>  long long int tx_rate;
>  struct slave *slave;
> +bool lacp_may_enable = false;
>  
>  lacp_lock();
>  slave = slave_lookup(lacp, slave_);
> @@ -362,8 +364,16 @@ lacp_process_packet(struct lacp *lacp, const void 
> *slave_,
>  slave->partner = pdu->actor;
>  }
>  
> +/*
> + * Evaluate may_enable here to avoid dropping of packets till main 
> thread
> + * sets may_enable to true.
> + */
> +lacp_may_enable = slave_may_enable__(slave);
> +
>  out:
>  lacp_unlock();
> +
> +return lacp_may_enable;
>  }
>  
>  /* Returns the lacp_status of the given 'lacp' object (which may be 
> NULL). */
> diff --git a/lib/lacp.h b/lib/lacp.h
> index f35cff5..1505c2c 100644
> --- a/lib/lacp.h
> +++ b/lib/lacp.h
> @@ -46,7 +46,8 @@ struct lacp *lacp_ref(const struct lacp *);
>  void lacp_configure(struct lacp *, const struct lacp_settings *);
>  bool lacp_is_active(const struct lacp *);
>  
> -void lacp_process_packet(struct lacp *, const void *slave,
> +bool lacp_process_packet(struct lacp *, const void *bond,
> + cons

Re: [ovs-dev] [PATCH v2 1/2] Fix packet drops on LACP bond after link up

2018-08-13 Thread Nitin Katiyar
Ben,
I am following up on the patch which Manohar had sent earlier. Can it be merged?

Regards,
Nitin

-Original Message-
From: Manohar Krishnappa Chidambaraswamy 
[mailto:manohar.krishnappa.chidambarasw...@ericsson.com] 
Sent: Monday, June 25, 2018 2:49 PM
To: Ben Pfaff 
Cc: d...@openvswitch.org; Nitin Katiyar 
Subject: Re: [ovs-dev] [PATCH v2 1/2] Fix packet drops on LACP bond after link 
up

Hi Ben,

Does this patch apply without issues?

Would you be able to look at 2/2 of this series as well?

Thanx
Manu

On 18/06/18, 2:05 PM, "ovs-dev-boun...@openvswitch.org on behalf of Manohar 
Krishnappa Chidambaraswamy"  wrote:

Ben,

Here are the v2 diffs. Hope this applies without any issue.

Thanx
Manu

Signed-off-by: Manohar K C

CC: Jan Scheurich 
    CC: Nitin Katiyar 
---
v1 1/2: https://patchwork.ozlabs.org/patch/915285/
v2 1/2: Rebased to master

 lib/lacp.c   | 14 --
 lib/lacp.h   |  3 ++-
 ofproto/bond.c   | 18 +++---
 ofproto/ofproto-dpif-xlate.c | 13 -
 4 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index d6b36aa..9e43e06 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
 static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
 static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,8 +325,8 @@ lacp_is_active(const struct lacp *lacp) 
OVS_EXCLUDED(mutex)
 /* Processes 'packet' which was received on 'slave_'.  This function 
should be
  * called on all packets received on 'slave_' with Ethernet Type 
ETH_TYPE_LACP.
  */
-void
-lacp_process_packet(struct lacp *lacp, const void *slave_,
+bool
+lacp_process_packet(struct lacp *lacp, const void *bond, const void 
*slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
 {
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void 
*slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,16 @@ lacp_process_packet(struct lacp *lacp, const void 
*slave_,
 slave->partner = pdu->actor;
 }
 
+/*
+ * Evaluate may_enable here to avoid dropping of packets till main 
thread
+ * sets may_enable to true.
+ */
+lacp_may_enable = slave_may_enable__(slave);
+
 out:
 lacp_unlock();
+
+return lacp_may_enable;
 }
 
 /* Returns the lacp_status of the given 'lacp' object (which may be NULL). 
*/
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..1505c2c 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,8 @@ struct lacp *lacp_ref(const struct lacp *);
 void lacp_configure(struct lacp *, const struct lacp_settings *);
 bool lacp_is_active(const struct lacp *);
 
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *bond,
+ const void *slave,
  const struct dp_packet *packet);
 enum lacp_status lacp_status(const struct lacp *);
 
diff --git a/ofproto/bond.c b/ofproto/bond.c
index f87cdba..5fc1e0e 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -777,6 +777,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
 {
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 ovs_rwlock_rdlock();
 slave = bond_slave_lookup(bond, slave_);
@@ -794,7 +795,13 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
  * drop all incoming traffic except if lacp_fallback_ab is enabled. */
 switch (bond->lacp_status) {
 case LACP_NEGOTIATED:
-verdict = slave->enabled ? BV_ACCEPT : BV_DROP;
+/*
+ * To reduce packet-drops due to delay in enabling of slave (post
+ * LACP-SYNC), from main thread, check for may_enable as well.
+ * When may_enable is TRUE, it means LACP is UP and waiting for
+ * the main thread to run LACP state machine and enable the slave.
+ */
+verdict = (slave->enabled || slave->may_enable) ? BV_ACCEPT : 
BV_DROP;
 goto out;
 case LACP_CONFIGURED:
 if (!bond->lacp_fall

Re: [ovs-dev] Conntrack performance drop in OVS 2.8

2018-07-04 Thread Nitin Katiyar
Hi,
Any suggestions/pointers on following?

Regards,
Nitin

From: Nitin Katiyar
Sent: Friday, June 29, 2018 3:00 PM
To: ovs-dev@openvswitch.org
Subject: Conntrack performance drop in OVS 2.8

Hi,
The performance of OVS 2.8 (with DPDK 17.05.02) with conntrack configuration 
has dropped significantly (especially for single flow case) as compared to OVS 
2.6 (with DPDK 16.11.4).

Following is the comparison between 2.6.2 and 2.8.2.
PKT Size   # of Flows MPPS (OVS 2.6)   MPPS (OVS 2.8) % Drop

64   1  3.37
1.8644.71
128 1  3.09 
   1.7443.52
256 1  2.66 
   1.1556.84
64   11.73  
  1.5113.03
128 11.68   
 1.4612.65
256 11.55   
 1.3413.60


OVS is configured with 2 DPDK ports (10G -Intel 82599)bonded in bond-slb mode 
and 1 VHU port. The VM is running the testpmd and it echoes the UDP packets.

I used following OF rules.

ovs-ofctl add-flow br-int "table=0, 
priority=10,ct_state=-trk,udp,actions=ct(table=1)"
ovs-ofctl add-flow br-int "table=1, 
priority=1000,ct_state=+new+trk,udp,in_port=10,actions=strip_vlan,ct(commit),output:101"

ovs-ofctl add-flow br-int "table=1, 
priority=900,ct_state=+est+trk,in_port=10,actions=strip_vlan,output:101"
ovs-ofctl add-flow br-int "table=1, 
priority=900,ct_state=+est+trk,in_port=101,actions=push_vlan:0x8100,mod_vlan_vid:4044,output:10".

There are 2 bridges configured in OVS which are connected through patch ports. 
Port 10 above is patch port and 101 is vhu port. OVS is configured with 2 DPDK 
ports (10G -Intel 82599) bonded in bond-slb mode and 1 VHU port. The VM is 
running the testpmd which echoes the UDP packets.

The generator (running on different server) is sending UDP traffic by altering 
the udp source port.

Has anyone else experienced the similar behavior?

Regards,
Nitin






I have simple configuration of SUT with one VM running testpmd echoing traffic 
on top of OVS.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 64Byte packet performance regression on 2.9 from 2.7

2018-07-02 Thread Nitin Katiyar
Hi,
I had tested 2.8.1/2 earlier which uses 17.05.01 or 17.05.02 and found around 
10% drop for udp traffic. OVS 2.7.4 gave the similar result as OVS 2.6.2 (DPDK 
16.11.4). I was using Intel Niantic 82599 for testing.

Regards,
Nitin

-Original Message-
From: Ilya Maximets [mailto:i.maxim...@samsung.com] 
Sent: Monday, July 02, 2018 10:25 PM
To: Shahaji Bhosle 
Cc: Jan Scheurich ; Jay Ding 
; Kevin Traynor ; Manasa Mudireddy 
; Nitin Katiyar ; 
Randy Schacher ; Stokes, Ian 
; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] 64Byte packet performance regression on 2.9 from 2.7

Sure, you need to collect perf records for the same binary, i.e. built with the 
same compiler options (and on the same machine), to make them useful.

Unfortunately, I have no setup to test your case right now.
Data for 2.8 could help bisecting the issue.

On 02.07.2018 18:04, Shahaji Bhosle wrote:
> Hi Ilya,
> Thanks for the reply.
> For performance traffic testing we are running with -O2. You are right about 
> the perf report, when were running with perf record we had set "-g -O0". Do 
> you need us to run with just "-g -O2" and give you the profile, or any other 
> optimization setting.
> Do you have a test setup for running 64B packets, and see the difference 
> between 2.7 and 2.9? On our side we are trying to get 2.8 to work so we can 
> give you an intermediate data point. Please let us know what we can do to 
> help you debug this.
> Thanks, Shahaji
> 
> 
> On Mon, Jul 2, 2018 at 10:55 AM, Ilya Maximets  <mailto:i.maxim...@samsung.com>> wrote:
> 
> Hi.
> Sorry for late response.
> 
> Looking at your perf data, I see functions like "dp_packet_batch_size"
> consuming ~0.5 - 0.7 % of time. Are you building with all compiler
> optimizations disabled? Otherwise where should be no such symbols in
> perf report. They should be completely inlined.
> 
> Best regards, Ilya Maximets.
> 
> On 27.06.2018 04:48, Shahaji Bhosle wrote:
> > Hi Ilya,
> > Just wanted to check if you found anything interesting. Or anything we 
> can try. Thanks, Shahaji
> > 
> > On Wed, Jun 20, 2018 at 9:01 AM, Shahaji Bhosle 
> mailto:shahaji.bho...@broadcom.com> 
> <mailto:shahaji.bho...@broadcom.com <mailto:shahaji.bho...@broadcom.com>>> 
> wrote:
> > 
> >     Thanks Ilya, 
> >      Sorry for the confusion with the number, we used to get some 
> different numbers on both ports so were recording it per port. You have to 
> compare it with the two port number
> > 
> >               CPU mask        Mpps
> >     17.11 testpmd     6 queue 0xfe    21.5 + 21.5
> >     OvS 2.9+DPDK17.11 6 queue 0xfe    15.5 + 15.5
> >     16.11 testpmd     6 queue 0xfe    21.5 + 21.5
> >     OvS 2.7+DPDK16.11 6 queue 0xfe    17.4+17.4
> > 
> > 
> >     Thanks, Shahaji
> > 
> >     On Wed, Jun 20, 2018 at 8:34 AM, Ilya Maximets 
> mailto:i.maxim...@samsung.com> 
> <mailto:i.maxim...@samsung.com <mailto:i.maxim...@samsung.com>>> wrote:
> > 
> >         Ok, I'll look at the data later.
> > 
> >         But your testpmd results are much lower than OVS results. 
> 21.5Mpps for testpmd
> >         versus 33.8Mpps for OVS. OVS should work slower than testpmd, 
> because it performs
> >         a lot of parsing and processing while testpmd does not.
> >         You probably tested testpmd in deifferent environment or 
> allocated less amount
> >         of resources for PMD htreads. Could you please recheck?
> > 
> >         What is your OVS configuration (pmd-cpu-mask, n_rxqs etc.)?
> >         And what is your testpmd command-line?
> > 
> >         On 20.06.2018 14:54, Shahaji Bhosle wrote:
> >         > Thanks Ilya,
> >         > Attaching the two perf reports...We did run testpmd on its 
> own, there were no red flags there. In some of the cases like flowgen 17.11 
> performs much better than 16.11, but for the macswap case, the numbers are 
> below. Let me know if you cannot see the attached perf reports. I can just 
> cut and paste them in the email if attachment does not work. Sorry I am not 
> sure I can post these on any outside servers. Let me know
> >         > Thanks, Shahaji
> >         > 
> >         > *DPDK on Maia (macswap)*      *Rings*         *Mpps*  
> *Cycles/Packet*
> >         > 17.11 testpmd                 6 queue         21.5 + 21.5     
> 60
> >         >                               1 queue         10.4+1

[ovs-dev] Conntrack performance drop in OVS 2.8

2018-06-29 Thread Nitin Katiyar
Hi,
The performance of OVS 2.8 (with DPDK 17.05.02) with conntrack configuration 
has dropped significantly (especially for single flow case) as compared to OVS 
2.6 (with DPDK 16.11.4).

Following is the comparison between 2.6.2 and 2.8.2.
PKT Size   # of Flows MPPS (OVS 2.6)   MPPS (OVS 2.8) % Drop

64   1  3.37
1.8644.71
128 1  3.09 
   1.7443.52
256 1  2.66 
   1.1556.84
64   11.73  
  1.5113.03
128 11.68   
 1.4612.65
256 11.55   
 1.3413.60


OVS is configured with 2 DPDK ports (10G -Intel 82599)bonded in bond-slb mode 
and 1 VHU port. The VM is running the testpmd and it echoes the UDP packets.

I used following OF rules.

ovs-ofctl add-flow br-int "table=0, 
priority=10,ct_state=-trk,udp,actions=ct(table=1)"
ovs-ofctl add-flow br-int "table=1, 
priority=1000,ct_state=+new+trk,udp,in_port=10,actions=strip_vlan,ct(commit),output:101"

ovs-ofctl add-flow br-int "table=1, 
priority=900,ct_state=+est+trk,in_port=10,actions=strip_vlan,output:101"
ovs-ofctl add-flow br-int "table=1, 
priority=900,ct_state=+est+trk,in_port=101,actions=push_vlan:0x8100,mod_vlan_vid:4044,output:10".

There are 2 bridges configured in OVS which are connected through patch ports. 
Port 10 above is patch port and 101 is vhu port. OVS is configured with 2 DPDK 
ports (10G -Intel 82599) bonded in bond-slb mode and 1 VHU port. The VM is 
running the testpmd which echoes the UDP packets.

The generator (running on different server) is sending UDP traffic by altering 
the udp source port.

Has anyone else experienced the similar behavior?

Regards,
Nitin






I have simple configuration of SUT with one VM running testpmd echoing traffic 
on top of OVS.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 64Byte packet performance regression on 2.9 from 2.7

2018-06-18 Thread Nitin Katiyar
Hi,
We also experienced degradation from OVS2.6/2.7 to OVS2.8.2(with DPDK17.05.02). 
The drop is more for 64 bytes packet size (~8-10%) even with higher number of 
flows. I tried OVS 2.8 with DPDK17.11 and it improved for higher packet sizes 
but 64 bytes size is still the concern.

Regards,
Nitin

-Original Message-
From: Ilya Maximets [mailto:i.maxim...@samsung.com] 
Sent: Monday, June 18, 2018 1:32 PM
To: ovs-dev@openvswitch.org; shahaji.bho...@broadcom.com
Subject: Re: [ovs-dev] 64Byte packet performance regression on 2.9 from 2.7

CC: Shahaji Bhosle

Sorry, missed you in CC list.

Best regards, Ilya Maximets.

On 15.06.2018 10:44, Ilya Maximets wrote:
>> Hi,
>> I just upgraded from OvS 2.7 + DPDK 16.11 to OvS2.9 + DPDK 17.11 and 
>> running into performance issue with 64 Byte packet rate. One 
>> interesting thing that I notice that even at very light load from 
>> IXIA the processing cycles on all the PMD threads run close to 100% 
>> of the cpu cycle on 2.9 OvS, but on OvS 2.7 even under full load the 
>> processing cycles remain at 75% of the cpu cycles.
>>
>> Attaching the FlameGraphs of both the versions, the only thing that 
>> pops out to me is the new way invoking netdev_send() is on 2.9 is 
>> being invoked via  dp_netdev_pmd_flush_output_packets() which seems 
>> to be adding another ~11% to the whole rx to tx path.
>>
>> I also did try the tx-flush-interval to 50 and more it does seem to 
>> help, but not significant enough to match the 2.7 performance.
>>
>>
>> Any help or ideas would be really great. Thanks, Shahaji
> 
> Hello, Shahaji.
> Could you, please, describe your testing scenario in more details? 
> Also, mail-list filters attachments, so they are not available. You 
> need to publish them somewhere else or write in text format inside the letter.
> 
> About the performance itself: Some performance degradation because of 
> output batching is expected for tests with low number of flows or 
> simple PHY-PHY tests. It was mainly targeted for cases with relatively 
> large number of flows, for amortizing of vhost-user penalties 
> (PHY-VM-PHY, VM-VM cases), OVS bonding cases.
> 
> If your test involves vhost-user ports, then you should also consider 
> vhost-user performance regression in stable DPDK 17.11 because of 
> fixes for CVE-2018-1059. Related bug:
>   https://dpdk.org/tracker/show_bug.cgi?id=48
> 
> It'll be good if you'll be able to test OVS 2.8 + DPDK 17.05. There 
> was too many changes since 2.7. It'll be hard to track down the root cause.
> 
> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 1/2] Fix packet drops on LACP bond after link up

2018-06-11 Thread Nitin Katiyar
Hi,
Changes look good to me.

Regards,
Nitin

-Original Message-
From: Manohar Krishnappa Chidambaraswamy 
Sent: Tuesday, June 05, 2018 4:07 PM
To: d...@openvswitch.org
Cc: Nitin Katiyar 
Subject: Re: [ovs-dev] [PATCH v1 1/2] Fix packet drops on LACP bond after link 
up

Hi,

Could someone take a look at this patch? We hit this issue (drops) on a 
customer setup and appreciate your comments/suggestions.

Thanx
Manu

On 17/05/18, 3:26 PM, "ovs-dev-boun...@openvswitch.org on behalf of Manohar 
Krishnappa Chidambaraswamy"  wrote:

1/2: Fix packet drops on LACP bond after link up

Problem:

During port DOWN->UP of link (slave) in a LACP bond, after receiving the
LACPDU with SYNC set for both actor and partner, the bond-slave remains
"disabled" until OVS main thread runs LACP state machine and eventually
"enables" the bond-slave. With this, we have observed delays in the order
of 350ms and packets are  dropped in OVS due to bond-admissibility check
(packets received on slave in "disabled" state are dropped).

Fix:

When a LACPDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Manohar K C

CC: Jan Scheurich 
CC: Nitin Katiyar 

---

v1 2/2: Avoid LACP negotiation when NIC driver reports PHY status as DOWN.

lib/lacp.c   | 14 --
lib/lacp.h   |  3 ++-
ofproto/bond.c   | 18 +++---
ofproto/ofproto-dpif-xlate.c | 13 -
4 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index 8353746..8d97ad5 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -154,6 +154,7 @@ static struct slave *slave_lookup(const struct lacp *, 
const void *slave)
 OVS_REQUIRES(mutex);
static bool info_tx_equal(struct lacp_info *, struct lacp_info *)
 OVS_REQUIRES(mutex);
+static bool slave_may_enable__(struct slave *slave) OVS_REQUIRES(mutex);
static unixctl_cb_func lacp_unixctl_show;
static unixctl_cb_func lacp_unixctl_show_stats;
@@ -324,8 +325,8 @@ lacp_is_active(const struct lacp *lacp) 
OVS_EXCLUDED(mutex)
/* Processes 'packet' which was received on 'slave_'.  This function should 
be
  * called on all packets received on 'slave_' with Ethernet Type 
ETH_TYPE_LACP.
  */
-void
-lacp_process_packet(struct lacp *lacp, const void *slave_,
+bool
+lacp_process_packet(struct lacp *lacp, const void *bond, const void 
*slave_,
 const struct dp_packet *packet)
 OVS_EXCLUDED(mutex)
{
@@ -333,6 +334,7 @@ lacp_process_packet(struct lacp *lacp, const void 
*slave_,
 const struct lacp_pdu *pdu;
 long long int tx_rate;
 struct slave *slave;
+bool lacp_may_enable = false;
 lacp_lock();
 slave = slave_lookup(lacp, slave_);
@@ -362,8 +364,16 @@ lacp_process_packet(struct lacp *lacp, const void 
*slave_,
 slave->partner = pdu->actor;
 }
+/*
+ * Evaluate may_enable here to avoid dropping of packets till main 
thread
+ * sets may_enable to true.
+ */
+lacp_may_enable = slave_may_enable__(slave);
+
out:
 lacp_unlock();
+
+return lacp_may_enable;
}
/* Returns the lacp_status of the given 'lacp' object (which may be NULL). 
*/
diff --git a/lib/lacp.h b/lib/lacp.h
index f35cff5..1505c2c 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -46,7 +46,8 @@ struct lacp *lacp_ref(const struct lacp *);
void lacp_configure(struct lacp *, const struct lacp_settings *);
bool lacp_is_active(const struct lacp *);
-void lacp_process_packet(struct lacp *, const void *slave,
+bool lacp_process_packet(struct lacp *, const void *bond,
+ const void *slave,
  const struct dp_packet *packet);
enum lacp_status lacp_status(const struct lacp *);
diff --git a/ofproto/bond.c b/ofproto/bond.c
index 11d28e1..7ca3687 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -777,6 +777,7 @@ bond_check_admissibility(struct bond *bond, const void 
*slave_,
{
 enum bond_verdict verdict = BV_DROP;
 struct bond_slave *slave;
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 ovs_rwlock_rdlock();
  

[ovs-dev] [PATCH] Adding cli for displaying LACP counters

2018-04-18 Thread Nitin Katiyar
Currently OVS does not provide any command to display stats for LACP
without which it is difficult to debug LACP issues. Here we propose
to display various statistics about LACP PDUs and slave state change.

Sample output:

ovs_lacp # ovs-appctl lacp/stats-show
 bond-prv statistics 

slave: dpdk0:
RX PDUs: 128
RX Bad PDUs: 0
TX PDUs: 5
Link Expired: 2
Link Defaulted: 1
Carrier Status Changed: 0

Signed-off-by: Nitin Katiyar <nitin.kati...@ericsson.com>
---
 NEWS   |  2 ++
 lib/lacp.c | 79 --
 vswitchd/ovs-vswitchd.8.in |  6 
 3 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 77c2b43..bdbe7e2 100644
--- a/NEWS
+++ b/NEWS
@@ -12,6 +12,8 @@ Post-v2.9.0
default it always accepts names and in interactive use it displays them;
use --names or --no-names to override.  See ovs-ofctl(8) for details.
- ovs-vsctl: New commands "add-bond-iface" and "del-bond-iface".
+   - ovs-appctl:
+ * New command "lacp/show-stats"
- OpenFlow:
  * OFPT_ROLE_STATUS is now available in OpenFlow 1.3.
- Linux kernel 4.14
diff --git a/lib/lacp.c b/lib/lacp.c
index fdefeda..8353746 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -127,9 +127,12 @@ struct slave {
 struct timer tx;  /* Next message transmission timer. */
 struct timer rx;  /* Expected message receive timer. */
 
-uint32_t count_rx_pdus;   /* dot3adAggPortStatsLACPDUsRx */
-uint32_t count_rx_pdus_bad;   /* dot3adAggPortStatsIllegalRx */
-uint32_t count_tx_pdus;   /* dot3adAggPortStatsLACPDUsTx */
+uint32_t count_rx_pdus; /* dot3adAggPortStatsLACPDUsRx */
+uint32_t count_rx_pdus_bad; /* dot3adAggPortStatsIllegalRx */
+uint32_t count_tx_pdus; /* dot3adAggPortStatsLACPDUsTx */
+uint32_t count_link_expired;/* Num of times link expired */
+uint32_t count_link_defaulted;  /* Num of times link defaulted */
+uint32_t count_carrier_changed; /* Num of times link status changed */
 };
 
 static struct ovs_mutex mutex;
@@ -153,6 +156,7 @@ static bool info_tx_equal(struct lacp_info *, struct 
lacp_info *)
 OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
+static unixctl_cb_func lacp_unixctl_show_stats;
 
 /* Populates 'pdu' with a LACP PDU comprised of 'actor' and 'partner'. */
 static void
@@ -206,6 +210,8 @@ lacp_init(void)
 {
 unixctl_command_register("lacp/show", "[port]", 0, 1,
  lacp_unixctl_show, NULL);
+unixctl_command_register("lacp/show-stats", "[port]", 0, 1,
+ lacp_unixctl_show_stats, NULL);
 }
 
 static void
@@ -459,6 +465,7 @@ lacp_slave_carrier_changed(const struct lacp *lacp, const 
void *slave_)
 if (slave->status == LACP_CURRENT || slave->lacp->active) {
 slave_set_expired(slave);
 }
+slave->count_carrier_changed++;
 
 out:
 lacp_unlock();
@@ -525,8 +532,10 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 
 if (slave->status == LACP_CURRENT) {
 slave_set_expired(slave);
+slave->count_link_expired++;
 } else if (slave->status == LACP_EXPIRED) {
 slave_set_defaulted(slave);
+slave->count_link_defaulted++;
 }
 if (slave->status != old_status) {
 seq_change(connectivity_seq_get());
@@ -994,6 +1003,40 @@ lacp_print_details(struct ds *ds, struct lacp *lacp) 
OVS_REQUIRES(mutex)
 }
 
 static void
+lacp_print_stats(struct ds *ds, struct lacp *lacp) OVS_REQUIRES(mutex)
+{
+struct shash slave_shash = SHASH_INITIALIZER(_shash);
+const struct shash_node **sorted_slaves = NULL;
+
+struct slave *slave;
+int i;
+
+ds_put_format(ds, " %s statistics \n", lacp->name);
+
+HMAP_FOR_EACH (slave, node, >slaves) {
+shash_add(_shash, slave->name, slave);
+}
+sorted_slaves = shash_sort(_shash);
+
+for (i = 0; i < shash_count(_shash); i++) {
+slave = sorted_slaves[i]->data;
+ds_put_format(ds, "\nslave: %s:\n", slave->name);
+ds_put_format(ds, "\tRX PDUs: %u\n", slave->count_rx_pdus);
+ds_put_format(ds, "\tRX Bad PDUs: %u\n", slave->count_rx_pdus_bad);
+ds_put_format(ds, "\tTX PDUs: %u\n", slave->count_tx_pdus);
+ds_put_format(ds, "\tLink Expired: %u\n",
+  slave->count_link_expired);
+ds_put_format(ds, "\tLink Defaulted: %u\n",
+  slave->count_link_defaulted);
+ds_put_format(ds, "\tCarrier Status Changed: %u\n",
+  slave->count_carrier_changed

Re: [ovs-dev] [PATCH] Adding cli for displaying LACP counters

2018-04-12 Thread Nitin Katiyar

Hi Ben,
Thanks for reviewing it.
Will update the documentation and NEWS item soon.

Regards,
Nitin
-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Sunday, April 01, 2018 6:06 AM
To: Nitin Katiyar <nitin.kati...@ericsson.com>
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH] Adding cli for displaying LACP counters

On Thu, Mar 29, 2018 at 02:16:34AM +0530, Nitin Katiyar wrote:
> Currently OVS does not provide any command to display stats for LACP 
> without which it is difficult to debug LACP issues. Here we propose to 
> display various statistics about LACP PDUs and slave state change.

Thanks for adding this feature.  The implementation looks good.

Please add documentation for this feature in ovs-vswitchd(8).

Please add a NEWS item.

Thanks,

Ben.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Adding cli for displaying LACP counters

2018-03-28 Thread Nitin Katiyar
Currently OVS does not provide any command to display stats for LACP
without which it is difficult to debug LACP issues. Here we propose
to display various statistics about LACP PDUs and slave state change.

Sample output:

ovs_lacp # ovs-appctl lacp/stats-show
 bond-prv statistics 

slave: dpdk0:
RX PDUs: 128
RX Bad PDUs: 0
TX PDUs: 5
Link Expired: 2
Link Defaulted: 1
Carrier Status Changed: 0

Signed-off-by: Nitin Katiyar <nitin.kati...@ericsson.com>
---
 lib/lacp.c | 79 +++---
 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/lib/lacp.c b/lib/lacp.c
index fdefeda..8353746 100644
--- a/lib/lacp.c
+++ b/lib/lacp.c
@@ -127,9 +127,12 @@ struct slave {
 struct timer tx;  /* Next message transmission timer. */
 struct timer rx;  /* Expected message receive timer. */
 
-uint32_t count_rx_pdus;   /* dot3adAggPortStatsLACPDUsRx */
-uint32_t count_rx_pdus_bad;   /* dot3adAggPortStatsIllegalRx */
-uint32_t count_tx_pdus;   /* dot3adAggPortStatsLACPDUsTx */
+uint32_t count_rx_pdus; /* dot3adAggPortStatsLACPDUsRx */
+uint32_t count_rx_pdus_bad; /* dot3adAggPortStatsIllegalRx */
+uint32_t count_tx_pdus; /* dot3adAggPortStatsLACPDUsTx */
+uint32_t count_link_expired;/* Num of times link expired */
+uint32_t count_link_defaulted;  /* Num of times link defaulted */
+uint32_t count_carrier_changed; /* Num of times link status changed */
 };
 
 static struct ovs_mutex mutex;
@@ -153,6 +156,7 @@ static bool info_tx_equal(struct lacp_info *, struct 
lacp_info *)
 OVS_REQUIRES(mutex);
 
 static unixctl_cb_func lacp_unixctl_show;
+static unixctl_cb_func lacp_unixctl_show_stats;
 
 /* Populates 'pdu' with a LACP PDU comprised of 'actor' and 'partner'. */
 static void
@@ -206,6 +210,8 @@ lacp_init(void)
 {
 unixctl_command_register("lacp/show", "[port]", 0, 1,
  lacp_unixctl_show, NULL);
+unixctl_command_register("lacp/show-stats", "[port]", 0, 1,
+ lacp_unixctl_show_stats, NULL);
 }
 
 static void
@@ -459,6 +465,7 @@ lacp_slave_carrier_changed(const struct lacp *lacp, const 
void *slave_)
 if (slave->status == LACP_CURRENT || slave->lacp->active) {
 slave_set_expired(slave);
 }
+slave->count_carrier_changed++;
 
 out:
 lacp_unlock();
@@ -525,8 +532,10 @@ lacp_run(struct lacp *lacp, lacp_send_pdu *send_pdu) 
OVS_EXCLUDED(mutex)
 
 if (slave->status == LACP_CURRENT) {
 slave_set_expired(slave);
+slave->count_link_expired++;
 } else if (slave->status == LACP_EXPIRED) {
 slave_set_defaulted(slave);
+slave->count_link_defaulted++;
 }
 if (slave->status != old_status) {
 seq_change(connectivity_seq_get());
@@ -994,6 +1003,40 @@ lacp_print_details(struct ds *ds, struct lacp *lacp) 
OVS_REQUIRES(mutex)
 }
 
 static void
+lacp_print_stats(struct ds *ds, struct lacp *lacp) OVS_REQUIRES(mutex)
+{
+struct shash slave_shash = SHASH_INITIALIZER(_shash);
+const struct shash_node **sorted_slaves = NULL;
+
+struct slave *slave;
+int i;
+
+ds_put_format(ds, " %s statistics \n", lacp->name);
+
+HMAP_FOR_EACH (slave, node, >slaves) {
+shash_add(_shash, slave->name, slave);
+}
+sorted_slaves = shash_sort(_shash);
+
+for (i = 0; i < shash_count(_shash); i++) {
+slave = sorted_slaves[i]->data;
+ds_put_format(ds, "\nslave: %s:\n", slave->name);
+ds_put_format(ds, "\tRX PDUs: %u\n", slave->count_rx_pdus);
+ds_put_format(ds, "\tRX Bad PDUs: %u\n", slave->count_rx_pdus_bad);
+ds_put_format(ds, "\tTX PDUs: %u\n", slave->count_tx_pdus);
+ds_put_format(ds, "\tLink Expired: %u\n",
+  slave->count_link_expired);
+ds_put_format(ds, "\tLink Defaulted: %u\n",
+  slave->count_link_defaulted);
+ds_put_format(ds, "\tCarrier Status Changed: %u\n",
+  slave->count_carrier_changed);
+}
+
+shash_destroy(_shash);
+free(sorted_slaves);
+}
+
+static void
 lacp_unixctl_show(struct unixctl_conn *conn, int argc, const char *argv[],
   void *aux OVS_UNUSED) OVS_EXCLUDED(mutex)
 {
@@ -1021,6 +1064,36 @@ out:
 lacp_unlock();
 }
 
+static void
+lacp_unixctl_show_stats(struct unixctl_conn *conn,
+  int argc,
+  const char *argv[],
+  void *aux OVS_UNUSED) OVS_EXCLUDED(mutex)
+{
+struct ds ds = DS_EMPTY_INITIALIZER;
+struct lacp *lacp;
+
+lacp_lock();
+if (argc > 1) {
+lacp = l

Re: [ovs-dev] [PATCH] netdev-dpdk: defer MTU set after interface start

2017-11-24 Thread Nitin Katiyar
Hi,
Is it being called after device start? If yes, then *dev_mtu_set() will not 
serve the purpose as this function expects the device to be stopped first.

Regards,
Nitin
-Original Message-
From: Stokes, Ian [mailto:ian.sto...@intel.com] 
Sent: Friday, November 24, 2017 1:57 PM
To: Matteo Croce ; d...@openvswitch.org; Kavanagh, Mark B 

Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: defer MTU set after interface start

> Some PMD assumes that the RX buffers are already allocated when 
> setting the device MTU, because the RX buffer size depends on the MTU.
> This worked until 67fe6d635193 added a call to rte_eth_dev_set_mtu() 
> in the init code, which would set the MTU before the RX buffer 
> allocation, triggering a segmentation fault with some PMD:
> 
> Stack trace of thread 20680:
> #0  0x559464396534 qede_set_mtu (ovs-vswitchd)
> #1  0x559464323c41 rte_eth_dev_set_mtu (ovs-vswitchd)
> #2  0x5594645f5e85 dpdk_eth_dev_queue_setup (ovs-vswitchd)
> #3  0x5594645f8ae6 netdev_dpdk_reconfigure (ovs-vswitchd)
> #4  0x55946452225c reconfigure_datapath (ovs-vswitchd)
> #5  0x559464522d07 do_add_port (ovs-vswitchd)
> #6  0x559464522e8d dpif_netdev_port_add (ovs-vswitchd)
> #7  0x559464528dde dpif_port_add (ovs-vswitchd)
> #8  0x5594644dc0e0 port_add (ovs-vswitchd)
> #9  0x5594644d2ab1 ofproto_port_add (ovs-vswitchd)
> #10 0x5594644c0a85 bridge_add_ports__ (ovs-vswitchd)
> #11 0x5594644c26e8 bridge_reconfigure (ovs-vswitchd)
> #12 0x5594644c5c49 bridge_run (ovs-vswitchd)
> #13 0x5594642d4155 main (ovs-vswitchd)
> #14 0x7f0e1444bc05 __libc_start_main (libc.so.6)
> #15 0x5594642d8328 _start (ovs-vswitchd)
> 
> A possible solution could be to move the first call to
> rte_eth_dev_set_mtu() just after the device start instead of
> dpdk_eth_dev_queue_setup() which, by the way, set the MTU multiple 
> times as the call to rte_eth_dev_set_mtu() was in a loop.
> 
> CC: Mark Kavanagh 
> Fixes: 67fe6d635193 ("netdev-dpdk: use rte_eth_dev_set_mtu.")
> Signed-off-by: Matteo Croce 
> ---
>  lib/netdev-dpdk.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> 76e79be25..229aa4a76 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -750,13 +750,6 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, 
> int n_rxq, int n_txq)
>  break;
>  }
> 
> -diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);
> -if (diag) {
> -VLOG_ERR("Interface %s MTU (%d) setup error: %s",
> -dev->up.name, dev->mtu, rte_strerror(-diag));
> -break;
> -}
> -
>  for (i = 0; i < n_txq; i++) {
>  diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
>dev->socket_id, NULL); @@ -
> 849,6 +842,13 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>  return -diag;
>  }
> 
> +diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);
> +if (diag) {
> +VLOG_ERR("Interface %s MTU (%d) setup error: %s",
> +dev->up.name, dev->mtu, rte_strerror(-diag));
> +return -diag;
> +}
> +

In my testing I didn't see any issues here, but would like to flag it to our 
MTU Guru (Mark K).

Any opinion on this? From the API description it looks like it is ok?


>  rte_eth_promiscuous_enable(dev->port_id);
>  rte_eth_allmulticast_enable(dev->port_id);
> 
> --
> 2.13.6
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Proposal for enabling dp_hash irrespective of OF version

2017-09-25 Thread Nitin Katiyar
Hi,
I have a proposal to add a provision for using "dp_hash" as selection method 
(which can currently be used with OF1.5 based controllers only) irrespective of 
the OF version being used by controller.

The link for document is : 
https://docs.google.com/document/d/13Jiwbs0atV_Y8Vatj6SmQeB_qdM-AyHu8uhtsO6P-_8/edit?usp=sharing

Let me know your comments.

Regards,
Nitin

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] MTU in i40e dpdk driver

2017-09-18 Thread Nitin Katiyar
Hi,
Yes, the tag is configured for VHU port so traffic from VM would be tagged with 
vlan.  Why is it different from 10G (ixgbe) driver? It should allow the packet 
matching  with the configured MTU. Is it expected behavior with i40e driver?

Thanks,
Nitin

-Original Message-
From: Kavanagh, Mark B [mailto:mark.b.kavan...@intel.com] 
Sent: Monday, September 18, 2017 7:44 PM
To: Nitin Katiyar <nitin.kati...@ericsson.com>; ovs-dev@openvswitch.org
Subject: RE: [ovs-dev] MTU in i40e dpdk driver

>From: Nitin Katiyar [mailto:nitin.kati...@ericsson.com]
>Sent: Monday, September 18, 2017 3:02 PM
>To: Kavanagh, Mark B <mark.b.kavan...@intel.com>; 
>ovs-dev@openvswitch.org
>Subject: RE: [ovs-dev] MTU in i40e dpdk driver
>
>Hi,
>It is set to 2140.

That should accommodate a max packet length of 2158 (i.e. MTU + ETHER_HDR (14B) 
+ ETHER_CRC (4B)).

Is the VM inside a VLAN by any chance? The presence of a VLAN tag would account 
for the additional 4B.

-Mark

>
>compute-0-4:~# ovs-vsctl get Interface dpdk1 mtu
>2140
>
>Regards,
>Nitin
>
>-Original Message-
>From: Kavanagh, Mark B [mailto:mark.b.kavan...@intel.com]
>Sent: Monday, September 18, 2017 7:26 PM
>To: Nitin Katiyar <nitin.kati...@ericsson.com>; ovs-dev@openvswitch.org
>Subject: Re: [ovs-dev] MTU in i40e dpdk driver
>
>>From: ovs-dev-boun...@openvswitch.org
>>[mailto:ovs-dev-boun...@openvswitch.org]
>>On Behalf Of Nitin Katiyar
>>Sent: Monday, September 18, 2017 2:05 PM
>>To: ovs-dev@openvswitch.org
>>Subject: [ovs-dev] MTU in i40e dpdk driver
>>
>>Hi,
>>We are using OVS-DPDK (2.6 version) with Fortville NIC (configured in 
>>25G
>>mode) being used as dpdk port. The setup involves 2 VMs running on 2 
>>different computes (destination VM in compute with 10G NIC while 
>>originating VM is in compute with Fortville NIC). All the interfaces 
>>in the path are configured with MTU of 2140.
>>
>>While pinging with size of 2112 (IP packet of 2140 bytes) we found 
>>that ping response does not reach originating VM (i.e on compute with 
>>Fortville
>NIC) .
>>DPDK interface does not show any drop but we don't see any ping 
>>response received at DPDK port (verified using port-mirroring). We 
>>also don't see any rule in ovs dpctl for ping response. If we increase 
>>the MTU of DPDK interface by 4 bytes  or reduce the ping size by 4 
>>bytes then it
>works.
>>
>>The same configuration works between 10G NICs on both sides.
>>
>>Is it a known issue with i40 dpdk driver?
>
>Hi Nitin,
>
>What is the MTU of the DPDK ports in this setup?
>
>   ovs-vscl get Interface  mtu
>
>Thanks,
>Mark
>
>>
>>Regards,
>>Nitin
>>___
>>dev mailing list
>>d...@openvswitch.org
>>https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] MTU in i40e dpdk driver

2017-09-18 Thread Nitin Katiyar
Hi,
It is set to 2140.

compute-0-4:~# ovs-vsctl get Interface dpdk1 mtu
2140

Regards,
Nitin

-Original Message-
From: Kavanagh, Mark B [mailto:mark.b.kavan...@intel.com] 
Sent: Monday, September 18, 2017 7:26 PM
To: Nitin Katiyar <nitin.kati...@ericsson.com>; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] MTU in i40e dpdk driver

>From: ovs-dev-boun...@openvswitch.org 
>[mailto:ovs-dev-boun...@openvswitch.org]
>On Behalf Of Nitin Katiyar
>Sent: Monday, September 18, 2017 2:05 PM
>To: ovs-dev@openvswitch.org
>Subject: [ovs-dev] MTU in i40e dpdk driver
>
>Hi,
>We are using OVS-DPDK (2.6 version) with Fortville NIC (configured in 
>25G
>mode) being used as dpdk port. The setup involves 2 VMs running on 2 
>different computes (destination VM in compute with 10G NIC while 
>originating VM is in compute with Fortville NIC). All the interfaces in 
>the path are configured with MTU of 2140.
>
>While pinging with size of 2112 (IP packet of 2140 bytes) we found that 
>ping response does not reach originating VM (i.e on compute with Fortville 
>NIC) .
>DPDK interface does not show any drop but we don't see any ping 
>response received at DPDK port (verified using port-mirroring). We also 
>don't see any rule in ovs dpctl for ping response. If we increase the 
>MTU of DPDK interface by 4 bytes  or reduce the ping size by 4 bytes then it 
>works.
>
>The same configuration works between 10G NICs on both sides.
>
>Is it a known issue with i40 dpdk driver?

Hi Nitin,

What is the MTU of the DPDK ports in this setup?

ovs-vscl get Interface  mtu

Thanks,
Mark

>
>Regards,
>Nitin
>___
>dev mailing list
>d...@openvswitch.org
>https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] MTU in i40e dpdk driver

2017-09-18 Thread Nitin Katiyar
Hi,
We are using OVS-DPDK (2.6 version) with Fortville NIC (configured in 25G mode) 
being used as dpdk port. The setup involves 2 VMs running on 2 different 
computes (destination VM in compute with 10G NIC while originating VM is in 
compute with Fortville NIC). All the interfaces in the path are configured with 
MTU of 2140.

While pinging with size of 2112 (IP packet of 2140 bytes) we found that ping 
response does not reach originating VM (i.e on compute with Fortville NIC) . 
DPDK interface does not show any drop but we don't see any ping response 
received at DPDK port (verified using port-mirroring). We also don't see any 
rule in ovs dpctl for ping response. If we increase the MTU of DPDK interface 
by 4 bytes  or reduce the ping size by 4 bytes then it works.

The same configuration works between 10G NICs on both sides.

Is it a known issue with i40 dpdk driver?

Regards,
Nitin
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev