Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-26 Thread David Miller
From: Jarod Wilson 
Date: Fri, 24 May 2019 09:49:28 -0400

> Once in a while, with just the right timing, 802.3ad slaves will fail to
> properly initialize, winding up in a weird state, with a partner system
> mac address of 00:00:00:00:00:00. This started happening after a fix to
> properly track link_failure_count tracking, where an 802.3ad slave that
> reported itself as link up in the miimon code, but wasn't able to get a
> valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
> BOND_LINK_DOWN. That was the proper thing to do for the general "my link
> went down" case, but has created a link initialization race that can put
> the interface in this odd state.
> 
> The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
> if the link has never been up (last_link_up == 0), so the link state
> doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
> in this case, it simply hasn't been up yet, and this prevents the
> unnecessary state change from DOWN to FAIL and getting stuck in an init
> failure w/o a partner mac.
> 
> Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking")
> Tested-by: Heesoon Kim 
> Signed-off-by: Jarod Wilson 

Applied and queued up for -stable.


Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread Jarod Wilson

On 5/24/19 6:38 PM, Mahesh Bandewar (महेश बंडेवार) wrote:

On Fri, May 24, 2019 at 2:17 PM Jay Vosburgh  wrote:


Jarod Wilson  wrote:


Once in a while, with just the right timing, 802.3ad slaves will fail to
properly initialize, winding up in a weird state, with a partner system
mac address of 00:00:00:00:00:00. This started happening after a fix to
properly track link_failure_count tracking, where an 802.3ad slave that
reported itself as link up in the miimon code, but wasn't able to get a
valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
BOND_LINK_DOWN. That was the proper thing to do for the general "my link
went down" case, but has created a link initialization race that can put
the interface in this odd state.



Are there any notification consequences because of this change?


No, there shouldn't be, it just makes initial link-up cleaner, 
everything during runtime once the link is initialized should remain the 
same.


--
Jarod Wilson
ja...@redhat.com


Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread Jarod Wilson

On 5/24/19 5:16 PM, Jay Vosburgh wrote:

Jarod Wilson  wrote:


Once in a while, with just the right timing, 802.3ad slaves will fail to
properly initialize, winding up in a weird state, with a partner system
mac address of 00:00:00:00:00:00. This started happening after a fix to
properly track link_failure_count tracking, where an 802.3ad slave that
reported itself as link up in the miimon code, but wasn't able to get a
valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
BOND_LINK_DOWN. That was the proper thing to do for the general "my link
went down" case, but has created a link initialization race that can put
the interface in this odd state.


Reading back in the git history, the ultimate cause of this
"weird state" appears to be devices that assert NETDEV_UP prior to
actually being able to supply sane speed/duplex values, correct?

Presuming that this is the case, I don't see that there's much
else to be done here, and so:

Acked-by: Jay Vosburgh 


Correct, we've got an miimon "device is up", but still can't get speed 
and/or duplex in this case.


--
Jarod Wilson
ja...@redhat.com


Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread महेश बंडेवार
On Fri, May 24, 2019 at 2:17 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >Once in a while, with just the right timing, 802.3ad slaves will fail to
> >properly initialize, winding up in a weird state, with a partner system
> >mac address of 00:00:00:00:00:00. This started happening after a fix to
> >properly track link_failure_count tracking, where an 802.3ad slave that
> >reported itself as link up in the miimon code, but wasn't able to get a
> >valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
> >BOND_LINK_DOWN. That was the proper thing to do for the general "my link
> >went down" case, but has created a link initialization race that can put
> >the interface in this odd state.
>
Are there any notification consequences because of this change?

>Reading back in the git history, the ultimate cause of this
> "weird state" appears to be devices that assert NETDEV_UP prior to
> actually being able to supply sane speed/duplex values, correct?
>
> Presuming that this is the case, I don't see that there's much
> else to be done here, and so:
>
> Acked-by: Jay Vosburgh 
>
> >The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
> >if the link has never been up (last_link_up == 0), so the link state
> >doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
> >in this case, it simply hasn't been up yet, and this prevents the
> >unnecessary state change from DOWN to FAIL and getting stuck in an init
> >failure w/o a partner mac.
> >
> >Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking")
> >CC: Jay Vosburgh 
> >CC: Veaceslav Falico 
> >CC: Andy Gospodarek 
> >CC: "David S. Miller" 
> >CC: net...@vger.kernel.org
> >Tested-by: Heesoon Kim 
> >Signed-off-by: Jarod Wilson 
>
>
>
> >---
> > drivers/net/bonding/bond_main.c | 15 ++-
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c 
> >b/drivers/net/bonding/bond_main.c
> >index 062fa7e3af4c..407f4095a37a 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long 
> >event,
> >   case NETDEV_CHANGE:
> >   /* For 802.3ad mode only:
> >* Getting invalid Speed/Duplex values here will put slave
> >-   * in weird state. So mark it as link-fail for the time
> >-   * being and let link-monitoring (miimon) set it right when
> >-   * correct speeds/duplex are available.
> >+   * in weird state. Mark it as link-fail if the link was
> >+   * previously up or link-down if it hasn't yet come up, and
> >+   * let link-monitoring (miimon) set it right when correct
> >+   * speeds/duplex are available.
> >*/
> >   if (bond_update_speed_duplex(slave) &&
> >-  BOND_MODE(bond) == BOND_MODE_8023AD)
> >-  slave->link = BOND_LINK_FAIL;
> >+  BOND_MODE(bond) == BOND_MODE_8023AD) {
> >+  if (slave->last_link_up)
> >+  slave->link = BOND_LINK_FAIL;
> >+  else
> >+  slave->link = BOND_LINK_DOWN;
> >+  }
> >
> >   if (BOND_MODE(bond) == BOND_MODE_8023AD)
> >   bond_3ad_adapter_speed_duplex_changed(slave);
> >--
> >2.20.1
> >


Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread Jay Vosburgh
Jarod Wilson  wrote:

>Once in a while, with just the right timing, 802.3ad slaves will fail to
>properly initialize, winding up in a weird state, with a partner system
>mac address of 00:00:00:00:00:00. This started happening after a fix to
>properly track link_failure_count tracking, where an 802.3ad slave that
>reported itself as link up in the miimon code, but wasn't able to get a
>valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
>BOND_LINK_DOWN. That was the proper thing to do for the general "my link
>went down" case, but has created a link initialization race that can put
>the interface in this odd state.

Reading back in the git history, the ultimate cause of this
"weird state" appears to be devices that assert NETDEV_UP prior to
actually being able to supply sane speed/duplex values, correct?

Presuming that this is the case, I don't see that there's much
else to be done here, and so:

Acked-by: Jay Vosburgh 

>The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
>if the link has never been up (last_link_up == 0), so the link state
>doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
>in this case, it simply hasn't been up yet, and this prevents the
>unnecessary state change from DOWN to FAIL and getting stuck in an init
>failure w/o a partner mac.
>
>Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking")
>CC: Jay Vosburgh 
>CC: Veaceslav Falico 
>CC: Andy Gospodarek 
>CC: "David S. Miller" 
>CC: net...@vger.kernel.org
>Tested-by: Heesoon Kim 
>Signed-off-by: Jarod Wilson 



>---
> drivers/net/bonding/bond_main.c | 15 ++-
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 062fa7e3af4c..407f4095a37a 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long event,
>   case NETDEV_CHANGE:
>   /* For 802.3ad mode only:
>* Getting invalid Speed/Duplex values here will put slave
>-   * in weird state. So mark it as link-fail for the time
>-   * being and let link-monitoring (miimon) set it right when
>-   * correct speeds/duplex are available.
>+   * in weird state. Mark it as link-fail if the link was
>+   * previously up or link-down if it hasn't yet come up, and
>+   * let link-monitoring (miimon) set it right when correct
>+   * speeds/duplex are available.
>*/
>   if (bond_update_speed_duplex(slave) &&
>-  BOND_MODE(bond) == BOND_MODE_8023AD)
>-  slave->link = BOND_LINK_FAIL;
>+  BOND_MODE(bond) == BOND_MODE_8023AD) {
>+  if (slave->last_link_up)
>+  slave->link = BOND_LINK_FAIL;
>+  else
>+  slave->link = BOND_LINK_DOWN;
>+  }
> 
>   if (BOND_MODE(bond) == BOND_MODE_8023AD)
>   bond_3ad_adapter_speed_duplex_changed(slave);
>-- 
>2.20.1
>


[PATCH net] bonding/802.3ad: fix slave link initialization transition states

2019-05-24 Thread Jarod Wilson
Once in a while, with just the right timing, 802.3ad slaves will fail to
properly initialize, winding up in a weird state, with a partner system
mac address of 00:00:00:00:00:00. This started happening after a fix to
properly track link_failure_count tracking, where an 802.3ad slave that
reported itself as link up in the miimon code, but wasn't able to get a
valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
BOND_LINK_DOWN. That was the proper thing to do for the general "my link
went down" case, but has created a link initialization race that can put
the interface in this odd state.

The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
if the link has never been up (last_link_up == 0), so the link state
doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
in this case, it simply hasn't been up yet, and this prevents the
unnecessary state change from DOWN to FAIL and getting stuck in an init
failure w/o a partner mac.

Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking")
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: net...@vger.kernel.org
Tested-by: Heesoon Kim 
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 062fa7e3af4c..407f4095a37a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long event,
case NETDEV_CHANGE:
/* For 802.3ad mode only:
 * Getting invalid Speed/Duplex values here will put slave
-* in weird state. So mark it as link-fail for the time
-* being and let link-monitoring (miimon) set it right when
-* correct speeds/duplex are available.
+* in weird state. Mark it as link-fail if the link was
+* previously up or link-down if it hasn't yet come up, and
+* let link-monitoring (miimon) set it right when correct
+* speeds/duplex are available.
 */
if (bond_update_speed_duplex(slave) &&
-   BOND_MODE(bond) == BOND_MODE_8023AD)
-   slave->link = BOND_LINK_FAIL;
+   BOND_MODE(bond) == BOND_MODE_8023AD) {
+   if (slave->last_link_up)
+   slave->link = BOND_LINK_FAIL;
+   else
+   slave->link = BOND_LINK_DOWN;
+   }
 
if (BOND_MODE(bond) == BOND_MODE_8023AD)
bond_3ad_adapter_speed_duplex_changed(slave);
-- 
2.20.1