Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states
From: Jarod Wilson Date: Fri, 24 May 2019 09:49:28 -0400 > Once in a while, with just the right timing, 802.3ad slaves will fail to > properly initialize, winding up in a weird state, with a partner system > mac address of 00:00:00:00:00:00. This started happening after a fix to > properly track link_failure_count tracking, where an 802.3ad slave that > reported itself as link up in the miimon code, but wasn't able to get a > valid speed/duplex, started getting set to BOND_LINK_FAIL instead of > BOND_LINK_DOWN. That was the proper thing to do for the general "my link > went down" case, but has created a link initialization race that can put > the interface in this odd state. > > The simple fix is to instead set the slave link to BOND_LINK_DOWN again, > if the link has never been up (last_link_up == 0), so the link state > doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed > in this case, it simply hasn't been up yet, and this prevents the > unnecessary state change from DOWN to FAIL and getting stuck in an init > failure w/o a partner mac. > > Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking") > Tested-by: Heesoon Kim > Signed-off-by: Jarod Wilson Applied and queued up for -stable.
Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states
On 5/24/19 6:38 PM, Mahesh Bandewar (महेश बंडेवार) wrote: On Fri, May 24, 2019 at 2:17 PM Jay Vosburgh wrote: Jarod Wilson wrote: Once in a while, with just the right timing, 802.3ad slaves will fail to properly initialize, winding up in a weird state, with a partner system mac address of 00:00:00:00:00:00. This started happening after a fix to properly track link_failure_count tracking, where an 802.3ad slave that reported itself as link up in the miimon code, but wasn't able to get a valid speed/duplex, started getting set to BOND_LINK_FAIL instead of BOND_LINK_DOWN. That was the proper thing to do for the general "my link went down" case, but has created a link initialization race that can put the interface in this odd state. Are there any notification consequences because of this change? No, there shouldn't be, it just makes initial link-up cleaner, everything during runtime once the link is initialized should remain the same. -- Jarod Wilson ja...@redhat.com
Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states
On 5/24/19 5:16 PM, Jay Vosburgh wrote: Jarod Wilson wrote: Once in a while, with just the right timing, 802.3ad slaves will fail to properly initialize, winding up in a weird state, with a partner system mac address of 00:00:00:00:00:00. This started happening after a fix to properly track link_failure_count tracking, where an 802.3ad slave that reported itself as link up in the miimon code, but wasn't able to get a valid speed/duplex, started getting set to BOND_LINK_FAIL instead of BOND_LINK_DOWN. That was the proper thing to do for the general "my link went down" case, but has created a link initialization race that can put the interface in this odd state. Reading back in the git history, the ultimate cause of this "weird state" appears to be devices that assert NETDEV_UP prior to actually being able to supply sane speed/duplex values, correct? Presuming that this is the case, I don't see that there's much else to be done here, and so: Acked-by: Jay Vosburgh Correct, we've got an miimon "device is up", but still can't get speed and/or duplex in this case. -- Jarod Wilson ja...@redhat.com
Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states
On Fri, May 24, 2019 at 2:17 PM Jay Vosburgh wrote: > > Jarod Wilson wrote: > > >Once in a while, with just the right timing, 802.3ad slaves will fail to > >properly initialize, winding up in a weird state, with a partner system > >mac address of 00:00:00:00:00:00. This started happening after a fix to > >properly track link_failure_count tracking, where an 802.3ad slave that > >reported itself as link up in the miimon code, but wasn't able to get a > >valid speed/duplex, started getting set to BOND_LINK_FAIL instead of > >BOND_LINK_DOWN. That was the proper thing to do for the general "my link > >went down" case, but has created a link initialization race that can put > >the interface in this odd state. > Are there any notification consequences because of this change? >Reading back in the git history, the ultimate cause of this > "weird state" appears to be devices that assert NETDEV_UP prior to > actually being able to supply sane speed/duplex values, correct? > > Presuming that this is the case, I don't see that there's much > else to be done here, and so: > > Acked-by: Jay Vosburgh > > >The simple fix is to instead set the slave link to BOND_LINK_DOWN again, > >if the link has never been up (last_link_up == 0), so the link state > >doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed > >in this case, it simply hasn't been up yet, and this prevents the > >unnecessary state change from DOWN to FAIL and getting stuck in an init > >failure w/o a partner mac. > > > >Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking") > >CC: Jay Vosburgh > >CC: Veaceslav Falico > >CC: Andy Gospodarek > >CC: "David S. Miller" > >CC: net...@vger.kernel.org > >Tested-by: Heesoon Kim > >Signed-off-by: Jarod Wilson > > > > >--- > > drivers/net/bonding/bond_main.c | 15 ++- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > >diff --git a/drivers/net/bonding/bond_main.c > >b/drivers/net/bonding/bond_main.c > >index 062fa7e3af4c..407f4095a37a 100644 > >--- a/drivers/net/bonding/bond_main.c > >+++ b/drivers/net/bonding/bond_main.c > >@@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long > >event, > > case NETDEV_CHANGE: > > /* For 802.3ad mode only: > >* Getting invalid Speed/Duplex values here will put slave > >- * in weird state. So mark it as link-fail for the time > >- * being and let link-monitoring (miimon) set it right when > >- * correct speeds/duplex are available. > >+ * in weird state. Mark it as link-fail if the link was > >+ * previously up or link-down if it hasn't yet come up, and > >+ * let link-monitoring (miimon) set it right when correct > >+ * speeds/duplex are available. > >*/ > > if (bond_update_speed_duplex(slave) && > >- BOND_MODE(bond) == BOND_MODE_8023AD) > >- slave->link = BOND_LINK_FAIL; > >+ BOND_MODE(bond) == BOND_MODE_8023AD) { > >+ if (slave->last_link_up) > >+ slave->link = BOND_LINK_FAIL; > >+ else > >+ slave->link = BOND_LINK_DOWN; > >+ } > > > > if (BOND_MODE(bond) == BOND_MODE_8023AD) > > bond_3ad_adapter_speed_duplex_changed(slave); > >-- > >2.20.1 > >
Re: [PATCH net] bonding/802.3ad: fix slave link initialization transition states
Jarod Wilson wrote: >Once in a while, with just the right timing, 802.3ad slaves will fail to >properly initialize, winding up in a weird state, with a partner system >mac address of 00:00:00:00:00:00. This started happening after a fix to >properly track link_failure_count tracking, where an 802.3ad slave that >reported itself as link up in the miimon code, but wasn't able to get a >valid speed/duplex, started getting set to BOND_LINK_FAIL instead of >BOND_LINK_DOWN. That was the proper thing to do for the general "my link >went down" case, but has created a link initialization race that can put >the interface in this odd state. Reading back in the git history, the ultimate cause of this "weird state" appears to be devices that assert NETDEV_UP prior to actually being able to supply sane speed/duplex values, correct? Presuming that this is the case, I don't see that there's much else to be done here, and so: Acked-by: Jay Vosburgh >The simple fix is to instead set the slave link to BOND_LINK_DOWN again, >if the link has never been up (last_link_up == 0), so the link state >doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed >in this case, it simply hasn't been up yet, and this prevents the >unnecessary state change from DOWN to FAIL and getting stuck in an init >failure w/o a partner mac. > >Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking") >CC: Jay Vosburgh >CC: Veaceslav Falico >CC: Andy Gospodarek >CC: "David S. Miller" >CC: net...@vger.kernel.org >Tested-by: Heesoon Kim >Signed-off-by: Jarod Wilson >--- > drivers/net/bonding/bond_main.c | 15 ++- > 1 file changed, 10 insertions(+), 5 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 062fa7e3af4c..407f4095a37a 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long event, > case NETDEV_CHANGE: > /* For 802.3ad mode only: >* Getting invalid Speed/Duplex values here will put slave >- * in weird state. So mark it as link-fail for the time >- * being and let link-monitoring (miimon) set it right when >- * correct speeds/duplex are available. >+ * in weird state. Mark it as link-fail if the link was >+ * previously up or link-down if it hasn't yet come up, and >+ * let link-monitoring (miimon) set it right when correct >+ * speeds/duplex are available. >*/ > if (bond_update_speed_duplex(slave) && >- BOND_MODE(bond) == BOND_MODE_8023AD) >- slave->link = BOND_LINK_FAIL; >+ BOND_MODE(bond) == BOND_MODE_8023AD) { >+ if (slave->last_link_up) >+ slave->link = BOND_LINK_FAIL; >+ else >+ slave->link = BOND_LINK_DOWN; >+ } > > if (BOND_MODE(bond) == BOND_MODE_8023AD) > bond_3ad_adapter_speed_duplex_changed(slave); >-- >2.20.1 >
[PATCH net] bonding/802.3ad: fix slave link initialization transition states
Once in a while, with just the right timing, 802.3ad slaves will fail to properly initialize, winding up in a weird state, with a partner system mac address of 00:00:00:00:00:00. This started happening after a fix to properly track link_failure_count tracking, where an 802.3ad slave that reported itself as link up in the miimon code, but wasn't able to get a valid speed/duplex, started getting set to BOND_LINK_FAIL instead of BOND_LINK_DOWN. That was the proper thing to do for the general "my link went down" case, but has created a link initialization race that can put the interface in this odd state. The simple fix is to instead set the slave link to BOND_LINK_DOWN again, if the link has never been up (last_link_up == 0), so the link state doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed in this case, it simply hasn't been up yet, and this prevents the unnecessary state change from DOWN to FAIL and getting stuck in an init failure w/o a partner mac. Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking") CC: Jay Vosburgh CC: Veaceslav Falico CC: Andy Gospodarek CC: "David S. Miller" CC: net...@vger.kernel.org Tested-by: Heesoon Kim Signed-off-by: Jarod Wilson --- drivers/net/bonding/bond_main.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 062fa7e3af4c..407f4095a37a 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long event, case NETDEV_CHANGE: /* For 802.3ad mode only: * Getting invalid Speed/Duplex values here will put slave -* in weird state. So mark it as link-fail for the time -* being and let link-monitoring (miimon) set it right when -* correct speeds/duplex are available. +* in weird state. Mark it as link-fail if the link was +* previously up or link-down if it hasn't yet come up, and +* let link-monitoring (miimon) set it right when correct +* speeds/duplex are available. */ if (bond_update_speed_duplex(slave) && - BOND_MODE(bond) == BOND_MODE_8023AD) - slave->link = BOND_LINK_FAIL; + BOND_MODE(bond) == BOND_MODE_8023AD) { + if (slave->last_link_up) + slave->link = BOND_LINK_FAIL; + else + slave->link = BOND_LINK_DOWN; + } if (BOND_MODE(bond) == BOND_MODE_8023AD) bond_3ad_adapter_speed_duplex_changed(slave); -- 2.20.1