https://lists.ubuntu.com/archives/kernel-team/2019-November/105446.html

** Description changed:

+ == Justification ==
+ From the well explained commit message:
+ 
+ Since de77ecd4ef02 ("bonding: improve link-status update in
+ mii-monitoring"), the bonding driver has utilized two separate variables
+ to indicate the next link state a particular slave should transition to.
+ Each is used to communicate to a different portion of the link state
+ change commit logic; one to the bond_miimon_commit function itself, and
+ another to the state transition logic.
+ 
+       Unfortunately, the two variables can become unsynchronized,
+ resulting in incorrect link state transitions within bonding.  This can
+ cause slaves to become stuck in an incorrect link state until a
+ subsequent carrier state transition.
+ 
+       The issue occurs when a special case in bond_slave_netdev_event
+ sets slave->link directly to BOND_LINK_FAIL.  On the next pass through
+ bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
+ case will set the proposed next state (link_new_state) to BOND_LINK_UP,
+ but the new_link to BOND_LINK_DOWN.  The setting of the final link state
+ from new_link comes after that from link_new_state, and so the slave
+ will end up incorrectly in _DOWN state.
+ 
+         Resolve this by combining the two variables into one.
+ 
+ == Fixes ==
+ * 1899bb32 (bonding: fix state transition issue in link monitoring)
+ 
+ This patch can be cherry-picked into E/F
+ 
+ For older releases like B/D, it will needs to be backported as they are
+ missing the slave_err() printk marco added in 5237ff79 (bonding: add
+ slave_foo printk macros) as well as the commit to replace netdev_err()
+ with slave_err() in e2a7420d (bonding/main: convert to using slave
+ printk macros)
+ 
+ For Xenial, the commit that causes this issue, de77ecd4, does not exist.
+ 
+ == Test ==
+ Test kernels can be found here:
+ https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/
+ 
+ The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
+ the patched kernel works as expected.
+ 
+ == Regression Potential ==
+ Low.
+ This patch just unifiy the variable used in link state change commit
+ logic to prevent the occurance of an incorrect state. And the changes
+ are limited to the bonding driver itself.
+ 
+ (Although the include/net/bonding.h will be used in other drivers, but
+ the changes to that file is only affecting this bond_main.c driver)
+ 
+ 
+ == Original Bug Report ==
  There's an issue with bonding driver in the current ubuntu kernels.
  Sometimes one link stuck in a weird state.
  It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html 
in upstream.
  Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.
  
  We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
  should be reproducible with other current kernel versions.

** Description changed:

  == Justification ==
  From the well explained commit message:
  
  Since de77ecd4ef02 ("bonding: improve link-status update in
  mii-monitoring"), the bonding driver has utilized two separate variables
  to indicate the next link state a particular slave should transition to.
  Each is used to communicate to a different portion of the link state
  change commit logic; one to the bond_miimon_commit function itself, and
  another to the state transition logic.
  
-       Unfortunately, the two variables can become unsynchronized,
+  Unfortunately, the two variables can become unsynchronized,
  resulting in incorrect link state transitions within bonding.  This can
  cause slaves to become stuck in an incorrect link state until a
  subsequent carrier state transition.
  
-       The issue occurs when a special case in bond_slave_netdev_event
+  The issue occurs when a special case in bond_slave_netdev_event
  sets slave->link directly to BOND_LINK_FAIL.  On the next pass through
  bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
  case will set the proposed next state (link_new_state) to BOND_LINK_UP,
  but the new_link to BOND_LINK_DOWN.  The setting of the final link state
  from new_link comes after that from link_new_state, and so the slave
  will end up incorrectly in _DOWN state.
  
-         Resolve this by combining the two variables into one.
+  Resolve this by combining the two variables into one.
  
  == Fixes ==
  * 1899bb32 (bonding: fix state transition issue in link monitoring)
  
  This patch can be cherry-picked into E/F
  
  For older releases like B/D, it will needs to be backported as they are
  missing the slave_err() printk marco added in 5237ff79 (bonding: add
  slave_foo printk macros) as well as the commit to replace netdev_err()
  with slave_err() in e2a7420d (bonding/main: convert to using slave
  printk macros)
  
  For Xenial, the commit that causes this issue, de77ecd4, does not exist.
  
  == Test ==
  Test kernels can be found here:
  https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/
  
  The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
  the patched kernel works as expected.
  
  == Regression Potential ==
  Low.
- This patch just unifiy the variable used in link state change commit
- logic to prevent the occurance of an incorrect state. And the changes
+ This patch just unify the variable used in link state change commit
+ logic to prevent the occurrence of an incorrect state. And the changes
  are limited to the bonding driver itself.
  
  (Although the include/net/bonding.h will be used in other drivers, but
  the changes to that file is only affecting this bond_main.c driver)
- 
  
  == Original Bug Report ==
  There's an issue with bonding driver in the current ubuntu kernels.
  Sometimes one link stuck in a weird state.
  It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html 
in upstream.
  Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.
  
  We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
  should be reproducible with other current kernel versions.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1852077

Title:
  Backport: bonding: fix state transition issue in link monitoring

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to