https://lists.ubuntu.com/archives/kernel-team/2019-November/105446.html
** Description changed:
+ == Justification ==
+ From the well explained commit message:
+
+ Since de77ecd4ef02 ("bonding: improve link-status update in
+ mii-monitoring"), the bonding driver has utilized two separate variables
+ to indicate the next link state a particular slave should transition to.
+ Each is used to communicate to a different portion of the link state
+ change commit logic; one to the bond_miimon_commit function itself, and
+ another to the state transition logic.
+
+ Unfortunately, the two variables can become unsynchronized,
+ resulting in incorrect link state transitions within bonding. This can
+ cause slaves to become stuck in an incorrect link state until a
+ subsequent carrier state transition.
+
+ The issue occurs when a special case in bond_slave_netdev_event
+ sets slave->link directly to BOND_LINK_FAIL. On the next pass through
+ bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
+ case will set the proposed next state (link_new_state) to BOND_LINK_UP,
+ but the new_link to BOND_LINK_DOWN. The setting of the final link state
+ from new_link comes after that from link_new_state, and so the slave
+ will end up incorrectly in _DOWN state.
+
+ Resolve this by combining the two variables into one.
+
+ == Fixes ==
+ * 1899bb32 (bonding: fix state transition issue in link monitoring)
+
+ This patch can be cherry-picked into E/F
+
+ For older releases like B/D, it will needs to be backported as they are
+ missing the slave_err() printk marco added in 5237ff79 (bonding: add
+ slave_foo printk macros) as well as the commit to replace netdev_err()
+ with slave_err() in e2a7420d (bonding/main: convert to using slave
+ printk macros)
+
+ For Xenial, the commit that causes this issue, de77ecd4, does not exist.
+
+ == Test ==
+ Test kernels can be found here:
+ https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/
+
+ The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
+ the patched kernel works as expected.
+
+ == Regression Potential ==
+ Low.
+ This patch just unifiy the variable used in link state change commit
+ logic to prevent the occurance of an incorrect state. And the changes
+ are limited to the bonding driver itself.
+
+ (Although the include/net/bonding.h will be used in other drivers, but
+ the changes to that file is only affecting this bond_main.c driver)
+
+
+ == Original Bug Report ==
There's an issue with bonding driver in the current ubuntu kernels.
Sometimes one link stuck in a weird state.
It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html
in upstream.
Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.
We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
should be reproducible with other current kernel versions.
** Description changed:
== Justification ==
From the well explained commit message:
Since de77ecd4ef02 ("bonding: improve link-status update in
mii-monitoring"), the bonding driver has utilized two separate variables
to indicate the next link state a particular slave should transition to.
Each is used to communicate to a different portion of the link state
change commit logic; one to the bond_miimon_commit function itself, and
another to the state transition logic.
- Unfortunately, the two variables can become unsynchronized,
+ Unfortunately, the two variables can become unsynchronized,
resulting in incorrect link state transitions within bonding. This can
cause slaves to become stuck in an incorrect link state until a
subsequent carrier state transition.
- The issue occurs when a special case in bond_slave_netdev_event
+ The issue occurs when a special case in bond_slave_netdev_event
sets slave->link directly to BOND_LINK_FAIL. On the next pass through
bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
case will set the proposed next state (link_new_state) to BOND_LINK_UP,
but the new_link to BOND_LINK_DOWN. The setting of the final link state
from new_link comes after that from link_new_state, and so the slave
will end up incorrectly in _DOWN state.
- Resolve this by combining the two variables into one.
+ Resolve this by combining the two variables into one.
== Fixes ==
* 1899bb32 (bonding: fix state transition issue in link monitoring)
This patch can be cherry-picked into E/F
For older releases like B/D, it will needs to be backported as they are
missing the slave_err() printk marco added in 5237ff79 (bonding: add
slave_foo printk macros) as well as the commit to replace netdev_err()
with slave_err() in e2a7420d (bonding/main: convert to using slave
printk macros)
For Xenial, the commit that causes this issue, de77ecd4, does not exist.
== Test ==
Test kernels can be found here:
https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/
The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
the patched kernel works as expected.
== Regression Potential ==
Low.
- This patch just unifiy the variable used in link state change commit
- logic to prevent the occurance of an incorrect state. And the changes
+ This patch just unify the variable used in link state change commit
+ logic to prevent the occurrence of an incorrect state. And the changes
are limited to the bonding driver itself.
(Although the include/net/bonding.h will be used in other drivers, but
the changes to that file is only affecting this bond_main.c driver)
-
== Original Bug Report ==
There's an issue with bonding driver in the current ubuntu kernels.
Sometimes one link stuck in a weird state.
It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html
in upstream.
Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.
We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
should be reproducible with other current kernel versions.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1852077
Title:
Backport: bonding: fix state transition issue in link monitoring
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs