[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-06-11 Thread Nivedita Singhvi
Jeff, Please do provide your logs and whatever other information you can share from your error case, any piece of info will help here. I do not yet have a repro environment myself. I suspect that most of the changes which seem to help or fix the issue are simply changing the timing enough to

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-06-11 Thread Jeffrey Honig
We are also seeing this running trusty with the HWE kernel (i.e. 4.4) in which we have upgraded to the upstream i40e drivers. When we started using i40e 2.0.26 we found that we needed to add pre-up sleep 15 for bond0 and this seems to work all the time. However, when using i40e 2.3.6 or 2.4.6

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-06-11 Thread Nivedita Singhvi
I would have thought this would be the relevant patch: bonding: speed/duplex update at NETDEV_UP event Mahesh Bandewar authored and davem330 committed on Sep 28, 2017 1 parent b5c7d4e commit 4d2c0cda07448ea6980f00102dc3964eb25e241c However, it was first available in v4.15-rc1. At least as far

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-06-07 Thread Nivedita Singhvi
Hi Joseph, We're continuing the investigation into this issue, and I was wondering if you and Nabuto could provide what the last point you had reached was, and/or next step you were going to do. >From what I can summarize (please confirm/correct): * Artful (4.13.*) kernels (with any Artful

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-30 Thread Jay Vosburgh
We've seen a similar-sounding issue in the past, but couldn't get it tracked down to the root cause. Is it possible to enable some instrumentation in the /etc/network/interfaces and obtain some data on a failing occurrence? What we've used in the past is adding something like pre-up echo 'file

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-29 Thread Nobuto Murata
#146~lp1753662ThreeCommits is better at some level (around 40% failure rate to 20%). Failure rate: 187/470 (39.8%), 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 Failure rate: 87/222 (39.2%), 4.4.0-120-generic #144-Ubuntu SMP Thu Apr 5 14:11:49 UTC 2018 Failure rate: 138/712

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-27 Thread Joseph Salisbury
I built a test kernel with the two commits pointed out by Jay. The test kernel also required commit f307668bfc as a prereq. The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 -- You received this bug notification because you are a member of Ubuntu Bugs,

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-27 Thread Joseph Salisbury
Thanks for the pointer, Jay! I'll build a Xenial test kernel with these two commits and post a link to it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start up

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-26 Thread Jay Vosburgh
I would suggest testing commit de77ecd4ef02ca783f7762e04e92b3d0964be66b Author: Mahesh Bandewar Date: Mon Mar 27 11:37:33 2017 -0700 bonding: improve link-status update in mii-monitoring and commit d94708a553022bf012fa95af10532a134eeb5a52 Author: WANG Cong

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-16 Thread Nobuto Murata
Just for the record, up-to-date numbers after the weekend. Failure rate: 167/422 (39.6%), 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 Failure rate: 87/222 (39.2%), 4.4.0-120-generic #144-Ubuntu SMP Thu Apr 5 14:11:49 UTC 2018 Failure rate: 117/726 (16.1%),

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-15 Thread Nobuto Murata
Ok, we have some numbers with the new host. Failure rate: 45/112 (40.2%), 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 Failure rate: 87/222 (39.2%), 4.4.0-120-generic #144-Ubuntu SMP Thu Apr 5 14:11:49 UTC 2018 Failure rate: 117/726 (16.1%), 4.4.0-040400-generic

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-15 Thread Nobuto Murata
Just for the record, I'm using the attached rc.local for testing. ** Attachment added: "rc.local" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5116629/+files/rc.local -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-13 Thread Nobuto Murata
I ran HWE 4.13 just to make sure the result is the same as the previous host. And as we confirmed before, the issue is not reproducible with HWE 4.13. [HWE 4.13] Failure rate: 0/407 (0.0%), 4.13.0-38-generic #43~16.04.1-Ubuntu SMP Wed Mar 14 17:48:43 UTC 2018 > We should first confirm that

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-13 Thread Nobuto Murata
I ran HWE 4.13 just to make sure the result is the same as the previous host. And as we confirmed before, the issue is not reproducible with HWE 4.13. [stock xenial] Failure rate: 117/726 (16.1%), 4.4.0-040400-generic #201803261439 SMP Mon Mar 26 14:43:35 UTC 2018 [HWE 4.13] Failure rate: 0/407

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-13 Thread Joseph Salisbury
Thanks for testing. I would have thought 4.4 with the Artful configs would not have had the bug if a config change is the fix. We know: 4.12-rc4 with Artful configs is good. 4.12-rc4 with Xenial configs is bad. Any kernel with Xenial configs is bad. It is possible a patch in combination to a

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-12 Thread Nobuto Murata
FWIW, I tried PCI hot-plugging to try another way for faster iterations without reboot. https://paste.ubuntu.com/p/qDVkMcTYPQ/ However, the issue wasn't reproducible with hot-plugging. Rebooting is the easiest reproduction so far. -- You received this bug notification because you are a member

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-12 Thread Nobuto Murata
4.4 kernel using the Artful configs didn't make much difference. Failure rate: 117/726 (16.1%), 4.4.0-040400-generic #201803261439 SMP Mon Mar 26 14:43:35 UTC 2018 I will let stock 4.4 and 4.13 hwe run just to make sure to know the occurrence rate with this host. -- You received this bug

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-11 Thread Nobuto Murata
FWIW, kernel trace happens with the kernel in: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/comments/77 But I will let it running anyway since I'm not sure if it affects to the testing or not. [5.999557] rtc_cmos 00:00: setting system clock to 2018-04-11 15:52:23 UTC

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-10 Thread Nobuto Murata
Finally got a machine up and running. Will resume testing shortly. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start up race conditions To manage notifications

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-03 Thread Nobuto Murata
The new environment is not fully up yet to test. ETA would be by the end of this week. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start up race conditions To

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-02 Thread Joseph Salisbury
Just curious if you had a chance to test the kernel posted in #77? I compared the configs between these two kernels: v4.12-rc3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12-rc3/ v4.12-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12-rc4/ Nothing sticks out as a fix in v4.12-rc4.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-26 Thread Joseph Salisbury
Thanks for the update. I'll start comparing the configs between Xenial and Artful to see if the change that caused this sticks out. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title:

Re: [Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-26 Thread Nobuto Murata
2018年3月27日(火) 0:11 Joseph Salisbury : > I built a 4.4 kernel using the Artful configs, it can be downloaded from: > http://kernel.ubuntu.com/~jsalisbury/lp1753662 Thanks. I just lost the access to the machine today. So I have to use another host in a different

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-26 Thread Joseph Salisbury
I built a 4.4 kernel using the Artful configs, it can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-24 Thread Nobuto Murata
4.12-rc4 kernel with Xenial configs looks bad. [xenial configs] v4.12-rc3 #201803161156 - relatively bad (36 of 249 - 14.5%) v4.12.0-041200rc3 #201803191316 - relatively bad (21 of 150 - 14.0%) ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 v4.12.0-041200rc3 #20180324 - relatively bad (60 of 499 -

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-22 Thread Joseph Salisbury
The HWE kernel was built with the Artful configs. I restarted the bisect using the Xenial configs and marking 4.12-rc4 as good and 4.12-rc3 as bad. We should re-test that to confirm we are going down the right patch. I built a 4.12-rc4 kernel with Xenial configs, which can be downloaded from:

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-21 Thread Nobuto Murata
BTW, have we set the baseline of "good" in this bisection with xenial configs? > 4.13.0-36(xenial HW) - good (0 of 119 - 0%) Does HWE kernel man with xenial configs? Or was it built with the source release config i.e. artful? -- You received this bug notification because you are a member of

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-21 Thread Nobuto Murata
Correction: Does HWE kernel mean it's with xenial configs? Or was it built with the source release config i.e. artful? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-21 Thread Nobuto Murata
55cbdaf6399de16b61d40d49b6c8bb739a877dea looks bad. [xenial configs] v4.12-rc3 #201803161156 - relatively bad (36 of 249 - 14.5%) v4.12.0-041200rc3 #201803191316 - relatively bad (21 of 150 - 14.0%) ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 v4.12.0-041200rc3 #20180324 - relatively bad (60 of

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-21 Thread Nobuto Murata
I was pretty occupied today, so I'm going to test 55cbdaf6399de16b61d40d49b6c8bb739a877dea now and report back my tomorrow morning. [xenial configs] v4.12-rc3 #201803161156 - relatively bad (36 of 249 - 14.5%) v4.12.0-041200rc3 #201803191316 relatively bad (21 of 150 - 14.0%)

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-20 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: 55cbdaf6399de16b61d40d49b6c8bb739a877dea The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-20 Thread Nobuto Murata
ea094f3c830a67f252677aacba5d04ebcf55c4d9 looks bad. [xenial configs] v4.12-rc3 #201803161156 - relatively bad (36 of 249 - 14.5%) v4.12.0-041200rc3 #201803191316 relatively bad (21 of 150 - 14.0%) ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 v4.12.0-041200rc3 #20180324 relatively bad (6 of 56 -

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-19 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: ea094f3c830a67f252677aacba5d04ebcf55c4d9 The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-19 Thread Nobuto Murata
ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 looks bad. [xenial configs] v4.12-rc3 #201803161156 - relatively bad (36 of 249 - 14.5%) v4.12.0-041200rc3 #201803191316 relatively bad (21 of 150 - 14.0%) ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 -- You received this bug notification because you are a

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-19 Thread Nobuto Murata
ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 looks bad. [xenial configs] v4.12-rc3 #201803161156 - relatively bad (36 of 249 - 14.5%) v4.12.0-041200rc3 #201803191316 relatively bad (21 of 150 - 14.0%) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-19 Thread Joseph Salisbury
I restarted the bisect. This time using Xenial configs and not Artful configs. I built the first test kernel, up to the following commit: ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 You've tested this SHA1 in

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-17 Thread Joseph Salisbury
It's good that the v4.12-rc3 with xenial configs was bad. It means we should use Xenial configs when performing the bisect and not Artful configs. I'll kick off another bisect and post the first test kernel. -- You received this bug notification because you are a member of Ubuntu Bugs, which

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-17 Thread Nobuto Murata
Ok, we see some differences with the three kernels. How do we want to proceed from here? v4.12-rc3 - bad (24 of 90 - 26.6%) http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12-rc3/ v4.12-rc3 #201803151851 - relatively good (2 of 151 - 1.3%)

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-16 Thread Joseph Salisbury
Sorry the correct link for the new 4.12-rc3 kernel with Xenial configs is: http://kernel.ubuntu.com/~jsalisbury/lp1753662/v4.12-rc3-xenial-configs/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-16 Thread Joseph Salisbury
I built another v4.12-rc3 test kernel. This time with Xenial configs instead of Artful configs. This test kernel can be downloaded from: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc3-xenial-configs Can you see if this kernel exhibits the bug? -- You received this bug notification

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-16 Thread Nobuto Murata
** Attachment added: "bond_check_xenial_4.12.0-041200rc3-generic_201803151851.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5081137/+files/bond_check_xenial_4.12.0-041200rc3-generic_201803151851.log -- You received this bug notification because you are a

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-16 Thread Nobuto Murata
The new build of v4.12-rc3 is a good build (2 of 151). 4.12.0-041200rc3-generic #201803151851 v4.12-rc3 - bad (24 of 90 - 26.6%) http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc3/ v4.12-rc3 - relatively good (2 of 151 - 1.3%) http://kernel.ubuntu.com/~jsalisbury/lp1753662/v4.12-rc3/ So

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-15 Thread Nobuto Murata
v4.12-rc4 is good, 1 of 146. Going to test v4.12-rc3. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start up race conditions To manage notifications about this

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-15 Thread Joseph Salisbury
A re-build of v4.12-rc3 is now available here: http://kernel.ubuntu.com/~jsalisbury/lp1753662/v4.12-rc3 Can you confirm that this kernel is bad and contains the bug? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-15 Thread Joseph Salisbury
Yes, sorry this is a "Reverse" bisect, so v4.12-rc4 should be good and not bad. I'm also going to build a v4.12-rc3 kernel with my configs to confirm it's bad. I'll post that shortly. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-15 Thread Nobuto Murata
> We may have went wrong somewhere in the bisect. However, just to be sure, I > built a v4.12-rc4 test kernel. This kernel should be bad and contain the bug. > If it does not, it may be due to the configs I'm using to build the test > kernels. I'm not following since I thought we tested that

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-15 Thread Joseph Salisbury
The reverse bisect reported the following commit as the fix, but I'm doubtful since its and i915 commit: commit 4681ee21d62cfed4364e09ec50ee8e88185dd628 Author: Joonas Lahtinen Date: Thu May 18 11:49:39 2017 +0300 drm/i915: Do not sync RCU during shrinking

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-15 Thread Nobuto Murata
** Attachment added: "bond_check_xenial_4.12.0-041200rc1_201803141835.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5080093/+files/bond_check_xenial_4.12.0-041200rc1_201803141835.log -- You received this bug notification because you are a member of Ubuntu

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-15 Thread Nobuto Murata
4681ee21d62cfed4364e09ec50ee8e88185dd628 looks good. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12.0-041200rc1 #201803141835 - relatively good (1 of 113 - 0.9%) 4681ee21d62cfed4364e09ec50ee8e88185dd628

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-14 Thread Nobuto Murata
171d8b9363725e122b164e6b9ef2acf2f751e387 looks good. The next test is with 4681ee21d62cfed4364e09ec50ee8e88185dd628. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12.0-041200rc1 #201803141333 - relatively good (1 of 217 -

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-14 Thread Chris Gregan
** Tags added: cdo-qa-blocker -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start up race conditions To manage notifications about this bug go to:

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-14 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: 4681ee21d62cfed4364e09ec50ee8e88185dd628 The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-14 Thread Nobuto Murata
The test is still in progress, but so far 171d8b9363725e122b164e6b9ef2acf2f751e387 looks good (0 of 21). Since I already downloaded the kernel locally, please go ahead to build the next one. Thanks, -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-14 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: 171d8b9363725e122b164e6b9ef2acf2f751e387 The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-14 Thread Nobuto Murata
d38162e4b5c643733792f32be4ea107c831827b4 looks good. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12.0-041200rc1 #201803131457 - relatively good (1 of 93 - 1.1%) d38162e4b5c643733792f32be4ea107c831827b4 v4.12.0-041200rc3

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-13 Thread Nobuto Murata
The test is still in progress, but so far d38162e4b5c643733792f32be4ea107c831827b4 looks good (1 of 37). Since I already downloaded the kernel locally, please go ahead to build the next one. Thanks, -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-13 Thread Joseph Salisbury
Yes that is the correct kernel. The mainline-build-one script uses the 'git describe' command to come up with the name. That command returns the closest git tag and not the one that contains it. So in the case of commit d38162e4b5c643733792f32be4ea107c831827b4: git describe

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-13 Thread Nobuto Murata
Oh wait, > I built the next test kernel, up to the following commit: > d38162e4b5c643733792f32be4ea107c831827b4 > > The test kernel can be downloaded from: > http://kernel.ubuntu.com/~jsalisbury/lp1753662 d38162e4b5c643733792f32be4ea107c831827b4 looks in-between v4.12-rc3 and rc4 which is

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-13 Thread Nobuto Murata
@Joseph, Will do. Just as a possibility, I could build a kernel on the host if that's helpful. Because the host is already reserved for this testing and has hundreds of GBs of memory and many CPU cores. If you have a pointer how to replicate your build process, that would be great. -- You

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-13 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: d38162e4b5c643733792f32be4ea107c831827b4 The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-12 Thread Nobuto Murata
25f480e89a022d382ddc5badc23b49426e89eabc looks good. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12.0-041200rc3 #201803121355 - relatively good (1 of 252 - 0.4%) 25f480e89a022d382ddc5badc23b49426e89eabc

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-12 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: 25f480e89a022d382ddc5badc23b49426e89eabc The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-09 Thread Nobuto Murata
400129f0a3ae989c30b37104bbc23b35c9d7a9a4 looks good. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12.0-041200rc3 #201803090724 - relatively good (2 of 77 - 2.6%) 400129f0a3ae989c30b37104bbc23b35c9d7a9a4

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: 400129f0a3ae989c30b37104bbc23b35c9d7a9a4 The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Nobuto Murata
0bb230399fd337cc9a838d47a0c9ec3433aa612e seems good. I'm ready for the next test. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12.0-041200rc3 #201803081620 - relatively good (1 of 36 - 2.8%)

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Nobuto Murata
** Attachment added: "bond_check_xenial_hwe_4.13_full.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5073884/+files/bond_check_xenial_hwe_4.13_full.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Nobuto Murata
I run xenial HWE over a night while sleeping, the result was 0/119. The next test is with: v4.12.0-041200rc3 #201803081620 - ? 0bb230399fd337cc9a838d47a0c9ec3433aa612e 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%)

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Joseph Salisbury
I built the next test kernel, up to the following commit: 0bb230399fd337cc9a838d47a0c9ec3433aa612e The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Nobuto Murata
4.12.0-041200rc3-generic #201803080803 looks good. Please proceed to the next one. I will test it my tomorrow which would be 12 hours later from now. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12.0-041200rc3 - good (0

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Nobuto Murata
So far 0 of 6 with 4.12.0-041200rc3-generic #201803080803. But I will keep it running for a while to see if it becomes close to 30% or 0%. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title:

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Joseph Salisbury
I just noticed I forgot to paste the SHA1 for the test kernel posted in comment #29: ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 Do you have results from that kernel? Once you do, I'll update the bisect and build the next kernel. -- You received this bug notification because you are a member of

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Nobuto Murata
Ok, 25% - 30% seems a baseline. I'd like to make sure v4.13 is really 0% for longer running test, but will do the bisection of v4.12-rc3 and v4.12-rc4 first. 4.4.0-116(xenial) - bad (9 of 31 - 29.0%) v4.12-rc2 - bad (15 of 53 - 28.3%) v4.12-rc3 - bad (24 of 90 - 26.6%) v4.12-rc4 - relatively

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-08 Thread Joseph Salisbury
I started a kernel bisect between v4.12-rc3 and v4.12-rc4. The kernel bisect will require testing of about 7-10 test kernels. I built the first test kernel, up to the following commit: The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test that

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-07 Thread Nobuto Murata
With rc2 result. It looks like there is a noticeable difference between v4.12-rc3 and v4.12-rc4. @Joseph, can you please start looking into diffs? I'm keeping one dedicated node just for this testing, so I can run the same script one by one for more bisections. v4.12-rc1 - bad (3 of 3) v4.12-rc2

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-07 Thread Nobuto Murata
With rc3, will test rc2 next. v4.12-rc1 - bad (3 out of 3) v4.12-rc3 - mixture result (24 out of 90) v4.12-rc4 - relatively good (1 out of 70) v4.12 - relatively good (5 out of 68) v4.13 - good (0 out of 41) ** Attachment added: "bond_check_xenial_mainline_4.12-rc3_full.log"

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-07 Thread Nobuto Murata
I have let rc4 run for hours. v4.12-rc1 - bad (3 out of 3) v4.12-rc4 - relatively good (1 out of 70) v4.12 - relatively good (5 out of 68) v4.13 - good (0 out of 41) I will let rc3 run during my night. ** Attachment added: "bond_check_xenial_mainline_4.12-rc4_full.log"

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-07 Thread Joseph Salisbury
Do you happen to have results from any of the other release candidates, such as 4.12-rc4? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start up race conditions

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-07 Thread Nobuto Murata
For the record, v4.12-rc1 - bad (3 out of 3) v4.12 - relatively good (5 out of 68) v4.13 - good (0 out of 41) ** Attachment added: "bond_check_xenial_mainline_4.13_full.log"

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-07 Thread Nobuto Murata
** Attachment added: "bond_check_xenial_mainline_4.12_full.log.gz" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5071492/+files/bond_check_xenial_mainline_4.12_full.log.gz -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
I run an overnight test with v4.12 just to make sure it really fixed the issue. It happened sometimes, but way less frequencies. We may need to test it longer for "good" cases since the patch may not be only one. Anyway, the current status is: v4.12-rc1 with i40e 2.1.14 - bad (3 out of 3) v4.12

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
Correction. I thought v4.12-rc1 had i40e 2.1.7 because of: https://github.com/torvalds/linux/commit/15990832cd3e7e8904f8dacdabfa33adb9a836d6 But it actually has 2.1.14 from the log output. So the correct status is: v4.12-rc1 with i40e 2.1.14 - bad v4.12 with i40e 2.1.14 - good -- You received

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
v4.12-rc1 with i40e 2.1.7 - bad v4.12 with i40e 2.1.14 - good I'm running out of time. So more bisections are for tomorrow. ** Attachment added: "bond_check_xenial_mainline_4.12-rc1.log"

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Joseph Salisbury
If v4.12-rc1 is still bad, we would need to test some of the other release candidates, such as rc2, rc3, rc4, etc. Once we have the last bad and first good, I'll start the reverse bisect and build a kernel. -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
v4.11 with i40e 1.6.27 - bad v4.12 with i40e 2.1.14 - good next: v4.12-rc1 with i40e 2.1.7 - ? ** Attachment added: "bond_check_xenial_mainline_4.12.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5070835/+files/bond_check_xenial_mainline_4.12.log -- You

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
Not reproducible with 4.13-rc1 with 5 reboots. 4.10 - bad 4.11 - bad 4.13-rc1 - good The next is 4.12. ** Attachment added: "bond_check_xenial_mainline_4.13-rc1.log"

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
Reproducible with: v4.11 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/ The next is v4.13-rc1. ** Attachment added: "bond_check_xenial_mainline_4.11.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5070785/+files/bond_check_xenial_mainline_4.11.log

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
Reproducible with: 4.10 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/ The next test will be with: v4.11 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/ ** Attachment added: "bond_check_xenial_mainline_4.10.log"

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Joseph Salisbury
To narrow it down further, can you also test the following kernels: v4.11 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/ v4.13-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
Thanks, I was running some tests with existing HWE kernels in xenial repo like linux-image-4.8.0-58-generic, linux-image-4.10.0-42-generic and linux-image-4.13.0-36-generic. It looks like 4.10 is the last bad one and 4.13 is the first good one. Let me double-check with those two: 4.10 Final:

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Joseph Salisbury
To perform a "Reverse" bisect, we need to identify the last kernel version that had the bug and the first kernel version that does not. Can you test the following upstream kernels: 4.4 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-wily/ 4.6 Final:

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
The record of xenial default kernel. ** Attachment added: "bond_check_xenial.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5070691/+files/bond_check_xenial.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
For the record of testing. ** Attachment added: "bond_check_xenial_c15e07b02bf0.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5070633/+files/bond_check_xenial_c15e07b02bf0.log -- You received this bug notification because you are a member of Ubuntu Bugs,

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Nobuto Murata
The kernel with c15e07b02bf0 didn't make a difference on the race condition. The issue is still reproducible. Let me know when you need my testing again with different kernels. So far, I'm using rc.local below to reboot the same node multiple times. #!/bin/sh exec >> /root/bond_check.log

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Joseph Salisbury
If that commit doesn't fix the issue, we can perform a "Reverse" bisect between 4.4 and 4.13 to find the fix. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Joseph Salisbury
There is one LACP commit that sticks out between v4.4 and v4.13: c15e07b02bf0 ("team: loadbalance: push lacpdus to exact delivery") I built a Xenial test kernel with this commit. The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1753662 Can you test this kernel and

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-06 Thread Joseph Salisbury
** Changed in: linux (Ubuntu) Importance: Undecided => High ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Xenial) Status: New => Triaged ** Changed in: linux (Ubuntu Xenial) Importance: Undecided => High ** Changed

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-05 Thread Nobuto Murata
This might be related (not exactly the same): https://sourceforge.net/p/e1000/bugs/524/ One says 1.6.42 fixed his issue. Looks like Intel has around 10 releases between 1.4.25 and 2.1.14, so it may not be handy to bisect.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-05 Thread Nobuto Murata
** Attachment added: "interfaces.50-cloud-init.cfg" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5070467/+files/interfaces.50-cloud-init.cfg -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-05 Thread Nobuto Murata
** Attachment added: "bond2_status.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5070466/+files/bond2_status.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-03-05 Thread Nobuto Murata
** Attachment added: "bond1_status.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+attachment/5070465/+files/bond1_status.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

  1   2   >