[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free
As several different forums are discussing this issue, I'm using this LP bug to continue investigation in to current manifestation of this bug (after 4.15 kernel). I suspect it's in one of the other places not fixed, as my colleague Dan stated a while ago. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1711407 Title: unregister_netdevice: waiting for lo to become free To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free
We are seeing definitely a problem on kernels after 4.15.0-159-generic, which is the last known good kernel. 5.3* kernels are affected, but I do not have data on most recent upstream. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1711407 Title: unregister_netdevice: waiting for lo to become free To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1403152 Title: unregister_netdevice: waiting for lo to become free. Usage count To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1403152/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1711407] Re: unregister_netdevice: waiting for lo to become free
Is anyone still seeing a similar issue on current mainline? ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1711407 Title: unregister_netdevice: waiting for lo to become free To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1939106] Re: The aufs storage-driver is no longer supported in docker.io_20.10.7-0ubuntu1~20.04.1
** Tags added: seg -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1939106 Title: The aufs storage-driver is no longer supported in docker.io_20.10.7-0ubuntu1~20.04.1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/1939106/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1922997] Re: P8 node entei unable to boot with 4.15.0-141.145~16.04.1
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1922997 Title: P8 node entei unable to boot with 4.15.0-141.145~16.04.1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1922997/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation
** Description changed: SRU Justification: [Impact] - This bug in bcache [insert correct area] affects I/O performance on all versions of the kernel [correct versions affected]. It is particularly negative on ceph if used with bcache. + This bug in bcache affects I/O performance on all versions of the kernel [correct versions affected]. It is particularly negative on ceph if used with bcache. Write I/O latency would suddenly go to around 1 second from around 10 ms when hitting this issue and would easily be stuck there for hours or even days, especially bad for ceph on bcache architecture. This would make ceph extremely slow and make the entire cloud almost unusable. The root cause is that the dirty bucket had reached the 70 percent threshold, thus causing all writes to go direct to the backing HDD device. It might be fine if it actually had a lot of dirty data, but this happens when dirty data has not even reached over 10 percent, due to having high memory fragmentation. What makes it worse is that the writeback rate might be still at minimum value (8) due to the writeback percent not reached, so it takes ages for bcache to really reclaim enough dirty buckets to get itself out of this situation. [Fix] * 71dda2a5625f31bc3410cb69c3d31376a2b66f28 “bcache: consider the fragmentation when update the writeback rate” - The current way to calculate the writeback rate only considered the dirty sectors. + The current way to calculate the writeback rate only considered the dirty sectors. This usually works fine when memory fragmentation is not high, but it will give us an unreasonably low writeback rate when we are in the situation that a few dirty sectors have consumed a lot of dirty buckets. In some cases, the dirty buckets reached CUTOFF_WRITEBACK_SYNC (i.e., stopped writeback) while the dirty data (sectors) had not even reached the writeback_percent threshold (i.e., started writeback). In that situation, the writeback rate will still be the minimum value (8*512 = 4KB/s), thus it will cause all the writes to bestuck in a non-writeback mode because of the slow writeback. We accelerate the rate in 3 stages with different aggressiveness: - the first stage starts when dirty buckets percent reach above BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50), + the first stage starts when dirty buckets percent reach above BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50), the second is BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID (57), - the third is BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64). + the third is BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64). By default the first stage tries to writeback the amount of dirty data in one bucket (on average) in (1 / (dirty_buckets_percent - 50)) seconds, the second stage tries to writeback the amount of dirty data in one bucket in (1 / (dirty_buckets_percent - 57)) * 100 milliseconds, the third stage tries to writeback the amount of dirty data in one bucket in (1 / (dirty_buckets_percent - 64)) milliseconds. The initial rate at each stage can be controlled by 3 configurable - parameters: + parameters: writeback_rate_fp_term_{low|mid|high} They are by default 1, 10, 1000, chosen based on testing and production data, detailed below. A. When it comes to the low stage, it is still far from the 70% -threshold, so we only want to give it a little bit push by setting the -term to 1, it means the initial rate will be 170 if the fragment is 6, -it is calculated by bucket_size/fragment, this rate is very small, -but still much more reasonable than the minimum 8. -For a production bcache with non-heavy workload, if the cache device -is bigger than 1 TB, it may take hours to consume 1% buckets, -so it is very possible to reclaim enough dirty buckets in this stage, -thus to avoid entering the next stage. + threshold, so we only want to give it a little bit push by setting the + term to 1, it means the initial rate will be 170 if the fragment is 6, + it is calculated by bucket_size/fragment, this rate is very small, + but still much more reasonable than the minimum 8. + For a production bcache with non-heavy workload, if the cache device + is bigger than 1 TB, it may take hours to consume 1% buckets, + so it is very possible to reclaim enough dirty buckets in this stage, + thus to avoid entering the next stage. B. If the dirty buckets ratio didn’t turn around during the first stage, -it comes to the mid stage, then it is necessary for mid stage -to be more aggressive than low stage, so the initial rate is chosen -to be 10 times more than the low stage, which means 1700 as the initial -rate if the fragment is 6. This is a normal rate -we usually see for a normal workload when writeback happens -because of writeback_percent. + it comes to the mid stage, then it is necessary for mid stage + to be more aggressive than
[Bug 1914807] Re: rack can't contact region, deployments fails
We are resorting to rebooting maas every day just so we can avoid hitting this and having deployments fail. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1914807 Title: rack can't contact region, deployments fails To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1914807/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, >z_sa_hdl)) failed
** Tags added: seg -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1906476 Title: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, >z_sa_hdl)) failed To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1521173] Re: AER: Corrected error received: id=00e0
Seen this as well -- although I don't believe it's causing any problems that we know of -- sure does look right now like it's only noise in the logs. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1521173 Title: AER: Corrected error received: id=00e0 To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1521173/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1889556] Re: grub-install failure does not fail package upgrade (and does not roll back to matching modules)
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1889556 Title: grub-install failure does not fail package upgrade (and does not roll back to matching modules) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1889556/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels
Some of the 4.15 kernels fixed: Bionic linux kernel: 4.15.0-109.110 Bionic linux-aws kernel: 4.15.0-1077.81 Xenial linux-hwe kernel: 4.15.0-107.108~16.04.1 Xenial linux-gcp kernel: 4.15.0-1078.88~16.04.1 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1879658 Title: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels
Packages tested linux-gcp (4.15.0-1078.88~16.04.1) xenial; linux-hwe (4.15.0-107.108~16.04.1) xenial; linux-gcp-4.15 (4.15.0-1078.88) bionic; linux (4.15.0-107.108) bionic; -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1879658 Title: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels
Tested. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1879658 Title: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1882039] Re: The thread level parallelism would be a bottleneck when searching for the shared pmd by using hugetlbfs
** Changed in: linux (Ubuntu Bionic) Importance: Medium => High ** Changed in: linux (Ubuntu Bionic) Status: Triaged => In Progress ** Changed in: linux (Ubuntu Eoan) Status: Triaged => In Progress ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Gavin Guo (mimi0213kimo) ** Changed in: linux (Ubuntu Focal) Status: Triaged => In Progress ** Changed in: linux (Ubuntu Focal) Importance: Medium => High ** Changed in: linux (Ubuntu Eoan) Importance: Medium => High ** Changed in: linux (Ubuntu) Importance: Medium => High ** Changed in: linux (Ubuntu Eoan) Assignee: (unassigned) => Gavin Guo (mimi0213kimo) ** Changed in: linux (Ubuntu Focal) Assignee: (unassigned) => Gavin Guo (mimi0213kimo) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1882039 Title: The thread level parallelism would be a bottleneck when searching for the shared pmd by using hugetlbfs To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1882039/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot
Note that fix for all the above series are already released. i.e, from Ubuntu-4.15.0-73.82. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834322 Title: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot
Could anyone hitting this bug confirm it is a DUP of LP Bug 1852077 and that latest releases fix this issue? The handling of the state changes/updates borked here due to not just marking it as a DUP and closing this one. I will close this next week otherwise. ** Changed in: linux (Ubuntu Focal) Status: In Progress => Fix Released ** Changed in: linux (Ubuntu Bionic) Status: Fix Committed => Fix Released ** Changed in: linux (Ubuntu Disco) Status: Fix Committed => Fix Released ** Changed in: linux (Ubuntu Eoan) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834322 Title: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels
Test kernel has been tested successfully so far by original reporter and has fixed the Docker breakage and so on. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1879658 Title: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels
** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Changed in: linux (Ubuntu) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu) Status: In Progress => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1879658 Title: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels
SRU request has been submitted. If anyone would like to test, there are test images up on: https://people.canonical.com/~nivedita/ipvlan-test-fix-278887/ You can 'wget' the files and then 'dpkg -i' the modules, linux-image, modules-extra debs in that order, and reboot. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1879658 Title: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1879658] [NEW] Cannot create ipvlans with > 1500 MTU on recent Bionic kernels
Public bug reported: [IMPACT] Setting an MTU larger than the default 1500 results in an error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels when attempting to create ipvlan interfaces: # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2 RTNETLINK answers: Invalid argument This breaks Docker and other applications which use a Jumbo MTU (9000) when using ipvlans. The bug is caused by the following recent commit to Bionic & Xenial-hwe; which is pulled in via the stable patchset below, which enforces a strict min/max MTU when MTUs are being set up via rtnetlink for ipvlans: Breaking commit: --- Ubuntu-hwe-4.15.0-92.93~16.04.1 * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261) * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link() The above patch applies checks of dev->min_mtu and dev->max_mtu to avoid a malicious user from crashing the kernel with a bad value. It was patching the original patchset to centralize min/max MTU checking from various different subsystems of the networking kernel. However, in that patchset, the max_mtu had not been set to the largest phys (64K) or jumbo (9000 bytes), and defaults to 1500. The recent commit above which enforces strict bounds checking for MTU size exposes the bug of the max mtu not being set correctly for the ipvlan driver (this has been previously fixed in bonding, teaming drivers). Fix: --- This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans, but was not backported to Bionic along with other patches. The missing commit in the Bionic backport: ipvlan: use ETH_MAX_MTU as max mtu commit 548feb33c598dfaf9f8e066b842441ac49b84a8a [Test Case] 1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe) 2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2 (where test1 eno1 is the physical interface you are adding the ipvlan on) 3. # ip link ... 14: test1@eno1: mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000 ... // check that your test1 ipvlan is created with mtu 9000 4. Install 4.15.0-92 kernel or later 5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2 RTNETLINK answers: Invalid argument 6. With the above fix commit backported to the xenial-hwe/Bionic, the jumbo mtu ipvlan creation works again, identical to before 92. [Regression Potential] This commit is in upstream mainline as of v4.18-rc2, and hence is already in Cosmic and later, i.e. all post Bionic releases currently. Hence there's low regression potential here. It only impacts ipvlan functionality, and not other networking systems, so core systems should not be affected by this. And affects on setup so it either works or doesn't. Patch is trivial. It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions (where the latent bug got exposed). ** Affects: linux (Ubuntu) Importance: Critical Status: Incomplete ** Affects: linux (Ubuntu Bionic) Importance: Critical Status: Incomplete ** Tags: bionic sts ** Changed in: linux (Ubuntu) Importance: Undecided => Critical ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => Critical ** Description changed: [IMPACT] Setting an MTU larger than the default 1500 results in an error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels when attempting to create ipvlan interfaces: # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2 RTNETLINK answers: Invalid argument - This breaks Docker and other applications which use a Jumbo + This breaks Docker and other applications which use a Jumbo MTU (9000) when using ipvlans. - The bug is caused by the following recent commit to Bionic - & Xenial-hwe; which is pulled in via the stable patchset below, + The bug is caused by the following recent commit to Bionic + & Xenial-hwe; which is pulled in via the stable patchset below, which enforces a strict min/max MTU when MTUs are being set up via rtnetlink for ipvlans: Breaking commit: --- Ubuntu-hwe-4.15.0-92.93~16.04.1 * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261) - * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link() + * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link() The above patch applies checks of dev->min_mtu and dev->max_mtu to avoid a malicious user from crashing the kernel with a bad value. It was patching the original patchset to centralize min/max MTU checking from various different subsystems of the networking kernel. However, in that patchset, the max_mtu had not been set to the largest phys (64K) or jumbo (9000 bytes), and defaults to 1500. The recent commit above which enforces strict bounds checking - for MTU size exposes the bug of the max mtu not being set correctly. + for MTU size exposes the bug of the max mtu not being set correctly + for the
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
The issue we have reported is easily avoided by specifying the primary port to be the active interface of the bond. On netplan-using systems: Add the directive "primary: $interface" (e.g. "primary: p94s0f0") to the "parameters:" section of the netplan config file. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Hello, diarmuid, Re: original issue report, were you able to resolve your issue? Please let us know. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
We are closing this LP bug for now as we aren't able to reproduce in-house, and we cannot get access to a live testing repro env at this time. Here is what we know: - There seems to be different performance for some tests when the NIC is configured with active-backup bonding mode, between the case when the active interface is the primary port, and when the active interface is the secondary port. i.e.: Primary port: enp94s0f0 // when this is the active, works fine Secondary port: enp94s0f1d1 // when this is the active, more drops - Switch info: 2 x Fortigate 1024D switches, each machine is connected to both - NIC info: root@u072:~# lspci | grep BCM57416 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) # ethtool -i enp1s0f0np0 driver: bnxt_en version: 1.10.0 firmware-version: 214.0.253.1/pkg 21.40.25.31 - Our attempt at a reproducer (initially reported in production env via graphical monitoring): mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST good system = ~ 0% drops bad systems = ~ 8% drops We are not getting NIC stats drops, nor UDP kernel drops, so it's not clear where the packet is being dropped, whether it's being dropped silently somewhere (?), or if that's a red herring and a mtr test issue, and what's seen in production is something else. If someone can reproduce this, or something similar, or if we manage to, we will re-open this bug or file a new one. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1811963] Re: Sporadic problems with X710 (i40e) and bonding where one interface is shown as "state DOWN" and without LOWER_UP
Hi Malte, Was this issue resolved for you? There are several other possibilities that it could be - and if it's still a problem with current mainline, please let us know. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1811963 Title: Sporadic problems with X710 (i40e) and bonding where one interface is shown as "state DOWN" and without LOWER_UP To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1811963/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Edwin, Do you happen to notice any IPv6 or LLDP or other link-local traffic on the interfaces? (including backup interface). The MTR loss % is purely a capture of their packets xmitted and responses received, so for that UDP MTR test, this is saying that UDP packets were lost, somewhere. The NIC does not have any drops showing via ethtool -S stats but I'm hunting down which are the right pair of before/afters. Other than the tpa_abort counts, there were no errors that I saw. I can't tell what the tpa_abort means for the frame - is it purely a failure only to coalesce, or does it end up dropping packets at some point in that functionality? I'm assuming not, as whatever the reason, those would be counted as drops, I hope, and printed in the interface stats. I'll attach all the stats here once I get them sorted out, I thought I had a clean diff of before and after from the tester, but after looking through, I don't think the file I have is from before/after the mtr test, as there was negligible UDP traffic. I'll try and get clarification from the reporter. Note that when the provision of primary= is used to configure which interface is primary, and when the primary port is used as the active interface for the bond, no problems are seen (and that works deterministically to set the correct active interface). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Additional observations. MAAS is being used to deploy the system and configure the bond interface and settings. MAAS allows you to specify which is the primary interface, with the other being the backup, for the active-backup bonding mode. However, it does not appear to be working -it's not passing along a primary primitive, for instance, in the netplan yaml or otherwise resulting in this being honored (still need to confirm). MAAS allows you to enter a mac address for the bond interface, but if not supplied, by default it will use the mac address of the "primary" interface, as configured. MAAS then populates the /etc/netplan/50-cloud-init.yaml, including a macaddr= line with the default. netplan then passes that along to systemd-networkd. The bonding kernel, however, will use as the active interface whichever interface is first attached to the bond (i.e., which completes getting attached to the bond interface first) in the absence of a primary= directive. The bonding kernel will, however, use the mac addr supplied as an override. So let's say the active interface was configured in MAAS to be f0, and it's mac is used to be the mac address of the bond, but f1 (the second port of the NIC) actually gets attached first to the bond and is used as the active interface by the bond. We have a situation where f0 = backup, f1 = active, and bond0 is using the mac of f0. While this should work, there is a potential for problems depending on the circumstances. It's likely this has nothing to do with our current issue, but here for completeness. Will see if we can test/confirm. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Edwin, let me know if you can get in touch with me via the contact email on my Launchpad page. Thanks for all the help! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
** Attachment added: "ethtool -S for inactive interface enp94s0f0" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+attachment/5327556/+files/ethtool-S-enp94s0f0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
ethtool-enp94s0f0 -- Settings for enp94s0f0: Supported ports: [ FIBRE ] Supported link modes: 1baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 1Mb/s Duplex: Full Port: FIBRE PHYAD: 1 Transceiver: internal Auto-negotiation: off Supports Wake-on: g Wake-on: d Current message level: 0x (0) Link detected: yes ethtool-i-enp94s0f0 -- driver: bnxt_en version: 1.10.0 firmware-version: 214.0.253.1/pkg 21.40.25.31 expansion-rom-version: bus-info: :5e:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: no supports-priv-flags: no ethtool-c-enp94s0f0 - Coalesce parameters for enp94s0f0: Adaptive RX: off TX: off stats-block-usecs: 100 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 10 rx-frames: 15 rx-usecs-irq: 1 rx-frames-irq: 1 tx-usecs: 28 tx-frames: 30 tx-usecs-irq: 2 tx-frames-irq: 2 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 ethtool-g-enp94s0f0 Ring parameters for enp94s0f0: Pre-set maximums: RX: 2047 RX Mini:0 RX Jumbo: 8191 TX: 2047 Current hardware settings: RX: 511 RX Mini:0 RX Jumbo: 2044 TX: 511 ethtool-k-enp94s0f0 - Features for enp94s0f0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: on receive-hashing: on highdma: on [fixed] rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: on tx-gre-csum-segmentation: on tx-ipxip4-segmentation: on tx-ipxip6-segmentation: off [fixed] tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-gso-partial: on tx-sctp-segmentation: off [fixed] tx-esp-segmentation: off [fixed] tx-udp-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: on rx-vlan-stag-hw-parse: on rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] hw-tc-offload: on esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: on tls-hw-tx-offload: off [fixed] tls-hw-rx-offload: off [fixed] rx-gro-hw: on tls-hw-record: off [fixed] -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
"Bad" System/NIC: NIC: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller System: Dell Kernel: 5.3.0-28-generic #30~18.04.1-Ubuntu (Note, this issue has been seen on prior kernels as well, upgraded to latest to see if various problems were resolved) Attaching stats/config files from nics from this system (seeing issue). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Good System/Good NIC (all configurations work) Comparison NIC: NetXtreme II BCM57000 10 Gigabit Ethernet QLogic 57000 System: Dell Kernel: 5.0.0-25-generic #26~18.04.1-Ubuntu /proc/net/bonding/bond0 --- Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: enp5s0f1 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: enp5s0f1 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:00:00:00:73:e2 Slave queue ID: 0 Slave Interface: enp5s0f0 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:00:00:00:73:e0 Slave queue ID: 0 /etc/netplan/50-cloud-init.yaml network: bonds: bond0: addresses: - 00.00.235.182/25 gateway4: 00.00.235.129 interfaces: - enp5s0f0 - enp5s0f1 macaddress: 00:00:00:00:73:e0 mtu: 9000 nameservers: addresses: - 00.00.235.172 - 00.00.235.171 search: - maas parameters: down-delay: 0 gratuitious-arp: 1 mii-monitor-interval: 100 mode: active-backup transmit-hash-policy: layer2 up-delay: 0 ethernets: ...(snip).. enp5s0f0: match: macaddress: 00:00:00:00:73:e0 mtu: 9000 set-name: enp5s0f0 enp5s0f1: match: macaddress: 00:00:00:00:73:e2 mtu: 9000 set-name: enp5s0f1 version: 2 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
"Bad" Configuration for active-backup mode: $ cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: enp94s0f1d1 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0 Slave Interface: enp94s0f1d1 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: 4c:d9:8f:48:08:da Slave queue ID: 0 Slave Interface: enp94s0f0 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: 4c:d9:8f:48:08:d9 Slave queue ID: 0 --- $ cat uname-rv 5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020 --- Scrubbed /etc/netplan/50-cloud-init.yaml: network: bonds: bond0: addresses: - 0.0.235.177/25 gateway4: 0.0.235.129 interfaces: - enp94s0f0 - enp94s0f1d1 macaddress: 00:00:00:48:08:00 mtu: 9000 nameservers: addresses: - 0.0.235.171 - 0.0.235.172 search: - maas parameters: down-delay: 0 gratuitious-arp: 1 mii-monitor-interval: 100 mode: active-backup transmit-hash-policy: layer2 up-delay: 0 ethernets: eno1: match: macaddress: 00:00:00:76:6e:ca mtu: 1500 set-name: eno1 eno2: match: macaddress: 00:00:00:76:6e:cb mtu: 1500 set-name: eno2 enp94s0f0: match: macaddress: 00:00:00:48:08:00 mtu: 9000 set-name: enp94s0f0 enp94s0f1d1: match: macaddress: 00:00:00:48:08:da mtu: 9000 set-name: enp94s0f1d1 version: 2 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
We have narrowed it down to a flaw in a specific configuration setting on this NIC, so we're comparing the good and bad configurations now. Primary port: enp94s0f0 Secondary port: enp94s0f1d1 A] Good config for fault-tolerance (active-backup) bonding mode: -- Primary port = active interface; Secondary port = backup B] Bad config for fault-tolerance (active-backup) bonding mode: -- Primary port = backup interface; Secondary port = active We are consistently able to reproduce a drop rate difference with UDP pkts, for the above good/bad cases: Good Case UDP MTR Test Result - mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST Start: 2020-02-10T10:14:01+ HOST: hostname Loss% Snt Last Avg Best Wrst StDev 1.|-- nn.nn.nnn.nnn 0.0%600.3 0.2 0.2 0.3 0.0 Bad Case UDP MTR Test Result --- mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST Start: 2020-02-10T14:10:52+ HOST: hostname Loss% Snt Last Avg Best Wrst StDev 1.|-- nn.nn.nnn.nnn 8.3%600.3 0.3 0.2 0.4 0.0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
The second port on the NIC definitely works as the active interface in an active-backup bonding configuration on the other NICs. At the moment, it's only this particular NIC that is seeing this problem that we know of. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Hello Edwin, Here is more information on the issue we are seeing wrt dropped packets and other connectivity issues with this NIC. The problem is *only* seen when the second port on the NIC is chosen as the active interface of a active-backup configuration. So on the "bad" system with the interfaces: enp94s0f0 -> when chosen as active, all OK enp94s0f1d1 -> when chosen as active, not OK I'll see if the reporters can confirm that on the "good" systems, there was no problem when the second interface is active. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1860217] Re: dpkg-reconfigure clamav-daemon in infinite loop
Impact: production servers are not provisioning -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1860217 Title: dpkg-reconfigure clamav-daemon in infinite loop To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/clamav/+bug/1860217/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1860217] Re: dpkg-reconfigure clamav-daemon in infinite loop
I have reproduced it on Xenial. $ cat /etc/lsb-release .. DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS" $ uname -rv 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 $ dpkg -l | grep clam ii clamav 0.102.1+dfsg-0ubuntu0.16.04.2 ii clamav-base 0.102.1+dfsg-0ubuntu0.16.04.2 ii clamav-daemon 0.102.1+dfsg-0ubuntu0.16.04.2 ii clamav-freshclam 0.102.1+dfsg-0ubuntu0.16.04.2 ii clamdscan 0.102.1+dfsg-0ubuntu0.16.04.2 ii libclamav9 0.102.1+dfsg-0ubuntu0.16.04.2 $ sudo DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true dpkg-reconfigure clamav-daemon (hangs) $ ps -fe | grep clamav root 20256 19801 0 07:54 pts/1 00:00:00 sudo DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true dpkg-reconfigure clamav-daemon root 20257 20256 70 07:54 pts/1 00:00:10 /usr/bin/perl -w /usr/sbin/dpkg-reconfigure clamav-daemon root 20306 20257 23 07:54 pts/1 00:00:03 /bin/sh /var/lib/dpkg/info/clamav-daemon.config reconfigure 0.102.1+dfsg-0ubuntu0.16.04.2 ubuntu 20647 17343 0 07:54 pts/0 00:00:00 grep --color=none clamav $ sudo strace -p 20257 ... read(8, "METAGET clamav-daemon/LogFile va"..., 8192) = 36 write(7, "0 /var/log/clamav/clamav.log\n", 29) = 29 read(8, "INPUT low clamav-daemon/LogTime\n", 8192) = 32 write(7, "30 question skipped\n", 20) = 20 read(8, "GO \n", 8192) = 4 write(7, "0 ok\n", 5) = 5 ... ^Cread(8, strace: Process 20257 detached ** Tags added: sts ** Changed in: clamav (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1860217 Title: dpkg-reconfigure clamav-daemon in infinite loop To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/clamav/+bug/1860217/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Hey Edwin, sorry, I didn't see your last question. I'll try and confirm but I've seen loss in both directions but it's not clear whether that's significant enough or not yet. e.g., TCP traffic is retransmitted, so it could be segments lost while outgoing or acks lost incoming. 4407 retransmitted TCP segments 130 TCP timeouts in stats collected about 5 mins apart - which isn't sufficient a sample size, we're trying to get a new collection of stats, logs using the netperf TCP_RR test. In our case, note, we're more concerned (and have more solid data) of latency issues than dropped packets (which I expect some of with heavy network testing). For example, netperf TCP_RR latency is about 70-78% of the older systems for 1,1 request/response byte sizes as well as 64/64, 100/200, 128/8192 sizes. I'll update here as soon as we have more data from the production environment. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
> NICs between systems? Are OS / kernel and driver > versions the same on both systems? Yes, identical distro release, kernel, and most of the software stack (I have not obtained and examined the full sw stack). Configuration of networking settings is also the same. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
> There are more than one variable at play here. > Does the problem follow the NIC if you swap the > NICs between systems? Are OS / kernel and driver > versions the same on both systems? Unfortunately, I've not been able to get them to try permutations or switches, as yet, as this is still a production system/environment. I'll try and obtain more information about it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Thanks very much for helping on this, Edwin! Please let me know if there's anything specific you need. I'm asking them to disable any IPv6, LLDP traffic in their environment, and retest and collect information again. Also, I'd like to disable tpa, would this be at all useful: modprobe bnx disable_tpa=1 ?? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
> The mtr packet loss is an interesting result. What mtr options did you use? Is this a UDP or ICMP test? The mtr command was: mtr --no-dns --report --report-cycles 60 $IP_ADDR so ICMP was going out. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
** Attachment added: "active interface ethtool-S" https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1853638/+attachment/5324070/+files/ethtool-S-enp94s0f0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
** Attachment added: "backup interface ethtool-S" https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1853638/+attachment/5324071/+files/ethtool-S-enp94s0f1d1 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Note that iperf was identical whereas netperf and mtr showed up differences (so it's possibly sporadic as well, not continuous) 1. iperf tcp test -- GoodSystem.9.84 Gbits/sec BadSystem18.37 Gbits/sec BadSystem2...9.85 Gbits/sec 2. iperf udp test -- GoodSystem.1.05 Mbits/sec BadSystem2...1.05 Mbits/sec 3. mtr ping test --- GoodSystem..0.0% Loss; 0.2 Avg; 0.1 Best, 0.9 Worst, 0.1 StdDev BadSystem2...11.7% Loss; 0.1 Avg; 0.1 Best, 0.2 Worst, 0.0 StdDev 4. netperf tcp_rr 1/1 bytes GoodSystem..17921.83 t/sec BadSystem1.13912.45 t/sec BadSystem2 5. netperf tcp_rr 64/64 bytes GoodSystem..16987.48 t/sec BadSystem1.13355.93 t/sec BadSystem2 6. netperf tcp_rr 128/8192 bytes --- GoodSystem..2396.45 t/sec BadSystem1.1678.54 t/sec BadSystem2 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Hello, Edwin, We have two separate users/customers filing reports, and I can answer for one of them. I'll ask the original poster separately as well to reply. With respect to one of these situations, this is the following system: Dell PowerEdge R440/0XP8V5, BIOS 2.2.11 06/14/2019 Note that a similar system does not have any issues: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.3.4 11/08/2016 So the NIC in the "bad" environment is: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) Product Name: Broadcom Adv. Dual 10G SFP+ Ethernet The NIC in the "good" environment is: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet [14e4:1006] Product Name: QLogic 57810 10 Gigabit Ethernet I'll have to scrub some files and see what I can attach, apologies, I'll have it here by tmrw. Unfortunately, we don't have an easy reproducer. A single iperf and netperf test (both UDP and TCP) showed identical results from both "good" and "bad" environments. What we have is an identical kernel, network configuration and stack with the "bad" system showing double, triple the latency to the systems from a remote server. I'll have more information for you shortly here regarding the exact k8 cmd. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
(active interface) > cat ethtool-S-enp94s0f1d1 | grep abort [0]: tpa_aborts: 19775497 [1]: tpa_aborts: 26758635 [2]: tpa_aborts: 12008147 [3]: tpa_aborts: 15829167 [4]: tpa_aborts: 25099500 [5]: tpa_aborts: 3292554 [6]: tpa_aborts: 2863692 [7]: tpa_aborts: 20224692 (backup interface) > cat ethtool-S-enp94s0f0 | grep abort [0]: tpa_aborts: 3158584 [1]: tpa_aborts: 1670319 [2]: tpa_aborts: 1749371 [3]: tpa_aborts: 1454301 [4]: tpa_aborts: 123020 [5]: tpa_aborts: 1403509 [6]: tpa_aborts: 1298383 [7]: tpa_aborts: 1858753 Netted out from previous capture, there were *f0 = 2014 tpa_aborts *d1 = 1118473 tpa_aborts ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Changed in: linux (Ubuntu) Importance: Undecided => Critical -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
We suspect this is a device (hw/fw) issue, however, not NetworkManager or kernel (driver bnxt_en). I've added the kernel for the driver impact (just in case, for now). This is really to eliminate all other causes and confirm whether it's the device at root cause). NIC Product Name: Broadcom Adv. Dual 10G SFP+ Ethernet 5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) NIC Driver/FW --- driver: bnxt_en version: 1.10.0 firmware-version: 214.0.253.1/pkg 21.40.25.31 expansion-rom-version: bus-info: :5e:00.1 supports-statistics: yes Kernel - 5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019 (appears to be an issue on all kernel versions) Environment Configuration - active-backup bonding mode (having the active backup up *might* potentially be the problem, but it might just be the device itself). The exact same distro, kernel, applications and configuration works fine with a different NIC (Broadcom 10g bnx2x). There were quite a few total tpa_abort stats counts (1118473) during the duration of a 2 minute iperf test. Hoping to get more information from other users seeing the same issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
I have reports of the same device appearing to drop packets and incur greater number of retransmissions under certain circumstances which we're still trying to nail down. I'm using this bug for now until proven to be a different problem. This is causing issues in a production environment. ** Changed in: network-manager (Ubuntu) Status: New => Confirmed ** Changed in: network-manager (Ubuntu) Importance: Undecided => Critical ** Tags added: sts ** Also affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot
Fix has been committed to B, D, E. I've manually updated this bug for now (it was not formally DUP'd to LP Bug 1852077. ** Changed in: linux (Ubuntu Focal) Importance: Undecided => High ** Changed in: linux (Ubuntu Eoan) Importance: Undecided => High ** Changed in: linux (Ubuntu Disco) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Status: New => Fix Committed ** Changed in: linux (Ubuntu Disco) Status: New => Fix Committed ** Changed in: linux (Ubuntu Eoan) Status: New => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834322 Title: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1843044] Re: firefox crashes on a FIPS enabled machine
We have multiple reports of the latest Firefox not working with FIPS due to the above ongoing, so we would like to determine how to fix this as a priority. We are trying to determine what the best approach to take is given the Mozilla team's direction to keep the default behavior of the nss library the same (checking the fips_enabled flag), and behaving differently if built with an env variable, and not go with Vineetha's submitted patch. To get FF to FIPS mode, I suspect on Bionic we will need this as well: Bug 1531267: "FIPS mode should be enabled automatically if the system is in FIPS mode" Fix in nss version: 3.43 (On Linux, even if /proc/sys/crypto/fips_enabled is 1, one needs to enable database's FIPS mode with modutil.) On Bionic the nss package version was 2:3.35, which does not have that fix (Eoan has 2:3.45). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1843044 Title: firefox crashes on a FIPS enabled machine To manage notifications about this bug go to: https://bugs.launchpad.net/firefox/+bug/1843044/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1843044] Re: firefox crashes on a FIPS enabled machine
** Changed in: firefox (Ubuntu) Status: New => Confirmed ** Changed in: firefox (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1843044 Title: firefox crashes on a FIPS enabled machine To manage notifications about this bug go to: https://bugs.launchpad.net/firefox/+bug/1843044/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1288196] Re: MAC address of bonding interface is randomly picked
Stephane, Any ideas on this, and how to push forward with a permanent solution? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1288196 Title: MAC address of bonding interface is randomly picked To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ifenslave/+bug/1288196/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot
FWIW, the fix has been committed to -stable: "bonding: fix state transition issue in link monitoring" Commit: 1899bb325149e481de31a4f32b59ea6f24e176ea https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/bonding?id=1899bb325149e481de31a4f32b59ea6f24e176ea -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834322 Title: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1852077] Re: Backport: bonding: fix state transition issue in link monitoring
FWIW, the fix has been committed to -stable: "bonding: fix state transition issue in link monitoring" Commit: 1899bb325149e481de31a4f32b59ea6f24e176ea https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/bonding?id=1899bb325149e481de31a4f32b59ea6f24e176ea ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1852077 Title: Backport: bonding: fix state transition issue in link monitoring To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot
This is being handled as a DUP of LP Bug 1852077 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077 ** Changed in: linux (Ubuntu) Status: Expired => In Progress ** Tags added: sts ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Focal) Importance: Undecided Status: In Progress ** Also affects: linux (Ubuntu Eoan) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834322 Title: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot
https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/ There is a test kernel above (from that LP bug). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834322 Title: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1852077] Re: Backport: bonding: fix state transition issue in link monitoring
Still waiting on these patches being committed to all the Ubuntu trees. Any ETA? Is this waiting on being picked up via -stable? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1852077 Title: Backport: bonding: fix state transition issue in link monitoring To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1830226] Re: systemtap currently broken in xenial
This version of systemtap needs to be dependent on kernel 4.4.0-143 #169 or later. I'm assuming that we can do that? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1830226 Title: systemtap currently broken in xenial To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemtap/+bug/1830226/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1846535] Re: cloud-init 19.2.36 fails with python exception "Not all expected physical devices present ..." during bionic image deployment from MAAS
Bumped up the importance due to more reports in production env hitting this problem (apparently, not fully confirmed). ** Changed in: cloud-init (Ubuntu) Importance: Undecided => Critical ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1846535 Title: cloud-init 19.2.36 fails with python exception "Not all expected physical devices present ..." during bionic image deployment from MAAS To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1846535/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384
This issue has been tested and successfully verified: Verification successful ! "...test appliance built with 4.15.0-58 was unusable ... hundreds of "BUG: non-zero pgtables_bytes on freeing mm: -16384" in syslog, RestAPI interface timeouts, failed to produce FFDC data using sosreport. Build with 4.15.0-60.67 displays none of these behaviors ... smoke test completed successfully." ** Tags added: verification-done-bionic ** Changed in: linux (Ubuntu Bionic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840046 Title: BUG: non-zero pgtables_bytes on freeing mm: -16384 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
** Tags added: sts ** Changed in: linux (Ubuntu Xenial) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => High ** Changed in: linux (Ubuntu Xenial) Importance: High => Critical ** Changed in: linux (Ubuntu Bionic) Importance: High => Critical ** Changed in: linux (Ubuntu Disco) Importance: Undecided => Critical ** Changed in: linux (Ubuntu Eoan) Importance: Undecided => Critical -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840789 Title: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840704] Re: ZFS kernel modules lack debug symbols
** Tags added: sts ** Tags added: linux ** Changed in: linux (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384
I'll update here once kernel is uploaded. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840046 Title: BUG: non-zero pgtables_bytes on freeing mm: -16384 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384
I unduped it for test process clarity. Trying to get the relevant people to test the fix. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840046 Title: BUG: non-zero pgtables_bytes on freeing mm: -16384 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384
** This bug is no longer a duplicate of bug 1837664 Bionic update: upstream stable patchset 2019-07-23 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840046 Title: BUG: non-zero pgtables_bytes on freeing mm: -16384 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384
*** This bug is a duplicate of bug 1837664 *** https://bugs.launchpad.net/bugs/1837664 I'm not sure this bug should be DUP'd to the stable-release bug. Might confuse the verification and handling triggers, perhaps? Will need to make sure the fix is tested once the fix is uploaded. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840046 Title: BUG: non-zero pgtables_bytes on freeing mm: -16384 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384
*** This bug is a duplicate of bug 1837664 *** https://bugs.launchpad.net/bugs/1837664 I'll unDUP it unless the kernel team says otherwise in IRC. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840046 Title: BUG: non-zero pgtables_bytes on freeing mm: -16384 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
Verified on Xenial ** Tags removed: verification-needed-xenial ** Tags added: verification-done-xenial -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
As the test kernel with the backported Xenial fix has been up for almost 2 months now, I'm submitting the SRU for Xenial, although I have not received feedback from original reporter or others. Backported patch for Xenial varies slightly from the cherry-picked patch for B, C. My testing has been successful (see original testing information in description). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1779756 Title: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1814095 Title: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus
** Tags added: sts ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic verification-done-cosmic ** Tags removed: verification-done-cosmic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
Bionic, Cosmic kernels successfully tested. I've updated the tags. ** Tags removed: verification-needed-bionic verification-needed-cosmic ** Tags added: verification-done-bionic verification-done-cosmic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus
Late update, but the original reporter did test the proposed kernel on systems able to reproduce the problem and were tested successfully. We do not yet have a way of reproducing this on Xenial (i.e, any 4.4 kernel). I'm still leaving this an open issue, will be trying to do this and once we can confirm/test, will update and push an SRU for Xenial as well. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
A 4.4 test kernel with the fix backported is available at: https://people.canonical.com/~nivedita/geneve-xenial-test/ if anyone wishes to validate the 4.4 X solution. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
Resubmitted SRU for B,C for this kernel cycle. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
Submitted SRU request for Bionic, Cosmic. Huge thanks for the testing, Matthew! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
** Tags added: cosmic xenial -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794232 Title: Geneve tunnels don't work when ipv6 is disabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
** Description changed: SRU Justification Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically. Fix: Fixed by upstream commit in v5.0: Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 "geneve: correctly handle ipv6.disable module parameter" - Hence available in Disco and later; required in X,B,C - Cherry picked and tested successfully for X, B, C. + Hence available in Disco and later; required in X,B,C. Testcase: 1. Boot with "ipv6.disable=1" 2. Then try and create a geneve tunnel using: -# ovs-vsctl add-br br1 -# ovs-vsctl add-port br1 geneve1 -- set interface geneve1 - type=geneve options:remote_ip=192.168.x.z // ip of the other host + # ovs-vsctl add-br br1 + # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 + type=geneve options:remote_ip=192.168.x.z // ip of the other host Regression Potential: Low, only geneve tunnels when ipv6 dynamically disabled, current status is it doesn't work at all. Other Info: * Mainline commit msg includes reference to a fix for - non-metadata tunnels (infrastructure is not yet in - our tree prior to Disco), hence not being included - at this time under this case. + non-metadata tunnels (infrastructure is not yet in + our tree prior to Disco), hence not being included + at this time under this case. - At this time, all geneve tunnels created as above - are metadata-enabled. - + At this time, all geneve tunnels created as above + are metadata-enabled. --- [Impact] When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in an OS environment with open vswitch, where ipv6 has been disabled, the create fails with the error : “ovs-vsctl: Error detected while setting up 'geneve0': could not add network device geneve0 to ofproto (Address family not supported by protocol)." [Fix] There is an upstream commit for this in v5.0 mainline (and in Disco and later Ubuntu kernels). "geneve: correctly handle ipv6.disable module parameter" Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 This fix is needed on all our series prior to Disco and the v5.0 kernel: X, C, B. It is identical to the fix we implemented and tested internally with, but had not pushed upstream yet. [Test Case] (Best to do this on a kvm guest VM so as not to interfere with your system's networking) 1. On any Ubuntu Xenial kernel, disable ipv6. This example - is shown with the4.15.0-23-generic kernel (which differs + is shown with the 4.15.0-23-generic kernel (which differs slightly from 4.4.x in symptoms): - Edit /etc/default/grub to add the line: GRUB_CMDLINE_LINUX="ipv6.disable=1" - # update-grub - Reboot 2. Install OVS # apt install openvswitch-switch 3. Create a Geneve tunnel # ovs-vsctl add-br br1 # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.x.z (where remote_ip is the IP of the other host) You will see the following error message: "ovs-vsctl: Error detected while setting up 'geneve1'. See ovs-vswitchd log for details." From /var/log/openvswitch/ovs-vswitchd.log you will see: "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: failed to add geneve1 as port: Address family not supported by protocol" You will notice from the "ifconfig" output that the device genev_sys_6081 is not created. If you do not disable IPv6 (remove ipv6.disable=1 from /etc/default/grub + update-grub + reboot), the same 'ovs-vsctl add-port' command completes successfully. You can see that it is working properly by adding an IP to the br1 and pinging each host. On kernel 4.4 (4.4.0-128-generic), the error message doesn't happen using the 'ovs-vsctl add-port' command, no warning is shown in ovs-vswitchd.log, but the device genev_sys_6081 is also not created and ping test won't work. With the fixed test kernel, the interfaces and tunnel is created successfully. [Regression Potential] * Low -- affects the geneve driver only, and when ipv6 is disabled, and since it doesn't work in that case at all, this fix gets the tunnel up and running for the common case. [Other Info] * Analysis Geneve tunnels should work with either IPv4 or IPv6 environments as a design and support principle. Currently, however, what's in the implementation requires support for ipv6 for metadata-based tunnels which geneve is: rather than: a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled b) ipv4 + metadata + ipv6 What enforces this in the current 4.4.0-x code when opening a Geneve tunnel is the following in geneve_open() : bool ipv6 = geneve->remote.sa.sa_family == AF_INET6; bool metadata = geneve->collect_md; ... #if IS_ENABLED(CONFIG_IPV6) geneve->sock6 = NULL; if (ipv6 || metadata)
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
** Description changed: + SRU Justification + + Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically. + + Fix: + Fixed by upstream commit in v5.0: + Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 + "geneve: correctly handle ipv6.disable module parameter" + + Hence available in Disco and later; required in X,B,C + Cherry picked and tested successfully for X, B, C. + + Testcase: + 1. Boot with "ipv6.disable=1" + 2. Then try and create a geneve tunnel using: +# ovs-vsctl add-br br1 +# ovs-vsctl add-port br1 geneve1 -- set interface geneve1 + type=geneve options:remote_ip=192.168.x.z // ip of the other host + + Regression Potential: Low, only geneve tunnels when ipv6 dynamically + disabled, current status is it doesn't work at all. + + Other Info: + * Mainline commit msg includes reference to a fix for + non-metadata tunnels (infrastructure is not yet in + our tree prior to Disco), hence not being included + at this time under this case. + + At this time, all geneve tunnels created as above + are metadata-enabled. + + + --- [Impact] When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in an OS environment with open vswitch, where ipv6 has been disabled, the create fails with the error : “ovs-vsctl: Error detected while setting up 'geneve0': could not add network device geneve0 to ofproto (Address family not supported by protocol)." [Fix] There is an upstream commit for this in v5.0 mainline (and in Disco and later Ubuntu kernels). "geneve: correctly handle ipv6.disable module parameter" Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 This fix is needed on all our series prior to Disco and the v5.0 kernel: X, C, B. It is identical to the fix we implemented and tested internally with, but had not pushed upstream yet. [Test Case] (Best to do this on a kvm guest VM so as not to interfere with your system's networking) 1. On any Ubuntu Xenial kernel, disable ipv6. This example is shown with the4.15.0-23-generic kernel (which differs slightly from 4.4.x in symptoms): - Edit /etc/default/grub to add the line: GRUB_CMDLINE_LINUX="ipv6.disable=1" - # update-grub - Reboot 2. Install OVS # apt install openvswitch-switch 3. Create a Geneve tunnel # ovs-vsctl add-br br1 # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.x.z (where remote_ip is the IP of the other host) You will see the following error message: "ovs-vsctl: Error detected while setting up 'geneve1'. See ovs-vswitchd log for details." From /var/log/openvswitch/ovs-vswitchd.log you will see: "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: failed to add geneve1 as port: Address family not supported by protocol" You will notice from the "ifconfig" output that the device genev_sys_6081 is not created. If you do not disable IPv6 (remove ipv6.disable=1 from /etc/default/grub + update-grub + reboot), the same 'ovs-vsctl add-port' command completes successfully. You can see that it is working properly by adding an IP to the br1 and pinging each host. On kernel 4.4 (4.4.0-128-generic), the error message doesn't happen using the 'ovs-vsctl add-port' command, no warning is shown in ovs-vswitchd.log, but the device genev_sys_6081 is also not created and ping test won't work. With the fixed test kernel, the interfaces and tunnel is created successfully. [Regression Potential] * Low -- affects the geneve driver only, and when ipv6 is disabled, and since it doesn't work in that case at all, this fix gets the tunnel up and running for the common case. [Other Info] * Analysis Geneve tunnels should work with either IPv4 or IPv6 environments as a design and support principle. Currently, however, what's in the implementation requires support for ipv6 for metadata-based tunnels which geneve is: rather than: a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled b) ipv4 + metadata + ipv6 What enforces this in the current 4.4.0-x code when opening a Geneve tunnel is the following in geneve_open() : bool ipv6 = geneve->remote.sa.sa_family == AF_INET6; bool metadata = geneve->collect_md; ... #if IS_ENABLED(CONFIG_IPV6) geneve->sock6 = NULL; if (ipv6 || metadata) ret = geneve_sock_add(geneve, true); #endif if (!ret && (!ipv6 || metadata)) ret = geneve_sock_add(geneve, false); CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but even though ipv6 is false, metadata is always true for a geneve open as it is set unconditionally in ovs: In /lib/dpif_netlink_rtnl.c : case OVS_VPORT_TYPE_GENEVE: nl_msg_put_flag(, IFLA_GENEVE_COLLECT_METADATA); The second argument of geneve_sock_add is a boolean
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
** Description changed: [Impact] When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in an OS environment with open vswitch, where ipv6 has been disabled, the create fails with the error : “ovs-vsctl: Error detected while setting up 'geneve0': could not add network device geneve0 to ofproto (Address family not supported by protocol)." [Fix] There is an upstream commit for this in v5.0 mainline (and in Disco and later Ubuntu kernels). "geneve: correctly handle ipv6.disable module parameter" Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 This fix is needed on all our series prior to Disco and the v5.0 kernel: X, C, B. It is identical to the fix we implemented and tested internally with, but had not pushed upstream yet. [Test Case] (Best to do this on a kvm guest VM so as not to interfere with your system's networking) 1. On any Ubuntu Xenial kernel, disable ipv6. This example is shown with the4.15.0-23-generic kernel (which differs slightly from 4.4.x in symptoms): - Edit /etc/default/grub to add the line: GRUB_CMDLINE_LINUX="ipv6.disable=1" - # update-grub - Reboot 2. Install OVS # apt install openvswitch-switch 3. Create a Geneve tunnel # ovs-vsctl add-br br1 # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.x.z (where remote_ip is the IP of the other host) You will see the following error message: "ovs-vsctl: Error detected while setting up 'geneve1'. See ovs-vswitchd log for details." From /var/log/openvswitch/ovs-vswitchd.log you will see: "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: failed to add geneve1 as port: Address family not supported by protocol" You will notice from the "ifconfig" output that the device genev_sys_6081 is not created. If you do not disable IPv6 (remove ipv6.disable=1 from /etc/default/grub + update-grub + reboot), the same 'ovs-vsctl add-port' command completes successfully. You can see that it is working properly by adding an IP to the br1 and pinging each host. On kernel 4.4 (4.4.0-128-generic), the error message doesn't happen using the 'ovs-vsctl add-port' command, no warning is shown in ovs-vswitchd.log, but the device genev_sys_6081 is also not created and ping test won't work. With the fixed test kernel, the interfaces and tunnel is created successfully. [Regression Potential] - * Low -- affects the geneve driver only, and when ipv6 is - disabled, and since it doesn't work in that case at all, - this fix gets the tunnel up and running for the common case. - + * Low -- affects the geneve driver only, and when ipv6 is + disabled, and since it doesn't work in that case at all, + this fix gets the tunnel up and running for the common case. [Other Info] * Analysis Geneve tunnels should work with either IPv4 or IPv6 environments as a design and support principle. Currently, however, what's in the implementation requires support for ipv6 for metadata-based tunnels which geneve is: rather than: a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled b) ipv4 + metadata + ipv6 What enforces this in the current 4.4.0-x code when opening a Geneve tunnel is the following in geneve_open() : bool ipv6 = geneve->remote.sa.sa_family == AF_INET6; bool metadata = geneve->collect_md; ... #if IS_ENABLED(CONFIG_IPV6) geneve->sock6 = NULL; if (ipv6 || metadata) ret = geneve_sock_add(geneve, true); #endif if (!ret && (!ipv6 || metadata)) ret = geneve_sock_add(geneve, false); CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but even though ipv6 is false, metadata is always true for a geneve open as it is set unconditionally in ovs: In /lib/dpif_netlink_rtnl.c : case OVS_VPORT_TYPE_GENEVE: nl_msg_put_flag(, IFLA_GENEVE_COLLECT_METADATA); The second argument of geneve_sock_add is a boolean value indicating whether it's an ipv6 address family socket or not, and we thus incorrectly pass a true value rather than false. The current "|| metadata" check is unnecessary and incorrectly sends the tunnel creation code down the ipv6 path, which fails subsequently when the code expects an ipv6 family socket. * This issue exists in all versions of the kernel upto present mainline and net-next trees. * Testing with a trivial patch to remove that and make similar changes to those made for vxlan (which had the same issue) has been successful. Patches for various versions to be attached here soon. * Example Versions (bug exists in all versions of Ubuntu - and mainline): + and mainline) + + Update: This has been patched upstream after original description filed + here, fix available in v5.0 mainline and Disco
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
** Changed in: linux (Ubuntu Disco) Status: In Progress => Fix Released ** Description changed: [Impact] When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in an OS environment with open vswitch, where ipv6 has been disabled, the create fails with the error : “ovs-vsctl: Error detected while setting up 'geneve0': could not add network device geneve0 to ofproto (Address family not supported by protocol)." [Fix] - There is an upstream commit for this in v5.0 mainline. + There is an upstream commit for this in v5.0 mainline. "geneve: correctly handle ipv6.disable module parameter" Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 - This fix is needed on all our series: X, C, B, D. It is identical + This fix is needed on all our series prior to Disco and the v5.0 kernel: X, C, B. It is identical to the fix we implemented and tested internally with, but - had not pushed upstream yet. - + had not pushed upstream yet. [Test Case] (Best to do this on a kvm guest VM so as not to interfere with your system's networking) 1. On any Ubuntu Xenial kernel, disable ipv6. This example is shown with the4.15.0-23-generic kernel (which differs slightly from 4.4.x in symptoms): - Edit /etc/default/grub to add the line: GRUB_CMDLINE_LINUX="ipv6.disable=1" - # update-grub - Reboot 2. Install OVS # apt install openvswitch-switch 3. Create a Geneve tunnel # ovs-vsctl add-br br1 # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.x.z (where remote_ip is the IP of the other host) You will see the following error message: "ovs-vsctl: Error detected while setting up 'geneve1'. See ovs-vswitchd log for details." From /var/log/openvswitch/ovs-vswitchd.log you will see: "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: failed to add geneve1 as port: Address family not supported by protocol" You will notice from the "ifconfig" output that the device genev_sys_6081 is not created. If you do not disable IPv6 (remove ipv6.disable=1 from /etc/default/grub + update-grub + reboot), the same 'ovs-vsctl add-port' command completes successfully. You can see that it is working properly by adding an IP to the br1 and pinging each host. On kernel 4.4 (4.4.0-128-generic), the error message doesn't happen using the 'ovs-vsctl add-port' command, no warning is shown in ovs-vswitchd.log, but the device genev_sys_6081 is also not created and ping test won't work. - With the fixed test kernel, the interfaces and tunnel + With the fixed test kernel, the interfaces and tunnel is created successfully. - [Other Info] * Analysis Geneve tunnels should work with either IPv4 or IPv6 environments as a design and support principle. Currently, however, what's in the implementation requires support for ipv6 for metadata-based tunnels which geneve is: rather than: a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled b) ipv4 + metadata + ipv6 What enforces this in the current 4.4.0-x code when opening a Geneve tunnel is the following in geneve_open() : bool ipv6 = geneve->remote.sa.sa_family == AF_INET6; bool metadata = geneve->collect_md; ... #if IS_ENABLED(CONFIG_IPV6) geneve->sock6 = NULL; if (ipv6 || metadata) ret = geneve_sock_add(geneve, true); #endif if (!ret && (!ipv6 || metadata)) ret = geneve_sock_add(geneve, false); CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but even though ipv6 is false, metadata is always true for a geneve open as it is set unconditionally in ovs: In /lib/dpif_netlink_rtnl.c : case OVS_VPORT_TYPE_GENEVE: nl_msg_put_flag(, IFLA_GENEVE_COLLECT_METADATA); The second argument of geneve_sock_add is a boolean value indicating whether it's an ipv6 address family socket or not, and we thus incorrectly pass a true value rather than false. The current "|| metadata" check is unnecessary and incorrectly sends the tunnel creation code down the ipv6 path, which fails subsequently when the code expects an ipv6 family socket. * This issue exists in all versions of the kernel upto present mainline and net-next trees. * Testing with a trivial patch to remove that and make similar changes to those made for vxlan (which had the same issue) has been successful. Patches for various versions to be attached here soon. * Example Versions (bug exists in all versions of Ubuntu and mainline): $ uname -r 4.4.0-135-generic $ lsb_release -rd Description: Ubuntu 16.04.5 LTS Release: 16.04 $ dpkg -l | grep openvswitch-switch ii openvswitch-switch 2.5.4-0ubuntu0.16.04.1 ** Description changed: [Impact] When attempting to
[Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled
We had tested a patch discussed above and tested internally, with success - although we have limited testing (opening up a geneve tunnel between 2 kvm guests). Jiri has now pushed an identical patch upstream which is available in the v5.0 kernel and later. "geneve: correctly handle ipv6.disable module parameter" Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 Although I do not have testing validation from original poster, since it has been committed upstream, I'm going to go ahead and get the SRU request started. ** Changed in: linux (Ubuntu) Status: Triaged => In Progress ** Changed in: linux (Ubuntu) Importance: Medium => High ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Disco) Importance: High Status: In Progress ** Changed in: linux (Ubuntu Cosmic) Status: New => In Progress ** Changed in: linux (Ubuntu Disco) Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi) ** Changed in: linux (Ubuntu Cosmic) Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi) ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi) ** Changed in: linux (Ubuntu Xenial) Status: New => In Progress ** Changed in: linux (Ubuntu Cosmic) Importance: Undecided => High ** Changed in: linux (Ubuntu Xenial) Importance: Undecided => High ** Description changed: [Impact] - When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in - an OS environment with open vswitch, where ipv6 has been disabled, + When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in + an OS environment with open vswitch, where ipv6 has been disabled, the create fails with the error : - “ovs-vsctl: Error detected while setting up 'geneve0': could not - add network device geneve0 to ofproto (Address family not supported + “ovs-vsctl: Error detected while setting up 'geneve0': could not + add network device geneve0 to ofproto (Address family not supported by protocol)." - + [Fix] + There is an upstream commit for this in v5.0 mainline. + + "geneve: correctly handle ipv6.disable module parameter" + Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7 + + This fix is needed on all our series: X, C, B, D + + [Test Case] - (Best to do this on a kvm guest VM so as not to interfere with - your system's networking) + (Best to do this on a kvm guest VM so as not to interfere with + your system's networking) 1. On any Ubuntu Xenial kernel, disable ipv6. This example -is shown with the4.15.0-23-generic kernel (which differs -slightly from 4.4.x in symptoms): - + is shown with the4.15.0-23-generic kernel (which differs + slightly from 4.4.x in symptoms): + - Edit /etc/default/grub to add the line: - GRUB_CMDLINE_LINUX="ipv6.disable=1" + GRUB_CMDLINE_LINUX="ipv6.disable=1" - # update-grub - Reboot - 2. Install OVS # apt install openvswitch-switch 3. Create a Geneve tunnel # ovs-vsctl add-br br1 - # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 + # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.x.z (where remote_ip is the IP of the other host) You will see the following error message: - "ovs-vsctl: Error detected while setting up 'geneve1'. + "ovs-vsctl: Error detected while setting up 'geneve1'. See ovs-vswitchd log for details." From /var/log/openvswitch/ovs-vswitchd.log you will see: - "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: - failed to add geneve1 as port: Address family not supported + "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: + failed to add geneve1 as port: Address family not supported by protocol" - You will notice from the "ifconfig" output that the device + You will notice from the "ifconfig" output that the device genev_sys_6081 is not created. - If you do not disable IPv6 (remove ipv6.disable=1 from - /etc/default/grub + update-grub + reboot), the same - 'ovs-vsctl add-port' command completes successfully. - You can see that it is working properly by adding an - IP to the br1 and pinging each host. + If you do not disable IPv6 (remove ipv6.disable=1 from + /etc/default/grub + update-grub + reboot), the same + 'ovs-vsctl add-port' command completes successfully. + You can see that it is working properly by adding an + IP to the br1 and pinging each host. - On kernel 4.4 (4.4.0-128-generic), the error message doesn't - happen using the 'ovs-vsctl add-port' command, no warning is - shown in ovs-vswitchd.log, but the device genev_sys_6081 is + On kernel 4.4 (4.4.0-128-generic), the error message doesn't + happen
[Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)
I have installed and booted to this kernel, and ensured no new regression introduced, although I cannot repro the issue. ** Tags removed: 4.15.0-24-generic cosmic kernel verification-needed-bionic verification-needed-cosmic ** Tags added: verification-done-bionic verification-done-cosmic ** Description changed: [Impact] The i40e driver can get stalled on tx timeouts. This can happen when DCB is enabled on the connected switch. This can also trigger a second situation when a tx timeout occurs before the recovery of a previous timeout has completed due to CPU load, which is not handled correctly. This leads to networking delays, drops and application timeouts and hangs. Note that the first tx timeout cause is just one of the ways to end up in the second situation. This issue was seen on a heavily loaded Kafka broker node running - the 4.15.0-38-generic kernel on Xenial. + the 4.15.0-38-generic kernel on Xenial. Symptoms include messages in the kernel log of the form: --- [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0 [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, hung_queue 6 With the test kernel provided in this LP bug which had these two commits compiled in, the problem has not been seen again, and has been running successfully for several months: - "i40e: Fix for Tx timeouts when interface is brought up if - DCB is enabled" + "i40e: Fix for Tx timeouts when interface is brought up if + DCB is enabled" Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee "i40e: prevent overlapping tx_timeout recover" Commit: d5585b7b6846a6d0f9517afe57be3843150719da * The first commit is already in Disco, Cosmic * The second commit is already in Disco * Bionic needs both patches and Cosmic needs the second [Test Case] * We are considering the case of both issues above occurring. * Seen by reporter on a Kafka broker node with heavy traffic. - * Not easy to reproduce as it requires something like the - following example environment and heavy load: + * Not easy to reproduce as it requires something like the + following example environment and heavy load: - Kernel: 4.15.0-38-generic - Network driver: i40e - version: 2.1.14-k - firmware-version: 6.00 0x800034e6 18.3.6 - NIC: Intel 40Gb XL710 - DCB enabled - + Kernel: 4.15.0-38-generic + Network driver: i40e + version: 2.1.14-k + firmware-version: 6.00 0x800034e6 18.3.6 + NIC: Intel 40Gb XL710 + DCB enabled [Regression Potential] Low, as the first only impacts i40e DCB environment, and has - been running for several months in production-load testing + been running for several months in production-load testing successfully. - --- Original Description Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to the Kernel 4.15.0-24-generic. On a "Dell PowerEdge R330" server with a network adapter "Intel Ethernet Converged Network Adapter X710-DA2" (driver i40e) the network card no longer works and permanently displays these three lines : [ 98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 [ 98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, hung_queue 8 [ 98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1779756 Title: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus
** Changed in: linux (Ubuntu) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus
I'm still trying to confirm this for Xenial. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus
Submitted patches for SRU. ** Description changed: [Impact] Transmit packet steering (xps) settings don't work when the number of queues (cpus) is higher than 64. This is currently still an issue on the 4.15 kernel (Xenial -hwe - and Bionic kernels). + and Bionic kernels). It was fixed in Intel's i40e driver version 2.7.11 and in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix). Fix - The following commit fixes this issue (as identified by Lihong Yang in discussion with Intel i40e team): "i40e: Fix the number of queues available to be mapped for use" Commit: bc6d33c8d93f520e97a8c6330b8910053d4f + It requires the following commit as well: + + i40e: Do not allow use more TC queue pairs than MSI-X vectors exist + Commit: 1563f2d2e01242f05dd523ffd56fe104bc1afd58 + [Test Case] 1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel -i40e driver version: 2.1.14-k -Any system with > 64 CPUs + i40e driver version: 2.1.14-k + Any system with > 64 CPUs 2. For any queue 0 - 63, you can read/set tx xps: - + echo > /sys/class/net/eth2/queues/tx-63/xps_cpus echo $? 0 cat /sys/class/net/eth2/queues/tx-63/xps_cpus - 00,, + 00,, - But for any queue number > 63, we see this error: + But for any queue number > 63, we see this error: echo > /sys/class/net/eth2/queues/tx-64/xps_cpus echo: write error: Invalid argument cat /sys/class/net/eth2/queues/tx-64/xps_cpus cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer
I am not sure we could deterministically provoke the issue. At the very least to ensure no other regression was introduced, I would run it under heavy network load. The environment in question which saw the issue had network load, contention for cpus and several other issues occur. The basic environment is: 1. For any 25Gb NIC/chipset that requires the 4.4 bnxt_en_bpo driver, set its 2 ports/interfaces up in bonding mode as follows: bond-lacp-rate fast bond-master bond0 bond-miimon 100 bond-mode 802.3ad bond-xmit-hash-policy layer3+4 mtu 9000 2. Run any heavy TCP network load test over the systems (e.g. iperf, netperf, file transfer, etc.) 3. Theoretically, it would appear that if the number of tx ring descriptors were lower, than that would be more likely to hit this (not successfully proven by testing here), but can lower it and see if that helps: # ethtool -G eno49 tx 128 // for example I am not sure if that helps, Scott. I'll try and smoke up more specific steps but I cannot guarantee you will see the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1814095 Title: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus
Will be submitting SRU request early next week; trying to get it into this next kernel release cycle. ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi) ** Changed in: linux (Ubuntu Bionic) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer
Just briefly wanted to say that this is one we've discussed at length -- we may not be able to get someone who has the right NIC to test with it in time. I'm sanity checking the kernel, but that is not exercising the key change here. If we could assume verification-done for our purposes here, that might be needed. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1814095 Title: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] [NEW] i40e xps management broken when > 64 queues/cpus
Public bug reported: [Impact] Transmit packet steering (xps) settings don't work when the number of queues (cpus) is higher than 64. This is currently still an issue on the 4.15 kernel (Xenial -hwe and Bionic kernels). It was fixed in Intel's i40e driver version 2.7.11 and in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix). Fix - The following commit fixes this issue (as identified by Lihong Yang in discussion with Intel i40e team): "i40e: Fix the number of queues available to be mapped for use" Commit: bc6d33c8d93f520e97a8c6330b8910053d4f [Test Case] 1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel i40e driver version: 2.1.14-k Any system with > 64 CPUs 2. For any queue 0 - 63, you can read/set tx xps: echo > /sys/class/net/eth2/queues/tx-63/xps_cpus echo $? 0 cat /sys/class/net/eth2/queues/tx-63/xps_cpus 00,, But for any queue number > 63, we see this error: echo > /sys/class/net/eth2/queues/tx-64/xps_cpus echo: write error: Invalid argument cat /sys/class/net/eth2/queues/tx-64/xps_cpus cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument ** Affects: linux (Ubuntu) Importance: High Status: Confirmed ** Affects: linux (Ubuntu Bionic) Importance: High Assignee: Nivedita Singhvi (niveditasinghvi) Status: Confirmed ** Tags: bionic ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Status: New => Confirmed ** Changed in: linux (Ubuntu Bionic) Status: New => Confirmed ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus
It's been reported by an external reporter and reproduced internally. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1820948 Title: i40e xps management broken when > 64 queues/cpus To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)
Submitted SRU request -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1779756 Title: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)
** Tags added: bionic cosmic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1779756 Title: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)
** Description changed: - Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 - to the Kernel 4.15.0-24-generic. + [Impact] + The i40e driver can get stalled on tx timeouts. This can happen when + DCB is enabled on the connected switch. This can also trigger a + second situation when a tx timeout occurs before the recovery of + a previous timeout has completed due to CPU load, which is not + handled correctly. This leads to networking delays, drops and + application timeouts and hangs. Note that the first tx timeout + cause is just one of the ways to end up in the second situation. + + This issue was seen on a heavily loaded Kafka broker node running + the 4.15.0-38-generic kernel on Xenial. + + Symptoms include messages in the kernel log of the form: + + --- + [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0 + [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, hung_queue 6 + + + With the test kernel provided in this LP bug which had these + two commits compiled in, the problem has not been seen again, + and has been running successfully for several months: + + "i40e: Fix for Tx timeouts when interface is brought up if + DCB is enabled" + Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee + + "i40e: prevent overlapping tx_timeout recover" + Commit: d5585b7b6846a6d0f9517afe57be3843150719da + + * The first commit is already in Disco, Cosmic + * The second commit is already in Disco + * Bionic needs both patches and Cosmic needs the second + + [Test Case] + * We are considering the case of both issues above occurring. + * Seen by reporter on a Kafka broker node with heavy traffic. + * Not easy to reproduce as it requires something like the + following example environment and heavy load: + + Kernel: 4.15.0-38-generic + Network driver: i40e + version: 2.1.14-k + firmware-version: 6.00 0x800034e6 18.3.6 + NIC: Intel 40Gb XL710 + DCB enabled + + + [Regression Potential] + Low, as the first only impacts i40e DCB environment, and has + been running for several months in production-load testing + successfully. + + + --- Original Description + Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to the Kernel 4.15.0-24-generic. On a "Dell PowerEdge R330" server with a network adapter "Intel Ethernet Converged Network Adapter X710-DA2" (driver i40e) the network card no longer works and permanently displays these three lines : - [ 98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 [ 98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, hung_queue 8 [ 98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1779756 Title: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)
** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi) ** Changed in: linux (Ubuntu Cosmic) Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi) ** Changed in: linux (Ubuntu Bionic) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu Cosmic) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1779756 Title: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs