[Kernel-packages] [Bug 1908108] [NEW] tg3: transmit timed out, resetting
Public bug reported: On a deploy of kubernetes, we're seeing a machine have issues with its tg3 driven nics. We see: Dec 14 07:44:08 juju-fcf29c-0-lxd-1 kernel: [ 1496.772960] tg3 :02:00.1 eth1: transmit timed out, resetting Around that time, we have issues with services losing network connections. A juju crashdump with logs is available here: https://oil-jenkins.canonical.com/artifacts/c15028dc-46fa-4f08-8895-55e9d500c362/generated/generated/kubernetes/juju-crashdump-kubernetes-2020-12-14-07.48.49.tar.gz syslog is at kubernetes-master_0/var/log/syslog this is on focal: [0.00] kernel: Linux version 5.4.0-58-generic (buildd@lcy01-amd64-004) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 (Ubuntu 5.4.0-58.64-generic 5.4.73) ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Description changed: On a deploy of kubernetes, we're seeing a machine have issues with its tg3 driven nics. We see: Dec 14 07:44:08 juju-fcf29c-0-lxd-1 kernel: [ 1496.772960] tg3 :02:00.1 eth1: transmit timed out, resetting Around that time, we have issues with services losing network connections. + A juju crashdump with logs is available here: + https://oil-jenkins.canonical.com/artifacts/c15028dc-46fa-4f08-8895-55e9d500c362/generated/generated/kubernetes/juju-crashdump-kubernetes-2020-12-14-07.48.49.tar.gz + + syslog is at kubernetes-master_0/var/log/syslog this is on focal: [0.00] kernel: Linux version 5.4.0-58-generic (buildd@lcy01-amd64-004) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 (Ubuntu 5.4.0-58.64-generic 5.4.73) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1908108 Title: tg3: transmit timed out, resetting Status in linux package in Ubuntu: New Bug description: On a deploy of kubernetes, we're seeing a machine have issues with its tg3 driven nics. We see: Dec 14 07:44:08 juju-fcf29c-0-lxd-1 kernel: [ 1496.772960] tg3 :02:00.1 eth1: transmit timed out, resetting Around that time, we have issues with services losing network connections. A juju crashdump with logs is available here: https://oil-jenkins.canonical.com/artifacts/c15028dc-46fa-4f08-8895-55e9d500c362/generated/generated/kubernetes/juju-crashdump-kubernetes-2020-12-14-07.48.49.tar.gz syslog is at kubernetes-master_0/var/log/syslog this is on focal: [0.00] kernel: Linux version 5.4.0-58-generic (buildd@lcy01-amd64-004) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 (Ubuntu 5.4.0-58.64-generic 5.4.73) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1908108/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
Re: [Kernel-packages] [Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout
@ Ryan we do not test Xenial or Disco On Thu, Aug 22, 2019 at 7:41 PM Ryan Harper <1784...@bugs.launchpad.net> wrote: > Finally, I did verify xenial proposed with our original test. I had > over 100 installs with no issue. > > @Jason > > Have you had any runs on Xenial or Disco? (or do you not test those)? > > -- > You received this bug notification because you are a member of Canonical > Field Critical, which is subscribed to a duplicate bug report (1796292). > https://bugs.launchpad.net/bugs/1784665 > > Title: > bcache: bch_allocator_thread(): hung task timeout > > Status in linux package in Ubuntu: > Fix Committed > Status in linux source package in Xenial: > Fix Committed > Status in linux source package in Bionic: > New > Status in linux source package in Disco: > Fix Committed > Status in linux source package in Eoan: > Fix Committed > > Bug description: > [Impact] > > bcache_allocator() can call the following: > >bch_allocator_thread() > -> bch_prio_write() >-> bch_bucket_alloc() > -> wait on >set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself causing a deadlock. > > [Test Case] > > This is a simple script that can easily trigger the deadlock condition: > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh > > A better test case has been also provided in bug 1796292 (duplicate of > this bug): > > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh > > [Fix] > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. Moreover, make sure to > wake up the garbage collector thread when bch_prio_write() is failing > to allocate buckets to increase the chance of freeing up more buckets. > > In addition to that it would be safe to also import other upstream > bcache fixes (all clean cherry picks): > > 7e865eba00a3df2dc8c4746173a8ca1c1c7f042e bcache: fix potential deadlock > in cached_def_free() > 80265d8dfd77792e133793cef44a21323aac2908 bcache: acquire > bch_register_lock later in cached_dev_free() > ce4c3e19e5201424357a0c82176633b32a98d2ec bcache: Replace > bch_read_string_list() by __sysfs_match_string() > ecb37ce9baac653cc09e2b631393dde3df82979f bcache: Move couple of > functions to sysfs.c > 04cbc21137bfa4d7b8771a5b14f3d6c9b2aee671 bcache: Move couple of string > arrays to sysfs.c > 5f2b18ec8e1643410a2369f06888951cdedea0bf bcache: Fix a compiler warning > in bcache_device_init() > 20d3a518713e394efa5a899c84574b4b79ec5098 bcache: Reduce the number of > sparse complaints about lock imbalances > 42361469ae84c851e40cb1f94c8c9a14cdd94039 bcache: Suppress more warnings > about set-but-not-used variables > f0d3814090ac77de94c42b7124c37ece23629197 bcache: Remove an unused > variable > 47344e330eabc1515cbe6061eb337100a3ab6d37 bcache: Fix kernel-doc warnings > 9dfbdec7b7fea1ff1b7b5d5d12980dbc7dca46c7 bcache: Annotate switch > fall-through > 4a4e443835a43a79113cc237c472c0d268eb1e1c bcache: Add __printf annotation > to __bch_check_keys() > fd01991d5c20098c5c1ffc4dca6c821cc60a2f74 bcache: Fix indentation > ca71df31661a0518ed58a1a59cf1993962153ebb bcache: fix using of loop > variable in memory shrink > f3641c3abd1da978ee969b0203b71b86ec1bfa93 bcache: fix error return value > in memory shrink > 688892b3bc05e25da94866e32210e5f503f16f69 bcache: fix incorrect sysfs > output value of strip size > 09a44ca2114737e0932257619c16a2b50c7807f1 bcache: use pr_info() to inform > duplicated CACHE_SET_IO_DISABLE set > c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0 bcache: fix high CPU occupancy > during journal > a728eacbbdd229d1d903e46261c57d5206f87a4a bcache: add journal statistic > 616486ab52ab7f9739b066d958bdd20e65aefd74 bcache: fix writeback target > calc on large devices > 1f0ffa67349c56ea54c03ccfd1e073c990e7411e bcache: only set > BCACHE_DEV_WB_RUNNING when cached device attached > eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() > 9951379b0ca88c95876ad9778b9099e19a95d566 bcache: never writeback a > discard operation > > [Regression Potential] > > The upstream fixes are all clean cherry picks from stable (most of > them are small cleanups), so regression potential is minimal. > > The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator()" that is addressing the main deadlock bug (that > seems to be a mainline bug - not fixed yet). We should spend more time > trying to reproduce this deadlock with a mainline kernel and post the > patch to the LKML for review / feedback. > > However, considering that this patch seems to fix/prevent the specific > deadlock problem reported in this bug (tested on the affected > platform) it can be considered safe to apply it. > > [Original Bug Report] > > $ cat /proc/version_signature > Ubuntu 4.15.0-29.31-generic 4.15.18 > > $ lsb_release
[Kernel-packages] [Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout
** Changed in: linux (Ubuntu Bionic) Status: Fix Committed => New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1784665 Title: bcache: bch_allocator_thread(): hung task timeout Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Status in linux source package in Bionic: New Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Bug description: [Impact] bcache_allocator() can call the following: bch_allocator_thread() -> bch_prio_write() -> bch_bucket_alloc() -> wait on >set->bucket_wait But the wake up event on bucket_wait is supposed to come from bch_allocator_thread() itself causing a deadlock. [Test Case] This is a simple script that can easily trigger the deadlock condition: https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh A better test case has been also provided in bug 1796292 (duplicate of this bug): https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh [Fix] Fix by making the call to bch_prio_write() non-blocking, so that bch_allocator_thread() never waits on itself. Moreover, make sure to wake up the garbage collector thread when bch_prio_write() is failing to allocate buckets to increase the chance of freeing up more buckets. In addition to that it would be safe to also import other upstream bcache fixes (all clean cherry picks): 7e865eba00a3df2dc8c4746173a8ca1c1c7f042e bcache: fix potential deadlock in cached_def_free() 80265d8dfd77792e133793cef44a21323aac2908 bcache: acquire bch_register_lock later in cached_dev_free() ce4c3e19e5201424357a0c82176633b32a98d2ec bcache: Replace bch_read_string_list() by __sysfs_match_string() ecb37ce9baac653cc09e2b631393dde3df82979f bcache: Move couple of functions to sysfs.c 04cbc21137bfa4d7b8771a5b14f3d6c9b2aee671 bcache: Move couple of string arrays to sysfs.c 5f2b18ec8e1643410a2369f06888951cdedea0bf bcache: Fix a compiler warning in bcache_device_init() 20d3a518713e394efa5a899c84574b4b79ec5098 bcache: Reduce the number of sparse complaints about lock imbalances 42361469ae84c851e40cb1f94c8c9a14cdd94039 bcache: Suppress more warnings about set-but-not-used variables f0d3814090ac77de94c42b7124c37ece23629197 bcache: Remove an unused variable 47344e330eabc1515cbe6061eb337100a3ab6d37 bcache: Fix kernel-doc warnings 9dfbdec7b7fea1ff1b7b5d5d12980dbc7dca46c7 bcache: Annotate switch fall-through 4a4e443835a43a79113cc237c472c0d268eb1e1c bcache: Add __printf annotation to __bch_check_keys() fd01991d5c20098c5c1ffc4dca6c821cc60a2f74 bcache: Fix indentation ca71df31661a0518ed58a1a59cf1993962153ebb bcache: fix using of loop variable in memory shrink f3641c3abd1da978ee969b0203b71b86ec1bfa93 bcache: fix error return value in memory shrink 688892b3bc05e25da94866e32210e5f503f16f69 bcache: fix incorrect sysfs output value of strip size 09a44ca2114737e0932257619c16a2b50c7807f1 bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0 bcache: fix high CPU occupancy during journal a728eacbbdd229d1d903e46261c57d5206f87a4a bcache: add journal statistic 616486ab52ab7f9739b066d958bdd20e65aefd74 bcache: fix writeback target calc on large devices 1f0ffa67349c56ea54c03ccfd1e073c990e7411e bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() 9951379b0ca88c95876ad9778b9099e19a95d566 bcache: never writeback a discard operation [Regression Potential] The upstream fixes are all clean cherry picks from stable (most of them are small cleanups), so regression potential is minimal. The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator()" that is addressing the main deadlock bug (that seems to be a mainline bug - not fixed yet). We should spend more time trying to reproduce this deadlock with a mainline kernel and post the patch to the LKML for review / feedback. However, considering that this patch seems to fix/prevent the specific deadlock problem reported in this bug (tested on the affected platform) it can be considered safe to apply it. [Original Bug Report] $ cat /proc/version_signature Ubuntu 4.15.0-29.31-generic 4.15.18 $ lsb_release -rd Description: Ubuntu Cosmic Cuttlefish (development branch) Release: 18.10 $ apt-cache policy linux-image-`uname -r` linux-image-4.15.0-29-generic: Installed: 4.15.0-29.31 Candidate: 4.15.0-29.31 Version table: *** 4.15.0-29.31 500 500 http://archive.ubuntu.com/ubuntu cosmic/main amd64 Packages 100 /var/lib/dpkg/status 3) mkfs.ext4 /dev/bcache0 returns successful creating an ext4
[Kernel-packages] [Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout
** Attachment added: "spinda.maas-curtin_config.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1784665/+attachment/5284072/+files/spinda.maas-curtin_config.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1784665 Title: bcache: bch_allocator_thread(): hung task timeout Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Bug description: [Impact] bcache_allocator() can call the following: bch_allocator_thread() -> bch_prio_write() -> bch_bucket_alloc() -> wait on >set->bucket_wait But the wake up event on bucket_wait is supposed to come from bch_allocator_thread() itself causing a deadlock. [Test Case] This is a simple script that can easily trigger the deadlock condition: https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh A better test case has been also provided in bug 1796292 (duplicate of this bug): https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh [Fix] Fix by making the call to bch_prio_write() non-blocking, so that bch_allocator_thread() never waits on itself. Moreover, make sure to wake up the garbage collector thread when bch_prio_write() is failing to allocate buckets to increase the chance of freeing up more buckets. In addition to that it would be safe to also import other upstream bcache fixes (all clean cherry picks): 7e865eba00a3df2dc8c4746173a8ca1c1c7f042e bcache: fix potential deadlock in cached_def_free() 80265d8dfd77792e133793cef44a21323aac2908 bcache: acquire bch_register_lock later in cached_dev_free() ce4c3e19e5201424357a0c82176633b32a98d2ec bcache: Replace bch_read_string_list() by __sysfs_match_string() ecb37ce9baac653cc09e2b631393dde3df82979f bcache: Move couple of functions to sysfs.c 04cbc21137bfa4d7b8771a5b14f3d6c9b2aee671 bcache: Move couple of string arrays to sysfs.c 5f2b18ec8e1643410a2369f06888951cdedea0bf bcache: Fix a compiler warning in bcache_device_init() 20d3a518713e394efa5a899c84574b4b79ec5098 bcache: Reduce the number of sparse complaints about lock imbalances 42361469ae84c851e40cb1f94c8c9a14cdd94039 bcache: Suppress more warnings about set-but-not-used variables f0d3814090ac77de94c42b7124c37ece23629197 bcache: Remove an unused variable 47344e330eabc1515cbe6061eb337100a3ab6d37 bcache: Fix kernel-doc warnings 9dfbdec7b7fea1ff1b7b5d5d12980dbc7dca46c7 bcache: Annotate switch fall-through 4a4e443835a43a79113cc237c472c0d268eb1e1c bcache: Add __printf annotation to __bch_check_keys() fd01991d5c20098c5c1ffc4dca6c821cc60a2f74 bcache: Fix indentation ca71df31661a0518ed58a1a59cf1993962153ebb bcache: fix using of loop variable in memory shrink f3641c3abd1da978ee969b0203b71b86ec1bfa93 bcache: fix error return value in memory shrink 688892b3bc05e25da94866e32210e5f503f16f69 bcache: fix incorrect sysfs output value of strip size 09a44ca2114737e0932257619c16a2b50c7807f1 bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0 bcache: fix high CPU occupancy during journal a728eacbbdd229d1d903e46261c57d5206f87a4a bcache: add journal statistic 616486ab52ab7f9739b066d958bdd20e65aefd74 bcache: fix writeback target calc on large devices 1f0ffa67349c56ea54c03ccfd1e073c990e7411e bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() 9951379b0ca88c95876ad9778b9099e19a95d566 bcache: never writeback a discard operation [Regression Potential] The upstream fixes are all clean cherry picks from stable (most of them are small cleanups), so regression potential is minimal. The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator()" that is addressing the main deadlock bug (that seems to be a mainline bug - not fixed yet). We should spend more time trying to reproduce this deadlock with a mainline kernel and post the patch to the LKML for review / feedback. However, considering that this patch seems to fix/prevent the specific deadlock problem reported in this bug (tested on the affected platform) it can be considered safe to apply it. [Original Bug Report] $ cat /proc/version_signature Ubuntu 4.15.0-29.31-generic 4.15.18 $ lsb_release -rd Description: Ubuntu Cosmic Cuttlefish (development branch) Release: 18.10 $ apt-cache policy linux-image-`uname -r` linux-image-4.15.0-29-generic: Installed: 4.15.0-29.31 Candidate: 4.15.0-29.31 Version table: *** 4.15.0-29.31 500 500 http://archive.ubuntu.com/ubuntu cosmic/main amd64
[Kernel-packages] [Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout
We're still seeing a bcache timeout failure during curtin install 2019-08-22T10:16:40+00:00 spinda cloud-init[1604]: finish: cmd-install/stage-partitioning/builtin/cmd-block-meta/clear-holders: FAIL: removing previous storage devices 2019-08-22T10:16:40+00:00 spinda cloud-init[1604]: TIMED BLOCK_META: 1203.679 I attached the rsyslog from a unit that failed. Linux version 4.15.0-59-generic (buildd@lgw01-amd64-035) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #66-Ubuntu SMP Wed Aug 14 10:56:44 UTC 2019 (Ubuntu 4.15.0-59.66-generic 4.15.18) ** Attachment added: "messages" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1784665/+attachment/5284071/+files/messages ** Tags removed: verification-done-bionic ** Tags added: verification-failed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1784665 Title: bcache: bch_allocator_thread(): hung task timeout Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Bug description: [Impact] bcache_allocator() can call the following: bch_allocator_thread() -> bch_prio_write() -> bch_bucket_alloc() -> wait on >set->bucket_wait But the wake up event on bucket_wait is supposed to come from bch_allocator_thread() itself causing a deadlock. [Test Case] This is a simple script that can easily trigger the deadlock condition: https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh A better test case has been also provided in bug 1796292 (duplicate of this bug): https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh [Fix] Fix by making the call to bch_prio_write() non-blocking, so that bch_allocator_thread() never waits on itself. Moreover, make sure to wake up the garbage collector thread when bch_prio_write() is failing to allocate buckets to increase the chance of freeing up more buckets. In addition to that it would be safe to also import other upstream bcache fixes (all clean cherry picks): 7e865eba00a3df2dc8c4746173a8ca1c1c7f042e bcache: fix potential deadlock in cached_def_free() 80265d8dfd77792e133793cef44a21323aac2908 bcache: acquire bch_register_lock later in cached_dev_free() ce4c3e19e5201424357a0c82176633b32a98d2ec bcache: Replace bch_read_string_list() by __sysfs_match_string() ecb37ce9baac653cc09e2b631393dde3df82979f bcache: Move couple of functions to sysfs.c 04cbc21137bfa4d7b8771a5b14f3d6c9b2aee671 bcache: Move couple of string arrays to sysfs.c 5f2b18ec8e1643410a2369f06888951cdedea0bf bcache: Fix a compiler warning in bcache_device_init() 20d3a518713e394efa5a899c84574b4b79ec5098 bcache: Reduce the number of sparse complaints about lock imbalances 42361469ae84c851e40cb1f94c8c9a14cdd94039 bcache: Suppress more warnings about set-but-not-used variables f0d3814090ac77de94c42b7124c37ece23629197 bcache: Remove an unused variable 47344e330eabc1515cbe6061eb337100a3ab6d37 bcache: Fix kernel-doc warnings 9dfbdec7b7fea1ff1b7b5d5d12980dbc7dca46c7 bcache: Annotate switch fall-through 4a4e443835a43a79113cc237c472c0d268eb1e1c bcache: Add __printf annotation to __bch_check_keys() fd01991d5c20098c5c1ffc4dca6c821cc60a2f74 bcache: Fix indentation ca71df31661a0518ed58a1a59cf1993962153ebb bcache: fix using of loop variable in memory shrink f3641c3abd1da978ee969b0203b71b86ec1bfa93 bcache: fix error return value in memory shrink 688892b3bc05e25da94866e32210e5f503f16f69 bcache: fix incorrect sysfs output value of strip size 09a44ca2114737e0932257619c16a2b50c7807f1 bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0 bcache: fix high CPU occupancy during journal a728eacbbdd229d1d903e46261c57d5206f87a4a bcache: add journal statistic 616486ab52ab7f9739b066d958bdd20e65aefd74 bcache: fix writeback target calc on large devices 1f0ffa67349c56ea54c03ccfd1e073c990e7411e bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() 9951379b0ca88c95876ad9778b9099e19a95d566 bcache: never writeback a discard operation [Regression Potential] The upstream fixes are all clean cherry picks from stable (most of them are small cleanups), so regression potential is minimal. The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator()" that is addressing the main deadlock bug (that seems to be a mainline bug - not fixed yet). We should spend more time trying to reproduce this deadlock with a mainline kernel and post the patch to the LKML for review / feedback. However,
Re: [Kernel-packages] [Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures
This is difficult for us to test in our lab because we are using MAAS, and we hit this during MAAS deployments of nodes, so we would need MAAS images built with these kernels. Additionally, this doesn't reproduce every time, it is maybe 1/4 test runs. It may be best to find a way to reproduce this outside of MAAS. On Wed, Jul 3, 2019 at 11:16 AM Andrea Righi wrote: > >From a kernel perspective this big slowness on shutting down a bcache > volume might be caused by a locking / race condition issue. If I read > correctly this problem has been reproduced in bionic (and in xenial we > even got a kernel oops - it looks like caused by a NULL pointer > dereference). I would try to address these issues separately. > > About bionic it would be nice to test this commit (also mentioned by > @elmo in comment #28): > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 > > Moreover, even if we didn't get an explicit NULL pointer dereference > with bionic, I think it would be interesting to test also the following > fixes: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a4b732a248d12cbdb46999daf0bf288c011335eb > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f0ffa67349c56ea54c03ccfd1e073c990e7411e > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9951379b0ca88c95876ad9778b9099e19a95d566 > > I've already backported all of them and applied to the latest bionic > kernel. A test kernel is available here: > > https://kernel.ubuntu.com/~arighi/LP-1796292/ > > If it doesn't cost too much it would be great to do a test with it. In > the meantime I'll try to reproduce the problem locally. Thanks in > advance! > > -- > You received this bug notification because you are a member of Canonical > Field High, which is subscribed to the bug report. > https://bugs.launchpad.net/bugs/1796292 > > Title: > Tight timeout for bcache removal causes spurious failures > > Status in curtin: > Fix Released > Status in linux package in Ubuntu: > Confirmed > Status in linux source package in Bionic: > New > Status in linux source package in Cosmic: > New > Status in linux source package in Disco: > New > Status in linux source package in Eoan: > Confirmed > > Bug description: > I've had a number of deployment faults where curtin would report > Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass- > deployment of 30+ nodes. Upon retrying the node would usually deploy > fine. Experimentally I've set the timeout ridiculously high, and it > seems I'm getting no faults with this. I'm wondering if the timeout > for removal is set too tight, or might need to be made configurable. > > --- curtin/util.py~ 2018-05-18 18:40:48.0 + > +++ curtin/util.py 2018-10-05 09:40:06.807390367 + > @@ -263,7 +263,7 @@ >return _subp(*args, **kwargs) > > > -def wait_for_removal(path, retries=[1, 3, 5, 7]): > +def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]): >if not path: >raise ValueError('wait_for_removal: missing path parameter') > > To manage notifications about this bug go to: > https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1796292 Title: Tight timeout for bcache removal causes spurious failures Status in curtin: Fix Released Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: New Status in linux source package in Cosmic: New Status in linux source package in Disco: New Status in linux source package in Eoan: Confirmed Bug description: I've had a number of deployment faults where curtin would report Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass- deployment of 30+ nodes. Upon retrying the node would usually deploy fine. Experimentally I've set the timeout ridiculously high, and it seems I'm getting no faults with this. I'm wondering if the timeout for removal is set too tight, or might need to be made configurable. --- curtin/util.py~ 2018-05-18 18:40:48.0 + +++ curtin/util.py 2018-10-05 09:40:06.807390367 + @@ -263,7 +263,7 @@ return _subp(*args, **kwargs) -def wait_for_removal(path, retries=[1, 3, 5, 7]): +def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]): if not path: raise ValueError('wait_for_removal: missing path parameter') To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures
** Tags added: cdo-qa foundations-engine -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1796292 Title: Tight timeout for bcache removal causes spurious failures Status in curtin: New Status in linux package in Ubuntu: Confirmed Bug description: I've had a number of deployment faults where curtin would report Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass- deployment of 30+ nodes. Upon retrying the node would usually deploy fine. Experimentally I've set the timeout ridiculously high, and it seems I'm getting no faults with this. I'm wondering if the timeout for removal is set too tight, or might need to be made configurable. --- curtin/util.py~ 2018-05-18 18:40:48.0 + +++ curtin/util.py 2018-10-05 09:40:06.807390367 + @@ -263,7 +263,7 @@ return _subp(*args, **kwargs) -def wait_for_removal(path, retries=[1, 3, 5, 7]): +def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]): if not path: raise ValueError('wait_for_removal: missing path parameter') To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures
This occurrs on a target machine during maas install. Apport is not collected in this case. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1796292 Title: Tight timeout for bcache removal causes spurious failures Status in curtin: New Status in linux package in Ubuntu: Confirmed Bug description: I've had a number of deployment faults where curtin would report Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass- deployment of 30+ nodes. Upon retrying the node would usually deploy fine. Experimentally I've set the timeout ridiculously high, and it seems I'm getting no faults with this. I'm wondering if the timeout for removal is set too tight, or might need to be made configurable. --- curtin/util.py~ 2018-05-18 18:40:48.0 + +++ curtin/util.py 2018-10-05 09:40:06.807390367 + @@ -263,7 +263,7 @@ return _subp(*args, **kwargs) -def wait_for_removal(path, retries=[1, 3, 5, 7]): +def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]): if not path: raise ValueError('wait_for_removal: missing path parameter') To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1797581] Re: Composing a VM in MAAS with exactly 2048 MB RAM causes the VM to kernel panic
@Christian - release: bionic - seabios: 1.10.2-1ubuntu1 - qemu: 1:2.11+dfsg-1ubuntu7.10 - libvirt: 4.0.0-1ubuntu8.8 - ovmf - this is a uefi thing right? we're not using it. - kernel 2019-03-18T12:17:11+00:00 elastic-2 kernel: [0.00] Linux version 4.15.0-46-generic (buildd@lgw01-amd64-038) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 (Ubuntu 4.15.0-46.49-generic 4.15.18) I don't have copies of the binaries from this run - it was from daily maas images: 2019-03-18T12:03:37.296671+00:00 leafeon maas.import-images: [info] Region downloading image descriptions from 'http://images.maas.io/ephemeral-v3/daily/'. I don't see anything in the logs to indicate an ID number for the kernel, initrd, or image coming there. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1797581 Title: Composing a VM in MAAS with exactly 2048 MB RAM causes the VM to kernel panic Status in MAAS: Incomplete Status in linux package in Ubuntu: Confirmed Status in qemu package in Ubuntu: Confirmed Bug description: Using latest MAAS master, I'm unable to compose a VM over the UI successfully when composed with 2048 MB of RAM. By that I mean that the VM is created, but it fails with a kernel panic. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1797581/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1797581] Re: Composing a VM in MAAS with exactly 2048 MB RAM causes the VM to kernel panic
Bumped to field-high as we ran into this again in testing. We have a workaround, but it's to not use 2G VM's, which is really silly and hard to remember when we go and add new deployments, especially because the failure mode is not obvious at all. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1797581 Title: Composing a VM in MAAS with exactly 2048 MB RAM causes the VM to kernel panic Status in MAAS: Incomplete Status in linux package in Ubuntu: Confirmed Status in qemu package in Ubuntu: Confirmed Bug description: Using latest MAAS master, I'm unable to compose a VM over the UI successfully when composed with 2048 MB of RAM. By that I mean that the VM is created, but it fails with a kernel panic. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1797581/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
Re: [Kernel-packages] [Bug 1820287] Re: kernel panic during pxe boot on DL360 gen9
This happens only sporadically. If it happens, is there some keyboard sequence I can use to dump more information, or is the system totally frozen at this point? Jason On Sat, Mar 16, 2019 at 11:35 AM Kai-Heng Feng wrote: > Would it be possible to get earlier trace? > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1820287 > > Title: > kernel panic during pxe boot on DL360 gen9 > > Status in linux package in Ubuntu: > Confirmed > > Bug description: > A machine in our test lab kernel panic'd during PXE boot from MAAS. > > It was running 4.15.0-46-generic #49-Ubuntu > > I've attached a screenshot of the call trace. > > To manage notifications about this bug go to: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820287/+subscriptions > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1820287 Title: kernel panic during pxe boot on DL360 gen9 Status in linux package in Ubuntu: Confirmed Bug description: A machine in our test lab kernel panic'd during PXE boot from MAAS. It was running 4.15.0-46-generic #49-Ubuntu I've attached a screenshot of the call trace. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820287/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1820287] Re: kernel panic during pxe boot on DL360 gen9
I can't get logs from the system because it's kernel panic'd. ** Changed in: linux (Ubuntu) Status: Incomplete => New ** Changed in: linux (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1820287 Title: kernel panic during pxe boot on DL360 gen9 Status in linux package in Ubuntu: Confirmed Bug description: A machine in our test lab kernel panic'd during PXE boot from MAAS. It was running 4.15.0-46-generic #49-Ubuntu I've attached a screenshot of the call trace. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820287/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1820287] [NEW] kernel panic during pxe boot on DL360 gen9
Public bug reported: A machine in our test lab kernel panic'd during PXE boot from MAAS. It was running 4.15.0-46-generic #49-Ubuntu I've attached a screenshot of the call trace. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: cdo-qa foundations-engine ** Attachment added: "kernel panic beartic" https://bugs.launchpad.net/bugs/1820287/+attachment/5246430/+files/kernel%20panic%20beartic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1820287 Title: kernel panic during pxe boot on DL360 gen9 Status in linux package in Ubuntu: New Bug description: A machine in our test lab kernel panic'd during PXE boot from MAAS. It was running 4.15.0-46-generic #49-Ubuntu I've attached a screenshot of the call trace. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820287/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1772490] Re: 'Deploying' timed out after 40 minutes / Failedbcache: register_bcache() error
*** This bug is a duplicate of bug 1768893 *** https://bugs.launchpad.net/bugs/1768893 ** This bug has been marked a duplicate of bug 1768893 installation on several nodes failed with errors relating to dmsetup remove of ceph devices. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1772490 Title: 'Deploying' timed out after 40 minutes / Failedbcache: register_bcache() error Status in curtin: Invalid Status in MAAS: Invalid Status in linux package in Ubuntu: Incomplete Bug description: We have a few runs over the weekend failed to deploy with maas 2.3.3. May 21 11:33:50 swoobat maas.node: [info] geodude: Status transition from DEPLOYING to FAILED_DEPLOYMENT May 21 11:33:50 swoobat maas.node: [error] geodude: Marking node failed: Node operation 'Deploying' timed out after 40 minutes. https://solutions.qa.canonical.com/#/qa/testRun/67dae845-b22e- 4de1-9b30-0ecb28eb3c35 To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1772490/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1759445] Re: kernel panic when trying to reboot in bionic
After updating firmware on the servers, we can't reproduce it at all anymore. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic Status in MAAS: Invalid Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Incomplete Bug description: cpe_foundation test deployment of Bionic failed. After some investigation, it looks like the nodes deployed and installed bionic, but never came back from a reboot. Accessing the ILO console of a node in question (all nodes failed), it revealed a kernel panic (attached) To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1759445/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1759445] Re: kernel panic when trying to reboot in bionic
So far we've only been able to produce this by doing bionic deploys. One thing that stands out in the rsyslog for bionic deploys is this failure: http://paste.ubuntu.com/p/y8xXc7PYjp/ Apr 2 17:48:35 leafeon blkdeactivate[1782]: /sbin/blkdeactivate: line 345: /bin/sort: No such file or directory Could it be related? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic Status in MAAS: Invalid Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Incomplete Bug description: cpe_foundation test deployment of Bionic failed. After some investigation, it looks like the nodes deployed and installed bionic, but never came back from a reboot. Accessing the ILO console of a node in question (all nodes failed), it revealed a kernel panic (attached) To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1759445/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1759445] Re: kernel panic when trying to reboot in bionic
We reproduced it again... looking to try the testing now. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic Status in MAAS: Invalid Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Incomplete Bug description: cpe_foundation test deployment of Bionic failed. After some investigation, it looks like the nodes deployed and installed bionic, but never came back from a reboot. Accessing the ILO console of a node in question (all nodes failed), it revealed a kernel panic (attached) To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1759445/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1759445] Re: kernel panic when trying to reboot in bionic
We can no longer reproduce this. ** Changed in: linux (Ubuntu Bionic) Status: Triaged => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic Status in MAAS: Invalid Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Incomplete Bug description: cpe_foundation test deployment of Bionic failed. After some investigation, it looks like the nodes deployed and installed bionic, but never came back from a reboot. Accessing the ILO console of a node in question (all nodes failed), it revealed a kernel panic (attached) To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1759445/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1759445] Re: kernel panic when trying to reboot in bionic
** Tags added: foundations-engine -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic Status in MAAS: Invalid Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: cpe_foundation test deployment of Bionic failed. After some investigation, it looks like the nodes deployed and installed bionic, but never came back from a reboot. Accessing the ILO console of a node in question (all nodes failed), it revealed a kernel panic (attached) To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1759445/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1759445] Re: Bionic due to kernel panic
This bug is a kernel panic when rebooting at the end of a MAAS deployment of bionic; there is no way to run apport-collect. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Summary changed: - Bionic due to kernel panic + kernel panic when trying to reboot in bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic Status in MAAS: Invalid Status in linux package in Ubuntu: Confirmed Bug description: cpe_foundation test deployment of Bionic failed. After some investigation, it looks like the nodes deployed and installed bionic, but never came back from a reboot. Accessing the ILO console of a node in question (all nodes failed), it revealed a kernel panic (attached) To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1759445/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1742505] Re: gre_sys set to default 1472 when using path_mtu > 1500 with ovs 2.8.x
@james-page When will the 2.8.1 release be? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1742505 Title: gre_sys set to default 1472 when using path_mtu > 1500 with ovs 2.8.x Status in Ubuntu Cloud Archive: In Progress Status in Ubuntu Cloud Archive pike series: In Progress Status in Ubuntu Cloud Archive queens series: In Progress Status in neutron: Invalid Status in linux package in Ubuntu: Confirmed Status in openvswitch package in Ubuntu: In Progress Status in linux source package in Artful: Confirmed Status in openvswitch source package in Artful: In Progress Status in linux source package in Bionic: Confirmed Status in openvswitch source package in Bionic: In Progress Bug description: [Impact] OpenStack Clouds using GRE overlay tunnels with > 1500 MTU's will observe packet fragmentation/networking issues for traffic in overlay networks. [Test Case] Deploy OpenStack Pike (xenial + pike UCA or artful) Create tenant networks using GRE segmentation Boot instances Instance networking will be broken/slow gre_sys devices will be set to mtu=1472 on hypervisor hosts. [Regression Potential] Minimal; the fix to OVS works around an issue for GRE tunnel port setup via rtnetlink by performing a second request once the gre device is setup to set the MTU to a high value (65000). [Original Bug Report] Setup: Pike neutron 11.0.2-0ubuntu1.1~cloud0 OVS 2.8.0 Jumbo frames setttings per: https://docs.openstack.org/mitaka/networking-guide/config-mtu.html global_physnet_mtu = 9000 path_mtu = 9000 Symptoms: gre_sys MTU is 1472 Instances with MTUs > 1500 fail to communicate across GRE Temporary Workaround: ifconfig gre_sys MTU 9000 Note: When ovs rebuilds tunnels, such as on a restart, gre_sys MTU is set back to default 1472. Note: downgrading from OVS 2.8.0 to 2.6.1 resolves the issue. Previous behavior: With Ocata or Pike and OVS 2.6.x gre_sys MTU defaults to 65490 It remains at 65490 through restarts. This may be related to some combination of the following changes in OVS which seem to imply MTUs must be set in the ovs database for tunnel interfaces and patches: https://github.com/openvswitch/ovs/commit/8c319e8b73032e06c7dd1832b3b31f8a1189dcd1 https://github.com/openvswitch/ovs/commit/3a414a0a4f1901ba015ec80b917b9fb206f3c74f https://github.com/openvswitch/ovs/blob/6355db7f447c8e83efbd4971cca9265f5e0c8531/datapath/vport-internal_dev.c#L186 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1742505/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1737640] Re: /usr/sbin/fanctl: arithmetic expression: expecting primary | unconfigured interfaces cause ifup failures
Testing on arm64, the workaround of adding xenial-proposed via maas doesn't work - the newer ubuntu-fan package isn't being installed http://paste.ubuntu.com/26178859/ I don't know how that can be, since the repo is being added (or should be added) before juju installs ubuntu-fan. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to ubuntu-fan in Ubuntu. https://bugs.launchpad.net/bugs/1737640 Title: /usr/sbin/fanctl: arithmetic expression: expecting primary | unconfigured interfaces cause ifup failures Status in juju: Triaged Status in ubuntu-fan package in Ubuntu: Confirmed Status in ubuntu-fan source package in Xenial: Fix Committed Bug description: I'm seeing this error as the status of multiple containers in my deploy: http://paste.ubuntu.com/26166720/ I can't connect to the parent machines anymore either - it seems networking is totally hosed on the machines. This is with juju 2.3.1. To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1737640/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1737640] Re: /usr/sbin/fanctl: arithmetic expression: expecting primary | unconfigured interfaces cause ifup failures
I just tested this also and can verify it fixed it in the environment/test where it was originally reported as broken. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to ubuntu-fan in Ubuntu. https://bugs.launchpad.net/bugs/1737640 Title: /usr/sbin/fanctl: arithmetic expression: expecting primary | unconfigured interfaces cause ifup failures Status in juju: Triaged Status in ubuntu-fan package in Ubuntu: Confirmed Status in ubuntu-fan source package in Xenial: Fix Committed Bug description: I'm seeing this error as the status of multiple containers in my deploy: http://paste.ubuntu.com/26166720/ I can't connect to the parent machines anymore either - it seems networking is totally hosed on the machines. This is with juju 2.3.1. To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1737640/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
Re: [Kernel-packages] [Bug 1641593] Re: unable to enable iommu on HPE Proliant Gen9 server
Doing some more testing it looks like the systems without the firmware udpate are not stable. I can sometimes, but not always, get them to boot using either 4.4.0-59-generic #80 or 4.4.0-62-generic #83, but once they're up, they don't last long. The longest I've seen is about 40 minutes before getting "sd 0:0:2:0: rejecting I/O to offline device" errors and /dev/sda going offline. I can get it to go offline quicker - almost immediately - by doing "cat /dev/urandom > /dev/sdb". The two systems with the firmware updates both reliably boot up and stay up using either 4.4.0-59-generic #80 or 4.4.0-62-generic #83, and haven't gone offline yet from the "cat /dev/urandom > /dev/sdb" test. I will leave them running over night. On Mon, Jan 30, 2017 at 6:51 PM, Jason Hobbs <jason.ho...@canonical.com> wrote: > So, I appear to have spoken too soon on exactly what fixes this. > > We have two systems being tested with 4.4.0-62-generic #83 - one with the > firmware update and one without. > > The one with the firmware updates has been up for over 6 hours now without > any issues. > > The one without firmware updates has been up for 40 minutes and is getting > I/O errors now. > > I'm also seeing a system with 4.4.0-59-generic #80 and no firmware updates > boot up with iommu enabled, I will see how long it stays up.. > > I'll also test with 4.4.0-59-generic #80 and the firmware updates. > > On Mon, Jan 30, 2017 at 6:13 PM, Jason Hobbs <jason.ho...@canonical.com> > wrote: > >> We found testing with the latest Xenial kernel (4.4.0.62.65) from >> https://launchpad.net/~canonical-kernel- >> team/+archive/ubuntu/ppa/+build/11278866 fixes this issue - no firmware >> updates required. We did also test with just the latest firmware >> updates, and that did not fix the issue. Latest firmware + 4.4.0.62.65 >> also works. >> >> -- >> You received this bug notification because you are subscribed to the bug >> report. >> https://bugs.launchpad.net/bugs/1641593 >> >> Title: >> unable to enable iommu on HPE Proliant Gen9 server >> >> Status in linux package in Ubuntu: >> Incomplete >> >> Bug description: >> I'm using MAAS to enable the following kernel flags on install/boot: >> >> iommu=pt intel_iommu=on >> >> in order to be able to passthrough SR-IOV VF functions to KVM guess; >> however when these options are enabled, the servers fail to install >> (see attached screenshot). >> >> The install eventually fails - it looks like the writes back to one of >> the disks starts to fail for some reason. >> >> Servers are targeted with Xenial and the release 4.4 kernel (no HWE). >> >> Here's the LSHW output from the system: >> http://pastebin.ubuntu.com/23875929/ >> >> To manage notifications about this bug go to: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1641593 >> /+subscriptions >> > > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1641593 Title: unable to enable iommu on HPE Proliant Gen9 server Status in linux package in Ubuntu: Incomplete Bug description: I'm using MAAS to enable the following kernel flags on install/boot: iommu=pt intel_iommu=on in order to be able to passthrough SR-IOV VF functions to KVM guess; however when these options are enabled, the servers fail to install (see attached screenshot). The install eventually fails - it looks like the writes back to one of the disks starts to fail for some reason. Servers are targeted with Xenial and the release 4.4 kernel (no HWE). Here's the LSHW output from the system: http://pastebin.ubuntu.com/23875929/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1641593/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
Re: [Kernel-packages] [Bug 1641593] Re: unable to enable iommu on HPE Proliant Gen9 server
So, I appear to have spoken too soon on exactly what fixes this. We have two systems being tested with 4.4.0-62-generic #83 - one with the firmware update and one without. The one with the firmware updates has been up for over 6 hours now without any issues. The one without firmware updates has been up for 40 minutes and is getting I/O errors now. I'm also seeing a system with 4.4.0-59-generic #80 and no firmware updates boot up with iommu enabled, I will see how long it stays up.. I'll also test with 4.4.0-59-generic #80 and the firmware updates. On Mon, Jan 30, 2017 at 6:13 PM, Jason Hobbs <jason.ho...@canonical.com> wrote: > We found testing with the latest Xenial kernel (4.4.0.62.65) from > https://launchpad.net/~canonical-kernel- > team/+archive/ubuntu/ppa/+build/11278866 fixes this issue - no firmware > updates required. We did also test with just the latest firmware > updates, and that did not fix the issue. Latest firmware + 4.4.0.62.65 > also works. > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1641593 > > Title: > unable to enable iommu on HPE Proliant Gen9 server > > Status in linux package in Ubuntu: > Incomplete > > Bug description: > I'm using MAAS to enable the following kernel flags on install/boot: > > iommu=pt intel_iommu=on > > in order to be able to passthrough SR-IOV VF functions to KVM guess; > however when these options are enabled, the servers fail to install > (see attached screenshot). > > The install eventually fails - it looks like the writes back to one of > the disks starts to fail for some reason. > > Servers are targeted with Xenial and the release 4.4 kernel (no HWE). > > Here's the LSHW output from the system: > http://pastebin.ubuntu.com/23875929/ > > To manage notifications about this bug go to: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/ > 1641593/+subscriptions > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1641593 Title: unable to enable iommu on HPE Proliant Gen9 server Status in linux package in Ubuntu: Incomplete Bug description: I'm using MAAS to enable the following kernel flags on install/boot: iommu=pt intel_iommu=on in order to be able to passthrough SR-IOV VF functions to KVM guess; however when these options are enabled, the servers fail to install (see attached screenshot). The install eventually fails - it looks like the writes back to one of the disks starts to fail for some reason. Servers are targeted with Xenial and the release 4.4 kernel (no HWE). Here's the LSHW output from the system: http://pastebin.ubuntu.com/23875929/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1641593/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1641593] Re: unable to enable iommu on HPE Proliant Gen9 server
We found testing with the latest Xenial kernel (4.4.0.62.65) from https://launchpad.net/~canonical-kernel- team/+archive/ubuntu/ppa/+build/11278866 fixes this issue - no firmware updates required. We did also test with just the latest firmware updates, and that did not fix the issue. Latest firmware + 4.4.0.62.65 also works. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1641593 Title: unable to enable iommu on HPE Proliant Gen9 server Status in linux package in Ubuntu: Incomplete Bug description: I'm using MAAS to enable the following kernel flags on install/boot: iommu=pt intel_iommu=on in order to be able to passthrough SR-IOV VF functions to KVM guess; however when these options are enabled, the servers fail to install (see attached screenshot). The install eventually fails - it looks like the writes back to one of the disks starts to fail for some reason. Servers are targeted with Xenial and the release 4.4 kernel (no HWE). Here's the LSHW output from the system: http://pastebin.ubuntu.com/23875929/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1641593/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1641593] Re: unable to enable iommu on HPE Proliant Gen9 server
** Description changed: I'm using MAAS to enable the following kernel flags on install/boot: iommu=pt intel_iommu=on in order to be able to passthrough SR-IOV VF functions to KVM guess; however when these options are enabled, the servers fail to install (see attached screenshot). The install eventually fails - it looks like the writes back to one of the disks starts to fail for some reason. Servers are targeted with Xenial and the release 4.4 kernel (no HWE). + + Here's the LSHW output from the system: + http://pastebin.ubuntu.com/23875929/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1641593 Title: unable to enable iommu on HPE Proliant Gen9 server Status in linux package in Ubuntu: Incomplete Bug description: I'm using MAAS to enable the following kernel flags on install/boot: iommu=pt intel_iommu=on in order to be able to passthrough SR-IOV VF functions to KVM guess; however when these options are enabled, the servers fail to install (see attached screenshot). The install eventually fails - it looks like the writes back to one of the disks starts to fail for some reason. Servers are targeted with Xenial and the release 4.4 kernel (no HWE). Here's the LSHW output from the system: http://pastebin.ubuntu.com/23875929/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1641593/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1618572] Re: apt-key add fails in overlayfs
** Tags added: oil -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1618572 Title: apt-key add fails in overlayfs Status in cloud-init: Confirmed Status in linux package in Ubuntu: Confirmed Status in linux source package in Xenial: Fix Committed Bug description: Sending a custom APT config to cloud-init fails to: 1. add keys 2. configure sources 3. configura additional repository. The same config is being sent to curtin, and curtin doesn't seem to fail (curtin install log http://paste.ubuntu.com/23112826/ just in case). config sent by maas = http://pastebin.ubuntu.com/23112834/ cloud-init.log = http://paste.ubuntu.com/23112820/ cloud-init-output.log = http://paste.ubuntu.com/23112822/ sources.list = http://paste.ubuntu.com/23112824/ ubuntu@node03:/var/log$ ls -l /etc/apt/sources.list.d/ total 0 ubuntu@node03:/var/log$ sudo apt-get update Hit:2 http://us.archive.ubuntu.com/ubuntu yakkety-updates InRelease Get:3 http://us.archive.ubuntu.com/ubuntu yakkety-backports InRelease [92.2 kB] Err:2 http://us.archive.ubuntu.com/ubuntu yakkety-updates InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 Ign:3 http://us.archive.ubuntu.com/ubuntu yakkety-backports InRelease Hit:4 http://us.archive.ubuntu.com/ubuntu yakkety-security InRelease Get:1 http://us.archive.ubuntu.com/ubuntu yakkety InRelease [247 kB] Err:4 http://us.archive.ubuntu.com/ubuntu yakkety-security InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 Err:1 http://us.archive.ubuntu.com/ubuntu yakkety InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 Fetched 339 kB in 0s (388 kB/s) Reading package lists... Error! W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://us.archive.ubuntu.com/ubuntu yakkety-updates InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 W: GPG error: http://us.archive.ubuntu.com/ubuntu yakkety-backports InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 W: The repository 'http://us.archive.ubuntu.com/ubuntu yakkety-backports InRelease' is not signed. N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use. N: See apt-secure(8) manpage for repository creation and user configuration details. W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://us.archive.ubuntu.com/ubuntu yakkety-security InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://us.archive.ubuntu.com/ubuntu yakkety InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 W: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/yakkety/InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 W: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/yakkety-updates/InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 W: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/yakkety-security/InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 40976EAF437D05B5 NO_PUBKEY 3B4FE6ACC0B21F32 W: Some index files failed to download. They have been ignored, or old ones used instead. E: Problem renaming the file /var/cache/apt/srcpkgcache.bin.3HKvbX to /var/cache/apt/srcpkgcache.bin - rename (116: Stale file handle) E: Problem renaming the file /var/cache/apt/pkgcache.bin.d0JUHJ to /var/cache/apt/pkgcache.bin - rename (116: Stale file handle) W: You may want to run apt-get update to correct these problems E: The package cache file is corrupted To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1618572/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to :
[Kernel-packages] [Bug 1464442] Re: installing or upgrading libc6 in Trusty removes all content from /tmp directory
Steve's suggested work around: # dpkg-divert --rename --add /sbin/telinit # cat /sbin/telinit #!/bin/sh exit 0 ^D # apt-get install [...] # dpkg-divert --rename --remove /sbin/telinit -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1464442 Title: installing or upgrading libc6 in Trusty removes all content from /tmp directory Status in linux package in Ubuntu: Invalid Status in upstart package in Ubuntu: Triaged Bug description: We are seeing an issue with installation of dkms package during a curtin installation which ends up with /tmp directory being wiped clean. This is very bad for curtin as it saves critical installation files in /tmp. It turns out that it's the of upgrading libc6, which is triggered as a result of installing dependencies, that removes content of /tmp. For example, installation of gcc results in the same result since it ends up with libc6 being upgraded. The only way that this won't be recreated is if the latest libc6 is already installed. This problem does not exist in precise. It can also be recreated by installing the .deb file for any version in trusty including 2.17. ubuntu@host:~$ ls /tmp tmpHHbRkP ubuntu@sirrush:~$ sudo apt-get install libc6 sudo: unable to resolve host sirrush Reading package lists... Done Building dependency tree Reading state information... Done The following extra packages will be installed: libc-dev-bin libc6-dev Suggested packages: glibc-doc Recommended packages: manpages-dev The following packages will be upgraded: libc-dev-bin libc6 libc6-dev 3 upgraded, 0 newly installed, 0 to remove and 148 not upgraded. Need to get 6,714 kB of archives. After this operation, 6,144 B disk space will be freed. Do you want to continue? [Y/n] y Get:1 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libc6-dev amd64 2.19-0ubuntu6.6 [1,910 kB] Get:2 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libc-dev-bin amd64 2.19-0ubuntu6.6 [68.9 kB] Get:3 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libc6 amd64 2.19-0ubuntu6.6 [4,735 kB] Fetched 6,714 kB in 0s (18.5 MB/s) Preconfiguring packages ... (Reading database ... 57798 files and directories currently installed.) Preparing to unpack .../libc6-dev_2.19-0ubuntu6.6_amd64.deb ... Unpacking libc6-dev:amd64 (2.19-0ubuntu6.6) over (2.19-0ubuntu6.3) ... Preparing to unpack .../libc-dev-bin_2.19-0ubuntu6.6_amd64.deb ... Unpacking libc-dev-bin (2.19-0ubuntu6.6) over (2.19-0ubuntu6.3) ... Preparing to unpack .../libc6_2.19-0ubuntu6.6_amd64.deb ... Unpacking libc6:amd64 (2.19-0ubuntu6.6) over (2.19-0ubuntu6.3) ... Processing triggers for man-db (2.6.7.1-1) ... Setting up libc6:amd64 (2.19-0ubuntu6.6) ... Setting up libc-dev-bin (2.19-0ubuntu6.6) ... Setting up libc6-dev:amd64 (2.19-0ubuntu6.6) ... Processing triggers for libc-bin (2.19-0ubuntu6.3) ... ubuntu@host:~$ ls /tmp ubuntu@host:~$ This is very recreatable. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1464442/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1444003] Re: BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:166]
** Changed in: linux (Ubuntu) Status: Incomplete = New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1444003 Title: BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:166] Status in linux package in Ubuntu: Incomplete Bug description: Seeing a number of failed deployments on the SM15K: This is from console log for a failed deployment this morning: [ 3310.700695] Call Trace:^M [ 3310.700698] [81057180] ? try_to_free_pmd_page+0x50/0x50^M [ 3310.700700] [8105709fff^M [ 3310.700707] [810590ed] change_page_attr_set_clr+0x38d/0x4a0^M [ 3310.700709] [a020a000] ? 0xa0209fff^M [ 3310.700711] [810596cf] set_memory_ro+0x2f/0x40^M [ 3310.700714] [8171bad4] set_section_ro_nx+0x3a/0x71^M [ 3310.700716] [810e25a8] load_module+0x12c8/0x1b40^M [ 3310.700719] [810de040] ? store_uevent+0x40/0x40^M [ 3310.700722] [810e2f96] SyS_finit_module+0x86/0xb0^M [ 3310.700725] [8173263d] system_call_fastpath+0x1a/0x1f^M [ 3310.700745] Code: 1d a9 29 00 3b 05 8f 90 c3 00 89 c2 0f 8d 25 fe ff ff 48 98 49 8b 4d 00 4p - CPU#2 stuck for 23s! [systemd-udevd:159]^M [ 3338.633651] Modules linked in: cryptd(+) e1000(+) ahci libahci^M [ 3338.633653] CPU: 2 PID: 159 Comm: systemd-udevd Not tainted 3.13.0-49-generic #81-Ubuntu^M [ 3338.633654] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0015 10/29/2012^M [ 3338.633655] task: 88022f361800 ti: 8800be95c000 task.ti: 8800be95c000^M [ 3338.633657] RIP: 0010:[810dc4ba] [810dc4ba] smp_call_function_many+0x26a/0x2d0^M [ 3338.633658] RSP: 0018:8800be95db60 EFLAGS: 0202^M [ 3338.633659] RAX: RBX: 88023fc93fc8 R09: 0004^M [ 3338.633661] R10: 88023fc93fc8 R11: 880234259c28 R12: 061c^M [ 3338.633662] R13: 8802341caf80 R14: 8802342eb180 R15: ^M [ 3338.633663] FS: 7f569e8ce880() GS:88023fc8() knlGS:^M [ 3338.633664] CS: 0010 DS: ES: CR0: 80050033^M [ 3338.633665] CR2: 02736138 CR3: be95b000 CR4: 000407e0^M [ 3338.633666] Stack:^M [ 3338.633669] 88023fc93fe8 00013f80 8800be95dbe8 8105c6e0^M [ 3338.633671] 0101 0012 8105c6e0 fff8105c6e0] ? rbt_memtype_copy_nth_element+0xa0/0xa0^M [ 3338.633680] [8105c6e0] ? rbt_memtype_copy_nth_element+0xa0/0xa0^M [ 3338.633682] [810dc67d] on_each_cpu+0x2d/0x60^M [ 3338.633685] [8105ccdd] flush_tlb_kernel_range+0x6d/0x70^M [ 3338.633687] [81187555] __purge_vmap_area_lazy+0x335/0x430^M [ 3338.633690] [811877b2] vm_unmap_aliases+0x162/0x180^M [ 3338.633693] [81058e2e] change_page_attr_set_clr+0xce/0x4a0^M [ 3338.633696] [81725ad1] ? __schedule+0x381/0x7d0^M [ 3338.633699] [81059243] set_memory_x+0x43/0x50^M [ 3338.633702] [ff ? store_uevent+0x40/0x40^M [ 3338.633711] [810e2f96] SyS_finit_module+0x86/0xb0^M [ 3338.633714] [8173263d] system_call_fastpath+0x1a/0x1f^M [ 3338.633734] Code: 1d a9 29 00 3b 05 8f 90 c3 00 89 c2 0f 8d 25 fe ff ff 48 98 49 8b 4d 00 48 03 0c c5 20 37 d1 81 f6 41 20 01 74 cb 0f 1f 00 f3 90 f6 41 20 01 75 f8 eb be 0f b6 4d d0 48 8b 55 c0 44 89 ef 48 8b ^M [ 3338.701651] BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:166]^M [ 3338.701654] Modules linked in: cryptd(+) e1000(+) ahci libahci^M [ 3338.701655] CPU: 6 P^M To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1444003/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
I don't believe there are plans to fix this against Saucy. ** Changed in: linux (Ubuntu Saucy) Status: Confirmed = Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Fix Released Status in Open Compute Project: Fix Released Status in “linux” package in Ubuntu: Fix Released Status in “linux” source package in Saucy: Incomplete Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
** Changed in: linux (Ubuntu) Status: Confirmed = Fix Released ** Changed in: opencompute Status: Confirmed = Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Fix Released Status in Open Compute Project: Fix Released Status in “linux” package in Ubuntu: Fix Released Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1292927] [NEW] temp bug - ignore for now!
Private bug reported: please ignore for now - filling out details in a minute ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Information type changed from Public to Private -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1292927 Title: temp bug - ignore for now! Status in “linux” package in Ubuntu: New Bug description: please ignore for now - filling out details in a minute To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1292927/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1282329] Re: juju requires cpu-checker which is unavailable on arm64/ppc64el
** Tags added: server-hwe -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to cpu-checker in Ubuntu. https://bugs.launchpad.net/bugs/1282329 Title: juju requires cpu-checker which is unavailable on arm64/ppc64el Status in juju-core: Invalid Status in “cpu-checker” package in Ubuntu: Fix Released Bug description: I'm testing out deploying charms to an arm64 target using the manual provider. After hacking juju to recognize the arch, and hacking in the necessary tools, I reach the following failure: dannf@laptop:~$ juju add-machine ssh:arm64.dannf -v verbose is deprecated with the current meaning, use show-log 2014-02-19 23:55:44 INFO juju api.go:231 connecting to API addresses: [bootstrap.dannf:17070] 2014-02-19 23:55:44 INFO juju apiclient.go:118 state/api: dialing wss://bootstrap.dannf:17070/ 2014-02-19 23:55:44 INFO juju apiclient.go:128 state/api: connection established 2014-02-19 23:55:44 INFO juju.environs.manual init.go:156 initialising arm64.dannf, user 2014-02-19 23:55:44 INFO juju.environs.manual init.go:167 ubuntu user is already initialised 2014-02-19 23:55:44 INFO juju.environs.manual provisioner.go:260 addresses for arm64.dannf: [192.168.1.117 public:arm64.dannf] 2014-02-19 23:55:44 INFO juju.environs.manual init.go:29 Checking if arm64.dannf is already provisioned 2014-02-19 23:55:44 INFO juju.environs.manual init.go:46 arm64.dannf is not provisioned 2014-02-19 23:55:44 INFO juju.environs.manual init.go:55 Detecting series and characteristics on arm64.dannf 2014-02-19 23:55:45 INFO juju.environs.manual init.go:118 series: trusty, characteristics: arch=arm64 cpu-cores=1 mem=16062M Logging to /var/log/cloud-init-output.log on remote host Running apt-get update Installing package: git Installing package: cpu-checker 2014-02-19 23:56:23 ERROR juju.environs.manual provisioner.go:78 provisioning failed, removing machine 2: exit status 1 2014-02-19 23:56:23 ERROR juju.cmd supercommand.go:294 exit status 1 The issue here is that cpu-checker is not available for arm64 (or ppc64el) in the archive. To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1282329/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
** Tags added: server-hwe -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Fix Committed Status in Open Compute Project: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
Hey Dustin - I reassigned to David since I'm not sure who will be testing it. David/Samantha/Rod - please reassign to whoever is doing the test! ** Changed in: maas Status: Triaged = In Progress ** Changed in: maas Assignee: Jason Hobbs (jason-hobbs) = David Duffey (david-duffey) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: In Progress Status in Open Compute Project: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
Cool David - let me know how it works out. The branch is otherwise complete/reviewed and ready to land. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Triaged Status in Open Compute Project: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
** Changed in: maas Milestone: None = 14.04 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Triaged Status in Open Compute Project: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
** Branch linked: lp:~jason-hobbs/maas/lp-1210393 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Triaged Status in Open Compute Project: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
I've posted a branch with a fix to lp:~jason-hobbs/maas/lp-1210393 I've manually tested this, but for lack of access, not on OCPv3 Roadrunner. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Triaged Status in Open Compute Project: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1210393] Re: MAAS ipmi fails on OCPv3 Roadrunner
I've started work on a patch to fix this. It will find either an existing maas user, or will find the first disabled user with an empty username. If it can't find either it will bail and give up on automatic IPMI config. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1210393 Title: MAAS ipmi fails on OCPv3 Roadrunner Status in MAAS: Triaged Status in Open Compute Project: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “linux” source package in Saucy: Confirmed Bug description: The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc. When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button). If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui. Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1210393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp