*** This bug is a duplicate of bug 1784665 ***
https://bugs.launchpad.net/bugs/1784665
This bug was fixed in the package linux - 5.2.0-13.14
---
linux (5.2.0-13.14) eoan; urgency=medium
* eoan/linux: 5.2.0-13.14 -proposed tracker (LP: #1840261)
* NULL pointer dereference whe
*** This bug is a duplicate of bug 1784665 ***
https://bugs.launchpad.net/bugs/1784665
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
*** This bug is a duplicate of bug 1784665 ***
https://bugs.launchpad.net/bugs/1784665
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
*** This bug is a duplicate of bug 1784665 ***
https://bugs.launchpad.net/bugs/1784665
** This bug has been marked a duplicate of bug 1784665
bcache: bch_allocator_thread(): hung task timeout
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribe
On Mon, Aug 5, 2019 at 1:19 PM Ryan Harper
wrote:
>
>
> On Mon, Aug 5, 2019 at 8:01 AM Andrea Righi
> wrote:
>
>> Ryan, I've uploaded a new test kernel with the fix mentioned in the
>> comment before:
>>
>> https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-56.62~lp1796292+4/
>>
>> I've perform
On Mon, Aug 5, 2019 at 8:01 AM Andrea Righi
wrote:
> Ryan, I've uploaded a new test kernel with the fix mentioned in the
> comment before:
>
> https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-56.62~lp1796292+4/
>
> I've performed over 100 installations using curtin-nvme.sh
> (install_count = 1
Ryan, I've uploaded a new test kernel with the fix mentioned in the
comment before:
https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-56.62~lp1796292+4/
I've performed over 100 installations using curtin-nvme.sh
(install_count = 100), no hung task timeout. I'll run other stress tests
to make su
Some additional info about the deadlock:
crash> bt 16588
PID: 16588 TASK: 9ffd7f332b00 CPU: 1 COMMAND: "bcache_allocato"
[exception RIP: bch_crc64+57]
RIP: c093b2c9 RSP: ab9585767e28 RFLAGS: 0286
RAX: f1f51403756de2bd RBX: RCX: 0
After some help from Ryan (on IRC) I've been able to run the last
reproducer script and trigger the same trace. Now I should be able to
collect all the information that I need and hopefully post a new test
kernel (fixed for real...) soon.
--
You received this bug notification because you are a me
Trying the first kernel without the change event sauce also fails:
[ 532.823594] bcache: run_cache_set() invalidating existing data
[ 532.828876] bcache: register_cache() registered cache device nvme0n1p2
[ 532.869716] bcache: register_bdev() registered backing device vda1
[ 532.994355] bcache
I tried the +3 kernel first, and I got 3 installs and then this hang:
[ 549.828710] bcache: run_cache_set() invalidating existing data
[ 549.836485] bcache: register_cache() registered cache device nvme1n1p2
[ 549.937486] bcache: register_bdev() registered backing device vdg
[ 550.018855] bca
Ryan, unfortunately the last reproducer script is giving me a lot of
errors and I'm still trying to figure out how to make it run to the end
(or at least to a point where it's start to run some bcache commands).
In the meantime (as anticipated on IRC) I've uploaded a test kernel
reverting the patc
Reproducer script
** Attachment added: "curtin-nvme.sh"
https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/17962
On Thu, Aug 1, 2019 at 10:15 AM Andrea Righi
wrote:
> Thanks Ryan, this is very interesting:
>
> [ 259.411486] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
> [ 259.537070] bcache: register_bcache() error /dev/vdg: device already
> registered (emitt
Thanks Ryan, this is very interesting:
[ 259.411486] bcache: register_bcache() error /dev/vdg: device already
registered (emitting change event)
[ 259.537070] bcache: register_bcache() error /dev/vdg: device already
registered (emitting change event)
[ 259.797830] bcache: register_bcache() error
ubuntu@ubuntu:~$ uname -r
4.15.0-56-generic
ubuntu@ubuntu:~$ cat /proc/version
Linux version 4.15.0-56-generic (arighi@kathleen) (gcc version 7.4.0 (Ubuntu
7.4.0-1ubuntu1~18.04.1)) #62~lp1796292 SMP Thu Aug 1 07:45:21 UTC 2019
This failed on the second install while running bcache-super-show /dev
I've uploaded a new test kernel based on the latest bionic kernel from
master-next:
https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-56.62~lp1796292/
In addition to that I've backported all the recent upstream bcache fixes
and applied my proposed fix for the potential deadlock in
bch_allocator
Escalated to Field Critical as it now happens often enough to block our
ability to test proposed product releases. We are unable to test
openstack-next at the moment because our test runs fail behind this bug.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
** Tags added: cscc
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache removal causes spurious failures
To manage notifications about this bug go to:
https://bugs
The newer kernel went about 16 runs and then popped this:
[ 2137.810559] md: md0: resync done.
[ 2296.795633] INFO: task python3:11639 blocked for more than 120 seconds.
[ 2296.800320] Tainted: P O 4.15.0-55-generic
#60+lp1796292+1
[ 2296.805097] "echo 0 > /proc/sys/kernel/hun
Andrea, thanks for the updated kernels.
On the first one, I got 23 installs before I ran into an issue; I'll
test the newer kernel next.
https://paste.ubuntu.com/p/2B4Kk3wbvQ/
[ 5436.870482] BUG: unable to handle kernel NULL pointer dereference at
09b8
[ 5436.873374] IP: cache_set_
... and, just in case, I've uploaded also a test kernel based on the latest
bionic's master-next + a bunch of extra bcache fixes:
https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-55.60+lp1796292+1/
If the previous kernel is still buggy it'd be nice to try also this one.
--
You received this
Hi Ryan, I've uploaded a new test kernel:
https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-54.58+lp1796292/
This one is based on 4.15.0-54.58 and it addresses specifically the
bch_bucket_alloc() problem (with this patch applied:
https://lore.kernel.org/lkml/20190710093117.GA2792@xps-13/T/#u).
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: linux (Ubuntu Bionic)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: linux (Ubuntu Cosmic)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: linux (Ubuntu Disco)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Good news! I've been able to reproduce the hung task in
bch_bucket_alloc() issue locally, using the test case from bug 1784665.
I think we're hitting the same problem now. I'll do more tests and will
keep you updated.
--
You received this bug notification because you are a member of Ubuntu
Bugs,
Thanks tons for the tests Ryan! Well, at least the hung task timeout
trace is different, so we're making some progress.
With the new kernel it seems that we're stuck in bch_bucket_alloc().
I've identified other upstream fixes that could help to prevent this
problem.
If you're willing to do few mo
** Changed in: curtin
Assignee: Terry Rudd (terrykrudd) => Andrea Righi (arighi)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache removal causes spurious fa
Without the patch, I can reproduce the hang fairly frequently, in one or
two loops, which fails in this way:
[ 1069.711956] bcache: cancel_writeback_rate_update_dwork() give up waiting for
dc->writeback_write_update to quit
[ 1088.583986] INFO: task kworker/0:2:436 blocked for more than 120 secon
I've setup our integration test that runs the the CDO-QA bcache/ceph
setup.
On the updated kernel I got through 10 loops on the deployment before it
stacktraced:
http://paste.ubuntu.com/p/zVrtvKBfCY/
[ 3939.846908] bcache: bch_cached_dev_attach() Caching vdd as bcache5 on set
275985b3-da58-41f8
This is difficult for us to test in our lab because we are using MAAS, and
we hit this during MAAS deployments of nodes, so we would need MAAS images
built with these kernels. Additionally, this doesn't reproduce every time,
it is maybe 1/4 test runs. It may be best to find a way to reproduce this
>From a kernel perspective this big slowness on shutting down a bcache
volume might be caused by a locking / race condition issue. If I read
correctly this problem has been reproduced in bionic (and in xenial we
even got a kernel oops - it looks like caused by a NULL pointer
dereference). I would t
Canonical kernel team has this item queued in the hotlist to work on. I
am assigning to myself to accelerate work
** Changed in: curtin
Assignee: (unassigned) => Terry Rudd (terrykrudd)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ub
** Also affects: linux (Ubuntu Bionic)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Eoan)
Importance: Undecided
Status: Confirmed
** Also affects: linux (Ubuntu Disco)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Cosmic)
Im
On Mon, Jun 3, 2019 at 2:05 PM Andrey Grebennikov <
agrebennikov1...@gmail.com> wrote:
> Is there an estimate on getting this package in bionic-updates please?
>
We are starting an SRU of curtin this week. SRU's take at least 7 days
from when they hit -proposed
possibly longer depending on test
Is there an estimate on getting this package in bionic-updates please?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache removal causes spurious failures
To mana
This bug is believed to be fixed in curtin in version 19.1. If this is
still a problem for you, please make a comment and set the state back to
New
Thank you.
** Changed in: curtin
Status: New => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Bugs,
This script *should* trigger the issue on Bionic GA:
https://pastebin.ubuntu.com/p/WdKGbMWnM6/
Try it with both GA and HWE bionic, the commit on HWE should trigger up.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launc
I was looking into kernel commits and I came across this:
https://github.com/torvalds/linux/commit/fadd94e05c02afec7b70b0b14915624f1782f578
So, as far as I understood, it actually deals with the issue of manual
device detach during a writeback clean-up and causing deadlock. The
timeline makes sens
** Tags added: cdo-qa foundations-engine
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache removal causes spurious failures
To manage notifications about this bu
@jhobbs
Here is the script that cleans up bcache devices on recommission:
https://pastebin.ubuntu.com/p/6WCGvM4Q32/
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bc
On Wed, May 8, 2019 at 11:55 PM Trent Lloyd
wrote:
> I have been running into this (curtin 18.1-17-gae48e86f-
> 0ubuntu1~16.04.1)
>
> I think this commit basically agrees with my thoughts but I just wanted
> to share them explicitly in case they are interesting
>
> (1) If you *unregister* the ca
Xenial GA kernel bcache unregister oops:
http://paste.ubuntu.com/p/BzfHFjzZ8y/
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache removal causes spurious failures
I have been running into this (curtin 18.1-17-gae48e86f-
0ubuntu1~16.04.1)
I think this commit basically agrees with my thoughts but I just wanted
to share them explicitly in case they are interesting
(1) If you *unregister* the cache device from the backing device, it
first has to purge all the
This occurrs on a target machine during maas install. Apport is not
collected in this case.
** Changed in: linux (Ubuntu)
Status: Incomplete => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.ne
Adding affects linux package
** Also affects: linux (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache remova
47 matches
Mail list logo