Some additional info about the deadlock:
crash> bt 16588
PID: 16588 TASK: ffff9ffd7f332b00 CPU: 1 COMMAND: "bcache_allocato"
[exception RIP: bch_crc64+57]
RIP: ffffffffc093b2c9 RSP: ffffab9585767e28 RFLAGS: 00000286
RAX: f1f51403756de2bd RBX: 0000000000000000 RCX: 0000000000000065
RDX: 0000000000000065 RSI: ffff9ffd63980000 RDI: ffff9ffd63925346
RBP: ffffab9585767e28 R8: ffffffffc093db60 R9: ffffab9585739000
R10: 000000000000007f R11: 000000001ffef001 R12: 0000000000000000
R13: 0000000000000008 R14: ffff9ffd63900000 R15: ffff9ffd683d0000
CS: 0010 SS: 0018
#0 [ffffab9585767e30] bch_prio_write at ffffffffc09325c0 [bcache]
#1 [ffffab9585767eb0] bch_allocator_thread at ffffffffc091bdc5 [bcache]
#2 [ffffab9585767f08] kthread at ffffffffa80b2481
#3 [ffffab9585767f50] ret_from_fork at ffffffffa8a00205
crash> bt 14658
PID: 14658 TASK: ffff9ffd7a9f0000 CPU: 0 COMMAND: "python3"
#0 [ffffab958380bb48] __schedule at ffffffffa89ae441
#1 [ffffab958380bbe8] schedule at ffffffffa89aea7c
#2 [ffffab958380bbf8] bch_bucket_alloc at ffffffffc091c370 [bcache]
#3 [ffffab958380bc68] __bch_bucket_alloc_set at ffffffffc091c5ce [bcache]
#4 [ffffab958380bcb8] bch_bucket_alloc_set at ffffffffc091c66e [bcache]
#5 [ffffab958380bcf8] __uuid_write at ffffffffc0931b69 [bcache]
#6 [ffffab958380bda0] bch_uuid_write at ffffffffc0931f76 [bcache]
#7 [ffffab958380bdc0] __cached_dev_store at ffffffffc0937c08 [bcache]
#8 [ffffab958380be20] bch_cached_dev_store at ffffffffc0938309 [bcache]
#9 [ffffab958380be50] sysfs_kf_write at ffffffffa830c97c
#10 [ffffab958380be60] kernfs_fop_write at ffffffffa830c3e5
#11 [ffffab958380bea0] __vfs_write at ffffffffa827e5bb
#12 [ffffab958380beb0] vfs_write at ffffffffa827e781
#13 [ffffab958380bee8] sys_write at ffffffffa827e9fc
#14 [ffffab958380bf30] do_syscall_64 at ffffffffa8003b03
#15 [ffffab958380bf50] entry_SYSCALL_64_after_hwframe at ffffffffa8a00081
RIP: 00007faffc7bd154 RSP: 00007ffe307cbc88 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007faffc7bd154
RDX: 0000000000000008 RSI: 00000000011ce7f0 RDI: 0000000000000003
RBP: 00007faffccb86c0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 00000000011ce7f0 R15: 0000000000f33e60
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
In this case the task "python3" (pid 14658) gets stuck in a wait that
never completes from bch_bucket_alloc(). The task should that should
resume "python3" from this wait is "bcache_allocator" (pid 16588), but
the resume never happens, because bcache_allocator is stuck in this
"retry_invalidate" busy loop:
static int bch_allocator_thread(void *arg)
{
...
retry_invalidate:
allocator_wait(ca, ca->set->gc_mark_valid &&
!ca->invalidate_needs_gc);
invalidate_buckets(ca);
/*
* Now, we write their new gens to disk so we can start writing
* new stuff to them:
*/
allocator_wait(ca, !atomic_read(&ca->set->prio_blocked));
if (CACHE_SYNC(&ca->set->sb)) {
/*
* This could deadlock if an allocation with a btree
* node locked ever blocked - having the btree node
* locked would block garbage collection, but here we're
* waiting on garbage collection before we invalidate
* and free anything.
*
* But this should be safe since the btree code always
* uses btree_check_reserve() before allocating now, and
* if it fails it blocks without btree nodes locked.
*/
if (!fifo_full(&ca->free_inc))
goto retry_invalidate;
if (bch_prio_write(ca, false) < 0) {
ca->invalidate_needs_gc = 1;
wake_up_gc(ca->set);
goto retry_invalidate;
}
}
}
...
The exact code path is this: bch_prio_write() fails, because it calls
bch_bucket_alloc() that fails (out of free buckets), it's waking up the
garbage collector (trying to free up some buckets) and it goes back to
retry_invalidate, but it's not enough apprently; bch_prio_write() is
going to fail over and over again (due to no buckets available), unable
to break out of the busy loop => deadlock.
Looking better at the code it seems safe to resume the bcache_allocator
main loop when bch_prio_write() fails (still keeping the wake_up event
to the garbage collector), instead of going back to retry_invalidate.
This should give the allocator a better chance to free up some buckets,
possibly preventing the "out of buckets" deadlock condition.
I'm currently testing a kernel with this change, if I can't trigger any
deadlock in the next hour or so, I'll upload a new test kernel.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache removal causes spurious failures
To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs