Patches posted to kernel-team mailing list [1]. [1] https://lists.ubuntu.com/archives/kernel-team/2018-October/096072.html [SRU Xenial][PATCH 0/2] Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup)
** Description changed: [Impact] - * Detaching virtio-scsi disk in Xenial guest can cause - CPU soft lockup in guest (and take 100% CPU in host). + * Detaching virtio-scsi disk in Xenial guest can cause + CPU soft lockup in guest (and take 100% CPU in host). - * It may prevent further progress on other tasks that - depend on resources locked earlier in the SCSI target - removal stack, and/or impact other SCSI functionality. + * It may prevent further progress on other tasks that + depend on resources locked earlier in the SCSI target + removal stack, and/or impact other SCSI functionality. - * The fix resolves a corner case in the requests counter - in the virtio SCSI target, which impacts a downstream - (SAUCE) patch in the virtio-scsi target removal handler - that depends on the requests counter. + * The fix resolves a corner case in the requests counter + in the virtio SCSI target, which impacts a downstream + (SAUCE) patch in the virtio-scsi target removal handler + that depends on the requests counter value to be zero. [Test Case] - * See LP #1798110 (this bug)'s comment #3 (too long for - this section -- synthetic case with GDB+QEMU) and - comment #4 (organic test case in cloud instance). + * See LP #1798110 (this bug)'s comment #3 (too long for + this section -- synthetic case with GDB+QEMU) and + comment #4 (organic test case in cloud instance). [Regression Potential] - * It seem low -- this only affects the SCSI command requeue - path with regards to the reference counter, which is only - used with real chance of problems in our downstream patch - (which is now passing this testcase). + * It seem low -- this only affects the SCSI command requeue + path with regards to the reference counter, which is only + used with real chance of problems in our downstream patch + (which is now passing this testcase). - * The other less serious issue would be decrementing it to - a negative / < 0 value, which is not possible with this - driver logic (see commit message), because the reqs counter - is always incremented before calling virtscsi_queuecommand(), - where this decrement operation is inserted. + * The other less serious issue would be decrementing it to + a negative / < 0 value, which is not possible with this + driver logic (see commit message), because the reqs counter + is always incremented before calling virtscsi_queuecommand(), + where this decrement operation is inserted. [Original Description] A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial when detaching a virtio-scsi drive, and provided a crashdump that shows 2 things: 1) The soft locked up CPU is waiting for another CPU to finish something, and that does not happen because the other CPU is infinitely looping in virtscsi_target_destroy(). 2) The loop happens because the 'tgt->reqs' counter is non-zero, and that probably happened due to a missing decrement in SCSI command requeue path, exercised when the virtio ring is full. The reported problem itself happens because of a downstream/SAUCE patch, coupled with the problem of the missing decrement for the reqs counter. Introducing a decrement in the SCSI command requeue path resolves the problem, verified synthetically with QEMU+GDB and with test-case/loop provided by the customer as problem reproducer. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1798110 Title: xenial: virtio-scsi: CPU soft lockup due to loop in virtscsi_target_destroy() To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798110/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
