Organic Testcase: It usually reproduces the problem in the 1st or 2nd iteration. With the fix the problem did not reproduce in 35 iterations.
** Description changed: - (I'll add the SRU template + testing steps and post to ML shortly.) + [Impact] + + * Detaching virtio-scsi disk in Xenial guest can cause + CPU soft lockup in guest (and take 100% CPU in host). + + * It may prevent further progress on other tasks that + depend on resources locked earlier in the SCSI target + removal stack, and/or impact other SCSI functionality. + + * The fix resolves a corner case in the requests counter + in the virtio SCSI target, which impacts a downstream + (SAUCE) patch in the virtio-scsi target removal handler + that depends on the requests counter. + + [Test Case] + + * See LP #1798110 (this bug)'s comment #3 (too long for + this section -- synthetic case with GDB+QEMU) and + comment #4 (organic test case in cloud instance). + + [Regression Potential] + + * It seem low -- this only affects the SCSI command requeue + path with regards to the reference counter, which is only + used with real chance of problems in our downstream patch + (which is now passing this testcase). + + * The other less serious issue would be decrementing it to + a negative / < 0 value, which is not possible with this + driver logic (see commit message), because the reqs counter + is always incremented before calling virtscsi_queuecommand(), + where this decrement operation is inserted. + + [Original Description] A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial when detaching a virtio-scsi drive, and provided a crashdump that shows 2 things: 1) The soft locked up CPU is waiting for another CPU to finish something, and that does not happen because the other CPU is infinitely looping in virtscsi_target_destroy(). 2) The loop happens because the 'tgt->reqs' counter is non-zero, and that probably happened due to a missing decrement in SCSI command requeue path, exercised when the virtio ring is full. The reported problem itself happens because of a downstream/SAUCE patch, coupled with the problem of the missing decrement for the reqs counter. Introducing a decrement in the SCSI command requeue path resolves the problem, verified synthetically with QEMU+GDB and with test-case/loop provided by the customer as problem reproducer. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1798110 Title: xenial: virtio-scsi: CPU soft lockup due to loop in virtscsi_target_destroy() To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798110/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
