------- Comment From [email protected] 2020-05-28 12:14 EDT------- (In reply to comment #18) > > I did a little research, and it is indeed possible to disable WBT during > runtime, its just a bit un-userfriendly for lots of disks, so I wrote a > quick`n`dirty udev-rule file. > > Is it possible for you to retry this test with WBT disabled? I think it > might prove useful to know whether we can handle the occasional > queue-slowdown with that feature disabled, and it should further narrow down > what components are affected (IOW, if it runs fine with WBT disabled, > someone needs to audit the WBT code, and such). > > Here is the udev-rule to disable WBT for every block-device on the system: > > t35lp49 (1) ~ # cat /etc/udev/rules.d/99-disable-blk-wbt.rules > # > # Possible workaround for LTC 185913: > # Disable the Writeback Throttling Feature. > # Main interface is 'queue/wbt_lat_usec'. When writing '0' into it, the > # feature should be disabled for the queue in question > # > # Rule Syntax: https://www.freedesktop.org/software/systemd/man/udev.html > # > > SUBSYSTEM!="block", GOTO="wbt_disable_end" > ACTION=="remove", GOTO="wbt_disable_end" > > TEST{0644}=="queue/wbt_lat_usec", ATTR{queue/wbt_lat_usec}!="0", > ATTR{queue/wbt_lat_usec}="0" > > LABEL="wbt_disable_end" > > Its a bit coarse (like I said, quick`n`dirty), but as I don't really know > your storage-setup, this should catch everything. You can also test whether > it worked before the test with `cat /sys/class/block/*/queue/wbt_lat_usec`; > it should display '0'.
The last re-run we did with this in place - and WBT disabled - did not crash anymore - where it would be pretty consistent before. We still see timeouts/unit-resets in the midlayer, but otherwise the system stays up (still no perfect, but better than crashing outright). So there is at least some evidence that this is something in the communication between scsi-midlayer/block-layer/wbt, when requests go through SCSI EH. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1881109/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
