Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On Thu, Apr 08, 2021 at 06:02:24PM +, Luis Chamberlain wrote: > On Sat, Apr 03, 2021 at 08:25:38PM +, Luis Chamberlain wrote: > > So creating say 1000 random files in /lib/firmware on a freshly created > > btrfs partition helps reproduce: > > > > mkfs.btrfs /dev/whatever > > mount /dev/wahtever /lib/firmware > > # Put it on /etc/fstab too > > > > Generate 1000 random files on it: > > > > ``` > > for n in {1..1000}; do > > > > dd if=/dev/urandom of=/lib/firmware/file$( printf %03d "$n" ).bin bs=1 > > count=$((RANDOM + 1024 )) > > done > > ``` > > > > Then reboot, upon reboot do: > > > > modprobe test_firmware > > echo 1 > /sys/devices/virtual/misc/test_firmware/config_enable_resume_test > > systemctl suspend > > > > If its a guest wake it up: > > > > virsh dompmwakeup domidofguest > > This happens because: > > btrfs_lookup() --> ... --> > > btrfs_search_slot() --> read_block_for_search() --> > > --> read_tree_block() --> btree_read_extent_buffer_pages() --> > > --> submit_one_bio() --> btrfs_submit_metadata_bio() --> > > --> btrfsic_submit_bio() --> submit_bio() > --> this completes and then > --> wait_on_page_locked() on the first page > --> never returns > > > I also managed to reproduce this easily with XFS as well, so this is not > a btrfs thing as I suspected. It does not happen with ext4 though. > However I think that's just by chance, it should still be prone to the > same issue. > > Either way, I'm dusting off my patches for fs freeze as I believe that > should fix this problem. I am not sure if we want a stop gap hack like > the one I posted in the meantime... I don't think so. I rather fix this > well with the series I'll post for fs freeze. Give me a bit of time and > I'll CC you on the patches. Low and behold, as I suspectd, my old VFS fsfreeze / thaw patch series this. However I should note that I needed to add remove also the WQ_FREEZABLE from fs as well, which was missing in my patch series, and which Jan Kara had pointed out. However, the VFS freeze work is quite a bit of work which we are still discussing, so in the meantime, I think we have no other option but to put the stop-gap patch I suggested with the usermode helper lock. I will just modify it a bit more. I'll also post my fs freeze work now again, but I'll note that it still requires some more work to address everything which we have discussed in the community. I'll post the patches as I think others may be interested in the progress of that. Luis
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On Sat, Apr 03, 2021 at 08:25:38PM +, Luis Chamberlain wrote: > So creating say 1000 random files in /lib/firmware on a freshly created > btrfs partition helps reproduce: > > mkfs.btrfs /dev/whatever > mount /dev/wahtever /lib/firmware > # Put it on /etc/fstab too > > Generate 1000 random files on it: > > ``` > for n in {1..1000}; do > > dd if=/dev/urandom of=/lib/firmware/file$( printf %03d "$n" ).bin bs=1 > count=$((RANDOM + 1024 )) > done > ``` > > Then reboot, upon reboot do: > > modprobe test_firmware > echo 1 > /sys/devices/virtual/misc/test_firmware/config_enable_resume_test > systemctl suspend > > If its a guest wake it up: > > virsh dompmwakeup domidofguest This happens because: btrfs_lookup() --> ... --> btrfs_search_slot() --> read_block_for_search() --> --> read_tree_block() --> btree_read_extent_buffer_pages() --> --> submit_one_bio() --> btrfs_submit_metadata_bio() --> --> btrfsic_submit_bio() --> submit_bio() --> this completes and then --> wait_on_page_locked() on the first page --> never returns I also managed to reproduce this easily with XFS as well, so this is not a btrfs thing as I suspected. It does not happen with ext4 though. However I think that's just by chance, it should still be prone to the same issue. Either way, I'm dusting off my patches for fs freeze as I believe that should fix this problem. I am not sure if we want a stop gap hack like the one I posted in the meantime... I don't think so. I rather fix this well with the series I'll post for fs freeze. Give me a bit of time and I'll CC you on the patches. Luis
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On 03/04/2021 23:04, Luis Chamberlain wrote: OK this fixes it but this just shows that likely the thaw'ing allows a race to take place which we didn't expect. I'll do some more digging for a proper fix. I can indeed confirm that this fixes the stall. This however does not seem to be the (complete) solution. Instead I now get a kernel crash message (see below) for every firmware location tried to read during resume. This might be intentional for debugging purposes during testing. This is identical for ext4 and btrfs. If the firmware file can not be found during caching on suspend, the reads are still attempted again during resume. This leads to multiple of those crash messages (for different firmware locations) during resume if the firmware file is not present, even for drivers properly requesting caching. So if this patch is to go in (those crashes would really help with getting the si2168 patches in…), I think you have to make sure that even for non-existent firmware files, no read is ever attempted on resume. Which means to set up the caching even if the initial request_firmware() failed and to store knowledge about failed caching attempts to not retry these reads on resume. Lukas [ cut here ] WARNING: CPU: 0 PID: 662 at fs/kernel_read_file.c:161 kernel_read_file_from_path_initns+0x11c/0x140 Modules linked in: test_firmware nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common kvm_amd vmwgfx ccp kvm ttm drm_kms_helper snd_pcm joydev snd_timer snd e1000 irqbypass cec soundcore vboxguest i2c_piix4 pcspkr drm fuse zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw ata_generic pata_acpi video CPU: 0 PID: 662 Comm: systemd-sleep Not tainted 5.12.0-rc5+ #2 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:kernel_read_file_from_path_initns+0x11c/0x140 Code: ff ff 4c 89 e7 89 44 24 10 e8 50 07 fc ff e8 fb 1c d8 ff 44 8b 44 24 10 48 83 c4 28 44 89 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 80 3d b4 10 7a 01 00 75 e3 e9 5b 5e 83 00 e8 cf 1c d8 ff 45 RSP: 0018:a096c0b9fb90 EFLAGS: 00010286 RAX: fff5 RBX: RCX: bd85d688 RDX: bd85d688 RSI: 0297 RDI: bd85d680 RBP: a096c0b9fbe0 R08: fff5 R09: bd85d688 R10: R11: R12: 8976ca811000 R13: a096c0b9fc00 R14: 7fff R15: 0001 FS: 7f4718de3b40() GS:8979cfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 5594afeca4f8 CR3: 000106a62000 CR4: 000506f0 Call Trace: ? snprintf+0x39/0x40 fw_get_filesystem_firmware+0xe2/0x270 _request_firmware+0x21e/0x500 request_firmware+0x32/0x50 test_firmware_resume.cold+0x4e/0xb2 [test_firmware] ? platform_pm_suspend+0x40/0x40 dpm_run_callback+0x4c/0x120 device_resume+0xa7/0x200 dpm_resume+0xce/0x2c0 dpm_resume_end+0xd/0x20 suspend_devices_and_enter+0x195/0x750 pm_suspend.cold+0x329/0x374 state_store+0x71/0xd0 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x108/0x180 vfs_write+0x1b8/0x270 ksys_write+0x4f/0xc0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4719a527a7 Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:7ffee4d18e18 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 0004 RCX: 7f4719a527a7 RDX: 0004 RSI: 7ffee4d18f00 RDI: 0004 RBP: 7ffee4d18f00 R08: 5626b555c710 R09: 7f4719ae84e0 R10: 7f4719ae83e0 R11: 0246 R12: 0004 R13: 5626b5558650 R14: 0004 R15: 7f4719b25700 ---[ end trace 7f7ef0dc067dd714 ]--- Trying to do direct read when not available test_firmware test_firmware: loading /lib/firmware/updates/5.12.0-rc5+/test-firmware.bin failed with error -11 [ cut here ]
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
Great to hear that you now succeeded in reproducing the problem. On 03/04/2021 22:25, Luis Chamberlain wrote: On Sat, Apr 03, 2021 at 12:24:07PM +0200, Lukas Middendorf wrote: One further thing I noticed which might be problematic in rare cases: According to the kernel debug messages, the firmware-loader does not attempt to cache the firmware during suspend, Correct, the goal with this test was to purposely *not* call for the firmware prior to suspend, and instead *race* (do the wrong thing) at resume. It shouldn't really stall... fail yes, but stall, that seems fishy. I understood that. if the previous call to request_firmware() has failed (file not present; call made during previous resume). In my opinion it should attempt to cache the firmware on suspend even in this case Yes, that is the *proper* way to do use the firmware API, but we want to reproduce the stall, and so we have to recreate the issue you reported by doing something bad. Should firmware_request_cache() always be called, or only instead of request_firmware() ? In case request_firmware is called on its own, and the firmware file is not present, it might still race on the next resume. request_firmware seems to be meant to include the cache request for the next suspend, but it apparently *does not* if firmware loading fails due to a missing file. I think this is something that should either be changed or properly documented in the API documentation. Did you also try to create a random test-firmware.bin (I used 1M from /dev/urandom) instead of an empty /lib/firmware ? No, right now I want to to just focus on fixing the stall you saw. This is a second nuance of the stall I saw: The firmware file (in this case test-firmware.bin) is present but not in cache. If you run ls -lR /lib/firmware > /dev/null before systemctl suspend your reproduction steps will very likely not stall. If you in addition run dd if=/dev/urandom of=/lib/firmware/test-firmware.bin bs=1K count=64 during the preparation of /lib/firmware (in addition to or instead of your for loop), then it will always stall (as long as the file content is not read before suspend). One stall is happening while the directory content is being listed (to find that the file is not present), the second stall is happening while actually trying to read the file. Those two stalls likely have the same root cause, but it might be different. I just want to make sure you have covered all cases. You might be better off just reposting your patches with the respective Reviewed-by tags and pestering your maintainer. I will try to be a little bit more insistent this time. Is "just repost" the usual way to handle if patches are ignored? You can repost, v2 just add my reviewed-by. Those patches are indeed correct as they are calling for the firmware prior to resume. Maybe clarify that. OK, will do. But indeed there is also another issue which you reported which needs to be fixed, and for that thanks so much for you patience! I'll be looking into this! Also thank you for your effort. Lukas
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On Sat, Apr 03, 2021 at 08:25:38PM +, Luis Chamberlain wrote: > So creating say 1000 random files in /lib/firmware on a freshly created > btrfs partition helps reproduce: > > mkfs.btrfs /dev/whatever > mount /dev/wahtever /lib/firmware > # Put it on /etc/fstab too > > Generate 1000 random files on it: > > ``` > for n in {1..1000}; do > > dd if=/dev/urandom of=/lib/firmware/file$( printf %03d "$n" ).bin bs=1 > count=$((RANDOM + 1024 )) > done > ``` > > Then reboot, upon reboot do: > > modprobe test_firmware > echo 1 > /sys/devices/virtual/misc/test_firmware/config_enable_resume_test > systemctl suspend OK this fixes it but this just shows that likely the thaw'ing allows a race to take place which we didn't expect. I'll do some more digging for a proper fix. diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c index 90d255fbdd9b..b9c37eefab35 100644 --- a/fs/kernel_read_file.c +++ b/fs/kernel_read_file.c @@ -4,6 +4,7 @@ #include #include #include +#include /** * kernel_read_file() - read file contents into a kernel buffer @@ -156,17 +157,25 @@ int kernel_read_file_from_path_initns(const char *path, loff_t offset, if (!path || !*path) return -EINVAL; + ret = usermodehelper_read_trylock(); + if (WARN_ON(ret)) { + pr_warn_once("Trying to do direct read when not available"); + return ret; + } task_lock(&init_task); get_fs_root(init_task.fs, &root); task_unlock(&init_task); file = file_open_root(root.dentry, root.mnt, path, O_RDONLY, 0); path_put(&root); - if (IS_ERR(file)) + if (IS_ERR(file)) { + usermodehelper_read_unlock(); return PTR_ERR(file); + } ret = kernel_read_file(file, offset, buf, buf_size, file_size, id); fput(file); + usermodehelper_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(kernel_read_file_from_path_initns);
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On Sat, Apr 03, 2021 at 12:24:07PM +0200, Lukas Middendorf wrote: > > On 02/04/2021 20:02, Luis Chamberlain wrote: > > No sorry, I dropped the ball on this but I managed to now spawn up the > > virtual guests where I was doing development to reproduce this. Give > > me some time and I will zero in on this now. > > > > For now what I have is the following to test this, I next will work > > on the userspace part. > > I can report that your patch for test_firmware works (applied to current > master from linus d93a0d4; I get some offsets and had to adjust some > whitespace for it to apply). With that module (and resume test enabled) I > get freezes at resume in the same cases that would also cause problems with > si2168. Beautiful! > I'm testing this on bare metal F34 beta with / on a btrfs. I'm using nvidia > driver again to make sure the system does not otherwise use any firmware > from /usr/lib/firmware (confirmed with kernel debug messages for > firmware_loader). My si2168 is not plugged in. OK I was not able to reproduce but the details below also clarify why, so I now was able to reproduce! So I don't need to specifically test with this distro, but others may do so. > I tested it with a normal population of files in /usr/lib/firmware , without > test-firmware.bin and also with a random 1MiB file in place. I tested after > a reboot so it does not do caching. > With /usr/lib/firmware on a separate ext4 partition I can confirm with dmesg > that the test_firmware suspend test actually works (does not freeze). > With /usr/lib/firmware on btrfs it fails in both cases (with and without the > firmware files). > With caching (first suspend with the ext4 partition mounted, then a second > suspend without) it does not freeze even with the firmware on btrfs. OK thanks for confirming! > One further thing I noticed which might be problematic in rare cases: > According to the kernel debug messages, the firmware-loader does not attempt > to cache the firmware during suspend, Correct, the goal with this test was to purposely *not* call for the firmware prior to suspend, and instead *race* (do the wrong thing) at resume. It shouldn't really stall... fail yes, but stall, that seems fishy. > if the previous call to > request_firmware() has failed (file not present; call made during previous > resume). In my opinion it should attempt to cache the firmware on suspend > even in this case Yes, that is the *proper* way to do use the firmware API, but we want to reproduce the stall, and so we have to recreate the issue you reported by doing something bad. > (If I remember correctly, firmware_request_cache also > works without the file being present). In case some low-memory condition has > caused the file system cache to lose the information about the file being > non-present (or the file has been written after the initial attempt and is > no longer in the file system cache), this might lead to freezes even for > well-behaved drivers in case they reattempt to do request_firmware() on > resume. > If the firmware is found during resume, it is cached on further suspends. Right. > Given how long it took me to narrow down this problem in this (for me) > reliably reproducible case, something like this happening at random would be > almost impossible to debug/locate and might actually happen frequently in > the wild. OK thanks for the report! > On 03/04/2021 00:19, Luis Chamberlain wrote: > > Lukas, can you share your /etc/fstab ? > > This is the core part (everything else is unmounted), I shortened the UUIDs. > The ext4 mount of course is also unmounted when I want it to fail, with it > in place it reliably not-fails. > > UUID=<1> / btrfs subvol=linux1 0 0 > UUID=<2> /usr/lib/firmware ext4defaults 1 1 > UUID=<1> /home/lukas btrfs subvol=linux1-f34-lukas 0 0 > tmpfs /home/lukas/.cache tmpfs size=16g,gid=lukas,uid=lukas,mode=700 0 0 > > > Also, how long do you stay in the boot before you try to suspend? > > During my reproduction sessions usually only shortly 1-5min, but I think I > have seen this also after a slightly longer time. I can try to let it sit > for longer if you think that is important. No, I was looking to see how easy it is to reproduce right after a fresh boot, now that I can reproduce I confirm it happens right away. > On 03/04/2021 00:58, Luis Chamberlain wrote: > > On Fri, Apr 2, 2021 at 3:19 PM Luis Chamberlain wrote: > > > > > > Lukas, can you share your /etc/fstab ? Also, how long do you stay in > > > the boot before you try to suspend? > > > > OK I cannot reproduce the issue with the modified patch I sent to > > test_firmware, which if you enable config_enable_resume_test will > > trigger a request_firmware() on resume, thus trying to mimic the race > > you note. To test this you can simply use a loopback filesystem for > > your /lib/firmware and create a btrfs filesystem for it, and then run: > > > > echo 1 > /sys/devices/virtual/misc/tes
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
Hi Luis, I now succeeded in reproducing this in VirtualBox running a F34 minimal installation with / on btrfs (with default firmware files). I can send you the virtual disk file if you want it. With the kernel sourcecode and the compiled and installed kernel it currently is 21.8GiB. On 03/04/2021 12:24, Lukas Middendorf wrote: On 03/04/2021 00:58, Luis Chamberlain wrote: Can you provide kernel logs for where you are seeing things get stuck at? I have dumped the output of the serial console with all outputs cranked to the max. In [1] is the output with test_firmware loaded and suspend test running. This leads to a freeze. However it is not completely dead. While it does not visually react, at least from time to time there still are some messages from the kernel and the cursor is still blinking. In [2] is a (successful) suspend and resume cycle with test_firmware not loaded. Lukas [1] https://gist.github.com/midluk/05716f714f778d26dc02771245df0ac9 [2] https://gist.github.com/midluk/a03eb953b6cf097688b8be2e0cd387fd
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On 02/04/2021 20:02, Luis Chamberlain wrote: No sorry, I dropped the ball on this but I managed to now spawn up the virtual guests where I was doing development to reproduce this. Give me some time and I will zero in on this now. For now what I have is the following to test this, I next will work on the userspace part. I can report that your patch for test_firmware works (applied to current master from linus d93a0d4; I get some offsets and had to adjust some whitespace for it to apply). With that module (and resume test enabled) I get freezes at resume in the same cases that would also cause problems with si2168. I'm testing this on bare metal F34 beta with / on a btrfs. I'm using nvidia driver again to make sure the system does not otherwise use any firmware from /usr/lib/firmware (confirmed with kernel debug messages for firmware_loader). My si2168 is not plugged in. I tested it with a normal population of files in /usr/lib/firmware , without test-firmware.bin and also with a random 1MiB file in place. I tested after a reboot so it does not do caching. With /usr/lib/firmware on a separate ext4 partition I can confirm with dmesg that the test_firmware suspend test actually works (does not freeze). With /usr/lib/firmware on btrfs it fails in both cases (with and without the firmware files). With caching (first suspend with the ext4 partition mounted, then a second suspend without) it does not freeze even with the firmware on btrfs. One further thing I noticed which might be problematic in rare cases: According to the kernel debug messages, the firmware-loader does not attempt to cache the firmware during suspend, if the previous call to request_firmware() has failed (file not present; call made during previous resume). In my opinion it should attempt to cache the firmware on suspend even in this case (If I remember correctly, firmware_request_cache also works without the file being present). In case some low-memory condition has caused the file system cache to lose the information about the file being non-present (or the file has been written after the initial attempt and is no longer in the file system cache), this might lead to freezes even for well-behaved drivers in case they reattempt to do request_firmware() on resume. If the firmware is found during resume, it is cached on further suspends. Given how long it took me to narrow down this problem in this (for me) reliably reproducible case, something like this happening at random would be almost impossible to debug/locate and might actually happen frequently in the wild. On 03/04/2021 00:19, Luis Chamberlain wrote: Lukas, can you share your /etc/fstab ? This is the core part (everything else is unmounted), I shortened the UUIDs. The ext4 mount of course is also unmounted when I want it to fail, with it in place it reliably not-fails. UUID=<1> / btrfs subvol=linux1 0 0 UUID=<2> /usr/lib/firmware ext4defaults 1 1 UUID=<1> /home/lukas btrfs subvol=linux1-f34-lukas 0 0 tmpfs /home/lukas/.cache tmpfs size=16g,gid=lukas,uid=lukas,mode=700 0 0 Also, how long do you stay in the boot before you try to suspend? During my reproduction sessions usually only shortly 1-5min, but I think I have seen this also after a slightly longer time. I can try to let it sit for longer if you think that is important. On 03/04/2021 00:58, Luis Chamberlain wrote: On Fri, Apr 2, 2021 at 3:19 PM Luis Chamberlain wrote: Lukas, can you share your /etc/fstab ? Also, how long do you stay in the boot before you try to suspend? OK I cannot reproduce the issue with the modified patch I sent to test_firmware, which if you enable config_enable_resume_test will trigger a request_firmware() on resume, thus trying to mimic the race you note. To test this you can simply use a loopback filesystem for your /lib/firmware and create a btrfs filesystem for it, and then run: echo 1 > /sys/devices/virtual/misc/test_firmware/config_enable_resume_test systemctl suspend Then resume. You should see "resume test" print on dmesg. I keep my /lib/firmware/ empty and still, nothing. Did you also try to create a random test-firmware.bin (I used 1M from /dev/urandom) instead of an empty /lib/firmware ? If the directory is completely empty, it also does not freeze for me. If the directory is empty, any attempt to access its content can likely be directly served from cache, even if the actual directory has never been accessed before, as long as /lib (which is a symlink to /usr/lib on fedora) has been accessed (which will likely always be true). So I have to further add to my previous findings that "firmware directory is not completely empty" is a further prerequisite for it to fail. Can you provide kernel logs for where you are seeing things get stuck at? The log does not have any entries from resume. For the attempts where it freezes the last entry in journalctl is systemd-sleep[5050]:
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On Fri, Apr 2, 2021 at 3:19 PM Luis Chamberlain wrote: > > Lukas, can you share your /etc/fstab ? Also, how long do you stay in > the boot before you try to suspend? OK I cannot reproduce the issue with the modified patch I sent to test_firmware, which if you enable config_enable_resume_test will trigger a request_firmware() on resume, thus trying to mimic the race you note. To test this you can simply use a loopback filesystem for your /lib/firmware and create a btrfs filesystem for it, and then run: echo 1 > /sys/devices/virtual/misc/test_firmware/config_enable_resume_test systemctl suspend Then resume. You should see "resume test" print on dmesg. I keep my /lib/firmware/ empty and still, nothing. Can you provide kernel logs for where you are seeing things get stuck at? Note that I had mentioned the races on suspend/resume do exist for any journaling filesystem, but this typically happens if you are doing ongoing writes. I suppose you are *not* doing writes and your filesystem is idle. As such without kernel logs I cannot be sure what the issue is, but at this point after the initial testing I've done I don't suspect this is a firmware API issue. You might be better off just reposting your patches with the respective Reviewed-by tags and pestering your maintainer. PS. If you want to test this on a guest you bring the guest back up with virsh dompmwakeup. Luis
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
Lukas, can you share your /etc/fstab ? Also, how long do you stay in the boot before you try to suspend? Luis
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
On Thu, Apr 01, 2021 at 04:59:47PM +0200, Lukas Middendorf wrote: > Hello Luis, > > > On 18/08/2020 16:37, Luis Chamberlain wrote: > > On Tue, Aug 18, 2020 at 12:04:51AM +0200, Lukas Middendorf wrote: > > > On 17/08/2020 17:20, Luis Chamberlain wrote: > > > > This helps, thanks so much, now we'll have to write a reproducer, thanks > > > > for the report!! > > > > > > Will you do it yourself or do you expect me to do anything for this? > > > > I meant to imply that we'd do this, now that we understand the problem. > > Thanks > > for your report! > > any news on this issue? Did you succeed with reproducing this at your end? No sorry, I dropped the ball on this but I managed to now spawn up the virtual guests where I was doing development to reproduce this. Give me some time and I will zero in on this now. For now what I have is the following to test this, I next will work on the userspace part. diff --git a/lib/test_firmware.c b/lib/test_firmware.c index 9fee2b93a8d1..f9e67fc4145a 100644 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -33,6 +34,8 @@ static DEFINE_MUTEX(test_fw_mutex); static const struct firmware *test_firmware; +static struct platform_device *pdev; + struct test_batched_req { u8 idx; int rc; @@ -53,6 +56,9 @@ struct test_batched_req { * @sync_direct: when the sync trigger is used if this is true * request_firmware_direct() will be used instead. * @send_uevent: whether or not to send a uevent for async requests + * @enable_resume_test: if @senable_resume is true this will enable a test to + * issue a request_firmware() upon resume. This is useful to test resume + * after suspend filesystem races. * @num_requests: number of requests to try per test case. This is trigger * specific. * @reqs: stores all requests information @@ -91,6 +97,7 @@ struct test_config { bool into_buf; bool sync_direct; bool send_uevent; + bool enable_resume_test; u8 num_requests; u8 read_fw_idx; @@ -184,6 +191,7 @@ static int __test_firmware_config_init(void) test_fw_config->send_uevent = true; test_fw_config->into_buf = false; test_fw_config->sync_direct = false; + test_fw_config->enable_resume_test = false; test_fw_config->req_firmware = request_firmware; test_fw_config->test_result = 0; test_fw_config->reqs = NULL; @@ -257,6 +265,9 @@ static ssize_t config_show(struct device *dev, len += scnprintf(buf+len, PAGE_SIZE - len, "sync_direct:\t\t%s\n", test_fw_config->sync_direct ? "true" : "false"); + len += scnprintf(buf+len, PAGE_SIZE - len, + "enable_resume_test:\t\t%s\n", + test_fw_config->enable_resume_test ? "true" : "false"); len += scnprintf(buf+len, PAGE_SIZE - len, "read_fw_idx:\t%u\n", test_fw_config->read_fw_idx); @@ -422,6 +433,22 @@ static ssize_t config_sync_direct_show(struct device *dev, } static DEVICE_ATTR_RW(config_sync_direct); +static ssize_t config_enable_resume_test_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return test_dev_config_update_bool(buf, count, + &test_fw_config->enable_resume_test); +} + +static ssize_t config_enable_resume_test_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + return test_dev_config_show_bool(buf, test_fw_config->enable_resume_test); +} +static DEVICE_ATTR_RW(config_enable_resume_test); + static ssize_t config_send_uevent_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -929,6 +956,7 @@ static struct attribute *test_dev_attrs[] = { TEST_FW_DEV_ATTR(config_into_buf), TEST_FW_DEV_ATTR(config_sync_direct), TEST_FW_DEV_ATTR(config_send_uevent), + TEST_FW_DEV_ATTR(config_enable_resume_test), TEST_FW_DEV_ATTR(config_read_fw_idx), /* These don't use the config at all - they could be ported! */ @@ -958,6 +986,81 @@ static struct miscdevice test_fw_misc_device = { .groups = test_dev_groups, }; +static int __maybe_unused test_firmware_suspend(struct device *dev) +{ + return 0; +} + + +static int __maybe_unused test_firmware_resume(struct device *dev) +{ + int rc; + + if (!test_fw_config->enable_resume_test) + return 0; + + pr_info("resume test, loading '%s'\n", test_fw_config->name); + + mutex_lock(&test_fw_mutex); + release_firmware(test_
Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
Hello Luis, On 18/08/2020 16:37, Luis Chamberlain wrote: On Tue, Aug 18, 2020 at 12:04:51AM +0200, Lukas Middendorf wrote: On 17/08/2020 17:20, Luis Chamberlain wrote: This helps, thanks so much, now we'll have to write a reproducer, thanks for the report!! Will you do it yourself or do you expect me to do anything for this? I meant to imply that we'd do this, now that we understand the problem. Thanks for your report! any news on this issue? Did you succeed with reproducing this at your end? I retested this with 5.12-rc5 and everything still seems unchanged. I still get freezes on resume under the same circumstances described previously if /usr/lib/firmware is on btrfs. My patches to si2168.c were not merged. The documentation of request_firmware() seems unchanged and does not reflect the fact that prerequisites exist for making it safe to call during resume callbacks. Lukas