Re: [PATCH v8 3/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
Hi, On Thu, 1 Dec 2022, Ricardo Ribalda wrote: > On Thu, 1 Dec 2022 at 14:22, 'Oliver Neukum' via Chromeos Kdump > wrote: > > > > On 01.12.22 14:03, Ricardo Ribalda wrote: > > > This patchset does not modify this behaviour. It simply fixes the > > > stall for kexec(). > > > > > > The patch that introduced the stall: > > > 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers > > > in .shutdown") > > > > That patch is problematic. I would go as far as saying that > > it needs to be reverted. > > It fixes a real issue. We have not had any complaints until we tried > to kexec in the platform. > I wont recommend reverting it until we have an alternative implementation. > > kexec is far less common than suspend/reboot. I've posted an alternative to ALSA list that reverts the problematic patch and fixes the problem (the patch was originally addressing) in a different way: https://mailman.alsa-project.org/pipermail/alsa-devel/2022-December/209776.html No changes outside sound/soc/ are needed with this approach. Br, Kai
Re: [PATCH v8 3/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
On Thu, 01 Dec 2022 14:22:12 +0100, Oliver Neukum wrote: > > On 01.12.22 14:03, Ricardo Ribalda wrote: > > Hi, > > > This patchset does not modify this behaviour. It simply fixes the > > stall for kexec(). > > > > The patch that introduced the stall: > > 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers > > in .shutdown") > > That patch is problematic. I would go as far as saying that > it needs to be reverted. ... or fixed. > > was sent as a generalised version of: > > https://github.com/thesofproject/linux/pull/3388 > > > > AFAIK, we would need a similar patch for every single board which > > I am not sure it is doable in a reasonable timeframe. > > > > On the meantime this seems like a decent compromises. Yes, a > > miss-behaving userspace can still stall during suspend, but that was > > not introduced in this patch. > > Well, I mean if you know what wrong then I'd say at least return to > a sanely broken state. > > The whole approach is wrong. You need to be able to deal with user > space talking to removed devices by returning an error and keeping > the resources association with the open file allocated until > user space calls close() As I already mentioned in another thread, if the user-space action has to be cut off, we just need to call snd_card_disconnect() instead without sync. A quick hack would be like below (totally untested and might be wrong, though). In anyway, Ricardo, please stop spinning too frequently; v8 in a few days is way too much, and now the recipient list became unmanageable. Let's give people some time to review and consider a better solution at first. thanks, Takashi -- 8< -- --- a/sound/soc/sof/core.c +++ b/sound/soc/sof/core.c @@ -475,7 +475,7 @@ EXPORT_SYMBOL(snd_sof_device_remove); int snd_sof_device_shutdown(struct device *dev) { struct snd_sof_dev *sdev = dev_get_drvdata(dev); - struct snd_sof_pdata *pdata = sdev->pdata; + struct snd_soc_component *component; if (IS_ENABLED(CONFIG_SND_SOC_SOF_PROBE_WORK_QUEUE)) cancel_work_sync(>probe_work); @@ -484,9 +484,9 @@ int snd_sof_device_shutdown(struct device *dev) * make sure clients and machine driver(s) are unregistered to force * all userspace devices to be closed prior to the DSP shutdown sequence */ - sof_unregister_clients(sdev); - - snd_sof_machine_unregister(sdev, pdata); + component = snd_soc_lookup_component(sdev->dev, NULL); + if (component && component->card && component->card->snd_card) + snd_card_disconnect(component->card->snd_card); if (sdev->fw_state == SOF_FW_BOOT_COMPLETE) return snd_sof_shutdown(sdev);
Re: [PATCH v8 3/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
Hi Oliver On Thu, 1 Dec 2022 at 14:22, 'Oliver Neukum' via Chromeos Kdump wrote: > > On 01.12.22 14:03, Ricardo Ribalda wrote: > > Hi, > > > This patchset does not modify this behaviour. It simply fixes the > > stall for kexec(). > > > > The patch that introduced the stall: > > 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers > > in .shutdown") > > That patch is problematic. I would go as far as saying that > it needs to be reverted. > It fixes a real issue. We have not had any complaints until we tried to kexec in the platform. I wont recommend reverting it until we have an alternative implementation. kexec is far less common than suspend/reboot. > > was sent as a generalised version of: > > https://github.com/thesofproject/linux/pull/3388 > > > > AFAIK, we would need a similar patch for every single board which > > I am not sure it is doable in a reasonable timeframe. > > > > On the meantime this seems like a decent compromises. Yes, a > > miss-behaving userspace can still stall during suspend, but that was > > not introduced in this patch. > > Well, I mean if you know what wrong then I'd say at least return to > a sanely broken state. > > The whole approach is wrong. You need to be able to deal with user > space talking to removed devices by returning an error and keeping > the resources association with the open file allocated until > user space calls close() In general, the whole shutdown is broken for all the subsystems ;). It is a complicated issue. Users handling fds, devices with DMAs in the middle of an operation, dma fences Unfortunately I am not that familiar with the sound subsystem to make a proper patch for this. > > Regards > Oliver > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromeos Kdump" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to chromeos-kdump+unsubscr...@google.com. > To view this discussion on the web, visit > https://groups.google.com/a/google.com/d/msgid/chromeos-kdump/d3730d1d-6f92-700a-06c4-0e0a35e270b0%40suse.com. -- Ricardo Ribalda
Re: [PATCH v8 3/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
On 01.12.22 14:03, Ricardo Ribalda wrote: Hi, This patchset does not modify this behaviour. It simply fixes the stall for kexec(). The patch that introduced the stall: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") That patch is problematic. I would go as far as saying that it needs to be reverted. was sent as a generalised version of: https://github.com/thesofproject/linux/pull/3388 AFAIK, we would need a similar patch for every single board which I am not sure it is doable in a reasonable timeframe. On the meantime this seems like a decent compromises. Yes, a miss-behaving userspace can still stall during suspend, but that was not introduced in this patch. Well, I mean if you know what wrong then I'd say at least return to a sanely broken state. The whole approach is wrong. You need to be able to deal with user space talking to removed devices by returning an error and keeping the resources association with the open file allocated until user space calls close() Regards Oliver
Re: [PATCH v8 3/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
Hi Oliver Thanks for your review On Thu, 1 Dec 2022 at 13:29, Oliver Neukum wrote: > > On 01.12.22 12:08, Ricardo Ribalda wrote: > > If we are shutting down due to kexec and the userspace is frozen, the > > system will stall forever waiting for userspace to complete. > > > > Do not wait for the clients to complete in that case. > > Hi, > > I am afraid I have to state that this approach is bad in every case, > not just this corner case. It basically means that user space can stall > the kernel for an arbitrary amount of time. And we cannot have that. > > Regards > Oliver This patchset does not modify this behaviour. It simply fixes the stall for kexec(). The patch that introduced the stall: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") was sent as a generalised version of: https://github.com/thesofproject/linux/pull/3388 AFAIK, we would need a similar patch for every single board which I am not sure it is doable in a reasonable timeframe. On the meantime this seems like a decent compromises. Yes, a miss-behaving userspace can still stall during suspend, but that was not introduced in this patch. Regards! > -- Ricardo Ribalda
Re: [PATCH v8 3/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
On 01.12.22 12:08, Ricardo Ribalda wrote: If we are shutting down due to kexec and the userspace is frozen, the system will stall forever waiting for userspace to complete. Do not wait for the clients to complete in that case. Hi, I am afraid I have to state that this approach is bad in every case, not just this corner case. It basically means that user space can stall the kernel for an arbitrary amount of time. And we cannot have that. Regards Oliver
[PATCH v8 3/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
If we are shutting down due to kexec and the userspace is frozen, the system will stall forever waiting for userspace to complete. Do not wait for the clients to complete in that case. This fixes: [ 84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done. [ 246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds. [ 246.819035] Call Trace: [ 246.821782] [ 246.824186] __schedule+0x5f9/0x1263 [ 246.828231] schedule+0x87/0xc5 [ 246.831779] snd_card_disconnect_sync+0xb5/0x127 ... [ 246.889249] snd_sof_device_shutdown+0xb4/0x150 [ 246.899317] pci_device_shutdown+0x37/0x61 [ 246.903990] device_shutdown+0x14c/0x1d6 [ 246.908391] kernel_kexec+0x45/0xb9 And: [ 246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds. [ 246.927709] Call Trace: [ 246.930461] [ 246.932819] __schedule+0x5f9/0x1263 [ 246.936855] ? fsnotify_grab_connector+0x5c/0x70 [ 246.942045] schedule+0x87/0xc5 [ 246.945567] schedule_timeout+0x49/0xf3 [ 246.949877] wait_for_completion+0x86/0xe8 [ 246.954463] snd_card_free+0x68/0x89 ... [ 247.001080] platform_device_unregister+0x12/0x35 Cc: sta...@vger.kernel.org Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") Signed-off-by: Ricardo Ribalda --- sound/soc/sof/core.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c index 3e6141d03770..9587b6a85103 100644 --- a/sound/soc/sof/core.c +++ b/sound/soc/sof/core.c @@ -9,6 +9,8 @@ // #include +#include +#include #include #include #include @@ -484,9 +486,10 @@ int snd_sof_device_shutdown(struct device *dev) * make sure clients and machine driver(s) are unregistered to force * all userspace devices to be closed prior to the DSP shutdown sequence */ - sof_unregister_clients(sdev); - - snd_sof_machine_unregister(sdev, pdata); + if (!(kexec_in_progress() && pm_freezing())) { + sof_unregister_clients(sdev); + snd_sof_machine_unregister(sdev, pdata); + } if (sdev->fw_state == SOF_FW_BOOT_COMPLETE) return snd_sof_shutdown(sdev); -- 2.39.0.rc0.267.gcb52ba06e7-goog-b4-0.11.0-dev-696ae