Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

2020-05-22 Thread Luis Chamberlain
On Fri, May 22, 2020 at 03:52:44PM -0500, Alex Elder wrote:
> On 5/22/20 12:28 AM, Luis Chamberlain wrote:
> > OK thanks. Can the user be affected by this crash? If so how? Can
> > we recover ? Is that always guaranteed?
> 
> We can't guarantee anything about recovering from a crash of
> an independent entity.  But by design, recovery from a modem
> crash is possible and is supposed to leave Linux in a
> consistent state.  A modem crash is likely to be observable
> to the user.

Thanks this helps, I'll drop this patch!

 Luis


Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

2020-05-22 Thread Alex Elder

On 5/22/20 12:28 AM, Luis Chamberlain wrote:

On Tue, May 19, 2020 at 05:34:13PM -0500, Alex Elder wrote:

On 5/15/20 4:28 PM, Luis Chamberlain wrote:

This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.

Using a taint flag allows us to annotate when this happens clearly.


I don't fully understand what this is meant to do, so I can't
fully assess whether it's the right thing to do.


It is meant to taint the kernel to ensure it is clear that something
critically bad has happened with the device firmware, it crashed, and
recovery may or may not happen, we are not 100% certain.


But in this particular place in the IPA code, the *modem* has
crashed.  And the IPA driver is not responsible for modem
firmware, remoteproc is.


Oi vei. So the device it depends on has crashed.


Yes, more or less.  It could be considered a little more
nuanced than even that, but I won't get into it here.


The IPA driver *can* be responsible for loading some other
firmware, but even in that case, it only happens on initial
boot, and it's basically assumed to never crash.


OK is this an issue which we can recover from? If for the slightest bit
this can affect users it is something we should inform them over.


If the IPA driver successfully loads this firmware, it should
be assumed to never crash.  So in that respect, there is no
recovery required.

Again, the modem (with which the IPA hardware and driver
interacts) can crash, or it can be shut down intentionally.
And in either case it can recover, automatically or manually.
But all of that (and the modem's firmware loading) is the
responsibility of the remoteproc subsystem, not IPA.


This patch set is missing uevents for these issues, but I just added
support for this.


So regardless of whether this module_firmware_crashed() call is
appropriate in some places, I believe it should not be used here.


OK thanks. Can the user be affected by this crash? If so how? Can
we recover ? Is that always guaranteed?


We can't guarantee anything about recovering from a crash of
an independent entity.  But by design, recovery from a modem
crash is possible and is supposed to leave Linux in a
consistent state.  A modem crash is likely to be observable
to the user.

I'll repeat that I don't believe the new call you inserted
in the IPA driver belongs there.

-Alex



   Luis





Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

2020-05-21 Thread Luis Chamberlain
On Tue, May 19, 2020 at 05:34:13PM -0500, Alex Elder wrote:
> On 5/15/20 4:28 PM, Luis Chamberlain wrote:
> > This makes use of the new module_firmware_crashed() to help
> > annotate when firmware for device drivers crash. When firmware
> > crashes devices can sometimes become unresponsive, and recovery
> > sometimes requires a driver unload / reload and in the worst cases
> > a reboot.
> > 
> > Using a taint flag allows us to annotate when this happens clearly.
> 
> I don't fully understand what this is meant to do, so I can't
> fully assess whether it's the right thing to do.

It is meant to taint the kernel to ensure it is clear that something
critically bad has happened with the device firmware, it crashed, and
recovery may or may not happen, we are not 100% certain.
> 
> But in this particular place in the IPA code, the *modem* has
> crashed.  And the IPA driver is not responsible for modem
> firmware, remoteproc is.

Oi vei. So the device it depends on has crashed.

> The IPA driver *can* be responsible for loading some other
> firmware, but even in that case, it only happens on initial
> boot, and it's basically assumed to never crash.

OK is this an issue which we can recover from? If for the slightest bit
this can affect users it is something we should inform them over.

This patch set is missing uevents for these issues, but I just added
support for this.

> So regardless of whether this module_firmware_crashed() call is
> appropriate in some places, I believe it should not be used here.

OK thanks. Can the user be affected by this crash? If so how? Can
we recover ? Is that always guaranteed?

  Luis


Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

2020-05-19 Thread Alex Elder

On 5/15/20 4:28 PM, Luis Chamberlain wrote:

This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.

Using a taint flag allows us to annotate when this happens clearly.


I don't fully understand what this is meant to do, so I can't
fully assess whether it's the right thing to do.

But in this particular place in the IPA code, the *modem* has
crashed.  And the IPA driver is not responsible for modem
firmware, remoteproc is.

The IPA driver *can* be responsible for loading some other
firmware, but even in that case, it only happens on initial
boot, and it's basically assumed to never crash.

So regardless of whether this module_firmware_crashed() call is
appropriate in some places, I believe it should not be used here.

-Alex



Cc: Alex Elder 
Signed-off-by: Luis Chamberlain 
---
  drivers/net/ipa/ipa_modem.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
index ed10818dd99f..1790b87446ed 100644
--- a/drivers/net/ipa/ipa_modem.c
+++ b/drivers/net/ipa/ipa_modem.c
@@ -285,6 +285,7 @@ static void ipa_modem_crashed(struct ipa *ipa)
struct device *dev = >pdev->dev;
int ret;
  
+	module_firmware_crashed();

ipa_endpoint_modem_pause_all(ipa, true);
  
  	ipa_endpoint_modem_hol_block_clear_all(ipa);






Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

2020-05-15 Thread Rafael Aquini
On Fri, May 15, 2020 at 09:28:41PM +, Luis Chamberlain wrote:
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
> 
> Using a taint flag allows us to annotate when this happens clearly.
> 
> Cc: Alex Elder 
> Signed-off-by: Luis Chamberlain 
> ---
>  drivers/net/ipa/ipa_modem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
> index ed10818dd99f..1790b87446ed 100644
> --- a/drivers/net/ipa/ipa_modem.c
> +++ b/drivers/net/ipa/ipa_modem.c
> @@ -285,6 +285,7 @@ static void ipa_modem_crashed(struct ipa *ipa)
>   struct device *dev = >pdev->dev;
>   int ret;
>  
> + module_firmware_crashed();
>   ipa_endpoint_modem_pause_all(ipa, true);
>  
>   ipa_endpoint_modem_hol_block_clear_all(ipa);
> -- 
> 2.26.2
> 
Acked-by: Rafael Aquini