Re: [PATCH v6 4/5] PCI: only return true when dev io state is really changed

2020-10-03 Thread Ethan Zhao
Bjorn,

On Sat, Oct 3, 2020 at 1:29 AM Bjorn Helgaas  wrote:
>
> [+cc Sinan]
>
> On Wed, Sep 30, 2020 at 03:05:36AM -0400, Ethan Zhao wrote:
> > When uncorrectable error happens, AER driver and DPC driver interrupt
> > handlers likely call
> >
> >pcie_do_recovery()
> >->pci_walk_bus()
> >  ->report_frozen_detected()
> >
> > with pci_channel_io_frozen the same time.
> >If pci_dev_set_io_state() return true even if the original state is
> > pci_channel_io_frozen, that will cause AER or DPC handler re-enter
> > the error detecting and recovery procedure one after another.
> >The result is the recovery flow mixed between AER and DPC.
> > So simplify the pci_dev_set_io_state() function to only return true
> > when dev->error_state is changed.
> >
> > Signed-off-by: Ethan Zhao 
> > Tested-by: Wen Jin 
> > Tested-by: Shanshan Zhang 
> > Reviewed-by: Alexandru Gagniuc 
> > Reviewed-by: Andy Shevchenko 
> > ---
> > Changnes:
> >  v2: revise description and code according to suggestion from Andy.
> >  v3: change code to simpler.
> >  v4: no change.
> >  v5: no change.
> >  v6: no change.
> >
> >  drivers/pci/pci.h | 37 +
> >  1 file changed, 5 insertions(+), 32 deletions(-)
> >
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 455b32187abd..f2beeaeda321 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -359,39 +359,12 @@ struct pci_sriov {
> >  static inline bool pci_dev_set_io_state(struct pci_dev *dev,
> >   pci_channel_state_t new)
> >  {
> > - bool changed = false;
> > -
> >   device_lock_assert(>dev);
> > - switch (new) {
> > - case pci_channel_io_perm_failure:
> > - switch (dev->error_state) {
> > - case pci_channel_io_frozen:
> > - case pci_channel_io_normal:
> > - case pci_channel_io_perm_failure:
> > - changed = true;
> > - break;
> > - }
> > - break;
> > - case pci_channel_io_frozen:
> > - switch (dev->error_state) {
> > - case pci_channel_io_frozen:
> > - case pci_channel_io_normal:
> > - changed = true;
> > - break;
> > - }
> > - break;
> > - case pci_channel_io_normal:
> > - switch (dev->error_state) {
> > - case pci_channel_io_frozen:
> > - case pci_channel_io_normal:
> > - changed = true;
> > - break;
> > - }
> > - break;
> > - }
> > - if (changed)
> > - dev->error_state = new;
> > - return changed;
> > + if (dev->error_state == new)
> > + return false;
> > +
> > + dev->error_state = new;
> > + return true;
> >  }
>
> IIUC this changes the behavior of the function, but it's difficult to
> analyze because it does a lot of simplification at the same time.
>
> Please consider the following, which is intended to simplify the
> function while preserving the behavior (but please verify; it's been a
> long time since I looked at this).  Then maybe see how your patch
> could be done on top of this?
>
> Alternatively, come up with your own simplification patch + the
> functionality change.
>
>
> commit 983d9b1f8177 ("PCI/ERR: Simplify pci_dev_set_io_state()")
> Author: Bjorn Helgaas 
> Date:   Tue May 19 12:28:57 2020 -0500
>
> PCI/ERR: Simplify pci_dev_set_io_state()
>
> Truth table:
>
>   requested new state
>   current  --
>   statenormal frozen perm_failure
>     +  -  -  
>   normal|  normal frozen perm_failure
>   frozen|  normal frozen perm_failure
>   perm_failure  |  perm_failure*  perm_failure*  perm_failure
>
>   * "not changed", returns false
>
> No functional change intended.
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 6d3f75867106..81408552f7c9 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -358,39 +358,21 @@ struct pci_sriov {
>  static inline bool pci_dev_set_io_state(struct pci_dev *dev,
> pci_channel_state_t new)
>  {
> -   bool changed = false;
> -
> device_lock_assert(>dev);
> -   switch (new) {
> -   case pci_channel_io_perm_failure:
> -   switch (dev->error_state) {
> -   case pci_channel_io_frozen:
> -   case pci_channel_io_normal:
> -   case pci_channel_io_perm_failure:
> -   changed = true;
> -   break;
> -   }
> -   break;
> -   case pci_channel_io_frozen:
> -   switch (dev->error_state) {
> -   case 

Re: [PATCH v6 4/5] PCI: only return true when dev io state is really changed

2020-10-02 Thread Ethan Zhao
Sinan,

On Sat, Oct 3, 2020 at 12:08 AM Sinan Kaya  wrote:
>
> On 9/30/2020 3:05 AM, Ethan Zhao wrote:
> > When uncorrectable error happens, AER driver and DPC driver interrupt
> > handlers likely call
> >
> >pcie_do_recovery()
> >->pci_walk_bus()
> >  ->report_frozen_detected()
> >
> > with pci_channel_io_frozen the same time.
>
> We need some more data on this. If DPC is supported by HW, errors
> should be triggered by DPC not AER.
>
> If I remember right, there is a register that tells which AER errors
> should be handled by DPC.

When uncorrectable errors happen, non-fatal severity level, AER and
DPC could be triggered at the same time.


Thanks,
Ethan

>


Re: [PATCH v6 4/5] PCI: only return true when dev io state is really changed

2020-10-02 Thread Bjorn Helgaas
[+cc Sinan]

On Wed, Sep 30, 2020 at 03:05:36AM -0400, Ethan Zhao wrote:
> When uncorrectable error happens, AER driver and DPC driver interrupt
> handlers likely call
> 
>pcie_do_recovery()
>->pci_walk_bus()
>  ->report_frozen_detected()
> 
> with pci_channel_io_frozen the same time.
>If pci_dev_set_io_state() return true even if the original state is
> pci_channel_io_frozen, that will cause AER or DPC handler re-enter
> the error detecting and recovery procedure one after another.
>The result is the recovery flow mixed between AER and DPC.
> So simplify the pci_dev_set_io_state() function to only return true
> when dev->error_state is changed.
> 
> Signed-off-by: Ethan Zhao 
> Tested-by: Wen Jin 
> Tested-by: Shanshan Zhang 
> Reviewed-by: Alexandru Gagniuc 
> Reviewed-by: Andy Shevchenko 
> ---
> Changnes:
>  v2: revise description and code according to suggestion from Andy.
>  v3: change code to simpler.
>  v4: no change.
>  v5: no change.
>  v6: no change.
> 
>  drivers/pci/pci.h | 37 +
>  1 file changed, 5 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 455b32187abd..f2beeaeda321 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -359,39 +359,12 @@ struct pci_sriov {
>  static inline bool pci_dev_set_io_state(struct pci_dev *dev,
>   pci_channel_state_t new)
>  {
> - bool changed = false;
> -
>   device_lock_assert(>dev);
> - switch (new) {
> - case pci_channel_io_perm_failure:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - case pci_channel_io_perm_failure:
> - changed = true;
> - break;
> - }
> - break;
> - case pci_channel_io_frozen:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - changed = true;
> - break;
> - }
> - break;
> - case pci_channel_io_normal:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - changed = true;
> - break;
> - }
> - break;
> - }
> - if (changed)
> - dev->error_state = new;
> - return changed;
> + if (dev->error_state == new)
> + return false;
> +
> + dev->error_state = new;
> + return true;
>  }

IIUC this changes the behavior of the function, but it's difficult to
analyze because it does a lot of simplification at the same time.

Please consider the following, which is intended to simplify the
function while preserving the behavior (but please verify; it's been a
long time since I looked at this).  Then maybe see how your patch
could be done on top of this?

Alternatively, come up with your own simplification patch + the
functionality change.


commit 983d9b1f8177 ("PCI/ERR: Simplify pci_dev_set_io_state()")
Author: Bjorn Helgaas 
Date:   Tue May 19 12:28:57 2020 -0500

PCI/ERR: Simplify pci_dev_set_io_state()

Truth table:

  requested new state
  current  --
  statenormal frozen perm_failure
    +  -  -  
  normal|  normal frozen perm_failure
  frozen|  normal frozen perm_failure
  perm_failure  |  perm_failure*  perm_failure*  perm_failure

  * "not changed", returns false

No functional change intended.

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f75867106..81408552f7c9 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -358,39 +358,21 @@ struct pci_sriov {
 static inline bool pci_dev_set_io_state(struct pci_dev *dev,
pci_channel_state_t new)
 {
-   bool changed = false;
-
device_lock_assert(>dev);
-   switch (new) {
-   case pci_channel_io_perm_failure:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   case pci_channel_io_perm_failure:
-   changed = true;
-   break;
-   }
-   break;
-   case pci_channel_io_frozen:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   changed = true;
-   break;
-   }
-   break;
-   case pci_channel_io_normal:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-  

Re: [PATCH v6 4/5] PCI: only return true when dev io state is really changed

2020-10-02 Thread Sinan Kaya
On 9/30/2020 3:05 AM, Ethan Zhao wrote:
> When uncorrectable error happens, AER driver and DPC driver interrupt
> handlers likely call
> 
>pcie_do_recovery()
>->pci_walk_bus()
>  ->report_frozen_detected()
> 
> with pci_channel_io_frozen the same time.

We need some more data on this. If DPC is supported by HW, errors
should be triggered by DPC not AER.

If I remember right, there is a register that tells which AER errors
should be handled by DPC.



[PATCH v6 4/5] PCI: only return true when dev io state is really changed

2020-09-30 Thread Ethan Zhao
When uncorrectable error happens, AER driver and DPC driver interrupt
handlers likely call

   pcie_do_recovery()
   ->pci_walk_bus()
 ->report_frozen_detected()

with pci_channel_io_frozen the same time.
   If pci_dev_set_io_state() return true even if the original state is
pci_channel_io_frozen, that will cause AER or DPC handler re-enter
the error detecting and recovery procedure one after another.
   The result is the recovery flow mixed between AER and DPC.
So simplify the pci_dev_set_io_state() function to only return true
when dev->error_state is changed.

Signed-off-by: Ethan Zhao 
Tested-by: Wen Jin 
Tested-by: Shanshan Zhang 
Reviewed-by: Alexandru Gagniuc 
Reviewed-by: Andy Shevchenko 
---
Changnes:
 v2: revise description and code according to suggestion from Andy.
 v3: change code to simpler.
 v4: no change.
 v5: no change.
 v6: no change.

 drivers/pci/pci.h | 37 +
 1 file changed, 5 insertions(+), 32 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 455b32187abd..f2beeaeda321 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -359,39 +359,12 @@ struct pci_sriov {
 static inline bool pci_dev_set_io_state(struct pci_dev *dev,
pci_channel_state_t new)
 {
-   bool changed = false;
-
device_lock_assert(>dev);
-   switch (new) {
-   case pci_channel_io_perm_failure:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   case pci_channel_io_perm_failure:
-   changed = true;
-   break;
-   }
-   break;
-   case pci_channel_io_frozen:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   changed = true;
-   break;
-   }
-   break;
-   case pci_channel_io_normal:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   changed = true;
-   break;
-   }
-   break;
-   }
-   if (changed)
-   dev->error_state = new;
-   return changed;
+   if (dev->error_state == new)
+   return false;
+
+   dev->error_state = new;
+   return true;
 }
 
 static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
-- 
2.18.4