RE: [PATCH] PCI: layerscape: Change back to the default error response behavior

2020-10-11 Thread Z.q. Hou
Hi Rob and Kishon,

> -Original Message-
> From: Rob Herring 
> Sent: 2020年9月30日 23:08
> To: Kishon Vijay Abraham I 
> Cc: Z.q. Hou ; PCI ;
> linux-kernel@vger.kernel.org; linux-arm-kernel
> ; Lorenzo Pieralisi
> ; Bjorn Helgaas ; M.h.
> Lian ; Roy Zang ; Mingkai
> Hu ; Leo Li 
> Subject: Re: [PATCH] PCI: layerscape: Change back to the default error
> response behavior
> 
> On Wed, Sep 30, 2020 at 8:29 AM Kishon Vijay Abraham I 
> wrote:
> >
> > Hi Hou,
> >
> > On 29/09/20 6:43 pm, Zhiqiang Hou wrote:
> > > From: Hou Zhiqiang 
> > >
> > > In the current error response behavior, it will send a SLVERR
> > > response to device's internal AXI slave system interface when the
> > > PCIe controller experiences an erroneous completion (UR, CA and CT)
> > > from an external completer for its outbound non-posted request,
> > > which will result in SError and crash the kernel directly.
> > > This patch change back it to the default behavior to increase the
> > > robustness of the kernel. In the default behavior, it always sends
> > > an OKAY response to the internal AXI slave interface when the
> > > controller gets these erroneous completions. And the AER driver will
> > > report and try to recover these errors.
> >
> > I don't think not forwarding any error interrupts is a good idea.
> 
> Interrupts would be fine. Abort/SError is not. I think it is pretty clear 
> what the
> correct behavior is for config accesses.

I agree with Rob.

> 
> > Maybe
> > you could disable it while reading configuration space registers
> > (vendorID and deviceID) and then enable error forwarding back?
> 
> To add to the locking (or lack of) problems in config accesses?

If take this approach, during the hole of CFG access, the error of MEM_rd will 
also not be forwarded, so it's not a reliable mechanism for user.

Thanks,
Zhiqiang

> 
> Rob


Re: [PATCH] PCI: layerscape: Change back to the default error response behavior

2020-09-30 Thread Kishon Vijay Abraham I
Hi,

On 30/09/20 8:37 pm, Rob Herring wrote:
> On Wed, Sep 30, 2020 at 8:29 AM Kishon Vijay Abraham I  wrote:
>>
>> Hi Hou,
>>
>> On 29/09/20 6:43 pm, Zhiqiang Hou wrote:
>>> From: Hou Zhiqiang 
>>>
>>> In the current error response behavior, it will send a SLVERR response
>>> to device's internal AXI slave system interface when the PCIe controller
>>> experiences an erroneous completion (UR, CA and CT) from an external
>>> completer for its outbound non-posted request, which will result in
>>> SError and crash the kernel directly.
>>> This patch change back it to the default behavior to increase the
>>> robustness of the kernel. In the default behavior, it always sends an
>>> OKAY response to the internal AXI slave interface when the controller
>>> gets these erroneous completions. And the AER driver will report and
>>> try to recover these errors.
>>
>> I don't think not forwarding any error interrupts is a good idea.
> 
> Interrupts would be fine. Abort/SError is not. I think it is pretty
> clear what the correct behavior is for config accesses.

IIUC $patch prevents SError in all cases. Doesn't UR, CA and CT all
sends SLVERR which will result in Abort and that is being prevented
here?. Maybe I'm wrong here, Hou can confirm.

Thanks
Kishon


Re: [PATCH] PCI: layerscape: Change back to the default error response behavior

2020-09-30 Thread Rob Herring
On Wed, Sep 30, 2020 at 8:29 AM Kishon Vijay Abraham I  wrote:
>
> Hi Hou,
>
> On 29/09/20 6:43 pm, Zhiqiang Hou wrote:
> > From: Hou Zhiqiang 
> >
> > In the current error response behavior, it will send a SLVERR response
> > to device's internal AXI slave system interface when the PCIe controller
> > experiences an erroneous completion (UR, CA and CT) from an external
> > completer for its outbound non-posted request, which will result in
> > SError and crash the kernel directly.
> > This patch change back it to the default behavior to increase the
> > robustness of the kernel. In the default behavior, it always sends an
> > OKAY response to the internal AXI slave interface when the controller
> > gets these erroneous completions. And the AER driver will report and
> > try to recover these errors.
>
> I don't think not forwarding any error interrupts is a good idea.

Interrupts would be fine. Abort/SError is not. I think it is pretty
clear what the correct behavior is for config accesses.

> Maybe
> you could disable it while reading configuration space registers
> (vendorID and deviceID) and then enable error forwarding back?

To add to the locking (or lack of) problems in config accesses?

Rob


Re: [PATCH] PCI: layerscape: Change back to the default error response behavior

2020-09-30 Thread Kishon Vijay Abraham I
Hi Hou,

On 29/09/20 6:43 pm, Zhiqiang Hou wrote:
> From: Hou Zhiqiang 
> 
> In the current error response behavior, it will send a SLVERR response
> to device's internal AXI slave system interface when the PCIe controller
> experiences an erroneous completion (UR, CA and CT) from an external
> completer for its outbound non-posted request, which will result in
> SError and crash the kernel directly.
> This patch change back it to the default behavior to increase the
> robustness of the kernel. In the default behavior, it always sends an
> OKAY response to the internal AXI slave interface when the controller
> gets these erroneous completions. And the AER driver will report and
> try to recover these errors.

I don't think not forwarding any error interrupts is a good idea. Maybe
you could disable it while reading configuration space registers
(vendorID and deviceID) and then enable error forwarding back?

Thanks
Kishon
> 
> Signed-off-by: Hou Zhiqiang 
> ---
>  drivers/pci/controller/dwc/pci-layerscape.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index f24f79a70d9a..e92ab8a77046 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -30,8 +30,6 @@
>  
>  /* PEX Internal Configuration Registers */
>  #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask Register1 */
> -#define PCIE_ABSERR  0x8d0 /* Bridge Slave Error Response Register */
> -#define PCIE_ABSERR_SETTING  0x9401 /* Forward error of non-posted request */
>  
>  #define PCIE_IATU_NUM6
>  
> @@ -123,14 +121,6 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
>   return 1;
>  }
>  
> -/* Forward error response of outbound non-posted requests */
> -static void ls_pcie_fix_error_response(struct ls_pcie *pcie)
> -{
> - struct dw_pcie *pci = pcie->pci;
> -
> - iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
> -}
> -
>  static int ls_pcie_host_init(struct pcie_port *pp)
>  {
>   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> @@ -142,7 +132,6 @@ static int ls_pcie_host_init(struct pcie_port *pp)
>* dw_pcie_setup_rc() will reconfigure the outbound windows.
>*/
>   ls_pcie_disable_outbound_atus(pcie);
> - ls_pcie_fix_error_response(pcie);
>  
>   dw_pcie_dbi_ro_wr_en(pci);
>   ls_pcie_clear_multifunction(pcie);
> 


RE: [PATCH] PCI: layerscape: Change back to the default error response behavior

2020-09-29 Thread Z.q. Hou
Hi Bjorn,

Thanks a lot for your comments!

> -Original Message-
> From: Bjorn Helgaas 
> Sent: 2020年9月29日 23:03
> To: Z.q. Hou 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; lorenzo.pieral...@arm.com;
> r...@kernel.org; bhelg...@google.com; M.h. Lian
> ; Roy Zang ; Mingkai Hu
> ; Leo Li 
> Subject: Re: [PATCH] PCI: layerscape: Change back to the default error
> response behavior
> 
> On Tue, Sep 29, 2020 at 09:13:28PM +0800, Zhiqiang Hou wrote:
> > From: Hou Zhiqiang 
> >
> > In the current error response behavior, it will send a SLVERR response
> > to device's internal AXI slave system interface when the PCIe
> > controller experiences an erroneous completion (UR, CA and CT) from an
> > external completer for its outbound non-posted request, which will
> > result in SError and crash the kernel directly.
> 
> Possible wording:
> 
>   As currently configured, when the PCIe controller receives a
>   Completion with UR or CA status, or a Completion Timeout occurs, it
>   sends a SLVERR response to the internal AXI slave system interface,
>   which results in SError and a kernel crash.
> 
> Please add a blank line between paragraphs, and s/This patch change back
> it/Change it/ below.
> 
> > This patch change back it to the default behavior to increase the
> > robustness of the kernel. In the default behavior, it always sends an
> > OKAY response to the internal AXI slave interface when the controller
> > gets these erroneous completions. And the AER driver will report and
> > try to recover these errors.
> 
> This reverts 84d897d69938 ("PCI: layerscape: Change default error response
> behavior"), so please mention that in the commit log, probably as:
> 
> Fixes: 84d897d69938 ("PCI: layerscape: Change default error response
> behavior")
> 
> Maybe it also needs a stable tag, e.g., v4.15+?

Thanks for your good suggestions! Will fix in v2.

> 
> Since this is a pure revert, whatever problem 84d897d69938 fixed must now
> be fixed in some other way.  Otherwise, this revert would just be
> reintroducing the problem fixed by 84d897d69938.
> 
> This commit log should mention that what that other fix is.
> 
> AER is only a reporting mechanism, it is asynchronous to the instruction
> stream, and it's optional (may not be implemented in the hardware, and may
> not be supported by the kernel), so I'm not super convinced that it can be the
> answer to this problem.
>

The commit 84d897d69938 ("PCI: layerscape: Change default error response 
behavior") doesn't fix any issue, it just enable a feature of DesignWare PCIe 
IP that it allows error response to AXI slave interface, which are not enabled 
on all other platforms with DWC IP. As mentioned in that commit it will also 
send an OKAY response to AXI slave interface for erroneous completion of 
non-post transaction including CFG and MEM_rd transactions, however upstream 
won't support for platforms aborting on CFG accesses, so we have to change it 
back to the default error response behavior and bear the error of MEM_rd isn't 
forwarded, just like other DWC IP platforms.

I remember the SError interrupt mechanism is also asynchronous abort and it is 
only a reporting mechanism. Contrast with the AER, it will make the kernel 
crash. So both of these 2 mechanism cannot ensure the data integrity, generally 
the upper layer data transfer protocol has its own mechanism to ensure the data 
integrity, it's not a issue for almost users. If one really wants a kernel 
crash when there is error of MEM_rd, he can enable this in his local code.

Thanks,
Zhiqiang
 
> > Signed-off-by: Hou Zhiqiang 
> > ---
> >  drivers/pci/controller/dwc/pci-layerscape.c | 11 ---
> >  1 file changed, 11 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c
> > b/drivers/pci/controller/dwc/pci-layerscape.c
> > index f24f79a70d9a..e92ab8a77046 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > @@ -30,8 +30,6 @@
> >
> >  /* PEX Internal Configuration Registers */
> >  #define PCIE_STRFMR1   0x71c /* Symbol Timer & Filter Mask
> Register1 */
> > -#define PCIE_ABSERR0x8d0 /* Bridge Slave Error Response
> Register */
> > -#define PCIE_ABSERR_SETTING0x9401 /* Forward error of
> non-posted request */
> >
> >  #define PCIE_IATU_NUM  6
> >
> > @@ -123,14 +121,6 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
> > return 1;
> >  }
> >
> > -/* Forward error response of outbound non-posted reque

Re: [PATCH] PCI: layerscape: Change back to the default error response behavior

2020-09-29 Thread Bjorn Helgaas
On Tue, Sep 29, 2020 at 09:13:28PM +0800, Zhiqiang Hou wrote:
> From: Hou Zhiqiang 
> 
> In the current error response behavior, it will send a SLVERR response
> to device's internal AXI slave system interface when the PCIe controller
> experiences an erroneous completion (UR, CA and CT) from an external
> completer for its outbound non-posted request, which will result in
> SError and crash the kernel directly.

Possible wording:

  As currently configured, when the PCIe controller receives a
  Completion with UR or CA status, or a Completion Timeout occurs, it
  sends a SLVERR response to the internal AXI slave system interface,
  which results in SError and a kernel crash.

Please add a blank line between paragraphs, and
s/This patch change back it/Change it/ below.

> This patch change back it to the default behavior to increase the
> robustness of the kernel. In the default behavior, it always sends an
> OKAY response to the internal AXI slave interface when the controller
> gets these erroneous completions. And the AER driver will report and
> try to recover these errors.

This reverts 84d897d69938 ("PCI: layerscape: Change default error
response behavior"), so please mention that in the commit log,
probably as:

Fixes: 84d897d69938 ("PCI: layerscape: Change default error response behavior")

Maybe it also needs a stable tag, e.g., v4.15+?

Since this is a pure revert, whatever problem 84d897d69938 fixed must
now be fixed in some other way.  Otherwise, this revert would just be
reintroducing the problem fixed by 84d897d69938.

This commit log should mention that what that other fix is.

AER is only a reporting mechanism, it is asynchronous to the
instruction stream, and it's optional (may not be implemented in the
hardware, and may not be supported by the kernel), so I'm not super
convinced that it can be the answer to this problem.

> Signed-off-by: Hou Zhiqiang 
> ---
>  drivers/pci/controller/dwc/pci-layerscape.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index f24f79a70d9a..e92ab8a77046 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -30,8 +30,6 @@
>  
>  /* PEX Internal Configuration Registers */
>  #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask Register1 */
> -#define PCIE_ABSERR  0x8d0 /* Bridge Slave Error Response Register */
> -#define PCIE_ABSERR_SETTING  0x9401 /* Forward error of non-posted request */
>  
>  #define PCIE_IATU_NUM6
>  
> @@ -123,14 +121,6 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
>   return 1;
>  }
>  
> -/* Forward error response of outbound non-posted requests */
> -static void ls_pcie_fix_error_response(struct ls_pcie *pcie)
> -{
> - struct dw_pcie *pci = pcie->pci;
> -
> - iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
> -}
> -
>  static int ls_pcie_host_init(struct pcie_port *pp)
>  {
>   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> @@ -142,7 +132,6 @@ static int ls_pcie_host_init(struct pcie_port *pp)
>* dw_pcie_setup_rc() will reconfigure the outbound windows.
>*/
>   ls_pcie_disable_outbound_atus(pcie);
> - ls_pcie_fix_error_response(pcie);
>  
>   dw_pcie_dbi_ro_wr_en(pci);
>   ls_pcie_clear_multifunction(pcie);
> -- 
> 2.17.1
>