RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-08 Thread Hongtao Jia


> -Original Message-
> From: Wood Scott-B07421
> Sent: Thursday, October 09, 2014 7:48 AM
> To: Jia Hongtao-B38951
> Cc: Guenter Roeck; Benjamin Herrenschmidt; Paul Mackerras; Michael
> Ellerman; linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org;
> Jojy G Varghese; Guenter Roeck
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
> 
> On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, September 30, 2014 2:36 AM
> > > To: Guenter Roeck
> > > Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman;
> > > linuxppc- d...@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G
> > > Varghese; Guenter Roeck; Jia Hongtao-B38951
> > > Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine
> > > check exception on E500MC / E5500
> > >
> > > On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > > > From: Jojy G Varghese 
> > > >
> > > > For E500MC and E5500, a machine check exception in pci(e) memory
> > > > space crashes the kernel.
> > > >
> > > > Testing shows that the MCAR(U) register is zero on a MC exception
> > > > for the
> > > > E5500 core. At the same time, DEAR register has been found to have
> > > > the address of the faulty load address during an MC exception for
> this core.
> > > >
> > > > This fix changes the current behavior to fixup the result register
> > > > and instruction pointers in the case of a load operation on a
> > > > faulty PCI address.
> > > >
> > > > The changes are:
> > > > - Added the hook to pci machine check handing to the e500mc
> > > > machine
> > > check
> > > >   exception handler.
> > > > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > > >   As mentioned above, this is necessary because the E5500 core does
> not
> > > >   report the fault address in the MCAR register.
> > > >
> > > > Cc: Scott Wood 
> > > > Signed-off-by: Jojy G Varghese  [Guenter Roeck:
> > > > updated description]
> > > > Signed-off-by: Guenter Roeck 
> > > > Signed-off-by: Guenter Roeck 
> > > > ---
> > > >  arch/powerpc/kernel/traps.c   | 3 ++-
> > > >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> > > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/powerpc/kernel/traps.c
> > > > b/arch/powerpc/kernel/traps.c index 0dc43f9..ecb709b 100644
> > > > --- a/arch/powerpc/kernel/traps.c
> > > > +++ b/arch/powerpc/kernel/traps.c
> > > > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > > > int recoverable = 1;
> > > >
> > > > if (reason & MCSR_LD) {
> > > > -   recoverable = fsl_rio_mcheck_exception(regs);
> > > > +   recoverable = fsl_rio_mcheck_exception(regs) ||
> > > > +   fsl_pci_mcheck_exception(regs);
> > > > if (recoverable == 1)
> > > > goto silent_out;
> > > > }
> > > > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > > > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > > > --- a/arch/powerpc/sysdev/fsl_pci.c
> > > > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > > > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > > > *regs)  #endif
> > > > addr += mfspr(SPRN_MCAR);
> > > >
> > > > +#ifdef CONFIG_E5500_CPU
> > > > +   if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > > > +   addr = PFN_PHYS(vmalloc_to_pfn((void
> *)mfspr(SPRN_DEAR)));
> > > #endif
> > >
> > > Kconfig tells you what hardware is supported, not what hardware
> > > you're actually running on.
> > >
> > > Jia Hongtao, do you know anything about this issue?  Is there an
> erratum?
> >
> > Sorry for the late response, I just return from my vacation.
> > I don't know this issue.
> >
> > > What chips are affected by the the erratum covered by
> > > <http://patchwork.ozlabs.org/patch/240239/>?
> >
> > MPC8544, MPC8548, MPC8572 are affected by this erratum.
> 
> What is the 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-08 Thread Scott Wood
On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Tuesday, September 30, 2014 2:36 AM
> > To: Guenter Roeck
> > Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-
> > d...@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G Varghese;
> > Guenter Roeck; Jia Hongtao-B38951
> > Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> > exception on E500MC / E5500
> > 
> > On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > > From: Jojy G Varghese 
> > >
> > > For E500MC and E5500, a machine check exception in pci(e) memory space
> > > crashes the kernel.
> > >
> > > Testing shows that the MCAR(U) register is zero on a MC exception for
> > > the
> > > E5500 core. At the same time, DEAR register has been found to have the
> > > address of the faulty load address during an MC exception for this core.
> > >
> > > This fix changes the current behavior to fixup the result register and
> > > instruction pointers in the case of a load operation on a faulty PCI
> > > address.
> > >
> > > The changes are:
> > > - Added the hook to pci machine check handing to the e500mc machine
> > check
> > >   exception handler.
> > > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > >   As mentioned above, this is necessary because the E5500 core does not
> > >   report the fault address in the MCAR register.
> > >
> > > Cc: Scott Wood 
> > > Signed-off-by: Jojy G Varghese  [Guenter Roeck:
> > > updated description]
> > > Signed-off-by: Guenter Roeck 
> > > Signed-off-by: Guenter Roeck 
> > > ---
> > >  arch/powerpc/kernel/traps.c   | 3 ++-
> > >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > > index 0dc43f9..ecb709b 100644
> > > --- a/arch/powerpc/kernel/traps.c
> > > +++ b/arch/powerpc/kernel/traps.c
> > > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > >   int recoverable = 1;
> > >
> > >   if (reason & MCSR_LD) {
> > > - recoverable = fsl_rio_mcheck_exception(regs);
> > > + recoverable = fsl_rio_mcheck_exception(regs) ||
> > > + fsl_pci_mcheck_exception(regs);
> > >   if (recoverable == 1)
> > >   goto silent_out;
> > >   }
> > > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > > --- a/arch/powerpc/sysdev/fsl_pci.c
> > > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > > *regs)  #endif
> > >   addr += mfspr(SPRN_MCAR);
> > >
> > > +#ifdef CONFIG_E5500_CPU
> > > + if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > > + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> > #endif
> > 
> > Kconfig tells you what hardware is supported, not what hardware you're
> > actually running on.
> > 
> > Jia Hongtao, do you know anything about this issue?  Is there an erratum?
> 
> Sorry for the late response, I just return from my vacation.
> I don't know this issue.
> 
> > What chips are affected by the the erratum covered by
> > <http://patchwork.ozlabs.org/patch/240239/>?
> 
> MPC8544, MPC8548, MPC8572 are affected by this erratum.

What is the erratum number?

> I checked P4080 which using e500mc and no such erratum is found.

What is the erratum behavior, and how does it differ from the problem
that Jojy and Guenter are trying to solve?

-Scott


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-08 Thread Scott Wood
On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, September 30, 2014 2:36 AM
  To: Guenter Roeck
  Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-
  d...@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G Varghese;
  Guenter Roeck; Jia Hongtao-B38951
  Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
  exception on E500MC / E5500
  
  On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
   From: Jojy G Varghese jo...@juniper.net
  
   For E500MC and E5500, a machine check exception in pci(e) memory space
   crashes the kernel.
  
   Testing shows that the MCAR(U) register is zero on a MC exception for
   the
   E5500 core. At the same time, DEAR register has been found to have the
   address of the faulty load address during an MC exception for this core.
  
   This fix changes the current behavior to fixup the result register and
   instruction pointers in the case of a load operation on a faulty PCI
   address.
  
   The changes are:
   - Added the hook to pci machine check handing to the e500mc machine
  check
 exception handler.
   - For the E5500 core, load faulting address from SPRN_DEAR register.
 As mentioned above, this is necessary because the E5500 core does not
 report the fault address in the MCAR register.
  
   Cc: Scott Wood scottw...@freescale.com
   Signed-off-by: Jojy G Varghese jo...@juniper.net [Guenter Roeck:
   updated description]
   Signed-off-by: Guenter Roeck gro...@juniper.net
   Signed-off-by: Guenter Roeck li...@roeck-us.net
   ---
arch/powerpc/kernel/traps.c   | 3 ++-
arch/powerpc/sysdev/fsl_pci.c | 5 +
2 files changed, 7 insertions(+), 1 deletion(-)
  
   diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
   index 0dc43f9..ecb709b 100644
   --- a/arch/powerpc/kernel/traps.c
   +++ b/arch/powerpc/kernel/traps.c
   @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
 int recoverable = 1;
  
 if (reason  MCSR_LD) {
   - recoverable = fsl_rio_mcheck_exception(regs);
   + recoverable = fsl_rio_mcheck_exception(regs) ||
   + fsl_pci_mcheck_exception(regs);
 if (recoverable == 1)
 goto silent_out;
 }
   diff --git a/arch/powerpc/sysdev/fsl_pci.c
   b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
   --- a/arch/powerpc/sysdev/fsl_pci.c
   +++ b/arch/powerpc/sysdev/fsl_pci.c
   @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
   *regs)  #endif
 addr += mfspr(SPRN_MCAR);
  
   +#ifdef CONFIG_E5500_CPU
   + if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
   + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
  #endif
  
  Kconfig tells you what hardware is supported, not what hardware you're
  actually running on.
  
  Jia Hongtao, do you know anything about this issue?  Is there an erratum?
 
 Sorry for the late response, I just return from my vacation.
 I don't know this issue.
 
  What chips are affected by the the erratum covered by
  http://patchwork.ozlabs.org/patch/240239/?
 
 MPC8544, MPC8548, MPC8572 are affected by this erratum.

What is the erratum number?

 I checked P4080 which using e500mc and no such erratum is found.

What is the erratum behavior, and how does it differ from the problem
that Jojy and Guenter are trying to solve?

-Scott


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-08 Thread Hongtao Jia


 -Original Message-
 From: Wood Scott-B07421
 Sent: Thursday, October 09, 2014 7:48 AM
 To: Jia Hongtao-B38951
 Cc: Guenter Roeck; Benjamin Herrenschmidt; Paul Mackerras; Michael
 Ellerman; linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org;
 Jojy G Varghese; Guenter Roeck
 Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
 exception on E500MC / E5500
 
 On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
 
   -Original Message-
   From: Wood Scott-B07421
   Sent: Tuesday, September 30, 2014 2:36 AM
   To: Guenter Roeck
   Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman;
   linuxppc- d...@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G
   Varghese; Guenter Roeck; Jia Hongtao-B38951
   Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine
   check exception on E500MC / E5500
  
   On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
From: Jojy G Varghese jo...@juniper.net
   
For E500MC and E5500, a machine check exception in pci(e) memory
space crashes the kernel.
   
Testing shows that the MCAR(U) register is zero on a MC exception
for the
E5500 core. At the same time, DEAR register has been found to have
the address of the faulty load address during an MC exception for
 this core.
   
This fix changes the current behavior to fixup the result register
and instruction pointers in the case of a load operation on a
faulty PCI address.
   
The changes are:
- Added the hook to pci machine check handing to the e500mc
machine
   check
  exception handler.
- For the E5500 core, load faulting address from SPRN_DEAR register.
  As mentioned above, this is necessary because the E5500 core does
 not
  report the fault address in the MCAR register.
   
Cc: Scott Wood scottw...@freescale.com
Signed-off-by: Jojy G Varghese jo...@juniper.net [Guenter Roeck:
updated description]
Signed-off-by: Guenter Roeck gro...@juniper.net
Signed-off-by: Guenter Roeck li...@roeck-us.net
---
 arch/powerpc/kernel/traps.c   | 3 ++-
 arch/powerpc/sysdev/fsl_pci.c | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)
   
diff --git a/arch/powerpc/kernel/traps.c
b/arch/powerpc/kernel/traps.c index 0dc43f9..ecb709b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
int recoverable = 1;
   
if (reason  MCSR_LD) {
-   recoverable = fsl_rio_mcheck_exception(regs);
+   recoverable = fsl_rio_mcheck_exception(regs) ||
+   fsl_pci_mcheck_exception(regs);
if (recoverable == 1)
goto silent_out;
}
diff --git a/arch/powerpc/sysdev/fsl_pci.c
b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
*regs)  #endif
addr += mfspr(SPRN_MCAR);
   
+#ifdef CONFIG_E5500_CPU
+   if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
+   addr = PFN_PHYS(vmalloc_to_pfn((void
 *)mfspr(SPRN_DEAR)));
   #endif
  
   Kconfig tells you what hardware is supported, not what hardware
   you're actually running on.
  
   Jia Hongtao, do you know anything about this issue?  Is there an
 erratum?
 
  Sorry for the late response, I just return from my vacation.
  I don't know this issue.
 
   What chips are affected by the the erratum covered by
   http://patchwork.ozlabs.org/patch/240239/?
 
  MPC8544, MPC8548, MPC8572 are affected by this erratum.
 
 What is the erratum number?

The number of this erratum for each chip is not consistent.
MPC8544: PCIe 4
MPC8548: PCI-Ex 39
MPC8572: PCI-Ex 3

 
  I checked P4080 which using e500mc and no such erratum is found.
 
 What is the erratum behavior, and how does it differ from the problem
 that Jojy and Guenter are trying to solve?

Here is the description of the erratum:

When its link goes down, the PCI Express controller clears all outstanding 
transactions with an
error indicator and sends a link down exception to the interrupt controller if
PEX_PME_MES_DISR[LDDD] = 0. If, however, any transactions are sent to the 
controller
after the link down event, they will be accepted by the controller and wait for 
the link to come
back up before starting any timeout counters (e.g. completion timeout). There 
is no mechanism
to cancel the new transactions short of a device HRESET.

For e500mc as Jojy and Guenter described it's like the same erratum on e500, 
not 100% sure.

For e5500 I don't quite understand yet.

 
 -Scott
 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-07 Thread Hongtao Jia


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, October 01, 2014 8:44 AM
> To: Guenter Roeck
> Cc: Jojy Varghese; Benjamin Herrenschmidt; Paul Mackerras; Michael
> Ellerman; linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org;
> Guenter Roeck; Jia Hongtao-B38951
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
> 
> On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
> > On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> > > On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> > > >
> > > > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> > > >
> > > > >Those are errors related to PCIe hotplug, and are seen with
> > > > >unexpected PCIe device removals (triggered, for example, by
> > > > >removing power from a PCIe adapter).
> > > > >The behavior we see on E5500 is quite similar to the same
> > > > >behavior on
> > > > >E500:
> > > > >If unhandled, the CPU keeps executing the same instruction over
> > > > >and over again if there is an error on a PCIe access and thus
> > > > >stalls. I don't know if this is considered an erratum or expected
> > > > >behavior, but it is one we have to address since we have to be
> > > > >able to handle that condition.
> > >
> > > The reason I ask is that the handling for e500 was described as an
> > > erratum workaround.  If it is an erratum it would be nice to know
> > > the erratum number and the full list of affected chips.
> > >
> > My understanding, which may be wrong, was that this is expected
> > behavior, at least for E5500. I actually thought I had seen it
> > somewhere in the specification (response to PCIe errors), but I don't
> recall where exactly.
> >
> > At least for my part I am not aware of an erratum.
> 
> Jia Hongtao, can you comment here?

I did not find any related erratum either.

> 
> > > > >Ultimately, we'll want
> > > > >to
> > > > >implement PCIe error handlers for the affected drivers, but that
> > > > >will be a next step.
> > >
> > > For now can we at least print a ratelimited error message?  I don't
> > > like the idea of silently ignoring these errors.  I suppose it's a
> > > separate issue from extending the workaround to cover e500mc, though.
> > >
> > I don't really like the idea of printing an error message pretty much
> > each time when an unexpected hotplug event occurs.
> 
> Unexpected events seem like the sort of thing you'd want to log, but my
> concern is that this might not be the only cause of PCI errors.
> 
> -Scott
> 



RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-07 Thread Hongtao Jia


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, September 30, 2014 2:36 AM
> To: Guenter Roeck
> Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-
> d...@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G Varghese;
> Guenter Roeck; Jia Hongtao-B38951
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
> 
> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > From: Jojy G Varghese 
> >
> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > crashes the kernel.
> >
> > Testing shows that the MCAR(U) register is zero on a MC exception for
> > the
> > E5500 core. At the same time, DEAR register has been found to have the
> > address of the faulty load address during an MC exception for this core.
> >
> > This fix changes the current behavior to fixup the result register and
> > instruction pointers in the case of a load operation on a faulty PCI
> > address.
> >
> > The changes are:
> > - Added the hook to pci machine check handing to the e500mc machine
> check
> >   exception handler.
> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> >   As mentioned above, this is necessary because the E5500 core does not
> >   report the fault address in the MCAR register.
> >
> > Cc: Scott Wood 
> > Signed-off-by: Jojy G Varghese  [Guenter Roeck:
> > updated description]
> > Signed-off-by: Guenter Roeck 
> > Signed-off-by: Guenter Roeck 
> > ---
> >  arch/powerpc/kernel/traps.c   | 3 ++-
> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 0dc43f9..ecb709b 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > int recoverable = 1;
> >
> > if (reason & MCSR_LD) {
> > -   recoverable = fsl_rio_mcheck_exception(regs);
> > +   recoverable = fsl_rio_mcheck_exception(regs) ||
> > +   fsl_pci_mcheck_exception(regs);
> > if (recoverable == 1)
> > goto silent_out;
> > }
> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > *regs)  #endif
> > addr += mfspr(SPRN_MCAR);
> >
> > +#ifdef CONFIG_E5500_CPU
> > +   if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > +   addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> #endif
> 
> Kconfig tells you what hardware is supported, not what hardware you're
> actually running on.
> 
> Jia Hongtao, do you know anything about this issue?  Is there an erratum?

Sorry for the late response, I just return from my vacation.
I don't know this issue.

> What chips are affected by the the erratum covered by
> <http://patchwork.ozlabs.org/patch/240239/>?

MPC8544, MPC8548, MPC8572 are affected by this erratum.
I checked P4080 which using e500mc and no such erratum is found.

> 
> Can we rely on DEAR or is this just a side effect of likely having taken
> a TLB miss for the address recently?  Perhaps we should use the
> instruction emulation to determine the effective address instead.
> 
> Guenter, is this patch intended to deal with an erratum or are you
> covering up legitimate errors?
> 
> -Scott
> 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-07 Thread Hongtao Jia


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, September 30, 2014 2:36 AM
 To: Guenter Roeck
 Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-
 d...@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G Varghese;
 Guenter Roeck; Jia Hongtao-B38951
 Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
 exception on E500MC / E5500
 
 On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
  From: Jojy G Varghese jo...@juniper.net
 
  For E500MC and E5500, a machine check exception in pci(e) memory space
  crashes the kernel.
 
  Testing shows that the MCAR(U) register is zero on a MC exception for
  the
  E5500 core. At the same time, DEAR register has been found to have the
  address of the faulty load address during an MC exception for this core.
 
  This fix changes the current behavior to fixup the result register and
  instruction pointers in the case of a load operation on a faulty PCI
  address.
 
  The changes are:
  - Added the hook to pci machine check handing to the e500mc machine
 check
exception handler.
  - For the E5500 core, load faulting address from SPRN_DEAR register.
As mentioned above, this is necessary because the E5500 core does not
report the fault address in the MCAR register.
 
  Cc: Scott Wood scottw...@freescale.com
  Signed-off-by: Jojy G Varghese jo...@juniper.net [Guenter Roeck:
  updated description]
  Signed-off-by: Guenter Roeck gro...@juniper.net
  Signed-off-by: Guenter Roeck li...@roeck-us.net
  ---
   arch/powerpc/kernel/traps.c   | 3 ++-
   arch/powerpc/sysdev/fsl_pci.c | 5 +
   2 files changed, 7 insertions(+), 1 deletion(-)
 
  diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
  index 0dc43f9..ecb709b 100644
  --- a/arch/powerpc/kernel/traps.c
  +++ b/arch/powerpc/kernel/traps.c
  @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
  int recoverable = 1;
 
  if (reason  MCSR_LD) {
  -   recoverable = fsl_rio_mcheck_exception(regs);
  +   recoverable = fsl_rio_mcheck_exception(regs) ||
  +   fsl_pci_mcheck_exception(regs);
  if (recoverable == 1)
  goto silent_out;
  }
  diff --git a/arch/powerpc/sysdev/fsl_pci.c
  b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
  --- a/arch/powerpc/sysdev/fsl_pci.c
  +++ b/arch/powerpc/sysdev/fsl_pci.c
  @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
  *regs)  #endif
  addr += mfspr(SPRN_MCAR);
 
  +#ifdef CONFIG_E5500_CPU
  +   if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
  +   addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
 #endif
 
 Kconfig tells you what hardware is supported, not what hardware you're
 actually running on.
 
 Jia Hongtao, do you know anything about this issue?  Is there an erratum?

Sorry for the late response, I just return from my vacation.
I don't know this issue.

 What chips are affected by the the erratum covered by
 http://patchwork.ozlabs.org/patch/240239/?

MPC8544, MPC8548, MPC8572 are affected by this erratum.
I checked P4080 which using e500mc and no such erratum is found.

 
 Can we rely on DEAR or is this just a side effect of likely having taken
 a TLB miss for the address recently?  Perhaps we should use the
 instruction emulation to determine the effective address instead.
 
 Guenter, is this patch intended to deal with an erratum or are you
 covering up legitimate errors?
 
 -Scott
 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-10-07 Thread Hongtao Jia


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, October 01, 2014 8:44 AM
 To: Guenter Roeck
 Cc: Jojy Varghese; Benjamin Herrenschmidt; Paul Mackerras; Michael
 Ellerman; linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org;
 Guenter Roeck; Jia Hongtao-B38951
 Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
 exception on E500MC / E5500
 
 On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
  On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
   On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
   
On 9/29/14 12:06 PM, Guenter Roeck li...@roeck-us.net wrote:
   
Those are errors related to PCIe hotplug, and are seen with
unexpected PCIe device removals (triggered, for example, by
removing power from a PCIe adapter).
The behavior we see on E5500 is quite similar to the same
behavior on
E500:
If unhandled, the CPU keeps executing the same instruction over
and over again if there is an error on a PCIe access and thus
stalls. I don't know if this is considered an erratum or expected
behavior, but it is one we have to address since we have to be
able to handle that condition.
  
   The reason I ask is that the handling for e500 was described as an
   erratum workaround.  If it is an erratum it would be nice to know
   the erratum number and the full list of affected chips.
  
  My understanding, which may be wrong, was that this is expected
  behavior, at least for E5500. I actually thought I had seen it
  somewhere in the specification (response to PCIe errors), but I don't
 recall where exactly.
 
  At least for my part I am not aware of an erratum.
 
 Jia Hongtao, can you comment here?

I did not find any related erratum either.

 
Ultimately, we'll want
to
implement PCIe error handlers for the affected drivers, but that
will be a next step.
  
   For now can we at least print a ratelimited error message?  I don't
   like the idea of silently ignoring these errors.  I suppose it's a
   separate issue from extending the workaround to cover e500mc, though.
  
  I don't really like the idea of printing an error message pretty much
  each time when an unexpected hotplug event occurs.
 
 Unexpected events seem like the sort of thing you'd want to log, but my
 concern is that this might not be the only cause of PCI errors.
 
 -Scott
 



Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Scott Wood
On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
> On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> > On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> > > 
> > > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> > > 
> > > >Those are errors related to PCIe hotplug, and are seen with unexpected
> > > >PCIe
> > > >device removals (triggered, for example, by removing power from a PCIe
> > > >adapter).
> > > >The behavior we see on E5500 is quite similar to the same behavior on
> > > >E500:
> > > >If unhandled, the CPU keeps executing the same instruction over and over
> > > >again
> > > >if there is an error on a PCIe access and thus stalls. I don't know if
> > > >this
> > > >is considered an erratum or expected behavior, but it is one we have to
> > > >address
> > > >since we have to be able to handle that condition. 
> > 
> > The reason I ask is that the handling for e500 was described as an
> > erratum workaround.  If it is an erratum it would be nice to know the
> > erratum number and the full list of affected chips.
> > 
> My understanding, which may be wrong, was that this is expected behavior,
> at least for E5500. I actually thought I had seen it somewhere in the
> specification (response to PCIe errors), but I don't recall where exactly.
> 
> At least for my part I am not aware of an erratum.

Jia Hongtao, can you comment here?

> > > >Ultimately, we'll want
> > > >to
> > > >implement PCIe error handlers for the affected drivers, but that will be
> > > >a next
> > > >step.
> > 
> > For now can we at least print a ratelimited error message?  I don't like
> > the idea of silently ignoring these errors.  I suppose it's a separate
> > issue from extending the workaround to cover e500mc, though.
> > 
> I don't really like the idea of printing an error message pretty much each 
> time
> when an unexpected hotplug event occurs.

Unexpected events seem like the sort of thing you'd want to log, but my
concern is that this might not be the only cause of PCI errors.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Jojy Varghese


On 9/30/14 8:50 AM, "Guenter Roeck"  wrote:

>On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
>> On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
>> > 
>> > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
>> > 
>> > >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
>> > >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
>> > >> > From: Jojy G Varghese 
>> > >> > 
>> > >> > For E500MC and E5500, a machine check exception in pci(e) memory
>>space
>> > >> > crashes the kernel.
>> > >> > 
>> > >> > Testing shows that the MCAR(U) register is zero on a MC
>>exception for
>> > >>the
>> > >> > E5500 core. At the same time, DEAR register has been found to
>>have the
>> > >> > address of the faulty load address during an MC exception for
>>this
>> > >>core.
>> > >> > 
>> > >> > This fix changes the current behavior to fixup the result
>>register
>> > >> > and instruction pointers in the case of a load operation on a
>>faulty
>> > >> > PCI address.
>> > >> > 
>> > >> > The changes are:
>> > >> > - Added the hook to pci machine check handing to the e500mc
>>machine
>> > >>check
>> > >> >   exception handler.
>> > >> > - For the E5500 core, load faulting address from SPRN_DEAR
>>register.
>> > >> >   As mentioned above, this is necessary because the E5500 core
>>does
>> > >>not
>> > >> >   report the fault address in the MCAR register.
>> > >> > 
>> > >> > Cc: Scott Wood 
>> > >> > Signed-off-by: Jojy G Varghese 
>> > >> > [Guenter Roeck: updated description]
>> > >> > Signed-off-by: Guenter Roeck 
>> > >> > Signed-off-by: Guenter Roeck 
>> > >> > ---
>> > >> >  arch/powerpc/kernel/traps.c   | 3 ++-
>> > >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
>> > >> >  2 files changed, 7 insertions(+), 1 deletion(-)
>> > >> > 
>> > >> > diff --git a/arch/powerpc/kernel/traps.c
>>b/arch/powerpc/kernel/traps.c
>> > >> > index 0dc43f9..ecb709b 100644
>> > >> > --- a/arch/powerpc/kernel/traps.c
>> > >> > +++ b/arch/powerpc/kernel/traps.c
>> > >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs
>>*regs)
>> > >> >   int recoverable = 1;
>> > >> >  
>> > >> >   if (reason & MCSR_LD) {
>> > >> > - recoverable = fsl_rio_mcheck_exception(regs);
>> > >> > + recoverable = fsl_rio_mcheck_exception(regs) ||
>> > >> > + fsl_pci_mcheck_exception(regs);
>> > >> >   if (recoverable == 1)
>> > >> >   goto silent_out;
>> > >> >   }
>> > >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
>> > >>b/arch/powerpc/sysdev/fsl_pci.c
>> > >> > index c507767..bdb956b 100644
>> > >> > --- a/arch/powerpc/sysdev/fsl_pci.c
>> > >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
>> > >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct
>>pt_regs
>> > >>*regs)
>> > >> >  #endif
>> > >> >   addr += mfspr(SPRN_MCAR);
>> > >> >  
>> > >> > +#ifdef CONFIG_E5500_CPU
>> > >> > + if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
>> > >> > + addr = PFN_PHYS(vmalloc_to_pfn((void 
>> > >> > *)mfspr(SPRN_DEAR)));
>> > >> > +#endif
>> > >> 
>> > >> Kconfig tells you what hardware is supported, not what hardware
>>you're
>> > >> actually running on.
>> 
>> Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
>> it is used for selecting GCC optimization settings.  You could have
>> CONFIG_GENERIC_CPU instead.
>> 
>> And the subject says "E500MC / E5500", not just "E5500". :-)
>> 
>> > >Hi Scott,
>> > >
>> > >Good point. Jojy, guess we'll have to check if the CPU is actually an
>> > >E5500.
>> > >Can you look into that ?
>> > 
>> > 
>> > "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting
>>that
>> > we use a runtime method of determining the cpu type (cpu_spec's
>>cpu_name
>> > for
>> > example).  
>> 
>> Yes, if there's a bug to be worked around, and we don't want to apply
>> the workaround unconditionally, you should use PVR to determine whether
>> you're running on an affected core.
>> 
>> > >> Can we rely on DEAR or is this just a side effect of likely having
>>taken
>> > >> a TLB miss for the address recently?  Perhaps we should use the
>> > >> instruction emulation to determine the effective address instead.
>> > >> 
>> > >> Guenter, is this patch intended to deal with an erratum or are you
>> > >> covering up legitimate errors?
>> > >> 
>> >
>> > >Those are errors related to PCIe hotplug, and are seen with
>>unexpected
>> > >PCIe
>> > >device removals (triggered, for example, by removing power from a
>>PCIe
>> > >adapter).
>> > >The behavior we see on E5500 is quite similar to the same behavior on
>> > >E500:
>> > >If unhandled, the CPU keeps executing the same instruction over and
>>over
>> > >again
>> > >if there is an error on a PCIe access and thus stalls. I don't know
>>if
>> > >this
>> > >is considered an erratum or expected behavior, but it is one we have
>>to
>> > >address
>> > >since we have to be able to handle that condition.
>> 
>> The reason I ask is that the handling for e500 was 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Jojy Varghese


On 9/30/14 1:17 PM, "Scott Wood"  wrote:

>On Tue, 2014-09-30 at 20:15 +, Jojy Varghese wrote:
>> 
>> On 9/30/14 8:50 AM, "Guenter Roeck"  wrote:
>> 
>> >On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
>> >> Which specific chip and revision did you see this on?  What is the
>>value
>> >> in MCSR?
>> >> 
>> >Jojy can answer that, at least for P5020. We have seen it on P5040 as
>> >well,
>> >though, so it is not just limited to one chip/revision.
>> 
>> The specifics are:
>> PVR: 0x80240012
>> Instruction that causes the MC exception: lwbrx
>>  The faulty load address is also present in RB. So we could change the
>> logic to use that
>> instead of DEAR. What I don¹t know is of there are other cases also
>>which
>> escapes the current logic.
>
>Could you find out what MCSR was when that happened?  I'm most
>interested in whether MAV was set, but the other bits would be
>interesting as well.

MCSR=a000 ( Load Error Report)
>
>-Scott
>
>
Thanks
Jojy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Scott Wood
On Tue, 2014-09-30 at 20:15 +, Jojy Varghese wrote:
> 
> On 9/30/14 8:50 AM, "Guenter Roeck"  wrote:
> 
> >On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> >> Which specific chip and revision did you see this on?  What is the value
> >> in MCSR?
> >> 
> >Jojy can answer that, at least for P5020. We have seen it on P5040 as
> >well,
> >though, so it is not just limited to one chip/revision.
> 
> The specifics are:
> PVR: 0x80240012
> Instruction that causes the MC exception: lwbrx
>   The faulty load address is also present in RB. So we could change the
> logic to use that 
> instead of DEAR. What I don’t know is of there are other cases also which
> escapes the current logic.

Could you find out what MCSR was when that happened?  I'm most
interested in whether MAV was set, but the other bits would be
interesting as well.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Guenter Roeck
On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> > 
> > On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> > 
> > >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> > >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > >> > From: Jojy G Varghese 
> > >> > 
> > >> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > >> > crashes the kernel.
> > >> > 
> > >> > Testing shows that the MCAR(U) register is zero on a MC exception for
> > >>the
> > >> > E5500 core. At the same time, DEAR register has been found to have the
> > >> > address of the faulty load address during an MC exception for this
> > >>core.
> > >> > 
> > >> > This fix changes the current behavior to fixup the result register
> > >> > and instruction pointers in the case of a load operation on a faulty
> > >> > PCI address.
> > >> > 
> > >> > The changes are:
> > >> > - Added the hook to pci machine check handing to the e500mc machine
> > >>check
> > >> >   exception handler.
> > >> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > >> >   As mentioned above, this is necessary because the E5500 core does
> > >>not
> > >> >   report the fault address in the MCAR register.
> > >> > 
> > >> > Cc: Scott Wood 
> > >> > Signed-off-by: Jojy G Varghese 
> > >> > [Guenter Roeck: updated description]
> > >> > Signed-off-by: Guenter Roeck 
> > >> > Signed-off-by: Guenter Roeck 
> > >> > ---
> > >> >  arch/powerpc/kernel/traps.c   | 3 ++-
> > >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> > >> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >> > 
> > >> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > >> > index 0dc43f9..ecb709b 100644
> > >> > --- a/arch/powerpc/kernel/traps.c
> > >> > +++ b/arch/powerpc/kernel/traps.c
> > >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > >> >int recoverable = 1;
> > >> >  
> > >> >if (reason & MCSR_LD) {
> > >> > -  recoverable = fsl_rio_mcheck_exception(regs);
> > >> > +  recoverable = fsl_rio_mcheck_exception(regs) ||
> > >> > +  fsl_pci_mcheck_exception(regs);
> > >> >if (recoverable == 1)
> > >> >goto silent_out;
> > >> >}
> > >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > >>b/arch/powerpc/sysdev/fsl_pci.c
> > >> > index c507767..bdb956b 100644
> > >> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > >>*regs)
> > >> >  #endif
> > >> >addr += mfspr(SPRN_MCAR);
> > >> >  
> > >> > +#ifdef CONFIG_E5500_CPU
> > >> > +  if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > >> > +  addr = PFN_PHYS(vmalloc_to_pfn((void 
> > >> > *)mfspr(SPRN_DEAR)));
> > >> > +#endif
> > >> 
> > >> Kconfig tells you what hardware is supported, not what hardware you're
> > >> actually running on.
> 
> Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
> it is used for selecting GCC optimization settings.  You could have
> CONFIG_GENERIC_CPU instead.
> 
> And the subject says "E500MC / E5500", not just "E5500". :-)
> 
> > >Hi Scott,
> > >
> > >Good point. Jojy, guess we'll have to check if the CPU is actually an
> > >E5500.
> > >Can you look into that ?
> > 
> > 
> > "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
> > we use a runtime method of determining the cpu type (cpu_spec's cpu_name
> > for
> > example).  
> 
> Yes, if there's a bug to be worked around, and we don't want to apply
> the workaround unconditionally, you should use PVR to determine whether
> you're running on an affected core.
> 
> > >> Can we rely on DEAR or is this just a side effect of likely having taken
> > >> a TLB miss for the address recently?  Perhaps we should use the
> > >> instruction emulation to determine the effective address instead.
> > >> 
> > >> Guenter, is this patch intended to deal with an erratum or are you
> > >> covering up legitimate errors?
> > >> 
> >
> > >Those are errors related to PCIe hotplug, and are seen with unexpected
> > >PCIe
> > >device removals (triggered, for example, by removing power from a PCIe
> > >adapter).
> > >The behavior we see on E5500 is quite similar to the same behavior on
> > >E500:
> > >If unhandled, the CPU keeps executing the same instruction over and over
> > >again
> > >if there is an error on a PCIe access and thus stalls. I don't know if
> > >this
> > >is considered an erratum or expected behavior, but it is one we have to
> > >address
> > >since we have to be able to handle that condition. 
> 
> The reason I ask is that the handling for e500 was described as an
> erratum workaround.  If it is an erratum it would be nice to know the
> erratum number and the full list of affected chips.
> 
My understanding, which may be wrong, was that 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Guenter Roeck
On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
 On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
  
  On 9/29/14 12:06 PM, Guenter Roeck li...@roeck-us.net wrote:
  
  On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
   On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
From: Jojy G Varghese jo...@juniper.net

For E500MC and E5500, a machine check exception in pci(e) memory space
crashes the kernel.

Testing shows that the MCAR(U) register is zero on a MC exception for
  the
E5500 core. At the same time, DEAR register has been found to have the
address of the faulty load address during an MC exception for this
  core.

This fix changes the current behavior to fixup the result register
and instruction pointers in the case of a load operation on a faulty
PCI address.

The changes are:
- Added the hook to pci machine check handing to the e500mc machine
  check
  exception handler.
- For the E5500 core, load faulting address from SPRN_DEAR register.
  As mentioned above, this is necessary because the E5500 core does
  not
  report the fault address in the MCAR register.

Cc: Scott Wood scottw...@freescale.com
Signed-off-by: Jojy G Varghese jo...@juniper.net
[Guenter Roeck: updated description]
Signed-off-by: Guenter Roeck gro...@juniper.net
Signed-off-by: Guenter Roeck li...@roeck-us.net
---
 arch/powerpc/kernel/traps.c   | 3 ++-
 arch/powerpc/sysdev/fsl_pci.c | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0dc43f9..ecb709b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
   int recoverable = 1;
 
   if (reason  MCSR_LD) {
-  recoverable = fsl_rio_mcheck_exception(regs);
+  recoverable = fsl_rio_mcheck_exception(regs) ||
+  fsl_pci_mcheck_exception(regs);
   if (recoverable == 1)
   goto silent_out;
   }
diff --git a/arch/powerpc/sysdev/fsl_pci.c
  b/arch/powerpc/sysdev/fsl_pci.c
index c507767..bdb956b 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
  *regs)
 #endif
   addr += mfspr(SPRN_MCAR);
 
+#ifdef CONFIG_E5500_CPU
+  if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
+  addr = PFN_PHYS(vmalloc_to_pfn((void 
*)mfspr(SPRN_DEAR)));
+#endif
   
   Kconfig tells you what hardware is supported, not what hardware you're
   actually running on.
 
 Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
 it is used for selecting GCC optimization settings.  You could have
 CONFIG_GENERIC_CPU instead.
 
 And the subject says E500MC / E5500, not just E5500. :-)
 
  Hi Scott,
  
  Good point. Jojy, guess we'll have to check if the CPU is actually an
  E5500.
  Can you look into that ?
  
  
  /proc/cpuinfo shows the cpu as e5500. Scott, are you suggesting that
  we use a runtime method of determining the cpu type (cpu_spec's cpu_name
  for
  example).  
 
 Yes, if there's a bug to be worked around, and we don't want to apply
 the workaround unconditionally, you should use PVR to determine whether
 you're running on an affected core.
 
   Can we rely on DEAR or is this just a side effect of likely having taken
   a TLB miss for the address recently?  Perhaps we should use the
   instruction emulation to determine the effective address instead.
   
   Guenter, is this patch intended to deal with an erratum or are you
   covering up legitimate errors?
   
 
  Those are errors related to PCIe hotplug, and are seen with unexpected
  PCIe
  device removals (triggered, for example, by removing power from a PCIe
  adapter).
  The behavior we see on E5500 is quite similar to the same behavior on
  E500:
  If unhandled, the CPU keeps executing the same instruction over and over
  again
  if there is an error on a PCIe access and thus stalls. I don't know if
  this
  is considered an erratum or expected behavior, but it is one we have to
  address
  since we have to be able to handle that condition. 
 
 The reason I ask is that the handling for e500 was described as an
 erratum workaround.  If it is an erratum it would be nice to know the
 erratum number and the full list of affected chips.
 
My understanding, which may be wrong, was that this is expected behavior,
at least for E5500. I actually thought I had seen it somewhere in the
specification (response to PCIe errors), but I don't recall where exactly.

At least for my part I am not aware of an erratum.

  Ultimately, we'll want
  to
  implement PCIe error handlers for the affected drivers, but that will be
  a next
  step.
 
 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Scott Wood
On Tue, 2014-09-30 at 20:15 +, Jojy Varghese wrote:
 
 On 9/30/14 8:50 AM, Guenter Roeck li...@roeck-us.net wrote:
 
 On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
  Which specific chip and revision did you see this on?  What is the value
  in MCSR?
  
 Jojy can answer that, at least for P5020. We have seen it on P5040 as
 well,
 though, so it is not just limited to one chip/revision.
 
 The specifics are:
 PVR: 0x80240012
 Instruction that causes the MC exception: lwbrx
   The faulty load address is also present in RB. So we could change the
 logic to use that 
 instead of DEAR. What I don’t know is of there are other cases also which
 escapes the current logic.

Could you find out what MCSR was when that happened?  I'm most
interested in whether MAV was set, but the other bits would be
interesting as well.

-Scott


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Jojy Varghese


On 9/30/14 1:17 PM, Scott Wood scottw...@freescale.com wrote:

On Tue, 2014-09-30 at 20:15 +, Jojy Varghese wrote:
 
 On 9/30/14 8:50 AM, Guenter Roeck li...@roeck-us.net wrote:
 
 On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
  Which specific chip and revision did you see this on?  What is the
value
  in MCSR?
  
 Jojy can answer that, at least for P5020. We have seen it on P5040 as
 well,
 though, so it is not just limited to one chip/revision.
 
 The specifics are:
 PVR: 0x80240012
 Instruction that causes the MC exception: lwbrx
  The faulty load address is also present in RB. So we could change the
 logic to use that
 instead of DEAR. What I don¹t know is of there are other cases also
which
 escapes the current logic.

Could you find out what MCSR was when that happened?  I'm most
interested in whether MAV was set, but the other bits would be
interesting as well.

MCSR=a000 ( Load Error Report)

-Scott


Thanks
Jojy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Jojy Varghese


On 9/30/14 8:50 AM, Guenter Roeck li...@roeck-us.net wrote:

On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
 On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
  
  On 9/29/14 12:06 PM, Guenter Roeck li...@roeck-us.net wrote:
  
  On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
   On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
From: Jojy G Varghese jo...@juniper.net

For E500MC and E5500, a machine check exception in pci(e) memory
space
crashes the kernel.

Testing shows that the MCAR(U) register is zero on a MC
exception for
  the
E5500 core. At the same time, DEAR register has been found to
have the
address of the faulty load address during an MC exception for
this
  core.

This fix changes the current behavior to fixup the result
register
and instruction pointers in the case of a load operation on a
faulty
PCI address.

The changes are:
- Added the hook to pci machine check handing to the e500mc
machine
  check
  exception handler.
- For the E5500 core, load faulting address from SPRN_DEAR
register.
  As mentioned above, this is necessary because the E5500 core
does
  not
  report the fault address in the MCAR register.

Cc: Scott Wood scottw...@freescale.com
Signed-off-by: Jojy G Varghese jo...@juniper.net
[Guenter Roeck: updated description]
Signed-off-by: Guenter Roeck gro...@juniper.net
Signed-off-by: Guenter Roeck li...@roeck-us.net
---
 arch/powerpc/kernel/traps.c   | 3 ++-
 arch/powerpc/sysdev/fsl_pci.c | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c
b/arch/powerpc/kernel/traps.c
index 0dc43f9..ecb709b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs
*regs)
  int recoverable = 1;
 
  if (reason  MCSR_LD) {
- recoverable = fsl_rio_mcheck_exception(regs);
+ recoverable = fsl_rio_mcheck_exception(regs) ||
+ fsl_pci_mcheck_exception(regs);
  if (recoverable == 1)
  goto silent_out;
  }
diff --git a/arch/powerpc/sysdev/fsl_pci.c
  b/arch/powerpc/sysdev/fsl_pci.c
index c507767..bdb956b 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct
pt_regs
  *regs)
 #endif
  addr += mfspr(SPRN_MCAR);
 
+#ifdef CONFIG_E5500_CPU
+ if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
+ addr = PFN_PHYS(vmalloc_to_pfn((void 
*)mfspr(SPRN_DEAR)));
+#endif
   
   Kconfig tells you what hardware is supported, not what hardware
you're
   actually running on.
 
 Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
 it is used for selecting GCC optimization settings.  You could have
 CONFIG_GENERIC_CPU instead.
 
 And the subject says E500MC / E5500, not just E5500. :-)
 
  Hi Scott,
  
  Good point. Jojy, guess we'll have to check if the CPU is actually an
  E5500.
  Can you look into that ?
  
  
  /proc/cpuinfo shows the cpu as e5500. Scott, are you suggesting
that
  we use a runtime method of determining the cpu type (cpu_spec's
cpu_name
  for
  example).  
 
 Yes, if there's a bug to be worked around, and we don't want to apply
 the workaround unconditionally, you should use PVR to determine whether
 you're running on an affected core.
 
   Can we rely on DEAR or is this just a side effect of likely having
taken
   a TLB miss for the address recently?  Perhaps we should use the
   instruction emulation to determine the effective address instead.
   
   Guenter, is this patch intended to deal with an erratum or are you
   covering up legitimate errors?
   
 
  Those are errors related to PCIe hotplug, and are seen with
unexpected
  PCIe
  device removals (triggered, for example, by removing power from a
PCIe
  adapter).
  The behavior we see on E5500 is quite similar to the same behavior on
  E500:
  If unhandled, the CPU keeps executing the same instruction over and
over
  again
  if there is an error on a PCIe access and thus stalls. I don't know
if
  this
  is considered an erratum or expected behavior, but it is one we have
to
  address
  since we have to be able to handle that condition.
 
 The reason I ask is that the handling for e500 was described as an
 erratum workaround.  If it is an erratum it would be nice to know the
 erratum number and the full list of affected chips.
 
My understanding, which may be wrong, was that this is expected behavior,
at least for E5500. I actually thought I had seen it somewhere in the
specification (response to PCIe errors), but I don't recall where exactly.

At least for my part I am not aware of an erratum.

  Ultimately, we'll want
  to
  implement PCIe error handlers for the 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-30 Thread Scott Wood
On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
 On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
  On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
   
   On 9/29/14 12:06 PM, Guenter Roeck li...@roeck-us.net wrote:
   
   Those are errors related to PCIe hotplug, and are seen with unexpected
   PCIe
   device removals (triggered, for example, by removing power from a PCIe
   adapter).
   The behavior we see on E5500 is quite similar to the same behavior on
   E500:
   If unhandled, the CPU keeps executing the same instruction over and over
   again
   if there is an error on a PCIe access and thus stalls. I don't know if
   this
   is considered an erratum or expected behavior, but it is one we have to
   address
   since we have to be able to handle that condition. 
  
  The reason I ask is that the handling for e500 was described as an
  erratum workaround.  If it is an erratum it would be nice to know the
  erratum number and the full list of affected chips.
  
 My understanding, which may be wrong, was that this is expected behavior,
 at least for E5500. I actually thought I had seen it somewhere in the
 specification (response to PCIe errors), but I don't recall where exactly.
 
 At least for my part I am not aware of an erratum.

Jia Hongtao, can you comment here?

   Ultimately, we'll want
   to
   implement PCIe error handlers for the affected drivers, but that will be
   a next
   step.
  
  For now can we at least print a ratelimited error message?  I don't like
  the idea of silently ignoring these errors.  I suppose it's a separate
  issue from extending the workaround to cover e500mc, though.
  
 I don't really like the idea of printing an error message pretty much each 
 time
 when an unexpected hotplug event occurs.

Unexpected events seem like the sort of thing you'd want to log, but my
concern is that this might not be the only cause of PCI errors.

-Scott


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Jojy Varghese


On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:

>On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
>> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
>> > From: Jojy G Varghese 
>> > 
>> > For E500MC and E5500, a machine check exception in pci(e) memory space
>> > crashes the kernel.
>> > 
>> > Testing shows that the MCAR(U) register is zero on a MC exception for
>>the
>> > E5500 core. At the same time, DEAR register has been found to have the
>> > address of the faulty load address during an MC exception for this
>>core.
>> > 
>> > This fix changes the current behavior to fixup the result register
>> > and instruction pointers in the case of a load operation on a faulty
>> > PCI address.
>> > 
>> > The changes are:
>> > - Added the hook to pci machine check handing to the e500mc machine
>>check
>> >   exception handler.
>> > - For the E5500 core, load faulting address from SPRN_DEAR register.
>> >   As mentioned above, this is necessary because the E5500 core does
>>not
>> >   report the fault address in the MCAR register.
>> > 
>> > Cc: Scott Wood 
>> > Signed-off-by: Jojy G Varghese 
>> > [Guenter Roeck: updated description]
>> > Signed-off-by: Guenter Roeck 
>> > Signed-off-by: Guenter Roeck 
>> > ---
>> >  arch/powerpc/kernel/traps.c   | 3 ++-
>> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
>> >  2 files changed, 7 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> > index 0dc43f9..ecb709b 100644
>> > --- a/arch/powerpc/kernel/traps.c
>> > +++ b/arch/powerpc/kernel/traps.c
>> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
>> >int recoverable = 1;
>> >  
>> >if (reason & MCSR_LD) {
>> > -  recoverable = fsl_rio_mcheck_exception(regs);
>> > +  recoverable = fsl_rio_mcheck_exception(regs) ||
>> > +  fsl_pci_mcheck_exception(regs);
>> >if (recoverable == 1)
>> >goto silent_out;
>> >}
>> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
>>b/arch/powerpc/sysdev/fsl_pci.c
>> > index c507767..bdb956b 100644
>> > --- a/arch/powerpc/sysdev/fsl_pci.c
>> > +++ b/arch/powerpc/sysdev/fsl_pci.c
>> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
>>*regs)
>> >  #endif
>> >addr += mfspr(SPRN_MCAR);
>> >  
>> > +#ifdef CONFIG_E5500_CPU
>> > +  if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
>> > +  addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
>> > +#endif
>> 
>> Kconfig tells you what hardware is supported, not what hardware you're
>> actually running on.
>> 
>Hi Scott,
>
>Good point. Jojy, guess we'll have to check if the CPU is actually an
>E5500.
>Can you look into that ?


"/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
we use a runtime method of determining the cpu type (cpu_spec's cpu_name
for
example).  


>
>> Jia Hongtao, do you know anything about this issue?  Is there an
>> erratum?  What chips are affected by the the erratum covered by
>> ?
>> 
>We already have and use the above patch(es) in our kernel. It works fine
>for E500 (P2020), but does not address E5500 (P5020/P5040).
>
>> Can we rely on DEAR or is this just a side effect of likely having taken
>> a TLB miss for the address recently?  Perhaps we should use the
>> instruction emulation to determine the effective address instead.
>> 
>> Guenter, is this patch intended to deal with an erratum or are you
>> covering up legitimate errors?
>> 
>Those are errors related to PCIe hotplug, and are seen with unexpected
>PCIe
>device removals (triggered, for example, by removing power from a PCIe
>adapter).
>The behavior we see on E5500 is quite similar to the same behavior on
>E500:
>If unhandled, the CPU keeps executing the same instruction over and over
>again
>if there is an error on a PCIe access and thus stalls. I don't know if
>this
>is considered an erratum or expected behavior, but it is one we have to
>address
>since we have to be able to handle that condition. Ultimately, we'll want
>to
>implement PCIe error handlers for the affected drivers, but that will be
>a next
>step.

According to the spec, we MCAR is supposed to hold the faulty data address
but for 5500 core, we found that MCAR is zero. You are right that DEAR
entry could
be a resultOf a TLB miss but that¹s the register we could rely on.

What do you mean by "instruction emulation"? Are you suggesting that we
examine the RD, RS 
registers for the instruction?



>
>Please let me know if you have a better solution to address this problem.
>
>Thanks,
>Guenter


Thanks
Jojy



Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Scott Wood
On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
> 
> On 9/29/14 12:06 PM, "Guenter Roeck"  wrote:
> 
> >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> >> > From: Jojy G Varghese 
> >> > 
> >> > For E500MC and E5500, a machine check exception in pci(e) memory space
> >> > crashes the kernel.
> >> > 
> >> > Testing shows that the MCAR(U) register is zero on a MC exception for
> >>the
> >> > E5500 core. At the same time, DEAR register has been found to have the
> >> > address of the faulty load address during an MC exception for this
> >>core.
> >> > 
> >> > This fix changes the current behavior to fixup the result register
> >> > and instruction pointers in the case of a load operation on a faulty
> >> > PCI address.
> >> > 
> >> > The changes are:
> >> > - Added the hook to pci machine check handing to the e500mc machine
> >>check
> >> >   exception handler.
> >> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> >> >   As mentioned above, this is necessary because the E5500 core does
> >>not
> >> >   report the fault address in the MCAR register.
> >> > 
> >> > Cc: Scott Wood 
> >> > Signed-off-by: Jojy G Varghese 
> >> > [Guenter Roeck: updated description]
> >> > Signed-off-by: Guenter Roeck 
> >> > Signed-off-by: Guenter Roeck 
> >> > ---
> >> >  arch/powerpc/kernel/traps.c   | 3 ++-
> >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> >> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >> > index 0dc43f9..ecb709b 100644
> >> > --- a/arch/powerpc/kernel/traps.c
> >> > +++ b/arch/powerpc/kernel/traps.c
> >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> >> >  int recoverable = 1;
> >> >  
> >> >  if (reason & MCSR_LD) {
> >> > -recoverable = fsl_rio_mcheck_exception(regs);
> >> > +recoverable = fsl_rio_mcheck_exception(regs) ||
> >> > +fsl_pci_mcheck_exception(regs);
> >> >  if (recoverable == 1)
> >> >  goto silent_out;
> >> >  }
> >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> >>b/arch/powerpc/sysdev/fsl_pci.c
> >> > index c507767..bdb956b 100644
> >> > --- a/arch/powerpc/sysdev/fsl_pci.c
> >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> >>*regs)
> >> >  #endif
> >> >  addr += mfspr(SPRN_MCAR);
> >> >  
> >> > +#ifdef CONFIG_E5500_CPU
> >> > +if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> >> > +addr = PFN_PHYS(vmalloc_to_pfn((void 
> >> > *)mfspr(SPRN_DEAR)));
> >> > +#endif
> >> 
> >> Kconfig tells you what hardware is supported, not what hardware you're
> >> actually running on.

Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
it is used for selecting GCC optimization settings.  You could have
CONFIG_GENERIC_CPU instead.

And the subject says "E500MC / E5500", not just "E5500". :-)

> >Hi Scott,
> >
> >Good point. Jojy, guess we'll have to check if the CPU is actually an
> >E5500.
> >Can you look into that ?
> 
> 
> "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
> we use a runtime method of determining the cpu type (cpu_spec's cpu_name
> for
> example).  

Yes, if there's a bug to be worked around, and we don't want to apply
the workaround unconditionally, you should use PVR to determine whether
you're running on an affected core.

> >> Can we rely on DEAR or is this just a side effect of likely having taken
> >> a TLB miss for the address recently?  Perhaps we should use the
> >> instruction emulation to determine the effective address instead.
> >> 
> >> Guenter, is this patch intended to deal with an erratum or are you
> >> covering up legitimate errors?
> >> 
>
> >Those are errors related to PCIe hotplug, and are seen with unexpected
> >PCIe
> >device removals (triggered, for example, by removing power from a PCIe
> >adapter).
> >The behavior we see on E5500 is quite similar to the same behavior on
> >E500:
> >If unhandled, the CPU keeps executing the same instruction over and over
> >again
> >if there is an error on a PCIe access and thus stalls. I don't know if
> >this
> >is considered an erratum or expected behavior, but it is one we have to
> >address
> >since we have to be able to handle that condition. 

The reason I ask is that the handling for e500 was described as an
erratum workaround.  If it is an erratum it would be nice to know the
erratum number and the full list of affected chips.

> >Ultimately, we'll want
> >to
> >implement PCIe error handlers for the affected drivers, but that will be
> >a next
> >step.

For now can we at least print a ratelimited error message?  I don't like
the idea of silently ignoring these errors.  I suppose it's a separate
issue from extending the workaround to cover e500mc, 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Guenter Roeck
On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > From: Jojy G Varghese 
> > 
> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > crashes the kernel.
> > 
> > Testing shows that the MCAR(U) register is zero on a MC exception for the
> > E5500 core. At the same time, DEAR register has been found to have the
> > address of the faulty load address during an MC exception for this core.
> > 
> > This fix changes the current behavior to fixup the result register
> > and instruction pointers in the case of a load operation on a faulty
> > PCI address.
> > 
> > The changes are:
> > - Added the hook to pci machine check handing to the e500mc machine check
> >   exception handler.
> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> >   As mentioned above, this is necessary because the E5500 core does not
> >   report the fault address in the MCAR register.
> > 
> > Cc: Scott Wood 
> > Signed-off-by: Jojy G Varghese 
> > [Guenter Roeck: updated description]
> > Signed-off-by: Guenter Roeck 
> > Signed-off-by: Guenter Roeck 
> > ---
> >  arch/powerpc/kernel/traps.c   | 3 ++-
> >  arch/powerpc/sysdev/fsl_pci.c | 5 +
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 0dc43f9..ecb709b 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > int recoverable = 1;
> >  
> > if (reason & MCSR_LD) {
> > -   recoverable = fsl_rio_mcheck_exception(regs);
> > +   recoverable = fsl_rio_mcheck_exception(regs) ||
> > +   fsl_pci_mcheck_exception(regs);
> > if (recoverable == 1)
> > goto silent_out;
> > }
> > diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> > index c507767..bdb956b 100644
> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
> >  #endif
> > addr += mfspr(SPRN_MCAR);
> >  
> > +#ifdef CONFIG_E5500_CPU
> > +   if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > +   addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> > +#endif
> 
> Kconfig tells you what hardware is supported, not what hardware you're
> actually running on.
> 
Hi Scott,

Good point. Jojy, guess we'll have to check if the CPU is actually an E5500.
Can you look into that ?

> Jia Hongtao, do you know anything about this issue?  Is there an
> erratum?  What chips are affected by the the erratum covered by
> ?
> 
We already have and use the above patch(es) in our kernel. It works fine
for E500 (P2020), but does not address E5500 (P5020/P5040).

> Can we rely on DEAR or is this just a side effect of likely having taken
> a TLB miss for the address recently?  Perhaps we should use the
> instruction emulation to determine the effective address instead.
> 
> Guenter, is this patch intended to deal with an erratum or are you
> covering up legitimate errors?
> 
Those are errors related to PCIe hotplug, and are seen with unexpected PCIe
device removals (triggered, for example, by removing power from a PCIe adapter).
The behavior we see on E5500 is quite similar to the same behavior on E500:
If unhandled, the CPU keeps executing the same instruction over and over again
if there is an error on a PCIe access and thus stalls. I don't know if this
is considered an erratum or expected behavior, but it is one we have to address
since we have to be able to handle that condition. Ultimately, we'll want to
implement PCIe error handlers for the affected drivers, but that will be a next
step.

Please let me know if you have a better solution to address this problem.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Scott Wood
On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> From: Jojy G Varghese 
> 
> For E500MC and E5500, a machine check exception in pci(e) memory space
> crashes the kernel.
> 
> Testing shows that the MCAR(U) register is zero on a MC exception for the
> E5500 core. At the same time, DEAR register has been found to have the
> address of the faulty load address during an MC exception for this core.
> 
> This fix changes the current behavior to fixup the result register
> and instruction pointers in the case of a load operation on a faulty
> PCI address.
> 
> The changes are:
> - Added the hook to pci machine check handing to the e500mc machine check
>   exception handler.
> - For the E5500 core, load faulting address from SPRN_DEAR register.
>   As mentioned above, this is necessary because the E5500 core does not
>   report the fault address in the MCAR register.
> 
> Cc: Scott Wood 
> Signed-off-by: Jojy G Varghese 
> [Guenter Roeck: updated description]
> Signed-off-by: Guenter Roeck 
> Signed-off-by: Guenter Roeck 
> ---
>  arch/powerpc/kernel/traps.c   | 3 ++-
>  arch/powerpc/sysdev/fsl_pci.c | 5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 0dc43f9..ecb709b 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
>   int recoverable = 1;
>  
>   if (reason & MCSR_LD) {
> - recoverable = fsl_rio_mcheck_exception(regs);
> + recoverable = fsl_rio_mcheck_exception(regs) ||
> + fsl_pci_mcheck_exception(regs);
>   if (recoverable == 1)
>   goto silent_out;
>   }
> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> index c507767..bdb956b 100644
> --- a/arch/powerpc/sysdev/fsl_pci.c
> +++ b/arch/powerpc/sysdev/fsl_pci.c
> @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
>  #endif
>   addr += mfspr(SPRN_MCAR);
>  
> +#ifdef CONFIG_E5500_CPU
> + if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> +#endif

Kconfig tells you what hardware is supported, not what hardware you're
actually running on.

Jia Hongtao, do you know anything about this issue?  Is there an
erratum?  What chips are affected by the the erratum covered by
?

Can we rely on DEAR or is this just a side effect of likely having taken
a TLB miss for the address recently?  Perhaps we should use the
instruction emulation to determine the effective address instead.

Guenter, is this patch intended to deal with an erratum or are you
covering up legitimate errors?

-Scott


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Scott Wood
On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
 From: Jojy G Varghese jo...@juniper.net
 
 For E500MC and E5500, a machine check exception in pci(e) memory space
 crashes the kernel.
 
 Testing shows that the MCAR(U) register is zero on a MC exception for the
 E5500 core. At the same time, DEAR register has been found to have the
 address of the faulty load address during an MC exception for this core.
 
 This fix changes the current behavior to fixup the result register
 and instruction pointers in the case of a load operation on a faulty
 PCI address.
 
 The changes are:
 - Added the hook to pci machine check handing to the e500mc machine check
   exception handler.
 - For the E5500 core, load faulting address from SPRN_DEAR register.
   As mentioned above, this is necessary because the E5500 core does not
   report the fault address in the MCAR register.
 
 Cc: Scott Wood scottw...@freescale.com
 Signed-off-by: Jojy G Varghese jo...@juniper.net
 [Guenter Roeck: updated description]
 Signed-off-by: Guenter Roeck gro...@juniper.net
 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---
  arch/powerpc/kernel/traps.c   | 3 ++-
  arch/powerpc/sysdev/fsl_pci.c | 5 +
  2 files changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
 index 0dc43f9..ecb709b 100644
 --- a/arch/powerpc/kernel/traps.c
 +++ b/arch/powerpc/kernel/traps.c
 @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
   int recoverable = 1;
  
   if (reason  MCSR_LD) {
 - recoverable = fsl_rio_mcheck_exception(regs);
 + recoverable = fsl_rio_mcheck_exception(regs) ||
 + fsl_pci_mcheck_exception(regs);
   if (recoverable == 1)
   goto silent_out;
   }
 diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
 index c507767..bdb956b 100644
 --- a/arch/powerpc/sysdev/fsl_pci.c
 +++ b/arch/powerpc/sysdev/fsl_pci.c
 @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
  #endif
   addr += mfspr(SPRN_MCAR);
  
 +#ifdef CONFIG_E5500_CPU
 + if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
 + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
 +#endif

Kconfig tells you what hardware is supported, not what hardware you're
actually running on.

Jia Hongtao, do you know anything about this issue?  Is there an
erratum?  What chips are affected by the the erratum covered by
http://patchwork.ozlabs.org/patch/240239/?

Can we rely on DEAR or is this just a side effect of likely having taken
a TLB miss for the address recently?  Perhaps we should use the
instruction emulation to determine the effective address instead.

Guenter, is this patch intended to deal with an erratum or are you
covering up legitimate errors?

-Scott


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Guenter Roeck
On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
 On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
  From: Jojy G Varghese jo...@juniper.net
  
  For E500MC and E5500, a machine check exception in pci(e) memory space
  crashes the kernel.
  
  Testing shows that the MCAR(U) register is zero on a MC exception for the
  E5500 core. At the same time, DEAR register has been found to have the
  address of the faulty load address during an MC exception for this core.
  
  This fix changes the current behavior to fixup the result register
  and instruction pointers in the case of a load operation on a faulty
  PCI address.
  
  The changes are:
  - Added the hook to pci machine check handing to the e500mc machine check
exception handler.
  - For the E5500 core, load faulting address from SPRN_DEAR register.
As mentioned above, this is necessary because the E5500 core does not
report the fault address in the MCAR register.
  
  Cc: Scott Wood scottw...@freescale.com
  Signed-off-by: Jojy G Varghese jo...@juniper.net
  [Guenter Roeck: updated description]
  Signed-off-by: Guenter Roeck gro...@juniper.net
  Signed-off-by: Guenter Roeck li...@roeck-us.net
  ---
   arch/powerpc/kernel/traps.c   | 3 ++-
   arch/powerpc/sysdev/fsl_pci.c | 5 +
   2 files changed, 7 insertions(+), 1 deletion(-)
  
  diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
  index 0dc43f9..ecb709b 100644
  --- a/arch/powerpc/kernel/traps.c
  +++ b/arch/powerpc/kernel/traps.c
  @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
  int recoverable = 1;
   
  if (reason  MCSR_LD) {
  -   recoverable = fsl_rio_mcheck_exception(regs);
  +   recoverable = fsl_rio_mcheck_exception(regs) ||
  +   fsl_pci_mcheck_exception(regs);
  if (recoverable == 1)
  goto silent_out;
  }
  diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
  index c507767..bdb956b 100644
  --- a/arch/powerpc/sysdev/fsl_pci.c
  +++ b/arch/powerpc/sysdev/fsl_pci.c
  @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
   #endif
  addr += mfspr(SPRN_MCAR);
   
  +#ifdef CONFIG_E5500_CPU
  +   if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
  +   addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
  +#endif
 
 Kconfig tells you what hardware is supported, not what hardware you're
 actually running on.
 
Hi Scott,

Good point. Jojy, guess we'll have to check if the CPU is actually an E5500.
Can you look into that ?

 Jia Hongtao, do you know anything about this issue?  Is there an
 erratum?  What chips are affected by the the erratum covered by
 http://patchwork.ozlabs.org/patch/240239/?
 
We already have and use the above patch(es) in our kernel. It works fine
for E500 (P2020), but does not address E5500 (P5020/P5040).

 Can we rely on DEAR or is this just a side effect of likely having taken
 a TLB miss for the address recently?  Perhaps we should use the
 instruction emulation to determine the effective address instead.
 
 Guenter, is this patch intended to deal with an erratum or are you
 covering up legitimate errors?
 
Those are errors related to PCIe hotplug, and are seen with unexpected PCIe
device removals (triggered, for example, by removing power from a PCIe adapter).
The behavior we see on E5500 is quite similar to the same behavior on E500:
If unhandled, the CPU keeps executing the same instruction over and over again
if there is an error on a PCIe access and thus stalls. I don't know if this
is considered an erratum or expected behavior, but it is one we have to address
since we have to be able to handle that condition. Ultimately, we'll want to
implement PCIe error handlers for the affected drivers, but that will be a next
step.

Please let me know if you have a better solution to address this problem.

Thanks,
Guenter
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Scott Wood
On Mon, 2014-09-29 at 23:03 +, Jojy Varghese wrote:
 
 On 9/29/14 12:06 PM, Guenter Roeck li...@roeck-us.net wrote:
 
 On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
  On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
   From: Jojy G Varghese jo...@juniper.net
   
   For E500MC and E5500, a machine check exception in pci(e) memory space
   crashes the kernel.
   
   Testing shows that the MCAR(U) register is zero on a MC exception for
 the
   E5500 core. At the same time, DEAR register has been found to have the
   address of the faulty load address during an MC exception for this
 core.
   
   This fix changes the current behavior to fixup the result register
   and instruction pointers in the case of a load operation on a faulty
   PCI address.
   
   The changes are:
   - Added the hook to pci machine check handing to the e500mc machine
 check
 exception handler.
   - For the E5500 core, load faulting address from SPRN_DEAR register.
 As mentioned above, this is necessary because the E5500 core does
 not
 report the fault address in the MCAR register.
   
   Cc: Scott Wood scottw...@freescale.com
   Signed-off-by: Jojy G Varghese jo...@juniper.net
   [Guenter Roeck: updated description]
   Signed-off-by: Guenter Roeck gro...@juniper.net
   Signed-off-by: Guenter Roeck li...@roeck-us.net
   ---
arch/powerpc/kernel/traps.c   | 3 ++-
arch/powerpc/sysdev/fsl_pci.c | 5 +
2 files changed, 7 insertions(+), 1 deletion(-)
   
   diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
   index 0dc43f9..ecb709b 100644
   --- a/arch/powerpc/kernel/traps.c
   +++ b/arch/powerpc/kernel/traps.c
   @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
int recoverable = 1;

if (reason  MCSR_LD) {
   -recoverable = fsl_rio_mcheck_exception(regs);
   +recoverable = fsl_rio_mcheck_exception(regs) ||
   +fsl_pci_mcheck_exception(regs);
if (recoverable == 1)
goto silent_out;
}
   diff --git a/arch/powerpc/sysdev/fsl_pci.c
 b/arch/powerpc/sysdev/fsl_pci.c
   index c507767..bdb956b 100644
   --- a/arch/powerpc/sysdev/fsl_pci.c
   +++ b/arch/powerpc/sysdev/fsl_pci.c
   @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
 *regs)
#endif
addr += mfspr(SPRN_MCAR);

   +#ifdef CONFIG_E5500_CPU
   +if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
   +addr = PFN_PHYS(vmalloc_to_pfn((void 
   *)mfspr(SPRN_DEAR)));
   +#endif
  
  Kconfig tells you what hardware is supported, not what hardware you're
  actually running on.

Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
it is used for selecting GCC optimization settings.  You could have
CONFIG_GENERIC_CPU instead.

And the subject says E500MC / E5500, not just E5500. :-)

 Hi Scott,
 
 Good point. Jojy, guess we'll have to check if the CPU is actually an
 E5500.
 Can you look into that ?
 
 
 /proc/cpuinfo shows the cpu as e5500. Scott, are you suggesting that
 we use a runtime method of determining the cpu type (cpu_spec's cpu_name
 for
 example).  

Yes, if there's a bug to be worked around, and we don't want to apply
the workaround unconditionally, you should use PVR to determine whether
you're running on an affected core.

  Can we rely on DEAR or is this just a side effect of likely having taken
  a TLB miss for the address recently?  Perhaps we should use the
  instruction emulation to determine the effective address instead.
  
  Guenter, is this patch intended to deal with an erratum or are you
  covering up legitimate errors?
  

 Those are errors related to PCIe hotplug, and are seen with unexpected
 PCIe
 device removals (triggered, for example, by removing power from a PCIe
 adapter).
 The behavior we see on E5500 is quite similar to the same behavior on
 E500:
 If unhandled, the CPU keeps executing the same instruction over and over
 again
 if there is an error on a PCIe access and thus stalls. I don't know if
 this
 is considered an erratum or expected behavior, but it is one we have to
 address
 since we have to be able to handle that condition. 

The reason I ask is that the handling for e500 was described as an
erratum workaround.  If it is an erratum it would be nice to know the
erratum number and the full list of affected chips.

 Ultimately, we'll want
 to
 implement PCIe error handlers for the affected drivers, but that will be
 a next
 step.

For now can we at least print a ratelimited error message?  I don't like
the idea of silently ignoring these errors.  I suppose it's a separate
issue from extending the workaround to cover e500mc, though.

 According to the spec, we MCAR is supposed to hold the faulty data address
 but for 5500 core, we found that MCAR is zero.

Which specific chip and revision did you see this on?  What is the value
in MCSR?

 You are right that DEAR 

Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

2014-09-29 Thread Jojy Varghese


On 9/29/14 12:06 PM, Guenter Roeck li...@roeck-us.net wrote:

On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
 On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
  From: Jojy G Varghese jo...@juniper.net
  
  For E500MC and E5500, a machine check exception in pci(e) memory space
  crashes the kernel.
  
  Testing shows that the MCAR(U) register is zero on a MC exception for
the
  E5500 core. At the same time, DEAR register has been found to have the
  address of the faulty load address during an MC exception for this
core.
  
  This fix changes the current behavior to fixup the result register
  and instruction pointers in the case of a load operation on a faulty
  PCI address.
  
  The changes are:
  - Added the hook to pci machine check handing to the e500mc machine
check
exception handler.
  - For the E5500 core, load faulting address from SPRN_DEAR register.
As mentioned above, this is necessary because the E5500 core does
not
report the fault address in the MCAR register.
  
  Cc: Scott Wood scottw...@freescale.com
  Signed-off-by: Jojy G Varghese jo...@juniper.net
  [Guenter Roeck: updated description]
  Signed-off-by: Guenter Roeck gro...@juniper.net
  Signed-off-by: Guenter Roeck li...@roeck-us.net
  ---
   arch/powerpc/kernel/traps.c   | 3 ++-
   arch/powerpc/sysdev/fsl_pci.c | 5 +
   2 files changed, 7 insertions(+), 1 deletion(-)
  
  diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
  index 0dc43f9..ecb709b 100644
  --- a/arch/powerpc/kernel/traps.c
  +++ b/arch/powerpc/kernel/traps.c
  @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
 int recoverable = 1;
   
 if (reason  MCSR_LD) {
  -  recoverable = fsl_rio_mcheck_exception(regs);
  +  recoverable = fsl_rio_mcheck_exception(regs) ||
  +  fsl_pci_mcheck_exception(regs);
 if (recoverable == 1)
 goto silent_out;
 }
  diff --git a/arch/powerpc/sysdev/fsl_pci.c
b/arch/powerpc/sysdev/fsl_pci.c
  index c507767..bdb956b 100644
  --- a/arch/powerpc/sysdev/fsl_pci.c
  +++ b/arch/powerpc/sysdev/fsl_pci.c
  @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
*regs)
   #endif
 addr += mfspr(SPRN_MCAR);
   
  +#ifdef CONFIG_E5500_CPU
  +  if (mfspr(SPRN_EPCR)  SPRN_EPCR_ICM)
  +  addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
  +#endif
 
 Kconfig tells you what hardware is supported, not what hardware you're
 actually running on.
 
Hi Scott,

Good point. Jojy, guess we'll have to check if the CPU is actually an
E5500.
Can you look into that ?


/proc/cpuinfo shows the cpu as e5500. Scott, are you suggesting that
we use a runtime method of determining the cpu type (cpu_spec's cpu_name
for
example).  



 Jia Hongtao, do you know anything about this issue?  Is there an
 erratum?  What chips are affected by the the erratum covered by
 http://patchwork.ozlabs.org/patch/240239/?
 
We already have and use the above patch(es) in our kernel. It works fine
for E500 (P2020), but does not address E5500 (P5020/P5040).

 Can we rely on DEAR or is this just a side effect of likely having taken
 a TLB miss for the address recently?  Perhaps we should use the
 instruction emulation to determine the effective address instead.
 
 Guenter, is this patch intended to deal with an erratum or are you
 covering up legitimate errors?
 
Those are errors related to PCIe hotplug, and are seen with unexpected
PCIe
device removals (triggered, for example, by removing power from a PCIe
adapter).
The behavior we see on E5500 is quite similar to the same behavior on
E500:
If unhandled, the CPU keeps executing the same instruction over and over
again
if there is an error on a PCIe access and thus stalls. I don't know if
this
is considered an erratum or expected behavior, but it is one we have to
address
since we have to be able to handle that condition. Ultimately, we'll want
to
implement PCIe error handlers for the affected drivers, but that will be
a next
step.

According to the spec, we MCAR is supposed to hold the faulty data address
but for 5500 core, we found that MCAR is zero. You are right that DEAR
entry could
be a resultOf a TLB miss but that¹s the register we could rely on.

What do you mean by instruction emulation? Are you suggesting that we
examine the RD, RS 
registers for the instruction?




Please let me know if you have a better solution to address this problem.

Thanks,
Guenter


Thanks
Jojy