Re: EDAC PCIe errors when scannning the bus

2014-03-20 Thread Valentin Longchamp
Hello Johannes,

On 03/19/2014 04:54 PM, Johannes Thumshirn wrote:
 On Wed, Mar 19, 2014 at 01:46:37PM +0100, Valentin Longchamp wrote:
 Hello,

 We have a board that is based on Freescale's P2041 SoC. The boards has 2 PCIe
 buses with this topology:

 PCIe 0 --- PEX8505 switch --- 4 network devices
 PCIE 2 --- FPGA

 On 3.10.33 + a subset of the Freescale SDK 1.4 patches, both PCIe buses work
 well and we are able to use the devices on them.

 For each bus, I however keep getting EDAC PCIe errors at the very first 
 stage of
 bus enumeration (please see the attached kernel log, with some debug output 
 from
 arch/powerpc/kernel/pci-common.c and drivers/pci/probe.c) for both buses.

 My current understanding of the situation is such: since PCI_PROBE_NORMAL 
 is
 used, pcibios_scan_phb() calls pci_scan_child_bus() that does a 
 pci_scan_slot()
 on the bus for 32 slots. The first pci_scan_slot() is successful and it
 discovers the P2041's PCIe Controller. All the 31 other pci_scan_slot() calls
 generate an EDAC PCIe error, that is triggered by the configuration read
 transaction to read an hypothetical vendor ID of a device on the bus. This is
 relevant with that is reported by the EDAC error handler (all the 31 are the 
 same):

 PCIE error(s) detected
 PCIE ERR_DR register: 0x0002

 ICCA bit is set: Access to an illegal configuration space from
 PEX_CONFIG_ADDR/PEX_CONFIG_DATA was detected.

 PCIE ERR_CAP_STAT register: 0x8001

 To is set: Transaction originated from PEX_CONFIG_ADDR/PEX_CONFIG_DATA.

 PCIE ERR_CAP_R0 register: 0x0800

 FMT: 0b00, TYPE: 0b00100 (Config read I guess)

 PCIE ERR_CAP_R1 register: 0x
 PCIE ERR_CAP_R2 register: 0x
 PCIE ERR_CAP_R3 register: 0x

 Afterwards, pci_scan_child_bus() calls pcibios_fixup_bus (that maybe helps 
 ?).
 From here, since the P2041's PCIe Controller is a bridge, pci_scan_bridge is
 called for this bus and all the devices are detected without having any
 configuration transaction causing EDAC errors.

 Has someone already observed such a behavior ? Why do these initial 
 transaction
 generate an error ? What would be a possible fix to avoid these transaction
 errors for these 31 (unneded ?) pci_scan_slot() calls on the initial bus ?

 
 I've encountered similar problems on a P4080 based design (mine has additional
 machine checks that cause an oops). I haven't solved it yet, so I 
 unfortunately
 can't offer you a fix. But I was told there are some errata workarounds that
 more or less could have an impact on PCIe behavior. Could you show me the 
 output
 of U-Boot's errata command?

Here is the output for the errata command:

 = errata
 Work-around for Erratum CPU-A003999 enabled
 Work-around for Erratum DDR-A003473 enabled
 Work-around for Erratum ESDHC111 enabled
 Work-around for Erratum DDR-A003 enabled
 Work-around for Erratum A004510 enabled
 Work-around for Erratum SRIO-A004034 enabled
 Work-around for Erratum A004849 is not enabled
 Work-around for Erratum A004580 is not enabled
 Work-around for Erratum USB14 enabled

 
 Especially if the workarounds for A-004580 and A-004849 are in place.
 

So both are not enabled, I am going to fix that. Surprisingly, A-004580 is not
defined for the P2041 in u-boot even though it is also present in the P2041's
errata sheet, I had to enable it myself.

However, I expect that enabling the workarounds for these 2 Errata are good for
the system but it will not solve the PCIe EDAC problem.

Thank you for the input.

Valentin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: EDAC PCIe errors when scannning the bus

2014-03-19 Thread Johannes Thumshirn
On Wed, Mar 19, 2014 at 01:46:37PM +0100, Valentin Longchamp wrote:
 Hello,

 We have a board that is based on Freescale's P2041 SoC. The boards has 2 PCIe
 buses with this topology:

 PCIe 0 --- PEX8505 switch --- 4 network devices
 PCIE 2 --- FPGA

 On 3.10.33 + a subset of the Freescale SDK 1.4 patches, both PCIe buses work
 well and we are able to use the devices on them.

 For each bus, I however keep getting EDAC PCIe errors at the very first stage 
 of
 bus enumeration (please see the attached kernel log, with some debug output 
 from
 arch/powerpc/kernel/pci-common.c and drivers/pci/probe.c) for both buses.

 My current understanding of the situation is such: since PCI_PROBE_NORMAL is
 used, pcibios_scan_phb() calls pci_scan_child_bus() that does a 
 pci_scan_slot()
 on the bus for 32 slots. The first pci_scan_slot() is successful and it
 discovers the P2041's PCIe Controller. All the 31 other pci_scan_slot() calls
 generate an EDAC PCIe error, that is triggered by the configuration read
 transaction to read an hypothetical vendor ID of a device on the bus. This is
 relevant with that is reported by the EDAC error handler (all the 31 are the 
 same):

  PCIE error(s) detected
  PCIE ERR_DR register: 0x0002

 ICCA bit is set: Access to an illegal configuration space from
 PEX_CONFIG_ADDR/PEX_CONFIG_DATA was detected.

  PCIE ERR_CAP_STAT register: 0x8001

 To is set: Transaction originated from PEX_CONFIG_ADDR/PEX_CONFIG_DATA.

  PCIE ERR_CAP_R0 register: 0x0800

 FMT: 0b00, TYPE: 0b00100 (Config read I guess)

  PCIE ERR_CAP_R1 register: 0x
  PCIE ERR_CAP_R2 register: 0x
  PCIE ERR_CAP_R3 register: 0x

 Afterwards, pci_scan_child_bus() calls pcibios_fixup_bus (that maybe helps ?).
 From here, since the P2041's PCIe Controller is a bridge, pci_scan_bridge is
 called for this bus and all the devices are detected without having any
 configuration transaction causing EDAC errors.

 Has someone already observed such a behavior ? Why do these initial 
 transaction
 generate an error ? What would be a possible fix to avoid these transaction
 errors for these 31 (unneded ?) pci_scan_slot() calls on the initial bus ?

 Best Regards,

 Valentin


Hi Valentin,

I've encountered similar problems on a P4080 based design (mine has additional
machine checks that cause an oops). I haven't solved it yet, so I unfortunately
can't offer you a fix. But I was told there are some errata workarounds that
more or less could have an impact on PCIe behavior. Could you show me the output
of U-Boot's errata command?

Especially if the workarounds for A-004580 and A-004849 are in place.

Johannes

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: EDAC PCIe errors when scannning the bus

2014-03-19 Thread Rajat Jain
Hello,

 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-
 ow...@vger.kernel.org] On Behalf Of Valentin Longchamp
 Sent: Wednesday, March 19, 2014 5:47 AM
 To: linuxppc-dev@lists.ozlabs.org; linux-...@vger.kernel.org
 Subject: EDAC PCIe errors when scannning the bus
 
 Hello,
 
 We have a board that is based on Freescale's P2041 SoC. The boards has 2
 PCIe buses with this topology:
 
 PCIe 0 --- PEX8505 switch --- 4 network devices PCIE 2 --- FPGA
 
 On 3.10.33 + a subset of the Freescale SDK 1.4 patches, both PCIe buses
 work well and we are able to use the devices on them.
 
 For each bus, I however keep getting EDAC PCIe errors at the very first
 stage of bus enumeration (please see the attached kernel log, with some
 debug output from arch/powerpc/kernel/pci-common.c and
 drivers/pci/probe.c) for both buses.
 
 My current understanding of the situation is such: since
 PCI_PROBE_NORMAL is used, pcibios_scan_phb() calls pci_scan_child_bus()
 that does a pci_scan_slot() on the bus for 32 slots. The first
 pci_scan_slot() is successful and it discovers the P2041's PCIe
 Controller. All the 31 other pci_scan_slot() calls generate an EDAC PCIe
 error, that is triggered by the configuration read transaction to read
 an hypothetical vendor ID of a device on the bus. This is relevant with
 that is reported by the EDAC error handler (all the 31 are the same):
 
  PCIE error(s) detected
  PCIE ERR_DR register: 0x0002
 
 ICCA bit is set: Access to an illegal configuration space from
 PEX_CONFIG_ADDR/PEX_CONFIG_DATA was detected.
 
  PCIE ERR_CAP_STAT register: 0x8001
 
 To is set: Transaction originated from PEX_CONFIG_ADDR/PEX_CONFIG_DATA.
 
  PCIE ERR_CAP_R0 register: 0x0800
 
 FMT: 0b00, TYPE: 0b00100 (Config read I guess)
 
  PCIE ERR_CAP_R1 register: 0x
  PCIE ERR_CAP_R2 register: 0x
  PCIE ERR_CAP_R3 register: 0x
 
 Afterwards, pci_scan_child_bus() calls pcibios_fixup_bus (that maybe
 helps ?).
 From here, since the P2041's PCIe Controller is a bridge,
 pci_scan_bridge is called for this bus and all the devices are detected
 without having any configuration transaction causing EDAC errors.
 
 Has someone already observed such a behavior ? Why do these initial
 transaction generate an error ? What would be a possible fix to avoid
 these transaction errors for these 31 (unneded ?) pci_scan_slot() calls
 on the initial bus ?

I see this too on my P5020 based platform. No fix yet, for now disabling the 
EDAC.

Thanks,

Rajat

 
 Best Regards,
 
 Valentin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev