Re: EDAC PCIe errors when scannning the bus
Hello Johannes, On 03/19/2014 04:54 PM, Johannes Thumshirn wrote: On Wed, Mar 19, 2014 at 01:46:37PM +0100, Valentin Longchamp wrote: Hello, We have a board that is based on Freescale's P2041 SoC. The boards has 2 PCIe buses with this topology: PCIe 0 --- PEX8505 switch --- 4 network devices PCIE 2 --- FPGA On 3.10.33 + a subset of the Freescale SDK 1.4 patches, both PCIe buses work well and we are able to use the devices on them. For each bus, I however keep getting EDAC PCIe errors at the very first stage of bus enumeration (please see the attached kernel log, with some debug output from arch/powerpc/kernel/pci-common.c and drivers/pci/probe.c) for both buses. My current understanding of the situation is such: since PCI_PROBE_NORMAL is used, pcibios_scan_phb() calls pci_scan_child_bus() that does a pci_scan_slot() on the bus for 32 slots. The first pci_scan_slot() is successful and it discovers the P2041's PCIe Controller. All the 31 other pci_scan_slot() calls generate an EDAC PCIe error, that is triggered by the configuration read transaction to read an hypothetical vendor ID of a device on the bus. This is relevant with that is reported by the EDAC error handler (all the 31 are the same): PCIE error(s) detected PCIE ERR_DR register: 0x0002 ICCA bit is set: Access to an illegal configuration space from PEX_CONFIG_ADDR/PEX_CONFIG_DATA was detected. PCIE ERR_CAP_STAT register: 0x8001 To is set: Transaction originated from PEX_CONFIG_ADDR/PEX_CONFIG_DATA. PCIE ERR_CAP_R0 register: 0x0800 FMT: 0b00, TYPE: 0b00100 (Config read I guess) PCIE ERR_CAP_R1 register: 0x PCIE ERR_CAP_R2 register: 0x PCIE ERR_CAP_R3 register: 0x Afterwards, pci_scan_child_bus() calls pcibios_fixup_bus (that maybe helps ?). From here, since the P2041's PCIe Controller is a bridge, pci_scan_bridge is called for this bus and all the devices are detected without having any configuration transaction causing EDAC errors. Has someone already observed such a behavior ? Why do these initial transaction generate an error ? What would be a possible fix to avoid these transaction errors for these 31 (unneded ?) pci_scan_slot() calls on the initial bus ? I've encountered similar problems on a P4080 based design (mine has additional machine checks that cause an oops). I haven't solved it yet, so I unfortunately can't offer you a fix. But I was told there are some errata workarounds that more or less could have an impact on PCIe behavior. Could you show me the output of U-Boot's errata command? Here is the output for the errata command: = errata Work-around for Erratum CPU-A003999 enabled Work-around for Erratum DDR-A003473 enabled Work-around for Erratum ESDHC111 enabled Work-around for Erratum DDR-A003 enabled Work-around for Erratum A004510 enabled Work-around for Erratum SRIO-A004034 enabled Work-around for Erratum A004849 is not enabled Work-around for Erratum A004580 is not enabled Work-around for Erratum USB14 enabled Especially if the workarounds for A-004580 and A-004849 are in place. So both are not enabled, I am going to fix that. Surprisingly, A-004580 is not defined for the P2041 in u-boot even though it is also present in the P2041's errata sheet, I had to enable it myself. However, I expect that enabling the workarounds for these 2 Errata are good for the system but it will not solve the PCIe EDAC problem. Thank you for the input. Valentin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: EDAC PCIe errors when scannning the bus
On Wed, Mar 19, 2014 at 01:46:37PM +0100, Valentin Longchamp wrote: Hello, We have a board that is based on Freescale's P2041 SoC. The boards has 2 PCIe buses with this topology: PCIe 0 --- PEX8505 switch --- 4 network devices PCIE 2 --- FPGA On 3.10.33 + a subset of the Freescale SDK 1.4 patches, both PCIe buses work well and we are able to use the devices on them. For each bus, I however keep getting EDAC PCIe errors at the very first stage of bus enumeration (please see the attached kernel log, with some debug output from arch/powerpc/kernel/pci-common.c and drivers/pci/probe.c) for both buses. My current understanding of the situation is such: since PCI_PROBE_NORMAL is used, pcibios_scan_phb() calls pci_scan_child_bus() that does a pci_scan_slot() on the bus for 32 slots. The first pci_scan_slot() is successful and it discovers the P2041's PCIe Controller. All the 31 other pci_scan_slot() calls generate an EDAC PCIe error, that is triggered by the configuration read transaction to read an hypothetical vendor ID of a device on the bus. This is relevant with that is reported by the EDAC error handler (all the 31 are the same): PCIE error(s) detected PCIE ERR_DR register: 0x0002 ICCA bit is set: Access to an illegal configuration space from PEX_CONFIG_ADDR/PEX_CONFIG_DATA was detected. PCIE ERR_CAP_STAT register: 0x8001 To is set: Transaction originated from PEX_CONFIG_ADDR/PEX_CONFIG_DATA. PCIE ERR_CAP_R0 register: 0x0800 FMT: 0b00, TYPE: 0b00100 (Config read I guess) PCIE ERR_CAP_R1 register: 0x PCIE ERR_CAP_R2 register: 0x PCIE ERR_CAP_R3 register: 0x Afterwards, pci_scan_child_bus() calls pcibios_fixup_bus (that maybe helps ?). From here, since the P2041's PCIe Controller is a bridge, pci_scan_bridge is called for this bus and all the devices are detected without having any configuration transaction causing EDAC errors. Has someone already observed such a behavior ? Why do these initial transaction generate an error ? What would be a possible fix to avoid these transaction errors for these 31 (unneded ?) pci_scan_slot() calls on the initial bus ? Best Regards, Valentin Hi Valentin, I've encountered similar problems on a P4080 based design (mine has additional machine checks that cause an oops). I haven't solved it yet, so I unfortunately can't offer you a fix. But I was told there are some errata workarounds that more or less could have an impact on PCIe behavior. Could you show me the output of U-Boot's errata command? Especially if the workarounds for A-004580 and A-004849 are in place. Johannes ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: EDAC PCIe errors when scannning the bus
Hello, -Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci- ow...@vger.kernel.org] On Behalf Of Valentin Longchamp Sent: Wednesday, March 19, 2014 5:47 AM To: linuxppc-dev@lists.ozlabs.org; linux-...@vger.kernel.org Subject: EDAC PCIe errors when scannning the bus Hello, We have a board that is based on Freescale's P2041 SoC. The boards has 2 PCIe buses with this topology: PCIe 0 --- PEX8505 switch --- 4 network devices PCIE 2 --- FPGA On 3.10.33 + a subset of the Freescale SDK 1.4 patches, both PCIe buses work well and we are able to use the devices on them. For each bus, I however keep getting EDAC PCIe errors at the very first stage of bus enumeration (please see the attached kernel log, with some debug output from arch/powerpc/kernel/pci-common.c and drivers/pci/probe.c) for both buses. My current understanding of the situation is such: since PCI_PROBE_NORMAL is used, pcibios_scan_phb() calls pci_scan_child_bus() that does a pci_scan_slot() on the bus for 32 slots. The first pci_scan_slot() is successful and it discovers the P2041's PCIe Controller. All the 31 other pci_scan_slot() calls generate an EDAC PCIe error, that is triggered by the configuration read transaction to read an hypothetical vendor ID of a device on the bus. This is relevant with that is reported by the EDAC error handler (all the 31 are the same): PCIE error(s) detected PCIE ERR_DR register: 0x0002 ICCA bit is set: Access to an illegal configuration space from PEX_CONFIG_ADDR/PEX_CONFIG_DATA was detected. PCIE ERR_CAP_STAT register: 0x8001 To is set: Transaction originated from PEX_CONFIG_ADDR/PEX_CONFIG_DATA. PCIE ERR_CAP_R0 register: 0x0800 FMT: 0b00, TYPE: 0b00100 (Config read I guess) PCIE ERR_CAP_R1 register: 0x PCIE ERR_CAP_R2 register: 0x PCIE ERR_CAP_R3 register: 0x Afterwards, pci_scan_child_bus() calls pcibios_fixup_bus (that maybe helps ?). From here, since the P2041's PCIe Controller is a bridge, pci_scan_bridge is called for this bus and all the devices are detected without having any configuration transaction causing EDAC errors. Has someone already observed such a behavior ? Why do these initial transaction generate an error ? What would be a possible fix to avoid these transaction errors for these 31 (unneded ?) pci_scan_slot() calls on the initial bus ? I see this too on my P5020 based platform. No fix yet, for now disabling the EDAC. Thanks, Rajat Best Regards, Valentin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev