Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
On 7/1/2011 1:00 PM, Richard A Lary wrote: On 7/1/2011 12:02 PM, Jon Mason wrote: On Fri, Jul 1, 2011 at 1:30 PM, Richard A Laryrl...@linux.vnet.ibm.com wrote: On 7/1/2011 8:24 AM, Jon Mason wrote: I recently sent out a number of patches to migrate drivers calling `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This function takes uses a PCI-E capability offset that was determined by calling pci_find_capability during the PCI bus walking. In response to one of the patches, James Smart posted: The reason is due to an issue on PPC platforms whereby use of pdev-is_pcie and pci_is_pcie() will erroneously fail under some conditions, but explicit search for the capability struct via pci_find_capability() is always successful. I expect this to be due a shadowing of pci config space in the hal/platform that isn't sufficiently built up. We detected this issue while testing AER/EEH, and are functional only if the pci_find_capability() option is used. See http://marc.info/?l=linux-scsim=130946649427828w=2 for the whole post. Based on his description above pci_pcie_cap andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally equivalent. If this is not safe, then the PCI bus walking code is most likely busted on EEH enabled PPC systems (and that is a BIG problem). Can anyone confirm this is still an issue? Jon, I applied the following debug patch to lpfc driver in a 2.6.32 distro kernel ( I had this one handy, I can try with mainline later today ) --- drivers/scsi/lpfc/lpfc_init.c | 10 10 + 0 - 0 ! 1 file changed, 10 insertions(+) Index: b/drivers/scsi/lpfc/lpfc_init.c === --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb pci_try_set_mwi(pdev); pci_save_state(pdev); + printk(KERN_WARNING pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n, + pdev-is_pcie, + pdev-pcie_cap, + pdev-pcie_type); + + if (pci_is_pcie(pdev)) + printk(KERN_WARNING pcicap: true\n); + else + printk(KERN_WARNING pcicap: false\n); + /* PCIe EEH recovery on powerpc platforms needs fundamental reset */ if (pci_find_capability(pdev, PCI_CAP_ID_EXP)) pdev-needs_freset = 1; This is output upon driver load on an IBM Power 7 model 8233-E8B server. dmesg | grep pcicap Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27 PDT 2011 pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false It would appear that the pcie information is not set in pci_dev structure for this device at the time the driver is being initialized during boot. Thanks for trying this. Can you confirm that the other devices in the system have this issue as well (or show that it is isolated to the lpr device)? You can add printks in set_pcie_port_type() to verify what is being set on bus walking and to see when it is being called with respect to when it is being populated by firmware. Jon, I will give this suggestion a try and post results On Power PC platforms, set_pcie_port_type() is not called. On Power PC, pci_dev structure is initialized by of_create_pci_dev(). However, the structure member pcie_cap is NOT computed nor set in this function. The information used to populate pci_dev comes from the Power PC device_tree passed to the OS by Open Firmware. Based upon standing Power PC design, we cannot support patches which replace pci_find_capability(pdef, PCI_CAP_ID_EXP) with pci_is_pcie(pdev) on Power PC platforms. -rich ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
On 7/5/2011 9:18 AM, Jon Mason wrote: On Tue, Jul 5, 2011 at 10:41 AM, Richard A Lary rl...@linux.vnet.ibm.com wrote: On 7/1/2011 1:00 PM, Richard A Lary wrote: On 7/1/2011 12:02 PM, Jon Mason wrote: On Fri, Jul 1, 2011 at 1:30 PM, Richard A Laryrl...@linux.vnet.ibm.com wrote: On 7/1/2011 8:24 AM, Jon Mason wrote: I recently sent out a number of patches to migrate drivers calling `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This function takes uses a PCI-E capability offset that was determined by calling pci_find_capability during the PCI bus walking. In response to one of the patches, James Smart posted: The reason is due to an issue on PPC platforms whereby use of pdev-is_pcie and pci_is_pcie() will erroneously fail under some conditions, but explicit search for the capability struct via pci_find_capability() is always successful. I expect this to be due a shadowing of pci config space in the hal/platform that isn't sufficiently built up. We detected this issue while testing AER/EEH, and are functional only if the pci_find_capability() option is used. See http://marc.info/?l=linux-scsim=130946649427828w=2 for the whole post. Based on his description above pci_pcie_cap andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally equivalent. If this is not safe, then the PCI bus walking code is most likely busted on EEH enabled PPC systems (and that is a BIG problem). Can anyone confirm this is still an issue? Jon, I applied the following debug patch to lpfc driver in a 2.6.32 distro kernel ( I had this one handy, I can try with mainline later today ) --- drivers/scsi/lpfc/lpfc_init.c | 10 10 + 0 - 0 ! 1 file changed, 10 insertions(+) Index: b/drivers/scsi/lpfc/lpfc_init.c === --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb pci_try_set_mwi(pdev); pci_save_state(pdev); + printk(KERN_WARNING pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n, + pdev-is_pcie, + pdev-pcie_cap, + pdev-pcie_type); + + if (pci_is_pcie(pdev)) + printk(KERN_WARNING pcicap: true\n); + else + printk(KERN_WARNING pcicap: false\n); + /* PCIe EEH recovery on powerpc platforms needs fundamental reset */ if (pci_find_capability(pdev, PCI_CAP_ID_EXP)) pdev-needs_freset = 1; This is output upon driver load on an IBM Power 7 model 8233-E8B server. dmesg | grep pcicap Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27 PDT 2011 pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false It would appear that the pcie information is not set in pci_dev structure for this device at the time the driver is being initialized during boot. Thanks for trying this. Can you confirm that the other devices in the system have this issue as well (or show that it is isolated to the lpr device)? You can add printks in set_pcie_port_type() to verify what is being set on bus walking and to see when it is being called with respect to when it is being populated by firmware. Jon, I will give this suggestion a try and post results On Power PC platforms, set_pcie_port_type() is not called. On Power PC, pci_dev structure is initialized by of_create_pci_dev(). However, the structure member pcie_cap is NOT computed nor set in this function. Yes, it is. of_create_pci_dev() calls set_pcie_port_type() http://lxr.linux.no/linux+v2.6.39/arch/powerpc/kernel/pci_of_scan.c#L144 That function sets pdev-pcie_cap http://lxr.linux.no/linux+v2.6.39/drivers/pci/probe.c#L896 So, it should be set. It looks like there is a bug in of_create_pci_dev, as set_pcie_port_type is being called BEFORE the BARs are setup. If you move set_pcie_port_type prior to pci_device_add (perhaps even after), then I bet the issue is resolved. The claim above was based upon observation that with this patch applied to a 2.6.32 kernel, the printk output did not appear in dmesg upon boot. static void set_pcie_hotplug_bridge(struct pci_dev *pdev) @@ -774,6 +780,8 @@ int pci_setup_device(struct pci_dev *dev u8 hdr_type; struct pci_slot *slot; + printk(KERN_WARNING pcicap: setup_device %p\n, dev); + if (pci_read_config_byte(dev, PCI_HEADER_TYPE, hdr_type)) return -EIO; I can make no claim about my understanding of pci device initialization on Power PC, so I have added Anton Blanchard to the cc list. Perhaps Anton can explain why pcie_cap is always 0 on Power PC. -rich The information used to populate pci_dev comes from the Power PC device_tree passed to the OS by Open Firmware. Based upon standing Power PC design, we cannot support patches which replace pci_find_capability(pdef, PCI_CAP_ID_EXP
Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
On 7/5/2011 10:22 AM, Richard A Lary wrote: On 7/5/2011 9:18 AM, Jon Mason wrote: On Tue, Jul 5, 2011 at 10:41 AM, Richard A Lary rl...@linux.vnet.ibm.com wrote: On 7/1/2011 1:00 PM, Richard A Lary wrote: On 7/1/2011 12:02 PM, Jon Mason wrote: On Fri, Jul 1, 2011 at 1:30 PM, Richard A Laryrl...@linux.vnet.ibm.com wrote: On 7/1/2011 8:24 AM, Jon Mason wrote: I recently sent out a number of patches to migrate drivers calling `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This function takes uses a PCI-E capability offset that was determined by calling pci_find_capability during the PCI bus walking. In response to one of the patches, James Smart posted: The reason is due to an issue on PPC platforms whereby use of pdev-is_pcie and pci_is_pcie() will erroneously fail under some conditions, but explicit search for the capability struct via pci_find_capability() is always successful. I expect this to be due a shadowing of pci config space in the hal/platform that isn't sufficiently built up. We detected this issue while testing AER/EEH, and are functional only if the pci_find_capability() option is used. See http://marc.info/?l=linux-scsim=130946649427828w=2 for the whole post. Based on his description above pci_pcie_cap andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally equivalent. If this is not safe, then the PCI bus walking code is most likely busted on EEH enabled PPC systems (and that is a BIG problem). Can anyone confirm this is still an issue? Jon, I applied the following debug patch to lpfc driver in a 2.6.32 distro kernel ( I had this one handy, I can try with mainline later today ) --- drivers/scsi/lpfc/lpfc_init.c | 10 10 + 0 - 0 ! 1 file changed, 10 insertions(+) Index: b/drivers/scsi/lpfc/lpfc_init.c === --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb pci_try_set_mwi(pdev); pci_save_state(pdev); + printk(KERN_WARNING pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n, + pdev-is_pcie, + pdev-pcie_cap, + pdev-pcie_type); + + if (pci_is_pcie(pdev)) + printk(KERN_WARNING pcicap: true\n); + else + printk(KERN_WARNING pcicap: false\n); + /* PCIe EEH recovery on powerpc platforms needs fundamental reset */ if (pci_find_capability(pdev, PCI_CAP_ID_EXP)) pdev-needs_freset = 1; This is output upon driver load on an IBM Power 7 model 8233-E8B server. dmesg | grep pcicap Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27 PDT 2011 pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false It would appear that the pcie information is not set in pci_dev structure for this device at the time the driver is being initialized during boot. Thanks for trying this. Can you confirm that the other devices in the system have this issue as well (or show that it is isolated to the lpr device)? You can add printks in set_pcie_port_type() to verify what is being set on bus walking and to see when it is being called with respect to when it is being populated by firmware. Jon, I will give this suggestion a try and post results On Power PC platforms, set_pcie_port_type() is not called. On Power PC, pci_dev structure is initialized by of_create_pci_dev(). However, the structure member pcie_cap is NOT computed nor set in this function. Yes, it is. of_create_pci_dev() calls set_pcie_port_type() http://lxr.linux.no/linux+v2.6.39/arch/powerpc/kernel/pci_of_scan.c#L144 That function sets pdev-pcie_cap http://lxr.linux.no/linux+v2.6.39/drivers/pci/probe.c#L896 So, it should be set. It looks like there is a bug in of_create_pci_dev, as set_pcie_port_type is being called BEFORE the BARs are setup. If you move set_pcie_port_type prior to pci_device_add (perhaps even after), then I bet the issue is resolved. The claim above was based upon observation that with this patch applied to a 2.6.32 kernel, the printk output did not appear in dmesg upon boot. static void set_pcie_hotplug_bridge(struct pci_dev *pdev) @@ -774,6 +780,8 @@ int pci_setup_device(struct pci_dev *dev u8 hdr_type; struct pci_slot *slot; + printk(KERN_WARNING pcicap: setup_device %p\n, dev); + if (pci_read_config_byte(dev, PCI_HEADER_TYPE, hdr_type)) return -EIO; I can make no claim about my understanding of pci device initialization on Power PC, so I have added Anton Blanchard to the cc list. Perhaps Anton can explain why pcie_cap is always 0 on Power PC. I pulled down 3.0-rc6 kernel, I will build and test the pci_is_pcie(pdev) patch in lpfc driver now that set_pcie_port_type() is called from of_create_pci_dev(). -rich The information used to populate pci_dev comes from the Power PC device_tree passed
Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
On 7/5/2011 1:34 PM, Richard A Lary wrote: On 7/5/2011 10:22 AM, Richard A Lary wrote: On 7/5/2011 9:18 AM, Jon Mason wrote: On Tue, Jul 5, 2011 at 10:41 AM, Richard A Lary rl...@linux.vnet.ibm.com wrote: On 7/1/2011 1:00 PM, Richard A Lary wrote: On 7/1/2011 12:02 PM, Jon Mason wrote: On Fri, Jul 1, 2011 at 1:30 PM, Richard A Laryrl...@linux.vnet.ibm.com wrote: On 7/1/2011 8:24 AM, Jon Mason wrote: I recently sent out a number of patches to migrate drivers calling `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This function takes uses a PCI-E capability offset that was determined by calling pci_find_capability during the PCI bus walking. In response to one of the patches, James Smart posted: The reason is due to an issue on PPC platforms whereby use of pdev-is_pcie and pci_is_pcie() will erroneously fail under some conditions, but explicit search for the capability struct via pci_find_capability() is always successful. I expect this to be due a shadowing of pci config space in the hal/platform that isn't sufficiently built up. We detected this issue while testing AER/EEH, and are functional only if the pci_find_capability() option is used. See http://marc.info/?l=linux-scsim=130946649427828w=2 for the whole post. Based on his description above pci_pcie_cap andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally equivalent. If this is not safe, then the PCI bus walking code is most likely busted on EEH enabled PPC systems (and that is a BIG problem). Can anyone confirm this is still an issue? Jon, I applied the following debug patch to lpfc driver in a 2.6.32 distro kernel ( I had this one handy, I can try with mainline later today ) --- drivers/scsi/lpfc/lpfc_init.c | 10 10 + 0 - 0 ! 1 file changed, 10 insertions(+) Index: b/drivers/scsi/lpfc/lpfc_init.c === --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb pci_try_set_mwi(pdev); pci_save_state(pdev); + printk(KERN_WARNING pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n, + pdev-is_pcie, + pdev-pcie_cap, + pdev-pcie_type); + + if (pci_is_pcie(pdev)) + printk(KERN_WARNING pcicap: true\n); + else + printk(KERN_WARNING pcicap: false\n); + /* PCIe EEH recovery on powerpc platforms needs fundamental reset */ if (pci_find_capability(pdev, PCI_CAP_ID_EXP)) pdev-needs_freset = 1; This is output upon driver load on an IBM Power 7 model 8233-E8B server. dmesg | grep pcicap Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27 PDT 2011 pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false It would appear that the pcie information is not set in pci_dev structure for this device at the time the driver is being initialized during boot. Thanks for trying this. Can you confirm that the other devices in the system have this issue as well (or show that it is isolated to the lpr device)? You can add printks in set_pcie_port_type() to verify what is being set on bus walking and to see when it is being called with respect to when it is being populated by firmware. Jon, I will give this suggestion a try and post results On Power PC platforms, set_pcie_port_type() is not called. On Power PC, pci_dev structure is initialized by of_create_pci_dev(). However, the structure member pcie_cap is NOT computed nor set in this function. Yes, it is. of_create_pci_dev() calls set_pcie_port_type() http://lxr.linux.no/linux+v2.6.39/arch/powerpc/kernel/pci_of_scan.c#L144 That function sets pdev-pcie_cap http://lxr.linux.no/linux+v2.6.39/drivers/pci/probe.c#L896 So, it should be set. It looks like there is a bug in of_create_pci_dev, as set_pcie_port_type is being called BEFORE the BARs are setup. If you move set_pcie_port_type prior to pci_device_add (perhaps even after), then I bet the issue is resolved. The claim above was based upon observation that with this patch applied to a 2.6.32 kernel, the printk output did not appear in dmesg upon boot. static void set_pcie_hotplug_bridge(struct pci_dev *pdev) @@ -774,6 +780,8 @@ int pci_setup_device(struct pci_dev *dev u8 hdr_type; struct pci_slot *slot; + printk(KERN_WARNING pcicap: setup_device %p\n, dev); + if (pci_read_config_byte(dev, PCI_HEADER_TYPE, hdr_type)) return -EIO; I can make no claim about my understanding of pci device initialization on Power PC, so I have added Anton Blanchard to the cc list. Perhaps Anton can explain why pcie_cap is always 0 on Power PC. I pulled down 3.0-rc6 kernel, I will build and test the pci_is_pcie(pdev) patch in lpfc driver now that set_pcie_port_type() is called from of_create_pci_dev(). Jon, I applied the debug patches mentioned above
Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
On 7/1/2011 8:24 AM, Jon Mason wrote: I recently sent out a number of patches to migrate drivers calling `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This function takes uses a PCI-E capability offset that was determined by calling pci_find_capability during the PCI bus walking. In response to one of the patches, James Smart posted: The reason is due to an issue on PPC platforms whereby use of pdev-is_pcie and pci_is_pcie() will erroneously fail under some conditions, but explicit search for the capability struct via pci_find_capability() is always successful. I expect this to be due a shadowing of pci config space in the hal/platform that isn't sufficiently built up. We detected this issue while testing AER/EEH, and are functional only if the pci_find_capability() option is used. See http://marc.info/?l=linux-scsim=130946649427828w=2 for the whole post. Based on his description above pci_pcie_cap andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally equivalent. If this is not safe, then the PCI bus walking code is most likely busted on EEH enabled PPC systems (and that is a BIG problem). Can anyone confirm this is still an issue? Jon, I applied the following debug patch to lpfc driver in a 2.6.32 distro kernel ( I had this one handy, I can try with mainline later today ) --- drivers/scsi/lpfc/lpfc_init.c | 10 10 +0 - 0 ! 1 file changed, 10 insertions(+) Index: b/drivers/scsi/lpfc/lpfc_init.c === --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb pci_try_set_mwi(pdev); pci_save_state(pdev); + printk(KERN_WARNING pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n, + pdev-is_pcie, + pdev-pcie_cap, + pdev-pcie_type); + + if (pci_is_pcie(pdev)) + printk(KERN_WARNING pcicap: true\n); + else + printk(KERN_WARNING pcicap: false\n); + /* PCIe EEH recovery on powerpc platforms needs fundamental reset */ if (pci_find_capability(pdev, PCI_CAP_ID_EXP)) pdev-needs_freset = 1; This is output upon driver load on an IBM Power 7 model 8233-E8B server. dmesg | grep pcicap Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27 PDT 2011 pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false It would appear that the pcie information is not set in pci_dev structure for this device at the time the driver is being initialized during boot. -rich ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
On 7/1/2011 12:02 PM, Jon Mason wrote: On Fri, Jul 1, 2011 at 1:30 PM, Richard A Laryrl...@linux.vnet.ibm.com wrote: On 7/1/2011 8:24 AM, Jon Mason wrote: I recently sent out a number of patches to migrate drivers calling `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This function takes uses a PCI-E capability offset that was determined by calling pci_find_capability during the PCI bus walking. In response to one of the patches, James Smart posted: The reason is due to an issue on PPC platforms whereby use of pdev-is_pcie and pci_is_pcie() will erroneously fail under some conditions, but explicit search for the capability struct via pci_find_capability() is always successful. I expect this to be due a shadowing of pci config space in the hal/platform that isn't sufficiently built up. We detected this issue while testing AER/EEH, and are functional only if the pci_find_capability() option is used. See http://marc.info/?l=linux-scsim=130946649427828w=2 for the whole post. Based on his description above pci_pcie_cap andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally equivalent. If this is not safe, then the PCI bus walking code is most likely busted on EEH enabled PPC systems (and that is a BIG problem). Can anyone confirm this is still an issue? Jon, I applied the following debug patch to lpfc driver in a 2.6.32 distro kernel ( I had this one handy, I can try with mainline later today ) --- drivers/scsi/lpfc/lpfc_init.c | 10 10 +0 - 0 ! 1 file changed, 10 insertions(+) Index: b/drivers/scsi/lpfc/lpfc_init.c === --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb pci_try_set_mwi(pdev); pci_save_state(pdev); + printk(KERN_WARNING pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n, + pdev-is_pcie, + pdev-pcie_cap, + pdev-pcie_type); + + if (pci_is_pcie(pdev)) + printk(KERN_WARNING pcicap: true\n); + else + printk(KERN_WARNING pcicap: false\n); + /* PCIe EEH recovery on powerpc platforms needs fundamental reset */ if (pci_find_capability(pdev, PCI_CAP_ID_EXP)) pdev-needs_freset = 1; This is output upon driver load on an IBM Power 7 model 8233-E8B server. dmesg | grep pcicap Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27 PDT 2011 pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false pcicap: is_pcie=0 pci_cap=0 pcie_type=0 pcicap: false It would appear that the pcie information is not set in pci_dev structure for this device at the time the driver is being initialized during boot. Thanks for trying this. Can you confirm that the other devices in the system have this issue as well (or show that it is isolated to the lpr device)? You can add printks in set_pcie_port_type() to verify what is being set on bus walking and to see when it is being called with respect to when it is being populated by firmware. Jon, I will give this suggestion a try and post results -rich ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3] [repost] powerpc/eeh: Display eeh error location for bus and device
From: Richard A Lary rl...@linux.vnet.ibm.com For adapters which have devices under a PCIe switch/bridge it is informative to display information for both the PCIe switch/bridge and the device on which the bus error was detected. rebased to powerpc-next Signed-off-by: Richard A Lary rl...@linux.vnet.ibm.com --- --- arch/powerpc/platforms/pseries/eeh_driver.c | 22 13 +9 - 0 ! 1 file changed, 13 insertions(+), 9 deletions(-) Index: b/arch/powerpc/platforms/pseries/eeh_driver.c === --- a/arch/powerpc/platforms/pseries/eeh_driver.c +++ b/arch/powerpc/platforms/pseries/eeh_driver.c @@ -328,7 +328,7 @@ struct pci_dn * handle_eeh_events (struc struct pci_bus *frozen_bus; int rc = 0; enum pci_ers_result result = PCI_ERS_RESULT_NONE; - const char *location, *pci_str, *drv_str; + const char *location, *pci_str, *drv_str, *bus_pci_str, *bus_drv_str; frozen_dn = find_device_pe(event-dn); if (!frozen_dn) { @@ -364,13 +364,8 @@ struct pci_dn * handle_eeh_events (struc frozen_pdn = PCI_DN(frozen_dn); frozen_pdn-eeh_freeze_count++; - if (frozen_pdn-pcidev) { - pci_str = pci_name (frozen_pdn-pcidev); - drv_str = pcid_name (frozen_pdn-pcidev); - } else { - pci_str = eeh_pci_name(event-dev); - drv_str = pcid_name (event-dev); - } + pci_str = eeh_pci_name(event-dev); + drv_str = pcid_name(event-dev); if (frozen_pdn-eeh_freeze_count EEH_MAX_ALLOWED_FREEZES) goto excess_failures; @@ -378,8 +373,17 @@ struct pci_dn * handle_eeh_events (struc printk(KERN_WARNING EEH: This PCI device has failed %d times in the last hour:\n, frozen_pdn-eeh_freeze_count); + + if (frozen_pdn-pcidev) { + bus_pci_str = pci_name(frozen_pdn-pcidev); + bus_drv_str = pcid_name(frozen_pdn-pcidev); + printk(KERN_WARNING + EEH: Bus location=%s driver=%s pci addr=%s\n, + location, bus_drv_str, bus_pci_str); + } + printk(KERN_WARNING - EEH: location=%s driver=%s pci addr=%s\n, + EEH: Device location=%s driver=%s pci addr=%s\n, location, drv_str, pci_str); /* Walk the various device drivers attached to this slot through ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/3] powerpc/eeh: Bug fix and enhancement
This patch set contains three patches, all to eeh.c, which fix a issue seen on PCIe adapters with devices below PCIe bridge/switch, gracefully handle case where driver sets 'needs_freset' for non-PCIe adapter, and display location information for both PCIe bus and device for the device which detected an error. Patches were tested on P5/P6/P7 platforms on a variety of PCI, PCI-X and PCI-E adapters with no regressions. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] [PATCH 1/3]powerpc/eeh: Propagate needs_freset flag to device at PE
From: Richard A Lary rl...@linux.vnet.ibm.com For multifunction adapters with a PCI bridge or switch as the device at the Partitionable Endpoint(PE), if one or more devices below PE sets dev-needs_freset, that value will be set for the PE device. In other words, if any device below PE requires a fundamental reset the PE will request a fundamental reset. Signed-off-by: Richard A Lary rl...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/eeh.c | 4842 +6 - 0 ! 1 file changed, 42 insertions(+), 6 deletions(-) Index: b/arch/powerpc/platforms/pseries/eeh.c === --- a/arch/powerpc/platforms/pseries/eeh.c +++ b/arch/powerpc/platforms/pseries/eeh.c @@ -448,6 +448,39 @@ void eeh_clear_slot (struct device_node raw_spin_unlock_irqrestore(confirm_error_lock, flags); } +void __eeh_set_pe_freset(struct device_node *parent, unsigned int *freset) +{ + struct device_node *dn; + + for_each_child_of_node(parent, dn) { + if (PCI_DN(dn)) { + + struct pci_dev *dev = PCI_DN(dn)-pcidev; + + if (dev dev-driver) + *freset |= dev-needs_freset; + + __eeh_set_pe_freset(dn, freset); + } + } +} + +void eeh_set_pe_freset(struct device_node *dn, unsigned int *freset) +{ + struct pci_dev *dev; + dn = find_device_pe(dn); + + /* Back up one, since config addrs might be shared */ + if (!pcibios_find_pci_bus(dn) PCI_DN(dn-parent)) + dn = dn-parent; + + dev = PCI_DN(dn)-pcidev; + if (dev) + *freset |= dev-needs_freset; + + __eeh_set_pe_freset(dn, freset); +} + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -736,18 +769,21 @@ int pcibios_set_pcie_reset_state(struct /** * rtas_set_slot_reset -- assert the pci #RST line for 1/4 second * @pdn: pci device node to be reset. - * - * Return 0 if success, else a non-zero value. */ static void __rtas_set_slot_reset(struct pci_dn *pdn) { - struct pci_dev *dev = pdn-pcidev; + unsigned int freset = 0; - /* Determine type of EEH reset required by device, -* default hot reset or fundamental reset + /* Determine type of EEH reset required for +* Partitionable Endpoint, a hot-reset (1) +* or a fundamental reset (3). +* A fundamental reset required by any device under +* Partitionable Endpoint trumps hot-reset. */ - if (dev dev-needs_freset) + eeh_set_pe_freset(pdn-node, freset); + + if (freset) rtas_pci_slot_reset(pdn, 3); else rtas_pci_slot_reset(pdn, 1); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3] powerpc/eeh: Handle functional reset on non-PCIe device
From: Richard A Lary rl...@linux.vnet.ibm.com Fundamental reset is an optional reset type supported only by PCIe adapters. Handle the unexpected case where a non-PCIe device has requested a fundamental reset. Try hot-reset as a fallback to handle this case. Signed-off-by: Richard A Lary rl...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/eeh.c | 2116 +5 - 0 ! 1 file changed, 16 insertions(+), 5 deletions(-) Index: b/arch/powerpc/platforms/pseries/eeh.c === --- a/arch/powerpc/platforms/pseries/eeh.c +++ b/arch/powerpc/platforms/pseries/eeh.c @@ -725,15 +725,26 @@ rtas_pci_slot_reset(struct pci_dn *pdn, if (pdn-eeh_pe_config_addr) config_addr = pdn-eeh_pe_config_addr; - rc = rtas_call(ibm_set_slot_reset,4,1, NULL, + rc = rtas_call(ibm_set_slot_reset, 4, 1, NULL, config_addr, BUID_HI(pdn-phb-buid), BUID_LO(pdn-phb-buid), state); - if (rc) - printk (KERN_WARNING EEH: Unable to reset the failed slot, -(%d) #RST=%d dn=%s\n, - rc, state, pdn-node-full_name); + + /* Fundamental-reset not supported on this PE, try hot-reset */ + if (rc == -8 state == 3) { + state = 1; + rc = rtas_call(ibm_set_slot_reset, 4, 1, NULL, + config_addr, + BUID_HI(pdn-phb-buid), + BUID_LO(pdn-phb-buid), + state); + if (rc) + printk(KERN_WARNING + EEH: Unable to reset the failed slot, +(%d) #RST=%d dn=%s\n, + rc, state, pdn-node-full_name); + } } /** ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3] powerpc/eeh: Handle functional reset on non-PCIe device
From: Richard A Lary rl...@linux.vnet.ibm.com For adapters which have devices under a PCIe switch/bridge it is informative to display information for both the PCIe switch/bridge and the device on which the bus error was detected. Signed-off-by: Richard A Lary rl...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/eeh_driver.c | 24 14 +10 -0 ! 1 file changed, 14 insertions(+), 10 deletions(-) Index: b/arch/powerpc/platforms/pseries/eeh_driver.c === --- a/arch/powerpc/platforms/pseries/eeh_driver.c +++ b/arch/powerpc/platforms/pseries/eeh_driver.c @@ -328,7 +328,7 @@ struct pci_dn * handle_eeh_events (struc struct pci_bus *frozen_bus; int rc = 0; enum pci_ers_result result = PCI_ERS_RESULT_NONE; - const char *location, *pci_str, *drv_str; + const char *location, *pci_str, *drv_str, *bus_pci_str, *bus_drv_str; frozen_dn = find_device_pe(event-dn); if (!frozen_dn) { @@ -364,22 +364,26 @@ struct pci_dn * handle_eeh_events (struc frozen_pdn = PCI_DN(frozen_dn); frozen_pdn-eeh_freeze_count++; - if (frozen_pdn-pcidev) { - pci_str = pci_name (frozen_pdn-pcidev); - drv_str = pcid_name (frozen_pdn-pcidev); - } else { - pci_str = eeh_pci_name(event-dev); - drv_str = pcid_name (event-dev); - } - + pci_str = eeh_pci_name(event-dev); + drv_str = pcid_name(event-dev); + if (frozen_pdn-eeh_freeze_count EEH_MAX_ALLOWED_FREEZES) goto excess_failures; printk(KERN_WARNING EEH: This PCI device has failed %d times in the last hour:\n, frozen_pdn-eeh_freeze_count); + + if (frozen_pdn-pcidev) { + bus_pci_str = pci_name(frozen_pdn-pcidev); + bus_drv_str = pcid_name(frozen_pdn-pcidev); + printk(KERN_WARNING + EEH: Bus location=%s driver=%s pci addr=%s\n, + location, bus_drv_str, bus_pci_str); + } + printk(KERN_WARNING - EEH: location=%s driver=%s pci addr=%s\n, + EEH: Device location=%s driver=%s pci addr=%s\n, location, drv_str, pci_str); /* Walk the various device drivers attached to this slot through ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3] powerpc/eeh: Display eeh error location for bus and device (resend)
From: Richard A Lary rl...@linux.vnet.ibm.com For adapters which have devices under a PCIe switch/bridge it is informative to display information for both the PCIe switch/bridge and the device on which the bus error was detected. Signed-off-by: Richard A Lary rl...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/eeh_driver.c | 24 14 +10 -0 ! 1 file changed, 14 insertions(+), 10 deletions(-) Index: b/arch/powerpc/platforms/pseries/eeh_driver.c === --- a/arch/powerpc/platforms/pseries/eeh_driver.c +++ b/arch/powerpc/platforms/pseries/eeh_driver.c @@ -328,7 +328,7 @@ struct pci_dn * handle_eeh_events (struc struct pci_bus *frozen_bus; int rc = 0; enum pci_ers_result result = PCI_ERS_RESULT_NONE; - const char *location, *pci_str, *drv_str; + const char *location, *pci_str, *drv_str, *bus_pci_str, *bus_drv_str; frozen_dn = find_device_pe(event-dn); if (!frozen_dn) { @@ -364,22 +364,26 @@ struct pci_dn * handle_eeh_events (struc frozen_pdn = PCI_DN(frozen_dn); frozen_pdn-eeh_freeze_count++; - if (frozen_pdn-pcidev) { - pci_str = pci_name (frozen_pdn-pcidev); - drv_str = pcid_name (frozen_pdn-pcidev); - } else { - pci_str = eeh_pci_name(event-dev); - drv_str = pcid_name (event-dev); - } - + pci_str = eeh_pci_name(event-dev); + drv_str = pcid_name(event-dev); + if (frozen_pdn-eeh_freeze_count EEH_MAX_ALLOWED_FREEZES) goto excess_failures; printk(KERN_WARNING EEH: This PCI device has failed %d times in the last hour:\n, frozen_pdn-eeh_freeze_count); + + if (frozen_pdn-pcidev) { + bus_pci_str = pci_name(frozen_pdn-pcidev); + bus_drv_str = pcid_name(frozen_pdn-pcidev); + printk(KERN_WARNING + EEH: Bus location=%s driver=%s pci addr=%s\n, + location, bus_drv_str, bus_pci_str); + } + printk(KERN_WARNING - EEH: location=%s driver=%s pci addr=%s\n, + EEH: Device location=%s driver=%s pci addr=%s\n, location, drv_str, pci_str); /* Walk the various device drivers attached to this slot through ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] powerpc/eeh: Add support for ibm,configure-pe RTAS call
On 4/6/2011 8:35 PM, Benjamin Herrenschmidt wrote: On Wed, 2011-04-06 at 15:50 -0700, Richard A Lary wrote: From: Richard A. Laryrl...@linux.vnet.ibm.com Added support for ibm,configure-pe RTAS call introduced with PAPR 2.2. Care to tell us a bit about the difference ? :-) There's nothing obvious in the code Also you added calls to rtas_configure_bridge() and eeh_restore_bars() to eeh_slot_error_Detail(), that might want some explanation as well. I was directed by Platform Firmware team to use the new configure-pe call on platforms which support it. Other than that, all I can tell you is what PAPR+ 2.5 has to say: --- Section 7.3.12.2 ibm,configure-pe This call has about the same semantics as the ibm,configure-bridge RTAS call, except that it: 1. Has the additional semantics of bypassing the configuration process if the PE has previously not been reset by the platform as a result of entering the EEH Stopped State. 2. Configures all the configurations spaces within the PE, including those of the endpoint devices within the PE (see Requirement R1–7.3.12.2–3). Thus, this RTAS call can be made at the beginning of any EEH processing. --- The addition of calls to rtas_configure_bridge() and eeh_restore_bars() were also added as directed by Platform Firmware team. Platform Firmware advised these calls were required to allow access to all devices under the Partitionable Endpoint by gather_pci_data() on PCIe adapters which contain PCIe Switches/Bridges. I have verified in testing that the addition of rtas_configure_bridge() and eeh_restore_bars() do indeed allow gather_pci_data() to read config space on every device under the PE, which returned all FF's prior to addition of those calls. Cheers, Ben. Signed-off-by: Richard A. Laryrl...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/eeh.c | 13 12 +1 - 0 ! 1 file changed, 12 insertions(+), 1 deletion(-) Index: b/arch/powerpc/platforms/pseries/eeh.c === --- a/arch/powerpc/platforms/pseries/eeh.c +++ b/arch/powerpc/platforms/pseries/eeh.c @@ -95,6 +95,7 @@ static int ibm_slot_error_detail; static int ibm_get_config_addr_info; static int ibm_get_config_addr_info2; static int ibm_configure_bridge; +static int ibm_configure_pe; int eeh_subsystem_enabled; EXPORT_SYMBOL(eeh_subsystem_enabled); @@ -263,6 +264,8 @@ void eeh_slot_error_detail(struct pci_dn pci_regs_buf[0] = 0; rtas_pci_enable(pdn, EEH_THAW_MMIO); + rtas_configure_bridge(pdn); + eeh_restore_bars(pdn); loglen = gather_pci_data(pdn, pci_regs_buf, EEH_PCI_REGS_LOG_LEN); rtas_slot_error_detail(pdn, severity, pci_regs_buf, loglen); @@ -896,6 +899,7 @@ void rtas_configure_bridge(struct pci_dn *pdn) { int config_addr; + int token; int rc; /* Use PE configuration address, if present */ @@ -903,7 +907,13 @@ rtas_configure_bridge(struct pci_dn *pdn if (pdn-eeh_pe_config_addr) config_addr = pdn-eeh_pe_config_addr; - rc = rtas_call(ibm_configure_bridge,3,1, NULL, + /* Use new configure-pe function, if supported */ + if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) + token = ibm_configure_pe; + else + token = ibm_configure_bridge; + + rc = rtas_call(token, 3, 1, NULL, config_addr, BUID_HI(pdn-phb-buid), BUID_LO(pdn-phb-buid)); @@ -1079,6 +1089,7 @@ void __init eeh_init(void) ibm_get_config_addr_info = rtas_token(ibm,get-config-addr-info); ibm_get_config_addr_info2 = rtas_token(ibm,get-config-addr-info2); ibm_configure_bridge = rtas_token (ibm,configure-bridge); + ibm_configure_pe = rtas_token(ibm,configure-pe); if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) return; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] powerpc/eeh: Add support for ibm,configure-pe RTAS call
From: Richard A. Lary rl...@linux.vnet.ibm.com Added support for ibm,configure-pe RTAS call introduced with PAPR 2.2. Signed-off-by: Richard A. Lary rl...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/eeh.c | 1312 +1 - 0 ! 1 file changed, 12 insertions(+), 1 deletion(-) Index: b/arch/powerpc/platforms/pseries/eeh.c === --- a/arch/powerpc/platforms/pseries/eeh.c +++ b/arch/powerpc/platforms/pseries/eeh.c @@ -95,6 +95,7 @@ static int ibm_slot_error_detail; static int ibm_get_config_addr_info; static int ibm_get_config_addr_info2; static int ibm_configure_bridge; +static int ibm_configure_pe; int eeh_subsystem_enabled; EXPORT_SYMBOL(eeh_subsystem_enabled); @@ -263,6 +264,8 @@ void eeh_slot_error_detail(struct pci_dn pci_regs_buf[0] = 0; rtas_pci_enable(pdn, EEH_THAW_MMIO); + rtas_configure_bridge(pdn); + eeh_restore_bars(pdn); loglen = gather_pci_data(pdn, pci_regs_buf, EEH_PCI_REGS_LOG_LEN); rtas_slot_error_detail(pdn, severity, pci_regs_buf, loglen); @@ -896,6 +899,7 @@ void rtas_configure_bridge(struct pci_dn *pdn) { int config_addr; + int token; int rc; /* Use PE configuration address, if present */ @@ -903,7 +907,13 @@ rtas_configure_bridge(struct pci_dn *pdn if (pdn-eeh_pe_config_addr) config_addr = pdn-eeh_pe_config_addr; - rc = rtas_call(ibm_configure_bridge,3,1, NULL, + /* Use new configure-pe function, if supported */ + if (ibm_configure_pe != RTAS_UNKNOWN_SERVICE) + token = ibm_configure_pe; + else + token = ibm_configure_bridge; + + rc = rtas_call(token, 3, 1, NULL, config_addr, BUID_HI(pdn-phb-buid), BUID_LO(pdn-phb-buid)); @@ -1079,6 +1089,7 @@ void __init eeh_init(void) ibm_get_config_addr_info = rtas_token(ibm,get-config-addr-info); ibm_get_config_addr_info2 = rtas_token(ibm,get-config-addr-info2); ibm_configure_bridge = rtas_token (ibm,configure-bridge); + ibm_configure_pe = rtas_token(ibm,configure-pe); if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) return; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev