Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
Thierry Redingwrites: ... > > In case of Tegra, dev actually points to the root port. Now if I read > the above code correctly, highest_pcie_bridge will still be NULL in that > case, which in turn will return NULL from pci_find_pcie_root_port(). But > shouldn't it really return dev? > > The patch that I used to fix the issue is this: > > --->8--- > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 2c712dcfd37d..dd56c1c05614 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -514,7 +514,7 @@ EXPORT_SYMBOL(pci_find_resource); > */ > struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > { > - struct pci_dev *bridge, *highest_pcie_bridge = NULL; > + struct pci_dev *bridge, *highest_pcie_bridge = dev; > > bridge = pci_upstream_bridge(dev); > while (bridge && pci_is_pcie(bridge)) { > --->8--- > > That works correctly if this function ends up being called on the PCIe > root port, though perhaps that's not what this function is supposed to > do. It's somewhat unclear from the kerneldoc what the function should > be doing when called on a root port device itself. That also works for me on powerpc (oops reported up thread). cheers
Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
On 2017/8/17 4:59, David Miller wrote: > From: Bjorn Helgaas> Date: Wed, 16 Aug 2017 15:02:37 -0500 > >> Your fix looks right to me. > > Someone please submit this fix formally because this change is now in > Linus's tree. > I will send it. > Thank you. > > . >
Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
From: Bjorn HelgaasDate: Wed, 16 Aug 2017 15:02:37 -0500 > Your fix looks right to me. Someone please submit this fix formally because this change is now in Linus's tree. Thank you.
Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
On Wed, Aug 16, 2017 at 09:33:03PM +0200, Thierry Reding wrote: > On Tue, Aug 15, 2017 at 12:03:31PM -0500, Bjorn Helgaas wrote: > > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: > > > Eric report a oops when booting the system after applying > > > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): > > > ... > > > > > It looks like the pci_find_pcie_root_port() was trying to > > > find the Root Port for the PCI device which is the Root > > > Port already, it will return NULL and trigger the problem, > > > so check the highest_pcie_bridge to fix thie problem. > > > > The problem was actually with a Root Complex Integrated Endpoint that > > has no upstream PCIe device: > > > > 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) > > Subsystem: Intel Corporation Device 0e2a > > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > > Stepping- SERR- FastB2B- DisINTx- > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > > SERR- > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, > > MSI 00 > > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s > > <64ns, L1 <1us > > ExtTag- RBE- FLReset- > > DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ > > Unsupported+ > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > > MaxPayload 128 bytes, MaxReadReq 128 bytes > > I've started seeing this crash on Tegra K1 as well. Here's the device > for which it oopses: > > 00:02.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1) > (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ > Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 391 > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > I/O behind bridge: 1000-1fff [size=4K] > Memory behind bridge: 1300-130f [size=1M] > Prefetchable memory behind bridge: 2000-200f > [size=1M] > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- > BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [40] Subsystem: NVIDIA Corporation TegraK1 PCIe x1 > Bridge > Capabilities: [48] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0+,D1+,D2+,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+ > Address: 00fcf000 Data: > Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed- > Mapping Address Base: fee0 > Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0 > ExtTag+ RBE+ > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- > Unsupported- > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- > TransPend- > LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s, Exit > Latency L0s <512ns > ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive+ BWMgmt+ ABWMgmt- > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- > Surprise- > Slot #0, PowerLimit 0.000W; Interlock- NoCompl- > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- > HPIrq- LinkChg- > Control: AttnInd Off, PwrInd On, Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ > Interlock- > Changed: MRL- PresDet+ LinkState+ > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ > CRSVisible- > RootCap: CRSVisible- > RootSta: PME ReqID , PMEStatus- PMEPending- > DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, > OBFF Not Supported ARIFwd- > AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, > OBFF Disabled ARIFwd- > AtomicOpsCtl: ReqEn- EgressBlck- >
Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
On Tue, Aug 15, 2017 at 12:03:31PM -0500, Bjorn Helgaas wrote: > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: > > Eric report a oops when booting the system after applying > > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): > > ... > > > It looks like the pci_find_pcie_root_port() was trying to > > find the Root Port for the PCI device which is the Root > > Port already, it will return NULL and trigger the problem, > > so check the highest_pcie_bridge to fix thie problem. > > The problem was actually with a Root Complex Integrated Endpoint that > has no upstream PCIe device: > > 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) > Subsystem: Intel Corporation Device 0e2a > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI > 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, > L1 <1us > ExtTag- RBE- FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ > Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes I've started seeing this crash on Tegra K1 as well. Here's the device for which it oopses: 00:02.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Subsystem: NVIDIA Corporation TegraK1 PCIe x1 Bridge Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+ Address: 00fcf000 Data: Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed- Mapping Address Base: fee0 Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag+ RBE+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s, Exit Latency L0s <512ns ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 0.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Off, PwrInd On, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID , PMEStatus- PMEPending- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Kernel driver in use: pcieport > > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") > > This also > > Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 > Completion erratum") > > which added pci_find_pcie_root_port(). Prior to this
Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
From: Bjorn HelgaasDate: Tue, 15 Aug 2017 12:03:31 -0500 > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: >> Eric report a oops when booting the system after applying >> the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): >> ... > >> It looks like the pci_find_pcie_root_port() was trying to >> find the Root Port for the PCI device which is the Root >> Port already, it will return NULL and trigger the problem, >> so check the highest_pcie_bridge to fix thie problem. > > The problem was actually with a Root Complex Integrated Endpoint that > has no upstream PCIe device: > > 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) > Subsystem: Intel Corporation Device 0e2a > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI > 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, > L1 <1us > ExtTag- RBE- FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ > Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes > >> Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") > > This also > > Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 > Completion erratum") > > which added pci_find_pcie_root_port(). Prior to this Relaxed Ordering > series, we only used pci_find_pcie_root_port() in a Chelsio quirk that > only applied to non-integrated endpoints, so we didn't trip over the > bug. ... > I think structuring the fix as follows is a little more readable: > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc3456dc1..587cd7623ed8 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c I've integrated all of this feedback and the other Fixes: tag and applied it to 'net', thanks.
Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: > Eric report a oops when booting the system after applying > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): > ... > It looks like the pci_find_pcie_root_port() was trying to > find the Root Port for the PCI device which is the Root > Port already, it will return NULL and trigger the problem, > so check the highest_pcie_bridge to fix thie problem. The problem was actually with a Root Complex Integrated Endpoint that has no upstream PCIe device: 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) Subsystem: Intel Corporation Device 0e2a Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") This also Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") which added pci_find_pcie_root_port(). Prior to this Relaxed Ordering series, we only used pci_find_pcie_root_port() in a Chelsio quirk that only applied to non-integrated endpoints, so we didn't trip over the bug. > Reported-by: Eric Dumazet> Signed-off-by: Eric Dumazet > Signed-off-by: Ding Tianhong > --- > drivers/pci/pci.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc34..7e2022f 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev > *dev) > bridge = pci_upstream_bridge(bridge); > } > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > + if (highest_pcie_bridge && > + pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > return NULL; > > return highest_pcie_bridge; > -- I think structuring the fix as follows is a little more readable: diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af0cc3456dc1..587cd7623ed8 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -522,10 +522,11 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) bridge = pci_upstream_bridge(bridge); } - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) - return NULL; + if (highest_pcie_bridge && + pci_pcie_type(highest_pcie_bridge) == PCI_EXP_TYPE_ROOT_PORT) + return highest_pcie_bridge; - return highest_pcie_bridge; + return NULL; } EXPORT_SYMBOL(pci_find_pcie_root_port);
[PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
Eric report a oops when booting the system after applying the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): [4.241029] BUG: unable to handle kernel NULL pointer dereference at 0050 [4.247001] IP: pci_find_pcie_root_port+0x62/0x80 [4.253011] PGD 0 [4.253011] P4D 0 [4.253011] [4.258013] Oops: [#1] SMP DEBUG_PAGEALLOC [4.262015] Modules linked in: [4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316 [4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016 [4.279002] task: a2ee38cfa040 task.stack: a51ec0004000 [4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80 [4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246 [4.295003] RAX: RBX: a2ee36bae000 RCX: 0006 [4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: a2ee36bae000 [4.310013] RBP: a51ec0007b58 R08: 0001 R09: [4.317001] R10: R11: R12: a51ec0007ad0 [4.324005] R13: a2ee36bae098 R14: 0002 R15: a2ee37204818 [4.331002] FS: () GS:a2ee3fcc() knlGS: [4.339002] CS: 0010 DS: ES: CR0: 80050033 [4.345001] CR2: 0050 CR3: 00401000f000 CR4: 001406e0 [4.351002] Call Trace: [4.354012] ? pci_configure_device+0x19f/0x570 [4.359002] ? pci_conf1_read+0xb8/0xf0 [4.363002] ? raw_pci_read+0x23/0x40 [4.366011] ? pci_read+0x2c/0x30 [4.370014] ? pci_read_config_word+0x67/0x70 [4.374012] pci_device_add+0x28/0x230 [4.378012] ? pci_vpd_f0_read+0x50/0x80 [4.382014] pci_scan_single_device+0x96/0xc0 [4.386012] pci_scan_slot+0x79/0xf0 [4.389001] pci_scan_child_bus+0x31/0x180 [4.394014] acpi_pci_root_create+0x1c6/0x240 [4.398013] pci_acpi_scan_root+0x15f/0x1b0 [4.402012] acpi_pci_root_add+0x2e6/0x400 [4.406012] ? acpi_evaluate_integer+0x37/0x60 [4.411002] acpi_bus_attach+0xdf/0x200 [4.415002] acpi_bus_attach+0x6a/0x200 [4.418014] acpi_bus_attach+0x6a/0x200 [4.422013] acpi_bus_scan+0x38/0x70 [4.426011] acpi_scan_init+0x10c/0x271 [4.429001] acpi_init+0x2fa/0x348 [4.433004] ? acpi_sleep_proc_init+0x2d/0x2d [4.437001] do_one_initcall+0x43/0x169 [4.441001] kernel_init_freeable+0x1d0/0x258 [4.445003] ? rest_init+0xe0/0xe0 [4.449001] kernel_init+0xe/0x150 == cut here = It looks like the pci_find_pcie_root_port() was trying to find the Root Port for the PCI device which is the Root Port already, it will return NULL and trigger the problem, so check the highest_pcie_bridge to fix thie problem. Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") Reported-by: Eric DumazetSigned-off-by: Eric Dumazet Signed-off-by: Ding Tianhong --- drivers/pci/pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af0cc34..7e2022f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) bridge = pci_upstream_bridge(bridge); } - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) + if (highest_pcie_bridge && + pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) return NULL; return highest_pcie_bridge; -- 1.8.3.1