Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On 2013/11/20 9:20, Bjorn Helgaas wrote: > On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang wrote: > [bhelgaas: changelog, tag for stable] > Reported-by: David Bulkow > Reported-by: Mika Westerberg > Signed-off-by: Yinghai Lu > Signed-off-by: Bjorn Helgaas > CC: sta...@vger.kernel.org # v2.6.32+ Hi Bjorn, This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed" So this patch needs to back port to 2.6.32+ ? >>> >>> 928bea9 might have made it more visible, but the underlying problem is that >>> we enable the device once in the probe path, and disable it twice in the >>> remove path. That problem exists in 2.6.32.61: >>> >>> pcie_portdrv_probe# .probe() method >>> pcie_port_device_register >>> pci_enable_device <-- enable >>> >>> pcie_portdrv_remove # .remove() method >>> pcie_port_device_remove >>> pci_disable_device<-- disable #1 >>> pci_disable_device <-- disable #2 >> >> During assign unassigned resources, we also enable the port device, >> >> fs_initcall(pcibios_assign_resources); >> pci_assign_unassigned_resources; >> pci_enable_bridges() >> pci_enable_device() >> >> >> So I think before the commit 928bea9 , the pci bridge device enable and >> disable is symmetrical. >> After the commit 928bea9, we only enable bridge once, but still remove twice. > > The port driver should be symmetrical, regardless of what happens > outside it. We have to be able to bind/unbind/bind/unbind > indefinitely. > > Do you think the patch is a problem for current upstream, or are you > just saying it doesn't need to be backported as far as 2.6.32? I > frankly don't care that much if those old kernels pick it up or not. > All I'm saying is that the problem this fixes is present that far > back. > > I'm not really interested in doing a lot more digging about ancient > kernels, unless you think it's going to break something if applied to > them. As you said, this is not a big problem, it's ok for current upstream, and for 3.4 3.5 3.10 stable tree, it seems just a enable and disable symmetry problem. I don't find anything unsafe about this even though when we unbind pcie port driver, the pcie port maybe still enable. I have no objection to backport it to ancient kernels. :) Thanks! Yijing. > > Bjorn > > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang wrote: [bhelgaas: changelog, tag for stable] Reported-by: David Bulkow Reported-by: Mika Westerberg Signed-off-by: Yinghai Lu Signed-off-by: Bjorn Helgaas CC: sta...@vger.kernel.org # v2.6.32+ >>> >>> Hi Bjorn, >>>This issue in X86 seems to be introduced after commit 928bea9 "PCI: >>> Delay enabling bridges until they're needed" >>> So this patch needs to back port to 2.6.32+ ? >> >> 928bea9 might have made it more visible, but the underlying problem is that >> we enable the device once in the probe path, and disable it twice in the >> remove path. That problem exists in 2.6.32.61: >> >> pcie_portdrv_probe# .probe() method >> pcie_port_device_register >> pci_enable_device <-- enable >> >> pcie_portdrv_remove # .remove() method >> pcie_port_device_remove >> pci_disable_device<-- disable #1 >> pci_disable_device <-- disable #2 > > During assign unassigned resources, we also enable the port device, > > fs_initcall(pcibios_assign_resources); > pci_assign_unassigned_resources; > pci_enable_bridges() > pci_enable_device() > > > So I think before the commit 928bea9 , the pci bridge device enable and > disable is symmetrical. > After the commit 928bea9, we only enable bridge once, but still remove twice. The port driver should be symmetrical, regardless of what happens outside it. We have to be able to bind/unbind/bind/unbind indefinitely. Do you think the patch is a problem for current upstream, or are you just saying it doesn't need to be backported as far as 2.6.32? I frankly don't care that much if those old kernels pick it up or not. All I'm saying is that the problem this fixes is present that far back. I'm not really interested in doing a lot more digging about ancient kernels, unless you think it's going to break something if applied to them. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
>>> [bhelgaas: changelog, tag for stable] >>> Reported-by: David Bulkow >>> Reported-by: Mika Westerberg >>> Signed-off-by: Yinghai Lu >>> Signed-off-by: Bjorn Helgaas >>> CC: sta...@vger.kernel.org # v2.6.32+ >> >> Hi Bjorn, >>This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay >> enabling bridges until they're needed" >> So this patch needs to back port to 2.6.32+ ? > > 928bea9 might have made it more visible, but the underlying problem is that > we enable the device once in the probe path, and disable it twice in the > remove path. That problem exists in 2.6.32.61: > > pcie_portdrv_probe# .probe() method > pcie_port_device_register > pci_enable_device <-- enable > > pcie_portdrv_remove # .remove() method > pcie_port_device_remove > pci_disable_device<-- disable #1 > pci_disable_device <-- disable #2 During assign unassigned resources, we also enable the port device, fs_initcall(pcibios_assign_resources); pci_assign_unassigned_resources; pci_enable_bridges() pci_enable_device() So I think before the commit 928bea9 , the pci bridge device enable and disable is symmetrical. After the commit 928bea9, we only enable bridge once, but still remove twice. Thanks! Yijing. -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Nov 19, 2013 at 09:54:58AM +0800, Yijing Wang wrote: > > The pcie_portdrv .probe() method calls pci_enable_device() once, in > > pcie_port_device_register(), but the .remove() method calls > > pci_disable_device() twice, in pcie_port_device_remove() and in > > pcie_portdrv_remove(). > > > > That causes a "disabling already-disabled device" warning when removing a > > PCIe port device. This happens all the time when removing Thunderbolt > > devices, but is also easy to reproduce with, e.g., > > "echo :00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind" > > > > This patch removes the disable from pcie_portdrv_remove(). > > > > [bhelgaas: changelog, tag for stable] > > Reported-by: David Bulkow > > Reported-by: Mika Westerberg > > Signed-off-by: Yinghai Lu > > Signed-off-by: Bjorn Helgaas > > CC: sta...@vger.kernel.org # v2.6.32+ > > Hi Bjorn, >This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay > enabling bridges until they're needed" > So this patch needs to back port to 2.6.32+ ? 928bea9 might have made it more visible, but the underlying problem is that we enable the device once in the probe path, and disable it twice in the remove path. That problem exists in 2.6.32.61: pcie_portdrv_probe# .probe() method pcie_port_device_register pci_enable_device <-- enable pcie_portdrv_remove # .remove() method pcie_port_device_remove pci_disable_device <-- disable #1 pci_disable_device <-- disable #2 Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Mon, Nov 18, 2013 at 06:33:43PM -0700, Bjorn Helgaas wrote: > I fixed the changelog (the extra disable was actually added by d899871936, > not by dc5351784e) and put the patch below in my for-linus branch. I'll > ask Linus to pull it later this week. > > Sorry for the delay, and thanks for the reminder. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Mon, Nov 18, 2013 at 06:33:43PM -0700, Bjorn Helgaas wrote: I fixed the changelog (the extra disable was actually added by d899871936, not by dc5351784e) and put the patch below in my for-linus branch. I'll ask Linus to pull it later this week. Sorry for the delay, and thanks for the reminder. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Nov 19, 2013 at 09:54:58AM +0800, Yijing Wang wrote: The pcie_portdrv .probe() method calls pci_enable_device() once, in pcie_port_device_register(), but the .remove() method calls pci_disable_device() twice, in pcie_port_device_remove() and in pcie_portdrv_remove(). That causes a disabling already-disabled device warning when removing a PCIe port device. This happens all the time when removing Thunderbolt devices, but is also easy to reproduce with, e.g., echo :00:1c.3 /sys/bus/pci/drivers/pcieport/unbind This patch removes the disable from pcie_portdrv_remove(). [bhelgaas: changelog, tag for stable] Reported-by: David Bulkow david.bul...@stratus.com Reported-by: Mika Westerberg mika.westerb...@linux.intel.com Signed-off-by: Yinghai Lu ying...@kernel.org Signed-off-by: Bjorn Helgaas bhelg...@google.com CC: sta...@vger.kernel.org # v2.6.32+ Hi Bjorn, This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay enabling bridges until they're needed So this patch needs to back port to 2.6.32+ ? 928bea9 might have made it more visible, but the underlying problem is that we enable the device once in the probe path, and disable it twice in the remove path. That problem exists in 2.6.32.61: pcie_portdrv_probe# .probe() method pcie_port_device_register pci_enable_device -- enable pcie_portdrv_remove # .remove() method pcie_port_device_remove pci_disable_device -- disable #1 pci_disable_device -- disable #2 Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
[bhelgaas: changelog, tag for stable] Reported-by: David Bulkow david.bul...@stratus.com Reported-by: Mika Westerberg mika.westerb...@linux.intel.com Signed-off-by: Yinghai Lu ying...@kernel.org Signed-off-by: Bjorn Helgaas bhelg...@google.com CC: sta...@vger.kernel.org # v2.6.32+ Hi Bjorn, This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay enabling bridges until they're needed So this patch needs to back port to 2.6.32+ ? 928bea9 might have made it more visible, but the underlying problem is that we enable the device once in the probe path, and disable it twice in the remove path. That problem exists in 2.6.32.61: pcie_portdrv_probe# .probe() method pcie_port_device_register pci_enable_device -- enable pcie_portdrv_remove # .remove() method pcie_port_device_remove pci_disable_device-- disable #1 pci_disable_device -- disable #2 During assign unassigned resources, we also enable the port device, fs_initcall(pcibios_assign_resources); pci_assign_unassigned_resources; pci_enable_bridges() pci_enable_device() So I think before the commit 928bea9 , the pci bridge device enable and disable is symmetrical. After the commit 928bea9, we only enable bridge once, but still remove twice. Thanks! Yijing. -- Thanks! Yijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang wangyij...@huawei.com wrote: [bhelgaas: changelog, tag for stable] Reported-by: David Bulkow david.bul...@stratus.com Reported-by: Mika Westerberg mika.westerb...@linux.intel.com Signed-off-by: Yinghai Lu ying...@kernel.org Signed-off-by: Bjorn Helgaas bhelg...@google.com CC: sta...@vger.kernel.org # v2.6.32+ Hi Bjorn, This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay enabling bridges until they're needed So this patch needs to back port to 2.6.32+ ? 928bea9 might have made it more visible, but the underlying problem is that we enable the device once in the probe path, and disable it twice in the remove path. That problem exists in 2.6.32.61: pcie_portdrv_probe# .probe() method pcie_port_device_register pci_enable_device -- enable pcie_portdrv_remove # .remove() method pcie_port_device_remove pci_disable_device-- disable #1 pci_disable_device -- disable #2 During assign unassigned resources, we also enable the port device, fs_initcall(pcibios_assign_resources); pci_assign_unassigned_resources; pci_enable_bridges() pci_enable_device() So I think before the commit 928bea9 , the pci bridge device enable and disable is symmetrical. After the commit 928bea9, we only enable bridge once, but still remove twice. The port driver should be symmetrical, regardless of what happens outside it. We have to be able to bind/unbind/bind/unbind indefinitely. Do you think the patch is a problem for current upstream, or are you just saying it doesn't need to be backported as far as 2.6.32? I frankly don't care that much if those old kernels pick it up or not. All I'm saying is that the problem this fixes is present that far back. I'm not really interested in doing a lot more digging about ancient kernels, unless you think it's going to break something if applied to them. Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On 2013/11/20 9:20, Bjorn Helgaas wrote: On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang wangyij...@huawei.com wrote: [bhelgaas: changelog, tag for stable] Reported-by: David Bulkow david.bul...@stratus.com Reported-by: Mika Westerberg mika.westerb...@linux.intel.com Signed-off-by: Yinghai Lu ying...@kernel.org Signed-off-by: Bjorn Helgaas bhelg...@google.com CC: sta...@vger.kernel.org # v2.6.32+ Hi Bjorn, This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay enabling bridges until they're needed So this patch needs to back port to 2.6.32+ ? 928bea9 might have made it more visible, but the underlying problem is that we enable the device once in the probe path, and disable it twice in the remove path. That problem exists in 2.6.32.61: pcie_portdrv_probe# .probe() method pcie_port_device_register pci_enable_device -- enable pcie_portdrv_remove # .remove() method pcie_port_device_remove pci_disable_device-- disable #1 pci_disable_device -- disable #2 During assign unassigned resources, we also enable the port device, fs_initcall(pcibios_assign_resources); pci_assign_unassigned_resources; pci_enable_bridges() pci_enable_device() So I think before the commit 928bea9 , the pci bridge device enable and disable is symmetrical. After the commit 928bea9, we only enable bridge once, but still remove twice. The port driver should be symmetrical, regardless of what happens outside it. We have to be able to bind/unbind/bind/unbind indefinitely. Do you think the patch is a problem for current upstream, or are you just saying it doesn't need to be backported as far as 2.6.32? I frankly don't care that much if those old kernels pick it up or not. All I'm saying is that the problem this fixes is present that far back. I'm not really interested in doing a lot more digging about ancient kernels, unless you think it's going to break something if applied to them. As you said, this is not a big problem, it's ok for current upstream, and for 3.4 3.5 3.10 stable tree, it seems just a enable and disable symmetry problem. I don't find anything unsafe about this even though when we unbind pcie port driver, the pcie port maybe still enable. I have no objection to backport it to ancient kernels. :) Thanks! Yijing. Bjorn -- Thanks! Yijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
> The pcie_portdrv .probe() method calls pci_enable_device() once, in > pcie_port_device_register(), but the .remove() method calls > pci_disable_device() twice, in pcie_port_device_remove() and in > pcie_portdrv_remove(). > > That causes a "disabling already-disabled device" warning when removing a > PCIe port device. This happens all the time when removing Thunderbolt > devices, but is also easy to reproduce with, e.g., > "echo :00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind" > > This patch removes the disable from pcie_portdrv_remove(). > > [bhelgaas: changelog, tag for stable] > Reported-by: David Bulkow > Reported-by: Mika Westerberg > Signed-off-by: Yinghai Lu > Signed-off-by: Bjorn Helgaas > CC: sta...@vger.kernel.org# v2.6.32+ Hi Bjorn, This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed" So this patch needs to back port to 2.6.32+ ? > --- > drivers/pci/pcie/portdrv_pci.c |1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c > index cd1e57e51aa7..0d8fdc48e642 100644 > --- a/drivers/pci/pcie/portdrv_pci.c > +++ b/drivers/pci/pcie/portdrv_pci.c > @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev, > static void pcie_portdrv_remove(struct pci_dev *dev) > { > pcie_port_device_remove(dev); > - pci_disable_device(dev); > } > > static int error_detected_iter(struct device *device, void *data) > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Fri, Nov 15, 2013 at 01:52:35PM +0200, Mika Westerberg wrote: > On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote: > > On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu wrote: > > > On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas > > > wrote: > > >> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever > > >> wrote: > > >>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas > > >>> wrote: > > On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: > > > On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: > > > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever > > > > wrote: > > > > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro > > > > > Linux > > > > > crashes a few seconds later. Using > > > > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove > > > > > to remove a bridge two levels above the device triggers the fault > > > > > immediately: > > > > We save a pci_dev pointer in the pci_pme_list, which of course has a > > longer lifetime than the pci_dev itself, but we don't acquire a > > reference > > on it, so I suspect the pci_dev got released before we got around to > > doing the pci_pme_list_scan(). > > > > Andreas, can you try the patch below? It's against v3.12-rc2, but it > > should apply to v3.11, too. > > >>> > > >>> I have tested your patch against 3.11 where it solves the problem. > > >>> Thanks! > > >>> > > >>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only > > >>> get the following warning (and no crash): > > >>> > > >>> tg3 :0a:00.0: PME# disabled > > >>> pcieport :09:00.0: PME# disabled > > >>> pciehp :09:00.0:pcie24: unloading service driver pciehp > > >>> pci_bus :0a: dev 00, dec refcount to 0 > > >>> pci_bus :0a: dev 00, released physical slot 9 > > >>> [ cut here ] > > >>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 > > >>> pci_disable_device+0x84/0x90() > > >>> Device pcieport > > >>> disabling already-disabled device > > >>> ... > > > > >>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . > > >> > > >> This is "PCI: Delay enabling bridges until they're needed" by Yinghai. > > > > > > that double disabling should be addressed by: > > > > > > https://lkml.org/lkml/2013/4/25/608 > > > > > > [PATCH] PCI: Remove duplicate pci_disable_device for pcie port > > > > I'll look at that patch again. I had some questions about it the > > first time, but perhaps it makes more sense after 928bea9648 has been > > applied. > > Bjorn, > > Are there any plans to apply the above patch? > > I'm seeing that warning on all my TBT test machines: > > [ 122.914180] pcieport :06:05.0: PME# disabled > [ 122.915386] [ cut here ] > [ 122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 > pci_disable_device+0x7c/0x90() > [ 122.917589] Device pcieport > [ 122.917589] disabling already-disabled device I fixed the changelog (the extra disable was actually added by d899871936, not by dc5351784e) and put the patch below in my for-linus branch. I'll ask Linus to pull it later this week. Sorry for the delay, and thanks for the reminder. Bjorn PCI: Remove duplicate pci_disable_device() from pcie_portdrv_remove() From: Yinghai Lu The pcie_portdrv .probe() method calls pci_enable_device() once, in pcie_port_device_register(), but the .remove() method calls pci_disable_device() twice, in pcie_port_device_remove() and in pcie_portdrv_remove(). That causes a "disabling already-disabled device" warning when removing a PCIe port device. This happens all the time when removing Thunderbolt devices, but is also easy to reproduce with, e.g., "echo :00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind" This patch removes the disable from pcie_portdrv_remove(). [bhelgaas: changelog, tag for stable] Reported-by: David Bulkow Reported-by: Mika Westerberg Signed-off-by: Yinghai Lu Signed-off-by: Bjorn Helgaas CC: sta...@vger.kernel.org # v2.6.32+ --- drivers/pci/pcie/portdrv_pci.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index cd1e57e51aa7..0d8fdc48e642 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev, static void pcie_portdrv_remove(struct pci_dev *dev) { pcie_port_device_remove(dev); - pci_disable_device(dev); } static int error_detected_iter(struct device *device, void *data) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Fri, Nov 15, 2013 at 01:52:35PM +0200, Mika Westerberg wrote: On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote: On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu ying...@kernel.org wrote: On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com wrote: On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com wrote: On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. I have tested your patch against 3.11 where it solves the problem. Thanks! Unfortunately I could not reproduce the problem in 3.12-rc5. I only get the following warning (and no crash): tg3 :0a:00.0: PME# disabled pcieport :09:00.0: PME# disabled pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: dev 00, dec refcount to 0 pci_bus :0a: dev 00, released physical slot 9 [ cut here ] WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 pci_disable_device+0x84/0x90() Device pcieport disabling already-disabled device ... Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port I'll look at that patch again. I had some questions about it the first time, but perhaps it makes more sense after 928bea9648 has been applied. Bjorn, Are there any plans to apply the above patch? I'm seeing that warning on all my TBT test machines: [ 122.914180] pcieport :06:05.0: PME# disabled [ 122.915386] [ cut here ] [ 122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 pci_disable_device+0x7c/0x90() [ 122.917589] Device pcieport [ 122.917589] disabling already-disabled device I fixed the changelog (the extra disable was actually added by d899871936, not by dc5351784e) and put the patch below in my for-linus branch. I'll ask Linus to pull it later this week. Sorry for the delay, and thanks for the reminder. Bjorn PCI: Remove duplicate pci_disable_device() from pcie_portdrv_remove() From: Yinghai Lu ying...@kernel.org The pcie_portdrv .probe() method calls pci_enable_device() once, in pcie_port_device_register(), but the .remove() method calls pci_disable_device() twice, in pcie_port_device_remove() and in pcie_portdrv_remove(). That causes a disabling already-disabled device warning when removing a PCIe port device. This happens all the time when removing Thunderbolt devices, but is also easy to reproduce with, e.g., echo :00:1c.3 /sys/bus/pci/drivers/pcieport/unbind This patch removes the disable from pcie_portdrv_remove(). [bhelgaas: changelog, tag for stable] Reported-by: David Bulkow david.bul...@stratus.com Reported-by: Mika Westerberg mika.westerb...@linux.intel.com Signed-off-by: Yinghai Lu ying...@kernel.org Signed-off-by: Bjorn Helgaas bhelg...@google.com CC: sta...@vger.kernel.org # v2.6.32+ --- drivers/pci/pcie/portdrv_pci.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index cd1e57e51aa7..0d8fdc48e642 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev, static void pcie_portdrv_remove(struct pci_dev *dev) { pcie_port_device_remove(dev); - pci_disable_device(dev); } static int error_detected_iter(struct device *device, void *data) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
The pcie_portdrv .probe() method calls pci_enable_device() once, in pcie_port_device_register(), but the .remove() method calls pci_disable_device() twice, in pcie_port_device_remove() and in pcie_portdrv_remove(). That causes a disabling already-disabled device warning when removing a PCIe port device. This happens all the time when removing Thunderbolt devices, but is also easy to reproduce with, e.g., echo :00:1c.3 /sys/bus/pci/drivers/pcieport/unbind This patch removes the disable from pcie_portdrv_remove(). [bhelgaas: changelog, tag for stable] Reported-by: David Bulkow david.bul...@stratus.com Reported-by: Mika Westerberg mika.westerb...@linux.intel.com Signed-off-by: Yinghai Lu ying...@kernel.org Signed-off-by: Bjorn Helgaas bhelg...@google.com CC: sta...@vger.kernel.org# v2.6.32+ Hi Bjorn, This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay enabling bridges until they're needed So this patch needs to back port to 2.6.32+ ? --- drivers/pci/pcie/portdrv_pci.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index cd1e57e51aa7..0d8fdc48e642 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev, static void pcie_portdrv_remove(struct pci_dev *dev) { pcie_port_device_remove(dev); - pci_disable_device(dev); } static int error_detected_iter(struct device *device, void *data) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . -- Thanks! Yijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote: > On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu wrote: > > On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas wrote: > >> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever > >> wrote: > >>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas > >>> wrote: > On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: > > On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: > > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever > > > wrote: > > > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro > > > > Linux > > > > crashes a few seconds later. Using > > > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove > > > > to remove a bridge two levels above the device triggers the fault > > > > immediately: > > We save a pci_dev pointer in the pci_pme_list, which of course has a > longer lifetime than the pci_dev itself, but we don't acquire a reference > on it, so I suspect the pci_dev got released before we got around to > doing the pci_pme_list_scan(). > > Andreas, can you try the patch below? It's against v3.12-rc2, but it > should apply to v3.11, too. > >>> > >>> I have tested your patch against 3.11 where it solves the problem. Thanks! > >>> > >>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only > >>> get the following warning (and no crash): > >>> > >>> tg3 :0a:00.0: PME# disabled > >>> pcieport :09:00.0: PME# disabled > >>> pciehp :09:00.0:pcie24: unloading service driver pciehp > >>> pci_bus :0a: dev 00, dec refcount to 0 > >>> pci_bus :0a: dev 00, released physical slot 9 > >>> [ cut here ] > >>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 > >>> pci_disable_device+0x84/0x90() > >>> Device pcieport > >>> disabling already-disabled device > >>> ... > > >>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . > >> > >> This is "PCI: Delay enabling bridges until they're needed" by Yinghai. > > > > that double disabling should be addressed by: > > > > https://lkml.org/lkml/2013/4/25/608 > > > > [PATCH] PCI: Remove duplicate pci_disable_device for pcie port > > I'll look at that patch again. I had some questions about it the > first time, but perhaps it makes more sense after 928bea9648 has been > applied. Bjorn, Are there any plans to apply the above patch? I'm seeing that warning on all my TBT test machines: [ 122.914180] pcieport :06:05.0: PME# disabled [ 122.915386] [ cut here ] [ 122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 pci_disable_device+0x7c/0x90() [ 122.917589] Device pcieport [ 122.917589] disabling already-disabled device [ 122.918681] Modules linked in: [ 122.920803] CPU: 0 PID: 1060 Comm: kworker/0:2 Not tainted 3.12.0 #193 [ 122.921877] Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013 [ 122.922989] Workqueue: kacpi_hotplug hotplug_event_work [ 122.924097] 0009 88006de81ab0 817ca961 88006de81af8 [ 122.925241] 88006de81ae8 810445c8 88006ea15800 88006ea15800 [ 122.926385] 81c5ac80 88006ea14098 88006eb35c28 88006de81b48 [ 122.927519] Call Trace: [ 122.928626] [] dump_stack+0x45/0x56 [ 122.929757] [] warn_slowpath_common+0x78/0xa0 [ 122.930884] [] warn_slowpath_fmt+0x47/0x50 [ 122.932003] [] ? do_pci_disable_device+0x4d/0x60 [ 122.933116] [] pci_disable_device+0x7c/0x90 [ 122.934235] [] pcie_portdrv_remove+0x15/0x20 [ 122.935345] [] pci_device_remove+0x28/0x60 [ 122.936442] [] __device_release_driver+0x64/0xd0 [ 122.937543] [] device_release_driver+0x1e/0x30 [ 122.938636] [] bus_remove_device+0xf7/0x140 [ 122.939718] [] device_del+0x135/0x1d0 [ 122.940806] [] pci_stop_bus_device+0x94/0xa0 [ 122.941890] [] pci_stop_bus_device+0x3b/0xa0 [ 122.942957] [] pci_stop_and_remove_bus_device+0xd/0x20 [ 122.944004] [] trim_stale_devices+0x62/0xc0 [ 122.945034] [] trim_stale_devices+0xab/0xc0 [ 122.946042] [] trim_stale_devices+0xab/0xc0 [ 122.947034] [] acpiphp_check_bridge+0x7e/0xd0 [ 122.948036] [] hotplug_event+0xf2/0x230 [ 122.949042] [] ? acpi_os_release_object+0x9/0xd [ 122.950054] [] hotplug_event_work+0x22/0x60 [ 122.951067] [] process_one_work+0x17a/0x430 [ 122.952084] [] worker_thread+0x119/0x390 [ 122.953095] [] ? manage_workers.isra.25+0x2a0/0x2a0 [ 122.954107] [] kthread+0xbb/0xc0 [ 122.955115] [] ? kthread_create_on_node+0x110/0x110 [ 122.956136] [] ret_from_fork+0x7c/0xb0 [ 122.957141] [] ? kthread_create_on_node+0x110/0x110 [ 122.958145] ---[ end trace a0dcbb3b178e4755 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote: On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu ying...@kernel.org wrote: On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com wrote: On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com wrote: On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. I have tested your patch against 3.11 where it solves the problem. Thanks! Unfortunately I could not reproduce the problem in 3.12-rc5. I only get the following warning (and no crash): tg3 :0a:00.0: PME# disabled pcieport :09:00.0: PME# disabled pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: dev 00, dec refcount to 0 pci_bus :0a: dev 00, released physical slot 9 [ cut here ] WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 pci_disable_device+0x84/0x90() Device pcieport disabling already-disabled device ... Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port I'll look at that patch again. I had some questions about it the first time, but perhaps it makes more sense after 928bea9648 has been applied. Bjorn, Are there any plans to apply the above patch? I'm seeing that warning on all my TBT test machines: [ 122.914180] pcieport :06:05.0: PME# disabled [ 122.915386] [ cut here ] [ 122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 pci_disable_device+0x7c/0x90() [ 122.917589] Device pcieport [ 122.917589] disabling already-disabled device [ 122.918681] Modules linked in: [ 122.920803] CPU: 0 PID: 1060 Comm: kworker/0:2 Not tainted 3.12.0 #193 [ 122.921877] Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013 [ 122.922989] Workqueue: kacpi_hotplug hotplug_event_work [ 122.924097] 0009 88006de81ab0 817ca961 88006de81af8 [ 122.925241] 88006de81ae8 810445c8 88006ea15800 88006ea15800 [ 122.926385] 81c5ac80 88006ea14098 88006eb35c28 88006de81b48 [ 122.927519] Call Trace: [ 122.928626] [817ca961] dump_stack+0x45/0x56 [ 122.929757] [810445c8] warn_slowpath_common+0x78/0xa0 [ 122.930884] [81044637] warn_slowpath_fmt+0x47/0x50 [ 122.932003] [812deb3d] ? do_pci_disable_device+0x4d/0x60 [ 122.933116] [812debcc] pci_disable_device+0x7c/0x90 [ 122.934235] [812ebfb5] pcie_portdrv_remove+0x15/0x20 [ 122.935345] [812e0318] pci_device_remove+0x28/0x60 [ 122.936442] [81424f24] __device_release_driver+0x64/0xd0 [ 122.937543] [81424fae] device_release_driver+0x1e/0x30 [ 122.938636] [81424837] bus_remove_device+0xf7/0x140 [ 122.939718] [81421575] device_del+0x135/0x1d0 [ 122.940806] [812db4c4] pci_stop_bus_device+0x94/0xa0 [ 122.941890] [812db46b] pci_stop_bus_device+0x3b/0xa0 [ 122.942957] [812db5cd] pci_stop_and_remove_bus_device+0xd/0x20 [ 122.944004] [812f3992] trim_stale_devices+0x62/0xc0 [ 122.945034] [812f39db] trim_stale_devices+0xab/0xc0 [ 122.946042] [812f39db] trim_stale_devices+0xab/0xc0 [ 122.947034] [812f3dbe] acpiphp_check_bridge+0x7e/0xd0 [ 122.948036] [812f4bf2] hotplug_event+0xf2/0x230 [ 122.949042] [8130dcf3] ? acpi_os_release_object+0x9/0xd [ 122.950054] [812f4d52] hotplug_event_work+0x22/0x60 [ 122.951067] [8105da2a] process_one_work+0x17a/0x430 [ 122.952084] [8105e619] worker_thread+0x119/0x390 [ 122.953095] [8105e500] ? manage_workers.isra.25+0x2a0/0x2a0 [ 122.954107] [810647bb] kthread+0xbb/0xc0 [ 122.955115] [81064700] ? kthread_create_on_node+0x110/0x110 [ 122.956136] [817db3fc] ret_from_fork+0x7c/0xb0 [ 122.957141] [81064700] ?
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Wed, Oct 30, 2013 at 12:57 AM, Yijing Wang wrote: Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . >>> >>> This is "PCI: Delay enabling bridges until they're needed" by Yinghai. >> >> that double disabling should be addressed by: >> >> https://lkml.org/lkml/2013/4/25/608 >> >> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port > > Hi Yinghai and Bjorn, >I found a related issue in the latest Bjorn/pci-next branch. > > Now if we remove the pcie port device in the system, there is a warning > occured. > It seems introduced after commit 928bea9 "PCI: Delay enabling bridges until > they're needed". > > [ 2124.129478] [ cut here ] > [ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 > pci_disable_device+0x90/0xa0() > [ 2124.129492] Device pcieport > [ 2124.129492] disabling already-disabled device yes. that is pcie port driver problem. | 928bea delay bridge enabling, and it's pci_is_enabled checking will prevent | pci bridge get enabled second time, so enable_cnt will be only 1. instead of | 2. if we enable bridge at first and then pcie port driver. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Wed, Oct 30, 2013 at 12:57 AM, Yijing Wang wangyij...@huawei.com wrote: Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port Hi Yinghai and Bjorn, I found a related issue in the latest Bjorn/pci-next branch. Now if we remove the pcie port device in the system, there is a warning occured. It seems introduced after commit 928bea9 PCI: Delay enabling bridges until they're needed. [ 2124.129478] [ cut here ] [ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 pci_disable_device+0x90/0xa0() [ 2124.129492] Device pcieport [ 2124.129492] disabling already-disabled device yes. that is pcie port driver problem. | 928bea delay bridge enabling, and it's pci_is_enabled checking will prevent | pci bridge get enabled second time, so enable_cnt will be only 1. instead of | 2. if we enable bridge at first and then pcie port driver. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
>>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . >> >> This is "PCI: Delay enabling bridges until they're needed" by Yinghai. > > that double disabling should be addressed by: > > https://lkml.org/lkml/2013/4/25/608 > > [PATCH] PCI: Remove duplicate pci_disable_device for pcie port Hi Yinghai and Bjorn, I found a related issue in the latest Bjorn/pci-next branch. Now if we remove the pcie port device in the system, there is a warning occured. It seems introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed". [ 2124.129478] [ cut here ] [ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 pci_disable_device+0x90/0xa0() [ 2124.129492] Device pcieport [ 2124.129492] disabling already-disabled device [ 2124.129494] Modules linked in: binfmt_misc cpufreq_conservative cpufreq_userspace cpufreq_powersave loop bnx2 igb kvm_intel kvm dca i2c_algo_bit ptp pps_core i2c_i801 i7core_edac iTCO_wdt iTCO_vendor_support lpc_ich mfd_core edac_core acpi_cpufreq serio_raw sg button pcspkr microcode autofs4 processor thermal_sys scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh ata_generic ata_piix megaraid_sas [ 2124.129530] CPU: 3 PID: 7 Comm: kworker/u49:0 Not tainted 3.12.0-rc2-2.10-desktop+ #22 [ 2124.129533] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011 [ 2124.129540] Workqueue: sysfsd sysfs_schedule_callback_work [ 2124.129543] 0009 880532cd9bb8 8162f51c 0007 [ 2124.129547] 880532cd9c08 880532cd9bf8 8105060c 0286 [ 2124.129552] 8805329f2000 81c67bc0 8805329f2000 [ 2124.129556] Call Trace: [ 2124.129564] [] dump_stack+0x55/0x86 [ 2124.129572] [] warn_slowpath_common+0x8c/0xc0 [ 2124.129576] [] warn_slowpath_fmt+0x46/0x50 [ 2124.129580] [] pci_disable_device+0x90/0xa0 [ 2124.129587] [] pcie_portdrv_remove+0x1e/0x30 [ 2124.129592] [] pci_device_remove+0x46/0xc0 [ 2124.129598] [] __device_release_driver+0x7f/0xf0 [ 2124.129602] [] device_release_driver+0x2c/0x40 [ 2124.129606] [] bus_remove_device+0x108/0x170 [ 2124.129610] [] device_del+0x130/0x1c0 [ 2124.129614] [] pci_stop_bus_device+0x9c/0xb0 [ 2124.129618] [] pci_stop_and_remove_bus_device+0x16/0x30 [ 2124.129622] [] remove_callback+0x29/0x40 [ 2124.129625] [] sysfs_schedule_callback_work+0x18/0x80 [ 2124.129632] [] process_one_work+0x17d/0x470 [ 2124.129635] [] worker_thread+0x122/0x380 [ 2124.129639] [] ? rescuer_thread+0x330/0x330 [ 2124.129643] [] kthread+0xc0/0xd0 [ 2124.129647] [] ? kthread_create_on_node+0x130/0x130 [ 2124.129653] [] ret_from_fork+0x7c/0xb0 [ 2124.129657] [] ? kthread_create_on_node+0x130/0x130 [ 2124.129660] ---[ end trace 5b020b35ec6adb4c ]--- > > Thanks > > Yinghai > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port Hi Yinghai and Bjorn, I found a related issue in the latest Bjorn/pci-next branch. Now if we remove the pcie port device in the system, there is a warning occured. It seems introduced after commit 928bea9 PCI: Delay enabling bridges until they're needed. [ 2124.129478] [ cut here ] [ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 pci_disable_device+0x90/0xa0() [ 2124.129492] Device pcieport [ 2124.129492] disabling already-disabled device [ 2124.129494] Modules linked in: binfmt_misc cpufreq_conservative cpufreq_userspace cpufreq_powersave loop bnx2 igb kvm_intel kvm dca i2c_algo_bit ptp pps_core i2c_i801 i7core_edac iTCO_wdt iTCO_vendor_support lpc_ich mfd_core edac_core acpi_cpufreq serio_raw sg button pcspkr microcode autofs4 processor thermal_sys scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh ata_generic ata_piix megaraid_sas [ 2124.129530] CPU: 3 PID: 7 Comm: kworker/u49:0 Not tainted 3.12.0-rc2-2.10-desktop+ #22 [ 2124.129533] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011 [ 2124.129540] Workqueue: sysfsd sysfs_schedule_callback_work [ 2124.129543] 0009 880532cd9bb8 8162f51c 0007 [ 2124.129547] 880532cd9c08 880532cd9bf8 8105060c 0286 [ 2124.129552] 8805329f2000 81c67bc0 8805329f2000 [ 2124.129556] Call Trace: [ 2124.129564] [8162f51c] dump_stack+0x55/0x86 [ 2124.129572] [8105060c] warn_slowpath_common+0x8c/0xc0 [ 2124.129576] [810506f6] warn_slowpath_fmt+0x46/0x50 [ 2124.129580] [81345ef0] pci_disable_device+0x90/0xa0 [ 2124.129587] [8135524e] pcie_portdrv_remove+0x1e/0x30 [ 2124.129592] [813491d6] pci_device_remove+0x46/0xc0 [ 2124.129598] [814139cf] __device_release_driver+0x7f/0xf0 [ 2124.129602] [81413c5c] device_release_driver+0x2c/0x40 [ 2124.129606] [81413368] bus_remove_device+0x108/0x170 [ 2124.129610] [814102f0] device_del+0x130/0x1c0 [ 2124.129614] [813427dc] pci_stop_bus_device+0x9c/0xb0 [ 2124.129618] [81342986] pci_stop_and_remove_bus_device+0x16/0x30 [ 2124.129622] [8134a599] remove_callback+0x29/0x40 [ 2124.129625] [811fd7d8] sysfs_schedule_callback_work+0x18/0x80 [ 2124.129632] [8106bbbd] process_one_work+0x17d/0x470 [ 2124.129635] [8106c342] worker_thread+0x122/0x380 [ 2124.129639] [8106c220] ? rescuer_thread+0x330/0x330 [ 2124.129643] [81073330] kthread+0xc0/0xd0 [ 2124.129647] [81073270] ? kthread_create_on_node+0x130/0x130 [ 2124.129653] [8163dbec] ret_from_fork+0x7c/0xb0 [ 2124.129657] [81073270] ? kthread_create_on_node+0x130/0x130 [ 2124.129660] ---[ end trace 5b020b35ec6adb4c ]--- Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- Thanks! Yijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Wed, Oct 16, 2013 at 2:21 PM, Bjorn Helgaas wrote: > On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: >> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: >> > [+cc Rafael, Mika, Kirill, linux-pci] >> > >> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever >> > wrote: >> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux >> > > crashes a few seconds later. Using >> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove >> > > to remove a bridge two levels above the device triggers the fault >> > > immediately: >> > >> > There have been significant changes in acpiphp related to Thunderbolt >> > since v3.11. >> >> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. >> I'd be surprised if acpiphp makes a difference here. > > Yeah, you're right; I wasn't paying attention. > > We save a pci_dev pointer in the pci_pme_list, which of course has a > longer lifetime than the pci_dev itself, but we don't acquire a reference > on it, so I suspect the pci_dev got released before we got around to > doing the pci_pme_list_scan(). > > Andreas, can you try the patch below? It's against v3.12-rc2, but it > should apply to v3.11, too. > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index ad7fc72..8b0a2f3 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work) > pci_pme_wakeup(pme_dev->dev, NULL); > } else { > list_del(_dev->list); > + pci_dev_put(pme_dev->dev); > kfree(pme_dev); > } > } > @@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) > GFP_KERNEL); > if (!pme_dev) > goto out; > - pme_dev->dev = dev; > + pme_dev->dev = pci_dev_get(dev); > mutex_lock(_pme_list_mutex); > list_add(_dev->list, _pme_list); > if (list_is_singular(_pme_list)) > @@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) > list_for_each_entry(pme_dev, _pme_list, list) { > if (pme_dev->dev == dev) { > list_del(_dev->list); > + pci_dev_put(pme_dev->dev); > kfree(pme_dev); > break; > } The patch above covered up the problem, but is incorrect. The topology is: 08:00.0 PCI bridge to [bus 09-0a] Thunderbolt Upstream Port 09:00.0 PCI bridge to [bus 0a]Thunderbolt Downstream Port 0a:00.0 tg3 NIC and the sequence leading to the crash is (edited for brevity): remove_store(08:00.0) pci_stop_and_remove_bus_device(08:00.0) # Upstream Port pci_stop_bus_device(08:00.0) pci_stop_bus_device(09:00.0)# Downstream Port pci_stop_bus_device(0a:00.0) # tg3 device pci_stop_dev(0a:00.0) pci_pme_active(0a:00.0, false)# remove from pci_pme_list device_del(0a:00.0) device_release_driver tg3_remove_one unregister_netdev dev->netdev_ops->ndo_stop # tg3_close tg3_close pci_wake_from_d3 pci_pme_active(dev, true) # add to pci_pme_list pci_remove_bus_device(08:00.0) pci_remove_bus_device(09:00.0) pci_remove_bus_device(0a:00.0) pci_destroy_dev(0a:00.0) put_device(0a:00.0) # drop last tg3 pci_dev reference ... pci_release_dev # pci_dev release function kfree(0a:00.0) ... pci_pme_list_scan 0a:00.0 still on list => use-after-free The patch above avoids the crash by acquiring a reference when adding to pci_pme_list, so when pci_destroy_dev() drops the reference, it's not the *last* reference, so the pci_dev is not released. The problem is that the reference acquired when we add to pci_pme_list will *never* be dropped, so we leaked the pci_dev. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Wed, Oct 16, 2013 at 2:21 PM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: [+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. Yeah, you're right; I wasn't paying attention. We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ad7fc72..8b0a2f3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work) pci_pme_wakeup(pme_dev-dev, NULL); } else { list_del(pme_dev-list); + pci_dev_put(pme_dev-dev); kfree(pme_dev); } } @@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) GFP_KERNEL); if (!pme_dev) goto out; - pme_dev-dev = dev; + pme_dev-dev = pci_dev_get(dev); mutex_lock(pci_pme_list_mutex); list_add(pme_dev-list, pci_pme_list); if (list_is_singular(pci_pme_list)) @@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) list_for_each_entry(pme_dev, pci_pme_list, list) { if (pme_dev-dev == dev) { list_del(pme_dev-list); + pci_dev_put(pme_dev-dev); kfree(pme_dev); break; } The patch above covered up the problem, but is incorrect. The topology is: 08:00.0 PCI bridge to [bus 09-0a] Thunderbolt Upstream Port 09:00.0 PCI bridge to [bus 0a]Thunderbolt Downstream Port 0a:00.0 tg3 NIC and the sequence leading to the crash is (edited for brevity): remove_store(08:00.0) pci_stop_and_remove_bus_device(08:00.0) # Upstream Port pci_stop_bus_device(08:00.0) pci_stop_bus_device(09:00.0)# Downstream Port pci_stop_bus_device(0a:00.0) # tg3 device pci_stop_dev(0a:00.0) pci_pme_active(0a:00.0, false)# remove from pci_pme_list device_del(0a:00.0) device_release_driver tg3_remove_one unregister_netdev dev-netdev_ops-ndo_stop # tg3_close tg3_close pci_wake_from_d3 pci_pme_active(dev, true) # add to pci_pme_list pci_remove_bus_device(08:00.0) pci_remove_bus_device(09:00.0) pci_remove_bus_device(0a:00.0) pci_destroy_dev(0a:00.0) put_device(0a:00.0) # drop last tg3 pci_dev reference ... pci_release_dev # pci_dev release function kfree(0a:00.0) ... pci_pme_list_scan 0a:00.0 still on list = use-after-free The patch above avoids the crash by acquiring a reference when adding to pci_pme_list, so when pci_destroy_dev() drops the reference, it's not the *last* reference, so the pci_dev is not released. The problem is that the reference acquired when we add to pci_pme_list will *never* be dropped, so we leaked the pci_dev. Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
> Sorry, I didn't understand this. Is this supposed to be an > explanation of how 928bea fixes the oops that Andreas saw? If so, can > you be a little more explicit about when the pci_dev got freed and > when pci_pme_list_scan() walked the list and accessed the freed area? I did some more debugging and it seems that 928bea is innocent after all. I added some debugging statements to pci_pme_active. The additional delay seems to make the oops easier to trigger and I can now replicate it up to https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5137a2ee2007d9cbbbeebd14abe08357a079b607 which makes much more sense. Here is what's going on (in 3.11). First of all pci_pme_activate is only ever called with false as the second paramter during boot. Now when I unplug the adapter, the first call is: [] dump_stack+0x54/0x8d [] pci_pme_active+0x30/0x210 [] ? pci_read+0x2c/0x30 (this should be pci_stop_dev imho) [] pci_stop_bus_device+0x4e/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_and_remove_bus_device+0x12/0x20 [] pciehp_unconfigure_device+0xa8/0x1b0 [] pciehp_disable_slot+0x68/0x200 [] pciehp_power_thread+0x83/0xf0 [] process_one_work+0x178/0x470 [] worker_thread+0x121/0x3a0 [] ? manage_workers.isra.21+0x2b0/0x2b0 [] kthread+0xc0/0xd0 [] ? SyS_unshare+0x220/0x280 [] ? kthread_create_on_node+0x120/0x120 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x120/0x120 tg3 :0a:00.0: PME# disabled This is still fine. But then it gets interesting. The next call is: [] dump_stack+0x54/0x8d [] pci_pme_active+0x30/0x210 [] __pci_enable_wake+0x65/0x160 [] pci_wake_from_d3+0x25/0x40 [] tg3_power_down+0x29/0x40 [tg3] [] tg3_close+0x10c/0x1d0 [tg3] [] __dev_close_many+0x85/0xd0 [] dev_close_many+0x8b/0x100 [] rollback_registered_many+0xd8/0x250 [] rollback_registered+0x2d/0x40 [] unregister_netdevice_queue+0x58/0xb0 [] unregister_netdev+0x1c/0x30 [] tg3_remove_one+0x6b/0x120 [tg3] [] pci_device_remove+0x3b/0xb0 [] __device_release_driver+0x7f/0xf0 [] device_release_driver+0x23/0x30 [] bus_remove_device+0xf4/0x170 [] device_del+0x135/0x1d0 [] pci_stop_bus_device+0x94/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_and_remove_bus_device+0x12/0x20 [] pciehp_unconfigure_device+0xa8/0x1b0 [] pciehp_disable_slot+0x68/0x200 [] pciehp_power_thread+0x83/0xf0 [] process_one_work+0x178/0x470 [] worker_thread+0x121/0x3a0 [] ? manage_workers.isra.21+0x2b0/0x2b0 [] kthread+0xc0/0xd0 [] ? SyS_unshare+0x220/0x280 [] ? kthread_create_on_node+0x120/0x120 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x120/0x120 tg3 :0a:00.0: PME# enabled On removal tg3 calls pci_wake_from_d3 to enable/disable wake-on-lan. This then calls pci_pme_activate(dev, true) for a device which is about to be deleted. The linked commit does no longer call pci_wake_from_d3, which "fixes" the problem. Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
Sorry, I didn't understand this. Is this supposed to be an explanation of how 928bea fixes the oops that Andreas saw? If so, can you be a little more explicit about when the pci_dev got freed and when pci_pme_list_scan() walked the list and accessed the freed area? I did some more debugging and it seems that 928bea is innocent after all. I added some debugging statements to pci_pme_active. The additional delay seems to make the oops easier to trigger and I can now replicate it up to https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5137a2ee2007d9cbbbeebd14abe08357a079b607 which makes much more sense. Here is what's going on (in 3.11). First of all pci_pme_activate is only ever called with false as the second paramter during boot. Now when I unplug the adapter, the first call is: [814b1cc7] dump_stack+0x54/0x8d [812ae970] pci_pme_active+0x30/0x210 [813bf2bc] ? pci_read+0x2c/0x30 (this should be pci_stop_dev imho) [812ac8ae] pci_stop_bus_device+0x4e/0xa0 [812ac89b] pci_stop_bus_device+0x3b/0xa0 [812ac89b] pci_stop_bus_device+0x3b/0xa0 [812aca02] pci_stop_and_remove_bus_device+0x12/0x20 [812c4698] pciehp_unconfigure_device+0xa8/0x1b0 [812c3ff8] pciehp_disable_slot+0x68/0x200 [812c4213] pciehp_power_thread+0x83/0xf0 [8107b5b8] process_one_work+0x178/0x470 [8107bf81] worker_thread+0x121/0x3a0 [8107be60] ? manage_workers.isra.21+0x2b0/0x2b0 [81082d80] kthread+0xc0/0xd0 [8106] ? SyS_unshare+0x220/0x280 [81082cc0] ? kthread_create_on_node+0x120/0x120 [814c07ec] ret_from_fork+0x7c/0xb0 [81082cc0] ? kthread_create_on_node+0x120/0x120 tg3 :0a:00.0: PME# disabled This is still fine. But then it gets interesting. The next call is: [814b1cc7] dump_stack+0x54/0x8d [812ae970] pci_pme_active+0x30/0x210 [812aebb5] __pci_enable_wake+0x65/0x160 [812aecd5] pci_wake_from_d3+0x25/0x40 [a0511c29] tg3_power_down+0x29/0x40 [tg3] [a0511d4c] tg3_close+0x10c/0x1d0 [tg3] [813d67b5] __dev_close_many+0x85/0xd0 [813d68cb] dev_close_many+0x8b/0x100 [813d8dd8] rollback_registered_many+0xd8/0x250 [813d8f7d] rollback_registered+0x2d/0x40 [813da828] unregister_netdevice_queue+0x58/0xb0 [813da89c] unregister_netdev+0x1c/0x30 [a050104b] tg3_remove_one+0x6b/0x120 [tg3] [812b1b0b] pci_device_remove+0x3b/0xb0 [81371c1f] __device_release_driver+0x7f/0xf0 [81371cb3] device_release_driver+0x23/0x30 [81371484] bus_remove_device+0xf4/0x170 [8136df45] device_del+0x135/0x1d0 [812ac8f4] pci_stop_bus_device+0x94/0xa0 [812ac89b] pci_stop_bus_device+0x3b/0xa0 [812ac89b] pci_stop_bus_device+0x3b/0xa0 [812aca02] pci_stop_and_remove_bus_device+0x12/0x20 [812c4698] pciehp_unconfigure_device+0xa8/0x1b0 [812c3ff8] pciehp_disable_slot+0x68/0x200 [812c4213] pciehp_power_thread+0x83/0xf0 [8107b5b8] process_one_work+0x178/0x470 [8107bf81] worker_thread+0x121/0x3a0 [8107be60] ? manage_workers.isra.21+0x2b0/0x2b0 [81082d80] kthread+0xc0/0xd0 [8106] ? SyS_unshare+0x220/0x280 [81082cc0] ? kthread_create_on_node+0x120/0x120 [814c07ec] ret_from_fork+0x7c/0xb0 [81082cc0] ? kthread_create_on_node+0x120/0x120 tg3 :0a:00.0: PME# enabled On removal tg3 calls pci_wake_from_d3 to enable/disable wake-on-lan. This then calls pci_pme_activate(dev, true) for a device which is about to be deleted. The linked commit does no longer call pci_wake_from_d3, which fixes the problem. Andreas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 11:13 PM, Yinghai Lu wrote: > On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas wrote: >> > Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is "PCI: Delay enabling bridges until they're needed" by Yinghai. >>> >>> that double disabling should be addressed by: >>> >>> https://lkml.org/lkml/2013/4/25/608 >>> >>> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port >> >> I'll look at that patch again. I had some questions about it the >> first time, but perhaps it makes more sense after 928bea9648 has been >> applied. >> >> Andreas originally reported a GPF oops in pci_pme_list_scan(). I >> posted a refcounting patch, which made the problem go away, but I >> can't explain why, and I don't want to apply it without understanding >> that. Decoding his oops shows this: >> >> 24: 0f 1f 00 nopl (%rax) >> 27: 48 8b 50 10 mov0x10(%rax),%rdx >> 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx <-- trapping instruction >> 2f: 48 85 d2 test %rdx,%rdx >> >> %rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer, >> which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is >> POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev >> that has already been freed. >> >> pci_pme_list_scan() holds pci_pme_list_mutex while it traverses >> pci_pme_list, and the pci_stop_and_remove_bus_device() path removes >> the pci_dev by calling pci_pme_active(), which also holds >> pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can >> see a pci_dev that has already been freed. >> >> If I understand Andreas correctly, 928bea9648 also fixes the crash, >> even without my refcounting change. Can you explain why? > > 928bea will make the dev->enable_cnt increase wrongly, as we have > pci_enable_device for child >pci_enable_bridge for parent > pci_enable_bridge for grandparent >pci_enable_device for grandparent >pci_enable_device for parent >pci_enable_brdige for grandparent > pci_enable_device for grandparent. > ... > > in that case grandprent will be enabled two times, and will enable_cnt will > have > extra increase. > > so later pci_disable_device will not really call do_pci_disable_device > do the really work, as enable_cnt still big. > > solution could be: > let pci_enable_bridge call __pci_enable_device. > and __pci_enable_device will not call pci_enable_bridge. Sorry, I didn't understand this. Is this supposed to be an explanation of how 928bea fixes the oops that Andreas saw? If so, can you be a little more explicit about when the pci_dev got freed and when pci_pme_list_scan() walked the list and accessed the freed area? Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 11:13 PM, Yinghai Lu ying...@kernel.org wrote: On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas bhelg...@google.com wrote: Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port I'll look at that patch again. I had some questions about it the first time, but perhaps it makes more sense after 928bea9648 has been applied. Andreas originally reported a GPF oops in pci_pme_list_scan(). I posted a refcounting patch, which made the problem go away, but I can't explain why, and I don't want to apply it without understanding that. Decoding his oops shows this: 24: 0f 1f 00 nopl (%rax) 27: 48 8b 50 10 mov0x10(%rax),%rdx 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx -- trapping instruction 2f: 48 85 d2 test %rdx,%rdx %rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer, which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev that has already been freed. pci_pme_list_scan() holds pci_pme_list_mutex while it traverses pci_pme_list, and the pci_stop_and_remove_bus_device() path removes the pci_dev by calling pci_pme_active(), which also holds pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can see a pci_dev that has already been freed. If I understand Andreas correctly, 928bea9648 also fixes the crash, even without my refcounting change. Can you explain why? 928bea will make the dev-enable_cnt increase wrongly, as we have pci_enable_device for child pci_enable_bridge for parent pci_enable_bridge for grandparent pci_enable_device for grandparent pci_enable_device for parent pci_enable_brdige for grandparent pci_enable_device for grandparent. ... in that case grandprent will be enabled two times, and will enable_cnt will have extra increase. so later pci_disable_device will not really call do_pci_disable_device do the really work, as enable_cnt still big. solution could be: let pci_enable_bridge call __pci_enable_device. and __pci_enable_device will not call pci_enable_bridge. Sorry, I didn't understand this. Is this supposed to be an explanation of how 928bea fixes the oops that Andreas saw? If so, can you be a little more explicit about when the pci_dev got freed and when pci_pme_list_scan() walked the list and accessed the freed area? Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 10:13 PM, Yinghai Lu wrote: > On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas wrote: >> > Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is "PCI: Delay enabling bridges until they're needed" by Yinghai. >>> >>> that double disabling should be addressed by: >>> >>> https://lkml.org/lkml/2013/4/25/608 >>> >>> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port >> >> I'll look at that patch again. I had some questions about it the >> first time, but perhaps it makes more sense after 928bea9648 has been >> applied. >> >> Andreas originally reported a GPF oops in pci_pme_list_scan(). I >> posted a refcounting patch, which made the problem go away, but I >> can't explain why, and I don't want to apply it without understanding >> that. Decoding his oops shows this: >> >> 24: 0f 1f 00 nopl (%rax) >> 27: 48 8b 50 10 mov0x10(%rax),%rdx >> 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx <-- trapping instruction >> 2f: 48 85 d2 test %rdx,%rdx >> >> %rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer, >> which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is >> POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev >> that has already been freed. >> >> pci_pme_list_scan() holds pci_pme_list_mutex while it traverses >> pci_pme_list, and the pci_stop_and_remove_bus_device() path removes >> the pci_dev by calling pci_pme_active(), which also holds >> pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can >> see a pci_dev that has already been freed. >> >> If I understand Andreas correctly, 928bea9648 also fixes the crash, >> even without my refcounting change. Can you explain why? > > 928bea will make the dev->enable_cnt increase wrongly, as we have > pci_enable_device for child >pci_enable_bridge for parent > pci_enable_bridge for grandparent >pci_enable_device for grandparent >pci_enable_device for parent >pci_enable_brdige for grandparent > pci_enable_device for grandparent. ===> looks like i read the code > wrongly. This one is skipped as we have pci_is_enabled checking. 928bea delay bridge enabling, and it's pci_is_enabled checking will prevent pci bridge get enabled second time, so enable_cnt will be only 1. instead of 2. if we enable bridge at first and then pcie port driver. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas wrote: > Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . >>> >>> This is "PCI: Delay enabling bridges until they're needed" by Yinghai. >> >> that double disabling should be addressed by: >> >> https://lkml.org/lkml/2013/4/25/608 >> >> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port > > I'll look at that patch again. I had some questions about it the > first time, but perhaps it makes more sense after 928bea9648 has been > applied. > > Andreas originally reported a GPF oops in pci_pme_list_scan(). I > posted a refcounting patch, which made the problem go away, but I > can't explain why, and I don't want to apply it without understanding > that. Decoding his oops shows this: > > 24: 0f 1f 00 nopl (%rax) > 27: 48 8b 50 10 mov0x10(%rax),%rdx > 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx <-- trapping instruction > 2f: 48 85 d2 test %rdx,%rdx > > %rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer, > which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is > POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev > that has already been freed. > > pci_pme_list_scan() holds pci_pme_list_mutex while it traverses > pci_pme_list, and the pci_stop_and_remove_bus_device() path removes > the pci_dev by calling pci_pme_active(), which also holds > pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can > see a pci_dev that has already been freed. > > If I understand Andreas correctly, 928bea9648 also fixes the crash, > even without my refcounting change. Can you explain why? 928bea will make the dev->enable_cnt increase wrongly, as we have pci_enable_device for child pci_enable_bridge for parent pci_enable_bridge for grandparent pci_enable_device for grandparent pci_enable_device for parent pci_enable_brdige for grandparent pci_enable_device for grandparent. ... in that case grandprent will be enabled two times, and will enable_cnt will have extra increase. so later pci_disable_device will not really call do_pci_disable_device do the really work, as enable_cnt still big. solution could be: let pci_enable_bridge call __pci_enable_device. and __pci_enable_device will not call pci_enable_bridge. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu wrote: > On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas wrote: >> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever >> wrote: >>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: > On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever > > wrote: > > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux > > > crashes a few seconds later. Using > > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove > > > to remove a bridge two levels above the device triggers the fault > > > immediately: We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. >>> >>> I have tested your patch against 3.11 where it solves the problem. Thanks! >>> >>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only >>> get the following warning (and no crash): >>> >>> tg3 :0a:00.0: PME# disabled >>> pcieport :09:00.0: PME# disabled >>> pciehp :09:00.0:pcie24: unloading service driver pciehp >>> pci_bus :0a: dev 00, dec refcount to 0 >>> pci_bus :0a: dev 00, released physical slot 9 >>> [ cut here ] >>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 >>> pci_disable_device+0x84/0x90() >>> Device pcieport >>> disabling already-disabled device >>> ... >>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . >> >> This is "PCI: Delay enabling bridges until they're needed" by Yinghai. > > that double disabling should be addressed by: > > https://lkml.org/lkml/2013/4/25/608 > > [PATCH] PCI: Remove duplicate pci_disable_device for pcie port I'll look at that patch again. I had some questions about it the first time, but perhaps it makes more sense after 928bea9648 has been applied. Andreas originally reported a GPF oops in pci_pme_list_scan(). I posted a refcounting patch, which made the problem go away, but I can't explain why, and I don't want to apply it without understanding that. Decoding his oops shows this: 24: 0f 1f 00 nopl (%rax) 27: 48 8b 50 10 mov0x10(%rax),%rdx 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx <-- trapping instruction 2f: 48 85 d2 test %rdx,%rdx %rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer, which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev that has already been freed. pci_pme_list_scan() holds pci_pme_list_mutex while it traverses pci_pme_list, and the pci_stop_and_remove_bus_device() path removes the pci_dev by calling pci_pme_active(), which also holds pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can see a pci_dev that has already been freed. If I understand Andreas correctly, 928bea9648 also fixes the crash, even without my refcounting change. Can you explain why? Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu ying...@kernel.org wrote: On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com wrote: On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com wrote: On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. I have tested your patch against 3.11 where it solves the problem. Thanks! Unfortunately I could not reproduce the problem in 3.12-rc5. I only get the following warning (and no crash): tg3 :0a:00.0: PME# disabled pcieport :09:00.0: PME# disabled pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: dev 00, dec refcount to 0 pci_bus :0a: dev 00, released physical slot 9 [ cut here ] WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 pci_disable_device+0x84/0x90() Device pcieport disabling already-disabled device ... Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port I'll look at that patch again. I had some questions about it the first time, but perhaps it makes more sense after 928bea9648 has been applied. Andreas originally reported a GPF oops in pci_pme_list_scan(). I posted a refcounting patch, which made the problem go away, but I can't explain why, and I don't want to apply it without understanding that. Decoding his oops shows this: 24: 0f 1f 00 nopl (%rax) 27: 48 8b 50 10 mov0x10(%rax),%rdx 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx -- trapping instruction 2f: 48 85 d2 test %rdx,%rdx %rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer, which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev that has already been freed. pci_pme_list_scan() holds pci_pme_list_mutex while it traverses pci_pme_list, and the pci_stop_and_remove_bus_device() path removes the pci_dev by calling pci_pme_active(), which also holds pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can see a pci_dev that has already been freed. If I understand Andreas correctly, 928bea9648 also fixes the crash, even without my refcounting change. Can you explain why? Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas bhelg...@google.com wrote: Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port I'll look at that patch again. I had some questions about it the first time, but perhaps it makes more sense after 928bea9648 has been applied. Andreas originally reported a GPF oops in pci_pme_list_scan(). I posted a refcounting patch, which made the problem go away, but I can't explain why, and I don't want to apply it without understanding that. Decoding his oops shows this: 24: 0f 1f 00 nopl (%rax) 27: 48 8b 50 10 mov0x10(%rax),%rdx 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx -- trapping instruction 2f: 48 85 d2 test %rdx,%rdx %rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer, which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev that has already been freed. pci_pme_list_scan() holds pci_pme_list_mutex while it traverses pci_pme_list, and the pci_stop_and_remove_bus_device() path removes the pci_dev by calling pci_pme_active(), which also holds pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can see a pci_dev that has already been freed. If I understand Andreas correctly, 928bea9648 also fixes the crash, even without my refcounting change. Can you explain why? 928bea will make the dev-enable_cnt increase wrongly, as we have pci_enable_device for child pci_enable_bridge for parent pci_enable_bridge for grandparent pci_enable_device for grandparent pci_enable_device for parent pci_enable_brdige for grandparent pci_enable_device for grandparent. ... in that case grandprent will be enabled two times, and will enable_cnt will have extra increase. so later pci_disable_device will not really call do_pci_disable_device do the really work, as enable_cnt still big. solution could be: let pci_enable_bridge call __pci_enable_device. and __pci_enable_device will not call pci_enable_bridge. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 24, 2013 at 10:13 PM, Yinghai Lu ying...@kernel.org wrote: On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas bhelg...@google.com wrote: Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port I'll look at that patch again. I had some questions about it the first time, but perhaps it makes more sense after 928bea9648 has been applied. Andreas originally reported a GPF oops in pci_pme_list_scan(). I posted a refcounting patch, which made the problem go away, but I can't explain why, and I don't want to apply it without understanding that. Decoding his oops shows this: 24: 0f 1f 00 nopl (%rax) 27: 48 8b 50 10 mov0x10(%rax),%rdx 2b:* 48 8b 52 38 mov0x38(%rdx),%rdx -- trapping instruction 2f: 48 85 d2 test %rdx,%rdx %rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer, which we put in %rdx. The oops says %rdx = 6b6b6b6b6b6b6b6b, which is POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev that has already been freed. pci_pme_list_scan() holds pci_pme_list_mutex while it traverses pci_pme_list, and the pci_stop_and_remove_bus_device() path removes the pci_dev by calling pci_pme_active(), which also holds pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can see a pci_dev that has already been freed. If I understand Andreas correctly, 928bea9648 also fixes the crash, even without my refcounting change. Can you explain why? 928bea will make the dev-enable_cnt increase wrongly, as we have pci_enable_device for child pci_enable_bridge for parent pci_enable_bridge for grandparent pci_enable_device for grandparent pci_enable_device for parent pci_enable_brdige for grandparent pci_enable_device for grandparent. === looks like i read the code wrongly. This one is skipped as we have pci_is_enabled checking. 928bea delay bridge enabling, and it's pci_is_enabled checking will prevent pci bridge get enabled second time, so enable_cnt will be only 1. instead of 2. if we enable bridge at first and then pcie port driver. Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas wrote: > [+cc Yinghai] > > On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever > wrote: >> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas wrote: >>> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: > [+cc Rafael, Mika, Kirill, linux-pci] > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever > wrote: > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux > > crashes a few seconds later. Using > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove > > to remove a bridge two levels above the device triggers the fault > > immediately: > > There have been significant changes in acpiphp related to Thunderbolt > since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. >>> >>> Yeah, you're right; I wasn't paying attention. >>> >>> We save a pci_dev pointer in the pci_pme_list, which of course has a >>> longer lifetime than the pci_dev itself, but we don't acquire a reference >>> on it, so I suspect the pci_dev got released before we got around to >>> doing the pci_pme_list_scan(). >>> >>> Andreas, can you try the patch below? It's against v3.12-rc2, but it >>> should apply to v3.11, too. >> >> I have tested your patch against 3.11 where it solves the problem. Thanks! >> >> Unfortunately I could not reproduce the problem in 3.12-rc5. I only >> get the following warning (and no crash): >> >> tg3 :0a:00.0: PME# disabled >> pcieport :09:00.0: PME# disabled >> pciehp :09:00.0:pcie24: unloading service driver pciehp >> pci_bus :0a: dev 00, dec refcount to 0 >> pci_bus :0a: dev 00, released physical slot 9 >> [ cut here ] >> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 >> pci_disable_device+0x84/0x90() >> Device pcieport >> disabling already-disabled device >> Modules linked in: >> btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus >> vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp >> coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel >> videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops >> aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul >> ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media >> applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid >> pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux >> lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc >> snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core >> apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata >> xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core >> usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button >> i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core >> CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30 >> Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS >> MBP101.88Z.00EE.B03.1212211437 12/21/2012 >> Workqueue: sysfsd sysfs_schedule_callback_work >> 0009 88044c021c00 814c4288 88044c021c48 >> 88044c021c38 81061b7d 880458a5c000 8187c5c0 >> 880458a5c000 880458a5b098 88044c021c98 >> Call Trace: >> [] dump_stack+0x54/0x8d >> [] warn_slowpath_common+0x7d/0xa0 >> [] warn_slowpath_fmt+0x4c/0x50 >> [] ? do_pci_disable_device+0x52/0x60 >> [] ? acpi_pci_irq_disable+0x4c/0x8d >> [] pci_disable_device+0x84/0x90 >> [] pcie_portdrv_remove+0x1a/0x20 >> [] pci_device_remove+0x3b/0xb0 >> [] __device_release_driver+0x7f/0xf0 >> [] device_release_driver+0x23/0x30 >> [] bus_remove_device+0x108/0x180 >> [] device_del+0x135/0x1d0 >> [] pci_stop_bus_device+0x94/0xa0 >> [] pci_stop_bus_device+0x3b/0xa0 >> [] pci_stop_and_remove_bus_device+0x12/0x20 >> [] remove_callback+0x25/0x40 >> [] sysfs_schedule_callback_work+0x14/0x80 >> [] process_one_work+0x178/0x470 >> [] worker_thread+0x121/0x3a0 >> [] ? manage_workers.isra.21+0x2b0/0x2b0 >> [] kthread+0xc0/0xd0 >> [] ? kthread_create_on_node+0x120/0x120 >> [] ret_from_fork+0x7c/0xb0 >> [] ? kthread_create_on_node+0x120/0x120 >> ---[ end trace b39a15fa94fbb2a2 ]--- >> >> >> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . > > This is "PCI: Delay enabling bridges until they're needed" by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever wrote: > On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas wrote: >> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: >>> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: >>> > [+cc Rafael, Mika, Kirill, linux-pci] >>> > >>> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever >>> > wrote: >>> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux >>> > > crashes a few seconds later. Using >>> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove >>> > > to remove a bridge two levels above the device triggers the fault >>> > > immediately: >>> > >>> > There have been significant changes in acpiphp related to Thunderbolt >>> > since v3.11. >>> >>> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. >>> I'd be surprised if acpiphp makes a difference here. >> >> Yeah, you're right; I wasn't paying attention. >> >> We save a pci_dev pointer in the pci_pme_list, which of course has a >> longer lifetime than the pci_dev itself, but we don't acquire a reference >> on it, so I suspect the pci_dev got released before we got around to >> doing the pci_pme_list_scan(). >> >> Andreas, can you try the patch below? It's against v3.12-rc2, but it >> should apply to v3.11, too. > > I have tested your patch against 3.11 where it solves the problem. Thanks! Hi Andreas, sorry for the delay here. I'm still trying to understand exactly why my patch fixes the problem, since I don't see a relevant refcounting change between v3.11 and v3.12-rc5. And I don't actually see the hole yet from inspection. It seems like we should be safe even without my patch. But maybe it's a case of releasing the pci_bus before releasing a pci_dev on the bus. I thought we recently fixed a hole there, but maybe not. I'll look more carefully at that path. Can I trouble you to collect a complete dmesg log from v3.11 without my patch? Maybe if I stare long enough at that and the lspci you supplied, I can figure out what's going on. If you were really gung-ho, you could add instrumentation to print out the pci_dev and pci_bus pointers as we enumerate them. My guess is that we'd see one of those pointers in the GPF register dump. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com wrote: On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: [+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. Yeah, you're right; I wasn't paying attention. We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. I have tested your patch against 3.11 where it solves the problem. Thanks! Hi Andreas, sorry for the delay here. I'm still trying to understand exactly why my patch fixes the problem, since I don't see a relevant refcounting change between v3.11 and v3.12-rc5. And I don't actually see the hole yet from inspection. It seems like we should be safe even without my patch. But maybe it's a case of releasing the pci_bus before releasing a pci_dev on the bus. I thought we recently fixed a hole there, but maybe not. I'll look more carefully at that path. Can I trouble you to collect a complete dmesg log from v3.11 without my patch? Maybe if I stare long enough at that and the lspci you supplied, I can figure out what's going on. If you were really gung-ho, you could add instrumentation to print out the pci_dev and pci_bus pointers as we enumerate them. My guess is that we'd see one of those pointers in the GPF register dump. Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com wrote: [+cc Yinghai] On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com wrote: On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: [+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. Yeah, you're right; I wasn't paying attention. We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. I have tested your patch against 3.11 where it solves the problem. Thanks! Unfortunately I could not reproduce the problem in 3.12-rc5. I only get the following warning (and no crash): tg3 :0a:00.0: PME# disabled pcieport :09:00.0: PME# disabled pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: dev 00, dec refcount to 0 pci_bus :0a: dev 00, released physical slot 9 [ cut here ] WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 pci_disable_device+0x84/0x90() Device pcieport disabling already-disabled device Modules linked in: btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30 Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS MBP101.88Z.00EE.B03.1212211437 12/21/2012 Workqueue: sysfsd sysfs_schedule_callback_work 0009 88044c021c00 814c4288 88044c021c48 88044c021c38 81061b7d 880458a5c000 8187c5c0 880458a5c000 880458a5b098 88044c021c98 Call Trace: [814c4288] dump_stack+0x54/0x8d [81061b7d] warn_slowpath_common+0x7d/0xa0 [81061bec] warn_slowpath_fmt+0x4c/0x50 [812bdd92] ? do_pci_disable_device+0x52/0x60 [813097f3] ? acpi_pci_irq_disable+0x4c/0x8d [812bde24] pci_disable_device+0x84/0x90 [812cc62a] pcie_portdrv_remove+0x1a/0x20 [812bfcdb] pci_device_remove+0x3b/0xb0 [81381caf] __device_release_driver+0x7f/0xf0 [81381d43] device_release_driver+0x23/0x30 [813814d8] bus_remove_device+0x108/0x180 [8137de75] device_del+0x135/0x1d0 [812ba394] pci_stop_bus_device+0x94/0xa0 [812ba33b] pci_stop_bus_device+0x3b/0xa0 [812ba4a2] pci_stop_and_remove_bus_device+0x12/0x20 [812c15c5] remove_callback+0x25/0x40 [81212ad4] sysfs_schedule_callback_work+0x14/0x80 [8107c9e8] process_one_work+0x178/0x470 [8107d3b1] worker_thread+0x121/0x3a0 [8107d290] ? manage_workers.isra.21+0x2b0/0x2b0 [810840f0] kthread+0xc0/0xd0 [81084030] ? kthread_create_on_node+0x120/0x120 [814d2dfc] ret_from_fork+0x7c/0xb0 [81084030] ? kthread_create_on_node+0x120/0x120 ---[ end trace b39a15fa94fbb2a2 ]--- Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. that double disabling should be addressed by: https://lkml.org/lkml/2013/4/25/608 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
[+cc Yinghai] On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever wrote: > On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas wrote: >> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: >>> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: >>> > [+cc Rafael, Mika, Kirill, linux-pci] >>> > >>> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever >>> > wrote: >>> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux >>> > > crashes a few seconds later. Using >>> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove >>> > > to remove a bridge two levels above the device triggers the fault >>> > > immediately: >>> > >>> > There have been significant changes in acpiphp related to Thunderbolt >>> > since v3.11. >>> >>> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. >>> I'd be surprised if acpiphp makes a difference here. >> >> Yeah, you're right; I wasn't paying attention. >> >> We save a pci_dev pointer in the pci_pme_list, which of course has a >> longer lifetime than the pci_dev itself, but we don't acquire a reference >> on it, so I suspect the pci_dev got released before we got around to >> doing the pci_pme_list_scan(). >> >> Andreas, can you try the patch below? It's against v3.12-rc2, but it >> should apply to v3.11, too. > > I have tested your patch against 3.11 where it solves the problem. Thanks! > > Unfortunately I could not reproduce the problem in 3.12-rc5. I only > get the following warning (and no crash): > > tg3 :0a:00.0: PME# disabled > pcieport :09:00.0: PME# disabled > pciehp :09:00.0:pcie24: unloading service driver pciehp > pci_bus :0a: dev 00, dec refcount to 0 > pci_bus :0a: dev 00, released physical slot 9 > [ cut here ] > WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 > pci_disable_device+0x84/0x90() > Device pcieport > disabling already-disabled device > Modules linked in: > btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus > vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp > coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel > videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops > aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul > ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media > applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid > pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux > lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc > snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core > apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata > xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core > usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button > i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core > CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30 > Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS > MBP101.88Z.00EE.B03.1212211437 12/21/2012 > Workqueue: sysfsd sysfs_schedule_callback_work > 0009 88044c021c00 814c4288 88044c021c48 > 88044c021c38 81061b7d 880458a5c000 8187c5c0 > 880458a5c000 880458a5b098 88044c021c98 > Call Trace: > [] dump_stack+0x54/0x8d > [] warn_slowpath_common+0x7d/0xa0 > [] warn_slowpath_fmt+0x4c/0x50 > [] ? do_pci_disable_device+0x52/0x60 > [] ? acpi_pci_irq_disable+0x4c/0x8d > [] pci_disable_device+0x84/0x90 > [] pcie_portdrv_remove+0x1a/0x20 > [] pci_device_remove+0x3b/0xb0 > [] __device_release_driver+0x7f/0xf0 > [] device_release_driver+0x23/0x30 > [] bus_remove_device+0x108/0x180 > [] device_del+0x135/0x1d0 > [] pci_stop_bus_device+0x94/0xa0 > [] pci_stop_bus_device+0x3b/0xa0 > [] pci_stop_and_remove_bus_device+0x12/0x20 > [] remove_callback+0x25/0x40 > [] sysfs_schedule_callback_work+0x14/0x80 > [] process_one_work+0x178/0x470 > [] worker_thread+0x121/0x3a0 > [] ? manage_workers.isra.21+0x2b0/0x2b0 > [] kthread+0xc0/0xd0 > [] ? kthread_create_on_node+0x120/0x120 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_create_on_node+0x120/0x120 > ---[ end trace b39a15fa94fbb2a2 ]--- > > > Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is "PCI: Delay enabling bridges until they're needed" by Yinghai. Yinghai, please comment. > From this commit on the pci_pme_list_scan crash disappears and the > warning appears. > > Since this commit seems to just mask the problem I went ahead and > tested your patch on 3.12-rc5 as well. It seems to work (not crash) > but the warning is still there. > > The above warning was triggered by removing the 08 bridge via sysfs. > The same warning can be triggered by unplugging the adapter (dmesg > below). The ethernet card is removed immediately. The bridges follow > 15 seconds later together with the warning. The topology is: > 06:03.0 -- 08 -- 09 -- 0a
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
[+cc Yinghai] On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com wrote: On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: [+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. Yeah, you're right; I wasn't paying attention. We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. I have tested your patch against 3.11 where it solves the problem. Thanks! Unfortunately I could not reproduce the problem in 3.12-rc5. I only get the following warning (and no crash): tg3 :0a:00.0: PME# disabled pcieport :09:00.0: PME# disabled pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: dev 00, dec refcount to 0 pci_bus :0a: dev 00, released physical slot 9 [ cut here ] WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430 pci_disable_device+0x84/0x90() Device pcieport disabling already-disabled device Modules linked in: btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30 Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS MBP101.88Z.00EE.B03.1212211437 12/21/2012 Workqueue: sysfsd sysfs_schedule_callback_work 0009 88044c021c00 814c4288 88044c021c48 88044c021c38 81061b7d 880458a5c000 8187c5c0 880458a5c000 880458a5b098 88044c021c98 Call Trace: [814c4288] dump_stack+0x54/0x8d [81061b7d] warn_slowpath_common+0x7d/0xa0 [81061bec] warn_slowpath_fmt+0x4c/0x50 [812bdd92] ? do_pci_disable_device+0x52/0x60 [813097f3] ? acpi_pci_irq_disable+0x4c/0x8d [812bde24] pci_disable_device+0x84/0x90 [812cc62a] pcie_portdrv_remove+0x1a/0x20 [812bfcdb] pci_device_remove+0x3b/0xb0 [81381caf] __device_release_driver+0x7f/0xf0 [81381d43] device_release_driver+0x23/0x30 [813814d8] bus_remove_device+0x108/0x180 [8137de75] device_del+0x135/0x1d0 [812ba394] pci_stop_bus_device+0x94/0xa0 [812ba33b] pci_stop_bus_device+0x3b/0xa0 [812ba4a2] pci_stop_and_remove_bus_device+0x12/0x20 [812c15c5] remove_callback+0x25/0x40 [81212ad4] sysfs_schedule_callback_work+0x14/0x80 [8107c9e8] process_one_work+0x178/0x470 [8107d3b1] worker_thread+0x121/0x3a0 [8107d290] ? manage_workers.isra.21+0x2b0/0x2b0 [810840f0] kthread+0xc0/0xd0 [81084030] ? kthread_create_on_node+0x120/0x120 [814d2dfc] ret_from_fork+0x7c/0xb0 [81084030] ? kthread_create_on_node+0x120/0x120 ---[ end trace b39a15fa94fbb2a2 ]--- Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d . This is PCI: Delay enabling bridges until they're needed by Yinghai. Yinghai, please comment. From this commit on the pci_pme_list_scan crash disappears and the warning appears. Since this commit seems to just mask the problem I went ahead and tested your patch on 3.12-rc5 as well. It seems to work (not crash) but the warning is still there. The above warning
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: > On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: > > [+cc Rafael, Mika, Kirill, linux-pci] > > > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever > > wrote: > > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux > > > crashes a few seconds later. Using > > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove > > > to remove a bridge two levels above the device triggers the fault > > > immediately: > > > > There have been significant changes in acpiphp related to Thunderbolt > > since v3.11. > > Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. > I'd be surprised if acpiphp makes a difference here. Yeah, you're right; I wasn't paying attention. We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ad7fc72..8b0a2f3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work) pci_pme_wakeup(pme_dev->dev, NULL); } else { list_del(_dev->list); + pci_dev_put(pme_dev->dev); kfree(pme_dev); } } @@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) GFP_KERNEL); if (!pme_dev) goto out; - pme_dev->dev = dev; + pme_dev->dev = pci_dev_get(dev); mutex_lock(_pme_list_mutex); list_add(_dev->list, _pme_list); if (list_is_singular(_pme_list)) @@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) list_for_each_entry(pme_dev, _pme_list, list) { if (pme_dev->dev == dev) { list_del(_dev->list); + pci_dev_put(pme_dev->dev); kfree(pme_dev); break; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote: On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: [+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. Yeah, you're right; I wasn't paying attention. We save a pci_dev pointer in the pci_pme_list, which of course has a longer lifetime than the pci_dev itself, but we don't acquire a reference on it, so I suspect the pci_dev got released before we got around to doing the pci_pme_list_scan(). Andreas, can you try the patch below? It's against v3.12-rc2, but it should apply to v3.11, too. diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ad7fc72..8b0a2f3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work) pci_pme_wakeup(pme_dev-dev, NULL); } else { list_del(pme_dev-list); + pci_dev_put(pme_dev-dev); kfree(pme_dev); } } @@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) GFP_KERNEL); if (!pme_dev) goto out; - pme_dev-dev = dev; + pme_dev-dev = pci_dev_get(dev); mutex_lock(pci_pme_list_mutex); list_add(pme_dev-list, pci_pme_list); if (list_is_singular(pci_pme_list)) @@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) list_for_each_entry(pme_dev, pci_pme_list, list) { if (pme_dev-dev == dev) { list_del(pme_dev-list); + pci_dev_put(pme_dev-dev); kfree(pme_dev); break; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: > [+cc Rafael, Mika, Kirill, linux-pci] > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever > wrote: > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux > > crashes a few seconds later. Using > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove > > to remove a bridge two levels above the device triggers the fault > > immediately: > > There have been significant changes in acpiphp related to Thunderbolt > since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. (Whine whine Intel continuing to refuse to provide documentation for a widely shipped part whine whine) -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
[+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever wrote: > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux > crashes a few seconds later. Using > echo 1 > /sys/bus/pci/devices/:08:00.0/remove > to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Any chance you can try reproduce this problem on a current kernel, e.g., v3.12-rc5? If it still happens, can you collect a complete dmesg log, acpidump, and "lspci -vv" output, and attach them to a new http://bugzilla.kernel.org report? Since you're doing a remove two levels above the Thunderbolt device, and it looks like pciehp is handling this part, you might be seeing something new, but the info above will still be a good start in looking at it. Bjorn > pciehp :09:00.0:pcie24: unloading service driver pciehp > pci_bus :0a: busn_res: [bus 0a] is released > pci_bus :09: busn_res: [bus 09-0a] is released > general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC > > Workqueue: events pci_pme_list_scan > task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000 > RIP: 0010:[] [] > pci_pme_list_scan+0x3c/0xe0 > RSP: 0018:88044ac63e10 EFLAGS: 00010202 > RAX: 88045601e7b0 RBX: 8187b070 RCX: > RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8 > RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0 > R10: 0001f9e0c287afc0 R11: R12: 880453250ca8 > R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8 > FS: () GS:88046d04() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0 > Stack: > 8187afc0 88044a920e40 88044ac63e68 8107ddd6 > 6d053d00 88046d053d00 88046d053d18 > 88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8 > Call Trace: > [] process_one_work+0x176/0x470 > [] worker_thread+0x11b/0x3a0 > [] ? manage_workers.isra.21+0x2b0/0x2b0 > [] kthread+0xc0/0xd0 > [] ? kthread_create_on_node+0x110/0x110 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_create_on_node+0x110/0x110 > Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81 > 0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 <48> > 8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7 > RIP [] pci_pme_list_scan+0x3c/0xe0 > RSP > ---[ end trace 3905f90a7dacf7b3 ]--- > > The offending line is: > (gdb) list *(pci_pme_list_scan+0x3c) > 0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551). > 1546if (!list_empty(_pme_list)) { > 1547list_for_each_entry_safe(pme_dev, n, > _pme_list, list) { > 1548if (pme_dev->dev->pme_poll) { > 1549struct pci_dev *bridge; > 1550 > 1551bridge = pme_dev->dev->bus->self; > 1552/* > 1553 * If bridge is in low power state, > the > 1554 * configuration space of > subordinate devices > 1555 * may be not accessible > If I read the disassembly correctly then the deref of bus seems to > cause the oops. > > An almost identical bug was reported (and fixed) some time ago: > http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html > > Andreas > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 > /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: busn_res: [bus 0a] is released pci_bus :09: busn_res: [bus 09-0a] is released general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC Workqueue: events pci_pme_list_scan task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000 RIP: 0010:[] [] pci_pme_list_scan+0x3c/0xe0 RSP: 0018:88044ac63e10 EFLAGS: 00010202 RAX: 88045601e7b0 RBX: 8187b070 RCX: RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8 RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0 R10: 0001f9e0c287afc0 R11: R12: 880453250ca8 R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8 FS: () GS:88046d04() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0 Stack: 8187afc0 88044a920e40 88044ac63e68 8107ddd6 6d053d00 88046d053d00 88046d053d18 88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8 Call Trace: [] process_one_work+0x176/0x470 [] worker_thread+0x11b/0x3a0 [] ? manage_workers.isra.21+0x2b0/0x2b0 [] kthread+0xc0/0xd0 [] ? kthread_create_on_node+0x110/0x110 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x110/0x110 Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81 0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 <48> 8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7 RIP [] pci_pme_list_scan+0x3c/0xe0 RSP ---[ end trace 3905f90a7dacf7b3 ]--- The offending line is: (gdb) list *(pci_pme_list_scan+0x3c) 0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551). 1546if (!list_empty(_pme_list)) { 1547list_for_each_entry_safe(pme_dev, n, _pme_list, list) { 1548if (pme_dev->dev->pme_poll) { 1549struct pci_dev *bridge; 1550 1551bridge = pme_dev->dev->bus->self; 1552/* 1553 * If bridge is in low power state, the 1554 * configuration space of subordinate devices 1555 * may be not accessible If I read the disassembly correctly then the deref of bus seems to cause the oops. An almost identical bug was reported (and fixed) some time ago: http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: busn_res: [bus 0a] is released pci_bus :09: busn_res: [bus 09-0a] is released general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC Workqueue: events pci_pme_list_scan task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000 RIP: 0010:[812bdc8c] [812bdc8c] pci_pme_list_scan+0x3c/0xe0 RSP: 0018:88044ac63e10 EFLAGS: 00010202 RAX: 88045601e7b0 RBX: 8187b070 RCX: RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8 RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0 R10: 0001f9e0c287afc0 R11: R12: 880453250ca8 R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8 FS: () GS:88046d04() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0 Stack: 8187afc0 88044a920e40 88044ac63e68 8107ddd6 6d053d00 88046d053d00 88046d053d18 88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8 Call Trace: [8107ddd6] process_one_work+0x176/0x470 [8107e79b] worker_thread+0x11b/0x3a0 [8107e680] ? manage_workers.isra.21+0x2b0/0x2b0 [810855e0] kthread+0xc0/0xd0 [81085520] ? kthread_create_on_node+0x110/0x110 [814f4c2c] ret_from_fork+0x7c/0xb0 [81085520] ? kthread_create_on_node+0x110/0x110 Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81 0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 48 8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7 RIP [812bdc8c] pci_pme_list_scan+0x3c/0xe0 RSP 88044ac63e10 ---[ end trace 3905f90a7dacf7b3 ]--- The offending line is: (gdb) list *(pci_pme_list_scan+0x3c) 0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551). 1546if (!list_empty(pci_pme_list)) { 1547list_for_each_entry_safe(pme_dev, n, pci_pme_list, list) { 1548if (pme_dev-dev-pme_poll) { 1549struct pci_dev *bridge; 1550 1551bridge = pme_dev-dev-bus-self; 1552/* 1553 * If bridge is in low power state, the 1554 * configuration space of subordinate devices 1555 * may be not accessible If I read the disassembly correctly then the deref of bus seems to cause the oops. An almost identical bug was reported (and fixed) some time ago: http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html Andreas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
[+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Any chance you can try reproduce this problem on a current kernel, e.g., v3.12-rc5? If it still happens, can you collect a complete dmesg log, acpidump, and lspci -vv output, and attach them to a new http://bugzilla.kernel.org report? Since you're doing a remove two levels above the Thunderbolt device, and it looks like pciehp is handling this part, you might be seeing something new, but the info above will still be a good start in looking at it. Bjorn pciehp :09:00.0:pcie24: unloading service driver pciehp pci_bus :0a: busn_res: [bus 0a] is released pci_bus :09: busn_res: [bus 09-0a] is released general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC Workqueue: events pci_pme_list_scan task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000 RIP: 0010:[812bdc8c] [812bdc8c] pci_pme_list_scan+0x3c/0xe0 RSP: 0018:88044ac63e10 EFLAGS: 00010202 RAX: 88045601e7b0 RBX: 8187b070 RCX: RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8 RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0 R10: 0001f9e0c287afc0 R11: R12: 880453250ca8 R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8 FS: () GS:88046d04() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0 Stack: 8187afc0 88044a920e40 88044ac63e68 8107ddd6 6d053d00 88046d053d00 88046d053d18 88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8 Call Trace: [8107ddd6] process_one_work+0x176/0x470 [8107e79b] worker_thread+0x11b/0x3a0 [8107e680] ? manage_workers.isra.21+0x2b0/0x2b0 [810855e0] kthread+0xc0/0xd0 [81085520] ? kthread_create_on_node+0x110/0x110 [814f4c2c] ret_from_fork+0x7c/0xb0 [81085520] ? kthread_create_on_node+0x110/0x110 Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81 0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 48 8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7 RIP [812bdc8c] pci_pme_list_scan+0x3c/0xe0 RSP 88044ac63e10 ---[ end trace 3905f90a7dacf7b3 ]--- The offending line is: (gdb) list *(pci_pme_list_scan+0x3c) 0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551). 1546if (!list_empty(pci_pme_list)) { 1547list_for_each_entry_safe(pme_dev, n, pci_pme_list, list) { 1548if (pme_dev-dev-pme_poll) { 1549struct pci_dev *bridge; 1550 1551bridge = pme_dev-dev-bus-self; 1552/* 1553 * If bridge is in low power state, the 1554 * configuration space of subordinate devices 1555 * may be not accessible If I read the disassembly correctly then the deref of bus seems to cause the oops. An almost identical bug was reported (and fixed) some time ago: http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html Andreas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote: [+cc Rafael, Mika, Kirill, linux-pci] On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever andreas.noe...@gmail.com wrote: When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux crashes a few seconds later. Using echo 1 /sys/bus/pci/devices/:08:00.0/remove to remove a bridge two levels above the device triggers the fault immediately: There have been significant changes in acpiphp related to Thunderbolt since v3.11. Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. I'd be surprised if acpiphp makes a difference here. (Whine whine Intel continuing to refuse to provide documentation for a widely shipped part whine whine) -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/