Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Yijing Wang
On 2013/11/20 9:20, Bjorn Helgaas wrote:
> On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang  wrote:
> [bhelgaas: changelog, tag for stable]
> Reported-by: David Bulkow 
> Reported-by: Mika Westerberg 
> Signed-off-by: Yinghai Lu 
> Signed-off-by: Bjorn Helgaas 
> CC: sta...@vger.kernel.org  # v2.6.32+

 Hi Bjorn,
This issue in X86 seems to be introduced after commit 928bea9 "PCI: 
 Delay enabling bridges until they're needed"
 So this patch needs to back port to 2.6.32+ ?
>>>
>>> 928bea9 might have made it more visible, but the underlying problem is that
>>> we enable the device once in the probe path, and disable it twice in the
>>> remove path.  That problem exists in 2.6.32.61:
>>>
>>>   pcie_portdrv_probe# .probe() method
>>> pcie_port_device_register
>>>   pci_enable_device <-- enable
>>>
>>>   pcie_portdrv_remove   # .remove() method
>>> pcie_port_device_remove
>>>   pci_disable_device<-- disable #1
>>> pci_disable_device  <-- disable #2
>>
>> During assign unassigned resources, we also enable the port device,
>>
>> fs_initcall(pcibios_assign_resources);
>>   pci_assign_unassigned_resources;
>> pci_enable_bridges()
>>   pci_enable_device()
>>
>>
>> So I think before the commit 928bea9 , the pci bridge device enable and 
>> disable is symmetrical.
>> After the commit 928bea9, we only enable bridge once, but still remove twice.
> 
> The port driver should be symmetrical, regardless of what happens
> outside it.  We have to be able to bind/unbind/bind/unbind
> indefinitely.
> 
> Do you think the patch is a problem for current upstream, or are you
> just saying it doesn't need to be backported as far as 2.6.32?  I
> frankly don't care that much if those old kernels pick it up or not.
> All I'm saying is that the problem this fixes is present that far
> back.
> 
> I'm not really interested in doing a lot more digging about ancient
> kernels, unless you think it's going to break something if applied to
> them.

As you said, this is not a big problem, it's ok for current upstream,
and for 3.4 3.5 3.10 stable tree, it seems just a enable and disable symmetry 
problem.
I don't find anything unsafe about this even though when we unbind pcie port 
driver,
the pcie port maybe still enable. I have no objection to backport it to ancient 
kernels. :)

Thanks!
Yijing.


> 
> Bjorn
> 
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Bjorn Helgaas
On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang  wrote:
 [bhelgaas: changelog, tag for stable]
 Reported-by: David Bulkow 
 Reported-by: Mika Westerberg 
 Signed-off-by: Yinghai Lu 
 Signed-off-by: Bjorn Helgaas 
 CC: sta...@vger.kernel.org  # v2.6.32+
>>>
>>> Hi Bjorn,
>>>This issue in X86 seems to be introduced after commit 928bea9 "PCI: 
>>> Delay enabling bridges until they're needed"
>>> So this patch needs to back port to 2.6.32+ ?
>>
>> 928bea9 might have made it more visible, but the underlying problem is that
>> we enable the device once in the probe path, and disable it twice in the
>> remove path.  That problem exists in 2.6.32.61:
>>
>>   pcie_portdrv_probe# .probe() method
>> pcie_port_device_register
>>   pci_enable_device <-- enable
>>
>>   pcie_portdrv_remove   # .remove() method
>> pcie_port_device_remove
>>   pci_disable_device<-- disable #1
>> pci_disable_device  <-- disable #2
>
> During assign unassigned resources, we also enable the port device,
>
> fs_initcall(pcibios_assign_resources);
>   pci_assign_unassigned_resources;
> pci_enable_bridges()
>   pci_enable_device()
>
>
> So I think before the commit 928bea9 , the pci bridge device enable and 
> disable is symmetrical.
> After the commit 928bea9, we only enable bridge once, but still remove twice.

The port driver should be symmetrical, regardless of what happens
outside it.  We have to be able to bind/unbind/bind/unbind
indefinitely.

Do you think the patch is a problem for current upstream, or are you
just saying it doesn't need to be backported as far as 2.6.32?  I
frankly don't care that much if those old kernels pick it up or not.
All I'm saying is that the problem this fixes is present that far
back.

I'm not really interested in doing a lot more digging about ancient
kernels, unless you think it's going to break something if applied to
them.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Yijing Wang
>>> [bhelgaas: changelog, tag for stable]
>>> Reported-by: David Bulkow 
>>> Reported-by: Mika Westerberg 
>>> Signed-off-by: Yinghai Lu 
>>> Signed-off-by: Bjorn Helgaas 
>>> CC: sta...@vger.kernel.org  # v2.6.32+
>>
>> Hi Bjorn,
>>This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay 
>> enabling bridges until they're needed"
>> So this patch needs to back port to 2.6.32+ ?
> 
> 928bea9 might have made it more visible, but the underlying problem is that
> we enable the device once in the probe path, and disable it twice in the
> remove path.  That problem exists in 2.6.32.61:
> 
>   pcie_portdrv_probe# .probe() method
> pcie_port_device_register
>   pci_enable_device <-- enable
> 
>   pcie_portdrv_remove   # .remove() method
> pcie_port_device_remove
>   pci_disable_device<-- disable #1
> pci_disable_device  <-- disable #2

During assign unassigned resources, we also enable the port device,

fs_initcall(pcibios_assign_resources);
  pci_assign_unassigned_resources;
pci_enable_bridges()
  pci_enable_device()


So I think before the commit 928bea9 , the pci bridge device enable and disable 
is symmetrical.
After the commit 928bea9, we only enable bridge once, but still remove twice.

Thanks!
Yijing.


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Bjorn Helgaas
On Tue, Nov 19, 2013 at 09:54:58AM +0800, Yijing Wang wrote:
> > The pcie_portdrv .probe() method calls pci_enable_device() once, in
> > pcie_port_device_register(), but the .remove() method calls
> > pci_disable_device() twice, in pcie_port_device_remove() and in
> > pcie_portdrv_remove().
> > 
> > That causes a "disabling already-disabled device" warning when removing a
> > PCIe port device.  This happens all the time when removing Thunderbolt
> > devices, but is also easy to reproduce with, e.g.,
> > "echo :00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"
> > 
> > This patch removes the disable from pcie_portdrv_remove().
> > 
> > [bhelgaas: changelog, tag for stable]
> > Reported-by: David Bulkow 
> > Reported-by: Mika Westerberg 
> > Signed-off-by: Yinghai Lu 
> > Signed-off-by: Bjorn Helgaas 
> > CC: sta...@vger.kernel.org  # v2.6.32+
> 
> Hi Bjorn,
>This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay 
> enabling bridges until they're needed"
> So this patch needs to back port to 2.6.32+ ?

928bea9 might have made it more visible, but the underlying problem is that
we enable the device once in the probe path, and disable it twice in the
remove path.  That problem exists in 2.6.32.61:

  pcie_portdrv_probe# .probe() method
    pcie_port_device_register
      pci_enable_device             <-- enable

  pcie_portdrv_remove   # .remove() method
    pcie_port_device_remove
      pci_disable_device            <-- disable #1
    pci_disable_device              <-- disable #2

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Mika Westerberg
On Mon, Nov 18, 2013 at 06:33:43PM -0700, Bjorn Helgaas wrote:
> I fixed the changelog (the extra disable was actually added by d899871936,
> not by dc5351784e) and put the patch below in my for-linus branch.  I'll
> ask Linus to pull it later this week.
> 
> Sorry for the delay, and thanks for the reminder.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Mika Westerberg
On Mon, Nov 18, 2013 at 06:33:43PM -0700, Bjorn Helgaas wrote:
 I fixed the changelog (the extra disable was actually added by d899871936,
 not by dc5351784e) and put the patch below in my for-linus branch.  I'll
 ask Linus to pull it later this week.
 
 Sorry for the delay, and thanks for the reminder.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Bjorn Helgaas
On Tue, Nov 19, 2013 at 09:54:58AM +0800, Yijing Wang wrote:
  The pcie_portdrv .probe() method calls pci_enable_device() once, in
  pcie_port_device_register(), but the .remove() method calls
  pci_disable_device() twice, in pcie_port_device_remove() and in
  pcie_portdrv_remove().
  
  That causes a disabling already-disabled device warning when removing a
  PCIe port device.  This happens all the time when removing Thunderbolt
  devices, but is also easy to reproduce with, e.g.,
  echo :00:1c.3  /sys/bus/pci/drivers/pcieport/unbind
  
  This patch removes the disable from pcie_portdrv_remove().
  
  [bhelgaas: changelog, tag for stable]
  Reported-by: David Bulkow david.bul...@stratus.com
  Reported-by: Mika Westerberg mika.westerb...@linux.intel.com
  Signed-off-by: Yinghai Lu ying...@kernel.org
  Signed-off-by: Bjorn Helgaas bhelg...@google.com
  CC: sta...@vger.kernel.org  # v2.6.32+
 
 Hi Bjorn,
This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay 
 enabling bridges until they're needed
 So this patch needs to back port to 2.6.32+ ?

928bea9 might have made it more visible, but the underlying problem is that
we enable the device once in the probe path, and disable it twice in the
remove path.  That problem exists in 2.6.32.61:

  pcie_portdrv_probe# .probe() method
    pcie_port_device_register
      pci_enable_device             -- enable

  pcie_portdrv_remove   # .remove() method
    pcie_port_device_remove
      pci_disable_device            -- disable #1
    pci_disable_device              -- disable #2

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Yijing Wang
 [bhelgaas: changelog, tag for stable]
 Reported-by: David Bulkow david.bul...@stratus.com
 Reported-by: Mika Westerberg mika.westerb...@linux.intel.com
 Signed-off-by: Yinghai Lu ying...@kernel.org
 Signed-off-by: Bjorn Helgaas bhelg...@google.com
 CC: sta...@vger.kernel.org  # v2.6.32+

 Hi Bjorn,
This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay 
 enabling bridges until they're needed
 So this patch needs to back port to 2.6.32+ ?
 
 928bea9 might have made it more visible, but the underlying problem is that
 we enable the device once in the probe path, and disable it twice in the
 remove path.  That problem exists in 2.6.32.61:
 
   pcie_portdrv_probe# .probe() method
 pcie_port_device_register
   pci_enable_device -- enable
 
   pcie_portdrv_remove   # .remove() method
 pcie_port_device_remove
   pci_disable_device-- disable #1
 pci_disable_device  -- disable #2

During assign unassigned resources, we also enable the port device,

fs_initcall(pcibios_assign_resources);
  pci_assign_unassigned_resources;
pci_enable_bridges()
  pci_enable_device()


So I think before the commit 928bea9 , the pci bridge device enable and disable 
is symmetrical.
After the commit 928bea9, we only enable bridge once, but still remove twice.

Thanks!
Yijing.


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Bjorn Helgaas
On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang wangyij...@huawei.com wrote:
 [bhelgaas: changelog, tag for stable]
 Reported-by: David Bulkow david.bul...@stratus.com
 Reported-by: Mika Westerberg mika.westerb...@linux.intel.com
 Signed-off-by: Yinghai Lu ying...@kernel.org
 Signed-off-by: Bjorn Helgaas bhelg...@google.com
 CC: sta...@vger.kernel.org  # v2.6.32+

 Hi Bjorn,
This issue in X86 seems to be introduced after commit 928bea9 PCI: 
 Delay enabling bridges until they're needed
 So this patch needs to back port to 2.6.32+ ?

 928bea9 might have made it more visible, but the underlying problem is that
 we enable the device once in the probe path, and disable it twice in the
 remove path.  That problem exists in 2.6.32.61:

   pcie_portdrv_probe# .probe() method
 pcie_port_device_register
   pci_enable_device -- enable

   pcie_portdrv_remove   # .remove() method
 pcie_port_device_remove
   pci_disable_device-- disable #1
 pci_disable_device  -- disable #2

 During assign unassigned resources, we also enable the port device,

 fs_initcall(pcibios_assign_resources);
   pci_assign_unassigned_resources;
 pci_enable_bridges()
   pci_enable_device()


 So I think before the commit 928bea9 , the pci bridge device enable and 
 disable is symmetrical.
 After the commit 928bea9, we only enable bridge once, but still remove twice.

The port driver should be symmetrical, regardless of what happens
outside it.  We have to be able to bind/unbind/bind/unbind
indefinitely.

Do you think the patch is a problem for current upstream, or are you
just saying it doesn't need to be backported as far as 2.6.32?  I
frankly don't care that much if those old kernels pick it up or not.
All I'm saying is that the problem this fixes is present that far
back.

I'm not really interested in doing a lot more digging about ancient
kernels, unless you think it's going to break something if applied to
them.

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-19 Thread Yijing Wang
On 2013/11/20 9:20, Bjorn Helgaas wrote:
 On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang wangyij...@huawei.com wrote:
 [bhelgaas: changelog, tag for stable]
 Reported-by: David Bulkow david.bul...@stratus.com
 Reported-by: Mika Westerberg mika.westerb...@linux.intel.com
 Signed-off-by: Yinghai Lu ying...@kernel.org
 Signed-off-by: Bjorn Helgaas bhelg...@google.com
 CC: sta...@vger.kernel.org  # v2.6.32+

 Hi Bjorn,
This issue in X86 seems to be introduced after commit 928bea9 PCI: 
 Delay enabling bridges until they're needed
 So this patch needs to back port to 2.6.32+ ?

 928bea9 might have made it more visible, but the underlying problem is that
 we enable the device once in the probe path, and disable it twice in the
 remove path.  That problem exists in 2.6.32.61:

   pcie_portdrv_probe# .probe() method
 pcie_port_device_register
   pci_enable_device -- enable

   pcie_portdrv_remove   # .remove() method
 pcie_port_device_remove
   pci_disable_device-- disable #1
 pci_disable_device  -- disable #2

 During assign unassigned resources, we also enable the port device,

 fs_initcall(pcibios_assign_resources);
   pci_assign_unassigned_resources;
 pci_enable_bridges()
   pci_enable_device()


 So I think before the commit 928bea9 , the pci bridge device enable and 
 disable is symmetrical.
 After the commit 928bea9, we only enable bridge once, but still remove twice.
 
 The port driver should be symmetrical, regardless of what happens
 outside it.  We have to be able to bind/unbind/bind/unbind
 indefinitely.
 
 Do you think the patch is a problem for current upstream, or are you
 just saying it doesn't need to be backported as far as 2.6.32?  I
 frankly don't care that much if those old kernels pick it up or not.
 All I'm saying is that the problem this fixes is present that far
 back.
 
 I'm not really interested in doing a lot more digging about ancient
 kernels, unless you think it's going to break something if applied to
 them.

As you said, this is not a big problem, it's ok for current upstream,
and for 3.4 3.5 3.10 stable tree, it seems just a enable and disable symmetry 
problem.
I don't find anything unsafe about this even though when we unbind pcie port 
driver,
the pcie port maybe still enable. I have no objection to backport it to ancient 
kernels. :)

Thanks!
Yijing.


 
 Bjorn
 
 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-18 Thread Yijing Wang
> The pcie_portdrv .probe() method calls pci_enable_device() once, in
> pcie_port_device_register(), but the .remove() method calls
> pci_disable_device() twice, in pcie_port_device_remove() and in
> pcie_portdrv_remove().
> 
> That causes a "disabling already-disabled device" warning when removing a
> PCIe port device.  This happens all the time when removing Thunderbolt
> devices, but is also easy to reproduce with, e.g.,
> "echo :00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"
> 
> This patch removes the disable from pcie_portdrv_remove().
> 
> [bhelgaas: changelog, tag for stable]
> Reported-by: David Bulkow 
> Reported-by: Mika Westerberg 
> Signed-off-by: Yinghai Lu 
> Signed-off-by: Bjorn Helgaas 
> CC: sta...@vger.kernel.org# v2.6.32+

Hi Bjorn,
   This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay 
enabling bridges until they're needed"
So this patch needs to back port to 2.6.32+ ?

> ---
>  drivers/pci/pcie/portdrv_pci.c |1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index cd1e57e51aa7..0d8fdc48e642 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
>   pcie_port_device_remove(dev);
> - pci_disable_device(dev);
>  }
>  
>  static int error_detected_iter(struct device *device, void *data)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-18 Thread Bjorn Helgaas
On Fri, Nov 15, 2013 at 01:52:35PM +0200, Mika Westerberg wrote:
> On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote:
> > On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu  wrote:
> > > On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas  
> > > wrote:
> > >> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever 
> > >>  wrote:
> > >>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas  
> > >>> wrote:
> >  On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
> > > On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
> > > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever 
> > > >  wrote:
> > > > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro 
> > > > > Linux
> > > > > crashes a few seconds later. Using
> > > > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
> > > > > to remove a bridge two levels above the device triggers the fault 
> > > > > immediately:
> > 
> >  We save a pci_dev pointer in the pci_pme_list, which of course has a
> >  longer lifetime than the pci_dev itself, but we don't acquire a 
> >  reference
> >  on it, so I suspect the pci_dev got released before we got around to
> >  doing the pci_pme_list_scan().
> > 
> >  Andreas, can you try the patch below?  It's against v3.12-rc2, but it
> >  should apply to v3.11, too.
> > >>>
> > >>> I have tested your patch against 3.11 where it solves the problem. 
> > >>> Thanks!
> > >>>
> > >>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only
> > >>> get the following warning (and no crash):
> > >>>
> > >>> tg3 :0a:00.0: PME# disabled
> > >>> pcieport :09:00.0: PME# disabled
> > >>> pciehp :09:00.0:pcie24: unloading service driver pciehp
> > >>> pci_bus :0a: dev 00, dec refcount to 0
> > >>> pci_bus :0a: dev 00, released physical slot 9
> > >>> [ cut here ]
> > >>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
> > >>> pci_disable_device+0x84/0x90()
> > >>> Device pcieport
> > >>> disabling already-disabled device
> > >>> ...
> > 
> > >>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
> > >>
> > >> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
> > >
> > > that double disabling should be addressed by:
> > >
> > > https://lkml.org/lkml/2013/4/25/608
> > >
> > > [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
> > 
> > I'll look at that patch again.  I had some questions about it the
> > first time, but perhaps it makes more sense after 928bea9648 has been
> > applied.
> 
> Bjorn,
> 
> Are there any plans to apply the above patch?
> 
> I'm seeing that warning on all my TBT test machines:
> 
> [  122.914180] pcieport :06:05.0: PME# disabled
> [  122.915386] [ cut here ]
> [  122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 
> pci_disable_device+0x7c/0x90()
> [  122.917589] Device pcieport
> [  122.917589] disabling already-disabled device

I fixed the changelog (the extra disable was actually added by d899871936,
not by dc5351784e) and put the patch below in my for-linus branch.  I'll
ask Linus to pull it later this week.

Sorry for the delay, and thanks for the reminder.

Bjorn


PCI: Remove duplicate pci_disable_device() from pcie_portdrv_remove()

From: Yinghai Lu 

The pcie_portdrv .probe() method calls pci_enable_device() once, in
pcie_port_device_register(), but the .remove() method calls
pci_disable_device() twice, in pcie_port_device_remove() and in
pcie_portdrv_remove().

That causes a "disabling already-disabled device" warning when removing a
PCIe port device.  This happens all the time when removing Thunderbolt
devices, but is also easy to reproduce with, e.g.,
"echo :00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"

This patch removes the disable from pcie_portdrv_remove().

[bhelgaas: changelog, tag for stable]
Reported-by: David Bulkow 
Reported-by: Mika Westerberg 
Signed-off-by: Yinghai Lu 
Signed-off-by: Bjorn Helgaas 
CC: sta...@vger.kernel.org  # v2.6.32+
---
 drivers/pci/pcie/portdrv_pci.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index cd1e57e51aa7..0d8fdc48e642 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
pcie_port_device_remove(dev);
-   pci_disable_device(dev);
 }
 
 static int error_detected_iter(struct device *device, void *data)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-18 Thread Bjorn Helgaas
On Fri, Nov 15, 2013 at 01:52:35PM +0200, Mika Westerberg wrote:
 On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote:
  On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu ying...@kernel.org wrote:
   On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com 
   wrote:
   On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever 
   andreas.noe...@gmail.com wrote:
   On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com 
   wrote:
   On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
   On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever 
andreas.noe...@gmail.com wrote:
 When I unplug the Thunderbolt ethernet adapter on my MacBookPro 
 Linux
 crashes a few seconds later. Using
 echo 1  /sys/bus/pci/devices/:08:00.0/remove
 to remove a bridge two levels above the device triggers the fault 
 immediately:
  
   We save a pci_dev pointer in the pci_pme_list, which of course has a
   longer lifetime than the pci_dev itself, but we don't acquire a 
   reference
   on it, so I suspect the pci_dev got released before we got around to
   doing the pci_pme_list_scan().
  
   Andreas, can you try the patch below?  It's against v3.12-rc2, but it
   should apply to v3.11, too.
  
   I have tested your patch against 3.11 where it solves the problem. 
   Thanks!
  
   Unfortunately I could not reproduce the problem in 3.12-rc5. I only
   get the following warning (and no crash):
  
   tg3 :0a:00.0: PME# disabled
   pcieport :09:00.0: PME# disabled
   pciehp :09:00.0:pcie24: unloading service driver pciehp
   pci_bus :0a: dev 00, dec refcount to 0
   pci_bus :0a: dev 00, released physical slot 9
   [ cut here ]
   WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
   pci_disable_device+0x84/0x90()
   Device pcieport
   disabling already-disabled device
   ...
  
   Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
  
   This is PCI: Delay enabling bridges until they're needed by Yinghai.
  
   that double disabling should be addressed by:
  
   https://lkml.org/lkml/2013/4/25/608
  
   [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
  
  I'll look at that patch again.  I had some questions about it the
  first time, but perhaps it makes more sense after 928bea9648 has been
  applied.
 
 Bjorn,
 
 Are there any plans to apply the above patch?
 
 I'm seeing that warning on all my TBT test machines:
 
 [  122.914180] pcieport :06:05.0: PME# disabled
 [  122.915386] [ cut here ]
 [  122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 
 pci_disable_device+0x7c/0x90()
 [  122.917589] Device pcieport
 [  122.917589] disabling already-disabled device

I fixed the changelog (the extra disable was actually added by d899871936,
not by dc5351784e) and put the patch below in my for-linus branch.  I'll
ask Linus to pull it later this week.

Sorry for the delay, and thanks for the reminder.

Bjorn


PCI: Remove duplicate pci_disable_device() from pcie_portdrv_remove()

From: Yinghai Lu ying...@kernel.org

The pcie_portdrv .probe() method calls pci_enable_device() once, in
pcie_port_device_register(), but the .remove() method calls
pci_disable_device() twice, in pcie_port_device_remove() and in
pcie_portdrv_remove().

That causes a disabling already-disabled device warning when removing a
PCIe port device.  This happens all the time when removing Thunderbolt
devices, but is also easy to reproduce with, e.g.,
echo :00:1c.3  /sys/bus/pci/drivers/pcieport/unbind

This patch removes the disable from pcie_portdrv_remove().

[bhelgaas: changelog, tag for stable]
Reported-by: David Bulkow david.bul...@stratus.com
Reported-by: Mika Westerberg mika.westerb...@linux.intel.com
Signed-off-by: Yinghai Lu ying...@kernel.org
Signed-off-by: Bjorn Helgaas bhelg...@google.com
CC: sta...@vger.kernel.org  # v2.6.32+
---
 drivers/pci/pcie/portdrv_pci.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index cd1e57e51aa7..0d8fdc48e642 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
pcie_port_device_remove(dev);
-   pci_disable_device(dev);
 }
 
 static int error_detected_iter(struct device *device, void *data)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-18 Thread Yijing Wang
 The pcie_portdrv .probe() method calls pci_enable_device() once, in
 pcie_port_device_register(), but the .remove() method calls
 pci_disable_device() twice, in pcie_port_device_remove() and in
 pcie_portdrv_remove().
 
 That causes a disabling already-disabled device warning when removing a
 PCIe port device.  This happens all the time when removing Thunderbolt
 devices, but is also easy to reproduce with, e.g.,
 echo :00:1c.3  /sys/bus/pci/drivers/pcieport/unbind
 
 This patch removes the disable from pcie_portdrv_remove().
 
 [bhelgaas: changelog, tag for stable]
 Reported-by: David Bulkow david.bul...@stratus.com
 Reported-by: Mika Westerberg mika.westerb...@linux.intel.com
 Signed-off-by: Yinghai Lu ying...@kernel.org
 Signed-off-by: Bjorn Helgaas bhelg...@google.com
 CC: sta...@vger.kernel.org# v2.6.32+

Hi Bjorn,
   This issue in X86 seems to be introduced after commit 928bea9 PCI: Delay 
enabling bridges until they're needed
So this patch needs to back port to 2.6.32+ ?

 ---
  drivers/pci/pcie/portdrv_pci.c |1 -
  1 file changed, 1 deletion(-)
 
 diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
 index cd1e57e51aa7..0d8fdc48e642 100644
 --- a/drivers/pci/pcie/portdrv_pci.c
 +++ b/drivers/pci/pcie/portdrv_pci.c
 @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
  static void pcie_portdrv_remove(struct pci_dev *dev)
  {
   pcie_port_device_remove(dev);
 - pci_disable_device(dev);
  }
  
  static int error_detected_iter(struct device *device, void *data)
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
 .
 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-15 Thread Mika Westerberg
On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote:
> On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu  wrote:
> > On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas  wrote:
> >> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever  
> >> wrote:
> >>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas  
> >>> wrote:
>  On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
> > On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever 
> > >  wrote:
> > > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro 
> > > > Linux
> > > > crashes a few seconds later. Using
> > > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
> > > > to remove a bridge two levels above the device triggers the fault 
> > > > immediately:
> 
>  We save a pci_dev pointer in the pci_pme_list, which of course has a
>  longer lifetime than the pci_dev itself, but we don't acquire a reference
>  on it, so I suspect the pci_dev got released before we got around to
>  doing the pci_pme_list_scan().
> 
>  Andreas, can you try the patch below?  It's against v3.12-rc2, but it
>  should apply to v3.11, too.
> >>>
> >>> I have tested your patch against 3.11 where it solves the problem. Thanks!
> >>>
> >>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only
> >>> get the following warning (and no crash):
> >>>
> >>> tg3 :0a:00.0: PME# disabled
> >>> pcieport :09:00.0: PME# disabled
> >>> pciehp :09:00.0:pcie24: unloading service driver pciehp
> >>> pci_bus :0a: dev 00, dec refcount to 0
> >>> pci_bus :0a: dev 00, released physical slot 9
> >>> [ cut here ]
> >>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
> >>> pci_disable_device+0x84/0x90()
> >>> Device pcieport
> >>> disabling already-disabled device
> >>> ...
> 
> >>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
> >>
> >> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
> >
> > that double disabling should be addressed by:
> >
> > https://lkml.org/lkml/2013/4/25/608
> >
> > [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
> 
> I'll look at that patch again.  I had some questions about it the
> first time, but perhaps it makes more sense after 928bea9648 has been
> applied.

Bjorn,

Are there any plans to apply the above patch?

I'm seeing that warning on all my TBT test machines:

[  122.914180] pcieport :06:05.0: PME# disabled
[  122.915386] [ cut here ]
[  122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 
pci_disable_device+0x7c/0x90()
[  122.917589] Device pcieport
[  122.917589] disabling already-disabled device
[  122.918681] Modules linked in:
[  122.920803] CPU: 0 PID: 1060 Comm: kworker/0:2 Not tainted 3.12.0 #193
[  122.921877] Hardware name:  /D33217CK, BIOS 
GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
[  122.922989] Workqueue: kacpi_hotplug hotplug_event_work
[  122.924097]  0009 88006de81ab0 817ca961 
88006de81af8
[  122.925241]  88006de81ae8 810445c8 88006ea15800 
88006ea15800
[  122.926385]  81c5ac80 88006ea14098 88006eb35c28 
88006de81b48
[  122.927519] Call Trace:
[  122.928626]  [] dump_stack+0x45/0x56
[  122.929757]  [] warn_slowpath_common+0x78/0xa0
[  122.930884]  [] warn_slowpath_fmt+0x47/0x50
[  122.932003]  [] ? do_pci_disable_device+0x4d/0x60
[  122.933116]  [] pci_disable_device+0x7c/0x90
[  122.934235]  [] pcie_portdrv_remove+0x15/0x20
[  122.935345]  [] pci_device_remove+0x28/0x60
[  122.936442]  [] __device_release_driver+0x64/0xd0
[  122.937543]  [] device_release_driver+0x1e/0x30
[  122.938636]  [] bus_remove_device+0xf7/0x140
[  122.939718]  [] device_del+0x135/0x1d0
[  122.940806]  [] pci_stop_bus_device+0x94/0xa0
[  122.941890]  [] pci_stop_bus_device+0x3b/0xa0
[  122.942957]  [] pci_stop_and_remove_bus_device+0xd/0x20
[  122.944004]  [] trim_stale_devices+0x62/0xc0
[  122.945034]  [] trim_stale_devices+0xab/0xc0
[  122.946042]  [] trim_stale_devices+0xab/0xc0
[  122.947034]  [] acpiphp_check_bridge+0x7e/0xd0
[  122.948036]  [] hotplug_event+0xf2/0x230
[  122.949042]  [] ? acpi_os_release_object+0x9/0xd
[  122.950054]  [] hotplug_event_work+0x22/0x60
[  122.951067]  [] process_one_work+0x17a/0x430
[  122.952084]  [] worker_thread+0x119/0x390
[  122.953095]  [] ? manage_workers.isra.25+0x2a0/0x2a0
[  122.954107]  [] kthread+0xbb/0xc0
[  122.955115]  [] ? kthread_create_on_node+0x110/0x110
[  122.956136]  [] ret_from_fork+0x7c/0xb0
[  122.957141]  [] ? kthread_create_on_node+0x110/0x110
[  122.958145] ---[ end trace a0dcbb3b178e4755 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-15 Thread Mika Westerberg
On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote:
 On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu ying...@kernel.org wrote:
  On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com wrote:
  On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com 
  wrote:
  On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com 
  wrote:
  On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
  On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
   On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever 
   andreas.noe...@gmail.com wrote:
When I unplug the Thunderbolt ethernet adapter on my MacBookPro 
Linux
crashes a few seconds later. Using
echo 1  /sys/bus/pci/devices/:08:00.0/remove
to remove a bridge two levels above the device triggers the fault 
immediately:
 
  We save a pci_dev pointer in the pci_pme_list, which of course has a
  longer lifetime than the pci_dev itself, but we don't acquire a reference
  on it, so I suspect the pci_dev got released before we got around to
  doing the pci_pme_list_scan().
 
  Andreas, can you try the patch below?  It's against v3.12-rc2, but it
  should apply to v3.11, too.
 
  I have tested your patch against 3.11 where it solves the problem. Thanks!
 
  Unfortunately I could not reproduce the problem in 3.12-rc5. I only
  get the following warning (and no crash):
 
  tg3 :0a:00.0: PME# disabled
  pcieport :09:00.0: PME# disabled
  pciehp :09:00.0:pcie24: unloading service driver pciehp
  pci_bus :0a: dev 00, dec refcount to 0
  pci_bus :0a: dev 00, released physical slot 9
  [ cut here ]
  WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
  pci_disable_device+0x84/0x90()
  Device pcieport
  disabling already-disabled device
  ...
 
  Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
 
  This is PCI: Delay enabling bridges until they're needed by Yinghai.
 
  that double disabling should be addressed by:
 
  https://lkml.org/lkml/2013/4/25/608
 
  [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
 
 I'll look at that patch again.  I had some questions about it the
 first time, but perhaps it makes more sense after 928bea9648 has been
 applied.

Bjorn,

Are there any plans to apply the above patch?

I'm seeing that warning on all my TBT test machines:

[  122.914180] pcieport :06:05.0: PME# disabled
[  122.915386] [ cut here ]
[  122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 
pci_disable_device+0x7c/0x90()
[  122.917589] Device pcieport
[  122.917589] disabling already-disabled device
[  122.918681] Modules linked in:
[  122.920803] CPU: 0 PID: 1060 Comm: kworker/0:2 Not tainted 3.12.0 #193
[  122.921877] Hardware name:  /D33217CK, BIOS 
GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
[  122.922989] Workqueue: kacpi_hotplug hotplug_event_work
[  122.924097]  0009 88006de81ab0 817ca961 
88006de81af8
[  122.925241]  88006de81ae8 810445c8 88006ea15800 
88006ea15800
[  122.926385]  81c5ac80 88006ea14098 88006eb35c28 
88006de81b48
[  122.927519] Call Trace:
[  122.928626]  [817ca961] dump_stack+0x45/0x56
[  122.929757]  [810445c8] warn_slowpath_common+0x78/0xa0
[  122.930884]  [81044637] warn_slowpath_fmt+0x47/0x50
[  122.932003]  [812deb3d] ? do_pci_disable_device+0x4d/0x60
[  122.933116]  [812debcc] pci_disable_device+0x7c/0x90
[  122.934235]  [812ebfb5] pcie_portdrv_remove+0x15/0x20
[  122.935345]  [812e0318] pci_device_remove+0x28/0x60
[  122.936442]  [81424f24] __device_release_driver+0x64/0xd0
[  122.937543]  [81424fae] device_release_driver+0x1e/0x30
[  122.938636]  [81424837] bus_remove_device+0xf7/0x140
[  122.939718]  [81421575] device_del+0x135/0x1d0
[  122.940806]  [812db4c4] pci_stop_bus_device+0x94/0xa0
[  122.941890]  [812db46b] pci_stop_bus_device+0x3b/0xa0
[  122.942957]  [812db5cd] pci_stop_and_remove_bus_device+0xd/0x20
[  122.944004]  [812f3992] trim_stale_devices+0x62/0xc0
[  122.945034]  [812f39db] trim_stale_devices+0xab/0xc0
[  122.946042]  [812f39db] trim_stale_devices+0xab/0xc0
[  122.947034]  [812f3dbe] acpiphp_check_bridge+0x7e/0xd0
[  122.948036]  [812f4bf2] hotplug_event+0xf2/0x230
[  122.949042]  [8130dcf3] ? acpi_os_release_object+0x9/0xd
[  122.950054]  [812f4d52] hotplug_event_work+0x22/0x60
[  122.951067]  [8105da2a] process_one_work+0x17a/0x430
[  122.952084]  [8105e619] worker_thread+0x119/0x390
[  122.953095]  [8105e500] ? manage_workers.isra.25+0x2a0/0x2a0
[  122.954107]  [810647bb] kthread+0xbb/0xc0
[  122.955115]  [81064700] ? kthread_create_on_node+0x110/0x110
[  122.956136]  [817db3fc] ret_from_fork+0x7c/0xb0
[  122.957141]  [81064700] ? 

Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-31 Thread Yinghai Lu
On Wed, Oct 30, 2013 at 12:57 AM, Yijing Wang  wrote:
 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
>>>
>>> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
>>
>> that double disabling should be addressed by:
>>
>> https://lkml.org/lkml/2013/4/25/608
>>
>> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
>
> Hi Yinghai and Bjorn,
>I found a related issue in the latest Bjorn/pci-next branch.
>
> Now if we remove the pcie port device in the system, there is a warning 
> occured.
> It seems introduced after commit 928bea9 "PCI: Delay enabling bridges until 
> they're needed".
>
> [ 2124.129478] [ cut here ]
> [ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 
> pci_disable_device+0x90/0xa0()
> [ 2124.129492] Device pcieport
> [ 2124.129492] disabling already-disabled device

yes. that is pcie port driver problem.

| 928bea delay bridge enabling, and it's pci_is_enabled checking will prevent
| pci bridge get enabled second time, so enable_cnt will be only 1. instead of
| 2. if we enable bridge at first and then pcie port driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-31 Thread Yinghai Lu
On Wed, Oct 30, 2013 at 12:57 AM, Yijing Wang wangyij...@huawei.com wrote:
 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is PCI: Delay enabling bridges until they're needed by Yinghai.

 that double disabling should be addressed by:

 https://lkml.org/lkml/2013/4/25/608

 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

 Hi Yinghai and Bjorn,
I found a related issue in the latest Bjorn/pci-next branch.

 Now if we remove the pcie port device in the system, there is a warning 
 occured.
 It seems introduced after commit 928bea9 PCI: Delay enabling bridges until 
 they're needed.

 [ 2124.129478] [ cut here ]
 [ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 
 pci_disable_device+0x90/0xa0()
 [ 2124.129492] Device pcieport
 [ 2124.129492] disabling already-disabled device

yes. that is pcie port driver problem.

| 928bea delay bridge enabling, and it's pci_is_enabled checking will prevent
| pci bridge get enabled second time, so enable_cnt will be only 1. instead of
| 2. if we enable bridge at first and then pcie port driver.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-30 Thread Yijing Wang
>>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
>>
>> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
> 
> that double disabling should be addressed by:
> 
> https://lkml.org/lkml/2013/4/25/608
> 
> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

Hi Yinghai and Bjorn,
   I found a related issue in the latest Bjorn/pci-next branch.

Now if we remove the pcie port device in the system, there is a warning occured.
It seems introduced after commit 928bea9 "PCI: Delay enabling bridges until 
they're needed".

[ 2124.129478] [ cut here ]
[ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 
pci_disable_device+0x90/0xa0()
[ 2124.129492] Device pcieport
[ 2124.129492] disabling already-disabled device
[ 2124.129494] Modules linked in: binfmt_misc cpufreq_conservative 
cpufreq_userspace cpufreq_powersave loop bnx2 igb kvm_intel kvm dca 
i2c_algo_bit ptp pps_core i2c_i801 i7core_edac iTCO_wdt iTCO_vendor_support 
lpc_ich mfd_core edac_core acpi_cpufreq serio_raw sg button pcspkr microcode 
autofs4 processor thermal_sys scsi_dh_rdac scsi_dh_alua scsi_dh_emc 
scsi_dh_hp_sw scsi_dh ata_generic ata_piix megaraid_sas
[ 2124.129530] CPU: 3 PID: 7 Comm: kworker/u49:0 Not tainted 
3.12.0-rc2-2.10-desktop+ #22
[ 2124.129533] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285
  /BC11BTSA  , BIOS CTSAV036 04/27/2011
[ 2124.129540] Workqueue: sysfsd sysfs_schedule_callback_work
[ 2124.129543]  0009 880532cd9bb8 8162f51c 
0007
[ 2124.129547]  880532cd9c08 880532cd9bf8 8105060c 
0286
[ 2124.129552]  8805329f2000 81c67bc0 8805329f2000 

[ 2124.129556] Call Trace:
[ 2124.129564]  [] dump_stack+0x55/0x86
[ 2124.129572]  [] warn_slowpath_common+0x8c/0xc0
[ 2124.129576]  [] warn_slowpath_fmt+0x46/0x50
[ 2124.129580]  [] pci_disable_device+0x90/0xa0
[ 2124.129587]  [] pcie_portdrv_remove+0x1e/0x30
[ 2124.129592]  [] pci_device_remove+0x46/0xc0
[ 2124.129598]  [] __device_release_driver+0x7f/0xf0
[ 2124.129602]  [] device_release_driver+0x2c/0x40
[ 2124.129606]  [] bus_remove_device+0x108/0x170
[ 2124.129610]  [] device_del+0x130/0x1c0
[ 2124.129614]  [] pci_stop_bus_device+0x9c/0xb0
[ 2124.129618]  [] pci_stop_and_remove_bus_device+0x16/0x30
[ 2124.129622]  [] remove_callback+0x29/0x40
[ 2124.129625]  [] sysfs_schedule_callback_work+0x18/0x80
[ 2124.129632]  [] process_one_work+0x17d/0x470
[ 2124.129635]  [] worker_thread+0x122/0x380
[ 2124.129639]  [] ? rescuer_thread+0x330/0x330
[ 2124.129643]  [] kthread+0xc0/0xd0
[ 2124.129647]  [] ? kthread_create_on_node+0x130/0x130
[ 2124.129653]  [] ret_from_fork+0x7c/0xb0
[ 2124.129657]  [] ? kthread_create_on_node+0x130/0x130
[ 2124.129660] ---[ end trace 5b020b35ec6adb4c ]---


> 
> Thanks
> 
> Yinghai
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-30 Thread Yijing Wang
 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is PCI: Delay enabling bridges until they're needed by Yinghai.
 
 that double disabling should be addressed by:
 
 https://lkml.org/lkml/2013/4/25/608
 
 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

Hi Yinghai and Bjorn,
   I found a related issue in the latest Bjorn/pci-next branch.

Now if we remove the pcie port device in the system, there is a warning occured.
It seems introduced after commit 928bea9 PCI: Delay enabling bridges until 
they're needed.

[ 2124.129478] [ cut here ]
[ 2124.129490] WARNING: CPU: 3 PID: 7 at drivers/pci/pci.c:1424 
pci_disable_device+0x90/0xa0()
[ 2124.129492] Device pcieport
[ 2124.129492] disabling already-disabled device
[ 2124.129494] Modules linked in: binfmt_misc cpufreq_conservative 
cpufreq_userspace cpufreq_powersave loop bnx2 igb kvm_intel kvm dca 
i2c_algo_bit ptp pps_core i2c_i801 i7core_edac iTCO_wdt iTCO_vendor_support 
lpc_ich mfd_core edac_core acpi_cpufreq serio_raw sg button pcspkr microcode 
autofs4 processor thermal_sys scsi_dh_rdac scsi_dh_alua scsi_dh_emc 
scsi_dh_hp_sw scsi_dh ata_generic ata_piix megaraid_sas
[ 2124.129530] CPU: 3 PID: 7 Comm: kworker/u49:0 Not tainted 
3.12.0-rc2-2.10-desktop+ #22
[ 2124.129533] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285
  /BC11BTSA  , BIOS CTSAV036 04/27/2011
[ 2124.129540] Workqueue: sysfsd sysfs_schedule_callback_work
[ 2124.129543]  0009 880532cd9bb8 8162f51c 
0007
[ 2124.129547]  880532cd9c08 880532cd9bf8 8105060c 
0286
[ 2124.129552]  8805329f2000 81c67bc0 8805329f2000 

[ 2124.129556] Call Trace:
[ 2124.129564]  [8162f51c] dump_stack+0x55/0x86
[ 2124.129572]  [8105060c] warn_slowpath_common+0x8c/0xc0
[ 2124.129576]  [810506f6] warn_slowpath_fmt+0x46/0x50
[ 2124.129580]  [81345ef0] pci_disable_device+0x90/0xa0
[ 2124.129587]  [8135524e] pcie_portdrv_remove+0x1e/0x30
[ 2124.129592]  [813491d6] pci_device_remove+0x46/0xc0
[ 2124.129598]  [814139cf] __device_release_driver+0x7f/0xf0
[ 2124.129602]  [81413c5c] device_release_driver+0x2c/0x40
[ 2124.129606]  [81413368] bus_remove_device+0x108/0x170
[ 2124.129610]  [814102f0] device_del+0x130/0x1c0
[ 2124.129614]  [813427dc] pci_stop_bus_device+0x9c/0xb0
[ 2124.129618]  [81342986] pci_stop_and_remove_bus_device+0x16/0x30
[ 2124.129622]  [8134a599] remove_callback+0x29/0x40
[ 2124.129625]  [811fd7d8] sysfs_schedule_callback_work+0x18/0x80
[ 2124.129632]  [8106bbbd] process_one_work+0x17d/0x470
[ 2124.129635]  [8106c342] worker_thread+0x122/0x380
[ 2124.129639]  [8106c220] ? rescuer_thread+0x330/0x330
[ 2124.129643]  [81073330] kthread+0xc0/0xd0
[ 2124.129647]  [81073270] ? kthread_create_on_node+0x130/0x130
[ 2124.129653]  [8163dbec] ret_from_fork+0x7c/0xb0
[ 2124.129657]  [81073270] ? kthread_create_on_node+0x130/0x130
[ 2124.129660] ---[ end trace 5b020b35ec6adb4c ]---


 
 Thanks
 
 Yinghai
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 .
 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-28 Thread Bjorn Helgaas
On Wed, Oct 16, 2013 at 2:21 PM, Bjorn Helgaas  wrote:
> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
>> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
>> > [+cc Rafael, Mika, Kirill, linux-pci]
>> >
>> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
>> >  wrote:
>> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
>> > > crashes a few seconds later. Using
>> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
>> > > to remove a bridge two levels above the device triggers the fault 
>> > > immediately:
>> >
>> > There have been significant changes in acpiphp related to Thunderbolt
>> > since v3.11.
>>
>> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
>> I'd be surprised if acpiphp makes a difference here.
>
> Yeah, you're right; I wasn't paying attention.
>
> We save a pci_dev pointer in the pci_pme_list, which of course has a
> longer lifetime than the pci_dev itself, but we don't acquire a reference
> on it, so I suspect the pci_dev got released before we got around to
> doing the pci_pme_list_scan().
>
> Andreas, can you try the patch below?  It's against v3.12-rc2, but it
> should apply to v3.11, too.
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ad7fc72..8b0a2f3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work)
> pci_pme_wakeup(pme_dev->dev, NULL);
> } else {
> list_del(_dev->list);
> +   pci_dev_put(pme_dev->dev);
> kfree(pme_dev);
> }
> }
> @@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
>   GFP_KERNEL);
> if (!pme_dev)
> goto out;
> -   pme_dev->dev = dev;
> +   pme_dev->dev = pci_dev_get(dev);
> mutex_lock(_pme_list_mutex);
> list_add(_dev->list, _pme_list);
> if (list_is_singular(_pme_list))
> @@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
> list_for_each_entry(pme_dev, _pme_list, list) {
> if (pme_dev->dev == dev) {
> list_del(_dev->list);
> +   pci_dev_put(pme_dev->dev);
> kfree(pme_dev);
> break;
> }

The patch above covered up the problem, but is incorrect.  The topology is:

  08:00.0 PCI bridge to [bus 09-0a] Thunderbolt Upstream Port
  09:00.0 PCI bridge to [bus 0a]Thunderbolt Downstream Port
  0a:00.0 tg3 NIC

and the sequence leading to the crash is (edited for brevity):

  remove_store(08:00.0)
pci_stop_and_remove_bus_device(08:00.0) # Upstream Port
  pci_stop_bus_device(08:00.0)
pci_stop_bus_device(09:00.0)# Downstream Port
  pci_stop_bus_device(0a:00.0)  # tg3 device
pci_stop_dev(0a:00.0)
  pci_pme_active(0a:00.0, false)# remove from pci_pme_list
  device_del(0a:00.0)
device_release_driver
  tg3_remove_one
unregister_netdev
  dev->netdev_ops->ndo_stop # tg3_close
  tg3_close
pci_wake_from_d3
  pci_pme_active(dev, true) # add to pci_pme_list
  pci_remove_bus_device(08:00.0)
pci_remove_bus_device(09:00.0)
  pci_remove_bus_device(0a:00.0)
pci_destroy_dev(0a:00.0)
  put_device(0a:00.0)   # drop last tg3
pci_dev reference
  ...
  pci_release_dev   # pci_dev release function
kfree(0a:00.0)
  ...
  pci_pme_list_scan
0a:00.0 still on list => use-after-free

The patch above avoids the crash by acquiring a reference when adding
to pci_pme_list, so when pci_destroy_dev() drops the reference, it's
not the *last* reference, so the pci_dev is not released.

The problem is that the reference acquired when we add to pci_pme_list
will *never* be dropped, so we leaked the pci_dev.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-28 Thread Bjorn Helgaas
On Wed, Oct 16, 2013 at 2:21 PM, Bjorn Helgaas bhelg...@google.com wrote:
 On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
 On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
  [+cc Rafael, Mika, Kirill, linux-pci]
 
  On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
  andreas.noe...@gmail.com wrote:
   When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
   crashes a few seconds later. Using
   echo 1  /sys/bus/pci/devices/:08:00.0/remove
   to remove a bridge two levels above the device triggers the fault 
   immediately:
 
  There have been significant changes in acpiphp related to Thunderbolt
  since v3.11.

 Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
 I'd be surprised if acpiphp makes a difference here.

 Yeah, you're right; I wasn't paying attention.

 We save a pci_dev pointer in the pci_pme_list, which of course has a
 longer lifetime than the pci_dev itself, but we don't acquire a reference
 on it, so I suspect the pci_dev got released before we got around to
 doing the pci_pme_list_scan().

 Andreas, can you try the patch below?  It's against v3.12-rc2, but it
 should apply to v3.11, too.

 diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
 index ad7fc72..8b0a2f3 100644
 --- a/drivers/pci/pci.c
 +++ b/drivers/pci/pci.c
 @@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work)
 pci_pme_wakeup(pme_dev-dev, NULL);
 } else {
 list_del(pme_dev-list);
 +   pci_dev_put(pme_dev-dev);
 kfree(pme_dev);
 }
 }
 @@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
   GFP_KERNEL);
 if (!pme_dev)
 goto out;
 -   pme_dev-dev = dev;
 +   pme_dev-dev = pci_dev_get(dev);
 mutex_lock(pci_pme_list_mutex);
 list_add(pme_dev-list, pci_pme_list);
 if (list_is_singular(pci_pme_list))
 @@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
 list_for_each_entry(pme_dev, pci_pme_list, list) {
 if (pme_dev-dev == dev) {
 list_del(pme_dev-list);
 +   pci_dev_put(pme_dev-dev);
 kfree(pme_dev);
 break;
 }

The patch above covered up the problem, but is incorrect.  The topology is:

  08:00.0 PCI bridge to [bus 09-0a] Thunderbolt Upstream Port
  09:00.0 PCI bridge to [bus 0a]Thunderbolt Downstream Port
  0a:00.0 tg3 NIC

and the sequence leading to the crash is (edited for brevity):

  remove_store(08:00.0)
pci_stop_and_remove_bus_device(08:00.0) # Upstream Port
  pci_stop_bus_device(08:00.0)
pci_stop_bus_device(09:00.0)# Downstream Port
  pci_stop_bus_device(0a:00.0)  # tg3 device
pci_stop_dev(0a:00.0)
  pci_pme_active(0a:00.0, false)# remove from pci_pme_list
  device_del(0a:00.0)
device_release_driver
  tg3_remove_one
unregister_netdev
  dev-netdev_ops-ndo_stop # tg3_close
  tg3_close
pci_wake_from_d3
  pci_pme_active(dev, true) # add to pci_pme_list
  pci_remove_bus_device(08:00.0)
pci_remove_bus_device(09:00.0)
  pci_remove_bus_device(0a:00.0)
pci_destroy_dev(0a:00.0)
  put_device(0a:00.0)   # drop last tg3
pci_dev reference
  ...
  pci_release_dev   # pci_dev release function
kfree(0a:00.0)
  ...
  pci_pme_list_scan
0a:00.0 still on list = use-after-free

The patch above avoids the crash by acquiring a reference when adding
to pci_pme_list, so when pci_destroy_dev() drops the reference, it's
not the *last* reference, so the pci_dev is not released.

The problem is that the reference acquired when we add to pci_pme_list
will *never* be dropped, so we leaked the pci_dev.

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-26 Thread Andreas Noever
> Sorry, I didn't understand this.  Is this supposed to be an
> explanation of how 928bea fixes the oops that Andreas saw?  If so, can
> you be a little more explicit about when the pci_dev got freed and
> when pci_pme_list_scan() walked the list and accessed the freed area?

I did some more debugging and it seems that 928bea is innocent after
all. I added some debugging statements to pci_pme_active. The
additional delay seems to make the oops easier to trigger and I can
now replicate it up to
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5137a2ee2007d9cbbbeebd14abe08357a079b607
which makes much more sense.

Here is what's going on (in 3.11). First of all pci_pme_activate is
only ever called with false as the second paramter during boot. Now
when I unplug the adapter, the first call is:
 [] dump_stack+0x54/0x8d
 [] pci_pme_active+0x30/0x210
 [] ? pci_read+0x2c/0x30 (this should be pci_stop_dev imho)
 [] pci_stop_bus_device+0x4e/0xa0
 [] pci_stop_bus_device+0x3b/0xa0
 [] pci_stop_bus_device+0x3b/0xa0
 [] pci_stop_and_remove_bus_device+0x12/0x20
 [] pciehp_unconfigure_device+0xa8/0x1b0
 [] pciehp_disable_slot+0x68/0x200
 [] pciehp_power_thread+0x83/0xf0
 [] process_one_work+0x178/0x470
 [] worker_thread+0x121/0x3a0
 [] ? manage_workers.isra.21+0x2b0/0x2b0
 [] kthread+0xc0/0xd0
 [] ? SyS_unshare+0x220/0x280
 [] ? kthread_create_on_node+0x120/0x120
 [] ret_from_fork+0x7c/0xb0
 [] ? kthread_create_on_node+0x120/0x120
tg3 :0a:00.0: PME# disabled

This is still fine. But then it gets interesting. The next call is:
 [] dump_stack+0x54/0x8d
 [] pci_pme_active+0x30/0x210
 [] __pci_enable_wake+0x65/0x160
 [] pci_wake_from_d3+0x25/0x40
 [] tg3_power_down+0x29/0x40 [tg3]
 [] tg3_close+0x10c/0x1d0 [tg3]
 [] __dev_close_many+0x85/0xd0
 [] dev_close_many+0x8b/0x100
 [] rollback_registered_many+0xd8/0x250
 [] rollback_registered+0x2d/0x40
 [] unregister_netdevice_queue+0x58/0xb0
 [] unregister_netdev+0x1c/0x30
 [] tg3_remove_one+0x6b/0x120 [tg3]
 [] pci_device_remove+0x3b/0xb0
 [] __device_release_driver+0x7f/0xf0
 [] device_release_driver+0x23/0x30
 [] bus_remove_device+0xf4/0x170
 [] device_del+0x135/0x1d0
 [] pci_stop_bus_device+0x94/0xa0
 [] pci_stop_bus_device+0x3b/0xa0
 [] pci_stop_bus_device+0x3b/0xa0
 [] pci_stop_and_remove_bus_device+0x12/0x20
 [] pciehp_unconfigure_device+0xa8/0x1b0
 [] pciehp_disable_slot+0x68/0x200
 [] pciehp_power_thread+0x83/0xf0
 [] process_one_work+0x178/0x470
 [] worker_thread+0x121/0x3a0
 [] ? manage_workers.isra.21+0x2b0/0x2b0
 [] kthread+0xc0/0xd0
 [] ? SyS_unshare+0x220/0x280
 [] ? kthread_create_on_node+0x120/0x120
 [] ret_from_fork+0x7c/0xb0
 [] ? kthread_create_on_node+0x120/0x120
tg3 :0a:00.0: PME# enabled

On removal tg3 calls pci_wake_from_d3 to enable/disable wake-on-lan.
This then calls pci_pme_activate(dev, true) for a device which is
about to be deleted. The linked commit does no longer call
pci_wake_from_d3, which "fixes" the problem.

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-26 Thread Andreas Noever
 Sorry, I didn't understand this.  Is this supposed to be an
 explanation of how 928bea fixes the oops that Andreas saw?  If so, can
 you be a little more explicit about when the pci_dev got freed and
 when pci_pme_list_scan() walked the list and accessed the freed area?

I did some more debugging and it seems that 928bea is innocent after
all. I added some debugging statements to pci_pme_active. The
additional delay seems to make the oops easier to trigger and I can
now replicate it up to
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5137a2ee2007d9cbbbeebd14abe08357a079b607
which makes much more sense.

Here is what's going on (in 3.11). First of all pci_pme_activate is
only ever called with false as the second paramter during boot. Now
when I unplug the adapter, the first call is:
 [814b1cc7] dump_stack+0x54/0x8d
 [812ae970] pci_pme_active+0x30/0x210
 [813bf2bc] ? pci_read+0x2c/0x30 (this should be pci_stop_dev imho)
 [812ac8ae] pci_stop_bus_device+0x4e/0xa0
 [812ac89b] pci_stop_bus_device+0x3b/0xa0
 [812ac89b] pci_stop_bus_device+0x3b/0xa0
 [812aca02] pci_stop_and_remove_bus_device+0x12/0x20
 [812c4698] pciehp_unconfigure_device+0xa8/0x1b0
 [812c3ff8] pciehp_disable_slot+0x68/0x200
 [812c4213] pciehp_power_thread+0x83/0xf0
 [8107b5b8] process_one_work+0x178/0x470
 [8107bf81] worker_thread+0x121/0x3a0
 [8107be60] ? manage_workers.isra.21+0x2b0/0x2b0
 [81082d80] kthread+0xc0/0xd0
 [8106] ? SyS_unshare+0x220/0x280
 [81082cc0] ? kthread_create_on_node+0x120/0x120
 [814c07ec] ret_from_fork+0x7c/0xb0
 [81082cc0] ? kthread_create_on_node+0x120/0x120
tg3 :0a:00.0: PME# disabled

This is still fine. But then it gets interesting. The next call is:
 [814b1cc7] dump_stack+0x54/0x8d
 [812ae970] pci_pme_active+0x30/0x210
 [812aebb5] __pci_enable_wake+0x65/0x160
 [812aecd5] pci_wake_from_d3+0x25/0x40
 [a0511c29] tg3_power_down+0x29/0x40 [tg3]
 [a0511d4c] tg3_close+0x10c/0x1d0 [tg3]
 [813d67b5] __dev_close_many+0x85/0xd0
 [813d68cb] dev_close_many+0x8b/0x100
 [813d8dd8] rollback_registered_many+0xd8/0x250
 [813d8f7d] rollback_registered+0x2d/0x40
 [813da828] unregister_netdevice_queue+0x58/0xb0
 [813da89c] unregister_netdev+0x1c/0x30
 [a050104b] tg3_remove_one+0x6b/0x120 [tg3]
 [812b1b0b] pci_device_remove+0x3b/0xb0
 [81371c1f] __device_release_driver+0x7f/0xf0
 [81371cb3] device_release_driver+0x23/0x30
 [81371484] bus_remove_device+0xf4/0x170
 [8136df45] device_del+0x135/0x1d0
 [812ac8f4] pci_stop_bus_device+0x94/0xa0
 [812ac89b] pci_stop_bus_device+0x3b/0xa0
 [812ac89b] pci_stop_bus_device+0x3b/0xa0
 [812aca02] pci_stop_and_remove_bus_device+0x12/0x20
 [812c4698] pciehp_unconfigure_device+0xa8/0x1b0
 [812c3ff8] pciehp_disable_slot+0x68/0x200
 [812c4213] pciehp_power_thread+0x83/0xf0
 [8107b5b8] process_one_work+0x178/0x470
 [8107bf81] worker_thread+0x121/0x3a0
 [8107be60] ? manage_workers.isra.21+0x2b0/0x2b0
 [81082d80] kthread+0xc0/0xd0
 [8106] ? SyS_unshare+0x220/0x280
 [81082cc0] ? kthread_create_on_node+0x120/0x120
 [814c07ec] ret_from_fork+0x7c/0xb0
 [81082cc0] ? kthread_create_on_node+0x120/0x120
tg3 :0a:00.0: PME# enabled

On removal tg3 calls pci_wake_from_d3 to enable/disable wake-on-lan.
This then calls pci_pme_activate(dev, true) for a device which is
about to be deleted. The linked commit does no longer call
pci_wake_from_d3, which fixes the problem.

Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-25 Thread Bjorn Helgaas
On Thu, Oct 24, 2013 at 11:13 PM, Yinghai Lu  wrote:
> On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas  wrote:
>>
> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
>>>
>>> that double disabling should be addressed by:
>>>
>>> https://lkml.org/lkml/2013/4/25/608
>>>
>>> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
>>
>> I'll look at that patch again.  I had some questions about it the
>> first time, but perhaps it makes more sense after 928bea9648 has been
>> applied.
>>
>> Andreas originally reported a GPF oops in pci_pme_list_scan().  I
>> posted a refcounting patch, which made the problem go away, but I
>> can't explain why, and I don't want to apply it without understanding
>> that.  Decoding his oops shows this:
>>
>>   24: 0f 1f 00 nopl   (%rax)
>>   27: 48 8b 50 10   mov0x10(%rax),%rdx
>>   2b:* 48 8b 52 38   mov0x38(%rdx),%rdx <-- trapping instruction
>>   2f: 48 85 d2 test   %rdx,%rdx
>>
>> %rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer,
>> which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
>> POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev
>> that has already been freed.
>>
>> pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
>> pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
>> the pci_dev by calling pci_pme_active(), which also holds
>> pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
>> see a pci_dev that has already been freed.
>>
>> If I understand Andreas correctly, 928bea9648 also fixes the crash,
>> even without my refcounting change.  Can you explain why?
>
> 928bea will make the dev->enable_cnt increase wrongly, as we have
> pci_enable_device for child
>pci_enable_bridge for parent
>  pci_enable_bridge for grandparent
>pci_enable_device for grandparent
>pci_enable_device for parent
>pci_enable_brdige for grandparent
>  pci_enable_device for grandparent.
> ...
>
> in that case grandprent will be enabled two times, and will enable_cnt will 
> have
> extra increase.
>
> so later pci_disable_device will not really call do_pci_disable_device
> do the really work, as enable_cnt still big.
>
> solution could be:
> let pci_enable_bridge call __pci_enable_device.
> and __pci_enable_device will not call pci_enable_bridge.

Sorry, I didn't understand this.  Is this supposed to be an
explanation of how 928bea fixes the oops that Andreas saw?  If so, can
you be a little more explicit about when the pci_dev got freed and
when pci_pme_list_scan() walked the list and accessed the freed area?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-25 Thread Bjorn Helgaas
On Thu, Oct 24, 2013 at 11:13 PM, Yinghai Lu ying...@kernel.org wrote:
 On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas bhelg...@google.com wrote:

 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is PCI: Delay enabling bridges until they're needed by Yinghai.

 that double disabling should be addressed by:

 https://lkml.org/lkml/2013/4/25/608

 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

 I'll look at that patch again.  I had some questions about it the
 first time, but perhaps it makes more sense after 928bea9648 has been
 applied.

 Andreas originally reported a GPF oops in pci_pme_list_scan().  I
 posted a refcounting patch, which made the problem go away, but I
 can't explain why, and I don't want to apply it without understanding
 that.  Decoding his oops shows this:

   24: 0f 1f 00 nopl   (%rax)
   27: 48 8b 50 10   mov0x10(%rax),%rdx
   2b:* 48 8b 52 38   mov0x38(%rdx),%rdx -- trapping instruction
   2f: 48 85 d2 test   %rdx,%rdx

 %rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer,
 which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
 POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev
 that has already been freed.

 pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
 pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
 the pci_dev by calling pci_pme_active(), which also holds
 pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
 see a pci_dev that has already been freed.

 If I understand Andreas correctly, 928bea9648 also fixes the crash,
 even without my refcounting change.  Can you explain why?

 928bea will make the dev-enable_cnt increase wrongly, as we have
 pci_enable_device for child
pci_enable_bridge for parent
  pci_enable_bridge for grandparent
pci_enable_device for grandparent
pci_enable_device for parent
pci_enable_brdige for grandparent
  pci_enable_device for grandparent.
 ...

 in that case grandprent will be enabled two times, and will enable_cnt will 
 have
 extra increase.

 so later pci_disable_device will not really call do_pci_disable_device
 do the really work, as enable_cnt still big.

 solution could be:
 let pci_enable_bridge call __pci_enable_device.
 and __pci_enable_device will not call pci_enable_bridge.

Sorry, I didn't understand this.  Is this supposed to be an
explanation of how 928bea fixes the oops that Andreas saw?  If so, can
you be a little more explicit about when the pci_dev got freed and
when pci_pme_list_scan() walked the list and accessed the freed area?

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-24 Thread Yinghai Lu
On Thu, Oct 24, 2013 at 10:13 PM, Yinghai Lu  wrote:
> On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas  wrote:
>>
> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
>>>
>>> that double disabling should be addressed by:
>>>
>>> https://lkml.org/lkml/2013/4/25/608
>>>
>>> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
>>
>> I'll look at that patch again.  I had some questions about it the
>> first time, but perhaps it makes more sense after 928bea9648 has been
>> applied.
>>
>> Andreas originally reported a GPF oops in pci_pme_list_scan().  I
>> posted a refcounting patch, which made the problem go away, but I
>> can't explain why, and I don't want to apply it without understanding
>> that.  Decoding his oops shows this:
>>
>>   24: 0f 1f 00 nopl   (%rax)
>>   27: 48 8b 50 10   mov0x10(%rax),%rdx
>>   2b:* 48 8b 52 38   mov0x38(%rdx),%rdx <-- trapping instruction
>>   2f: 48 85 d2 test   %rdx,%rdx
>>
>> %rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer,
>> which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
>> POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev
>> that has already been freed.
>>
>> pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
>> pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
>> the pci_dev by calling pci_pme_active(), which also holds
>> pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
>> see a pci_dev that has already been freed.
>>
>> If I understand Andreas correctly, 928bea9648 also fixes the crash,
>> even without my refcounting change.  Can you explain why?
>
> 928bea will make the dev->enable_cnt increase wrongly, as we have
> pci_enable_device for child
>pci_enable_bridge for parent
>  pci_enable_bridge for grandparent
>pci_enable_device for grandparent
>pci_enable_device for parent
>pci_enable_brdige for grandparent
>  pci_enable_device for grandparent.  ===> looks like i read the code 
> wrongly. This one is skipped as we have pci_is_enabled checking.

928bea delay bridge enabling, and it's pci_is_enabled checking will prevent
pci bridge get enabled second time, so enable_cnt will be only 1. instead of
2. if we enable bridge at first and then pcie port driver.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-24 Thread Yinghai Lu
On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas  wrote:
>
 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
>>>
>>> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
>>
>> that double disabling should be addressed by:
>>
>> https://lkml.org/lkml/2013/4/25/608
>>
>> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
>
> I'll look at that patch again.  I had some questions about it the
> first time, but perhaps it makes more sense after 928bea9648 has been
> applied.
>
> Andreas originally reported a GPF oops in pci_pme_list_scan().  I
> posted a refcounting patch, which made the problem go away, but I
> can't explain why, and I don't want to apply it without understanding
> that.  Decoding his oops shows this:
>
>   24: 0f 1f 00 nopl   (%rax)
>   27: 48 8b 50 10   mov0x10(%rax),%rdx
>   2b:* 48 8b 52 38   mov0x38(%rdx),%rdx <-- trapping instruction
>   2f: 48 85 d2 test   %rdx,%rdx
>
> %rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer,
> which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
> POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev
> that has already been freed.
>
> pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
> pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
> the pci_dev by calling pci_pme_active(), which also holds
> pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
> see a pci_dev that has already been freed.
>
> If I understand Andreas correctly, 928bea9648 also fixes the crash,
> even without my refcounting change.  Can you explain why?

928bea will make the dev->enable_cnt increase wrongly, as we have
pci_enable_device for child
   pci_enable_bridge for parent
 pci_enable_bridge for grandparent
   pci_enable_device for grandparent
   pci_enable_device for parent
   pci_enable_brdige for grandparent
 pci_enable_device for grandparent.
...

in that case grandprent will be enabled two times, and will enable_cnt will have
extra increase.

so later pci_disable_device will not really call do_pci_disable_device
do the really work, as enable_cnt still big.

solution could be:
let pci_enable_bridge call __pci_enable_device.
and __pci_enable_device will not call pci_enable_bridge.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-24 Thread Bjorn Helgaas
On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu  wrote:
> On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas  wrote:
>> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever  
>> wrote:
>>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas  wrote:
 On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever 
> >  wrote:
> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
> > > crashes a few seconds later. Using
> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
> > > to remove a bridge two levels above the device triggers the fault 
> > > immediately:

 We save a pci_dev pointer in the pci_pme_list, which of course has a
 longer lifetime than the pci_dev itself, but we don't acquire a reference
 on it, so I suspect the pci_dev got released before we got around to
 doing the pci_pme_list_scan().

 Andreas, can you try the patch below?  It's against v3.12-rc2, but it
 should apply to v3.11, too.
>>>
>>> I have tested your patch against 3.11 where it solves the problem. Thanks!
>>>
>>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only
>>> get the following warning (and no crash):
>>>
>>> tg3 :0a:00.0: PME# disabled
>>> pcieport :09:00.0: PME# disabled
>>> pciehp :09:00.0:pcie24: unloading service driver pciehp
>>> pci_bus :0a: dev 00, dec refcount to 0
>>> pci_bus :0a: dev 00, released physical slot 9
>>> [ cut here ]
>>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
>>> pci_disable_device+0x84/0x90()
>>> Device pcieport
>>> disabling already-disabled device
>>> ...

>>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
>>
>> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
>
> that double disabling should be addressed by:
>
> https://lkml.org/lkml/2013/4/25/608
>
> [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

I'll look at that patch again.  I had some questions about it the
first time, but perhaps it makes more sense after 928bea9648 has been
applied.

Andreas originally reported a GPF oops in pci_pme_list_scan().  I
posted a refcounting patch, which made the problem go away, but I
can't explain why, and I don't want to apply it without understanding
that.  Decoding his oops shows this:

  24: 0f 1f 00 nopl   (%rax)
  27: 48 8b 50 10   mov0x10(%rax),%rdx
  2b:* 48 8b 52 38   mov0x38(%rdx),%rdx <-- trapping instruction
  2f: 48 85 d2 test   %rdx,%rdx

%rax is the pci_dev pointer, so 0x10(%rax) is the dev->bus pointer,
which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
POISON_FREE, so I think we loaded dev->bus out of a struct pci_dev
that has already been freed.

pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
the pci_dev by calling pci_pme_active(), which also holds
pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
see a pci_dev that has already been freed.

If I understand Andreas correctly, 928bea9648 also fixes the crash,
even without my refcounting change.  Can you explain why?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-24 Thread Bjorn Helgaas
On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu ying...@kernel.org wrote:
 On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com wrote:
 On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever andreas.noe...@gmail.com 
 wrote:
 On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote:
 On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
 On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
  On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever 
  andreas.noe...@gmail.com wrote:
   When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
   crashes a few seconds later. Using
   echo 1  /sys/bus/pci/devices/:08:00.0/remove
   to remove a bridge two levels above the device triggers the fault 
   immediately:

 We save a pci_dev pointer in the pci_pme_list, which of course has a
 longer lifetime than the pci_dev itself, but we don't acquire a reference
 on it, so I suspect the pci_dev got released before we got around to
 doing the pci_pme_list_scan().

 Andreas, can you try the patch below?  It's against v3.12-rc2, but it
 should apply to v3.11, too.

 I have tested your patch against 3.11 where it solves the problem. Thanks!

 Unfortunately I could not reproduce the problem in 3.12-rc5. I only
 get the following warning (and no crash):

 tg3 :0a:00.0: PME# disabled
 pcieport :09:00.0: PME# disabled
 pciehp :09:00.0:pcie24: unloading service driver pciehp
 pci_bus :0a: dev 00, dec refcount to 0
 pci_bus :0a: dev 00, released physical slot 9
 [ cut here ]
 WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
 pci_disable_device+0x84/0x90()
 Device pcieport
 disabling already-disabled device
 ...

 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is PCI: Delay enabling bridges until they're needed by Yinghai.

 that double disabling should be addressed by:

 https://lkml.org/lkml/2013/4/25/608

 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

I'll look at that patch again.  I had some questions about it the
first time, but perhaps it makes more sense after 928bea9648 has been
applied.

Andreas originally reported a GPF oops in pci_pme_list_scan().  I
posted a refcounting patch, which made the problem go away, but I
can't explain why, and I don't want to apply it without understanding
that.  Decoding his oops shows this:

  24: 0f 1f 00 nopl   (%rax)
  27: 48 8b 50 10   mov0x10(%rax),%rdx
  2b:* 48 8b 52 38   mov0x38(%rdx),%rdx -- trapping instruction
  2f: 48 85 d2 test   %rdx,%rdx

%rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer,
which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev
that has already been freed.

pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
the pci_dev by calling pci_pme_active(), which also holds
pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
see a pci_dev that has already been freed.

If I understand Andreas correctly, 928bea9648 also fixes the crash,
even without my refcounting change.  Can you explain why?

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-24 Thread Yinghai Lu
On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas bhelg...@google.com wrote:

 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is PCI: Delay enabling bridges until they're needed by Yinghai.

 that double disabling should be addressed by:

 https://lkml.org/lkml/2013/4/25/608

 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

 I'll look at that patch again.  I had some questions about it the
 first time, but perhaps it makes more sense after 928bea9648 has been
 applied.

 Andreas originally reported a GPF oops in pci_pme_list_scan().  I
 posted a refcounting patch, which made the problem go away, but I
 can't explain why, and I don't want to apply it without understanding
 that.  Decoding his oops shows this:

   24: 0f 1f 00 nopl   (%rax)
   27: 48 8b 50 10   mov0x10(%rax),%rdx
   2b:* 48 8b 52 38   mov0x38(%rdx),%rdx -- trapping instruction
   2f: 48 85 d2 test   %rdx,%rdx

 %rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer,
 which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
 POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev
 that has already been freed.

 pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
 pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
 the pci_dev by calling pci_pme_active(), which also holds
 pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
 see a pci_dev that has already been freed.

 If I understand Andreas correctly, 928bea9648 also fixes the crash,
 even without my refcounting change.  Can you explain why?

928bea will make the dev-enable_cnt increase wrongly, as we have
pci_enable_device for child
   pci_enable_bridge for parent
 pci_enable_bridge for grandparent
   pci_enable_device for grandparent
   pci_enable_device for parent
   pci_enable_brdige for grandparent
 pci_enable_device for grandparent.
...

in that case grandprent will be enabled two times, and will enable_cnt will have
extra increase.

so later pci_disable_device will not really call do_pci_disable_device
do the really work, as enable_cnt still big.

solution could be:
let pci_enable_bridge call __pci_enable_device.
and __pci_enable_device will not call pci_enable_bridge.

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-24 Thread Yinghai Lu
On Thu, Oct 24, 2013 at 10:13 PM, Yinghai Lu ying...@kernel.org wrote:
 On Thu, Oct 24, 2013 at 8:33 PM, Bjorn Helgaas bhelg...@google.com wrote:

 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is PCI: Delay enabling bridges until they're needed by Yinghai.

 that double disabling should be addressed by:

 https://lkml.org/lkml/2013/4/25/608

 [PATCH] PCI: Remove duplicate pci_disable_device for pcie port

 I'll look at that patch again.  I had some questions about it the
 first time, but perhaps it makes more sense after 928bea9648 has been
 applied.

 Andreas originally reported a GPF oops in pci_pme_list_scan().  I
 posted a refcounting patch, which made the problem go away, but I
 can't explain why, and I don't want to apply it without understanding
 that.  Decoding his oops shows this:

   24: 0f 1f 00 nopl   (%rax)
   27: 48 8b 50 10   mov0x10(%rax),%rdx
   2b:* 48 8b 52 38   mov0x38(%rdx),%rdx -- trapping instruction
   2f: 48 85 d2 test   %rdx,%rdx

 %rax is the pci_dev pointer, so 0x10(%rax) is the dev-bus pointer,
 which we put in %rdx.  The oops says %rdx = 6b6b6b6b6b6b6b6b, which is
 POISON_FREE, so I think we loaded dev-bus out of a struct pci_dev
 that has already been freed.

 pci_pme_list_scan() holds pci_pme_list_mutex while it traverses
 pci_pme_list, and the pci_stop_and_remove_bus_device() path removes
 the pci_dev by calling pci_pme_active(), which also holds
 pci_pme_list_mutex, so I don't understand how pci_pme_list_scan() can
 see a pci_dev that has already been freed.

 If I understand Andreas correctly, 928bea9648 also fixes the crash,
 even without my refcounting change.  Can you explain why?

 928bea will make the dev-enable_cnt increase wrongly, as we have
 pci_enable_device for child
pci_enable_bridge for parent
  pci_enable_bridge for grandparent
pci_enable_device for grandparent
pci_enable_device for parent
pci_enable_brdige for grandparent
  pci_enable_device for grandparent.  === looks like i read the code 
 wrongly. This one is skipped as we have pci_is_enabled checking.

928bea delay bridge enabling, and it's pci_is_enabled checking will prevent
pci bridge get enabled second time, so enable_cnt will be only 1. instead of
2. if we enable bridge at first and then pcie port driver.

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-23 Thread Yinghai Lu
On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas  wrote:
> [+cc Yinghai]
>
> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever
>  wrote:
>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas  wrote:
>>> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
 On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
 > [+cc Rafael, Mika, Kirill, linux-pci]
 >
 > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
 >  wrote:
 > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
 > > crashes a few seconds later. Using
 > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
 > > to remove a bridge two levels above the device triggers the fault 
 > > immediately:
 >
 > There have been significant changes in acpiphp related to Thunderbolt
 > since v3.11.

 Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
 I'd be surprised if acpiphp makes a difference here.
>>>
>>> Yeah, you're right; I wasn't paying attention.
>>>
>>> We save a pci_dev pointer in the pci_pme_list, which of course has a
>>> longer lifetime than the pci_dev itself, but we don't acquire a reference
>>> on it, so I suspect the pci_dev got released before we got around to
>>> doing the pci_pme_list_scan().
>>>
>>> Andreas, can you try the patch below?  It's against v3.12-rc2, but it
>>> should apply to v3.11, too.
>>
>> I have tested your patch against 3.11 where it solves the problem. Thanks!
>>
>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only
>> get the following warning (and no crash):
>>
>> tg3 :0a:00.0: PME# disabled
>> pcieport :09:00.0: PME# disabled
>> pciehp :09:00.0:pcie24: unloading service driver pciehp
>> pci_bus :0a: dev 00, dec refcount to 0
>> pci_bus :0a: dev 00, released physical slot 9
>> [ cut here ]
>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
>> pci_disable_device+0x84/0x90()
>> Device pcieport
>> disabling already-disabled device
>> Modules linked in:
>>  btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus
>> vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp
>> coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel
>> videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops
>> aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul
>> ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media
>> applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid
>> pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux
>> lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc
>> snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core
>> apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata
>> xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core
>>  usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button
>> i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core
>> CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30
>> Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
>> MBP101.88Z.00EE.B03.1212211437 12/21/2012
>> Workqueue: sysfsd sysfs_schedule_callback_work
>>  0009 88044c021c00 814c4288 88044c021c48
>>  88044c021c38 81061b7d 880458a5c000 8187c5c0
>>  880458a5c000 880458a5b098  88044c021c98
>> Call Trace:
>>  [] dump_stack+0x54/0x8d
>>  [] warn_slowpath_common+0x7d/0xa0
>>  [] warn_slowpath_fmt+0x4c/0x50
>>  [] ? do_pci_disable_device+0x52/0x60
>>  [] ? acpi_pci_irq_disable+0x4c/0x8d
>>  [] pci_disable_device+0x84/0x90
>>  [] pcie_portdrv_remove+0x1a/0x20
>>  [] pci_device_remove+0x3b/0xb0
>>  [] __device_release_driver+0x7f/0xf0
>>  [] device_release_driver+0x23/0x30
>>  [] bus_remove_device+0x108/0x180
>>  [] device_del+0x135/0x1d0
>>  [] pci_stop_bus_device+0x94/0xa0
>>  [] pci_stop_bus_device+0x3b/0xa0
>>  [] pci_stop_and_remove_bus_device+0x12/0x20
>>  [] remove_callback+0x25/0x40
>>  [] sysfs_schedule_callback_work+0x14/0x80
>>  [] process_one_work+0x178/0x470
>>  [] worker_thread+0x121/0x3a0
>>  [] ? manage_workers.isra.21+0x2b0/0x2b0
>>  [] kthread+0xc0/0xd0
>>  [] ? kthread_create_on_node+0x120/0x120
>>  [] ret_from_fork+0x7c/0xb0
>>  [] ? kthread_create_on_node+0x120/0x120
>> ---[ end trace b39a15fa94fbb2a2 ]---
>>
>>
>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
>
> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.

that double disabling should be addressed by:

https://lkml.org/lkml/2013/4/25/608

[PATCH] PCI: Remove duplicate pci_disable_device for pcie port

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-23 Thread Bjorn Helgaas
On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever
 wrote:
> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas  wrote:
>> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
>>> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
>>> > [+cc Rafael, Mika, Kirill, linux-pci]
>>> >
>>> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
>>> >  wrote:
>>> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
>>> > > crashes a few seconds later. Using
>>> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
>>> > > to remove a bridge two levels above the device triggers the fault 
>>> > > immediately:
>>> >
>>> > There have been significant changes in acpiphp related to Thunderbolt
>>> > since v3.11.
>>>
>>> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
>>> I'd be surprised if acpiphp makes a difference here.
>>
>> Yeah, you're right; I wasn't paying attention.
>>
>> We save a pci_dev pointer in the pci_pme_list, which of course has a
>> longer lifetime than the pci_dev itself, but we don't acquire a reference
>> on it, so I suspect the pci_dev got released before we got around to
>> doing the pci_pme_list_scan().
>>
>> Andreas, can you try the patch below?  It's against v3.12-rc2, but it
>> should apply to v3.11, too.
>
> I have tested your patch against 3.11 where it solves the problem. Thanks!

Hi Andreas, sorry for the delay here.  I'm still trying to understand
exactly why my patch fixes the problem, since I don't see a relevant
refcounting change between v3.11 and v3.12-rc5.  And I don't actually
see the hole yet from inspection.  It seems like we should be safe
even without my patch.

But maybe it's a case of releasing the pci_bus before releasing a
pci_dev on the bus.  I thought we recently fixed a hole there, but
maybe not.  I'll look more carefully at that path.

Can I trouble you to collect a complete dmesg log from v3.11 without
my patch?  Maybe if I stare long enough at that and the lspci you
supplied, I can figure out what's going on.  If you were really
gung-ho, you could add instrumentation to print out the pci_dev and
pci_bus pointers as we enumerate them.  My guess is that we'd see one
of those pointers in the GPF register dump.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-23 Thread Bjorn Helgaas
On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever
andreas.noe...@gmail.com wrote:
 On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote:
 On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
 On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
  [+cc Rafael, Mika, Kirill, linux-pci]
 
  On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
  andreas.noe...@gmail.com wrote:
   When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
   crashes a few seconds later. Using
   echo 1  /sys/bus/pci/devices/:08:00.0/remove
   to remove a bridge two levels above the device triggers the fault 
   immediately:
 
  There have been significant changes in acpiphp related to Thunderbolt
  since v3.11.

 Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
 I'd be surprised if acpiphp makes a difference here.

 Yeah, you're right; I wasn't paying attention.

 We save a pci_dev pointer in the pci_pme_list, which of course has a
 longer lifetime than the pci_dev itself, but we don't acquire a reference
 on it, so I suspect the pci_dev got released before we got around to
 doing the pci_pme_list_scan().

 Andreas, can you try the patch below?  It's against v3.12-rc2, but it
 should apply to v3.11, too.

 I have tested your patch against 3.11 where it solves the problem. Thanks!

Hi Andreas, sorry for the delay here.  I'm still trying to understand
exactly why my patch fixes the problem, since I don't see a relevant
refcounting change between v3.11 and v3.12-rc5.  And I don't actually
see the hole yet from inspection.  It seems like we should be safe
even without my patch.

But maybe it's a case of releasing the pci_bus before releasing a
pci_dev on the bus.  I thought we recently fixed a hole there, but
maybe not.  I'll look more carefully at that path.

Can I trouble you to collect a complete dmesg log from v3.11 without
my patch?  Maybe if I stare long enough at that and the lspci you
supplied, I can figure out what's going on.  If you were really
gung-ho, you could add instrumentation to print out the pci_dev and
pci_bus pointers as we enumerate them.  My guess is that we'd see one
of those pointers in the GPF register dump.

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-23 Thread Yinghai Lu
On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas bhelg...@google.com wrote:
 [+cc Yinghai]

 On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever
 andreas.noe...@gmail.com wrote:
 On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote:
 On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
 On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
  [+cc Rafael, Mika, Kirill, linux-pci]
 
  On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
  andreas.noe...@gmail.com wrote:
   When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
   crashes a few seconds later. Using
   echo 1  /sys/bus/pci/devices/:08:00.0/remove
   to remove a bridge two levels above the device triggers the fault 
   immediately:
 
  There have been significant changes in acpiphp related to Thunderbolt
  since v3.11.

 Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
 I'd be surprised if acpiphp makes a difference here.

 Yeah, you're right; I wasn't paying attention.

 We save a pci_dev pointer in the pci_pme_list, which of course has a
 longer lifetime than the pci_dev itself, but we don't acquire a reference
 on it, so I suspect the pci_dev got released before we got around to
 doing the pci_pme_list_scan().

 Andreas, can you try the patch below?  It's against v3.12-rc2, but it
 should apply to v3.11, too.

 I have tested your patch against 3.11 where it solves the problem. Thanks!

 Unfortunately I could not reproduce the problem in 3.12-rc5. I only
 get the following warning (and no crash):

 tg3 :0a:00.0: PME# disabled
 pcieport :09:00.0: PME# disabled
 pciehp :09:00.0:pcie24: unloading service driver pciehp
 pci_bus :0a: dev 00, dec refcount to 0
 pci_bus :0a: dev 00, released physical slot 9
 [ cut here ]
 WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
 pci_disable_device+0x84/0x90()
 Device pcieport
 disabling already-disabled device
 Modules linked in:
  btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus
 vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp
 coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel
 videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops
 aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul
 ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media
 applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid
 pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux
 lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc
 snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core
 apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata
 xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core
  usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button
 i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core
 CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30
 Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
 MBP101.88Z.00EE.B03.1212211437 12/21/2012
 Workqueue: sysfsd sysfs_schedule_callback_work
  0009 88044c021c00 814c4288 88044c021c48
  88044c021c38 81061b7d 880458a5c000 8187c5c0
  880458a5c000 880458a5b098  88044c021c98
 Call Trace:
  [814c4288] dump_stack+0x54/0x8d
  [81061b7d] warn_slowpath_common+0x7d/0xa0
  [81061bec] warn_slowpath_fmt+0x4c/0x50
  [812bdd92] ? do_pci_disable_device+0x52/0x60
  [813097f3] ? acpi_pci_irq_disable+0x4c/0x8d
  [812bde24] pci_disable_device+0x84/0x90
  [812cc62a] pcie_portdrv_remove+0x1a/0x20
  [812bfcdb] pci_device_remove+0x3b/0xb0
  [81381caf] __device_release_driver+0x7f/0xf0
  [81381d43] device_release_driver+0x23/0x30
  [813814d8] bus_remove_device+0x108/0x180
  [8137de75] device_del+0x135/0x1d0
  [812ba394] pci_stop_bus_device+0x94/0xa0
  [812ba33b] pci_stop_bus_device+0x3b/0xa0
  [812ba4a2] pci_stop_and_remove_bus_device+0x12/0x20
  [812c15c5] remove_callback+0x25/0x40
  [81212ad4] sysfs_schedule_callback_work+0x14/0x80
  [8107c9e8] process_one_work+0x178/0x470
  [8107d3b1] worker_thread+0x121/0x3a0
  [8107d290] ? manage_workers.isra.21+0x2b0/0x2b0
  [810840f0] kthread+0xc0/0xd0
  [81084030] ? kthread_create_on_node+0x120/0x120
  [814d2dfc] ret_from_fork+0x7c/0xb0
  [81084030] ? kthread_create_on_node+0x120/0x120
 ---[ end trace b39a15fa94fbb2a2 ]---


 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

 This is PCI: Delay enabling bridges until they're needed by Yinghai.

that double disabling should be addressed by:

https://lkml.org/lkml/2013/4/25/608

[PATCH] PCI: Remove duplicate pci_disable_device for pcie port

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe 

Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-22 Thread Bjorn Helgaas
[+cc Yinghai]

On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever
 wrote:
> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas  wrote:
>> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
>>> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
>>> > [+cc Rafael, Mika, Kirill, linux-pci]
>>> >
>>> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
>>> >  wrote:
>>> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
>>> > > crashes a few seconds later. Using
>>> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
>>> > > to remove a bridge two levels above the device triggers the fault 
>>> > > immediately:
>>> >
>>> > There have been significant changes in acpiphp related to Thunderbolt
>>> > since v3.11.
>>>
>>> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
>>> I'd be surprised if acpiphp makes a difference here.
>>
>> Yeah, you're right; I wasn't paying attention.
>>
>> We save a pci_dev pointer in the pci_pme_list, which of course has a
>> longer lifetime than the pci_dev itself, but we don't acquire a reference
>> on it, so I suspect the pci_dev got released before we got around to
>> doing the pci_pme_list_scan().
>>
>> Andreas, can you try the patch below?  It's against v3.12-rc2, but it
>> should apply to v3.11, too.
>
> I have tested your patch against 3.11 where it solves the problem. Thanks!
>
> Unfortunately I could not reproduce the problem in 3.12-rc5. I only
> get the following warning (and no crash):
>
> tg3 :0a:00.0: PME# disabled
> pcieport :09:00.0: PME# disabled
> pciehp :09:00.0:pcie24: unloading service driver pciehp
> pci_bus :0a: dev 00, dec refcount to 0
> pci_bus :0a: dev 00, released physical slot 9
> [ cut here ]
> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
> pci_disable_device+0x84/0x90()
> Device pcieport
> disabling already-disabled device
> Modules linked in:
>  btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus
> vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel
> videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops
> aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul
> ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media
> applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid
> pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux
> lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc
> snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core
> apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata
> xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core
>  usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button
> i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core
> CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30
> Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
> MBP101.88Z.00EE.B03.1212211437 12/21/2012
> Workqueue: sysfsd sysfs_schedule_callback_work
>  0009 88044c021c00 814c4288 88044c021c48
>  88044c021c38 81061b7d 880458a5c000 8187c5c0
>  880458a5c000 880458a5b098  88044c021c98
> Call Trace:
>  [] dump_stack+0x54/0x8d
>  [] warn_slowpath_common+0x7d/0xa0
>  [] warn_slowpath_fmt+0x4c/0x50
>  [] ? do_pci_disable_device+0x52/0x60
>  [] ? acpi_pci_irq_disable+0x4c/0x8d
>  [] pci_disable_device+0x84/0x90
>  [] pcie_portdrv_remove+0x1a/0x20
>  [] pci_device_remove+0x3b/0xb0
>  [] __device_release_driver+0x7f/0xf0
>  [] device_release_driver+0x23/0x30
>  [] bus_remove_device+0x108/0x180
>  [] device_del+0x135/0x1d0
>  [] pci_stop_bus_device+0x94/0xa0
>  [] pci_stop_bus_device+0x3b/0xa0
>  [] pci_stop_and_remove_bus_device+0x12/0x20
>  [] remove_callback+0x25/0x40
>  [] sysfs_schedule_callback_work+0x14/0x80
>  [] process_one_work+0x178/0x470
>  [] worker_thread+0x121/0x3a0
>  [] ? manage_workers.isra.21+0x2b0/0x2b0
>  [] kthread+0xc0/0xd0
>  [] ? kthread_create_on_node+0x120/0x120
>  [] ret_from_fork+0x7c/0xb0
>  [] ? kthread_create_on_node+0x120/0x120
> ---[ end trace b39a15fa94fbb2a2 ]---
>
>
> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

This is "PCI: Delay enabling bridges until they're needed" by Yinghai.

Yinghai, please comment.

> From this commit on the pci_pme_list_scan crash disappears and the
> warning appears.
>
> Since this commit seems to just mask the problem I went ahead and
> tested your patch on 3.12-rc5 as well. It seems to work (not crash)
> but the warning is still there.
>
> The above warning was triggered by removing the 08 bridge via sysfs.
> The same warning can be triggered by unplugging the adapter (dmesg
> below). The ethernet card is removed immediately. The bridges follow
> 15 seconds later together with the warning. The topology is:
> 06:03.0 -- 08 -- 09 -- 0a 

Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-22 Thread Bjorn Helgaas
[+cc Yinghai]

On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever
andreas.noe...@gmail.com wrote:
 On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas bhelg...@google.com wrote:
 On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
 On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
  [+cc Rafael, Mika, Kirill, linux-pci]
 
  On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
  andreas.noe...@gmail.com wrote:
   When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
   crashes a few seconds later. Using
   echo 1  /sys/bus/pci/devices/:08:00.0/remove
   to remove a bridge two levels above the device triggers the fault 
   immediately:
 
  There have been significant changes in acpiphp related to Thunderbolt
  since v3.11.

 Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
 I'd be surprised if acpiphp makes a difference here.

 Yeah, you're right; I wasn't paying attention.

 We save a pci_dev pointer in the pci_pme_list, which of course has a
 longer lifetime than the pci_dev itself, but we don't acquire a reference
 on it, so I suspect the pci_dev got released before we got around to
 doing the pci_pme_list_scan().

 Andreas, can you try the patch below?  It's against v3.12-rc2, but it
 should apply to v3.11, too.

 I have tested your patch against 3.11 where it solves the problem. Thanks!

 Unfortunately I could not reproduce the problem in 3.12-rc5. I only
 get the following warning (and no crash):

 tg3 :0a:00.0: PME# disabled
 pcieport :09:00.0: PME# disabled
 pciehp :09:00.0:pcie24: unloading service driver pciehp
 pci_bus :0a: dev 00, dec refcount to 0
 pci_bus :0a: dev 00, released physical slot 9
 [ cut here ]
 WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
 pci_disable_device+0x84/0x90()
 Device pcieport
 disabling already-disabled device
 Modules linked in:
  btusb bluetooth joydev hid_apple bcm5974 nls_utf8 nls_cp437 hfsplus
 vfat fat snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp
 coretemp kvm_intel kvm cfg80211 uvcvideo crc32_pclmul crc32c_intel
 videobuf2_vmalloc ghash_clmulni_intel aesni_intel videobuf2_memops
 aes_x86_64 glue_helper videobuf2_core tg3 videodev lrw gf128mul
 ablk_helper iTCO_wdt hid_generic iTCO_vendor_support cryptd media
 applesmc input_polldev usbhid ptp microcode snd_hda_codec_cirrus hid
 pps_core libphy rfkill i2c_i801 pcspkr snd_hda_intel apple_gmux
 lib80211 snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm snd_page_alloc
 snd_timer mei_me snd mei processor soundcore lpc_ich evdev mfd_core
 apple_bl ac battery ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata
 xhci_hcd ehci_pci sdhci_pci ehci_hcd sdhci scsi_mod mmc_core
  usbcore usb_common nouveau mxm_wmi wmi ttm i915 video button
 i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core
 CPU: 0 PID: 122 Comm: kworker/u16:5 Not tainted 3.12.0-1-dirty #30
 Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
 MBP101.88Z.00EE.B03.1212211437 12/21/2012
 Workqueue: sysfsd sysfs_schedule_callback_work
  0009 88044c021c00 814c4288 88044c021c48
  88044c021c38 81061b7d 880458a5c000 8187c5c0
  880458a5c000 880458a5b098  88044c021c98
 Call Trace:
  [814c4288] dump_stack+0x54/0x8d
  [81061b7d] warn_slowpath_common+0x7d/0xa0
  [81061bec] warn_slowpath_fmt+0x4c/0x50
  [812bdd92] ? do_pci_disable_device+0x52/0x60
  [813097f3] ? acpi_pci_irq_disable+0x4c/0x8d
  [812bde24] pci_disable_device+0x84/0x90
  [812cc62a] pcie_portdrv_remove+0x1a/0x20
  [812bfcdb] pci_device_remove+0x3b/0xb0
  [81381caf] __device_release_driver+0x7f/0xf0
  [81381d43] device_release_driver+0x23/0x30
  [813814d8] bus_remove_device+0x108/0x180
  [8137de75] device_del+0x135/0x1d0
  [812ba394] pci_stop_bus_device+0x94/0xa0
  [812ba33b] pci_stop_bus_device+0x3b/0xa0
  [812ba4a2] pci_stop_and_remove_bus_device+0x12/0x20
  [812c15c5] remove_callback+0x25/0x40
  [81212ad4] sysfs_schedule_callback_work+0x14/0x80
  [8107c9e8] process_one_work+0x178/0x470
  [8107d3b1] worker_thread+0x121/0x3a0
  [8107d290] ? manage_workers.isra.21+0x2b0/0x2b0
  [810840f0] kthread+0xc0/0xd0
  [81084030] ? kthread_create_on_node+0x120/0x120
  [814d2dfc] ret_from_fork+0x7c/0xb0
  [81084030] ? kthread_create_on_node+0x120/0x120
 ---[ end trace b39a15fa94fbb2a2 ]---


 Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .

This is PCI: Delay enabling bridges until they're needed by Yinghai.

Yinghai, please comment.

 From this commit on the pci_pme_list_scan crash disappears and the
 warning appears.

 Since this commit seems to just mask the problem I went ahead and
 tested your patch on 3.12-rc5 as well. It seems to work (not crash)
 but the warning is still there.

 The above warning 

Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-16 Thread Bjorn Helgaas
On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
> > [+cc Rafael, Mika, Kirill, linux-pci]
> > 
> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
> >  wrote:
> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
> > > crashes a few seconds later. Using
> > > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
> > > to remove a bridge two levels above the device triggers the fault 
> > > immediately:
> > 
> > There have been significant changes in acpiphp related to Thunderbolt
> > since v3.11.
> 
> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. 
> I'd be surprised if acpiphp makes a difference here.

Yeah, you're right; I wasn't paying attention.

We save a pci_dev pointer in the pci_pme_list, which of course has a
longer lifetime than the pci_dev itself, but we don't acquire a reference
on it, so I suspect the pci_dev got released before we got around to
doing the pci_pme_list_scan().

Andreas, can you try the patch below?  It's against v3.12-rc2, but it
should apply to v3.11, too.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ad7fc72..8b0a2f3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work)
pci_pme_wakeup(pme_dev->dev, NULL);
} else {
list_del(_dev->list);
+   pci_dev_put(pme_dev->dev);
kfree(pme_dev);
}
}
@@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
  GFP_KERNEL);
if (!pme_dev)
goto out;
-   pme_dev->dev = dev;
+   pme_dev->dev = pci_dev_get(dev);
mutex_lock(_pme_list_mutex);
list_add(_dev->list, _pme_list);
if (list_is_singular(_pme_list))
@@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
list_for_each_entry(pme_dev, _pme_list, list) {
if (pme_dev->dev == dev) {
list_del(_dev->list);
+   pci_dev_put(pme_dev->dev);
kfree(pme_dev);
break;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-16 Thread Bjorn Helgaas
On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
 On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
  [+cc Rafael, Mika, Kirill, linux-pci]
  
  On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
  andreas.noe...@gmail.com wrote:
   When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
   crashes a few seconds later. Using
   echo 1  /sys/bus/pci/devices/:08:00.0/remove
   to remove a bridge two levels above the device triggers the fault 
   immediately:
  
  There have been significant changes in acpiphp related to Thunderbolt
  since v3.11.
 
 Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. 
 I'd be surprised if acpiphp makes a difference here.

Yeah, you're right; I wasn't paying attention.

We save a pci_dev pointer in the pci_pme_list, which of course has a
longer lifetime than the pci_dev itself, but we don't acquire a reference
on it, so I suspect the pci_dev got released before we got around to
doing the pci_pme_list_scan().

Andreas, can you try the patch below?  It's against v3.12-rc2, but it
should apply to v3.11, too.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ad7fc72..8b0a2f3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work)
pci_pme_wakeup(pme_dev-dev, NULL);
} else {
list_del(pme_dev-list);
+   pci_dev_put(pme_dev-dev);
kfree(pme_dev);
}
}
@@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
  GFP_KERNEL);
if (!pme_dev)
goto out;
-   pme_dev-dev = dev;
+   pme_dev-dev = pci_dev_get(dev);
mutex_lock(pci_pme_list_mutex);
list_add(pme_dev-list, pci_pme_list);
if (list_is_singular(pci_pme_list))
@@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
list_for_each_entry(pme_dev, pci_pme_list, list) {
if (pme_dev-dev == dev) {
list_del(pme_dev-list);
+   pci_dev_put(pme_dev-dev);
kfree(pme_dev);
break;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-14 Thread Matthew Garrett
On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
> [+cc Rafael, Mika, Kirill, linux-pci]
> 
> On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
>  wrote:
> > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
> > crashes a few seconds later. Using
> > echo 1 > /sys/bus/pci/devices/:08:00.0/remove
> > to remove a bridge two levels above the device triggers the fault 
> > immediately:
> 
> There have been significant changes in acpiphp related to Thunderbolt
> since v3.11.

Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. 
I'd be surprised if acpiphp makes a difference here.

(Whine whine Intel continuing to refuse to provide documentation for a 
widely shipped part whine whine)

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-14 Thread Bjorn Helgaas
[+cc Rafael, Mika, Kirill, linux-pci]

On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
 wrote:
> When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
> crashes a few seconds later. Using
> echo 1 > /sys/bus/pci/devices/:08:00.0/remove
> to remove a bridge two levels above the device triggers the fault immediately:

There have been significant changes in acpiphp related to Thunderbolt
since v3.11.  Any chance you can try reproduce this problem on a
current kernel, e.g., v3.12-rc5?  If it still happens, can you collect
a complete dmesg log, acpidump, and "lspci -vv" output, and attach
them to a new http://bugzilla.kernel.org report?

Since you're doing a remove two levels above the Thunderbolt device,
and it looks like pciehp is handling this part, you might be seeing
something new, but the info above will still be a good start in
looking at it.

Bjorn

> pciehp :09:00.0:pcie24: unloading service driver pciehp
> pci_bus :0a: busn_res: [bus 0a] is released
> pci_bus :09: busn_res: [bus 09-0a] is released
> general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> 
> Workqueue: events pci_pme_list_scan
> task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000
> RIP: 0010:[]  [] 
> pci_pme_list_scan+0x3c/0xe0
> RSP: 0018:88044ac63e10  EFLAGS: 00010202
> RAX: 88045601e7b0 RBX: 8187b070 RCX: 
> RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8
> RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0
> R10: 0001f9e0c287afc0 R11:  R12: 880453250ca8
> R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8
> FS:  () GS:88046d04() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0
> Stack:
>  8187afc0 88044a920e40 88044ac63e68 8107ddd6
>  6d053d00  88046d053d00 88046d053d18
>  88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8
> Call Trace:
>  [] process_one_work+0x176/0x470
>  [] worker_thread+0x11b/0x3a0
>  [] ? manage_workers.isra.21+0x2b0/0x2b0
>  [] kthread+0xc0/0xd0
>  [] ? kthread_create_on_node+0x110/0x110
>  [] ret_from_fork+0x7c/0xb0
>  [] ? kthread_create_on_node+0x110/0x110
> Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81
> 0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 <48>
> 8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7
> RIP  [] pci_pme_list_scan+0x3c/0xe0
>  RSP 
> ---[ end trace 3905f90a7dacf7b3 ]---
>
> The offending line is:
> (gdb) list *(pci_pme_list_scan+0x3c)
> 0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551).
> 1546if (!list_empty(_pme_list)) {
> 1547list_for_each_entry_safe(pme_dev, n,
> _pme_list, list) {
> 1548if (pme_dev->dev->pme_poll) {
> 1549struct pci_dev *bridge;
> 1550
> 1551bridge = pme_dev->dev->bus->self;
> 1552/*
> 1553 * If bridge is in low power state, 
> the
> 1554 * configuration space of
> subordinate devices
> 1555 * may be not accessible
> If I read the disassembly correctly then the deref of bus seems to
> cause the oops.
>
> An almost identical bug was reported (and fixed) some time ago:
> http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html
>
> Andreas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-14 Thread Andreas Noever
When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
crashes a few seconds later. Using
echo 1 > /sys/bus/pci/devices/:08:00.0/remove
to remove a bridge two levels above the device triggers the fault immediately:

pciehp :09:00.0:pcie24: unloading service driver pciehp
pci_bus :0a: busn_res: [bus 0a] is released
pci_bus :09: busn_res: [bus 09-0a] is released
general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC

Workqueue: events pci_pme_list_scan
task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000
RIP: 0010:[]  [] pci_pme_list_scan+0x3c/0xe0
RSP: 0018:88044ac63e10  EFLAGS: 00010202
RAX: 88045601e7b0 RBX: 8187b070 RCX: 
RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8
RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0
R10: 0001f9e0c287afc0 R11:  R12: 880453250ca8
R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8
FS:  () GS:88046d04() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0
Stack:
 8187afc0 88044a920e40 88044ac63e68 8107ddd6
 6d053d00  88046d053d00 88046d053d18
 88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8
Call Trace:
 [] process_one_work+0x176/0x470
 [] worker_thread+0x11b/0x3a0
 [] ? manage_workers.isra.21+0x2b0/0x2b0
 [] kthread+0xc0/0xd0
 [] ? kthread_create_on_node+0x110/0x110
 [] ret_from_fork+0x7c/0xb0
 [] ? kthread_create_on_node+0x110/0x110
Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81
0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 <48>
8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7
RIP  [] pci_pme_list_scan+0x3c/0xe0
 RSP 
---[ end trace 3905f90a7dacf7b3 ]---

The offending line is:
(gdb) list *(pci_pme_list_scan+0x3c)
0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551).
1546if (!list_empty(_pme_list)) {
1547list_for_each_entry_safe(pme_dev, n,
_pme_list, list) {
1548if (pme_dev->dev->pme_poll) {
1549struct pci_dev *bridge;
1550
1551bridge = pme_dev->dev->bus->self;
1552/*
1553 * If bridge is in low power state, the
1554 * configuration space of
subordinate devices
1555 * may be not accessible
If I read the disassembly correctly then the deref of bus seems to
cause the oops.

An almost identical bug was reported (and fixed) some time ago:
http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-14 Thread Andreas Noever
When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
crashes a few seconds later. Using
echo 1  /sys/bus/pci/devices/:08:00.0/remove
to remove a bridge two levels above the device triggers the fault immediately:

pciehp :09:00.0:pcie24: unloading service driver pciehp
pci_bus :0a: busn_res: [bus 0a] is released
pci_bus :09: busn_res: [bus 09-0a] is released
general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC

Workqueue: events pci_pme_list_scan
task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000
RIP: 0010:[812bdc8c]  [812bdc8c] pci_pme_list_scan+0x3c/0xe0
RSP: 0018:88044ac63e10  EFLAGS: 00010202
RAX: 88045601e7b0 RBX: 8187b070 RCX: 
RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8
RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0
R10: 0001f9e0c287afc0 R11:  R12: 880453250ca8
R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8
FS:  () GS:88046d04() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0
Stack:
 8187afc0 88044a920e40 88044ac63e68 8107ddd6
 6d053d00  88046d053d00 88046d053d18
 88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8
Call Trace:
 [8107ddd6] process_one_work+0x176/0x470
 [8107e79b] worker_thread+0x11b/0x3a0
 [8107e680] ? manage_workers.isra.21+0x2b0/0x2b0
 [810855e0] kthread+0xc0/0xd0
 [81085520] ? kthread_create_on_node+0x110/0x110
 [814f4c2c] ret_from_fork+0x7c/0xb0
 [81085520] ? kthread_create_on_node+0x110/0x110
Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81
0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 48
8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7
RIP  [812bdc8c] pci_pme_list_scan+0x3c/0xe0
 RSP 88044ac63e10
---[ end trace 3905f90a7dacf7b3 ]---

The offending line is:
(gdb) list *(pci_pme_list_scan+0x3c)
0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551).
1546if (!list_empty(pci_pme_list)) {
1547list_for_each_entry_safe(pme_dev, n,
pci_pme_list, list) {
1548if (pme_dev-dev-pme_poll) {
1549struct pci_dev *bridge;
1550
1551bridge = pme_dev-dev-bus-self;
1552/*
1553 * If bridge is in low power state, the
1554 * configuration space of
subordinate devices
1555 * may be not accessible
If I read the disassembly correctly then the deref of bus seems to
cause the oops.

An almost identical bug was reported (and fixed) some time ago:
http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html

Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-14 Thread Bjorn Helgaas
[+cc Rafael, Mika, Kirill, linux-pci]

On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
andreas.noe...@gmail.com wrote:
 When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
 crashes a few seconds later. Using
 echo 1  /sys/bus/pci/devices/:08:00.0/remove
 to remove a bridge two levels above the device triggers the fault immediately:

There have been significant changes in acpiphp related to Thunderbolt
since v3.11.  Any chance you can try reproduce this problem on a
current kernel, e.g., v3.12-rc5?  If it still happens, can you collect
a complete dmesg log, acpidump, and lspci -vv output, and attach
them to a new http://bugzilla.kernel.org report?

Since you're doing a remove two levels above the Thunderbolt device,
and it looks like pciehp is handling this part, you might be seeing
something new, but the info above will still be a good start in
looking at it.

Bjorn

 pciehp :09:00.0:pcie24: unloading service driver pciehp
 pci_bus :0a: busn_res: [bus 0a] is released
 pci_bus :09: busn_res: [bus 09-0a] is released
 general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
 
 Workqueue: events pci_pme_list_scan
 task: 88044b0b8000 ti: 88044ac62000 task.ti: 88044ac62000
 RIP: 0010:[812bdc8c]  [812bdc8c] 
 pci_pme_list_scan+0x3c/0xe0
 RSP: 0018:88044ac63e10  EFLAGS: 00010202
 RAX: 88045601e7b0 RBX: 8187b070 RCX: 
 RDX: 6b6b6b6b6b6b6b6b RSI: 88044ac63da0 RDI: 880453250ca8
 RBP: 88044ac63e20 R08: 88044ac63da0 R09: 0001f9e0c287afc0
 R10: 0001f9e0c287afc0 R11:  R12: 880453250ca8
 R13: 88046d053d00 R14: 88046d058200 R15: 8187afc8
 FS:  () GS:88046d04() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7fd301d57000 CR3: 0280d000 CR4: 001407e0
 Stack:
  8187afc0 88044a920e40 88044ac63e68 8107ddd6
  6d053d00  88046d053d00 88046d053d18
  88044a920e70 88044b0b8000 88044a920e40 88044ac63ec8
 Call Trace:
  [8107ddd6] process_one_work+0x176/0x470
  [8107e79b] worker_thread+0x11b/0x3a0
  [8107e680] ? manage_workers.isra.21+0x2b0/0x2b0
  [810855e0] kthread+0xc0/0xd0
  [81085520] ? kthread_create_on_node+0x110/0x110
  [814f4c2c] ret_from_fork+0x7c/0xb0
  [81085520] ? kthread_create_on_node+0x110/0x110
 Code: 54 53 e8 f8 c6 22 00 4c 8b 25 01 d4 5b 00 49 81 fc 70 b0 87 81
 0f 84 98 00 00 00 49 8b 1c 24 4c 89 e7 eb 36 0f 1f 00 48 8b 50 10 48
 8b 52 38 48 85 d2 74 07 8b 4a 78 85 c9 75 0a 31 f6 48 89 c7
 RIP  [812bdc8c] pci_pme_list_scan+0x3c/0xe0
  RSP 88044ac63e10
 ---[ end trace 3905f90a7dacf7b3 ]---

 The offending line is:
 (gdb) list *(pci_pme_list_scan+0x3c)
 0x812bdc8c is in pci_pme_list_scan (drivers/pci/pci.c:1551).
 1546if (!list_empty(pci_pme_list)) {
 1547list_for_each_entry_safe(pme_dev, n,
 pci_pme_list, list) {
 1548if (pme_dev-dev-pme_poll) {
 1549struct pci_dev *bridge;
 1550
 1551bridge = pme_dev-dev-bus-self;
 1552/*
 1553 * If bridge is in low power state, 
 the
 1554 * configuration space of
 subordinate devices
 1555 * may be not accessible
 If I read the disassembly correctly then the deref of bus seems to
 cause the oops.

 An almost identical bug was reported (and fixed) some time ago:
 http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/01165.html

 Andreas
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-10-14 Thread Matthew Garrett
On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
 [+cc Rafael, Mika, Kirill, linux-pci]
 
 On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
 andreas.noe...@gmail.com wrote:
  When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
  crashes a few seconds later. Using
  echo 1  /sys/bus/pci/devices/:08:00.0/remove
  to remove a bridge two levels above the device triggers the fault 
  immediately:
 
 There have been significant changes in acpiphp related to Thunderbolt
 since v3.11.

Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe. 
I'd be surprised if acpiphp makes a difference here.

(Whine whine Intel continuing to refuse to provide documentation for a 
widely shipped part whine whine)

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/