Re: [PATCH 1/8] PCI: Add pcibios_stop_dev()

2013-07-05 Thread Bjorn Helgaas
On Thu, Jul 4, 2013 at 8:57 PM, Gavin Shan sha...@linux.vnet.ibm.com wrote:
 When stopping and removing one specific PCI device, the platform
 might need take some actions. One example is that EEH already had
 eeh cache and eeh device attached to the PCI device, and we need
 release eeh cache and device during the time. The patch introduces
 hook pcibios_stop_dev() for the purpose.

 Cc: Bjorn Helgaas bhelg...@google.com
 Cc: linux-...@vger.kernel.org
 Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
 ---
  drivers/pci/probe.c  |4 
  drivers/pci/remove.c |2 ++
  include/linux/pci.h  |1 +
  3 files changed, 7 insertions(+), 0 deletions(-)

 diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
 index 70f10fa..7167dc4 100644
 --- a/drivers/pci/probe.c
 +++ b/drivers/pci/probe.c
 @@ -1669,6 +1669,10 @@ void __weak pcibios_remove_bus(struct pci_bus *bus)
  {
  }

 +void __weak pcibios_stop_dev(struct pci_dev *dev)
 +{
 +}
 +
  struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
 struct pci_ops *ops, void *sysdata, struct list_head 
 *resources)
  {
 diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
 index 8fc54b7..e329efc 100644
 --- a/drivers/pci/remove.c
 +++ b/drivers/pci/remove.c
 @@ -21,6 +21,8 @@ static void pci_stop_dev(struct pci_dev *dev)
  {
 pci_pme_active(dev, false);

 +   pcibios_stop_dev(dev);

We already have pcibios_release_device() (just merged for v3.11).
Would that work for you?  I haven't seen your whole series, so I can't
tell whether you need something different.

 if (dev-is_added) {
 pci_proc_detach_device(dev);
 pci_remove_sysfs_dev_files(dev);
 diff --git a/include/linux/pci.h b/include/linux/pci.h
 index 3a24e4f..40df783 100644
 --- a/include/linux/pci.h
 +++ b/include/linux/pci.h
 @@ -696,6 +696,7 @@ int no_pci_devices(void);
  void pcibios_resource_survey_bus(struct pci_bus *bus);
  void pcibios_add_bus(struct pci_bus *bus);
  void pcibios_remove_bus(struct pci_bus *bus);
 +void pcibios_stop_dev(struct pci_dev *dev);
  void pcibios_fixup_bus(struct pci_bus *);
  int __must_check pcibios_enable_device(struct pci_dev *, int mask);
  /* Architecture specific versions may override this (weak) */
 --
 1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] PCI: Add pcibios_stop_dev()

2013-07-05 Thread Benjamin Herrenschmidt
On Fri, 2013-07-05 at 12:49 -0600, Bjorn Helgaas wrote:
 We already have pcibios_release_device() (just merged for v3.11).
 Would that work for you?  I haven't seen your whole series, so I can't
 tell whether you need something different.

Maybe... Gavin's original hook applies when the device is removed
from the tree, your new hook when the refcount expires and the
structure is about to be freed...

I *suspect* that's fine but I'll need to have a closer look (or wait
for Gavin to be back from vacation :-)

BTW. One thing we do in EEH is that if we get an error and the driver
for the device doesn't implement recovery, we simulate a hotplug.

IE. We unplug the device (currently removing it), reset it (or the
bus it's on, whatever applies, which lifts the fence established by
the EEH hardware), and re-plug it.

The current method really removes the device from the PCI subsystem,
and plugs it back. IE. We rescan the slot, and might even end up
re-assigning resources, which I'm not too fan about. It's also unclear
what quirks gets called in that re-plug case, etc...

I'm wondering whether it might be better to keep the pci_dev around,
just mark it blocked for user access, unbind the driver, reset, restore
the BARs to their original values, unblock and re-bind.

What's your take on this ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] PCI: Add pcibios_stop_dev()

2013-07-05 Thread Bjorn Helgaas
On Fri, Jul 5, 2013 at 4:36 PM, Benjamin Herrenschmidt
b...@kernel.crashing.org wrote:
 On Fri, 2013-07-05 at 12:49 -0600, Bjorn Helgaas wrote:
 We already have pcibios_release_device() (just merged for v3.11).
 Would that work for you?  I haven't seen your whole series, so I can't
 tell whether you need something different.

 Maybe... Gavin's original hook applies when the device is removed
 from the tree, your new hook when the refcount expires and the
 structure is about to be freed...

Yep, pcibios_release_device() is definitely at a different point.  I
just want to avoid exposing too many points in the device lifetime to
the arch, because it makes it so much harder to refactor things.

 I *suspect* that's fine but I'll need to have a closer look (or wait
 for Gavin to be back from vacation :-)

 BTW. One thing we do in EEH is that if we get an error and the driver
 for the device doesn't implement recovery, we simulate a hotplug.

 IE. We unplug the device (currently removing it), reset it (or the
 bus it's on, whatever applies, which lifts the fence established by
 the EEH hardware), and re-plug it.

 The current method really removes the device from the PCI subsystem,
 and plugs it back. IE. We rescan the slot, and might even end up
 re-assigning resources, which I'm not too fan about. It's also unclear
 what quirks gets called in that re-plug case, etc...

 I'm wondering whether it might be better to keep the pci_dev around,
 just mark it blocked for user access, unbind the driver, reset, restore
 the BARs to their original values, unblock and re-bind.

Personally, I like the full hotplug approach, even though it will
probably reassign resources, because it uses the standard paths that
should be better-tested, and it's clear what should happen.
Reassigning resources should be OK because we already have to do that
for the non-error handling hot-add case.

I know we have paths where we save config space, do a reset, and
restore config space.  I don't like those because some quirks aren't
applied and I don't believe restoring config space is sufficient to
make the device safe to use.  There could be internal state that we
aren't restoring.  It's also possible that a firmware update means the
device is different than it was before the reset, e.g., BARs could be
different types or sizes.

Bjorn
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] PCI: Add pcibios_stop_dev()

2013-07-05 Thread Benjamin Herrenschmidt
On Fri, 2013-07-05 at 16:49 -0600, Bjorn Helgaas wrote:
 I know we have paths where we save config space, do a reset, and
 restore config space.  I don't like those because some quirks aren't
 applied and I don't believe restoring config space is sufficient to
 make the device safe to use.  There could be internal state that we
 aren't restoring.  It's also possible that a firmware update means the
 device is different than it was before the reset, e.g., BARs could be
 different types or sizes.

True. Ok, I'll stick to the actual hotplug path then, you make
good points.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/8] PCI: Add pcibios_stop_dev()

2013-07-04 Thread Gavin Shan
When stopping and removing one specific PCI device, the platform
might need take some actions. One example is that EEH already had
eeh cache and eeh device attached to the PCI device, and we need
release eeh cache and device during the time. The patch introduces
hook pcibios_stop_dev() for the purpose.

Cc: Bjorn Helgaas bhelg...@google.com
Cc: linux-...@vger.kernel.org
Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 drivers/pci/probe.c  |4 
 drivers/pci/remove.c |2 ++
 include/linux/pci.h  |1 +
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 70f10fa..7167dc4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1669,6 +1669,10 @@ void __weak pcibios_remove_bus(struct pci_bus *bus)
 {
 }
 
+void __weak pcibios_stop_dev(struct pci_dev *dev)
+{
+}
+
 struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
struct pci_ops *ops, void *sysdata, struct list_head *resources)
 {
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 8fc54b7..e329efc 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -21,6 +21,8 @@ static void pci_stop_dev(struct pci_dev *dev)
 {
pci_pme_active(dev, false);
 
+   pcibios_stop_dev(dev);
+
if (dev-is_added) {
pci_proc_detach_device(dev);
pci_remove_sysfs_dev_files(dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3a24e4f..40df783 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -696,6 +696,7 @@ int no_pci_devices(void);
 void pcibios_resource_survey_bus(struct pci_bus *bus);
 void pcibios_add_bus(struct pci_bus *bus);
 void pcibios_remove_bus(struct pci_bus *bus);
+void pcibios_stop_dev(struct pci_dev *dev);
 void pcibios_fixup_bus(struct pci_bus *);
 int __must_check pcibios_enable_device(struct pci_dev *, int mask);
 /* Architecture specific versions may override this (weak) */
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] PCI: Add pcibios_stop_dev()

2013-07-04 Thread Benjamin Herrenschmidt
On Fri, 2013-07-05 at 10:57 +0800, Gavin Shan wrote:
 When stopping and removing one specific PCI device, the platform
 might need take some actions. One example is that EEH already had
 eeh cache and eeh device attached to the PCI device, and we need
 release eeh cache and device during the time. The patch introduces
 hook pcibios_stop_dev() for the purpose.

Bjorn, any objection ? Ack ? Nack ? :-)

I'd like to put that in my tree, it's part of a series that fixes a
number of bugs with our hotplug and EEH code which I'd like to send
to Linus soonish.

Cheers,
Ben.

 Cc: Bjorn Helgaas bhelg...@google.com
 Cc: linux-...@vger.kernel.org
 Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
 ---
  drivers/pci/probe.c  |4 
  drivers/pci/remove.c |2 ++
  include/linux/pci.h  |1 +
  3 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
 index 70f10fa..7167dc4 100644
 --- a/drivers/pci/probe.c
 +++ b/drivers/pci/probe.c
 @@ -1669,6 +1669,10 @@ void __weak pcibios_remove_bus(struct pci_bus *bus)
  {
  }
  
 +void __weak pcibios_stop_dev(struct pci_dev *dev)
 +{
 +}
 +
  struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
   struct pci_ops *ops, void *sysdata, struct list_head *resources)
  {
 diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
 index 8fc54b7..e329efc 100644
 --- a/drivers/pci/remove.c
 +++ b/drivers/pci/remove.c
 @@ -21,6 +21,8 @@ static void pci_stop_dev(struct pci_dev *dev)
  {
   pci_pme_active(dev, false);
  
 + pcibios_stop_dev(dev);
 +
   if (dev-is_added) {
   pci_proc_detach_device(dev);
   pci_remove_sysfs_dev_files(dev);
 diff --git a/include/linux/pci.h b/include/linux/pci.h
 index 3a24e4f..40df783 100644
 --- a/include/linux/pci.h
 +++ b/include/linux/pci.h
 @@ -696,6 +696,7 @@ int no_pci_devices(void);
  void pcibios_resource_survey_bus(struct pci_bus *bus);
  void pcibios_add_bus(struct pci_bus *bus);
  void pcibios_remove_bus(struct pci_bus *bus);
 +void pcibios_stop_dev(struct pci_dev *dev);
  void pcibios_fixup_bus(struct pci_bus *);
  int __must_check pcibios_enable_device(struct pci_dev *, int mask);
  /* Architecture specific versions may override this (weak) */


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev