Re: [driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver

2018-11-27 Thread Dan Williams
On Tue, Nov 27, 2018 at 9:58 AM Alexander Duyck
 wrote:
>
> On Mon, 2018-11-26 at 18:48 -0800, Dan Williams wrote:
> > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
> >  wrote:
> > >
> > > Probe devices asynchronously instead of the driver. This results in us
> > > seeing the same behavior if the device is registered before the driver or
> > > after. This way we can avoid serializing the initialization should the
> > > driver not be loaded until after the devices have already been added.
> > >
> > > The motivation behind this is that if we have a set of devices that
> > > take a significant amount of time to load we can greatly reduce the time 
> > > to
> > > load by processing them in parallel instead of one at a time. In addition,
> > > each device can exist on a different node so placing a single thread on 
> > > one
> > > CPU to initialize all of the devices for a given driver can result in poor
> > > performance on a system with multiple nodes.
> >
> > Do you have numbers on effects of this change individually? Is this
> > change necessary for the libnvdimm init speedup, or is it independent?
>
> It depends on the case. I was using X86_PMEM_LEGACY_DEVICE to spawn a
> couple of 32GB persistent memory devices. I had to use this patch and
> the async_probe option to get them loading in parallel versus serial as
> the driver load order is a bit different.
>
> Basically as long as all the necessary drivers are loaded for libnvdimm
> you are good, however if the device can get probed before the driver is
> loaded you run into issues as the loading will be serialized without
> this patch.

I think we could achieve the same with something like the following:

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 77f188cd8023..66c9827efdb4 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -3718,5 +3718,6 @@ static __exit void nfit_exit(void)

 module_init(nfit_init);
 module_exit(nfit_exit);
+MODULE_SOFTDEP("pre: nd_pmem");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Intel Corporation");

...to ensure that the pmem driver is loaded and ready to service
devices before they start being discovered.

>
> > > I am using the driver_data member of the device struct to store the driver
> > > pointer while we wait on the deferred probe call. This should be safe to 
> > > do
> > > as the value will either be set to NULL on a failed probe or driver load
> > > followed by unload, or the driver value itself will be set on a successful
> > > driver load. In addition I have used the async_probe flag to add 
> > > additional
> > > protection as it will be cleared if someone overwrites the driver_data
> > > member as a part of loading the driver.
> >
> > I would not put it past a device-driver to call dev_get_drvdata()
> > before dev_set_drvdata(), to check "has this device already been
> > initialized". So I don't think it is safe to assume that the core can
> > stash this information in ->driver_data. Why not put this
> > infrastructure in struct device_private?
>
> The data should be cleared before we even get to the probe call so I am
> not sure that is something we would need to worry about.

Yes it "should", but I have the sense that I have seen code that looks
at dev_get_drvdata() != NULL when it really should be looking at
dev->driver. Maybe not in leaf drivers, but bus code.

> As far as why I didn't use device_private, it was mostly just for the
> sake of space savings. I only had to add one bit to an existing
> bitfield to make the async_probe approach work, and the drvdata just
> seemed like the obvious place to put the deferred driver.

It seems device_private already has deferred_probe data, why not async_probe?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver

2018-11-27 Thread Alexander Duyck
On Mon, 2018-11-26 at 18:48 -0800, Dan Williams wrote:
> On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
>  wrote:
> > 
> > Probe devices asynchronously instead of the driver. This results in us
> > seeing the same behavior if the device is registered before the driver or
> > after. This way we can avoid serializing the initialization should the
> > driver not be loaded until after the devices have already been added.
> > 
> > The motivation behind this is that if we have a set of devices that
> > take a significant amount of time to load we can greatly reduce the time to
> > load by processing them in parallel instead of one at a time. In addition,
> > each device can exist on a different node so placing a single thread on one
> > CPU to initialize all of the devices for a given driver can result in poor
> > performance on a system with multiple nodes.
> 
> Do you have numbers on effects of this change individually? Is this
> change necessary for the libnvdimm init speedup, or is it independent?

It depends on the case. I was using X86_PMEM_LEGACY_DEVICE to spawn a
couple of 32GB persistent memory devices. I had to use this patch and
the async_probe option to get them loading in parallel versus serial as
the driver load order is a bit different.

Basically as long as all the necessary drivers are loaded for libnvdimm
you are good, however if the device can get probed before the driver is
loaded you run into issues as the loading will be serialized without
this patch.

> > I am using the driver_data member of the device struct to store the driver
> > pointer while we wait on the deferred probe call. This should be safe to do
> > as the value will either be set to NULL on a failed probe or driver load
> > followed by unload, or the driver value itself will be set on a successful
> > driver load. In addition I have used the async_probe flag to add additional
> > protection as it will be cleared if someone overwrites the driver_data
> > member as a part of loading the driver.
> 
> I would not put it past a device-driver to call dev_get_drvdata()
> before dev_set_drvdata(), to check "has this device already been
> initialized". So I don't think it is safe to assume that the core can
> stash this information in ->driver_data. Why not put this
> infrastructure in struct device_private?

The data should be cleared before we even get to the probe call so I am
not sure that is something we would need to worry about.

As far as why I didn't use device_private, it was mostly just for the
sake of space savings. I only had to add one bit to an existing
bitfield to make the async_probe approach work, and the drvdata just
seemed like the obvious place to put the deferred driver.

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver

2018-11-26 Thread Dan Williams
On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
 wrote:
>
> Probe devices asynchronously instead of the driver. This results in us
> seeing the same behavior if the device is registered before the driver or
> after. This way we can avoid serializing the initialization should the
> driver not be loaded until after the devices have already been added.
>
> The motivation behind this is that if we have a set of devices that
> take a significant amount of time to load we can greatly reduce the time to
> load by processing them in parallel instead of one at a time. In addition,
> each device can exist on a different node so placing a single thread on one
> CPU to initialize all of the devices for a given driver can result in poor
> performance on a system with multiple nodes.

Do you have numbers on effects of this change individually? Is this
change necessary for the libnvdimm init speedup, or is it independent?

> I am using the driver_data member of the device struct to store the driver
> pointer while we wait on the deferred probe call. This should be safe to do
> as the value will either be set to NULL on a failed probe or driver load
> followed by unload, or the driver value itself will be set on a successful
> driver load. In addition I have used the async_probe flag to add additional
> protection as it will be cleared if someone overwrites the driver_data
> member as a part of loading the driver.

I would not put it past a device-driver to call dev_get_drvdata()
before dev_set_drvdata(), to check "has this device already been
initialized". So I don't think it is safe to assume that the core can
stash this information in ->driver_data. Why not put this
infrastructure in struct device_private?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver

2018-11-08 Thread Bart Van Assche
On Thu, 2018-11-08 at 10:07 -0800, Alexander Duyck wrote:
> Probe devices asynchronously instead of the driver. This results in us
> seeing the same behavior if the device is registered before the driver or
> after. This way we can avoid serializing the initialization should the
> driver not be loaded until after the devices have already been added.

Reviewed-by: Bart Van Assche 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver

2018-11-08 Thread Alexander Duyck
Probe devices asynchronously instead of the driver. This results in us
seeing the same behavior if the device is registered before the driver or
after. This way we can avoid serializing the initialization should the
driver not be loaded until after the devices have already been added.

The motivation behind this is that if we have a set of devices that
take a significant amount of time to load we can greatly reduce the time to
load by processing them in parallel instead of one at a time. In addition,
each device can exist on a different node so placing a single thread on one
CPU to initialize all of the devices for a given driver can result in poor
performance on a system with multiple nodes.

I am using the driver_data member of the device struct to store the driver
pointer while we wait on the deferred probe call. This should be safe to do
as the value will either be set to NULL on a failed probe or driver load
followed by unload, or the driver value itself will be set on a successful
driver load. In addition I have used the async_probe flag to add additional
protection as it will be cleared if someone overwrites the driver_data
member as a part of loading the driver.

Signed-off-by: Alexander Duyck 
---
 drivers/base/bus.c |   23 ++--
 drivers/base/dd.c  |   68 
 include/linux/device.h |   10 ++-
 3 files changed, 80 insertions(+), 21 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 8a630f9bd880..0cd2eadd0816 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -606,17 +606,6 @@ static ssize_t uevent_store(struct device_driver *drv, 
const char *buf,
 }
 static DRIVER_ATTR_WO(uevent);
 
-static void driver_attach_async(void *_drv, async_cookie_t cookie)
-{
-   struct device_driver *drv = _drv;
-   int ret;
-
-   ret = driver_attach(drv);
-
-   pr_debug("bus: '%s': driver %s async attach completed: %d\n",
-drv->bus->name, drv->name, ret);
-}
-
 /**
  * bus_add_driver - Add a driver to the bus.
  * @drv: driver.
@@ -649,15 +638,9 @@ int bus_add_driver(struct device_driver *drv)
 
klist_add_tail(>knode_bus, >p->klist_drivers);
if (drv->bus->p->drivers_autoprobe) {
-   if (driver_allows_async_probing(drv)) {
-   pr_debug("bus: '%s': probing driver %s 
asynchronously\n",
-   drv->bus->name, drv->name);
-   async_schedule(driver_attach_async, drv);
-   } else {
-   error = driver_attach(drv);
-   if (error)
-   goto out_unregister;
-   }
+   error = driver_attach(drv);
+   if (error)
+   goto out_unregister;
}
module_add_driver(drv->owner, drv);
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index ed19cf0d6f9a..f4e84d639c69 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -808,6 +808,7 @@ static int __device_attach(struct device *dev, bool 
allow_async)
ret = 1;
else {
dev->driver = NULL;
+   dev_set_drvdata(dev, NULL);
ret = 0;
}
} else {
@@ -925,6 +926,48 @@ int device_driver_attach(struct device_driver *drv, struct 
device *dev)
return ret;
 }
 
+static inline struct device_driver *dev_get_drv_async(const struct device *dev)
+{
+   return dev->async_probe ? dev->driver_data : NULL;
+}
+
+static inline void dev_set_drv_async(struct device *dev,
+struct device_driver *drv)
+{
+   /*
+* Set async_probe to true indicating we are waiting for this data to be
+* loaded as a potential driver.
+*/
+   dev->driver_data = drv;
+   dev->async_probe = true;
+}
+
+static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
+{
+   struct device *dev = _dev;
+   struct device_driver *drv;
+
+   __device_driver_lock(dev, dev->parent);
+
+   /*
+* If someone attempted to bind a driver either successfully or
+* unsuccessfully before we got here we should just skip the driver
+* probe call.
+*/
+   drv = dev_get_drv_async(dev);
+   if (drv && !dev->driver)
+   driver_probe_device(drv, dev);
+
+   /* We made our attempt at an async_probe, clear the flag */
+   dev->async_probe = false;
+
+   __device_driver_unlock(dev, dev->parent);
+
+   put_device(dev);
+
+   dev_dbg(dev, "async probe completed\n");
+}
+
 static int __driver_attach(struct device *dev, void *data)
 {
struct device_driver *drv = data;
@@ -952,6 +995,25 @@ static int __driver_attach(struct device *dev, void *data)
return ret;
} /* ret > 0 means positive match */
 
+   if (driver_allows_async_probing(drv)) {
+