Re: [RFC PATCH 1/9] kernel.h: add do_empty() macro
On 4/18/20 11:41 AM, Randy Dunlap wrote: --- linux-next-20200327.orig/include/linux/kernel.h +++ linux-next-20200327/include/linux/kernel.h @@ -40,6 +40,14 @@ #define READ 0 #define WRITE 1 +/* + * When using -Wextra, an "if" statement followed by an empty block + * (containing only a ';'), produces a warning from gcc: + * warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] + * Replace the empty body with do_empty() to silence this warning. + */ +#define do_empty() do { } while (0) + /** * ARRAY_SIZE - get the number of elements in array @arr * @arr: array to be sized I'm less than enthusiast about introducing a new macro to suppress "empty body" warnings. Anyone who encounters code in which this macro is used will have to look up the definition of this macro to learn what it does. Has it been considered to suppress empty body warnings by changing the empty bodies from ";" into "{}"? Thanks, Bart. ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [Ksummit-discuss] [PATCH v2 0/3] Maintainer Entry Profiles
On 9/12/19 8:34 AM, Joe Perches wrote: > On Thu, 2019-09-12 at 14:31 +0100, Bart Van Assche wrote: >> On 9/11/19 5:40 PM, Martin K. Petersen wrote: >>> * The patch must compile without warnings (make C=1 CF="-D__CHECK_ENDIAN__") >>> and does not incur any zeroday test robot complaints. >> >> How about adding W=1 to that make command? > > That's rather too compiler version dependent and new > warnings frequently get introduced by new compiler versions. I've never observed this myself. If a new compiler warning is added to gcc and if it produces warnings that are not useful for kernel code usually Linus or someone else is quick to suppress that warning. Another argument in favor of W=1 is that the formatting of kernel-doc headers is checked only if W=1 is passed to make. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [Ksummit-discuss] [PATCH v2 0/3] Maintainer Entry Profiles
On 9/11/19 5:40 PM, Martin K. Petersen wrote: > * Do not use custom To: and Cc: for individual patches. We want to see the > whole series, even patches that potentially need to go through a different > subsystem tree. Hi Martin, Thanks for having written this summary. This is very helpful. For the above paragraph, should it be clarified whether that requirement applies to mailing list e-mail addresses only or also to individual e-mail addresses? When using git send-email it is easy to end up with different cc-lists per patch. > * The patch must compile without warnings (make C=1 CF="-D__CHECK_ENDIAN__") > and does not incur any zeroday test robot complaints. How about adding W=1 to that make command? How about existing drivers that trigger tons of endianness warnings, e.g. qla2xxx? How about requiring that no new warnings are introduced? > * The patch must have a commit message that describes, comprehensively and in > plain English, what the patch does. How about making this requirement more detailed and requiring that not only what has been changed is document but also why that change has been made? > * Patches which have been obsoleted by a later version will be set to > "Superceded" status. Did you perhaps mean "Superseded"? Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [Ksummit-discuss] [PATCH v2 0/3] Maintainer Entry Profiles
On 9/11/19 4:48 PM, Dan Williams wrote: > At last years Plumbers Conference I proposed the Maintainer Entry > Profile as a document that a maintainer can provide to set contributor > expectations and provide fodder for a discussion between maintainers > about the merits of different maintainer policies. > > For those that did not attend, the goal of the Maintainer Entry Profile, > and the Maintainer Handbook more generally, is to provide a desk > reference for maintainers both new and experienced. The session > introduction was: > > The first rule of kernel maintenance is that there are no hard and > fast rules. That state of affairs is both a blessing and a curse. It > has served the community well to be adaptable to the different > people and different problem spaces that inhabit the kernel > community. However, that variability also leads to inconsistent > experiences for contributors, little to no guidance for new > contributors, and unnecessary stress on current maintainers. There > are quite a few of people who have been around long enough to make > enough mistakes that they have gained some hard earned proficiency. > However if the kernel community expects to keep growing it needs to > be able both scale the maintainers it has and ramp new ones without > necessarily let them make a decades worth of mistakes to learn the > ropes. > > To be clear, the proposed document does not impose or suggest new > rules. Instead it provides an outlet to document the unwritten rules > and policies in effect for each subsystem, and that each subsystem > might decide differently for whatever reason. Any maintainer who reads this might interpret this as an encouragement to establish custom policies. I think one of the conclusions of the Linux Plumbers 2019 edition is that too much diversity is bad and that we need more uniformity across kernel subsystems with regard what is expected from patch contributors. I would appreciate if a summary of https://linuxplumbersconf.org/event/4/contributions/554/attachments/353/584/Reflections__Kernel_Summit_2019.pdf would be integrated in the maintainer handbook. Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v6 9/9] libnvdimm: Schedule device registration on node local to the device
On Tue, 2018-11-27 at 12:50 -0800, Dan Williams wrote: > Thanks Bart, so tying this back to Alex's patches, does the ordering > problem that Alex's patches solve impact the SCSI case? I'm looking > for something like "SCSI depends on asynchronous probing and without > 'driver core: Establish clear order of operations for deferred probe > and remove' probing is often needlessly serialized". I.e. does it > suffer from the same platform problem that libnvdimm ran into where > it's local async probing implementation was hindered by the driver > core? (+Martin) Hi Dan, Patch 6/9 reduces the time needed to scan SCSI LUNs significantly. The only way to realize that speedup is by enabling more concurrency. That's why I think that patch 6/9 is a significant driver core improvement. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v6 9/9] libnvdimm: Schedule device registration on node local to the device
On Tue, 2018-11-27 at 11:34 -0800, Dan Williams wrote: > On Tue, Nov 27, 2018 at 10:04 AM Alexander Duyck > wrote: > > > > On Mon, 2018-11-26 at 18:21 -0800, Dan Williams wrote: > > > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck > > > wrote: > > > > > > > > Force the device registration for nvdimm devices to be closer to the > > > > actual > > > > device. This is achieved by using either the NUMA node ID of the > > > > region, or > > > > of the parent. By doing this we can have everything above the region > > > > based > > > > on the region, and everything below the region based on the nvdimm bus. > > > > > > > > By guaranteeing NUMA locality I see an improvement of as high as 25% for > > > > per-node init of a system with 12TB of persistent memory. > > > > > > > > > > It seems the speed-up is achieved with just patches 1, 2, and 9 from > > > this series, correct? I wouldn't want to hold up that benefit while > > > the driver-core bits are debated. > > > > Actually patch 6 ends up impacting things for persistent memory as > > well. The problem is that all the async calls to add interfaces only do > > anything if the driver is already loaded. So there are cases such as > > the X86_PMEM_LEGACY_DEVICE case where the memory regions end up still > > being serialized because the devices are added before the driver. > > Ok, but is the patch 6 change generally useful outside of the > libnvdimm case? Yes, local hacks like MODULE_SOFTDEP are terrible for > global problems, but what I'm trying to tease out if this change > benefits other async probing subsystems outside of libnvdimm, SCSI > perhaps? Bart can you chime in with the benefits you see so it's clear > to Greg that the driver-core changes are a generic improvement? Hi Dan, For SCSI asynchronous probing is really important because when scanning SAN LUNs there is plenty of potential for concurrency due to the network delay. I think the following quote provides the information you are looking for: "This patch reduces the time needed for loading the scsi_debug kernel module with parameters delay=0 and max_luns=256 from 0.7s to 0.1s. In other words, this specific test runs about seven times faster." Source: https://www.spinics.net/lists/linux-scsi/msg124457.html Best regards, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 5/9] driver core: Establish clear order of operations for deferred probe and remove
On 11/22/18 5:23 PM, Rong Chen wrote: Kbuild robot printed all warnings and the below warning was found in patch 5/9. "include/linux/device.h:1056: warning: Function parameter or member 'async_probe' not described in 'device'" Hi Rong, Thanks for having taken a look. That's indeed something that must be fixed in patch 5/9. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v6 2/9] async: Add support for queueing on specific NUMA node
On 11/11/18 12:33 PM, Greg KH wrote: When I see new major patches show up from an Intel developer without _any_ other review from anyone else, directed at me, I get suspicious it is happening again. Hi Greg, Please note that I reviewed this patch four days ago. See also https://lore.kernel.org/lkml/1541720197.196084.231.ca...@acm.org/. Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver
On Thu, 2018-11-08 at 10:07 -0800, Alexander Duyck wrote: > Probe devices asynchronously instead of the driver. This results in us > seeing the same behavior if the device is registered before the driver or > after. This way we can avoid serializing the initialization should the > driver not be loaded until after the devices have already been added. Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v6 5/9] driver core: Establish clear order of operations for deferred probe and remove
On Thu, 2018-11-08 at 10:07 -0800, Alexander Duyck wrote: > One change I made in addition is I replaced the use of "bool X:1" to define > the bitfield to a "u8 X:1" setup in order to resolve some checkpatch > warnings. If you have to repost this patch series please remove the above from the patch description since I think this part is no longer relevant. Anyway: Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 5/9] driver core: Establish clear order of operations for deferred probe and remove
On Tue, 2018-11-06 at 17:34 -0800, Joe Perches wrote: > On Tue, 2018-11-06 at 15:48 -0800, Bart Van Assche wrote: > > On Mon, 2018-11-05 at 13:12 -0800, Alexander Duyck wrote: > > > One change I made in addition is I replaced the use of "bool X:1" to > > > define > > > the bitfield to a "u8 X:1" setup in order to resolve some checkpatch > > > warnings. > > > > Please use "bool X:1" instead of "u8 X:1". I think it was a bad idea to make > > checkpatch complain about "bool X:1" since "bool X:1" should only be avoided > > in structures for which alignment must be architecture-independent. For > > struct > > device it is fine if member alignment differs per architecture. > > Additionally, > > changing "bool X:1" into "u8 X:1" will reduce performance on architectures > > that > > cannot do byte addressing. > > I generally agree. But the checkpatch warning _could_ > be useful in those cases where alignment should be > architecture-independent. > > Any suggestion on how to improve the message? It would be great if a heuristic could be developed that recognizes structs for which the data layout must be architecture independent. If such a heuristic could be developed it could be used to only display warn about "bool X:n" for such structures. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v6 2/9] async: Add support for queueing on specific NUMA node
On Thu, 2018-11-08 at 10:06 -0800, Alexander Duyck wrote: > Introduce four new variants of the async_schedule_ functions that allow > scheduling on a specific NUMA node. > > The first two functions are async_schedule_near and > async_schedule_near_domain end up mapping to async_schedule and > async_schedule_domain, but provide NUMA node specific functionality. They > replace the original functions which were moved to inline function > definitions that call the new functions while passing NUMA_NO_NODE. > > The second two functions are async_schedule_dev and > async_schedule_dev_domain which provide NUMA specific functionality when > passing a device as the data member and that device has a NUMA node other > than NUMA_NO_NODE. > > The main motivation behind this is to address the need to be able to > schedule device specific init work on specific NUMA nodes in order to > improve performance of memory initialization. Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 2/9] async: Add support for queueing on specific NUMA node
On Mon, 2018-11-05 at 13:11 -0800, Alexander Duyck wrote: > +/** > + * async_schedule_domain - schedule a function for asynchronous execution > within a certain domain > + * @func: function to execute asynchronously > + * @data: data pointer to pass to the function > + * @domain: the domain > + * > + * Returns an async_cookie_t that may be used for checkpointing later. > + * @domain may be used in the async_synchronize_*_domain() functions to > + * wait within a certain synchronization domain rather than globally. A > + * synchronization domain is specified via @domain. > + * Note: This function may be called from atomic or non-atomic contexts. > + */ Please leave out "A synchronization domain is specified via @domain." since that text is redundant due to "@domain: the domain". > +/** > + * async_schedule_dev_domain - A device specific version of > async_schedule_domain > + * @func: function to execute asynchronously > + * @dev: device argument to be passed to function > + * @domain: the domain > + * > + * Returns an async_cookie_t that may be used for checkpointing later. > + * @dev is used as both the argument for the function and to provide NUMA > + * context for where to run the function. By doing this we can try to > + * provide for the best possible outcome by operating on the device on the > + * CPUs closest to the device. > + * @domain may be used in the async_synchronize_*_domain() functions to > + * wait within a certain synchronization domain rather than globally. A > + * synchronization domain is specified via @domain. > + * Note: This function may be called from atomic or non-atomic contexts. > + */ Same comment here: I think "A synchronization domain is specified via @domain." is redundant. > +/** > + * async_schedule_node_domain - NUMA specific version of > async_schedule_domain > + * @func: function to execute asynchronously > + * @data: data pointer to pass to the function > + * @node: NUMA node that we want to schedule this on or close to > + * @domain: the domain > + * > + * Returns an async_cookie_t that may be used for checkpointing later. > + * @domain may be used in the async_synchronize_*_domain() functions to > + * wait within a certain synchronization domain rather than globally. A > + * synchronization domain is specified via @domain. Note: This function > + * may be called from atomic or non-atomic contexts. > + * > + * The node requested will be honored on a best effort basis. If the node > + * has no CPUs associated with it then the work is distributed among all > + * available CPUs. > + */ Same comment here: I think that also in the above "A synchronization domain is specified via @domain." is redundant. Otherwise this patch looks fine to me. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 9/9] libnvdimm: Schedule device registration on node local to the device
On Mon, 2018-11-05 at 13:12 -0800, Alexander Duyck wrote: > This patch is meant to force the device registration for nvdimm devices to > be closer to the actual device. This is achieved by using either the NUMA > node ID of the region, or of the parent. By doing this we can have > everything above the region based on the region, and everything below the > region based on the nvdimm bus. > > By guaranteeing NUMA locality I see an improvement of as high as 25% for > per-node init of a system with 12TB of persistent memory. Thank you for having included the performance numbers. Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 8/9] PM core: Use new async_schedule_dev command
On Mon, 2018-11-05 at 13:12 -0800, Alexander Duyck wrote: > This change makes it so that we use the device specific version of the > async_schedule commands to defer various tasks related to power management. > By doing this we should see a slight improvement in performance as any > device that is sensitive to latency/locality in the setup will now be > initializing on the node closest to the device. Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 7/9] driver core: Attach devices on CPU local to device node
On Mon, 2018-11-05 at 13:12 -0800, Alexander Duyck wrote: > This change makes it so that we call the asynchronous probe routines on a > CPU local to the device node. By doing this we should be able to improve > our initialization time significantly as we can avoid having to access the > device from a remote node which may introduce higher latency. > > For example, in the case of initializing memory for NVDIMM this can have a > significant impact as initialing 3TB on remote node can take up to 39 > seconds while initialing it on a local node only takes 23 seconds. It is > situations like this where we will see the biggest improvement. > > Signed-off-by: Alexander Duyck > --- > drivers/base/dd.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 2fdfe45bb6ea..cf7681309ee3 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -834,7 +834,7 @@ static int __device_attach(struct device *dev, bool > allow_async) > dev_dbg(dev, "scheduling asynchronous probe\n"); > get_device(dev); > dev->async_probe = true; > - async_schedule(__device_attach_async_helper, dev); > + async_schedule_dev(__device_attach_async_helper, dev); > } else { > pm_request_idle(dev); > } > @@ -992,7 +992,7 @@ static int __driver_attach(struct device *dev, void *data) > if (!dev->driver) { > get_device(dev); > dev_set_drv_async(dev, drv); > - async_schedule(__driver_attach_async_helper, dev); > + async_schedule_dev(__driver_attach_async_helper, dev); > } > device_unlock(dev); > return 0; Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 6/9] driver core: Probe devices asynchronously instead of the driver
On Mon, 2018-11-05 at 13:12 -0800, Alexander Duyck wrote: > diff --git a/include/linux/device.h b/include/linux/device.h > index fc7091d436c2..4d9a39769081 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > [ ... ] > +static inline struct device_driver *dev_get_drv_async(const struct device > *dev) > +{ > + return dev->async_probe ? dev->driver_data : NULL; > +} > + > +static inline void dev_set_drv_async(struct device *dev, > + struct device_driver *drv) > +{ > + /* > + * Set async_probe to true indicating we are waiting for this data to be > + * loaded as a potential driver. > + */ > + dev->driver_data = drv; > + dev->async_probe = true; > +} Since these two new functions are only used in the driver core please move their definition into the driver core. Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 5/9] driver core: Establish clear order of operations for deferred probe and remove
On Tue, 2018-11-06 at 12:10 +0800, kbuild test robot wrote: > Hi Alexander, > > Thank you for the patch! Perhaps something to improve: > > [auto build test WARNING on driver-core/master] > > url: > https://github.com/0day-ci/linux/commits/Alexander-Duyck/Add-NUMA-aware-async_schedule-calls/20181106-093800 > reproduce: make htmldocs > > All warnings (new ones prefixed by >>): > >include/net/mac80211.h:1001: warning: Function parameter or member > 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info' > [ ... ] There are plenty of references in this report to header files not touched by patch 5/9 in this series. I assume that this report indicates a bug in the 0-day testing infrastructure? Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 5/9] driver core: Establish clear order of operations for deferred probe and remove
On Mon, 2018-11-05 at 13:12 -0800, Alexander Duyck wrote: > One change I made in addition is I replaced the use of "bool X:1" to define > the bitfield to a "u8 X:1" setup in order to resolve some checkpatch > warnings. Please use "bool X:1" instead of "u8 X:1". I think it was a bad idea to make checkpatch complain about "bool X:1" since "bool X:1" should only be avoided in structures for which alignment must be architecture-independent. For struct device it is fine if member alignment differs per architecture. Additionally, changing "bool X:1" into "u8 X:1" will reduce performance on architectures that cannot do byte addressing. > static void __device_release_driver(struct device *dev, struct device > *parent) > { > - struct device_driver *drv; > + struct device_driver *drv = dev->driver; > > - drv = dev->driver; > - if (drv) { > - while (device_links_busy(dev)) { > - __device_driver_unlock(dev, parent); > + /* > + * In the event that we are asked to release the driver on an > + * interface that is still waiting on a probe we can just terminate > + * the probe by setting async_probe to false. When the async call > + * is finally completed it will see this state and just exit. > + */ > + dev->async_probe = false; > + if (!drv) > + return; > > - device_links_unbind_consumers(dev); > + while (device_links_busy(dev)) { > + __device_driver_unlock(dev, parent); > > - __device_driver_lock(dev, parent); > - /* > - * A concurrent invocation of the same function might > - * have released the driver successfully while this one > - * was waiting, so check for that. > - */ > - if (dev->driver != drv) > - return; > - } > + device_links_unbind_consumers(dev); > > - pm_runtime_get_sync(dev); > - pm_runtime_clean_up_links(dev); > + __device_driver_lock(dev, parent); > + /* > + * A concurrent invocation of the same function might > + * have released the driver successfully while this one > + * was waiting, so check for that. > + */ > + if (dev->driver != drv) > + return; > + } > > - driver_sysfs_remove(dev); > + pm_runtime_get_sync(dev); > + pm_runtime_clean_up_links(dev); > > - if (dev->bus) > - blocking_notifier_call_chain(>bus->p->bus_notifier, > - BUS_NOTIFY_UNBIND_DRIVER, > - dev); > + driver_sysfs_remove(dev); > > - pm_runtime_put_sync(dev); > + if (dev->bus) > + blocking_notifier_call_chain(>bus->p->bus_notifier, > + BUS_NOTIFY_UNBIND_DRIVER, > + dev); > > - if (dev->bus && dev->bus->remove) > - dev->bus->remove(dev); > - else if (drv->remove) > - drv->remove(dev); > + pm_runtime_put_sync(dev); > > - device_links_driver_cleanup(dev); > - arch_teardown_dma_ops(dev); > + if (dev->bus && dev->bus->remove) > + dev->bus->remove(dev); > + else if (drv->remove) > + drv->remove(dev); > > - devres_release_all(dev); > - dev->driver = NULL; > - dev_set_drvdata(dev, NULL); > - if (dev->pm_domain && dev->pm_domain->dismiss) > - dev->pm_domain->dismiss(dev); > - pm_runtime_reinit(dev); > - dev_pm_set_driver_flags(dev, 0); > + device_links_driver_cleanup(dev); > + arch_teardown_dma_ops(dev); > + > + devres_release_all(dev); > + dev->driver = NULL; > + dev_set_drvdata(dev, NULL); > + if (dev->pm_domain && dev->pm_domain->dismiss) > + dev->pm_domain->dismiss(dev); > + pm_runtime_reinit(dev); > + dev_pm_set_driver_flags(dev, 0); > > - klist_remove(>p->knode_driver); > - device_pm_check_callbacks(dev); > - if (dev->bus) > - blocking_notifier_call_chain(>bus->p->bus_notifier, > - BUS_NOTIFY_UNBOUND_DRIVER, > - dev); > + klist_remove(>p->knode_driver); > + device_pm_check_callbacks(dev); > + if (dev->bus) > + blocking_notifier_call_chain(>bus->p->bus_notifier, > + BUS_NOTIFY_UNBOUND_DRIVER, > + dev); > > - kobject_uevent(>kobj, KOBJ_UNBIND); > - } > +
Re: [driver-core PATCH v5 4/9] driver core: Move async_synchronize_full call
On Tue, 2018-11-06 at 08:18 -0800, Alexander Duyck wrote: > On Mon, 2018-11-05 at 17:04 -0800, Bart Van Assche wrote: > > On Mon, 2018-11-05 at 13:11 -0800, Alexander Duyck wrote: > > > This patch moves the async_synchronize_full call out of > > > __device_release_driver and into driver_detach. > > > > > > The idea behind this is that the async_synchronize_full call will only > > > guarantee that any existing async operations are flushed. This doesn't do > > > anything to guarantee that a hotplug event that may occur while we are > > > doing the release of the driver will not be asynchronously scheduled. > > > > > > By moving this into the driver_detach path we can avoid potential > > > deadlocks > > > as we aren't holding the device lock at this point and we should not have > > > the driver we want to flush loaded so the flush will take care of any > > > asynchronous events the driver we are detaching might have scheduled. > > > > > > Signed-off-by: Alexander Duyck > > > --- > > > drivers/base/dd.c |6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > > index 76c40fe69463..e74cefeb5b69 100644 > > > --- a/drivers/base/dd.c > > > +++ b/drivers/base/dd.c > > > @@ -975,9 +975,6 @@ static void __device_release_driver(struct device > > > *dev, struct device *parent) > > > > > > drv = dev->driver; > > > if (drv) { > > > - if (driver_allows_async_probing(drv)) > > > - async_synchronize_full(); > > > - > > > while (device_links_busy(dev)) { > > > __device_driver_unlock(dev, parent); > > > > > > @@ -1087,6 +1084,9 @@ void driver_detach(struct device_driver *drv) > > > struct device_private *dev_prv; > > > struct device *dev; > > > > > > + if (driver_allows_async_probing(drv)) > > > + async_synchronize_full(); > > > + > > > for (;;) { > > > spin_lock(>p->klist_devices.k_lock); > > > if (list_empty(>p->klist_devices.k_list)) { > > > > Have you considered to move that async_synchronize_full() call into > > bus_remove_driver()? Verifying the correctness of this patch requires to > > check whether the async_synchronize_full() comes after the > > klist_remove(>p->knode_bus) call. That verification is easier when > > the async_synchronize_full() call occurs in bus_remove_driver() instead > > of in driver_detach(). > > I considered it, however it ends up with things being more symmetric to > have use take care of synchronizing things in driver_detach since after > this patch set we are scheduling thing asynchronously in driver_attach. > > Also I don't think it would be any great risk simply because calling > driver_detach with the driver still associated with the bus would be a > blatent error as it could easily lead to issues where you unbind a > driver but have it get hotplugged to a device while that is going on. Thanks for the additional clarification. Since I'm fine with this patch: Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 4/9] driver core: Move async_synchronize_full call
On Mon, 2018-11-05 at 13:11 -0800, Alexander Duyck wrote: > This patch moves the async_synchronize_full call out of > __device_release_driver and into driver_detach. > > The idea behind this is that the async_synchronize_full call will only > guarantee that any existing async operations are flushed. This doesn't do > anything to guarantee that a hotplug event that may occur while we are > doing the release of the driver will not be asynchronously scheduled. > > By moving this into the driver_detach path we can avoid potential deadlocks > as we aren't holding the device lock at this point and we should not have > the driver we want to flush loaded so the flush will take care of any > asynchronous events the driver we are detaching might have scheduled. > > Signed-off-by: Alexander Duyck > --- > drivers/base/dd.c |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 76c40fe69463..e74cefeb5b69 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -975,9 +975,6 @@ static void __device_release_driver(struct device *dev, > struct device *parent) > > drv = dev->driver; > if (drv) { > - if (driver_allows_async_probing(drv)) > - async_synchronize_full(); > - > while (device_links_busy(dev)) { > __device_driver_unlock(dev, parent); > > @@ -1087,6 +1084,9 @@ void driver_detach(struct device_driver *drv) > struct device_private *dev_prv; > struct device *dev; > > + if (driver_allows_async_probing(drv)) > + async_synchronize_full(); > + > for (;;) { > spin_lock(>p->klist_devices.k_lock); > if (list_empty(>p->klist_devices.k_list)) { Have you considered to move that async_synchronize_full() call into bus_remove_driver()? Verifying the correctness of this patch requires to check whether the async_synchronize_full() comes after the klist_remove(>p->knode_bus) call. That verification is easier when the async_synchronize_full() call occurs in bus_remove_driver() instead of in driver_detach(). Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 0/9] Add NUMA aware async_schedule calls
On Mon, 2018-11-05 at 13:11 -0800, Alexander Duyck wrote: > This patch set provides functionality that will help to improve the > locality of the async_schedule calls used to provide deferred > initialization. Hi Alexander, Is this patch series perhaps available in a public git tree? That would make it easier for me to test my sd changes on top of your patches than applying this patch series myself on kernel v4.20-rc1. Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [driver-core PATCH v5 1/9] workqueue: Provide queue_work_node to queue work near a given NUMA node
On Mon, 2018-11-05 at 13:11 -0800, Alexander Duyck wrote: > +/** > + * workqueue_select_cpu_near - Select a CPU based on NUMA node > + * @node: NUMA node ID that we want to bind a CPU from select? > + /* If CPU is valid return that, otherwise just defer */ > + return (cpu < nr_cpu_ids) ? cpu : WORK_CPU_UNBOUND; Please leave out the superfluous parentheses if this patch series would have to be reposted. Anyway: Reviewed-by: Bart Van Assche ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats
On Tue, 2018-09-25 at 12:15 -0600, Logan Gunthorpe wrote: > On 2018-09-25 11:29 a.m., Bart Van Assche wrote: > > On Tue, 2018-09-25 at 10:22 -0600, Logan Gunthorpe wrote: > > > @@ -83,9 +132,14 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) > > > > > > pdev->p2pdma = p2p; > > > > > > + error = sysfs_create_group(>dev.kobj, _group); > > > + if (error) > > > + goto out_pool_destroy; > > > + > > > return 0; > > > > > > out_pool_destroy: > > > + pdev->p2pdma = NULL; > > > gen_pool_destroy(p2p->pool); > > > out: > > > devm_kfree(>dev, p2p); > > > > This doesn't look right to me. Shouldn't devm_remove_action() be called > > instead > > of devm_kfree() if sysfs_create_group() fails? > > That makes no sense to me. We are reversing a devm_kzalloc() not a > custom action In case what I wrote was not clear: both devm_kzalloc() and devm_add_action_or_reset() have to be reversed if sysfs_create_group() fails. devm_add_action_or_reset() calls devres_add(). The latter function adds an element to the dev->devres_head list. So I think that only calling devm_kfree() if sysfs_create_group() fails will lead to corruption of the dev->devres_head list. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
On Tue, 2018-09-25 at 10:22 -0600, Logan Gunthorpe wrote: > +int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents, > + enum dma_data_direction dir) > +{ > + struct dev_pagemap *pgmap; > + struct scatterlist *s; > + phys_addr_t paddr; > + int i; > + > + /* > + * p2pdma mappings are not compatible with devices that use > + * dma_virt_ops. If the upper layers do the right thing > + * this should never happen because it will be prevented > + * by the check in pci_p2pdma_add_client() > + */ > + if (WARN_ON_ONCE(IS_ENABLED(CONFIG_DMA_VIRT_OPS) && > + dev->dma_ops == _virt_ops)) > + return 0; Are you assuming that the compiler will optimize out the dev->dma_ops == _virt_ops test if CONFIG_DMA_VIRT_OPS=n such that no reference to the dma_virt_ops symbol appears in the object file? Are you sure all compilers and compiler versions that are used to build the Linux kernel will do that? Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats
On Tue, 2018-09-25 at 10:22 -0600, Logan Gunthorpe wrote: > @@ -83,9 +132,14 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) > > pdev->p2pdma = p2p; > > + error = sysfs_create_group(>dev.kobj, _group); > + if (error) > + goto out_pool_destroy; > + > return 0; > > out_pool_destroy: > + pdev->p2pdma = NULL; > gen_pool_destroy(p2p->pool); > out: > devm_kfree(>dev, p2p); This doesn't look right to me. Shouldn't devm_remove_action() be called instead of devm_kfree() if sysfs_create_group() fails? Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests
On Thu, 2018-03-01 at 15:58 +, Stephen Bates wrote: > > Any plans adding the capability to nvme-rdma? Should be > > straight-forward... In theory, the use-case would be rdma backend > > fabric behind. Shouldn't be hard to test either... > > Nice idea Sagi. Yes we have been starting to look at that. Though again we > would probably want to impose the "attached to the same PCIe switch" rule > which might be less common to satisfy in initiator systems. > > Down the road I would also like to discuss the best way to use this P2P > framework to facilitate copies between NVMe namespaces (on both PCIe and > fabric attached namespaces) without having to expose the CMB up to user > space. Wasn't something like that done in the SCSI world at some point > Martin? Are you perhaps referring to the following patch series: "Copy Offload" (https://www.spinics.net/lists/linux-scsi/msg74680.html / https://lwn.net/Articles/592094/)? I will contact Martin off-list and in case he wouldn't have the time myself to revive that patch series then I will free up some time to work on this. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Wed, 2018-01-24 at 19:56 -0800, Dan Williams wrote: > The get_user_pages_longterm() api was recently added as a stop-gap > measure to prevent applications from growing dependencies on the > ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely > with no ongoing coordination with the filesystem. This 'longterm' > pinning is also problematic for the non-DAX VMA case where the core-mm > needs a time bounded way to revoke a pin and manipulate the physical > pages. While existing RDMA applications have already grown the > assumption that they can pin page-cache pages indefinitely, the fact > that we are breaking this assumption for filesystem-dax presents an > opportunity to deprecate the 'indefinite pin' mechanisms and move to a > general interface that supports pin revocation. > > While RDMA may grow an explicit Infiniband-verb for this 'memory > registration with lease' semantic, it seems that this problem is > bigger than just RDMA. At LSF/MM it would be useful to have a > discussion between fs, mm, dax, and RDMA folks about addressing this > problem at the core level. > > Particular people that would be useful to have in attendance are > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). Is on demand paging sufficient as a solution for your use case or do you perhaps need something different? See also https://www.openfabrics.org/images/eventpresos/workshops2013/2013_Workshop_Tues_0930_liss_odp.pdf Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [RFC 01/16] NOVA: Documentation
On Thu, 2017-08-03 at 00:48 -0700, Steven Swanson wrote: > +### DAX Support > + > +Supporting DAX efficiently is a core feature of NOVA and one of the > challenges > +in designing NOVA is reconciling DAX support which aims to avoid file system > +intervention when file data changes, and other features that require such > +intervention. > + > +NOVA's philosophy with respect to DAX is that when a program uses DAX mmap to > +to modify a file, the program must take full responsibility for that data and > +NOVA must ensure that the memory will behave as expected. At other times, > the > +file system provides protection. This approach has several implications: > + > +1. Implementing `msync()` in user space works fine. > + > +2. While a file is mmap'd, it is not protected by NOVA's RAID-style parity > +mechanism, because protecting it would be too expensive. When the file is > +unmapped and/or during file system recovery, protection is restored. > + > +3. The snapshot mechanism must be careful about the order in which in adds > +pages to the file's snapshot image. Hello Steven, Thank you for having shared this very interesting work. After having read the NOVA paper and patch 01/16 I have a question for you. Does the above mean that COW is disabled for writable mmap-ed files? If so, what is the reason behind this? Is there a fundamental issue that does not allow to implement COW for writable mmap-ed files? Or have you perhaps tried to implement this and was the performance not sufficient? Please note that I'm neither a filesystem nor a persistent memory expert. Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: dm: enable opt-out of device-mapper dax support
On Tue, 2017-08-01 at 13:59 -0700, Dan Williams wrote: > On Tue, Aug 1, 2017 at 12:45 PM, Dan Williams> wrote: > > On Tue, Aug 1, 2017 at 12:02 PM, Mike Snitzer wrote: > > > On Tue, Aug 01 2017 at 2:12pm -0400, > > > Dan Williams wrote: > > > > [ ... ] > > > > > > I'm questioning the need to have yet another Kbuild CONFIG option. If > > > the user has enabled CONFIG_BLK_DEV_PMEM and CONFIG_FS_DAX (DAX already > > > gets selected by CONFIG_FS_DAX) then shouldn't the DM capabilities just > > > be enabled? > > > > > > Guess I'm just skeptical of: why do we want to move to a model where > > > users need to opt-in to DM support for DAX? > > > > > > I also _really_ don't like each target's DAX support being colocated in > > > drivers/md/dm-dax.c > > > > > > This all looks and feels like a serious step backwards. > > > > Ok, you want ifdef'd sections of DAX code in each target and make > > DM_DAX a silent option that gets enabled with BLK_DEV_PMEM, anything > > else? > > Actually, no, I was thrown off by Bart's suggestion to move code > around. I can handle this all by deleting "select DAX" and adding more > stubbed out dax helpers. Hello Mike and Dan, How about one *-dax.c file per *.c dm file that has to be modified to add DAX support? I think that approach would avoid collocation of code for different targets in a single dm-dax.c file and would also avoid that #ifdef CONFIG_DAX statements have to be added. This approach is orthogonal to removal of CONFIG_DM_DAX. Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [resend PATCH v2 11/33] dm: add dax_device and dax_operations support
On Mon, 2017-04-17 at 12:09 -0700, Dan Williams wrote: > diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig > index b7767da50c26..1de8372d9459 100644 > --- a/drivers/md/Kconfig > +++ b/drivers/md/Kconfig > @@ -200,6 +200,7 @@ config BLK_DEV_DM_BUILTIN > config BLK_DEV_DM > tristate "Device mapper support" > select BLK_DEV_DM_BUILTIN > + select DAX > ---help--- > Device-mapper is a low level volume manager. It works by allowing > people to specify mappings for ranges of logical sectors. Various (replying to an e-mail of three months ago) Hello Dan, While building a v4.12 kernel I noticed that enabling device mapper support now unconditionally enables DAX. I think there are plenty of systems that use dm but do not need DAX. Have you considered to rework this such that instead of dm selecting DAX that DAX support is only enabled in dm if CONFIG_DAX is enabled? Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On 11/23/2016 09:13 AM, Logan Gunthorpe wrote: IMO any memory that has been registered for a P2P transaction should be locked from being evicted. So if there's a get_user_pages call it needs to be pinned until the put_page. The main issue being with the RDMA case: handling an eviction when a chunk of memory has been registered as an MR would be very tricky. The MR may be relied upon by another host and the kernel would have to inform user-space the MR was invalid then user-space would have to tell the remote application. Hello Logan, Are you aware that the Linux kernel already supports ODP (On Demand Paging)? See also the output of git grep -nHi on.demand.paging. See also https://www.openfabrics.org/images/eventpresos/workshops2014/DevWorkshop/presos/Tuesday/pdf/04_ODP_update.pdf. Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v4 2/5] mmc: move 'parent' tracking to mmc_blk_data
On 06/21/2016 10:46 PM, Dan Williams wrote: In preparation for the removal of 'driverfs_dev' from 'struct gendisk', carry this data in mmc_blk_data. It is used for registration of parent disks and partitions. Cc: Ulf Hansson <ulf.hans...@linaro.org> Reported-by: Bart Van Assche <bart.vanass...@sandisk.com> Signed-off-by: Dan Williams <dan.j.willi...@intel.com> --- drivers/mmc/card/block.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 383184743f9a..d1733424bf6e 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -93,6 +93,7 @@ static DEFINE_SPINLOCK(mmc_blk_lock); */ struct mmc_blk_data { spinlock_t lock; + struct device *parent; struct gendisk *disk; struct mmc_queue queue; struct list_head part; @@ -2270,7 +2271,7 @@ again: md->disk->fops = _bdops; md->disk->private_data = md; md->disk->queue = md->queue.queue; - md->disk->driverfs_dev = parent; + md->parent = parent; set_disk_ro(md->disk, md->read_only || default_ro); md->disk->flags = GENHD_FL_EXT_DEVT; if (area_type & (MMC_BLK_DATA_AREA_RPMB | MMC_BLK_DATA_AREA_BOOT)) @@ -2458,7 +2459,7 @@ static int mmc_add_disk(struct mmc_blk_data *md) int ret; struct mmc_card *card = md->queue.card; - add_disk(md->disk); + device_add_disk(md->parent, md->disk); md->force_ro.show = force_ro_show; md->force_ro.store = force_ro_store; sysfs_attr_init(>force_ro.attr); What will the impact be of this patch on code that accesses driverfs_dev like printk_all_partitions()? Will this patch hurt bisectability? Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v4 3/5] um: track 'parent' device in a local variable
On 06/21/2016 10:47 PM, Dan Williams wrote: In preparation for the removal of 'driverfs_dev' from 'struct gendisk' use a local variable to track the parented vs un-parented case in ubd_disk_register(). Cc: Jeff Dike <jd...@addtoit.com> Cc: Richard Weinberger <rich...@nod.at> Reported-by: Bart Van Assche <bart.vanass...@sandisk.com> Signed-off-by: Dan Williams <dan.j.willi...@intel.com> --- arch/um/drivers/ubd_kern.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index ef6b4d960bad..8ec7b4112f55 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -801,6 +801,7 @@ static void ubd_device_release(struct device *dev) static int ubd_disk_register(int major, u64 size, int unit, struct gendisk **disk_out) { + struct device *dev = NULL; struct gendisk *disk; disk = alloc_disk(1 << UBD_SHIFT); @@ -823,12 +824,12 @@ static int ubd_disk_register(int major, u64 size, int unit, ubd_devs[unit].pdev.dev.release = ubd_device_release; dev_set_drvdata(_devs[unit].pdev.dev, _devs[unit]); platform_device_register(_devs[unit].pdev); - disk->driverfs_dev = _devs[unit].pdev.dev; + dev = _devs[unit].pdev.dev; } disk->private_data = _devs[unit]; disk->queue = ubd_devs[unit].queue; - add_disk(disk); + device_add_disk(dev, disk); *disk_out = disk; return 0; Hello Dan, The Reported-by tag is intended to give people credit who find bugs in the upstream kernel. What I reported was a bug not in the upstream kernel but in a previous version of this patch series so I think the "Reported-by" tag can be left out from this patch. Additionally, please consider to use a more descriptive name instead of "dev", e.g. "parent". Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm