[RFC PATCH v6 2/2] Implement lock_device_hotplug_sysfs() by breaking active protection

2014-05-09 Thread Li Zhong
This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism
to break active protection.

The active protection is there to keep the file alive by blocking
deletion while operations are on-going in the file.  This blocking
creates a dependency loop when an operation running off a sysfs knob
ends up grabbing a lock which may be held while removing the said sysfs
knob.

So the problem here is the order of s_active, and the series of hotplug
related locks.

commit 5e33bc41 solves it by taking out the first of the series of
hoplug related locks, device_hotplug_lock, with a try lock. And if that
try lock fails, it returns a restart syscall error, and drops s_active
temporarily to allow the other process to remove the sysfs knob.

This doesn't help with lockdep warnings reported against s_active and
other hotplug related locks in the series.

This patch instead tries to take s_active out of the lock dependency
graph using the active protection breaking mechanism.

lock_device_hotplug_sysfs() function name is kept here, three more
arguments are added, dev, attr, to help find the kernfs_node
corresponding to the sysfs knob (which should always be able to be
found, WARN if not); kn_out is used to record the found kernfs_node for
use in unlock. unlock_device_hotplug_sysfs() is created to help
cleanup the operations(get reference, break active, etc) done in
lock_device_hotplug_sysfs(), as well as unlock the lock.

As we break the active protection here, the device removing process
might remove the sysfs entries after that point. So after we grab the
device_hotplug_lock, we can check whether that really happens. If so,
we should abort and not invoke the online/offline callbacks anymore.
lockdep assertion is added in try_offline_node() to make sure
device_hotplug_lock is grabbed while removing cpu or memory.

As devcie_hotplug_lock is used to protect hotplug operations with multiple
subsystems. So there might be devices that won't be removed together with
devices of other subsystems, and thus device_hotplug_lock is not needed.
In this case, their specific online/offline callbacks needs to check whether
the device has been removed.

Cc: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: Greg Kroah-Hartman 
Cc: Toshi Kani 
Signed-off-by: Li Zhong 
---
 drivers/base/core.c| 93 +-
 drivers/base/memory.c  |  5 +--
 include/linux/device.h |  7 +++-
 mm/memory_hotplug.c|  2 ++
 4 files changed, 95 insertions(+), 12 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 20da3ad..329f3b4 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -61,14 +61,87 @@ void unlock_device_hotplug(void)
mutex_unlock(_hotplug_lock);
 }
 
-int lock_device_hotplug_sysfs(void)
+void device_hotplug_assert_held(void)
 {
-   if (mutex_trylock(_hotplug_lock))
-   return 0;
+   lockdep_assert_held(_hotplug_lock);
+}
+
+/*
+ * device_hotplug_lock is used to protect hotplugs which may involve multiple
+ * subsystems. This _sysfs() version is created to solve the deadlock with
+ * s_active of the device attribute file, which could be used to online/offline
+ * the device.
+ *
+ * Returns 0 on success. -ENODEV if the @dev is removed after we break the
+ * active protection. (We should always be able to find the kernfs_node
+ * related to the attribute, WARN if not, and also return -ENODEV).
+ *
+ * When success(return 0), the found kernfs_node is passed out in @kn_out,
+ * else pass NULL to @kn_out.
+ */
+int lock_device_hotplug_sysfs(struct device *dev,
+ struct device_attribute *attr,
+ struct kernfs_node **kn_out)
+{
+   struct kernfs_node *kn;
+
+   *kn_out = kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+
+   if (WARN_ON_ONCE(!kn))
+   return -ENODEV;
+
+   /*
+* Break active protection here to avoid deadlocks with device
+* removing process, which tries to remove sysfs entries including this
+* "online" attribute while holding some hotplug related locks.
+*
+* @dev needs to be protected here, or it could go away any time after
+* dropping active protection.
+*/
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
+
+   /*
+* For devices which need device_hoplug_lock to protect the
+* online/offline operation, i.e. it might be removed together with
+* devices in some other subsystems, we can check here whether the
+* device has been removed, so we don't call device_{on|off}line
+* against removed device. lockdep assertion is added in
+* try_offline_node() to make sure the lock is held to remove cpu
+* or memory.
+*
+* If some devices won't be removed together with devices in other
+

[RFC PATCH v6 1/2 ] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-05-09 Thread Li Zhong
While auditing the usage of lock_device_hotplug_sysfs() for implementing
it in another way in following patch, it seems to me that the code here
is to add/remove device, and the files probe/release themselves won't be
removed.

lock_device_hotplug_sysfs() is used to solve the deadlocks of s_active of
the "online" file and the hotplug related locks.

So if one process trying to remove the device, the sysfs files for this device,
including "online" will be removed after device_hotplug_lock is grabbed.
So for another process which might concurrently write to the "online" file,
after it grabs s_active of the "online", it couldn't directly lock
device_hotplug_lock, instead, in *_sysfs() version, try lock is used, and if
it fails, the whole syscall is restarted.

But here the s_active of "probe/release" are not involved in the above process,
as they are not to be removed. So we could use lock_device_hotplug() here.

Cc: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: Greg Kroah-Hartman 
Cc: Toshi Kani 
Signed-off-by: Li Zhong 
---
 drivers/base/cpu.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc..9483225 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -88,9 +88,7 @@ static ssize_t cpu_probe_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_probe(buf, count);
 
@@ -106,9 +104,7 @@ static ssize_t cpu_release_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_release(buf, count);
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-05-09 Thread Li Zhong
viOn Tue, 2014-04-22 at 11:34 +0800, Li Zhong wrote:
> On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> > Hello,
> > 
> > On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> > 
> > Proper /** function comment would be nice.
> 
> Ok, will try to write some in next version.
> 
> > 
> > > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > > +struct device_attribute *attr)
> > 
> > I can see why you did this but let's please not require the user of
> > this function to see how the thing is working internally.  Let's
> > return int and keep track of (or look up again) the kernfs_node
> > internally.
> 
> Ok, it also makes the prototype of lock and unlock look more consistent
> and comfortable. 

When trying to do an new version of the patch, I find that if the device
is really removed, then we couldn't look up using the parent, and
attribute name again in unlock. So I guess maybe I could add one more
argument, e.g. kn_out,:q to track this kernfs node. 

Code will be posted soon for your review. 

Thanks, Zhong

> 
> > 
> > >  {
> > ...
> > > + /*
> > > +  * We assume device_hotplug_lock must be acquired before removing
> > 
> > Is this assumption true?  If so, can we add lockdep assertions in
> > places to verify and enforce this?  If not, aren't we just feeling
> > good when the reality is broken?
> 
> It seems not true ... I think there are devices that don't have the
> online/offline concept, we just need to add it, remove it, like ethernet
> cards. 
> 
> Maybe we could change the comments above, like:
>   /* We assume device_hotplug_lock must be acquired before 
>* removing devices, which have online/offline sysfs knob, 
>* and some locks are needed to serialize the online/offline
>* callbacks and device removing. ...
> ? 
> 
> And we could add lockdep assertions in cpu and memory related code? e.g.
> remove_memory(), unregister_cpu()
> 
> Currently, remove_memory() has comments for the function:
> 
>  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
>  * and online/offline operations before this call, as required by
>  * try_offline_node().
>  */
> 
> maybe it could be removed with the lockdep assertion.
> 
> > ...
> > 
> > Function comment please.
> 
> OK. 
> 
> Thanks, Zhong
> 
> > > +void unlock_device_hotplug_sysfs(struct device *dev,
> > > +  struct kernfs_node *kn)
> > > +{
> > > + unlock_device_hotplug();
> > > + kernfs_unbreak_active_protection(kn);
> > > + put_device(dev);
> > > + kernfs_put(kn);
> > >  }
> > 
> > Thanks.
> > 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-05-09 Thread Li Zhong
viOn Tue, 2014-04-22 at 11:34 +0800, Li Zhong wrote:
 On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
  Hello,
  
  On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
  
  Proper /** function comment would be nice.
 
 Ok, will try to write some in next version.
 
  
   +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
   +struct device_attribute *attr)
  
  I can see why you did this but let's please not require the user of
  this function to see how the thing is working internally.  Let's
  return int and keep track of (or look up again) the kernfs_node
  internally.
 
 Ok, it also makes the prototype of lock and unlock look more consistent
 and comfortable. 

When trying to do an new version of the patch, I find that if the device
is really removed, then we couldn't look up using the parent, and
attribute name again in unlock. So I guess maybe I could add one more
argument, e.g. kn_out,:q to track this kernfs node. 

Code will be posted soon for your review. 

Thanks, Zhong

 
  
{
  ...
   + /*
   +  * We assume device_hotplug_lock must be acquired before removing
  
  Is this assumption true?  If so, can we add lockdep assertions in
  places to verify and enforce this?  If not, aren't we just feeling
  good when the reality is broken?
 
 It seems not true ... I think there are devices that don't have the
 online/offline concept, we just need to add it, remove it, like ethernet
 cards. 
 
 Maybe we could change the comments above, like:
   /* We assume device_hotplug_lock must be acquired before 
* removing devices, which have online/offline sysfs knob, 
* and some locks are needed to serialize the online/offline
* callbacks and device removing. ...
 ? 
 
 And we could add lockdep assertions in cpu and memory related code? e.g.
 remove_memory(), unregister_cpu()
 
 Currently, remove_memory() has comments for the function:
 
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations before this call, as required by
  * try_offline_node().
  */
 
 maybe it could be removed with the lockdep assertion.
 
  ...
  
  Function comment please.
 
 OK. 
 
 Thanks, Zhong
 
   +void unlock_device_hotplug_sysfs(struct device *dev,
   +  struct kernfs_node *kn)
   +{
   + unlock_device_hotplug();
   + kernfs_unbreak_active_protection(kn);
   + put_device(dev);
   + kernfs_put(kn);
}
  
  Thanks.
  
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v6 1/2 ] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-05-09 Thread Li Zhong
While auditing the usage of lock_device_hotplug_sysfs() for implementing
it in another way in following patch, it seems to me that the code here
is to add/remove device, and the files probe/release themselves won't be
removed.

lock_device_hotplug_sysfs() is used to solve the deadlocks of s_active of
the online file and the hotplug related locks.

So if one process trying to remove the device, the sysfs files for this device,
including online will be removed after device_hotplug_lock is grabbed.
So for another process which might concurrently write to the online file,
after it grabs s_active of the online, it couldn't directly lock
device_hotplug_lock, instead, in *_sysfs() version, try lock is used, and if
it fails, the whole syscall is restarted.

But here the s_active of probe/release are not involved in the above process,
as they are not to be removed. So we could use lock_device_hotplug() here.

Cc: Tejun Heo t...@kernel.org
Cc: Rafael J. Wysocki rafael.j.wyso...@intel.com
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Toshi Kani toshi.k...@hp.com
Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/cpu.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc..9483225 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -88,9 +88,7 @@ static ssize_t cpu_probe_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_probe(buf, count);
 
@@ -106,9 +104,7 @@ static ssize_t cpu_release_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_release(buf, count);
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v6 2/2] Implement lock_device_hotplug_sysfs() by breaking active protection

2014-05-09 Thread Li Zhong
This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism
to break active protection.

The active protection is there to keep the file alive by blocking
deletion while operations are on-going in the file.  This blocking
creates a dependency loop when an operation running off a sysfs knob
ends up grabbing a lock which may be held while removing the said sysfs
knob.

So the problem here is the order of s_active, and the series of hotplug
related locks.

commit 5e33bc41 solves it by taking out the first of the series of
hoplug related locks, device_hotplug_lock, with a try lock. And if that
try lock fails, it returns a restart syscall error, and drops s_active
temporarily to allow the other process to remove the sysfs knob.

This doesn't help with lockdep warnings reported against s_active and
other hotplug related locks in the series.

This patch instead tries to take s_active out of the lock dependency
graph using the active protection breaking mechanism.

lock_device_hotplug_sysfs() function name is kept here, three more
arguments are added, dev, attr, to help find the kernfs_node
corresponding to the sysfs knob (which should always be able to be
found, WARN if not); kn_out is used to record the found kernfs_node for
use in unlock. unlock_device_hotplug_sysfs() is created to help
cleanup the operations(get reference, break active, etc) done in
lock_device_hotplug_sysfs(), as well as unlock the lock.

As we break the active protection here, the device removing process
might remove the sysfs entries after that point. So after we grab the
device_hotplug_lock, we can check whether that really happens. If so,
we should abort and not invoke the online/offline callbacks anymore.
lockdep assertion is added in try_offline_node() to make sure
device_hotplug_lock is grabbed while removing cpu or memory.

As devcie_hotplug_lock is used to protect hotplug operations with multiple
subsystems. So there might be devices that won't be removed together with
devices of other subsystems, and thus device_hotplug_lock is not needed.
In this case, their specific online/offline callbacks needs to check whether
the device has been removed.

Cc: Tejun Heo t...@kernel.org
Cc: Rafael J. Wysocki rafael.j.wyso...@intel.com
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Toshi Kani toshi.k...@hp.com
Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/core.c| 93 +-
 drivers/base/memory.c  |  5 +--
 include/linux/device.h |  7 +++-
 mm/memory_hotplug.c|  2 ++
 4 files changed, 95 insertions(+), 12 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 20da3ad..329f3b4 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -61,14 +61,87 @@ void unlock_device_hotplug(void)
mutex_unlock(device_hotplug_lock);
 }
 
-int lock_device_hotplug_sysfs(void)
+void device_hotplug_assert_held(void)
 {
-   if (mutex_trylock(device_hotplug_lock))
-   return 0;
+   lockdep_assert_held(device_hotplug_lock);
+}
+
+/*
+ * device_hotplug_lock is used to protect hotplugs which may involve multiple
+ * subsystems. This _sysfs() version is created to solve the deadlock with
+ * s_active of the device attribute file, which could be used to online/offline
+ * the device.
+ *
+ * Returns 0 on success. -ENODEV if the @dev is removed after we break the
+ * active protection. (We should always be able to find the kernfs_node
+ * related to the attribute, WARN if not, and also return -ENODEV).
+ *
+ * When success(return 0), the found kernfs_node is passed out in @kn_out,
+ * else pass NULL to @kn_out.
+ */
+int lock_device_hotplug_sysfs(struct device *dev,
+ struct device_attribute *attr,
+ struct kernfs_node **kn_out)
+{
+   struct kernfs_node *kn;
+
+   *kn_out = kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
+
+   if (WARN_ON_ONCE(!kn))
+   return -ENODEV;
+
+   /*
+* Break active protection here to avoid deadlocks with device
+* removing process, which tries to remove sysfs entries including this
+* online attribute while holding some hotplug related locks.
+*
+* @dev needs to be protected here, or it could go away any time after
+* dropping active protection.
+*/
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
+
+   /*
+* For devices which need device_hoplug_lock to protect the
+* online/offline operation, i.e. it might be removed together with
+* devices in some other subsystems, we can check here whether the
+* device has been removed, so we don't call device_{on|off}line
+* against removed device. lockdep assertion is added in
+* try_offline_node() to make sure the lock is held to remove cpu

Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 09:59 -0400, Alan Stern wrote:
> On Fri, 25 Apr 2014, Li Zhong wrote:
> 
> > > No, this isn't self removal. The driver-attribute (not device-attribute)
> > > store operation simply grabs a lock that is also held while the driver
> > > is being deregistered at module unload. Taking a reference to the module
> > > in this case will prevent deregistration while store is running.
> > > 
> > > But it seems like this can be solved for usb-serial by simply not
> > > holding the lock while deregistering.
> > 
> > I didn't look carefully about this lock. 
> > 
> > But I'm not sure whether there are such requirements for driver
> > attributes:
> > 
> > some lock needs be grabbed in the driver attributes store callbacks, and
> > the same lock also needs to be grabbed during driver unregister. 
> 
> In this case, the lock does _not_ need to be grabbed during driver 
> unregister.  The driver grabs the lock, but it doesn't need to.

OK.

> 
> > If we have such requirements currently or in the future, I think they
> > could all be solved by breaking active protection after get the module
> > reference.
> 
> No!  That would be very bad.
> 
> Unloading modules is quite different from unbinding drivers.  After the
> driver is unbound, its attribute callback routines can continue to run.  
> But after a driver module has been unloaded, its attribute callback 
> routines _cannot_ run because they aren't present in memory any more.
> 
> If we allowed a module to be unloaded while one of its callbacks was 
> running (because active protection was broken), imagine what would 
> happen...

I don't think the module could be unloaded after we increased the module
reference counter. 

Thanks, Zhong

> 
> Alan Stern
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 09:54 -0400, Alan Stern wrote:
> On Fri, 25 Apr 2014, Li Zhong wrote:
> 
> > > I don't get why try_module_get() matters here.  We can't call into
> > > ->store if the object at hand is already destroyed and the underlying
> > > module can't go away if the target device is still alive.
> > > try_module_get() doesn't actually protect the object.  Why does that
> > > matter?  This is self removal, right?  Can you please take a look at
> > > kernfs_remove_self()?
> > 
> > This is about one process writing something to driver attributes, and
> > one process trying to unload this driver. 
> > 
> > I think try_module_get() could detect whether the driver is being
> > unloaded, and if not, prevent it from being unloaded, so it could
> > protect the object here by not allow the driver to be unloaded.
> 
> That isn't how try_module_get() works.  If the module is being 
> unloaded, try_module_get() simply fails.  It does not prevent the 
> module from being unloaded -- that's why its name begins with "try".

Yes, I know that. What I said above is for the case when
try_module_get() detects the driver is NOT being unloaded, and increases
the reference counter. 

Thanks, Zhong
> 
> Alan Stern
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 14:47 +0200, Rafael J. Wysocki wrote:
> On 4/25/2014 3:46 AM, Li Zhong wrote:
> > On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
> >> On 4/24/2014 10:59 AM, Li Zhong wrote:
> >>> On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
> >>>> On 4/23/2014 4:23 PM, Tejun Heo wrote:
> >>>>> Hello, Rafael.
> >>>> Hi,
> >>>>
> >>>>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
> >>>>>> Can you please elaborate a bit?
> >>>>> Because it can get involved in larger locking dependency issues by
> >>>>> joining dependency graphs of two otherwise largely disjoint
> >>>>> subsystems.  It has potential to create possible deadlocks which don't
> >>>>> need to exist.
> >>>> Well, I do my best not to add unnecessary locks if that's what you mean.
> >>>>
> >>>>>> It is there to protect hotplug operations involving multiple devices
> >>>>>> (in different subsystems) from racing with each other.  Why exactly
> >>>>>> is it bad?
> >>>>> But why would different subsystems, say cpu and memory, use the same
> >>>>> lock?  Wouldn't those subsystems already have proper locking inside
> >>>>> their own subsystems?
> >>>> That locking is not sufficient.
> >>>>
> >>>>> Why add this additional global lock across multiple subsystems?
> >>>> That basically is because of how eject works when it is triggered via 
> >>>> ACPI.
> >>>>
> >>>> It is signaled for a device at the top of a subtree.  It may be a
> >>>> container of some sort and the eject involves everything below that
> >>>> device in the ACPI namespace.  That may involve multiple subsystem
> >>>> (CPUs, memory, PCI host bridge, etc.).
> >>>>
> >>>> We do that in two steps, offline (which can fail) and eject proper
> >>>> (which can't fail and makes all of the involved devices go away). All
> >>>> that has to be done in one go with respect to the sysfs-triggered
> >>>> offline/online and that's why the lock is there.
> >>> Thank you for the education. It's more clear to me now why we need this
> >>> lock.
> >>>
> >>> I still have some small questions about when this lock is needed:
> >>>
> >>> I could understand sysfs-triggered online is not acceptable when
> >>> removing devices in multiple subsystems. But maybe concurrent offline
> >>> and remove(with proper per subsystem locks) seems not harmful?
> >>>
> >>> And if we just want to remove some devices in a specific subsystem, e.g.
> >>> like writing /cpu/release, if it just wants to offline and remove some
> >>> cpus, then maybe we don't require the device_hotplug_lock to be taken?
> >> No and no.
> >>
> >> If the offline phase fails for any device in the subtree, we roll back
> >> the operation
> >> and online devices that have already been offlined by it.  Also the ACPI
> >> hot-addition
> >> needs to acquire device_hotplug_lock so that it doesn't race with ejects
> >> and so
> >> that lock needs to be taken by sysfs-triggered offline too.
> > I can understand that hot-addition needs the device_hotplug lock, but
> > still not very clear about the offline.
> >
> > I guess your are describing following scenario:
> >
> > user A: (trying remove cpu 1 and memory 1-10)
> >
> > lock_device_hotplug
> > offline cpu with cpu locks  -- successful
> > offline memories with memory locks  -- failed, e.g. for memory8
> > online cpu and memory with their locks
> > unlock_device_hotplug
> 
> What about if all is successful and CPU1 is gone before 
> device_hotplug_lock is released?

You mean user B will try to offline an already removed cpu1? But I think
the cpu subsys locks should be able to handle such situation?

> 
> > user B: (trying offline cpu 1)
> >
> > offline cpu with cpu locks
> >
> > But I don't see any problem for user B not taking the device_hotplug
> > lock. The result may be different for user B to take or not take the
> > lock. But I think it could be seen as concurrent online/offline for cpu1
> > under cpu hotplug locks, which just depends on which is executed last?
> >
> > Or did I miss something here?
> 
> Yes, yo

Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 08:28 -0400, Tejun Heo wrote:
> On Fri, Apr 25, 2014 at 09:56:10AM +0800, Li Zhong wrote:
> > > A nests cpu subsys mutex under s_active of the online node.  B nests
> > > s_active of the online node under the cpu subsys mutex.  What am I
> > > missing?
> > 
> > From the above two chain, I think the problem is cpu_subsys_mutex and
> > s_active(online), which is the deadlock we are trying to solve in patch
> 
> Yeap, that one was what I was thinking about.
> 
> > #2. I seems to me s_active(release) here doesn't have lock issues? 
> 
> And your patch doesn't change the situation for online.  

I'm a little confused here...

To clarify for this patch(#1): 

Currently(before patch#2), the s_active(online) and cpu subsys mutex
deadlock is solved by lock_device_hotplug_sysfs(). (lockdep warnings are
still there, so we want to change the implementation of
lock_device_hotplug_sysfs() in patch #2 to take out of s_active_online,
instead of device_hotplug_lock, which is grabbed before cpu subsys
mutex). 

But here, for the probe/release, I think we really don't need to use
this _sysfs() version. 

So it is replaced with lock_device_hotplug() directly to avoid two
unnecessary conversions of lock_device_hotplug_sysfs(), that otherwise
need to be done in patch #2.

Thanks, Zhong

> Alright,
> thanks a lot for the explanation.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 12:15 +0200, Johan Hovold wrote:
> On Fri, Apr 25, 2014 at 10:16:57AM +0800, Li Zhong wrote:
> > On Thu, 2014-04-24 at 16:52 +0200, Johan Hovold wrote:
> 
> > > No, this isn't self removal. The driver-attribute (not device-attribute)
> > > store operation simply grabs a lock that is also held while the driver
> > > is being deregistered at module unload. Taking a reference to the module
> > > in this case will prevent deregistration while store is running.
> > > 
> > > But it seems like this can be solved for usb-serial by simply not
> > > holding the lock while deregistering.
> > 
> > I didn't look carefully about this lock. 
> > 
> > But I'm not sure whether there are such requirements for driver
> > attributes:
> > 
> > some lock needs be grabbed in the driver attributes store callbacks, and
> > the same lock also needs to be grabbed during driver unregister. 
> > 
> > If we have such requirements currently or in the future, I think they
> > could all be solved by breaking active protection after get the module
> > reference.
> 
> Yes, if we find any such cases, but I don't think it should be done
> generally (as in one of your earlier posts).

OK, maybe we only need to reconsider this only if we see some more
similar lockdep warnings in the future. 

> 
> Active protection is usually exactly what prevents the driver from being
> deregistered, and would only need to be broken to avoid any ABBA
> deadlocks from being reported by lockdep (the module reference would be
> enough to prevent the actual deadlock).

Yes, maybe try to get the module reference is not bad before writing to
driver attributes, as it doesn't make much sense to really call the
callbacks for the driver attribute if the driver is being unload.

And after we get the reference, it is safe for us to break the active.
But if we don't have such real cases(lockdep warnings), we actually
don't need to break it. 

Thanks, Zhong

> 
> Thanks,
> Johan
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 12:15 +0200, Johan Hovold wrote:
 On Fri, Apr 25, 2014 at 10:16:57AM +0800, Li Zhong wrote:
  On Thu, 2014-04-24 at 16:52 +0200, Johan Hovold wrote:
 
   No, this isn't self removal. The driver-attribute (not device-attribute)
   store operation simply grabs a lock that is also held while the driver
   is being deregistered at module unload. Taking a reference to the module
   in this case will prevent deregistration while store is running.
   
   But it seems like this can be solved for usb-serial by simply not
   holding the lock while deregistering.
  
  I didn't look carefully about this lock. 
  
  But I'm not sure whether there are such requirements for driver
  attributes:
  
  some lock needs be grabbed in the driver attributes store callbacks, and
  the same lock also needs to be grabbed during driver unregister. 
  
  If we have such requirements currently or in the future, I think they
  could all be solved by breaking active protection after get the module
  reference.
 
 Yes, if we find any such cases, but I don't think it should be done
 generally (as in one of your earlier posts).

OK, maybe we only need to reconsider this only if we see some more
similar lockdep warnings in the future. 

 
 Active protection is usually exactly what prevents the driver from being
 deregistered, and would only need to be broken to avoid any ABBA
 deadlocks from being reported by lockdep (the module reference would be
 enough to prevent the actual deadlock).

Yes, maybe try to get the module reference is not bad before writing to
driver attributes, as it doesn't make much sense to really call the
callbacks for the driver attribute if the driver is being unload.

And after we get the reference, it is safe for us to break the active.
But if we don't have such real cases(lockdep warnings), we actually
don't need to break it. 

Thanks, Zhong

 
 Thanks,
 Johan
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 08:28 -0400, Tejun Heo wrote:
 On Fri, Apr 25, 2014 at 09:56:10AM +0800, Li Zhong wrote:
   A nests cpu subsys mutex under s_active of the online node.  B nests
   s_active of the online node under the cpu subsys mutex.  What am I
   missing?
  
  From the above two chain, I think the problem is cpu_subsys_mutex and
  s_active(online), which is the deadlock we are trying to solve in patch
 
 Yeap, that one was what I was thinking about.
 
  #2. I seems to me s_active(release) here doesn't have lock issues? 
 
 And your patch doesn't change the situation for online.  

I'm a little confused here...

To clarify for this patch(#1): 

Currently(before patch#2), the s_active(online) and cpu subsys mutex
deadlock is solved by lock_device_hotplug_sysfs(). (lockdep warnings are
still there, so we want to change the implementation of
lock_device_hotplug_sysfs() in patch #2 to take out of s_active_online,
instead of device_hotplug_lock, which is grabbed before cpu subsys
mutex). 

But here, for the probe/release, I think we really don't need to use
this _sysfs() version. 

So it is replaced with lock_device_hotplug() directly to avoid two
unnecessary conversions of lock_device_hotplug_sysfs(), that otherwise
need to be done in patch #2.

Thanks, Zhong

 Alright,
 thanks a lot for the explanation.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 14:47 +0200, Rafael J. Wysocki wrote:
 On 4/25/2014 3:46 AM, Li Zhong wrote:
  On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
  On 4/24/2014 10:59 AM, Li Zhong wrote:
  On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
  On 4/23/2014 4:23 PM, Tejun Heo wrote:
  Hello, Rafael.
  Hi,
 
  On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
  Can you please elaborate a bit?
  Because it can get involved in larger locking dependency issues by
  joining dependency graphs of two otherwise largely disjoint
  subsystems.  It has potential to create possible deadlocks which don't
  need to exist.
  Well, I do my best not to add unnecessary locks if that's what you mean.
 
  It is there to protect hotplug operations involving multiple devices
  (in different subsystems) from racing with each other.  Why exactly
  is it bad?
  But why would different subsystems, say cpu and memory, use the same
  lock?  Wouldn't those subsystems already have proper locking inside
  their own subsystems?
  That locking is not sufficient.
 
  Why add this additional global lock across multiple subsystems?
  That basically is because of how eject works when it is triggered via 
  ACPI.
 
  It is signaled for a device at the top of a subtree.  It may be a
  container of some sort and the eject involves everything below that
  device in the ACPI namespace.  That may involve multiple subsystem
  (CPUs, memory, PCI host bridge, etc.).
 
  We do that in two steps, offline (which can fail) and eject proper
  (which can't fail and makes all of the involved devices go away). All
  that has to be done in one go with respect to the sysfs-triggered
  offline/online and that's why the lock is there.
  Thank you for the education. It's more clear to me now why we need this
  lock.
 
  I still have some small questions about when this lock is needed:
 
  I could understand sysfs-triggered online is not acceptable when
  removing devices in multiple subsystems. But maybe concurrent offline
  and remove(with proper per subsystem locks) seems not harmful?
 
  And if we just want to remove some devices in a specific subsystem, e.g.
  like writing /cpu/release, if it just wants to offline and remove some
  cpus, then maybe we don't require the device_hotplug_lock to be taken?
  No and no.
 
  If the offline phase fails for any device in the subtree, we roll back
  the operation
  and online devices that have already been offlined by it.  Also the ACPI
  hot-addition
  needs to acquire device_hotplug_lock so that it doesn't race with ejects
  and so
  that lock needs to be taken by sysfs-triggered offline too.
  I can understand that hot-addition needs the device_hotplug lock, but
  still not very clear about the offline.
 
  I guess your are describing following scenario:
 
  user A: (trying remove cpu 1 and memory 1-10)
 
  lock_device_hotplug
  offline cpu with cpu locks  -- successful
  offline memories with memory locks  -- failed, e.g. for memory8
  online cpu and memory with their locks
  unlock_device_hotplug
 
 What about if all is successful and CPU1 is gone before 
 device_hotplug_lock is released?

You mean user B will try to offline an already removed cpu1? But I think
the cpu subsys locks should be able to handle such situation?

 
  user B: (trying offline cpu 1)
 
  offline cpu with cpu locks
 
  But I don't see any problem for user B not taking the device_hotplug
  lock. The result may be different for user B to take or not take the
  lock. But I think it could be seen as concurrent online/offline for cpu1
  under cpu hotplug locks, which just depends on which is executed last?
 
  Or did I miss something here?
 
 Yes, you could do offline in parallel with user A without taking 
 device_hotplug_lock, but the result may be surprising to user B then.
 
 With device _hotplug_lock user B will always see CPU1 off line (or gone) 
 after his offline in this scenario, while without taking the lock user B 
 may sometimes see CPU1 on line after his offline.  I don't think that's 
 a good thing.

That seems complicated after some more thinking. 

I think I missed something when describing the steps for A. I think the
initial online/offline state needs to be recorded by offline operations
in A, so the rollback could be done based on the initial state. 

If adding the above, then 
1) B offline cpu 1 before A offline cpu 1 
A could see the initial state of cpu1 as offline, and the rollback would
not put cpu1 online again. 

In the code, I think the check is done at 
if (!cpu_online(cpu))
return -EINVAL;
So the pn-put_online is kept as false. 

So the result is cpu1 offline.

2) B offline cpu 1 after A offline cpu1 
then the rollback would online cpu1 

2.1) B offline cpu1 after A rollback
The result is cpu1 offline, good.

2.2) B offline cpu1 before A rollback

B would see a -EINVAL error, and the result is cpu1 online. 

I guess this is the case you mentioned

Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 09:54 -0400, Alan Stern wrote:
 On Fri, 25 Apr 2014, Li Zhong wrote:
 
   I don't get why try_module_get() matters here.  We can't call into
   -store if the object at hand is already destroyed and the underlying
   module can't go away if the target device is still alive.
   try_module_get() doesn't actually protect the object.  Why does that
   matter?  This is self removal, right?  Can you please take a look at
   kernfs_remove_self()?
  
  This is about one process writing something to driver attributes, and
  one process trying to unload this driver. 
  
  I think try_module_get() could detect whether the driver is being
  unloaded, and if not, prevent it from being unloaded, so it could
  protect the object here by not allow the driver to be unloaded.
 
 That isn't how try_module_get() works.  If the module is being 
 unloaded, try_module_get() simply fails.  It does not prevent the 
 module from being unloaded -- that's why its name begins with try.

Yes, I know that. What I said above is for the case when
try_module_get() detects the driver is NOT being unloaded, and increases
the reference counter. 

Thanks, Zhong
 
 Alan Stern
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 09:59 -0400, Alan Stern wrote:
 On Fri, 25 Apr 2014, Li Zhong wrote:
 
   No, this isn't self removal. The driver-attribute (not device-attribute)
   store operation simply grabs a lock that is also held while the driver
   is being deregistered at module unload. Taking a reference to the module
   in this case will prevent deregistration while store is running.
   
   But it seems like this can be solved for usb-serial by simply not
   holding the lock while deregistering.
  
  I didn't look carefully about this lock. 
  
  But I'm not sure whether there are such requirements for driver
  attributes:
  
  some lock needs be grabbed in the driver attributes store callbacks, and
  the same lock also needs to be grabbed during driver unregister. 
 
 In this case, the lock does _not_ need to be grabbed during driver 
 unregister.  The driver grabs the lock, but it doesn't need to.

OK.

 
  If we have such requirements currently or in the future, I think they
  could all be solved by breaking active protection after get the module
  reference.
 
 No!  That would be very bad.
 
 Unloading modules is quite different from unbinding drivers.  After the
 driver is unbound, its attribute callback routines can continue to run.  
 But after a driver module has been unloaded, its attribute callback 
 routines _cannot_ run because they aren't present in memory any more.
 
 If we allowed a module to be unloaded while one of its callbacks was 
 running (because active protection was broken), imagine what would 
 happen...

I don't think the module could be unloaded after we increased the module
reference counter. 

Thanks, Zhong

 
 Alan Stern
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 16:52 +0200, Johan Hovold wrote:
> On Thu, Apr 24, 2014 at 10:35:17AM -0400, Tejun Heo wrote:
> > On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
> > > On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> > > > cc'ing Li Zhong who's working on a simliar issue in the following
> > > > thread and quoting whole body.
> > > > 
> > > >   http://thread.gmane.org/gmane.linux.kernel/1680706
> > > > 
> > > > Li, this is another variation of the same problem.  Maybe this can be
> > > > covered by your work too?
> > > 
> > > It seems to me that it is about write something to driver attribute, and
> > > driver unloading. If so, maybe it's not easy to reuse the help functions
> > > created for device attribute, and device removing.
> > > 
> > > But I guess the idea to break the active protection could still be
> > > applied here:
> > > 
> > > Maybe we could try_module_get() here (like the other option suggested by
> > > Johan?), and break active protection if we could get the module,
> > > something like below? 
> > 
> > I don't get why try_module_get() matters here.  We can't call into
> > ->store if the object at hand is already destroyed and the underlying
> > module can't go away if the target device is still alive.
> > try_module_get() doesn't actually protect the object.  Why does that
> > matter?  This is self removal, right?  Can you please take a look at
> > kernfs_remove_self()?
> 
> No, this isn't self removal. The driver-attribute (not device-attribute)
> store operation simply grabs a lock that is also held while the driver
> is being deregistered at module unload. Taking a reference to the module
> in this case will prevent deregistration while store is running.
> 
> But it seems like this can be solved for usb-serial by simply not
> holding the lock while deregistering.

I didn't look carefully about this lock. 

But I'm not sure whether there are such requirements for driver
attributes:

some lock needs be grabbed in the driver attributes store callbacks, and
the same lock also needs to be grabbed during driver unregister. 

If we have such requirements currently or in the future, I think they
could all be solved by breaking active protection after get the module
reference.

Thanks, Zhong

> 
> Johan
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 10:35 -0400, Tejun Heo wrote:
> On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
> > On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> > > cc'ing Li Zhong who's working on a simliar issue in the following
> > > thread and quoting whole body.
> > > 
> > >   http://thread.gmane.org/gmane.linux.kernel/1680706
> > > 
> > > Li, this is another variation of the same problem.  Maybe this can be
> > > covered by your work too?
> > 
> > It seems to me that it is about write something to driver attribute, and
> > driver unloading. If so, maybe it's not easy to reuse the help functions
> > created for device attribute, and device removing.
> > 
> > But I guess the idea to break the active protection could still be
> > applied here:
> > 
> > Maybe we could try_module_get() here (like the other option suggested by
> > Johan?), and break active protection if we could get the module,
> > something like below? 
> 
> I don't get why try_module_get() matters here.  We can't call into
> ->store if the object at hand is already destroyed and the underlying
> module can't go away if the target device is still alive.
> try_module_get() doesn't actually protect the object.  Why does that
> matter?  This is self removal, right?  Can you please take a look at
> kernfs_remove_self()?

This is about one process writing something to driver attributes, and
one process trying to unload this driver. 

I think try_module_get() could detect whether the driver is being
unloaded, and if not, prevent it from being unloaded, so it could
protect the object here by not allow the driver to be unloaded.

And if the driver is being unloaded, we could abort the write operation
directly. 

Thanks, Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 10:32 -0400, Tejun Heo wrote:
> Hello,
> 
> On Thu, Apr 24, 2014 at 04:37:23PM +0800, Li Zhong wrote:
> > On Wed, 2014-04-23 at 10:39 -0400, Tejun Heo wrote:
> > After thinking it harder, I still couldn't see ABBA here ... 
> > 
> > the active protection taken here is for "probe/release" which will not
> > be waited for removing something like "online" under cpu#? Or my
> > assumption that s_active for different files are different locks are
> > completely wrong? Or I missed something else? 
> 
> I'm probably confused about the locking.  I was thinking a scenario
> like the following.
> 
> A. CPU on/offline
> 
>grabs s_active protection of online node
>grabs cpu subsys mutex
>perform on/offline
>releases cpu subsys mutex
>releases s_active protection of online node

so the chain here is s_active(online) -> cpu_subsys_mutex

> 
> B. CPU release
> 
>grabs s_active protection of release node
>grabs cpu subsys mutex
>performs removal of the CPU
>removes the online node
>releases cpu subsys mutex
>releases s_active protection of release node

and the chain here is s_active(release) -> cpu_subsys_mutex ->
s_active(online)

> 
> A nests cpu subsys mutex under s_active of the online node.  B nests
> s_active of the online node under the cpu subsys mutex.  What am I
> missing?

>From the above two chain, I think the problem is cpu_subsys_mutex and
s_active(online), which is the deadlock we are trying to solve in patch
#2. I seems to me s_active(release) here doesn't have lock issues? 

Thanks, Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
> On 4/24/2014 10:59 AM, Li Zhong wrote:
> > On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
> >> On 4/23/2014 4:23 PM, Tejun Heo wrote:
> >>> Hello, Rafael.
> >> Hi,
> >>
> >>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
> >>>> Can you please elaborate a bit?
> >>> Because it can get involved in larger locking dependency issues by
> >>> joining dependency graphs of two otherwise largely disjoint
> >>> subsystems.  It has potential to create possible deadlocks which don't
> >>> need to exist.
> >> Well, I do my best not to add unnecessary locks if that's what you mean.
> >>
> >>>> It is there to protect hotplug operations involving multiple devices
> >>>> (in different subsystems) from racing with each other.  Why exactly
> >>>> is it bad?
> >>> But why would different subsystems, say cpu and memory, use the same
> >>> lock?  Wouldn't those subsystems already have proper locking inside
> >>> their own subsystems?
> >> That locking is not sufficient.
> >>
> >>> Why add this additional global lock across multiple subsystems?
> >> That basically is because of how eject works when it is triggered via ACPI.
> >>
> >> It is signaled for a device at the top of a subtree.  It may be a
> >> container of some sort and the eject involves everything below that
> >> device in the ACPI namespace.  That may involve multiple subsystem
> >> (CPUs, memory, PCI host bridge, etc.).
> >>
> >> We do that in two steps, offline (which can fail) and eject proper
> >> (which can't fail and makes all of the involved devices go away). All
> >> that has to be done in one go with respect to the sysfs-triggered
> >> offline/online and that's why the lock is there.
> > Thank you for the education. It's more clear to me now why we need this
> > lock.
> >
> > I still have some small questions about when this lock is needed:
> >
> > I could understand sysfs-triggered online is not acceptable when
> > removing devices in multiple subsystems. But maybe concurrent offline
> > and remove(with proper per subsystem locks) seems not harmful?
> >
> > And if we just want to remove some devices in a specific subsystem, e.g.
> > like writing /cpu/release, if it just wants to offline and remove some
> > cpus, then maybe we don't require the device_hotplug_lock to be taken?
> 
> No and no.
> 
> If the offline phase fails for any device in the subtree, we roll back 
> the operation
> and online devices that have already been offlined by it.  Also the ACPI 
> hot-addition
> needs to acquire device_hotplug_lock so that it doesn't race with ejects 
> and so
> that lock needs to be taken by sysfs-triggered offline too.

I can understand that hot-addition needs the device_hotplug lock, but
still not very clear about the offline. 

I guess your are describing following scenario: 

user A: (trying remove cpu 1 and memory 1-10)

lock_device_hotplug
offline cpu with cpu locks  -- successful
offline memories with memory locks  -- failed, e.g. for memory8
online cpu and memory with their locks
unlock_device_hotplug

user B: (trying offline cpu 1)

offline cpu with cpu locks

But I don't see any problem for user B not taking the device_hotplug
lock. The result may be different for user B to take or not take the
lock. But I think it could be seen as concurrent online/offline for cpu1
under cpu hotplug locks, which just depends on which is executed last? 

Or did I miss something here? 

Thanks, Zhong

> Thanks,
> Rafael
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-24 Thread Li Zhong
On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
> On 4/23/2014 4:23 PM, Tejun Heo wrote:
> > Hello, Rafael.
> 
> Hi,
> 
> > On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
> >> Can you please elaborate a bit?
> > Because it can get involved in larger locking dependency issues by
> > joining dependency graphs of two otherwise largely disjoint
> > subsystems.  It has potential to create possible deadlocks which don't
> > need to exist.
> 
> Well, I do my best not to add unnecessary locks if that's what you mean.
> 
> >> It is there to protect hotplug operations involving multiple devices
> >> (in different subsystems) from racing with each other.  Why exactly
> >> is it bad?
> > But why would different subsystems, say cpu and memory, use the same
> > lock?  Wouldn't those subsystems already have proper locking inside
> > their own subsystems?
> 
> That locking is not sufficient.
> 
> > Why add this additional global lock across multiple subsystems?
> 
> That basically is because of how eject works when it is triggered via ACPI.
> 
> It is signaled for a device at the top of a subtree.  It may be a 
> container of some sort and the eject involves everything below that 
> device in the ACPI namespace.  That may involve multiple subsystem 
> (CPUs, memory, PCI host bridge, etc.).
> 
> We do that in two steps, offline (which can fail) and eject proper 
> (which can't fail and makes all of the involved devices go away). All 
> that has to be done in one go with respect to the sysfs-triggered 
> offline/online and that's why the lock is there.

Thank you for the education. It's more clear to me now why we need this
lock. 

I still have some small questions about when this lock is needed:

I could understand sysfs-triggered online is not acceptable when
removing devices in multiple subsystems. But maybe concurrent offline
and remove(with proper per subsystem locks) seems not harmful? 

And if we just want to remove some devices in a specific subsystem, e.g.
like writing /cpu/release, if it just wants to offline and remove some
cpus, then maybe we don't require the device_hotplug_lock to be taken? 

Thanks, Zhong
> 
> Thanks,
> Rafael
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-24 Thread Li Zhong
On Wed, 2014-04-23 at 10:39 -0400, Tejun Heo wrote:
> On Wed, Apr 23, 2014 at 10:00:26AM +0800, Li Zhong wrote:
> > If you remove cpu0, then the cpu0 directory will be removed, together
> > with the "online" file in the directory, while some other process might
> > be writing 0 or 1 to it.
> > 
> > But here, for the probe/release, take "release" for example, if user
> > writes something that stands for cpu0 to it, the cpu0 will be removed, 
> > and the cpu0 directory and the files under it will be removed. But
> > "release" itself is not removed.
> > 
> > They are attributes of cpu_subsys, not of some specific cpus.
> 
> OIC, so the file itself which triggers removal doesn't get removed.
> Hmmm... but can't you still fall into deadlock?  If on/offline takes
> the same lock under active protection which is also taken while
> removing the cpu files, it doesn't really matter whether the release
> file itself is removed in the process or not.  You can still have ABBA
> deadlock.  What am I missing here?

After thinking it harder, I still couldn't see ABBA here ... 

the active protection taken here is for "probe/release" which will not
be waited for removing something like "online" under cpu#? Or my
assumption that s_active for different files are different locks are
completely wrong? Or I missed something else? 

Thanks, Zhong
> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-24 Thread Li Zhong
On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> cc'ing Li Zhong who's working on a simliar issue in the following
> thread and quoting whole body.
> 
>   http://thread.gmane.org/gmane.linux.kernel/1680706
> 
> Li, this is another variation of the same problem.  Maybe this can be
> covered by your work too?

It seems to me that it is about write something to driver attribute, and
driver unloading. If so, maybe it's not easy to reuse the help functions
created for device attribute, and device removing.

But I guess the idea to break the active protection could still be
applied here:

Maybe we could try_module_get() here (like the other option suggested by
Johan?), and break active protection if we could get the module,
something like below? 


diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 83e910a..6ce27e0 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -69,9 +69,24 @@ static ssize_t drv_attr_store(struct kobject *kobj, struct 
attribute *attr,
struct driver_attribute *drv_attr = to_drv_attr(attr);
struct driver_private *drv_priv = to_driver(kobj);
ssize_t ret = -EIO;
+   struct kernfs_node *kn;
+
+   if (!try_module_get(drv_priv->driver->owner))
+   return ret;
+
+   kn = kernfs_find_and_get(kobj->sd, attr->name);
+   if (WARN_ON_ONCE(!kn))
+   return ret;
+
+   kernfs_break_active_protection(kn);
 
if (drv_attr->store)
ret = drv_attr->store(drv_priv->driver, buf, count);
+
+   kernfs_unbreak_active_protection(kn);
+   kernfs_put(kn);
+
+   module_put(drv_priv->driver->owner);
return ret;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-24 Thread Li Zhong
On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
 cc'ing Li Zhong who's working on a simliar issue in the following
 thread and quoting whole body.
 
   http://thread.gmane.org/gmane.linux.kernel/1680706
 
 Li, this is another variation of the same problem.  Maybe this can be
 covered by your work too?

It seems to me that it is about write something to driver attribute, and
driver unloading. If so, maybe it's not easy to reuse the help functions
created for device attribute, and device removing.

But I guess the idea to break the active protection could still be
applied here:

Maybe we could try_module_get() here (like the other option suggested by
Johan?), and break active protection if we could get the module,
something like below? 


diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 83e910a..6ce27e0 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -69,9 +69,24 @@ static ssize_t drv_attr_store(struct kobject *kobj, struct 
attribute *attr,
struct driver_attribute *drv_attr = to_drv_attr(attr);
struct driver_private *drv_priv = to_driver(kobj);
ssize_t ret = -EIO;
+   struct kernfs_node *kn;
+
+   if (!try_module_get(drv_priv-driver-owner))
+   return ret;
+
+   kn = kernfs_find_and_get(kobj-sd, attr-name);
+   if (WARN_ON_ONCE(!kn))
+   return ret;
+
+   kernfs_break_active_protection(kn);
 
if (drv_attr-store)
ret = drv_attr-store(drv_priv-driver, buf, count);
+
+   kernfs_unbreak_active_protection(kn);
+   kernfs_put(kn);
+
+   module_put(drv_priv-driver-owner);
return ret;
 }
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-24 Thread Li Zhong
On Wed, 2014-04-23 at 10:39 -0400, Tejun Heo wrote:
 On Wed, Apr 23, 2014 at 10:00:26AM +0800, Li Zhong wrote:
  If you remove cpu0, then the cpu0 directory will be removed, together
  with the online file in the directory, while some other process might
  be writing 0 or 1 to it.
  
  But here, for the probe/release, take release for example, if user
  writes something that stands for cpu0 to it, the cpu0 will be removed, 
  and the cpu0 directory and the files under it will be removed. But
  release itself is not removed.
  
  They are attributes of cpu_subsys, not of some specific cpus.
 
 OIC, so the file itself which triggers removal doesn't get removed.
 Hmmm... but can't you still fall into deadlock?  If on/offline takes
 the same lock under active protection which is also taken while
 removing the cpu files, it doesn't really matter whether the release
 file itself is removed in the process or not.  You can still have ABBA
 deadlock.  What am I missing here?

After thinking it harder, I still couldn't see ABBA here ... 

the active protection taken here is for probe/release which will not
be waited for removing something like online under cpu#? Or my
assumption that s_active for different files are different locks are
completely wrong? Or I missed something else? 

Thanks, Zhong
 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-24 Thread Li Zhong
On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
 On 4/23/2014 4:23 PM, Tejun Heo wrote:
  Hello, Rafael.
 
 Hi,
 
  On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
  Can you please elaborate a bit?
  Because it can get involved in larger locking dependency issues by
  joining dependency graphs of two otherwise largely disjoint
  subsystems.  It has potential to create possible deadlocks which don't
  need to exist.
 
 Well, I do my best not to add unnecessary locks if that's what you mean.
 
  It is there to protect hotplug operations involving multiple devices
  (in different subsystems) from racing with each other.  Why exactly
  is it bad?
  But why would different subsystems, say cpu and memory, use the same
  lock?  Wouldn't those subsystems already have proper locking inside
  their own subsystems?
 
 That locking is not sufficient.
 
  Why add this additional global lock across multiple subsystems?
 
 That basically is because of how eject works when it is triggered via ACPI.
 
 It is signaled for a device at the top of a subtree.  It may be a 
 container of some sort and the eject involves everything below that 
 device in the ACPI namespace.  That may involve multiple subsystem 
 (CPUs, memory, PCI host bridge, etc.).
 
 We do that in two steps, offline (which can fail) and eject proper 
 (which can't fail and makes all of the involved devices go away). All 
 that has to be done in one go with respect to the sysfs-triggered 
 offline/online and that's why the lock is there.

Thank you for the education. It's more clear to me now why we need this
lock. 

I still have some small questions about when this lock is needed:

I could understand sysfs-triggered online is not acceptable when
removing devices in multiple subsystems. But maybe concurrent offline
and remove(with proper per subsystem locks) seems not harmful? 

And if we just want to remove some devices in a specific subsystem, e.g.
like writing /cpu/release, if it just wants to offline and remove some
cpus, then maybe we don't require the device_hotplug_lock to be taken? 

Thanks, Zhong
 
 Thanks,
 Rafael
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
 On 4/24/2014 10:59 AM, Li Zhong wrote:
  On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
  On 4/23/2014 4:23 PM, Tejun Heo wrote:
  Hello, Rafael.
  Hi,
 
  On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
  Can you please elaborate a bit?
  Because it can get involved in larger locking dependency issues by
  joining dependency graphs of two otherwise largely disjoint
  subsystems.  It has potential to create possible deadlocks which don't
  need to exist.
  Well, I do my best not to add unnecessary locks if that's what you mean.
 
  It is there to protect hotplug operations involving multiple devices
  (in different subsystems) from racing with each other.  Why exactly
  is it bad?
  But why would different subsystems, say cpu and memory, use the same
  lock?  Wouldn't those subsystems already have proper locking inside
  their own subsystems?
  That locking is not sufficient.
 
  Why add this additional global lock across multiple subsystems?
  That basically is because of how eject works when it is triggered via ACPI.
 
  It is signaled for a device at the top of a subtree.  It may be a
  container of some sort and the eject involves everything below that
  device in the ACPI namespace.  That may involve multiple subsystem
  (CPUs, memory, PCI host bridge, etc.).
 
  We do that in two steps, offline (which can fail) and eject proper
  (which can't fail and makes all of the involved devices go away). All
  that has to be done in one go with respect to the sysfs-triggered
  offline/online and that's why the lock is there.
  Thank you for the education. It's more clear to me now why we need this
  lock.
 
  I still have some small questions about when this lock is needed:
 
  I could understand sysfs-triggered online is not acceptable when
  removing devices in multiple subsystems. But maybe concurrent offline
  and remove(with proper per subsystem locks) seems not harmful?
 
  And if we just want to remove some devices in a specific subsystem, e.g.
  like writing /cpu/release, if it just wants to offline and remove some
  cpus, then maybe we don't require the device_hotplug_lock to be taken?
 
 No and no.
 
 If the offline phase fails for any device in the subtree, we roll back 
 the operation
 and online devices that have already been offlined by it.  Also the ACPI 
 hot-addition
 needs to acquire device_hotplug_lock so that it doesn't race with ejects 
 and so
 that lock needs to be taken by sysfs-triggered offline too.

I can understand that hot-addition needs the device_hotplug lock, but
still not very clear about the offline. 

I guess your are describing following scenario: 

user A: (trying remove cpu 1 and memory 1-10)

lock_device_hotplug
offline cpu with cpu locks  -- successful
offline memories with memory locks  -- failed, e.g. for memory8
online cpu and memory with their locks
unlock_device_hotplug

user B: (trying offline cpu 1)

offline cpu with cpu locks

But I don't see any problem for user B not taking the device_hotplug
lock. The result may be different for user B to take or not take the
lock. But I think it could be seen as concurrent online/offline for cpu1
under cpu hotplug locks, which just depends on which is executed last? 

Or did I miss something here? 

Thanks, Zhong

 Thanks,
 Rafael
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 10:32 -0400, Tejun Heo wrote:
 Hello,
 
 On Thu, Apr 24, 2014 at 04:37:23PM +0800, Li Zhong wrote:
  On Wed, 2014-04-23 at 10:39 -0400, Tejun Heo wrote:
  After thinking it harder, I still couldn't see ABBA here ... 
  
  the active protection taken here is for probe/release which will not
  be waited for removing something like online under cpu#? Or my
  assumption that s_active for different files are different locks are
  completely wrong? Or I missed something else? 
 
 I'm probably confused about the locking.  I was thinking a scenario
 like the following.
 
 A. CPU on/offline
 
grabs s_active protection of online node
grabs cpu subsys mutex
perform on/offline
releases cpu subsys mutex
releases s_active protection of online node

so the chain here is s_active(online) - cpu_subsys_mutex

 
 B. CPU release
 
grabs s_active protection of release node
grabs cpu subsys mutex
performs removal of the CPU
removes the online node
releases cpu subsys mutex
releases s_active protection of release node

and the chain here is s_active(release) - cpu_subsys_mutex -
s_active(online)

 
 A nests cpu subsys mutex under s_active of the online node.  B nests
 s_active of the online node under the cpu subsys mutex.  What am I
 missing?

From the above two chain, I think the problem is cpu_subsys_mutex and
s_active(online), which is the deadlock we are trying to solve in patch
#2. I seems to me s_active(release) here doesn't have lock issues? 

Thanks, Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 10:35 -0400, Tejun Heo wrote:
 On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
  On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
   cc'ing Li Zhong who's working on a simliar issue in the following
   thread and quoting whole body.
   
 http://thread.gmane.org/gmane.linux.kernel/1680706
   
   Li, this is another variation of the same problem.  Maybe this can be
   covered by your work too?
  
  It seems to me that it is about write something to driver attribute, and
  driver unloading. If so, maybe it's not easy to reuse the help functions
  created for device attribute, and device removing.
  
  But I guess the idea to break the active protection could still be
  applied here:
  
  Maybe we could try_module_get() here (like the other option suggested by
  Johan?), and break active protection if we could get the module,
  something like below? 
 
 I don't get why try_module_get() matters here.  We can't call into
 -store if the object at hand is already destroyed and the underlying
 module can't go away if the target device is still alive.
 try_module_get() doesn't actually protect the object.  Why does that
 matter?  This is self removal, right?  Can you please take a look at
 kernfs_remove_self()?

This is about one process writing something to driver attributes, and
one process trying to unload this driver. 

I think try_module_get() could detect whether the driver is being
unloaded, and if not, prevent it from being unloaded, so it could
protect the object here by not allow the driver to be unloaded.

And if the driver is being unloaded, we could abort the write operation
directly. 

Thanks, Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-24 Thread Li Zhong
On Thu, 2014-04-24 at 16:52 +0200, Johan Hovold wrote:
 On Thu, Apr 24, 2014 at 10:35:17AM -0400, Tejun Heo wrote:
  On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
   On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
cc'ing Li Zhong who's working on a simliar issue in the following
thread and quoting whole body.

  http://thread.gmane.org/gmane.linux.kernel/1680706

Li, this is another variation of the same problem.  Maybe this can be
covered by your work too?
   
   It seems to me that it is about write something to driver attribute, and
   driver unloading. If so, maybe it's not easy to reuse the help functions
   created for device attribute, and device removing.
   
   But I guess the idea to break the active protection could still be
   applied here:
   
   Maybe we could try_module_get() here (like the other option suggested by
   Johan?), and break active protection if we could get the module,
   something like below? 
  
  I don't get why try_module_get() matters here.  We can't call into
  -store if the object at hand is already destroyed and the underlying
  module can't go away if the target device is still alive.
  try_module_get() doesn't actually protect the object.  Why does that
  matter?  This is self removal, right?  Can you please take a look at
  kernfs_remove_self()?
 
 No, this isn't self removal. The driver-attribute (not device-attribute)
 store operation simply grabs a lock that is also held while the driver
 is being deregistered at module unload. Taking a reference to the module
 in this case will prevent deregistration while store is running.
 
 But it seems like this can be solved for usb-serial by simply not
 holding the lock while deregistering.

I didn't look carefully about this lock. 

But I'm not sure whether there are such requirements for driver
attributes:

some lock needs be grabbed in the driver attributes store callbacks, and
the same lock also needs to be grabbed during driver unregister. 

If we have such requirements currently or in the future, I think they
could all be solved by breaking active protection after get the module
reference.

Thanks, Zhong

 
 Johan
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-23 Thread Li Zhong
On Wed, 2014-04-23 at 12:58 +0200, Rafael J. Wysocki wrote:
> On Wednesday, April 23, 2014 01:03:42 PM Li Zhong wrote:
> > On Tue, 2014-04-22 at 16:44 -0400, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
> > > > > Is this assumption true?  If so, can we add lockdep assertions in
> > > > > places to verify and enforce this?  If not, aren't we just feeling
> > > > > good when the reality is broken?
> > > > 
> > > > It seems not true ... I think there are devices that don't have the
> > > > online/offline concept, we just need to add it, remove it, like ethernet
> > > > cards. 
> > > > 
> > > > Maybe we could change the comments above, like:
> > > > /* We assume device_hotplug_lock must be acquired before 
> > > >  * removing devices, which have online/offline sysfs knob, 
> > > >  * and some locks are needed to serialize the online/offline
> > > >  * callbacks and device removing. ...
> > > > ? 
> > > > 
> > > > And we could add lockdep assertions in cpu and memory related code? e.g.
> > > > remove_memory(), unregister_cpu()
> > > > 
> > > > Currently, remove_memory() has comments for the function:
> > > > 
> > > >  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> > > >  * and online/offline operations before this call, as required by
> > > >  * try_offline_node().
> > > >  */
> > > > 
> > > > maybe it could be removed with the lockdep assertion.
> > > 
> > > I'm confused about the overall locking scheme.  What's the role of
> > > device_hotplug_lock?  Is that solely to prevent the sysfs deadlock
> > > issue?  Or does it serve other synchronization purposes depending on
> > > the specific subsystem?  If the former, the lock no longer needs to
> > > exist.  The only thing necessary would be synchronization between
> > > device_del() deleting the sysfs file and the unbreak helper invoking
> > > device-specific callback.  If the latter, we probably should change
> > > that.  Sharing hotplug lock across multiple subsystems through driver
> > > core sounds like a pretty bad idea.
> > 
> > I think it's the latter.
> 
> Actually, no, this is not the case if I understand you correctly.

Oh, Sorry, I didn't read carefully. Yes, it's not specific subsystem.
After seeing your reply, I understand it is for protecting device hot
remove involving multiple subsystems.

> 
> > I think device_{on|off}line is better to be
> > done in some sort of lock which prevents the device from being removed,
> > including some preparation work that needs be done before device_del().
> 
> Quite frankly, you should be confident that you understand the code you're
> trying to modify or please don't touch it.
> 
> I'll have a deeper look at this issue later today or tomorrow and will get
> back to you then.

Ok, thank you, 
Zhong

> 
> Thanks!
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-23 Thread Li Zhong
On Wed, 2014-04-23 at 12:54 +0200, Rafael J. Wysocki wrote:
> On Wednesday, April 23, 2014 09:50:32 AM Li Zhong wrote:
> > On Tue, 2014-04-22 at 12:11 +0200, Rafael J. Wysocki wrote:
> > > On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
> > > > On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> > > > > Hello,
> > > > > 
> > > > > On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> > > > > 
> > > > > Proper /** function comment would be nice.
> > > > 
> > > > Ok, will try to write some in next version.
> > > > 
> > > > > 
> > > > > > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > > > > > +  struct device_attribute 
> > > > > > *attr)
> > > > > 
> > > > > I can see why you did this but let's please not require the user of
> > > > > this function to see how the thing is working internally.  Let's
> > > > > return int and keep track of (or look up again) the kernfs_node
> > > > > internally.
> > > > 
> > > > Ok, it also makes the prototype of lock and unlock look more consistent
> > > > and comfortable. 
> > > > 
> > > > > 
> > > > > >  {
> > > > > ...
> > > > > > +   /*
> > > > > > +* We assume device_hotplug_lock must be acquired before 
> > > > > > removing
> > > > > 
> > > > > Is this assumption true?  If so, can we add lockdep assertions in
> > > > > places to verify and enforce this?  If not, aren't we just feeling
> > > > > good when the reality is broken?
> > > > 
> > > > It seems not true ... I think there are devices that don't have the
> > > > online/offline concept, we just need to add it, remove it, like ethernet
> > > > cards. 
> > > 
> > > Well, I haven't been following this closely (I was travelling, sorry), but
> > > there certainly are devices without online/offline.  That currently is 
> > > only
> > > present for CPUs, memory blocks and ACPI containers (if I remember 
> > > correctly).
> > > 
> > > > 
> > > > Maybe we could change the comments above, like:
> > > > /* We assume device_hotplug_lock must be acquired before 
> > > >  * removing devices, which have online/offline sysfs knob, 
> > > >  * and some locks are needed to serialize the online/offline
> > > >  * callbacks and device removing. ...
> > > > ? 
> > > 
> > > Lockdep assertions would be better than this in my opinion.
> > 
> > This is talking about the lock required in the other process, the device
> > removing process, e.g. that in remove_memory() below. So I guess no
> > lockdep assertions needed here. Or I misunderstand your point? 
> 
> I mean if you assume certain lock to be held somewhere, it is better to use
> lockdep annotations to express that assumption, because that will cause users
> to *see* the problem when it happens.

OK, I see, I think you were suggesting the same thing as Tejun, just I
misunderstood it. 

Thanks, Zhong

> 
> Thanks!
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-23 Thread Li Zhong
On Wed, 2014-04-23 at 12:54 +0200, Rafael J. Wysocki wrote:
 On Wednesday, April 23, 2014 09:50:32 AM Li Zhong wrote:
  On Tue, 2014-04-22 at 12:11 +0200, Rafael J. Wysocki wrote:
   On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
 Hello,
 
 On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
 
 Proper /** function comment would be nice.

Ok, will try to write some in next version.

 
  +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
  +  struct device_attribute 
  *attr)
 
 I can see why you did this but let's please not require the user of
 this function to see how the thing is working internally.  Let's
 return int and keep track of (or look up again) the kernfs_node
 internally.

Ok, it also makes the prototype of lock and unlock look more consistent
and comfortable. 

 
   {
 ...
  +   /*
  +* We assume device_hotplug_lock must be acquired before 
  removing
 
 Is this assumption true?  If so, can we add lockdep assertions in
 places to verify and enforce this?  If not, aren't we just feeling
 good when the reality is broken?

It seems not true ... I think there are devices that don't have the
online/offline concept, we just need to add it, remove it, like ethernet
cards. 
   
   Well, I haven't been following this closely (I was travelling, sorry), but
   there certainly are devices without online/offline.  That currently is 
   only
   present for CPUs, memory blocks and ACPI containers (if I remember 
   correctly).
   

Maybe we could change the comments above, like:
/* We assume device_hotplug_lock must be acquired before 
 * removing devices, which have online/offline sysfs knob, 
 * and some locks are needed to serialize the online/offline
 * callbacks and device removing. ...
? 
   
   Lockdep assertions would be better than this in my opinion.
  
  This is talking about the lock required in the other process, the device
  removing process, e.g. that in remove_memory() below. So I guess no
  lockdep assertions needed here. Or I misunderstand your point? 
 
 I mean if you assume certain lock to be held somewhere, it is better to use
 lockdep annotations to express that assumption, because that will cause users
 to *see* the problem when it happens.

OK, I see, I think you were suggesting the same thing as Tejun, just I
misunderstood it. 

Thanks, Zhong

 
 Thanks!
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-23 Thread Li Zhong
On Wed, 2014-04-23 at 12:58 +0200, Rafael J. Wysocki wrote:
 On Wednesday, April 23, 2014 01:03:42 PM Li Zhong wrote:
  On Tue, 2014-04-22 at 16:44 -0400, Tejun Heo wrote:
   Hello,
   
   On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
 Is this assumption true?  If so, can we add lockdep assertions in
 places to verify and enforce this?  If not, aren't we just feeling
 good when the reality is broken?

It seems not true ... I think there are devices that don't have the
online/offline concept, we just need to add it, remove it, like ethernet
cards. 

Maybe we could change the comments above, like:
/* We assume device_hotplug_lock must be acquired before 
 * removing devices, which have online/offline sysfs knob, 
 * and some locks are needed to serialize the online/offline
 * callbacks and device removing. ...
? 

And we could add lockdep assertions in cpu and memory related code? e.g.
remove_memory(), unregister_cpu()

Currently, remove_memory() has comments for the function:

 * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
 * and online/offline operations before this call, as required by
 * try_offline_node().
 */

maybe it could be removed with the lockdep assertion.
   
   I'm confused about the overall locking scheme.  What's the role of
   device_hotplug_lock?  Is that solely to prevent the sysfs deadlock
   issue?  Or does it serve other synchronization purposes depending on
   the specific subsystem?  If the former, the lock no longer needs to
   exist.  The only thing necessary would be synchronization between
   device_del() deleting the sysfs file and the unbreak helper invoking
   device-specific callback.  If the latter, we probably should change
   that.  Sharing hotplug lock across multiple subsystems through driver
   core sounds like a pretty bad idea.
  
  I think it's the latter.
 
 Actually, no, this is not the case if I understand you correctly.

Oh, Sorry, I didn't read carefully. Yes, it's not specific subsystem.
After seeing your reply, I understand it is for protecting device hot
remove involving multiple subsystems.

 
  I think device_{on|off}line is better to be
  done in some sort of lock which prevents the device from being removed,
  including some preparation work that needs be done before device_del().
 
 Quite frankly, you should be confident that you understand the code you're
 trying to modify or please don't touch it.
 
 I'll have a deeper look at this issue later today or tomorrow and will get
 back to you then.

Ok, thank you, 
Zhong

 
 Thanks!
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-22 Thread Li Zhong
On Tue, 2014-04-22 at 16:44 -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
> > > Is this assumption true?  If so, can we add lockdep assertions in
> > > places to verify and enforce this?  If not, aren't we just feeling
> > > good when the reality is broken?
> > 
> > It seems not true ... I think there are devices that don't have the
> > online/offline concept, we just need to add it, remove it, like ethernet
> > cards. 
> > 
> > Maybe we could change the comments above, like:
> > /* We assume device_hotplug_lock must be acquired before 
> >  * removing devices, which have online/offline sysfs knob, 
> >  * and some locks are needed to serialize the online/offline
> >  * callbacks and device removing. ...
> > ? 
> > 
> > And we could add lockdep assertions in cpu and memory related code? e.g.
> > remove_memory(), unregister_cpu()
> > 
> > Currently, remove_memory() has comments for the function:
> > 
> >  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> >  * and online/offline operations before this call, as required by
> >  * try_offline_node().
> >  */
> > 
> > maybe it could be removed with the lockdep assertion.
> 
> I'm confused about the overall locking scheme.  What's the role of
> device_hotplug_lock?  Is that solely to prevent the sysfs deadlock
> issue?  Or does it serve other synchronization purposes depending on
> the specific subsystem?  If the former, the lock no longer needs to
> exist.  The only thing necessary would be synchronization between
> device_del() deleting the sysfs file and the unbreak helper invoking
> device-specific callback.  If the latter, we probably should change
> that.  Sharing hotplug lock across multiple subsystems through driver
> core sounds like a pretty bad idea.

I think it's the latter. I think device_{on|off}line is better to be
done in some sort of lock which prevents the device from being removed,
including some preparation work that needs be done before device_del().

Thanks, Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-22 Thread Li Zhong
On Tue, 2014-04-22 at 16:40 -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Apr 22, 2014 at 10:29:37AM +0800, Li Zhong wrote:
> > The probe/release files are attribute files for cpu subsys, looks like
> > following in sysfs dirs
> > 
> > $ cd /sys/devices/system/cpu/
> > $ ls -l
> > total 0
> > drwxr-xr-x. 7 root root 0 Apr 17 19:00 cpu0
> > drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu1
> > drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu10
> > ..
> > drwxr-xr-x. 3 root root 0 Apr 20 08:00 cpufreq
> > drwxr-xr-x. 2 root root 0 Apr 20 08:00 cpuidle
> > -rw---. 1 root root 65536 Apr 21 00:28 dscr_default
> > -r--r--r--. 1 root root 65536 Apr 21 00:28 kernel_max
> > -r--r--r--. 1 root root 65536 Apr 21 00:28 offline
> > -r--r--r--. 1 root root 65536 Sep  4  2014 online
> > -r--r--r--. 1 root root 65536 Apr 21 00:28 possible
> > drwxr-xr-x. 2 root root 0 Apr 20 08:00 power
> > -r--r--r--. 1 root root 65536 Apr 17 20:46 present
> > --w---. 1 root root 65536 Apr 21 00:28 probe<-  
> > --w---. 1 root root 65536 Apr 21 00:28 release  <-
> > -rw---. 1 root root 65536 Apr 21 00:28 subcores_per_core
> > -rw-r--r--. 1 root root 65536 Apr 21 00:28 uevent
> > 
> > From the code, it seems cpu subsys won't be unregistered, and it doesn't
> > make sense to remove all the cpus in the system. 
> 
> I don't think I'm following you.  Are you saying that no files which
> are protected under the hotplug lock that cpu subsys uses are removed
> during offlining?

Sorry, Maybe I didn't say it clearly. 

There are files under cpu#, e.g. 

$ cd /sys/devices/system/cpu/cpu0
$ ls
cache dscr   node1pmc1  pmc5   smt_snooze_delay
uevent
cpuidle   mmcr0  online   pmc2  pmc6   spurr
crash_notes   mmcr1  physical_id  pmc3  power  subsystem
crash_notes_size  mmcra  pir  pmc4  purr   topology

If you remove cpu0, then the cpu0 directory will be removed, together
with the "online" file in the directory, while some other process might
be writing 0 or 1 to it.

But here, for the probe/release, take "release" for example, if user
writes something that stands for cpu0 to it, the cpu0 will be removed, 
and the cpu0 directory and the files under it will be removed. But
"release" itself is not removed.

They are attributes of cpu_subsys, not of some specific cpus.

Hopes the above makes things a bit clearer. 

Thanks, Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-22 Thread Li Zhong
On Tue, 2014-04-22 at 12:11 +0200, Rafael J. Wysocki wrote:
> On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
> > On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> > > 
> > > Proper /** function comment would be nice.
> > 
> > Ok, will try to write some in next version.
> > 
> > > 
> > > > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > > > +  struct device_attribute 
> > > > *attr)
> > > 
> > > I can see why you did this but let's please not require the user of
> > > this function to see how the thing is working internally.  Let's
> > > return int and keep track of (or look up again) the kernfs_node
> > > internally.
> > 
> > Ok, it also makes the prototype of lock and unlock look more consistent
> > and comfortable. 
> > 
> > > 
> > > >  {
> > > ...
> > > > +   /*
> > > > +* We assume device_hotplug_lock must be acquired before 
> > > > removing
> > > 
> > > Is this assumption true?  If so, can we add lockdep assertions in
> > > places to verify and enforce this?  If not, aren't we just feeling
> > > good when the reality is broken?
> > 
> > It seems not true ... I think there are devices that don't have the
> > online/offline concept, we just need to add it, remove it, like ethernet
> > cards. 
> 
> Well, I haven't been following this closely (I was travelling, sorry), but
> there certainly are devices without online/offline.  That currently is only
> present for CPUs, memory blocks and ACPI containers (if I remember correctly).
> 
> > 
> > Maybe we could change the comments above, like:
> > /* We assume device_hotplug_lock must be acquired before 
> >  * removing devices, which have online/offline sysfs knob, 
> >  * and some locks are needed to serialize the online/offline
> >  * callbacks and device removing. ...
> > ? 
> 
> Lockdep assertions would be better than this in my opinion.

This is talking about the lock required in the other process, the device
removing process, e.g. that in remove_memory() below. So I guess no
lockdep assertions needed here. Or I misunderstand your point? 

> > 
> > And we could add lockdep assertions in cpu and memory related code? e.g.
> > remove_memory(), unregister_cpu()
> > 
> > Currently, remove_memory() has comments for the function:
> > 
> >  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> >  * and online/offline operations before this call, as required by
> >  * try_offline_node().
> >  */
> > 
> > maybe it could be removed with the lockdep assertion.
> 
> No, please don't remove it.  It is there to explain where the locking 
> requirement
> comes from.

OK, I see. I think I'll just add lockdep assertions, and keep the
comments there. 

Thanks, Zhong

> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-22 Thread Li Zhong
On Tue, 2014-04-22 at 12:11 +0200, Rafael J. Wysocki wrote:
 On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
  On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
   Hello,
   
   On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
   
   Proper /** function comment would be nice.
  
  Ok, will try to write some in next version.
  
   
+struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
+  struct device_attribute 
*attr)
   
   I can see why you did this but let's please not require the user of
   this function to see how the thing is working internally.  Let's
   return int and keep track of (or look up again) the kernfs_node
   internally.
  
  Ok, it also makes the prototype of lock and unlock look more consistent
  and comfortable. 
  
   
 {
   ...
+   /*
+* We assume device_hotplug_lock must be acquired before 
removing
   
   Is this assumption true?  If so, can we add lockdep assertions in
   places to verify and enforce this?  If not, aren't we just feeling
   good when the reality is broken?
  
  It seems not true ... I think there are devices that don't have the
  online/offline concept, we just need to add it, remove it, like ethernet
  cards. 
 
 Well, I haven't been following this closely (I was travelling, sorry), but
 there certainly are devices without online/offline.  That currently is only
 present for CPUs, memory blocks and ACPI containers (if I remember correctly).
 
  
  Maybe we could change the comments above, like:
  /* We assume device_hotplug_lock must be acquired before 
   * removing devices, which have online/offline sysfs knob, 
   * and some locks are needed to serialize the online/offline
   * callbacks and device removing. ...
  ? 
 
 Lockdep assertions would be better than this in my opinion.

This is talking about the lock required in the other process, the device
removing process, e.g. that in remove_memory() below. So I guess no
lockdep assertions needed here. Or I misunderstand your point? 

  
  And we could add lockdep assertions in cpu and memory related code? e.g.
  remove_memory(), unregister_cpu()
  
  Currently, remove_memory() has comments for the function:
  
   * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
   * and online/offline operations before this call, as required by
   * try_offline_node().
   */
  
  maybe it could be removed with the lockdep assertion.
 
 No, please don't remove it.  It is there to explain where the locking 
 requirement
 comes from.

OK, I see. I think I'll just add lockdep assertions, and keep the
comments there. 

Thanks, Zhong

 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-22 Thread Li Zhong
On Tue, 2014-04-22 at 16:40 -0400, Tejun Heo wrote:
 Hello,
 
 On Tue, Apr 22, 2014 at 10:29:37AM +0800, Li Zhong wrote:
  The probe/release files are attribute files for cpu subsys, looks like
  following in sysfs dirs
  
  $ cd /sys/devices/system/cpu/
  $ ls -l
  total 0
  drwxr-xr-x. 7 root root 0 Apr 17 19:00 cpu0
  drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu1
  drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu10
  ..
  drwxr-xr-x. 3 root root 0 Apr 20 08:00 cpufreq
  drwxr-xr-x. 2 root root 0 Apr 20 08:00 cpuidle
  -rw---. 1 root root 65536 Apr 21 00:28 dscr_default
  -r--r--r--. 1 root root 65536 Apr 21 00:28 kernel_max
  -r--r--r--. 1 root root 65536 Apr 21 00:28 offline
  -r--r--r--. 1 root root 65536 Sep  4  2014 online
  -r--r--r--. 1 root root 65536 Apr 21 00:28 possible
  drwxr-xr-x. 2 root root 0 Apr 20 08:00 power
  -r--r--r--. 1 root root 65536 Apr 17 20:46 present
  --w---. 1 root root 65536 Apr 21 00:28 probe-  
  --w---. 1 root root 65536 Apr 21 00:28 release  -
  -rw---. 1 root root 65536 Apr 21 00:28 subcores_per_core
  -rw-r--r--. 1 root root 65536 Apr 21 00:28 uevent
  
  From the code, it seems cpu subsys won't be unregistered, and it doesn't
  make sense to remove all the cpus in the system. 
 
 I don't think I'm following you.  Are you saying that no files which
 are protected under the hotplug lock that cpu subsys uses are removed
 during offlining?

Sorry, Maybe I didn't say it clearly. 

There are files under cpu#, e.g. 

$ cd /sys/devices/system/cpu/cpu0
$ ls
cache dscr   node1pmc1  pmc5   smt_snooze_delay
uevent
cpuidle   mmcr0  online   pmc2  pmc6   spurr
crash_notes   mmcr1  physical_id  pmc3  power  subsystem
crash_notes_size  mmcra  pir  pmc4  purr   topology

If you remove cpu0, then the cpu0 directory will be removed, together
with the online file in the directory, while some other process might
be writing 0 or 1 to it.

But here, for the probe/release, take release for example, if user
writes something that stands for cpu0 to it, the cpu0 will be removed, 
and the cpu0 directory and the files under it will be removed. But
release itself is not removed.

They are attributes of cpu_subsys, not of some specific cpus.

Hopes the above makes things a bit clearer. 

Thanks, Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-22 Thread Li Zhong
On Tue, 2014-04-22 at 16:44 -0400, Tejun Heo wrote:
 Hello,
 
 On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
   Is this assumption true?  If so, can we add lockdep assertions in
   places to verify and enforce this?  If not, aren't we just feeling
   good when the reality is broken?
  
  It seems not true ... I think there are devices that don't have the
  online/offline concept, we just need to add it, remove it, like ethernet
  cards. 
  
  Maybe we could change the comments above, like:
  /* We assume device_hotplug_lock must be acquired before 
   * removing devices, which have online/offline sysfs knob, 
   * and some locks are needed to serialize the online/offline
   * callbacks and device removing. ...
  ? 
  
  And we could add lockdep assertions in cpu and memory related code? e.g.
  remove_memory(), unregister_cpu()
  
  Currently, remove_memory() has comments for the function:
  
   * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
   * and online/offline operations before this call, as required by
   * try_offline_node().
   */
  
  maybe it could be removed with the lockdep assertion.
 
 I'm confused about the overall locking scheme.  What's the role of
 device_hotplug_lock?  Is that solely to prevent the sysfs deadlock
 issue?  Or does it serve other synchronization purposes depending on
 the specific subsystem?  If the former, the lock no longer needs to
 exist.  The only thing necessary would be synchronization between
 device_del() deleting the sysfs file and the unbreak helper invoking
 device-specific callback.  If the latter, we probably should change
 that.  Sharing hotplug lock across multiple subsystems through driver
 core sounds like a pretty bad idea.

I think it's the latter. I think device_{on|off}line is better to be
done in some sort of lock which prevents the device from being removed,
including some preparation work that needs be done before device_del().

Thanks, Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-21 Thread Li Zhong
On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> 
> Proper /** function comment would be nice.

Ok, will try to write some in next version.

> 
> > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > +  struct device_attribute *attr)
> 
> I can see why you did this but let's please not require the user of
> this function to see how the thing is working internally.  Let's
> return int and keep track of (or look up again) the kernfs_node
> internally.

Ok, it also makes the prototype of lock and unlock look more consistent
and comfortable. 

> 
> >  {
> ...
> > +   /*
> > +* We assume device_hotplug_lock must be acquired before removing
> 
> Is this assumption true?  If so, can we add lockdep assertions in
> places to verify and enforce this?  If not, aren't we just feeling
> good when the reality is broken?

It seems not true ... I think there are devices that don't have the
online/offline concept, we just need to add it, remove it, like ethernet
cards. 

Maybe we could change the comments above, like:
/* We assume device_hotplug_lock must be acquired before 
 * removing devices, which have online/offline sysfs knob, 
 * and some locks are needed to serialize the online/offline
 * callbacks and device removing. ...
? 

And we could add lockdep assertions in cpu and memory related code? e.g.
remove_memory(), unregister_cpu()

Currently, remove_memory() has comments for the function:

 * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
 * and online/offline operations before this call, as required by
 * try_offline_node().
 */

maybe it could be removed with the lockdep assertion.

> ...
> 
> Function comment please.

OK. 

Thanks, Zhong

> > +void unlock_device_hotplug_sysfs(struct device *dev,
> > +struct kernfs_node *kn)
> > +{
> > +   unlock_device_hotplug();
> > +   kernfs_unbreak_active_protection(kn);
> > +   put_device(dev);
> > +   kernfs_put(kn);
> >  }
> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-21 Thread Li Zhong
On Mon, 2014-04-21 at 18:38 -0400, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 21, 2014 at 05:20:59PM +0800, Li Zhong wrote:
> > While auditing the usage of lock_device_hotplug_sysfs() for implementing
> > it in another way in following patch, it seems to me that the code here
> > is to add/remove device, and the files probe/release for cpu bus
> > themselves won't be removed. 
> > 
> > So it seems to me there is no s_active related deadlock here, and we
> > could just use lock_device_hotplug().
> 
> It may still cause issue if offlining ends up removing sysfs files or
> gets involved with the same lock used during cpu hot[un]plug
> operations (e.g. offlining racing against cpu hotunplug) and offlining
> a cpu does remove files.  Has this change been tested?

The probe/release files are attribute files for cpu subsys, looks like
following in sysfs dirs

$ cd /sys/devices/system/cpu/
$ ls -l
total 0
drwxr-xr-x. 7 root root 0 Apr 17 19:00 cpu0
drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu1
drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu10
..
drwxr-xr-x. 3 root root 0 Apr 20 08:00 cpufreq
drwxr-xr-x. 2 root root 0 Apr 20 08:00 cpuidle
-rw---. 1 root root 65536 Apr 21 00:28 dscr_default
-r--r--r--. 1 root root 65536 Apr 21 00:28 kernel_max
-r--r--r--. 1 root root 65536 Apr 21 00:28 offline
-r--r--r--. 1 root root 65536 Sep  4  2014 online
-r--r--r--. 1 root root 65536 Apr 21 00:28 possible
drwxr-xr-x. 2 root root 0 Apr 20 08:00 power
-r--r--r--. 1 root root 65536 Apr 17 20:46 present
--w---. 1 root root 65536 Apr 21 00:28 probe<-  
--w---. 1 root root 65536 Apr 21 00:28 release  <-
-rw---. 1 root root 65536 Apr 21 00:28 subcores_per_core
-rw-r--r--. 1 root root 65536 Apr 21 00:28 uevent

>From the code, it seems cpu subsys won't be unregistered, and it doesn't
make sense to remove all the cpus in the system. 

Thanks, Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-21 Thread Li Zhong
This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism 
to break active protection. 

The active protection is there to keep the file alive by blocking
deletion while operations are on-going in the file.  This blocking
creates a dependency loop when an operation running off a sysfs knob
ends up grabbing a lock which may be held while removing the said sysfs
knob.

So the problem here is the order of s_active, and the series of hotplug
related locks. 

commit 5e33bc41 solves it by taking out the first of the series of
hoplug related locks, device_hotplug_lock, with a try lock. And if that
try lock fails, it returns a restart syscall error, and drops s_active
temporarily to allow the other process to remove the sysfs knob.

This doesn't help with lockdep warnings reported against s_active and
other hotplug related locks in the series.

This patch instead tries to take s_active out of the lock dependency
graph using the active protection breaking mechanism. 

lock_device_hotplug_sysfs() function name is kept here, two more
arguments are added, dev, attr, to help find the kernfs_node
corresponding to the sysfs knob (which should always be able to be
found, WARN if not). The found kernfs_node is recorded as the return
value, to be released in unlock_device_hotplug_sysfs().

As we break the active protection here, the device removing process
might remove the sysfs entries after that point. So after we grab the
device_hotplug_lock, we need check whether that really happens. If so,
we should abort and not invoke the online/offline callbacks anymore. In
this case, NULL is returned to the caller, so it could return with
ENODEV.

Signed-off-by: Li Zhong 
---
 drivers/base/core.c| 63 ++
 drivers/base/memory.c  |  9 
 include/linux/device.h |  5 +++-
 3 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..4d37a2b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -61,14 +61,55 @@ void unlock_device_hotplug(void)
mutex_unlock(_hotplug_lock);
 }
 
-int lock_device_hotplug_sysfs(void)
+struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
+  struct device_attribute *attr)
 {
-   if (mutex_trylock(_hotplug_lock))
-   return 0;
+   struct kernfs_node *kn;
+
+   kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+
+   if (WARN_ON_ONCE(!kn))
+   return NULL;
+
+   /*
+* Break active protection here to avoid deadlocks with device
+* removing process, which tries to remove sysfs entries including this
+* "online" attribute while holding some hotplug related locks.
+*
+* @dev needs to be protected here, or it could go away any time after
+* dropping active protection.
+*/
+
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
+
+   /*
+* We assume device_hotplug_lock must be acquired before removing
+* device, we can check here whether the device has been removed, so
+* we don't call device_{on|off}line against removed device.
+*/
 
-   /* Avoid busy looping (5 ms of sleep should do). */
-   msleep(5);
-   return restart_syscall();
+   if (!dev->kobj.sd) {
+   /* device_del() already called on @dev, we should also abort */
+   unlock_device_hotplug();
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+   kernfs_put(kn);
+   return NULL;
+   }
+
+   return kn;
+}
+
+void unlock_device_hotplug_sysfs(struct device *dev,
+struct kernfs_node *kn)
+{
+   unlock_device_hotplug();
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+   kernfs_put(kn);
 }
 
 #ifdef CONFIG_BLOCK
@@ -439,17 +480,19 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, );
if (ret < 0)
return ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   kn = lock_device_hotplug_sysfs(dev, attr);
+   if (!kn)
+   return -ENODEV;
 
ret = val ? device_online(dev) : device_offline(dev);
-   unlock_device_hotplug();
+   unlock_device_hotplug_sysfs(dev, kn);
+
return ret < 0 ? ret : count;
 }
 static DEVICE_ATTR_RW(online);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..c2b66d4 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -320,10 +320,11 @@ store_mem_state(struct device *dev,
 {
struct memory_block *mem = to_memory_bl

[RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-21 Thread Li Zhong
While auditing the usage of lock_device_hotplug_sysfs() for implementing
it in another way in following patch, it seems to me that the code here
is to add/remove device, and the files probe/release for cpu bus
themselves won't be removed. 

So it seems to me there is no s_active related deadlock here, and we
could just use lock_device_hotplug().

Signed-off-by: Li Zhong 
---
 drivers/base/cpu.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc..9483225 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -88,9 +88,7 @@ static ssize_t cpu_probe_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_probe(buf, count);
 
@@ -106,9 +104,7 @@ static ssize_t cpu_release_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_release(buf, count);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-21 Thread Li Zhong
While auditing the usage of lock_device_hotplug_sysfs() for implementing
it in another way in following patch, it seems to me that the code here
is to add/remove device, and the files probe/release for cpu bus
themselves won't be removed. 

So it seems to me there is no s_active related deadlock here, and we
could just use lock_device_hotplug().

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/cpu.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc..9483225 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -88,9 +88,7 @@ static ssize_t cpu_probe_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_probe(buf, count);
 
@@ -106,9 +104,7 @@ static ssize_t cpu_release_store(struct device *dev,
ssize_t cnt;
int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   lock_device_hotplug();
 
cnt = arch_cpu_release(buf, count);
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-21 Thread Li Zhong
This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism 
to break active protection. 

The active protection is there to keep the file alive by blocking
deletion while operations are on-going in the file.  This blocking
creates a dependency loop when an operation running off a sysfs knob
ends up grabbing a lock which may be held while removing the said sysfs
knob.

So the problem here is the order of s_active, and the series of hotplug
related locks. 

commit 5e33bc41 solves it by taking out the first of the series of
hoplug related locks, device_hotplug_lock, with a try lock. And if that
try lock fails, it returns a restart syscall error, and drops s_active
temporarily to allow the other process to remove the sysfs knob.

This doesn't help with lockdep warnings reported against s_active and
other hotplug related locks in the series.

This patch instead tries to take s_active out of the lock dependency
graph using the active protection breaking mechanism. 

lock_device_hotplug_sysfs() function name is kept here, two more
arguments are added, dev, attr, to help find the kernfs_node
corresponding to the sysfs knob (which should always be able to be
found, WARN if not). The found kernfs_node is recorded as the return
value, to be released in unlock_device_hotplug_sysfs().

As we break the active protection here, the device removing process
might remove the sysfs entries after that point. So after we grab the
device_hotplug_lock, we need check whether that really happens. If so,
we should abort and not invoke the online/offline callbacks anymore. In
this case, NULL is returned to the caller, so it could return with
ENODEV.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/core.c| 63 ++
 drivers/base/memory.c  |  9 
 include/linux/device.h |  5 +++-
 3 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..4d37a2b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -61,14 +61,55 @@ void unlock_device_hotplug(void)
mutex_unlock(device_hotplug_lock);
 }
 
-int lock_device_hotplug_sysfs(void)
+struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
+  struct device_attribute *attr)
 {
-   if (mutex_trylock(device_hotplug_lock))
-   return 0;
+   struct kernfs_node *kn;
+
+   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
+
+   if (WARN_ON_ONCE(!kn))
+   return NULL;
+
+   /*
+* Break active protection here to avoid deadlocks with device
+* removing process, which tries to remove sysfs entries including this
+* online attribute while holding some hotplug related locks.
+*
+* @dev needs to be protected here, or it could go away any time after
+* dropping active protection.
+*/
+
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
+
+   /*
+* We assume device_hotplug_lock must be acquired before removing
+* device, we can check here whether the device has been removed, so
+* we don't call device_{on|off}line against removed device.
+*/
 
-   /* Avoid busy looping (5 ms of sleep should do). */
-   msleep(5);
-   return restart_syscall();
+   if (!dev-kobj.sd) {
+   /* device_del() already called on @dev, we should also abort */
+   unlock_device_hotplug();
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+   kernfs_put(kn);
+   return NULL;
+   }
+
+   return kn;
+}
+
+void unlock_device_hotplug_sysfs(struct device *dev,
+struct kernfs_node *kn)
+{
+   unlock_device_hotplug();
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+   kernfs_put(kn);
 }
 
 #ifdef CONFIG_BLOCK
@@ -439,17 +480,19 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, val);
if (ret  0)
return ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   kn = lock_device_hotplug_sysfs(dev, attr);
+   if (!kn)
+   return -ENODEV;
 
ret = val ? device_online(dev) : device_offline(dev);
-   unlock_device_hotplug();
+   unlock_device_hotplug_sysfs(dev, kn);
+
return ret  0 ? ret : count;
 }
 static DEVICE_ATTR_RW(online);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..c2b66d4 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -320,10 +320,11 @@ store_mem_state(struct device *dev,
 {
struct memory_block *mem

Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

2014-04-21 Thread Li Zhong
On Mon, 2014-04-21 at 18:38 -0400, Tejun Heo wrote:
 Hello,
 
 On Mon, Apr 21, 2014 at 05:20:59PM +0800, Li Zhong wrote:
  While auditing the usage of lock_device_hotplug_sysfs() for implementing
  it in another way in following patch, it seems to me that the code here
  is to add/remove device, and the files probe/release for cpu bus
  themselves won't be removed. 
  
  So it seems to me there is no s_active related deadlock here, and we
  could just use lock_device_hotplug().
 
 It may still cause issue if offlining ends up removing sysfs files or
 gets involved with the same lock used during cpu hot[un]plug
 operations (e.g. offlining racing against cpu hotunplug) and offlining
 a cpu does remove files.  Has this change been tested?

The probe/release files are attribute files for cpu subsys, looks like
following in sysfs dirs

$ cd /sys/devices/system/cpu/
$ ls -l
total 0
drwxr-xr-x. 7 root root 0 Apr 17 19:00 cpu0
drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu1
drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu10
..
drwxr-xr-x. 3 root root 0 Apr 20 08:00 cpufreq
drwxr-xr-x. 2 root root 0 Apr 20 08:00 cpuidle
-rw---. 1 root root 65536 Apr 21 00:28 dscr_default
-r--r--r--. 1 root root 65536 Apr 21 00:28 kernel_max
-r--r--r--. 1 root root 65536 Apr 21 00:28 offline
-r--r--r--. 1 root root 65536 Sep  4  2014 online
-r--r--r--. 1 root root 65536 Apr 21 00:28 possible
drwxr-xr-x. 2 root root 0 Apr 20 08:00 power
-r--r--r--. 1 root root 65536 Apr 17 20:46 present
--w---. 1 root root 65536 Apr 21 00:28 probe-  
--w---. 1 root root 65536 Apr 21 00:28 release  -
-rw---. 1 root root 65536 Apr 21 00:28 subcores_per_core
-rw-r--r--. 1 root root 65536 Apr 21 00:28 uevent

From the code, it seems cpu subsys won't be unregistered, and it doesn't
make sense to remove all the cpus in the system. 

Thanks, Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-21 Thread Li Zhong
On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
 Hello,
 
 On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
 
 Proper /** function comment would be nice.

Ok, will try to write some in next version.

 
  +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
  +  struct device_attribute *attr)
 
 I can see why you did this but let's please not require the user of
 this function to see how the thing is working internally.  Let's
 return int and keep track of (or look up again) the kernfs_node
 internally.

Ok, it also makes the prototype of lock and unlock look more consistent
and comfortable. 

 
   {
 ...
  +   /*
  +* We assume device_hotplug_lock must be acquired before removing
 
 Is this assumption true?  If so, can we add lockdep assertions in
 places to verify and enforce this?  If not, aren't we just feeling
 good when the reality is broken?

It seems not true ... I think there are devices that don't have the
online/offline concept, we just need to add it, remove it, like ethernet
cards. 

Maybe we could change the comments above, like:
/* We assume device_hotplug_lock must be acquired before 
 * removing devices, which have online/offline sysfs knob, 
 * and some locks are needed to serialize the online/offline
 * callbacks and device removing. ...
? 

And we could add lockdep assertions in cpu and memory related code? e.g.
remove_memory(), unregister_cpu()

Currently, remove_memory() has comments for the function:

 * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
 * and online/offline operations before this call, as required by
 * try_offline_node().
 */

maybe it could be removed with the lockdep assertion.

 ...
 
 Function comment please.

OK. 

Thanks, Zhong

  +void unlock_device_hotplug_sysfs(struct device *dev,
  +struct kernfs_node *kn)
  +{
  +   unlock_device_hotplug();
  +   kernfs_unbreak_active_protection(kn);
  +   put_device(dev);
  +   kernfs_put(kn);
   }
 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks

2014-04-18 Thread Li Zhong
On Thu, 2014-04-17 at 11:17 -0400, Tejun Heo wrote:
> Hello,
> 
> On Thu, Apr 17, 2014 at 02:50:44PM +0800, Li Zhong wrote:
> > This patch tries to solve the device hot remove locking issues in a
> > different way from commit 5e33bc41, as kernfs already has a mechanism 
> > to break active protection. 
> > 
> > The problem here is the order of s_active, and series of hotplug related
> > lock. 
> 
> It prolly deservse more detailed explanation of the deadlock along
> with how 5e33bc41 ("$SUBJ") tried to solve it.  The active protetion
> is there to keep the file alive by blocking deletion while operations
> are on-going in the file.  This blocking creates a dependency loop
> when an operation running off a sysfs knob ends up grabbing a lock
> which may be held while removing the said sysfs knob.

OK, I'll try to add these and something more in next version.

> 
> > +   kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > +   if (WARN_ON_ONCE(!kn))
> > +   return -ENODEV;
> > +
> > +   /*
> > +* Break active protection here to avoid deadlocks with device
> > +* removing process, which tries to remove sysfs entries including this
> > +* "online" attribute while holding some hotplug related locks.
> > +*
> > +* @dev needs to be protected here, or it could go away any time after
> > +* dropping active protection. But it is still unreasonable/unsafe to
> > +* online/offline a device after it being removed. Fortunately, there
> 
> I think this is something driver layer proper should provide
> synchronization for.  It shouldn't be difficult to synchronize this
> function against device_del(), right?  And, please note that @dev is
> guaranteed to have not been removed (at least hasn't gone through attr
> removal) upto this point.

Ok, I think what we need here is the check below, after getting
device_hotplug_lock, and abort this function if device already removed.
We should allow device_del() to remove the device in the other process,
which is why we are breaking the active protection. 

> 
> > +* are some checks in online/offline knobs. Like cpu, it checks cpu
> > +* present/online mask before doing the real work.
> > +*/
> > +
> > +   get_device(dev);
> > +   kernfs_break_active_protection(kn);
> > +
> > +   lock_device_hotplug();
> > +
> > +   /*
> > +* If we assume device_hotplug_lock must be acquired before removing
> > +* device, we may try to find a way to check whether the device has
> > +* been removed here, so we don't call device_{on|off}line against
> > +* removed device.
> > +*/
> 
> Yeah, let's please fix this.

OK, I guess we can check whether dev->kobj.sd becomes NULL. If so, it
means the device has already been deleted by device_del().

> 
> > ret = val ? device_online(dev) : device_offline(dev);
> > unlock_device_hotplug();
> > +
> > +   kernfs_unbreak_active_protection(kn);
> > +   put_device(dev);
> > +
> > +   kernfs_put(kn);
> > +
> > return ret < 0 ? ret : count;
> >  }
> >  static DEVICE_ATTR_RW(online);
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index bece691..0d2f3a5 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
> >  {
> > struct memory_block *mem = to_memory_block(dev);
> > int ret, online_type;
> > +   struct kernfs_node *kn;
> >  
> > -   ret = lock_device_hotplug_sysfs();
> > -   if (ret)
> > -   return ret;
> > +   kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > +   if (WARN_ON_ONCE(!kn))
> > +   return -ENODEV;
> > +
> > +   /* refer to comments in online_store() for more information */
> > +   get_device(dev);
> > +   kernfs_break_active_protection(kn);
> > +
> > +   lock_device_hotplug();
> >  
> > if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
> > online_type = ONLINE_KERNEL;
> > @@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
> >  err:
> > unlock_device_hotplug();
> >  
> > +   kernfs_unbreak_active_protection(kn);
> > +   put_device(dev);
> > +
> > +   kernfs_put(kn);
> 
> There are other users of lock_device_hotplug_sysfs().  We probably
> want to audit them and convert them too, preferably with helper
> routines so that they don't end up duplicating the complexity?

I see, I guess I could keep lock_device_hotplug_sysfs(), just replace it
with the implementation here; and provide a new
unlock_device_hotplug_sysfs(), which will do the unlock, unbreak, and
puts. 

I'll try to get the code ready sometime next week for your review.

Thanks, Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks

2014-04-18 Thread Li Zhong
On Thu, 2014-04-17 at 11:17 -0400, Tejun Heo wrote:
 Hello,
 
 On Thu, Apr 17, 2014 at 02:50:44PM +0800, Li Zhong wrote:
  This patch tries to solve the device hot remove locking issues in a
  different way from commit 5e33bc41, as kernfs already has a mechanism 
  to break active protection. 
  
  The problem here is the order of s_active, and series of hotplug related
  lock. 
 
 It prolly deservse more detailed explanation of the deadlock along
 with how 5e33bc41 ($SUBJ) tried to solve it.  The active protetion
 is there to keep the file alive by blocking deletion while operations
 are on-going in the file.  This blocking creates a dependency loop
 when an operation running off a sysfs knob ends up grabbing a lock
 which may be held while removing the said sysfs knob.

OK, I'll try to add these and something more in next version.

 
  +   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
  +   if (WARN_ON_ONCE(!kn))
  +   return -ENODEV;
  +
  +   /*
  +* Break active protection here to avoid deadlocks with device
  +* removing process, which tries to remove sysfs entries including this
  +* online attribute while holding some hotplug related locks.
  +*
  +* @dev needs to be protected here, or it could go away any time after
  +* dropping active protection. But it is still unreasonable/unsafe to
  +* online/offline a device after it being removed. Fortunately, there
 
 I think this is something driver layer proper should provide
 synchronization for.  It shouldn't be difficult to synchronize this
 function against device_del(), right?  And, please note that @dev is
 guaranteed to have not been removed (at least hasn't gone through attr
 removal) upto this point.

Ok, I think what we need here is the check below, after getting
device_hotplug_lock, and abort this function if device already removed.
We should allow device_del() to remove the device in the other process,
which is why we are breaking the active protection. 

 
  +* are some checks in online/offline knobs. Like cpu, it checks cpu
  +* present/online mask before doing the real work.
  +*/
  +
  +   get_device(dev);
  +   kernfs_break_active_protection(kn);
  +
  +   lock_device_hotplug();
  +
  +   /*
  +* If we assume device_hotplug_lock must be acquired before removing
  +* device, we may try to find a way to check whether the device has
  +* been removed here, so we don't call device_{on|off}line against
  +* removed device.
  +*/
 
 Yeah, let's please fix this.

OK, I guess we can check whether dev-kobj.sd becomes NULL. If so, it
means the device has already been deleted by device_del().

 
  ret = val ? device_online(dev) : device_offline(dev);
  unlock_device_hotplug();
  +
  +   kernfs_unbreak_active_protection(kn);
  +   put_device(dev);
  +
  +   kernfs_put(kn);
  +
  return ret  0 ? ret : count;
   }
   static DEVICE_ATTR_RW(online);
  diff --git a/drivers/base/memory.c b/drivers/base/memory.c
  index bece691..0d2f3a5 100644
  --- a/drivers/base/memory.c
  +++ b/drivers/base/memory.c
  @@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
   {
  struct memory_block *mem = to_memory_block(dev);
  int ret, online_type;
  +   struct kernfs_node *kn;
   
  -   ret = lock_device_hotplug_sysfs();
  -   if (ret)
  -   return ret;
  +   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
  +   if (WARN_ON_ONCE(!kn))
  +   return -ENODEV;
  +
  +   /* refer to comments in online_store() for more information */
  +   get_device(dev);
  +   kernfs_break_active_protection(kn);
  +
  +   lock_device_hotplug();
   
  if (!strncmp(buf, online_kernel, min_t(int, count, 13)))
  online_type = ONLINE_KERNEL;
  @@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
   err:
  unlock_device_hotplug();
   
  +   kernfs_unbreak_active_protection(kn);
  +   put_device(dev);
  +
  +   kernfs_put(kn);
 
 There are other users of lock_device_hotplug_sysfs().  We probably
 want to audit them and convert them too, preferably with helper
 routines so that they don't end up duplicating the complexity?

I see, I guess I could keep lock_device_hotplug_sysfs(), just replace it
with the implementation here; and provide a new
unlock_device_hotplug_sysfs(), which will do the unlock, unbreak, and
puts. 

I'll try to get the code ready sometime next week for your review.

Thanks, Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks

2014-04-17 Thread Li Zhong
This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism 
to break active protection. 

The problem here is the order of s_active, and series of hotplug related
lock. 

This patch takes s_active out of the lock dependency graph, so it won't
dead lock with any hotplug realted locks.

Signed-off-by: Li Zhong 
---
 drivers/base/core.c   | 37 ++---
 drivers/base/memory.c | 18 +++---
 2 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..1b96cb9 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,17 +439,48 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, );
if (ret < 0)
return ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+   if (WARN_ON_ONCE(!kn))
+   return -ENODEV;
+
+   /*
+* Break active protection here to avoid deadlocks with device
+* removing process, which tries to remove sysfs entries including this
+* "online" attribute while holding some hotplug related locks.
+*
+* @dev needs to be protected here, or it could go away any time after
+* dropping active protection. But it is still unreasonable/unsafe to
+* online/offline a device after it being removed. Fortunately, there
+* are some checks in online/offline knobs. Like cpu, it checks cpu
+* present/online mask before doing the real work.
+*/
+
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
+
+   /*
+* If we assume device_hotplug_lock must be acquired before removing
+* device, we may try to find a way to check whether the device has
+* been removed here, so we don't call device_{on|off}line against
+* removed device.
+*/
 
ret = val ? device_online(dev) : device_offline(dev);
unlock_device_hotplug();
+
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+
+   kernfs_put(kn);
+
return ret < 0 ? ret : count;
 }
 static DEVICE_ATTR_RW(online);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..0d2f3a5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
 {
struct memory_block *mem = to_memory_block(dev);
int ret, online_type;
+   struct kernfs_node *kn;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+   if (WARN_ON_ONCE(!kn))
+   return -ENODEV;
+
+   /* refer to comments in online_store() for more information */
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
 
if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
online_type = ONLINE_KERNEL;
@@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
 err:
unlock_device_hotplug();
 
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+
+   kernfs_put(kn);
+
if (ret)
return ret;
return count;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks

2014-04-17 Thread Li Zhong
This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism 
to break active protection. 

The problem here is the order of s_active, and series of hotplug related
lock. 

This patch takes s_active out of the lock dependency graph, so it won't
dead lock with any hotplug realted locks.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/core.c   | 37 ++---
 drivers/base/memory.c | 18 +++---
 2 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..1b96cb9 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,17 +439,48 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, val);
if (ret  0)
return ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
+   if (WARN_ON_ONCE(!kn))
+   return -ENODEV;
+
+   /*
+* Break active protection here to avoid deadlocks with device
+* removing process, which tries to remove sysfs entries including this
+* online attribute while holding some hotplug related locks.
+*
+* @dev needs to be protected here, or it could go away any time after
+* dropping active protection. But it is still unreasonable/unsafe to
+* online/offline a device after it being removed. Fortunately, there
+* are some checks in online/offline knobs. Like cpu, it checks cpu
+* present/online mask before doing the real work.
+*/
+
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
+
+   /*
+* If we assume device_hotplug_lock must be acquired before removing
+* device, we may try to find a way to check whether the device has
+* been removed here, so we don't call device_{on|off}line against
+* removed device.
+*/
 
ret = val ? device_online(dev) : device_offline(dev);
unlock_device_hotplug();
+
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+
+   kernfs_put(kn);
+
return ret  0 ? ret : count;
 }
 static DEVICE_ATTR_RW(online);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..0d2f3a5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
 {
struct memory_block *mem = to_memory_block(dev);
int ret, online_type;
+   struct kernfs_node *kn;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
+   if (WARN_ON_ONCE(!kn))
+   return -ENODEV;
+
+   /* refer to comments in online_store() for more information */
+   get_device(dev);
+   kernfs_break_active_protection(kn);
+
+   lock_device_hotplug();
 
if (!strncmp(buf, online_kernel, min_t(int, count, 13)))
online_type = ONLINE_KERNEL;
@@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
 err:
unlock_device_hotplug();
 
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+
+   kernfs_put(kn);
+
if (ret)
return ret;
return count;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-16 Thread Li Zhong
On Wed, 2014-04-16 at 11:17 -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 16, 2014 at 09:41:40AM +0800, Li Zhong wrote:
> > > If so, that is
> > > an actually possible deadlock, no?
> > 
> > Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
> > lock_device_hotplug_sysfs() to return a restart syscall error if not
> > able to try lock the device_hotplug_lock. That also requires the device
> > removing code path to take the device_hotplug_lock. 
> 
> But that patch only takes out device_hotplug_lock out of the
> dependency graph and does nothing for cpu_add_remove_lock.  It seems
> to be that there still is a deadlock condition involving s_active and
> cpu_add_remove_lock.  Am I missing something here?

It seems to me cpu_add_remove_lock is always taken after
device_hotplug_lock.

So if cpu_add_remove_lock has been acquired by device removing process,
then it means the other online/offline process couldn't successfully try
lock device_hotplug_lock, and will release s_active with a restart
syscall error;

if cpu_add_remove_lock has been acquired by online/offline process, then
it should already hold device_hotlug_lock, and keeps the device removing
process waiting at device_hotplug_lock. So online/offline process could
release the lock, and finally release s_active soon. 

But after some further thinking, I seem to understand your point.
s_active has lock order problem with the other series of hotplug related
locks, so it's better to take s_active out of the dependency chain,
rather than the first of the other series of locks? like you suggested
below.

> 
> Now that kernfs has a proper mechanism to deal with it, wouldn't it
> make more sense to replace 5e33bc41 with prper s_active protection
> breaking?

I'll try this way and send you the code for review.

Thanks,
Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-16 Thread Li Zhong
On Wed, 2014-04-16 at 11:17 -0400, Tejun Heo wrote:
 Hello,
 
 On Wed, Apr 16, 2014 at 09:41:40AM +0800, Li Zhong wrote:
   If so, that is
   an actually possible deadlock, no?
  
  Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
  lock_device_hotplug_sysfs() to return a restart syscall error if not
  able to try lock the device_hotplug_lock. That also requires the device
  removing code path to take the device_hotplug_lock. 
 
 But that patch only takes out device_hotplug_lock out of the
 dependency graph and does nothing for cpu_add_remove_lock.  It seems
 to be that there still is a deadlock condition involving s_active and
 cpu_add_remove_lock.  Am I missing something here?

It seems to me cpu_add_remove_lock is always taken after
device_hotplug_lock.

So if cpu_add_remove_lock has been acquired by device removing process,
then it means the other online/offline process couldn't successfully try
lock device_hotplug_lock, and will release s_active with a restart
syscall error;

if cpu_add_remove_lock has been acquired by online/offline process, then
it should already hold device_hotlug_lock, and keeps the device removing
process waiting at device_hotplug_lock. So online/offline process could
release the lock, and finally release s_active soon. 

But after some further thinking, I seem to understand your point.
s_active has lock order problem with the other series of hotplug related
locks, so it's better to take s_active out of the dependency chain,
rather than the first of the other series of locks? like you suggested
below.

 
 Now that kernfs has a proper mechanism to deal with it, wouldn't it
 make more sense to replace 5e33bc41 with prper s_active protection
 breaking?

I'll try this way and send you the code for review.

Thanks,
Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v3] memory-hotplug: Update documentation to hide information about SECTIONS and remove end_phys_index

2014-04-15 Thread Li Zhong
Seems we all agree that information about SECTION, e.g. section size,
sections per memory block should be kept as kernel internals, and not
exposed to userspace.

This patch updates Documentation/memory-hotplug.txt to refer to memory
blocks instead of memory sections where appropriate and added a
paragraph to explain that memory blocks are made of memory sections.
The documentation update is mostly provided by Nathan.

Also, as end_phys_index in code is actually not the end section id, but
the end memory block id, which should always be the same as phys_index.
So it is removed here.

Signed-off-by: Li Zhong 
Reviewed-by: Zhang Yanfei 
---
v3: memort->memory, pointed out by Yanfei

 Documentation/memory-hotplug.txt |  125 +++---
 drivers/base/memory.c|   12 
 2 files changed, 61 insertions(+), 76 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 58340d5..1aa239f 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -88,16 +88,21 @@ phase by hand.
 
 1.3. Unit of Memory online/offline operation
 
-Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole memory
-into chunks of the same size. The chunk is called a "section". The size of
-a section is architecture dependent. For example, power uses 16MiB, ia64 uses
-1GiB. The unit of online/offline operation is "one section". (see Section 3.)
+Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
+into chunks of the same size. These chunks are called "sections". The size of
+a memory section is architecture dependent. For example, power uses 16MiB, ia64
+uses 1GiB.
 
-To determine the size of sections, please read this file:
+Memory sections are combined into chunks referred to as "memory blocks". The
+size of a memory block is architecture dependent and represents the logical
+unit upon which memory online/offline operations are to be performed. The
+default size of a memory block is the same as memory section size unless an
+architecture specifies otherwise. (see Section 3.)
+
+To determine the size (in bytes) of a memory block please read this file:
 
 /sys/devices/system/memory/block_size_bytes
 
-This file shows the size of sections in byte.
 
 ---
 2. Kernel Configuration
@@ -123,42 +128,35 @@ config options.
 (CONFIG_ACPI_CONTAINER).
 This option can be kernel module too.
 
+
 
-4 sysfs files for memory hotplug
+3 sysfs files for memory hotplug
 
-All sections have their device information in sysfs.  Each section is part of
-a memory block under /sys/devices/system/memory as
+All memory blocks have their device information in sysfs.  Each memory block
+is described under /sys/devices/system/memory as
 
 /sys/devices/system/memory/memoryXXX
-(XXX is the section id.)
+(XXX is the memory block id.)
 
-Now, XXX is defined as (start_address_of_section / section_size) of the first
-section contained in the memory block.  The files 'phys_index' and
-'end_phys_index' under each directory report the beginning and end section id's
-for the memory block covered by the sysfs directory.  It is expected that all
+For the memory block covered by the sysfs directory.  It is expected that all
 memory sections in this range are present and no memory holes exist in the
 range. Currently there is no way to determine if there is a memory hole, but
 the existence of one should not affect the hotplug capabilities of the memory
 block.
 
-For example, assume 1GiB section size. A device for a memory starting at
+For example, assume 1GiB memory block size. A device for a memory starting at
 0x1 is /sys/device/system/memory/memory4
 (0x1 / 1Gib = 4)
 This device covers address range [0x1 ... 0x14000)
 
-Under each section, you can see 4 or 5 files, the end_phys_index file being
-a recent addition and not present on older kernels.
+Under each memory block, you can see 4 files:
 
-/sys/devices/system/memory/memoryXXX/start_phys_index
-/sys/devices/system/memory/memoryXXX/end_phys_index
+/sys/devices/system/memory/memoryXXX/phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index'  : read-only and contains section id of the first section
-   in the memory block, same as XXX.
-'end_phys_index'  : read-only and contains section id of the last section
-   in the memory block.
+'phys_index'  : read-only and contains memory block id, same as XXX.
 'state'   : read-write
 at read:  contains online/offline state of memory.
 at write: user can specify "online_kernel",
@@ -185,6 +183,7 @@ For example:
 A backlink will also be created:
 /sys/de

Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-15 Thread Li Zhong
On Tue, 2014-04-15 at 10:50 -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Apr 15, 2014 at 10:44:37AM +0800, Li Zhong wrote:
> > / *
> >   * This process might deadlock with another process trying to 
> >   * remove this device:
> >   * This process holding the s_active of "online" attribute, and tries 
> >   * to online/offline the device with some locks protecting hotplug.
> >   * Device removing process holding some locks protecting hotplug, and 
> >   * tries to remove the "online" attribute, waiting for the s_active to
> >   * be released. 
> >   *
> >   * The deadlock described above should be solved with
> >   * lock_device_hotplug_sysfs(). We temporarily drop the active 
> >   * protection here to avoid some lockdep warnings. 
> >   *
> >   * If device_hotplug_lock is forgotten to be used when removing
> >   * device(possibly some very simple device even don't need this lock?),
> >   * @dev could go away any time after dropping the active protection. 
> >   * So increase its ref count before dropping active protection. 
> >   * Though invoking device_{on|off}line() on a removed device seems
> >   * unreasonable, it should be less disastrous than playing with freed
> >   * @dev. Also, we might be able to have some mechanism abort 
> >   * device_{on|off}line() if @dev already removed.
> >   */
> 
> Hmmm... I'm not sure I fully understand the problem.  Does the code
> ever try to remove "online" while holding cpu_add_remove_lock and,
> when written 0, online knob grabs cpu_add_remove_lock?  

Yes.

In acpi_processor_remove(), cpu_maps_update_begin() is called to hold
cpu_add_remove_lock, and then arch_unregister_cpu calls
unregister_cpu(), which will try to remove dir cpu1 including "online".

while written 0 to online, cpu_down() will also try to grab
cpu_add_remove_lock with cpu_maps_update_begin().

> If so, that is
> an actually possible deadlock, no?

Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
lock_device_hotplug_sysfs() to return a restart syscall error if not
able to try lock the device_hotplug_lock. That also requires the device
removing code path to take the device_hotplug_lock. 

Thanks, Zhong

> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-15 Thread Li Zhong
On Tue, 2014-04-15 at 10:50 -0400, Tejun Heo wrote:
 Hello,
 
 On Tue, Apr 15, 2014 at 10:44:37AM +0800, Li Zhong wrote:
  / *
* This process might deadlock with another process trying to 
* remove this device:
* This process holding the s_active of online attribute, and tries 
* to online/offline the device with some locks protecting hotplug.
* Device removing process holding some locks protecting hotplug, and 
* tries to remove the online attribute, waiting for the s_active to
* be released. 
*
* The deadlock described above should be solved with
* lock_device_hotplug_sysfs(). We temporarily drop the active 
* protection here to avoid some lockdep warnings. 
*
* If device_hotplug_lock is forgotten to be used when removing
* device(possibly some very simple device even don't need this lock?),
* @dev could go away any time after dropping the active protection. 
* So increase its ref count before dropping active protection. 
* Though invoking device_{on|off}line() on a removed device seems
* unreasonable, it should be less disastrous than playing with freed
* @dev. Also, we might be able to have some mechanism abort 
* device_{on|off}line() if @dev already removed.
*/
 
 Hmmm... I'm not sure I fully understand the problem.  Does the code
 ever try to remove online while holding cpu_add_remove_lock and,
 when written 0, online knob grabs cpu_add_remove_lock?  

Yes.

In acpi_processor_remove(), cpu_maps_update_begin() is called to hold
cpu_add_remove_lock, and then arch_unregister_cpu calls
unregister_cpu(), which will try to remove dir cpu1 including online.

while written 0 to online, cpu_down() will also try to grab
cpu_add_remove_lock with cpu_maps_update_begin().

 If so, that is
 an actually possible deadlock, no?

Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
lock_device_hotplug_sysfs() to return a restart syscall error if not
able to try lock the device_hotplug_lock. That also requires the device
removing code path to take the device_hotplug_lock. 

Thanks, Zhong

 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v3] memory-hotplug: Update documentation to hide information about SECTIONS and remove end_phys_index

2014-04-15 Thread Li Zhong
Seems we all agree that information about SECTION, e.g. section size,
sections per memory block should be kept as kernel internals, and not
exposed to userspace.

This patch updates Documentation/memory-hotplug.txt to refer to memory
blocks instead of memory sections where appropriate and added a
paragraph to explain that memory blocks are made of memory sections.
The documentation update is mostly provided by Nathan.

Also, as end_phys_index in code is actually not the end section id, but
the end memory block id, which should always be the same as phys_index.
So it is removed here.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Reviewed-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
---
v3: memort-memory, pointed out by Yanfei

 Documentation/memory-hotplug.txt |  125 +++---
 drivers/base/memory.c|   12 
 2 files changed, 61 insertions(+), 76 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 58340d5..1aa239f 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -88,16 +88,21 @@ phase by hand.
 
 1.3. Unit of Memory online/offline operation
 
-Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole memory
-into chunks of the same size. The chunk is called a section. The size of
-a section is architecture dependent. For example, power uses 16MiB, ia64 uses
-1GiB. The unit of online/offline operation is one section. (see Section 3.)
+Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
+into chunks of the same size. These chunks are called sections. The size of
+a memory section is architecture dependent. For example, power uses 16MiB, ia64
+uses 1GiB.
 
-To determine the size of sections, please read this file:
+Memory sections are combined into chunks referred to as memory blocks. The
+size of a memory block is architecture dependent and represents the logical
+unit upon which memory online/offline operations are to be performed. The
+default size of a memory block is the same as memory section size unless an
+architecture specifies otherwise. (see Section 3.)
+
+To determine the size (in bytes) of a memory block please read this file:
 
 /sys/devices/system/memory/block_size_bytes
 
-This file shows the size of sections in byte.
 
 ---
 2. Kernel Configuration
@@ -123,42 +128,35 @@ config options.
 (CONFIG_ACPI_CONTAINER).
 This option can be kernel module too.
 
+
 
-4 sysfs files for memory hotplug
+3 sysfs files for memory hotplug
 
-All sections have their device information in sysfs.  Each section is part of
-a memory block under /sys/devices/system/memory as
+All memory blocks have their device information in sysfs.  Each memory block
+is described under /sys/devices/system/memory as
 
 /sys/devices/system/memory/memoryXXX
-(XXX is the section id.)
+(XXX is the memory block id.)
 
-Now, XXX is defined as (start_address_of_section / section_size) of the first
-section contained in the memory block.  The files 'phys_index' and
-'end_phys_index' under each directory report the beginning and end section id's
-for the memory block covered by the sysfs directory.  It is expected that all
+For the memory block covered by the sysfs directory.  It is expected that all
 memory sections in this range are present and no memory holes exist in the
 range. Currently there is no way to determine if there is a memory hole, but
 the existence of one should not affect the hotplug capabilities of the memory
 block.
 
-For example, assume 1GiB section size. A device for a memory starting at
+For example, assume 1GiB memory block size. A device for a memory starting at
 0x1 is /sys/device/system/memory/memory4
 (0x1 / 1Gib = 4)
 This device covers address range [0x1 ... 0x14000)
 
-Under each section, you can see 4 or 5 files, the end_phys_index file being
-a recent addition and not present on older kernels.
+Under each memory block, you can see 4 files:
 
-/sys/devices/system/memory/memoryXXX/start_phys_index
-/sys/devices/system/memory/memoryXXX/end_phys_index
+/sys/devices/system/memory/memoryXXX/phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index'  : read-only and contains section id of the first section
-   in the memory block, same as XXX.
-'end_phys_index'  : read-only and contains section id of the last section
-   in the memory block.
+'phys_index'  : read-only and contains memory block id, same as XXX.
 'state'   : read-write
 at read:  contains online/offline state of memory.
 at write: user can specify online_kernel,
@@ -185,6 +183,7 @@ For example:
 A backlink will also be created:
 /sys/devices/system

Re: [RFC PATCH v2] memory-hotplug: Update documentation to hide information about SECTIONS and remove end_phys_index

2014-04-14 Thread Li Zhong
On Mon, 2014-04-14 at 17:13 +0800, Zhang Yanfei wrote:
> On 04/14/2014 04:43 PM, Li Zhong wrote:
> > Seems we all agree that information about SECTION, e.g. section size,
> > sections per memory block should be kept as kernel internals, and not
> > exposed to userspace.
> > 
> > This patch updates Documentation/memory-hotplug.txt to refer to memory
> > blocks instead of memory sections where appropriate and added a
> > paragraph to explain that memory blocks are made of memory sections.
> > The documentation update is mostly provided by Nathan.
> > 
> > Also, as end_phys_index in code is actually not the end section id, but
> > the end memory block id, which should always be the same as phys_index.
> > So it is removed here.
> > 
> > Signed-off-by: Li Zhong 
> 
> Reviewed-by: Zhang Yanfei 
> 
> Still the nitpick there.

Ao.. Will fix it in next version.

Thanks, Zhong

> 
> > ---
> >  Documentation/memory-hotplug.txt |  125 
> > +++---
> >  drivers/base/memory.c|   12 
> >  2 files changed, 61 insertions(+), 76 deletions(-)
> > 
> > diff --git a/Documentation/memory-hotplug.txt 
> > b/Documentation/memory-hotplug.txt
> > index 58340d5..1aa239f 100644
> > --- a/Documentation/memory-hotplug.txt
> > +++ b/Documentation/memory-hotplug.txt
> > @@ -88,16 +88,21 @@ phase by hand.
> >  
> >  1.3. Unit of Memory online/offline operation
> >  
> > -Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole 
> > memory
> > -into chunks of the same size. The chunk is called a "section". The size of
> > -a section is architecture dependent. For example, power uses 16MiB, ia64 
> > uses
> > -1GiB. The unit of online/offline operation is "one section". (see Section 
> > 3.)
> > +Memory hotplug uses SPARSEMEM memory model which allows memory to be 
> > divided
> > +into chunks of the same size. These chunks are called "sections". The size 
> > of
> > +a memory section is architecture dependent. For example, power uses 16MiB, 
> > ia64
> > +uses 1GiB.
> >  
> > -To determine the size of sections, please read this file:
> > +Memory sections are combined into chunks referred to as "memory blocks". 
> > The
> > +size of a memory block is architecture dependent and represents the logical
> > +unit upon which memory online/offline operations are to be performed. The
> > +default size of a memory block is the same as memory section size unless an
> > +architecture specifies otherwise. (see Section 3.)
> > +
> > +To determine the size (in bytes) of a memory block please read this file:
> >  
> >  /sys/devices/system/memory/block_size_bytes
> >  
> > -This file shows the size of sections in byte.
> >  
> >  ---
> >  2. Kernel Configuration
> > @@ -123,42 +128,35 @@ config options.
> >  (CONFIG_ACPI_CONTAINER).
> >  This option can be kernel module too.
> >  
> > +
> >  
> > -4 sysfs files for memory hotplug
> > +3 sysfs files for memory hotplug
> >  
> > -All sections have their device information in sysfs.  Each section is part 
> > of
> > -a memory block under /sys/devices/system/memory as
> > +All memory blocks have their device information in sysfs.  Each memory 
> > block
> > +is described under /sys/devices/system/memory as
> >  
> >  /sys/devices/system/memory/memoryXXX
> > -(XXX is the section id.)
> > +(XXX is the memory block id.)
> >  
> > -Now, XXX is defined as (start_address_of_section / section_size) of the 
> > first
> > -section contained in the memory block.  The files 'phys_index' and
> > -'end_phys_index' under each directory report the beginning and end section 
> > id's
> > -for the memory block covered by the sysfs directory.  It is expected that 
> > all
> > +For the memory block covered by the sysfs directory.  It is expected that 
> > all
> >  memory sections in this range are present and no memory holes exist in the
> >  range. Currently there is no way to determine if there is a memory hole, 
> > but
> >  the existence of one should not affect the hotplug capabilities of the 
> > memory
> >  block.
> >  
> > -For example, assume 1GiB section size. A device for a memory starting at
> > +For example, assume 1GiB memory block size. A device for a memory starting 
> > at
> >  0x1 

Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-14 Thread Li Zhong
On Mon, 2014-04-14 at 16:13 -0400, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 14, 2014 at 03:47:29PM +0800, Li Zhong wrote:
> > @@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct 
> > device_attribute *attr,
> >  {
> > bool val;
> > int ret;
> > +   struct kernfs_node *kn;
> >  
> > ret = strtobool(buf, );
> > if (ret < 0)
> > @@ -448,7 +449,19 @@ static ssize_t online_store(struct device *dev, struct 
> > device_attribute *attr,
> > if (ret)
> > return ret;
> >  
> > +   kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > +   if (WARN_ON_ONCE(!kn))
> > +   goto out;
> > +
> > +   get_device(dev);
> > +   kernfs_break_active_protection(kn);
> > ret = val ? device_online(dev) : device_offline(dev);
> > +   kernfs_unbreak_active_protection(kn);
> > +   put_device(dev);
> > +
> > +   kernfs_put(kn);
> > +
> > +out:
> > unlock_device_hotplug();
> > return ret < 0 ? ret : count;
> >  }
> 
> Can you please add comment explainin why this is being down?  As it
> currently stands, it's quite a bit of complexity without any
> indication what's going on why.  Also, if device_hotplug is locked, is
> it really necessary to get @dev?  Can it go away inbetween?  The code
> snippet looks weird because getting @dev indicates that the device
> might go away without doing it but then it proceeds to invoke
> device_{on|off}line() which probably isn't safe to invoke on a removed
> device.

Hi Tejun, 

I tried to write some draft words here... (I'm not good at writing
it...) Could you please help to have a review and comment? thanks.

/ *
  * This process might deadlock with another process trying to 
  * remove this device:
  * This process holding the s_active of "online" attribute, and tries 
  * to online/offline the device with some locks protecting hotplug.
  * Device removing process holding some locks protecting hotplug, and 
  * tries to remove the "online" attribute, waiting for the s_active to
  * be released. 
  *
  * The deadlock described above should be solved with
  * lock_device_hotplug_sysfs(). We temporarily drop the active 
  * protection here to avoid some lockdep warnings. 
  *
  * If device_hotplug_lock is forgotten to be used when removing
  * device(possibly some very simple device even don't need this lock?),
  * @dev could go away any time after dropping the active protection. 
  * So increase its ref count before dropping active protection. 
  * Though invoking device_{on|off}line() on a removed device seems
  * unreasonable, it should be less disastrous than playing with freed
  * @dev. Also, we might be able to have some mechanism abort 
  * device_{on|off}line() if @dev already removed.
  */

Thanks, Zhong
> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2] memory-hotplug: Update documentation to hide information about SECTIONS and remove end_phys_index

2014-04-14 Thread Li Zhong
Seems we all agree that information about SECTION, e.g. section size,
sections per memory block should be kept as kernel internals, and not
exposed to userspace.

This patch updates Documentation/memory-hotplug.txt to refer to memory
blocks instead of memory sections where appropriate and added a
paragraph to explain that memory blocks are made of memory sections.
The documentation update is mostly provided by Nathan.

Also, as end_phys_index in code is actually not the end section id, but
the end memory block id, which should always be the same as phys_index.
So it is removed here.

Signed-off-by: Li Zhong 
---
 Documentation/memory-hotplug.txt |  125 +++---
 drivers/base/memory.c|   12 
 2 files changed, 61 insertions(+), 76 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 58340d5..1aa239f 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -88,16 +88,21 @@ phase by hand.
 
 1.3. Unit of Memory online/offline operation
 
-Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole memory
-into chunks of the same size. The chunk is called a "section". The size of
-a section is architecture dependent. For example, power uses 16MiB, ia64 uses
-1GiB. The unit of online/offline operation is "one section". (see Section 3.)
+Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
+into chunks of the same size. These chunks are called "sections". The size of
+a memory section is architecture dependent. For example, power uses 16MiB, ia64
+uses 1GiB.
 
-To determine the size of sections, please read this file:
+Memory sections are combined into chunks referred to as "memory blocks". The
+size of a memory block is architecture dependent and represents the logical
+unit upon which memory online/offline operations are to be performed. The
+default size of a memory block is the same as memory section size unless an
+architecture specifies otherwise. (see Section 3.)
+
+To determine the size (in bytes) of a memory block please read this file:
 
 /sys/devices/system/memory/block_size_bytes
 
-This file shows the size of sections in byte.
 
 ---
 2. Kernel Configuration
@@ -123,42 +128,35 @@ config options.
 (CONFIG_ACPI_CONTAINER).
 This option can be kernel module too.
 
+
 
-4 sysfs files for memory hotplug
+3 sysfs files for memory hotplug
 
-All sections have their device information in sysfs.  Each section is part of
-a memory block under /sys/devices/system/memory as
+All memory blocks have their device information in sysfs.  Each memory block
+is described under /sys/devices/system/memory as
 
 /sys/devices/system/memory/memoryXXX
-(XXX is the section id.)
+(XXX is the memory block id.)
 
-Now, XXX is defined as (start_address_of_section / section_size) of the first
-section contained in the memory block.  The files 'phys_index' and
-'end_phys_index' under each directory report the beginning and end section id's
-for the memory block covered by the sysfs directory.  It is expected that all
+For the memory block covered by the sysfs directory.  It is expected that all
 memory sections in this range are present and no memory holes exist in the
 range. Currently there is no way to determine if there is a memory hole, but
 the existence of one should not affect the hotplug capabilities of the memory
 block.
 
-For example, assume 1GiB section size. A device for a memory starting at
+For example, assume 1GiB memory block size. A device for a memory starting at
 0x1 is /sys/device/system/memory/memory4
 (0x1 / 1Gib = 4)
 This device covers address range [0x1 ... 0x14000)
 
-Under each section, you can see 4 or 5 files, the end_phys_index file being
-a recent addition and not present on older kernels.
+Under each memory block, you can see 4 files:
 
-/sys/devices/system/memory/memoryXXX/start_phys_index
-/sys/devices/system/memory/memoryXXX/end_phys_index
+/sys/devices/system/memory/memoryXXX/phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index'  : read-only and contains section id of the first section
-   in the memory block, same as XXX.
-'end_phys_index'  : read-only and contains section id of the last section
-   in the memory block.
+'phys_index'  : read-only and contains memory block id, same as XXX.
 'state'   : read-write
 at read:  contains online/offline state of memory.
 at write: user can specify "online_kernel",
@@ -185,6 +183,7 @@ For example:
 A backlink will also be created:
 /sys/devices/system/memory/memory9/node0 -> ../../node/node0
 
+
 -

[RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-14 Thread Li Zhong
I noticed following lockdep warning when trying acpi hot-remove cpus:

[84154.204080] ==
[84154.204080] [ INFO: possible circular locking dependency detected ]
[84154.204080] 3.14.0-next-20140408+ #24 Tainted: GW
[84154.204080] ---
[84154.204080] bash/777 is trying to acquire lock:
[84154.204080]  (cpu_add_remove_lock){+.+.+.}, at: []
cpu_maps_update_begin+0x17/0x20
[84154.213203] 
[84154.213203] but task is already holding lock:
[84154.213203]  (s_active#79){.+}, at: []
kernfs_fop_write+0xe4/0x190
[84154.213203] 
[84154.213203] which lock already depends on the new lock.
[84154.213203] 
[84154.213203] 
[84154.213203] the existing dependency chain (in reverse order) is:
[84154.213203] 
-> #3 (s_active#79){.+}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] __kernfs_remove+0x250/0x310
[84154.213203][] kernfs_remove_by_name_ns
+0x50/0xc0
[84154.213203][] sysfs_remove_file_ns
+0x15/0x20
[84154.213203][] device_remove_file+0x19/0x20
[84154.213203][] device_remove_attrs+0x33/0x80
[84154.213203][] device_del+0x127/0x1c0
[84154.213203][] device_unregister+0x22/0x60
[84154.213203][] unregister_cpu+0x1e/0x40
[84154.213203][] arch_unregister_cpu+0x23/0x30
[84154.213203][] acpi_processor_remove
+0x8d/0xb2
[84154.213203][] acpi_bus_trim+0x5a/0x8d
[84154.213203][] acpi_device_hotplug
+0x1a8/0x3ec
[84154.213203][] acpi_hotplug_work_fn
+0x1f/0x2b
[84154.213203][] process_one_work+0x1eb/0x6b0
[84154.213203][] worker_thread+0x11b/0x370
[84154.213203][] kthread+0xe4/0x100
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #2 (cpu_hotplug.lock#2){+.+.+.}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] mutex_lock_nested+0x50/0x3c0
[84154.213203][] cpu_hotplug_begin+0x4f/0x80
[84154.213203][] _cpu_up+0x3f/0x160
[84154.213203][] cpu_up+0x69/0x80
[84154.213203][] smp_init+0x60/0x8c
[84154.213203][] kernel_init_freeable
+0x126/0x23b
[84154.213203][] kernel_init+0xe/0xf0
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #1 (cpu_hotplug.lock){++}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] cpu_hotplug_begin+0x41/0x80
[84154.213203][] _cpu_up+0x3f/0x160
[84154.213203][] cpu_up+0x69/0x80
[84154.213203][] smp_init+0x60/0x8c
[84154.213203][] kernel_init_freeable
+0x126/0x23b
[84154.213203][] kernel_init+0xe/0xf0
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #0 (cpu_add_remove_lock){+.+.+.}:
[84154.213203][] __lock_acquire+0x1f2a/0x1f60
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] mutex_lock_nested+0x50/0x3c0
[84154.213203][] cpu_maps_update_begin
+0x17/0x20
[84154.213203][] cpu_down+0x1d/0x50
[84154.213203][] cpu_subsys_offline+0x14/0x20
[84154.213203][] device_offline+0xad/0xd0
[84154.213203][] online_store+0x42/0x80
[84154.213203][] dev_attr_store+0x18/0x30
[84154.213203][] sysfs_kf_write+0x49/0x60
[84154.213203][] kernfs_fop_write+0x109/0x190
[84154.213203][] vfs_write+0xbe/0x1c0
[84154.213203][] SyS_write+0x52/0xb0
[84154.213203][] tracesys+0xd0/0xd5
[84154.213203] 
[84154.213203] other info that might help us debug this:
[84154.213203] 
[84154.213203] Chain exists of:
  cpu_add_remove_lock --> cpu_hotplug.lock#2 --> s_active#79

[84154.213203]  Possible unsafe locking scenario:
[84154.213203] 
[84154.213203]CPU0CPU1
[84154.213203]
[84154.213203]   lock(s_active#79);
[84154.213203]lock(cpu_hotplug.lock#2);
[84154.213203]lock(s_active#79);
[84154.213203]   lock(cpu_add_remove_lock);
[84154.213203] 
[84154.213203]  *** DEADLOCK ***
.

The deadlock itself seems already fixed in commit 5e33bc41. 

As Tejun suggested, to avoid this lockdep warning,
kernfs_break_active_protection() is used before online/offline callbacks
to take the s_active lock out of the dependency chain during
online/offline operations. 

Signed-off-by: Li Zhong 
---
 drivers/base/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..b313ad7 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, );
if (ret < 0)
@@ -448,7 +449,19 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
if (ret)
 

Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-14 Thread Li Zhong
On Mon, 2014-04-14 at 16:13 -0400, Tejun Heo wrote:
 Hello,
 
 On Mon, Apr 14, 2014 at 03:47:29PM +0800, Li Zhong wrote:
  @@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct 
  device_attribute *attr,
   {
  bool val;
  int ret;
  +   struct kernfs_node *kn;
   
  ret = strtobool(buf, val);
  if (ret  0)
  @@ -448,7 +449,19 @@ static ssize_t online_store(struct device *dev, struct 
  device_attribute *attr,
  if (ret)
  return ret;
   
  +   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
  +   if (WARN_ON_ONCE(!kn))
  +   goto out;
  +
  +   get_device(dev);
  +   kernfs_break_active_protection(kn);
  ret = val ? device_online(dev) : device_offline(dev);
  +   kernfs_unbreak_active_protection(kn);
  +   put_device(dev);
  +
  +   kernfs_put(kn);
  +
  +out:
  unlock_device_hotplug();
  return ret  0 ? ret : count;
   }
 
 Can you please add comment explainin why this is being down?  As it
 currently stands, it's quite a bit of complexity without any
 indication what's going on why.  Also, if device_hotplug is locked, is
 it really necessary to get @dev?  Can it go away inbetween?  The code
 snippet looks weird because getting @dev indicates that the device
 might go away without doing it but then it proceeds to invoke
 device_{on|off}line() which probably isn't safe to invoke on a removed
 device.

Hi Tejun, 

I tried to write some draft words here... (I'm not good at writing
it...) Could you please help to have a review and comment? thanks.

/ *
  * This process might deadlock with another process trying to 
  * remove this device:
  * This process holding the s_active of online attribute, and tries 
  * to online/offline the device with some locks protecting hotplug.
  * Device removing process holding some locks protecting hotplug, and 
  * tries to remove the online attribute, waiting for the s_active to
  * be released. 
  *
  * The deadlock described above should be solved with
  * lock_device_hotplug_sysfs(). We temporarily drop the active 
  * protection here to avoid some lockdep warnings. 
  *
  * If device_hotplug_lock is forgotten to be used when removing
  * device(possibly some very simple device even don't need this lock?),
  * @dev could go away any time after dropping the active protection. 
  * So increase its ref count before dropping active protection. 
  * Though invoking device_{on|off}line() on a removed device seems
  * unreasonable, it should be less disastrous than playing with freed
  * @dev. Also, we might be able to have some mechanism abort 
  * device_{on|off}line() if @dev already removed.
  */

Thanks, Zhong
 
 Thanks.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2] memory-hotplug: Update documentation to hide information about SECTIONS and remove end_phys_index

2014-04-14 Thread Li Zhong
On Mon, 2014-04-14 at 17:13 +0800, Zhang Yanfei wrote:
 On 04/14/2014 04:43 PM, Li Zhong wrote:
  Seems we all agree that information about SECTION, e.g. section size,
  sections per memory block should be kept as kernel internals, and not
  exposed to userspace.
  
  This patch updates Documentation/memory-hotplug.txt to refer to memory
  blocks instead of memory sections where appropriate and added a
  paragraph to explain that memory blocks are made of memory sections.
  The documentation update is mostly provided by Nathan.
  
  Also, as end_phys_index in code is actually not the end section id, but
  the end memory block id, which should always be the same as phys_index.
  So it is removed here.
  
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
 
 Reviewed-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
 
 Still the nitpick there.

Ao.. Will fix it in next version.

Thanks, Zhong

 
  ---
   Documentation/memory-hotplug.txt |  125 
  +++---
   drivers/base/memory.c|   12 
   2 files changed, 61 insertions(+), 76 deletions(-)
  
  diff --git a/Documentation/memory-hotplug.txt 
  b/Documentation/memory-hotplug.txt
  index 58340d5..1aa239f 100644
  --- a/Documentation/memory-hotplug.txt
  +++ b/Documentation/memory-hotplug.txt
  @@ -88,16 +88,21 @@ phase by hand.
   
   1.3. Unit of Memory online/offline operation
   
  -Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole 
  memory
  -into chunks of the same size. The chunk is called a section. The size of
  -a section is architecture dependent. For example, power uses 16MiB, ia64 
  uses
  -1GiB. The unit of online/offline operation is one section. (see Section 
  3.)
  +Memory hotplug uses SPARSEMEM memory model which allows memory to be 
  divided
  +into chunks of the same size. These chunks are called sections. The size 
  of
  +a memory section is architecture dependent. For example, power uses 16MiB, 
  ia64
  +uses 1GiB.
   
  -To determine the size of sections, please read this file:
  +Memory sections are combined into chunks referred to as memory blocks. 
  The
  +size of a memory block is architecture dependent and represents the logical
  +unit upon which memory online/offline operations are to be performed. The
  +default size of a memory block is the same as memory section size unless an
  +architecture specifies otherwise. (see Section 3.)
  +
  +To determine the size (in bytes) of a memory block please read this file:
   
   /sys/devices/system/memory/block_size_bytes
   
  -This file shows the size of sections in byte.
   
   ---
   2. Kernel Configuration
  @@ -123,42 +128,35 @@ config options.
   (CONFIG_ACPI_CONTAINER).
   This option can be kernel module too.
   
  +
   
  -4 sysfs files for memory hotplug
  +3 sysfs files for memory hotplug
   
  -All sections have their device information in sysfs.  Each section is part 
  of
  -a memory block under /sys/devices/system/memory as
  +All memory blocks have their device information in sysfs.  Each memory 
  block
  +is described under /sys/devices/system/memory as
   
   /sys/devices/system/memory/memoryXXX
  -(XXX is the section id.)
  +(XXX is the memory block id.)
   
  -Now, XXX is defined as (start_address_of_section / section_size) of the 
  first
  -section contained in the memory block.  The files 'phys_index' and
  -'end_phys_index' under each directory report the beginning and end section 
  id's
  -for the memory block covered by the sysfs directory.  It is expected that 
  all
  +For the memory block covered by the sysfs directory.  It is expected that 
  all
   memory sections in this range are present and no memory holes exist in the
   range. Currently there is no way to determine if there is a memory hole, 
  but
   the existence of one should not affect the hotplug capabilities of the 
  memory
   block.
   
  -For example, assume 1GiB section size. A device for a memory starting at
  +For example, assume 1GiB memory block size. A device for a memory starting 
  at
   0x1 is /sys/device/system/memory/memory4
   (0x1 / 1Gib = 4)
   This device covers address range [0x1 ... 0x14000)
   
  -Under each section, you can see 4 or 5 files, the end_phys_index file being
  -a recent addition and not present on older kernels.
  +Under each memory block, you can see 4 files:
   
  -/sys/devices/system/memory/memoryXXX/start_phys_index
  -/sys/devices/system/memory/memoryXXX/end_phys_index
  +/sys/devices/system/memory/memoryXXX/phys_index
   /sys/devices/system/memory/memoryXXX/phys_device
   /sys/devices/system/memory/memoryXXX/state
   /sys/devices/system/memory/memoryXXX/removable
   
  -'phys_index'  : read-only and contains section id of the first section
  -   in the memory block, same as XXX.
  -'end_phys_index'  : read-only and contains section id

[RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

2014-04-14 Thread Li Zhong
 suggested, to avoid this lockdep warning,
kernfs_break_active_protection() is used before online/offline callbacks
to take the s_active lock out of the dependency chain during
online/offline operations. 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..b313ad7 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, val);
if (ret  0)
@@ -448,7 +449,19 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
if (ret)
return ret;
 
+   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
+   if (WARN_ON_ONCE(!kn))
+   goto out;
+
+   get_device(dev);
+   kernfs_break_active_protection(kn);
ret = val ? device_online(dev) : device_offline(dev);
+   kernfs_unbreak_active_protection(kn);
+   put_device(dev);
+
+   kernfs_put(kn);
+
+out:
unlock_device_hotplug();
return ret  0 ? ret : count;
 }



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2] memory-hotplug: Update documentation to hide information about SECTIONS and remove end_phys_index

2014-04-14 Thread Li Zhong
Seems we all agree that information about SECTION, e.g. section size,
sections per memory block should be kept as kernel internals, and not
exposed to userspace.

This patch updates Documentation/memory-hotplug.txt to refer to memory
blocks instead of memory sections where appropriate and added a
paragraph to explain that memory blocks are made of memory sections.
The documentation update is mostly provided by Nathan.

Also, as end_phys_index in code is actually not the end section id, but
the end memory block id, which should always be the same as phys_index.
So it is removed here.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 Documentation/memory-hotplug.txt |  125 +++---
 drivers/base/memory.c|   12 
 2 files changed, 61 insertions(+), 76 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 58340d5..1aa239f 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -88,16 +88,21 @@ phase by hand.
 
 1.3. Unit of Memory online/offline operation
 
-Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole memory
-into chunks of the same size. The chunk is called a section. The size of
-a section is architecture dependent. For example, power uses 16MiB, ia64 uses
-1GiB. The unit of online/offline operation is one section. (see Section 3.)
+Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
+into chunks of the same size. These chunks are called sections. The size of
+a memory section is architecture dependent. For example, power uses 16MiB, ia64
+uses 1GiB.
 
-To determine the size of sections, please read this file:
+Memory sections are combined into chunks referred to as memory blocks. The
+size of a memory block is architecture dependent and represents the logical
+unit upon which memory online/offline operations are to be performed. The
+default size of a memory block is the same as memory section size unless an
+architecture specifies otherwise. (see Section 3.)
+
+To determine the size (in bytes) of a memory block please read this file:
 
 /sys/devices/system/memory/block_size_bytes
 
-This file shows the size of sections in byte.
 
 ---
 2. Kernel Configuration
@@ -123,42 +128,35 @@ config options.
 (CONFIG_ACPI_CONTAINER).
 This option can be kernel module too.
 
+
 
-4 sysfs files for memory hotplug
+3 sysfs files for memory hotplug
 
-All sections have their device information in sysfs.  Each section is part of
-a memory block under /sys/devices/system/memory as
+All memory blocks have their device information in sysfs.  Each memory block
+is described under /sys/devices/system/memory as
 
 /sys/devices/system/memory/memoryXXX
-(XXX is the section id.)
+(XXX is the memory block id.)
 
-Now, XXX is defined as (start_address_of_section / section_size) of the first
-section contained in the memory block.  The files 'phys_index' and
-'end_phys_index' under each directory report the beginning and end section id's
-for the memory block covered by the sysfs directory.  It is expected that all
+For the memory block covered by the sysfs directory.  It is expected that all
 memory sections in this range are present and no memory holes exist in the
 range. Currently there is no way to determine if there is a memory hole, but
 the existence of one should not affect the hotplug capabilities of the memory
 block.
 
-For example, assume 1GiB section size. A device for a memory starting at
+For example, assume 1GiB memory block size. A device for a memory starting at
 0x1 is /sys/device/system/memory/memory4
 (0x1 / 1Gib = 4)
 This device covers address range [0x1 ... 0x14000)
 
-Under each section, you can see 4 or 5 files, the end_phys_index file being
-a recent addition and not present on older kernels.
+Under each memory block, you can see 4 files:
 
-/sys/devices/system/memory/memoryXXX/start_phys_index
-/sys/devices/system/memory/memoryXXX/end_phys_index
+/sys/devices/system/memory/memoryXXX/phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index'  : read-only and contains section id of the first section
-   in the memory block, same as XXX.
-'end_phys_index'  : read-only and contains section id of the last section
-   in the memory block.
+'phys_index'  : read-only and contains memory block id, same as XXX.
 'state'   : read-write
 at read:  contains online/offline state of memory.
 at write: user can specify online_kernel,
@@ -185,6 +183,7 @@ For example:
 A backlink will also be created:
 /sys/devices/system/memory/memory9/node0 - ../../node/node0
 
+
 
 4. Physical memory

[RFC PATCH v2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-10 Thread Li Zhong
I noticed following lockdep warning when trying acpi hot-remove cpus:

[84154.204080] ==
[84154.204080] [ INFO: possible circular locking dependency detected ]
[84154.204080] 3.14.0-next-20140408+ #24 Tainted: GW
[84154.204080] ---
[84154.204080] bash/777 is trying to acquire lock:
[84154.204080]  (cpu_add_remove_lock){+.+.+.}, at: []
cpu_maps_update_begin+0x17/0x20
[84154.213203] 
[84154.213203] but task is already holding lock:
[84154.213203]  (s_active#79){.+}, at: []
kernfs_fop_write+0xe4/0x190
[84154.213203] 
[84154.213203] which lock already depends on the new lock.
[84154.213203] 
[84154.213203] 
[84154.213203] the existing dependency chain (in reverse order) is:
[84154.213203] 
-> #3 (s_active#79){.+}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] __kernfs_remove+0x250/0x310
[84154.213203][] kernfs_remove_by_name_ns
+0x50/0xc0
[84154.213203][] sysfs_remove_file_ns
+0x15/0x20
[84154.213203][] device_remove_file+0x19/0x20
[84154.213203][] device_remove_attrs+0x33/0x80
[84154.213203][] device_del+0x127/0x1c0
[84154.213203][] device_unregister+0x22/0x60
[84154.213203][] unregister_cpu+0x1e/0x40
[84154.213203][] arch_unregister_cpu+0x23/0x30
[84154.213203][] acpi_processor_remove
+0x8d/0xb2
[84154.213203][] acpi_bus_trim+0x5a/0x8d
[84154.213203][] acpi_device_hotplug
+0x1a8/0x3ec
[84154.213203][] acpi_hotplug_work_fn
+0x1f/0x2b
[84154.213203][] process_one_work+0x1eb/0x6b0
[84154.213203][] worker_thread+0x11b/0x370
[84154.213203][] kthread+0xe4/0x100
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #2 (cpu_hotplug.lock#2){+.+.+.}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] mutex_lock_nested+0x50/0x3c0
[84154.213203][] cpu_hotplug_begin+0x4f/0x80
[84154.213203][] _cpu_up+0x3f/0x160
[84154.213203][] cpu_up+0x69/0x80
[84154.213203][] smp_init+0x60/0x8c
[84154.213203][] kernel_init_freeable
+0x126/0x23b
[84154.213203][] kernel_init+0xe/0xf0
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #1 (cpu_hotplug.lock){++}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] cpu_hotplug_begin+0x41/0x80
[84154.213203][] _cpu_up+0x3f/0x160
[84154.213203][] cpu_up+0x69/0x80
[84154.213203][] smp_init+0x60/0x8c
[84154.213203][] kernel_init_freeable
+0x126/0x23b
[84154.213203][] kernel_init+0xe/0xf0
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #0 (cpu_add_remove_lock){+.+.+.}:
[84154.213203][] __lock_acquire+0x1f2a/0x1f60
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] mutex_lock_nested+0x50/0x3c0
[84154.213203][] cpu_maps_update_begin
+0x17/0x20
[84154.213203][] cpu_down+0x1d/0x50
[84154.213203][] cpu_subsys_offline+0x14/0x20
[84154.213203][] device_offline+0xad/0xd0
[84154.213203][] online_store+0x42/0x80
[84154.213203][] dev_attr_store+0x18/0x30
[84154.213203][] sysfs_kf_write+0x49/0x60
[84154.213203][] kernfs_fop_write+0x109/0x190
[84154.213203][] vfs_write+0xbe/0x1c0
[84154.213203][] SyS_write+0x52/0xb0
[84154.213203][] tracesys+0xd0/0xd5
[84154.213203] 
[84154.213203] other info that might help us debug this:
[84154.213203] 
[84154.213203] Chain exists of:
  cpu_add_remove_lock --> cpu_hotplug.lock#2 --> s_active#79

[84154.213203]  Possible unsafe locking scenario:
[84154.213203] 
[84154.213203]CPU0CPU1
[84154.213203]
[84154.213203]   lock(s_active#79);
[84154.213203]lock(cpu_hotplug.lock#2);
[84154.213203]lock(s_active#79);
[84154.213203]   lock(cpu_add_remove_lock);
[84154.213203] 
[84154.213203]  *** DEADLOCK ***
.

The deadlock itself seems already fixed in commit 5e33bc41. 

As Tejun suggested, to avoid this lockdep warning,
kernfs_break_active_protection() is used before online/offline callbacks
to take the s_active lock out of the dependency chain during
online/offline operations. 

Signed-off-by: Li Zhong 
---
 drivers/base/core.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..2b9f68e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, );
if (ret < 0)
@@ -448,7 +449,15 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
if (ret)
 

[RFC PATCH] Suppress a device hot remove related lockdep warning

2014-04-10 Thread Li Zhong
I noticed following lockdep warning when trying acpi hot-remove cpus:

[84154.204080] ==
[84154.204080] [ INFO: possible circular locking dependency detected ]
[84154.204080] 3.14.0-next-20140408+ #24 Tainted: GW
[84154.204080] ---
[84154.204080] bash/777 is trying to acquire lock:
[84154.204080]  (cpu_add_remove_lock){+.+.+.}, at: [] 
cpu_maps_update_begin+0x17/0x20
[84154.213203] 
[84154.213203] but task is already holding lock:
[84154.213203]  (s_active#79){.+}, at: [] 
kernfs_fop_write+0xe4/0x190
[84154.213203] 
[84154.213203] which lock already depends on the new lock.
[84154.213203] 
[84154.213203] 
[84154.213203] the existing dependency chain (in reverse order) is:
[84154.213203] 
-> #3 (s_active#79){.+}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] __kernfs_remove+0x250/0x310
[84154.213203][] kernfs_remove_by_name_ns+0x50/0xc0
[84154.213203][] sysfs_remove_file_ns+0x15/0x20
[84154.213203][] device_remove_file+0x19/0x20
[84154.213203][] device_remove_attrs+0x33/0x80
[84154.213203][] device_del+0x127/0x1c0
[84154.213203][] device_unregister+0x22/0x60
[84154.213203][] unregister_cpu+0x1e/0x40
[84154.213203][] arch_unregister_cpu+0x23/0x30
[84154.213203][] acpi_processor_remove+0x8d/0xb2
[84154.213203][] acpi_bus_trim+0x5a/0x8d
[84154.213203][] acpi_device_hotplug+0x1a8/0x3ec
[84154.213203][] acpi_hotplug_work_fn+0x1f/0x2b
[84154.213203][] process_one_work+0x1eb/0x6b0
[84154.213203][] worker_thread+0x11b/0x370
[84154.213203][] kthread+0xe4/0x100
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #2 (cpu_hotplug.lock#2){+.+.+.}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] mutex_lock_nested+0x50/0x3c0
[84154.213203][] cpu_hotplug_begin+0x4f/0x80
[84154.213203][] _cpu_up+0x3f/0x160
[84154.213203][] cpu_up+0x69/0x80
[84154.213203][] smp_init+0x60/0x8c
[84154.213203][] kernel_init_freeable+0x126/0x23b
[84154.213203][] kernel_init+0xe/0xf0
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #1 (cpu_hotplug.lock){++}:
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] cpu_hotplug_begin+0x41/0x80
[84154.213203][] _cpu_up+0x3f/0x160
[84154.213203][] cpu_up+0x69/0x80
[84154.213203][] smp_init+0x60/0x8c
[84154.213203][] kernel_init_freeable+0x126/0x23b
[84154.213203][] kernel_init+0xe/0xf0
[84154.213203][] ret_from_fork+0x7c/0xb0
[84154.213203] 
-> #0 (cpu_add_remove_lock){+.+.+.}:
[84154.213203][] __lock_acquire+0x1f2a/0x1f60
[84154.213203][] lock_acquire+0x9b/0x1d0
[84154.213203][] mutex_lock_nested+0x50/0x3c0
[84154.213203][] cpu_maps_update_begin+0x17/0x20
[84154.213203][] cpu_down+0x1d/0x50
[84154.213203][] cpu_subsys_offline+0x14/0x20
[84154.213203][] device_offline+0xad/0xd0
[84154.213203][] online_store+0x42/0x80
[84154.213203][] dev_attr_store+0x18/0x30
[84154.213203][] sysfs_kf_write+0x49/0x60
[84154.213203][] kernfs_fop_write+0x109/0x190
[84154.213203][] vfs_write+0xbe/0x1c0
[84154.213203][] SyS_write+0x52/0xb0
[84154.213203][] tracesys+0xd0/0xd5
[84154.213203] 
[84154.213203] other info that might help us debug this:
[84154.213203] 
[84154.213203] Chain exists of:
  cpu_add_remove_lock --> cpu_hotplug.lock#2 --> s_active#79

[84154.213203]  Possible unsafe locking scenario:
[84154.213203] 
[84154.213203]CPU0CPU1
[84154.213203]
[84154.213203]   lock(s_active#79);
[84154.213203]lock(cpu_hotplug.lock#2);
[84154.213203]lock(s_active#79);
[84154.213203]   lock(cpu_add_remove_lock);
[84154.213203] 
[84154.213203]  *** DEADLOCK ***
.

The deadlock itself seems already fixed in commit 5e33bc41. 

This patch uses DEVICE_ATTR_IGNORE_LOCKDEP for "online" attr to suppress
this lockdep warning. But I'm a little afraid it might also hide
(future) potential real dead lock scenarios? 

Signed-off-by: Li Zhong 
---
 drivers/base/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..98e3e09 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -452,7 +452,9 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
unlock_device_hotplug();
return ret < 0 ? ret : count;
 }
-static DEVICE_ATTR_RW(online);
+
+static DEVICE_ATTR_IGNORE_LOCKDEP(online, (S_IWUSR | S_IRUGO),
+ online_show, online_store);
 
 int device_add_groups(s

[RFC PATCH] Suppress a device hot remove related lockdep warning

2014-04-10 Thread Li Zhong
 DEVICE_ATTR_IGNORE_LOCKDEP for online attr to suppress
this lockdep warning. But I'm a little afraid it might also hide
(future) potential real dead lock scenarios? 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..98e3e09 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -452,7 +452,9 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
unlock_device_hotplug();
return ret  0 ? ret : count;
 }
-static DEVICE_ATTR_RW(online);
+
+static DEVICE_ATTR_IGNORE_LOCKDEP(online, (S_IWUSR | S_IRUGO),
+ online_show, online_store);
 
 int device_add_groups(struct device *dev, const struct attribute_group 
**groups)
 {




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2] Use kernfs_break_active_protection() for device online store callbacks

2014-04-10 Thread Li Zhong
 suggested, to avoid this lockdep warning,
kernfs_break_active_protection() is used before online/offline callbacks
to take the s_active lock out of the dependency chain during
online/offline operations. 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/core.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..2b9f68e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 {
bool val;
int ret;
+   struct kernfs_node *kn;
 
ret = strtobool(buf, val);
if (ret  0)
@@ -448,7 +449,15 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
if (ret)
return ret;
 
+   kn = kernfs_find_and_get(dev-kobj.sd, attr-attr.name);
+   if (WARN_ON_ONCE(!kn))
+   goto out;
+
+   kernfs_break_active_protection(kn);
+
ret = val ? device_online(dev) : device_offline(dev);
+   kernfs_unbreak_active_protection(kn);
+out:
unlock_device_hotplug();
return ret  0 ? ret : count;
 }


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-09 Thread Li Zhong
On Wed, 2014-04-09 at 12:39 -0500, Nathan Fontenot wrote:
> On 04/08/2014 02:47 PM, Dave Hansen wrote:
> > 
> > That document really needs to be updated to stop referring to sections
> > (at least in the descriptions of the user interface).  We can not change
> > the units of phys_index/end_phys_index without also changing
> > block_size_bytes.
> > 
> 
> Here is a first pass at updating the documentation.
> 
> I have tried to update the documentation to refer to memory blocks instead
> of memory sections where appropriate and added a paragraph to explain
> that memory blocks are mode of memory sections.
> 
> Thoughts?

If we all agree to hide the information about sections, then I think we
also need to update the section id's used for phys_index/end_phys_index,
something like following on top of yours?

--
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 92d15e2..9fbb025 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -138,10 +138,7 @@ is described under /sys/devices/system/memory as
 /sys/devices/system/memory/memoryXXX
 (XXX is the memory block id.)
 
-Now, XXX is defined as (start_address_of_section / section_size) of the first
-section contained in the memory block.  The files 'phys_index' and
-'end_phys_index' under each directory report the beginning and end section id's
-for the memory block covered by the sysfs directory.  It is expected that all
+For the memory block covered by the sysfs directory.  It is expected that all
 memory sections in this range are present and no memory holes exist in the
 range. Currently there is no way to determine if there is a memory hole, but
 the existence of one should not affect the hotplug capabilities of the memory
@@ -155,16 +152,14 @@ This device covers address range [0x1 ... 
0x14000)
 Under each memory block, you can see 4 or 5 files, the end_phys_index file
 being a recent addition and not present on older kernels.
 
-/sys/devices/system/memory/memoryXXX/start_phys_index
+/sys/devices/system/memory/memoryXXX/phys_index
 /sys/devices/system/memory/memoryXXX/end_phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index'  : read-only and contains section id of the first section
-   in the memory block, same as XXX.
-'end_phys_index'  : read-only and contains section id of the last section
-   in the memory block.
+'phys_index'  : read-only and contains memory block id, same as XXX.
+'end_phys_index'  : read-only and contains memory block id, same as XXX.
 'state'   : read-write
 at read:  contains online/offline state of memory.
 at write: user can specify "online_kernel",
--

Not sure whether it is proper to remove end_phys_index, too? 

Thanks, 
Zhong

 

 
> 
> -Nathan
> ---
>  Documentation/memory-hotplug.txt |  113 
> ---
>  1 file changed, 59 insertions(+), 54 deletions(-)
> 
> Index: linux/Documentation/memory-hotplug.txt
> ===
> --- linux.orig/Documentation/memory-hotplug.txt
> +++ linux/Documentation/memory-hotplug.txt
> @@ -88,16 +88,21 @@ phase by hand.
> 
>  1.3. Unit of Memory online/offline operation
>  
> -Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole 
> memory
> -into chunks of the same size. The chunk is called a "section". The size of
> -a section is architecture dependent. For example, power uses 16MiB, ia64 uses
> -1GiB. The unit of online/offline operation is "one section". (see Section 3.)
> +Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
> +into chunks of the same size. These chunks are called "sections". The size of
> +a memory section is architecture dependent. For example, power uses 16MiB, 
> ia64
> +uses 1GiB.
> +
> +Memory sections are combined into chunks referred to as "memory blocks". The
> +size of a memory block is architecture dependent and represents the logical
> +unit upon which memory online/offline operations are to be performed. The
> +default size of a memory block is the same as memory section size unless an
> +architecture specifies otherwise. (see Section 3.)
> 
> -To determine the size of sections, please read this file:
> +To determine the size (in bytes) of a memory block please read this file:
> 
>  /sys/devices/system/memory/block_size_bytes
> 
> -This file shows the size of sections in byte.
> 
>  ---
>  2. Kernel Configuration
> @@ -123,14 +128,15 @@ config options.
>  (CONFIG_ACPI_CONTAINER).
>  This option can be kernel module too.
> 
> +
>  
> -4 sysfs files for memory hotplug
> +3 sysfs files for memory hotplug
>  
> -All sections have their device information in 

Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-09 Thread Li Zhong
On Wed, 2014-04-09 at 08:49 -0700, Dave Hansen wrote:
> On 04/09/2014 02:20 AM, Li Zhong wrote:
> > Or do you mean we don't need to expose any information related to
> > SECTION to userspace? 
> 
> Right, we don't need to expose sections themselves to userspace.  Do we?
> 
OK, I agree with that. 

Yanfei, I recall you once expressed your preference for section
numbers? 

Thanks, Zhong


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-09 Thread Li Zhong
On Tue, 2014-04-08 at 12:47 -0700, Dave Hansen wrote:
> On 04/08/2014 11:23 AM, Nathan Fontenot wrote:
> > On 04/08/2014 11:13 AM, Dave Hansen wrote:
> >> On 04/08/2014 01:27 AM, Li Zhong wrote:
> >>> If Dave and others don't have further objections, it seems this small
> >>> userspace incompatibility could be accepted by most of us, and I don't
> >>> need to make a version 2. 
> >>
> >> Let me ask another question then.  What are the units of
> >> phys_index/end_phys_index?  How do we expose those units to userspace?
> >>
> > 
> > The documentation for these files just states that the files contain
> > the first and last section id of memory in the memory block for
> > phys_index and end_phys_index respectively.
> > 
> > I'm not sure the values have ever been units of anything, at least not
> > that I remember.
> 
> 
> 
> There are two units.  SECTION_SIZE, which is completely internal to the
> kernel, and block_size_bytes which used to be the same as SECTION_SIZE,
> but is not now.  Which one of those two is phys_index/end_phys_index in,
> and if it is in terms of SECTION_SIZE like this patch proposes, how do
> we tell userspace how large SECTION_SIZE is?

With this patch, I think we could still tell how large SECTION_SIZE is.
block_size_bytes tells us the size of the block, and
end_phys_index-phys_index+1, tells us the numbers of sections in the
block, and then we could know the SECTION_SIZE by a division. 

Not that obvious, but if needed, we could easily add a file telling us
section_size or sections_per_block. 

> 
> block_size_bytes is supposed to tell you how large the sections are.  In
> the case where we lumped a bunch of sections together, we also bumped up
> block_size_bytes.  That's why we currently divide the *ACTUAL* section
> number in phys_index/end_phys_index by block_size_bytes.
> 
> That document really needs to be updated to stop referring to sections
> (at least in the descriptions of the user interface).  We can not change
> the units of phys_index/end_phys_index without also changing
> block_size_bytes.

How about adding a new section_size_bytes together with this patch? 

Or do you mean we don't need to expose any information related to
SECTION to userspace? 

Thanks, 
Zhong


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-09 Thread Li Zhong
On Tue, 2014-04-08 at 12:47 -0700, Dave Hansen wrote:
 On 04/08/2014 11:23 AM, Nathan Fontenot wrote:
  On 04/08/2014 11:13 AM, Dave Hansen wrote:
  On 04/08/2014 01:27 AM, Li Zhong wrote:
  If Dave and others don't have further objections, it seems this small
  userspace incompatibility could be accepted by most of us, and I don't
  need to make a version 2. 
 
  Let me ask another question then.  What are the units of
  phys_index/end_phys_index?  How do we expose those units to userspace?
 
  
  The documentation for these files just states that the files contain
  the first and last section id of memory in the memory block for
  phys_index and end_phys_index respectively.
  
  I'm not sure the values have ever been units of anything, at least not
  that I remember.
 
 sigh
 
 There are two units.  SECTION_SIZE, which is completely internal to the
 kernel, and block_size_bytes which used to be the same as SECTION_SIZE,
 but is not now.  Which one of those two is phys_index/end_phys_index in,
 and if it is in terms of SECTION_SIZE like this patch proposes, how do
 we tell userspace how large SECTION_SIZE is?

With this patch, I think we could still tell how large SECTION_SIZE is.
block_size_bytes tells us the size of the block, and
end_phys_index-phys_index+1, tells us the numbers of sections in the
block, and then we could know the SECTION_SIZE by a division. 

Not that obvious, but if needed, we could easily add a file telling us
section_size or sections_per_block. 

 
 block_size_bytes is supposed to tell you how large the sections are.  In
 the case where we lumped a bunch of sections together, we also bumped up
 block_size_bytes.  That's why we currently divide the *ACTUAL* section
 number in phys_index/end_phys_index by block_size_bytes.
 
 That document really needs to be updated to stop referring to sections
 (at least in the descriptions of the user interface).  We can not change
 the units of phys_index/end_phys_index without also changing
 block_size_bytes.

How about adding a new section_size_bytes together with this patch? 

Or do you mean we don't need to expose any information related to
SECTION to userspace? 

Thanks, 
Zhong


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-09 Thread Li Zhong
On Wed, 2014-04-09 at 08:49 -0700, Dave Hansen wrote:
 On 04/09/2014 02:20 AM, Li Zhong wrote:
  Or do you mean we don't need to expose any information related to
  SECTION to userspace? 
 
 Right, we don't need to expose sections themselves to userspace.  Do we?
 
OK, I agree with that. 

Yanfei, I recall you once expressed your preference for section
numbers? 

Thanks, Zhong


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-09 Thread Li Zhong
On Wed, 2014-04-09 at 12:39 -0500, Nathan Fontenot wrote:
 On 04/08/2014 02:47 PM, Dave Hansen wrote:
  
  That document really needs to be updated to stop referring to sections
  (at least in the descriptions of the user interface).  We can not change
  the units of phys_index/end_phys_index without also changing
  block_size_bytes.
  
 
 Here is a first pass at updating the documentation.
 
 I have tried to update the documentation to refer to memory blocks instead
 of memory sections where appropriate and added a paragraph to explain
 that memory blocks are mode of memory sections.
 
 Thoughts?

If we all agree to hide the information about sections, then I think we
also need to update the section id's used for phys_index/end_phys_index,
something like following on top of yours?

--
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 92d15e2..9fbb025 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -138,10 +138,7 @@ is described under /sys/devices/system/memory as
 /sys/devices/system/memory/memoryXXX
 (XXX is the memory block id.)
 
-Now, XXX is defined as (start_address_of_section / section_size) of the first
-section contained in the memory block.  The files 'phys_index' and
-'end_phys_index' under each directory report the beginning and end section id's
-for the memory block covered by the sysfs directory.  It is expected that all
+For the memory block covered by the sysfs directory.  It is expected that all
 memory sections in this range are present and no memory holes exist in the
 range. Currently there is no way to determine if there is a memory hole, but
 the existence of one should not affect the hotplug capabilities of the memory
@@ -155,16 +152,14 @@ This device covers address range [0x1 ... 
0x14000)
 Under each memory block, you can see 4 or 5 files, the end_phys_index file
 being a recent addition and not present on older kernels.
 
-/sys/devices/system/memory/memoryXXX/start_phys_index
+/sys/devices/system/memory/memoryXXX/phys_index
 /sys/devices/system/memory/memoryXXX/end_phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index'  : read-only and contains section id of the first section
-   in the memory block, same as XXX.
-'end_phys_index'  : read-only and contains section id of the last section
-   in the memory block.
+'phys_index'  : read-only and contains memory block id, same as XXX.
+'end_phys_index'  : read-only and contains memory block id, same as XXX.
 'state'   : read-write
 at read:  contains online/offline state of memory.
 at write: user can specify online_kernel,
--

Not sure whether it is proper to remove end_phys_index, too? 

Thanks, 
Zhong

 

 
 
 -Nathan
 ---
  Documentation/memory-hotplug.txt |  113 
 ---
  1 file changed, 59 insertions(+), 54 deletions(-)
 
 Index: linux/Documentation/memory-hotplug.txt
 ===
 --- linux.orig/Documentation/memory-hotplug.txt
 +++ linux/Documentation/memory-hotplug.txt
 @@ -88,16 +88,21 @@ phase by hand.
 
  1.3. Unit of Memory online/offline operation
  
 -Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole 
 memory
 -into chunks of the same size. The chunk is called a section. The size of
 -a section is architecture dependent. For example, power uses 16MiB, ia64 uses
 -1GiB. The unit of online/offline operation is one section. (see Section 3.)
 +Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
 +into chunks of the same size. These chunks are called sections. The size of
 +a memory section is architecture dependent. For example, power uses 16MiB, 
 ia64
 +uses 1GiB.
 +
 +Memory sections are combined into chunks referred to as memory blocks. The
 +size of a memory block is architecture dependent and represents the logical
 +unit upon which memory online/offline operations are to be performed. The
 +default size of a memory block is the same as memory section size unless an
 +architecture specifies otherwise. (see Section 3.)
 
 -To determine the size of sections, please read this file:
 +To determine the size (in bytes) of a memory block please read this file:
 
  /sys/devices/system/memory/block_size_bytes
 
 -This file shows the size of sections in byte.
 
  ---
  2. Kernel Configuration
 @@ -123,14 +128,15 @@ config options.
  (CONFIG_ACPI_CONTAINER).
  This option can be kernel module too.
 
 +
  
 -4 sysfs files for memory hotplug
 +3 sysfs files for memory hotplug
  
 -All sections have their device information in sysfs.  Each section is part of
 -a memory block under /sys/devices/system/memory as

Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-08 Thread Li Zhong
On Fri, 2014-04-04 at 10:29 +0900, Yasuaki Ishimatsu wrote:
> (2014/04/02 17:56), Li Zhong wrote:
> > I noticed the phys_index and end_phys_index under
> > /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
> > (for the test machine, one memory block has 8 sections, that is
> >   sections_per_block equals 8)
> >
> > # cd /sys/devices/system/memory/memory100/
> > # cat phys_index end_phys_index
> > 0064
> > 0064
> >
> > Seems they should reflect the start/end section number respectively, which
> > also matches what is said in Documentation/memory-hotplug.txt
> >
> > This patch tries to modify that so the two files could show the start/end
> > section number of the memory block.
> >
> > After this change, output of the above example looks like:
> >
> > # cat phys_index end_phys_index
> > 0320
> > 0327
> >
> > Signed-off-by: Li Zhong 
> > ---
> 
> Good catch! This is a bug. So I think the bug should be fixed.
> 
> Reviewed-by: Yasuaki Ishimatsu 

Thank you all for the review. 

If Dave and others don't have further objections, it seems this small
userspace incompatibility could be accepted by most of us, and I don't
need to make a version 2. 

Thanks, Zhong 

> Thanks,
> Yasuaki Ishimatsu
> 
> >   drivers/base/memory.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index bece691..b10f2fa 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct device 
> > *dev,
> > struct memory_block *mem = to_memory_block(dev);
> > unsigned long phys_index;
> >
> > -   phys_index = mem->start_section_nr / sections_per_block;
> > +   phys_index = mem->start_section_nr;
> > return sprintf(buf, "%08lx\n", phys_index);
> >   }
> >
> > @@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device 
> > *dev,
> > struct memory_block *mem = to_memory_block(dev);
> > unsigned long phys_index;
> >
> > -   phys_index = mem->end_section_nr / sections_per_block;
> > +   phys_index = mem->end_section_nr;
> > return sprintf(buf, "%08lx\n", phys_index);
> >   }
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-08 Thread Li Zhong
On Fri, 2014-04-04 at 10:29 +0900, Yasuaki Ishimatsu wrote:
 (2014/04/02 17:56), Li Zhong wrote:
  I noticed the phys_index and end_phys_index under
  /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
  (for the test machine, one memory block has 8 sections, that is
sections_per_block equals 8)
 
  # cd /sys/devices/system/memory/memory100/
  # cat phys_index end_phys_index
  0064
  0064
 
  Seems they should reflect the start/end section number respectively, which
  also matches what is said in Documentation/memory-hotplug.txt
 
  This patch tries to modify that so the two files could show the start/end
  section number of the memory block.
 
  After this change, output of the above example looks like:
 
  # cat phys_index end_phys_index
  0320
  0327
 
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
 
 Good catch! This is a bug. So I think the bug should be fixed.
 
 Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Thank you all for the review. 

If Dave and others don't have further objections, it seems this small
userspace incompatibility could be accepted by most of us, and I don't
need to make a version 2. 

Thanks, Zhong 

 Thanks,
 Yasuaki Ishimatsu
 
drivers/base/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/base/memory.c b/drivers/base/memory.c
  index bece691..b10f2fa 100644
  --- a/drivers/base/memory.c
  +++ b/drivers/base/memory.c
  @@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct device 
  *dev,
  struct memory_block *mem = to_memory_block(dev);
  unsigned long phys_index;
 
  -   phys_index = mem-start_section_nr / sections_per_block;
  +   phys_index = mem-start_section_nr;
  return sprintf(buf, %08lx\n, phys_index);
}
 
  @@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device 
  *dev,
  struct memory_block *mem = to_memory_block(dev);
  unsigned long phys_index;
 
  -   phys_index = mem-end_section_nr / sections_per_block;
  +   phys_index = mem-end_section_nr;
  return sprintf(buf, %08lx\n, phys_index);
}
 
 
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-03 Thread Li Zhong
On Thu, 2014-04-03 at 11:06 +0800, Zhang Yanfei wrote:
> On 04/03/2014 10:37 AM, Li Zhong wrote:
> > On Thu, 2014-04-03 at 09:37 +0800, Zhang Yanfei wrote:
> >> Add ccing
> >>
> >> On 04/02/2014 04:56 PM, Li Zhong wrote:
> >>> I noticed the phys_index and end_phys_index under 
> >>> /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
> >>> (for the test machine, one memory block has 8 sections, that is 
> >>>  sections_per_block equals 8)
> >>>
> >>> # cd /sys/devices/system/memory/memory100/
> >>> # cat phys_index end_phys_index 
> >>> 0064
> >>> 0064
> >>>
> >>> Seems they should reflect the start/end section number respectively, 
> >>> which 
> >>> also matches what is said in Documentation/memory-hotplug.txt
> >>
> > Hi Yanfei, 
> > 
> > Thanks for the review. 
> > 
> >> Indeed. I've noticed this before. The value in 'end_phys_index' doesn't
> >> match what it really means. But, the name itself is vague, it looks like
> >> it is the index of some page frame. (we keep this name for compatibility?)
> > 
> > I guess so, Dave just reminded me that the RFC would also break
> > userspace..
> > 
> > And now my plan is: 
> >  leave the code unchanged
> >  update the document, state the end_phys_index/phys_index are the same,
> > and means the memory block index
> 
> Ah. I doubt whether there is userspace tool which is using the two sysfiles?
> for example, the memory100 directory itself can tell us which block it is.
> So why there is the two files under it give the same meaning.
> 
> If there is userspace tool using the two files, does it use 'end_phys_index'
> in the correct way? That said, if a userspace tool knows what the 
> 'end_phys_index'
> really mean, does it still need it since we have 'phys_index' with the same 
> value?

For the end_phys_index, I totally agree with you. If somebody tries to
use it, say as the end section number, he should finally be able to find
that the value is not what he expects. But who knows ...

For phys_index, I guess it is also reasonable for some userspace tool to
just loop for each memoryXXX directory under /sys/devices/system/memory,
and use the phys_index as the memory block index for the directory,
instead of extracting the XXX from the directory name memoryXXX

Don't know whether there are some generic rules for handling such kind
of compatibility issues...


> >  [optional] create two new files start_sec_nr, end_sec_nr if needed
> 
> These two are the really meaningful sysfiles for userspace, IMO.

OK, I see. 

> 
> > 
> > Do you have any other suggestions? 
> 
> No. I think we should wait for other guys to comment more.

OK, let's wait.

Thanks, Zhong

> 
> Thanks.
> 
> > 
> > Thanks, Zhong
> > 
> >>
> >> The corresponding member in struct memory_block is:
> >>
> >> struct memory_block {
> >> unsigned long start_section_nr;
> >> unsigned long end_section_nr;
> >> ...
> >>
> >> The two members seem to have the right name, and have the right value in 
> >> kernel.
> >>
> >>
> >>>
> >>> This patch tries to modify that so the two files could show the start/end
> >>> section number of the memory block.
> >>>
> >>> After this change, output of the above example looks like:
> >>>
> >>> # cat phys_index end_phys_index 
> >>> 0320
> >>> 0327
> >>>
> >>> Signed-off-by: Li Zhong 
> >>> ---
> >>>  drivers/base/memory.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> >>> index bece691..b10f2fa 100644
> >>> --- a/drivers/base/memory.c
> >>> +++ b/drivers/base/memory.c
> >>> @@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct 
> >>> device *dev,
> >>>   struct memory_block *mem = to_memory_block(dev);
> >>>   unsigned long phys_index;
> >>>  
> >>> - phys_index = mem->start_section_nr / sections_per_block;
> >>> + phys_index = mem->start_section_nr;
> >>>   return sprintf(buf, "%08lx\n", phys_index);
> >>>  }
> >>>  
> >>> @@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device 
> >>> *dev,
> >>>   struct memory_block *mem = to_memory_block(dev);
> >>>   unsigned long phys_index;
> >>>  
> >>> - phys_index = mem->end_section_nr / sections_per_block;
> >>> + phys_index = mem->end_section_nr;
> >>>   return sprintf(buf, "%08lx\n", phys_index);
> >>>  }
> >>>  
> >>>
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>> the body of a message to majord...@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> Please read the FAQ at  http://www.tux.org/lkml/
> >>>
> >>
> >>
> > 
> > 
> > .
> > 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-03 Thread Li Zhong
On Thu, 2014-04-03 at 11:06 +0800, Zhang Yanfei wrote:
 On 04/03/2014 10:37 AM, Li Zhong wrote:
  On Thu, 2014-04-03 at 09:37 +0800, Zhang Yanfei wrote:
  Add ccing
 
  On 04/02/2014 04:56 PM, Li Zhong wrote:
  I noticed the phys_index and end_phys_index under 
  /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
  (for the test machine, one memory block has 8 sections, that is 
   sections_per_block equals 8)
 
  # cd /sys/devices/system/memory/memory100/
  # cat phys_index end_phys_index 
  0064
  0064
 
  Seems they should reflect the start/end section number respectively, 
  which 
  also matches what is said in Documentation/memory-hotplug.txt
 
  Hi Yanfei, 
  
  Thanks for the review. 
  
  Indeed. I've noticed this before. The value in 'end_phys_index' doesn't
  match what it really means. But, the name itself is vague, it looks like
  it is the index of some page frame. (we keep this name for compatibility?)
  
  I guess so, Dave just reminded me that the RFC would also break
  userspace..
  
  And now my plan is: 
   leave the code unchanged
   update the document, state the end_phys_index/phys_index are the same,
  and means the memory block index
 
 Ah. I doubt whether there is userspace tool which is using the two sysfiles?
 for example, the memory100 directory itself can tell us which block it is.
 So why there is the two files under it give the same meaning.
 
 If there is userspace tool using the two files, does it use 'end_phys_index'
 in the correct way? That said, if a userspace tool knows what the 
 'end_phys_index'
 really mean, does it still need it since we have 'phys_index' with the same 
 value?

For the end_phys_index, I totally agree with you. If somebody tries to
use it, say as the end section number, he should finally be able to find
that the value is not what he expects. But who knows ...

For phys_index, I guess it is also reasonable for some userspace tool to
just loop for each memoryXXX directory under /sys/devices/system/memory,
and use the phys_index as the memory block index for the directory,
instead of extracting the XXX from the directory name memoryXXX

Don't know whether there are some generic rules for handling such kind
of compatibility issues...


   [optional] create two new files start_sec_nr, end_sec_nr if needed
 
 These two are the really meaningful sysfiles for userspace, IMO.

OK, I see. 

 
  
  Do you have any other suggestions? 
 
 No. I think we should wait for other guys to comment more.

OK, let's wait.

Thanks, Zhong

 
 Thanks.
 
  
  Thanks, Zhong
  
 
  The corresponding member in struct memory_block is:
 
  struct memory_block {
  unsigned long start_section_nr;
  unsigned long end_section_nr;
  ...
 
  The two members seem to have the right name, and have the right value in 
  kernel.
 
 
 
  This patch tries to modify that so the two files could show the start/end
  section number of the memory block.
 
  After this change, output of the above example looks like:
 
  # cat phys_index end_phys_index 
  0320
  0327
 
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
   drivers/base/memory.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/base/memory.c b/drivers/base/memory.c
  index bece691..b10f2fa 100644
  --- a/drivers/base/memory.c
  +++ b/drivers/base/memory.c
  @@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct 
  device *dev,
struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
   
  - phys_index = mem-start_section_nr / sections_per_block;
  + phys_index = mem-start_section_nr;
return sprintf(buf, %08lx\n, phys_index);
   }
   
  @@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device 
  *dev,
struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
   
  - phys_index = mem-end_section_nr / sections_per_block;
  + phys_index = mem-end_section_nr;
return sprintf(buf, %08lx\n, phys_index);
   }
   
 
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 
 
  
  
  .
  
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-02 Thread Li Zhong
On Thu, 2014-04-03 at 09:37 +0800, Zhang Yanfei wrote:
> Add ccing
> 
> On 04/02/2014 04:56 PM, Li Zhong wrote:
> > I noticed the phys_index and end_phys_index under 
> > /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
> > (for the test machine, one memory block has 8 sections, that is 
> >  sections_per_block equals 8)
> > 
> > # cd /sys/devices/system/memory/memory100/
> > # cat phys_index end_phys_index 
> > 0064
> > 0064
> > 
> > Seems they should reflect the start/end section number respectively, which 
> > also matches what is said in Documentation/memory-hotplug.txt
> 
Hi Yanfei, 

Thanks for the review. 

> Indeed. I've noticed this before. The value in 'end_phys_index' doesn't
> match what it really means. But, the name itself is vague, it looks like
> it is the index of some page frame. (we keep this name for compatibility?)

I guess so, Dave just reminded me that the RFC would also break
userspace..

And now my plan is: 
 leave the code unchanged
 update the document, state the end_phys_index/phys_index are the same,
and means the memory block index
 [optional] create two new files start_sec_nr, end_sec_nr if needed

Do you have any other suggestions? 

Thanks, Zhong

> 
> The corresponding member in struct memory_block is:
> 
> struct memory_block {
> unsigned long start_section_nr;
> unsigned long end_section_nr;
> ...
> 
> The two members seem to have the right name, and have the right value in 
> kernel.
> 
> 
> > 
> > This patch tries to modify that so the two files could show the start/end
> > section number of the memory block.
> > 
> > After this change, output of the above example looks like:
> > 
> > # cat phys_index end_phys_index 
> > 0320
> > 0327
> > 
> > Signed-off-by: Li Zhong 
> > ---
> >  drivers/base/memory.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index bece691..b10f2fa 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct device 
> > *dev,
> > struct memory_block *mem = to_memory_block(dev);
> > unsigned long phys_index;
> >  
> > -   phys_index = mem->start_section_nr / sections_per_block;
> > +   phys_index = mem->start_section_nr;
> > return sprintf(buf, "%08lx\n", phys_index);
> >  }
> >  
> > @@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device 
> > *dev,
> > struct memory_block *mem = to_memory_block(dev);
> > unsigned long phys_index;
> >  
> > -   phys_index = mem->end_section_nr / sections_per_block;
> > +   phys_index = mem->end_section_nr;
> > return sprintf(buf, "%08lx\n", phys_index);
> >  }
> >  
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-02 Thread Li Zhong
On Wed, 2014-04-02 at 09:09 -0700, Dave Hansen wrote:
> On 04/02/2014 01:56 AM, Li Zhong wrote:
> > I noticed the phys_index and end_phys_index under 
> > /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
> > (for the test machine, one memory block has 8 sections, that is 
> >  sections_per_block equals 8)
> > 
> > # cd /sys/devices/system/memory/memory100/
> > # cat phys_index end_phys_index 
> > 0064
> > 0064
> > 
> > Seems they should reflect the start/end section number respectively, which 
> > also matches what is said in Documentation/memory-hotplug.txt
> 
> This changes a user-visible interface.  Won't this break userspace?

Hi Dave, 

Hmm, I think so... Thank you for the reminder

Do you have some better ideas to fix this? Maybe we should leave the
code unchanged, just change the corresponding document (just it seems
that the end_phys_index will always be the same as phys_index)? And if
the section numbers are really needed, then we could create two new
files later, e.g. start_section_nr, end_section_nr? 

Thanks, 
Zhong
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-02 Thread Li Zhong
I noticed the phys_index and end_phys_index under 
/sys/devices/system/memory/memoryXXX/ have the same value, e.g.
(for the test machine, one memory block has 8 sections, that is 
 sections_per_block equals 8)

# cd /sys/devices/system/memory/memory100/
# cat phys_index end_phys_index 
0064
0064

Seems they should reflect the start/end section number respectively, which 
also matches what is said in Documentation/memory-hotplug.txt

This patch tries to modify that so the two files could show the start/end
section number of the memory block.

After this change, output of the above example looks like:

# cat phys_index end_phys_index 
0320
0327

Signed-off-by: Li Zhong 
---
 drivers/base/memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..b10f2fa 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct device *dev,
struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
-   phys_index = mem->start_section_nr / sections_per_block;
+   phys_index = mem->start_section_nr;
return sprintf(buf, "%08lx\n", phys_index);
 }
 
@@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device *dev,
struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
-   phys_index = mem->end_section_nr / sections_per_block;
+   phys_index = mem->end_section_nr;
return sprintf(buf, "%08lx\n", phys_index);
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-02 Thread Li Zhong
I noticed the phys_index and end_phys_index under 
/sys/devices/system/memory/memoryXXX/ have the same value, e.g.
(for the test machine, one memory block has 8 sections, that is 
 sections_per_block equals 8)

# cd /sys/devices/system/memory/memory100/
# cat phys_index end_phys_index 
0064
0064

Seems they should reflect the start/end section number respectively, which 
also matches what is said in Documentation/memory-hotplug.txt

This patch tries to modify that so the two files could show the start/end
section number of the memory block.

After this change, output of the above example looks like:

# cat phys_index end_phys_index 
0320
0327

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..b10f2fa 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct device *dev,
struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
-   phys_index = mem-start_section_nr / sections_per_block;
+   phys_index = mem-start_section_nr;
return sprintf(buf, %08lx\n, phys_index);
 }
 
@@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device *dev,
struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
-   phys_index = mem-end_section_nr / sections_per_block;
+   phys_index = mem-end_section_nr;
return sprintf(buf, %08lx\n, phys_index);
 }
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-02 Thread Li Zhong
On Wed, 2014-04-02 at 09:09 -0700, Dave Hansen wrote:
 On 04/02/2014 01:56 AM, Li Zhong wrote:
  I noticed the phys_index and end_phys_index under 
  /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
  (for the test machine, one memory block has 8 sections, that is 
   sections_per_block equals 8)
  
  # cd /sys/devices/system/memory/memory100/
  # cat phys_index end_phys_index 
  0064
  0064
  
  Seems they should reflect the start/end section number respectively, which 
  also matches what is said in Documentation/memory-hotplug.txt
 
 This changes a user-visible interface.  Won't this break userspace?

Hi Dave, 

Hmm, I think so... Thank you for the reminder

Do you have some better ideas to fix this? Maybe we should leave the
code unchanged, just change the corresponding document (just it seems
that the end_phys_index will always be the same as phys_index)? And if
the section numbers are really needed, then we could create two new
files later, e.g. start_section_nr, end_section_nr? 

Thanks, 
Zhong
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] memory driver: make phys_index/end_phys_index reflect the start/end section number

2014-04-02 Thread Li Zhong
On Thu, 2014-04-03 at 09:37 +0800, Zhang Yanfei wrote:
 Add ccing
 
 On 04/02/2014 04:56 PM, Li Zhong wrote:
  I noticed the phys_index and end_phys_index under 
  /sys/devices/system/memory/memoryXXX/ have the same value, e.g.
  (for the test machine, one memory block has 8 sections, that is 
   sections_per_block equals 8)
  
  # cd /sys/devices/system/memory/memory100/
  # cat phys_index end_phys_index 
  0064
  0064
  
  Seems they should reflect the start/end section number respectively, which 
  also matches what is said in Documentation/memory-hotplug.txt
 
Hi Yanfei, 

Thanks for the review. 

 Indeed. I've noticed this before. The value in 'end_phys_index' doesn't
 match what it really means. But, the name itself is vague, it looks like
 it is the index of some page frame. (we keep this name for compatibility?)

I guess so, Dave just reminded me that the RFC would also break
userspace..

And now my plan is: 
 leave the code unchanged
 update the document, state the end_phys_index/phys_index are the same,
and means the memory block index
 [optional] create two new files start_sec_nr, end_sec_nr if needed

Do you have any other suggestions? 

Thanks, Zhong

 
 The corresponding member in struct memory_block is:
 
 struct memory_block {
 unsigned long start_section_nr;
 unsigned long end_section_nr;
 ...
 
 The two members seem to have the right name, and have the right value in 
 kernel.
 
 
  
  This patch tries to modify that so the two files could show the start/end
  section number of the memory block.
  
  After this change, output of the above example looks like:
  
  # cat phys_index end_phys_index 
  0320
  0327
  
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
   drivers/base/memory.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/drivers/base/memory.c b/drivers/base/memory.c
  index bece691..b10f2fa 100644
  --- a/drivers/base/memory.c
  +++ b/drivers/base/memory.c
  @@ -114,7 +114,7 @@ static ssize_t show_mem_start_phys_index(struct device 
  *dev,
  struct memory_block *mem = to_memory_block(dev);
  unsigned long phys_index;
   
  -   phys_index = mem-start_section_nr / sections_per_block;
  +   phys_index = mem-start_section_nr;
  return sprintf(buf, %08lx\n, phys_index);
   }
   
  @@ -124,7 +124,7 @@ static ssize_t show_mem_end_phys_index(struct device 
  *dev,
  struct memory_block *mem = to_memory_block(dev);
  unsigned long phys_index;
   
  -   phys_index = mem-end_section_nr / sections_per_block;
  +   phys_index = mem-end_section_nr;
  return sprintf(buf, %08lx\n, phys_index);
   }
   
  
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
  
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH ] workqueue: add args to workqueue lockdep name

2014-02-13 Thread Li Zhong
Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in
process_one_work().

Maybe #fmt plus #args could be used as the lock_name to give some more
information for some fmt string like the above. 

__builtin_constant_p() check is removed (as there seems no good way to
check all the variables in args list). However, by removing the check,
it only adds two additional "s for those constants.

Some lockdep name examples printed out after the change:

lockdep namewq->name

"events_long"   events_long
"%s"("khelper") khelper
"xfs-data/%s"mp->m_fsname   xfs-data/dm-3

Signed-off-by: Li Zhong 
---
 include/linux/workqueue.h |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 594521b..704f4f6 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -419,10 +419,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, 
int max_active,
static struct lock_class_key __key; \
const char *__lock_name;\
\
-   if (__builtin_constant_p(fmt))  \
-   __lock_name = (fmt);\
-   else\
-   __lock_name = #fmt; \
+   __lock_name = #fmt#args;\
\
__alloc_workqueue_key((fmt), (flags), (max_active), \
  &__key, __lock_name, ##args); \



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH ] workqueue: add args to workqueue lockdep name

2014-02-13 Thread Li Zhong
Tommi noticed a 'funny' lock class name: %s#5 from a lock acquired in
process_one_work().

Maybe #fmt plus #args could be used as the lock_name to give some more
information for some fmt string like the above. 

__builtin_constant_p() check is removed (as there seems no good way to
check all the variables in args list). However, by removing the check,
it only adds two additional s for those constants.

Some lockdep name examples printed out after the change:

lockdep namewq-name

events_long   events_long
%s(khelper) khelper
xfs-data/%smp-m_fsname   xfs-data/dm-3

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 include/linux/workqueue.h |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 594521b..704f4f6 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -419,10 +419,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, 
int max_active,
static struct lock_class_key __key; \
const char *__lock_name;\
\
-   if (__builtin_constant_p(fmt))  \
-   __lock_name = (fmt);\
-   else\
-   __lock_name = #fmt; \
+   __lock_name = #fmt#args;\
\
__alloc_workqueue_key((fmt), (flags), (max_active), \
  __key, __lock_name, ##args); \



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lockdep: strange %s#5 lock name

2014-02-12 Thread Li Zhong
On Tue, 2014-02-11 at 10:27 -0500, Tejun Heo wrote:
> On Tue, Feb 11, 2014 at 12:00:36PM +0100, Peter Zijlstra wrote:
> > > Looks good to me.  Can you please post the patch with SOB?
> > 
> > ---
> > Subject: workqueue: Fix workqueue lockdep name
> > 
> > Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in
> > process_one_work(). It turns out that commit b196be89cdc14 forgot to
> > change the lockdep_init_map() when it changed the @lock_name argument
> > from a string to a format.
> > 
> > Fixes: b196be89cdc14 ("workqueue: make alloc_workqueue() take printf fmt 
> > and args for name")
> > Reported-by: Tommi Rantala 
> > Signed-off-by: Peter Zijlstra 
> 
> Applied to wq/for-3.14-fixes.

Hi Tejun, Peter,

I found following lockdep warning caused by this commit in next-tree:

[5.251993] [ cut here ]
[5.252019] WARNING: CPU: 0 PID: 221 at kernel/locking/lockdep.c:710 
__lock_acquire+0x1761/0x1f60()
[5.252019] Modules linked in: e1000
[5.252019] CPU: 0 PID: 221 Comm: lvm Not tainted 3.14.0-rc2-next-20140212 #1
[5.252019] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[5.252019]  0009 880118e91938 8155fe12 
880118e91978
[5.252019]  8105c195 880118e91958 81eb33d0 
0002
[5.252019]  880118dd2318   
880118e91988
[5.252019] Call Trace:
[5.252019]  [] dump_stack+0x19/0x1b
[5.252019]  [] warn_slowpath_common+0x85/0xb0
[5.252019]  [] warn_slowpath_null+0x1a/0x20
[5.252019]  [] __lock_acquire+0x1761/0x1f60
[5.252019]  [] ? mark_held_locks+0xae/0x120
[5.252019]  [] ? debug_check_no_locks_freed+0x8e/0x160
[5.252019]  [] ? lockdep_init_map+0xac/0x600
[5.252019]  [] lock_acquire+0x9a/0x120
[5.252019]  [] ? flush_workqueue+0x5/0x750
[5.252019]  [] flush_workqueue+0x109/0x750
[5.252019]  [] ? flush_workqueue+0x5/0x750
[5.252019]  [] ? _raw_spin_unlock_irq+0x30/0x40
[5.252019]  [] ? srcu_reschedule+0xe0/0xf0
[5.252019]  [] dm_suspend+0xe9/0x1e0
[5.252019]  [] dev_suspend+0x1e3/0x270
[5.252019]  [] ? table_load+0x350/0x350
[5.252019]  [] ctl_ioctl+0x26c/0x510
[5.252019]  [] ? __lock_acquire+0x41c/0x1f60
[5.252019]  [] ? vtime_account_user+0x98/0xb0
[5.252019]  [] dm_ctl_ioctl+0x13/0x20
[5.252019]  [] do_vfs_ioctl+0x88/0x570
[5.252019]  [] ? __fget_light+0x129/0x150
[5.252019]  [] SyS_ioctl+0x91/0xb0
[5.252019]  [] tracesys+0xcf/0xd4
[5.252019] ---[ end trace ff1fa506f34be3bc ]---

It seems to me that when the second time alloc_workqueue() is called
from the same code path, it would have two locks with the same key, but
not the same >name, which doesn't meet lockdep's assumption.

Maybe we could use the #fmt plus #args as the lock_name to avoid the
above issue? It could also give some more information for some string
like %s#5. Draft code attached below.

I removed __builtin_constant_p() check (I don't know how to check all
the args). However, by removing the check, it only adds two additional
two "s for those constants.

Some lockdep name examples I printed out after the change:

lockdep namewq->name

"events_long"   events_long
"%s"("khelper") khelper
"xfs-data/%s"mp->m_fsname   xfs-data/dm-3
...

A little bit ugly, not sure whether we could generate some better names
here?

Thanks, 
Zhong

---
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 594521b..704f4f6 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -419,10 +419,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, 
int max_active,
static struct lock_class_key __key; \
const char *__lock_name;\
\
-   if (__builtin_constant_p(fmt))  \
-   __lock_name = (fmt);\
-   else\
-   __lock_name = #fmt; \
+   __lock_name = #fmt#args;\
\
__alloc_workqueue_key((fmt), (flags), (max_active), \
  &__key, __lock_name, ##args); \
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 861d8dd..82ef9f3 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4202,7 +4202,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char 
*fmt,
INIT_LIST_HEAD(>flusher_overflow);
INIT_LIST_HEAD(>maydays);
 
-   lockdep_init_map(>lockdep_map, wq->name, key, 0);
+   lockdep_init_map(>lockdep_map, lock_name, key, 0);
INIT_LIST_HEAD(>list);
 

Re: lockdep: strange %s#5 lock name

2014-02-12 Thread Li Zhong
On Tue, 2014-02-11 at 10:27 -0500, Tejun Heo wrote:
 On Tue, Feb 11, 2014 at 12:00:36PM +0100, Peter Zijlstra wrote:
   Looks good to me.  Can you please post the patch with SOB?
  
  ---
  Subject: workqueue: Fix workqueue lockdep name
  
  Tommi noticed a 'funny' lock class name: %s#5 from a lock acquired in
  process_one_work(). It turns out that commit b196be89cdc14 forgot to
  change the lockdep_init_map() when it changed the @lock_name argument
  from a string to a format.
  
  Fixes: b196be89cdc14 (workqueue: make alloc_workqueue() take printf fmt 
  and args for name)
  Reported-by: Tommi Rantala tt.rant...@gmail.com
  Signed-off-by: Peter Zijlstra pet...@infradead.org
 
 Applied to wq/for-3.14-fixes.

Hi Tejun, Peter,

I found following lockdep warning caused by this commit in next-tree:

[5.251993] [ cut here ]
[5.252019] WARNING: CPU: 0 PID: 221 at kernel/locking/lockdep.c:710 
__lock_acquire+0x1761/0x1f60()
[5.252019] Modules linked in: e1000
[5.252019] CPU: 0 PID: 221 Comm: lvm Not tainted 3.14.0-rc2-next-20140212 #1
[5.252019] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[5.252019]  0009 880118e91938 8155fe12 
880118e91978
[5.252019]  8105c195 880118e91958 81eb33d0 
0002
[5.252019]  880118dd2318   
880118e91988
[5.252019] Call Trace:
[5.252019]  [8155fe12] dump_stack+0x19/0x1b
[5.252019]  [8105c195] warn_slowpath_common+0x85/0xb0
[5.252019]  [8105c1da] warn_slowpath_null+0x1a/0x20
[5.252019]  [810a1721] __lock_acquire+0x1761/0x1f60
[5.252019]  [8109ec2e] ? mark_held_locks+0xae/0x120
[5.252019]  [8109ef4e] ? debug_check_no_locks_freed+0x8e/0x160
[5.252019]  [810a264c] ? lockdep_init_map+0xac/0x600
[5.252019]  [810a251a] lock_acquire+0x9a/0x120
[5.252019]  [810793f5] ? flush_workqueue+0x5/0x750
[5.252019]  [810794f9] flush_workqueue+0x109/0x750
[5.252019]  [810793f5] ? flush_workqueue+0x5/0x750
[5.252019]  [81566890] ? _raw_spin_unlock_irq+0x30/0x40
[5.252019]  [810b7720] ? srcu_reschedule+0xe0/0xf0
[5.252019]  [81405889] dm_suspend+0xe9/0x1e0
[5.252019]  [8140a853] dev_suspend+0x1e3/0x270
[5.252019]  [8140a670] ? table_load+0x350/0x350
[5.252019]  [8140b40c] ctl_ioctl+0x26c/0x510
[5.252019]  [810a03dc] ? __lock_acquire+0x41c/0x1f60
[5.252019]  [810923d8] ? vtime_account_user+0x98/0xb0
[5.252019]  [8140b6c3] dm_ctl_ioctl+0x13/0x20
[5.252019]  [811986c8] do_vfs_ioctl+0x88/0x570
[5.252019]  [811a5579] ? __fget_light+0x129/0x150
[5.252019]  [81198c41] SyS_ioctl+0x91/0xb0
[5.252019]  [8157049d] tracesys+0xcf/0xd4
[5.252019] ---[ end trace ff1fa506f34be3bc ]---

It seems to me that when the second time alloc_workqueue() is called
from the same code path, it would have two locks with the same key, but
not the same wq-name, which doesn't meet lockdep's assumption.

Maybe we could use the #fmt plus #args as the lock_name to avoid the
above issue? It could also give some more information for some string
like %s#5. Draft code attached below.

I removed __builtin_constant_p() check (I don't know how to check all
the args). However, by removing the check, it only adds two additional
two s for those constants.

Some lockdep name examples I printed out after the change:

lockdep namewq-name

events_long   events_long
%s(khelper) khelper
xfs-data/%smp-m_fsname   xfs-data/dm-3
...

A little bit ugly, not sure whether we could generate some better names
here?

Thanks, 
Zhong

---
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 594521b..704f4f6 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -419,10 +419,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, 
int max_active,
static struct lock_class_key __key; \
const char *__lock_name;\
\
-   if (__builtin_constant_p(fmt))  \
-   __lock_name = (fmt);\
-   else\
-   __lock_name = #fmt; \
+   __lock_name = #fmt#args;\
\
__alloc_workqueue_key((fmt), (flags), (max_active), \
  __key, __lock_name, ##args); \
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 861d8dd..82ef9f3 

Re: [RFC PATCH next]module: Fix mod->mkobj.kobj potentially freed too early

2013-08-22 Thread Li Zhong
On Thu, 2013-08-22 at 16:30 +0930, Rusty Russell wrote:
> Greg KH  writes:
> > On Wed, Aug 21, 2013 at 05:49:58PM +0800, Li Zhong wrote:
> > >  struct kobj_type module_ktype = {
> > > + .release   =module_kobj_release,
> > >   .sysfs_ops =_sysfs_ops,
> > >  };
> >
> > Wait, as there is no release function here for the kobject (a different
> > problem), why is the deferred release function causing any problems?
> > There is no release function to call, so what is causing the oops?
> 
> Because DEBUG_KOBJECT_RELEASE does the kobject_put() sometime later,
> which is what causes the oops.
> 
> Since kobjects don't have an owner field, AFAICT someone *could* grab
> one in a module which is unloading, then put it after unload.   So this
> fixes a real bug, albeit not one seen in the real world.
> 
> Applied,

Oh, thank you, Rusty. 

I just sent out another version... which fix it in another way as Greg
suggested, could you please also help to take a look at it?

Thanks, Zhong

> Rusty.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2 next]module: Fix mod->mkobj.kobj potentially freed too early

2013-08-22 Thread Li Zhong
DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

After some investigation, it seems the reason is:
The mod->mkobj.kobj(a01600d0 below) is freed together with mod
itself by module_free(mod, mod->module_core) in free_module(). However,
its children still hold references to it, as the delay caused by
DEBUG_KOBJECT_RELEASE. So when the child('holders' below) tries to
decrease the reference count to its parent in kobject_del(), BUG happens
as it tries to access already freed memory.

v2: 
Greg suggested to make the kobject as a pointer. But it seems a little
hard to make kobj(kobject) embedded in mkobj(module_kobject) a pointer,
as that seems to cause getting the mkobj from the kobj impossible --
to_module_kobject() is used in several places to derive mkobj from its
member kobj. 

So in this version, I made the whole mkobj(module_kobject) as a pointer
in mod(module).

The mkobj is then allocated in mod_sysfs_init(), and freed in its member
kobj's release function -- it seems that there is no mkobj usage after
its kobj is released? 

[ 1844.175287] kobject: 'holders' (88007c1f1600): kobject_release, parent 
a01600d0 (delayed)
[ 1844.178991] kobject: 'notes' (8800370b2a00): kobject_release, parent 
a01600d0 (delayed)
[ 1845.180118] kobject: 'holders' (88007c1f1600): kobject_cleanup, parent 
a01600d0
[ 1845.182130] kobject: 'holders' (88007c1f1600): auto cleanup kobject_del
[ 1845.184120] BUG: unable to handle kernel paging request at a01601d0
[ 1845.185026] IP: [] kobject_put+0x11/0x60
[ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
[ 1845.185026] Oops:  [#1] PREEMPT 
[ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
[ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G   O 
3.11.0-rc6-next-20130819+ #1
[ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1845.185026] Workqueue: events kobject_delayed_cleanup
[ 1845.185026] task: 88007ca51f00 ti: 88007ca5c000 task.ti: 
88007ca5c000
[ 1845.185026] RIP: 0010:[]  [] 
kobject_put+0x11/0x60
[ 1845.185026] RSP: 0018:88007ca5dd08  EFLAGS: 00010282
[ 1845.185026] RAX: 2000 RBX: a01600d0 RCX: 8177d638
[ 1845.185026] RDX: 88007ca5dc18 RSI:  RDI: a01600d0
[ 1845.185026] RBP: 88007ca5dd18 R08: 824e9810 R09: 
[ 1845.185026] R10: 8800 R11: dead4ead0001 R12: 81a95040
[ 1845.185026] R13: 88007b27a960 R14: 88007c1f1600 R15: 
[ 1845.185026] FS:  () GS:81a23000() 
knlGS:
[ 1845.185026] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1845.185026] CR2: a01601d0 CR3: 37207000 CR4: 06b0
[ 1845.185026] Stack:
[ 1845.185026]  88007c1f1600 88007c1f1600 88007ca5dd38 
812cdb7e
[ 1845.185026]   88007c1f1640 88007ca5dd68 
812cdbfe
[ 1845.185026]  88007c974800 88007c1f1640 88007ff61a00 

[ 1845.185026] Call Trace:
[ 1845.185026]  [] kobject_del+0x2e/0x40
[ 1845.185026]  [] kobject_delayed_cleanup+0x6e/0x1d0
[ 1845.185026]  [] process_one_work+0x1e5/0x670
[ 1845.185026]  [] ? process_one_work+0x183/0x670
[ 1845.185026]  [] worker_thread+0x113/0x370
[ 1845.185026]  [] ? rescuer_thread+0x290/0x290
[ 1845.185026]  [] kthread+0xda/0xe0
[ 1845.185026]  [] ? _raw_spin_unlock_irq+0x30/0x60
[ 1845.185026]  [] ? kthread_create_on_node+0x130/0x130
[ 1845.185026]  [] ret_from_fork+0x7a/0xb0
[ 1845.185026]  [] ? kthread_create_on_node+0x130/0x130
[ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff 
ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d  87 
00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84 
[ 1845.185026] RIP  [] kobject_put+0x11/0x60
[ 1845.185026]  RSP 
[ 1845.185026] CR2: a01601d0
[ 1845.185026] ---[ end trace 49a70afd109f5653 ]---

Signed-off-by: Li Zhong 
---
 drivers/base/core.c|  2 +-
 drivers/base/module.c  |  4 ++--
 include/linux/module.h |  2 +-
 kernel/module.c| 37 +
 kernel/params.c| 19 +--
 5 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 09a99d6..b8a1fc6 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1629,7 +1629,7 @@ struct device *__root_device_register(const char *name, 
struct module *owner)
 
 #ifdef CONFIG_MODULES  /* gotta find a "cleaner" way to do this */
if (owner) {
-   struct module_kobject *mk = >mkobj;
+   struct module_kobject *mk = owner->mkobj;
 
err = sysfs_create_link(>dev.kobj, >kobj, "module");
if (err) {
diff --git a/drivers/base/module.c b/drivers/base/module.c
index db930d3..7b7bd5a 100644
--- a/dr

[RFC PATCH v2 next]module: Fix mod-mkobj.kobj potentially freed too early

2013-08-22 Thread Li Zhong
DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

After some investigation, it seems the reason is:
The mod-mkobj.kobj(a01600d0 below) is freed together with mod
itself by module_free(mod, mod-module_core) in free_module(). However,
its children still hold references to it, as the delay caused by
DEBUG_KOBJECT_RELEASE. So when the child('holders' below) tries to
decrease the reference count to its parent in kobject_del(), BUG happens
as it tries to access already freed memory.

v2: 
Greg suggested to make the kobject as a pointer. But it seems a little
hard to make kobj(kobject) embedded in mkobj(module_kobject) a pointer,
as that seems to cause getting the mkobj from the kobj impossible --
to_module_kobject() is used in several places to derive mkobj from its
member kobj. 

So in this version, I made the whole mkobj(module_kobject) as a pointer
in mod(module).

The mkobj is then allocated in mod_sysfs_init(), and freed in its member
kobj's release function -- it seems that there is no mkobj usage after
its kobj is released? 

[ 1844.175287] kobject: 'holders' (88007c1f1600): kobject_release, parent 
a01600d0 (delayed)
[ 1844.178991] kobject: 'notes' (8800370b2a00): kobject_release, parent 
a01600d0 (delayed)
[ 1845.180118] kobject: 'holders' (88007c1f1600): kobject_cleanup, parent 
a01600d0
[ 1845.182130] kobject: 'holders' (88007c1f1600): auto cleanup kobject_del
[ 1845.184120] BUG: unable to handle kernel paging request at a01601d0
[ 1845.185026] IP: [812cda81] kobject_put+0x11/0x60
[ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
[ 1845.185026] Oops:  [#1] PREEMPT 
[ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
[ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G   O 
3.11.0-rc6-next-20130819+ #1
[ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1845.185026] Workqueue: events kobject_delayed_cleanup
[ 1845.185026] task: 88007ca51f00 ti: 88007ca5c000 task.ti: 
88007ca5c000
[ 1845.185026] RIP: 0010:[812cda81]  [812cda81] 
kobject_put+0x11/0x60
[ 1845.185026] RSP: 0018:88007ca5dd08  EFLAGS: 00010282
[ 1845.185026] RAX: 2000 RBX: a01600d0 RCX: 8177d638
[ 1845.185026] RDX: 88007ca5dc18 RSI:  RDI: a01600d0
[ 1845.185026] RBP: 88007ca5dd18 R08: 824e9810 R09: 
[ 1845.185026] R10: 8800 R11: dead4ead0001 R12: 81a95040
[ 1845.185026] R13: 88007b27a960 R14: 88007c1f1600 R15: 
[ 1845.185026] FS:  () GS:81a23000() 
knlGS:
[ 1845.185026] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1845.185026] CR2: a01601d0 CR3: 37207000 CR4: 06b0
[ 1845.185026] Stack:
[ 1845.185026]  88007c1f1600 88007c1f1600 88007ca5dd38 
812cdb7e
[ 1845.185026]   88007c1f1640 88007ca5dd68 
812cdbfe
[ 1845.185026]  88007c974800 88007c1f1640 88007ff61a00 

[ 1845.185026] Call Trace:
[ 1845.185026]  [812cdb7e] kobject_del+0x2e/0x40
[ 1845.185026]  [812cdbfe] kobject_delayed_cleanup+0x6e/0x1d0
[ 1845.185026]  [81063a45] process_one_work+0x1e5/0x670
[ 1845.185026]  [810639e3] ? process_one_work+0x183/0x670
[ 1845.185026]  [810642b3] worker_thread+0x113/0x370
[ 1845.185026]  [810641a0] ? rescuer_thread+0x290/0x290
[ 1845.185026]  [8106bfba] kthread+0xda/0xe0
[ 1845.185026]  [814ff0f0] ? _raw_spin_unlock_irq+0x30/0x60
[ 1845.185026]  [8106bee0] ? kthread_create_on_node+0x130/0x130
[ 1845.185026]  [8150751a] ret_from_fork+0x7a/0xb0
[ 1845.185026]  [8106bee0] ? kthread_create_on_node+0x130/0x130
[ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff 
ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d f6 87 
00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84 
[ 1845.185026] RIP  [812cda81] kobject_put+0x11/0x60
[ 1845.185026]  RSP 88007ca5dd08
[ 1845.185026] CR2: a01601d0
[ 1845.185026] ---[ end trace 49a70afd109f5653 ]---

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 drivers/base/core.c|  2 +-
 drivers/base/module.c  |  4 ++--
 include/linux/module.h |  2 +-
 kernel/module.c| 37 +
 kernel/params.c| 19 +--
 5 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 09a99d6..b8a1fc6 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1629,7 +1629,7 @@ struct device *__root_device_register(const char *name, 
struct module *owner)
 
 #ifdef CONFIG_MODULES  /* gotta find a cleaner way to do this */
if (owner) {
-   struct module_kobject *mk = owner-mkobj

Re: [RFC PATCH next]module: Fix mod-mkobj.kobj potentially freed too early

2013-08-22 Thread Li Zhong
On Thu, 2013-08-22 at 16:30 +0930, Rusty Russell wrote:
 Greg KH gre...@linuxfoundation.org writes:
  On Wed, Aug 21, 2013 at 05:49:58PM +0800, Li Zhong wrote:
struct kobj_type module_ktype = {
   + .release   =module_kobj_release,
 .sysfs_ops =module_sysfs_ops,
};
 
  Wait, as there is no release function here for the kobject (a different
  problem), why is the deferred release function causing any problems?
  There is no release function to call, so what is causing the oops?
 
 Because DEBUG_KOBJECT_RELEASE does the kobject_put() sometime later,
 which is what causes the oops.
 
 Since kobjects don't have an owner field, AFAICT someone *could* grab
 one in a module which is unloading, then put it after unload.   So this
 fixes a real bug, albeit not one seen in the real world.
 
 Applied,

Oh, thank you, Rusty. 

I just sent out another version... which fix it in another way as Greg
suggested, could you please also help to take a look at it?

Thanks, Zhong

 Rusty.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH next]module: Fix mod->mkobj.kobj potentially freed too early

2013-08-21 Thread Li Zhong
On Wed, 2013-08-21 at 21:03 -0700, Greg KH wrote:
> On Thu, Aug 22, 2013 at 10:34:06AM +0800, Li Zhong wrote:
> > On Wed, 2013-08-21 at 09:18 -0700, Greg KH wrote:
> > > On Wed, Aug 21, 2013 at 05:49:58PM +0800, Li Zhong wrote:
> > > > DEBUG_KOBJECT_RELEASE helps to find the issue attached below.
> > > > 
> > > > After some investigation, it seems the reason is:
> > > > The mod->mkobj.kobj(a01600d0 below) is freed together with mod
> > > > itself in free_module(). However, its children still hold references to
> > > > it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
> > > > child(holders below) tries to decrease the reference count to its parent
> > > > in kobject_del(), BUG happens as it tries to access already freed 
> > > > memory.
> > > 
> > > Ah, thanks for tracking this down.  I had seen this in my local testing,
> > > but wasn't able to figure out the offending code.
> > > 
> > > > This patch tries to fix it by waiting for the mod->mkobj.kobj to be
> > > > really released in the module removing process (and some error code
> > > > paths).
> > > 
> > > Nasty, we should just be freeing the structure in the release function,
> > > why doesn't that work?
> > 
> > Hi Greg,
> > 
> > It seems I didn't describe it clearly in the previous mail. I'm trying
> > to do it better below: 
> > 
> > mod->mkobj.kobj is embedded in module_kobject(not a pointer):
> > struct module_kobject {
> > struct kobject kobj;
> > ...
> > 
> > and allocated with the module memory. So we could see the parent below
> > a01600d0 is between MODULES_VADDR (a000) and
> > MODULES_END(ff00). 
> > 
> > It seem to me that the mkobj.kobj is freed by module_free(mod,
> > mod->module_core). 
> 
> Ick, you are right.  If a kobject is being embedded in an object, it
> should control the lifespan of the object, not somewhere else like is
> happening here.
> 
> The best solution for this is to make the kobject a pointer, not
> embedded in the structure, that will fix this issue, right?

Yes, I think so. I'll try to write a fix using this way, thanks for your
suggestion.

Thanks, Zhong

> 
> thanks,
> 
> greg k-h
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH next]module: Fix mod->mkobj.kobj potentially freed too early

2013-08-21 Thread Li Zhong
On Wed, 2013-08-21 at 09:18 -0700, Greg KH wrote:
> On Wed, Aug 21, 2013 at 05:49:58PM +0800, Li Zhong wrote:
> > DEBUG_KOBJECT_RELEASE helps to find the issue attached below.
> > 
> > After some investigation, it seems the reason is:
> > The mod->mkobj.kobj(a01600d0 below) is freed together with mod
> > itself in free_module(). However, its children still hold references to
> > it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
> > child(holders below) tries to decrease the reference count to its parent
> > in kobject_del(), BUG happens as it tries to access already freed memory.
> 
> Ah, thanks for tracking this down.  I had seen this in my local testing,
> but wasn't able to figure out the offending code.
> 
> > This patch tries to fix it by waiting for the mod->mkobj.kobj to be
> > really released in the module removing process (and some error code
> > paths).
> 
> Nasty, we should just be freeing the structure in the release function,
> why doesn't that work?

Hi Greg,

It seems I didn't describe it clearly in the previous mail. I'm trying
to do it better below: 

mod->mkobj.kobj is embedded in module_kobject(not a pointer):
struct module_kobject {
struct kobject kobj;
...

and allocated with the module memory. So we could see the parent below
a01600d0 is between MODULES_VADDR (a000) and
MODULES_END(ff00). 

It seem to me that the mkobj.kobj is freed by module_free(mod,
mod->module_core). 

So in free_module(), we need delay the call of module_free(), until
mkobj.kobj finishes its cleanup. 

> > [ 1844.175287] kobject: 'holders' (88007c1f1600): kobject_release, 
> > parent a01600d0 (delayed)
> > [ 1844.178991] kobject: 'notes' (8800370b2a00): kobject_release, parent 
> > a01600d0 (delayed)
> > [ 1845.180118] kobject: 'holders' (88007c1f1600): kobject_cleanup, 
> > parent a01600d0
> > [ 1845.182130] kobject: 'holders' (88007c1f1600): auto cleanup 
> > kobject_del
> > [ 1845.184120] BUG: unable to handle kernel paging request at 
> > a01601d0
> > [ 1845.185026] IP: [] kobject_put+0x11/0x60
> > [ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
> > [ 1845.185026] Oops:  [#1] PREEMPT 
> > [ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: 
> > kprobe_example]
> > [ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G   O 
> > 3.11.0-rc6-next-20130819+ #1
> > [ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> > [ 1845.185026] Workqueue: events kobject_delayed_cleanup
> > [ 1845.185026] task: 88007ca51f00 ti: 88007ca5c000 task.ti: 
> > 88007ca5c000
> > [ 1845.185026] RIP: 0010:[]  [] 
> > kobject_put+0x11/0x60
> > [ 1845.185026] RSP: 0018:88007ca5dd08  EFLAGS: 00010282
> > [ 1845.185026] RAX: 2000 RBX: a01600d0 RCX: 
> > 8177d638
> > [ 1845.185026] RDX: 88007ca5dc18 RSI:  RDI: 
> > a01600d0
> > [ 1845.185026] RBP: 88007ca5dd18 R08: 824e9810 R09: 
> > 
> > [ 1845.185026] R10: 8800 R11: dead4ead0001 R12: 
> > 81a95040
> > [ 1845.185026] R13: 88007b27a960 R14: 88007c1f1600 R15: 
> > 
> > [ 1845.185026] FS:  () GS:81a23000() 
> > knlGS:
> > [ 1845.185026] CS:  0010 DS:  ES:  CR0: 8005003b
> > [ 1845.185026] CR2: a01601d0 CR3: 37207000 CR4: 
> > 06b0
> > [ 1845.185026] Stack:
> > [ 1845.185026]  88007c1f1600 88007c1f1600 88007ca5dd38 
> > 812cdb7e
> > [ 1845.185026]   88007c1f1640 88007ca5dd68 
> > 812cdbfe
> > [ 1845.185026]  88007c974800 88007c1f1640 88007ff61a00 
> > 
> > [ 1845.185026] Call Trace:
> > [ 1845.185026]  [] kobject_del+0x2e/0x40
> > [ 1845.185026]  [] kobject_delayed_cleanup+0x6e/0x1d0
> > [ 1845.185026]  [] process_one_work+0x1e5/0x670
> > [ 1845.185026]  [] ? process_one_work+0x183/0x670
> > [ 1845.185026]  [] worker_thread+0x113/0x370
> > [ 1845.185026]  [] ? rescuer_thread+0x290/0x290
> > [ 1845.185026]  [] kthread+0xda/0xe0
> > [ 1845.185026]  [] ? _raw_spin_unlock_irq+0x30/0x60
> > [ 1845.185026]  [] ? kthread_create_on_node+0x130/0x130
> > [ 1845.185026]  [] ret_from_fork+0x7a/0xb0
> > [ 1845.185026]  [] ? kthread_create_on_node+0x130/0x130
> > [ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff 
> > ff ff 66 0f 1f 44 00 00 55 48

[RFC PATCH next]module: Fix mod->mkobj.kobj potentially freed too early

2013-08-21 Thread Li Zhong
DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

After some investigation, it seems the reason is:
The mod->mkobj.kobj(a01600d0 below) is freed together with mod
itself in free_module(). However, its children still hold references to
it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
child(holders below) tries to decrease the reference count to its parent
in kobject_del(), BUG happens as it tries to access already freed memory.

This patch tries to fix it by waiting for the mod->mkobj.kobj to be
really released in the module removing process (and some error code
paths).

[ 1844.175287] kobject: 'holders' (88007c1f1600): kobject_release, parent 
a01600d0 (delayed)
[ 1844.178991] kobject: 'notes' (8800370b2a00): kobject_release, parent 
a01600d0 (delayed)
[ 1845.180118] kobject: 'holders' (88007c1f1600): kobject_cleanup, parent 
a01600d0
[ 1845.182130] kobject: 'holders' (88007c1f1600): auto cleanup kobject_del
[ 1845.184120] BUG: unable to handle kernel paging request at a01601d0
[ 1845.185026] IP: [] kobject_put+0x11/0x60
[ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
[ 1845.185026] Oops:  [#1] PREEMPT 
[ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
[ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G   O 
3.11.0-rc6-next-20130819+ #1
[ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1845.185026] Workqueue: events kobject_delayed_cleanup
[ 1845.185026] task: 88007ca51f00 ti: 88007ca5c000 task.ti: 
88007ca5c000
[ 1845.185026] RIP: 0010:[]  [] 
kobject_put+0x11/0x60
[ 1845.185026] RSP: 0018:88007ca5dd08  EFLAGS: 00010282
[ 1845.185026] RAX: 2000 RBX: a01600d0 RCX: 8177d638
[ 1845.185026] RDX: 88007ca5dc18 RSI:  RDI: a01600d0
[ 1845.185026] RBP: 88007ca5dd18 R08: 824e9810 R09: 
[ 1845.185026] R10: 8800 R11: dead4ead0001 R12: 81a95040
[ 1845.185026] R13: 88007b27a960 R14: 88007c1f1600 R15: 
[ 1845.185026] FS:  () GS:81a23000() 
knlGS:
[ 1845.185026] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1845.185026] CR2: a01601d0 CR3: 37207000 CR4: 06b0
[ 1845.185026] Stack:
[ 1845.185026]  88007c1f1600 88007c1f1600 88007ca5dd38 
812cdb7e
[ 1845.185026]   88007c1f1640 88007ca5dd68 
812cdbfe
[ 1845.185026]  88007c974800 88007c1f1640 88007ff61a00 

[ 1845.185026] Call Trace:
[ 1845.185026]  [] kobject_del+0x2e/0x40
[ 1845.185026]  [] kobject_delayed_cleanup+0x6e/0x1d0
[ 1845.185026]  [] process_one_work+0x1e5/0x670
[ 1845.185026]  [] ? process_one_work+0x183/0x670
[ 1845.185026]  [] worker_thread+0x113/0x370
[ 1845.185026]  [] ? rescuer_thread+0x290/0x290
[ 1845.185026]  [] kthread+0xda/0xe0
[ 1845.185026]  [] ? _raw_spin_unlock_irq+0x30/0x60
[ 1845.185026]  [] ? kthread_create_on_node+0x130/0x130
[ 1845.185026]  [] ret_from_fork+0x7a/0xb0
[ 1845.185026]  [] ? kthread_create_on_node+0x130/0x130
[ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff 
ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d  87 
00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84 
[ 1845.185026] RIP  [] kobject_put+0x11/0x60
[ 1845.185026]  RSP 
[ 1845.185026] CR2: a01601d0
[ 1845.185026] ---[ end trace 49a70afd109f5653 ]---

Signed-off-by: Li Zhong 
---
 include/linux/module.h |  1 +
 kernel/module.c| 14 +++---
 kernel/params.c|  7 +++
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 504035f..05f2447 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -42,6 +42,7 @@ struct module_kobject {
struct module *mod;
struct kobject *drivers_dir;
struct module_param_attrs *mp;
+   struct completion *kobj_completion;
 };
 
 struct module_attribute {
diff --git a/kernel/module.c b/kernel/module.c
index 2069158..b5e2baa 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1611,6 +1611,14 @@ static void module_remove_modinfo_attrs(struct module 
*mod)
kfree(mod->modinfo_attrs);
 }
 
+static void mod_kobject_put(struct module *mod)
+{
+   DECLARE_COMPLETION_ONSTACK(c);
+   mod->mkobj.kobj_completion = 
+   kobject_put(>mkobj.kobj);
+   wait_for_completion();
+}
+
 static int mod_sysfs_init(struct module *mod)
 {
int err;
@@ -1638,7 +1646,7 @@ static int mod_sysfs_init(struct module *mod)
err = kobject_init_and_add(>mkobj.kobj, _ktype, NULL,
   "%s", mod->name);
if (err)
-   kobject_put(>mkobj.kobj);
+   mod_kobject_put(mod);
 
/* delay uevent until full sy

[RFC PATCH next]module: Fix mod-mkobj.kobj potentially freed too early

2013-08-21 Thread Li Zhong
DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

After some investigation, it seems the reason is:
The mod-mkobj.kobj(a01600d0 below) is freed together with mod
itself in free_module(). However, its children still hold references to
it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
child(holders below) tries to decrease the reference count to its parent
in kobject_del(), BUG happens as it tries to access already freed memory.

This patch tries to fix it by waiting for the mod-mkobj.kobj to be
really released in the module removing process (and some error code
paths).

[ 1844.175287] kobject: 'holders' (88007c1f1600): kobject_release, parent 
a01600d0 (delayed)
[ 1844.178991] kobject: 'notes' (8800370b2a00): kobject_release, parent 
a01600d0 (delayed)
[ 1845.180118] kobject: 'holders' (88007c1f1600): kobject_cleanup, parent 
a01600d0
[ 1845.182130] kobject: 'holders' (88007c1f1600): auto cleanup kobject_del
[ 1845.184120] BUG: unable to handle kernel paging request at a01601d0
[ 1845.185026] IP: [812cda81] kobject_put+0x11/0x60
[ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
[ 1845.185026] Oops:  [#1] PREEMPT 
[ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
[ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G   O 
3.11.0-rc6-next-20130819+ #1
[ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1845.185026] Workqueue: events kobject_delayed_cleanup
[ 1845.185026] task: 88007ca51f00 ti: 88007ca5c000 task.ti: 
88007ca5c000
[ 1845.185026] RIP: 0010:[812cda81]  [812cda81] 
kobject_put+0x11/0x60
[ 1845.185026] RSP: 0018:88007ca5dd08  EFLAGS: 00010282
[ 1845.185026] RAX: 2000 RBX: a01600d0 RCX: 8177d638
[ 1845.185026] RDX: 88007ca5dc18 RSI:  RDI: a01600d0
[ 1845.185026] RBP: 88007ca5dd18 R08: 824e9810 R09: 
[ 1845.185026] R10: 8800 R11: dead4ead0001 R12: 81a95040
[ 1845.185026] R13: 88007b27a960 R14: 88007c1f1600 R15: 
[ 1845.185026] FS:  () GS:81a23000() 
knlGS:
[ 1845.185026] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1845.185026] CR2: a01601d0 CR3: 37207000 CR4: 06b0
[ 1845.185026] Stack:
[ 1845.185026]  88007c1f1600 88007c1f1600 88007ca5dd38 
812cdb7e
[ 1845.185026]   88007c1f1640 88007ca5dd68 
812cdbfe
[ 1845.185026]  88007c974800 88007c1f1640 88007ff61a00 

[ 1845.185026] Call Trace:
[ 1845.185026]  [812cdb7e] kobject_del+0x2e/0x40
[ 1845.185026]  [812cdbfe] kobject_delayed_cleanup+0x6e/0x1d0
[ 1845.185026]  [81063a45] process_one_work+0x1e5/0x670
[ 1845.185026]  [810639e3] ? process_one_work+0x183/0x670
[ 1845.185026]  [810642b3] worker_thread+0x113/0x370
[ 1845.185026]  [810641a0] ? rescuer_thread+0x290/0x290
[ 1845.185026]  [8106bfba] kthread+0xda/0xe0
[ 1845.185026]  [814ff0f0] ? _raw_spin_unlock_irq+0x30/0x60
[ 1845.185026]  [8106bee0] ? kthread_create_on_node+0x130/0x130
[ 1845.185026]  [8150751a] ret_from_fork+0x7a/0xb0
[ 1845.185026]  [8106bee0] ? kthread_create_on_node+0x130/0x130
[ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff 
ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d f6 87 
00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84 
[ 1845.185026] RIP  [812cda81] kobject_put+0x11/0x60
[ 1845.185026]  RSP 88007ca5dd08
[ 1845.185026] CR2: a01601d0
[ 1845.185026] ---[ end trace 49a70afd109f5653 ]---

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 include/linux/module.h |  1 +
 kernel/module.c| 14 +++---
 kernel/params.c|  7 +++
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 504035f..05f2447 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -42,6 +42,7 @@ struct module_kobject {
struct module *mod;
struct kobject *drivers_dir;
struct module_param_attrs *mp;
+   struct completion *kobj_completion;
 };
 
 struct module_attribute {
diff --git a/kernel/module.c b/kernel/module.c
index 2069158..b5e2baa 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1611,6 +1611,14 @@ static void module_remove_modinfo_attrs(struct module 
*mod)
kfree(mod-modinfo_attrs);
 }
 
+static void mod_kobject_put(struct module *mod)
+{
+   DECLARE_COMPLETION_ONSTACK(c);
+   mod-mkobj.kobj_completion = c;
+   kobject_put(mod-mkobj.kobj);
+   wait_for_completion(c);
+}
+
 static int mod_sysfs_init(struct module *mod)
 {
int err;
@@ -1638,7 +1646,7 @@ static int mod_sysfs_init(struct module *mod

Re: [RFC PATCH next]module: Fix mod-mkobj.kobj potentially freed too early

2013-08-21 Thread Li Zhong
On Wed, 2013-08-21 at 09:18 -0700, Greg KH wrote:
 On Wed, Aug 21, 2013 at 05:49:58PM +0800, Li Zhong wrote:
  DEBUG_KOBJECT_RELEASE helps to find the issue attached below.
  
  After some investigation, it seems the reason is:
  The mod-mkobj.kobj(a01600d0 below) is freed together with mod
  itself in free_module(). However, its children still hold references to
  it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
  child(holders below) tries to decrease the reference count to its parent
  in kobject_del(), BUG happens as it tries to access already freed memory.
 
 Ah, thanks for tracking this down.  I had seen this in my local testing,
 but wasn't able to figure out the offending code.
 
  This patch tries to fix it by waiting for the mod-mkobj.kobj to be
  really released in the module removing process (and some error code
  paths).
 
 Nasty, we should just be freeing the structure in the release function,
 why doesn't that work?

Hi Greg,

It seems I didn't describe it clearly in the previous mail. I'm trying
to do it better below: 

mod-mkobj.kobj is embedded in module_kobject(not a pointer):
struct module_kobject {
struct kobject kobj;
...

and allocated with the module memory. So we could see the parent below
a01600d0 is between MODULES_VADDR (a000) and
MODULES_END(ff00). 

It seem to me that the mkobj.kobj is freed by module_free(mod,
mod-module_core). 

So in free_module(), we need delay the call of module_free(), until
mkobj.kobj finishes its cleanup. 

  [ 1844.175287] kobject: 'holders' (88007c1f1600): kobject_release, 
  parent a01600d0 (delayed)
  [ 1844.178991] kobject: 'notes' (8800370b2a00): kobject_release, parent 
  a01600d0 (delayed)
  [ 1845.180118] kobject: 'holders' (88007c1f1600): kobject_cleanup, 
  parent a01600d0
  [ 1845.182130] kobject: 'holders' (88007c1f1600): auto cleanup 
  kobject_del
  [ 1845.184120] BUG: unable to handle kernel paging request at 
  a01601d0
  [ 1845.185026] IP: [812cda81] kobject_put+0x11/0x60
  [ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
  [ 1845.185026] Oops:  [#1] PREEMPT 
  [ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: 
  kprobe_example]
  [ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G   O 
  3.11.0-rc6-next-20130819+ #1
  [ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
  [ 1845.185026] Workqueue: events kobject_delayed_cleanup
  [ 1845.185026] task: 88007ca51f00 ti: 88007ca5c000 task.ti: 
  88007ca5c000
  [ 1845.185026] RIP: 0010:[812cda81]  [812cda81] 
  kobject_put+0x11/0x60
  [ 1845.185026] RSP: 0018:88007ca5dd08  EFLAGS: 00010282
  [ 1845.185026] RAX: 2000 RBX: a01600d0 RCX: 
  8177d638
  [ 1845.185026] RDX: 88007ca5dc18 RSI:  RDI: 
  a01600d0
  [ 1845.185026] RBP: 88007ca5dd18 R08: 824e9810 R09: 
  
  [ 1845.185026] R10: 8800 R11: dead4ead0001 R12: 
  81a95040
  [ 1845.185026] R13: 88007b27a960 R14: 88007c1f1600 R15: 
  
  [ 1845.185026] FS:  () GS:81a23000() 
  knlGS:
  [ 1845.185026] CS:  0010 DS:  ES:  CR0: 8005003b
  [ 1845.185026] CR2: a01601d0 CR3: 37207000 CR4: 
  06b0
  [ 1845.185026] Stack:
  [ 1845.185026]  88007c1f1600 88007c1f1600 88007ca5dd38 
  812cdb7e
  [ 1845.185026]   88007c1f1640 88007ca5dd68 
  812cdbfe
  [ 1845.185026]  88007c974800 88007c1f1640 88007ff61a00 
  
  [ 1845.185026] Call Trace:
  [ 1845.185026]  [812cdb7e] kobject_del+0x2e/0x40
  [ 1845.185026]  [812cdbfe] kobject_delayed_cleanup+0x6e/0x1d0
  [ 1845.185026]  [81063a45] process_one_work+0x1e5/0x670
  [ 1845.185026]  [810639e3] ? process_one_work+0x183/0x670
  [ 1845.185026]  [810642b3] worker_thread+0x113/0x370
  [ 1845.185026]  [810641a0] ? rescuer_thread+0x290/0x290
  [ 1845.185026]  [8106bfba] kthread+0xda/0xe0
  [ 1845.185026]  [814ff0f0] ? _raw_spin_unlock_irq+0x30/0x60
  [ 1845.185026]  [8106bee0] ? kthread_create_on_node+0x130/0x130
  [ 1845.185026]  [8150751a] ret_from_fork+0x7a/0xb0
  [ 1845.185026]  [8106bee0] ? kthread_create_on_node+0x130/0x130
  [ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff 
  ff ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d 
  f6 87 00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84 
  [ 1845.185026] RIP  [812cda81] kobject_put+0x11/0x60
  [ 1845.185026]  RSP 88007ca5dd08
  [ 1845.185026] CR2: a01601d0
  [ 1845.185026] ---[ end trace 49a70afd109f5653 ]---
  
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
   include/linux/module.h

Re: [RFC PATCH next]module: Fix mod-mkobj.kobj potentially freed too early

2013-08-21 Thread Li Zhong
On Wed, 2013-08-21 at 21:03 -0700, Greg KH wrote:
 On Thu, Aug 22, 2013 at 10:34:06AM +0800, Li Zhong wrote:
  On Wed, 2013-08-21 at 09:18 -0700, Greg KH wrote:
   On Wed, Aug 21, 2013 at 05:49:58PM +0800, Li Zhong wrote:
DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

After some investigation, it seems the reason is:
The mod-mkobj.kobj(a01600d0 below) is freed together with mod
itself in free_module(). However, its children still hold references to
it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
child(holders below) tries to decrease the reference count to its parent
in kobject_del(), BUG happens as it tries to access already freed 
memory.
   
   Ah, thanks for tracking this down.  I had seen this in my local testing,
   but wasn't able to figure out the offending code.
   
This patch tries to fix it by waiting for the mod-mkobj.kobj to be
really released in the module removing process (and some error code
paths).
   
   Nasty, we should just be freeing the structure in the release function,
   why doesn't that work?
  
  Hi Greg,
  
  It seems I didn't describe it clearly in the previous mail. I'm trying
  to do it better below: 
  
  mod-mkobj.kobj is embedded in module_kobject(not a pointer):
  struct module_kobject {
  struct kobject kobj;
  ...
  
  and allocated with the module memory. So we could see the parent below
  a01600d0 is between MODULES_VADDR (a000) and
  MODULES_END(ff00). 
  
  It seem to me that the mkobj.kobj is freed by module_free(mod,
  mod-module_core). 
 
 Ick, you are right.  If a kobject is being embedded in an object, it
 should control the lifespan of the object, not somewhere else like is
 happening here.
 
 The best solution for this is to make the kobject a pointer, not
 embedded in the structure, that will fix this issue, right?

Yes, I think so. I'll try to write a fix using this way, thanks for your
suggestion.

Thanks, Zhong

 
 thanks,
 
 greg k-h
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >