Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Toshi Kani
On Wed, 2013-08-28 at 10:12 +0800, Gu Zheng wrote:
> On 08/28/2013 05:38 AM, Toshi Kani wrote:
 :
> >>
> >> What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like 
> >> the
> >> attached one(With a preliminary test, it also can make the splat go 
> >> away).:)
> > 
> > I am curious how msleep(10) & restart_syscall() work in the change
> > below.  Doesn't the msleep() make s_active held longer time, which can
> > lead the thread holding device_hotplug_lock to wait it for deletion?
> 
> Yes, but it can avoid busy waiting. 

I know, but it's kinda unfortunate to sleep with s_active held in this
situation.  But I am fine with the 5ms Rafael used in this latest
patchset since this is a rare case anyway.

> > Also, does restart_syscall() release s_active and reopen this file
> > again?
> 
> Sure, it just set a TIF_SIGPENDING flag and return an -ERESTARTNOINTR error, 
> s_active/file
> will be released/closed in the failed path. And when do_signal() catches the 
> -ERESTARTNOINTR,
> it will change the regs to restart the syscall.

I see.  This is a clever functionality. 

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Rafael J. Wysocki
On Wednesday, August 28, 2013 08:24:22 AM Tejun Heo wrote:
> Hello, Rafael.
> 
> On Tue, Aug 27, 2013 at 11:45:46PM +0200, Rafael J. Wysocki wrote:
> > I've thought about that a bit over the last several hours and I'm still
> > thinking that that patch is a bit overkill, because it will trigger the
> > restart_syscall() for all cases when device_hotplug_lock is locked, even
> > if they can't lead to any deadlocks.  The only deadlockish situation is
> > when device *removal* is in progress when store_online(), for example,
> > is called.
> > 
> > So to address that particular situation without adding too much overhead for
> > other cases, I've come up with the appended patch (untested for now).
> > 
> > This is how it is supposed to work.
> > 
> > There are three "lock levels" for device hotplug, "normal", "remove"
> > and "weak".  The difference is related to how __lock_device_hotplug()
> > works.  Namely, if device hotplug is currently locked, that function
> > will either block or return false, depending on the "current lock
> > level" and its argument (the "new lock level").  The rules here are
> > that false is returned immediately if the "current lock level" is
> > "remove" and the "new lock level" is "weak".  The function blocks
> > for all other combinations of the two.
> 
> I think this is going way too far and a massive overkill in terms of
> complexity, which is way more important than for doing some
> restart_syscall loops during some rare sysfs file access, which isn't
> gonna be that common anyway.  Fancy locking always sucks.  The root
> cause of the problem is file removal ref draining from sysfs side and
> a proper solution should be implemented there.  Adding all this
> complexity to device hotplug lock won't solve problems involving other
> locks while obfuscating what's going on through all the complexity.
> Also, when sysfs is updated with a proper solution, a simpler
> workaround from device hotplug side would be far easier to convert to
> the new solution than this, which over time is likely to grow
> interesting uses.
> 
> Let's *please* stick to something simple.

OK, I'll use restart_syscall() approach in the simplest form, then.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Tejun Heo
Hello, Rafael.

On Tue, Aug 27, 2013 at 11:45:46PM +0200, Rafael J. Wysocki wrote:
> I've thought about that a bit over the last several hours and I'm still
> thinking that that patch is a bit overkill, because it will trigger the
> restart_syscall() for all cases when device_hotplug_lock is locked, even
> if they can't lead to any deadlocks.  The only deadlockish situation is
> when device *removal* is in progress when store_online(), for example,
> is called.
> 
> So to address that particular situation without adding too much overhead for
> other cases, I've come up with the appended patch (untested for now).
> 
> This is how it is supposed to work.
> 
> There are three "lock levels" for device hotplug, "normal", "remove"
> and "weak".  The difference is related to how __lock_device_hotplug()
> works.  Namely, if device hotplug is currently locked, that function
> will either block or return false, depending on the "current lock
> level" and its argument (the "new lock level").  The rules here are
> that false is returned immediately if the "current lock level" is
> "remove" and the "new lock level" is "weak".  The function blocks
> for all other combinations of the two.

I think this is going way too far and a massive overkill in terms of
complexity, which is way more important than for doing some
restart_syscall loops during some rare sysfs file access, which isn't
gonna be that common anyway.  Fancy locking always sucks.  The root
cause of the problem is file removal ref draining from sysfs side and
a proper solution should be implemented there.  Adding all this
complexity to device hotplug lock won't solve problems involving other
locks while obfuscating what's going on through all the complexity.
Also, when sysfs is updated with a proper solution, a simpler
workaround from device hotplug side would be far easier to convert to
the new solution than this, which over time is likely to grow
interesting uses.

Let's *please* stick to something simple.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Gu Zheng
Hi Rafael,

On 08/28/2013 05:45 AM, Rafael J. Wysocki wrote:

> On Tuesday, August 27, 2013 02:36:44 PM Tejun Heo wrote:
>> Hello,
>>
[...]
> 
> I've thought about that a bit over the last several hours and I'm still
> thinking that that patch is a bit overkill, because it will trigger the
> restart_syscall() for all cases when device_hotplug_lock is locked, even
> if they can't lead to any deadlocks.  The only deadlockish situation is
> when device *removal* is in progress when store_online(), for example,
> is called.
> 
> So to address that particular situation without adding too much overhead for
> other cases, I've come up with the appended patch (untested for now).
> 
> This is how it is supposed to work.
> 
> There are three "lock levels" for device hotplug, "normal", "remove"
> and "weak".  The difference is related to how __lock_device_hotplug()
> works.  Namely, if device hotplug is currently locked, that function
> will either block or return false, depending on the "current lock
> level" and its argument (the "new lock level").  The rules here are
> that false is returned immediately if the "current lock level" is
> "remove" and the "new lock level" is "weak".  The function blocks
> for all other combinations of the two.
> 
> There are two functions supposed to use device hotplug "lock levels"
> other than "normal": store_online() and acpi_scan_hot_remove().
> Everybody else is supposed to use "normal" (well, there are more
> potential users of "weak" in drivers/base/memory.c).
> 
> acpi_scan_hot_remove() uses the "remove" lock level to indicate
> that it is going to remove devices while holding device hotplug
> locked.  In turn, store_online() uses the "weak" lock level so
> that it doesn't block when devices are being removed with device
> hotplug locked, because that may lead to a deadlock.
> 
> show_online() actually doesn't need to lock device hotplug, but
> it is useful to serialize it with respect to device_offline()
> and device_online() (in case user space attempts to run them
> concurrently).

Yeah. I tested this one on latest kernel tree, it does make the splat go away.
Looking forward to the ACPI part one.:)

Regards,
Gu

> 
> ---
>  drivers/acpi/scan.c|4 +-
>  drivers/base/core.c|   72 
> ++---
>  include/linux/device.h |   25 -
>  3 files changed, 83 insertions(+), 18 deletions(-)
> 
> Index: linux-pm/drivers/base/core.c
> ===
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -49,6 +49,55 @@ static struct kobject *dev_kobj;
>  struct kobject *sysfs_dev_char_kobj;
>  struct kobject *sysfs_dev_block_kobj;
>  
> +static struct {
> + struct task_struct *holder;
> + enum dev_hotplug_lock_type type;
> + struct mutex lock; /* Synchronizes accesses to holder and type */
> + wait_queue_head_t wait_queue;
> +} device_hotplug = {
> + .holder = NULL,
> + .type = DEV_HOTPLUG_LOCK_NONE,
> + .lock = __MUTEX_INITIALIZER(device_hotplug.lock),
> + .wait_queue = __WAIT_QUEUE_HEAD_INITIALIZER(device_hotplug.wait_queue),
> +};
> +
> +bool __lock_device_hotplug(enum dev_hotplug_lock_type type)
> +{
> + DEFINE_WAIT(wait);
> + bool ret = true;
> +
> + mutex_lock(_hotplug.lock);
> + for (;;) {
> + prepare_to_wait(_hotplug.wait_queue, ,
> + TASK_UNINTERRUPTIBLE);
> + if (!device_hotplug.holder) {
> + device_hotplug.holder = current;
> + device_hotplug.type = type;
> + break;
> + } else if (type == DEV_HOTPLUG_LOCK_WEAK
> + && device_hotplug.type == DEV_HOTPLUG_LOCK_REMOVE) {
> + ret = false;
> + break;
> + }
> + mutex_unlock(_hotplug.lock);
> + schedule();
> + mutex_lock(_hotplug.lock);
> + }
> + finish_wait(_hotplug.wait_queue, );
> + mutex_unlock(_hotplug.lock);
> + return ret;
> +}
> +
> +void unlock_device_hotplug(void)
> +{
> + mutex_lock(_hotplug.lock);
> + BUG_ON(device_hotplug.holder != current);
> + device_hotplug.holder = NULL;
> + device_hotplug.type = DEV_HOTPLUG_LOCK_NONE;
> + wake_up(_hotplug.wait_queue);
> + mutex_unlock(_hotplug.lock);
> +}
> +
>  #ifdef CONFIG_BLOCK
>  static inline int device_is_not_partition(struct device *dev)
>  {
> @@ -408,9 +457,10 @@ static ssize_t show_online(struct device
>  {
>   bool val;
>  
> - lock_device_hotplug();
> + /* Serialize against device_online() and device_offline(). */
> + device_lock(dev);
>   val = !dev->offline;
> - unlock_device_hotplug();
> + device_unlock(dev);
>   return sprintf(buf, "%u\n", val);
>  }
>  
> @@ -424,7 +474,11 @@ static ssize_t store_online(struct devic
>   if (ret < 0)
>   return ret;
>  
> - 

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Gu Zheng
Hi Rafael,

On 08/28/2013 05:45 AM, Rafael J. Wysocki wrote:

 On Tuesday, August 27, 2013 02:36:44 PM Tejun Heo wrote:
 Hello,

[...]
 
 I've thought about that a bit over the last several hours and I'm still
 thinking that that patch is a bit overkill, because it will trigger the
 restart_syscall() for all cases when device_hotplug_lock is locked, even
 if they can't lead to any deadlocks.  The only deadlockish situation is
 when device *removal* is in progress when store_online(), for example,
 is called.
 
 So to address that particular situation without adding too much overhead for
 other cases, I've come up with the appended patch (untested for now).
 
 This is how it is supposed to work.
 
 There are three lock levels for device hotplug, normal, remove
 and weak.  The difference is related to how __lock_device_hotplug()
 works.  Namely, if device hotplug is currently locked, that function
 will either block or return false, depending on the current lock
 level and its argument (the new lock level).  The rules here are
 that false is returned immediately if the current lock level is
 remove and the new lock level is weak.  The function blocks
 for all other combinations of the two.
 
 There are two functions supposed to use device hotplug lock levels
 other than normal: store_online() and acpi_scan_hot_remove().
 Everybody else is supposed to use normal (well, there are more
 potential users of weak in drivers/base/memory.c).
 
 acpi_scan_hot_remove() uses the remove lock level to indicate
 that it is going to remove devices while holding device hotplug
 locked.  In turn, store_online() uses the weak lock level so
 that it doesn't block when devices are being removed with device
 hotplug locked, because that may lead to a deadlock.
 
 show_online() actually doesn't need to lock device hotplug, but
 it is useful to serialize it with respect to device_offline()
 and device_online() (in case user space attempts to run them
 concurrently).

Yeah. I tested this one on latest kernel tree, it does make the splat go away.
Looking forward to the ACPI part one.:)

Regards,
Gu

 
 ---
  drivers/acpi/scan.c|4 +-
  drivers/base/core.c|   72 
 ++---
  include/linux/device.h |   25 -
  3 files changed, 83 insertions(+), 18 deletions(-)
 
 Index: linux-pm/drivers/base/core.c
 ===
 --- linux-pm.orig/drivers/base/core.c
 +++ linux-pm/drivers/base/core.c
 @@ -49,6 +49,55 @@ static struct kobject *dev_kobj;
  struct kobject *sysfs_dev_char_kobj;
  struct kobject *sysfs_dev_block_kobj;
  
 +static struct {
 + struct task_struct *holder;
 + enum dev_hotplug_lock_type type;
 + struct mutex lock; /* Synchronizes accesses to holder and type */
 + wait_queue_head_t wait_queue;
 +} device_hotplug = {
 + .holder = NULL,
 + .type = DEV_HOTPLUG_LOCK_NONE,
 + .lock = __MUTEX_INITIALIZER(device_hotplug.lock),
 + .wait_queue = __WAIT_QUEUE_HEAD_INITIALIZER(device_hotplug.wait_queue),
 +};
 +
 +bool __lock_device_hotplug(enum dev_hotplug_lock_type type)
 +{
 + DEFINE_WAIT(wait);
 + bool ret = true;
 +
 + mutex_lock(device_hotplug.lock);
 + for (;;) {
 + prepare_to_wait(device_hotplug.wait_queue, wait,
 + TASK_UNINTERRUPTIBLE);
 + if (!device_hotplug.holder) {
 + device_hotplug.holder = current;
 + device_hotplug.type = type;
 + break;
 + } else if (type == DEV_HOTPLUG_LOCK_WEAK
 +  device_hotplug.type == DEV_HOTPLUG_LOCK_REMOVE) {
 + ret = false;
 + break;
 + }
 + mutex_unlock(device_hotplug.lock);
 + schedule();
 + mutex_lock(device_hotplug.lock);
 + }
 + finish_wait(device_hotplug.wait_queue, wait);
 + mutex_unlock(device_hotplug.lock);
 + return ret;
 +}
 +
 +void unlock_device_hotplug(void)
 +{
 + mutex_lock(device_hotplug.lock);
 + BUG_ON(device_hotplug.holder != current);
 + device_hotplug.holder = NULL;
 + device_hotplug.type = DEV_HOTPLUG_LOCK_NONE;
 + wake_up(device_hotplug.wait_queue);
 + mutex_unlock(device_hotplug.lock);
 +}
 +
  #ifdef CONFIG_BLOCK
  static inline int device_is_not_partition(struct device *dev)
  {
 @@ -408,9 +457,10 @@ static ssize_t show_online(struct device
  {
   bool val;
  
 - lock_device_hotplug();
 + /* Serialize against device_online() and device_offline(). */
 + device_lock(dev);
   val = !dev-offline;
 - unlock_device_hotplug();
 + device_unlock(dev);
   return sprintf(buf, %u\n, val);
  }
  
 @@ -424,7 +474,11 @@ static ssize_t store_online(struct devic
   if (ret  0)
   return ret;
  
 - lock_device_hotplug();
 + if (!__lock_device_hotplug(DEV_HOTPLUG_LOCK_WEAK)) {
 + /* Avoid 

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Tejun Heo
Hello, Rafael.

On Tue, Aug 27, 2013 at 11:45:46PM +0200, Rafael J. Wysocki wrote:
 I've thought about that a bit over the last several hours and I'm still
 thinking that that patch is a bit overkill, because it will trigger the
 restart_syscall() for all cases when device_hotplug_lock is locked, even
 if they can't lead to any deadlocks.  The only deadlockish situation is
 when device *removal* is in progress when store_online(), for example,
 is called.
 
 So to address that particular situation without adding too much overhead for
 other cases, I've come up with the appended patch (untested for now).
 
 This is how it is supposed to work.
 
 There are three lock levels for device hotplug, normal, remove
 and weak.  The difference is related to how __lock_device_hotplug()
 works.  Namely, if device hotplug is currently locked, that function
 will either block or return false, depending on the current lock
 level and its argument (the new lock level).  The rules here are
 that false is returned immediately if the current lock level is
 remove and the new lock level is weak.  The function blocks
 for all other combinations of the two.

I think this is going way too far and a massive overkill in terms of
complexity, which is way more important than for doing some
restart_syscall loops during some rare sysfs file access, which isn't
gonna be that common anyway.  Fancy locking always sucks.  The root
cause of the problem is file removal ref draining from sysfs side and
a proper solution should be implemented there.  Adding all this
complexity to device hotplug lock won't solve problems involving other
locks while obfuscating what's going on through all the complexity.
Also, when sysfs is updated with a proper solution, a simpler
workaround from device hotplug side would be far easier to convert to
the new solution than this, which over time is likely to grow
interesting uses.

Let's *please* stick to something simple.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Rafael J. Wysocki
On Wednesday, August 28, 2013 08:24:22 AM Tejun Heo wrote:
 Hello, Rafael.
 
 On Tue, Aug 27, 2013 at 11:45:46PM +0200, Rafael J. Wysocki wrote:
  I've thought about that a bit over the last several hours and I'm still
  thinking that that patch is a bit overkill, because it will trigger the
  restart_syscall() for all cases when device_hotplug_lock is locked, even
  if they can't lead to any deadlocks.  The only deadlockish situation is
  when device *removal* is in progress when store_online(), for example,
  is called.
  
  So to address that particular situation without adding too much overhead for
  other cases, I've come up with the appended patch (untested for now).
  
  This is how it is supposed to work.
  
  There are three lock levels for device hotplug, normal, remove
  and weak.  The difference is related to how __lock_device_hotplug()
  works.  Namely, if device hotplug is currently locked, that function
  will either block or return false, depending on the current lock
  level and its argument (the new lock level).  The rules here are
  that false is returned immediately if the current lock level is
  remove and the new lock level is weak.  The function blocks
  for all other combinations of the two.
 
 I think this is going way too far and a massive overkill in terms of
 complexity, which is way more important than for doing some
 restart_syscall loops during some rare sysfs file access, which isn't
 gonna be that common anyway.  Fancy locking always sucks.  The root
 cause of the problem is file removal ref draining from sysfs side and
 a proper solution should be implemented there.  Adding all this
 complexity to device hotplug lock won't solve problems involving other
 locks while obfuscating what's going on through all the complexity.
 Also, when sysfs is updated with a proper solution, a simpler
 workaround from device hotplug side would be far easier to convert to
 the new solution than this, which over time is likely to grow
 interesting uses.
 
 Let's *please* stick to something simple.

OK, I'll use restart_syscall() approach in the simplest form, then.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Toshi Kani
On Wed, 2013-08-28 at 10:12 +0800, Gu Zheng wrote:
 On 08/28/2013 05:38 AM, Toshi Kani wrote:
 :
 
  What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like 
  the
  attached one(With a preliminary test, it also can make the splat go 
  away).:)
  
  I am curious how msleep(10)  restart_syscall() work in the change
  below.  Doesn't the msleep() make s_active held longer time, which can
  lead the thread holding device_hotplug_lock to wait it for deletion?
 
 Yes, but it can avoid busy waiting. 

I know, but it's kinda unfortunate to sleep with s_active held in this
situation.  But I am fine with the 5ms Rafael used in this latest
patchset since this is a rare case anyway.

  Also, does restart_syscall() release s_active and reopen this file
  again?
 
 Sure, it just set a TIF_SIGPENDING flag and return an -ERESTARTNOINTR error, 
 s_active/file
 will be released/closed in the failed path. And when do_signal() catches the 
 -ERESTARTNOINTR,
 it will change the regs to restart the syscall.

I see.  This is a clever functionality. 

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Gu Zheng
Hi Toshi,

On 08/28/2013 05:38 AM, Toshi Kani wrote:

> On Tue, 2013-08-27 at 17:21 +0800, Gu Zheng wrote:
>> Hi Rafael,
>>
>> On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:
>>
>>> On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
>> Hi Rafael,
>>>
>>> [...]
>>>

 OK, so the patch below is quick and dirty and overkill, but it should make 
 the
 splat go away at least.
>>>
>>> And if this patch does make the splat go away for you, please also test the
>>> appended one (Tejun, thanks for the hint!).
>>>
>>> I'll address the ACPI part differently later.
>>
>> What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
>> attached one(With a preliminary test, it also can make the splat go away).:)
> 
> I am curious how msleep(10) & restart_syscall() work in the change
> below.  Doesn't the msleep() make s_active held longer time, which can
> lead the thread holding device_hotplug_lock to wait it for deletion?

Yes, but it can avoid busy waiting. 

> Also, does restart_syscall() release s_active and reopen this file
> again?

Sure, it just set a TIF_SIGPENDING flag and return an -ERESTARTNOINTR error, 
s_active/file
will be released/closed in the failed path. And when do_signal() catches the 
-ERESTARTNOINTR,
it will change the regs to restart the syscall.

Thanks,
Gu

> 
> @@ -408,9 +408,13 @@ static ssize_t show_online(struct device *dev,
> struct device_attribute *attr,
>  {
> bool val;
> 
> -   lock_device_hotplug();
> +   if (!read_lock_device_hotplug()) {
> +   msleep(10);
> +   return restart_syscall();
> +   }
> +
> 
> Thanks,
> -Toshi
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Toshi Kani
On Tue, 2013-08-27 at 17:21 +0800, Gu Zheng wrote:
> Hi Rafael,
> 
> On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:
> 
> > On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
> >> On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
> >>> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
>  Hi Rafael,
> > 
> > [...]
> > 
> >>
> >> OK, so the patch below is quick and dirty and overkill, but it should make 
> >> the
> >> splat go away at least.
> > 
> > And if this patch does make the splat go away for you, please also test the
> > appended one (Tejun, thanks for the hint!).
> > 
> > I'll address the ACPI part differently later.
> 
> What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
> attached one(With a preliminary test, it also can make the splat go away).:)

I am curious how msleep(10) & restart_syscall() work in the change
below.  Doesn't the msleep() make s_active held longer time, which can
lead the thread holding device_hotplug_lock to wait it for deletion?
Also, does restart_syscall() release s_active and reopen this file
again?

@@ -408,9 +408,13 @@ static ssize_t show_online(struct device *dev,
struct device_attribute *attr,
 {
bool val;

-   lock_device_hotplug();
+   if (!read_lock_device_hotplug()) {
+   msleep(10);
+   return restart_syscall();
+   }
+

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Rafael J. Wysocki
On Tuesday, August 27, 2013 02:36:44 PM Tejun Heo wrote:
> Hello,
> 
> On Tue, Aug 27, 2013 at 05:21:44PM +0800, Gu Zheng wrote:
> > >> OK, so the patch below is quick and dirty and overkill, but it should 
> > >> make the
> > >> splat go away at least.
> > > 
> > > And if this patch does make the splat go away for you, please also test 
> > > the
> > > appended one (Tejun, thanks for the hint!).
> > > 
> > > I'll address the ACPI part differently later.
> > 
> > What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like 
> > the
> > attached one(With a preliminary test, it also can make the splat go away).:)
> 
> Hmmm.. I don't get it.  How is introducing another rwlock whic may
> cause the operation, even reading the status, to fail randomly a
> better option?

That's more about replacing an existing mutex with an rwsem, but I don't think
it helps anyway.  And as I said before, the acpi_eject_store() case is entirely
separate in my opinion.

> It's harier and more brittle.  We probably want to
> implement better solution in sysfs for files which interact with
> device addition / removal paths but for now I think Rafael's patch is
> the right direction.

I've thought about that a bit over the last several hours and I'm still
thinking that that patch is a bit overkill, because it will trigger the
restart_syscall() for all cases when device_hotplug_lock is locked, even
if they can't lead to any deadlocks.  The only deadlockish situation is
when device *removal* is in progress when store_online(), for example,
is called.

So to address that particular situation without adding too much overhead for
other cases, I've come up with the appended patch (untested for now).

This is how it is supposed to work.

There are three "lock levels" for device hotplug, "normal", "remove"
and "weak".  The difference is related to how __lock_device_hotplug()
works.  Namely, if device hotplug is currently locked, that function
will either block or return false, depending on the "current lock
level" and its argument (the "new lock level").  The rules here are
that false is returned immediately if the "current lock level" is
"remove" and the "new lock level" is "weak".  The function blocks
for all other combinations of the two.

There are two functions supposed to use device hotplug "lock levels"
other than "normal": store_online() and acpi_scan_hot_remove().
Everybody else is supposed to use "normal" (well, there are more
potential users of "weak" in drivers/base/memory.c).

acpi_scan_hot_remove() uses the "remove" lock level to indicate
that it is going to remove devices while holding device hotplug
locked.  In turn, store_online() uses the "weak" lock level so
that it doesn't block when devices are being removed with device
hotplug locked, because that may lead to a deadlock.

show_online() actually doesn't need to lock device hotplug, but
it is useful to serialize it with respect to device_offline()
and device_online() (in case user space attempts to run them
concurrently).

---
 drivers/acpi/scan.c|4 +-
 drivers/base/core.c|   72 ++---
 include/linux/device.h |   25 -
 3 files changed, 83 insertions(+), 18 deletions(-)

Index: linux-pm/drivers/base/core.c
===
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -49,6 +49,55 @@ static struct kobject *dev_kobj;
 struct kobject *sysfs_dev_char_kobj;
 struct kobject *sysfs_dev_block_kobj;
 
+static struct {
+   struct task_struct *holder;
+   enum dev_hotplug_lock_type type;
+   struct mutex lock; /* Synchronizes accesses to holder and type */
+   wait_queue_head_t wait_queue;
+} device_hotplug = {
+   .holder = NULL,
+   .type = DEV_HOTPLUG_LOCK_NONE,
+   .lock = __MUTEX_INITIALIZER(device_hotplug.lock),
+   .wait_queue = __WAIT_QUEUE_HEAD_INITIALIZER(device_hotplug.wait_queue),
+};
+
+bool __lock_device_hotplug(enum dev_hotplug_lock_type type)
+{
+   DEFINE_WAIT(wait);
+   bool ret = true;
+
+   mutex_lock(_hotplug.lock);
+   for (;;) {
+   prepare_to_wait(_hotplug.wait_queue, ,
+   TASK_UNINTERRUPTIBLE);
+   if (!device_hotplug.holder) {
+   device_hotplug.holder = current;
+   device_hotplug.type = type;
+   break;
+   } else if (type == DEV_HOTPLUG_LOCK_WEAK
+   && device_hotplug.type == DEV_HOTPLUG_LOCK_REMOVE) {
+   ret = false;
+   break;
+   }
+   mutex_unlock(_hotplug.lock);
+   schedule();
+   mutex_lock(_hotplug.lock);
+   }
+   finish_wait(_hotplug.wait_queue, );
+   mutex_unlock(_hotplug.lock);
+   return ret;
+}
+
+void unlock_device_hotplug(void)
+{
+   mutex_lock(_hotplug.lock);
+   

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Tejun Heo
Hello,

On Tue, Aug 27, 2013 at 05:21:44PM +0800, Gu Zheng wrote:
> >> OK, so the patch below is quick and dirty and overkill, but it should make 
> >> the
> >> splat go away at least.
> > 
> > And if this patch does make the splat go away for you, please also test the
> > appended one (Tejun, thanks for the hint!).
> > 
> > I'll address the ACPI part differently later.
> 
> What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
> attached one(With a preliminary test, it also can make the splat go away).:)

Hmmm.. I don't get it.  How is introducing another rwlock whic may
cause the operation, even reading the status, to fail randomly a
better option?  It's harier and more brittle.  We probably want to
implement better solution in sysfs for files which interact with
device addition / removal paths but for now I think Rafael's patch is
the right direction.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

> On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
>> On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
>>> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,
> 
> [...]
> 
>>
>> OK, so the patch below is quick and dirty and overkill, but it should make 
>> the
>> splat go away at least.
> 
> And if this patch does make the splat go away for you, please also test the
> appended one (Tejun, thanks for the hint!).
> 
> I'll address the ACPI part differently later.

What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
attached one(With a preliminary test, it also can make the splat go away).:)

Regards,
Gu

> 
[...]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


>From f1682ceaef4105f75f4d6a0bb8e77c8a5dde365b Mon Sep 17 00:00:00 2001
From: Gu Zheng 
Date: Tue, 27 Aug 2013 17:59:55 +0900
Subject: [PATCH] acpi: fix removal lock dep


Signed-off-by: Gu Zheng 
---
 drivers/acpi/scan.c|   43 ++-
 drivers/acpi/sysfs.c   |7 +--
 drivers/base/core.c|   45 -
 drivers/base/memory.c  |5 +++--
 include/linux/device.h |8 ++--
 5 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 8a46c92..bb41760 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -36,7 +36,7 @@ bool acpi_force_hot_remove;
 static const char *dummy_hid = "device";
 
 static LIST_HEAD(acpi_bus_id_list);
-static DEFINE_MUTEX(acpi_scan_lock);
+static DECLARE_RWSEM(acpi_scan_rwsem);
 static LIST_HEAD(acpi_scan_handlers_list);
 DEFINE_MUTEX(acpi_device_lock);
 LIST_HEAD(acpi_wakeup_device_list);
@@ -49,13 +49,13 @@ struct acpi_device_bus_id{
 
 void acpi_scan_lock_acquire(void)
 {
-   mutex_lock(_scan_lock);
+   down_write(_scan_rwsem);
 }
 EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
 
 void acpi_scan_lock_release(void)
 {
-   mutex_unlock(_scan_lock);
+   up_write(_scan_rwsem);
 }
 EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
 
@@ -207,7 +207,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
return -EINVAL;
}
 
-   lock_device_hotplug();
+   device_hotplug_begin();
 
/*
 * Carry out two passes here and ignore errors in the first pass,
@@ -240,7 +240,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
acpi_bus_online_companions, NULL,
NULL, NULL);
 
-   unlock_device_hotplug();
+   device_hotplug_end();
 
put_device(>dev);
return -EBUSY;
@@ -252,7 +252,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
 
acpi_bus_trim(device);
 
-   unlock_device_hotplug();
+   device_hotplug_end();
 
/* Device node has been unregistered. */
put_device(>dev);
@@ -308,7 +308,7 @@ static void acpi_bus_device_eject(void *context)
struct acpi_scan_handler *handler;
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
 
-   mutex_lock(_scan_lock);
+   acpi_scan_lock_acquire();
 
acpi_bus_get_device(handle, );
if (!device)
@@ -334,7 +334,7 @@ static void acpi_bus_device_eject(void *context)
}
 
  out:
-   mutex_unlock(_scan_lock);
+   acpi_scan_lock_release();
return;
 
  err_out:
@@ -349,8 +349,8 @@ static void acpi_scan_bus_device_check(acpi_handle handle, 
u32 ost_source)
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
int error;
 
-   mutex_lock(_scan_lock);
-   lock_device_hotplug();
+   acpi_scan_lock_acquire();
+   device_hotplug_begin();
 
if (ost_source != ACPI_NOTIFY_BUS_CHECK) {
acpi_bus_get_device(handle, );
@@ -376,9 +376,9 @@ static void acpi_scan_bus_device_check(acpi_handle handle, 
u32 ost_source)
kobject_uevent(>dev.kobj, KOBJ_ONLINE);
 
  out:
-   unlock_device_hotplug();
+   device_hotplug_end();
acpi_evaluate_hotplug_ost(handle, ost_source, ost_code, NULL);
-   mutex_unlock(_scan_lock);
+   acpi_scan_lock_release();
 }
 
 static void acpi_scan_bus_check(void *context)
@@ -469,15 +469,14 @@ void acpi_bus_hot_remove_device(void *context)
acpi_handle handle = device->handle;
int error;
 
-   mutex_lock(_scan_lock);
+   acpi_scan_lock_acquire();
 
error = acpi_scan_hot_remove(device);
if (error && handle)
acpi_evaluate_hotplug_ost(handle, ej_event->event,
  ACPI_OST_SC_NON_SPECIFIC_FAILURE,
   

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,
 
 [...]
 

 OK, so the patch below is quick and dirty and overkill, but it should make 
 the
 splat go away at least.
 
 And if this patch does make the splat go away for you, please also test the
 appended one (Tejun, thanks for the hint!).
 
 I'll address the ACPI part differently later.

What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
attached one(With a preliminary test, it also can make the splat go away).:)

Regards,
Gu

 
[...]
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


From f1682ceaef4105f75f4d6a0bb8e77c8a5dde365b Mon Sep 17 00:00:00 2001
From: Gu Zheng guz.f...@cn.fujitsu.com
Date: Tue, 27 Aug 2013 17:59:55 +0900
Subject: [PATCH] acpi: fix removal lock dep


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/acpi/scan.c|   43 ++-
 drivers/acpi/sysfs.c   |7 +--
 drivers/base/core.c|   45 -
 drivers/base/memory.c  |5 +++--
 include/linux/device.h |8 ++--
 5 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 8a46c92..bb41760 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -36,7 +36,7 @@ bool acpi_force_hot_remove;
 static const char *dummy_hid = device;
 
 static LIST_HEAD(acpi_bus_id_list);
-static DEFINE_MUTEX(acpi_scan_lock);
+static DECLARE_RWSEM(acpi_scan_rwsem);
 static LIST_HEAD(acpi_scan_handlers_list);
 DEFINE_MUTEX(acpi_device_lock);
 LIST_HEAD(acpi_wakeup_device_list);
@@ -49,13 +49,13 @@ struct acpi_device_bus_id{
 
 void acpi_scan_lock_acquire(void)
 {
-   mutex_lock(acpi_scan_lock);
+   down_write(acpi_scan_rwsem);
 }
 EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
 
 void acpi_scan_lock_release(void)
 {
-   mutex_unlock(acpi_scan_lock);
+   up_write(acpi_scan_rwsem);
 }
 EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
 
@@ -207,7 +207,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
return -EINVAL;
}
 
-   lock_device_hotplug();
+   device_hotplug_begin();
 
/*
 * Carry out two passes here and ignore errors in the first pass,
@@ -240,7 +240,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
acpi_bus_online_companions, NULL,
NULL, NULL);
 
-   unlock_device_hotplug();
+   device_hotplug_end();
 
put_device(device-dev);
return -EBUSY;
@@ -252,7 +252,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
 
acpi_bus_trim(device);
 
-   unlock_device_hotplug();
+   device_hotplug_end();
 
/* Device node has been unregistered. */
put_device(device-dev);
@@ -308,7 +308,7 @@ static void acpi_bus_device_eject(void *context)
struct acpi_scan_handler *handler;
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
 
-   mutex_lock(acpi_scan_lock);
+   acpi_scan_lock_acquire();
 
acpi_bus_get_device(handle, device);
if (!device)
@@ -334,7 +334,7 @@ static void acpi_bus_device_eject(void *context)
}
 
  out:
-   mutex_unlock(acpi_scan_lock);
+   acpi_scan_lock_release();
return;
 
  err_out:
@@ -349,8 +349,8 @@ static void acpi_scan_bus_device_check(acpi_handle handle, 
u32 ost_source)
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
int error;
 
-   mutex_lock(acpi_scan_lock);
-   lock_device_hotplug();
+   acpi_scan_lock_acquire();
+   device_hotplug_begin();
 
if (ost_source != ACPI_NOTIFY_BUS_CHECK) {
acpi_bus_get_device(handle, device);
@@ -376,9 +376,9 @@ static void acpi_scan_bus_device_check(acpi_handle handle, 
u32 ost_source)
kobject_uevent(device-dev.kobj, KOBJ_ONLINE);
 
  out:
-   unlock_device_hotplug();
+   device_hotplug_end();
acpi_evaluate_hotplug_ost(handle, ost_source, ost_code, NULL);
-   mutex_unlock(acpi_scan_lock);
+   acpi_scan_lock_release();
 }
 
 static void acpi_scan_bus_check(void *context)
@@ -469,15 +469,14 @@ void acpi_bus_hot_remove_device(void *context)
acpi_handle handle = device-handle;
int error;
 
-   mutex_lock(acpi_scan_lock);
+   acpi_scan_lock_acquire();
 
error = acpi_scan_hot_remove(device);
if (error  handle)
acpi_evaluate_hotplug_ost(handle, ej_event-event,
 

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Tejun Heo
Hello,

On Tue, Aug 27, 2013 at 05:21:44PM +0800, Gu Zheng wrote:
  OK, so the patch below is quick and dirty and overkill, but it should make 
  the
  splat go away at least.
  
  And if this patch does make the splat go away for you, please also test the
  appended one (Tejun, thanks for the hint!).
  
  I'll address the ACPI part differently later.
 
 What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
 attached one(With a preliminary test, it also can make the splat go away).:)

Hmmm.. I don't get it.  How is introducing another rwlock whic may
cause the operation, even reading the status, to fail randomly a
better option?  It's harier and more brittle.  We probably want to
implement better solution in sysfs for files which interact with
device addition / removal paths but for now I think Rafael's patch is
the right direction.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Rafael J. Wysocki
On Tuesday, August 27, 2013 02:36:44 PM Tejun Heo wrote:
 Hello,
 
 On Tue, Aug 27, 2013 at 05:21:44PM +0800, Gu Zheng wrote:
   OK, so the patch below is quick and dirty and overkill, but it should 
   make the
   splat go away at least.
   
   And if this patch does make the splat go away for you, please also test 
   the
   appended one (Tejun, thanks for the hint!).
   
   I'll address the ACPI part differently later.
  
  What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like 
  the
  attached one(With a preliminary test, it also can make the splat go away).:)
 
 Hmmm.. I don't get it.  How is introducing another rwlock whic may
 cause the operation, even reading the status, to fail randomly a
 better option?

That's more about replacing an existing mutex with an rwsem, but I don't think
it helps anyway.  And as I said before, the acpi_eject_store() case is entirely
separate in my opinion.

 It's harier and more brittle.  We probably want to
 implement better solution in sysfs for files which interact with
 device addition / removal paths but for now I think Rafael's patch is
 the right direction.

I've thought about that a bit over the last several hours and I'm still
thinking that that patch is a bit overkill, because it will trigger the
restart_syscall() for all cases when device_hotplug_lock is locked, even
if they can't lead to any deadlocks.  The only deadlockish situation is
when device *removal* is in progress when store_online(), for example,
is called.

So to address that particular situation without adding too much overhead for
other cases, I've come up with the appended patch (untested for now).

This is how it is supposed to work.

There are three lock levels for device hotplug, normal, remove
and weak.  The difference is related to how __lock_device_hotplug()
works.  Namely, if device hotplug is currently locked, that function
will either block or return false, depending on the current lock
level and its argument (the new lock level).  The rules here are
that false is returned immediately if the current lock level is
remove and the new lock level is weak.  The function blocks
for all other combinations of the two.

There are two functions supposed to use device hotplug lock levels
other than normal: store_online() and acpi_scan_hot_remove().
Everybody else is supposed to use normal (well, there are more
potential users of weak in drivers/base/memory.c).

acpi_scan_hot_remove() uses the remove lock level to indicate
that it is going to remove devices while holding device hotplug
locked.  In turn, store_online() uses the weak lock level so
that it doesn't block when devices are being removed with device
hotplug locked, because that may lead to a deadlock.

show_online() actually doesn't need to lock device hotplug, but
it is useful to serialize it with respect to device_offline()
and device_online() (in case user space attempts to run them
concurrently).

---
 drivers/acpi/scan.c|4 +-
 drivers/base/core.c|   72 ++---
 include/linux/device.h |   25 -
 3 files changed, 83 insertions(+), 18 deletions(-)

Index: linux-pm/drivers/base/core.c
===
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -49,6 +49,55 @@ static struct kobject *dev_kobj;
 struct kobject *sysfs_dev_char_kobj;
 struct kobject *sysfs_dev_block_kobj;
 
+static struct {
+   struct task_struct *holder;
+   enum dev_hotplug_lock_type type;
+   struct mutex lock; /* Synchronizes accesses to holder and type */
+   wait_queue_head_t wait_queue;
+} device_hotplug = {
+   .holder = NULL,
+   .type = DEV_HOTPLUG_LOCK_NONE,
+   .lock = __MUTEX_INITIALIZER(device_hotplug.lock),
+   .wait_queue = __WAIT_QUEUE_HEAD_INITIALIZER(device_hotplug.wait_queue),
+};
+
+bool __lock_device_hotplug(enum dev_hotplug_lock_type type)
+{
+   DEFINE_WAIT(wait);
+   bool ret = true;
+
+   mutex_lock(device_hotplug.lock);
+   for (;;) {
+   prepare_to_wait(device_hotplug.wait_queue, wait,
+   TASK_UNINTERRUPTIBLE);
+   if (!device_hotplug.holder) {
+   device_hotplug.holder = current;
+   device_hotplug.type = type;
+   break;
+   } else if (type == DEV_HOTPLUG_LOCK_WEAK
+device_hotplug.type == DEV_HOTPLUG_LOCK_REMOVE) {
+   ret = false;
+   break;
+   }
+   mutex_unlock(device_hotplug.lock);
+   schedule();
+   mutex_lock(device_hotplug.lock);
+   }
+   finish_wait(device_hotplug.wait_queue, wait);
+   mutex_unlock(device_hotplug.lock);
+   return ret;
+}
+
+void unlock_device_hotplug(void)
+{
+   mutex_lock(device_hotplug.lock);
+   BUG_ON(device_hotplug.holder != current);
+  

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Toshi Kani
On Tue, 2013-08-27 at 17:21 +0800, Gu Zheng wrote:
 Hi Rafael,
 
 On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:
 
  On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
  On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
  On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
  Hi Rafael,
  
  [...]
  
 
  OK, so the patch below is quick and dirty and overkill, but it should make 
  the
  splat go away at least.
  
  And if this patch does make the splat go away for you, please also test the
  appended one (Tejun, thanks for the hint!).
  
  I'll address the ACPI part differently later.
 
 What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
 attached one(With a preliminary test, it also can make the splat go away).:)

I am curious how msleep(10)  restart_syscall() work in the change
below.  Doesn't the msleep() make s_active held longer time, which can
lead the thread holding device_hotplug_lock to wait it for deletion?
Also, does restart_syscall() release s_active and reopen this file
again?

@@ -408,9 +408,13 @@ static ssize_t show_online(struct device *dev,
struct device_attribute *attr,
 {
bool val;

-   lock_device_hotplug();
+   if (!read_lock_device_hotplug()) {
+   msleep(10);
+   return restart_syscall();
+   }
+

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Gu Zheng
Hi Toshi,

On 08/28/2013 05:38 AM, Toshi Kani wrote:

 On Tue, 2013-08-27 at 17:21 +0800, Gu Zheng wrote:
 Hi Rafael,

 On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,

 [...]


 OK, so the patch below is quick and dirty and overkill, but it should make 
 the
 splat go away at least.

 And if this patch does make the splat go away for you, please also test the
 appended one (Tejun, thanks for the hint!).

 I'll address the ACPI part differently later.

 What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
 attached one(With a preliminary test, it also can make the splat go away).:)
 
 I am curious how msleep(10)  restart_syscall() work in the change
 below.  Doesn't the msleep() make s_active held longer time, which can
 lead the thread holding device_hotplug_lock to wait it for deletion?

Yes, but it can avoid busy waiting. 

 Also, does restart_syscall() release s_active and reopen this file
 again?

Sure, it just set a TIF_SIGPENDING flag and return an -ERESTARTNOINTR error, 
s_active/file
will be released/closed in the failed path. And when do_signal() catches the 
-ERESTARTNOINTR,
it will change the regs to restart the syscall.

Thanks,
Gu

 
 @@ -408,9 +408,13 @@ static ssize_t show_online(struct device *dev,
 struct device_attribute *attr,
  {
 bool val;
 
 -   lock_device_hotplug();
 +   if (!read_lock_device_hotplug()) {
 +   msleep(10);
 +   return restart_syscall();
 +   }
 +
 
 Thanks,
 -Toshi
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

> On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
>> On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
>>> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,
> 
> [...]
> 
>>
>> OK, so the patch below is quick and dirty and overkill, but it should make 
>> the
>> splat go away at least.
> 
> And if this patch does make the splat go away for you, please also test the
> appended one (Tejun, thanks for the hint!).

Yes, this one works too, and as expected, the ACPI part is still there.

Thanks,
Gu

==  

[ INFO: possible circular locking dependency detected ] 

3.11.0-rc6-fix-refeal-fix-01+ #171 Tainted: GF  

--- 

kworker/0:1/96 is trying to acquire lock:   

 (s_active#245){.+}, at: [] sysfs_addrm_finish+0x3b/0x70  



but task is already holding lock:   

 (device_hotplug_lock){+.+.+.}, at: [] 
lock_device_hotplug+0x17/0x20  


which lock already depends on the new lock. 





the existing dependency chain (in reverse order) is:



-> #2 (device_hotplug_lock){+.+.+.}:

   [] validate_chain+0x70c/0x870  

   [] __lock_acquire+0x36f/0x5f0  

   [] lock_acquire+0xa0/0x130 

   [] mutex_lock_nested+0x7b/0x3b0

   [] lock_device_hotplug+0x17/0x20   

   [] acpi_scan_bus_device_check+0x33/0x10f   

   [] acpi_scan_device_check+0x13/0x15

   [] acpi_os_execute_deferred+0x27/0x34  

   [] process_one_work+0x1e8/0x560

   [] worker_thread+0x120/0x3a0   

   [] kthread+0xee/0x100  

   [] ret_from_fork+0x7c/0xb0 



-> #1 (acpi_scan_lock){+.+.+.}: 

   [] validate_chain+0x70c/0x870  

   [] __lock_acquire+0x36f/0x5f0  

   [] lock_acquire+0xa0/0x130 

   [] mutex_lock_nested+0x7b/0x3b0

   [] acpi_eject_store+0x88/0x170 

   [] dev_attr_store+0x20/0x30

   [] sysfs_write_file+0xe6/0x170 

   [] vfs_write+0xc8/0x170

   [] SyS_write+0x62/0xb0 

   [] system_call_fastpath+0x16/0x1b  


 

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng
>>>[] device_remove_file+0x17/0x20
>>> 
>>>          
>>>    [] device_remove_attrs+0x2a/0xe0   
>>> 
>>>  
>>>[] device_del+0x12b/0x1f0  
>>> 
>>>  
>>>[] device_unregister+0x22/0x60 
>>> 
>>>  
>>>[] remove_memory_block+0x6d/0xa0   
>>> 
>>>  
>>>[] unregister_memory_section+0x1f/0x30 
>>> 
>>>  
>>>[] __remove_pages+0xc8/0x150   
>>> 
>>>  
>>>[] arch_remove_memory+0x94/0xd0
>>> 
>>>  
>>>[] remove_memory+0x6f/0x90 
>>> 
>>>  
>>>[] acpi_memory_remove_memory+0x7e/0xa3 
>>> 
>>>  
>>>[] acpi_memory_device_remove+0x27/0x33 
>>> 
>>>  
>>>[] acpi_bus_device_detach+0x3d/0x5e
>>> 
>>>  
>>>[] acpi_ns_walk_namespace+0xd6/0x17d   
>>> 
>>>  
>>>[] acpi_walk_namespace+0x8a/0xc4   
>>> 
>>>  
>>>[] acpi_bus_trim+0x33/0x7a 
>>> 
>>>  
>>>[] acpi_scan_hot_remove+0x160/0x281
>>> 
>>>  
>>
>> --> device_hotplug_lock is already held by show_online() at this point (or if
>> it is not held, that function will return -EBUSY after attempting to grab
>> device_remove_rwsem for reading), so acpi_scan_hot_remove() will wait for it
>> to be released before calling acpi_bus_trim().
>>
>> So the situation in which one thread holds s_active for reading and blocks on
>> device_hotplug_lock while another thread is holding it over device removal is
>> clearly impossible to me.
>>
>> So I'm not sure how device_hotplug_lock is still involved?
> 
> Well, it is not involved, but lockdep doesn't see that anyway.  Bummer.

Yeah, I also guess this is a wrong detection, but I'm not familiar with this 
filed.
If it is really so, the lockdep guys maybe need to fix it, rather than altering 
an overkill
solution to accommodate it. What's your opinion?
In fact, I think "[PATCH] driver core / ACPI: Avoid device removal locking 
problems" is
a good solution.:) 

Regards,
Gu

> 
>>>[] acpi_bus_hot_remove_device+0x37/0x73
>>> 
>>>  
>>>[] acpi_os_execute_deferred+0x27/0x34  
>>> 
>>>  
>>>[] process_one_work+0x1e8/0x560
>>> 
>>>  
>>>[] worker_thread+0x120/0x3a0   
>>>

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 10:43 PM, Rafael J. Wysocki wrote:

> On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
>> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
>>> Hi Rafael,
>>
>> Hi,
>>
>>> On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:
>>>
 From: Rafael J. Wysocki 

 There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
 around the acpi_bus_trim() call in acpi_scan_hot_remove() which
 generally removes devices (it removes ACPI device objects at least,
 but it may also remove "physical" device objects through .detach()
 callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
 attributes are removed under these locks and to remove those
 attributes it is necessary to hold the s_active references of their
 directory entries for writing.

 On the other hand, the execution of a .show() or .store() callback
 from a sysfs attribute is carried out with that attribute's s_active
 reference held for reading.  Consequently, if any device sysfs
 attribute that may be removed from within acpi_scan_hot_remove()
 through acpi_bus_trim() has a .store() or .show() callback which
 acquires either acpi_scan_lock or device_hotplug_lock, the execution
 of that callback may deadlock with the removal of the attribute.
 [Unfortunately, the "online" device attribute of CPUs and memory
 blocks and the "eject" attribute of ACPI device objects are affected
 by this issue.]

 To avoid those deadlocks introduce a new protection mechanism that
 can be used by the device sysfs attributes in question.  Namely,
 if a device sysfs attribute's .store() or .show() callback routine
 is about to acquire device_hotplug_lock or acpi_scan_lock, it can
 first execute read_lock_device_remove() and return an error code if
 that function returns false.  If true is returned, the lock in
 question may be acquired and read_unlock_device_remove() must be
 called.  [This mechanism is implemented by means of an additional
 rwsem in drivers/base/core.c.]

 Make the affected sysfs attributes in the driver core and ACPI core
 use read_lock_device_remove() and read_unlock_device_remove() as
 described above.

 Signed-off-by: Rafael J. Wysocki 
 Reported-by: Gu Zheng 
>>>
>>> I'm sorry to forget to mention that the original reporter is
>>> Yasuaki Ishimatsu . I continued
>>> the investigation and found more issues.
>>>
>>> We tested this patch on kernel 3.11-rc6, but it seems that the
>>> issue is still there. Detail info as following.
>>
>> Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
>> because we'll need to take acpi_scan_lock during system suspend for PCI hot
>> remove to work and that's under pm_mutex.  So I wonder if we can simply
>> drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
>> a side note, because dropping it won't help here.
>>
>> Now ->
>>
>>> ==  
>>> 
>>>  
>>> [ INFO: possible circular locking dependency detected ] 
>>> 
>>>  
>>> 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF
>>> 
>>>  
>>> --- 
>>> 
>>>  
>>> kworker/0:2/754 is trying to acquire lock:  
>>> 
>>>  
>>>  (s_active#73){.+}, at: [] 
>>> sysfs_addrm_finish+0x3b/0x70
>>> 
>>> 
>>> 
>>>  
>>> but task is already holding lock:   
>>> 
>>>  
>>>  (mem_sysfs_mutex){+.+.+.}, at: [] 
>>> remove_memory_block+0x1d/0xa0   
>>> 
>>> 
>>> 
>>>  
>>> which lock already depends on the new lock. 
>>>   

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Rafael J. Wysocki
On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
> On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
> > On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
> > > Hi Rafael,

[...]

> 
> OK, so the patch below is quick and dirty and overkill, but it should make the
> splat go away at least.

And if this patch does make the splat go away for you, please also test the
appended one (Tejun, thanks for the hint!).

I'll address the ACPI part differently later.

> ---
>  drivers/acpi/scan.c |3 ++-
>  drivers/base/core.c |   36 
>  2 files changed, 22 insertions(+), 17 deletions(-)
> 
> Index: linux-pm/drivers/base/core.c
> ===
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -49,6 +49,18 @@ static struct kobject *dev_kobj;
>  struct kobject *sysfs_dev_char_kobj;
>  struct kobject *sysfs_dev_block_kobj;
>  
> +static DEFINE_MUTEX(device_hotplug_lock);
> +
> +void lock_device_hotplug(void)
> +{
> + mutex_lock(_hotplug_lock);
> +}
> +
> +void unlock_device_hotplug(void)
> +{
> + mutex_unlock(_hotplug_lock);
> +}
> +
>  #ifdef CONFIG_BLOCK
>  static inline int device_is_not_partition(struct device *dev)
>  {
> @@ -408,9 +420,11 @@ static ssize_t show_online(struct device
>  {
>   bool val;
>  
> - lock_device_hotplug();
> + if (!mutex_trylock(_hotplug_lock))
> + return -EAGAIN;
> +
>   val = !dev->offline;
> - unlock_device_hotplug();
> + mutex_unlock(_hotplug_lock);
>   return sprintf(buf, "%u\n", val);
>  }
>  
> @@ -424,9 +438,11 @@ static ssize_t store_online(struct devic
>   if (ret < 0)
>   return ret;
>  
> - lock_device_hotplug();
> + if (!mutex_trylock(_hotplug_lock))
> + return -EAGAIN;
> +
>   ret = val ? device_online(dev) : device_offline(dev);
> - unlock_device_hotplug();
> + mutex_unlock(_hotplug_lock);
>   return ret < 0 ? ret : count;
>  }
>  
> @@ -1479,18 +1495,6 @@ EXPORT_SYMBOL_GPL(put_device);
>  EXPORT_SYMBOL_GPL(device_create_file);
>  EXPORT_SYMBOL_GPL(device_remove_file);
>  
> -static DEFINE_MUTEX(device_hotplug_lock);
> -
> -void lock_device_hotplug(void)
> -{
> - mutex_lock(_hotplug_lock);
> -}
> -
> -void unlock_device_hotplug(void)
> -{
> - mutex_unlock(_hotplug_lock);
> -}
> -
>  static int device_check_offline(struct device *dev, void *not_used)
>  {
>   int ret;
> Index: linux-pm/drivers/acpi/scan.c
> ===
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -510,7 +510,8 @@ acpi_eject_store(struct device *d, struc
>   if (ACPI_FAILURE(status) || !acpi_device->flags.ejectable)
>   return -ENODEV;
>  
> - mutex_lock(_scan_lock);
> + if (!mutex_trylock(_scan_lock))
> + return -EAGAIN;
>  
>   if (acpi_device->flags.eject_pending) {
>   /* ACPI eject notification event. */

Thanks,
Rafael


---
 drivers/base/core.c |   45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/base/core.c
===
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -49,6 +49,28 @@ static struct kobject *dev_kobj;
 struct kobject *sysfs_dev_char_kobj;
 struct kobject *sysfs_dev_block_kobj;
 
+static DEFINE_MUTEX(device_hotplug_lock);
+
+void lock_device_hotplug(void)
+{
+   mutex_lock(_hotplug_lock);
+}
+
+void unlock_device_hotplug(void)
+{
+   mutex_unlock(_hotplug_lock);
+}
+
+static int try_to_lock_device_hotplug(void)
+{
+   if (!mutex_trylock(_hotplug_lock)) {
+   /* Avoid busy waiting, 10 ms of sleep should do. */
+   msleep(10);
+   return restart_syscall();
+   }
+   return 0;
+}
+
 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
 {
@@ -407,8 +429,12 @@ static ssize_t show_online(struct device
   char *buf)
 {
bool val;
+   int ret;
+
+   ret = try_to_lock_device_hotplug();
+   if (ret)
+   return ret;
 
-   lock_device_hotplug();
val = !dev->offline;
unlock_device_hotplug();
return sprintf(buf, "%u\n", val);
@@ -424,7 +450,10 @@ static ssize_t store_online(struct devic
if (ret < 0)
return ret;
 
-   lock_device_hotplug();
+   ret = try_to_lock_device_hotplug();
+   if (ret)
+   return ret;
+
ret = val ? device_online(dev) : device_offline(dev);
unlock_device_hotplug();
return ret < 0 ? ret : count;
@@ -1479,18 +1508,6 @@ EXPORT_SYMBOL_GPL(put_device);
 EXPORT_SYMBOL_GPL(device_create_file);
 EXPORT_SYMBOL_GPL(device_remove_file);
 
-static DEFINE_MUTEX(device_hotplug_lock);
-
-void 

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Rafael J. Wysocki
On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
> > Hi Rafael,
> 
> Hi,
> 
> > On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:
> > 
> > > From: Rafael J. Wysocki 
> > > 
> > > There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
> > > around the acpi_bus_trim() call in acpi_scan_hot_remove() which
> > > generally removes devices (it removes ACPI device objects at least,
> > > but it may also remove "physical" device objects through .detach()
> > > callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
> > > attributes are removed under these locks and to remove those
> > > attributes it is necessary to hold the s_active references of their
> > > directory entries for writing.
> > > 
> > > On the other hand, the execution of a .show() or .store() callback
> > > from a sysfs attribute is carried out with that attribute's s_active
> > > reference held for reading.  Consequently, if any device sysfs
> > > attribute that may be removed from within acpi_scan_hot_remove()
> > > through acpi_bus_trim() has a .store() or .show() callback which
> > > acquires either acpi_scan_lock or device_hotplug_lock, the execution
> > > of that callback may deadlock with the removal of the attribute.
> > > [Unfortunately, the "online" device attribute of CPUs and memory
> > > blocks and the "eject" attribute of ACPI device objects are affected
> > > by this issue.]
> > > 
> > > To avoid those deadlocks introduce a new protection mechanism that
> > > can be used by the device sysfs attributes in question.  Namely,
> > > if a device sysfs attribute's .store() or .show() callback routine
> > > is about to acquire device_hotplug_lock or acpi_scan_lock, it can
> > > first execute read_lock_device_remove() and return an error code if
> > > that function returns false.  If true is returned, the lock in
> > > question may be acquired and read_unlock_device_remove() must be
> > > called.  [This mechanism is implemented by means of an additional
> > > rwsem in drivers/base/core.c.]
> > > 
> > > Make the affected sysfs attributes in the driver core and ACPI core
> > > use read_lock_device_remove() and read_unlock_device_remove() as
> > > described above.
> > > 
> > > Signed-off-by: Rafael J. Wysocki 
> > > Reported-by: Gu Zheng 
> > 
> > I'm sorry to forget to mention that the original reporter is
> > Yasuaki Ishimatsu . I continued
> > the investigation and found more issues.
> > 
> > We tested this patch on kernel 3.11-rc6, but it seems that the
> > issue is still there. Detail info as following.
> 
> Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
> because we'll need to take acpi_scan_lock during system suspend for PCI hot
> remove to work and that's under pm_mutex.  So I wonder if we can simply
> drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
> a side note, because dropping it won't help here.
> 
> Now ->
> 
> > ==  
> > 
> >  
> > [ INFO: possible circular locking dependency detected ] 
> > 
> >  
> > 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF
> > 
> >  
> > --- 
> > 
> >  
> > kworker/0:2/754 is trying to acquire lock:  
> > 
> >  
> >  (s_active#73){.+}, at: [] 
> > sysfs_addrm_finish+0x3b/0x70
> > 
> > 
> > 
> >  
> > but task is already holding lock:   
> > 
> >  
> >  (mem_sysfs_mutex){+.+.+.}, at: [] 
> > remove_memory_block+0x1d/0xa0   
> > 
> > 
> > 
> >  
> > which lock already depends on the new lock. 
> > 

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Rafael J. Wysocki
On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
> Hi Rafael,

Hi,

> On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki 
> > 
> > There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
> > around the acpi_bus_trim() call in acpi_scan_hot_remove() which
> > generally removes devices (it removes ACPI device objects at least,
> > but it may also remove "physical" device objects through .detach()
> > callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
> > attributes are removed under these locks and to remove those
> > attributes it is necessary to hold the s_active references of their
> > directory entries for writing.
> > 
> > On the other hand, the execution of a .show() or .store() callback
> > from a sysfs attribute is carried out with that attribute's s_active
> > reference held for reading.  Consequently, if any device sysfs
> > attribute that may be removed from within acpi_scan_hot_remove()
> > through acpi_bus_trim() has a .store() or .show() callback which
> > acquires either acpi_scan_lock or device_hotplug_lock, the execution
> > of that callback may deadlock with the removal of the attribute.
> > [Unfortunately, the "online" device attribute of CPUs and memory
> > blocks and the "eject" attribute of ACPI device objects are affected
> > by this issue.]
> > 
> > To avoid those deadlocks introduce a new protection mechanism that
> > can be used by the device sysfs attributes in question.  Namely,
> > if a device sysfs attribute's .store() or .show() callback routine
> > is about to acquire device_hotplug_lock or acpi_scan_lock, it can
> > first execute read_lock_device_remove() and return an error code if
> > that function returns false.  If true is returned, the lock in
> > question may be acquired and read_unlock_device_remove() must be
> > called.  [This mechanism is implemented by means of an additional
> > rwsem in drivers/base/core.c.]
> > 
> > Make the affected sysfs attributes in the driver core and ACPI core
> > use read_lock_device_remove() and read_unlock_device_remove() as
> > described above.
> > 
> > Signed-off-by: Rafael J. Wysocki 
> > Reported-by: Gu Zheng 
> 
> I'm sorry to forget to mention that the original reporter is
> Yasuaki Ishimatsu . I continued
> the investigation and found more issues.
> 
> We tested this patch on kernel 3.11-rc6, but it seems that the
> issue is still there. Detail info as following.

Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
because we'll need to take acpi_scan_lock during system suspend for PCI hot
remove to work and that's under pm_mutex.  So I wonder if we can simply
drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
a side note, because dropping it won't help here.

Now ->

> ==
>   
>  
> [ INFO: possible circular locking dependency detected ]   
>   
>  
> 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF  
>   
>  
> ---   
>   
>  
> kworker/0:2/754 is trying to acquire lock:
>   
>  
>  (s_active#73){.+}, at: [] sysfs_addrm_finish+0x3b/0x70 
>   
>  
>   
>   
>  
> but task is already holding lock: 
>   
>  
>  (mem_sysfs_mutex){+.+.+.}, at: [] 
> remove_memory_block+0x1d/0xa0 
>   
>   
>   
>  
> which lock already depends on the new lock.   
>   
>  
>   
>   
>  

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Rafael J. Wysocki
On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,

Hi,

 On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:
 
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
  
  There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
  around the acpi_bus_trim() call in acpi_scan_hot_remove() which
  generally removes devices (it removes ACPI device objects at least,
  but it may also remove physical device objects through .detach()
  callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
  attributes are removed under these locks and to remove those
  attributes it is necessary to hold the s_active references of their
  directory entries for writing.
  
  On the other hand, the execution of a .show() or .store() callback
  from a sysfs attribute is carried out with that attribute's s_active
  reference held for reading.  Consequently, if any device sysfs
  attribute that may be removed from within acpi_scan_hot_remove()
  through acpi_bus_trim() has a .store() or .show() callback which
  acquires either acpi_scan_lock or device_hotplug_lock, the execution
  of that callback may deadlock with the removal of the attribute.
  [Unfortunately, the online device attribute of CPUs and memory
  blocks and the eject attribute of ACPI device objects are affected
  by this issue.]
  
  To avoid those deadlocks introduce a new protection mechanism that
  can be used by the device sysfs attributes in question.  Namely,
  if a device sysfs attribute's .store() or .show() callback routine
  is about to acquire device_hotplug_lock or acpi_scan_lock, it can
  first execute read_lock_device_remove() and return an error code if
  that function returns false.  If true is returned, the lock in
  question may be acquired and read_unlock_device_remove() must be
  called.  [This mechanism is implemented by means of an additional
  rwsem in drivers/base/core.c.]
  
  Make the affected sysfs attributes in the driver core and ACPI core
  use read_lock_device_remove() and read_unlock_device_remove() as
  described above.
  
  Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
  Reported-by: Gu Zheng guz.f...@cn.fujitsu.com
 
 I'm sorry to forget to mention that the original reporter is
 Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com. I continued
 the investigation and found more issues.
 
 We tested this patch on kernel 3.11-rc6, but it seems that the
 issue is still there. Detail info as following.

Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
because we'll need to take acpi_scan_lock during system suspend for PCI hot
remove to work and that's under pm_mutex.  So I wonder if we can simply
drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
a side note, because dropping it won't help here.

Now -

 ==
   
  
 [ INFO: possible circular locking dependency detected ]   
   
  
 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF  
   
  
 ---   
   
  
 kworker/0:2/754 is trying to acquire lock:
   
  
  (s_active#73){.+}, at: [8121062b] sysfs_addrm_finish+0x3b/0x70 
   
  
   
   
  
 but task is already holding lock: 
   
  
  (mem_sysfs_mutex){+.+.+.}, at: [813b949d] 
 remove_memory_block+0x1d/0xa0 
   
   
   
  
 which lock already depends on the new lock.   
   
  
   
   
  

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Rafael J. Wysocki
On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
  Hi Rafael,
 
 Hi,
 
  On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:
  
   From: Rafael J. Wysocki rafael.j.wyso...@intel.com
   
   There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
   around the acpi_bus_trim() call in acpi_scan_hot_remove() which
   generally removes devices (it removes ACPI device objects at least,
   but it may also remove physical device objects through .detach()
   callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
   attributes are removed under these locks and to remove those
   attributes it is necessary to hold the s_active references of their
   directory entries for writing.
   
   On the other hand, the execution of a .show() or .store() callback
   from a sysfs attribute is carried out with that attribute's s_active
   reference held for reading.  Consequently, if any device sysfs
   attribute that may be removed from within acpi_scan_hot_remove()
   through acpi_bus_trim() has a .store() or .show() callback which
   acquires either acpi_scan_lock or device_hotplug_lock, the execution
   of that callback may deadlock with the removal of the attribute.
   [Unfortunately, the online device attribute of CPUs and memory
   blocks and the eject attribute of ACPI device objects are affected
   by this issue.]
   
   To avoid those deadlocks introduce a new protection mechanism that
   can be used by the device sysfs attributes in question.  Namely,
   if a device sysfs attribute's .store() or .show() callback routine
   is about to acquire device_hotplug_lock or acpi_scan_lock, it can
   first execute read_lock_device_remove() and return an error code if
   that function returns false.  If true is returned, the lock in
   question may be acquired and read_unlock_device_remove() must be
   called.  [This mechanism is implemented by means of an additional
   rwsem in drivers/base/core.c.]
   
   Make the affected sysfs attributes in the driver core and ACPI core
   use read_lock_device_remove() and read_unlock_device_remove() as
   described above.
   
   Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
   Reported-by: Gu Zheng guz.f...@cn.fujitsu.com
  
  I'm sorry to forget to mention that the original reporter is
  Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com. I continued
  the investigation and found more issues.
  
  We tested this patch on kernel 3.11-rc6, but it seems that the
  issue is still there. Detail info as following.
 
 Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
 because we'll need to take acpi_scan_lock during system suspend for PCI hot
 remove to work and that's under pm_mutex.  So I wonder if we can simply
 drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
 a side note, because dropping it won't help here.
 
 Now -
 
  ==  
  
   
  [ INFO: possible circular locking dependency detected ] 
  
   
  3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF
  
   
  --- 
  
   
  kworker/0:2/754 is trying to acquire lock:  
  
   
   (s_active#73){.+}, at: [8121062b] 
  sysfs_addrm_finish+0x3b/0x70
  
  
  
   
  but task is already holding lock:   
  
   
   (mem_sysfs_mutex){+.+.+.}, at: [813b949d] 
  remove_memory_block+0x1d/0xa0   
  
  
  
   
  which lock already depends on the new lock. 
  
   

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Rafael J. Wysocki
On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
  On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
   Hi Rafael,

[...]

 
 OK, so the patch below is quick and dirty and overkill, but it should make the
 splat go away at least.

And if this patch does make the splat go away for you, please also test the
appended one (Tejun, thanks for the hint!).

I'll address the ACPI part differently later.

 ---
  drivers/acpi/scan.c |3 ++-
  drivers/base/core.c |   36 
  2 files changed, 22 insertions(+), 17 deletions(-)
 
 Index: linux-pm/drivers/base/core.c
 ===
 --- linux-pm.orig/drivers/base/core.c
 +++ linux-pm/drivers/base/core.c
 @@ -49,6 +49,18 @@ static struct kobject *dev_kobj;
  struct kobject *sysfs_dev_char_kobj;
  struct kobject *sysfs_dev_block_kobj;
  
 +static DEFINE_MUTEX(device_hotplug_lock);
 +
 +void lock_device_hotplug(void)
 +{
 + mutex_lock(device_hotplug_lock);
 +}
 +
 +void unlock_device_hotplug(void)
 +{
 + mutex_unlock(device_hotplug_lock);
 +}
 +
  #ifdef CONFIG_BLOCK
  static inline int device_is_not_partition(struct device *dev)
  {
 @@ -408,9 +420,11 @@ static ssize_t show_online(struct device
  {
   bool val;
  
 - lock_device_hotplug();
 + if (!mutex_trylock(device_hotplug_lock))
 + return -EAGAIN;
 +
   val = !dev-offline;
 - unlock_device_hotplug();
 + mutex_unlock(device_hotplug_lock);
   return sprintf(buf, %u\n, val);
  }
  
 @@ -424,9 +438,11 @@ static ssize_t store_online(struct devic
   if (ret  0)
   return ret;
  
 - lock_device_hotplug();
 + if (!mutex_trylock(device_hotplug_lock))
 + return -EAGAIN;
 +
   ret = val ? device_online(dev) : device_offline(dev);
 - unlock_device_hotplug();
 + mutex_unlock(device_hotplug_lock);
   return ret  0 ? ret : count;
  }
  
 @@ -1479,18 +1495,6 @@ EXPORT_SYMBOL_GPL(put_device);
  EXPORT_SYMBOL_GPL(device_create_file);
  EXPORT_SYMBOL_GPL(device_remove_file);
  
 -static DEFINE_MUTEX(device_hotplug_lock);
 -
 -void lock_device_hotplug(void)
 -{
 - mutex_lock(device_hotplug_lock);
 -}
 -
 -void unlock_device_hotplug(void)
 -{
 - mutex_unlock(device_hotplug_lock);
 -}
 -
  static int device_check_offline(struct device *dev, void *not_used)
  {
   int ret;
 Index: linux-pm/drivers/acpi/scan.c
 ===
 --- linux-pm.orig/drivers/acpi/scan.c
 +++ linux-pm/drivers/acpi/scan.c
 @@ -510,7 +510,8 @@ acpi_eject_store(struct device *d, struc
   if (ACPI_FAILURE(status) || !acpi_device-flags.ejectable)
   return -ENODEV;
  
 - mutex_lock(acpi_scan_lock);
 + if (!mutex_trylock(acpi_scan_lock))
 + return -EAGAIN;
  
   if (acpi_device-flags.eject_pending) {
   /* ACPI eject notification event. */

Thanks,
Rafael


---
 drivers/base/core.c |   45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/base/core.c
===
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -49,6 +49,28 @@ static struct kobject *dev_kobj;
 struct kobject *sysfs_dev_char_kobj;
 struct kobject *sysfs_dev_block_kobj;
 
+static DEFINE_MUTEX(device_hotplug_lock);
+
+void lock_device_hotplug(void)
+{
+   mutex_lock(device_hotplug_lock);
+}
+
+void unlock_device_hotplug(void)
+{
+   mutex_unlock(device_hotplug_lock);
+}
+
+static int try_to_lock_device_hotplug(void)
+{
+   if (!mutex_trylock(device_hotplug_lock)) {
+   /* Avoid busy waiting, 10 ms of sleep should do. */
+   msleep(10);
+   return restart_syscall();
+   }
+   return 0;
+}
+
 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
 {
@@ -407,8 +429,12 @@ static ssize_t show_online(struct device
   char *buf)
 {
bool val;
+   int ret;
+
+   ret = try_to_lock_device_hotplug();
+   if (ret)
+   return ret;
 
-   lock_device_hotplug();
val = !dev-offline;
unlock_device_hotplug();
return sprintf(buf, %u\n, val);
@@ -424,7 +450,10 @@ static ssize_t store_online(struct devic
if (ret  0)
return ret;
 
-   lock_device_hotplug();
+   ret = try_to_lock_device_hotplug();
+   if (ret)
+   return ret;
+
ret = val ? device_online(dev) : device_offline(dev);
unlock_device_hotplug();
return ret  0 ? ret : count;
@@ -1479,18 +1508,6 @@ EXPORT_SYMBOL_GPL(put_device);
 EXPORT_SYMBOL_GPL(device_create_file);
 EXPORT_SYMBOL_GPL(device_remove_file);
 
-static DEFINE_MUTEX(device_hotplug_lock);
-
-void lock_device_hotplug(void)
-{
-   

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 10:43 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,

 Hi,

 On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:

 From: Rafael J. Wysocki rafael.j.wyso...@intel.com

 There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
 around the acpi_bus_trim() call in acpi_scan_hot_remove() which
 generally removes devices (it removes ACPI device objects at least,
 but it may also remove physical device objects through .detach()
 callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
 attributes are removed under these locks and to remove those
 attributes it is necessary to hold the s_active references of their
 directory entries for writing.

 On the other hand, the execution of a .show() or .store() callback
 from a sysfs attribute is carried out with that attribute's s_active
 reference held for reading.  Consequently, if any device sysfs
 attribute that may be removed from within acpi_scan_hot_remove()
 through acpi_bus_trim() has a .store() or .show() callback which
 acquires either acpi_scan_lock or device_hotplug_lock, the execution
 of that callback may deadlock with the removal of the attribute.
 [Unfortunately, the online device attribute of CPUs and memory
 blocks and the eject attribute of ACPI device objects are affected
 by this issue.]

 To avoid those deadlocks introduce a new protection mechanism that
 can be used by the device sysfs attributes in question.  Namely,
 if a device sysfs attribute's .store() or .show() callback routine
 is about to acquire device_hotplug_lock or acpi_scan_lock, it can
 first execute read_lock_device_remove() and return an error code if
 that function returns false.  If true is returned, the lock in
 question may be acquired and read_unlock_device_remove() must be
 called.  [This mechanism is implemented by means of an additional
 rwsem in drivers/base/core.c.]

 Make the affected sysfs attributes in the driver core and ACPI core
 use read_lock_device_remove() and read_unlock_device_remove() as
 described above.

 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Reported-by: Gu Zheng guz.f...@cn.fujitsu.com

 I'm sorry to forget to mention that the original reporter is
 Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com. I continued
 the investigation and found more issues.

 We tested this patch on kernel 3.11-rc6, but it seems that the
 issue is still there. Detail info as following.

 Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
 because we'll need to take acpi_scan_lock during system suspend for PCI hot
 remove to work and that's under pm_mutex.  So I wonder if we can simply
 drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
 a side note, because dropping it won't help here.

 Now -

 ==  
 
  
 [ INFO: possible circular locking dependency detected ] 
 
  
 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF
 
  
 --- 
 
  
 kworker/0:2/754 is trying to acquire lock:  
 
  
  (s_active#73){.+}, at: [8121062b] 
 sysfs_addrm_finish+0x3b/0x70
 
 
 
  
 but task is already holding lock:   
 
  
  (mem_sysfs_mutex){+.+.+.}, at: [813b949d] 
 remove_memory_block+0x1d/0xa0   
 
 
 
  
 which lock already depends on the new lock. 
 
  
 
   

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng
 and blocks on
 device_hotplug_lock while another thread is holding it over device removal is
 clearly impossible to me.

 So I'm not sure how device_hotplug_lock is still involved?
 
 Well, it is not involved, but lockdep doesn't see that anyway.  Bummer.

Yeah, I also guess this is a wrong detection, but I'm not familiar with this 
filed.
If it is really so, the lockdep guys maybe need to fix it, rather than altering 
an overkill
solution to accommodate it. What's your opinion?
In fact, I think [PATCH] driver core / ACPI: Avoid device removal locking 
problems is
a good solution.:) 

Regards,
Gu

 
[8131b6fc] acpi_bus_hot_remove_device+0x37/0x73
 
  
[81315dac] acpi_os_execute_deferred+0x27/0x34  
 
  
[8106bec8] process_one_work+0x1e8/0x560
 
  
[8106d0a0] worker_thread+0x120/0x3a0   
 
  
[81073b5e] kthread+0xee/0x100  
 
  
[815a605c] ret_from_fork+0x7c/0xb0 
 
  
 
 
  
 other info that might help us debug this:   
 
  
 
 
  
 Chain exists of:
 
  
   s_active#73 -- pm_mutex -- mem_sysfs_mutex  
 
  
 
 
  
  Possible unsafe locking scenario:  
 
  
 
 
  
CPU0CPU1 
 
  
 
 
  
   lock(mem_sysfs_mutex);
 
  
lock(pm_mutex);  
 
  
lock(mem_sysfs_mutex);   
 
  
   lock(s_active#73);
 
  
 
 
  
  *** DEADLOCK ***   
 
  
 
 OK, so the patch below is quick and dirty and overkill, but it should make the
 splat go away at least.
 
 Thanks,
 Rafael
 
 
 ---
  drivers/acpi/scan.c |3 ++-
  drivers/base/core.c |   36 
  2 files changed, 22 insertions(+), 17 deletions(-)
 
 Index: linux-pm/drivers/base/core.c

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,
 
 [...]
 

 OK, so the patch below is quick and dirty and overkill, but it should make 
 the
 splat go away at least.
 
 And if this patch does make the splat go away for you, please also test the
 appended one (Tejun, thanks for the hint!).

Yes, this one works too, and as expected, the ACPI part is still there.

Thanks,
Gu

==  

[ INFO: possible circular locking dependency detected ] 

3.11.0-rc6-fix-refeal-fix-01+ #171 Tainted: GF  

--- 

kworker/0:1/96 is trying to acquire lock:   

 (s_active#245){.+}, at: [8121062b] sysfs_addrm_finish+0x3b/0x70  



but task is already holding lock:   

 (device_hotplug_lock){+.+.+.}, at: [813a16b7] 
lock_device_hotplug+0x17/0x20  


which lock already depends on the new lock. 





the existing dependency chain (in reverse order) is:



- #2 (device_hotplug_lock){+.+.+.}:

   [810ba88c] validate_chain+0x70c/0x870  

   [810bad5f] __lock_acquire+0x36f/0x5f0  

   [810bb080] lock_acquire+0xa0/0x130 

   [8159779b] mutex_lock_nested+0x7b/0x3b0

   [813a16b7] lock_device_hotplug+0x17/0x20   

   [8131c131] acpi_scan_bus_device_check+0x33/0x10f   

   [8131c220] acpi_scan_device_check+0x13/0x15

   [81315dac] acpi_os_execute_deferred+0x27/0x34  

   [8106bec8] process_one_work+0x1e8/0x560

   [8106d0a0] worker_thread+0x120/0x3a0   

   [81073b5e] kthread+0xee/0x100  

   [815a5fdc] ret_from_fork+0x7c/0xb0 



- #1 (acpi_scan_lock){+.+.+.}: 

   [810ba88c] validate_chain+0x70c/0x870  

   [810bad5f] __lock_acquire+0x36f/0x5f0  

   [810bb080] lock_acquire+0xa0/0x130 

   [8159779b] mutex_lock_nested+0x7b/0x3b0

   [8131a58a] acpi_eject_store+0x88/0x170 

   [813a0f40] dev_attr_store+0x20/0x30

   [8120ed96] sysfs_write_file+0xe6/0x170 

   [81195bc8] vfs_write+0xc8/0x170
 

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-25 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki 
> 
> There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
> around the acpi_bus_trim() call in acpi_scan_hot_remove() which
> generally removes devices (it removes ACPI device objects at least,
> but it may also remove "physical" device objects through .detach()
> callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
> attributes are removed under these locks and to remove those
> attributes it is necessary to hold the s_active references of their
> directory entries for writing.
> 
> On the other hand, the execution of a .show() or .store() callback
> from a sysfs attribute is carried out with that attribute's s_active
> reference held for reading.  Consequently, if any device sysfs
> attribute that may be removed from within acpi_scan_hot_remove()
> through acpi_bus_trim() has a .store() or .show() callback which
> acquires either acpi_scan_lock or device_hotplug_lock, the execution
> of that callback may deadlock with the removal of the attribute.
> [Unfortunately, the "online" device attribute of CPUs and memory
> blocks and the "eject" attribute of ACPI device objects are affected
> by this issue.]
> 
> To avoid those deadlocks introduce a new protection mechanism that
> can be used by the device sysfs attributes in question.  Namely,
> if a device sysfs attribute's .store() or .show() callback routine
> is about to acquire device_hotplug_lock or acpi_scan_lock, it can
> first execute read_lock_device_remove() and return an error code if
> that function returns false.  If true is returned, the lock in
> question may be acquired and read_unlock_device_remove() must be
> called.  [This mechanism is implemented by means of an additional
> rwsem in drivers/base/core.c.]
> 
> Make the affected sysfs attributes in the driver core and ACPI core
> use read_lock_device_remove() and read_unlock_device_remove() as
> described above.
> 
> Signed-off-by: Rafael J. Wysocki 
> Reported-by: Gu Zheng 

I'm sorry to forget to mention that the original reporter is
Yasuaki Ishimatsu . I continued
the investigation and found more issues.

We tested this patch on kernel 3.11-rc6, but it seems that the
issue is still there. Detail info as following.

Thanks,
Gu

==  

 
[ INFO: possible circular locking dependency detected ] 

 
3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF

 
--- 

 
kworker/0:2/754 is trying to acquire lock:  

 
 (s_active#73){.+}, at: [] sysfs_addrm_finish+0x3b/0x70   

 


 
but task is already holding lock:   

 
 (mem_sysfs_mutex){+.+.+.}, at: [] 
remove_memory_block+0x1d/0xa0   



 
which lock already depends on the new lock. 

 


 


 
the existing dependency chain (in reverse order) is:

 


 
-> #4 (mem_sysfs_mutex){+.+.+.}:   

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-25 Thread Greg Kroah-Hartman
On Sun, Aug 25, 2013 at 10:09:47PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
> around the acpi_bus_trim() call in acpi_scan_hot_remove() which
> generally removes devices (it removes ACPI device objects at least,
> but it may also remove "physical" device objects through .detach()
> callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
> attributes are removed under these locks and to remove those
> attributes it is necessary to hold the s_active references of their
> directory entries for writing.
> 
> On the other hand, the execution of a .show() or .store() callback
> from a sysfs attribute is carried out with that attribute's s_active
> reference held for reading.  Consequently, if any device sysfs
> attribute that may be removed from within acpi_scan_hot_remove()
> through acpi_bus_trim() has a .store() or .show() callback which
> acquires either acpi_scan_lock or device_hotplug_lock, the execution
> of that callback may deadlock with the removal of the attribute.
> [Unfortunately, the "online" device attribute of CPUs and memory
> blocks and the "eject" attribute of ACPI device objects are affected
> by this issue.]
> 
> To avoid those deadlocks introduce a new protection mechanism that
> can be used by the device sysfs attributes in question.  Namely,
> if a device sysfs attribute's .store() or .show() callback routine
> is about to acquire device_hotplug_lock or acpi_scan_lock, it can
> first execute read_lock_device_remove() and return an error code if
> that function returns false.  If true is returned, the lock in
> question may be acquired and read_unlock_device_remove() must be
> called.  [This mechanism is implemented by means of an additional
> rwsem in drivers/base/core.c.]
> 
> Make the affected sysfs attributes in the driver core and ACPI core
> use read_lock_device_remove() and read_unlock_device_remove() as
> described above.
> 
> Signed-off-by: Rafael J. Wysocki 
> Reported-by: Gu Zheng 

Acked-by: Greg Kroah-Hartman 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-25 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
around the acpi_bus_trim() call in acpi_scan_hot_remove() which
generally removes devices (it removes ACPI device objects at least,
but it may also remove "physical" device objects through .detach()
callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
attributes are removed under these locks and to remove those
attributes it is necessary to hold the s_active references of their
directory entries for writing.

On the other hand, the execution of a .show() or .store() callback
from a sysfs attribute is carried out with that attribute's s_active
reference held for reading.  Consequently, if any device sysfs
attribute that may be removed from within acpi_scan_hot_remove()
through acpi_bus_trim() has a .store() or .show() callback which
acquires either acpi_scan_lock or device_hotplug_lock, the execution
of that callback may deadlock with the removal of the attribute.
[Unfortunately, the "online" device attribute of CPUs and memory
blocks and the "eject" attribute of ACPI device objects are affected
by this issue.]

To avoid those deadlocks introduce a new protection mechanism that
can be used by the device sysfs attributes in question.  Namely,
if a device sysfs attribute's .store() or .show() callback routine
is about to acquire device_hotplug_lock or acpi_scan_lock, it can
first execute read_lock_device_remove() and return an error code if
that function returns false.  If true is returned, the lock in
question may be acquired and read_unlock_device_remove() must be
called.  [This mechanism is implemented by means of an additional
rwsem in drivers/base/core.c.]

Make the affected sysfs attributes in the driver core and ACPI core
use read_lock_device_remove() and read_unlock_device_remove() as
described above.

Signed-off-by: Rafael J. Wysocki 
Reported-by: Gu Zheng 
---

For 3.12, applies on top of linux-pm.git/linux-next.

Thanks,
Rafael

---
 drivers/acpi/scan.c|8 
 drivers/base/core.c|   29 +
 include/linux/device.h |4 
 3 files changed, 41 insertions(+)

Index: linux-pm/drivers/base/core.c
===
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -408,7 +408,11 @@ static ssize_t show_online(struct device
 {
bool val;
 
+   if (!read_lock_device_remove())
+   return -EBUSY;
+
lock_device_hotplug();
+   read_unlock_device_remove();
val = !dev->offline;
unlock_device_hotplug();
return sprintf(buf, "%u\n", val);
@@ -424,7 +428,11 @@ static ssize_t store_online(struct devic
if (ret < 0)
return ret;
 
+   if (!read_lock_device_remove())
+   return -EBUSY;
+
lock_device_hotplug();
+   read_unlock_device_remove();
ret = val ? device_online(dev) : device_offline(dev);
unlock_device_hotplug();
return ret < 0 ? ret : count;
@@ -1479,8 +1487,29 @@ EXPORT_SYMBOL_GPL(put_device);
 EXPORT_SYMBOL_GPL(device_create_file);
 EXPORT_SYMBOL_GPL(device_remove_file);
 
+static DECLARE_RWSEM(device_remove_rwsem);
 static DEFINE_MUTEX(device_hotplug_lock);
 
+bool __must_check read_lock_device_remove(void)
+{
+   return down_read_trylock(_remove_rwsem);
+}
+
+void read_unlock_device_remove(void)
+{
+   up_read(_remove_rwsem);
+}
+
+void device_remove_begin(void)
+{
+   down_write(_remove_rwsem);
+}
+
+void device_remove_end(void)
+{
+   up_write(_remove_rwsem);
+}
+
 void lock_device_hotplug(void)
 {
mutex_lock(_hotplug_lock);
Index: linux-pm/include/linux/device.h
===
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -893,6 +893,10 @@ static inline bool device_supports_offli
return dev->bus && dev->bus->offline && dev->bus->online;
 }
 
+extern bool __must_check read_lock_device_remove(void);
+extern void read_unlock_device_remove(void);
+extern void device_remove_begin(void);
+extern void device_remove_end(void);
 extern void lock_device_hotplug(void);
 extern void unlock_device_hotplug(void);
 extern int device_offline(struct device *dev);
Index: linux-pm/drivers/acpi/scan.c
===
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -288,6 +288,7 @@ static void acpi_bus_device_eject(void *
struct acpi_scan_handler *handler;
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
 
+   device_remove_begin();
mutex_lock(_scan_lock);
 
acpi_bus_get_device(handle, );
@@ -315,6 +316,7 @@ static void acpi_bus_device_eject(void *
 
  out:
mutex_unlock(_scan_lock);
+   device_remove_end();
return;
 
  err_out:
@@ -449,6 +451,7 @@ void acpi_bus_hot_remove_device(void *co
acpi_handle handle = 

[PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-25 Thread Rafael J. Wysocki
From: Rafael J. Wysocki rafael.j.wyso...@intel.com

There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
around the acpi_bus_trim() call in acpi_scan_hot_remove() which
generally removes devices (it removes ACPI device objects at least,
but it may also remove physical device objects through .detach()
callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
attributes are removed under these locks and to remove those
attributes it is necessary to hold the s_active references of their
directory entries for writing.

On the other hand, the execution of a .show() or .store() callback
from a sysfs attribute is carried out with that attribute's s_active
reference held for reading.  Consequently, if any device sysfs
attribute that may be removed from within acpi_scan_hot_remove()
through acpi_bus_trim() has a .store() or .show() callback which
acquires either acpi_scan_lock or device_hotplug_lock, the execution
of that callback may deadlock with the removal of the attribute.
[Unfortunately, the online device attribute of CPUs and memory
blocks and the eject attribute of ACPI device objects are affected
by this issue.]

To avoid those deadlocks introduce a new protection mechanism that
can be used by the device sysfs attributes in question.  Namely,
if a device sysfs attribute's .store() or .show() callback routine
is about to acquire device_hotplug_lock or acpi_scan_lock, it can
first execute read_lock_device_remove() and return an error code if
that function returns false.  If true is returned, the lock in
question may be acquired and read_unlock_device_remove() must be
called.  [This mechanism is implemented by means of an additional
rwsem in drivers/base/core.c.]

Make the affected sysfs attributes in the driver core and ACPI core
use read_lock_device_remove() and read_unlock_device_remove() as
described above.

Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
Reported-by: Gu Zheng guz.f...@cn.fujitsu.com
---

For 3.12, applies on top of linux-pm.git/linux-next.

Thanks,
Rafael

---
 drivers/acpi/scan.c|8 
 drivers/base/core.c|   29 +
 include/linux/device.h |4 
 3 files changed, 41 insertions(+)

Index: linux-pm/drivers/base/core.c
===
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -408,7 +408,11 @@ static ssize_t show_online(struct device
 {
bool val;
 
+   if (!read_lock_device_remove())
+   return -EBUSY;
+
lock_device_hotplug();
+   read_unlock_device_remove();
val = !dev-offline;
unlock_device_hotplug();
return sprintf(buf, %u\n, val);
@@ -424,7 +428,11 @@ static ssize_t store_online(struct devic
if (ret  0)
return ret;
 
+   if (!read_lock_device_remove())
+   return -EBUSY;
+
lock_device_hotplug();
+   read_unlock_device_remove();
ret = val ? device_online(dev) : device_offline(dev);
unlock_device_hotplug();
return ret  0 ? ret : count;
@@ -1479,8 +1487,29 @@ EXPORT_SYMBOL_GPL(put_device);
 EXPORT_SYMBOL_GPL(device_create_file);
 EXPORT_SYMBOL_GPL(device_remove_file);
 
+static DECLARE_RWSEM(device_remove_rwsem);
 static DEFINE_MUTEX(device_hotplug_lock);
 
+bool __must_check read_lock_device_remove(void)
+{
+   return down_read_trylock(device_remove_rwsem);
+}
+
+void read_unlock_device_remove(void)
+{
+   up_read(device_remove_rwsem);
+}
+
+void device_remove_begin(void)
+{
+   down_write(device_remove_rwsem);
+}
+
+void device_remove_end(void)
+{
+   up_write(device_remove_rwsem);
+}
+
 void lock_device_hotplug(void)
 {
mutex_lock(device_hotplug_lock);
Index: linux-pm/include/linux/device.h
===
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -893,6 +893,10 @@ static inline bool device_supports_offli
return dev-bus  dev-bus-offline  dev-bus-online;
 }
 
+extern bool __must_check read_lock_device_remove(void);
+extern void read_unlock_device_remove(void);
+extern void device_remove_begin(void);
+extern void device_remove_end(void);
 extern void lock_device_hotplug(void);
 extern void unlock_device_hotplug(void);
 extern int device_offline(struct device *dev);
Index: linux-pm/drivers/acpi/scan.c
===
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -288,6 +288,7 @@ static void acpi_bus_device_eject(void *
struct acpi_scan_handler *handler;
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
 
+   device_remove_begin();
mutex_lock(acpi_scan_lock);
 
acpi_bus_get_device(handle, device);
@@ -315,6 +316,7 @@ static void acpi_bus_device_eject(void *
 
  out:
mutex_unlock(acpi_scan_lock);
+   device_remove_end();
return;
 
  

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-25 Thread Greg Kroah-Hartman
On Sun, Aug 25, 2013 at 10:09:47PM +0200, Rafael J. Wysocki wrote:
 From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
 around the acpi_bus_trim() call in acpi_scan_hot_remove() which
 generally removes devices (it removes ACPI device objects at least,
 but it may also remove physical device objects through .detach()
 callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
 attributes are removed under these locks and to remove those
 attributes it is necessary to hold the s_active references of their
 directory entries for writing.
 
 On the other hand, the execution of a .show() or .store() callback
 from a sysfs attribute is carried out with that attribute's s_active
 reference held for reading.  Consequently, if any device sysfs
 attribute that may be removed from within acpi_scan_hot_remove()
 through acpi_bus_trim() has a .store() or .show() callback which
 acquires either acpi_scan_lock or device_hotplug_lock, the execution
 of that callback may deadlock with the removal of the attribute.
 [Unfortunately, the online device attribute of CPUs and memory
 blocks and the eject attribute of ACPI device objects are affected
 by this issue.]
 
 To avoid those deadlocks introduce a new protection mechanism that
 can be used by the device sysfs attributes in question.  Namely,
 if a device sysfs attribute's .store() or .show() callback routine
 is about to acquire device_hotplug_lock or acpi_scan_lock, it can
 first execute read_lock_device_remove() and return an error code if
 that function returns false.  If true is returned, the lock in
 question may be acquired and read_unlock_device_remove() must be
 called.  [This mechanism is implemented by means of an additional
 rwsem in drivers/base/core.c.]
 
 Make the affected sysfs attributes in the driver core and ACPI core
 use read_lock_device_remove() and read_unlock_device_remove() as
 described above.
 
 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Reported-by: Gu Zheng guz.f...@cn.fujitsu.com

Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-25 Thread Gu Zheng
Hi Rafael,

On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:

 From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
 around the acpi_bus_trim() call in acpi_scan_hot_remove() which
 generally removes devices (it removes ACPI device objects at least,
 but it may also remove physical device objects through .detach()
 callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
 attributes are removed under these locks and to remove those
 attributes it is necessary to hold the s_active references of their
 directory entries for writing.
 
 On the other hand, the execution of a .show() or .store() callback
 from a sysfs attribute is carried out with that attribute's s_active
 reference held for reading.  Consequently, if any device sysfs
 attribute that may be removed from within acpi_scan_hot_remove()
 through acpi_bus_trim() has a .store() or .show() callback which
 acquires either acpi_scan_lock or device_hotplug_lock, the execution
 of that callback may deadlock with the removal of the attribute.
 [Unfortunately, the online device attribute of CPUs and memory
 blocks and the eject attribute of ACPI device objects are affected
 by this issue.]
 
 To avoid those deadlocks introduce a new protection mechanism that
 can be used by the device sysfs attributes in question.  Namely,
 if a device sysfs attribute's .store() or .show() callback routine
 is about to acquire device_hotplug_lock or acpi_scan_lock, it can
 first execute read_lock_device_remove() and return an error code if
 that function returns false.  If true is returned, the lock in
 question may be acquired and read_unlock_device_remove() must be
 called.  [This mechanism is implemented by means of an additional
 rwsem in drivers/base/core.c.]
 
 Make the affected sysfs attributes in the driver core and ACPI core
 use read_lock_device_remove() and read_unlock_device_remove() as
 described above.
 
 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Reported-by: Gu Zheng guz.f...@cn.fujitsu.com

I'm sorry to forget to mention that the original reporter is
Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com. I continued
the investigation and found more issues.

We tested this patch on kernel 3.11-rc6, but it seems that the
issue is still there. Detail info as following.

Thanks,
Gu

==  

 
[ INFO: possible circular locking dependency detected ] 

 
3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF

 
--- 

 
kworker/0:2/754 is trying to acquire lock:  

 
 (s_active#73){.+}, at: [8121062b] sysfs_addrm_finish+0x3b/0x70   

 


 
but task is already holding lock:   

 
 (mem_sysfs_mutex){+.+.+.}, at: [813b949d] 
remove_memory_block+0x1d/0xa0   



 
which lock already depends on the new lock. 

 


 


 
the existing dependency chain (in reverse order) is: