Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-08 Thread Rafael J. Wysocki
On Tuesday, September 08, 2015 12:40:08 PM Peter Zijlstra wrote:
> On Mon, Sep 07, 2015 at 11:33:21PM +0200, Rafael J. Wysocki wrote:
> > On Monday, September 07, 2015 11:11:19 AM Jiang Liu wrote:
> > Peter, Ingo, some help from lockdep expert is needed.
> > 
> > We have a splat that almost certainly is a false positive (the original 
> > report
> > is here http://marc.info/?l=linux-kernel=144109156901959=4) and no ideas
> > how to make it go away.  Can you please have a look and advise?
> 
> I can't even find the relevant code :/
> 
> From that email I get kernfs_fop_write() which calls
> kernfs_get_active(), but that does _NOT_ call cpu_up(), so that
> callchain is shite.
> 
> The actual lockdep splat is also not really helpful, and is spraying
> names over: acpi, device, sysfs and kernfs (do we really need that many
> layeres of obfuscation for a simple file?)
> 
> So, please, start by explaining the thing proper such that simple people
> like me know what to look for.

OK, I'll try to get that later this week.

Or maybe Jiang Liu can beat me to doing that. :-)

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-08 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 11:33:21PM +0200, Rafael J. Wysocki wrote:
> On Monday, September 07, 2015 11:11:19 AM Jiang Liu wrote:
> Peter, Ingo, some help from lockdep expert is needed.
> 
> We have a splat that almost certainly is a false positive (the original report
> is here http://marc.info/?l=linux-kernel=144109156901959=4) and no ideas
> how to make it go away.  Can you please have a look and advise?

I can't even find the relevant code :/

>From that email I get kernfs_fop_write() which calls
kernfs_get_active(), but that does _NOT_ call cpu_up(), so that
callchain is shite.

The actual lockdep splat is also not really helpful, and is spraying
names over: acpi, device, sysfs and kernfs (do we really need that many
layeres of obfuscation for a simple file?)

So, please, start by explaining the thing proper such that simple people
like me know what to look for.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-08 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 11:33:21PM +0200, Rafael J. Wysocki wrote:
> On Monday, September 07, 2015 11:11:19 AM Jiang Liu wrote:
> Peter, Ingo, some help from lockdep expert is needed.
> 
> We have a splat that almost certainly is a false positive (the original report
> is here http://marc.info/?l=linux-kernel=144109156901959=4) and no ideas
> how to make it go away.  Can you please have a look and advise?

I can't even find the relevant code :/

>From that email I get kernfs_fop_write() which calls
kernfs_get_active(), but that does _NOT_ call cpu_up(), so that
callchain is shite.

The actual lockdep splat is also not really helpful, and is spraying
names over: acpi, device, sysfs and kernfs (do we really need that many
layeres of obfuscation for a simple file?)

So, please, start by explaining the thing proper such that simple people
like me know what to look for.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-08 Thread Rafael J. Wysocki
On Tuesday, September 08, 2015 12:40:08 PM Peter Zijlstra wrote:
> On Mon, Sep 07, 2015 at 11:33:21PM +0200, Rafael J. Wysocki wrote:
> > On Monday, September 07, 2015 11:11:19 AM Jiang Liu wrote:
> > Peter, Ingo, some help from lockdep expert is needed.
> > 
> > We have a splat that almost certainly is a false positive (the original 
> > report
> > is here http://marc.info/?l=linux-kernel=144109156901959=4) and no ideas
> > how to make it go away.  Can you please have a look and advise?
> 
> I can't even find the relevant code :/
> 
> From that email I get kernfs_fop_write() which calls
> kernfs_get_active(), but that does _NOT_ call cpu_up(), so that
> callchain is shite.
> 
> The actual lockdep splat is also not really helpful, and is spraying
> names over: acpi, device, sysfs and kernfs (do we really need that many
> layeres of obfuscation for a simple file?)
> 
> So, please, start by explaining the thing proper such that simple people
> like me know what to look for.

OK, I'll try to get that later this week.

Or maybe Jiang Liu can beat me to doing that. :-)

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-07 Thread Rafael J. Wysocki
On Monday, September 07, 2015 11:11:19 AM Jiang Liu wrote:
> On 2015/9/4 22:16, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > On Fri, Sep 4, 2015 at 9:20 AM, Jiang Liu  wrote:
> >> On 2015/9/4 4:08, Rafael J. Wysocki wrote:
> >>> Hi Tejun,
> >>>
> >>> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
>  Hello, Rafael.
> 
>  On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
> > So acpi_device_hotplug() calls lock_device_hotplug() which simply
> > acquires device_hotplug_lock.  It is held throughout the entire
> > hot-add/hot-remove code path.
> >
> > Witing anything to /sys/devices/system/cpu/cpux/online goes through
> > online_store() in drivers/base/core.c and that does
> > lock_device_hotplug_sysfs() which then attempts to acquire
> > device_hotplug_lock using mutex_trylock().  And it only calls
> > either device_online() or device_offline() if it ends up with the
> > lock held.
> >
> > Quite frankly, I don't see how these particular two code paths can
> > deadlock in any way.
> >
> > So either a third code path is involved which is not executed
> > under device_hotplug_lock, or lockdep needs to be told to actually
> > take device_hotplug_lock into account in this case IMO.
> 
>  Hmm... all sysfs rw functions are protected from removal.  ie. by
>  default, removal of a sysfs file drains in-flight rw operations, so
>  the hot plug path grabs a lock and then tries to remove a file and
>  writing to the online file makes the file's write method to try to
>  grab the same lock.  It deadlocks if the hotunplug path already has
>  the lock and trying to drain the online file for removal.
> >>>
> >>> My point is that you cannot get into that situation.  If hotplug
> >>> already holds device_hotplug_lock, the write to "online" will end up
> >>> doing restart_syscall().
> >>>
> >>> If the "online" code path is holding the lock, hotplug cannot acquire
> >>> it and cannot proceed.
> >>>
> >>> Am I missing anything?
> >> Hi Rafael,
> >> I think your are right. The lock_device_hotplug_sysfs() has
> >> already provided a solution for such a deadlock scenario. And there's
> >> another related code path at boot as:
> >> smp_init()
> >> ->cpu_up()
> >> ->cpu_hotplug_begin()
> >> So it seems to be a false alarm. Any way to teach lockdep
> >> about this to get rid of the false alarm?
> > 
> > Well, maybe we could call lock_device_hotplug() from that code path too?
> Hi Rafael,
>   Adding lock_device_hotplug() to smp_init() doesn't solve the
> issue. So it seems to be an false alarm of lockdep, and I don't know
> how to get rid of such an lockdep false alarm:(

Peter, Ingo, some help from lockdep expert is needed.

We have a splat that almost certainly is a false positive (the original report
is here http://marc.info/?l=linux-kernel=144109156901959=4) and no ideas
how to make it go away.  Can you please have a look and advise?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-07 Thread Rafael J. Wysocki
On Monday, September 07, 2015 11:11:19 AM Jiang Liu wrote:
> On 2015/9/4 22:16, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > On Fri, Sep 4, 2015 at 9:20 AM, Jiang Liu  wrote:
> >> On 2015/9/4 4:08, Rafael J. Wysocki wrote:
> >>> Hi Tejun,
> >>>
> >>> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
>  Hello, Rafael.
> 
>  On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
> > So acpi_device_hotplug() calls lock_device_hotplug() which simply
> > acquires device_hotplug_lock.  It is held throughout the entire
> > hot-add/hot-remove code path.
> >
> > Witing anything to /sys/devices/system/cpu/cpux/online goes through
> > online_store() in drivers/base/core.c and that does
> > lock_device_hotplug_sysfs() which then attempts to acquire
> > device_hotplug_lock using mutex_trylock().  And it only calls
> > either device_online() or device_offline() if it ends up with the
> > lock held.
> >
> > Quite frankly, I don't see how these particular two code paths can
> > deadlock in any way.
> >
> > So either a third code path is involved which is not executed
> > under device_hotplug_lock, or lockdep needs to be told to actually
> > take device_hotplug_lock into account in this case IMO.
> 
>  Hmm... all sysfs rw functions are protected from removal.  ie. by
>  default, removal of a sysfs file drains in-flight rw operations, so
>  the hot plug path grabs a lock and then tries to remove a file and
>  writing to the online file makes the file's write method to try to
>  grab the same lock.  It deadlocks if the hotunplug path already has
>  the lock and trying to drain the online file for removal.
> >>>
> >>> My point is that you cannot get into that situation.  If hotplug
> >>> already holds device_hotplug_lock, the write to "online" will end up
> >>> doing restart_syscall().
> >>>
> >>> If the "online" code path is holding the lock, hotplug cannot acquire
> >>> it and cannot proceed.
> >>>
> >>> Am I missing anything?
> >> Hi Rafael,
> >> I think your are right. The lock_device_hotplug_sysfs() has
> >> already provided a solution for such a deadlock scenario. And there's
> >> another related code path at boot as:
> >> smp_init()
> >> ->cpu_up()
> >> ->cpu_hotplug_begin()
> >> So it seems to be a false alarm. Any way to teach lockdep
> >> about this to get rid of the false alarm?
> > 
> > Well, maybe we could call lock_device_hotplug() from that code path too?
> Hi Rafael,
>   Adding lock_device_hotplug() to smp_init() doesn't solve the
> issue. So it seems to be an false alarm of lockdep, and I don't know
> how to get rid of such an lockdep false alarm:(

Peter, Ingo, some help from lockdep expert is needed.

We have a splat that almost certainly is a false positive (the original report
is here http://marc.info/?l=linux-kernel=144109156901959=4) and no ideas
how to make it go away.  Can you please have a look and advise?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-06 Thread Jiang Liu
On 2015/9/4 22:16, Rafael J. Wysocki wrote:
> Hi,
> 
> On Fri, Sep 4, 2015 at 9:20 AM, Jiang Liu  wrote:
>> On 2015/9/4 4:08, Rafael J. Wysocki wrote:
>>> Hi Tejun,
>>>
>>> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
 Hello, Rafael.

 On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
> So acpi_device_hotplug() calls lock_device_hotplug() which simply
> acquires device_hotplug_lock.  It is held throughout the entire
> hot-add/hot-remove code path.
>
> Witing anything to /sys/devices/system/cpu/cpux/online goes through
> online_store() in drivers/base/core.c and that does
> lock_device_hotplug_sysfs() which then attempts to acquire
> device_hotplug_lock using mutex_trylock().  And it only calls
> either device_online() or device_offline() if it ends up with the
> lock held.
>
> Quite frankly, I don't see how these particular two code paths can
> deadlock in any way.
>
> So either a third code path is involved which is not executed
> under device_hotplug_lock, or lockdep needs to be told to actually
> take device_hotplug_lock into account in this case IMO.

 Hmm... all sysfs rw functions are protected from removal.  ie. by
 default, removal of a sysfs file drains in-flight rw operations, so
 the hot plug path grabs a lock and then tries to remove a file and
 writing to the online file makes the file's write method to try to
 grab the same lock.  It deadlocks if the hotunplug path already has
 the lock and trying to drain the online file for removal.
>>>
>>> My point is that you cannot get into that situation.  If hotplug
>>> already holds device_hotplug_lock, the write to "online" will end up
>>> doing restart_syscall().
>>>
>>> If the "online" code path is holding the lock, hotplug cannot acquire
>>> it and cannot proceed.
>>>
>>> Am I missing anything?
>> Hi Rafael,
>> I think your are right. The lock_device_hotplug_sysfs() has
>> already provided a solution for such a deadlock scenario. And there's
>> another related code path at boot as:
>> smp_init()
>> ->cpu_up()
>> ->cpu_hotplug_begin()
>> So it seems to be a false alarm. Any way to teach lockdep
>> about this to get rid of the false alarm?
> 
> Well, maybe we could call lock_device_hotplug() from that code path too?
Hi Rafael,
Adding lock_device_hotplug() to smp_init() doesn't solve the
issue. So it seems to be an false alarm of lockdep, and I don't know
how to get rid of such an lockdep false alarm:(
Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-06 Thread Jiang Liu
On 2015/9/4 22:16, Rafael J. Wysocki wrote:
> Hi,
> 
> On Fri, Sep 4, 2015 at 9:20 AM, Jiang Liu  wrote:
>> On 2015/9/4 4:08, Rafael J. Wysocki wrote:
>>> Hi Tejun,
>>>
>>> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
 Hello, Rafael.

 On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
> So acpi_device_hotplug() calls lock_device_hotplug() which simply
> acquires device_hotplug_lock.  It is held throughout the entire
> hot-add/hot-remove code path.
>
> Witing anything to /sys/devices/system/cpu/cpux/online goes through
> online_store() in drivers/base/core.c and that does
> lock_device_hotplug_sysfs() which then attempts to acquire
> device_hotplug_lock using mutex_trylock().  And it only calls
> either device_online() or device_offline() if it ends up with the
> lock held.
>
> Quite frankly, I don't see how these particular two code paths can
> deadlock in any way.
>
> So either a third code path is involved which is not executed
> under device_hotplug_lock, or lockdep needs to be told to actually
> take device_hotplug_lock into account in this case IMO.

 Hmm... all sysfs rw functions are protected from removal.  ie. by
 default, removal of a sysfs file drains in-flight rw operations, so
 the hot plug path grabs a lock and then tries to remove a file and
 writing to the online file makes the file's write method to try to
 grab the same lock.  It deadlocks if the hotunplug path already has
 the lock and trying to drain the online file for removal.
>>>
>>> My point is that you cannot get into that situation.  If hotplug
>>> already holds device_hotplug_lock, the write to "online" will end up
>>> doing restart_syscall().
>>>
>>> If the "online" code path is holding the lock, hotplug cannot acquire
>>> it and cannot proceed.
>>>
>>> Am I missing anything?
>> Hi Rafael,
>> I think your are right. The lock_device_hotplug_sysfs() has
>> already provided a solution for such a deadlock scenario. And there's
>> another related code path at boot as:
>> smp_init()
>> ->cpu_up()
>> ->cpu_hotplug_begin()
>> So it seems to be a false alarm. Any way to teach lockdep
>> about this to get rid of the false alarm?
> 
> Well, maybe we could call lock_device_hotplug() from that code path too?
Hi Rafael,
Adding lock_device_hotplug() to smp_init() doesn't solve the
issue. So it seems to be an false alarm of lockdep, and I don't know
how to get rid of such an lockdep false alarm:(
Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-04 Thread Rafael J. Wysocki
Hi,

On Fri, Sep 4, 2015 at 9:20 AM, Jiang Liu  wrote:
> On 2015/9/4 4:08, Rafael J. Wysocki wrote:
>> Hi Tejun,
>>
>> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
>>> Hello, Rafael.
>>>
>>> On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
 So acpi_device_hotplug() calls lock_device_hotplug() which simply
 acquires device_hotplug_lock.  It is held throughout the entire
 hot-add/hot-remove code path.

 Witing anything to /sys/devices/system/cpu/cpux/online goes through
 online_store() in drivers/base/core.c and that does
 lock_device_hotplug_sysfs() which then attempts to acquire
 device_hotplug_lock using mutex_trylock().  And it only calls
 either device_online() or device_offline() if it ends up with the
 lock held.

 Quite frankly, I don't see how these particular two code paths can
 deadlock in any way.

 So either a third code path is involved which is not executed
 under device_hotplug_lock, or lockdep needs to be told to actually
 take device_hotplug_lock into account in this case IMO.
>>>
>>> Hmm... all sysfs rw functions are protected from removal.  ie. by
>>> default, removal of a sysfs file drains in-flight rw operations, so
>>> the hot plug path grabs a lock and then tries to remove a file and
>>> writing to the online file makes the file's write method to try to
>>> grab the same lock.  It deadlocks if the hotunplug path already has
>>> the lock and trying to drain the online file for removal.
>>
>> My point is that you cannot get into that situation.  If hotplug
>> already holds device_hotplug_lock, the write to "online" will end up
>> doing restart_syscall().
>>
>> If the "online" code path is holding the lock, hotplug cannot acquire
>> it and cannot proceed.
>>
>> Am I missing anything?
> Hi Rafael,
> I think your are right. The lock_device_hotplug_sysfs() has
> already provided a solution for such a deadlock scenario. And there's
> another related code path at boot as:
> smp_init()
> ->cpu_up()
> ->cpu_hotplug_begin()
> So it seems to be a false alarm. Any way to teach lockdep
> about this to get rid of the false alarm?

Well, maybe we could call lock_device_hotplug() from that code path too?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-04 Thread Jiang Liu
On 2015/9/4 4:08, Rafael J. Wysocki wrote:
> Hi Tejun,
> 
> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
>> Hello, Rafael.
>>
>> On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
>>> So acpi_device_hotplug() calls lock_device_hotplug() which simply
>>> acquires device_hotplug_lock.  It is held throughout the entire
>>> hot-add/hot-remove code path.
>>>
>>> Witing anything to /sys/devices/system/cpu/cpux/online goes through
>>> online_store() in drivers/base/core.c and that does
>>> lock_device_hotplug_sysfs() which then attempts to acquire
>>> device_hotplug_lock using mutex_trylock().  And it only calls
>>> either device_online() or device_offline() if it ends up with the
>>> lock held.
>>>
>>> Quite frankly, I don't see how these particular two code paths can
>>> deadlock in any way.
>>>
>>> So either a third code path is involved which is not executed
>>> under device_hotplug_lock, or lockdep needs to be told to actually
>>> take device_hotplug_lock into account in this case IMO.
>>
>> Hmm... all sysfs rw functions are protected from removal.  ie. by
>> default, removal of a sysfs file drains in-flight rw operations, so
>> the hot plug path grabs a lock and then tries to remove a file and
>> writing to the online file makes the file's write method to try to
>> grab the same lock.  It deadlocks if the hotunplug path already has
>> the lock and trying to drain the online file for removal.
> 
> My point is that you cannot get into that situation.  If hotplug
> already holds device_hotplug_lock, the write to "online" will end up
> doing restart_syscall().
> 
> If the "online" code path is holding the lock, hotplug cannot acquire
> it and cannot proceed.
> 
> Am I missing anything?
Hi Rafael,
I think your are right. The lock_device_hotplug_sysfs() has
already provided a solution for such a deadlock scenario. And there's
another related code path at boot as:
smp_init()
->cpu_up()
->cpu_hotplug_begin()
So it seems to be a false alarm. Any way to teach lockdep
about this to get rid of the false alarm?
Thanks!
Gerry

> 
> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-04 Thread Rafael J. Wysocki
Hi,

On Fri, Sep 4, 2015 at 9:20 AM, Jiang Liu  wrote:
> On 2015/9/4 4:08, Rafael J. Wysocki wrote:
>> Hi Tejun,
>>
>> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
>>> Hello, Rafael.
>>>
>>> On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
 So acpi_device_hotplug() calls lock_device_hotplug() which simply
 acquires device_hotplug_lock.  It is held throughout the entire
 hot-add/hot-remove code path.

 Witing anything to /sys/devices/system/cpu/cpux/online goes through
 online_store() in drivers/base/core.c and that does
 lock_device_hotplug_sysfs() which then attempts to acquire
 device_hotplug_lock using mutex_trylock().  And it only calls
 either device_online() or device_offline() if it ends up with the
 lock held.

 Quite frankly, I don't see how these particular two code paths can
 deadlock in any way.

 So either a third code path is involved which is not executed
 under device_hotplug_lock, or lockdep needs to be told to actually
 take device_hotplug_lock into account in this case IMO.
>>>
>>> Hmm... all sysfs rw functions are protected from removal.  ie. by
>>> default, removal of a sysfs file drains in-flight rw operations, so
>>> the hot plug path grabs a lock and then tries to remove a file and
>>> writing to the online file makes the file's write method to try to
>>> grab the same lock.  It deadlocks if the hotunplug path already has
>>> the lock and trying to drain the online file for removal.
>>
>> My point is that you cannot get into that situation.  If hotplug
>> already holds device_hotplug_lock, the write to "online" will end up
>> doing restart_syscall().
>>
>> If the "online" code path is holding the lock, hotplug cannot acquire
>> it and cannot proceed.
>>
>> Am I missing anything?
> Hi Rafael,
> I think your are right. The lock_device_hotplug_sysfs() has
> already provided a solution for such a deadlock scenario. And there's
> another related code path at boot as:
> smp_init()
> ->cpu_up()
> ->cpu_hotplug_begin()
> So it seems to be a false alarm. Any way to teach lockdep
> about this to get rid of the false alarm?

Well, maybe we could call lock_device_hotplug() from that code path too?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-04 Thread Jiang Liu
On 2015/9/4 4:08, Rafael J. Wysocki wrote:
> Hi Tejun,
> 
> On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
>> Hello, Rafael.
>>
>> On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
>>> So acpi_device_hotplug() calls lock_device_hotplug() which simply
>>> acquires device_hotplug_lock.  It is held throughout the entire
>>> hot-add/hot-remove code path.
>>>
>>> Witing anything to /sys/devices/system/cpu/cpux/online goes through
>>> online_store() in drivers/base/core.c and that does
>>> lock_device_hotplug_sysfs() which then attempts to acquire
>>> device_hotplug_lock using mutex_trylock().  And it only calls
>>> either device_online() or device_offline() if it ends up with the
>>> lock held.
>>>
>>> Quite frankly, I don't see how these particular two code paths can
>>> deadlock in any way.
>>>
>>> So either a third code path is involved which is not executed
>>> under device_hotplug_lock, or lockdep needs to be told to actually
>>> take device_hotplug_lock into account in this case IMO.
>>
>> Hmm... all sysfs rw functions are protected from removal.  ie. by
>> default, removal of a sysfs file drains in-flight rw operations, so
>> the hot plug path grabs a lock and then tries to remove a file and
>> writing to the online file makes the file's write method to try to
>> grab the same lock.  It deadlocks if the hotunplug path already has
>> the lock and trying to drain the online file for removal.
> 
> My point is that you cannot get into that situation.  If hotplug
> already holds device_hotplug_lock, the write to "online" will end up
> doing restart_syscall().
> 
> If the "online" code path is holding the lock, hotplug cannot acquire
> it and cannot proceed.
> 
> Am I missing anything?
Hi Rafael,
I think your are right. The lock_device_hotplug_sysfs() has
already provided a solution for such a deadlock scenario. And there's
another related code path at boot as:
smp_init()
->cpu_up()
->cpu_hotplug_begin()
So it seems to be a false alarm. Any way to teach lockdep
about this to get rid of the false alarm?
Thanks!
Gerry

> 
> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-03 Thread Rafael J. Wysocki
Hi Tejun,

On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
> Hello, Rafael.
>
> On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
>> So acpi_device_hotplug() calls lock_device_hotplug() which simply
>> acquires device_hotplug_lock.  It is held throughout the entire
>> hot-add/hot-remove code path.
>>
>> Witing anything to /sys/devices/system/cpu/cpux/online goes through
>> online_store() in drivers/base/core.c and that does
>> lock_device_hotplug_sysfs() which then attempts to acquire
>> device_hotplug_lock using mutex_trylock().  And it only calls
>> either device_online() or device_offline() if it ends up with the
>> lock held.
>>
>> Quite frankly, I don't see how these particular two code paths can
>> deadlock in any way.
>>
>> So either a third code path is involved which is not executed
>> under device_hotplug_lock, or lockdep needs to be told to actually
>> take device_hotplug_lock into account in this case IMO.
>
> Hmm... all sysfs rw functions are protected from removal.  ie. by
> default, removal of a sysfs file drains in-flight rw operations, so
> the hot plug path grabs a lock and then tries to remove a file and
> writing to the online file makes the file's write method to try to
> grab the same lock.  It deadlocks if the hotunplug path already has
> the lock and trying to drain the online file for removal.

My point is that you cannot get into that situation.  If hotplug
already holds device_hotplug_lock, the write to "online" will end up
doing restart_syscall().

If the "online" code path is holding the lock, hotplug cannot acquire
it and cannot proceed.

Am I missing anything?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-03 Thread Tejun Heo
Hello, Rafael.

On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
> So acpi_device_hotplug() calls lock_device_hotplug() which simply
> acquires device_hotplug_lock.  It is held throughout the entire
> hot-add/hot-remove code path.
> 
> Witing anything to /sys/devices/system/cpu/cpux/online goes through
> online_store() in drivers/base/core.c and that does
> lock_device_hotplug_sysfs() which then attempts to acquire
> device_hotplug_lock using mutex_trylock().  And it only calls
> either device_online() or device_offline() if it ends up with the
> lock held.
> 
> Quite frankly, I don't see how these particular two code paths can
> deadlock in any way.
> 
> So either a third code path is involved which is not executed
> under device_hotplug_lock, or lockdep needs to be told to actually
> take device_hotplug_lock into account in this case IMO.

Hmm... all sysfs rw functions are protected from removal.  ie. by
default, removal of a sysfs file drains in-flight rw operations, so
the hot plug path grabs a lock and then tries to remove a file and
writing to the online file makes the file's write method to try to
grab the same lock.  It deadlocks if the hotunplug path already has
the lock and trying to drain the online file for removal.

The same problem exists for "delete" files but that's already handled
from device core side.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-03 Thread Tejun Heo
Hello, Rafael.

On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
> So acpi_device_hotplug() calls lock_device_hotplug() which simply
> acquires device_hotplug_lock.  It is held throughout the entire
> hot-add/hot-remove code path.
> 
> Witing anything to /sys/devices/system/cpu/cpux/online goes through
> online_store() in drivers/base/core.c and that does
> lock_device_hotplug_sysfs() which then attempts to acquire
> device_hotplug_lock using mutex_trylock().  And it only calls
> either device_online() or device_offline() if it ends up with the
> lock held.
> 
> Quite frankly, I don't see how these particular two code paths can
> deadlock in any way.
> 
> So either a third code path is involved which is not executed
> under device_hotplug_lock, or lockdep needs to be told to actually
> take device_hotplug_lock into account in this case IMO.

Hmm... all sysfs rw functions are protected from removal.  ie. by
default, removal of a sysfs file drains in-flight rw operations, so
the hot plug path grabs a lock and then tries to remove a file and
writing to the online file makes the file's write method to try to
grab the same lock.  It deadlocks if the hotunplug path already has
the lock and trying to drain the online file for removal.

The same problem exists for "delete" files but that's already handled
from device core side.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-03 Thread Rafael J. Wysocki
Hi Tejun,

On Thu, Sep 3, 2015 at 6:19 PM, Tejun Heo  wrote:
> Hello, Rafael.
>
> On Thu, Sep 03, 2015 at 02:58:16AM +0200, Rafael J. Wysocki wrote:
>> So acpi_device_hotplug() calls lock_device_hotplug() which simply
>> acquires device_hotplug_lock.  It is held throughout the entire
>> hot-add/hot-remove code path.
>>
>> Witing anything to /sys/devices/system/cpu/cpux/online goes through
>> online_store() in drivers/base/core.c and that does
>> lock_device_hotplug_sysfs() which then attempts to acquire
>> device_hotplug_lock using mutex_trylock().  And it only calls
>> either device_online() or device_offline() if it ends up with the
>> lock held.
>>
>> Quite frankly, I don't see how these particular two code paths can
>> deadlock in any way.
>>
>> So either a third code path is involved which is not executed
>> under device_hotplug_lock, or lockdep needs to be told to actually
>> take device_hotplug_lock into account in this case IMO.
>
> Hmm... all sysfs rw functions are protected from removal.  ie. by
> default, removal of a sysfs file drains in-flight rw operations, so
> the hot plug path grabs a lock and then tries to remove a file and
> writing to the online file makes the file's write method to try to
> grab the same lock.  It deadlocks if the hotunplug path already has
> the lock and trying to drain the online file for removal.

My point is that you cannot get into that situation.  If hotplug
already holds device_hotplug_lock, the write to "online" will end up
doing restart_syscall().

If the "online" code path is holding the lock, hotplug cannot acquire
it and cannot proceed.

Am I missing anything?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-02 Thread Rafael J. Wysocki
On Wednesday, September 02, 2015 12:14:45 PM Tejun Heo wrote:
> On Tue, Sep 01, 2015 at 03:12:34PM +0800, Jiang Liu wrote:
> > Hi Rafael and Tejun,
> > When running CPU hotplug tests, it triggers an lockdep warning
> > as follow. The two possible deadlock paths are:
> > 1) echo x > /sys/devices/system/cpu/cpux/online
> >->kernfs_fop_write()
> >  ->kernfs_get_active()
> > 1.a)   ->rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
> >  ->cpu_up()
> > 1.b)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
> > 2) hardware triggers hotplug evetns
> >->acpi_device_hotplug()
> >  ->acpi_processor_remove()
> > 2.a)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
> >  ->unregister_cpu()
> >->device_del()
> >  ->kernfs_remove_by_name_ns()
> >->__kernfs_remove()
> >  ->kernfs_drain()
> > 2.b)   ->rwsem_acquire(>dep_map, 0, 0, _RET_IP_)
> > 
> > So there is a possible deadlock scenario among 1.a, 1.b, 2.a and 2.b.
> > I'm not familiar with kernfs, so could you please help to comment:
> > 1) whether is a real deadlock issue?
> 
> Yes, it seems to be.  It's highly unlikely but still possible.

Hmm.

So acpi_device_hotplug() calls lock_device_hotplug() which simply
acquires device_hotplug_lock.  It is held throughout the entire
hot-add/hot-remove code path.

Witing anything to /sys/devices/system/cpu/cpux/online goes through
online_store() in drivers/base/core.c and that does
lock_device_hotplug_sysfs() which then attempts to acquire
device_hotplug_lock using mutex_trylock().  And it only calls
either device_online() or device_offline() if it ends up with the
lock held.

Quite frankly, I don't see how these particular two code paths can
deadlock in any way.

So either a third code path is involved which is not executed
under device_hotplug_lock, or lockdep needs to be told to actually
take device_hotplug_lock into account in this case IMO.

> > 2) any recommended way to get it fixed?
> 
> This usually happens with "delete" files and it's worked around by
> performing special self-removal on the file before actually removing
> the device.  I suppose on/offline files would need to turn off
> active_protection with kernfs_[un]break_active_protection() which
> should probably grow sysfs and device layer wrappers.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-02 Thread Tejun Heo
On Tue, Sep 01, 2015 at 03:12:34PM +0800, Jiang Liu wrote:
> Hi Rafael and Tejun,
>   When running CPU hotplug tests, it triggers an lockdep warning
> as follow. The two possible deadlock paths are:
> 1) echo x > /sys/devices/system/cpu/cpux/online
>->kernfs_fop_write()
>  ->kernfs_get_active()
> 1.a)   ->rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
>  ->cpu_up()
> 1.b)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
> 2) hardware triggers hotplug evetns
>->acpi_device_hotplug()
>  ->acpi_processor_remove()
> 2.a)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
>  ->unregister_cpu()
>->device_del()
>  ->kernfs_remove_by_name_ns()
>->__kernfs_remove()
>  ->kernfs_drain()
> 2.b)   ->rwsem_acquire(>dep_map, 0, 0, _RET_IP_)
> 
> So there is a possible deadlock scenario among 1.a, 1.b, 2.a and 2.b.
> I'm not familiar with kernfs, so could you please help to comment:
> 1) whether is a real deadlock issue?

Yes, it seems to be.  It's highly unlikely but still possible.

> 2) any recommended way to get it fixed?

This usually happens with "delete" files and it's worked around by
performing special self-removal on the file before actually removing
the device.  I suppose on/offline files would need to turn off
active_protection with kernfs_[un]break_active_protection() which
should probably grow sysfs and device layer wrappers.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-02 Thread Tejun Heo
On Tue, Sep 01, 2015 at 03:12:34PM +0800, Jiang Liu wrote:
> Hi Rafael and Tejun,
>   When running CPU hotplug tests, it triggers an lockdep warning
> as follow. The two possible deadlock paths are:
> 1) echo x > /sys/devices/system/cpu/cpux/online
>->kernfs_fop_write()
>  ->kernfs_get_active()
> 1.a)   ->rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
>  ->cpu_up()
> 1.b)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
> 2) hardware triggers hotplug evetns
>->acpi_device_hotplug()
>  ->acpi_processor_remove()
> 2.a)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
>  ->unregister_cpu()
>->device_del()
>  ->kernfs_remove_by_name_ns()
>->__kernfs_remove()
>  ->kernfs_drain()
> 2.b)   ->rwsem_acquire(>dep_map, 0, 0, _RET_IP_)
> 
> So there is a possible deadlock scenario among 1.a, 1.b, 2.a and 2.b.
> I'm not familiar with kernfs, so could you please help to comment:
> 1) whether is a real deadlock issue?

Yes, it seems to be.  It's highly unlikely but still possible.

> 2) any recommended way to get it fixed?

This usually happens with "delete" files and it's worked around by
performing special self-removal on the file before actually removing
the device.  I suppose on/offline files would need to turn off
active_protection with kernfs_[un]break_active_protection() which
should probably grow sysfs and device layer wrappers.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible deadlock related to CPU hotplug and kernfs

2015-09-02 Thread Rafael J. Wysocki
On Wednesday, September 02, 2015 12:14:45 PM Tejun Heo wrote:
> On Tue, Sep 01, 2015 at 03:12:34PM +0800, Jiang Liu wrote:
> > Hi Rafael and Tejun,
> > When running CPU hotplug tests, it triggers an lockdep warning
> > as follow. The two possible deadlock paths are:
> > 1) echo x > /sys/devices/system/cpu/cpux/online
> >->kernfs_fop_write()
> >  ->kernfs_get_active()
> > 1.a)   ->rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
> >  ->cpu_up()
> > 1.b)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
> > 2) hardware triggers hotplug evetns
> >->acpi_device_hotplug()
> >  ->acpi_processor_remove()
> > 2.a)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
> >  ->unregister_cpu()
> >->device_del()
> >  ->kernfs_remove_by_name_ns()
> >->__kernfs_remove()
> >  ->kernfs_drain()
> > 2.b)   ->rwsem_acquire(>dep_map, 0, 0, _RET_IP_)
> > 
> > So there is a possible deadlock scenario among 1.a, 1.b, 2.a and 2.b.
> > I'm not familiar with kernfs, so could you please help to comment:
> > 1) whether is a real deadlock issue?
> 
> Yes, it seems to be.  It's highly unlikely but still possible.

Hmm.

So acpi_device_hotplug() calls lock_device_hotplug() which simply
acquires device_hotplug_lock.  It is held throughout the entire
hot-add/hot-remove code path.

Witing anything to /sys/devices/system/cpu/cpux/online goes through
online_store() in drivers/base/core.c and that does
lock_device_hotplug_sysfs() which then attempts to acquire
device_hotplug_lock using mutex_trylock().  And it only calls
either device_online() or device_offline() if it ends up with the
lock held.

Quite frankly, I don't see how these particular two code paths can
deadlock in any way.

So either a third code path is involved which is not executed
under device_hotplug_lock, or lockdep needs to be told to actually
take device_hotplug_lock into account in this case IMO.

> > 2) any recommended way to get it fixed?
> 
> This usually happens with "delete" files and it's worked around by
> performing special self-removal on the file before actually removing
> the device.  I suppose on/offline files would need to turn off
> active_protection with kernfs_[un]break_active_protection() which
> should probably grow sysfs and device layer wrappers.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Possible deadlock related to CPU hotplug and kernfs

2015-09-01 Thread Jiang Liu
Hi Rafael and Tejun,
When running CPU hotplug tests, it triggers an lockdep warning
as follow. The two possible deadlock paths are:
1) echo x > /sys/devices/system/cpu/cpux/online
   ->kernfs_fop_write()
 ->kernfs_get_active()
1.a)   ->rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
 ->cpu_up()
1.b)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
2) hardware triggers hotplug evetns
   ->acpi_device_hotplug()
 ->acpi_processor_remove()
2.a)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
 ->unregister_cpu()
   ->device_del()
 ->kernfs_remove_by_name_ns()
   ->__kernfs_remove()
 ->kernfs_drain()
2.b)   ->rwsem_acquire(>dep_map, 0, 0, _RET_IP_)

So there is a possible deadlock scenario among 1.a, 1.b, 2.a and 2.b.
I'm not familiar with kernfs, so could you please help to comment:
1) whether is a real deadlock issue?
2) any recommended way to get it fixed?
Thanks!
Gerry

Full lockdep warnings:
[  310.309391] [ INFO: possible circular locking dependency detected ]
[  310.316462] 4.2.0-rc8+ #7 Not tainted
[  310.320613] ---
[  310.327684] kworker/u288:3/388 is trying to acquire lock:
[  310.333780]  (s_active#97){.+}, at: []
kernfs_remove_by_name_ns+0x49/0xb0
[  310.343885]
[  310.343885] but task is already holding lock:
[  310.350466]  (cpu_hotplug.lock#2){+.+.+.}, at: []
cpu_hotplug_begin+0x7b/0xc0
[  310.360564]
[  310.360564] which lock already depends on the new lock.
[  310.360564]
[  310.369766]
[  310.369766] the existing dependency chain (in reverse order) is:
[  310.378198]
[  310.378198] -> #3 (cpu_hotplug.lock#2){+.+.+.}:
[  310.383821][] lock_acquire+0xdd/0x2a0
[  310.390591][] mutex_lock_nested+0x70/0x3e0
[  310.397847][] cpu_hotplug_begin+0x7b/0xc0
[  310.405004][] _cpu_up+0x31/0x140
[  310.411285][] cpu_up+0x7c/0xa0
[  310.417362][] smp_init+0x86/0x88
[  310.423647][] kernel_init_freeable+0x171/0x286
[  310.431292][] kernel_init+0xe/0xe0
[  310.437771][] ret_from_fork+0x3f/0x70
[  310.444540]
[  310.444540] -> #2 (cpu_hotplug.lock){++}:
[  310.449957][] lock_acquire+0xdd/0x2a0
[  310.456714][] cpu_hotplug_begin+0x6d/0xc0
[  310.463871][] _cpu_up+0x31/0x140
[  310.470143][] cpu_up+0x7c/0xa0
[  310.476228][] smp_init+0x86/0x88
[  310.482509][] kernel_init_freeable+0x171/0x286
[  310.490153][] kernel_init+0xe/0xe0
[  310.496628][] ret_from_fork+0x3f/0x70
[  310.503393]
[  310.503393] -> #1 (cpu_add_remove_lock){+.+.+.}:
[  310.509099][] lock_acquire+0xdd/0x2a0
[  310.515866][] __might_fault+0x84/0xb0
[  310.522635][] kernfs_fop_write+0x8f/0x190
[  310.529793][] __vfs_write+0x28/0xe0
[  310.536368][] vfs_write+0xac/0x1a0
[  310.542833][] SyS_write+0x49/0xb0
[  310.549212][]
entry_SYSCALL_64_fastpath+0x16/0x7a
[  310.557149]
[  310.557149] -> #0 (s_active#97){.+}:
[  310.562135][] __lock_acquire+0x21b9/0x21c0
[  310.569391][] lock_acquire+0xdd/0x2a0
[  310.576159][] __kernfs_remove+0x231/0x330
[  310.583318][]
kernfs_remove_by_name_ns+0x49/0xb0
[  310.591154][] sysfs_remove_file_ns+0x15/0x20
[  310.598594][] device_remove_attrs+0x3e/0x80
[  310.605948][] device_del+0x138/0x270
[  310.612617][] device_unregister+0x22/0x70
[  310.619767][] unregister_cpu+0x39/0x60
[  310.626622][] arch_unregister_cpu+0x23/0x30
[  310.633974][] acpi_processor_remove+0x91/0xca
[  310.641524][] acpi_bus_trim+0x5a/0x8d
[  310.648292][] acpi_bus_trim+0x38/0x8d
[  310.655060][]
acpi_scan_device_not_present+0x1d/0x3d
[  310.663312][] acpi_scan_bus_check+0x29/0xa2
[  310.670654][] acpi_device_hotplug+0x99/0x3fa
[  310.678103][] acpi_hotplug_work_fn+0x1f/0x2b
[  310.68][] process_one_work+0x1f1/0x7c0
[  310.692814][] worker_thread+0x69/0x480
[  310.699677][] kthread+0x11f/0x140
[  310.706046][] ret_from_fork+0x3f/0x70
[  310.712815]
[  310.712815] other info that might help us debug this:
[  310.712815]
[  310.721907] Chain exists of:
[  310.721907]   s_active#97 --> cpu_hotplug.lock --> cpu_hotplug.lock#2
[  310.721907]
[  310.731680]  Possible unsafe locking scenario:
[  310.731680]
[  310.738413]CPU0CPU1
[  310.743562]
[  310.748710]   lock(cpu_hotplug.lock#2);
[  310.753261]lock(cpu_hotplug.lock);
[  310.760382]lock(cpu_hotplug.lock#2);
[  310.767755]   lock(s_active#97);
[  310.771625]
[  310.771625]  *** DEADLOCK ***
[  310.771625]
[  310.778382] 7 locks held by kworker/u288:3/388:
[  310.783530]  #0:  ("kacpi_hotplug"){.+.+.+}, at: []

Possible deadlock related to CPU hotplug and kernfs

2015-09-01 Thread Jiang Liu
Hi Rafael and Tejun,
When running CPU hotplug tests, it triggers an lockdep warning
as follow. The two possible deadlock paths are:
1) echo x > /sys/devices/system/cpu/cpux/online
   ->kernfs_fop_write()
 ->kernfs_get_active()
1.a)   ->rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
 ->cpu_up()
1.b)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
2) hardware triggers hotplug evetns
   ->acpi_device_hotplug()
 ->acpi_processor_remove()
2.a)   ->cpu_hotplug_begin()[lock_map_acquire(_hotplug.dep_map)]
 ->unregister_cpu()
   ->device_del()
 ->kernfs_remove_by_name_ns()
   ->__kernfs_remove()
 ->kernfs_drain()
2.b)   ->rwsem_acquire(>dep_map, 0, 0, _RET_IP_)

So there is a possible deadlock scenario among 1.a, 1.b, 2.a and 2.b.
I'm not familiar with kernfs, so could you please help to comment:
1) whether is a real deadlock issue?
2) any recommended way to get it fixed?
Thanks!
Gerry

Full lockdep warnings:
[  310.309391] [ INFO: possible circular locking dependency detected ]
[  310.316462] 4.2.0-rc8+ #7 Not tainted
[  310.320613] ---
[  310.327684] kworker/u288:3/388 is trying to acquire lock:
[  310.333780]  (s_active#97){.+}, at: []
kernfs_remove_by_name_ns+0x49/0xb0
[  310.343885]
[  310.343885] but task is already holding lock:
[  310.350466]  (cpu_hotplug.lock#2){+.+.+.}, at: []
cpu_hotplug_begin+0x7b/0xc0
[  310.360564]
[  310.360564] which lock already depends on the new lock.
[  310.360564]
[  310.369766]
[  310.369766] the existing dependency chain (in reverse order) is:
[  310.378198]
[  310.378198] -> #3 (cpu_hotplug.lock#2){+.+.+.}:
[  310.383821][] lock_acquire+0xdd/0x2a0
[  310.390591][] mutex_lock_nested+0x70/0x3e0
[  310.397847][] cpu_hotplug_begin+0x7b/0xc0
[  310.405004][] _cpu_up+0x31/0x140
[  310.411285][] cpu_up+0x7c/0xa0
[  310.417362][] smp_init+0x86/0x88
[  310.423647][] kernel_init_freeable+0x171/0x286
[  310.431292][] kernel_init+0xe/0xe0
[  310.437771][] ret_from_fork+0x3f/0x70
[  310.444540]
[  310.444540] -> #2 (cpu_hotplug.lock){++}:
[  310.449957][] lock_acquire+0xdd/0x2a0
[  310.456714][] cpu_hotplug_begin+0x6d/0xc0
[  310.463871][] _cpu_up+0x31/0x140
[  310.470143][] cpu_up+0x7c/0xa0
[  310.476228][] smp_init+0x86/0x88
[  310.482509][] kernel_init_freeable+0x171/0x286
[  310.490153][] kernel_init+0xe/0xe0
[  310.496628][] ret_from_fork+0x3f/0x70
[  310.503393]
[  310.503393] -> #1 (cpu_add_remove_lock){+.+.+.}:
[  310.509099][] lock_acquire+0xdd/0x2a0
[  310.515866][] __might_fault+0x84/0xb0
[  310.522635][] kernfs_fop_write+0x8f/0x190
[  310.529793][] __vfs_write+0x28/0xe0
[  310.536368][] vfs_write+0xac/0x1a0
[  310.542833][] SyS_write+0x49/0xb0
[  310.549212][]
entry_SYSCALL_64_fastpath+0x16/0x7a
[  310.557149]
[  310.557149] -> #0 (s_active#97){.+}:
[  310.562135][] __lock_acquire+0x21b9/0x21c0
[  310.569391][] lock_acquire+0xdd/0x2a0
[  310.576159][] __kernfs_remove+0x231/0x330
[  310.583318][]
kernfs_remove_by_name_ns+0x49/0xb0
[  310.591154][] sysfs_remove_file_ns+0x15/0x20
[  310.598594][] device_remove_attrs+0x3e/0x80
[  310.605948][] device_del+0x138/0x270
[  310.612617][] device_unregister+0x22/0x70
[  310.619767][] unregister_cpu+0x39/0x60
[  310.626622][] arch_unregister_cpu+0x23/0x30
[  310.633974][] acpi_processor_remove+0x91/0xca
[  310.641524][] acpi_bus_trim+0x5a/0x8d
[  310.648292][] acpi_bus_trim+0x38/0x8d
[  310.655060][]
acpi_scan_device_not_present+0x1d/0x3d
[  310.663312][] acpi_scan_bus_check+0x29/0xa2
[  310.670654][] acpi_device_hotplug+0x99/0x3fa
[  310.678103][] acpi_hotplug_work_fn+0x1f/0x2b
[  310.68][] process_one_work+0x1f1/0x7c0
[  310.692814][] worker_thread+0x69/0x480
[  310.699677][] kthread+0x11f/0x140
[  310.706046][] ret_from_fork+0x3f/0x70
[  310.712815]
[  310.712815] other info that might help us debug this:
[  310.712815]
[  310.721907] Chain exists of:
[  310.721907]   s_active#97 --> cpu_hotplug.lock --> cpu_hotplug.lock#2
[  310.721907]
[  310.731680]  Possible unsafe locking scenario:
[  310.731680]
[  310.738413]CPU0CPU1
[  310.743562]
[  310.748710]   lock(cpu_hotplug.lock#2);
[  310.753261]lock(cpu_hotplug.lock);
[  310.760382]lock(cpu_hotplug.lock#2);
[  310.767755]   lock(s_active#97);
[  310.771625]
[  310.771625]  *** DEADLOCK ***
[  310.771625]
[  310.778382] 7 locks held by kworker/u288:3/388:
[  310.783530]  #0:  ("kacpi_hotplug"){.+.+.+}, at: []