Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread David Hildenbrand
On 17.08.2018 10:20, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand  wrote:
>>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>> a) device_lock()
>> b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order. Looking at 1., this order is not always
>> satisfied when calling device_online() - essentially we simply don't take
>> one of both locks anymore - and fixing this would require us to
>> take the mem_hotplug_lock in core driver code (online_store()), which
>> sounds wrong.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>>mem_hotplug_begin() and the calls device_online() - which takes
>>device_lock() - .online does no longer call mem_hotplug_begin(), so
>>effectively calls online_pages() without mem_hotplug_lock. onlining/
>>offlining of pages is no longer serialised across different devices.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>>onlining memory during add_memory() does not take care of that. (I
>>didn't follow how strictly this is needed, but there seems to be a
>>reason because it is documented at device_online() and
>>device_offline()).
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>>(and device_online()) without locks. This was introduced after
>>30467e0b3be. And skimming over the code, I assume it could need some
>>more care in regards to locking.
>>
>> ACPI code already holds the device_hotplug_lock, and as we are
>> effectively hotplugging memory block devices, requiring to hold that
>> lock does not sound too wrong, although not chosen in [1], as
>> "I don't think resolving a locking dependency is appropriate by
>>  just serializing them with another lock."
>> I think this is the cleanest solution.
>>
>> Requiring add_memory()/add_memory_resource() to be called under
>> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
>> online_pages/offline_pages() fixes 1. and 3.
>>
>> Fixup all callers of add_memory/add_memory_resource to hold the lock if
>> not already done.
>>
>> So this is essentially a revert of 30467e0b3be, implementation of what
>> was suggested in [1] by Vitaly, applied to the current tree.
>>
>> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>> 2015-February/065324.html
>>
>> This patch is partly based on a patch by Vitaly Kuznetsov.
>>
>> Signed-off-by: David Hildenbrand 
>> ---
>>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>>  drivers/acpi/acpi_memhotplug.c|  1 +
>>  drivers/base/memory.c | 18 +-
>>  drivers/hv/hv_balloon.c   |  4 +++
>>  drivers/s390/char/sclp_cmd.c  |  3 ++
>>  drivers/xen/balloon.c |  3 ++
>>  mm/memory_hotplug.c   | 42 ++-
>>  7 files changed, 55 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
>> b/arch/powerpc/platforms/powernv/memtrace.c
>> index 51dc398ae3f7..4c2737a33020 100644
>> --- a/arch/powerpc/platforms/powernv/memtrace.c
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>> int i, ret = 0;
>> struct memtrace_entry *ent;
>>
>> +   /* add_memory() requires device_hotplug_lock */
>> +   lock_device_hotplug();
>> for (i = memtrace_array_nr - 1; i >= 0; i--) {
>> ent = _array[i];
>>
>> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>> pr_info("Added trace memory back to node %d\n", ent->nid);
>> ent->size = ent->start = ent->nid = -1;
>> }
>> +   unlock_device_hotplug();
>> if (ret)
>> return ret;
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 6b0d3ef7309c..e7a4c7900967 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
>> acpi_memory_device *mem_device)
>> if (node < 0)
>> node = memory_add_physaddr_to_nid(info->start_addr);
>>
>> +   /* we already hold the device_hotplug lock at this point */
>> result = add_memory(node, info->start_addr, info->length);
>>
>> /*
> 
> A very minor nit here: I would 

Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand  wrote:
>
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order. Looking at 1., this order is not always
> satisfied when calling device_online() - essentially we simply don't take
> one of both locks anymore - and fixing this would require us to
> take the mem_hotplug_lock in core driver code (online_store()), which
> sounds wrong.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
>mem_hotplug_begin() and the calls device_online() - which takes
>device_lock() - .online does no longer call mem_hotplug_begin(), so
>effectively calls online_pages() without mem_hotplug_lock. onlining/
>offlining of pages is no longer serialised across different devices.
>
> 2. device_online() should be called under device_hotplug_lock, however
>onlining memory during add_memory() does not take care of that. (I
>didn't follow how strictly this is needed, but there seems to be a
>reason because it is documented at device_online() and
>device_offline()).
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>(and device_online()) without locks. This was introduced after
>30467e0b3be. And skimming over the code, I assume it could need some
>more care in regards to locking.
>
> ACPI code already holds the device_hotplug_lock, and as we are
> effectively hotplugging memory block devices, requiring to hold that
> lock does not sound too wrong, although not chosen in [1], as
> "I don't think resolving a locking dependency is appropriate by
>  just serializing them with another lock."
> I think this is the cleanest solution.
>
> Requiring add_memory()/add_memory_resource() to be called under
> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
> online_pages/offline_pages() fixes 1. and 3.
>
> Fixup all callers of add_memory/add_memory_resource to hold the lock if
> not already done.
>
> So this is essentially a revert of 30467e0b3be, implementation of what
> was suggested in [1] by Vitaly, applied to the current tree.
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
> 2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>  drivers/acpi/acpi_memhotplug.c|  1 +
>  drivers/base/memory.c | 18 +-
>  drivers/hv/hv_balloon.c   |  4 +++
>  drivers/s390/char/sclp_cmd.c  |  3 ++
>  drivers/xen/balloon.c |  3 ++
>  mm/memory_hotplug.c   | 42 ++-
>  7 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..4c2737a33020 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -206,6 +206,8 @@ static int memtrace_online(void)
> int i, ret = 0;
> struct memtrace_entry *ent;
>
> +   /* add_memory() requires device_hotplug_lock */
> +   lock_device_hotplug();
> for (i = memtrace_array_nr - 1; i >= 0; i--) {
> ent = _array[i];
>
> @@ -244,6 +246,7 @@ static int memtrace_online(void)
> pr_info("Added trace memory back to node %d\n", ent->nid);
> ent->size = ent->start = ent->nid = -1;
> }
> +   unlock_device_hotplug();
> if (ret)
> return ret;
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..e7a4c7900967 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> +   /* we already hold the device_hotplug lock at this point */
> result = add_memory(node, info->start_addr, info->length);
>
> /*

A very minor nit here: I would say "device_hotplug_lock is already
held at this point" in the comment (I sort of don't like to say "we"
in code comments as it is not particularly clear what 

[PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread David Hildenbrand
There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
a) device_lock()
b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took b),
followed by a), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order. Looking at 1., this order is not always
satisfied when calling device_online() - essentially we simply don't take
one of both locks anymore - and fixing this would require us to
take the mem_hotplug_lock in core driver code (online_store()), which
sounds wrong.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock. onlining/
   offlining of pages is no longer serialised across different devices.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that. (I
   didn't follow how strictly this is needed, but there seems to be a
   reason because it is documented at device_online() and
   device_offline()).

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   (and device_online()) without locks. This was introduced after
   30467e0b3be. And skimming over the code, I assume it could need some
   more care in regards to locking.

ACPI code already holds the device_hotplug_lock, and as we are
effectively hotplugging memory block devices, requiring to hold that
lock does not sound too wrong, although not chosen in [1], as
"I don't think resolving a locking dependency is appropriate by
 just serializing them with another lock."
I think this is the cleanest solution.

Requiring add_memory()/add_memory_resource() to be called under
device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
online_pages/offline_pages() fixes 1. and 3.

Fixup all callers of add_memory/add_memory_resource to hold the lock if
not already done.

So this is essentially a revert of 30467e0b3be, implementation of what
was suggested in [1] by Vitaly, applied to the current tree.

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c |  3 ++
 drivers/acpi/acpi_memhotplug.c|  1 +
 drivers/base/memory.c | 18 +-
 drivers/hv/hv_balloon.c   |  4 +++
 drivers/s390/char/sclp_cmd.c  |  3 ++
 drivers/xen/balloon.c |  3 ++
 mm/memory_hotplug.c   | 42 ++-
 7 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..4c2737a33020 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -206,6 +206,8 @@ static int memtrace_online(void)
int i, ret = 0;
struct memtrace_entry *ent;
 
+   /* add_memory() requires device_hotplug_lock */
+   lock_device_hotplug();
for (i = memtrace_array_nr - 1; i >= 0; i--) {
ent = _array[i];
 
@@ -244,6 +246,7 @@ static int memtrace_online(void)
pr_info("Added trace memory back to node %d\n", ent->nid);
ent->size = ent->start = ent->nid = -1;
}
+   unlock_device_hotplug();
if (ret)
return ret;
 
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..e7a4c7900967 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
+   /* we already hold the device_hotplug lock at this point */
result = add_memory(node, info->start_addr, info->length);
 
/*
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..f60507f994df 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -341,19 +341,11 @@ store_mem_state(struct device *dev,
goto err;
}
 
-   /*
-* Memory hotplug needs to hold mem_hotplug_begin() for probe to find
-* the correct memory block to online before doing device_online(dev),
-* which