Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
On 17.08.2018 10:20, Rafael J. Wysocki wrote: > On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand wrote: >> >> There seem to be some problems as result of 30467e0b3be ("mm, hotplug: >> fix concurrent memory hot-add deadlock"), which tried to fix a possible >> lock inversion reported and discussed in [1] due to the two locks >> a) device_lock() >> b) mem_hotplug_lock >> >> While add_memory() first takes b), followed by a) during >> bus_probe_device(), onlining of memory from user space first took b), >> followed by a), exposing a possible deadlock. >> >> In [1], and it was decided to not make use of device_hotplug_lock, but >> rather to enforce a locking order. Looking at 1., this order is not always >> satisfied when calling device_online() - essentially we simply don't take >> one of both locks anymore - and fixing this would require us to >> take the mem_hotplug_lock in core driver code (online_store()), which >> sounds wrong. >> >> The problems I spotted related to this: >> >> 1. Memory block device attributes: While .state first calls >>mem_hotplug_begin() and the calls device_online() - which takes >>device_lock() - .online does no longer call mem_hotplug_begin(), so >>effectively calls online_pages() without mem_hotplug_lock. onlining/ >>offlining of pages is no longer serialised across different devices. >> >> 2. device_online() should be called under device_hotplug_lock, however >>onlining memory during add_memory() does not take care of that. (I >>didn't follow how strictly this is needed, but there seems to be a >>reason because it is documented at device_online() and >>device_offline()). >> >> In addition, I think there is also something wrong about the locking in >> >> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() >>(and device_online()) without locks. This was introduced after >>30467e0b3be. And skimming over the code, I assume it could need some >>more care in regards to locking. >> >> ACPI code already holds the device_hotplug_lock, and as we are >> effectively hotplugging memory block devices, requiring to hold that >> lock does not sound too wrong, although not chosen in [1], as >> "I don't think resolving a locking dependency is appropriate by >> just serializing them with another lock." >> I think this is the cleanest solution. >> >> Requiring add_memory()/add_memory_resource() to be called under >> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in >> online_pages/offline_pages() fixes 1. and 3. >> >> Fixup all callers of add_memory/add_memory_resource to hold the lock if >> not already done. >> >> So this is essentially a revert of 30467e0b3be, implementation of what >> was suggested in [1] by Vitaly, applied to the current tree. >> >> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ >> 2015-February/065324.html >> >> This patch is partly based on a patch by Vitaly Kuznetsov. >> >> Signed-off-by: David Hildenbrand >> --- >> arch/powerpc/platforms/powernv/memtrace.c | 3 ++ >> drivers/acpi/acpi_memhotplug.c| 1 + >> drivers/base/memory.c | 18 +- >> drivers/hv/hv_balloon.c | 4 +++ >> drivers/s390/char/sclp_cmd.c | 3 ++ >> drivers/xen/balloon.c | 3 ++ >> mm/memory_hotplug.c | 42 ++- >> 7 files changed, 55 insertions(+), 19 deletions(-) >> >> diff --git a/arch/powerpc/platforms/powernv/memtrace.c >> b/arch/powerpc/platforms/powernv/memtrace.c >> index 51dc398ae3f7..4c2737a33020 100644 >> --- a/arch/powerpc/platforms/powernv/memtrace.c >> +++ b/arch/powerpc/platforms/powernv/memtrace.c >> @@ -206,6 +206,8 @@ static int memtrace_online(void) >> int i, ret = 0; >> struct memtrace_entry *ent; >> >> + /* add_memory() requires device_hotplug_lock */ >> + lock_device_hotplug(); >> for (i = memtrace_array_nr - 1; i >= 0; i--) { >> ent = _array[i]; >> >> @@ -244,6 +246,7 @@ static int memtrace_online(void) >> pr_info("Added trace memory back to node %d\n", ent->nid); >> ent->size = ent->start = ent->nid = -1; >> } >> + unlock_device_hotplug(); >> if (ret) >> return ret; >> >> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c >> index 6b0d3ef7309c..e7a4c7900967 100644 >> --- a/drivers/acpi/acpi_memhotplug.c >> +++ b/drivers/acpi/acpi_memhotplug.c >> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct >> acpi_memory_device *mem_device) >> if (node < 0) >> node = memory_add_physaddr_to_nid(info->start_addr); >> >> + /* we already hold the device_hotplug lock at this point */ >> result = add_memory(node, info->start_addr, info->length); >> >> /* > > A very minor nit here: I would
Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand wrote: > > There seem to be some problems as result of 30467e0b3be ("mm, hotplug: > fix concurrent memory hot-add deadlock"), which tried to fix a possible > lock inversion reported and discussed in [1] due to the two locks > a) device_lock() > b) mem_hotplug_lock > > While add_memory() first takes b), followed by a) during > bus_probe_device(), onlining of memory from user space first took b), > followed by a), exposing a possible deadlock. > > In [1], and it was decided to not make use of device_hotplug_lock, but > rather to enforce a locking order. Looking at 1., this order is not always > satisfied when calling device_online() - essentially we simply don't take > one of both locks anymore - and fixing this would require us to > take the mem_hotplug_lock in core driver code (online_store()), which > sounds wrong. > > The problems I spotted related to this: > > 1. Memory block device attributes: While .state first calls >mem_hotplug_begin() and the calls device_online() - which takes >device_lock() - .online does no longer call mem_hotplug_begin(), so >effectively calls online_pages() without mem_hotplug_lock. onlining/ >offlining of pages is no longer serialised across different devices. > > 2. device_online() should be called under device_hotplug_lock, however >onlining memory during add_memory() does not take care of that. (I >didn't follow how strictly this is needed, but there seems to be a >reason because it is documented at device_online() and >device_offline()). > > In addition, I think there is also something wrong about the locking in > > 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() >(and device_online()) without locks. This was introduced after >30467e0b3be. And skimming over the code, I assume it could need some >more care in regards to locking. > > ACPI code already holds the device_hotplug_lock, and as we are > effectively hotplugging memory block devices, requiring to hold that > lock does not sound too wrong, although not chosen in [1], as > "I don't think resolving a locking dependency is appropriate by > just serializing them with another lock." > I think this is the cleanest solution. > > Requiring add_memory()/add_memory_resource() to be called under > device_hotplug_lock fixes 2., taking the mem_hotplug_lock in > online_pages/offline_pages() fixes 1. and 3. > > Fixup all callers of add_memory/add_memory_resource to hold the lock if > not already done. > > So this is essentially a revert of 30467e0b3be, implementation of what > was suggested in [1] by Vitaly, applied to the current tree. > > [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ > 2015-February/065324.html > > This patch is partly based on a patch by Vitaly Kuznetsov. > > Signed-off-by: David Hildenbrand > --- > arch/powerpc/platforms/powernv/memtrace.c | 3 ++ > drivers/acpi/acpi_memhotplug.c| 1 + > drivers/base/memory.c | 18 +- > drivers/hv/hv_balloon.c | 4 +++ > drivers/s390/char/sclp_cmd.c | 3 ++ > drivers/xen/balloon.c | 3 ++ > mm/memory_hotplug.c | 42 ++- > 7 files changed, 55 insertions(+), 19 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/memtrace.c > b/arch/powerpc/platforms/powernv/memtrace.c > index 51dc398ae3f7..4c2737a33020 100644 > --- a/arch/powerpc/platforms/powernv/memtrace.c > +++ b/arch/powerpc/platforms/powernv/memtrace.c > @@ -206,6 +206,8 @@ static int memtrace_online(void) > int i, ret = 0; > struct memtrace_entry *ent; > > + /* add_memory() requires device_hotplug_lock */ > + lock_device_hotplug(); > for (i = memtrace_array_nr - 1; i >= 0; i--) { > ent = _array[i]; > > @@ -244,6 +246,7 @@ static int memtrace_online(void) > pr_info("Added trace memory back to node %d\n", ent->nid); > ent->size = ent->start = ent->nid = -1; > } > + unlock_device_hotplug(); > if (ret) > return ret; > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > index 6b0d3ef7309c..e7a4c7900967 100644 > --- a/drivers/acpi/acpi_memhotplug.c > +++ b/drivers/acpi/acpi_memhotplug.c > @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct > acpi_memory_device *mem_device) > if (node < 0) > node = memory_add_physaddr_to_nid(info->start_addr); > > + /* we already hold the device_hotplug lock at this point */ > result = add_memory(node, info->start_addr, info->length); > > /* A very minor nit here: I would say "device_hotplug_lock is already held at this point" in the comment (I sort of don't like to say "we" in code comments as it is not particularly clear what
[PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
There seem to be some problems as result of 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took b), followed by a), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. Looking at 1., this order is not always satisfied when calling device_online() - essentially we simply don't take one of both locks anymore - and fixing this would require us to take the mem_hotplug_lock in core driver code (online_store()), which sounds wrong. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. onlining/ offlining of pages is no longer serialised across different devices. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. (I didn't follow how strictly this is needed, but there seems to be a reason because it is documented at device_online() and device_offline()). In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() (and device_online()) without locks. This was introduced after 30467e0b3be. And skimming over the code, I assume it could need some more care in regards to locking. ACPI code already holds the device_hotplug_lock, and as we are effectively hotplugging memory block devices, requiring to hold that lock does not sound too wrong, although not chosen in [1], as "I don't think resolving a locking dependency is appropriate by just serializing them with another lock." I think this is the cleanest solution. Requiring add_memory()/add_memory_resource() to be called under device_hotplug_lock fixes 2., taking the mem_hotplug_lock in online_pages/offline_pages() fixes 1. and 3. Fixup all callers of add_memory/add_memory_resource to hold the lock if not already done. So this is essentially a revert of 30467e0b3be, implementation of what was suggested in [1] by Vitaly, applied to the current tree. [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 3 ++ drivers/acpi/acpi_memhotplug.c| 1 + drivers/base/memory.c | 18 +- drivers/hv/hv_balloon.c | 4 +++ drivers/s390/char/sclp_cmd.c | 3 ++ drivers/xen/balloon.c | 3 ++ mm/memory_hotplug.c | 42 ++- 7 files changed, 55 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 51dc398ae3f7..4c2737a33020 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -206,6 +206,8 @@ static int memtrace_online(void) int i, ret = 0; struct memtrace_entry *ent; + /* add_memory() requires device_hotplug_lock */ + lock_device_hotplug(); for (i = memtrace_array_nr - 1; i >= 0; i--) { ent = _array[i]; @@ -244,6 +246,7 @@ static int memtrace_online(void) pr_info("Added trace memory back to node %d\n", ent->nid); ent->size = ent->start = ent->nid = -1; } + unlock_device_hotplug(); if (ret) return ret; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 6b0d3ef7309c..e7a4c7900967 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); + /* we already hold the device_hotplug lock at this point */ result = add_memory(node, info->start_addr, info->length); /* diff --git a/drivers/base/memory.c b/drivers/base/memory.c index c8a1cb0b6136..f60507f994df 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -341,19 +341,11 @@ store_mem_state(struct device *dev, goto err; } - /* -* Memory hotplug needs to hold mem_hotplug_begin() for probe to find -* the correct memory block to online before doing device_online(dev), -* which