Re: [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
On Mon, Oct 12, 2020 at 02:53:21PM +0200, David Hildenbrand wrote: >virtio-mem soon wants to use offline_and_remove_memory() memory that >exceeds a single Linux memory block (memory_block_size_bytes()). Let's >remove that restriction. > >Let's remember the old state and try to restore that if anything goes >wrong. While re-onlining can, in general, fail, it's highly unlikely to >happen (usually only when a notifier fails to allocate memory, and these >are rather rare). > >This will be used by virtio-mem to offline+remove memory ranges that are >bigger than a single memory block - for example, with a device block >size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory >block size of 128MB. > >While we could compress the state into 2 bit, using 8 bit is much >easier. > >This handling is similar, but different to acpi_scan_try_to_offline(): > >a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG >optimization is still relevant - it should only apply to ZONE_NORMAL >(where we have no guarantees). If relevant, we can always add it. > >b) acpi_scan_try_to_offline() simply onlines all memory in case >something goes wrong. It doesn't restore previous online type. Let's do >that, so we won't overwrite what e.g., user space configured. > >Cc: "Michael S. Tsirkin" >Cc: Jason Wang >Cc: Pankaj Gupta >Cc: Michal Hocko >Cc: Oscar Salvador >Cc: Wei Yang >Cc: Andrew Morton >Signed-off-by: David Hildenbrand Looks good to me. Reviewed-by: Wei Yang >--- > mm/memory_hotplug.c | 105 +--- > 1 file changed, 89 insertions(+), 16 deletions(-) > >diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >index b44d4c7ba73b..217080ca93e5 100644 >--- a/mm/memory_hotplug.c >+++ b/mm/memory_hotplug.c >@@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size) > } > EXPORT_SYMBOL_GPL(remove_memory); > >+static int try_offline_memory_block(struct memory_block *mem, void *arg) >+{ >+ uint8_t online_type = MMOP_ONLINE_KERNEL; >+ uint8_t **online_types = arg; >+ struct page *page; >+ int rc; >+ >+ /* >+ * Sense the online_type via the zone of the memory block. Offlining >+ * with multiple zones within one memory block will be rejected >+ * by offlining code ... so we don't care about that. >+ */ >+ page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr)); >+ if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE) >+ online_type = MMOP_ONLINE_MOVABLE; >+ >+ rc = device_offline(>dev); >+ /* >+ * Default is MMOP_OFFLINE - change it only if offlining succeeded, >+ * so try_reonline_memory_block() can do the right thing. >+ */ >+ if (!rc) >+ **online_types = online_type; >+ >+ (*online_types)++; >+ /* Ignore if already offline. */ >+ return rc < 0 ? rc : 0; >+} >+ >+static int try_reonline_memory_block(struct memory_block *mem, void *arg) >+{ >+ uint8_t **online_types = arg; >+ int rc; >+ >+ if (**online_types != MMOP_OFFLINE) { >+ mem->online_type = **online_types; >+ rc = device_online(>dev); >+ if (rc < 0) >+ pr_warn("%s: Failed to re-online memory: %d", >+ __func__, rc); >+ } >+ >+ /* Continue processing all remaining memory blocks. */ >+ (*online_types)++; >+ return 0; >+} >+ > /* >- * Try to offline and remove a memory block. Might take a long time to >- * finish in case memory is still in use. Primarily useful for memory devices >- * that logically unplugged all memory (so it's no longer in use) and want to >- * offline + remove the memory block. >+ * Try to offline and remove memory. Might take a long time to finish in case >+ * memory is still in use. Primarily useful for memory devices that logically >+ * unplugged all memory (so it's no longer in use) and want to offline + >remove >+ * that memory. > */ > int offline_and_remove_memory(int nid, u64 start, u64 size) > { >- struct memory_block *mem; >- int rc = -EINVAL; >+ const unsigned long mb_count = size / memory_block_size_bytes(); >+ uint8_t *online_types, *tmp; >+ int rc; > > if (!IS_ALIGNED(start, memory_block_size_bytes()) || >- size != memory_block_size_bytes()) >- return rc; >+ !IS_ALIGNED(size, memory_block_size_bytes()) || !size) >+ return -EINVAL; >+ >+ /* >+ * We'll remember the old online type of each memory block, so we can >+ * try to revert whatever we did when offlining one memory block fails >+ * after offlining some others succeeded. >+ */ >+ online_types = kmalloc_array(mb_count, sizeof(*online_types), >+ GFP_KERNEL); >+ if (!online_types) >+ return -ENOMEM; >+ /* >+ * Initialize all states to MMOP_OFFLINE, so when we abort processing
Re: [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
On Mon, Oct 12, 2020 at 02:53:21PM +0200, David Hildenbrand wrote: > virtio-mem soon wants to use offline_and_remove_memory() memory that > exceeds a single Linux memory block (memory_block_size_bytes()). Let's > remove that restriction. > > Let's remember the old state and try to restore that if anything goes > wrong. While re-onlining can, in general, fail, it's highly unlikely to > happen (usually only when a notifier fails to allocate memory, and these > are rather rare). > > This will be used by virtio-mem to offline+remove memory ranges that are > bigger than a single memory block - for example, with a device block > size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory > block size of 128MB. > > While we could compress the state into 2 bit, using 8 bit is much > easier. > > This handling is similar, but different to acpi_scan_try_to_offline(): > > a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG > optimization is still relevant - it should only apply to ZONE_NORMAL > (where we have no guarantees). If relevant, we can always add it. > > b) acpi_scan_try_to_offline() simply onlines all memory in case > something goes wrong. It doesn't restore previous online type. Let's do > that, so we won't overwrite what e.g., user space configured. > > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Pankaj Gupta > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Wei Yang > Cc: Andrew Morton > Signed-off-by: David Hildenbrand Could I get some acks from mm folks for this one? The rest can go in through my tree I guess ... Andrew? Thanks! > --- > mm/memory_hotplug.c | 105 +--- > 1 file changed, 89 insertions(+), 16 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index b44d4c7ba73b..217080ca93e5 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size) > } > EXPORT_SYMBOL_GPL(remove_memory); > > +static int try_offline_memory_block(struct memory_block *mem, void *arg) > +{ > + uint8_t online_type = MMOP_ONLINE_KERNEL; > + uint8_t **online_types = arg; > + struct page *page; > + int rc; > + > + /* > + * Sense the online_type via the zone of the memory block. Offlining > + * with multiple zones within one memory block will be rejected > + * by offlining code ... so we don't care about that. > + */ > + page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr)); > + if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE) > + online_type = MMOP_ONLINE_MOVABLE; > + > + rc = device_offline(>dev); > + /* > + * Default is MMOP_OFFLINE - change it only if offlining succeeded, > + * so try_reonline_memory_block() can do the right thing. > + */ > + if (!rc) > + **online_types = online_type; > + > + (*online_types)++; > + /* Ignore if already offline. */ > + return rc < 0 ? rc : 0; > +} > + > +static int try_reonline_memory_block(struct memory_block *mem, void *arg) > +{ > + uint8_t **online_types = arg; > + int rc; > + > + if (**online_types != MMOP_OFFLINE) { > + mem->online_type = **online_types; > + rc = device_online(>dev); > + if (rc < 0) > + pr_warn("%s: Failed to re-online memory: %d", > + __func__, rc); > + } > + > + /* Continue processing all remaining memory blocks. */ > + (*online_types)++; > + return 0; > +} > + > /* > - * Try to offline and remove a memory block. Might take a long time to > - * finish in case memory is still in use. Primarily useful for memory devices > - * that logically unplugged all memory (so it's no longer in use) and want to > - * offline + remove the memory block. > + * Try to offline and remove memory. Might take a long time to finish in case > + * memory is still in use. Primarily useful for memory devices that logically > + * unplugged all memory (so it's no longer in use) and want to offline + > remove > + * that memory. > */ > int offline_and_remove_memory(int nid, u64 start, u64 size) > { > - struct memory_block *mem; > - int rc = -EINVAL; > + const unsigned long mb_count = size / memory_block_size_bytes(); > + uint8_t *online_types, *tmp; > + int rc; > > if (!IS_ALIGNED(start, memory_block_size_bytes()) || > - size != memory_block_size_bytes()) > - return rc; > + !IS_ALIGNED(size, memory_block_size_bytes()) || !size) > + return -EINVAL; > + > + /* > + * We'll remember the old online type of each memory block, so we can > + * try to revert whatever we did when offlining one memory block fails > + * after offlining some others succeeded. > + */ > + online_types = kmalloc_array(mb_count, sizeof(*online_types), > + GFP_KERNEL);
[PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
virtio-mem soon wants to use offline_and_remove_memory() memory that exceeds a single Linux memory block (memory_block_size_bytes()). Let's remove that restriction. Let's remember the old state and try to restore that if anything goes wrong. While re-onlining can, in general, fail, it's highly unlikely to happen (usually only when a notifier fails to allocate memory, and these are rather rare). This will be used by virtio-mem to offline+remove memory ranges that are bigger than a single memory block - for example, with a device block size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory block size of 128MB. While we could compress the state into 2 bit, using 8 bit is much easier. This handling is similar, but different to acpi_scan_try_to_offline(): a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG optimization is still relevant - it should only apply to ZONE_NORMAL (where we have no guarantees). If relevant, we can always add it. b) acpi_scan_try_to_offline() simply onlines all memory in case something goes wrong. It doesn't restore previous online type. Let's do that, so we won't overwrite what e.g., user space configured. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Pankaj Gupta Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Cc: Andrew Morton Signed-off-by: David Hildenbrand --- mm/memory_hotplug.c | 105 +--- 1 file changed, 89 insertions(+), 16 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b44d4c7ba73b..217080ca93e5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size) } EXPORT_SYMBOL_GPL(remove_memory); +static int try_offline_memory_block(struct memory_block *mem, void *arg) +{ + uint8_t online_type = MMOP_ONLINE_KERNEL; + uint8_t **online_types = arg; + struct page *page; + int rc; + + /* +* Sense the online_type via the zone of the memory block. Offlining +* with multiple zones within one memory block will be rejected +* by offlining code ... so we don't care about that. +*/ + page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr)); + if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE) + online_type = MMOP_ONLINE_MOVABLE; + + rc = device_offline(>dev); + /* +* Default is MMOP_OFFLINE - change it only if offlining succeeded, +* so try_reonline_memory_block() can do the right thing. +*/ + if (!rc) + **online_types = online_type; + + (*online_types)++; + /* Ignore if already offline. */ + return rc < 0 ? rc : 0; +} + +static int try_reonline_memory_block(struct memory_block *mem, void *arg) +{ + uint8_t **online_types = arg; + int rc; + + if (**online_types != MMOP_OFFLINE) { + mem->online_type = **online_types; + rc = device_online(>dev); + if (rc < 0) + pr_warn("%s: Failed to re-online memory: %d", + __func__, rc); + } + + /* Continue processing all remaining memory blocks. */ + (*online_types)++; + return 0; +} + /* - * Try to offline and remove a memory block. Might take a long time to - * finish in case memory is still in use. Primarily useful for memory devices - * that logically unplugged all memory (so it's no longer in use) and want to - * offline + remove the memory block. + * Try to offline and remove memory. Might take a long time to finish in case + * memory is still in use. Primarily useful for memory devices that logically + * unplugged all memory (so it's no longer in use) and want to offline + remove + * that memory. */ int offline_and_remove_memory(int nid, u64 start, u64 size) { - struct memory_block *mem; - int rc = -EINVAL; + const unsigned long mb_count = size / memory_block_size_bytes(); + uint8_t *online_types, *tmp; + int rc; if (!IS_ALIGNED(start, memory_block_size_bytes()) || - size != memory_block_size_bytes()) - return rc; + !IS_ALIGNED(size, memory_block_size_bytes()) || !size) + return -EINVAL; + + /* +* We'll remember the old online type of each memory block, so we can +* try to revert whatever we did when offlining one memory block fails +* after offlining some others succeeded. +*/ + online_types = kmalloc_array(mb_count, sizeof(*online_types), +GFP_KERNEL); + if (!online_types) + return -ENOMEM; + /* +* Initialize all states to MMOP_OFFLINE, so when we abort processing in +* try_offline_memory_block(), we'll skip all unprocessed blocks in +* try_reonline_memory_block(). +*/ + memset(online_types, MMOP_OFFLINE, mb_count);