Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-17 Thread David Hildenbrand
On 17.04.19 15:31, Michal Hocko wrote:
> On Wed 17-04-19 15:24:47, David Hildenbrand wrote:
>> On 17.04.19 15:12, Michal Hocko wrote:
>>> On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
 __add_pages() doesn't add the memory resource, so __remove_pages()
 shouldn't remove it. Let's factor it out. Especially as it is a special
 case for memory used as system memory, added via add_memory() and
 friends.

 We now remove the resource after removing the sections instead of doing
 it the other way around. I don't think this change is problematic.

 add_memory()
register memory resource
arch_add_memory()

 remove_memory
arch_remove_memory()
release memory resource

 While at it, explain why we ignore errors and that it only happeny if
 we remove memory in a different granularity as we added it.
>>>
>>> OK, I agree that the symmetry is good in general and it certainly makes
>>> sense here as well. But does it make sense to pick up this particular
>>> part without larger considerations of add vs. remove apis? I have a
>>> strong feeling this wouldn't be the only thing to care about. In other
>>> words does this help future changes or it is more likely to cause more
>>> code conflicts with other features being developed right now?
>>
>> I am planning to
>>
>> 1. factor out memory block device handling, so features like sub-section
>> add/remove are easier to add internally. Move it to the user that wants
>> it. Clean up all the mess we have right now due to supporting memory
>> block devices that span several sections.
>>
>> 2. Make sure that any arch_add_pages() and friends clean up properly if
>> they fail instead of indicating failure but leaving some partially added
>> memory lying around.
>>
>> 3. Clean up node handling regarding to memory hotplug/unplug. Especially
>> don't allow to offline/remove memory spanning several nodes etc.
> 
> Yes, this all sounds sane to me.
> 
>> IOW, in order to properly clean up memory block device handling and
>> prepare for more changes people are interested in (e.g. sub-section add
>> of device memory), this is the right thing to do. The other parts are
>> bigger changes.
> 
> This would be really valuable to have in the cover. Beause there is so
> much to clean up in this mess but making random small cleanups without a
> larger plan tends to step on others toes more than being useful.

I agree, let's discuss the bigger plan I have in mind

1. arch_add_memory()/arch_remove_memory() don't deal with memory block
devices. add_memory()/remove_memory()/online_pages()/offline_pages() do.

2. add_memory()/remove_memory()/online_pages()/offline_pages()
- Only work on memory block device alignment/granularity
- Only work on single nodes.
- Only work on single zones.

3. mem->nid correctly indicates if
- Memory block devices belongs to single node / no node / multiple nodes
- Fast and reliable way to detect
remove_memory()/online_pages()/offline_pages() being called with
multiple nodes.

4. arch_remove_memory() and friends never fail. Removing of memory
always succeeds. This allows better error handling when adding of memory
fails. We will move some parts from CONFIG_MEMORY_HOTREMOVE to
CONFIG_MEMORY_HOTPLUG, so we can use them to clean up if adding of
memory fails.

5. Remove all accesses to struct_page from memory removal path. Pages
might never have been initialized, they should not be touched.

All other features I see on the horizon (vmemmap on added memory,
sub-section hot-add) would mainly be centered around
arch_add_memory()/arch_remove_memory(), which would not have to deal
with any special cases around memory block device memory.

Do you agree, do you have any other points/things in mind you consider
important?

-- 

Thanks,

David / dhildenb


Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-17 Thread Michal Hocko
On Wed 17-04-19 15:24:47, David Hildenbrand wrote:
> On 17.04.19 15:12, Michal Hocko wrote:
> > On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
> >> __add_pages() doesn't add the memory resource, so __remove_pages()
> >> shouldn't remove it. Let's factor it out. Especially as it is a special
> >> case for memory used as system memory, added via add_memory() and
> >> friends.
> >>
> >> We now remove the resource after removing the sections instead of doing
> >> it the other way around. I don't think this change is problematic.
> >>
> >> add_memory()
> >>register memory resource
> >>arch_add_memory()
> >>
> >> remove_memory
> >>arch_remove_memory()
> >>release memory resource
> >>
> >> While at it, explain why we ignore errors and that it only happeny if
> >> we remove memory in a different granularity as we added it.
> > 
> > OK, I agree that the symmetry is good in general and it certainly makes
> > sense here as well. But does it make sense to pick up this particular
> > part without larger considerations of add vs. remove apis? I have a
> > strong feeling this wouldn't be the only thing to care about. In other
> > words does this help future changes or it is more likely to cause more
> > code conflicts with other features being developed right now?
> 
> I am planning to
> 
> 1. factor out memory block device handling, so features like sub-section
> add/remove are easier to add internally. Move it to the user that wants
> it. Clean up all the mess we have right now due to supporting memory
> block devices that span several sections.
> 
> 2. Make sure that any arch_add_pages() and friends clean up properly if
> they fail instead of indicating failure but leaving some partially added
> memory lying around.
> 
> 3. Clean up node handling regarding to memory hotplug/unplug. Especially
> don't allow to offline/remove memory spanning several nodes etc.

Yes, this all sounds sane to me.

> IOW, in order to properly clean up memory block device handling and
> prepare for more changes people are interested in (e.g. sub-section add
> of device memory), this is the right thing to do. The other parts are
> bigger changes.

This would be really valuable to have in the cover. Beause there is so
much to clean up in this mess but making random small cleanups without a
larger plan tends to step on others toes more than being useful.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-17 Thread David Hildenbrand
On 17.04.19 15:12, Michal Hocko wrote:
> On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
>> __add_pages() doesn't add the memory resource, so __remove_pages()
>> shouldn't remove it. Let's factor it out. Especially as it is a special
>> case for memory used as system memory, added via add_memory() and
>> friends.
>>
>> We now remove the resource after removing the sections instead of doing
>> it the other way around. I don't think this change is problematic.
>>
>> add_memory()
>>  register memory resource
>>  arch_add_memory()
>>
>> remove_memory
>>  arch_remove_memory()
>>  release memory resource
>>
>> While at it, explain why we ignore errors and that it only happeny if
>> we remove memory in a different granularity as we added it.
> 
> OK, I agree that the symmetry is good in general and it certainly makes
> sense here as well. But does it make sense to pick up this particular
> part without larger considerations of add vs. remove apis? I have a
> strong feeling this wouldn't be the only thing to care about. In other
> words does this help future changes or it is more likely to cause more
> code conflicts with other features being developed right now?

I am planning to

1. factor out memory block device handling, so features like sub-section
add/remove are easier to add internally. Move it to the user that wants
it. Clean up all the mess we have right now due to supporting memory
block devices that span several sections.

2. Make sure that any arch_add_pages() and friends clean up properly if
they fail instead of indicating failure but leaving some partially added
memory lying around.

3. Clean up node handling regarding to memory hotplug/unplug. Especially
don't allow to offline/remove memory spanning several nodes etc.

IOW, in order to properly clean up memory block device handling and
prepare for more changes people are interested in (e.g. sub-section add
of device memory), this is the right thing to do. The other parts are
bigger changes.

-- 

Thanks,

David / dhildenb


Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-17 Thread Michal Hocko
On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
> __add_pages() doesn't add the memory resource, so __remove_pages()
> shouldn't remove it. Let's factor it out. Especially as it is a special
> case for memory used as system memory, added via add_memory() and
> friends.
> 
> We now remove the resource after removing the sections instead of doing
> it the other way around. I don't think this change is problematic.
> 
> add_memory()
>   register memory resource
>   arch_add_memory()
> 
> remove_memory
>   arch_remove_memory()
>   release memory resource
> 
> While at it, explain why we ignore errors and that it only happeny if
> we remove memory in a different granularity as we added it.

OK, I agree that the symmetry is good in general and it certainly makes
sense here as well. But does it make sense to pick up this particular
part without larger considerations of add vs. remove apis? I have a
strong feeling this wouldn't be the only thing to care about. In other
words does this help future changes or it is more likely to cause more
code conflicts with other features being developed right now?

> Cc: Andrew Morton 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: David Hildenbrand 
> Cc: Pavel Tatashin 
> Cc: Wei Yang 
> Cc: Qian Cai 
> Cc: Arun KS 
> Cc: Mathieu Malaterre 
> Signed-off-by: David Hildenbrand 
> ---
>  mm/memory_hotplug.c | 34 --
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4970ff658055..696ed7ee5e28 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned long 
> phys_start_pfn,
>   if (is_dev_zone(zone)) {
>   if (altmap)
>   map_offset = vmem_altmap_offset(altmap);
> - } else {
> - resource_size_t start, size;
> -
> - start = phys_start_pfn << PAGE_SHIFT;
> - size = nr_pages * PAGE_SIZE;
> -
> - ret = release_mem_region_adjustable(_resource, start,
> - size);
> - if (ret) {
> - resource_size_t endres = start + size - 1;
> -
> - pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
> - , , ret);
> - }
>   }
>  
>   clear_zone_contiguous(zone);
> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> +static void __release_memory_resource(u64 start, u64 size)
> +{
> + int ret;
> +
> + /*
> +  * When removing memory in the same granularity as it was added,
> +  * this function never fails. It might only fail if resources
> +  * have to be adjusted or split. We'll ignore the error, as
> +  * removing of memory cannot fail.
> +  */
> + ret = release_mem_region_adjustable(_resource, start, size);
> + if (ret) {
> + resource_size_t endres = start + size - 1;
> +
> + pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
> + , , ret);
> + }
> +}
> +
>  /**
>   * remove_memory
>   * @nid: the node ID
> @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>   memblock_remove(start, size);
>  
>   arch_remove_memory(nid, start, size, NULL);
> + __release_memory_resource(start, size);
>  
>   try_offline_node(nid);
>  
> -- 
> 2.17.2
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-17 Thread Oscar Salvador
On Tue, 2019-04-09 at 12:01 +0200, David Hildenbrand wrote:
> __add_pages() doesn't add the memory resource, so __remove_pages()
> shouldn't remove it. Let's factor it out. Especially as it is a
> special
> case for memory used as system memory, added via add_memory() and
> friends.

I would call the special case the other way, aka: zone_device hooking
into hotplug path.

> 
> We now remove the resource after removing the sections instead of
> doing
> it the other way around. I don't think this change is problematic.
> 
> add_memory()
>   register memory resource
>   arch_add_memory()
> 
> remove_memory
>   arch_remove_memory()
>   release memory resource
> 
> While at it, explain why we ignore errors and that it only happeny if
> we remove memory in a different granularity as we added it.

In the future we may want to allow drivers to hook directly into
arch_add_memory()/arch_remove_memory(), and this will lead to different
granularity in hot_add/hot_remove operations. 

At least that was one of the conclusions I drew from the last vmemmap-
patchset.
So, we will have to see how we can handle those kind of errors.

> 
> Cc: Andrew Morton 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: David Hildenbrand 
> Cc: Pavel Tatashin 
> Cc: Wei Yang 
> Cc: Qian Cai 
> Cc: Arun KS 
> Cc: Mathieu Malaterre 
> Signed-off-by: David Hildenbrand 

Besides what Andrew pointed out about the types of start,size, I do not
see anything wrong:

Reviewed-by: Oscar Salvador 

> ---
>  mm/memory_hotplug.c | 34 --
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4970ff658055..696ed7ee5e28 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>   if (is_dev_zone(zone)) {
>   if (altmap)
>   map_offset = vmem_altmap_offset(altmap);
> - } else {
> - resource_size_t start, size;
> -
> - start = phys_start_pfn << PAGE_SHIFT;
> - size = nr_pages * PAGE_SIZE;
> -
> - ret = release_mem_region_adjustable(_resource,
> start,
> - size);
> - if (ret) {
> - resource_size_t endres = start + size - 1;
> -
> - pr_warn("Unable to release resource <%pa-
> %pa> (%d)\n",
> - , , ret);
> - }
>   }
>  
>   clear_zone_contiguous(zone);
> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> +static void __release_memory_resource(u64 start, u64 size)
> +{
> + int ret;
> +
> + /*
> +  * When removing memory in the same granularity as it was
> added,
> +  * this function never fails. It might only fail if
> resources
> +  * have to be adjusted or split. We'll ignore the error, as
> +  * removing of memory cannot fail.
> +  */
> + ret = release_mem_region_adjustable(_resource, start,
> size);
> + if (ret) {
> + resource_size_t endres = start + size - 1;
> +
> + pr_warn("Unable to release resource <%pa-%pa>
> (%d)\n",
> + , , ret);
> + }
> +}
> +
>  /**
>   * remove_memory
>   * @nid: the node ID
> @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start,
> u64 size)
>   memblock_remove(start, size);
>  
>   arch_remove_memory(nid, start, size, NULL);
> + __release_memory_resource(start, size);
>  
>   try_offline_node(nid);
>  
-- 
Oscar Salvador
SUSE L3


Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-17 Thread David Hildenbrand
On 17.04.19 05:37, Andrew Morton wrote:
> On Wed, 10 Apr 2019 10:07:24 +0200 David Hildenbrand  wrote:
> 
>> Care to fixup both u64 to resource_size_t? Or should I send a patch?
>> Whatever you prefer.
> 
> Please send along a fixup.

Will do!

> 
> This patch series has no evidence of having been reviewed :(.  Can you
> suggest who could help us out here?

Usually I would say Oscar and Michal, but they seem to be fairly busy. I
guess only the first patch is a real change, the other three patches are
mostly function prototype changes, handling errors in a nicer way.

Thanks!

-- 

Thanks,

David / dhildenb


Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-16 Thread Andrew Morton
On Wed, 10 Apr 2019 10:07:24 +0200 David Hildenbrand  wrote:

> Care to fixup both u64 to resource_size_t? Or should I send a patch?
> Whatever you prefer.

Please send along a fixup.

This patch series has no evidence of having been reviewed :(.  Can you
suggest who could help us out here?



Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-10 Thread David Hildenbrand
On 10.04.19 00:41, Andrew Morton wrote:
> On Tue,  9 Apr 2019 12:01:45 +0200 David Hildenbrand  wrote:
> 
>> __add_pages() doesn't add the memory resource, so __remove_pages()
>> shouldn't remove it. Let's factor it out. Especially as it is a special
>> case for memory used as system memory, added via add_memory() and
>> friends.
>>
>> We now remove the resource after removing the sections instead of doing
>> it the other way around. I don't think this change is problematic.
>>
>> add_memory()
>>  register memory resource
>>  arch_add_memory()
>>
>> remove_memory
>>  arch_remove_memory()
>>  release memory resource
>>
>> While at it, explain why we ignore errors and that it only happeny if
>> we remove memory in a different granularity as we added it.
> 
> Seems sane.
> 
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>>  }
>>  EXPORT_SYMBOL(try_offline_node);
>>  
>> +static void __release_memory_resource(u64 start, u64 size)
>> +{
>> +int ret;
>> +
>> +/*
>> + * When removing memory in the same granularity as it was added,
>> + * this function never fails. It might only fail if resources
>> + * have to be adjusted or split. We'll ignore the error, as
>> + * removing of memory cannot fail.
>> + */
>> +ret = release_mem_region_adjustable(_resource, start, size);
>> +if (ret) {
>> +resource_size_t endres = start + size - 1;
>> +
>> +pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
>> +, , ret);
>> +}
>> +}
> 
> The types seem confused here.  Should `start' and `size' be
> resource_size_t?  Or maybe phys_addr_t.

Hmm, right now it has the same prototype as register_memory_resource. I
guess using resource_size_t is the right thing to do.

> 
> release_mem_region_adjustable() takes resource_size_t's.
> 
> Is %pa the way to print a resource_size_t?  I guess it happens to work
> because resource_size_t happens to map onto phys_addr_t, which isn't
> ideal.

Documentation/core-api/printk-formats.rst

"
%pa[p]  0x01234567 or 0x0123456789abcdef

For printing a phys_addr_t type (and its derivatives, such as
resource_size_t) ...
"


Care to fixup both u64 to resource_size_t? Or should I send a patch?
Whatever you prefer.

Thanks!

-- 

Thanks,

David / dhildenb


Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-09 Thread Andrew Morton
On Tue,  9 Apr 2019 12:01:45 +0200 David Hildenbrand  wrote:

> __add_pages() doesn't add the memory resource, so __remove_pages()
> shouldn't remove it. Let's factor it out. Especially as it is a special
> case for memory used as system memory, added via add_memory() and
> friends.
> 
> We now remove the resource after removing the sections instead of doing
> it the other way around. I don't think this change is problematic.
> 
> add_memory()
>   register memory resource
>   arch_add_memory()
> 
> remove_memory
>   arch_remove_memory()
>   release memory resource
> 
> While at it, explain why we ignore errors and that it only happeny if
> we remove memory in a different granularity as we added it.

Seems sane.

> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> +static void __release_memory_resource(u64 start, u64 size)
> +{
> + int ret;
> +
> + /*
> +  * When removing memory in the same granularity as it was added,
> +  * this function never fails. It might only fail if resources
> +  * have to be adjusted or split. We'll ignore the error, as
> +  * removing of memory cannot fail.
> +  */
> + ret = release_mem_region_adjustable(_resource, start, size);
> + if (ret) {
> + resource_size_t endres = start + size - 1;
> +
> + pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
> + , , ret);
> + }
> +}

The types seem confused here.  Should `start' and `size' be
resource_size_t?  Or maybe phys_addr_t.

release_mem_region_adjustable() takes resource_size_t's.

Is %pa the way to print a resource_size_t?  I guess it happens to work
because resource_size_t happens to map onto phys_addr_t, which isn't
ideal.

Wanna have a headscratch over that?

>  /**
>   * remove_memory
>   * @nid: the node ID
> @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>   memblock_remove(start, size);
>  
>   arch_remove_memory(nid, start, size, NULL);
> + __release_memory_resource(start, size);
>  
>   try_offline_node(nid);



[PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()

2019-04-09 Thread David Hildenbrand
__add_pages() doesn't add the memory resource, so __remove_pages()
shouldn't remove it. Let's factor it out. Especially as it is a special
case for memory used as system memory, added via add_memory() and
friends.

We now remove the resource after removing the sections instead of doing
it the other way around. I don't think this change is problematic.

add_memory()
register memory resource
arch_add_memory()

remove_memory
arch_remove_memory()
release memory resource

While at it, explain why we ignore errors and that it only happeny if
we remove memory in a different granularity as we added it.

Cc: Andrew Morton 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Pavel Tatashin 
Cc: Wei Yang 
Cc: Qian Cai 
Cc: Arun KS 
Cc: Mathieu Malaterre 
Signed-off-by: David Hildenbrand 
---
 mm/memory_hotplug.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4970ff658055..696ed7ee5e28 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
if (is_dev_zone(zone)) {
if (altmap)
map_offset = vmem_altmap_offset(altmap);
-   } else {
-   resource_size_t start, size;
-
-   start = phys_start_pfn << PAGE_SHIFT;
-   size = nr_pages * PAGE_SIZE;
-
-   ret = release_mem_region_adjustable(_resource, start,
-   size);
-   if (ret) {
-   resource_size_t endres = start + size - 1;
-
-   pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
-   , , ret);
-   }
}
 
clear_zone_contiguous(zone);
@@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
 }
 EXPORT_SYMBOL(try_offline_node);
 
+static void __release_memory_resource(u64 start, u64 size)
+{
+   int ret;
+
+   /*
+* When removing memory in the same granularity as it was added,
+* this function never fails. It might only fail if resources
+* have to be adjusted or split. We'll ignore the error, as
+* removing of memory cannot fail.
+*/
+   ret = release_mem_region_adjustable(_resource, start, size);
+   if (ret) {
+   resource_size_t endres = start + size - 1;
+
+   pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
+   , , ret);
+   }
+}
+
 /**
  * remove_memory
  * @nid: the node ID
@@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
memblock_remove(start, size);
 
arch_remove_memory(nid, start, size, NULL);
+   __release_memory_resource(start, size);
 
try_offline_node(nid);
 
-- 
2.17.2