Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-10 Thread Oscar Salvador
On Sat, Aug 11, 2018 at 12:25:39AM +1000, Rashmica Gupta wrote:
> On Fri, Aug 10, 2018 at 11:00 PM, Michal Hocko  wrote:
> > On Fri 10-08-18 16:55:40, Rashmica Gupta wrote:
> > [...]
> >> Most memory hotplug/hotremove seems to be block or section based, and
> >> always adds and removes memory at the same place.
> >
> > Yes and that is hard wired to the memory hotplug code. It is not easy to
> > make it work outside of section units restriction. So whatever your
> > memtrace is doing and if it relies on subsection hotplug it cannot
> > possibly work with the current code.
> >
> > I didn't get to review your patch but if it is only needed for an
> > unmerged code I would rather incline to not merge it unless it is a
> > clear win to the resource subsystem. A report from Oscar shows that this
> > is not the case though.
> >
> 
> Yup, makes sense. I'll work on it and see if I can not break things.

In __case__ we really need this patch, I think that one way to fix this is
to only call merge_node_resources() in case the node is already online.
Something like this (completely untested):

+struct resource *request_resource_and_merge(struct resource *parent,
+   struct resource *new, int nid)
+{
+   struct resource *conflict;
+
+   conflict = request_resource_conflict(parent, new);
+
+   if (conflict)
+   return conflict;
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+   /* We do not need to merge any resources on a node that is being
+* hot-added together with its memory.
+* The node will be allocated later.
+*/
+   if (node_online(nid))
+   merge_node_resources(nid, parent);
+#endif /* CONFIG_MEMORY_HOTREMOVE */

Although as Michal said, all memory-hotplug code is section-oriented, so
whatever it is that interacts with it should expect that.
Otherwise it can fail soon or later.

-- 
Oscar Salvador
SUSE L3


Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-10 Thread Rashmica Gupta
On Fri, Aug 10, 2018 at 11:00 PM, Michal Hocko  wrote:
> On Fri 10-08-18 16:55:40, Rashmica Gupta wrote:
> [...]
>> Most memory hotplug/hotremove seems to be block or section based, and
>> always adds and removes memory at the same place.
>
> Yes and that is hard wired to the memory hotplug code. It is not easy to
> make it work outside of section units restriction. So whatever your
> memtrace is doing and if it relies on subsection hotplug it cannot
> possibly work with the current code.
>
> I didn't get to review your patch but if it is only needed for an
> unmerged code I would rather incline to not merge it unless it is a
> clear win to the resource subsystem. A report from Oscar shows that this
> is not the case though.
>

Yup, makes sense. I'll work on it and see if I can not break things.

> --
> Michal Hocko
> SUSE Labs


Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-10 Thread Michal Hocko
On Fri 10-08-18 16:55:40, Rashmica Gupta wrote:
[...]
> Most memory hotplug/hotremove seems to be block or section based, and
> always adds and removes memory at the same place.

Yes and that is hard wired to the memory hotplug code. It is not easy to
make it work outside of section units restriction. So whatever your
memtrace is doing and if it relies on subsection hotplug it cannot
possibly work with the current code.

I didn't get to review your patch but if it is only needed for an
unmerged code I would rather incline to not merge it unless it is a
clear win to the resource subsystem. A report from Oscar shows that this
is not the case though.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-10 Thread Oscar Salvador
On Thu, Aug 09, 2018 at 12:54:09PM +1000, Rashmica Gupta wrote:
> When hot-removing memory release_mem_region_adjustable() splits
> iomem resources if they are not the exact size of the memory being
> hot-deleted. Adding this memory back to the kernel adds a new
> resource.
> 
> Eg a node has memory 0x0 - 0xf. Offlining and hot-removing
> 1GB from 0xf4000 results in the single resource 0x0-0xf being
> split into two resources: 0x0-0xf3fff and 0xf8000-0xf.
> 
> When we hot-add the memory back we now have three resources:
> 0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.
> 
> Now if we try to remove some memory that overlaps these resources,
> like 2GB from 0xf4000, release_mem_region_adjustable() fails as it
> expects the chunk of memory to be within the boundaries of a single
> resource.
> 
> This patch adds a function request_resource_and_merge(). This is called
> instead of request_resource_conflict() when registering a resource in
> add_memory(). It calls request_resource_conflict() and if hot-removing is
> enabled (if it isn't we won't get resource fragmentation) we attempt to
> merge contiguous resources on the node.
> 
> Signed-off-by: Rashmica Gupta 

Hi Rashmica,

Unfortunately this patch breaks memory-hotplug.

It makes my kernel go boom when hot-adding memory via qemu:

Way to reproduce it:

# connect to a qemu console
# add hot memory:

(qemu) object_add memory-backend-ram,id=ram0,size=1G
(qemu) device_add pc-dimm,id=dimm2,memdev=ram0,node=1

and...

kernel: BUG: unable to handle kernel paging request at 00029ce8
kernel: PGD 0 P4D 0 
kernel: Oops:  [#1] SMP PTI
kernel: CPU: 1 PID: 7 Comm: kworker/u4:0 Tainted: GE 
4.18.0-rc8-next-20180810-1-default+ #292
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.0.0-prebuilt.qemu-project.org 04/01/2014
kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
kernel: RIP: 0010:request_resource_and_merge+0x51/0x120
kernel: Code: df e8 13 e6 ff ff c6 05 bc eb 50 01 00 48 85 c0 74 09 5b 5d 41 5c 
41 5d 41 5e c3 4a 8b 04 e5 40 e6 00 82 48 c7 c7 d0 70 58 82 <4c> 8b a8 e8 9c 02 
00 4d 89 ec 4c 03 a8 f8 9c 02 00 e8 89 aa 57 00
kernel: RSP: 0018:c9367d48 EFLAGS: 00010246
kernel: RAX:  RBX: 81e4e060 RCX: 00013fff
kernel: RDX: 0001 RSI: 880077467580 RDI: 825870d0
kernel: RBP: 880077467580 R08: 88007ffabcf0 R09: 880077467580
kernel: R10:  R11: 8800376eec09 R12: 0001
kernel: R13: 4000 R14: 0001 R15: 0001
kernel: FS:  () GS:88007db0() 
knlGS:
kernel: CS:  0010 DS:  ES:  CR0: 80050033
kernel: CR2: 00029ce8 CR3: 783ac000 CR4: 06a0
kernel: DR0:  DR1:  DR2: 
kernel: DR3:  DR6: fffe0ff0 DR7: 0400
kernel: Call Trace:
kernel:  add_memory+0x68/0x120
kernel:  acpi_memory_device_add+0x134/0x2e0
kernel:  acpi_bus_attach+0xd9/0x190
kernel:  acpi_bus_scan+0x37/0x70
kernel:  acpi_device_hotplug+0x389/0x4e0
kernel:  acpi_hotplug_work_fn+0x1a/0x30
kernel:  process_one_work+0x15f/0x350
kernel:  worker_thread+0x49/0x3e0
kernel:  kthread+0xf5/0x130
kernel:  ? max_active_store+0x60/0x60
kernel:  ? kthread_bind+0x10/0x10
kernel:  ret_from_fork+0x35/0x40
kernel: Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) 
xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) 
ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) 
iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) 
iptable_filter(E) ip_tables(E) x_tables(E) bochs_drm(E) ttm(E) 
drm_kms_helper(E) drm(E) virtio_net(E) net_failover(E) i2c_piix4(E) 
parport_pc(E) parport(E) failover(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) 
fb_sys_fops(E) nfit(E) libnvdimm(E) button(E) pcspkr(E) btrfs(E) libcrc32c(E) 
xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) 
ata_generic(E) ata_piix(E) ahci(E) libahci(E) virtio_pci(E) virtio_ring(E) 
virtio(E) serio_raw(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
kernel: CR2: 00029ce8
kernel: ---[ end trace be1a8c4d1824ebf4 ]---
kernel: RIP: 0010:request_resource_and_merge+0x51/0x120
kernel: Code: df e8 13 e6 ff ff c6 05 bc eb 50 01 00 48 85 c0 74 09 5b 5d 41 5c 
41 5d 41 5e c3 4a 8b 04 e5 40 e6 00 82 48 c7 c7 d0 70 58 82 <4c> 8b a8 e8 9c 02 
00 4d 89 ec 4c 03 a8 f8 9c 02 00 e8 89 aa 57 00
kernel: RSP: 0018:c9367d48 EFLAGS: 00010246
kernel: RAX:  RBX: 81e4e060 RCX: 00013fff
kernel: RDX: 0001 RSI: 880077467580 RDI: 825870d0
kernel: RBP: 880077467580 R08: 88007ffabcf0 R09: 880077467580
kernel: R10:  R11: 8800376eec09 R12: 0001
kernel: R13: 4000 R14: 0001 R15: 000

Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-10 Thread Vlastimil Babka
On 08/10/2018 08:55 AM, Rashmica Gupta wrote:
> On Fri, Aug 10, 2018 at 11:12 AM, Andrew Morton
>  wrote:
>>
>> What is the end-user impact of this patch?
>>
> 
> Only architectures/setups that allow the user to remove and add memory of
> different sizes or different start addresses from the kernel at runtime will
> potentially encounter the resource fragmentation.
> 
> Trying to remove memory that overlaps iomem resources the first time
> gives us this warning: "Unable to release resource <%pa-%pa>".
> 
> Attempting a second time results in a kernel oops (on ppc at least).

An oops? I think that should be investigated and fixed, even if resource
merging prevents it. Do you have the details?

Thanks,
Vlastimil


Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-09 Thread Rashmica Gupta
On Fri, Aug 10, 2018 at 11:12 AM, Andrew Morton
 wrote:
> On Thu,  9 Aug 2018 12:54:09 +1000 Rashmica Gupta  
> wrote:
>
>> When hot-removing memory release_mem_region_adjustable() splits
>> iomem resources if they are not the exact size of the memory being
>> hot-deleted. Adding this memory back to the kernel adds a new
>> resource.
>>
>> Eg a node has memory 0x0 - 0xf. Offlining and hot-removing
>> 1GB from 0xf4000 results in the single resource 0x0-0xf being
>> split into two resources: 0x0-0xf3fff and 0xf8000-0xf.
>>
>> When we hot-add the memory back we now have three resources:
>> 0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.
>>
>> Now if we try to remove some memory that overlaps these resources,
>> like 2GB from 0xf4000, release_mem_region_adjustable() fails as it
>> expects the chunk of memory to be within the boundaries of a single
>> resource.
>>
>> This patch adds a function request_resource_and_merge(). This is called
>> instead of request_resource_conflict() when registering a resource in
>> add_memory(). It calls request_resource_conflict() and if hot-removing is
>> enabled (if it isn't we won't get resource fragmentation) we attempt to
>> merge contiguous resources on the node.
>
> What is the end-user impact of this patch?
>

Only architectures/setups that allow the user to remove and add memory of
different sizes or different start addresses from the kernel at runtime will
potentially encounter the resource fragmentation.

Trying to remove memory that overlaps iomem resources the first time
gives us this warning: "Unable to release resource <%pa-%pa>".

Attempting a second time results in a kernel oops (on ppc at least).

With this patch the user will not be restricted, by resource fragmentation
caused by previous hotremove/hotplug attempts, to what chunks of memory
they can remove.



> Do you believe the fix should be merged into 4.18?  Backporting into
> -stable kernels?  If so, why?


I hit this when adding hot-add code to memtrace on ppc.

Most memory hotplug/hotremove seems to be block or section based, and
always adds and removes memory at the same place.

Memtrace on ppc is different in that given a size (aligned to a block size),
it scans each node and finds a chunk of memory of that size that we can offline
and then removes it.

As this is possibly only as issue for memtrace on ppc with a patch that is not
in 4.18, I don't think this code needs to go in 4.18.


>
> Thanks.


Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-09 Thread Andrew Morton
On Thu,  9 Aug 2018 12:54:09 +1000 Rashmica Gupta  wrote:

> When hot-removing memory release_mem_region_adjustable() splits
> iomem resources if they are not the exact size of the memory being
> hot-deleted. Adding this memory back to the kernel adds a new
> resource.
> 
> Eg a node has memory 0x0 - 0xf. Offlining and hot-removing
> 1GB from 0xf4000 results in the single resource 0x0-0xf being
> split into two resources: 0x0-0xf3fff and 0xf8000-0xf.
> 
> When we hot-add the memory back we now have three resources:
> 0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.
> 
> Now if we try to remove some memory that overlaps these resources,
> like 2GB from 0xf4000, release_mem_region_adjustable() fails as it
> expects the chunk of memory to be within the boundaries of a single
> resource.
> 
> This patch adds a function request_resource_and_merge(). This is called
> instead of request_resource_conflict() when registering a resource in
> add_memory(). It calls request_resource_conflict() and if hot-removing is
> enabled (if it isn't we won't get resource fragmentation) we attempt to
> merge contiguous resources on the node.

What is the end-user impact of this patch?

Do you believe the fix should be merged into 4.18?  Backporting into
-stable kernels?  If so, why?

Thanks.


Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-09 Thread Mike Rapoport
On Thu, Aug 09, 2018 at 12:54:09PM +1000, Rashmica Gupta wrote:
> When hot-removing memory release_mem_region_adjustable() splits
> iomem resources if they are not the exact size of the memory being
> hot-deleted. Adding this memory back to the kernel adds a new
> resource.
> 
> Eg a node has memory 0x0 - 0xf. Offlining and hot-removing
> 1GB from 0xf4000 results in the single resource 0x0-0xf being
> split into two resources: 0x0-0xf3fff and 0xf8000-0xf.
> 
> When we hot-add the memory back we now have three resources:
> 0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.
> 
> Now if we try to remove some memory that overlaps these resources,
> like 2GB from 0xf4000, release_mem_region_adjustable() fails as it
> expects the chunk of memory to be within the boundaries of a single
> resource.
> 
> This patch adds a function request_resource_and_merge(). This is called
> instead of request_resource_conflict() when registering a resource in
> add_memory(). It calls request_resource_conflict() and if hot-removing is
> enabled (if it isn't we won't get resource fragmentation) we attempt to
> merge contiguous resources on the node.
> 
> Signed-off-by: Rashmica Gupta 

Reviewed-by: Mike Rapoport 

> ---
> v2->v3: Update Xen balloon, make the commit msg and a comment clearer,
> and changed '>' to '>=' when comparing the end of a resource and the
> end of a node.
> 
> v1->v2: Only attempt to merge resources if hot-remove is enabled.
> 
>  drivers/xen/balloon.c  |   3 +-
>  include/linux/ioport.h |   2 +
>  include/linux/memory_hotplug.h |   2 +-
>  kernel/resource.c  | 120 
> +
>  mm/memory_hotplug.c|  22 
>  5 files changed, 136 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 065f0b607373..9b972b37b0da 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -401,7 +401,8 @@ static enum bp_state reserve_additional_memory(void)
>* callers drop the mutex before trying again.
>*/
>   mutex_unlock(&balloon_mutex);
> - rc = add_memory_resource(nid, resource, memhp_auto_online);
> + rc = add_memory_resource(nid, resource->start, resource_size(resource),
> +  memhp_auto_online);
>   mutex_lock(&balloon_mutex);
> 
>   if (rc) {
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..f5b93a711e86 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -189,6 +189,8 @@ extern int allocate_resource(struct resource *root, 
> struct resource *new,
>  resource_size_t,
>  resource_size_t),
>void *alignf_data);
> +extern struct resource *request_resource_and_merge(struct resource *parent,
> +struct resource *new, int 
> nid);
>  struct resource *lookup_resource(struct resource *root, resource_size_t 
> start);
>  int adjust_resource(struct resource *res, resource_size_t start,
>   resource_size_t size);
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 4e9828cda7a2..9c00f97c8cc6 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -322,7 +322,7 @@ static inline void remove_memory(int nid, u64 start, u64 
> size) {}
>  extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
>   void *arg, int (*func)(struct memory_block *, void *));
>  extern int add_memory(int nid, u64 start, u64 size);
> -extern int add_memory_resource(int nid, struct resource *resource, bool 
> online);
> +extern int add_memory_resource(int nid, u64 start, u64 size, bool online);
>  extern int arch_add_memory(int nid, u64 start, u64 size,
>   struct vmem_altmap *altmap, bool want_memblock);
>  extern void move_pfn_range_to_zone(struct zone *zone, unsigned long 
> start_pfn,
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 30e1bc68503b..a31d3f5bccb7 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1621,3 +1621,123 @@ static int __init strict_iomem(char *str)
>  }
> 
>  __setup("iomem=", strict_iomem);
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/*
> + * Attempt to merge resource and it's sibling
> + */
> +static int merge_resources(struct resource *res)
> +{
> + struct resource *next;
> + struct resource *tmp;
> + uint64_t size;
> + int ret = -EINVAL;
> +
> + next = res->sibling;
> +
> + /*
> +  * Not sure how to handle two different children. So only attempt
> +  * to merge two resources if neither have children, only one has a
> +  * child or if both have the same child.
> +  */
> + if ((res->child && next->child) && (res->child !

Re: [PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-09 Thread Vlastimil Babka
On 08/09/2018 04:54 AM, Rashmica Gupta wrote:
> When hot-removing memory release_mem_region_adjustable() splits
> iomem resources if they are not the exact size of the memory being
> hot-deleted. Adding this memory back to the kernel adds a new
> resource.
> 
> Eg a node has memory 0x0 - 0xf. Offlining and hot-removing
> 1GB from 0xf4000 results in the single resource 0x0-0xf being
> split into two resources: 0x0-0xf3fff and 0xf8000-0xf.
> 
> When we hot-add the memory back we now have three resources:
> 0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.
> 
> Now if we try to remove some memory that overlaps these resources,
> like 2GB from 0xf4000, release_mem_region_adjustable() fails as it
> expects the chunk of memory to be within the boundaries of a single
> resource.
> 
> This patch adds a function request_resource_and_merge(). This is called
> instead of request_resource_conflict() when registering a resource in
> add_memory(). It calls request_resource_conflict() and if hot-removing is
> enabled (if it isn't we won't get resource fragmentation) we attempt to
> merge contiguous resources on the node.
> 
> Signed-off-by: Rashmica Gupta 

Acked-by: Vlastimil Babka