[v4 1/6] mm/memory_hotplug: enforce block size aligned range check

2018-02-15 Thread Pavel Tatashin
Start qemu with the following arguments:

-m 64G,slots=2,maxmem=66G -object memory-backend-ram,id=mem1,size=2G

Which boots machine with 64G and adds a device mem1 with 2G that can be
hotplugged later.

Also make sure that .config has the following options turned on:

CONFIG_MEMORY_HOTPLUG
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
CONFIG_ACPI_HOTPLUG_MEMORY

Using the qemu monitor hotplug the memory:

(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

The operation will fail with the following trace:

WARNING: CPU: 0 PID: 91 at drivers/base/memory.c:205
pages_correctly_reserved+0xe6/0x110
Modules linked in:
CPU: 0 PID: 91 Comm: systemd-udevd Not tainted 4.16.0-rc1_pt_master #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:pages_correctly_reserved+0xe6/0x110
RSP: 0018:be5086b53d98 EFLAGS: 00010246
RAX: 9acb3fff3180 RBX: 9acaf7646038 RCX: 0800
RDX: 9acb3fff3000 RSI: 0218 RDI: 010c
RBP: 0108 R08: e81f8340 R09: 0110
R10: 9acb3fff6000 R11: 0246 R12: 0008
R13:  R14: be5086b53f08 R15: 9acaf7506f20
FS:  7fd7f20da8c0() GS:9acb3fc0() knlGS:000
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fd7f20f2000 CR3: 000ff7ac2001 CR4: 001606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 memory_subsys_online+0x44/0xa0
 device_online+0x51/0x80
 store_mem_state+0x5e/0xe0
 kernfs_fop_write+0xfa/0x170
 __vfs_write+0x2e/0x150
 ? __inode_security_revalidate+0x47/0x60
 ? selinux_file_permission+0xd5/0x130
 ? _cond_resched+0x10/0x20
 vfs_write+0xa8/0x1a0
 ? find_vma+0x54/0x60
 SyS_write+0x4d/0xb0
 do_syscall_64+0x5d/0x110
 entry_SYSCALL_64_after_hwframe+0x21/0x86
RIP: 0033:0x7fd7f0d3a840
RSP: 002b:7fff5db77c68 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 0006 RCX: 7fd7f0d3a840
RDX: 0006 RSI: 7fd7f20f2000 RDI: 0007
RBP: 7fd7f20f2000 R08: 55db265c4ab0 R09: 7fd7f20da8c0
R10: 0006 R11: 0246 R12: 55db265c49d0
R13: 0006 R14: 55db265c5510 R15: 000b
Code: fe ff ff 07 00 77 24 48 89 f8 48 c1 e8 17 49 8b 14 c2 48 85 d2 74 14
40 0f b6 c6 49 81 c0 00 00 20 00 48 c1 e0 04 48 01 d0 75 93 <0f> ff 31 c0
c3 b8 01 00 00 00 c3 31 d2 48 c7 c7 b0 32 67 a6 31
---[ end trace 6203bc4f1a5d30e8 ]---

The problem is detected in: drivers/base/memory.c

static bool pages_correctly_reserved(unsigned long start_pfn)
 if (WARN_ON_ONCE(!pfn_valid(pfn)))

This function loops through every section in the newly added memory block
and verifies that the first pfn in each section is valid, meaning section
exists, has mapping (struct page array), and is online.

The block size on x86 is usually 128M, but when machine is booted with
more than 64G of memory the block size is changed to 2G:

$ cat /sys/devices/system/memory/block_size_bytes
8000

or

$ dmesg | grep "block size"
[0.086469] x86/mm: Memory block size: 2048MB

During memory hotplug, and hotremove we verify that the range is section
size aligned, but we actually must verify that it is block size aligned,
because that is the proper unit for hotplug operations.  See:
Documentation/memory-hotplug.txt

So, when the start_pfn of newly added memory is not block size aligned, we
can get a memory block with partially populated sections.

In our case the start_pfn starts from the last_pfn (end of physical
memory).

$ dmesg | grep last_pfn
[0.00] e820: last_pfn = 0x104 max_arch_pfn = 0x4

0x104 == 65G, and so is not 2G aligned!

The fix is to enforce that memory that is hotplugged and hotremoved is
block size aligned.

With this fix, running the above sequence yield to the following result:

(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
Block size [0x8000] unaligned hotplug range: start 0x104000,
size 0x8000
acpi PNP0C80:00: add_memory failed
acpi PNP0C80:00: acpi_memory_enable_device() error
acpi PNP0C80:00: Enumeration failure

Signed-off-by: Pavel Tatashin 
Reviewed-by: Ingo Molnar 
Acked-by: Michal Hocko 
---
 mm/memory_hotplug.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b2bd52ff7605..565048f496f7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1083,15 +1083,16 @@ int try_online_node(int nid)
 
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
-   u64 start_pfn = PFN_DOWN(start);
+   unsigned long block_sz = memory_block_size_bytes();
+   u64 block_nr_pages = block_sz >> PAGE_SHIFT;
u64 nr_pages = size >> 

[v4 1/6] mm/memory_hotplug: enforce block size aligned range check

2018-02-15 Thread Pavel Tatashin
Start qemu with the following arguments:

-m 64G,slots=2,maxmem=66G -object memory-backend-ram,id=mem1,size=2G

Which boots machine with 64G and adds a device mem1 with 2G that can be
hotplugged later.

Also make sure that .config has the following options turned on:

CONFIG_MEMORY_HOTPLUG
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
CONFIG_ACPI_HOTPLUG_MEMORY

Using the qemu monitor hotplug the memory:

(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

The operation will fail with the following trace:

WARNING: CPU: 0 PID: 91 at drivers/base/memory.c:205
pages_correctly_reserved+0xe6/0x110
Modules linked in:
CPU: 0 PID: 91 Comm: systemd-udevd Not tainted 4.16.0-rc1_pt_master #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:pages_correctly_reserved+0xe6/0x110
RSP: 0018:be5086b53d98 EFLAGS: 00010246
RAX: 9acb3fff3180 RBX: 9acaf7646038 RCX: 0800
RDX: 9acb3fff3000 RSI: 0218 RDI: 010c
RBP: 0108 R08: e81f8340 R09: 0110
R10: 9acb3fff6000 R11: 0246 R12: 0008
R13:  R14: be5086b53f08 R15: 9acaf7506f20
FS:  7fd7f20da8c0() GS:9acb3fc0() knlGS:000
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fd7f20f2000 CR3: 000ff7ac2001 CR4: 001606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 memory_subsys_online+0x44/0xa0
 device_online+0x51/0x80
 store_mem_state+0x5e/0xe0
 kernfs_fop_write+0xfa/0x170
 __vfs_write+0x2e/0x150
 ? __inode_security_revalidate+0x47/0x60
 ? selinux_file_permission+0xd5/0x130
 ? _cond_resched+0x10/0x20
 vfs_write+0xa8/0x1a0
 ? find_vma+0x54/0x60
 SyS_write+0x4d/0xb0
 do_syscall_64+0x5d/0x110
 entry_SYSCALL_64_after_hwframe+0x21/0x86
RIP: 0033:0x7fd7f0d3a840
RSP: 002b:7fff5db77c68 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 0006 RCX: 7fd7f0d3a840
RDX: 0006 RSI: 7fd7f20f2000 RDI: 0007
RBP: 7fd7f20f2000 R08: 55db265c4ab0 R09: 7fd7f20da8c0
R10: 0006 R11: 0246 R12: 55db265c49d0
R13: 0006 R14: 55db265c5510 R15: 000b
Code: fe ff ff 07 00 77 24 48 89 f8 48 c1 e8 17 49 8b 14 c2 48 85 d2 74 14
40 0f b6 c6 49 81 c0 00 00 20 00 48 c1 e0 04 48 01 d0 75 93 <0f> ff 31 c0
c3 b8 01 00 00 00 c3 31 d2 48 c7 c7 b0 32 67 a6 31
---[ end trace 6203bc4f1a5d30e8 ]---

The problem is detected in: drivers/base/memory.c

static bool pages_correctly_reserved(unsigned long start_pfn)
 if (WARN_ON_ONCE(!pfn_valid(pfn)))

This function loops through every section in the newly added memory block
and verifies that the first pfn in each section is valid, meaning section
exists, has mapping (struct page array), and is online.

The block size on x86 is usually 128M, but when machine is booted with
more than 64G of memory the block size is changed to 2G:

$ cat /sys/devices/system/memory/block_size_bytes
8000

or

$ dmesg | grep "block size"
[0.086469] x86/mm: Memory block size: 2048MB

During memory hotplug, and hotremove we verify that the range is section
size aligned, but we actually must verify that it is block size aligned,
because that is the proper unit for hotplug operations.  See:
Documentation/memory-hotplug.txt

So, when the start_pfn of newly added memory is not block size aligned, we
can get a memory block with partially populated sections.

In our case the start_pfn starts from the last_pfn (end of physical
memory).

$ dmesg | grep last_pfn
[0.00] e820: last_pfn = 0x104 max_arch_pfn = 0x4

0x104 == 65G, and so is not 2G aligned!

The fix is to enforce that memory that is hotplugged and hotremoved is
block size aligned.

With this fix, running the above sequence yield to the following result:

(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
Block size [0x8000] unaligned hotplug range: start 0x104000,
size 0x8000
acpi PNP0C80:00: add_memory failed
acpi PNP0C80:00: acpi_memory_enable_device() error
acpi PNP0C80:00: Enumeration failure

Signed-off-by: Pavel Tatashin 
Reviewed-by: Ingo Molnar 
Acked-by: Michal Hocko 
---
 mm/memory_hotplug.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b2bd52ff7605..565048f496f7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1083,15 +1083,16 @@ int try_online_node(int nid)
 
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
-   u64 start_pfn = PFN_DOWN(start);
+   unsigned long block_sz = memory_block_size_bytes();
+   u64 block_nr_pages = block_sz >> PAGE_SHIFT;
u64 nr_pages = size >> PAGE_SHIFT;
+   u64 start_pfn = PFN_DOWN(start);
 
-