Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()

2019-11-19 Thread Andrew Morton
On Tue, 19 Nov 2019 15:16:22 +0100 David Hildenbrand  wrote:

> On 14.10.19 21:17, Andrew Morton wrote:
> > On Mon, 14 Oct 2019 11:32:13 +0200 David Hildenbrand  
> > wrote:
> > 
> >>> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
> >>
> >> @Andrew, can you convert that to
> >>
> >> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
> >> memory to zones until online") # visible after d0dc12e86b319
> >>
> >> and add
> >>
> >> Cc: sta...@vger.kernel.org # v4.13+
> > 
> > Done, thanks.
> > 
> 
> Just a note that Toshiki reported a BUG (race between delayed
> initialization of ZONE_DEVICE memmaps without holding the memory
> hotplug lock and concurrent zone shrinking).
> 
> https://lkml.org/lkml/2019/11/14/1040
> 
> "Iteration of create and destroy namespace causes the panic as below:
> 
> [   41.207694] kernel BUG at mm/page_alloc.c:535!
> [   41.208109] invalid opcode:  [#1] SMP PTI
> [   41.208508] CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
> [   41.209064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
> [   41.210175] RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
> [   41.210643] Code: 04 41 83 e2 3c 48 8d 04 a8 48 c1 e0 07 48 03 04 dd e0 59 
> 55 bb 48 8b 58 68 48 39 da 73 0e 48 c7 c6 70 ac 11 bb e8 1b b2 fd ff <0f> 0b 
> 48 03 58 78 48 39 da 73 e9 49 01 ca b9 3f 00 00 00 4f 8d 0c
> [   41.212354] RSP: 0018:ac0d41557c80 EFLAGS: 00010246
> [   41.212821] RAX: 004a RBX: 00244a00 RCX: 
> 
> [   41.213459] RDX:  RSI:  RDI: 
> bb1197dc
> [   41.214100] RBP: 000c R08: 0439 R09: 
> 0059
> [   41.214736] R10:  R11: ac0d41557b08 R12: 
> 8be475ea72b0
> [   41.215376] R13: fa00 R14: 0025 R15: 
> fffc0bb5
> [   41.216008] FS:  7f30862ab600() GS:8be57bc4() 
> knlGS:
> [   41.216771] CS:  0010 DS:  ES:  CR0: 80050033
> [   41.217299] CR2: 55e824d0d508 CR3: 000231dac000 CR4: 
> 06e0
> [   41.217934] Call Trace:
> [   41.218225]  memmap_init_zone_device+0x165/0x17c
> [   41.218642]  memremap_pages+0x4c1/0x540
> [   41.218989]  devm_memremap_pages+0x1d/0x60
> [   41.219367]  pmem_attach_disk+0x16b/0x600 [nd_pmem]
> [   41.219804]  ? devm_nsio_enable+0xb8/0xe0
> [   41.220172]  nvdimm_bus_probe+0x69/0x1c0
> [   41.220526]  really_probe+0x1c2/0x3e0
> [   41.220856]  driver_probe_device+0xb4/0x100
> [   41.221238]  device_driver_attach+0x4f/0x60
> [   41.221611]  bind_store+0xc9/0x110
> [   41.221919]  kernfs_fop_write+0x116/0x190
> [   41.222326]  vfs_write+0xa5/0x1a0
> [   41.222626]  ksys_write+0x59/0xd0
> [   41.222927]  do_syscall_64+0x5b/0x180
> [   41.223264]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   41.223714] RIP: 0033:0x7f30865d0ed8
> [   41.224037] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 
> 0f 1e fa 48 8d 05 45 78 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 
> 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> [   41.225920] RSP: 002b:7fffe5d30a78 EFLAGS: 0246 ORIG_RAX: 
> 0001
> [   41.226608] RAX: ffda RBX: 55e824d07f40 RCX: 
> 7f30865d0ed8
> [   41.227242] RDX: 0007 RSI: 55e824d07f40 RDI: 
> 0004
> [   41.227870] RBP: 0007 R08: 0007 R09: 
> 0006
> [   41.228753] R10:  R11: 0246 R12: 
> 0004
> [   41.229419] R13: 7f30862ab528 R14: 0001 R15: 
> 55e824d07f40
> 
> While creating a namespace and initializing memmap, if you destroy the 
> namespace
> and shrink the zone, it will initialize the memmap outside the zone and
> trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page) in
> set_pfnblock_flags_mask()."
> 
> 
> This BUG is also mitigated by this commit, where we for now stop to
> shrink the ZONE_DEVICE zone until we can do it in a safe and clean
> way.
> 

OK, thanks.  I updated the changelog, added Reported-by:Toshiki and
shall squeeze this fix into 5.4.



Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()

2019-11-19 Thread David Hildenbrand
On 14.10.19 21:17, Andrew Morton wrote:
> On Mon, 14 Oct 2019 11:32:13 +0200 David Hildenbrand  wrote:
> 
>>> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
>>
>> @Andrew, can you convert that to
>>
>> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
>> memory to zones until online") # visible after d0dc12e86b319
>>
>> and add
>>
>> Cc: sta...@vger.kernel.org # v4.13+
> 
> Done, thanks.
> 

Just a note that Toshiki reported a BUG (race between delayed
initialization of ZONE_DEVICE memmaps without holding the memory
hotplug lock and concurrent zone shrinking).

https://lkml.org/lkml/2019/11/14/1040

"Iteration of create and destroy namespace causes the panic as below:

[   41.207694] kernel BUG at mm/page_alloc.c:535!
[   41.208109] invalid opcode:  [#1] SMP PTI
[   41.208508] CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
[   41.209064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[   41.210175] RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
[   41.210643] Code: 04 41 83 e2 3c 48 8d 04 a8 48 c1 e0 07 48 03 04 dd e0 59 
55 bb 48 8b 58 68 48 39 da 73 0e 48 c7 c6 70 ac 11 bb e8 1b b2 fd ff <0f> 0b 48 
03 58 78 48 39 da 73 e9 49 01 ca b9 3f 00 00 00 4f 8d 0c
[   41.212354] RSP: 0018:ac0d41557c80 EFLAGS: 00010246
[   41.212821] RAX: 004a RBX: 00244a00 RCX: 
[   41.213459] RDX:  RSI:  RDI: bb1197dc
[   41.214100] RBP: 000c R08: 0439 R09: 0059
[   41.214736] R10:  R11: ac0d41557b08 R12: 8be475ea72b0
[   41.215376] R13: fa00 R14: 0025 R15: fffc0bb5
[   41.216008] FS:  7f30862ab600() GS:8be57bc4() 
knlGS:
[   41.216771] CS:  0010 DS:  ES:  CR0: 80050033
[   41.217299] CR2: 55e824d0d508 CR3: 000231dac000 CR4: 06e0
[   41.217934] Call Trace:
[   41.218225]  memmap_init_zone_device+0x165/0x17c
[   41.218642]  memremap_pages+0x4c1/0x540
[   41.218989]  devm_memremap_pages+0x1d/0x60
[   41.219367]  pmem_attach_disk+0x16b/0x600 [nd_pmem]
[   41.219804]  ? devm_nsio_enable+0xb8/0xe0
[   41.220172]  nvdimm_bus_probe+0x69/0x1c0
[   41.220526]  really_probe+0x1c2/0x3e0
[   41.220856]  driver_probe_device+0xb4/0x100
[   41.221238]  device_driver_attach+0x4f/0x60
[   41.221611]  bind_store+0xc9/0x110
[   41.221919]  kernfs_fop_write+0x116/0x190
[   41.222326]  vfs_write+0xa5/0x1a0
[   41.222626]  ksys_write+0x59/0xd0
[   41.222927]  do_syscall_64+0x5b/0x180
[   41.223264]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   41.223714] RIP: 0033:0x7f30865d0ed8
[   41.224037] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 
0f 1e fa 48 8d 05 45 78 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[   41.225920] RSP: 002b:7fffe5d30a78 EFLAGS: 0246 ORIG_RAX: 
0001
[   41.226608] RAX: ffda RBX: 55e824d07f40 RCX: 7f30865d0ed8
[   41.227242] RDX: 0007 RSI: 55e824d07f40 RDI: 0004
[   41.227870] RBP: 0007 R08: 0007 R09: 0006
[   41.228753] R10:  R11: 0246 R12: 0004
[   41.229419] R13: 7f30862ab528 R14: 0001 R15: 55e824d07f40

While creating a namespace and initializing memmap, if you destroy the namespace
and shrink the zone, it will initialize the memmap outside the zone and
trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page) in
set_pfnblock_flags_mask()."


This BUG is also mitigated by this commit, where we for now stop to
shrink the ZONE_DEVICE zone until we can do it in a safe and clean
way.

-- 

Thanks,

David / dhildenb



Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()

2019-10-14 Thread Andrew Morton
On Mon, 14 Oct 2019 11:32:13 +0200 David Hildenbrand  wrote:

> > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
> 
> @Andrew, can you convert that to
> 
> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded 
> memory to zones until online") # visible after d0dc12e86b319
> 
> and add
> 
> Cc: sta...@vger.kernel.org # v4.13+

Done, thanks.


Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()

2019-10-14 Thread David Hildenbrand

On 06.10.19 10:56, David Hildenbrand wrote:

Let's limit shrinking to !ZONE_DEVICE so we can fix the current code. We
should never try to touch the memmap of offline sections where we could
have uninitialized memmaps and could trigger BUGs when calling
page_to_nid() on poisoned pages.

There is no reliable way to distinguish an uninitialized memmap from an
initialized memmap that belongs to ZONE_DEVICE, as we don't have
anything like SECTION_IS_ONLINE we can use similar to
pfn_to_online_section() for !ZONE_DEVICE memory. E.g.,
set_zone_contiguous() similarly relies on pfn_to_online_section() and
will therefore never set a ZONE_DEVICE zone consecutive. Stopping to
shrink the ZONE_DEVICE therefore results in no observable changes,
besides /proc/zoneinfo indicating different boundaries - something we
can totally live with.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized with 0 and the node with the
right value. So the zone might be wrong but not garbage. After that
commit, both the zone and the node will be garbage when touching
uninitialized memmaps.

Cc: Andrew Morton 
Cc: Oscar Salvador 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Pavel Tatashin 
Cc: Dan Williams 
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")


@Andrew, can you convert that to

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded 
memory to zones until online") # visible after d0dc12e86b319


and add

Cc: sta...@vger.kernel.org # v4.13+


--

Thanks,

David / dhildenb


[PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()

2019-10-06 Thread David Hildenbrand
Let's limit shrinking to !ZONE_DEVICE so we can fix the current code. We
should never try to touch the memmap of offline sections where we could
have uninitialized memmaps and could trigger BUGs when calling
page_to_nid() on poisoned pages.

There is no reliable way to distinguish an uninitialized memmap from an
initialized memmap that belongs to ZONE_DEVICE, as we don't have
anything like SECTION_IS_ONLINE we can use similar to
pfn_to_online_section() for !ZONE_DEVICE memory. E.g.,
set_zone_contiguous() similarly relies on pfn_to_online_section() and
will therefore never set a ZONE_DEVICE zone consecutive. Stopping to
shrink the ZONE_DEVICE therefore results in no observable changes,
besides /proc/zoneinfo indicating different boundaries - something we
can totally live with.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized with 0 and the node with the
right value. So the zone might be wrong but not garbage. After that
commit, both the zone and the node will be garbage when touching
uninitialized memmaps.

Cc: Andrew Morton 
Cc: Oscar Salvador 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Pavel Tatashin 
Cc: Dan Williams 
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Reported-by: Aneesh Kumar K.V 
Signed-off-by: David Hildenbrand 
---
 mm/memory_hotplug.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86b4dc18e831..f96608d24f6a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -331,7 +331,7 @@ static unsigned long find_smallest_section_pfn(int nid, 
struct zone *zone,
 unsigned long end_pfn)
 {
for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) {
-   if (unlikely(!pfn_valid(start_pfn)))
+   if (unlikely(!pfn_to_online_page(start_pfn)))
continue;
 
if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -356,7 +356,7 @@ static unsigned long find_biggest_section_pfn(int nid, 
struct zone *zone,
/* pfn is the end pfn of a memory section. */
pfn = end_pfn - 1;
for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) {
-   if (unlikely(!pfn_valid(pfn)))
+   if (unlikely(!pfn_to_online_page(pfn)))
continue;
 
if (unlikely(pfn_to_nid(pfn) != nid))
@@ -415,7 +415,7 @@ static void shrink_zone_span(struct zone *zone, unsigned 
long start_pfn,
 */
pfn = zone_start_pfn;
for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-   if (unlikely(!pfn_valid(pfn)))
+   if (unlikely(!pfn_to_online_page(pfn)))
continue;
 
if (page_zone(pfn_to_page(pfn)) != zone)
@@ -463,6 +463,16 @@ static void __remove_zone(struct zone *zone, unsigned long 
start_pfn,
struct pglist_data *pgdat = zone->zone_pgdat;
unsigned long flags;
 
+#ifdef CONFIG_ZONE_DEVICE
+   /*
+* Zone shrinking code cannot properly deal with ZONE_DEVICE. So
+* we will not try to shrink the zones - which is okay as
+* set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+*/
+   if (zone_idx(zone) == ZONE_DEVICE)
+   return;
+#endif
+
pgdat_resize_lock(zone->zone_pgdat, );
shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
update_pgdat_span(pgdat);
-- 
2.21.0