Re: NULL pointer dereference during memory hotremove

2019-05-20 Thread Michal Hocko
On Fri 17-05-19 13:33:25, Pavel Tatashin wrote:
> On Fri, May 17, 2019 at 1:24 PM Pavel Tatashin
>  wrote:
> >
> > On Fri, May 17, 2019 at 1:22 PM Pavel Tatashin
> >  wrote:
> > >
> > > On Fri, May 17, 2019 at 10:38 AM Michal Hocko  wrote:
> > > >
> > > > On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
> > > > > This panic is unrelated to circular lock issue that I reported in a
> > > > > separate thread, that also happens during memory hotremove.
> > > > >
> > > > > xakep ~/x/linux$ git describe
> > > > > v5.1-12317-ga6a4b66bd8f4
> > > >
> > > > Does this happen on 5.0 as well?
> > >
> > > Yes, just reproduced it on 5.0 as well. Unfortunately, I do not have a
> > > script, and have to do it manually, also it does not happen every
> > > time, it happened on 3rd time for me.
> >
> > Actually, sorry, I have not tested 5.0, I compiled 5.0, but my script
> > still tested v5.1-12317-ga6a4b66bd8f4 build. I will report later if I
> > am able to reproduce it on 5.0.
> 
> OK, confirmed on 5.0 as well, took 4 tries to reproduce:

What is the last version that survives? Can you bisect?
-- 
Michal Hocko
SUSE Labs


Re: NULL pointer dereference during memory hotremove

2019-05-17 Thread Pavel Tatashin
On Fri, May 17, 2019 at 1:24 PM Pavel Tatashin
 wrote:
>
> On Fri, May 17, 2019 at 1:22 PM Pavel Tatashin
>  wrote:
> >
> > On Fri, May 17, 2019 at 10:38 AM Michal Hocko  wrote:
> > >
> > > On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
> > > > This panic is unrelated to circular lock issue that I reported in a
> > > > separate thread, that also happens during memory hotremove.
> > > >
> > > > xakep ~/x/linux$ git describe
> > > > v5.1-12317-ga6a4b66bd8f4
> > >
> > > Does this happen on 5.0 as well?
> >
> > Yes, just reproduced it on 5.0 as well. Unfortunately, I do not have a
> > script, and have to do it manually, also it does not happen every
> > time, it happened on 3rd time for me.
>
> Actually, sorry, I have not tested 5.0, I compiled 5.0, but my script
> still tested v5.1-12317-ga6a4b66bd8f4 build. I will report later if I
> am able to reproduce it on 5.0.

OK, confirmed on 5.0 as well, took 4 tries to reproduce:
(qemu) [   17.104486] Offlined Pages 32768
[   17.105543] Built 1 zonelists, mobility grouping on.  Total pages: 1515892
[   17.106475] Policy zone: Normal
[   17.107029] BUG: unable to handle kernel NULL pointer dereference
at 0698
[   17.107645] #PF error: [normal kernel read fault]
[   17.108038] PGD 0 P4D 0
[   17.108287] Oops:  [#1] SMP PTI
[   17.108557] CPU: 5 PID: 313 Comm: kworker/u16:5 Not tainted 5.0.0_pt_pmem1 #2
[   17.109128] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.12.0-20181126_142135-anatol 04/01/2014
[   17.109910] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[   17.110323] RIP: 0010:__remove_pages+0x2f/0x520
[   17.110674] Code: 41 56 41 55 49 89 fd 41 54 55 53 48 89 d3 48 83
ec 68 48 89 4c 24 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 60 31 c0
48 89 f8 <48> 2b 47 58 48 3d 00 19 00 00 0f 85 7f 03 00 00 48 85 c9 0f
84 df
[   17.112114] RSP: 0018:b43b815f3ca8 EFLAGS: 00010246
[   17.112518] RAX: 0640 RBX: 0004 RCX: 
[   17.113073] RDX: 0004 RSI: 0024 RDI: 0640
[   17.113615] RBP: 00024000 R08:  R09: 4000
[   17.114186] R10: 4000 R11: 00024000 R12: e382c900
[   17.114743] R13: 0640 R14: 0004 R15: 0024
[   17.115288] FS:  () GS:979539b4()
knlGS:
[   17.115911] CS:  0010 DS:  ES:  CR0: 80050033
[   17.116356] CR2: 0698 CR3: 000133c22004 CR4: 00360ee0
[   17.116913] DR0:  DR1:  DR2: 
[   17.117467] DR3:  DR6: fffe0ff0 DR7: 0400
[   17.118016] Call Trace:
[   17.118214]  ? memblock_isolate_range+0xc4/0x139
[   17.118570]  ? firmware_map_remove+0x48/0x90
[   17.118908]  arch_remove_memory+0x7b/0xc0
[   17.119216]  __remove_memory+0x93/0xc0
[   17.119528]  acpi_memory_device_remove+0x67/0xe0
[   17.119890]  acpi_bus_trim+0x50/0x90
[   17.120167]  acpi_device_hotplug+0x2fc/0x460
[   17.120498]  acpi_hotplug_work_fn+0x15/0x20
[   17.120834]  process_one_work+0x2a0/0x650
[   17.121146]  worker_thread+0x34/0x3d0
[   17.121432]  ? process_one_work+0x650/0x650
[   17.121772]  kthread+0x118/0x130
[   17.122032]  ? kthread_create_on_node+0x60/0x60
[   17.122413]  ret_from_fork+0x3a/0x50
[   17.122727] Modules linked in:
[   17.122983] CR2: 0698
[   17.123250] ---[ end trace 389c4034f6d42e6f ]---
[   17.123618] RIP: 0010:__remove_pages+0x2f/0x520
[   17.123979] Code: 41 56 41 55 49 89 fd 41 54 55 53 48 89 d3 48 83
ec 68 48 89 4c 24 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 60 31 c0
48 89 f8 <48> 2b 47 58 48 3d 00 19 00 00 0f 85 7f 03 00 00 48 85 c9 0f
84 df
[   17.125410] RSP: 0018:b43b815f3ca8 EFLAGS: 00010246
[   17.125818] RAX: 0640 RBX: 0004 RCX: 
[   17.126359] RDX: 0004 RSI: 0024 RDI: 0640
[   17.126906] RBP: 00024000 R08:  R09: 4000
[   17.127453] R10: 4000 R11: 00024000 R12: e382c900
[   17.128008] R13: 0640 R14: 0004 R15: 0024
[   17.128555] FS:  () GS:979539b4()
knlGS:
[   17.129182] CS:  0010 DS:  ES:  CR0: 80050033
[   17.129627] CR2: 0698 CR3: 000133c22004 CR4: 00360ee0
[   17.130182] DR0:  DR1:  DR2: 
[   17.130744] DR3:  DR6: fffe0ff0 DR7: 0400
[   17.131293] BUG: sleeping function called from invalid context at
include/linux/percpu-rwsem.h:34
[   17.132050] in_atomic(): 0, irqs_disabled(): 1, pid: 313, name: kworker/u16:5
[   17.132596] INFO: lockdep is turned off.
[   17.132908] irq event stamp: 14046
[   17.133175] hardirqs last  enabled at (14045): []
kfree+0xba/0x230
[   17.133777] hardirqs last disabled at (14046): []

Re: NULL pointer dereference during memory hotremove

2019-05-17 Thread Pavel Tatashin
On Fri, May 17, 2019 at 1:22 PM Pavel Tatashin
 wrote:
>
> On Fri, May 17, 2019 at 10:38 AM Michal Hocko  wrote:
> >
> > On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
> > > This panic is unrelated to circular lock issue that I reported in a
> > > separate thread, that also happens during memory hotremove.
> > >
> > > xakep ~/x/linux$ git describe
> > > v5.1-12317-ga6a4b66bd8f4
> >
> > Does this happen on 5.0 as well?
>
> Yes, just reproduced it on 5.0 as well. Unfortunately, I do not have a
> script, and have to do it manually, also it does not happen every
> time, it happened on 3rd time for me.

Actually, sorry, I have not tested 5.0, I compiled 5.0, but my script
still tested v5.1-12317-ga6a4b66bd8f4 build. I will report later if I
am able to reproduce it on 5.0.

Pasha


Re: NULL pointer dereference during memory hotremove

2019-05-17 Thread Pavel Tatashin
On Fri, May 17, 2019 at 10:38 AM Michal Hocko  wrote:
>
> On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
> > This panic is unrelated to circular lock issue that I reported in a
> > separate thread, that also happens during memory hotremove.
> >
> > xakep ~/x/linux$ git describe
> > v5.1-12317-ga6a4b66bd8f4
>
> Does this happen on 5.0 as well?

Yes, just reproduced it on 5.0 as well. Unfortunately, I do not have a
script, and have to do it manually, also it does not happen every
time, it happened on 3rd time for me.

Pasha


Re: NULL pointer dereference during memory hotremove

2019-05-17 Thread David Hildenbrand
On 17.05.19 16:38, Michal Hocko wrote:
> On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
>> This panic is unrelated to circular lock issue that I reported in a
>> separate thread, that also happens during memory hotremove.
>>
>> xakep ~/x/linux$ git describe
>> v5.1-12317-ga6a4b66bd8f4
> 
> Does this happen on 5.0 as well?
> 

We have on the list

[PATCH V3 1/4] mm/hotplug: Reorder arch_remove_memory() call in
__remove_memory()

Can that help?

-- 

Thanks,

David / dhildenb


Re: NULL pointer dereference during memory hotremove

2019-05-17 Thread Michal Hocko
On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
> This panic is unrelated to circular lock issue that I reported in a
> separate thread, that also happens during memory hotremove.
> 
> xakep ~/x/linux$ git describe
> v5.1-12317-ga6a4b66bd8f4

Does this happen on 5.0 as well?
-- 
Michal Hocko
SUSE Labs


NULL pointer dereference during memory hotremove

2019-05-17 Thread Pavel Tatashin
This panic is unrelated to circular lock issue that I reported in a
separate thread, that also happens during memory hotremove.

xakep ~/x/linux$ git describe
v5.1-12317-ga6a4b66bd8f4

Config is attached, qemu script is following:

qemu-system-x86_64  \
-enable-kvm \
-cpu host   \
-parallel none  \
-echr 1 \
-serial none\
-chardev stdio,id=console,signal=off,mux=on \
-serial chardev:console \
-mon chardev=console\
-vga none   \
-display none   \
-kernel pmem/native/arch/x86/boot/bzImage   \
-m 8G,slots=1,maxmem=16G\
-smp 8  \
-fsdev local,id=virtfs1,path=/,security_model=none  \
-device virtio-9p-pci,fsdev=virtfs1,mount_tag=hostfs\
-append 'earlyprintk=serial,ttyS0,115200 console=ttyS0
TERM=xterm ip=dhcp memmap=2G!6G loglevel=7'

The unusual case with this script is that 2G reserved for pmem device:
memmap=2G!6G. Otherwise, it is a normal layout. Unfortunately, it does
not happen every time, but I have hit it a couple times.


# QEMU 4.0.0 monitor - type 'help' for more information
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
# echo online_movable > /sys/devices/system/memory/memory79/state
[   40.219090] Built 1 zonelists, mobility grouping on.  Total pages: 1529279
[   40.223258] Policy zone: Normal
# (qemu) device_del dimm1
(qemu) [   49.624600] Offlined Pages 32768
[   49.625796] Built 1 zonelists, mobility grouping on.  Total pages: 1516352
[   49.627841] Policy zone: Normal
[   49.630932] BUG: kernel NULL pointer dereference, address: 0698
[   49.633704] #PF: supervisor read access in kernel mode
[   49.635689] #PF: error_code(0x) - not-present page
[   49.637620] PGD 800236b59067 P4D 800236b59067 PUD 2358fe067 PMD 0
[   49.640163] Oops:  [#1] SMP PTI
[   49.641223] CPU: 0 PID: 7 Comm: kworker/u16:0 Not tainted 5.1.0_pt_pmem #38
[   49.643183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.12.0-20181126_142135-anatol 04/01/2014
[   49.645858] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[   49.647101] RIP: 0010:__remove_pages+0x1a/0x460
[   49.648165] Code: e9 bb a9 fd ff 0f 0b 66 0f 1f 84 00 00 00 00 00
41 57 48 89 f8 49 89 ff 41 56 49 89 f6 41 55 41 54 55 53 48 89 d3 48
83 ec 50 <48> 2b 47 58 48 89 4c 24 48 48 3d 00 19 00 00 75 09 48 85 c9
0f 85
[   49.651925] RSP: 0018:bd1000c8fcb8 EFLAGS: 00010286
[   49.652857] RAX: 0640 RBX: 0004 RCX: 
[   49.654139] RDX: 0004 RSI: 0024 RDI: 0640
[   49.655393] RBP:  R08:  R09: 4000
[   49.656523] R10: 4000 R11: 00024000 R12: 4000
[   49.657654] R13: 00024000 R14: 0024 R15: 0640
[   49.658828] FS:  () GS:9b4bf980()
knlGS:
[   49.660178] CS:  0010 DS:  ES:  CR0: 80050033
[   49.661033] CR2: 0698 CR3: 0002382e0006 CR4: 00360ef0
[   49.662114] DR0:  DR1:  DR2: 
[   49.663172] DR3:  DR6: fffe0ff0 DR7: 0400
[   49.664243] Call Trace:
[   49.664622]  ? memblock_isolate_range+0xc4/0x139
[   49.665290]  ? firmware_map_add_hotplug+0x7e/0xde
[   49.665908]  ? memblock_remove_region+0x30/0x74
[   49.666498]  arch_remove_memory+0x6f/0xa0
[   49.667012]  __remove_memory+0xab/0x130
[   49.667492]  ? walk_memory_range+0xa1/0xe0
[   49.668008]  acpi_memory_device_remove+0x67/0xe0
[   49.668595]  acpi_bus_trim+0x50/0x90
[   49.669051]  acpi_device_hotplug+0x2fa/0x3e0
[   49.669590]  acpi_hotplug_work_fn+0x15/0x20
[   49.670116]  process_one_work+0x2a0/0x650
[   49.670577]  worker_thread+0x34/0x3d0
[   49.670997]  ? process_one_work+0x650/0x650
[   49.671503]  kthread+0x118/0x130
[   49.671879]  ? kthread_create_on_node+0x60/0x60
[   49.672411]  ret_from_fork+0x3a/0x50
[   49.672836] Modules linked in:
[   49.673190] CR2: 0698
[   49.673583] ---[ end trace 6b727d3a8ce48aa1 ]---
[   49.674120] RIP: 0010:__remove_pages+0x1a/0x460
[   49.674624] Code: e9 bb a9 fd ff 0f 0b 66 0f 1f 84 00 00 00 00 00
41 57 48 89 f8 49 89 ff 41 56 49 89 f6 41 55 41 54