Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 19, 2015 at 08:58:27AM +0200, Kirill A. Shutemov wrote: > On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote: > > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote: > > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > > > > I couldn't see any problem. > > > > > > > > > > > > > > > > However, in this round, I did another test which is same one > > > > > > > > I attached but a liitle bit different because it doesn't do > > > > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > > > > > > > Could you share updated test? > > > > > > > > > > > > It's part of my testing suite so I should factor it out. > > > > > > I will send it when I go to office tomorrow. > > > > > > > > > > Thanks. > > > > > > > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > > > > > > > Befor leaving office, I queued it up and result is below. > > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > > > > Anyway, please confirm and say to me what I should add more patches > > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > > > > fix patches. > > > > > > > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 > > > > > release: > > > > > > > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > > > > > > > 1. mm: fix __page_mapcount() > > > > 2. thp: fix leak due split_huge_page() vs. exit race > > > > > > > > If I missed some patches, let me know it. > > > > > > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested > > > > again. > > > > But unfortunately, the result was below. > > > > > > > > Now, I am making test program I can send to you but it seems to be not > > > > easy > > > > because small changes for factoring it out from testing suite seems to > > > > change > > > > something(ex, timing) and makes hard to reproduce. I will try it again. > > > > > > Your test suite seems generate quite a few bug reports. Don't mind make > > > whole > > > suite public? > > > > It's tough due to including company internal stuffs. > > That's why I try to factor the part I can share out but unfortunatel, > > I couldn't grab a time for retrying until now. :( > > > > > > > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > > > > index:0x60e02 > > > > flags: 0x40040018(uptodate|dirty|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > > > > page->mem_cgroup:880077cf0c00 > > > > [ cut here ] > > > > kernel BUG at mm/huge_memory.c:3272! > > > > invalid opcode: [#1] SMP > > > > Dumping ftrace buffer: > > > >(ftrace buffer empty) > > > > Modules linked in: > > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > > > > 01/01/2011 > > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > > > > RIP: 0010:[] [] > > > > split_huge_page_to_list+0x8fb/0x910 > > > > RSP: 0018:88007344f968 EFLAGS: 00010286 > > > > RAX: 0021 RBX: ea240080 RCX: > > > > RDX: 0001 RSI: 0246 RDI: 821df4d8 > > > > RBP: 88007344f9e8 R08: R09: 880bc600 > > > > R10: 8163e2c0 R11: 4b47 R12: ea240080 > > > > R13: ea240088 R14: ea240080 R15: > > > > FS: () GS:88007830() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 8005003b > > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > > > > Stack: > > > > cccd ea240080 88007344fa00 ea240088 > > > > 88007344fa00 88007344f9e8 810f0200 > > > > ea24 ea240080 > > > > Call Trace: > > > > [] ? __lock_page+0xa0/0xb0 > > > > [] deferred_split_scan+0x115/0x240 > > > > [] ? list_lru_count_one+0x1c/0x30 > > > > [] shrink_slab.part.42+0x1e3/0x350 > > > > [] shrink_zone+0x26a/0x280 > > > > [] do_try_to_free_pages+0x12d/0x3b0 > > > > [] try_to_free_pages+0xb4/0x140 > > > > [] __alloc_pages_nodemask+0x459/0x920 > > > > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > > > > [] khugepaged+0x155/0x1b10 > > > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 19, 2015 at 08:58:27AM +0200, Kirill A. Shutemov wrote: > On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote: > > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote: > > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > > > > I couldn't see any problem. > > > > > > > > > > > > > > > > However, in this round, I did another test which is same one > > > > > > > > I attached but a liitle bit different because it doesn't do > > > > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > > > > > > > Could you share updated test? > > > > > > > > > > > > It's part of my testing suite so I should factor it out. > > > > > > I will send it when I go to office tomorrow. > > > > > > > > > > Thanks. > > > > > > > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > > > > > > > Befor leaving office, I queued it up and result is below. > > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > > > > Anyway, please confirm and say to me what I should add more patches > > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > > > > fix patches. > > > > > > > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 > > > > > release: > > > > > > > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > > > > > > > 1. mm: fix __page_mapcount() > > > > 2. thp: fix leak due split_huge_page() vs. exit race > > > > > > > > If I missed some patches, let me know it. > > > > > > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested > > > > again. > > > > But unfortunately, the result was below. > > > > > > > > Now, I am making test program I can send to you but it seems to be not > > > > easy > > > > because small changes for factoring it out from testing suite seems to > > > > change > > > > something(ex, timing) and makes hard to reproduce. I will try it again. > > > > > > Your test suite seems generate quite a few bug reports. Don't mind make > > > whole > > > suite public? > > > > It's tough due to including company internal stuffs. > > That's why I try to factor the part I can share out but unfortunatel, > > I couldn't grab a time for retrying until now. :( > > > > > > > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > > > > index:0x60e02 > > > > flags: 0x40040018(uptodate|dirty|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > > > > page->mem_cgroup:880077cf0c00 > > > > [ cut here ] > > > > kernel BUG at mm/huge_memory.c:3272! > > > > invalid opcode: [#1] SMP > > > > Dumping ftrace buffer: > > > >(ftrace buffer empty) > > > > Modules linked in: > > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > > > > 01/01/2011 > > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > > > > RIP: 0010:[] [] > > > > split_huge_page_to_list+0x8fb/0x910 > > > > RSP: 0018:88007344f968 EFLAGS: 00010286 > > > > RAX: 0021 RBX: ea240080 RCX: > > > > RDX: 0001 RSI: 0246 RDI: 821df4d8 > > > > RBP: 88007344f9e8 R08: R09: 880bc600 > > > > R10: 8163e2c0 R11: 4b47 R12: ea240080 > > > > R13: ea240088 R14: ea240080 R15: > > > > FS: () GS:88007830() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 8005003b > > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > > > > Stack: > > > > cccd ea240080 88007344fa00 ea240088 > > > > 88007344fa00 88007344f9e8 810f0200 > > > > ea24 ea240080 > > > > Call Trace: > > > > [] ? __lock_page+0xa0/0xb0 > > > > [] deferred_split_scan+0x115/0x240 > > > > [] ? list_lru_count_one+0x1c/0x30 > > > > [] shrink_slab.part.42+0x1e3/0x350 > > > > [] shrink_zone+0x26a/0x280 > > > > [] do_try_to_free_pages+0x12d/0x3b0 > > > > [] try_to_free_pages+0xb4/0x140 > > > > [] __alloc_pages_nodemask+0x459/0x920 > > > > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > > > > [] khugepaged+0x155/0x1b10 > > > >
Re: kernel oops on mmotm-2015-10-15-15-20
> On Nov 19, 2015, at 14:58, Kirill A. Shutemov wrote: > > uncharged i also encounter this crash , also i encounter a crash like this in qemu: [2.703436] [] do_execveat_common.isra.36+0x4f0/0x630 [2.703624] [] do_execve+0x24/0x30 [2.703767] [] SyS_execve+0x1c/0x2c [2.703923] BUG: Bad page map in process init pte:604837ebd3 pmd:b29e7003 [2.704140] page:ffc07f00af80 count:2 mapcount:-1 mapping: (null) index:0x1 [2.704414] flags: 0x4014(referenced|dirty) [2.704563] page dumped because: bad pte [2.704666] addr:007fafb7e000 vm_flags:00100073 anon_vma:ffc0729bdb90 mapping: (null) index:7fafb7e [2.704906] file: (null) fault: (null) mmap: (null) readpage: (null) [2.705117] CPU: 0 PID: 84 Comm: init Tainted: GB 4.2.0ajb-5-g11a9bf3 #80 [2.705315] Hardware name: ranchu (DT) [2.705408] Call trace: [2.705488] [] dump_backtrace+0x0/0x124 [2.705657] [] show_stack+0x10/0x1c [2.705797] [] dump_stack+0x78/0x98 [2.705971] [] print_bad_pte+0x154/0x1f0 [2.706102] [] unmap_single_vma+0x574/0x704 [2.706236] [] unmap_vmas+0x54/0x70 [2.706354] [] exit_mmap+0x88/0xfc [2.706473] [] mmput+0x48/0xe8 [2.706584] [] flush_old_exec+0x30c/0x79c [2.706719] [] load_elf_binary+0x21c/0x1098 [2.706856] [] search_binary_handler+0xa8/0x224 [2.706995] [] do_execveat_common.isra.36+0x4f0/0x630 [2.707144] [] do_execve+0x24/0x30 [2.707263] [] SyS_execve+0x1c/0x2c [2.707392] BUG: Bad page map in process init pte:604837fbd3 pmd:b29e7003 [2.707752] page:ffc07f00afc0 count:2 mapcount:-1 mapping: (null) index:0x1 [2.708167] flags: 0x4014(referenced|dirty) [2.708333] page dumped because: bad pte [2.708501] addr:007fafb7f000 vm_flags:00100073 anon_vma:ffc0729bdb90 mapping: (null) index:7fafb7f [2.709084] file: (null) fault: (null) mmap: (null) readpage: (null) [2.709306] CPU: 0 PID: 84 Comm: init Tainted: GB 4.2.0ajb-5-g11a9bf3 #80 [2.709494] Hardware name: ranchu (DT) seems the page map count is not correct .. i build is based on mmotm-2015-10-21-14-41 Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
> On Nov 19, 2015, at 14:58, Kirill A. Shutemovwrote: > > uncharged i also encounter this crash , also i encounter a crash like this in qemu: [2.703436] [] do_execveat_common.isra.36+0x4f0/0x630 [2.703624] [] do_execve+0x24/0x30 [2.703767] [] SyS_execve+0x1c/0x2c [2.703923] BUG: Bad page map in process init pte:604837ebd3 pmd:b29e7003 [2.704140] page:ffc07f00af80 count:2 mapcount:-1 mapping: (null) index:0x1 [2.704414] flags: 0x4014(referenced|dirty) [2.704563] page dumped because: bad pte [2.704666] addr:007fafb7e000 vm_flags:00100073 anon_vma:ffc0729bdb90 mapping: (null) index:7fafb7e [2.704906] file: (null) fault: (null) mmap: (null) readpage: (null) [2.705117] CPU: 0 PID: 84 Comm: init Tainted: GB 4.2.0ajb-5-g11a9bf3 #80 [2.705315] Hardware name: ranchu (DT) [2.705408] Call trace: [2.705488] [] dump_backtrace+0x0/0x124 [2.705657] [] show_stack+0x10/0x1c [2.705797] [] dump_stack+0x78/0x98 [2.705971] [] print_bad_pte+0x154/0x1f0 [2.706102] [] unmap_single_vma+0x574/0x704 [2.706236] [] unmap_vmas+0x54/0x70 [2.706354] [] exit_mmap+0x88/0xfc [2.706473] [] mmput+0x48/0xe8 [2.706584] [] flush_old_exec+0x30c/0x79c [2.706719] [] load_elf_binary+0x21c/0x1098 [2.706856] [] search_binary_handler+0xa8/0x224 [2.706995] [] do_execveat_common.isra.36+0x4f0/0x630 [2.707144] [] do_execve+0x24/0x30 [2.707263] [] SyS_execve+0x1c/0x2c [2.707392] BUG: Bad page map in process init pte:604837fbd3 pmd:b29e7003 [2.707752] page:ffc07f00afc0 count:2 mapcount:-1 mapping: (null) index:0x1 [2.708167] flags: 0x4014(referenced|dirty) [2.708333] page dumped because: bad pte [2.708501] addr:007fafb7f000 vm_flags:00100073 anon_vma:ffc0729bdb90 mapping: (null) index:7fafb7f [2.709084] file: (null) fault: (null) mmap: (null) readpage: (null) [2.709306] CPU: 0 PID: 84 Comm: init Tainted: GB 4.2.0ajb-5-g11a9bf3 #80 [2.709494] Hardware name: ranchu (DT) seems the page map count is not correct .. i build is based on mmotm-2015-10-21-14-41 Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote: > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > > > I couldn't see any problem. > > > > > > > > > > > > > > However, in this round, I did another test which is same one > > > > > > > I attached but a liitle bit different because it doesn't do > > > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > > > > > Could you share updated test? > > > > > > > > > > It's part of my testing suite so I should factor it out. > > > > > I will send it when I go to office tomorrow. > > > > > > > > Thanks. > > > > > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > > > > > Befor leaving office, I queued it up and result is below. > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > > > Anyway, please confirm and say to me what I should add more patches > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > > > fix patches. > > > > > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > > > > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > > > > > 1. mm: fix __page_mapcount() > > > 2. thp: fix leak due split_huge_page() vs. exit race > > > > > > If I missed some patches, let me know it. > > > > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested > > > again. > > > But unfortunately, the result was below. > > > > > > Now, I am making test program I can send to you but it seems to be not > > > easy > > > because small changes for factoring it out from testing suite seems to > > > change > > > something(ex, timing) and makes hard to reproduce. I will try it again. > > > > Your test suite seems generate quite a few bug reports. Don't mind make > > whole > > suite public? > > It's tough due to including company internal stuffs. > That's why I try to factor the part I can share out but unfortunatel, > I couldn't grab a time for retrying until now. :( > > > > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > > > index:0x60e02 > > > flags: 0x40040018(uptodate|dirty|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > > > page->mem_cgroup:880077cf0c00 > > > [ cut here ] > > > kernel BUG at mm/huge_memory.c:3272! > > > invalid opcode: [#1] SMP > > > Dumping ftrace buffer: > > >(ftrace buffer empty) > > > Modules linked in: > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > > > 01/01/2011 > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > > > RIP: 0010:[] [] > > > split_huge_page_to_list+0x8fb/0x910 > > > RSP: 0018:88007344f968 EFLAGS: 00010286 > > > RAX: 0021 RBX: ea240080 RCX: > > > RDX: 0001 RSI: 0246 RDI: 821df4d8 > > > RBP: 88007344f9e8 R08: R09: 880bc600 > > > R10: 8163e2c0 R11: 4b47 R12: ea240080 > > > R13: ea240088 R14: ea240080 R15: > > > FS: () GS:88007830() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 8005003b > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > > > Stack: > > > cccd ea240080 88007344fa00 ea240088 > > > 88007344fa00 88007344f9e8 810f0200 > > > ea24 ea240080 > > > Call Trace: > > > [] ? __lock_page+0xa0/0xb0 > > > [] deferred_split_scan+0x115/0x240 > > > [] ? list_lru_count_one+0x1c/0x30 > > > [] shrink_slab.part.42+0x1e3/0x350 > > > [] shrink_zone+0x26a/0x280 > > > [] do_try_to_free_pages+0x12d/0x3b0 > > > [] try_to_free_pages+0xb4/0x140 > > > [] __alloc_pages_nodemask+0x459/0x920 > > > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > > > [] khugepaged+0x155/0x1b10 > > > [] ? prepare_to_wait_event+0xf0/0xf0 > > > [] ? __split_huge_pmd_locked+0x4e0/0x4e0 > > > [] kthread+0xc9/0xe0 > > > [] ? kthread_park+0x60/0x60 > > > [] ret_from_fork+0x3f/0x70 > > > [] ? kthread_park+0x60/0x60 > > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote: > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > > I couldn't see any problem. > > > > > > > > > > > > However, in this round, I did another test which is same one > > > > > > I attached but a liitle bit different because it doesn't do > > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > > > Could you share updated test? > > > > > > > > It's part of my testing suite so I should factor it out. > > > > I will send it when I go to office tomorrow. > > > > > > Thanks. > > > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > > > Befor leaving office, I queued it up and result is below. > > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > > Anyway, please confirm and say to me what I should add more patches > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > > fix patches. > > > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > > > 1. mm: fix __page_mapcount() > > 2. thp: fix leak due split_huge_page() vs. exit race > > > > If I missed some patches, let me know it. > > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested > > again. > > But unfortunately, the result was below. > > > > Now, I am making test program I can send to you but it seems to be not easy > > because small changes for factoring it out from testing suite seems to > > change > > something(ex, timing) and makes hard to reproduce. I will try it again. > > Your test suite seems generate quite a few bug reports. Don't mind make whole > suite public? It's tough due to including company internal stuffs. That's why I try to factor the part I can share out but unfortunatel, I couldn't grab a time for retrying until now. :( > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > > index:0x60e02 > > flags: 0x40040018(uptodate|dirty|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > > page->mem_cgroup:880077cf0c00 > > [ cut here ] > > kernel BUG at mm/huge_memory.c:3272! > > invalid opcode: [#1] SMP > > Dumping ftrace buffer: > >(ftrace buffer empty) > > Modules linked in: > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > > RIP: 0010:[] [] > > split_huge_page_to_list+0x8fb/0x910 > > RSP: 0018:88007344f968 EFLAGS: 00010286 > > RAX: 0021 RBX: ea240080 RCX: > > RDX: 0001 RSI: 0246 RDI: 821df4d8 > > RBP: 88007344f9e8 R08: R09: 880bc600 > > R10: 8163e2c0 R11: 4b47 R12: ea240080 > > R13: ea240088 R14: ea240080 R15: > > FS: () GS:88007830() knlGS: > > CS: 0010 DS: ES: CR0: 8005003b > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > > Stack: > > cccd ea240080 88007344fa00 ea240088 > > 88007344fa00 88007344f9e8 810f0200 > > ea24 ea240080 > > Call Trace: > > [] ? __lock_page+0xa0/0xb0 > > [] deferred_split_scan+0x115/0x240 > > [] ? list_lru_count_one+0x1c/0x30 > > [] shrink_slab.part.42+0x1e3/0x350 > > [] shrink_zone+0x26a/0x280 > > [] do_try_to_free_pages+0x12d/0x3b0 > > [] try_to_free_pages+0xb4/0x140 > > [] __alloc_pages_nodemask+0x459/0x920 > > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > > [] khugepaged+0x155/0x1b10 > > [] ? prepare_to_wait_event+0xf0/0xf0 > > [] ? __split_huge_pmd_locked+0x4e0/0x4e0 > > [] kthread+0xc9/0xe0 > > [] ? kthread_park+0x60/0x60 > > [] ret_from_fork+0x3f/0x70 > > [] ? kthread_park+0x60/0x60 > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 > > e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 > > c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 > > RIP [] split_huge_page_to_list+0x8fb/0x910 > > RSP > > ---[ end trace 0ee39378e850d8de ]--- > > Kernel panic - not syncing: Fatal
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote: > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > > I couldn't see any problem. > > > > > > > > > > > > However, in this round, I did another test which is same one > > > > > > I attached but a liitle bit different because it doesn't do > > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > > > Could you share updated test? > > > > > > > > It's part of my testing suite so I should factor it out. > > > > I will send it when I go to office tomorrow. > > > > > > Thanks. > > > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > > > Befor leaving office, I queued it up and result is below. > > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > > Anyway, please confirm and say to me what I should add more patches > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > > fix patches. > > > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > > > 1. mm: fix __page_mapcount() > > 2. thp: fix leak due split_huge_page() vs. exit race > > > > If I missed some patches, let me know it. > > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested > > again. > > But unfortunately, the result was below. > > > > Now, I am making test program I can send to you but it seems to be not easy > > because small changes for factoring it out from testing suite seems to > > change > > something(ex, timing) and makes hard to reproduce. I will try it again. > > Your test suite seems generate quite a few bug reports. Don't mind make whole > suite public? It's tough due to including company internal stuffs. That's why I try to factor the part I can share out but unfortunatel, I couldn't grab a time for retrying until now. :( > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > > index:0x60e02 > > flags: 0x40040018(uptodate|dirty|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > > page->mem_cgroup:880077cf0c00 > > [ cut here ] > > kernel BUG at mm/huge_memory.c:3272! > > invalid opcode: [#1] SMP > > Dumping ftrace buffer: > >(ftrace buffer empty) > > Modules linked in: > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > > RIP: 0010:[] [] > > split_huge_page_to_list+0x8fb/0x910 > > RSP: 0018:88007344f968 EFLAGS: 00010286 > > RAX: 0021 RBX: ea240080 RCX: > > RDX: 0001 RSI: 0246 RDI: 821df4d8 > > RBP: 88007344f9e8 R08: R09: 880bc600 > > R10: 8163e2c0 R11: 4b47 R12: ea240080 > > R13: ea240088 R14: ea240080 R15: > > FS: () GS:88007830() knlGS: > > CS: 0010 DS: ES: CR0: 8005003b > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > > Stack: > > cccd ea240080 88007344fa00 ea240088 > > 88007344fa00 88007344f9e8 810f0200 > > ea24 ea240080 > > Call Trace: > > [] ? __lock_page+0xa0/0xb0 > > [] deferred_split_scan+0x115/0x240 > > [] ? list_lru_count_one+0x1c/0x30 > > [] shrink_slab.part.42+0x1e3/0x350 > > [] shrink_zone+0x26a/0x280 > > [] do_try_to_free_pages+0x12d/0x3b0 > > [] try_to_free_pages+0xb4/0x140 > > [] __alloc_pages_nodemask+0x459/0x920 > > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > > [] khugepaged+0x155/0x1b10 > > [] ? prepare_to_wait_event+0xf0/0xf0 > > [] ? __split_huge_pmd_locked+0x4e0/0x4e0 > > [] kthread+0xc9/0xe0 > > [] ? kthread_park+0x60/0x60 > > [] ret_from_fork+0x3f/0x70 > > [] ? kthread_park+0x60/0x60 > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 > > e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 > > c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 > > RIP [] split_huge_page_to_list+0x8fb/0x910 > > RSP > > ---[ end trace 0ee39378e850d8de ]--- > > Kernel panic - not syncing: Fatal
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote: > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > > > I couldn't see any problem. > > > > > > > > > > > > > > However, in this round, I did another test which is same one > > > > > > > I attached but a liitle bit different because it doesn't do > > > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > > > > > Could you share updated test? > > > > > > > > > > It's part of my testing suite so I should factor it out. > > > > > I will send it when I go to office tomorrow. > > > > > > > > Thanks. > > > > > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > > > > > Befor leaving office, I queued it up and result is below. > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > > > Anyway, please confirm and say to me what I should add more patches > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > > > fix patches. > > > > > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > > > > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > > > > > 1. mm: fix __page_mapcount() > > > 2. thp: fix leak due split_huge_page() vs. exit race > > > > > > If I missed some patches, let me know it. > > > > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested > > > again. > > > But unfortunately, the result was below. > > > > > > Now, I am making test program I can send to you but it seems to be not > > > easy > > > because small changes for factoring it out from testing suite seems to > > > change > > > something(ex, timing) and makes hard to reproduce. I will try it again. > > > > Your test suite seems generate quite a few bug reports. Don't mind make > > whole > > suite public? > > It's tough due to including company internal stuffs. > That's why I try to factor the part I can share out but unfortunatel, > I couldn't grab a time for retrying until now. :( > > > > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > > > index:0x60e02 > > > flags: 0x40040018(uptodate|dirty|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > > > page->mem_cgroup:880077cf0c00 > > > [ cut here ] > > > kernel BUG at mm/huge_memory.c:3272! > > > invalid opcode: [#1] SMP > > > Dumping ftrace buffer: > > >(ftrace buffer empty) > > > Modules linked in: > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > > > 01/01/2011 > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > > > RIP: 0010:[] [] > > > split_huge_page_to_list+0x8fb/0x910 > > > RSP: 0018:88007344f968 EFLAGS: 00010286 > > > RAX: 0021 RBX: ea240080 RCX: > > > RDX: 0001 RSI: 0246 RDI: 821df4d8 > > > RBP: 88007344f9e8 R08: R09: 880bc600 > > > R10: 8163e2c0 R11: 4b47 R12: ea240080 > > > R13: ea240088 R14: ea240080 R15: > > > FS: () GS:88007830() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 8005003b > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > > > Stack: > > > cccd ea240080 88007344fa00 ea240088 > > > 88007344fa00 88007344f9e8 810f0200 > > > ea24 ea240080 > > > Call Trace: > > > [] ? __lock_page+0xa0/0xb0 > > > [] deferred_split_scan+0x115/0x240 > > > [] ? list_lru_count_one+0x1c/0x30 > > > [] shrink_slab.part.42+0x1e3/0x350 > > > [] shrink_zone+0x26a/0x280 > > > [] do_try_to_free_pages+0x12d/0x3b0 > > > [] try_to_free_pages+0xb4/0x140 > > > [] __alloc_pages_nodemask+0x459/0x920 > > > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > > > [] khugepaged+0x155/0x1b10 > > > [] ? prepare_to_wait_event+0xf0/0xf0 > > > [] ? __split_huge_pmd_locked+0x4e0/0x4e0 > > > [] kthread+0xc9/0xe0 > > > [] ? kthread_park+0x60/0x60 > > > [] ret_from_fork+0x3f/0x70 > > > [] ? kthread_park+0x60/0x60 > > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > I couldn't see any problem. > > > > > > > > > > However, in this round, I did another test which is same one > > > > > I attached but a liitle bit different because it doesn't do > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > Could you share updated test? > > > > > > It's part of my testing suite so I should factor it out. > > > I will send it when I go to office tomorrow. > > > > Thanks. > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > Befor leaving office, I queued it up and result is below. > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > Anyway, please confirm and say to me what I should add more patches > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > fix patches. > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > 1. mm: fix __page_mapcount() > 2. thp: fix leak due split_huge_page() vs. exit race > > If I missed some patches, let me know it. > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested again. > But unfortunately, the result was below. > > Now, I am making test program I can send to you but it seems to be not easy > because small changes for factoring it out from testing suite seems to change > something(ex, timing) and makes hard to reproduce. I will try it again. Your test suite seems generate quite a few bug reports. Don't mind make whole suite public? > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > index:0x60e02 > flags: 0x40040018(uptodate|dirty|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > page->mem_cgroup:880077cf0c00 > [ cut here ] > kernel BUG at mm/huge_memory.c:3272! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > RIP: 0010:[] [] > split_huge_page_to_list+0x8fb/0x910 > RSP: 0018:88007344f968 EFLAGS: 00010286 > RAX: 0021 RBX: ea240080 RCX: > RDX: 0001 RSI: 0246 RDI: 821df4d8 > RBP: 88007344f9e8 R08: R09: 880bc600 > R10: 8163e2c0 R11: 4b47 R12: ea240080 > R13: ea240088 R14: ea240080 R15: > FS: () GS:88007830() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > Stack: > cccd ea240080 88007344fa00 ea240088 > 88007344fa00 88007344f9e8 810f0200 > ea24 ea240080 > Call Trace: > [] ? __lock_page+0xa0/0xb0 > [] deferred_split_scan+0x115/0x240 > [] ? list_lru_count_one+0x1c/0x30 > [] shrink_slab.part.42+0x1e3/0x350 > [] shrink_zone+0x26a/0x280 > [] do_try_to_free_pages+0x12d/0x3b0 > [] try_to_free_pages+0xb4/0x140 > [] __alloc_pages_nodemask+0x459/0x920 > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > [] khugepaged+0x155/0x1b10 > [] ? prepare_to_wait_event+0xf0/0xf0 > [] ? __split_huge_pmd_locked+0x4e0/0x4e0 > [] kthread+0xc9/0xe0 > [] ? kthread_park+0x60/0x60 > [] ret_from_fork+0x3f/0x70 > [] ? kthread_park+0x60/0x60 > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 > 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 > c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 > RIP [] split_huge_page_to_list+0x8fb/0x910 > RSP > ---[ end trace 0ee39378e850d8de ]--- > Kernel panic - not syncing: Fatal exception > Dumping ftrace buffer: >(ftrace buffer empty) > Kernel Offset: disabled I looked more into it. It seems a race between split_huge_page() and deferred_split_scan() as the dumped page is not huge. Could you check if the patch below makes any difference to the situation? diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 91e2f4b7ca39..923c0f6eb50a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3186,13 +3186,6 @@ static
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote: > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > > I couldn't see any problem. > > > > > > > > > > However, in this round, I did another test which is same one > > > > > I attached but a liitle bit different because it doesn't do > > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > > > Could you share updated test? > > > > > > It's part of my testing suite so I should factor it out. > > > I will send it when I go to office tomorrow. > > > > Thanks. > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > > > Befor leaving office, I queued it up and result is below. > > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > > Anyway, please confirm and say to me what I should add more patches > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > > fix patches. > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com > > 1. mm: fix __page_mapcount() > 2. thp: fix leak due split_huge_page() vs. exit race > > If I missed some patches, let me know it. > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested again. > But unfortunately, the result was below. > > Now, I am making test program I can send to you but it seems to be not easy > because small changes for factoring it out from testing suite seems to change > something(ex, timing) and makes hard to reproduce. I will try it again. Your test suite seems generate quite a few bug reports. Don't mind make whole suite public? > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 > index:0x60e02 > flags: 0x40040018(uptodate|dirty|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > page->mem_cgroup:880077cf0c00 > [ cut here ] > kernel BUG at mm/huge_memory.c:3272! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 > RIP: 0010:[] [] > split_huge_page_to_list+0x8fb/0x910 > RSP: 0018:88007344f968 EFLAGS: 00010286 > RAX: 0021 RBX: ea240080 RCX: > RDX: 0001 RSI: 0246 RDI: 821df4d8 > RBP: 88007344f9e8 R08: R09: 880bc600 > R10: 8163e2c0 R11: 4b47 R12: ea240080 > R13: ea240088 R14: ea240080 R15: > FS: () GS:88007830() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 > Stack: > cccd ea240080 88007344fa00 ea240088 > 88007344fa00 88007344f9e8 810f0200 > ea24 ea240080 > Call Trace: > [] ? __lock_page+0xa0/0xb0 > [] deferred_split_scan+0x115/0x240 > [] ? list_lru_count_one+0x1c/0x30 > [] shrink_slab.part.42+0x1e3/0x350 > [] shrink_zone+0x26a/0x280 > [] do_try_to_free_pages+0x12d/0x3b0 > [] try_to_free_pages+0xb4/0x140 > [] __alloc_pages_nodemask+0x459/0x920 > [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 > [] khugepaged+0x155/0x1b10 > [] ? prepare_to_wait_event+0xf0/0xf0 > [] ? __split_huge_pmd_locked+0x4e0/0x4e0 > [] kthread+0xc9/0xe0 > [] ? kthread_park+0x60/0x60 > [] ret_from_fork+0x3f/0x70 > [] ? kthread_park+0x60/0x60 > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 > 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 > c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 > RIP [] split_huge_page_to_list+0x8fb/0x910 > RSP > ---[ end trace 0ee39378e850d8de ]--- > Kernel panic - not syncing: Fatal exception > Dumping ftrace buffer: >(ftrace buffer empty) > Kernel Offset: disabled I looked more into it. It seems a race between split_huge_page() and deferred_split_scan() as the dumped page is not huge. Could you check if the patch below makes any difference to the situation? diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 91e2f4b7ca39..923c0f6eb50a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3186,13 +3186,6 @@ static
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > I couldn't see any problem. > > > > > > > > However, in this round, I did another test which is same one > > > > I attached but a liitle bit different because it doesn't do > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > Could you share updated test? > > > > It's part of my testing suite so I should factor it out. > > I will send it when I go to office tomorrow. > > Thanks. > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > Befor leaving office, I queued it up and result is below. > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > Anyway, please confirm and say to me what I should add more patches > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > fix patches. > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com 1. mm: fix __page_mapcount() 2. thp: fix leak due split_huge_page() vs. exit race If I missed some patches, let me know it. I applied above two patches based on mmotm-2015-11-10-15-53 and tested again. But unfortunately, the result was below. Now, I am making test program I can send to you but it seems to be not easy because small changes for factoring it out from testing suite seems to change something(ex, timing) and makes hard to reproduce. I will try it again. page:ea240080 count:2 mapcount:1 mapping:88007eff3321 index:0x60e02 flags: 0x40040018(uptodate|dirty|swapbacked) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:880077cf0c00 [ cut here ] kernel BUG at mm/huge_memory.c:3272! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 RIP: 0010:[] [] split_huge_page_to_list+0x8fb/0x910 RSP: 0018:88007344f968 EFLAGS: 00010286 RAX: 0021 RBX: ea240080 RCX: RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 88007344f9e8 R08: R09: 880bc600 R10: 8163e2c0 R11: 4b47 R12: ea240080 R13: ea240088 R14: ea240080 R15: FS: () GS:88007830() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 Stack: cccd ea240080 88007344fa00 ea240088 88007344fa00 88007344f9e8 810f0200 ea24 ea240080 Call Trace: [] ? __lock_page+0xa0/0xb0 [] deferred_split_scan+0x115/0x240 [] ? list_lru_count_one+0x1c/0x30 [] shrink_slab.part.42+0x1e3/0x350 [] shrink_zone+0x26a/0x280 [] do_try_to_free_pages+0x12d/0x3b0 [] try_to_free_pages+0xb4/0x140 [] __alloc_pages_nodemask+0x459/0x920 [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 [] khugepaged+0x155/0x1b10 [] ? prepare_to_wait_event+0xf0/0xf0 [] ? __split_huge_pmd_locked+0x4e0/0x4e0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 RIP [] split_huge_page_to_list+0x8fb/0x910 RSP ---[ end trace 0ee39378e850d8de ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > During the test with MADV_FREE on kernel I applied your patches, > > > I couldn't see any problem. > > > > > > However, in this round, I did another test which is same one > > > I attached but a liitle bit different because it doesn't do > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > Could you share updated test? > > It's part of my testing suite so I should factor it out. > I will send it when I go to office tomorrow. Thanks. > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > Befor leaving office, I queued it up and result is below. > It seems you fixed already but didn't apply it to mmotm yet. Right? > Anyway, please confirm and say to me what I should add more patches > into mmotm-2015-11-10-15-53 for follow up your recent many bug > fix patches. The two my patches which are not in the mmotm-2015-11-10-15-53 release: http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > During the test with MADV_FREE on kernel I applied your patches, > > I couldn't see any problem. > > > > However, in this round, I did another test which is same one > > I attached but a liitle bit different because it doesn't do > > (memcg things/kill/swapoff) for testing program long-live test. > > Could you share updated test? It's part of my testing suite so I should factor it out. I will send it when I go to office tomorrow. > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? Befor leaving office, I queued it up and result is below. It seems you fixed already but didn't apply it to mmotm yet. Right? Anyway, please confirm and say to me what I should add more patches into mmotm-2015-11-10-15-53 for follow up your recent many bug fix patches. Thanks. page:ea553fc0 count:3 mapcount:1 mapping:88007f717a01 index:0x602ff flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma) page->mem_cgroup:880077cf0c00 [ cut here ] kernel BUG at mm/migrate.c:889! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 10 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 RIP: 0010:[] [] migrate_pages+0x8e6/0x950 RSP: 0018:88007344fa00 EFLAGS: 00010282 RAX: 0021 RBX: ea0001a0bbc0 RCX: RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 88007344fa80 R08: R09: 880b9540 R10: 8163e2c0 R11: 02c2 R12: R13: ea553f80 R14: ea553fc0 R15: 8189db40 FS: () GS:88007834() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f45cc0091d8 CR3: 7eba7000 CR4: 06a0 Stack: 880073441a40 81114880 81116420 ea553fe0 88007344fb30 88007344fb20 88007344fb20 Call Trace: [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0 [] ? isolate_freepages_block+0x3d0/0x3d0 [] compact_zone+0x2bb/0x720 [] ? list_del+0xd/0x30 [] compact_zone_order+0x6d/0xa0 [] try_to_compact_pages+0xed/0x200 [] __alloc_pages_direct_compact+0x3b/0xd4 [] __alloc_pages_nodemask+0x3fb/0x920 [] khugepaged+0x155/0x1b10 [] ? prepare_to_wait_event+0xf0/0xf0 [] ? __split_huge_pmd_locked+0x4e0/0x4e0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 f4 fa ff ff 48 c7 c6 b8 f6 77 81 4c 89 f7 e8 fa 36 fd ff <0f> 0b 48 83 e8 01 e9 d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff RIP [] migrate_pages+0x8e6/0x950 RSP ---[ end trace 337555313b7e45be ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > During the test with MADV_FREE on kernel I applied your patches, > I couldn't see any problem. > > However, in this round, I did another test which is same one > I attached but a liitle bit different because it doesn't do > (memcg things/kill/swapoff) for testing program long-live test. Could you share updated test? And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > With that, I encountered this problem. > > page:eaf60080 count:1 mapcount:0 mapping:88007f584691 > index:0x62a02 > flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > page->mem_cgroup:880077cf0c00 > [ cut here ] > kernel BUG at mm/huge_memory.c:3340! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000 > RIP: 0010:[] [] > split_huge_page_to_list+0x907/0x920 > RSP: 0018:88004ced7a38 EFLAGS: 00010296 > RAX: 0021 RBX: eaf60080 RCX: 81830db8 > RDX: 0001 RSI: 0246 RDI: 821df4d8 > RBP: 88004ced7ab8 R08: R09: 880bc560 > R10: 8163d880 R11: 00014f25 R12: eaf60080 > R13: eaf60088 R14: eaf60080 R15: > FS: 7f43d3ced740() GS:8800782e() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0 > Stack: > cccd eaf60080 88004ced7ad0 eaf60088 > 88004ced7ad0 88004ced7ab8 810ef9d0 > eaf6 eaf60080 > Call Trace: > [] ? __lock_page+0xa0/0xb0 > [] deferred_split_scan+0x11c/0x260 > [] ? list_lru_count_one+0x1c/0x30 > [] shrink_slab.part.42+0x1e3/0x350 > [] shrink_zone+0x26a/0x280 > [] do_try_to_free_pages+0x12d/0x3b0 > [] try_to_free_pages+0xb4/0x140 > [] __alloc_pages_nodemask+0x459/0x920 > [] handle_mm_fault+0xc77/0x1000 > [] ? retint_kernel+0x10/0x10 > [] __do_page_fault+0x189/0x400 > [] do_page_fault+0xc/0x10 > [] page_fault+0x22/0x30 > Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 > 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 > af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f > RIP [] split_huge_page_to_list+0x907/0x920 > RSP > ---[ end trace c9a60522e3a296e4 ]--- I don't see how it's possible: call lock_page() just before split_huge_page() in deferred_split_scan(). > So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED. > In this time, I saw below oops in this time. > If I miss somethings, please let me know it. > > [ cut here ] > kernel BUG at include/linux/swapops.h:129! Looks similar to what I fixed by inserting smp_wmb() just before clear_compound_head() in __split_huge_page_tail(). Do you have this in place? Like in last -mm tree? > Another hit: > > page:ea520080 count:2 mapcount:0 mapping:880072b38a51 > index:0x62602 > flags: 0x40048028(uptodate|lru|swapcache|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > page->mem_cgroup:880077cf0c00 > [ cut here ] > kernel BUG at mm/huge_memory.c:3306! The same as the first one: no idea. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > During the test with MADV_FREE on kernel I applied your patches, > I couldn't see any problem. > > However, in this round, I did another test which is same one > I attached but a liitle bit different because it doesn't do > (memcg things/kill/swapoff) for testing program long-live test. Could you share updated test? And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > With that, I encountered this problem. > > page:eaf60080 count:1 mapcount:0 mapping:88007f584691 > index:0x62a02 > flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > page->mem_cgroup:880077cf0c00 > [ cut here ] > kernel BUG at mm/huge_memory.c:3340! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000 > RIP: 0010:[] [] > split_huge_page_to_list+0x907/0x920 > RSP: 0018:88004ced7a38 EFLAGS: 00010296 > RAX: 0021 RBX: eaf60080 RCX: 81830db8 > RDX: 0001 RSI: 0246 RDI: 821df4d8 > RBP: 88004ced7ab8 R08: R09: 880bc560 > R10: 8163d880 R11: 00014f25 R12: eaf60080 > R13: eaf60088 R14: eaf60080 R15: > FS: 7f43d3ced740() GS:8800782e() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0 > Stack: > cccd eaf60080 88004ced7ad0 eaf60088 > 88004ced7ad0 88004ced7ab8 810ef9d0 > eaf6 eaf60080 > Call Trace: > [] ? __lock_page+0xa0/0xb0 > [] deferred_split_scan+0x11c/0x260 > [] ? list_lru_count_one+0x1c/0x30 > [] shrink_slab.part.42+0x1e3/0x350 > [] shrink_zone+0x26a/0x280 > [] do_try_to_free_pages+0x12d/0x3b0 > [] try_to_free_pages+0xb4/0x140 > [] __alloc_pages_nodemask+0x459/0x920 > [] handle_mm_fault+0xc77/0x1000 > [] ? retint_kernel+0x10/0x10 > [] __do_page_fault+0x189/0x400 > [] do_page_fault+0xc/0x10 > [] page_fault+0x22/0x30 > Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 > 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 > af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f > RIP [] split_huge_page_to_list+0x907/0x920 > RSP > ---[ end trace c9a60522e3a296e4 ]--- I don't see how it's possible: call lock_page() just before split_huge_page() in deferred_split_scan(). > So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED. > In this time, I saw below oops in this time. > If I miss somethings, please let me know it. > > [ cut here ] > kernel BUG at include/linux/swapops.h:129! Looks similar to what I fixed by inserting smp_wmb() just before clear_compound_head() in __split_huge_page_tail(). Do you have this in place? Like in last -mm tree? > Another hit: > > page:ea520080 count:2 mapcount:0 mapping:880072b38a51 > index:0x62602 > flags: 0x40048028(uptodate|lru|swapcache|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) > page->mem_cgroup:880077cf0c00 > [ cut here ] > kernel BUG at mm/huge_memory.c:3306! The same as the first one: no idea. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > During the test with MADV_FREE on kernel I applied your patches, > > I couldn't see any problem. > > > > However, in this round, I did another test which is same one > > I attached but a liitle bit different because it doesn't do > > (memcg things/kill/swapoff) for testing program long-live test. > > Could you share updated test? It's part of my testing suite so I should factor it out. I will send it when I go to office tomorrow. > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? Befor leaving office, I queued it up and result is below. It seems you fixed already but didn't apply it to mmotm yet. Right? Anyway, please confirm and say to me what I should add more patches into mmotm-2015-11-10-15-53 for follow up your recent many bug fix patches. Thanks. page:ea553fc0 count:3 mapcount:1 mapping:88007f717a01 index:0x602ff flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma) page->mem_cgroup:880077cf0c00 [ cut here ] kernel BUG at mm/migrate.c:889! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 10 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 RIP: 0010:[] [] migrate_pages+0x8e6/0x950 RSP: 0018:88007344fa00 EFLAGS: 00010282 RAX: 0021 RBX: ea0001a0bbc0 RCX: RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 88007344fa80 R08: R09: 880b9540 R10: 8163e2c0 R11: 02c2 R12: R13: ea553f80 R14: ea553fc0 R15: 8189db40 FS: () GS:88007834() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f45cc0091d8 CR3: 7eba7000 CR4: 06a0 Stack: 880073441a40 81114880 81116420 ea553fe0 88007344fb30 88007344fb20 88007344fb20 Call Trace: [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0 [] ? isolate_freepages_block+0x3d0/0x3d0 [] compact_zone+0x2bb/0x720 [] ? list_del+0xd/0x30 [] compact_zone_order+0x6d/0xa0 [] try_to_compact_pages+0xed/0x200 [] __alloc_pages_direct_compact+0x3b/0xd4 [] __alloc_pages_nodemask+0x3fb/0x920 [] khugepaged+0x155/0x1b10 [] ? prepare_to_wait_event+0xf0/0xf0 [] ? __split_huge_pmd_locked+0x4e0/0x4e0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 f4 fa ff ff 48 c7 c6 b8 f6 77 81 4c 89 f7 e8 fa 36 fd ff <0f> 0b 48 83 e8 01 e9 d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff RIP [] migrate_pages+0x8e6/0x950 RSP ---[ end trace 337555313b7e45be ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > During the test with MADV_FREE on kernel I applied your patches, > > > I couldn't see any problem. > > > > > > However, in this round, I did another test which is same one > > > I attached but a liitle bit different because it doesn't do > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > Could you share updated test? > > It's part of my testing suite so I should factor it out. > I will send it when I go to office tomorrow. Thanks. > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > Befor leaving office, I queued it up and result is below. > It seems you fixed already but didn't apply it to mmotm yet. Right? > Anyway, please confirm and say to me what I should add more patches > into mmotm-2015-11-10-15-53 for follow up your recent many bug > fix patches. The two my patches which are not in the mmotm-2015-11-10-15-53 release: http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote: > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote: > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote: > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote: > > > > During the test with MADV_FREE on kernel I applied your patches, > > > > I couldn't see any problem. > > > > > > > > However, in this round, I did another test which is same one > > > > I attached but a liitle bit different because it doesn't do > > > > (memcg things/kill/swapoff) for testing program long-live test. > > > > > > Could you share updated test? > > > > It's part of my testing suite so I should factor it out. > > I will send it when I go to office tomorrow. > > Thanks. > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53? > > > > Befor leaving office, I queued it up and result is below. > > It seems you fixed already but didn't apply it to mmotm yet. Right? > > Anyway, please confirm and say to me what I should add more patches > > into mmotm-2015-11-10-15-53 for follow up your recent many bug > > fix patches. > > The two my patches which are not in the mmotm-2015-11-10-15-53 release: > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com 1. mm: fix __page_mapcount() 2. thp: fix leak due split_huge_page() vs. exit race If I missed some patches, let me know it. I applied above two patches based on mmotm-2015-11-10-15-53 and tested again. But unfortunately, the result was below. Now, I am making test program I can send to you but it seems to be not easy because small changes for factoring it out from testing suite seems to change something(ex, timing) and makes hard to reproduce. I will try it again. page:ea240080 count:2 mapcount:1 mapping:88007eff3321 index:0x60e02 flags: 0x40040018(uptodate|dirty|swapbacked) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:880077cf0c00 [ cut here ] kernel BUG at mm/huge_memory.c:3272! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000 RIP: 0010:[] [] split_huge_page_to_list+0x8fb/0x910 RSP: 0018:88007344f968 EFLAGS: 00010286 RAX: 0021 RBX: ea240080 RCX: RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 88007344f9e8 R08: R09: 880bc600 R10: 8163e2c0 R11: 4b47 R12: ea240080 R13: ea240088 R14: ea240080 R15: FS: () GS:88007830() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0 Stack: cccd ea240080 88007344fa00 ea240088 88007344fa00 88007344f9e8 810f0200 ea24 ea240080 Call Trace: [] ? __lock_page+0xa0/0xb0 [] deferred_split_scan+0x115/0x240 [] ? list_lru_count_one+0x1c/0x30 [] shrink_slab.part.42+0x1e3/0x350 [] shrink_zone+0x26a/0x280 [] do_try_to_free_pages+0x12d/0x3b0 [] try_to_free_pages+0xb4/0x140 [] __alloc_pages_nodemask+0x459/0x920 [] ? trace_event_raw_event_tick_stop+0xd0/0xd0 [] khugepaged+0x155/0x1b10 [] ? prepare_to_wait_event+0xf0/0xf0 [] ? __split_huge_pmd_locked+0x4e0/0x4e0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 RIP [] split_huge_page_to_list+0x8fb/0x910 RSP ---[ end trace 0ee39378e850d8de ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 12, 2015 at 09:36:14AM +0900, Minchan Kim wrote: > > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for > > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no > > > MADV_FREE code in there > > > + pte_mkdirty patch > > > + freeze/unfreeze patch > > > + do_page_add_anon_rmap patch > > > + above split_huge_pmd > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > BUG: Bad rss-counter state mm:88007fa3bb80 idx:1 val:512 > > > > With the patch below my test setup run for 2+ days without triggering the > > bug. split_huge_pmd patch should be dropped. > > > > Please test. > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 14cbbad54a3e..7aa0a3fef2aa 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct > > vm_area_struct *vma, pmd_t *pmd, > > write = pmd_write(*pmd); > > young = pmd_young(*pmd); > > > > - /* leave pmd empty until pte is filled */ > > - pmdp_huge_clear_flush_notify(vma, haddr, pmd); > > - > > pgtable = pgtable_trans_huge_withdraw(mm, pmd); > > pmd_populate(mm, &_pmd, pgtable); > > > > @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct > > vm_area_struct *vma, pmd_t *pmd, > > } > > > > smp_wmb(); /* make pte visible before pmd */ > > + /* > > +* Up to this point the pmd is present and huge and userland has the > > +* whole access to the hugepage during the split (which happens in > > +* place). If we overwrite the pmd with the not-huge version pointing > > +* to the pte here (which of course we could if all CPUs were bug > > +* free), userland could trigger a small page size TLB miss on the > > +* small sized TLB while the hugepage TLB entry is still established in > > +* the huge TLB. Some CPU doesn't like that. > > +* See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum > > +* 383 on page 93. Intel should be safe but is also warns that it's > > +* only safe if the permission and cache attributes of the two entries > > +* loaded in the two TLB is identical (which should be the case here). > > +* But it is generally safer to never allow small and huge TLB entries > > +* for the same virtual address to be loaded simultaneously. So instead > > +* of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the > > +* current pmd notpresent (atomically because here the pmd_trans_huge > > +* and pmd_trans_splitting must remain set at all times on the pmd > > +* until the split is complete for this pmd), then we flush the SMP TLB > > +* and finally we write the non-huge version of the pmd entry with > > +* pmd_populate. > > +*/ > > + pmdp_invalidate(vma, haddr, pmd); > > pmd_populate(mm, pmd, pgtable); > > > > if (freeze) { > > I have been tested this patch with MADV_DONTNEED for a few days and > I couldn't see the problem any more. And I will continue to test it > with MADV_FREE. During the test with MADV_FREE on kernel I applied your patches, I couldn't see any problem. However, in this round, I did another test which is same one I attached but a liitle bit different because it doesn't do (memcg things/kill/swapoff) for testing program long-live test. With that, I encountered this problem. page:eaf60080 count:1 mapcount:0 mapping:88007f584691 index:0x62a02 flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:880077cf0c00 [ cut here ] kernel BUG at mm/huge_memory.c:3340! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000 RIP: 0010:[] [] split_huge_page_to_list+0x907/0x920 RSP: 0018:88004ced7a38 EFLAGS: 00010296 RAX: 0021 RBX: eaf60080 RCX: 81830db8 RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 88004ced7ab8 R08: R09: 880bc560 R10: 8163d880 R11: 00014f25 R12: eaf60080 R13: eaf60088 R14: eaf60080 R15: FS: 7f43d3ced740() GS:8800782e() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0 Stack: cccd eaf60080 88004ced7ad0 eaf60088 88004ced7ad0 88004ced7ab8
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 12, 2015 at 09:36:14AM +0900, Minchan Kim wrote: > > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for > > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no > > > MADV_FREE code in there > > > + pte_mkdirty patch > > > + freeze/unfreeze patch > > > + do_page_add_anon_rmap patch > > > + above split_huge_pmd > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > BUG: Bad rss-counter state mm:88007fa3bb80 idx:1 val:512 > > > > With the patch below my test setup run for 2+ days without triggering the > > bug. split_huge_pmd patch should be dropped. > > > > Please test. > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 14cbbad54a3e..7aa0a3fef2aa 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct > > vm_area_struct *vma, pmd_t *pmd, > > write = pmd_write(*pmd); > > young = pmd_young(*pmd); > > > > - /* leave pmd empty until pte is filled */ > > - pmdp_huge_clear_flush_notify(vma, haddr, pmd); > > - > > pgtable = pgtable_trans_huge_withdraw(mm, pmd); > > pmd_populate(mm, &_pmd, pgtable); > > > > @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct > > vm_area_struct *vma, pmd_t *pmd, > > } > > > > smp_wmb(); /* make pte visible before pmd */ > > + /* > > +* Up to this point the pmd is present and huge and userland has the > > +* whole access to the hugepage during the split (which happens in > > +* place). If we overwrite the pmd with the not-huge version pointing > > +* to the pte here (which of course we could if all CPUs were bug > > +* free), userland could trigger a small page size TLB miss on the > > +* small sized TLB while the hugepage TLB entry is still established in > > +* the huge TLB. Some CPU doesn't like that. > > +* See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum > > +* 383 on page 93. Intel should be safe but is also warns that it's > > +* only safe if the permission and cache attributes of the two entries > > +* loaded in the two TLB is identical (which should be the case here). > > +* But it is generally safer to never allow small and huge TLB entries > > +* for the same virtual address to be loaded simultaneously. So instead > > +* of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the > > +* current pmd notpresent (atomically because here the pmd_trans_huge > > +* and pmd_trans_splitting must remain set at all times on the pmd > > +* until the split is complete for this pmd), then we flush the SMP TLB > > +* and finally we write the non-huge version of the pmd entry with > > +* pmd_populate. > > +*/ > > + pmdp_invalidate(vma, haddr, pmd); > > pmd_populate(mm, pmd, pgtable); > > > > if (freeze) { > > I have been tested this patch with MADV_DONTNEED for a few days and > I couldn't see the problem any more. And I will continue to test it > with MADV_FREE. During the test with MADV_FREE on kernel I applied your patches, I couldn't see any problem. However, in this round, I did another test which is same one I attached but a liitle bit different because it doesn't do (memcg things/kill/swapoff) for testing program long-live test. With that, I encountered this problem. page:eaf60080 count:1 mapcount:0 mapping:88007f584691 index:0x62a02 flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:880077cf0c00 [ cut here ] kernel BUG at mm/huge_memory.c:3340! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000 RIP: 0010:[] [] split_huge_page_to_list+0x907/0x920 RSP: 0018:88004ced7a38 EFLAGS: 00010296 RAX: 0021 RBX: eaf60080 RCX: 81830db8 RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 88004ced7ab8 R08: R09: 880bc560 R10: 8163d880 R11: 00014f25 R12: eaf60080 R13: eaf60088 R14: eaf60080 R15: FS: 7f43d3ced740() GS:8800782e() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0 Stack: cccd eaf60080 88004ced7ad0 eaf60088 88004ced7ad0 88004ced7ab8
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 09, 2015 at 12:55:22AM +0200, Kirill A. Shutemov wrote: > On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote: > > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote: > > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > > > > Hello Kirill, > > > > > > > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov > > > > > > > wrote: > > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov > > > > > > > > > wrote: > > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. > > > > > > > > > > > Shutemov wrote: > > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim > > > > > > > > > > > > wrote: > > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh > > > > > > > > > > > > > > Dickins wrote: > > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it > > > > > > > > > > > > > > > > again but I had another oops > > > > > > > > > > > > > > > > in this time but symptom is related to > > > > > > > > > > > > > > > > anon_vma, too. > > > > > > > > > > > > > > > > (kernel is based on recent mmotm + > > > > > > > > > > > > > > > > unconditional mkdirty for bug fix) > > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since > > > > > > > > > > > > > > > > the page was not page_mapped > > > > > > > > > > > > > > > > at that time but second check of page_mapped > > > > > > > > > > > > > > > > right before try_to_unmap seems > > > > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > > > > flags: > > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > > > > page dumped because: > > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > > > > > > > > > > > > && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > > > > migration series. > > > > > > > > > > > > > > > Let me think on it, but it could well relate to > > > > > > > > > > > > > > > the one you got before. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > > > > migration cleanup > > > > > > > > > > > > > > series and will test it again. > > > > > > > > > > > > > > If it is fixed, I will test again with your > > > > > > > > > > > > > > migration patchset, then. > > > > > > > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I > > > > > > > > > > > > > attach for a long time. > > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration > > > > > > > > > > > > > patch in there. > > > > > > > > > > > > > And I added below debug code with request from Kirill > > > > > > > > > > > > > to all test kernels. > > > > > > > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I > > > > > > > > > > > > think I track it down > > > > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > > > > properly tested, but > > > > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration > > > > > > > > > > > > works: I thought that > > > > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with
Re: kernel oops on mmotm-2015-10-15-15-20
On Mon, Nov 09, 2015 at 12:55:22AM +0200, Kirill A. Shutemov wrote: > On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote: > > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote: > > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > > > > Hello Kirill, > > > > > > > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov > > > > > > > wrote: > > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov > > > > > > > > > wrote: > > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. > > > > > > > > > > > Shutemov wrote: > > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim > > > > > > > > > > > > wrote: > > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh > > > > > > > > > > > > > > Dickins wrote: > > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it > > > > > > > > > > > > > > > > again but I had another oops > > > > > > > > > > > > > > > > in this time but symptom is related to > > > > > > > > > > > > > > > > anon_vma, too. > > > > > > > > > > > > > > > > (kernel is based on recent mmotm + > > > > > > > > > > > > > > > > unconditional mkdirty for bug fix) > > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since > > > > > > > > > > > > > > > > the page was not page_mapped > > > > > > > > > > > > > > > > at that time but second check of page_mapped > > > > > > > > > > > > > > > > right before try_to_unmap seems > > > > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > > > > flags: > > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > > > > page dumped because: > > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > > > > > > > > > > > > && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > > > > migration series. > > > > > > > > > > > > > > > Let me think on it, but it could well relate to > > > > > > > > > > > > > > > the one you got before. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > > > > migration cleanup > > > > > > > > > > > > > > series and will test it again. > > > > > > > > > > > > > > If it is fixed, I will test again with your > > > > > > > > > > > > > > migration patchset, then. > > > > > > > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I > > > > > > > > > > > > > attach for a long time. > > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration > > > > > > > > > > > > > patch in there. > > > > > > > > > > > > > And I added below debug code with request from Kirill > > > > > > > > > > > > > to all test kernels. > > > > > > > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I > > > > > > > > > > > > think I track it down > > > > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > > > > properly tested, but > > > > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration > > > > > > > > > > > > works: I thought that > > > > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote: > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote: > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > > > Hello Kirill, > > > > > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov > > > > > > > > wrote: > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. > > > > > > > > > > Shutemov wrote: > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim > > > > > > > > > > > wrote: > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh > > > > > > > > > > > > > Dickins wrote: > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again > > > > > > > > > > > > > > > but I had another oops > > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, > > > > > > > > > > > > > > > too. > > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the > > > > > > > > > > > > > > > page was not page_mapped > > > > > > > > > > > > > > > at that time but second check of page_mapped > > > > > > > > > > > > > > > right before try_to_unmap seems > > > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > > > flags: > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > > > page dumped because: > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > > > > > > > > > > > && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > > > migration series. > > > > > > > > > > > > > > Let me think on it, but it could well relate to the > > > > > > > > > > > > > > one you got before. > > > > > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > > > migration cleanup > > > > > > > > > > > > > series and will test it again. > > > > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I > > > > > > > > > > > > attach for a long time. > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration > > > > > > > > > > > > patch in there. > > > > > > > > > > > > And I added below debug code with request from Kirill > > > > > > > > > > > > to all test kernels. > > > > > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I > > > > > > > > > > > think I track it down > > > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > > > properly tested, but > > > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration > > > > > > > > > > > works: I thought that > > > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with > > > > > > > > > > > split_huge_page(), we can end up > > > > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > > > > _mapcount > > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote: > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote: > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > > > Hello Kirill, > > > > > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov > > > > > > > > wrote: > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. > > > > > > > > > > Shutemov wrote: > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim > > > > > > > > > > > wrote: > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh > > > > > > > > > > > > > Dickins wrote: > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again > > > > > > > > > > > > > > > but I had another oops > > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, > > > > > > > > > > > > > > > too. > > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the > > > > > > > > > > > > > > > page was not page_mapped > > > > > > > > > > > > > > > at that time but second check of page_mapped > > > > > > > > > > > > > > > right before try_to_unmap seems > > > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > > > flags: > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > > > page dumped because: > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > > > > > > > > > > > && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > > > migration series. > > > > > > > > > > > > > > Let me think on it, but it could well relate to the > > > > > > > > > > > > > > one you got before. > > > > > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > > > migration cleanup > > > > > > > > > > > > > series and will test it again. > > > > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I > > > > > > > > > > > > attach for a long time. > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration > > > > > > > > > > > > patch in there. > > > > > > > > > > > > And I added below debug code with request from Kirill > > > > > > > > > > > > to all test kernels. > > > > > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I > > > > > > > > > > > think I track it down > > > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > > > properly tested, but > > > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration > > > > > > > > > > > works: I thought that > > > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with > > > > > > > > > > > split_huge_page(), we can end up > > > > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > > > > _mapcount > > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote: > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > > Hello Kirill, > > > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov > > > > > > > wrote: > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov > > > > > > > > > wrote: > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim > > > > > > > > > > > wrote: > > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins > > > > > > > > > > > > wrote: > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again > > > > > > > > > > > > > > but I had another oops > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, > > > > > > > > > > > > > > too. > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the > > > > > > > > > > > > > > page was not page_mapped > > > > > > > > > > > > > > at that time but second check of page_mapped right > > > > > > > > > > > > > > before try_to_unmap seems > > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > > flags: > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) > > > > > > > > > > > > > > && !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > > migration series. > > > > > > > > > > > > > Let me think on it, but it could well relate to the > > > > > > > > > > > > > one you got before. > > > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > > migration cleanup > > > > > > > > > > > > series and will test it again. > > > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I > > > > > > > > > > > attach for a long time. > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration > > > > > > > > > > > patch in there. > > > > > > > > > > > And I added below debug code with request from Kirill to > > > > > > > > > > > all test kernels. > > > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think > > > > > > > > > > I track it down > > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > > properly tested, but > > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: > > > > > > > > > > I thought that > > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), > > > > > > > > > > we can end up > > > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > > > _mapcount > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable > > > > > > > > > > by vmscan and by > > > > > > > > > > pfn scanners (Sasha showed few similar traces from > > > > > > > > > > compaction too). > > > > > > > > > > It's likely that page->mapping in this case would point to > > > > > > > > > > freed anon_vma. > > > > > > > > > > > > > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > Hello Kirill, > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov > > > > > > > > wrote: > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins > > > > > > > > > > > wrote: > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but > > > > > > > > > > > > > I had another oops > > > > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the > > > > > > > > > > > > > page was not page_mapped > > > > > > > > > > > > > at that time but second check of page_mapped right > > > > > > > > > > > > > before try_to_unmap seems > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > flags: > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > migration series. > > > > > > > > > > > > Let me think on it, but it could well relate to the one > > > > > > > > > > > > you got before. > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > migration cleanup > > > > > > > > > > > series and will test it again. > > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach > > > > > > > > > > for a long time. > > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch > > > > > > > > > > in there. > > > > > > > > > > And I added below debug code with request from Kirill to > > > > > > > > > > all test kernels. > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I > > > > > > > > > track it down > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > properly tested, but > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > > > > thought that > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we > > > > > > > > > can end up > > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > > _mapcount > > > > > > > > > elevated. The page is on LRU too. So it's still reachable by > > > > > > > > > vmscan and by > > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction > > > > > > > > > too). > > > > > > > > > It's likely that page->mapping in this case would point to > > > > > > > > > freed anon_vma. > > > > > > > > > > > > > > > > > > BOOM! > > > > > > > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > > > > migration > > > > > > > > > entries logic: on setup we remove page from rmap and drop > > > > > > > > > pin, on removing > > > > > > > > > we get pin back and put page on rmap.
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote: > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > > Hello Kirill, > > > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov > > > > > > > wrote: > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov > > > > > > > > > wrote: > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim > > > > > > > > > > > wrote: > > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins > > > > > > > > > > > > wrote: > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again > > > > > > > > > > > > > > but I had another oops > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, > > > > > > > > > > > > > > too. > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the > > > > > > > > > > > > > > page was not page_mapped > > > > > > > > > > > > > > at that time but second check of page_mapped right > > > > > > > > > > > > > > before try_to_unmap seems > > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > > flags: > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) > > > > > > > > > > > > > > && !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > > migration series. > > > > > > > > > > > > > Let me think on it, but it could well relate to the > > > > > > > > > > > > > one you got before. > > > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > > migration cleanup > > > > > > > > > > > > series and will test it again. > > > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I > > > > > > > > > > > attach for a long time. > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration > > > > > > > > > > > patch in there. > > > > > > > > > > > And I added below debug code with request from Kirill to > > > > > > > > > > > all test kernels. > > > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think > > > > > > > > > > I track it down > > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > > properly tested, but > > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: > > > > > > > > > > I thought that > > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), > > > > > > > > > > we can end up > > > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > > > _mapcount > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable > > > > > > > > > > by vmscan and by > > > > > > > > > > pfn scanners (Sasha showed few similar traces from > > > > > > > > > > compaction too). > > > > > > > > > > It's likely that page->mapping in this case would point to > > > > > > > > > > freed anon_vma. > > > > > > > > > > > > > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote: > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > > Hello Kirill, > > > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov > > > > > > > > wrote: > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins > > > > > > > > > > > wrote: > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but > > > > > > > > > > > > > I had another oops > > > > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the > > > > > > > > > > > > > page was not page_mapped > > > > > > > > > > > > > at that time but second check of page_mapped right > > > > > > > > > > > > > before try_to_unmap seems > > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > > flags: > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > > migration series. > > > > > > > > > > > > Let me think on it, but it could well relate to the one > > > > > > > > > > > > you got before. > > > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > > migration cleanup > > > > > > > > > > > series and will test it again. > > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach > > > > > > > > > > for a long time. > > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch > > > > > > > > > > in there. > > > > > > > > > > And I added below debug code with request from Kirill to > > > > > > > > > > all test kernels. > > > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I > > > > > > > > > track it down > > > > > > > > > finally. > > > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet > > > > > > > > > properly tested, but > > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > > > > thought that > > > > > > > > > kernel would wait migration to finish on before > > > > > > > > > deconstruction mapping. > > > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we > > > > > > > > > can end up > > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > > _mapcount > > > > > > > > > elevated. The page is on LRU too. So it's still reachable by > > > > > > > > > vmscan and by > > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction > > > > > > > > > too). > > > > > > > > > It's likely that page->mapping in this case would point to > > > > > > > > > freed anon_vma. > > > > > > > > > > > > > > > > > > BOOM! > > > > > > > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > > > > migration > > > > > > > > > entries logic: on setup we remove page from rmap and drop > > > > > > > > > pin, on removing > > > > > > > > > we get pin back and put page on rmap.
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > Hello Kirill, > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov > > > > > > > wrote: > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins > > > > > > > > > > wrote: > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I > > > > > > > > > > > > had another oops > > > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page > > > > > > > > > > > > was not page_mapped > > > > > > > > > > > > at that time but second check of page_mapped right > > > > > > > > > > > > before try_to_unmap seems > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > flags: > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > migration series. > > > > > > > > > > > Let me think on it, but it could well relate to the one > > > > > > > > > > > you got before. > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > migration cleanup > > > > > > > > > > series and will test it again. > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach > > > > > > > > > for a long time. > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch > > > > > > > > > in there. > > > > > > > > > And I added below debug code with request from Kirill to all > > > > > > > > > test kernels. > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I > > > > > > > > track it down > > > > > > > > finally. > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > > > > tested, but > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > > > thought that > > > > > > > > kernel would wait migration to finish on before deconstruction > > > > > > > > mapping. > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we > > > > > > > > can end up > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > _mapcount > > > > > > > > elevated. The page is on LRU too. So it's still reachable by > > > > > > > > vmscan and by > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction > > > > > > > > too). > > > > > > > > It's likely that page->mapping in this case would point to > > > > > > > > freed anon_vma. > > > > > > > > > > > > > > > > BOOM! > > > > > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > > > migration > > > > > > > > entries logic: on setup we remove page from rmap and drop pin, > > > > > > > > on removing > > > > > > > > we get pin back and put page on rmap. This way even if > > > > > > > > migration entry > > > > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > > > > > > > Please, test. > > > > > > > > > > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote: > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > > Hello Kirill, > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov > > > > > > > wrote: > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins > > > > > > > > > > wrote: > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I > > > > > > > > > > > > had another oops > > > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional > > > > > > > > > > > > mkdirty for bug fix) > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page > > > > > > > > > > > > was not page_mapped > > > > > > > > > > > > at that time but second check of page_mapped right > > > > > > > > > > > > before try_to_unmap seems > > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 > > > > > > > > > > > > extents:1 across:4191228k FS > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > > flags: > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page > > > > > > > > > > > migration series. > > > > > > > > > > > Let me think on it, but it could well relate to the one > > > > > > > > > > > you got before. > > > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > > instead of next-20151021 to remove noise from your > > > > > > > > > > migration cleanup > > > > > > > > > > series and will test it again. > > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach > > > > > > > > > for a long time. > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch > > > > > > > > > in there. > > > > > > > > > And I added below debug code with request from Kirill to all > > > > > > > > > test kernels. > > > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I > > > > > > > > track it down > > > > > > > > finally. > > > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > > > > tested, but > > > > > > > > looks like it works. > > > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > > > thought that > > > > > > > > kernel would wait migration to finish on before deconstruction > > > > > > > > mapping. > > > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we > > > > > > > > can end up > > > > > > > > with page which is not mapped anymore but has _count and > > > > > > > > _mapcount > > > > > > > > elevated. The page is on LRU too. So it's still reachable by > > > > > > > > vmscan and by > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction > > > > > > > > too). > > > > > > > > It's likely that page->mapping in this case would point to > > > > > > > > freed anon_vma. > > > > > > > > > > > > > > > > BOOM! > > > > > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > > > migration > > > > > > > > entries logic: on setup we remove page from rmap and drop pin, > > > > > > > > on removing > > > > > > > > we get pin back and put page on rmap. This way even if > > > > > > > > migration entry > > > > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > > > > > > > Please, test. > > > > > > > > > > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > Hello Kirill, > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I > > > > > > > > > > > had another oops > > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty > > > > > > > > > > > for bug fix) > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page > > > > > > > > > > > was not page_mapped > > > > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > > > > try_to_unmap seems > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > > across:4191228k FS > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > > across:4191228k FS > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > flags: > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > > > > series. > > > > > > > > > > Let me think on it, but it could well relate to the one you > > > > > > > > > > got before. > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > instead of next-20151021 to remove noise from your migration > > > > > > > > > cleanup > > > > > > > > > series and will test it again. > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for > > > > > > > > a long time. > > > > > > > > Therefore, there is no patchset from Hugh's migration patch in > > > > > > > > there. > > > > > > > > And I added below debug code with request from Kirill to all > > > > > > > > test kernels. > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I > > > > > > > track it down > > > > > > > finally. > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > > > tested, but > > > > > > > looks like it works. > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > > thought that > > > > > > > kernel would wait migration to finish on before deconstruction > > > > > > > mapping. > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can > > > > > > > end up > > > > > > > with page which is not mapped anymore but has _count and _mapcount > > > > > > > elevated. The page is on LRU too. So it's still reachable by > > > > > > > vmscan and by > > > > > > > pfn scanners (Sasha showed few similar traces from compaction > > > > > > > too). > > > > > > > It's likely that page->mapping in this case would point to freed > > > > > > > anon_vma. > > > > > > > > > > > > > > BOOM! > > > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > > migration > > > > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > > > > removing > > > > > > > we get pin back and put page on rmap. This way even if migration > > > > > > > entry > > > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > > > > > Please, test. > > > > > > > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new > > > > > > patch, I tested > > > > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > > > > index:0x61800 compound_mapcount: 0 > > > > > > flags:
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > Hello Kirill, > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > > > > another oops > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty > > > > > > > > > > for bug fix) > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was > > > > > > > > > > not page_mapped > > > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > > > try_to_unmap seems > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > across:4191228k FS > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > across:4191228k FS > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > flags: > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > > > series. > > > > > > > > > Let me think on it, but it could well relate to the one you > > > > > > > > > got before. > > > > > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > instead of next-20151021 to remove noise from your migration > > > > > > > > cleanup > > > > > > > > series and will test it again. > > > > > > > > If it is fixed, I will test again with your migration patchset, > > > > > > > > then. > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a > > > > > > > long time. > > > > > > > Therefore, there is no patchset from Hugh's migration patch in > > > > > > > there. > > > > > > > And I added below debug code with request from Kirill to all test > > > > > > > kernels. > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I track > > > > > > it down > > > > > > finally. > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > > tested, but > > > > > > looks like it works. > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > thought that > > > > > > kernel would wait migration to finish on before deconstruction > > > > > > mapping. > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can > > > > > > end up > > > > > > with page which is not mapped anymore but has _count and _mapcount > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan > > > > > > and by > > > > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > > > > It's likely that page->mapping in this case would point to freed > > > > > > anon_vma. > > > > > > > > > > > > BOOM! > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > migration > > > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > > > removing > > > > > > we get pin back and put page on rmap. This way even if migration > > > > > > entry > > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > > > Please, test. > > > > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new > > > > > patch, I tested > > > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > > > index:0x61800 compound_mapcount: 0 > > > > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > > > > page->mem_cgroup:88007f613c00 > > > > > > > > Ignore my previous answer. Still sleeping. > > > > > > > > The right way to fix I think is something like: > > > > > > > > diff --git a/mm/rmap.c
Re: kernel oops on mmotm-2015-10-15-15-20
Hello Kirill, On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > Hello Hugh, > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > > > another oops > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for > > > > > > > > > bug fix) > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was > > > > > > > > > not page_mapped > > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > > try_to_unmap seems > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > across:4191228k FS > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > across:4191228k FS > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > flags: > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > > series. > > > > > > > > Let me think on it, but it could well relate to the one you got > > > > > > > > before. > > > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > instead of next-20151021 to remove noise from your migration > > > > > > > cleanup > > > > > > > series and will test it again. > > > > > > > If it is fixed, I will test again with your migration patchset, > > > > > > > then. > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a > > > > > > long time. > > > > > > Therefore, there is no patchset from Hugh's migration patch in > > > > > > there. > > > > > > And I added below debug code with request from Kirill to all test > > > > > > kernels. > > > > > > > > > > It took too long time (and a lot of printk()), but I think I track it > > > > > down > > > > > finally. > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > tested, but > > > > > looks like it works. > > > > > > > > > > The problem was my wrong assumption on how migration works: I thought > > > > > that > > > > > kernel would wait migration to finish on before deconstruction > > > > > mapping. > > > > > > > > > > But turn out that's not true. > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end > > > > > up > > > > > with page which is not mapped anymore but has _count and _mapcount > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan > > > > > and by > > > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > > > It's likely that page->mapping in this case would point to freed > > > > > anon_vma. > > > > > > > > > > BOOM! > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > > removing > > > > > we get pin back and put page on rmap. This way even if migration entry > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > Please, test. > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, > > > > I tested > > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > > index:0x61800 compound_mapcount: 0 > > > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > > > page->mem_cgroup:88007f613c00 > > > > > > Ignore my previous answer. Still sleeping. > > > > > > The right way to fix I think is something like: > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index 35643176bc15..f2d46792a554 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, > > > bool compound = flags & RMAP_COMPOUND; > > > bool first; > > > > > > - if (PageTransCompound(page)) { > > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > Hello Hugh, > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > > another oops > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for > > > > > > > > bug fix) > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > > > page_mapped > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > try_to_unmap seems > > > > > > > > to be true. > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > across:4191228k FS > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > across:4191228k FS > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > flags: > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > series. > > > > > > > Let me think on it, but it could well relate to the one you got > > > > > > > before. > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > > > series and will test it again. > > > > > > If it is fixed, I will test again with your migration patchset, > > > > > > then. > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > > > time. > > > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > > > And I added below debug code with request from Kirill to all test > > > > > kernels. > > > > > > > > It took too long time (and a lot of printk()), but I think I track it > > > > down > > > > finally. > > > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, > > > > but > > > > looks like it works. > > > > > > > > The problem was my wrong assumption on how migration works: I thought > > > > that > > > > kernel would wait migration to finish on before deconstruction mapping. > > > > > > > > But turn out that's not true. > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > > > with page which is not mapped anymore but has _count and _mapcount > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and > > > > by > > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > > It's likely that page->mapping in this case would point to freed > > > > anon_vma. > > > > > > > > BOOM! > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > removing > > > > we get pin back and put page on rmap. This way even if migration entry > > > > will be removed under us we don't corrupt page's state. > > > > > > > > Please, test. > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > > > tested > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > index:0x61800 compound_mapcount: 0 > > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > > page->mem_cgroup:88007f613c00 > > > > Ignore my previous answer. Still sleeping. > > > > The right way to fix I think is something like: > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 35643176bc15..f2d46792a554 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, > > bool compound = flags & RMAP_COMPOUND; > > bool first; > > > > - if (PageTransCompound(page)) { > > + if (PageTransCompound(page) && compound) { > > + atomic_t *mapcount; > > VM_BUG_ON_PAGE(!PageLocked(page), page); > > - if (compound) { > > - atomic_t *mapcount; > > - > > - VM_BUG_ON_PAGE(!PageTransHuge(page), page); > > - mapcount =
Re: kernel oops on mmotm-2015-10-15-15-20
On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > Hello Hugh, > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > > another oops > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for > > > > > > > > bug fix) > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > > > page_mapped > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > try_to_unmap seems > > > > > > > > to be true. > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > across:4191228k FS > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > across:4191228k FS > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > flags: > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > series. > > > > > > > Let me think on it, but it could well relate to the one you got > > > > > > > before. > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > > > series and will test it again. > > > > > > If it is fixed, I will test again with your migration patchset, > > > > > > then. > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > > > time. > > > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > > > And I added below debug code with request from Kirill to all test > > > > > kernels. > > > > > > > > It took too long time (and a lot of printk()), but I think I track it > > > > down > > > > finally. > > > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, > > > > but > > > > looks like it works. > > > > > > > > The problem was my wrong assumption on how migration works: I thought > > > > that > > > > kernel would wait migration to finish on before deconstruction mapping. > > > > > > > > But turn out that's not true. > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > > > with page which is not mapped anymore but has _count and _mapcount > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and > > > > by > > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > > It's likely that page->mapping in this case would point to freed > > > > anon_vma. > > > > > > > > BOOM! > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > removing > > > > we get pin back and put page on rmap. This way even if migration entry > > > > will be removed under us we don't corrupt page's state. > > > > > > > > Please, test. > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > > > tested > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > index:0x61800 compound_mapcount: 0 > > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > > page->mem_cgroup:88007f613c00 > > > > Ignore my previous answer. Still sleeping. > > > > The right way to fix I think is something like: > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 35643176bc15..f2d46792a554 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, > > bool compound = flags & RMAP_COMPOUND; > > bool first; > > > > - if (PageTransCompound(page)) { > > + if (PageTransCompound(page) && compound) { > > + atomic_t *mapcount; > > VM_BUG_ON_PAGE(!PageLocked(page), page); > > - if (compound) { > > - atomic_t *mapcount; > > - > > - VM_BUG_ON_PAGE(!PageTransHuge(page), page); > > - mapcount =
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote: > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > > Hello Kirill, > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I > > > > > > > > > > > had another oops > > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty > > > > > > > > > > > for bug fix) > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page > > > > > > > > > > > was not page_mapped > > > > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > > > > try_to_unmap seems > > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > > across:4191228k FS > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > > across:4191228k FS > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > > flags: > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > > > > series. > > > > > > > > > > Let me think on it, but it could well relate to the one you > > > > > > > > > > got before. > > > > > > > > > > > > > > > > > > I will roll back to > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > > instead of next-20151021 to remove noise from your migration > > > > > > > > > cleanup > > > > > > > > > series and will test it again. > > > > > > > > > If it is fixed, I will test again with your migration > > > > > > > > > patchset, then. > > > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for > > > > > > > > a long time. > > > > > > > > Therefore, there is no patchset from Hugh's migration patch in > > > > > > > > there. > > > > > > > > And I added below debug code with request from Kirill to all > > > > > > > > test kernels. > > > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I > > > > > > > track it down > > > > > > > finally. > > > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > > > tested, but > > > > > > > looks like it works. > > > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > > thought that > > > > > > > kernel would wait migration to finish on before deconstruction > > > > > > > mapping. > > > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can > > > > > > > end up > > > > > > > with page which is not mapped anymore but has _count and _mapcount > > > > > > > elevated. The page is on LRU too. So it's still reachable by > > > > > > > vmscan and by > > > > > > > pfn scanners (Sasha showed few similar traces from compaction > > > > > > > too). > > > > > > > It's likely that page->mapping in this case would point to freed > > > > > > > anon_vma. > > > > > > > > > > > > > > BOOM! > > > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > > migration > > > > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > > > > removing > > > > > > > we get pin back and put page on rmap. This way even if migration > > > > > > > entry > > > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > > > > > Please, test. > > > > > > > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new > > > > > > patch, I tested > > > > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > > > > index:0x61800 compound_mapcount: 0 > > > > > > flags:
Re: kernel oops on mmotm-2015-10-15-15-20
On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote: > Hello Kirill, > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > > Hello Hugh, > > > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > > > > another oops > > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty > > > > > > > > > > for bug fix) > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was > > > > > > > > > > not page_mapped > > > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > > > try_to_unmap seems > > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > across:4191228k FS > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > > across:4191228k FS > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > > flags: > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > > > series. > > > > > > > > > Let me think on it, but it could well relate to the one you > > > > > > > > > got before. > > > > > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > > instead of next-20151021 to remove noise from your migration > > > > > > > > cleanup > > > > > > > > series and will test it again. > > > > > > > > If it is fixed, I will test again with your migration patchset, > > > > > > > > then. > > > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a > > > > > > > long time. > > > > > > > Therefore, there is no patchset from Hugh's migration patch in > > > > > > > there. > > > > > > > And I added below debug code with request from Kirill to all test > > > > > > > kernels. > > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I track > > > > > > it down > > > > > > finally. > > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > > tested, but > > > > > > looks like it works. > > > > > > > > > > > > The problem was my wrong assumption on how migration works: I > > > > > > thought that > > > > > > kernel would wait migration to finish on before deconstruction > > > > > > mapping. > > > > > > > > > > > > But turn out that's not true. > > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can > > > > > > end up > > > > > > with page which is not mapped anymore but has _count and _mapcount > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan > > > > > > and by > > > > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > > > > It's likely that page->mapping in this case would point to freed > > > > > > anon_vma. > > > > > > > > > > > > BOOM! > > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal > > > > > > migration > > > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > > > removing > > > > > > we get pin back and put page on rmap. This way even if migration > > > > > > entry > > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > > > Please, test. > > > > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new > > > > > patch, I tested > > > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > > > index:0x61800 compound_mapcount: 0 > > > > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > > > > page->mem_cgroup:88007f613c00 > > > > > > > > Ignore my previous answer. Still sleeping. > > > > > > > > The right way to fix I think is something like: > > > > > > > > diff --git a/mm/rmap.c
Re: kernel oops on mmotm-2015-10-15-15-20
Hello Kirill, On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote: > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote: > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > > > Hello Hugh, > > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > > > another oops > > > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for > > > > > > > > > bug fix) > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was > > > > > > > > > not page_mapped > > > > > > > > > at that time but second check of page_mapped right before > > > > > > > > > try_to_unmap seems > > > > > > > > > to be true. > > > > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > across:4191228k FS > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > > > across:4191228k FS > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 > > > > > > > > > mapping:88007f1b5f51 index:0x60aff > > > > > > > > > flags: > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration > > > > > > > > series. > > > > > > > > Let me think on it, but it could well relate to the one you got > > > > > > > > before. > > > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > > > instead of next-20151021 to remove noise from your migration > > > > > > > cleanup > > > > > > > series and will test it again. > > > > > > > If it is fixed, I will test again with your migration patchset, > > > > > > > then. > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a > > > > > > long time. > > > > > > Therefore, there is no patchset from Hugh's migration patch in > > > > > > there. > > > > > > And I added below debug code with request from Kirill to all test > > > > > > kernels. > > > > > > > > > > It took too long time (and a lot of printk()), but I think I track it > > > > > down > > > > > finally. > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly > > > > > tested, but > > > > > looks like it works. > > > > > > > > > > The problem was my wrong assumption on how migration works: I thought > > > > > that > > > > > kernel would wait migration to finish on before deconstruction > > > > > mapping. > > > > > > > > > > But turn out that's not true. > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end > > > > > up > > > > > with page which is not mapped anymore but has _count and _mapcount > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan > > > > > and by > > > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > > > It's likely that page->mapping in this case would point to freed > > > > > anon_vma. > > > > > > > > > > BOOM! > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > > > > entries logic: on setup we remove page from rmap and drop pin, on > > > > > removing > > > > > we get pin back and put page on rmap. This way even if migration entry > > > > > will be removed under us we don't corrupt page's state. > > > > > > > > > > Please, test. > > > > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, > > > > I tested > > > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > > > index:0x61800 compound_mapcount: 0 > > > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > > > page->mem_cgroup:88007f613c00 > > > > > > Ignore my previous answer. Still sleeping. > > > > > > The right way to fix I think is something like: > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index 35643176bc15..f2d46792a554 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, > > > bool compound = flags & RMAP_COMPOUND; > > > bool first; > > > > > > - if (PageTransCompound(page)) { > > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > Hello Hugh, > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > another oops > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug > > > > > > > fix) > > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > > page_mapped > > > > > > > at that time but second check of page_mapped right before > > > > > > > try_to_unmap seems > > > > > > > to be true. > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > across:4191228k FS > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > across:4191228k FS > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > > > index:0x60aff > > > > > > > flags: > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > > > Let me think on it, but it could well relate to the one you got > > > > > > before. > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > > series and will test it again. > > > > > If it is fixed, I will test again with your migration patchset, then. > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > > time. > > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > > And I added below debug code with request from Kirill to all test > > > > kernels. > > > > > > It took too long time (and a lot of printk()), but I think I track it down > > > finally. > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, > > > but > > > looks like it works. > > > > > > The problem was my wrong assumption on how migration works: I thought that > > > kernel would wait migration to finish on before deconstruction mapping. > > > > > > But turn out that's not true. > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > > with page which is not mapped anymore but has _count and _mapcount > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > It's likely that page->mapping in this case would point to freed anon_vma. > > > > > > BOOM! > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > > entries logic: on setup we remove page from rmap and drop pin, on removing > > > we get pin back and put page on rmap. This way even if migration entry > > > will be removed under us we don't corrupt page's state. > > > > > > Please, test. > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > > tested > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > index:0x61800 compound_mapcount: 0 > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > page->mem_cgroup:88007f613c00 > > Ignore my previous answer. Still sleeping. > > The right way to fix I think is something like: > > diff --git a/mm/rmap.c b/mm/rmap.c > index 35643176bc15..f2d46792a554 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, > bool compound = flags & RMAP_COMPOUND; > bool first; > > - if (PageTransCompound(page)) { > + if (PageTransCompound(page) && compound) { > + atomic_t *mapcount; > VM_BUG_ON_PAGE(!PageLocked(page), page); > - if (compound) { > - atomic_t *mapcount; > - > - VM_BUG_ON_PAGE(!PageTransHuge(page), page); > - mapcount = compound_mapcount_ptr(page); > - first = atomic_inc_and_test(mapcount); > - } else { > - /* Anon THP always mapped first with PMD */ > - first = 0; > - VM_BUG_ON_PAGE(!page_mapcount(page), page); > - atomic_inc(>_mapcount); > -
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote: > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > > Hello Hugh, > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > > > I added the code to check it and queued it again but I had > > > > > > > another oops > > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug > > > > > > > fix) > > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > > page_mapped > > > > > > > at that time but second check of page_mapped right before > > > > > > > try_to_unmap seems > > > > > > > to be true. > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > across:4191228k FS > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > > across:4191228k FS > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > > > index:0x60aff > > > > > > > flags: > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > > > Let me think on it, but it could well relate to the one you got > > > > > > before. > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > > series and will test it again. > > > > > If it is fixed, I will test again with your migration patchset, then. > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > > time. > > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > > And I added below debug code with request from Kirill to all test > > > > kernels. > > > > > > It took too long time (and a lot of printk()), but I think I track it down > > > finally. > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, > > > but > > > looks like it works. > > > > > > The problem was my wrong assumption on how migration works: I thought that > > > kernel would wait migration to finish on before deconstruction mapping. > > > > > > But turn out that's not true. > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > > with page which is not mapped anymore but has _count and _mapcount > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by > > > pfn scanners (Sasha showed few similar traces from compaction too). > > > It's likely that page->mapping in this case would point to freed anon_vma. > > > > > > BOOM! > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > > entries logic: on setup we remove page from rmap and drop pin, on removing > > > we get pin back and put page on rmap. This way even if migration entry > > > will be removed under us we don't corrupt page's state. > > > > > > Please, test. > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > > tested > > one I sent to you(ie, oops.c + memcg_test.sh) > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > > index:0x61800 compound_mapcount: 0 > > flags: 0x40044009(locked|uptodate|head|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > > page->mem_cgroup:88007f613c00 > > Ignore my previous answer. Still sleeping. > > The right way to fix I think is something like: > > diff --git a/mm/rmap.c b/mm/rmap.c > index 35643176bc15..f2d46792a554 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, > bool compound = flags & RMAP_COMPOUND; > bool first; > > - if (PageTransCompound(page)) { > + if (PageTransCompound(page) && compound) { > + atomic_t *mapcount; > VM_BUG_ON_PAGE(!PageLocked(page), page); > - if (compound) { > - atomic_t *mapcount; > - > - VM_BUG_ON_PAGE(!PageTransHuge(page), page); > - mapcount = compound_mapcount_ptr(page); > - first = atomic_inc_and_test(mapcount); > - } else { > - /* Anon THP always mapped first with PMD */ > - first = 0; > - VM_BUG_ON_PAGE(!page_mapcount(page), page); > - atomic_inc(>_mapcount); > -
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > Hello Hugh, > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > > oops > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug > > > > > > fix) > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > page_mapped > > > > > > at that time but second check of page_mapped right before > > > > > > try_to_unmap seems > > > > > > to be true. > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > > index:0x60aff > > > > > > flags: > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > > Let me think on it, but it could well relate to the one you got > > > > > before. > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > series and will test it again. > > > > If it is fixed, I will test again with your migration patchset, then. > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > time. > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > And I added below debug code with request from Kirill to all test kernels. > > > > It took too long time (and a lot of printk()), but I think I track it down > > finally. > > > > The patch below seems fixes issue for me. It's not yet properly tested, but > > looks like it works. > > > > The problem was my wrong assumption on how migration works: I thought that > > kernel would wait migration to finish on before deconstruction mapping. > > > > But turn out that's not true. > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > with page which is not mapped anymore but has _count and _mapcount > > elevated. The page is on LRU too. So it's still reachable by vmscan and by > > pfn scanners (Sasha showed few similar traces from compaction too). > > It's likely that page->mapping in this case would point to freed anon_vma. > > > > BOOM! > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > entries logic: on setup we remove page from rmap and drop pin, on removing > > we get pin back and put page on rmap. This way even if migration entry > > will be removed under us we don't corrupt page's state. > > > > Please, test. > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > tested > one I sent to you(ie, oops.c + memcg_test.sh) > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > index:0x61800 compound_mapcount: 0 > flags: 0x40044009(locked|uptodate|head|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > page->mem_cgroup:88007f613c00 Ignore my previous answer. Still sleeping. The right way to fix I think is something like: diff --git a/mm/rmap.c b/mm/rmap.c index 35643176bc15..f2d46792a554 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, bool compound = flags & RMAP_COMPOUND; bool first; - if (PageTransCompound(page)) { + if (PageTransCompound(page) && compound) { + atomic_t *mapcount; VM_BUG_ON_PAGE(!PageLocked(page), page); - if (compound) { - atomic_t *mapcount; - - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - mapcount = compound_mapcount_ptr(page); - first = atomic_inc_and_test(mapcount); - } else { - /* Anon THP always mapped first with PMD */ - first = 0; - VM_BUG_ON_PAGE(!page_mapcount(page), page); - atomic_inc(>_mapcount); - } + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + mapcount = compound_mapcount_ptr(page); + first = atomic_inc_and_test(mapcount); } else { VM_BUG_ON_PAGE(compound, page); first
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > Hello Hugh, > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > > oops > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug > > > > > > fix) > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > page_mapped > > > > > > at that time but second check of page_mapped right before > > > > > > try_to_unmap seems > > > > > > to be true. > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > > index:0x60aff > > > > > > flags: > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > > Let me think on it, but it could well relate to the one you got > > > > > before. > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > series and will test it again. > > > > If it is fixed, I will test again with your migration patchset, then. > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > time. > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > And I added below debug code with request from Kirill to all test kernels. > > > > It took too long time (and a lot of printk()), but I think I track it down > > finally. > > > > The patch below seems fixes issue for me. It's not yet properly tested, but > > looks like it works. > > > > The problem was my wrong assumption on how migration works: I thought that > > kernel would wait migration to finish on before deconstruction mapping. > > > > But turn out that's not true. > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > with page which is not mapped anymore but has _count and _mapcount > > elevated. The page is on LRU too. So it's still reachable by vmscan and by > > pfn scanners (Sasha showed few similar traces from compaction too). > > It's likely that page->mapping in this case would point to freed anon_vma. > > > > BOOM! > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > entries logic: on setup we remove page from rmap and drop pin, on removing > > we get pin back and put page on rmap. This way even if migration entry > > will be removed under us we don't corrupt page's state. > > > > Please, test. > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > tested > one I sent to you(ie, oops.c + memcg_test.sh) > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > index:0x61800 compound_mapcount: 0 > flags: 0x40044009(locked|uptodate|head|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) The VM_BUG_ON_PAGE() is bogus after the patch. Just drop it. > page->mem_cgroup:88007f613c00 > [ cut here ] > kernel BUG at mm/rmap.c:1156! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ > #1573 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000 > RIP: 0010:[] [] > do_page_add_anon_rmap+0x323/0x360 > RSP: :8805f758 EFLAGS: 00010292 > RAX: 0021 RBX: ea00016a RCX: 81830db8 > RDX: 0001 RSI: 0246 RDI: 821df4d8 > RBP: 8805f780 R08: R09: 880b8be0 > R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0 > R13: 6180 R14: R15: 88007e85ddc0 > FS: 7f5cd5fea740() GS:8800bfae() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 64c03000 CR3: 7f017000 CR4: 06a0 > Stack: > 88007f351000 88007f352000 ea00016a 6180 > 88007e85ddc0 8805f790 81128278 8805f800 > 81146dbb 000619ff
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > Hello Hugh, > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > oops > > > > > in this time but symptom is related to anon_vma, too. > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > page_mapped > > > > > at that time but second check of page_mapped right before > > > > > try_to_unmap seems > > > > > to be true. > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > index:0x60aff > > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > && !anon_vma) > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > Let me think on it, but it could well relate to the one you got before. > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > instead of next-20151021 to remove noise from your migration cleanup > > > series and will test it again. > > > If it is fixed, I will test again with your migration patchset, then. > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time. > > Therefore, there is no patchset from Hugh's migration patch in there. > > And I added below debug code with request from Kirill to all test kernels. > > It took too long time (and a lot of printk()), but I think I track it down > finally. > > The patch below seems fixes issue for me. It's not yet properly tested, but > looks like it works. > > The problem was my wrong assumption on how migration works: I thought that > kernel would wait migration to finish on before deconstruction mapping. > > But turn out that's not true. > > As result if zap_pte_range() races with split_huge_page(), we can end up > with page which is not mapped anymore but has _count and _mapcount > elevated. The page is on LRU too. So it's still reachable by vmscan and by > pfn scanners (Sasha showed few similar traces from compaction too). > It's likely that page->mapping in this case would point to freed anon_vma. > > BOOM! > > The patch modify freeze/unfreeze_page() code to match normal migration > entries logic: on setup we remove page from rmap and drop pin, on removing > we get pin back and put page on rmap. This way even if migration entry > will be removed under us we don't corrupt page's state. > > Please, test. > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested one I sent to you(ie, oops.c + memcg_test.sh) page:ea00016a count:3 mapcount:0 mapping:88007f49d001 index:0x61800 compound_mapcount: 0 flags: 0x40044009(locked|uptodate|head|swapbacked) page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) page->mem_cgroup:88007f613c00 [ cut here ] kernel BUG at mm/rmap.c:1156! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ #1573 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000 RIP: 0010:[] [] do_page_add_anon_rmap+0x323/0x360 RSP: :8805f758 EFLAGS: 00010292 RAX: 0021 RBX: ea00016a RCX: 81830db8 RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 8805f780 R08: R09: 880b8be0 R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0 R13: 6180 R14: R15: 88007e85ddc0 FS: 7f5cd5fea740() GS:8800bfae() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 64c03000 CR3: 7f017000 CR4: 06a0 Stack: 88007f351000 88007f352000 ea00016a 6180 88007e85ddc0 8805f790 81128278 8805f800 81146dbb 000619ff 00061800 1600 Call Trace: [] page_add_anon_rmap+0x18/0x20 [] unfreeze_page+0x24b/0x330 [] split_huge_page_to_list+0x3df/0x920 [] ? scan_swap_map+0x37f/0x550 [] add_to_swap+0xb6/0x100 [] shrink_page_list+0x3b7/0xdc0 [] shrink_inactive_list+0x18c/0x4b0 [] shrink_lruvec+0x58f/0x730 [] shrink_zone+0xd4/0x280 [] do_try_to_free_pages+0x12d/0x3b0 []
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > Hello Hugh, > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > oops > > > > > in this time but symptom is related to anon_vma, too. > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > page_mapped > > > > > at that time but second check of page_mapped right before > > > > > try_to_unmap seems > > > > > to be true. > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > index:0x60aff > > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > && !anon_vma) > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > Let me think on it, but it could well relate to the one you got before. > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > instead of next-20151021 to remove noise from your migration cleanup > > > series and will test it again. > > > If it is fixed, I will test again with your migration patchset, then. > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time. > > Therefore, there is no patchset from Hugh's migration patch in there. > > And I added below debug code with request from Kirill to all test kernels. > > It took too long time (and a lot of printk()), but I think I track it down > finally. > > The patch below seems fixes issue for me. It's not yet properly tested, but > looks like it works. > > The problem was my wrong assumption on how migration works: I thought that > kernel would wait migration to finish on before deconstruction mapping. > > But turn out that's not true. > > As result if zap_pte_range() races with split_huge_page(), we can end up > with page which is not mapped anymore but has _count and _mapcount > elevated. The page is on LRU too. So it's still reachable by vmscan and by > pfn scanners (Sasha showed few similar traces from compaction too). > It's likely that page->mapping in this case would point to freed anon_vma. > > BOOM! > > The patch modify freeze/unfreeze_page() code to match normal migration > entries logic: on setup we remove page from rmap and drop pin, on removing > we get pin back and put page on rmap. This way even if migration entry > will be removed under us we don't corrupt page's state. > > Please, test. > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested one I sent to you(ie, oops.c + memcg_test.sh) page:ea00016a count:3 mapcount:0 mapping:88007f49d001 index:0x61800 compound_mapcount: 0 flags: 0x40044009(locked|uptodate|head|swapbacked) page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) page->mem_cgroup:88007f613c00 [ cut here ] kernel BUG at mm/rmap.c:1156! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ #1573 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000 RIP: 0010:[] [] do_page_add_anon_rmap+0x323/0x360 RSP: :8805f758 EFLAGS: 00010292 RAX: 0021 RBX: ea00016a RCX: 81830db8 RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 8805f780 R08: R09: 880b8be0 R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0 R13: 6180 R14: R15: 88007e85ddc0 FS: 7f5cd5fea740() GS:8800bfae() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 64c03000 CR3: 7f017000 CR4: 06a0 Stack: 88007f351000 88007f352000 ea00016a 6180 88007e85ddc0 8805f790 81128278 8805f800 81146dbb 000619ff 00061800 1600 Call Trace: [] page_add_anon_rmap+0x18/0x20 [] unfreeze_page+0x24b/0x330 [] split_huge_page_to_list+0x3df/0x920 [] ? scan_swap_map+0x37f/0x550 [] add_to_swap+0xb6/0x100 [] shrink_page_list+0x3b7/0xdc0 [] shrink_inactive_list+0x18c/0x4b0 [] shrink_lruvec+0x58f/0x730 [] shrink_zone+0xd4/0x280 [] do_try_to_free_pages+0x12d/0x3b0 []
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > Hello Hugh, > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > > oops > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug > > > > > > fix) > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > page_mapped > > > > > > at that time but second check of page_mapped right before > > > > > > try_to_unmap seems > > > > > > to be true. > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > > index:0x60aff > > > > > > flags: > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > > Let me think on it, but it could well relate to the one you got > > > > > before. > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > series and will test it again. > > > > If it is fixed, I will test again with your migration patchset, then. > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > time. > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > And I added below debug code with request from Kirill to all test kernels. > > > > It took too long time (and a lot of printk()), but I think I track it down > > finally. > > > > The patch below seems fixes issue for me. It's not yet properly tested, but > > looks like it works. > > > > The problem was my wrong assumption on how migration works: I thought that > > kernel would wait migration to finish on before deconstruction mapping. > > > > But turn out that's not true. > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > with page which is not mapped anymore but has _count and _mapcount > > elevated. The page is on LRU too. So it's still reachable by vmscan and by > > pfn scanners (Sasha showed few similar traces from compaction too). > > It's likely that page->mapping in this case would point to freed anon_vma. > > > > BOOM! > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > entries logic: on setup we remove page from rmap and drop pin, on removing > > we get pin back and put page on rmap. This way even if migration entry > > will be removed under us we don't corrupt page's state. > > > > Please, test. > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > tested > one I sent to you(ie, oops.c + memcg_test.sh) > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > index:0x61800 compound_mapcount: 0 > flags: 0x40044009(locked|uptodate|head|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) The VM_BUG_ON_PAGE() is bogus after the patch. Just drop it. > page->mem_cgroup:88007f613c00 > [ cut here ] > kernel BUG at mm/rmap.c:1156! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ > #1573 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000 > RIP: 0010:[] [] > do_page_add_anon_rmap+0x323/0x360 > RSP: :8805f758 EFLAGS: 00010292 > RAX: 0021 RBX: ea00016a RCX: 81830db8 > RDX: 0001 RSI: 0246 RDI: 821df4d8 > RBP: 8805f780 R08: R09: 880b8be0 > R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0 > R13: 6180 R14: R15: 88007e85ddc0 > FS: 7f5cd5fea740() GS:8800bfae() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 64c03000 CR3: 7f017000 CR4: 06a0 > Stack: > 88007f351000 88007f352000 ea00016a 6180 > 88007e85ddc0 8805f790 81128278 8805f800 > 81146dbb 000619ff
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote: > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote: > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > > > Hello Hugh, > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > > oops > > > > > > in this time but symptom is related to anon_vma, too. > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug > > > > > > fix) > > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > > page_mapped > > > > > > at that time but second check of page_mapped right before > > > > > > try_to_unmap seems > > > > > > to be true. > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > > across:4191228k FS > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > > index:0x60aff > > > > > > flags: > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && > > > > > > !PageKsm(page) && !anon_vma) > > > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > > Let me think on it, but it could well relate to the one you got > > > > > before. > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > > > instead of next-20151021 to remove noise from your migration cleanup > > > > series and will test it again. > > > > If it is fixed, I will test again with your migration patchset, then. > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long > > > time. > > > Therefore, there is no patchset from Hugh's migration patch in there. > > > And I added below debug code with request from Kirill to all test kernels. > > > > It took too long time (and a lot of printk()), but I think I track it down > > finally. > > > > The patch below seems fixes issue for me. It's not yet properly tested, but > > looks like it works. > > > > The problem was my wrong assumption on how migration works: I thought that > > kernel would wait migration to finish on before deconstruction mapping. > > > > But turn out that's not true. > > > > As result if zap_pte_range() races with split_huge_page(), we can end up > > with page which is not mapped anymore but has _count and _mapcount > > elevated. The page is on LRU too. So it's still reachable by vmscan and by > > pfn scanners (Sasha showed few similar traces from compaction too). > > It's likely that page->mapping in this case would point to freed anon_vma. > > > > BOOM! > > > > The patch modify freeze/unfreeze_page() code to match normal migration > > entries logic: on setup we remove page from rmap and drop pin, on removing > > we get pin back and put page on rmap. This way even if migration entry > > will be removed under us we don't corrupt page's state. > > > > Please, test. > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I > tested > one I sent to you(ie, oops.c + memcg_test.sh) > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 > index:0x61800 compound_mapcount: 0 > flags: 0x40044009(locked|uptodate|head|swapbacked) > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page)) > page->mem_cgroup:88007f613c00 Ignore my previous answer. Still sleeping. The right way to fix I think is something like: diff --git a/mm/rmap.c b/mm/rmap.c index 35643176bc15..f2d46792a554 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page, bool compound = flags & RMAP_COMPOUND; bool first; - if (PageTransCompound(page)) { + if (PageTransCompound(page) && compound) { + atomic_t *mapcount; VM_BUG_ON_PAGE(!PageLocked(page), page); - if (compound) { - atomic_t *mapcount; - - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - mapcount = compound_mapcount_ptr(page); - first = atomic_inc_and_test(mapcount); - } else { - /* Anon THP always mapped first with PMD */ - first = 0; - VM_BUG_ON_PAGE(!page_mapcount(page), page); - atomic_inc(>_mapcount); - } + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + mapcount = compound_mapcount_ptr(page); + first = atomic_inc_and_test(mapcount); } else { VM_BUG_ON_PAGE(compound, page); first
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > Hello Hugh, > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > I added the code to check it and queued it again but I had another oops > > > > in this time but symptom is related to anon_vma, too. > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > page_mapped > > > > at that time but second check of page_mapped right before try_to_unmap > > > > seems > > > > to be true. > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > index:0x60aff > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > > !anon_vma) > > > > > > That's interesting, that's one I added in my page migration series. > > > Let me think on it, but it could well relate to the one you got before. > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > instead of next-20151021 to remove noise from your migration cleanup > > series and will test it again. > > If it is fixed, I will test again with your migration patchset, then. > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time. > Therefore, there is no patchset from Hugh's migration patch in there. > And I added below debug code with request from Kirill to all test kernels. It took too long time (and a lot of printk()), but I think I track it down finally. The patch below seems fixes issue for me. It's not yet properly tested, but looks like it works. The problem was my wrong assumption on how migration works: I thought that kernel would wait migration to finish on before deconstruction mapping. But turn out that's not true. As result if zap_pte_range() races with split_huge_page(), we can end up with page which is not mapped anymore but has _count and _mapcount elevated. The page is on LRU too. So it's still reachable by vmscan and by pfn scanners (Sasha showed few similar traces from compaction too). It's likely that page->mapping in this case would point to freed anon_vma. BOOM! The patch modify freeze/unfreeze_page() code to match normal migration entries logic: on setup we remove page from rmap and drop pin, on removing we get pin back and put page on rmap. This way even if migration entry will be removed under us we don't corrupt page's state. Please, test. Not-Yet-Signed-off-by: Kirill A. Shutemov diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5e0fe82a0fae..192b50c7526c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); + + if (freeze) { + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { + page_remove_rmap(page + i, false); + put_page(page + i); + } + } } void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, @@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page, if (pte_soft_dirty(entry)) swp_pte = pte_swp_mksoft_dirty(swp_pte); set_pte_at(vma->vm_mm, address, pte + i, swp_pte); + page_remove_rmap(page, false); + put_page(page); } pte_unmap_unlock(pte, ptl); } @@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page, return; pte = pte_offset_map_lock(vma->vm_mm, pmd, address, ); for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) { - if (!page_mapped(page)) - continue; if (!is_swap_pte(pte[i])) continue; @@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page, if (migration_entry_to_page(swp_entry) != page) continue; + get_page(page); + page_add_anon_rmap(page, vma, address, false); + entry = pte_mkold(mk_pte(page, vma->vm_page_prot)); entry = pte_mkdirty(entry); if (is_write_migration_entry(swp_entry)) -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote: > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > > Hello Hugh, > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > I added the code to check it and queued it again but I had another oops > > > > in this time but symptom is related to anon_vma, too. > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > page_mapped > > > > at that time but second check of page_mapped right before try_to_unmap > > > > seems > > > > to be true. > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > index:0x60aff > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > > !anon_vma) > > > > > > That's interesting, that's one I added in my page migration series. > > > Let me think on it, but it could well relate to the one you got before. > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > > instead of next-20151021 to remove noise from your migration cleanup > > series and will test it again. > > If it is fixed, I will test again with your migration patchset, then. > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time. > Therefore, there is no patchset from Hugh's migration patch in there. > And I added below debug code with request from Kirill to all test kernels. It took too long time (and a lot of printk()), but I think I track it down finally. The patch below seems fixes issue for me. It's not yet properly tested, but looks like it works. The problem was my wrong assumption on how migration works: I thought that kernel would wait migration to finish on before deconstruction mapping. But turn out that's not true. As result if zap_pte_range() races with split_huge_page(), we can end up with page which is not mapped anymore but has _count and _mapcount elevated. The page is on LRU too. So it's still reachable by vmscan and by pfn scanners (Sasha showed few similar traces from compaction too). It's likely that page->mapping in this case would point to freed anon_vma. BOOM! The patch modify freeze/unfreeze_page() code to match normal migration entries logic: on setup we remove page from rmap and drop pin, on removing we get pin back and put page on rmap. This way even if migration entry will be removed under us we don't corrupt page's state. Please, test. Not-Yet-Signed-off-by: Kirill A. Shutemovdiff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5e0fe82a0fae..192b50c7526c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); + + if (freeze) { + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { + page_remove_rmap(page + i, false); + put_page(page + i); + } + } } void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, @@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page, if (pte_soft_dirty(entry)) swp_pte = pte_swp_mksoft_dirty(swp_pte); set_pte_at(vma->vm_mm, address, pte + i, swp_pte); + page_remove_rmap(page, false); + put_page(page); } pte_unmap_unlock(pte, ptl); } @@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page, return; pte = pte_offset_map_lock(vma->vm_mm, pmd, address, ); for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) { - if (!page_mapped(page)) - continue; if (!is_swap_pte(pte[i])) continue; @@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page, if (migration_entry_to_page(swp_entry) != page) continue; + get_page(page); + page_add_anon_rmap(page, vma, address, false); + entry = pte_mkold(mk_pte(page, vma->vm_page_prot)); entry = pte_mkdirty(entry); if (is_write_migration_entry(swp_entry)) -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, 21 Oct 2015, Hugh Dickins wrote: > On Wed, 21 Oct 2015, Hugh Dickins wrote: > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > Hello Hugh, > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > oops > > > > > in this time but symptom is related to anon_vma, too. > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > page_mapped > > > > > at that time but second check of page_mapped right before > > > > > try_to_unmap seems > > > > > to be true. > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > index:0x60aff > > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > && !anon_vma) > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > Let me think on it, but it could well relate to the one you got before. > > I think I have introduced a bug there; or rather, made more evident > a pre-existing bug. But I'm not sure yet: the stacktrace was from > compaction (called by khugepaged, but that may not be relevant at all), > and thinking through the races with isolate_migratepages_block() is > never easy. > > What's certain is that I was not giving any thought to > isolate_migratepages_block() when I added that VM_BUG_ON_PAGE(): > I was thinking about "stable" anonymous pages, and how they get > faulted back in from swapcache while holding page lock. > > It looks to me now as if a page might not yet be PageAnon when it's > first tested in __unmap_and_move(), when going to page_get_anon_vma(); > but is page_mapped() and PageAnon() by time of calling try_to_unmap(), > where I inserted the VM_BUG_ON_PAGE(). > > If so, the code would always have been wrong (trying to unmap the > anonymous page, and later remap its replacement, without a hold on > the anon_vma needed to guide both lookups); but I'll have made it > more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend > that's a good step forward :) > > There's a reference count check in isolated_migratepages_block() > before this, which would make it unlikely, but I doubt rules it out. > > However... you did hit an anon_vma reference counting problem before > my migration changes went in, and Kirill had a vague suspicion that > he might be screwing up anon_vma refcounting in split_huge_page(): > if he confirms that, I'd say it's more likely to be the cause of > your crash on this occasion. > > Not hard to fix mine (though we'll probably have to lose the > VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that > trivial fix), I just want to give the races more thought. And after giving it more thought, I realize that I was wrong yesterday, and the new VM_BUG_ON_PAGE() should be good as is: my guess is that it is simply alerting you to the same anon_vma reference counting issue as you had already hit without that patch. What I was forgetting yesterday, is that isolate_migratepages_block() can only take the page for migration when it's PageLRU(): and do_anonymous_page() only adds a page to the LRU after it has been marked as mapped and PageAnon. So the window that worried me yesterday, that __unmap_and_move() might see !PageAnon, then reach try_to_unmap() with it page_mapped and PageAnon: that window does not exist, with or without my changes. Hugh > > However it turns out, I think you have a very useful test there. > > (And I've observed no PageDirty problems with your recent patchsets, > though I don't use MADV_FREE at all myself.) > > Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > Hello Hugh, > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > I added the code to check it and queued it again but I had another oops > > > in this time but symptom is related to anon_vma, too. > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped > > > at that time but second check of page_mapped right before try_to_unmap > > > seems > > > to be true. > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > index:0x60aff > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > !anon_vma) > > > > That's interesting, that's one I added in my page migration series. > > Let me think on it, but it could well relate to the one you got before. > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > instead of next-20151021 to remove noise from your migration cleanup > series and will test it again. > If it is fixed, I will test again with your migration patchset, then. I tested mmotm-2015-10-15-15-20 with test program I attach for a long time. Therefore, there is no patchset from Hugh's migration patch in there. And I added below debug code with request from Kirill to all test kernels. diff --git a/mm/rmap.c b/mm/rmap.c index ddfb9be72366..1c23b70b1f57 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -513,6 +513,13 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page) anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); root_anon_vma = READ_ONCE(anon_vma->root); + + if (root_anon_vma == NULL) { + printk("anon_vma %p refcount %d\n", anon_vma, + atomic_read(_vma->refcount)); + VM_BUG_ON_PAGE(1, page); + } + if (down_read_trylock(_anon_vma->rwsem)) { /* * If the page is still mapped, then this anon_vma is still 1. mmotm-2015-10-15-15-20 + kirill's pte_mkdirty 1st trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS BUG: Bad rss-counter state mm:88007f1ed780 idx:1 val:488 BUG: Bad rss-counter state mm:88007f1ed780 idx:2 val:24 2nd trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS BUG: Bad rss-counter state mm:8800a5cca680 idx:1 val:512 Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS 2. mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP. 1st trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS BUG: Bad rss-counter state mm:88007f4c2d80 idx:1 val:511 BUG: Bad rss-counter state mm:88007f4c2d80 idx:2 val:1 2nd trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS anon_vma 88089aa0 refcount 0 page:ea0001a2ea40 count:3 mapcount:1 mapping:88089aa1 index:0x647a9 I tested it with KVM which guest system has 12 core and 3G memory. In mmotm-2015-10-15-15-20-no-madvise_free, I tweaked test program does madvise_dontneed intead of madvise_free via below patch For the testing, gcc -o oops oops.c ./memcg_test.sh I will be off from now on so please understand late response but I hope my test program will reproduce it in your machine. diff --git a/oops.c b/oops.c index e50330a..c8298f8 100644 --- a/oops.c +++ b/oops.c @@ -8,7 +8,7 @@ #include #include -#define MADV_FREE 5 +#define MADV_FREE 4 int pid; memcg_move_task.sh Description: Bourne shell script memcg_test.sh Description: Bourne shell script #include #include #include #include #include #include #include #include #include #define MADV_FREE 4 int pid; void sig_handler(int signo) { printf("pid %d sig received %d\n", pid, signo); exit(1); } void free_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size) { int i; for (i = 0; i < buf_count; i++) { if (bufs[i] != NULL) { munmap(bufs[i], buf_size); bufs[i] = NULL; } } } void alloc_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size) { int i; time_t rawtime; struct tm * timeinfo; void *addr = (void*)0x6000; for (i = 0; i < buf_count; i++) { void *ptr = NULL; ptr = mmap(addr, buf_size, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE|MAP_FIXED, 0, 0); if (ptr == MAP_FAILED) { char bufs[64];
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote: > Hello Hugh, > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > I added the code to check it and queued it again but I had another oops > > > in this time but symptom is related to anon_vma, too. > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped > > > at that time but second check of page_mapped right before try_to_unmap > > > seems > > > to be true. > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > index:0x60aff > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > !anon_vma) > > > > That's interesting, that's one I added in my page migration series. > > Let me think on it, but it could well relate to the one you got before. > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > instead of next-20151021 to remove noise from your migration cleanup > series and will test it again. > If it is fixed, I will test again with your migration patchset, then. I tested mmotm-2015-10-15-15-20 with test program I attach for a long time. Therefore, there is no patchset from Hugh's migration patch in there. And I added below debug code with request from Kirill to all test kernels. diff --git a/mm/rmap.c b/mm/rmap.c index ddfb9be72366..1c23b70b1f57 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -513,6 +513,13 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page) anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); root_anon_vma = READ_ONCE(anon_vma->root); + + if (root_anon_vma == NULL) { + printk("anon_vma %p refcount %d\n", anon_vma, + atomic_read(_vma->refcount)); + VM_BUG_ON_PAGE(1, page); + } + if (down_read_trylock(_anon_vma->rwsem)) { /* * If the page is still mapped, then this anon_vma is still 1. mmotm-2015-10-15-15-20 + kirill's pte_mkdirty 1st trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS BUG: Bad rss-counter state mm:88007f1ed780 idx:1 val:488 BUG: Bad rss-counter state mm:88007f1ed780 idx:2 val:24 2nd trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS BUG: Bad rss-counter state mm:8800a5cca680 idx:1 val:512 Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS 2. mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP. 1st trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS BUG: Bad rss-counter state mm:88007f4c2d80 idx:1 val:511 BUG: Bad rss-counter state mm:88007f4c2d80 idx:2 val:1 2nd trial: Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS anon_vma 88089aa0 refcount 0 page:ea0001a2ea40 count:3 mapcount:1 mapping:88089aa1 index:0x647a9 I tested it with KVM which guest system has 12 core and 3G memory. In mmotm-2015-10-15-15-20-no-madvise_free, I tweaked test program does madvise_dontneed intead of madvise_free via below patch For the testing, gcc -o oops oops.c ./memcg_test.sh I will be off from now on so please understand late response but I hope my test program will reproduce it in your machine. diff --git a/oops.c b/oops.c index e50330a..c8298f8 100644 --- a/oops.c +++ b/oops.c @@ -8,7 +8,7 @@ #include #include -#define MADV_FREE 5 +#define MADV_FREE 4 int pid; memcg_move_task.sh Description: Bourne shell script memcg_test.sh Description: Bourne shell script #include #include #include #include #include #include #include #include #include #define MADV_FREE 4 int pid; void sig_handler(int signo) { printf("pid %d sig received %d\n", pid, signo); exit(1); } void free_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size) { int i; for (i = 0; i < buf_count; i++) { if (bufs[i] != NULL) { munmap(bufs[i], buf_size); bufs[i] = NULL; } } } void alloc_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size) { int i; time_t rawtime; struct tm * timeinfo; void *addr = (void*)0x6000; for (i = 0; i < buf_count; i++) { void *ptr = NULL; ptr = mmap(addr, buf_size, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE|MAP_FIXED, 0, 0); if (ptr == MAP_FAILED) { char bufs[64];
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, 21 Oct 2015, Hugh Dickins wrote: > On Wed, 21 Oct 2015, Hugh Dickins wrote: > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > Hello Hugh, > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > > > I added the code to check it and queued it again but I had another > > > > > oops > > > > > in this time but symptom is related to anon_vma, too. > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > > page_mapped > > > > > at that time but second check of page_mapped right before > > > > > try_to_unmap seems > > > > > to be true. > > > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > > across:4191228k FS > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > > index:0x60aff > > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) > > > > > && !anon_vma) > > > > > > > > That's interesting, that's one I added in my page migration series. > > > > Let me think on it, but it could well relate to the one you got before. > > I think I have introduced a bug there; or rather, made more evident > a pre-existing bug. But I'm not sure yet: the stacktrace was from > compaction (called by khugepaged, but that may not be relevant at all), > and thinking through the races with isolate_migratepages_block() is > never easy. > > What's certain is that I was not giving any thought to > isolate_migratepages_block() when I added that VM_BUG_ON_PAGE(): > I was thinking about "stable" anonymous pages, and how they get > faulted back in from swapcache while holding page lock. > > It looks to me now as if a page might not yet be PageAnon when it's > first tested in __unmap_and_move(), when going to page_get_anon_vma(); > but is page_mapped() and PageAnon() by time of calling try_to_unmap(), > where I inserted the VM_BUG_ON_PAGE(). > > If so, the code would always have been wrong (trying to unmap the > anonymous page, and later remap its replacement, without a hold on > the anon_vma needed to guide both lookups); but I'll have made it > more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend > that's a good step forward :) > > There's a reference count check in isolated_migratepages_block() > before this, which would make it unlikely, but I doubt rules it out. > > However... you did hit an anon_vma reference counting problem before > my migration changes went in, and Kirill had a vague suspicion that > he might be screwing up anon_vma refcounting in split_huge_page(): > if he confirms that, I'd say it's more likely to be the cause of > your crash on this occasion. > > Not hard to fix mine (though we'll probably have to lose the > VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that > trivial fix), I just want to give the races more thought. And after giving it more thought, I realize that I was wrong yesterday, and the new VM_BUG_ON_PAGE() should be good as is: my guess is that it is simply alerting you to the same anon_vma reference counting issue as you had already hit without that patch. What I was forgetting yesterday, is that isolate_migratepages_block() can only take the page for migration when it's PageLRU(): and do_anonymous_page() only adds a page to the LRU after it has been marked as mapped and PageAnon. So the window that worried me yesterday, that __unmap_and_move() might see !PageAnon, then reach try_to_unmap() with it page_mapped and PageAnon: that window does not exist, with or without my changes. Hugh > > However it turns out, I think you have a very useful test there. > > (And I've observed no PageDirty problems with your recent patchsets, > though I don't use MADV_FREE at all myself.) > > Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, 21 Oct 2015, Hugh Dickins wrote: > On Thu, 22 Oct 2015, Minchan Kim wrote: > > Hello Hugh, > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > I added the code to check it and queued it again but I had another oops > > > > in this time but symptom is related to anon_vma, too. > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > page_mapped > > > > at that time but second check of page_mapped right before try_to_unmap > > > > seems > > > > to be true. > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > index:0x60aff > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > > !anon_vma) > > > > > > That's interesting, that's one I added in my page migration series. > > > Let me think on it, but it could well relate to the one you got before. I think I have introduced a bug there; or rather, made more evident a pre-existing bug. But I'm not sure yet: the stacktrace was from compaction (called by khugepaged, but that may not be relevant at all), and thinking through the races with isolate_migratepages_block() is never easy. What's certain is that I was not giving any thought to isolate_migratepages_block() when I added that VM_BUG_ON_PAGE(): I was thinking about "stable" anonymous pages, and how they get faulted back in from swapcache while holding page lock. It looks to me now as if a page might not yet be PageAnon when it's first tested in __unmap_and_move(), when going to page_get_anon_vma(); but is page_mapped() and PageAnon() by time of calling try_to_unmap(), where I inserted the VM_BUG_ON_PAGE(). If so, the code would always have been wrong (trying to unmap the anonymous page, and later remap its replacement, without a hold on the anon_vma needed to guide both lookups); but I'll have made it more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend that's a good step forward :) There's a reference count check in isolated_migratepages_block() before this, which would make it unlikely, but I doubt rules it out. However... you did hit an anon_vma reference counting problem before my migration changes went in, and Kirill had a vague suspicion that he might be screwing up anon_vma refcounting in split_huge_page(): if he confirms that, I'd say it's more likely to be the cause of your crash on this occasion. Not hard to fix mine (though we'll probably have to lose the VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that trivial fix), I just want to give the races more thought. However it turns out, I think you have a very useful test there. (And I've observed no PageDirty problems with your recent patchsets, though I don't use MADV_FREE at all myself.) Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, 22 Oct 2015, Minchan Kim wrote: > Hello Hugh, > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > I added the code to check it and queued it again but I had another oops > > > in this time but symptom is related to anon_vma, too. > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped > > > at that time but second check of page_mapped right before try_to_unmap > > > seems > > > to be true. > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > index:0x60aff > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > !anon_vma) > > > > That's interesting, that's one I added in my page migration series. > > Let me think on it, but it could well relate to the one you got before. > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > instead of next-20151021 to remove noise from your migration cleanup > series and will test it again. > If it is fixed, I will test again with your migration patchset, then. Not a good use of your time, I think. It's sure to be fixed in the rc5-mmotm because that VM_BUG_ON_PAGE(blah) just does not exist in that tree: I added it to verify my reasoning in changing the comments about page_get_anon_vma() and PageSwapCache in mm/migrate.c. > > > > > > page->mem_cgroup:88007f3dcc00 > > > [ cut here ] > > > kernel BUG at mm/migrate.c:889! > > > invalid opcode: [#1] SMP > > > Dumping ftrace buffer: > > >(ftrace buffer empty) > > > Modules linked in: > > > CPU: 11 PID: 59 Comm: khugepaged Not tainted > > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 > > > > Hmm, it might be me to blame, or it might be Kirill, don't know yet. > > It might be me, either. > > > > > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes > > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch: > > I haven't digested yet, but it might turn out to be relevant. Sorry, I think that was an irrelevant suggestion: today's new rc6-mmotm is identical to yesterday's there, and the patch that was removed appears to be identical to the one added. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
Hello Hugh, On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > I added the code to check it and queued it again but I had another oops > > in this time but symptom is related to anon_vma, too. > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > It seems page_get_anon_vma returns NULL since the page was not page_mapped > > at that time but second check of page_mapped right before try_to_unmap seems > > to be true. > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > index:0x60aff > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > !anon_vma) > > That's interesting, that's one I added in my page migration series. > Let me think on it, but it could well relate to the one you got before. I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 instead of next-20151021 to remove noise from your migration cleanup series and will test it again. If it is fixed, I will test again with your migration patchset, then. > > > page->mem_cgroup:88007f3dcc00 > > [ cut here ] > > kernel BUG at mm/migrate.c:889! > > invalid opcode: [#1] SMP > > Dumping ftrace buffer: > >(ftrace buffer empty) > > Modules linked in: > > CPU: 11 PID: 59 Comm: khugepaged Not tainted > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 > > Hmm, it might be me to blame, or it might be Kirill, don't know yet. It might be me, either. > > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch: > I haven't digested yet, but it might turn out to be relevant. > > Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, 22 Oct 2015, Minchan Kim wrote: > > I added the code to check it and queued it again but I had another oops > in this time but symptom is related to anon_vma, too. > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > It seems page_get_anon_vma returns NULL since the page was not page_mapped > at that time but second check of page_mapped right before try_to_unmap seems > to be true. > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > index:0x60aff > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > !anon_vma) That's interesting, that's one I added in my page migration series. Let me think on it, but it could well relate to the one you got before. > page->mem_cgroup:88007f3dcc00 > [ cut here ] > kernel BUG at mm/migrate.c:889! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 11 PID: 59 Comm: khugepaged Not tainted > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 Hmm, it might be me to blame, or it might be Kirill, don't know yet. Oh, hold on, I think Andrew has just posted a new mmotm, and it includes an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch: I haven't digested yet, but it might turn out to be relevant. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Oct 21, 2015 at 02:07:23PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote: > > I detach this report from my patchset thread because I see below > > problem with removing MADV_FREE related code and I can reproduce > > same oops with MADV_FREE + recent patches(both my SetPageDirty > > and Kirill's pte_mkdirty) within 7 hours. > > Could you share code for your workload? It's part of test suite so I need time to factor it out. I will do/test and send it. > > > I can not be sure it's THP refcount redesign's problem but it was > > one of big change in MM between mmotm-2015-10-15-15-20 and > > mmotm-2015-10-06-16-30 so it could be a culprit. > > > > In page_lock_anon_vma_read, anon_vma_root was NULL. > > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result. > > Hm. That's tricky.. :-/ > > Could you please dump anon_vma->refcount too? I added the code to check it and queued it again but I had another oops in this time but symptom is related to anon_vma, too. (kernel is based on recent mmotm + unconditional mkdirty for bug fix) It seems page_get_anon_vma returns NULL since the page was not page_mapped at that time but second check of page_mapped right before try_to_unmap seems to be true. Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 index:0x60aff flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma) page->mem_cgroup:88007f3dcc00 [ cut here ] kernel BUG at mm/migrate.c:889! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 11 PID: 59 Comm: khugepaged Not tainted 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 8800b9851a40 ti: 8800b985c000 task.ti: 8800b985c000 RIP: 0010:[] [] migrate_pages+0x8e6/0x950 RSP: 0018:8800b985fa00 EFLAGS: 00010286 RAX: 0021 RBX: ea0002dd7fc0 RCX: 81830db8 RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 8800b985fa80 R08: R09: 880bb160 R10: 8163e000 R11: 01e0 R12: R13: ea0001cfbf80 R14: ea0001cfbfc0 R15: 8189de80 FS: () GS:8800bfb6() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 5594f9d7e578 CR3: 01808000 CR4: 06a0 Stack: 8800b9851a40 811144b0 81115fb0 ea0001cfbfe0 8800b985fb30 8800b985fb20 8800b985fb20 Call Trace: [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0 [] ? isolate_freepages_block+0x3d0/0x3d0 [] compact_zone+0x2bb/0x720 [] ? retint_kernel+0x10/0x10 [] ? list_del+0xd/0x30 [] compact_zone_order+0x6d/0xa0 [] try_to_compact_pages+0xed/0x200 [] __alloc_pages_direct_compact+0x3b/0xd4 [] __alloc_pages_nodemask+0x3fb/0x920 [] khugepaged+0x158/0x1b90 [] ? hrtick_update+0x51/0x70 [] ? prepare_to_wait_event+0xf0/0xf0 [] ? unfreeze_page+0x320/0x320 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 f4 fa ff ff 48 c7 c6 58 e9 77 81 4c 89 f7 e8 fa 2a fd ff <0f> 0b 48 83 e8 01 e9 d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff RIP [] migrate_pages+0x8e6/0x950 RSP ---[ end trace 59eb35cc15af8a53 ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled > > I have vage suspicion that I'm screwing up anon_vma refcounting during > split_huge_page. > > It would be great to see if the page was part of THP before. > > > > > .. > > .. > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 > > index:0x61445 > > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 > > index:0x615ef > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(1) > > page->mem_cgroup:88007f2de000 > > [ cut here ] > > kernel BUG at mm/rmap.c:517! > > invalid opcode: [#1] SMP > > Dumping ftrace buffer: > >(ftrace buffer empty) > > Modules linked in: > > CPU: 0 PID: 24935 Comm: madvise_test Not tainted > > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000 > > RIP: 0010:[] [] > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote: > I detach this report from my patchset thread because I see below > problem with removing MADV_FREE related code and I can reproduce > same oops with MADV_FREE + recent patches(both my SetPageDirty > and Kirill's pte_mkdirty) within 7 hours. Could you share code for your workload? > I can not be sure it's THP refcount redesign's problem but it was > one of big change in MM between mmotm-2015-10-15-15-20 and > mmotm-2015-10-06-16-30 so it could be a culprit. > > In page_lock_anon_vma_read, anon_vma_root was NULL. > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result. Hm. That's tricky.. :-/ Could you please dump anon_vma->refcount too? I have vage suspicion that I'm screwing up anon_vma refcounting during split_huge_page. It would be great to see if the page was part of THP before. > > .. > .. > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 > index:0x61445 > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 > index:0x615ef > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > page dumped because: VM_BUG_ON_PAGE(1) > page->mem_cgroup:88007f2de000 > [ cut here ] > kernel BUG at mm/rmap.c:517! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 24935 Comm: madvise_test Not tainted > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000 > RIP: 0010:[] [] > page_lock_anon_vma_read+0x18e/0x190 > RSP: :8800ada2b868 EFLAGS: 00010296 > RAX: 0021 RBX: ea0001b87bc0 RCX: > RDX: 0001 RSI: 0282 RDI: 81830db0 > RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75 > R10: 01ff14bc R11: R12: 88007e806461 > R13: 88007e806460 R14: R15: 818464c0 > FS: 7f6d93212740() GS:8800bfa0() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 63c14000 CR3: a674b000 CR4: 06b0 > Stack: > ea0001b87bc0 8800ada2b8f8 88007f2de000 > 8800ada2b8d0 81129593 8800 8105f8c0 > ea0001b87bc0 8800ada2b9f8 88007f2de000 > Call Trace: > [] rmap_walk+0x1b3/0x3f0 > [] ? finish_task_switch+0x70/0x260 > [] page_referenced+0x1a3/0x220 > [] ? __page_check_address+0x1d0/0x1d0 > [] ? page_get_anon_vma+0xd0/0xd0 > [] ? anon_vma_ctor+0x40/0x40 > [] shrink_page_list+0x5ce/0xdc0 > [] shrink_inactive_list+0x18c/0x4b0 > [] shrink_lruvec+0x58f/0x730 > [] shrink_zone+0xd4/0x280 > [] do_try_to_free_pages+0x12d/0x3b0 > [] try_to_free_mem_cgroup_pages+0x9d/0x120 > [] try_charge+0x175/0x720 > [] ? __activate_page+0x230/0x230 > [] mem_cgroup_try_charge+0x85/0x1d0 > [] handle_mm_fault+0xc9a/0x1000 > [] ? __set_cpus_allowed_ptr+0x9b/0x1a0 > [] __do_page_fault+0x189/0x400 > [] do_page_fault+0xc/0x10 > [] page_fault+0x22/0x30 > Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff > 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 > 00 55 48 89 e5 41 57 41 56 45 31 f6 > 41 55 4c > RIP [] page_lock_anon_vma_read+0x18e/0x190 > RSP > ---[ end trace cfbb87f54f12290e ]--- > Kernel panic - not syncing: Fatal exception > Dumping ftrace buffer: >(ftrace buffer empty) > Kernel Offset: disabled > > On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote: > > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote: > > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote: > > > > Hello, it's too late since I sent previos patch. > > > > https://lkml.org/lkml/2015/6/3/37 > > > > > > > > This patch is alomost new compared to previos approach. > > > > I think this is more simple, clear and easy to review. > > > > > > > > One thing I should notice is that I have tested this patch > > > > and couldn't find any critical problem so I rebased patchset > > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal > > > > patchset. Unfortunately, I start to see sudden discarding of > > > > the page we shouldn't do. IOW, application's valid anonymous page > > > > was disappeared suddenly. > > > > > > > > When I look through THP changes, I think we could lose > > > > dirty bit of pte between freeze_page and unfreeze_page > > > > when we mark it as migration entry and restore it. > > > > So, I added below simple code without enough considering > > > > and cannot see the problem any more. > > > > I hope it's good hint to find right fix this problem. > > > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, 22 Oct 2015, Minchan Kim wrote: > Hello Hugh, > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > I added the code to check it and queued it again but I had another oops > > > in this time but symptom is related to anon_vma, too. > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped > > > at that time but second check of page_mapped right before try_to_unmap > > > seems > > > to be true. > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k > > > FS > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > index:0x60aff > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > !anon_vma) > > > > That's interesting, that's one I added in my page migration series. > > Let me think on it, but it could well relate to the one you got before. > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 > instead of next-20151021 to remove noise from your migration cleanup > series and will test it again. > If it is fixed, I will test again with your migration patchset, then. Not a good use of your time, I think. It's sure to be fixed in the rc5-mmotm because that VM_BUG_ON_PAGE(blah) just does not exist in that tree: I added it to verify my reasoning in changing the comments about page_get_anon_vma() and PageSwapCache in mm/migrate.c. > > > > > > page->mem_cgroup:88007f3dcc00 > > > [ cut here ] > > > kernel BUG at mm/migrate.c:889! > > > invalid opcode: [#1] SMP > > > Dumping ftrace buffer: > > >(ftrace buffer empty) > > > Modules linked in: > > > CPU: 11 PID: 59 Comm: khugepaged Not tainted > > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 > > > > Hmm, it might be me to blame, or it might be Kirill, don't know yet. > > It might be me, either. > > > > > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes > > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch: > > I haven't digested yet, but it might turn out to be relevant. Sorry, I think that was an irrelevant suggestion: today's new rc6-mmotm is identical to yesterday's there, and the patch that was removed appears to be identical to the one added. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, 21 Oct 2015, Hugh Dickins wrote: > On Thu, 22 Oct 2015, Minchan Kim wrote: > > Hello Hugh, > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > > > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > > > > > I added the code to check it and queued it again but I had another oops > > > > in this time but symptom is related to anon_vma, too. > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > > > It seems page_get_anon_vma returns NULL since the page was not > > > > page_mapped > > > > at that time but second check of page_mapped right before try_to_unmap > > > > seems > > > > to be true. > > > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 > > > > across:4191228k FS > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > > > index:0x60aff > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > > > !anon_vma) > > > > > > That's interesting, that's one I added in my page migration series. > > > Let me think on it, but it could well relate to the one you got before. I think I have introduced a bug there; or rather, made more evident a pre-existing bug. But I'm not sure yet: the stacktrace was from compaction (called by khugepaged, but that may not be relevant at all), and thinking through the races with isolate_migratepages_block() is never easy. What's certain is that I was not giving any thought to isolate_migratepages_block() when I added that VM_BUG_ON_PAGE(): I was thinking about "stable" anonymous pages, and how they get faulted back in from swapcache while holding page lock. It looks to me now as if a page might not yet be PageAnon when it's first tested in __unmap_and_move(), when going to page_get_anon_vma(); but is page_mapped() and PageAnon() by time of calling try_to_unmap(), where I inserted the VM_BUG_ON_PAGE(). If so, the code would always have been wrong (trying to unmap the anonymous page, and later remap its replacement, without a hold on the anon_vma needed to guide both lookups); but I'll have made it more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend that's a good step forward :) There's a reference count check in isolated_migratepages_block() before this, which would make it unlikely, but I doubt rules it out. However... you did hit an anon_vma reference counting problem before my migration changes went in, and Kirill had a vague suspicion that he might be screwing up anon_vma refcounting in split_huge_page(): if he confirms that, I'd say it's more likely to be the cause of your crash on this occasion. Not hard to fix mine (though we'll probably have to lose the VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that trivial fix), I just want to give the races more thought. However it turns out, I think you have a very useful test there. (And I've observed no PageDirty problems with your recent patchsets, though I don't use MADV_FREE at all myself.) Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Thu, 22 Oct 2015, Minchan Kim wrote: > > I added the code to check it and queued it again but I had another oops > in this time but symptom is related to anon_vma, too. > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > It seems page_get_anon_vma returns NULL since the page was not page_mapped > at that time but second check of page_mapped right before try_to_unmap seems > to be true. > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > index:0x60aff > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > !anon_vma) That's interesting, that's one I added in my page migration series. Let me think on it, but it could well relate to the one you got before. > page->mem_cgroup:88007f3dcc00 > [ cut here ] > kernel BUG at mm/migrate.c:889! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 11 PID: 59 Comm: khugepaged Not tainted > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 Hmm, it might be me to blame, or it might be Kirill, don't know yet. Oh, hold on, I think Andrew has just posted a new mmotm, and it includes an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch: I haven't digested yet, but it might turn out to be relevant. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
Hello Hugh, On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote: > On Thu, 22 Oct 2015, Minchan Kim wrote: > > > > I added the code to check it and queued it again but I had another oops > > in this time but symptom is related to anon_vma, too. > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix) > > It seems page_get_anon_vma returns NULL since the page was not page_mapped > > at that time but second check of page_mapped right before try_to_unmap seems > > to be true. > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 > > index:0x60aff > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && > > !anon_vma) > > That's interesting, that's one I added in my page migration series. > Let me think on it, but it could well relate to the one you got before. I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20 instead of next-20151021 to remove noise from your migration cleanup series and will test it again. If it is fixed, I will test again with your migration patchset, then. > > > page->mem_cgroup:88007f3dcc00 > > [ cut here ] > > kernel BUG at mm/migrate.c:889! > > invalid opcode: [#1] SMP > > Dumping ftrace buffer: > >(ftrace buffer empty) > > Modules linked in: > > CPU: 11 PID: 59 Comm: khugepaged Not tainted > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 > > Hmm, it might be me to blame, or it might be Kirill, don't know yet. It might be me, either. > > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch: > I haven't digested yet, but it might turn out to be relevant. > > Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Oct 21, 2015 at 02:07:23PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote: > > I detach this report from my patchset thread because I see below > > problem with removing MADV_FREE related code and I can reproduce > > same oops with MADV_FREE + recent patches(both my SetPageDirty > > and Kirill's pte_mkdirty) within 7 hours. > > Could you share code for your workload? It's part of test suite so I need time to factor it out. I will do/test and send it. > > > I can not be sure it's THP refcount redesign's problem but it was > > one of big change in MM between mmotm-2015-10-15-15-20 and > > mmotm-2015-10-06-16-30 so it could be a culprit. > > > > In page_lock_anon_vma_read, anon_vma_root was NULL. > > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result. > > Hm. That's tricky.. :-/ > > Could you please dump anon_vma->refcount too? I added the code to check it and queued it again but I had another oops in this time but symptom is related to anon_vma, too. (kernel is based on recent mmotm + unconditional mkdirty for bug fix) It seems page_get_anon_vma returns NULL since the page was not page_mapped at that time but second check of page_mapped right before try_to_unmap seems to be true. Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 index:0x60aff flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma) page->mem_cgroup:88007f3dcc00 [ cut here ] kernel BUG at mm/migrate.c:889! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 11 PID: 59 Comm: khugepaged Not tainted 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 8800b9851a40 ti: 8800b985c000 task.ti: 8800b985c000 RIP: 0010:[] [] migrate_pages+0x8e6/0x950 RSP: 0018:8800b985fa00 EFLAGS: 00010286 RAX: 0021 RBX: ea0002dd7fc0 RCX: 81830db8 RDX: 0001 RSI: 0246 RDI: 821df4d8 RBP: 8800b985fa80 R08: R09: 880bb160 R10: 8163e000 R11: 01e0 R12: R13: ea0001cfbf80 R14: ea0001cfbfc0 R15: 8189de80 FS: () GS:8800bfb6() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 5594f9d7e578 CR3: 01808000 CR4: 06a0 Stack: 8800b9851a40 811144b0 81115fb0 ea0001cfbfe0 8800b985fb30 8800b985fb20 8800b985fb20 Call Trace: [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0 [] ? isolate_freepages_block+0x3d0/0x3d0 [] compact_zone+0x2bb/0x720 [] ? retint_kernel+0x10/0x10 [] ? list_del+0xd/0x30 [] compact_zone_order+0x6d/0xa0 [] try_to_compact_pages+0xed/0x200 [] __alloc_pages_direct_compact+0x3b/0xd4 [] __alloc_pages_nodemask+0x3fb/0x920 [] khugepaged+0x158/0x1b90 [] ? hrtick_update+0x51/0x70 [] ? prepare_to_wait_event+0xf0/0xf0 [] ? unfreeze_page+0x320/0x320 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 f4 fa ff ff 48 c7 c6 58 e9 77 81 4c 89 f7 e8 fa 2a fd ff <0f> 0b 48 83 e8 01 e9 d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff RIP [] migrate_pages+0x8e6/0x950 RSP ---[ end trace 59eb35cc15af8a53 ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled > > I have vage suspicion that I'm screwing up anon_vma refcounting during > split_huge_page. > > It would be great to see if the page was part of THP before. > > > > > .. > > .. > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 > > index:0x61445 > > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 > > index:0x615ef > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > > page dumped because: VM_BUG_ON_PAGE(1) > > page->mem_cgroup:88007f2de000 > > [ cut here ] > > kernel BUG at mm/rmap.c:517! > > invalid opcode: [#1] SMP > > Dumping ftrace buffer: > >(ftrace buffer empty) > > Modules linked in: > > CPU: 0 PID: 24935 Comm: madvise_test Not tainted > > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000 > > RIP: 0010:[] [] > >
Re: kernel oops on mmotm-2015-10-15-15-20
On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote: > I detach this report from my patchset thread because I see below > problem with removing MADV_FREE related code and I can reproduce > same oops with MADV_FREE + recent patches(both my SetPageDirty > and Kirill's pte_mkdirty) within 7 hours. Could you share code for your workload? > I can not be sure it's THP refcount redesign's problem but it was > one of big change in MM between mmotm-2015-10-15-15-20 and > mmotm-2015-10-06-16-30 so it could be a culprit. > > In page_lock_anon_vma_read, anon_vma_root was NULL. > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result. Hm. That's tricky.. :-/ Could you please dump anon_vma->refcount too? I have vage suspicion that I'm screwing up anon_vma refcounting during split_huge_page. It would be great to see if the page was part of THP before. > > .. > .. > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 > index:0x61445 > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 > index:0x615ef > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) > page dumped because: VM_BUG_ON_PAGE(1) > page->mem_cgroup:88007f2de000 > [ cut here ] > kernel BUG at mm/rmap.c:517! > invalid opcode: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 24935 Comm: madvise_test Not tainted > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000 > RIP: 0010:[] [] > page_lock_anon_vma_read+0x18e/0x190 > RSP: :8800ada2b868 EFLAGS: 00010296 > RAX: 0021 RBX: ea0001b87bc0 RCX: > RDX: 0001 RSI: 0282 RDI: 81830db0 > RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75 > R10: 01ff14bc R11: R12: 88007e806461 > R13: 88007e806460 R14: R15: 818464c0 > FS: 7f6d93212740() GS:8800bfa0() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 63c14000 CR3: a674b000 CR4: 06b0 > Stack: > ea0001b87bc0 8800ada2b8f8 88007f2de000 > 8800ada2b8d0 81129593 8800 8105f8c0 > ea0001b87bc0 8800ada2b9f8 88007f2de000 > Call Trace: > [] rmap_walk+0x1b3/0x3f0 > [] ? finish_task_switch+0x70/0x260 > [] page_referenced+0x1a3/0x220 > [] ? __page_check_address+0x1d0/0x1d0 > [] ? page_get_anon_vma+0xd0/0xd0 > [] ? anon_vma_ctor+0x40/0x40 > [] shrink_page_list+0x5ce/0xdc0 > [] shrink_inactive_list+0x18c/0x4b0 > [] shrink_lruvec+0x58f/0x730 > [] shrink_zone+0xd4/0x280 > [] do_try_to_free_pages+0x12d/0x3b0 > [] try_to_free_mem_cgroup_pages+0x9d/0x120 > [] try_charge+0x175/0x720 > [] ? __activate_page+0x230/0x230 > [] mem_cgroup_try_charge+0x85/0x1d0 > [] handle_mm_fault+0xc9a/0x1000 > [] ? __set_cpus_allowed_ptr+0x9b/0x1a0 > [] __do_page_fault+0x189/0x400 > [] do_page_fault+0xc/0x10 > [] page_fault+0x22/0x30 > Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff > 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 > 00 55 48 89 e5 41 57 41 56 45 31 f6 > 41 55 4c > RIP [] page_lock_anon_vma_read+0x18e/0x190 > RSP > ---[ end trace cfbb87f54f12290e ]--- > Kernel panic - not syncing: Fatal exception > Dumping ftrace buffer: >(ftrace buffer empty) > Kernel Offset: disabled > > On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote: > > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote: > > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote: > > > > Hello, it's too late since I sent previos patch. > > > > https://lkml.org/lkml/2015/6/3/37 > > > > > > > > This patch is alomost new compared to previos approach. > > > > I think this is more simple, clear and easy to review. > > > > > > > > One thing I should notice is that I have tested this patch > > > > and couldn't find any critical problem so I rebased patchset > > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal > > > > patchset. Unfortunately, I start to see sudden discarding of > > > > the page we shouldn't do. IOW, application's valid anonymous page > > > > was disappeared suddenly. > > > > > > > > When I look through THP changes, I think we could lose > > > > dirty bit of pte between freeze_page and unfreeze_page > > > > when we mark it as migration entry and restore it. > > > > So, I added below simple code without enough considering > > > > and cannot see the problem any more. > > > > I hope it's good hint to find right fix this problem. > > > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > >
kernel oops on mmotm-2015-10-15-15-20
I detach this report from my patchset thread because I see below problem with removing MADV_FREE related code and I can reproduce same oops with MADV_FREE + recent patches(both my SetPageDirty and Kirill's pte_mkdirty) within 7 hours. I can not be sure it's THP refcount redesign's problem but it was one of big change in MM between mmotm-2015-10-15-15-20 and mmotm-2015-10-06-16-30 so it could be a culprit. In page_lock_anon_vma_read, anon_vma_root was NULL. I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result. .. .. Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 index:0x61445 page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 index:0x615ef flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) page dumped because: VM_BUG_ON_PAGE(1) page->mem_cgroup:88007f2de000 [ cut here ] kernel BUG at mm/rmap.c:517! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 24935 Comm: madvise_test Not tainted 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000 RIP: 0010:[] [] page_lock_anon_vma_read+0x18e/0x190 RSP: :8800ada2b868 EFLAGS: 00010296 RAX: 0021 RBX: ea0001b87bc0 RCX: RDX: 0001 RSI: 0282 RDI: 81830db0 RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75 R10: 01ff14bc R11: R12: 88007e806461 R13: 88007e806460 R14: R15: 818464c0 FS: 7f6d93212740() GS:8800bfa0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 63c14000 CR3: a674b000 CR4: 06b0 Stack: ea0001b87bc0 8800ada2b8f8 88007f2de000 8800ada2b8d0 81129593 8800 8105f8c0 ea0001b87bc0 8800ada2b9f8 88007f2de000 Call Trace: [] rmap_walk+0x1b3/0x3f0 [] ? finish_task_switch+0x70/0x260 [] page_referenced+0x1a3/0x220 [] ? __page_check_address+0x1d0/0x1d0 [] ? page_get_anon_vma+0xd0/0xd0 [] ? anon_vma_ctor+0x40/0x40 [] shrink_page_list+0x5ce/0xdc0 [] shrink_inactive_list+0x18c/0x4b0 [] shrink_lruvec+0x58f/0x730 [] shrink_zone+0xd4/0x280 [] do_try_to_free_pages+0x12d/0x3b0 [] try_to_free_mem_cgroup_pages+0x9d/0x120 [] try_charge+0x175/0x720 [] ? __activate_page+0x230/0x230 [] mem_cgroup_try_charge+0x85/0x1d0 [] handle_mm_fault+0xc9a/0x1000 [] ? __set_cpus_allowed_ptr+0x9b/0x1a0 [] __do_page_fault+0x189/0x400 [] do_page_fault+0xc/0x10 [] page_fault+0x22/0x30 Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 45 31 f6 41 55 4c RIP [] page_lock_anon_vma_read+0x18e/0x190 RSP ---[ end trace cfbb87f54f12290e ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote: > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote: > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote: > > > Hello, it's too late since I sent previos patch. > > > https://lkml.org/lkml/2015/6/3/37 > > > > > > This patch is alomost new compared to previos approach. > > > I think this is more simple, clear and easy to review. > > > > > > One thing I should notice is that I have tested this patch > > > and couldn't find any critical problem so I rebased patchset > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal > > > patchset. Unfortunately, I start to see sudden discarding of > > > the page we shouldn't do. IOW, application's valid anonymous page > > > was disappeared suddenly. > > > > > > When I look through THP changes, I think we could lose > > > dirty bit of pte between freeze_page and unfreeze_page > > > when we mark it as migration entry and restore it. > > > So, I added below simple code without enough considering > > > and cannot see the problem any more. > > > I hope it's good hint to find right fix this problem. > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index d5ea516ffb54..e881c04f5950 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct > > > *vma, struct page *page, > > > if (is_write_migration_entry(swp_entry)) > > > entry = maybe_mkwrite(entry, vma); > > > > > > + if (PageDirty(page)) > > > + SetPageDirty(page); > > > > The condition of PageDirty was typo. I didn't add the condition. > > Just added. > > > >
kernel oops on mmotm-2015-10-15-15-20
I detach this report from my patchset thread because I see below problem with removing MADV_FREE related code and I can reproduce same oops with MADV_FREE + recent patches(both my SetPageDirty and Kirill's pte_mkdirty) within 7 hours. I can not be sure it's THP refcount redesign's problem but it was one of big change in MM between mmotm-2015-10-15-15-20 and mmotm-2015-10-06-16-30 so it could be a culprit. In page_lock_anon_vma_read, anon_vma_root was NULL. I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result. .. .. Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 index:0x61445 page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 index:0x615ef flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked) page dumped because: VM_BUG_ON_PAGE(1) page->mem_cgroup:88007f2de000 [ cut here ] kernel BUG at mm/rmap.c:517! invalid opcode: [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 24935 Comm: madvise_test Not tainted 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000 RIP: 0010:[] [] page_lock_anon_vma_read+0x18e/0x190 RSP: :8800ada2b868 EFLAGS: 00010296 RAX: 0021 RBX: ea0001b87bc0 RCX: RDX: 0001 RSI: 0282 RDI: 81830db0 RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75 R10: 01ff14bc R11: R12: 88007e806461 R13: 88007e806460 R14: R15: 818464c0 FS: 7f6d93212740() GS:8800bfa0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 63c14000 CR3: a674b000 CR4: 06b0 Stack: ea0001b87bc0 8800ada2b8f8 88007f2de000 8800ada2b8d0 81129593 8800 8105f8c0 ea0001b87bc0 8800ada2b9f8 88007f2de000 Call Trace: [] rmap_walk+0x1b3/0x3f0 [] ? finish_task_switch+0x70/0x260 [] page_referenced+0x1a3/0x220 [] ? __page_check_address+0x1d0/0x1d0 [] ? page_get_anon_vma+0xd0/0xd0 [] ? anon_vma_ctor+0x40/0x40 [] shrink_page_list+0x5ce/0xdc0 [] shrink_inactive_list+0x18c/0x4b0 [] shrink_lruvec+0x58f/0x730 [] shrink_zone+0xd4/0x280 [] do_try_to_free_pages+0x12d/0x3b0 [] try_to_free_mem_cgroup_pages+0x9d/0x120 [] try_charge+0x175/0x720 [] ? __activate_page+0x230/0x230 [] mem_cgroup_try_charge+0x85/0x1d0 [] handle_mm_fault+0xc9a/0x1000 [] ? __set_cpus_allowed_ptr+0x9b/0x1a0 [] __do_page_fault+0x189/0x400 [] do_page_fault+0xc/0x10 [] page_fault+0x22/0x30 Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 45 31 f6 41 55 4c RIP [] page_lock_anon_vma_read+0x18e/0x190 RSP ---[ end trace cfbb87f54f12290e ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote: > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote: > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote: > > > Hello, it's too late since I sent previos patch. > > > https://lkml.org/lkml/2015/6/3/37 > > > > > > This patch is alomost new compared to previos approach. > > > I think this is more simple, clear and easy to review. > > > > > > One thing I should notice is that I have tested this patch > > > and couldn't find any critical problem so I rebased patchset > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal > > > patchset. Unfortunately, I start to see sudden discarding of > > > the page we shouldn't do. IOW, application's valid anonymous page > > > was disappeared suddenly. > > > > > > When I look through THP changes, I think we could lose > > > dirty bit of pte between freeze_page and unfreeze_page > > > when we mark it as migration entry and restore it. > > > So, I added below simple code without enough considering > > > and cannot see the problem any more. > > > I hope it's good hint to find right fix this problem. > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index d5ea516ffb54..e881c04f5950 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct > > > *vma, struct page *page, > > > if (is_write_migration_entry(swp_entry)) > > > entry = maybe_mkwrite(entry, vma); > > > > > > + if (PageDirty(page)) > > > + SetPageDirty(page); > > > > The condition of PageDirty was typo. I didn't add the condition. > > Just added. > > > >