Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-24 Thread Minchan Kim
On Thu, Nov 19, 2015 at 08:58:27AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > > I couldn't see any problem.
> > > > > > > > 
> > > > > > > > However, in this round, I did another test which is same one
> > > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > > 
> > > > > > > Could you share updated test?
> > > > > > 
> > > > > > It's part of my testing suite so I should factor it out.
> > > > > > I will send it when I go to office tomorrow.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > > 
> > > > > > Befor leaving office, I queued it up and result is below.
> > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > > fix patches.
> > > > > 
> > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 
> > > > > release:
> > > > > 
> > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > > 
> > > > 1. mm: fix __page_mapcount()
> > > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > > 
> > > > If I missed some patches, let me know it.
> > > > 
> > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > > again.
> > > > But unfortunately, the result was below.
> > > > 
> > > > Now, I am making test program I can send to you but it seems to be not 
> > > > easy
> > > > because small changes for factoring it out from testing suite seems to 
> > > > change
> > > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > > 
> > > Your test suite seems generate quite a few bug reports. Don't mind make 
> > > whole
> > > suite public?
> > 
> > It's tough due to including company internal stuffs.
> > That's why I try to factor the part I can share out but unfortunatel,
> > I couldn't grab a time for retrying until now. :(
> > 
> > >  
> > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > > index:0x60e02
> > > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > > page->mem_cgroup:880077cf0c00
> > > > [ cut here ]
> > > > kernel BUG at mm/huge_memory.c:3272!
> > > > invalid opcode:  [#1] SMP 
> > > > Dumping ftrace buffer:
> > > >(ftrace buffer empty)
> > > > Modules linked in:
> > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > > 01/01/2011
> > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > > RIP: 0010:[]  [] 
> > > > split_huge_page_to_list+0x8fb/0x910
> > > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > > RAX: 0021 RBX: ea240080 RCX: 
> > > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > > R13: ea240088 R14: ea240080 R15: 
> > > > FS:  () GS:88007830() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > > Stack:
> > > >  cccd ea240080 88007344fa00 ea240088
> > > >  88007344fa00  88007344f9e8 810f0200
> > > >  ea24   ea240080
> > > > Call Trace:
> > > >  [] ? __lock_page+0xa0/0xb0
> > > >  [] deferred_split_scan+0x115/0x240
> > > >  [] ? list_lru_count_one+0x1c/0x30
> > > >  [] shrink_slab.part.42+0x1e3/0x350
> > > >  [] shrink_zone+0x26a/0x280
> > > >  [] do_try_to_free_pages+0x12d/0x3b0
> > > >  [] try_to_free_pages+0xb4/0x140
> > > >  [] __alloc_pages_nodemask+0x459/0x920
> > > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > > >  [] khugepaged+0x155/0x1b10
> > > >  

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-24 Thread Minchan Kim
On Thu, Nov 19, 2015 at 08:58:27AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > > I couldn't see any problem.
> > > > > > > > 
> > > > > > > > However, in this round, I did another test which is same one
> > > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > > 
> > > > > > > Could you share updated test?
> > > > > > 
> > > > > > It's part of my testing suite so I should factor it out.
> > > > > > I will send it when I go to office tomorrow.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > > 
> > > > > > Befor leaving office, I queued it up and result is below.
> > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > > fix patches.
> > > > > 
> > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 
> > > > > release:
> > > > > 
> > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > > 
> > > > 1. mm: fix __page_mapcount()
> > > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > > 
> > > > If I missed some patches, let me know it.
> > > > 
> > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > > again.
> > > > But unfortunately, the result was below.
> > > > 
> > > > Now, I am making test program I can send to you but it seems to be not 
> > > > easy
> > > > because small changes for factoring it out from testing suite seems to 
> > > > change
> > > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > > 
> > > Your test suite seems generate quite a few bug reports. Don't mind make 
> > > whole
> > > suite public?
> > 
> > It's tough due to including company internal stuffs.
> > That's why I try to factor the part I can share out but unfortunatel,
> > I couldn't grab a time for retrying until now. :(
> > 
> > >  
> > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > > index:0x60e02
> > > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > > page->mem_cgroup:880077cf0c00
> > > > [ cut here ]
> > > > kernel BUG at mm/huge_memory.c:3272!
> > > > invalid opcode:  [#1] SMP 
> > > > Dumping ftrace buffer:
> > > >(ftrace buffer empty)
> > > > Modules linked in:
> > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > > 01/01/2011
> > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > > RIP: 0010:[]  [] 
> > > > split_huge_page_to_list+0x8fb/0x910
> > > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > > RAX: 0021 RBX: ea240080 RCX: 
> > > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > > R13: ea240088 R14: ea240080 R15: 
> > > > FS:  () GS:88007830() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > > Stack:
> > > >  cccd ea240080 88007344fa00 ea240088
> > > >  88007344fa00  88007344f9e8 810f0200
> > > >  ea24   ea240080
> > > > Call Trace:
> > > >  [] ? __lock_page+0xa0/0xb0
> > > >  [] deferred_split_scan+0x115/0x240
> > > >  [] ? list_lru_count_one+0x1c/0x30
> > > >  [] shrink_slab.part.42+0x1e3/0x350
> > > >  [] shrink_zone+0x26a/0x280
> > > >  [] do_try_to_free_pages+0x12d/0x3b0
> > > >  [] try_to_free_pages+0xb4/0x140
> > > >  [] __alloc_pages_nodemask+0x459/0x920
> > > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > > >  [] khugepaged+0x155/0x1b10
> > > >  

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-19 Thread yalin wang

> On Nov 19, 2015, at 14:58, Kirill A. Shutemov  wrote:
> 
> uncharged
i also encounter this crash ,

also  i encounter a crash like this in qemu:


[2.703436] [] do_execveat_common.isra.36+0x4f0/0x630
[2.703624] [] do_execve+0x24/0x30
[2.703767] [] SyS_execve+0x1c/0x2c
[2.703923] BUG: Bad page map in process init  pte:604837ebd3 
pmd:b29e7003
[2.704140] page:ffc07f00af80 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.704414] flags: 0x4014(referenced|dirty)
[2.704563] page dumped because: bad pte
[2.704666] addr:007fafb7e000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7e
[2.704906] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.705117] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.705315] Hardware name: ranchu (DT)
[2.705408] Call trace:
[2.705488] [] dump_backtrace+0x0/0x124
[2.705657] [] show_stack+0x10/0x1c
[2.705797] [] dump_stack+0x78/0x98
[2.705971] [] print_bad_pte+0x154/0x1f0
[2.706102] [] unmap_single_vma+0x574/0x704
[2.706236] [] unmap_vmas+0x54/0x70
[2.706354] [] exit_mmap+0x88/0xfc
[2.706473] [] mmput+0x48/0xe8
[2.706584] [] flush_old_exec+0x30c/0x79c
[2.706719] [] load_elf_binary+0x21c/0x1098
[2.706856] [] search_binary_handler+0xa8/0x224
[2.706995] [] do_execveat_common.isra.36+0x4f0/0x630
[2.707144] [] do_execve+0x24/0x30
[2.707263] [] SyS_execve+0x1c/0x2c
[2.707392] BUG: Bad page map in process init  pte:604837fbd3 
pmd:b29e7003
[2.707752] page:ffc07f00afc0 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.708167] flags: 0x4014(referenced|dirty)
[2.708333] page dumped because: bad pte
[2.708501] addr:007fafb7f000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7f
[2.709084] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.709306] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.709494] Hardware name: ranchu (DT)

seems the page map count is not correct ..
i build is based on mmotm-2015-10-21-14-41

Thanks



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-19 Thread yalin wang

> On Nov 19, 2015, at 14:58, Kirill A. Shutemov  wrote:
> 
> uncharged
i also encounter this crash ,

also  i encounter a crash like this in qemu:


[2.703436] [] do_execveat_common.isra.36+0x4f0/0x630
[2.703624] [] do_execve+0x24/0x30
[2.703767] [] SyS_execve+0x1c/0x2c
[2.703923] BUG: Bad page map in process init  pte:604837ebd3 
pmd:b29e7003
[2.704140] page:ffc07f00af80 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.704414] flags: 0x4014(referenced|dirty)
[2.704563] page dumped because: bad pte
[2.704666] addr:007fafb7e000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7e
[2.704906] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.705117] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.705315] Hardware name: ranchu (DT)
[2.705408] Call trace:
[2.705488] [] dump_backtrace+0x0/0x124
[2.705657] [] show_stack+0x10/0x1c
[2.705797] [] dump_stack+0x78/0x98
[2.705971] [] print_bad_pte+0x154/0x1f0
[2.706102] [] unmap_single_vma+0x574/0x704
[2.706236] [] unmap_vmas+0x54/0x70
[2.706354] [] exit_mmap+0x88/0xfc
[2.706473] [] mmput+0x48/0xe8
[2.706584] [] flush_old_exec+0x30c/0x79c
[2.706719] [] load_elf_binary+0x21c/0x1098
[2.706856] [] search_binary_handler+0xa8/0x224
[2.706995] [] do_execveat_common.isra.36+0x4f0/0x630
[2.707144] [] do_execve+0x24/0x30
[2.707263] [] SyS_execve+0x1c/0x2c
[2.707392] BUG: Bad page map in process init  pte:604837fbd3 
pmd:b29e7003
[2.707752] page:ffc07f00afc0 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.708167] flags: 0x4014(referenced|dirty)
[2.708333] page dumped because: bad pte
[2.708501] addr:007fafb7f000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7f
[2.709084] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.709306] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.709494] Hardware name: ranchu (DT)

seems the page map count is not correct ..
i build is based on mmotm-2015-10-21-14-41

Thanks



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Kirill A. Shutemov
On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > I couldn't see any problem.
> > > > > > > 
> > > > > > > However, in this round, I did another test which is same one
> > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > 
> > > > > > Could you share updated test?
> > > > > 
> > > > > It's part of my testing suite so I should factor it out.
> > > > > I will send it when I go to office tomorrow.
> > > > 
> > > > Thanks.
> > > > 
> > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > 
> > > > > Befor leaving office, I queued it up and result is below.
> > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > fix patches.
> > > > 
> > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > > 
> > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > 
> > > 1. mm: fix __page_mapcount()
> > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > 
> > > If I missed some patches, let me know it.
> > > 
> > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > again.
> > > But unfortunately, the result was below.
> > > 
> > > Now, I am making test program I can send to you but it seems to be not 
> > > easy
> > > because small changes for factoring it out from testing suite seems to 
> > > change
> > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > 
> > Your test suite seems generate quite a few bug reports. Don't mind make 
> > whole
> > suite public?
> 
> It's tough due to including company internal stuffs.
> That's why I try to factor the part I can share out but unfortunatel,
> I couldn't grab a time for retrying until now. :(
> 
> >  
> > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > index:0x60e02
> > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > page->mem_cgroup:880077cf0c00
> > > [ cut here ]
> > > kernel BUG at mm/huge_memory.c:3272!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > 01/01/2011
> > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > RIP: 0010:[]  [] 
> > > split_huge_page_to_list+0x8fb/0x910
> > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > RAX: 0021 RBX: ea240080 RCX: 
> > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > R13: ea240088 R14: ea240080 R15: 
> > > FS:  () GS:88007830() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > Stack:
> > >  cccd ea240080 88007344fa00 ea240088
> > >  88007344fa00  88007344f9e8 810f0200
> > >  ea24   ea240080
> > > Call Trace:
> > >  [] ? __lock_page+0xa0/0xb0
> > >  [] deferred_split_scan+0x115/0x240
> > >  [] ? list_lru_count_one+0x1c/0x30
> > >  [] shrink_slab.part.42+0x1e3/0x350
> > >  [] shrink_zone+0x26a/0x280
> > >  [] do_try_to_free_pages+0x12d/0x3b0
> > >  [] try_to_free_pages+0xb4/0x140
> > >  [] __alloc_pages_nodemask+0x459/0x920
> > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > >  [] khugepaged+0x155/0x1b10
> > >  [] ? prepare_to_wait_event+0xf0/0xf0
> > >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> > >  [] kthread+0xc9/0xe0
> > >  [] ? kthread_park+0x60/0x60
> > >  [] ret_from_fork+0x3f/0x70
> > >  [] ? kthread_park+0x60/0x60
> > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Minchan Kim
On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > I couldn't see any problem.
> > > > > > 
> > > > > > However, in this round, I did another test which is same one
> > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > 
> > > > > Could you share updated test?
> > > > 
> > > > It's part of my testing suite so I should factor it out.
> > > > I will send it when I go to office tomorrow.
> > > 
> > > Thanks.
> > > 
> > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > 
> > > > Befor leaving office, I queued it up and result is below.
> > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > Anyway, please confirm and say to me what I should add more patches
> > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > fix patches.
> > > 
> > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > 
> > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > 
> > 1. mm: fix __page_mapcount()
> > 2. thp: fix leak due split_huge_page() vs. exit race
> > 
> > If I missed some patches, let me know it.
> > 
> > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > again.
> > But unfortunately, the result was below.
> > 
> > Now, I am making test program I can send to you but it seems to be not easy
> > because small changes for factoring it out from testing suite seems to 
> > change
> > something(ex, timing) and makes hard to reproduce. I will try it again.
> 
> Your test suite seems generate quite a few bug reports. Don't mind make whole
> suite public?

It's tough due to including company internal stuffs.
That's why I try to factor the part I can share out but unfortunatel,
I couldn't grab a time for retrying until now. :(

>  
> > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > index:0x60e02
> > flags: 0x40040018(uptodate|dirty|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > page->mem_cgroup:880077cf0c00
> > [ cut here ]
> > kernel BUG at mm/huge_memory.c:3272!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > RIP: 0010:[]  [] 
> > split_huge_page_to_list+0x8fb/0x910
> > RSP: 0018:88007344f968  EFLAGS: 00010286
> > RAX: 0021 RBX: ea240080 RCX: 
> > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > RBP: 88007344f9e8 R08:  R09: 880bc600
> > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > R13: ea240088 R14: ea240080 R15: 
> > FS:  () GS:88007830() knlGS:
> > CS:  0010 DS:  ES:  CR0: 8005003b
> > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > Stack:
> >  cccd ea240080 88007344fa00 ea240088
> >  88007344fa00  88007344f9e8 810f0200
> >  ea24   ea240080
> > Call Trace:
> >  [] ? __lock_page+0xa0/0xb0
> >  [] deferred_split_scan+0x115/0x240
> >  [] ? list_lru_count_one+0x1c/0x30
> >  [] shrink_slab.part.42+0x1e3/0x350
> >  [] shrink_zone+0x26a/0x280
> >  [] do_try_to_free_pages+0x12d/0x3b0
> >  [] try_to_free_pages+0xb4/0x140
> >  [] __alloc_pages_nodemask+0x459/0x920
> >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> >  [] khugepaged+0x155/0x1b10
> >  [] ? prepare_to_wait_event+0xf0/0xf0
> >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> >  [] kthread+0xc9/0xe0
> >  [] ? kthread_park+0x60/0x60
> >  [] ret_from_fork+0x3f/0x70
> >  [] ? kthread_park+0x60/0x60
> > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 
> > e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 
> > c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> > RIP  [] split_huge_page_to_list+0x8fb/0x910
> >  RSP 
> > ---[ end trace 0ee39378e850d8de ]---
> > Kernel panic - not syncing: Fatal 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Minchan Kim
On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > I couldn't see any problem.
> > > > > > 
> > > > > > However, in this round, I did another test which is same one
> > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > 
> > > > > Could you share updated test?
> > > > 
> > > > It's part of my testing suite so I should factor it out.
> > > > I will send it when I go to office tomorrow.
> > > 
> > > Thanks.
> > > 
> > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > 
> > > > Befor leaving office, I queued it up and result is below.
> > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > Anyway, please confirm and say to me what I should add more patches
> > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > fix patches.
> > > 
> > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > 
> > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > 
> > 1. mm: fix __page_mapcount()
> > 2. thp: fix leak due split_huge_page() vs. exit race
> > 
> > If I missed some patches, let me know it.
> > 
> > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > again.
> > But unfortunately, the result was below.
> > 
> > Now, I am making test program I can send to you but it seems to be not easy
> > because small changes for factoring it out from testing suite seems to 
> > change
> > something(ex, timing) and makes hard to reproduce. I will try it again.
> 
> Your test suite seems generate quite a few bug reports. Don't mind make whole
> suite public?

It's tough due to including company internal stuffs.
That's why I try to factor the part I can share out but unfortunatel,
I couldn't grab a time for retrying until now. :(

>  
> > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > index:0x60e02
> > flags: 0x40040018(uptodate|dirty|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > page->mem_cgroup:880077cf0c00
> > [ cut here ]
> > kernel BUG at mm/huge_memory.c:3272!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > RIP: 0010:[]  [] 
> > split_huge_page_to_list+0x8fb/0x910
> > RSP: 0018:88007344f968  EFLAGS: 00010286
> > RAX: 0021 RBX: ea240080 RCX: 
> > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > RBP: 88007344f9e8 R08:  R09: 880bc600
> > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > R13: ea240088 R14: ea240080 R15: 
> > FS:  () GS:88007830() knlGS:
> > CS:  0010 DS:  ES:  CR0: 8005003b
> > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > Stack:
> >  cccd ea240080 88007344fa00 ea240088
> >  88007344fa00  88007344f9e8 810f0200
> >  ea24   ea240080
> > Call Trace:
> >  [] ? __lock_page+0xa0/0xb0
> >  [] deferred_split_scan+0x115/0x240
> >  [] ? list_lru_count_one+0x1c/0x30
> >  [] shrink_slab.part.42+0x1e3/0x350
> >  [] shrink_zone+0x26a/0x280
> >  [] do_try_to_free_pages+0x12d/0x3b0
> >  [] try_to_free_pages+0xb4/0x140
> >  [] __alloc_pages_nodemask+0x459/0x920
> >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> >  [] khugepaged+0x155/0x1b10
> >  [] ? prepare_to_wait_event+0xf0/0xf0
> >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> >  [] kthread+0xc9/0xe0
> >  [] ? kthread_park+0x60/0x60
> >  [] ret_from_fork+0x3f/0x70
> >  [] ? kthread_park+0x60/0x60
> > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 
> > e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 
> > c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> > RIP  [] split_huge_page_to_list+0x8fb/0x910
> >  RSP 
> > ---[ end trace 0ee39378e850d8de ]---
> > Kernel panic - not syncing: Fatal 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Kirill A. Shutemov
On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > I couldn't see any problem.
> > > > > > > 
> > > > > > > However, in this round, I did another test which is same one
> > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > 
> > > > > > Could you share updated test?
> > > > > 
> > > > > It's part of my testing suite so I should factor it out.
> > > > > I will send it when I go to office tomorrow.
> > > > 
> > > > Thanks.
> > > > 
> > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > 
> > > > > Befor leaving office, I queued it up and result is below.
> > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > fix patches.
> > > > 
> > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > > 
> > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > 
> > > 1. mm: fix __page_mapcount()
> > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > 
> > > If I missed some patches, let me know it.
> > > 
> > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > again.
> > > But unfortunately, the result was below.
> > > 
> > > Now, I am making test program I can send to you but it seems to be not 
> > > easy
> > > because small changes for factoring it out from testing suite seems to 
> > > change
> > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > 
> > Your test suite seems generate quite a few bug reports. Don't mind make 
> > whole
> > suite public?
> 
> It's tough due to including company internal stuffs.
> That's why I try to factor the part I can share out but unfortunatel,
> I couldn't grab a time for retrying until now. :(
> 
> >  
> > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > index:0x60e02
> > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > page->mem_cgroup:880077cf0c00
> > > [ cut here ]
> > > kernel BUG at mm/huge_memory.c:3272!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > 01/01/2011
> > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > RIP: 0010:[]  [] 
> > > split_huge_page_to_list+0x8fb/0x910
> > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > RAX: 0021 RBX: ea240080 RCX: 
> > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > R13: ea240088 R14: ea240080 R15: 
> > > FS:  () GS:88007830() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > Stack:
> > >  cccd ea240080 88007344fa00 ea240088
> > >  88007344fa00  88007344f9e8 810f0200
> > >  ea24   ea240080
> > > Call Trace:
> > >  [] ? __lock_page+0xa0/0xb0
> > >  [] deferred_split_scan+0x115/0x240
> > >  [] ? list_lru_count_one+0x1c/0x30
> > >  [] shrink_slab.part.42+0x1e3/0x350
> > >  [] shrink_zone+0x26a/0x280
> > >  [] do_try_to_free_pages+0x12d/0x3b0
> > >  [] try_to_free_pages+0xb4/0x140
> > >  [] __alloc_pages_nodemask+0x459/0x920
> > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > >  [] khugepaged+0x155/0x1b10
> > >  [] ? prepare_to_wait_event+0xf0/0xf0
> > >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> > >  [] kthread+0xc9/0xe0
> > >  [] ? kthread_park+0x60/0x60
> > >  [] ret_from_fork+0x3f/0x70
> > >  [] ? kthread_park+0x60/0x60
> > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-17 Thread Kirill A. Shutemov
On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > I couldn't see any problem.
> > > > > 
> > > > > However, in this round, I did another test which is same one
> > > > > I attached but a liitle bit different because it doesn't do
> > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > 
> > > > Could you share updated test?
> > > 
> > > It's part of my testing suite so I should factor it out.
> > > I will send it when I go to office tomorrow.
> > 
> > Thanks.
> > 
> > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > 
> > > Befor leaving office, I queued it up and result is below.
> > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > Anyway, please confirm and say to me what I should add more patches
> > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > fix patches.
> > 
> > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > 
> > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> 
> 1. mm: fix __page_mapcount()
> 2. thp: fix leak due split_huge_page() vs. exit race
> 
> If I missed some patches, let me know it.
> 
> I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> But unfortunately, the result was below.
> 
> Now, I am making test program I can send to you but it seems to be not easy
> because small changes for factoring it out from testing suite seems to change
> something(ex, timing) and makes hard to reproduce. I will try it again.

Your test suite seems generate quite a few bug reports. Don't mind make whole
suite public?
 
> page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> index:0x60e02
> flags: 0x40040018(uptodate|dirty|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3272!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x8fb/0x910
> RSP: 0018:88007344f968  EFLAGS: 00010286
> RAX: 0021 RBX: ea240080 RCX: 
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88007344f9e8 R08:  R09: 880bc600
> R10: 8163e2c0 R11: 4b47 R12: ea240080
> R13: ea240088 R14: ea240080 R15: 
> FS:  () GS:88007830() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> Stack:
>  cccd ea240080 88007344fa00 ea240088
>  88007344fa00  88007344f9e8 810f0200
>  ea24   ea240080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x115/0x240
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
>  [] khugepaged+0x155/0x1b10
>  [] ? prepare_to_wait_event+0xf0/0xf0
>  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
>  [] kthread+0xc9/0xe0
>  [] ? kthread_park+0x60/0x60
>  [] ret_from_fork+0x3f/0x70
>  [] ? kthread_park+0x60/0x60
> Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
> 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 
> c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> RIP  [] split_huge_page_to_list+0x8fb/0x910
>  RSP 
> ---[ end trace 0ee39378e850d8de ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled

I looked more into it. It seems a race between split_huge_page() and
deferred_split_scan() as the dumped page is not huge.

Could you check if the patch below makes any difference to the situation?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 91e2f4b7ca39..923c0f6eb50a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3186,13 +3186,6 @@ static 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-17 Thread Kirill A. Shutemov
On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > I couldn't see any problem.
> > > > > 
> > > > > However, in this round, I did another test which is same one
> > > > > I attached but a liitle bit different because it doesn't do
> > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > 
> > > > Could you share updated test?
> > > 
> > > It's part of my testing suite so I should factor it out.
> > > I will send it when I go to office tomorrow.
> > 
> > Thanks.
> > 
> > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > 
> > > Befor leaving office, I queued it up and result is below.
> > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > Anyway, please confirm and say to me what I should add more patches
> > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > fix patches.
> > 
> > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > 
> > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> 
> 1. mm: fix __page_mapcount()
> 2. thp: fix leak due split_huge_page() vs. exit race
> 
> If I missed some patches, let me know it.
> 
> I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> But unfortunately, the result was below.
> 
> Now, I am making test program I can send to you but it seems to be not easy
> because small changes for factoring it out from testing suite seems to change
> something(ex, timing) and makes hard to reproduce. I will try it again.

Your test suite seems generate quite a few bug reports. Don't mind make whole
suite public?
 
> page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> index:0x60e02
> flags: 0x40040018(uptodate|dirty|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3272!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x8fb/0x910
> RSP: 0018:88007344f968  EFLAGS: 00010286
> RAX: 0021 RBX: ea240080 RCX: 
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88007344f9e8 R08:  R09: 880bc600
> R10: 8163e2c0 R11: 4b47 R12: ea240080
> R13: ea240088 R14: ea240080 R15: 
> FS:  () GS:88007830() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> Stack:
>  cccd ea240080 88007344fa00 ea240088
>  88007344fa00  88007344f9e8 810f0200
>  ea24   ea240080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x115/0x240
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
>  [] khugepaged+0x155/0x1b10
>  [] ? prepare_to_wait_event+0xf0/0xf0
>  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
>  [] kthread+0xc9/0xe0
>  [] ? kthread_park+0x60/0x60
>  [] ret_from_fork+0x3f/0x70
>  [] ? kthread_park+0x60/0x60
> Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
> 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 
> c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> RIP  [] split_huge_page_to_list+0x8fb/0x910
>  RSP 
> ---[ end trace 0ee39378e850d8de ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled

I looked more into it. It seems a race between split_huge_page() and
deferred_split_scan() as the dumped page is not huge.

Could you check if the patch below makes any difference to the situation?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 91e2f4b7ca39..923c0f6eb50a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3186,13 +3186,6 @@ static 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Minchan Kim
On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > I couldn't see any problem.
> > > > 
> > > > However, in this round, I did another test which is same one
> > > > I attached but a liitle bit different because it doesn't do
> > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > 
> > > Could you share updated test?
> > 
> > It's part of my testing suite so I should factor it out.
> > I will send it when I go to office tomorrow.
> 
> Thanks.
> 
> > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > 
> > Befor leaving office, I queued it up and result is below.
> > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > Anyway, please confirm and say to me what I should add more patches
> > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > fix patches.
> 
> The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> 
> http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

1. mm: fix __page_mapcount()
2. thp: fix leak due split_huge_page() vs. exit race

If I missed some patches, let me know it.

I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
But unfortunately, the result was below.

Now, I am making test program I can send to you but it seems to be not easy
because small changes for factoring it out from testing suite seems to change
something(ex, timing) and makes hard to reproduce. I will try it again.


page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
index:0x60e02
flags: 0x40040018(uptodate|dirty|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3272!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x8fb/0x910
RSP: 0018:88007344f968  EFLAGS: 00010286
RAX: 0021 RBX: ea240080 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344f9e8 R08:  R09: 880bc600
R10: 8163e2c0 R11: 4b47 R12: ea240080
R13: ea240088 R14: ea240080 R15: 
FS:  () GS:88007830() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
Stack:
 cccd ea240080 88007344fa00 ea240088
 88007344fa00  88007344f9e8 810f0200
 ea24   ea240080
Call Trace:
 [] ? __lock_page+0xa0/0xb0
 [] deferred_split_scan+0x115/0x240
 [] ? list_lru_count_one+0x1c/0x30
 [] shrink_slab.part.42+0x1e3/0x350
 [] shrink_zone+0x26a/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] try_to_free_pages+0xb4/0x140
 [] __alloc_pages_nodemask+0x459/0x920
 [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 
77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
RIP  [] split_huge_page_to_list+0x8fb/0x910
 RSP 
---[ end trace 0ee39378e850d8de ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Kirill A. Shutemov
On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > During the test with MADV_FREE on kernel I applied your patches,
> > > I couldn't see any problem.
> > > 
> > > However, in this round, I did another test which is same one
> > > I attached but a liitle bit different because it doesn't do
> > > (memcg things/kill/swapoff) for testing program long-live test.
> > 
> > Could you share updated test?
> 
> It's part of my testing suite so I should factor it out.
> I will send it when I go to office tomorrow.

Thanks.

> > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> 
> Befor leaving office, I queued it up and result is below.
> It seems you fixed already but didn't apply it to mmotm yet. Right?
> Anyway, please confirm and say to me what I should add more patches
> into mmotm-2015-11-10-15-53 for follow up your recent many bug
> fix patches.

The two my patches which are not in the mmotm-2015-11-10-15-53 release:

http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Minchan Kim
On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > During the test with MADV_FREE on kernel I applied your patches,
> > I couldn't see any problem.
> > 
> > However, in this round, I did another test which is same one
> > I attached but a liitle bit different because it doesn't do
> > (memcg things/kill/swapoff) for testing program long-live test.
> 
> Could you share updated test?

It's part of my testing suite so I should factor it out.
I will send it when I go to office tomorrow.

> 
> And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

Befor leaving office, I queued it up and result is below.
It seems you fixed already but didn't apply it to mmotm yet. Right?
Anyway, please confirm and say to me what I should add more patches
into mmotm-2015-11-10-15-53 for follow up your recent many bug
fix patches.

Thanks.

page:ea553fc0 count:3 mapcount:1 mapping:88007f717a01 
index:0x602ff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 10 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:88007344fa00  EFLAGS: 00010282
RAX: 0021 RBX: ea0001a0bbc0 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344fa80 R08:  R09: 880b9540
R10: 8163e2c0 R11: 02c2 R12: 
R13: ea553f80 R14: ea553fc0 R15: 8189db40
FS:  () GS:88007834() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f45cc0091d8 CR3: 7eba7000 CR4: 06a0
Stack:
 880073441a40   
 81114880  81116420 ea553fe0
 88007344fb30 88007344fb20  88007344fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 b8 f6 77 81 4c 89 f7 e8 fa 36 fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 337555313b7e45be ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Kirill A. Shutemov
On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> During the test with MADV_FREE on kernel I applied your patches,
> I couldn't see any problem.
> 
> However, in this round, I did another test which is same one
> I attached but a liitle bit different because it doesn't do
> (memcg things/kill/swapoff) for testing program long-live test.

Could you share updated test?

And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

> With that, I encountered this problem.
> 
> page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
> index:0x62a02
> flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3340!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x907/0x920
> RSP: 0018:88004ced7a38  EFLAGS: 00010296
> RAX: 0021 RBX: eaf60080 RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88004ced7ab8 R08:  R09: 880bc560
> R10: 8163d880 R11: 00014f25 R12: eaf60080
> R13: eaf60088 R14: eaf60080 R15: 
> FS:  7f43d3ced740() GS:8800782e() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
> Stack:
>  cccd eaf60080 88004ced7ad0 eaf60088
>  88004ced7ad0  88004ced7ab8 810ef9d0
>  eaf6   eaf60080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x11c/0x260
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] handle_mm_fault+0xc77/0x1000
>  [] ? retint_kernel+0x10/0x10
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 
> 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 
> af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f 
> RIP  [] split_huge_page_to_list+0x907/0x920
>  RSP 
> ---[ end trace c9a60522e3a296e4 ]---

I don't see how it's possible: call lock_page() just before
split_huge_page() in deferred_split_scan().

> So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED.
> In this time, I saw below oops in this time.
> If I miss somethings, please let me know it.
> 
> [ cut here ]
> kernel BUG at include/linux/swapops.h:129!

Looks similar to what I fixed by inserting smp_wmb() just before
clear_compound_head() in __split_huge_page_tail().

Do you have this in place? Like in last -mm tree?

> Another hit:
> 
> page:ea520080 count:2 mapcount:0 mapping:880072b38a51 
> index:0x62602
> flags: 0x40048028(uptodate|lru|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3306!

The same as the first one: no idea.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Kirill A. Shutemov
On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> During the test with MADV_FREE on kernel I applied your patches,
> I couldn't see any problem.
> 
> However, in this round, I did another test which is same one
> I attached but a liitle bit different because it doesn't do
> (memcg things/kill/swapoff) for testing program long-live test.

Could you share updated test?

And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

> With that, I encountered this problem.
> 
> page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
> index:0x62a02
> flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3340!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x907/0x920
> RSP: 0018:88004ced7a38  EFLAGS: 00010296
> RAX: 0021 RBX: eaf60080 RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88004ced7ab8 R08:  R09: 880bc560
> R10: 8163d880 R11: 00014f25 R12: eaf60080
> R13: eaf60088 R14: eaf60080 R15: 
> FS:  7f43d3ced740() GS:8800782e() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
> Stack:
>  cccd eaf60080 88004ced7ad0 eaf60088
>  88004ced7ad0  88004ced7ab8 810ef9d0
>  eaf6   eaf60080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x11c/0x260
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] handle_mm_fault+0xc77/0x1000
>  [] ? retint_kernel+0x10/0x10
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 
> 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 
> af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f 
> RIP  [] split_huge_page_to_list+0x907/0x920
>  RSP 
> ---[ end trace c9a60522e3a296e4 ]---

I don't see how it's possible: call lock_page() just before
split_huge_page() in deferred_split_scan().

> So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED.
> In this time, I saw below oops in this time.
> If I miss somethings, please let me know it.
> 
> [ cut here ]
> kernel BUG at include/linux/swapops.h:129!

Looks similar to what I fixed by inserting smp_wmb() just before
clear_compound_head() in __split_huge_page_tail().

Do you have this in place? Like in last -mm tree?

> Another hit:
> 
> page:ea520080 count:2 mapcount:0 mapping:880072b38a51 
> index:0x62602
> flags: 0x40048028(uptodate|lru|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3306!

The same as the first one: no idea.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Minchan Kim
On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > During the test with MADV_FREE on kernel I applied your patches,
> > I couldn't see any problem.
> > 
> > However, in this round, I did another test which is same one
> > I attached but a liitle bit different because it doesn't do
> > (memcg things/kill/swapoff) for testing program long-live test.
> 
> Could you share updated test?

It's part of my testing suite so I should factor it out.
I will send it when I go to office tomorrow.

> 
> And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

Befor leaving office, I queued it up and result is below.
It seems you fixed already but didn't apply it to mmotm yet. Right?
Anyway, please confirm and say to me what I should add more patches
into mmotm-2015-11-10-15-53 for follow up your recent many bug
fix patches.

Thanks.

page:ea553fc0 count:3 mapcount:1 mapping:88007f717a01 
index:0x602ff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 10 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:88007344fa00  EFLAGS: 00010282
RAX: 0021 RBX: ea0001a0bbc0 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344fa80 R08:  R09: 880b9540
R10: 8163e2c0 R11: 02c2 R12: 
R13: ea553f80 R14: ea553fc0 R15: 8189db40
FS:  () GS:88007834() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f45cc0091d8 CR3: 7eba7000 CR4: 06a0
Stack:
 880073441a40   
 81114880  81116420 ea553fe0
 88007344fb30 88007344fb20  88007344fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 b8 f6 77 81 4c 89 f7 e8 fa 36 fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 337555313b7e45be ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Kirill A. Shutemov
On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > During the test with MADV_FREE on kernel I applied your patches,
> > > I couldn't see any problem.
> > > 
> > > However, in this round, I did another test which is same one
> > > I attached but a liitle bit different because it doesn't do
> > > (memcg things/kill/swapoff) for testing program long-live test.
> > 
> > Could you share updated test?
> 
> It's part of my testing suite so I should factor it out.
> I will send it when I go to office tomorrow.

Thanks.

> > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> 
> Befor leaving office, I queued it up and result is below.
> It seems you fixed already but didn't apply it to mmotm yet. Right?
> Anyway, please confirm and say to me what I should add more patches
> into mmotm-2015-11-10-15-53 for follow up your recent many bug
> fix patches.

The two my patches which are not in the mmotm-2015-11-10-15-53 release:

http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-16 Thread Minchan Kim
On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > I couldn't see any problem.
> > > > 
> > > > However, in this round, I did another test which is same one
> > > > I attached but a liitle bit different because it doesn't do
> > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > 
> > > Could you share updated test?
> > 
> > It's part of my testing suite so I should factor it out.
> > I will send it when I go to office tomorrow.
> 
> Thanks.
> 
> > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > 
> > Befor leaving office, I queued it up and result is below.
> > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > Anyway, please confirm and say to me what I should add more patches
> > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > fix patches.
> 
> The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> 
> http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

1. mm: fix __page_mapcount()
2. thp: fix leak due split_huge_page() vs. exit race

If I missed some patches, let me know it.

I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
But unfortunately, the result was below.

Now, I am making test program I can send to you but it seems to be not easy
because small changes for factoring it out from testing suite seems to change
something(ex, timing) and makes hard to reproduce. I will try it again.


page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
index:0x60e02
flags: 0x40040018(uptodate|dirty|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3272!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x8fb/0x910
RSP: 0018:88007344f968  EFLAGS: 00010286
RAX: 0021 RBX: ea240080 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344f9e8 R08:  R09: 880bc600
R10: 8163e2c0 R11: 4b47 R12: ea240080
R13: ea240088 R14: ea240080 R15: 
FS:  () GS:88007830() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
Stack:
 cccd ea240080 88007344fa00 ea240088
 88007344fa00  88007344f9e8 810f0200
 ea24   ea240080
Call Trace:
 [] ? __lock_page+0xa0/0xb0
 [] deferred_split_scan+0x115/0x240
 [] ? list_lru_count_one+0x1c/0x30
 [] shrink_slab.part.42+0x1e3/0x350
 [] shrink_zone+0x26a/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] try_to_free_pages+0xb4/0x140
 [] __alloc_pages_nodemask+0x459/0x920
 [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 
77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
RIP  [] split_huge_page_to_list+0x8fb/0x910
 RSP 
---[ end trace 0ee39378e850d8de ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-15 Thread Minchan Kim
On Thu, Nov 12, 2015 at 09:36:14AM +0900, Minchan Kim wrote:



> > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > > MADV_FREE code in there
> > >  + pte_mkdirty patch
> > >  + freeze/unfreeze patch
> > >  + do_page_add_anon_rmap patch
> > >  + above split_huge_pmd
> > > 
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > BUG: Bad rss-counter state mm:88007fa3bb80 idx:1 val:512
> > 
> > With the patch below my test setup run for 2+ days without triggering the
> > bug. split_huge_pmd patch should be dropped.
> > 
> > Please test.
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 14cbbad54a3e..7aa0a3fef2aa 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > write = pmd_write(*pmd);
> > young = pmd_young(*pmd);
> >  
> > -   /* leave pmd empty until pte is filled */
> > -   pmdp_huge_clear_flush_notify(vma, haddr, pmd);
> > -
> > pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> > pmd_populate(mm, &_pmd, pgtable);
> >  
> > @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > }
> >  
> > smp_wmb(); /* make pte visible before pmd */
> > +   /*
> > +* Up to this point the pmd is present and huge and userland has the
> > +* whole access to the hugepage during the split (which happens in
> > +* place). If we overwrite the pmd with the not-huge version pointing
> > +* to the pte here (which of course we could if all CPUs were bug
> > +* free), userland could trigger a small page size TLB miss on the
> > +* small sized TLB while the hugepage TLB entry is still established in
> > +* the huge TLB. Some CPU doesn't like that.
> > +* See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> > +* 383 on page 93. Intel should be safe but is also warns that it's
> > +* only safe if the permission and cache attributes of the two entries
> > +* loaded in the two TLB is identical (which should be the case here).
> > +* But it is generally safer to never allow small and huge TLB entries
> > +* for the same virtual address to be loaded simultaneously. So instead
> > +* of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> > +* current pmd notpresent (atomically because here the pmd_trans_huge
> > +* and pmd_trans_splitting must remain set at all times on the pmd
> > +* until the split is complete for this pmd), then we flush the SMP TLB
> > +* and finally we write the non-huge version of the pmd entry with
> > +* pmd_populate.
> > +*/
> > +   pmdp_invalidate(vma, haddr, pmd);
> > pmd_populate(mm, pmd, pgtable);
> >  
> > if (freeze) {
> 
> I have been tested this patch with MADV_DONTNEED for a few days and
> I couldn't see the problem any more. And I will continue to test it
> with MADV_FREE.

During the test with MADV_FREE on kernel I applied your patches,
I couldn't see any problem.

However, in this round, I did another test which is same one
I attached but a liitle bit different because it doesn't do
(memcg things/kill/swapoff) for testing program long-live test.

With that, I encountered this problem.

page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
index:0x62a02
flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3340!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x907/0x920
RSP: 0018:88004ced7a38  EFLAGS: 00010296
RAX: 0021 RBX: eaf60080 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88004ced7ab8 R08:  R09: 880bc560
R10: 8163d880 R11: 00014f25 R12: eaf60080
R13: eaf60088 R14: eaf60080 R15: 
FS:  7f43d3ced740() GS:8800782e() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
Stack:
 cccd eaf60080 88004ced7ad0 eaf60088
 88004ced7ad0  88004ced7ab8 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-15 Thread Minchan Kim
On Thu, Nov 12, 2015 at 09:36:14AM +0900, Minchan Kim wrote:



> > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > > MADV_FREE code in there
> > >  + pte_mkdirty patch
> > >  + freeze/unfreeze patch
> > >  + do_page_add_anon_rmap patch
> > >  + above split_huge_pmd
> > > 
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > BUG: Bad rss-counter state mm:88007fa3bb80 idx:1 val:512
> > 
> > With the patch below my test setup run for 2+ days without triggering the
> > bug. split_huge_pmd patch should be dropped.
> > 
> > Please test.
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 14cbbad54a3e..7aa0a3fef2aa 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > write = pmd_write(*pmd);
> > young = pmd_young(*pmd);
> >  
> > -   /* leave pmd empty until pte is filled */
> > -   pmdp_huge_clear_flush_notify(vma, haddr, pmd);
> > -
> > pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> > pmd_populate(mm, &_pmd, pgtable);
> >  
> > @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > }
> >  
> > smp_wmb(); /* make pte visible before pmd */
> > +   /*
> > +* Up to this point the pmd is present and huge and userland has the
> > +* whole access to the hugepage during the split (which happens in
> > +* place). If we overwrite the pmd with the not-huge version pointing
> > +* to the pte here (which of course we could if all CPUs were bug
> > +* free), userland could trigger a small page size TLB miss on the
> > +* small sized TLB while the hugepage TLB entry is still established in
> > +* the huge TLB. Some CPU doesn't like that.
> > +* See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> > +* 383 on page 93. Intel should be safe but is also warns that it's
> > +* only safe if the permission and cache attributes of the two entries
> > +* loaded in the two TLB is identical (which should be the case here).
> > +* But it is generally safer to never allow small and huge TLB entries
> > +* for the same virtual address to be loaded simultaneously. So instead
> > +* of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> > +* current pmd notpresent (atomically because here the pmd_trans_huge
> > +* and pmd_trans_splitting must remain set at all times on the pmd
> > +* until the split is complete for this pmd), then we flush the SMP TLB
> > +* and finally we write the non-huge version of the pmd entry with
> > +* pmd_populate.
> > +*/
> > +   pmdp_invalidate(vma, haddr, pmd);
> > pmd_populate(mm, pmd, pgtable);
> >  
> > if (freeze) {
> 
> I have been tested this patch with MADV_DONTNEED for a few days and
> I couldn't see the problem any more. And I will continue to test it
> with MADV_FREE.

During the test with MADV_FREE on kernel I applied your patches,
I couldn't see any problem.

However, in this round, I did another test which is same one
I attached but a liitle bit different because it doesn't do
(memcg things/kill/swapoff) for testing program long-live test.

With that, I encountered this problem.

page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
index:0x62a02
flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3340!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x907/0x920
RSP: 0018:88004ced7a38  EFLAGS: 00010296
RAX: 0021 RBX: eaf60080 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88004ced7ab8 R08:  R09: 880bc560
R10: 8163d880 R11: 00014f25 R12: eaf60080
R13: eaf60088 R14: eaf60080 R15: 
FS:  7f43d3ced740() GS:8800782e() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
Stack:
 cccd eaf60080 88004ced7ad0 eaf60088
 88004ced7ad0  88004ced7ab8 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-11 Thread Minchan Kim
On Mon, Nov 09, 2015 at 12:55:22AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > > Hello Kirill,
> > > > > > > 
> > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I added the code to check it and queued it 
> > > > > > > > > > > > > > > > again but I had another oops
> > > > > > > > > > > > > > > > in this time but symptom is related to 
> > > > > > > > > > > > > > > > anon_vma, too.
> > > > > > > > > > > > > > > > (kernel is based on recent mmotm + 
> > > > > > > > > > > > > > > > unconditional mkdirty for bug fix)
> > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since 
> > > > > > > > > > > > > > > > the page was not page_mapped
> > > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > > Let me think on it, but it could well relate to 
> > > > > > > > > > > > > > > the one you got before.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > > If it is fixed, I will test again with your 
> > > > > > > > > > > > > > migration patchset, then.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > > patch in there.
> > > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > > 
> > > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > > think I track it down
> > > > > > > > > > > > finally.
> > > > > > > > > > > >  
> > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > > properly tested, but
> > > > > > > > > > > > looks like it works.
> > > > > > > > > > > > 
> > > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > > works: I thought that
> > > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > > 
> > > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > > 
> > > > > > > > > > > > As result if zap_pte_range() races with 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-11 Thread Minchan Kim
On Mon, Nov 09, 2015 at 12:55:22AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > > Hello Kirill,
> > > > > > > 
> > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I added the code to check it and queued it 
> > > > > > > > > > > > > > > > again but I had another oops
> > > > > > > > > > > > > > > > in this time but symptom is related to 
> > > > > > > > > > > > > > > > anon_vma, too.
> > > > > > > > > > > > > > > > (kernel is based on recent mmotm + 
> > > > > > > > > > > > > > > > unconditional mkdirty for bug fix)
> > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since 
> > > > > > > > > > > > > > > > the page was not page_mapped
> > > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > > Let me think on it, but it could well relate to 
> > > > > > > > > > > > > > > the one you got before.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > > If it is fixed, I will test again with your 
> > > > > > > > > > > > > > migration patchset, then.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > > patch in there.
> > > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > > 
> > > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > > think I track it down
> > > > > > > > > > > > finally.
> > > > > > > > > > > >  
> > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > > properly tested, but
> > > > > > > > > > > > looks like it works.
> > > > > > > > > > > > 
> > > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > > works: I thought that
> > > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > > 
> > > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > > 
> > > > > > > > > > > > As result if zap_pte_range() races with 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-08 Thread Kirill A. Shutemov
On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > Hello Kirill,
> > > > > > 
> > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > > patchset, then.
> > > > > > > > > > > > 
> > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > patch in there.
> > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > 
> > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > think I track it down
> > > > > > > > > > > finally.
> > > > > > > > > > >  
> > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > properly tested, but
> > > > > > > > > > > looks like it works.
> > > > > > > > > > > 
> > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > works: I thought that
> > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > 
> > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > 
> > > > > > > > > > > As result if zap_pte_range() races with 
> > > > > > > > > > > split_huge_page(), we can end up
> > > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > > _mapcount
> > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-08 Thread Kirill A. Shutemov
On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > Hello Kirill,
> > > > > > 
> > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > > patchset, then.
> > > > > > > > > > > > 
> > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > patch in there.
> > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > 
> > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > think I track it down
> > > > > > > > > > > finally.
> > > > > > > > > > >  
> > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > properly tested, but
> > > > > > > > > > > looks like it works.
> > > > > > > > > > > 
> > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > works: I thought that
> > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > 
> > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > 
> > > > > > > > > > > As result if zap_pte_range() races with 
> > > > > > > > > > > split_huge_page(), we can end up
> > > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > > _mapcount
> > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Minchan Kim
On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > Hello Kirill,
> > > > > 
> > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) 
> > > > > > > > > > > > > > && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > 
> > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > patchset, then.
> > > > > > > > > > > 
> > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > attach for a long time.
> > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > patch in there.
> > > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > > all test kernels.
> > > > > > > > > > 
> > > > > > > > > > It took too long time (and a lot of printk()), but I think 
> > > > > > > > > > I track it down
> > > > > > > > > > finally.
> > > > > > > > > >  
> > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > properly tested, but
> > > > > > > > > > looks like it works.
> > > > > > > > > > 
> > > > > > > > > > The problem was my wrong assumption on how migration works: 
> > > > > > > > > > I thought that
> > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > deconstruction mapping.
> > > > > > > > > > 
> > > > > > > > > > But turn out that's not true.
> > > > > > > > > > 
> > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), 
> > > > > > > > > > we can end up
> > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > _mapcount
> > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> > > > > > > > > > by vmscan and by
> > > > > > > > > > pfn scanners (Sasha showed few similar traces from 
> > > > > > > > > > compaction too).
> > > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > > freed anon_vma.
> > > > > > > > > > 
> > > > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Kirill A. Shutemov
On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > Hello Kirill,
> > > > 
> > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I added the code to check it and queued it again but 
> > > > > > > > > > > > > I had another oops
> > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > 
> > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > migration series.
> > > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > > you got before.
> > > > > > > > > > > 
> > > > > > > > > > > I will roll back to 
> > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > migration cleanup
> > > > > > > > > > > series and will test it again.
> > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > patchset, then.
> > > > > > > > > > 
> > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > > for a long time.
> > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > > in there.
> > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > all test kernels.
> > > > > > > > > 
> > > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > > track it down
> > > > > > > > > finally.
> > > > > > > > >  
> > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > properly tested, but
> > > > > > > > > looks like it works.
> > > > > > > > > 
> > > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > > thought that
> > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > deconstruction mapping.
> > > > > > > > > 
> > > > > > > > > But turn out that's not true.
> > > > > > > > > 
> > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > > can end up
> > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > _mapcount
> > > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > > vmscan and by
> > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > > too).
> > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > freed anon_vma.
> > > > > > > > > 
> > > > > > > > > BOOM!
> > > > > > > > > 
> > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > > migration
> > > > > > > > > entries logic: on setup we remove page from rmap and drop 
> > > > > > > > > pin, on removing
> > > > > > > > > we get pin back and put page on rmap. 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Minchan Kim
On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > Hello Kirill,
> > > > > 
> > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) 
> > > > > > > > > > > > > > && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > 
> > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > patchset, then.
> > > > > > > > > > > 
> > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > attach for a long time.
> > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > patch in there.
> > > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > > all test kernels.
> > > > > > > > > > 
> > > > > > > > > > It took too long time (and a lot of printk()), but I think 
> > > > > > > > > > I track it down
> > > > > > > > > > finally.
> > > > > > > > > >  
> > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > properly tested, but
> > > > > > > > > > looks like it works.
> > > > > > > > > > 
> > > > > > > > > > The problem was my wrong assumption on how migration works: 
> > > > > > > > > > I thought that
> > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > deconstruction mapping.
> > > > > > > > > > 
> > > > > > > > > > But turn out that's not true.
> > > > > > > > > > 
> > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), 
> > > > > > > > > > we can end up
> > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > _mapcount
> > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> > > > > > > > > > by vmscan and by
> > > > > > > > > > pfn scanners (Sasha showed few similar traces from 
> > > > > > > > > > compaction too).
> > > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > > freed anon_vma.
> > > > > > > > > > 
> > > > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Kirill A. Shutemov
On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > Hello Kirill,
> > > > 
> > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I added the code to check it and queued it again but 
> > > > > > > > > > > > > I had another oops
> > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > 
> > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > migration series.
> > > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > > you got before.
> > > > > > > > > > > 
> > > > > > > > > > > I will roll back to 
> > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > migration cleanup
> > > > > > > > > > > series and will test it again.
> > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > patchset, then.
> > > > > > > > > > 
> > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > > for a long time.
> > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > > in there.
> > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > all test kernels.
> > > > > > > > > 
> > > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > > track it down
> > > > > > > > > finally.
> > > > > > > > >  
> > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > properly tested, but
> > > > > > > > > looks like it works.
> > > > > > > > > 
> > > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > > thought that
> > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > deconstruction mapping.
> > > > > > > > > 
> > > > > > > > > But turn out that's not true.
> > > > > > > > > 
> > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > > can end up
> > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > _mapcount
> > > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > > vmscan and by
> > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > > too).
> > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > freed anon_vma.
> > > > > > > > > 
> > > > > > > > > BOOM!
> > > > > > > > > 
> > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > > migration
> > > > > > > > > entries logic: on setup we remove page from rmap and drop 
> > > > > > > > > pin, on removing
> > > > > > > > > we get pin back and put page on rmap. 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-03 Thread Minchan Kim
On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > Hello Kirill,
> > > 
> > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > Hello Hugh,
> > > > > > > > > > 
> > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > > had another oops
> > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > > was not page_mapped
> > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > to be true.
> > > > > > > > > > > > 
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > flags: 
> > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > 
> > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > migration series.
> > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > you got before.
> > > > > > > > > > 
> > > > > > > > > > I will roll back to 
> > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > migration cleanup
> > > > > > > > > > series and will test it again.
> > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > patchset, then.
> > > > > > > > > 
> > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > for a long time.
> > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > in there.
> > > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > > test kernels.
> > > > > > > > 
> > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > track it down
> > > > > > > > finally.
> > > > > > > >  
> > > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > > tested, but
> > > > > > > > looks like it works.
> > > > > > > > 
> > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > thought that
> > > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > > mapping.
> > > > > > > > 
> > > > > > > > But turn out that's not true.
> > > > > > > > 
> > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > can end up
> > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > _mapcount
> > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > vmscan and by
> > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > too).
> > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > freed anon_vma.
> > > > > > > > 
> > > > > > > > BOOM!
> > > > > > > > 
> > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > migration
> > > > > > > > entries logic: on setup we remove page from rmap and drop pin, 
> > > > > > > > on removing
> > > > > > > > we get pin back and put page on rmap. This way even if 
> > > > > > > > migration entry
> > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > 
> > > > > > > > Please, test.
> > > > > > > > 
> > > > > > > 
> > > > > > > kernel: On mmotm-2015-10-15-15-20 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-03 Thread Minchan Kim
On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > Hello Kirill,
> > > 
> > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > Hello Hugh,
> > > > > > > > > > 
> > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > > had another oops
> > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > > was not page_mapped
> > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > to be true.
> > > > > > > > > > > > 
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > flags: 
> > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > 
> > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > migration series.
> > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > you got before.
> > > > > > > > > > 
> > > > > > > > > > I will roll back to 
> > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > migration cleanup
> > > > > > > > > > series and will test it again.
> > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > patchset, then.
> > > > > > > > > 
> > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > for a long time.
> > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > in there.
> > > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > > test kernels.
> > > > > > > > 
> > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > track it down
> > > > > > > > finally.
> > > > > > > >  
> > > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > > tested, but
> > > > > > > > looks like it works.
> > > > > > > > 
> > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > thought that
> > > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > > mapping.
> > > > > > > > 
> > > > > > > > But turn out that's not true.
> > > > > > > > 
> > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > can end up
> > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > _mapcount
> > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > vmscan and by
> > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > too).
> > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > freed anon_vma.
> > > > > > > > 
> > > > > > > > BOOM!
> > > > > > > > 
> > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > migration
> > > > > > > > entries logic: on setup we remove page from rmap and drop pin, 
> > > > > > > > on removing
> > > > > > > > we get pin back and put page on rmap. This way even if 
> > > > > > > > migration entry
> > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > 
> > > > > > > > Please, test.
> > > > > > > > 
> > > > > > > 
> > > > > > > kernel: On mmotm-2015-10-15-15-20 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Minchan Kim
On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > Hello Kirill,
> > 
> > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > Hello Hugh,
> > > > > > > > > 
> > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > 
> > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > had another oops
> > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > > for bug fix)
> > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > was not page_mapped
> > > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > > try_to_unmap seems
> > > > > > > > > > > to be true.
> > > > > > > > > > > 
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > flags: 
> > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > 
> > > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > > series.
> > > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > > got before.
> > > > > > > > > 
> > > > > > > > > I will roll back to 
> > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > > cleanup
> > > > > > > > > series and will test it again.
> > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > patchset, then.
> > > > > > > > 
> > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for 
> > > > > > > > a long time.
> > > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > > there.
> > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > test kernels.
> > > > > > > 
> > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > track it down
> > > > > > > finally.
> > > > > > >  
> > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > tested, but
> > > > > > > looks like it works.
> > > > > > > 
> > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > thought that
> > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > mapping.
> > > > > > > 
> > > > > > > But turn out that's not true.
> > > > > > > 
> > > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > > end up
> > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > vmscan and by
> > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > too).
> > > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > > anon_vma.
> > > > > > > 
> > > > > > > BOOM!
> > > > > > > 
> > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > migration
> > > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > > removing
> > > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > > entry
> > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > 
> > > > > > > Please, test.
> > > > > > > 
> > > > > > 
> > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > > patch, I tested
> > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > 
> > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > > index:0x61800 compound_mapcount: 0
> > > > > > flags: 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Kirill A. Shutemov
On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> Hello Kirill,
> 
> On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > Hello Hugh,
> > > > > > > > 
> > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > 
> > > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > > another oops
> > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > for bug fix)
> > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > > not page_mapped
> > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > try_to_unmap seems
> > > > > > > > > > to be true.
> > > > > > > > > > 
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > flags: 
> > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > 
> > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > series.
> > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > got before.
> > > > > > > > 
> > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > cleanup
> > > > > > > > series and will test it again.
> > > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > > then.
> > > > > > > 
> > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > > long time.
> > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > there.
> > > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > > kernels.
> > > > > > 
> > > > > > It took too long time (and a lot of printk()), but I think I track 
> > > > > > it down
> > > > > > finally.
> > > > > >  
> > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > tested, but
> > > > > > looks like it works.
> > > > > > 
> > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > thought that
> > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > mapping.
> > > > > > 
> > > > > > But turn out that's not true.
> > > > > > 
> > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > end up
> > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > > and by
> > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > anon_vma.
> > > > > > 
> > > > > > BOOM!
> > > > > > 
> > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > migration
> > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > removing
> > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > entry
> > > > > > will be removed under us we don't corrupt page's state.
> > > > > > 
> > > > > > Please, test.
> > > > > > 
> > > > > 
> > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > patch, I tested
> > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > 
> > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > index:0x61800 compound_mapcount: 0
> > > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > page->mem_cgroup:88007f613c00
> > > > 
> > > > Ignore my previous answer. Still sleeping.
> > > > 
> > > > The right way to fix I think is something like:
> > > > 
> > > > diff --git a/mm/rmap.c 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Minchan Kim
Hello Kirill,

On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > Hello Hugh,
> > > > > > > 
> > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > 
> > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > another oops
> > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > > bug fix)
> > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > not page_mapped
> > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > try_to_unmap seems
> > > > > > > > > to be true.
> > > > > > > > > 
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > flags: 
> > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > 
> > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > series.
> > > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > > before.
> > > > > > > 
> > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > cleanup
> > > > > > > series and will test it again.
> > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > then.
> > > > > > 
> > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > long time.
> > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > there.
> > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > kernels.
> > > > > 
> > > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > > down
> > > > > finally.
> > > > >  
> > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > tested, but
> > > > > looks like it works.
> > > > > 
> > > > > The problem was my wrong assumption on how migration works: I thought 
> > > > > that
> > > > > kernel would wait migration to finish on before deconstruction 
> > > > > mapping.
> > > > > 
> > > > > But turn out that's not true.
> > > > > 
> > > > > As result if zap_pte_range() races with split_huge_page(), we can end 
> > > > > up
> > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > and by
> > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > It's likely that page->mapping in this case would point to freed 
> > > > > anon_vma.
> > > > > 
> > > > > BOOM!
> > > > > 
> > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > removing
> > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > will be removed under us we don't corrupt page's state.
> > > > > 
> > > > > Please, test.
> > > > > 
> > > > 
> > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, 
> > > > I tested
> > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > 
> > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > index:0x61800 compound_mapcount: 0
> > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > page->mem_cgroup:88007f613c00
> > > 
> > > Ignore my previous answer. Still sleeping.
> > > 
> > > The right way to fix I think is something like:
> > > 
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 35643176bc15..f2d46792a554 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > >   bool compound = flags & RMAP_COMPOUND;
> > >   bool first;
> > >  
> > > - if (PageTransCompound(page)) {
> > > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Kirill A. Shutemov
On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > Hello Hugh,
> > > > > > 
> > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > 
> > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > another oops
> > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > bug fix)
> > > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > > page_mapped
> > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > try_to_unmap seems
> > > > > > > > to be true.
> > > > > > > > 
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > flags: 
> > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > 
> > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > series.
> > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > before.
> > > > > > 
> > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > series and will test it again.
> > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > then.
> > > > > 
> > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > > time.
> > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > And I added below debug code with request from Kirill to all test 
> > > > > kernels.
> > > > 
> > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > down
> > > > finally.
> > > >  
> > > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > > but
> > > > looks like it works.
> > > > 
> > > > The problem was my wrong assumption on how migration works: I thought 
> > > > that
> > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > 
> > > > But turn out that's not true.
> > > > 
> > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > with page which is not mapped anymore but has _count and _mapcount
> > > > elevated. The page is on LRU too. So it's still reachable by vmscan and 
> > > > by
> > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > It's likely that page->mapping in this case would point to freed 
> > > > anon_vma.
> > > > 
> > > > BOOM!
> > > > 
> > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > removing
> > > > we get pin back and put page on rmap. This way even if migration entry
> > > > will be removed under us we don't corrupt page's state.
> > > > 
> > > > Please, test.
> > > > 
> > > 
> > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > > tested
> > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > 
> > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > index:0x61800 compound_mapcount: 0
> > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > page->mem_cgroup:88007f613c00
> > 
> > Ignore my previous answer. Still sleeping.
> > 
> > The right way to fix I think is something like:
> > 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 35643176bc15..f2d46792a554 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > bool compound = flags & RMAP_COMPOUND;
> > bool first;
> >  
> > -   if (PageTransCompound(page)) {
> > +   if (PageTransCompound(page) && compound) {
> > +   atomic_t *mapcount;
> > VM_BUG_ON_PAGE(!PageLocked(page), page);
> > -   if (compound) {
> > -   atomic_t *mapcount;
> > -
> > -   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > -   mapcount = 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Kirill A. Shutemov
On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > Hello Hugh,
> > > > > > 
> > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > 
> > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > another oops
> > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > bug fix)
> > > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > > page_mapped
> > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > try_to_unmap seems
> > > > > > > > to be true.
> > > > > > > > 
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > flags: 
> > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > 
> > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > series.
> > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > before.
> > > > > > 
> > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > series and will test it again.
> > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > then.
> > > > > 
> > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > > time.
> > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > And I added below debug code with request from Kirill to all test 
> > > > > kernels.
> > > > 
> > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > down
> > > > finally.
> > > >  
> > > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > > but
> > > > looks like it works.
> > > > 
> > > > The problem was my wrong assumption on how migration works: I thought 
> > > > that
> > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > 
> > > > But turn out that's not true.
> > > > 
> > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > with page which is not mapped anymore but has _count and _mapcount
> > > > elevated. The page is on LRU too. So it's still reachable by vmscan and 
> > > > by
> > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > It's likely that page->mapping in this case would point to freed 
> > > > anon_vma.
> > > > 
> > > > BOOM!
> > > > 
> > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > removing
> > > > we get pin back and put page on rmap. This way even if migration entry
> > > > will be removed under us we don't corrupt page's state.
> > > > 
> > > > Please, test.
> > > > 
> > > 
> > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > > tested
> > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > 
> > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > index:0x61800 compound_mapcount: 0
> > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > page->mem_cgroup:88007f613c00
> > 
> > Ignore my previous answer. Still sleeping.
> > 
> > The right way to fix I think is something like:
> > 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 35643176bc15..f2d46792a554 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > bool compound = flags & RMAP_COMPOUND;
> > bool first;
> >  
> > -   if (PageTransCompound(page)) {
> > +   if (PageTransCompound(page) && compound) {
> > +   atomic_t *mapcount;
> > VM_BUG_ON_PAGE(!PageLocked(page), page);
> > -   if (compound) {
> > -   atomic_t *mapcount;
> > -
> > -   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > -   mapcount = 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Minchan Kim
On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > Hello Kirill,
> > 
> > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > Hello Hugh,
> > > > > > > > > 
> > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > 
> > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > had another oops
> > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > > for bug fix)
> > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > was not page_mapped
> > > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > > try_to_unmap seems
> > > > > > > > > > > to be true.
> > > > > > > > > > > 
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > flags: 
> > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > 
> > > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > > series.
> > > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > > got before.
> > > > > > > > > 
> > > > > > > > > I will roll back to 
> > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > > cleanup
> > > > > > > > > series and will test it again.
> > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > patchset, then.
> > > > > > > > 
> > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for 
> > > > > > > > a long time.
> > > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > > there.
> > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > test kernels.
> > > > > > > 
> > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > track it down
> > > > > > > finally.
> > > > > > >  
> > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > tested, but
> > > > > > > looks like it works.
> > > > > > > 
> > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > thought that
> > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > mapping.
> > > > > > > 
> > > > > > > But turn out that's not true.
> > > > > > > 
> > > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > > end up
> > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > vmscan and by
> > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > too).
> > > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > > anon_vma.
> > > > > > > 
> > > > > > > BOOM!
> > > > > > > 
> > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > migration
> > > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > > removing
> > > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > > entry
> > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > 
> > > > > > > Please, test.
> > > > > > > 
> > > > > > 
> > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > > patch, I tested
> > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > 
> > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > > index:0x61800 compound_mapcount: 0
> > > > > > flags: 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Kirill A. Shutemov
On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> Hello Kirill,
> 
> On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > Hello Hugh,
> > > > > > > > 
> > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > 
> > > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > > another oops
> > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > for bug fix)
> > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > > not page_mapped
> > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > try_to_unmap seems
> > > > > > > > > > to be true.
> > > > > > > > > > 
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > flags: 
> > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > 
> > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > series.
> > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > got before.
> > > > > > > > 
> > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > cleanup
> > > > > > > > series and will test it again.
> > > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > > then.
> > > > > > > 
> > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > > long time.
> > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > there.
> > > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > > kernels.
> > > > > > 
> > > > > > It took too long time (and a lot of printk()), but I think I track 
> > > > > > it down
> > > > > > finally.
> > > > > >  
> > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > tested, but
> > > > > > looks like it works.
> > > > > > 
> > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > thought that
> > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > mapping.
> > > > > > 
> > > > > > But turn out that's not true.
> > > > > > 
> > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > end up
> > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > > and by
> > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > anon_vma.
> > > > > > 
> > > > > > BOOM!
> > > > > > 
> > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > migration
> > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > removing
> > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > entry
> > > > > > will be removed under us we don't corrupt page's state.
> > > > > > 
> > > > > > Please, test.
> > > > > > 
> > > > > 
> > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > patch, I tested
> > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > 
> > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > index:0x61800 compound_mapcount: 0
> > > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > page->mem_cgroup:88007f613c00
> > > > 
> > > > Ignore my previous answer. Still sleeping.
> > > > 
> > > > The right way to fix I think is something like:
> > > > 
> > > > diff --git a/mm/rmap.c 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-02 Thread Minchan Kim
Hello Kirill,

On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > Hello Hugh,
> > > > > > > 
> > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > 
> > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > another oops
> > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > > bug fix)
> > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > not page_mapped
> > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > try_to_unmap seems
> > > > > > > > > to be true.
> > > > > > > > > 
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > flags: 
> > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > 
> > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > series.
> > > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > > before.
> > > > > > > 
> > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > cleanup
> > > > > > > series and will test it again.
> > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > then.
> > > > > > 
> > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > long time.
> > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > there.
> > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > kernels.
> > > > > 
> > > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > > down
> > > > > finally.
> > > > >  
> > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > tested, but
> > > > > looks like it works.
> > > > > 
> > > > > The problem was my wrong assumption on how migration works: I thought 
> > > > > that
> > > > > kernel would wait migration to finish on before deconstruction 
> > > > > mapping.
> > > > > 
> > > > > But turn out that's not true.
> > > > > 
> > > > > As result if zap_pte_range() races with split_huge_page(), we can end 
> > > > > up
> > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > and by
> > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > It's likely that page->mapping in this case would point to freed 
> > > > > anon_vma.
> > > > > 
> > > > > BOOM!
> > > > > 
> > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > removing
> > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > will be removed under us we don't corrupt page's state.
> > > > > 
> > > > > Please, test.
> > > > > 
> > > > 
> > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, 
> > > > I tested
> > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > 
> > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > index:0x61800 compound_mapcount: 0
> > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > page->mem_cgroup:88007f613c00
> > > 
> > > Ignore my previous answer. Still sleeping.
> > > 
> > > The right way to fix I think is something like:
> > > 
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 35643176bc15..f2d46792a554 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > >   bool compound = flags & RMAP_COMPOUND;
> > >   bool first;
> > >  
> > > - if (PageTransCompound(page)) {
> > > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-30 Thread Minchan Kim
On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > Hello Hugh,
> > > > > 
> > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > 
> > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > another oops
> > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > > fix)
> > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > page_mapped
> > > > > > > at that time but second check of page_mapped right before 
> > > > > > > try_to_unmap seems
> > > > > > > to be true.
> > > > > > > 
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > > index:0x60aff
> > > > > > > flags: 
> > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > 
> > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > before.
> > > > > 
> > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > series and will test it again.
> > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > 
> > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > time.
> > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > And I added below debug code with request from Kirill to all test 
> > > > kernels.
> > > 
> > > It took too long time (and a lot of printk()), but I think I track it down
> > > finally.
> > >  
> > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > but
> > > looks like it works.
> > > 
> > > The problem was my wrong assumption on how migration works: I thought that
> > > kernel would wait migration to finish on before deconstruction mapping.
> > > 
> > > But turn out that's not true.
> > > 
> > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > with page which is not mapped anymore but has _count and _mapcount
> > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > 
> > > BOOM!
> > > 
> > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > we get pin back and put page on rmap. This way even if migration entry
> > > will be removed under us we don't corrupt page's state.
> > > 
> > > Please, test.
> > > 
> > 
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > tested
> > one I sent to you(ie, oops.c + memcg_test.sh)
> > 
> > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > index:0x61800 compound_mapcount: 0
> > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > page->mem_cgroup:88007f613c00
> 
> Ignore my previous answer. Still sleeping.
> 
> The right way to fix I think is something like:
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..f2d46792a554 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
>   bool compound = flags & RMAP_COMPOUND;
>   bool first;
>  
> - if (PageTransCompound(page)) {
> + if (PageTransCompound(page) && compound) {
> + atomic_t *mapcount;
>   VM_BUG_ON_PAGE(!PageLocked(page), page);
> - if (compound) {
> - atomic_t *mapcount;
> -
> - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> - mapcount = compound_mapcount_ptr(page);
> - first = atomic_inc_and_test(mapcount);
> - } else {
> - /* Anon THP always mapped first with PMD */
> - first = 0;
> - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> - atomic_inc(>_mapcount);
> -   

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-30 Thread Minchan Kim
On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > Hello Hugh,
> > > > > 
> > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > 
> > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > another oops
> > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > > fix)
> > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > page_mapped
> > > > > > > at that time but second check of page_mapped right before 
> > > > > > > try_to_unmap seems
> > > > > > > to be true.
> > > > > > > 
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > > index:0x60aff
> > > > > > > flags: 
> > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > 
> > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > before.
> > > > > 
> > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > series and will test it again.
> > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > 
> > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > time.
> > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > And I added below debug code with request from Kirill to all test 
> > > > kernels.
> > > 
> > > It took too long time (and a lot of printk()), but I think I track it down
> > > finally.
> > >  
> > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > but
> > > looks like it works.
> > > 
> > > The problem was my wrong assumption on how migration works: I thought that
> > > kernel would wait migration to finish on before deconstruction mapping.
> > > 
> > > But turn out that's not true.
> > > 
> > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > with page which is not mapped anymore but has _count and _mapcount
> > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > 
> > > BOOM!
> > > 
> > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > we get pin back and put page on rmap. This way even if migration entry
> > > will be removed under us we don't corrupt page's state.
> > > 
> > > Please, test.
> > > 
> > 
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > tested
> > one I sent to you(ie, oops.c + memcg_test.sh)
> > 
> > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > index:0x61800 compound_mapcount: 0
> > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > page->mem_cgroup:88007f613c00
> 
> Ignore my previous answer. Still sleeping.
> 
> The right way to fix I think is something like:
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..f2d46792a554 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
>   bool compound = flags & RMAP_COMPOUND;
>   bool first;
>  
> - if (PageTransCompound(page)) {
> + if (PageTransCompound(page) && compound) {
> + atomic_t *mapcount;
>   VM_BUG_ON_PAGE(!PageLocked(page), page);
> - if (compound) {
> - atomic_t *mapcount;
> -
> - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> - mapcount = compound_mapcount_ptr(page);
> - first = atomic_inc_and_test(mapcount);
> - } else {
> - /* Anon THP always mapped first with PMD */
> - first = 0;
> - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> - atomic_inc(>_mapcount);
> -   

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Kirill A. Shutemov
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> page->mem_cgroup:88007f613c00

Ignore my previous answer. Still sleeping.

The right way to fix I think is something like:

diff --git a/mm/rmap.c b/mm/rmap.c
index 35643176bc15..f2d46792a554 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
bool compound = flags & RMAP_COMPOUND;
bool first;
 
-   if (PageTransCompound(page)) {
+   if (PageTransCompound(page) && compound) {
+   atomic_t *mapcount;
VM_BUG_ON_PAGE(!PageLocked(page), page);
-   if (compound) {
-   atomic_t *mapcount;
-
-   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-   mapcount = compound_mapcount_ptr(page);
-   first = atomic_inc_and_test(mapcount);
-   } else {
-   /* Anon THP always mapped first with PMD */
-   first = 0;
-   VM_BUG_ON_PAGE(!page_mapcount(page), page);
-   atomic_inc(>_mapcount);
-   }
+   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+   mapcount = compound_mapcount_ptr(page);
+   first = atomic_inc_and_test(mapcount);
} else {
VM_BUG_ON_PAGE(compound, page);
first 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Kirill A. Shutemov
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))

The VM_BUG_ON_PAGE() is bogus after the patch. Just drop it.

> page->mem_cgroup:88007f613c00
> [ cut here ]
> kernel BUG at mm/rmap.c:1156!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
> #1573
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
> RIP: 0010:[]  [] 
> do_page_add_anon_rmap+0x323/0x360
> RSP: :8805f758  EFLAGS: 00010292
> RAX: 0021 RBX: ea00016a RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 8805f780 R08:  R09: 880b8be0
> R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
> R13: 6180 R14:  R15: 88007e85ddc0
> FS:  7f5cd5fea740() GS:8800bfae() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 64c03000 CR3: 7f017000 CR4: 06a0
> Stack:
>  88007f351000 88007f352000 ea00016a 6180
>  88007e85ddc0 8805f790 81128278 8805f800
>  81146dbb 000619ff 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Minchan Kim
On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> > > 
> > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > instead of next-20151021 to remove noise from your migration cleanup
> > > series and will test it again.
> > > If it is fixed, I will test again with your migration patchset, then.
> > 
> > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > Therefore, there is no patchset from Hugh's migration patch in there.
> > And I added below debug code with request from Kirill to all test kernels.
> 
> It took too long time (and a lot of printk()), but I think I track it down
> finally.
>  
> The patch below seems fixes issue for me. It's not yet properly tested, but
> looks like it works.
> 
> The problem was my wrong assumption on how migration works: I thought that
> kernel would wait migration to finish on before deconstruction mapping.
> 
> But turn out that's not true.
> 
> As result if zap_pte_range() races with split_huge_page(), we can end up
> with page which is not mapped anymore but has _count and _mapcount
> elevated. The page is on LRU too. So it's still reachable by vmscan and by
> pfn scanners (Sasha showed few similar traces from compaction too).
> It's likely that page->mapping in this case would point to freed anon_vma.
> 
> BOOM!
> 
> The patch modify freeze/unfreeze_page() code to match normal migration
> entries logic: on setup we remove page from rmap and drop pin, on removing
> we get pin back and put page on rmap. This way even if migration entry
> will be removed under us we don't corrupt page's state.
> 
> Please, test.
> 

kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
one I sent to you(ie, oops.c + memcg_test.sh)

page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
index:0x61800 compound_mapcount: 0
flags: 0x40044009(locked|uptodate|head|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
page->mem_cgroup:88007f613c00
[ cut here ]
kernel BUG at mm/rmap.c:1156!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
#1573
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
RIP: 0010:[]  [] 
do_page_add_anon_rmap+0x323/0x360
RSP: :8805f758  EFLAGS: 00010292
RAX: 0021 RBX: ea00016a RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8805f780 R08:  R09: 880b8be0
R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
R13: 6180 R14:  R15: 88007e85ddc0
FS:  7f5cd5fea740() GS:8800bfae() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 64c03000 CR3: 7f017000 CR4: 06a0
Stack:
 88007f351000 88007f352000 ea00016a 6180
 88007e85ddc0 8805f790 81128278 8805f800
 81146dbb 000619ff 00061800 1600
Call Trace:
 [] page_add_anon_rmap+0x18/0x20
 [] unfreeze_page+0x24b/0x330
 [] split_huge_page_to_list+0x3df/0x920
 [] ? scan_swap_map+0x37f/0x550
 [] add_to_swap+0xb6/0x100
 [] shrink_page_list+0x3b7/0xdc0
 [] shrink_inactive_list+0x18c/0x4b0
 [] shrink_lruvec+0x58f/0x730
 [] shrink_zone+0xd4/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Minchan Kim
On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> > > 
> > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > instead of next-20151021 to remove noise from your migration cleanup
> > > series and will test it again.
> > > If it is fixed, I will test again with your migration patchset, then.
> > 
> > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > Therefore, there is no patchset from Hugh's migration patch in there.
> > And I added below debug code with request from Kirill to all test kernels.
> 
> It took too long time (and a lot of printk()), but I think I track it down
> finally.
>  
> The patch below seems fixes issue for me. It's not yet properly tested, but
> looks like it works.
> 
> The problem was my wrong assumption on how migration works: I thought that
> kernel would wait migration to finish on before deconstruction mapping.
> 
> But turn out that's not true.
> 
> As result if zap_pte_range() races with split_huge_page(), we can end up
> with page which is not mapped anymore but has _count and _mapcount
> elevated. The page is on LRU too. So it's still reachable by vmscan and by
> pfn scanners (Sasha showed few similar traces from compaction too).
> It's likely that page->mapping in this case would point to freed anon_vma.
> 
> BOOM!
> 
> The patch modify freeze/unfreeze_page() code to match normal migration
> entries logic: on setup we remove page from rmap and drop pin, on removing
> we get pin back and put page on rmap. This way even if migration entry
> will be removed under us we don't corrupt page's state.
> 
> Please, test.
> 

kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
one I sent to you(ie, oops.c + memcg_test.sh)

page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
index:0x61800 compound_mapcount: 0
flags: 0x40044009(locked|uptodate|head|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
page->mem_cgroup:88007f613c00
[ cut here ]
kernel BUG at mm/rmap.c:1156!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
#1573
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
RIP: 0010:[]  [] 
do_page_add_anon_rmap+0x323/0x360
RSP: :8805f758  EFLAGS: 00010292
RAX: 0021 RBX: ea00016a RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8805f780 R08:  R09: 880b8be0
R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
R13: 6180 R14:  R15: 88007e85ddc0
FS:  7f5cd5fea740() GS:8800bfae() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 64c03000 CR3: 7f017000 CR4: 06a0
Stack:
 88007f351000 88007f352000 ea00016a 6180
 88007e85ddc0 8805f790 81128278 8805f800
 81146dbb 000619ff 00061800 1600
Call Trace:
 [] page_add_anon_rmap+0x18/0x20
 [] unfreeze_page+0x24b/0x330
 [] split_huge_page_to_list+0x3df/0x920
 [] ? scan_swap_map+0x37f/0x550
 [] add_to_swap+0xb6/0x100
 [] shrink_page_list+0x3b7/0xdc0
 [] shrink_inactive_list+0x18c/0x4b0
 [] shrink_lruvec+0x58f/0x730
 [] shrink_zone+0xd4/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Kirill A. Shutemov
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))

The VM_BUG_ON_PAGE() is bogus after the patch. Just drop it.

> page->mem_cgroup:88007f613c00
> [ cut here ]
> kernel BUG at mm/rmap.c:1156!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
> #1573
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
> RIP: 0010:[]  [] 
> do_page_add_anon_rmap+0x323/0x360
> RSP: :8805f758  EFLAGS: 00010292
> RAX: 0021 RBX: ea00016a RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 8805f780 R08:  R09: 880b8be0
> R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
> R13: 6180 R14:  R15: 88007e85ddc0
> FS:  7f5cd5fea740() GS:8800bfae() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 64c03000 CR3: 7f017000 CR4: 06a0
> Stack:
>  88007f351000 88007f352000 ea00016a 6180
>  88007e85ddc0 8805f790 81128278 8805f800
>  81146dbb 000619ff 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Kirill A. Shutemov
On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> page->mem_cgroup:88007f613c00

Ignore my previous answer. Still sleeping.

The right way to fix I think is something like:

diff --git a/mm/rmap.c b/mm/rmap.c
index 35643176bc15..f2d46792a554 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
bool compound = flags & RMAP_COMPOUND;
bool first;
 
-   if (PageTransCompound(page)) {
+   if (PageTransCompound(page) && compound) {
+   atomic_t *mapcount;
VM_BUG_ON_PAGE(!PageLocked(page), page);
-   if (compound) {
-   atomic_t *mapcount;
-
-   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-   mapcount = compound_mapcount_ptr(page);
-   first = atomic_inc_and_test(mapcount);
-   } else {
-   /* Anon THP always mapped first with PMD */
-   first = 0;
-   VM_BUG_ON_PAGE(!page_mapcount(page), page);
-   atomic_inc(>_mapcount);
-   }
+   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+   mapcount = compound_mapcount_ptr(page);
+   first = atomic_inc_and_test(mapcount);
} else {
VM_BUG_ON_PAGE(compound, page);
first 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-28 Thread Kirill A. Shutemov
On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.
> > 
> > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > instead of next-20151021 to remove noise from your migration cleanup
> > series and will test it again.
> > If it is fixed, I will test again with your migration patchset, then.
> 
> I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> Therefore, there is no patchset from Hugh's migration patch in there.
> And I added below debug code with request from Kirill to all test kernels.

It took too long time (and a lot of printk()), but I think I track it down
finally.
 
The patch below seems fixes issue for me. It's not yet properly tested, but
looks like it works.

The problem was my wrong assumption on how migration works: I thought that
kernel would wait migration to finish on before deconstruction mapping.

But turn out that's not true.

As result if zap_pte_range() races with split_huge_page(), we can end up
with page which is not mapped anymore but has _count and _mapcount
elevated. The page is on LRU too. So it's still reachable by vmscan and by
pfn scanners (Sasha showed few similar traces from compaction too).
It's likely that page->mapping in this case would point to freed anon_vma.

BOOM!

The patch modify freeze/unfreeze_page() code to match normal migration
entries logic: on setup we remove page from rmap and drop pin, on removing
we get pin back and put page on rmap. This way even if migration entry
will be removed under us we don't corrupt page's state.

Please, test.

Not-Yet-Signed-off-by: Kirill A. Shutemov 

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5e0fe82a0fae..192b50c7526c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct 
vm_area_struct *vma, pmd_t *pmd,
 
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
+
+   if (freeze) {
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+   page_remove_rmap(page + i, false);
+   put_page(page + i);
+   }
+   }
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (pte_soft_dirty(entry))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
+   page_remove_rmap(page, false);
+   put_page(page);
}
pte_unmap_unlock(pte, ptl);
 }
@@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
return;
pte = pte_offset_map_lock(vma->vm_mm, pmd, address, );
for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
-   if (!page_mapped(page))
-   continue;
if (!is_swap_pte(pte[i]))
continue;
 
@@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (migration_entry_to_page(swp_entry) != page)
continue;
 
+   get_page(page);
+   page_add_anon_rmap(page, vma, address, false);
+
entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
entry = pte_mkdirty(entry);
if (is_write_migration_entry(swp_entry))
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-28 Thread Kirill A. Shutemov
On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.
> > 
> > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > instead of next-20151021 to remove noise from your migration cleanup
> > series and will test it again.
> > If it is fixed, I will test again with your migration patchset, then.
> 
> I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> Therefore, there is no patchset from Hugh's migration patch in there.
> And I added below debug code with request from Kirill to all test kernels.

It took too long time (and a lot of printk()), but I think I track it down
finally.
 
The patch below seems fixes issue for me. It's not yet properly tested, but
looks like it works.

The problem was my wrong assumption on how migration works: I thought that
kernel would wait migration to finish on before deconstruction mapping.

But turn out that's not true.

As result if zap_pte_range() races with split_huge_page(), we can end up
with page which is not mapped anymore but has _count and _mapcount
elevated. The page is on LRU too. So it's still reachable by vmscan and by
pfn scanners (Sasha showed few similar traces from compaction too).
It's likely that page->mapping in this case would point to freed anon_vma.

BOOM!

The patch modify freeze/unfreeze_page() code to match normal migration
entries logic: on setup we remove page from rmap and drop pin, on removing
we get pin back and put page on rmap. This way even if migration entry
will be removed under us we don't corrupt page's state.

Please, test.

Not-Yet-Signed-off-by: Kirill A. Shutemov 

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5e0fe82a0fae..192b50c7526c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct 
vm_area_struct *vma, pmd_t *pmd,
 
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
+
+   if (freeze) {
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+   page_remove_rmap(page + i, false);
+   put_page(page + i);
+   }
+   }
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (pte_soft_dirty(entry))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
+   page_remove_rmap(page, false);
+   put_page(page);
}
pte_unmap_unlock(pte, ptl);
 }
@@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
return;
pte = pte_offset_map_lock(vma->vm_mm, pmd, address, );
for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
-   if (!page_mapped(page))
-   continue;
if (!is_swap_pte(pte[i]))
continue;
 
@@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (migration_entry_to_page(swp_entry) != page)
continue;
 
+   get_page(page);
+   page_add_anon_rmap(page, vma, address, false);
+
entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
entry = pte_mkdirty(entry);
if (is_write_migration_entry(swp_entry))
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Hugh Dickins
On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Wed, 21 Oct 2015, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> 
> I think I have introduced a bug there; or rather, made more evident
> a pre-existing bug.  But I'm not sure yet: the stacktrace was from
> compaction (called by khugepaged, but that may not be relevant at all),
> and thinking through the races with isolate_migratepages_block() is
> never easy.
> 
> What's certain is that I was not giving any thought to
> isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
> I was thinking about "stable" anonymous pages, and how they get
> faulted back in from swapcache while holding page lock.
> 
> It looks to me now as if a page might not yet be PageAnon when it's
> first tested in __unmap_and_move(), when going to page_get_anon_vma();
> but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
> where I inserted the VM_BUG_ON_PAGE().
> 
> If so, the code would always have been wrong (trying to unmap the
> anonymous page, and later remap its replacement, without a hold on
> the anon_vma needed to guide both lookups); but I'll have made it
> more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
> that's a good step forward :)
> 
> There's a reference count check in isolated_migratepages_block()
> before this, which would make it unlikely, but I doubt rules it out.
> 
> However... you did hit an anon_vma reference counting problem before
> my migration changes went in, and Kirill had a vague suspicion that
> he might be screwing up anon_vma refcounting in split_huge_page():
> if he confirms that, I'd say it's more likely to be the cause of
> your crash on this occasion.
> 
> Not hard to fix mine (though we'll probably have to lose the
> VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
> trivial fix), I just want to give the races more thought.

And after giving it more thought, I realize that I was wrong yesterday,
and the new VM_BUG_ON_PAGE() should be good as is: my guess is that it
is simply alerting you to the same anon_vma reference counting issue
as you had already hit without that patch.

What I was forgetting yesterday, is that isolate_migratepages_block()
can only take the page for migration when it's PageLRU(): and
do_anonymous_page() only adds a page to the LRU after it has been
marked as mapped and PageAnon.

So the window that worried me yesterday, that __unmap_and_move()
might see !PageAnon, then reach try_to_unmap() with it page_mapped
and PageAnon: that window does not exist, with or without my changes.

Hugh

> 
> However it turns out, I think you have a very useful test there.
> 
> (And I've observed no PageDirty problems with your recent patchsets,
> though I don't use MADV_FREE at all myself.)
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Minchan Kim
On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
Therefore, there is no patchset from Hugh's migration patch in there.
And I added below debug code with request from Kirill to all test kernels.

diff --git a/mm/rmap.c b/mm/rmap.c
index ddfb9be72366..1c23b70b1f57 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -513,6 +513,13 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
 
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = READ_ONCE(anon_vma->root);
+
+   if (root_anon_vma == NULL) {
+   printk("anon_vma %p refcount %d\n", anon_vma,
+   atomic_read(_vma->refcount));
+   VM_BUG_ON_PAGE(1, page);
+   }
+
if (down_read_trylock(_anon_vma->rwsem)) {
/*
 * If the page is still mapped, then this anon_vma is still


1. mmotm-2015-10-15-15-20 + kirill's pte_mkdirty

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f1ed780 idx:1 val:488
BUG: Bad rss-counter state mm:88007f1ed780 idx:2 val:24

2nd trial:

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:8800a5cca680 idx:1 val:512
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS

2. mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP.

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f4c2d80 idx:1 val:511
BUG: Bad rss-counter state mm:88007f4c2d80 idx:2 val:1

2nd trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
anon_vma 88089aa0 refcount 0
page:ea0001a2ea40 count:3 mapcount:1 mapping:88089aa1 
index:0x647a9

I tested it with KVM which guest system has 12 core and 3G memory.
In mmotm-2015-10-15-15-20-no-madvise_free, I tweaked test program does
madvise_dontneed intead of madvise_free via below patch

For the testing,

gcc -o oops oops.c
./memcg_test.sh

I will be off from now on so please understand late response
but I hope my test program will reproduce it in your machine.

diff --git a/oops.c b/oops.c
index e50330a..c8298f8 100644
--- a/oops.c
+++ b/oops.c
@@ -8,7 +8,7 @@
 #include 
 #include 
 
-#define MADV_FREE 5
+#define MADV_FREE 4
 
 int pid;



memcg_move_task.sh
Description: Bourne shell script


memcg_test.sh
Description: Bourne shell script
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MADV_FREE 4

int pid;

void sig_handler(int signo)
{
printf("pid %d sig received %d\n", pid, signo);
	exit(1);
}

void free_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;

	for (i = 0; i < buf_count; i++) {
		if (bufs[i] != NULL) {
			munmap(bufs[i],  buf_size);
			bufs[i] = NULL;
		}
	}
}

void alloc_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;
	time_t rawtime;
	struct tm * timeinfo;
	void *addr = (void*)0x6000;

	for (i = 0; i < buf_count; i++) {
		void *ptr = NULL;

		ptr = mmap(addr, buf_size, PROT_READ|PROT_WRITE,
			MAP_ANON|MAP_PRIVATE|MAP_FIXED, 0, 0);

		if (ptr == MAP_FAILED) {
			char bufs[64];

			

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Minchan Kim
On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
Therefore, there is no patchset from Hugh's migration patch in there.
And I added below debug code with request from Kirill to all test kernels.

diff --git a/mm/rmap.c b/mm/rmap.c
index ddfb9be72366..1c23b70b1f57 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -513,6 +513,13 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
 
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = READ_ONCE(anon_vma->root);
+
+   if (root_anon_vma == NULL) {
+   printk("anon_vma %p refcount %d\n", anon_vma,
+   atomic_read(_vma->refcount));
+   VM_BUG_ON_PAGE(1, page);
+   }
+
if (down_read_trylock(_anon_vma->rwsem)) {
/*
 * If the page is still mapped, then this anon_vma is still


1. mmotm-2015-10-15-15-20 + kirill's pte_mkdirty

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f1ed780 idx:1 val:488
BUG: Bad rss-counter state mm:88007f1ed780 idx:2 val:24

2nd trial:

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:8800a5cca680 idx:1 val:512
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS

2. mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP.

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f4c2d80 idx:1 val:511
BUG: Bad rss-counter state mm:88007f4c2d80 idx:2 val:1

2nd trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
anon_vma 88089aa0 refcount 0
page:ea0001a2ea40 count:3 mapcount:1 mapping:88089aa1 
index:0x647a9

I tested it with KVM which guest system has 12 core and 3G memory.
In mmotm-2015-10-15-15-20-no-madvise_free, I tweaked test program does
madvise_dontneed intead of madvise_free via below patch

For the testing,

gcc -o oops oops.c
./memcg_test.sh

I will be off from now on so please understand late response
but I hope my test program will reproduce it in your machine.

diff --git a/oops.c b/oops.c
index e50330a..c8298f8 100644
--- a/oops.c
+++ b/oops.c
@@ -8,7 +8,7 @@
 #include 
 #include 
 
-#define MADV_FREE 5
+#define MADV_FREE 4
 
 int pid;



memcg_move_task.sh
Description: Bourne shell script


memcg_test.sh
Description: Bourne shell script
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MADV_FREE 4

int pid;

void sig_handler(int signo)
{
printf("pid %d sig received %d\n", pid, signo);
	exit(1);
}

void free_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;

	for (i = 0; i < buf_count; i++) {
		if (bufs[i] != NULL) {
			munmap(bufs[i],  buf_size);
			bufs[i] = NULL;
		}
	}
}

void alloc_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;
	time_t rawtime;
	struct tm * timeinfo;
	void *addr = (void*)0x6000;

	for (i = 0; i < buf_count; i++) {
		void *ptr = NULL;

		ptr = mmap(addr, buf_size, PROT_READ|PROT_WRITE,
			MAP_ANON|MAP_PRIVATE|MAP_FIXED, 0, 0);

		if (ptr == MAP_FAILED) {
			char bufs[64];

			

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Hugh Dickins
On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Wed, 21 Oct 2015, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> 
> I think I have introduced a bug there; or rather, made more evident
> a pre-existing bug.  But I'm not sure yet: the stacktrace was from
> compaction (called by khugepaged, but that may not be relevant at all),
> and thinking through the races with isolate_migratepages_block() is
> never easy.
> 
> What's certain is that I was not giving any thought to
> isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
> I was thinking about "stable" anonymous pages, and how they get
> faulted back in from swapcache while holding page lock.
> 
> It looks to me now as if a page might not yet be PageAnon when it's
> first tested in __unmap_and_move(), when going to page_get_anon_vma();
> but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
> where I inserted the VM_BUG_ON_PAGE().
> 
> If so, the code would always have been wrong (trying to unmap the
> anonymous page, and later remap its replacement, without a hold on
> the anon_vma needed to guide both lookups); but I'll have made it
> more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
> that's a good step forward :)
> 
> There's a reference count check in isolated_migratepages_block()
> before this, which would make it unlikely, but I doubt rules it out.
> 
> However... you did hit an anon_vma reference counting problem before
> my migration changes went in, and Kirill had a vague suspicion that
> he might be screwing up anon_vma refcounting in split_huge_page():
> if he confirms that, I'd say it's more likely to be the cause of
> your crash on this occasion.
> 
> Not hard to fix mine (though we'll probably have to lose the
> VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
> trivial fix), I just want to give the races more thought.

And after giving it more thought, I realize that I was wrong yesterday,
and the new VM_BUG_ON_PAGE() should be good as is: my guess is that it
is simply alerting you to the same anon_vma reference counting issue
as you had already hit without that patch.

What I was forgetting yesterday, is that isolate_migratepages_block()
can only take the page for migration when it's PageLRU(): and
do_anonymous_page() only adds a page to the LRU after it has been
marked as mapped and PageAnon.

So the window that worried me yesterday, that __unmap_and_move()
might see !PageAnon, then reach try_to_unmap() with it page_mapped
and PageAnon: that window does not exist, with or without my changes.

Hugh

> 
> However it turns out, I think you have a very useful test there.
> 
> (And I've observed no PageDirty problems with your recent patchsets,
> though I don't use MADV_FREE at all myself.)
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Hugh Dickins
On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.

I think I have introduced a bug there; or rather, made more evident
a pre-existing bug.  But I'm not sure yet: the stacktrace was from
compaction (called by khugepaged, but that may not be relevant at all),
and thinking through the races with isolate_migratepages_block() is
never easy.

What's certain is that I was not giving any thought to
isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
I was thinking about "stable" anonymous pages, and how they get
faulted back in from swapcache while holding page lock.

It looks to me now as if a page might not yet be PageAnon when it's
first tested in __unmap_and_move(), when going to page_get_anon_vma();
but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
where I inserted the VM_BUG_ON_PAGE().

If so, the code would always have been wrong (trying to unmap the
anonymous page, and later remap its replacement, without a hold on
the anon_vma needed to guide both lookups); but I'll have made it
more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
that's a good step forward :)

There's a reference count check in isolated_migratepages_block()
before this, which would make it unlikely, but I doubt rules it out.

However... you did hit an anon_vma reference counting problem before
my migration changes went in, and Kirill had a vague suspicion that
he might be screwing up anon_vma refcounting in split_huge_page():
if he confirms that, I'd say it's more likely to be the cause of
your crash on this occasion.

Not hard to fix mine (though we'll probably have to lose the
VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
trivial fix), I just want to give the races more thought.

However it turns out, I think you have a very useful test there.

(And I've observed no PageDirty problems with your recent patchsets,
though I don't use MADV_FREE at all myself.)

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Hugh Dickins
On Thu, 22 Oct 2015, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

Not a good use of your time, I think.  It's sure to be fixed in the
rc5-mmotm because that VM_BUG_ON_PAGE(blah) just does not exist in
that tree: I added it to verify my reasoning in changing the comments
about page_get_anon_vma() and PageSwapCache in mm/migrate.c.

> 
> > 
> > > page->mem_cgroup:88007f3dcc00
> > > [ cut here ]
> > > kernel BUG at mm/migrate.c:889!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> > 
> > Hmm, it might be me to blame, or it might be Kirill, don't know yet.
> 
> It might be me, either.
> 
> > 
> > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> > I haven't digested yet, but it might turn out to be relevant.

Sorry, I think that was an irrelevant suggestion: today's new rc6-mmotm
is identical to yesterday's there, and the patch that was removed appears
to be identical to the one added.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Minchan Kim
Hello Hugh,

On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > 
> > I added the code to check it and queued it again but I had another oops
> > in this time but symptom is related to anon_vma, too.
> > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > at that time but second check of page_mapped right before try_to_unmap seems
> > to be true.
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > index:0x60aff
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > !anon_vma)
> 
> That's interesting, that's one I added in my page migration series.
> Let me think on it, but it could well relate to the one you got before.

I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
instead of next-20151021 to remove noise from your migration cleanup
series and will test it again.
If it is fixed, I will test again with your migration patchset, then.

> 
> > page->mem_cgroup:88007f3dcc00
> > [ cut here ]
> > kernel BUG at mm/migrate.c:889!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> 
> Hmm, it might be me to blame, or it might be Kirill, don't know yet.

It might be me, either.

> 
> Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> I haven't digested yet, but it might turn out to be relevant.
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Hugh Dickins
On Thu, 22 Oct 2015, Minchan Kim wrote:
> 
> I added the code to check it and queued it again but I had another oops
> in this time but symptom is related to anon_vma, too.
> (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> It seems page_get_anon_vma returns NULL since the page was not page_mapped
> at that time but second check of page_mapped right before try_to_unmap seems
> to be true.
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> index:0x60aff
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> !anon_vma)

That's interesting, that's one I added in my page migration series.
Let me think on it, but it could well relate to the one you got before.

> page->mem_cgroup:88007f3dcc00
> [ cut here ]
> kernel BUG at mm/migrate.c:889!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557

Hmm, it might be me to blame, or it might be Kirill, don't know yet.

Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
I haven't digested yet, but it might turn out to be relevant.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Minchan Kim
On Wed, Oct 21, 2015 at 02:07:23PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> > I detach this report from my patchset thread because I see below
> > problem with removing MADV_FREE related code and I can reproduce
> > same oops with MADV_FREE + recent patches(both my SetPageDirty
> > and Kirill's pte_mkdirty) within 7 hours.
> 
> Could you share code for your workload?

It's part of test suite so I need time to factor it out.
I will do/test and send it.

> 
> > I can not be sure it's THP refcount redesign's problem but it was
> > one of big change in MM between mmotm-2015-10-15-15-20 and
> > mmotm-2015-10-06-16-30 so it could be a culprit.
> > 
> > In page_lock_anon_vma_read, anon_vma_root was NULL.
> > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.
> 
> Hm. That's tricky.. :-/
> 
> Could you please dump anon_vma->refcount too?

I added the code to check it and queued it again but I had another oops
in this time but symptom is related to anon_vma, too.
(kernel is based on recent mmotm + unconditional mkdirty for bug fix)
It seems page_get_anon_vma returns NULL since the page was not page_mapped
at that time but second check of page_mapped right before try_to_unmap seems
to be true.

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
index:0x60aff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:88007f3dcc00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 11 PID: 59 Comm: khugepaged Not tainted 
4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b9851a40 ti: 8800b985c000 task.ti: 8800b985c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:8800b985fa00  EFLAGS: 00010286
RAX: 0021 RBX: ea0002dd7fc0 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8800b985fa80 R08:  R09: 880bb160
R10: 8163e000 R11: 01e0 R12: 
R13: ea0001cfbf80 R14: ea0001cfbfc0 R15: 8189de80
FS:  () GS:8800bfb6() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 5594f9d7e578 CR3: 01808000 CR4: 06a0
Stack:
 8800b9851a40   
 811144b0  81115fb0 ea0001cfbfe0
 8800b985fb30 8800b985fb20  8800b985fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? retint_kernel+0x10/0x10
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x158/0x1b90
 [] ? hrtick_update+0x51/0x70
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? unfreeze_page+0x320/0x320
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 58 e9 77 81 4c 89 f7 e8 fa 2a fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 59eb35cc15af8a53 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

> 
> I have vage suspicion that I'm screwing up anon_vma refcounting during
> split_huge_page.
> 
> It would be great to see if the page was part of THP before.
> 
> > 
> > ..
> > ..
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> > index:0x61445
> > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> > index:0x615ef
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(1)
> > page->mem_cgroup:88007f2de000
> > [ cut here ]
> > kernel BUG at mm/rmap.c:517!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> > RIP: 0010:[]  [] 
> > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Kirill A. Shutemov
On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> I detach this report from my patchset thread because I see below
> problem with removing MADV_FREE related code and I can reproduce
> same oops with MADV_FREE + recent patches(both my SetPageDirty
> and Kirill's pte_mkdirty) within 7 hours.

Could you share code for your workload?

> I can not be sure it's THP refcount redesign's problem but it was
> one of big change in MM between mmotm-2015-10-15-15-20 and
> mmotm-2015-10-06-16-30 so it could be a culprit.
> 
> In page_lock_anon_vma_read, anon_vma_root was NULL.
> I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

Hm. That's tricky.. :-/

Could you please dump anon_vma->refcount too?

I have vage suspicion that I'm screwing up anon_vma refcounting during
split_huge_page.

It would be great to see if the page was part of THP before.

> 
> ..
> ..
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> index:0x61445
> page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> index:0x615ef
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(1)
> page->mem_cgroup:88007f2de000
> [ cut here ]
> kernel BUG at mm/rmap.c:517!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> RIP: 0010:[]  [] 
> page_lock_anon_vma_read+0x18e/0x190
> RSP: :8800ada2b868  EFLAGS: 00010296
> RAX: 0021 RBX: ea0001b87bc0 RCX: 
> RDX: 0001 RSI: 0282 RDI: 81830db0
> RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75
> R10: 01ff14bc R11:  R12: 88007e806461
> R13: 88007e806460 R14:  R15: 818464c0
> FS:  7f6d93212740() GS:8800bfa0() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 63c14000 CR3: a674b000 CR4: 06b0
> Stack:
>  ea0001b87bc0 8800ada2b8f8 88007f2de000 
>  8800ada2b8d0 81129593 8800 8105f8c0
>  ea0001b87bc0 8800ada2b9f8 88007f2de000 
> Call Trace:
>  [] rmap_walk+0x1b3/0x3f0
>  [] ? finish_task_switch+0x70/0x260
>  [] page_referenced+0x1a3/0x220
>  [] ? __page_check_address+0x1d0/0x1d0
>  [] ? page_get_anon_vma+0xd0/0xd0
>  [] ? anon_vma_ctor+0x40/0x40
>  [] shrink_page_list+0x5ce/0xdc0
>  [] shrink_inactive_list+0x18c/0x4b0
>  [] shrink_lruvec+0x58f/0x730
>  [] shrink_zone+0xd4/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [] try_charge+0x175/0x720
>  [] ? __activate_page+0x230/0x230
>  [] mem_cgroup_try_charge+0x85/0x1d0
>  [] handle_mm_fault+0xc9a/0x1000
>  [] ? __set_cpus_allowed_ptr+0x9b/0x1a0
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 
> 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 
> 00 55 48 89 e5 41 57 41 56 45 31 f6 
> 41 55 4c 
> RIP  [] page_lock_anon_vma_read+0x18e/0x190
>  RSP 
> ---[ end trace cfbb87f54f12290e ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> 
> On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > > Hello, it's too late since I sent previos patch.
> > > > https://lkml.org/lkml/2015/6/3/37
> > > > 
> > > > This patch is alomost new compared to previos approach.
> > > > I think this is more simple, clear and easy to review.
> > > > 
> > > > One thing I should notice is that I have tested this patch
> > > > and couldn't find any critical problem so I rebased patchset
> > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > > patchset. Unfortunately, I start to see sudden discarding of
> > > > the page we shouldn't do. IOW, application's valid anonymous page
> > > > was disappeared suddenly.
> > > > 
> > > > When I look through THP changes, I think we could lose
> > > > dirty bit of pte between freeze_page and unfreeze_page
> > > > when we mark it as migration entry and restore it.
> > > > So, I added below simple code without enough considering
> > > > and cannot see the problem any more.
> > > > I hope it's good hint to find right fix this problem.
> > > > 
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Hugh Dickins
On Thu, 22 Oct 2015, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

Not a good use of your time, I think.  It's sure to be fixed in the
rc5-mmotm because that VM_BUG_ON_PAGE(blah) just does not exist in
that tree: I added it to verify my reasoning in changing the comments
about page_get_anon_vma() and PageSwapCache in mm/migrate.c.

> 
> > 
> > > page->mem_cgroup:88007f3dcc00
> > > [ cut here ]
> > > kernel BUG at mm/migrate.c:889!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> > 
> > Hmm, it might be me to blame, or it might be Kirill, don't know yet.
> 
> It might be me, either.
> 
> > 
> > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> > I haven't digested yet, but it might turn out to be relevant.

Sorry, I think that was an irrelevant suggestion: today's new rc6-mmotm
is identical to yesterday's there, and the patch that was removed appears
to be identical to the one added.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Hugh Dickins
On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.

I think I have introduced a bug there; or rather, made more evident
a pre-existing bug.  But I'm not sure yet: the stacktrace was from
compaction (called by khugepaged, but that may not be relevant at all),
and thinking through the races with isolate_migratepages_block() is
never easy.

What's certain is that I was not giving any thought to
isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
I was thinking about "stable" anonymous pages, and how they get
faulted back in from swapcache while holding page lock.

It looks to me now as if a page might not yet be PageAnon when it's
first tested in __unmap_and_move(), when going to page_get_anon_vma();
but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
where I inserted the VM_BUG_ON_PAGE().

If so, the code would always have been wrong (trying to unmap the
anonymous page, and later remap its replacement, without a hold on
the anon_vma needed to guide both lookups); but I'll have made it
more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
that's a good step forward :)

There's a reference count check in isolated_migratepages_block()
before this, which would make it unlikely, but I doubt rules it out.

However... you did hit an anon_vma reference counting problem before
my migration changes went in, and Kirill had a vague suspicion that
he might be screwing up anon_vma refcounting in split_huge_page():
if he confirms that, I'd say it's more likely to be the cause of
your crash on this occasion.

Not hard to fix mine (though we'll probably have to lose the
VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
trivial fix), I just want to give the races more thought.

However it turns out, I think you have a very useful test there.

(And I've observed no PageDirty problems with your recent patchsets,
though I don't use MADV_FREE at all myself.)

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Hugh Dickins
On Thu, 22 Oct 2015, Minchan Kim wrote:
> 
> I added the code to check it and queued it again but I had another oops
> in this time but symptom is related to anon_vma, too.
> (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> It seems page_get_anon_vma returns NULL since the page was not page_mapped
> at that time but second check of page_mapped right before try_to_unmap seems
> to be true.
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> index:0x60aff
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> !anon_vma)

That's interesting, that's one I added in my page migration series.
Let me think on it, but it could well relate to the one you got before.

> page->mem_cgroup:88007f3dcc00
> [ cut here ]
> kernel BUG at mm/migrate.c:889!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557

Hmm, it might be me to blame, or it might be Kirill, don't know yet.

Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
I haven't digested yet, but it might turn out to be relevant.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Minchan Kim
Hello Hugh,

On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > 
> > I added the code to check it and queued it again but I had another oops
> > in this time but symptom is related to anon_vma, too.
> > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > at that time but second check of page_mapped right before try_to_unmap seems
> > to be true.
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > index:0x60aff
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > !anon_vma)
> 
> That's interesting, that's one I added in my page migration series.
> Let me think on it, but it could well relate to the one you got before.

I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
instead of next-20151021 to remove noise from your migration cleanup
series and will test it again.
If it is fixed, I will test again with your migration patchset, then.

> 
> > page->mem_cgroup:88007f3dcc00
> > [ cut here ]
> > kernel BUG at mm/migrate.c:889!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> 
> Hmm, it might be me to blame, or it might be Kirill, don't know yet.

It might be me, either.

> 
> Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> I haven't digested yet, but it might turn out to be relevant.
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Minchan Kim
On Wed, Oct 21, 2015 at 02:07:23PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> > I detach this report from my patchset thread because I see below
> > problem with removing MADV_FREE related code and I can reproduce
> > same oops with MADV_FREE + recent patches(both my SetPageDirty
> > and Kirill's pte_mkdirty) within 7 hours.
> 
> Could you share code for your workload?

It's part of test suite so I need time to factor it out.
I will do/test and send it.

> 
> > I can not be sure it's THP refcount redesign's problem but it was
> > one of big change in MM between mmotm-2015-10-15-15-20 and
> > mmotm-2015-10-06-16-30 so it could be a culprit.
> > 
> > In page_lock_anon_vma_read, anon_vma_root was NULL.
> > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.
> 
> Hm. That's tricky.. :-/
> 
> Could you please dump anon_vma->refcount too?

I added the code to check it and queued it again but I had another oops
in this time but symptom is related to anon_vma, too.
(kernel is based on recent mmotm + unconditional mkdirty for bug fix)
It seems page_get_anon_vma returns NULL since the page was not page_mapped
at that time but second check of page_mapped right before try_to_unmap seems
to be true.

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
index:0x60aff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:88007f3dcc00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 11 PID: 59 Comm: khugepaged Not tainted 
4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b9851a40 ti: 8800b985c000 task.ti: 8800b985c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:8800b985fa00  EFLAGS: 00010286
RAX: 0021 RBX: ea0002dd7fc0 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8800b985fa80 R08:  R09: 880bb160
R10: 8163e000 R11: 01e0 R12: 
R13: ea0001cfbf80 R14: ea0001cfbfc0 R15: 8189de80
FS:  () GS:8800bfb6() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 5594f9d7e578 CR3: 01808000 CR4: 06a0
Stack:
 8800b9851a40   
 811144b0  81115fb0 ea0001cfbfe0
 8800b985fb30 8800b985fb20  8800b985fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? retint_kernel+0x10/0x10
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x158/0x1b90
 [] ? hrtick_update+0x51/0x70
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? unfreeze_page+0x320/0x320
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 58 e9 77 81 4c 89 f7 e8 fa 2a fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 59eb35cc15af8a53 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

> 
> I have vage suspicion that I'm screwing up anon_vma refcounting during
> split_huge_page.
> 
> It would be great to see if the page was part of THP before.
> 
> > 
> > ..
> > ..
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> > index:0x61445
> > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> > index:0x615ef
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(1)
> > page->mem_cgroup:88007f2de000
> > [ cut here ]
> > kernel BUG at mm/rmap.c:517!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> > RIP: 0010:[]  [] 
> > 

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Kirill A. Shutemov
On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> I detach this report from my patchset thread because I see below
> problem with removing MADV_FREE related code and I can reproduce
> same oops with MADV_FREE + recent patches(both my SetPageDirty
> and Kirill's pte_mkdirty) within 7 hours.

Could you share code for your workload?

> I can not be sure it's THP refcount redesign's problem but it was
> one of big change in MM between mmotm-2015-10-15-15-20 and
> mmotm-2015-10-06-16-30 so it could be a culprit.
> 
> In page_lock_anon_vma_read, anon_vma_root was NULL.
> I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

Hm. That's tricky.. :-/

Could you please dump anon_vma->refcount too?

I have vage suspicion that I'm screwing up anon_vma refcounting during
split_huge_page.

It would be great to see if the page was part of THP before.

> 
> ..
> ..
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> index:0x61445
> page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> index:0x615ef
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(1)
> page->mem_cgroup:88007f2de000
> [ cut here ]
> kernel BUG at mm/rmap.c:517!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> RIP: 0010:[]  [] 
> page_lock_anon_vma_read+0x18e/0x190
> RSP: :8800ada2b868  EFLAGS: 00010296
> RAX: 0021 RBX: ea0001b87bc0 RCX: 
> RDX: 0001 RSI: 0282 RDI: 81830db0
> RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75
> R10: 01ff14bc R11:  R12: 88007e806461
> R13: 88007e806460 R14:  R15: 818464c0
> FS:  7f6d93212740() GS:8800bfa0() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 63c14000 CR3: a674b000 CR4: 06b0
> Stack:
>  ea0001b87bc0 8800ada2b8f8 88007f2de000 
>  8800ada2b8d0 81129593 8800 8105f8c0
>  ea0001b87bc0 8800ada2b9f8 88007f2de000 
> Call Trace:
>  [] rmap_walk+0x1b3/0x3f0
>  [] ? finish_task_switch+0x70/0x260
>  [] page_referenced+0x1a3/0x220
>  [] ? __page_check_address+0x1d0/0x1d0
>  [] ? page_get_anon_vma+0xd0/0xd0
>  [] ? anon_vma_ctor+0x40/0x40
>  [] shrink_page_list+0x5ce/0xdc0
>  [] shrink_inactive_list+0x18c/0x4b0
>  [] shrink_lruvec+0x58f/0x730
>  [] shrink_zone+0xd4/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [] try_charge+0x175/0x720
>  [] ? __activate_page+0x230/0x230
>  [] mem_cgroup_try_charge+0x85/0x1d0
>  [] handle_mm_fault+0xc9a/0x1000
>  [] ? __set_cpus_allowed_ptr+0x9b/0x1a0
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 
> 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 
> 00 55 48 89 e5 41 57 41 56 45 31 f6 
> 41 55 4c 
> RIP  [] page_lock_anon_vma_read+0x18e/0x190
>  RSP 
> ---[ end trace cfbb87f54f12290e ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> 
> On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > > Hello, it's too late since I sent previos patch.
> > > > https://lkml.org/lkml/2015/6/3/37
> > > > 
> > > > This patch is alomost new compared to previos approach.
> > > > I think this is more simple, clear and easy to review.
> > > > 
> > > > One thing I should notice is that I have tested this patch
> > > > and couldn't find any critical problem so I rebased patchset
> > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > > patchset. Unfortunately, I start to see sudden discarding of
> > > > the page we shouldn't do. IOW, application's valid anonymous page
> > > > was disappeared suddenly.
> > > > 
> > > > When I look through THP changes, I think we could lose
> > > > dirty bit of pte between freeze_page and unfreeze_page
> > > > when we mark it as migration entry and restore it.
> > > > So, I added below simple code without enough considering
> > > > and cannot see the problem any more.
> > > > I hope it's good hint to find right fix this problem.
> > > > 
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > 

kernel oops on mmotm-2015-10-15-15-20

2015-10-20 Thread Minchan Kim
I detach this report from my patchset thread because I see below
problem with removing MADV_FREE related code and I can reproduce
same oops with MADV_FREE + recent patches(both my SetPageDirty
and Kirill's pte_mkdirty) within 7 hours.

I can not be sure it's THP refcount redesign's problem but it was
one of big change in MM between mmotm-2015-10-15-15-20 and
mmotm-2015-10-06-16-30 so it could be a culprit.

In page_lock_anon_vma_read, anon_vma_root was NULL.
I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

..
..
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
index:0x61445
page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
index:0x615ef
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(1)
page->mem_cgroup:88007f2de000
[ cut here ]
kernel BUG at mm/rmap.c:517!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
RIP: 0010:[]  [] 
page_lock_anon_vma_read+0x18e/0x190
RSP: :8800ada2b868  EFLAGS: 00010296
RAX: 0021 RBX: ea0001b87bc0 RCX: 
RDX: 0001 RSI: 0282 RDI: 81830db0
RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75
R10: 01ff14bc R11:  R12: 88007e806461
R13: 88007e806460 R14:  R15: 818464c0
FS:  7f6d93212740() GS:8800bfa0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 63c14000 CR3: a674b000 CR4: 06b0
Stack:
 ea0001b87bc0 8800ada2b8f8 88007f2de000 
 8800ada2b8d0 81129593 8800 8105f8c0
 ea0001b87bc0 8800ada2b9f8 88007f2de000 
Call Trace:
 [] rmap_walk+0x1b3/0x3f0
 [] ? finish_task_switch+0x70/0x260
 [] page_referenced+0x1a3/0x220
 [] ? __page_check_address+0x1d0/0x1d0
 [] ? page_get_anon_vma+0xd0/0xd0
 [] ? anon_vma_ctor+0x40/0x40
 [] shrink_page_list+0x5ce/0xdc0
 [] shrink_inactive_list+0x18c/0x4b0
 [] shrink_lruvec+0x58f/0x730
 [] shrink_zone+0xd4/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] try_to_free_mem_cgroup_pages+0x9d/0x120
 [] try_charge+0x175/0x720
 [] ? __activate_page+0x230/0x230
 [] mem_cgroup_try_charge+0x85/0x1d0
 [] handle_mm_fault+0xc9a/0x1000
 [] ? __set_cpus_allowed_ptr+0x9b/0x1a0
 [] __do_page_fault+0x189/0x400
 [] do_page_fault+0xc/0x10
 [] page_fault+0x22/0x30
Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 
89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 00 
55 48 89 e5 41 57 41 56 45 31 f6 
41 55 4c 
RIP  [] page_lock_anon_vma_read+0x18e/0x190
 RSP 
---[ end trace cfbb87f54f12290e ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > Hello, it's too late since I sent previos patch.
> > > https://lkml.org/lkml/2015/6/3/37
> > > 
> > > This patch is alomost new compared to previos approach.
> > > I think this is more simple, clear and easy to review.
> > > 
> > > One thing I should notice is that I have tested this patch
> > > and couldn't find any critical problem so I rebased patchset
> > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > patchset. Unfortunately, I start to see sudden discarding of
> > > the page we shouldn't do. IOW, application's valid anonymous page
> > > was disappeared suddenly.
> > > 
> > > When I look through THP changes, I think we could lose
> > > dirty bit of pte between freeze_page and unfreeze_page
> > > when we mark it as migration entry and restore it.
> > > So, I added below simple code without enough considering
> > > and cannot see the problem any more.
> > > I hope it's good hint to find right fix this problem.
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index d5ea516ffb54..e881c04f5950 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct 
> > > *vma, struct page *page,
> > >   if (is_write_migration_entry(swp_entry))
> > >   entry = maybe_mkwrite(entry, vma);
> > >  
> > > + if (PageDirty(page))
> > > + SetPageDirty(page);
> > 
> > The condition of PageDirty was typo. I didn't add the condition.
> > Just added.
> > 
> > 

kernel oops on mmotm-2015-10-15-15-20

2015-10-20 Thread Minchan Kim
I detach this report from my patchset thread because I see below
problem with removing MADV_FREE related code and I can reproduce
same oops with MADV_FREE + recent patches(both my SetPageDirty
and Kirill's pte_mkdirty) within 7 hours.

I can not be sure it's THP refcount redesign's problem but it was
one of big change in MM between mmotm-2015-10-15-15-20 and
mmotm-2015-10-06-16-30 so it could be a culprit.

In page_lock_anon_vma_read, anon_vma_root was NULL.
I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

..
..
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
index:0x61445
page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
index:0x615ef
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(1)
page->mem_cgroup:88007f2de000
[ cut here ]
kernel BUG at mm/rmap.c:517!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
RIP: 0010:[]  [] 
page_lock_anon_vma_read+0x18e/0x190
RSP: :8800ada2b868  EFLAGS: 00010296
RAX: 0021 RBX: ea0001b87bc0 RCX: 
RDX: 0001 RSI: 0282 RDI: 81830db0
RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75
R10: 01ff14bc R11:  R12: 88007e806461
R13: 88007e806460 R14:  R15: 818464c0
FS:  7f6d93212740() GS:8800bfa0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 63c14000 CR3: a674b000 CR4: 06b0
Stack:
 ea0001b87bc0 8800ada2b8f8 88007f2de000 
 8800ada2b8d0 81129593 8800 8105f8c0
 ea0001b87bc0 8800ada2b9f8 88007f2de000 
Call Trace:
 [] rmap_walk+0x1b3/0x3f0
 [] ? finish_task_switch+0x70/0x260
 [] page_referenced+0x1a3/0x220
 [] ? __page_check_address+0x1d0/0x1d0
 [] ? page_get_anon_vma+0xd0/0xd0
 [] ? anon_vma_ctor+0x40/0x40
 [] shrink_page_list+0x5ce/0xdc0
 [] shrink_inactive_list+0x18c/0x4b0
 [] shrink_lruvec+0x58f/0x730
 [] shrink_zone+0xd4/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] try_to_free_mem_cgroup_pages+0x9d/0x120
 [] try_charge+0x175/0x720
 [] ? __activate_page+0x230/0x230
 [] mem_cgroup_try_charge+0x85/0x1d0
 [] handle_mm_fault+0xc9a/0x1000
 [] ? __set_cpus_allowed_ptr+0x9b/0x1a0
 [] __do_page_fault+0x189/0x400
 [] do_page_fault+0xc/0x10
 [] page_fault+0x22/0x30
Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 
89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 00 
55 48 89 e5 41 57 41 56 45 31 f6 
41 55 4c 
RIP  [] page_lock_anon_vma_read+0x18e/0x190
 RSP 
---[ end trace cfbb87f54f12290e ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > Hello, it's too late since I sent previos patch.
> > > https://lkml.org/lkml/2015/6/3/37
> > > 
> > > This patch is alomost new compared to previos approach.
> > > I think this is more simple, clear and easy to review.
> > > 
> > > One thing I should notice is that I have tested this patch
> > > and couldn't find any critical problem so I rebased patchset
> > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > patchset. Unfortunately, I start to see sudden discarding of
> > > the page we shouldn't do. IOW, application's valid anonymous page
> > > was disappeared suddenly.
> > > 
> > > When I look through THP changes, I think we could lose
> > > dirty bit of pte between freeze_page and unfreeze_page
> > > when we mark it as migration entry and restore it.
> > > So, I added below simple code without enough considering
> > > and cannot see the problem any more.
> > > I hope it's good hint to find right fix this problem.
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index d5ea516ffb54..e881c04f5950 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct 
> > > *vma, struct page *page,
> > >   if (is_write_migration_entry(swp_entry))
> > >   entry = maybe_mkwrite(entry, vma);
> > >  
> > > + if (PageDirty(page))
> > > + SetPageDirty(page);
> > 
> > The condition of PageDirty was typo. I didn't add the condition.
> > Just added.
> > 
> >