subject:"Re\: kernel oops\!"

On Thu, Jul 16, 2020 at 09:22:11PM +0300, Maxim Levitsky wrote:
> On Thu, 2020-07-16 at 21:21 +0300, Andy Shevchenko wrote:
> > On Thu, Jul 16, 2020 at 09:00:00PM +0300, Maxim Levitsky wrote:
> > > On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote:

...

> > > It works (no more oops)
> > 
> > Thanks for testing. I'm about to send formal patch, can you give your 
> > Tested-by tag there then?
> 
> Of course.
> 
> Tested-by: Maxim Levitsky 

Thanks, I meant there [1] :-)

[1]: 
https://lore.kernel.org/lkml/20200716182747.54929-1-andriy.shevche...@linux.intel.com/T/#u

-- 
With Best Regards,
Andy Shevchenko

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

On Thu, Jul 16, 2020 at 09:00:00PM +0300, Maxim Levitsky wrote:
> On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote:
> > On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> > > Hi!
> > > 
> > > Few days ago I bisected a regression on 5.8 kernel:
> > > 
> > > I have nvidia rtx 2070s and its USB type C port driver (which is open 
> > > source)
> > > started to crash on load:
> > 
> > ...
> > 
> > > Reverting the commit helped fix this oops.
> > > 
> > > My .config attached.
> > > If any more info is needed I'll be happy to provide it,
> > > and of course test patches.
> > 
> > Can you test below?
> > 
> > diff --git a/drivers/base/property.c b/drivers/base/property.c
> > index 1e6d75e65938..d58aa98fe964 100644
> > --- a/drivers/base/property.c
> > +++ b/drivers/base/property.c
> > @@ -721,7 +721,7 @@ struct fwnode_handle *device_get_next_child_node(struct 
> > device *dev,
> > return next;
> >  
> > /* When no more children in primary, continue with secondary */
> > -   if (!IS_ERR_OR_NULL(fwnode->secondary))
> > +   if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary))
> > next = fwnode_get_next_child_node(fwnode->secondary, child);
> >  
> > return next;
> 
> It works (no more oops)

Thanks for testing. I'm about to send formal patch, can you give your Tested-by 
tag there then?

-- 
With Best Regards,
Andy Shevchenko

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

On Thu, 2020-07-16 at 21:21 +0300, Andy Shevchenko wrote:
> On Thu, Jul 16, 2020 at 09:00:00PM +0300, Maxim Levitsky wrote:
> > On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote:
> > > On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> > > > Hi!
> > > > 
> > > > Few days ago I bisected a regression on 5.8 kernel:
> > > > 
> > > > I have nvidia rtx 2070s and its USB type C port driver (which is open 
> > > > source)
> > > > started to crash on load:
> > > 
> > > ...
> > > 
> > > > Reverting the commit helped fix this oops.
> > > > 
> > > > My .config attached.
> > > > If any more info is needed I'll be happy to provide it,
> > > > and of course test patches.
> > > 
> > > Can you test below?
> > > 
> > > diff --git a/drivers/base/property.c b/drivers/base/property.c
> > > index 1e6d75e65938..d58aa98fe964 100644
> > > --- a/drivers/base/property.c
> > > +++ b/drivers/base/property.c
> > > @@ -721,7 +721,7 @@ struct fwnode_handle 
> > > *device_get_next_child_node(struct device *dev,
> > >   return next;
> > >  
> > >   /* When no more children in primary, continue with secondary */
> > > - if (!IS_ERR_OR_NULL(fwnode->secondary))
> > > + if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary))
> > >   next = fwnode_get_next_child_node(fwnode->secondary, child);
> > >  
> > >   return next;
> > 
> > It works (no more oops)
> 
> Thanks for testing. I'm about to send formal patch, can you give your 
> Tested-by tag there then?

Of course.

Tested-by: Maxim Levitsky 

Best regards,
Maxim Levitsky
>

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

On Thu, 2020-07-16 at 18:47 +0300, Andy Shevchenko wrote:
> On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> > Hi!
> > 
> > Few days ago I bisected a regression on 5.8 kernel:
> > 
> > I have nvidia rtx 2070s and its USB type C port driver (which is open 
> > source)
> > started to crash on load:
> 
> ...
> 
> > Reverting the commit helped fix this oops.
> > 
> > My .config attached.
> > If any more info is needed I'll be happy to provide it,
> > and of course test patches.
> 
> Can you test below?
> 
> diff --git a/drivers/base/property.c b/drivers/base/property.c
> index 1e6d75e65938..d58aa98fe964 100644
> --- a/drivers/base/property.c
> +++ b/drivers/base/property.c
> @@ -721,7 +721,7 @@ struct fwnode_handle *device_get_next_child_node(struct 
> device *dev,
>   return next;
>  
>   /* When no more children in primary, continue with secondary */
> - if (!IS_ERR_OR_NULL(fwnode->secondary))
> + if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary))
>   next = fwnode_get_next_child_node(fwnode->secondary, child);
>  
>   return next;

It works (no more oops)

Best regards,
Maxim Levitsky

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

On Thu, 2020-07-16 at 17:34 +0300, Andy Shevchenko wrote:
> On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> > Hi!
> > 
> > Few days ago I bisected a regression on 5.8 kernel:
> > 
> > I have nvidia rtx 2070s and its USB type C port driver (which is open 
> > source)
> > started to crash on load:
> 
> I'm looking at this, but I have questions:
> - any pointers to the device tree excerpt which this tries to iterate over
> - can you provide full Code: line?
> 
> Only way I see, why it happens, is that fwnode is not initialized properly
> somewhere (means it has garbage in the secondary pointer).
> 
> > [  +0.43] CPU: 19 PID: 31281 Comm: kworker/19:1 Tainted: PW  O  
> > 5.8.0-rc3.stable #133
> > [  +0.45] Hardware name: Gigabyte Technology Co., Ltd. TRX40 
> > DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020
> > [  +0.30] Workqueue: events_long ucsi_init_work [typec_ucsi]
> > [  +0.48] RIP: 0010:device_get_next_child_node+0x5b/0xb0
> > [  +0.24] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 
> > 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 
> > <48> 8b 03 48 85 c0 74 f3 48>
> > [  +0.65] RSP: 0018:c900038d7e08 EFLAGS: 00010246
> > [  +0.44] RAX: 889fb6b62f00 RBX:  RCX: 
> > 0001
> > [  +0.27] RDX: 889fb6fd4a70 RSI:  RDI: 
> > 889fb6b63608
> > [  +0.46] RBP:  R08: 0001 R09: 
> > 7fff
> > [  +0.24] R10: 2075ce282580 R11: 0062de3e R12: 
> > 889fb6b63608
> > [  +0.43] R13: 0001 R14: 889fb6b63018 R15: 
> > 0001
> > [  +0.44] FS:  () GS:889fbe4c() 
> > knlGS:
> > [  +0.24] CS:  0010 DS:  ES:  CR0: 80050033
> > [  +0.42] CR2:  CR3: 00175621b000 CR4: 
> > 00340ea0
> > [  +0.46] Call Trace:
> > [  +0.30]  ucsi_init+0x213/0x530 [typec_ucsi]
> > [  +0.28]  ucsi_init_work+0x12/0x20 [typec_ucsi]
> > [  +0.49]  process_one_work+0x1d2/0x390
> > [  +0.27]  worker_thread+0x4a/0x3b0
> > [  +0.25]  ? process_one_work+0x390/0x390
> > [  +0.49]  kthread+0xf9/0x130
> > [  +0.26]  ? kthread_park+0x90/0x90
> > [  +0.28]  ret_from_fork+0x1f/0x30
> > [  +0.48] Modules linked in: ucsi_ccg typec_ucsi typec hfsplus cdrom 
> > ntfs msdos vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost 
> > vhost_iotlb tap xfs rfcomm xt_M>
> > [  +0.39]  usb_storage ext4 mbcache jbd2 amdgpu gpu_sched ttm 
> > drm_kms_helper syscopyarea sysfillrect ahci sysimgblt fb_sys_fops 
> > crc32_pclmul libahci crc32c_intel igb ccp >
> > [  +0.000289] CR2: 
> > [  +0.26] ---[ end trace 38ebb9aebd55fbff ]---
> > [  +0.014201] RIP: 0010:device_get_next_child_node+0x5b/0xb0
> > [  +0.30] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 
> > 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 
> > <48> 8b 03 48 85 c0 74 f3 48>
> > [  +0.75] RSP: 0018:c900038d7e08 EFLAGS: 00010246
> > [  +0.27] RAX: 889fb6b62f00 RBX:  RCX: 
> > 0001
> > [  +0.48] RDX: 889fb6fd4a70 RSI:  RDI: 
> > 889fb6b63608
> > [  +0.49] RBP:  R08: 0001 R09: 
> > 7fff
> > [  +0.27] R10: 2075ce282580 R11: 0062de3e R12: 
> > 889fb6b63608
> > [  +0.49] R13: 0001 R14: 889fb6b63018 R15: 
> > 0001
> > [  +0.50] FS:  () GS:889fbe4c() 
> > knlGS:
> > [  +0.27] CS:  0010 DS:  ES:  CR0: 80050033
> > [  +0.50] CR2:  CR3: 00175621b000 CR4: 
> > 00340ea0
> > 
> > I bisected this, while passing the UCSI controller to a VM, and this
> > is the result:
> > 
> > git bisect start
> > # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7
> > git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
> > # bad: [48778464bb7d346b47157d21ffde2af6b2d39110] Linux 5.8-rc2
> > git bisect bad 48778464bb7d346b47157d21ffde2af6b2d39110
> > # good: [a98f670e41a99f53acb1fb33cee9c6abbb2e6f23] Merge tag 'media/v5.8-1' 
> > of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> > git bisect good a98f670e41a99f53acb1fb33cee9c6abbb2e6f23
> > # good: [081096d98bb23946f16215357b141c5616b234bf] Merge tag 'tty-5.8-rc1' 
> > of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > git bisect good 081096d98bb23946f16215357b141c5616b234bf
> > # bad: [3a2a8751742133a7bbc49b9d1bcbd52e212edff6] Merge tag 'for-v5.8' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
> > git bisect bad 3a2a8751742133a7bbc49b9d1bcbd52e212edff6
> > # bad: [a1e81f9654eef650d3ee35c94a8cab00b5cd379c] m68k: implement 
> > flush_icache_user_range
> > git bisect bad

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> Hi!
> 
> Few days ago I bisected a regression on 5.8 kernel:
> 
> I have nvidia rtx 2070s and its USB type C port driver (which is open source)
> started to crash on load:

...

> Reverting the commit helped fix this oops.
> 
> My .config attached.
> If any more info is needed I'll be happy to provide it,
> and of course test patches.

Can you test below?

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 1e6d75e65938..d58aa98fe964 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -721,7 +721,7 @@ struct fwnode_handle *device_get_next_child_node(struct 
device *dev,
return next;
 
/* When no more children in primary, continue with secondary */
-   if (!IS_ERR_OR_NULL(fwnode->secondary))
+   if (fwnode && !IS_ERR_OR_NULL(fwnode->secondary))
next = fwnode_get_next_child_node(fwnode->secondary, child);
 
return next;
-- 
With Best Regards,
Andy Shevchenko

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> Hi!
> 
> Few days ago I bisected a regression on 5.8 kernel:
> 
> I have nvidia rtx 2070s and its USB type C port driver (which is open source)
> started to crash on load:

I'm looking at this, but I have questions:
- any pointers to the device tree excerpt which this tries to iterate over
- can you provide full Code: line?

Only way I see, why it happens, is that fwnode is not initialized properly
somewhere (means it has garbage in the secondary pointer).

> [  +0.43] CPU: 19 PID: 31281 Comm: kworker/19:1 Tainted: PW  O
>   5.8.0-rc3.stable #133
> [  +0.45] Hardware name: Gigabyte Technology Co., Ltd. TRX40 
> DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020
> [  +0.30] Workqueue: events_long ucsi_init_work [typec_ucsi]
> [  +0.48] RIP: 0010:device_get_next_child_node+0x5b/0xb0
> [  +0.24] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 
> 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 
> 03 48 85 c0 74 f3 48>
> [  +0.65] RSP: 0018:c900038d7e08 EFLAGS: 00010246
> [  +0.44] RAX: 889fb6b62f00 RBX:  RCX: 
> 0001
> [  +0.27] RDX: 889fb6fd4a70 RSI:  RDI: 
> 889fb6b63608
> [  +0.46] RBP:  R08: 0001 R09: 
> 7fff
> [  +0.24] R10: 2075ce282580 R11: 0062de3e R12: 
> 889fb6b63608
> [  +0.43] R13: 0001 R14: 889fb6b63018 R15: 
> 0001
> [  +0.44] FS:  () GS:889fbe4c() 
> knlGS:
> [  +0.24] CS:  0010 DS:  ES:  CR0: 80050033
> [  +0.42] CR2:  CR3: 00175621b000 CR4: 
> 00340ea0
> [  +0.46] Call Trace:
> [  +0.30]  ucsi_init+0x213/0x530 [typec_ucsi]
> [  +0.28]  ucsi_init_work+0x12/0x20 [typec_ucsi]
> [  +0.49]  process_one_work+0x1d2/0x390
> [  +0.27]  worker_thread+0x4a/0x3b0
> [  +0.25]  ? process_one_work+0x390/0x390
> [  +0.49]  kthread+0xf9/0x130
> [  +0.26]  ? kthread_park+0x90/0x90
> [  +0.28]  ret_from_fork+0x1f/0x30
> [  +0.48] Modules linked in: ucsi_ccg typec_ucsi typec hfsplus cdrom ntfs 
> msdos vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost vhost_iotlb 
> tap xfs rfcomm xt_M>
> [  +0.39]  usb_storage ext4 mbcache jbd2 amdgpu gpu_sched ttm 
> drm_kms_helper syscopyarea sysfillrect ahci sysimgblt fb_sys_fops 
> crc32_pclmul libahci crc32c_intel igb ccp >
> [  +0.000289] CR2: 
> [  +0.26] ---[ end trace 38ebb9aebd55fbff ]---
> [  +0.014201] RIP: 0010:device_get_next_child_node+0x5b/0xb0
> [  +0.30] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 
> 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 
> 03 48 85 c0 74 f3 48>
> [  +0.75] RSP: 0018:c900038d7e08 EFLAGS: 00010246
> [  +0.27] RAX: 889fb6b62f00 RBX:  RCX: 
> 0001
> [  +0.48] RDX: 889fb6fd4a70 RSI:  RDI: 
> 889fb6b63608
> [  +0.49] RBP:  R08: 0001 R09: 
> 7fff
> [  +0.27] R10: 2075ce282580 R11: 0062de3e R12: 
> 889fb6b63608
> [  +0.49] R13: 0001 R14: 889fb6b63018 R15: 
> 0001
> [  +0.50] FS:  () GS:889fbe4c() 
> knlGS:
> [  +0.27] CS:  0010 DS:  ES:  CR0: 80050033
> [  +0.50] CR2:  CR3: 00175621b000 CR4: 
> 00340ea0
> 
> I bisected this, while passing the UCSI controller to a VM, and this
> is the result:
> 
> git bisect start
> # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7
> git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
> # bad: [48778464bb7d346b47157d21ffde2af6b2d39110] Linux 5.8-rc2
> git bisect bad 48778464bb7d346b47157d21ffde2af6b2d39110
> # good: [a98f670e41a99f53acb1fb33cee9c6abbb2e6f23] Merge tag 'media/v5.8-1' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> git bisect good a98f670e41a99f53acb1fb33cee9c6abbb2e6f23
> # good: [081096d98bb23946f16215357b141c5616b234bf] Merge tag 'tty-5.8-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect good 081096d98bb23946f16215357b141c5616b234bf
> # bad: [3a2a8751742133a7bbc49b9d1bcbd52e212edff6] Merge tag 'for-v5.8' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
> git bisect bad 3a2a8751742133a7bbc49b9d1bcbd52e212edff6
> # bad: [a1e81f9654eef650d3ee35c94a8cab00b5cd379c] m68k: implement 
> flush_icache_user_range
> git bisect bad a1e81f9654eef650d3ee35c94a8cab00b5cd379c
> # good: [c336c022503d1be719ca06f2526c211709e3d2d3] staging: wfx: remove false 
> positive warning
> git bisect good c336c022503d1be719ca06f2526c211709e3d2d3
> # good: [05c8a4fc44a916dd897769ca69b42381f9177ec4] habanalabs: correctly cast 
> u64

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

On Thu, 2020-07-16 at 10:28 +0200, Greg KH wrote:
> On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> > Hi!
> > 
> > Few days ago I bisected a regression on 5.8 kernel:
> > 
> > I have nvidia rtx 2070s and its USB type C port driver (which is open 
> > source)
> 
> Is that driver merged into the tree?  If not, do you have a pointer to
> it somewhere?
> 
> thanks,
> 
> greg k-h
> 
It is in the tree.

CONFIG_TYPEC_UCSI selectes the generic UCSI driver
CONFIG_UCSI_CCG selects the hardware driver,
which is an i2c driver which binds to an i2c device (I think with address 0x8)
on an i2c controller, which is exposed by function 3 of the NVIDIA card, and 
uses the
CONFIG_I2C_NVIDIA_GPU driver.

We also have CONFIG_TYPEC_NVIDIA_ALTMODE which I haven't researched
what it does.

Best regards,
Maxim Levitsky

Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'

2020-07-16 Thread Greg KH

On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> Hi!
> 
> Few days ago I bisected a regression on 5.8 kernel:
> 
> I have nvidia rtx 2070s and its USB type C port driver (which is open source)

Is that driver merged into the tree?  If not, do you have a pointer to
it somewhere?

thanks,

greg k-h

Re: Kernel Oops on 4.8.0-rc8 while running trinity tests

2016-09-27 Thread Abdul Haleem


The kernel oops is still reproducible on 4.8.0-rc8 on PowerPC bare metal

While running trinity system call fuzzer, I see these kernel oops messages:

Unable to handle kernel paging request for data at address 
0xe45f7702

Faulting instruction address: 0xc0055380
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA PowerNV
Modules linked in: torture leds_powernv led_class powernv_op_panel 
powernv_rng rng_core autofs4 [last unloaded: rcutorture]

CPU: 28 PID: 19687 Comm: trinity-main Not tainted 4.8.0-rc8-autotest #1
task: c007dc61c600 task.stack: c007ddb2
NIP: c0055380 LR: c0234968 CTR: 
REGS: c007ddb23640 TRAP: 0300   Not tainted (4.8.0-rc8-autotest)
MSR: 90009033   CR: 24002442  XER: 


CFAR: c00087d0 DAR: e45f7702 DSISR: 4000 SOFTE: 1
GPR00: 0007 c007ddb238c0 c0f7c100 c000
GPR04:  0009  
GPR08: e45f7702  007f 0015
GPR12:  c000  1000
GPR16: 0001  c2e02798 10034120
GPR20: 10034108 c007ddf842e0 c0ff0df8 
GPR24: c1fff7ff  c007ddb23a60 0100
GPR28: 0100 c2e02400 c2e02464 
NIP [c0055380] __find_linux_pte_or_hugepte+0x1c0/0x330
LR [c0234968] __unmap_hugepage_range+0x178/0x670
Call Trace:
[c007ddb23980] [c0234e80] __unmap_hugepage_range_final+0x20/0x50
[c007ddb239b0] [c020a52c] unmap_single_vma+0xcc/0x120
[c007ddb239f0] [c020a984] unmap_vmas+0x84/0x120
[c007ddb23a40] [c0212c00] unmap_region+0xd0/0x1a0
[c007ddb23b30] [c0214e8c] do_munmap+0x2dc/0x4a0
[c007ddb23ba0] [c0216800] mmap_region+0x1c0/0x6e0
[c007ddb23c90] [c02170fc] do_mmap+0x3dc/0x4c0
[c007ddb23d20] [c01f1034] vm_mmap_pgoff+0xc4/0x100
[c007ddb23d90] [c02144d0] SyS_mmap_pgoff+0x100/0x2a0
[c007ddb23e10] [c0012424] sys_mmap+0x44/0x70
[c007ddb23e30] [c00095e0] system_call+0x38/0x108
Instruction dump:
7d290030 79081764 3929 3860 7d2a07b4 7c895c36 7d494838 78630044
7908f5e6 79291f24 7d081b78 796b0020 <7d49402a> 7c694214 2eaa f941ffd0
---[ end trace f4f25c6801290199 ]---


On Friday 26 August 2016 12:02 PM, Abdul Haleem wrote:

Hi,

Trinity tests failed on mainline4.8.0-rc3with the following error 
message:


Machine Type : PowerPC Bare Metal & also reproducible on PowerVM LPAR
config : attached

06:05:25 20:36:07 INFO | Test: running trinity tests
06:05:25 20:36:07 INFO | trinity
06:05:25 20:36:07 INFO | STARTtrinity trinity
timestamp=1471912567localtime=Aug 22 20:36:07
06:06:19 Unable to handle kernel paging request for data at address 
0xe475e1dc0700

06:06:19 Faulting instruction address: 0xc00553a0
06:06:19 Oops: Kernel access of bad area, sig: 11 [#1]
06:06:19 SMP NR_CPUS=32 NUMA PowerNV
06:06:19 Modules linked in: torture iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp tun bridge stp llc iptable_filter ip_tables 
x_tables binfmt_misc kvm_hv kvm leds_powernv led_class 
powernv_op_panel powernv_rng rng_core autofs4 btrfs xor raid6_pq [last 
unloaded: rcutorture]
06:06:19 CPU: 24 PID: 16309 Comm: trinity-main Not tainted 
4.8.0-rc3-autotest #1

06:06:19 task: c007de33 task.stack: c007d85dc000
06:06:19 NIP: c00553a0 LR: c02345a8 CTR: 
06:06:19 REGS: c007d85df640 TRAP: 0300   Not tainted 
(4.8.0-rc3-autotest)
06:06:19 MSR: 90009033   CR: 
24002452  XER: 
06:06:19 CFAR: c00087d0 DAR: e475e1dc0700 DSISR: 4000 
SOFTE: 1
06:06:19 GPR00: 0007 c007d85df8c0 c0f7ad00 
c000
06:06:19 GPR04:  0009 0700 

06:06:19 GPR08: e475e1dc0700  007f 
0015
06:06:19 GPR12:  cfffe000  
1000
06:06:19 GPR16: 0001  c007ddfa6798 
100341e0
06:06:19 GPR20: 100341c8 c007dc336508 c0ff0df8 

06:06:19 GPR24: c1fff7ff  c007d85dfa60 
0100
06:06:19 GPR28: 0100 c007ddfa6400 c007ddfa6464 
0007

06:06:19 NIP [c00553a0] __find_linux_pte_or_hugepte+0x1c0/0x330
06:06:19 LR [c02345a8] __unmap_hugepage_range+0x178/0x670
06:06:19 Call Trace:
06:06:19 [c007d85df980] [c0234ac0] 
__unmap_hugepage_range_final+0x20/0x50
06:06:19

Re: Kernel Oops on 4.8.0-rc8 while running trinity tests

2016-09-27 Thread Abdul Haleem


The kernel oops is still reproducible on 4.8.0-rc8 on PowerPC bare metal

While running trinity system call fuzzer, I see these kernel oops messages:

Unable to handle kernel paging request for data at address 
0xe45f7702

Faulting instruction address: 0xc0055380
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA PowerNV
Modules linked in: torture leds_powernv led_class powernv_op_panel 
powernv_rng rng_core autofs4 [last unloaded: rcutorture]

CPU: 28 PID: 19687 Comm: trinity-main Not tainted 4.8.0-rc8-autotest #1
task: c007dc61c600 task.stack: c007ddb2
NIP: c0055380 LR: c0234968 CTR: 
REGS: c007ddb23640 TRAP: 0300   Not tainted (4.8.0-rc8-autotest)
MSR: 90009033   CR: 24002442  XER: 


CFAR: c00087d0 DAR: e45f7702 DSISR: 4000 SOFTE: 1
GPR00: 0007 c007ddb238c0 c0f7c100 c000
GPR04:  0009  
GPR08: e45f7702  007f 0015
GPR12:  c000  1000
GPR16: 0001  c2e02798 10034120
GPR20: 10034108 c007ddf842e0 c0ff0df8 
GPR24: c1fff7ff  c007ddb23a60 0100
GPR28: 0100 c2e02400 c2e02464 
NIP [c0055380] __find_linux_pte_or_hugepte+0x1c0/0x330
LR [c0234968] __unmap_hugepage_range+0x178/0x670
Call Trace:
[c007ddb23980] [c0234e80] __unmap_hugepage_range_final+0x20/0x50
[c007ddb239b0] [c020a52c] unmap_single_vma+0xcc/0x120
[c007ddb239f0] [c020a984] unmap_vmas+0x84/0x120
[c007ddb23a40] [c0212c00] unmap_region+0xd0/0x1a0
[c007ddb23b30] [c0214e8c] do_munmap+0x2dc/0x4a0
[c007ddb23ba0] [c0216800] mmap_region+0x1c0/0x6e0
[c007ddb23c90] [c02170fc] do_mmap+0x3dc/0x4c0
[c007ddb23d20] [c01f1034] vm_mmap_pgoff+0xc4/0x100
[c007ddb23d90] [c02144d0] SyS_mmap_pgoff+0x100/0x2a0
[c007ddb23e10] [c0012424] sys_mmap+0x44/0x70
[c007ddb23e30] [c00095e0] system_call+0x38/0x108
Instruction dump:
7d290030 79081764 3929 3860 7d2a07b4 7c895c36 7d494838 78630044
7908f5e6 79291f24 7d081b78 796b0020 <7d49402a> 7c694214 2eaa f941ffd0
---[ end trace f4f25c6801290199 ]---


On Friday 26 August 2016 12:02 PM, Abdul Haleem wrote:

Hi,

Trinity tests failed on mainline4.8.0-rc3with the following error 
message:


Machine Type : PowerPC Bare Metal & also reproducible on PowerVM LPAR
config : attached

06:05:25 20:36:07 INFO | Test: running trinity tests
06:05:25 20:36:07 INFO | trinity
06:05:25 20:36:07 INFO | STARTtrinity trinity
timestamp=1471912567localtime=Aug 22 20:36:07
06:06:19 Unable to handle kernel paging request for data at address 
0xe475e1dc0700

06:06:19 Faulting instruction address: 0xc00553a0
06:06:19 Oops: Kernel access of bad area, sig: 11 [#1]
06:06:19 SMP NR_CPUS=32 NUMA PowerNV
06:06:19 Modules linked in: torture iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp tun bridge stp llc iptable_filter ip_tables 
x_tables binfmt_misc kvm_hv kvm leds_powernv led_class 
powernv_op_panel powernv_rng rng_core autofs4 btrfs xor raid6_pq [last 
unloaded: rcutorture]
06:06:19 CPU: 24 PID: 16309 Comm: trinity-main Not tainted 
4.8.0-rc3-autotest #1

06:06:19 task: c007de33 task.stack: c007d85dc000
06:06:19 NIP: c00553a0 LR: c02345a8 CTR: 
06:06:19 REGS: c007d85df640 TRAP: 0300   Not tainted 
(4.8.0-rc3-autotest)
06:06:19 MSR: 90009033   CR: 
24002452  XER: 
06:06:19 CFAR: c00087d0 DAR: e475e1dc0700 DSISR: 4000 
SOFTE: 1
06:06:19 GPR00: 0007 c007d85df8c0 c0f7ad00 
c000
06:06:19 GPR04:  0009 0700 

06:06:19 GPR08: e475e1dc0700  007f 
0015
06:06:19 GPR12:  cfffe000  
1000
06:06:19 GPR16: 0001  c007ddfa6798 
100341e0
06:06:19 GPR20: 100341c8 c007dc336508 c0ff0df8 

06:06:19 GPR24: c1fff7ff  c007d85dfa60 
0100
06:06:19 GPR28: 0100 c007ddfa6400 c007ddfa6464 
0007

06:06:19 NIP [c00553a0] __find_linux_pte_or_hugepte+0x1c0/0x330
06:06:19 LR [c02345a8] __unmap_hugepage_range+0x178/0x670
06:06:19 Call Trace:
06:06:19 [c007d85df980] [c0234ac0] 
__unmap_hugepage_range_final+0x20/0x50
06:06:19 [c007d85df9b0] [c020a16c]

Re: kernel OOPS in MM(?)

2016-03-15 Thread Evgenii Lepikhin

Hello,

On 2016-03-10 12:31, Evgenii Lepikhin wrote:

> We need help to understand the source of the problem and may be to create a 
> bugreport. Here is crash report:
>
> Mar 10 04:03:51 l28 kernel: [2075560.434445] BUG: unable to handle kernel 
> paging request at 40008021
> Mar 10 04:03:51 l28 kernel: [2075560.434669] IP: [] 
> __kmalloc+0x69/0x100
> Mar 10 04:03:51 l28 kernel: [2075560.434800] PGD b7e462067 PUD 0 
> Mar 10 04:03:51 l28 kernel: [2075560.434913] Oops:  [#1] SMP 
> Mar 10 04:03:51 l28 kernel: [2075560.435044] Modules linked in:
> tcm_loop iscsi_target_mod target_core_pscsi target_core_file
> target_core_iblock target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp
> libis
> csi_tcp libiscsi scsi_transport_iscsi fuse [last unloaded: ipfw_mod]
> Mar 10 04:03:51 l28 kernel: [2075560.435539] CPU: 4 PID: 27141 Comm: rm 
> Tainted: G   O 3.12.51-jl-2015-12-25 #1
> Mar 10 04:03:51 l28 kernel: [2075560.435734] Hardware name: Intel Corporation 
> S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 
> 02/26/2013
> Mar 10 04:03:51 l28 kernel: [2075560.435939] task: 880e622ccba0 ti: 
> 880eeb008000 task.ti: 880eeb008000
> Mar 10 04:03:51 l28 kernel: [2075560.436131] RIP: 0010:[]  
> [] __kmalloc+0x69/0x100
> Mar 10 04:03:51 l28 kernel: [2075560.436333] RSP: 0018:880eeb009b38  
> EFLAGS: 00010282
> Mar 10 04:03:51 l28 kernel: [2075560.436439] RAX:  RBX: 
>  RCX: a8a73dc2
> Mar 10 04:03:51 l28 kernel: [2075560.436632] RDX: a8a73dc1 RSI: 
>  RDI: 00013500
> Mar 10 04:03:51 l28 kernel: [2075560.438248] RBP: 880eeb009b58 R08: 
> 88103fc13500 R09: 811a0267
> Mar 10 04:03:51 l28 kernel: [2075560.438446] R10: 880eeb009d84 R11: 
>  R12: 88081f803a00
> Mar 10 04:03:51 l28 kernel: [2075560.438656] R13: 40008021 R14: 
> 0250 R15: 880250e833b0
> Mar 10 04:03:51 l28 kernel: [2075560.438851] FS:  7fe2316dd700() 
> GS:88103fc0() knlGS:
> Mar 10 04:03:51 l28 kernel: [2075560.439045] CS:  0010 DS:  ES:  CR0: 
> 80050033
> Mar 10 04:03:51 l28 kernel: [2075560.439152] CR2: 40008021 CR3: 
> 000a20736000 CR4: 000407e0
> Mar 10 04:03:51 l28 kernel: [2075560.439343] Stack:
> Mar 10 04:03:51 l28 kernel: [2075560.439439]   
> 0250 0060 
> Mar 10 04:03:51 l28 kernel: [2075560.439663]  880eeb009b88 
> 811a0267 881015fb7fe0 0060
> Mar 10 04:03:51 l28 kernel: [2075560.439898]  880250e83490 
>  880eeb009ba8 811a02f8
> Mar 10 04:03:51 l28 kernel: [2075560.440153] Call Trace:
> Mar 10 04:03:51 l28 kernel: [2075560.440257]  [] 
> kmem_alloc+0x67/0xe0
> Mar 10 04:03:51 l28 kernel: [2075560.440365]  [] 
> kmem_zalloc+0x18/0x40
> Mar 10 04:03:51 l28 kernel: [2075560.440473]  [] 
> xfs_log_commit_cil+0x373/0x4c0
> Mar 10 04:03:51 l28 kernel: [2075560.440585]  [] ? 
> xfs_bmap_search_multi_extents+0xe0/0x110
> Mar 10 04:03:51 l28 kernel: [2075560.440783]  [] 
> xfs_trans_commit+0x6c/0x250
> Mar 10 04:03:51 l28 kernel: [2075560.440899]  [] 
> xfs_bmap_finish+0xb7/0x1a0

Another issue on the same server, same instruction pointer:

Mar 16 04:53:54 l28 kernel: [521052.387878] BUG: unable to handle kernel paging 
request at 40008021
Mar 16 04:53:54 l28 kernel: [521052.388022] IP: [] 
__kmalloc+0x69/0x100
Mar 16 04:53:54 l28 kernel: [521052.388171] PGD 0 
Mar 16 04:53:54 l28 kernel: [521052.388289] Oops:  [#1] SMP 
Mar 16 04:53:54 l28 kernel: [521052.388410] Modules linked in: tcm_loop 
iscsi_target_mod target_core_pscsi target_core_file target_core_iblock 
target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp libis
csi_tcp libiscsi scsi_transport_iscsi fuse
Mar 16 04:53:54 l28 kernel: [521052.388913] CPU: 6 PID: 5947 Comm: iscsi_trx 
Tainted: G   O 3.12.51-jl-2015-12-25 #1
Mar 16 04:53:54 l28 kernel: [521052.389125] Hardware name: Intel Corporation 
S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Mar 16 04:53:54 l28 kernel: [521052.389351] task: 88081a3a6720 ti: 
8808162de000 task.ti: 8808162de000
Mar 16 04:53:54 l28 kernel: [521052.389566] RIP: 0010:[]  
[] __kmalloc+0x69/0x100
Mar 16 04:53:54 l28 kernel: [521052.389782] RSP: 0018:8808162dfd18  EFLAGS: 
00010286
Mar 16 04:53:54 l28 kernel: [521052.389899] RAX:  RBX: 
880819a51800 RCX: 03b305d3
Mar 16 04:53:54 l28 kernel: [521052.390112] RDX: 03b305d2 RSI: 
 RDI: 00013500
Mar 16 04:53:54 l28 kernel: [521052.390309] RBP: 8808162dfd38 R08: 
88103fd13500 R09: a00e7072
Mar 16 04:53:54 l28 kernel: [521052.390503] R10: 0010 R11: 
0030 R12: 88081f803a00
Mar 16 04:53:54 l28 kernel: [521052.390694] R13: 40008021 R14: 
80d0 R15: 8808162dfdd0
Mar 16

Re: kernel OOPS in MM(?)

2016-03-15 Thread Evgenii Lepikhin

Hello,

On 2016-03-10 12:31, Evgenii Lepikhin wrote:

> We need help to understand the source of the problem and may be to create a 
> bugreport. Here is crash report:
>
> Mar 10 04:03:51 l28 kernel: [2075560.434445] BUG: unable to handle kernel 
> paging request at 40008021
> Mar 10 04:03:51 l28 kernel: [2075560.434669] IP: [] 
> __kmalloc+0x69/0x100
> Mar 10 04:03:51 l28 kernel: [2075560.434800] PGD b7e462067 PUD 0 
> Mar 10 04:03:51 l28 kernel: [2075560.434913] Oops:  [#1] SMP 
> Mar 10 04:03:51 l28 kernel: [2075560.435044] Modules linked in:
> tcm_loop iscsi_target_mod target_core_pscsi target_core_file
> target_core_iblock target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp
> libis
> csi_tcp libiscsi scsi_transport_iscsi fuse [last unloaded: ipfw_mod]
> Mar 10 04:03:51 l28 kernel: [2075560.435539] CPU: 4 PID: 27141 Comm: rm 
> Tainted: G   O 3.12.51-jl-2015-12-25 #1
> Mar 10 04:03:51 l28 kernel: [2075560.435734] Hardware name: Intel Corporation 
> S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 
> 02/26/2013
> Mar 10 04:03:51 l28 kernel: [2075560.435939] task: 880e622ccba0 ti: 
> 880eeb008000 task.ti: 880eeb008000
> Mar 10 04:03:51 l28 kernel: [2075560.436131] RIP: 0010:[]  
> [] __kmalloc+0x69/0x100
> Mar 10 04:03:51 l28 kernel: [2075560.436333] RSP: 0018:880eeb009b38  
> EFLAGS: 00010282
> Mar 10 04:03:51 l28 kernel: [2075560.436439] RAX:  RBX: 
>  RCX: a8a73dc2
> Mar 10 04:03:51 l28 kernel: [2075560.436632] RDX: a8a73dc1 RSI: 
>  RDI: 00013500
> Mar 10 04:03:51 l28 kernel: [2075560.438248] RBP: 880eeb009b58 R08: 
> 88103fc13500 R09: 811a0267
> Mar 10 04:03:51 l28 kernel: [2075560.438446] R10: 880eeb009d84 R11: 
>  R12: 88081f803a00
> Mar 10 04:03:51 l28 kernel: [2075560.438656] R13: 40008021 R14: 
> 0250 R15: 880250e833b0
> Mar 10 04:03:51 l28 kernel: [2075560.438851] FS:  7fe2316dd700() 
> GS:88103fc0() knlGS:
> Mar 10 04:03:51 l28 kernel: [2075560.439045] CS:  0010 DS:  ES:  CR0: 
> 80050033
> Mar 10 04:03:51 l28 kernel: [2075560.439152] CR2: 40008021 CR3: 
> 000a20736000 CR4: 000407e0
> Mar 10 04:03:51 l28 kernel: [2075560.439343] Stack:
> Mar 10 04:03:51 l28 kernel: [2075560.439439]   
> 0250 0060 
> Mar 10 04:03:51 l28 kernel: [2075560.439663]  880eeb009b88 
> 811a0267 881015fb7fe0 0060
> Mar 10 04:03:51 l28 kernel: [2075560.439898]  880250e83490 
>  880eeb009ba8 811a02f8
> Mar 10 04:03:51 l28 kernel: [2075560.440153] Call Trace:
> Mar 10 04:03:51 l28 kernel: [2075560.440257]  [] 
> kmem_alloc+0x67/0xe0
> Mar 10 04:03:51 l28 kernel: [2075560.440365]  [] 
> kmem_zalloc+0x18/0x40
> Mar 10 04:03:51 l28 kernel: [2075560.440473]  [] 
> xfs_log_commit_cil+0x373/0x4c0
> Mar 10 04:03:51 l28 kernel: [2075560.440585]  [] ? 
> xfs_bmap_search_multi_extents+0xe0/0x110
> Mar 10 04:03:51 l28 kernel: [2075560.440783]  [] 
> xfs_trans_commit+0x6c/0x250
> Mar 10 04:03:51 l28 kernel: [2075560.440899]  [] 
> xfs_bmap_finish+0xb7/0x1a0

Another issue on the same server, same instruction pointer:

Mar 16 04:53:54 l28 kernel: [521052.387878] BUG: unable to handle kernel paging 
request at 40008021
Mar 16 04:53:54 l28 kernel: [521052.388022] IP: [] 
__kmalloc+0x69/0x100
Mar 16 04:53:54 l28 kernel: [521052.388171] PGD 0 
Mar 16 04:53:54 l28 kernel: [521052.388289] Oops:  [#1] SMP 
Mar 16 04:53:54 l28 kernel: [521052.388410] Modules linked in: tcm_loop 
iscsi_target_mod target_core_pscsi target_core_file target_core_iblock 
target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp libis
csi_tcp libiscsi scsi_transport_iscsi fuse
Mar 16 04:53:54 l28 kernel: [521052.388913] CPU: 6 PID: 5947 Comm: iscsi_trx 
Tainted: G   O 3.12.51-jl-2015-12-25 #1
Mar 16 04:53:54 l28 kernel: [521052.389125] Hardware name: Intel Corporation 
S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Mar 16 04:53:54 l28 kernel: [521052.389351] task: 88081a3a6720 ti: 
8808162de000 task.ti: 8808162de000
Mar 16 04:53:54 l28 kernel: [521052.389566] RIP: 0010:[]  
[] __kmalloc+0x69/0x100
Mar 16 04:53:54 l28 kernel: [521052.389782] RSP: 0018:8808162dfd18  EFLAGS: 
00010286
Mar 16 04:53:54 l28 kernel: [521052.389899] RAX:  RBX: 
880819a51800 RCX: 03b305d3
Mar 16 04:53:54 l28 kernel: [521052.390112] RDX: 03b305d2 RSI: 
 RDI: 00013500
Mar 16 04:53:54 l28 kernel: [521052.390309] RBP: 8808162dfd38 R08: 
88103fd13500 R09: a00e7072
Mar 16 04:53:54 l28 kernel: [521052.390503] R10: 0010 R11: 
0030 R12: 88081f803a00
Mar 16 04:53:54 l28 kernel: [521052.390694] R13: 40008021 R14: 
80d0 R15: 8808162dfdd0
Mar 16

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-24 Thread Minchan Kim

On Thu, Nov 19, 2015 at 08:58:27AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > > I couldn't see any problem.
> > > > > > > > 
> > > > > > > > However, in this round, I did another test which is same one
> > > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > > 
> > > > > > > Could you share updated test?
> > > > > > 
> > > > > > It's part of my testing suite so I should factor it out.
> > > > > > I will send it when I go to office tomorrow.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > > 
> > > > > > Befor leaving office, I queued it up and result is below.
> > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > > fix patches.
> > > > > 
> > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 
> > > > > release:
> > > > > 
> > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > > 
> > > > 1. mm: fix __page_mapcount()
> > > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > > 
> > > > If I missed some patches, let me know it.
> > > > 
> > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > > again.
> > > > But unfortunately, the result was below.
> > > > 
> > > > Now, I am making test program I can send to you but it seems to be not 
> > > > easy
> > > > because small changes for factoring it out from testing suite seems to 
> > > > change
> > > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > > 
> > > Your test suite seems generate quite a few bug reports. Don't mind make 
> > > whole
> > > suite public?
> > 
> > It's tough due to including company internal stuffs.
> > That's why I try to factor the part I can share out but unfortunatel,
> > I couldn't grab a time for retrying until now. :(
> > 
> > >  
> > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > > index:0x60e02
> > > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > > page->mem_cgroup:880077cf0c00
> > > > [ cut here ]
> > > > kernel BUG at mm/huge_memory.c:3272!
> > > > invalid opcode:  [#1] SMP 
> > > > Dumping ftrace buffer:
> > > >(ftrace buffer empty)
> > > > Modules linked in:
> > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > > 01/01/2011
> > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > > RIP: 0010:[]  [] 
> > > > split_huge_page_to_list+0x8fb/0x910
> > > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > > RAX: 0021 RBX: ea240080 RCX: 
> > > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > > R13: ea240088 R14: ea240080 R15: 
> > > > FS:  () GS:88007830() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > > Stack:
> > > >  cccd ea240080 88007344fa00 ea240088
> > > >  88007344fa00  88007344f9e8 810f0200
> > > >  ea24   ea240080
> > > > Call Trace:
> > > >  [] ? __lock_page+0xa0/0xb0
> > > >  [] deferred_split_scan+0x115/0x240
> > > >  [] ? list_lru_count_one+0x1c/0x30
> > > >  [] shrink_slab.part.42+0x1e3/0x350
> > > >  [] shrink_zone+0x26a/0x280
> > > >  [] do_try_to_free_pages+0x12d/0x3b0
> > > >  [] try_to_free_pages+0xb4/0x140
> > > >  [] __alloc_pages_nodemask+0x459/0x920
> > > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > > >  [] khugepaged+0x155/0x1b10
> > > >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-24 Thread Minchan Kim

On Thu, Nov 19, 2015 at 08:58:27AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > > I couldn't see any problem.
> > > > > > > > 
> > > > > > > > However, in this round, I did another test which is same one
> > > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > > 
> > > > > > > Could you share updated test?
> > > > > > 
> > > > > > It's part of my testing suite so I should factor it out.
> > > > > > I will send it when I go to office tomorrow.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > > 
> > > > > > Befor leaving office, I queued it up and result is below.
> > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > > fix patches.
> > > > > 
> > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 
> > > > > release:
> > > > > 
> > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > > 
> > > > 1. mm: fix __page_mapcount()
> > > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > > 
> > > > If I missed some patches, let me know it.
> > > > 
> > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > > again.
> > > > But unfortunately, the result was below.
> > > > 
> > > > Now, I am making test program I can send to you but it seems to be not 
> > > > easy
> > > > because small changes for factoring it out from testing suite seems to 
> > > > change
> > > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > > 
> > > Your test suite seems generate quite a few bug reports. Don't mind make 
> > > whole
> > > suite public?
> > 
> > It's tough due to including company internal stuffs.
> > That's why I try to factor the part I can share out but unfortunatel,
> > I couldn't grab a time for retrying until now. :(
> > 
> > >  
> > > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > > index:0x60e02
> > > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > > page->mem_cgroup:880077cf0c00
> > > > [ cut here ]
> > > > kernel BUG at mm/huge_memory.c:3272!
> > > > invalid opcode:  [#1] SMP 
> > > > Dumping ftrace buffer:
> > > >(ftrace buffer empty)
> > > > Modules linked in:
> > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > > 01/01/2011
> > > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > > RIP: 0010:[]  [] 
> > > > split_huge_page_to_list+0x8fb/0x910
> > > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > > RAX: 0021 RBX: ea240080 RCX: 
> > > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > > R13: ea240088 R14: ea240080 R15: 
> > > > FS:  () GS:88007830() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > > Stack:
> > > >  cccd ea240080 88007344fa00 ea240088
> > > >  88007344fa00  88007344f9e8 810f0200
> > > >  ea24   ea240080
> > > > Call Trace:
> > > >  [] ? __lock_page+0xa0/0xb0
> > > >  [] deferred_split_scan+0x115/0x240
> > > >  [] ? list_lru_count_one+0x1c/0x30
> > > >  [] shrink_slab.part.42+0x1e3/0x350
> > > >  [] shrink_zone+0x26a/0x280
> > > >  [] do_try_to_free_pages+0x12d/0x3b0
> > > >  [] try_to_free_pages+0xb4/0x140
> > > >  [] __alloc_pages_nodemask+0x459/0x920
> > > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > > >  [] khugepaged+0x155/0x1b10
> > > >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-19 Thread yalin wang


> On Nov 19, 2015, at 14:58, Kirill A. Shutemov  wrote:
> 
> uncharged
i also encounter this crash ,

also  i encounter a crash like this in qemu:


[2.703436] [] do_execveat_common.isra.36+0x4f0/0x630
[2.703624] [] do_execve+0x24/0x30
[2.703767] [] SyS_execve+0x1c/0x2c
[2.703923] BUG: Bad page map in process init  pte:604837ebd3 
pmd:b29e7003
[2.704140] page:ffc07f00af80 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.704414] flags: 0x4014(referenced|dirty)
[2.704563] page dumped because: bad pte
[2.704666] addr:007fafb7e000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7e
[2.704906] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.705117] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.705315] Hardware name: ranchu (DT)
[2.705408] Call trace:
[2.705488] [] dump_backtrace+0x0/0x124
[2.705657] [] show_stack+0x10/0x1c
[2.705797] [] dump_stack+0x78/0x98
[2.705971] [] print_bad_pte+0x154/0x1f0
[2.706102] [] unmap_single_vma+0x574/0x704
[2.706236] [] unmap_vmas+0x54/0x70
[2.706354] [] exit_mmap+0x88/0xfc
[2.706473] [] mmput+0x48/0xe8
[2.706584] [] flush_old_exec+0x30c/0x79c
[2.706719] [] load_elf_binary+0x21c/0x1098
[2.706856] [] search_binary_handler+0xa8/0x224
[2.706995] [] do_execveat_common.isra.36+0x4f0/0x630
[2.707144] [] do_execve+0x24/0x30
[2.707263] [] SyS_execve+0x1c/0x2c
[2.707392] BUG: Bad page map in process init  pte:604837fbd3 
pmd:b29e7003
[2.707752] page:ffc07f00afc0 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.708167] flags: 0x4014(referenced|dirty)
[2.708333] page dumped because: bad pte
[2.708501] addr:007fafb7f000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7f
[2.709084] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.709306] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.709494] Hardware name: ranchu (DT)

seems the page map count is not correct ..
i build is based on mmotm-2015-10-21-14-41

Thanks



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-19 Thread yalin wang


> On Nov 19, 2015, at 14:58, Kirill A. Shutemov  wrote:
> 
> uncharged
i also encounter this crash ,

also  i encounter a crash like this in qemu:


[2.703436] [] do_execveat_common.isra.36+0x4f0/0x630
[2.703624] [] do_execve+0x24/0x30
[2.703767] [] SyS_execve+0x1c/0x2c
[2.703923] BUG: Bad page map in process init  pte:604837ebd3 
pmd:b29e7003
[2.704140] page:ffc07f00af80 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.704414] flags: 0x4014(referenced|dirty)
[2.704563] page dumped because: bad pte
[2.704666] addr:007fafb7e000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7e
[2.704906] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.705117] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.705315] Hardware name: ranchu (DT)
[2.705408] Call trace:
[2.705488] [] dump_backtrace+0x0/0x124
[2.705657] [] show_stack+0x10/0x1c
[2.705797] [] dump_stack+0x78/0x98
[2.705971] [] print_bad_pte+0x154/0x1f0
[2.706102] [] unmap_single_vma+0x574/0x704
[2.706236] [] unmap_vmas+0x54/0x70
[2.706354] [] exit_mmap+0x88/0xfc
[2.706473] [] mmput+0x48/0xe8
[2.706584] [] flush_old_exec+0x30c/0x79c
[2.706719] [] load_elf_binary+0x21c/0x1098
[2.706856] [] search_binary_handler+0xa8/0x224
[2.706995] [] do_execveat_common.isra.36+0x4f0/0x630
[2.707144] [] do_execve+0x24/0x30
[2.707263] [] SyS_execve+0x1c/0x2c
[2.707392] BUG: Bad page map in process init  pte:604837fbd3 
pmd:b29e7003
[2.707752] page:ffc07f00afc0 count:2 mapcount:-1 mapping:  
(null) index:0x1
[2.708167] flags: 0x4014(referenced|dirty)
[2.708333] page dumped because: bad pte
[2.708501] addr:007fafb7f000 vm_flags:00100073 
anon_vma:ffc0729bdb90 mapping:  (null) index:7fafb7f
[2.709084] file:  (null) fault:  (null) mmap:  
(null) readpage:  (null)
[2.709306] CPU: 0 PID: 84 Comm: init Tainted: GB   
4.2.0ajb-5-g11a9bf3 #80
[2.709494] Hardware name: ranchu (DT)

seems the page map count is not correct ..
i build is based on mmotm-2015-10-21-14-41

Thanks



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Kirill A. Shutemov

On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > I couldn't see any problem.
> > > > > > > 
> > > > > > > However, in this round, I did another test which is same one
> > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > 
> > > > > > Could you share updated test?
> > > > > 
> > > > > It's part of my testing suite so I should factor it out.
> > > > > I will send it when I go to office tomorrow.
> > > > 
> > > > Thanks.
> > > > 
> > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > 
> > > > > Befor leaving office, I queued it up and result is below.
> > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > fix patches.
> > > > 
> > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > > 
> > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > 
> > > 1. mm: fix __page_mapcount()
> > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > 
> > > If I missed some patches, let me know it.
> > > 
> > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > again.
> > > But unfortunately, the result was below.
> > > 
> > > Now, I am making test program I can send to you but it seems to be not 
> > > easy
> > > because small changes for factoring it out from testing suite seems to 
> > > change
> > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > 
> > Your test suite seems generate quite a few bug reports. Don't mind make 
> > whole
> > suite public?
> 
> It's tough due to including company internal stuffs.
> That's why I try to factor the part I can share out but unfortunatel,
> I couldn't grab a time for retrying until now. :(
> 
> >  
> > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > index:0x60e02
> > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > page->mem_cgroup:880077cf0c00
> > > [ cut here ]
> > > kernel BUG at mm/huge_memory.c:3272!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > 01/01/2011
> > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > RIP: 0010:[]  [] 
> > > split_huge_page_to_list+0x8fb/0x910
> > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > RAX: 0021 RBX: ea240080 RCX: 
> > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > R13: ea240088 R14: ea240080 R15: 
> > > FS:  () GS:88007830() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > Stack:
> > >  cccd ea240080 88007344fa00 ea240088
> > >  88007344fa00  88007344f9e8 810f0200
> > >  ea24   ea240080
> > > Call Trace:
> > >  [] ? __lock_page+0xa0/0xb0
> > >  [] deferred_split_scan+0x115/0x240
> > >  [] ? list_lru_count_one+0x1c/0x30
> > >  [] shrink_slab.part.42+0x1e3/0x350
> > >  [] shrink_zone+0x26a/0x280
> > >  [] do_try_to_free_pages+0x12d/0x3b0
> > >  [] try_to_free_pages+0xb4/0x140
> > >  [] __alloc_pages_nodemask+0x459/0x920
> > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > >  [] khugepaged+0x155/0x1b10
> > >  [] ? prepare_to_wait_event+0xf0/0xf0
> > >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> > >  [] kthread+0xc9/0xe0
> > >  [] ? kthread_park+0x60/0x60
> > >  [] ret_from_fork+0x3f/0x70
> > >  [] ? kthread_park+0x60/0x60
> > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Minchan Kim

On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > I couldn't see any problem.
> > > > > > 
> > > > > > However, in this round, I did another test which is same one
> > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > 
> > > > > Could you share updated test?
> > > > 
> > > > It's part of my testing suite so I should factor it out.
> > > > I will send it when I go to office tomorrow.
> > > 
> > > Thanks.
> > > 
> > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > 
> > > > Befor leaving office, I queued it up and result is below.
> > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > Anyway, please confirm and say to me what I should add more patches
> > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > fix patches.
> > > 
> > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > 
> > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > 
> > 1. mm: fix __page_mapcount()
> > 2. thp: fix leak due split_huge_page() vs. exit race
> > 
> > If I missed some patches, let me know it.
> > 
> > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > again.
> > But unfortunately, the result was below.
> > 
> > Now, I am making test program I can send to you but it seems to be not easy
> > because small changes for factoring it out from testing suite seems to 
> > change
> > something(ex, timing) and makes hard to reproduce. I will try it again.
> 
> Your test suite seems generate quite a few bug reports. Don't mind make whole
> suite public?

It's tough due to including company internal stuffs.
That's why I try to factor the part I can share out but unfortunatel,
I couldn't grab a time for retrying until now. :(

>  
> > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > index:0x60e02
> > flags: 0x40040018(uptodate|dirty|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > page->mem_cgroup:880077cf0c00
> > [ cut here ]
> > kernel BUG at mm/huge_memory.c:3272!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > RIP: 0010:[]  [] 
> > split_huge_page_to_list+0x8fb/0x910
> > RSP: 0018:88007344f968  EFLAGS: 00010286
> > RAX: 0021 RBX: ea240080 RCX: 
> > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > RBP: 88007344f9e8 R08:  R09: 880bc600
> > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > R13: ea240088 R14: ea240080 R15: 
> > FS:  () GS:88007830() knlGS:
> > CS:  0010 DS:  ES:  CR0: 8005003b
> > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > Stack:
> >  cccd ea240080 88007344fa00 ea240088
> >  88007344fa00  88007344f9e8 810f0200
> >  ea24   ea240080
> > Call Trace:
> >  [] ? __lock_page+0xa0/0xb0
> >  [] deferred_split_scan+0x115/0x240
> >  [] ? list_lru_count_one+0x1c/0x30
> >  [] shrink_slab.part.42+0x1e3/0x350
> >  [] shrink_zone+0x26a/0x280
> >  [] do_try_to_free_pages+0x12d/0x3b0
> >  [] try_to_free_pages+0xb4/0x140
> >  [] __alloc_pages_nodemask+0x459/0x920
> >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> >  [] khugepaged+0x155/0x1b10
> >  [] ? prepare_to_wait_event+0xf0/0xf0
> >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> >  [] kthread+0xc9/0xe0
> >  [] ? kthread_park+0x60/0x60
> >  [] ret_from_fork+0x3f/0x70
> >  [] ? kthread_park+0x60/0x60
> > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 
> > e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 
> > c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> > RIP  [] split_huge_page_to_list+0x8fb/0x910
> >  RSP 
> > ---[ end trace 0ee39378e850d8de ]---
> > Kernel panic - not syncing: Fatal

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Minchan Kim

On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > I couldn't see any problem.
> > > > > > 
> > > > > > However, in this round, I did another test which is same one
> > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > 
> > > > > Could you share updated test?
> > > > 
> > > > It's part of my testing suite so I should factor it out.
> > > > I will send it when I go to office tomorrow.
> > > 
> > > Thanks.
> > > 
> > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > 
> > > > Befor leaving office, I queued it up and result is below.
> > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > Anyway, please confirm and say to me what I should add more patches
> > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > fix patches.
> > > 
> > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > 
> > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > 
> > 1. mm: fix __page_mapcount()
> > 2. thp: fix leak due split_huge_page() vs. exit race
> > 
> > If I missed some patches, let me know it.
> > 
> > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > again.
> > But unfortunately, the result was below.
> > 
> > Now, I am making test program I can send to you but it seems to be not easy
> > because small changes for factoring it out from testing suite seems to 
> > change
> > something(ex, timing) and makes hard to reproduce. I will try it again.
> 
> Your test suite seems generate quite a few bug reports. Don't mind make whole
> suite public?

It's tough due to including company internal stuffs.
That's why I try to factor the part I can share out but unfortunatel,
I couldn't grab a time for retrying until now. :(

>  
> > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > index:0x60e02
> > flags: 0x40040018(uptodate|dirty|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > page->mem_cgroup:880077cf0c00
> > [ cut here ]
> > kernel BUG at mm/huge_memory.c:3272!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > RIP: 0010:[]  [] 
> > split_huge_page_to_list+0x8fb/0x910
> > RSP: 0018:88007344f968  EFLAGS: 00010286
> > RAX: 0021 RBX: ea240080 RCX: 
> > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > RBP: 88007344f9e8 R08:  R09: 880bc600
> > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > R13: ea240088 R14: ea240080 R15: 
> > FS:  () GS:88007830() knlGS:
> > CS:  0010 DS:  ES:  CR0: 8005003b
> > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > Stack:
> >  cccd ea240080 88007344fa00 ea240088
> >  88007344fa00  88007344f9e8 810f0200
> >  ea24   ea240080
> > Call Trace:
> >  [] ? __lock_page+0xa0/0xb0
> >  [] deferred_split_scan+0x115/0x240
> >  [] ? list_lru_count_one+0x1c/0x30
> >  [] shrink_slab.part.42+0x1e3/0x350
> >  [] shrink_zone+0x26a/0x280
> >  [] do_try_to_free_pages+0x12d/0x3b0
> >  [] try_to_free_pages+0xb4/0x140
> >  [] __alloc_pages_nodemask+0x459/0x920
> >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> >  [] khugepaged+0x155/0x1b10
> >  [] ? prepare_to_wait_event+0xf0/0xf0
> >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> >  [] kthread+0xc9/0xe0
> >  [] ? kthread_park+0x60/0x60
> >  [] ret_from_fork+0x3f/0x70
> >  [] ? kthread_park+0x60/0x60
> > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 
> > e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 
> > c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> > RIP  [] split_huge_page_to_list+0x8fb/0x910
> >  RSP 
> > ---[ end trace 0ee39378e850d8de ]---
> > Kernel panic - not syncing: Fatal

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-18 Thread Kirill A. Shutemov

On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > I couldn't see any problem.
> > > > > > > 
> > > > > > > However, in this round, I did another test which is same one
> > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > 
> > > > > > Could you share updated test?
> > > > > 
> > > > > It's part of my testing suite so I should factor it out.
> > > > > I will send it when I go to office tomorrow.
> > > > 
> > > > Thanks.
> > > > 
> > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > 
> > > > > Befor leaving office, I queued it up and result is below.
> > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > fix patches.
> > > > 
> > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > > 
> > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> > > 
> > > 1. mm: fix __page_mapcount()
> > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > 
> > > If I missed some patches, let me know it.
> > > 
> > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested 
> > > again.
> > > But unfortunately, the result was below.
> > > 
> > > Now, I am making test program I can send to you but it seems to be not 
> > > easy
> > > because small changes for factoring it out from testing suite seems to 
> > > change
> > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > 
> > Your test suite seems generate quite a few bug reports. Don't mind make 
> > whole
> > suite public?
> 
> It's tough due to including company internal stuffs.
> That's why I try to factor the part I can share out but unfortunatel,
> I couldn't grab a time for retrying until now. :(
> 
> >  
> > > page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> > > index:0x60e02
> > > flags: 0x40040018(uptodate|dirty|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > page->mem_cgroup:880077cf0c00
> > > [ cut here ]
> > > kernel BUG at mm/huge_memory.c:3272!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> > > 01/01/2011
> > > task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> > > RIP: 0010:[]  [] 
> > > split_huge_page_to_list+0x8fb/0x910
> > > RSP: 0018:88007344f968  EFLAGS: 00010286
> > > RAX: 0021 RBX: ea240080 RCX: 
> > > RDX: 0001 RSI: 0246 RDI: 821df4d8
> > > RBP: 88007344f9e8 R08:  R09: 880bc600
> > > R10: 8163e2c0 R11: 4b47 R12: ea240080
> > > R13: ea240088 R14: ea240080 R15: 
> > > FS:  () GS:88007830() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> > > Stack:
> > >  cccd ea240080 88007344fa00 ea240088
> > >  88007344fa00  88007344f9e8 810f0200
> > >  ea24   ea240080
> > > Call Trace:
> > >  [] ? __lock_page+0xa0/0xb0
> > >  [] deferred_split_scan+0x115/0x240
> > >  [] ? list_lru_count_one+0x1c/0x30
> > >  [] shrink_slab.part.42+0x1e3/0x350
> > >  [] shrink_zone+0x26a/0x280
> > >  [] do_try_to_free_pages+0x12d/0x3b0
> > >  [] try_to_free_pages+0xb4/0x140
> > >  [] __alloc_pages_nodemask+0x459/0x920
> > >  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > >  [] khugepaged+0x155/0x1b10
> > >  [] ? prepare_to_wait_event+0xf0/0xf0
> > >  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
> > >  [] kthread+0xc9/0xe0
> > >  [] ? kthread_park+0x60/0x60
> > >  [] ret_from_fork+0x3f/0x70
> > >  [] ? kthread_park+0x60/0x60
> > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-17 Thread Kirill A. Shutemov

On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > I couldn't see any problem.
> > > > > 
> > > > > However, in this round, I did another test which is same one
> > > > > I attached but a liitle bit different because it doesn't do
> > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > 
> > > > Could you share updated test?
> > > 
> > > It's part of my testing suite so I should factor it out.
> > > I will send it when I go to office tomorrow.
> > 
> > Thanks.
> > 
> > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > 
> > > Befor leaving office, I queued it up and result is below.
> > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > Anyway, please confirm and say to me what I should add more patches
> > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > fix patches.
> > 
> > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > 
> > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> 
> 1. mm: fix __page_mapcount()
> 2. thp: fix leak due split_huge_page() vs. exit race
> 
> If I missed some patches, let me know it.
> 
> I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> But unfortunately, the result was below.
> 
> Now, I am making test program I can send to you but it seems to be not easy
> because small changes for factoring it out from testing suite seems to change
> something(ex, timing) and makes hard to reproduce. I will try it again.

Your test suite seems generate quite a few bug reports. Don't mind make whole
suite public?
 
> page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> index:0x60e02
> flags: 0x40040018(uptodate|dirty|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3272!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x8fb/0x910
> RSP: 0018:88007344f968  EFLAGS: 00010286
> RAX: 0021 RBX: ea240080 RCX: 
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88007344f9e8 R08:  R09: 880bc600
> R10: 8163e2c0 R11: 4b47 R12: ea240080
> R13: ea240088 R14: ea240080 R15: 
> FS:  () GS:88007830() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> Stack:
>  cccd ea240080 88007344fa00 ea240088
>  88007344fa00  88007344f9e8 810f0200
>  ea24   ea240080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x115/0x240
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
>  [] khugepaged+0x155/0x1b10
>  [] ? prepare_to_wait_event+0xf0/0xf0
>  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
>  [] kthread+0xc9/0xe0
>  [] ? kthread_park+0x60/0x60
>  [] ret_from_fork+0x3f/0x70
>  [] ? kthread_park+0x60/0x60
> Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
> 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 
> c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> RIP  [] split_huge_page_to_list+0x8fb/0x910
>  RSP 
> ---[ end trace 0ee39378e850d8de ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled

I looked more into it. It seems a race between split_huge_page() and
deferred_split_scan() as the dumped page is not huge.

Could you check if the patch below makes any difference to the situation?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 91e2f4b7ca39..923c0f6eb50a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3186,13 +3186,6 @@ static

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-17 Thread Kirill A. Shutemov

On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > I couldn't see any problem.
> > > > > 
> > > > > However, in this round, I did another test which is same one
> > > > > I attached but a liitle bit different because it doesn't do
> > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > 
> > > > Could you share updated test?
> > > 
> > > It's part of my testing suite so I should factor it out.
> > > I will send it when I go to office tomorrow.
> > 
> > Thanks.
> > 
> > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > 
> > > Befor leaving office, I queued it up and result is below.
> > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > Anyway, please confirm and say to me what I should add more patches
> > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > fix patches.
> > 
> > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > 
> > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com
> 
> 1. mm: fix __page_mapcount()
> 2. thp: fix leak due split_huge_page() vs. exit race
> 
> If I missed some patches, let me know it.
> 
> I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> But unfortunately, the result was below.
> 
> Now, I am making test program I can send to you but it seems to be not easy
> because small changes for factoring it out from testing suite seems to change
> something(ex, timing) and makes hard to reproduce. I will try it again.

Your test suite seems generate quite a few bug reports. Don't mind make whole
suite public?
 
> page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
> index:0x60e02
> flags: 0x40040018(uptodate|dirty|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3272!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x8fb/0x910
> RSP: 0018:88007344f968  EFLAGS: 00010286
> RAX: 0021 RBX: ea240080 RCX: 
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88007344f9e8 R08:  R09: 880bc600
> R10: 8163e2c0 R11: 4b47 R12: ea240080
> R13: ea240088 R14: ea240080 R15: 
> FS:  () GS:88007830() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
> Stack:
>  cccd ea240080 88007344fa00 ea240088
>  88007344fa00  88007344f9e8 810f0200
>  ea24   ea240080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x115/0x240
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
>  [] khugepaged+0x155/0x1b10
>  [] ? prepare_to_wait_event+0xf0/0xf0
>  [] ? __split_huge_pmd_locked+0x4e0/0x4e0
>  [] kthread+0xc9/0xe0
>  [] ? kthread_park+0x60/0x60
>  [] ret_from_fork+0x3f/0x70
>  [] ? kthread_park+0x60/0x60
> Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
> 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 
> c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> RIP  [] split_huge_page_to_list+0x8fb/0x910
>  RSP 
> ---[ end trace 0ee39378e850d8de ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled

I looked more into it. It seems a race between split_huge_page() and
deferred_split_scan() as the dumped page is not huge.

Could you check if the patch below makes any difference to the situation?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 91e2f4b7ca39..923c0f6eb50a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3186,13 +3186,6 @@ static

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > I couldn't see any problem.
> > > > 
> > > > However, in this round, I did another test which is same one
> > > > I attached but a liitle bit different because it doesn't do
> > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > 
> > > Could you share updated test?
> > 
> > It's part of my testing suite so I should factor it out.
> > I will send it when I go to office tomorrow.
> 
> Thanks.
> 
> > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > 
> > Befor leaving office, I queued it up and result is below.
> > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > Anyway, please confirm and say to me what I should add more patches
> > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > fix patches.
> 
> The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> 
> http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

1. mm: fix __page_mapcount()
2. thp: fix leak due split_huge_page() vs. exit race

If I missed some patches, let me know it.

I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
But unfortunately, the result was below.

Now, I am making test program I can send to you but it seems to be not easy
because small changes for factoring it out from testing suite seems to change
something(ex, timing) and makes hard to reproduce. I will try it again.


page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
index:0x60e02
flags: 0x40040018(uptodate|dirty|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3272!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x8fb/0x910
RSP: 0018:88007344f968  EFLAGS: 00010286
RAX: 0021 RBX: ea240080 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344f9e8 R08:  R09: 880bc600
R10: 8163e2c0 R11: 4b47 R12: ea240080
R13: ea240088 R14: ea240080 R15: 
FS:  () GS:88007830() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
Stack:
 cccd ea240080 88007344fa00 ea240088
 88007344fa00  88007344f9e8 810f0200
 ea24   ea240080
Call Trace:
 [] ? __lock_page+0xa0/0xb0
 [] deferred_split_scan+0x115/0x240
 [] ? list_lru_count_one+0x1c/0x30
 [] shrink_slab.part.42+0x1e3/0x350
 [] shrink_zone+0x26a/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] try_to_free_pages+0xb4/0x140
 [] __alloc_pages_nodemask+0x459/0x920
 [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 
77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
RIP  [] split_huge_page_to_list+0x8fb/0x910
 RSP 
---[ end trace 0ee39378e850d8de ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > During the test with MADV_FREE on kernel I applied your patches,
> > > I couldn't see any problem.
> > > 
> > > However, in this round, I did another test which is same one
> > > I attached but a liitle bit different because it doesn't do
> > > (memcg things/kill/swapoff) for testing program long-live test.
> > 
> > Could you share updated test?
> 
> It's part of my testing suite so I should factor it out.
> I will send it when I go to office tomorrow.

Thanks.

> > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> 
> Befor leaving office, I queued it up and result is below.
> It seems you fixed already but didn't apply it to mmotm yet. Right?
> Anyway, please confirm and say to me what I should add more patches
> into mmotm-2015-11-10-15-53 for follow up your recent many bug
> fix patches.

The two my patches which are not in the mmotm-2015-11-10-15-53 release:

http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > During the test with MADV_FREE on kernel I applied your patches,
> > I couldn't see any problem.
> > 
> > However, in this round, I did another test which is same one
> > I attached but a liitle bit different because it doesn't do
> > (memcg things/kill/swapoff) for testing program long-live test.
> 
> Could you share updated test?

It's part of my testing suite so I should factor it out.
I will send it when I go to office tomorrow.

> 
> And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

Befor leaving office, I queued it up and result is below.
It seems you fixed already but didn't apply it to mmotm yet. Right?
Anyway, please confirm and say to me what I should add more patches
into mmotm-2015-11-10-15-53 for follow up your recent many bug
fix patches.

Thanks.

page:ea553fc0 count:3 mapcount:1 mapping:88007f717a01 
index:0x602ff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 10 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:88007344fa00  EFLAGS: 00010282
RAX: 0021 RBX: ea0001a0bbc0 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344fa80 R08:  R09: 880b9540
R10: 8163e2c0 R11: 02c2 R12: 
R13: ea553f80 R14: ea553fc0 R15: 8189db40
FS:  () GS:88007834() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f45cc0091d8 CR3: 7eba7000 CR4: 06a0
Stack:
 880073441a40   
 81114880  81116420 ea553fe0
 88007344fb30 88007344fb20  88007344fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 b8 f6 77 81 4c 89 f7 e8 fa 36 fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 337555313b7e45be ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> During the test with MADV_FREE on kernel I applied your patches,
> I couldn't see any problem.
> 
> However, in this round, I did another test which is same one
> I attached but a liitle bit different because it doesn't do
> (memcg things/kill/swapoff) for testing program long-live test.

Could you share updated test?

And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

> With that, I encountered this problem.
> 
> page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
> index:0x62a02
> flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3340!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x907/0x920
> RSP: 0018:88004ced7a38  EFLAGS: 00010296
> RAX: 0021 RBX: eaf60080 RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88004ced7ab8 R08:  R09: 880bc560
> R10: 8163d880 R11: 00014f25 R12: eaf60080
> R13: eaf60088 R14: eaf60080 R15: 
> FS:  7f43d3ced740() GS:8800782e() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
> Stack:
>  cccd eaf60080 88004ced7ad0 eaf60088
>  88004ced7ad0  88004ced7ab8 810ef9d0
>  eaf6   eaf60080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x11c/0x260
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] handle_mm_fault+0xc77/0x1000
>  [] ? retint_kernel+0x10/0x10
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 
> 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 
> af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f 
> RIP  [] split_huge_page_to_list+0x907/0x920
>  RSP 
> ---[ end trace c9a60522e3a296e4 ]---

I don't see how it's possible: call lock_page() just before
split_huge_page() in deferred_split_scan().

> So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED.
> In this time, I saw below oops in this time.
> If I miss somethings, please let me know it.
> 
> [ cut here ]
> kernel BUG at include/linux/swapops.h:129!

Looks similar to what I fixed by inserting smp_wmb() just before
clear_compound_head() in __split_huge_page_tail().

Do you have this in place? Like in last -mm tree?

> Another hit:
> 
> page:ea520080 count:2 mapcount:0 mapping:880072b38a51 
> index:0x62602
> flags: 0x40048028(uptodate|lru|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3306!

The same as the first one: no idea.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> During the test with MADV_FREE on kernel I applied your patches,
> I couldn't see any problem.
> 
> However, in this round, I did another test which is same one
> I attached but a liitle bit different because it doesn't do
> (memcg things/kill/swapoff) for testing program long-live test.

Could you share updated test?

And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

> With that, I encountered this problem.
> 
> page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
> index:0x62a02
> flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3340!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
> RIP: 0010:[]  [] 
> split_huge_page_to_list+0x907/0x920
> RSP: 0018:88004ced7a38  EFLAGS: 00010296
> RAX: 0021 RBX: eaf60080 RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 88004ced7ab8 R08:  R09: 880bc560
> R10: 8163d880 R11: 00014f25 R12: eaf60080
> R13: eaf60088 R14: eaf60080 R15: 
> FS:  7f43d3ced740() GS:8800782e() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
> Stack:
>  cccd eaf60080 88004ced7ad0 eaf60088
>  88004ced7ad0  88004ced7ab8 810ef9d0
>  eaf6   eaf60080
> Call Trace:
>  [] ? __lock_page+0xa0/0xb0
>  [] deferred_split_scan+0x11c/0x260
>  [] ? list_lru_count_one+0x1c/0x30
>  [] shrink_slab.part.42+0x1e3/0x350
>  [] shrink_zone+0x26a/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_pages+0xb4/0x140
>  [] __alloc_pages_nodemask+0x459/0x920
>  [] handle_mm_fault+0xc77/0x1000
>  [] ? retint_kernel+0x10/0x10
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 
> 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 
> af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f 
> RIP  [] split_huge_page_to_list+0x907/0x920
>  RSP 
> ---[ end trace c9a60522e3a296e4 ]---

I don't see how it's possible: call lock_page() just before
split_huge_page() in deferred_split_scan().

> So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED.
> In this time, I saw below oops in this time.
> If I miss somethings, please let me know it.
> 
> [ cut here ]
> kernel BUG at include/linux/swapops.h:129!

Looks similar to what I fixed by inserting smp_wmb() just before
clear_compound_head() in __split_huge_page_tail().

Do you have this in place? Like in last -mm tree?

> Another hit:
> 
> page:ea520080 count:2 mapcount:0 mapping:880072b38a51 
> index:0x62602
> flags: 0x40048028(uptodate|lru|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:880077cf0c00
> [ cut here ]
> kernel BUG at mm/huge_memory.c:3306!

The same as the first one: no idea.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > During the test with MADV_FREE on kernel I applied your patches,
> > I couldn't see any problem.
> > 
> > However, in this round, I did another test which is same one
> > I attached but a liitle bit different because it doesn't do
> > (memcg things/kill/swapoff) for testing program long-live test.
> 
> Could you share updated test?

It's part of my testing suite so I should factor it out.
I will send it when I go to office tomorrow.

> 
> And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

Befor leaving office, I queued it up and result is below.
It seems you fixed already but didn't apply it to mmotm yet. Right?
Anyway, please confirm and say to me what I should add more patches
into mmotm-2015-11-10-15-53 for follow up your recent many bug
fix patches.

Thanks.

page:ea553fc0 count:3 mapcount:1 mapping:88007f717a01 
index:0x602ff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 10 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:88007344fa00  EFLAGS: 00010282
RAX: 0021 RBX: ea0001a0bbc0 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344fa80 R08:  R09: 880b9540
R10: 8163e2c0 R11: 02c2 R12: 
R13: ea553f80 R14: ea553fc0 R15: 8189db40
FS:  () GS:88007834() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f45cc0091d8 CR3: 7eba7000 CR4: 06a0
Stack:
 880073441a40   
 81114880  81116420 ea553fe0
 88007344fb30 88007344fb20  88007344fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 b8 f6 77 81 4c 89 f7 e8 fa 36 fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 337555313b7e45be ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > During the test with MADV_FREE on kernel I applied your patches,
> > > I couldn't see any problem.
> > > 
> > > However, in this round, I did another test which is same one
> > > I attached but a liitle bit different because it doesn't do
> > > (memcg things/kill/swapoff) for testing program long-live test.
> > 
> > Could you share updated test?
> 
> It's part of my testing suite so I should factor it out.
> I will send it when I go to office tomorrow.

Thanks.

> > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> 
> Befor leaving office, I queued it up and result is below.
> It seems you fixed already but didn't apply it to mmotm yet. Right?
> Anyway, please confirm and say to me what I should add more patches
> into mmotm-2015-11-10-15-53 for follow up your recent many bug
> fix patches.

The two my patches which are not in the mmotm-2015-11-10-15-53 release:

http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > I couldn't see any problem.
> > > > 
> > > > However, in this round, I did another test which is same one
> > > > I attached but a liitle bit different because it doesn't do
> > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > 
> > > Could you share updated test?
> > 
> > It's part of my testing suite so I should factor it out.
> > I will send it when I go to office tomorrow.
> 
> Thanks.
> 
> > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > 
> > Befor leaving office, I queued it up and result is below.
> > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > Anyway, please confirm and say to me what I should add more patches
> > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > fix patches.
> 
> The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> 
> http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shute...@linux.intel.com
> http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shute...@linux.intel.com

1. mm: fix __page_mapcount()
2. thp: fix leak due split_huge_page() vs. exit race

If I missed some patches, let me know it.

I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
But unfortunately, the result was below.

Now, I am making test program I can send to you but it seems to be not easy
because small changes for factoring it out from testing suite seems to change
something(ex, timing) and makes hard to reproduce. I will try it again.


page:ea240080 count:2 mapcount:1 mapping:88007eff3321 
index:0x60e02
flags: 0x40040018(uptodate|dirty|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3272!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880073441a40 ti: 88007344c000 task.ti: 88007344c000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x8fb/0x910
RSP: 0018:88007344f968  EFLAGS: 00010286
RAX: 0021 RBX: ea240080 RCX: 
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88007344f9e8 R08:  R09: 880bc600
R10: 8163e2c0 R11: 4b47 R12: ea240080
R13: ea240088 R14: ea240080 R15: 
FS:  () GS:88007830() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7ffd59edcd68 CR3: 01808000 CR4: 06a0
Stack:
 cccd ea240080 88007344fa00 ea240088
 88007344fa00  88007344f9e8 810f0200
 ea24   ea240080
Call Trace:
 [] ? __lock_page+0xa0/0xb0
 [] deferred_split_scan+0x115/0x240
 [] ? list_lru_count_one+0x1c/0x30
 [] shrink_slab.part.42+0x1e3/0x350
 [] shrink_zone+0x26a/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 [] try_to_free_pages+0xb4/0x140
 [] __alloc_pages_nodemask+0x459/0x920
 [] ? trace_event_raw_event_tick_stop+0xd0/0xd0
 [] khugepaged+0x155/0x1b10
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 
94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 
77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
RIP  [] split_huge_page_to_list+0x8fb/0x910
 RSP 
---[ end trace 0ee39378e850d8de ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-15 Thread Minchan Kim

On Thu, Nov 12, 2015 at 09:36:14AM +0900, Minchan Kim wrote:



> > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > > MADV_FREE code in there
> > >  + pte_mkdirty patch
> > >  + freeze/unfreeze patch
> > >  + do_page_add_anon_rmap patch
> > >  + above split_huge_pmd
> > > 
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > BUG: Bad rss-counter state mm:88007fa3bb80 idx:1 val:512
> > 
> > With the patch below my test setup run for 2+ days without triggering the
> > bug. split_huge_pmd patch should be dropped.
> > 
> > Please test.
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 14cbbad54a3e..7aa0a3fef2aa 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > write = pmd_write(*pmd);
> > young = pmd_young(*pmd);
> >  
> > -   /* leave pmd empty until pte is filled */
> > -   pmdp_huge_clear_flush_notify(vma, haddr, pmd);
> > -
> > pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> > pmd_populate(mm, &_pmd, pgtable);
> >  
> > @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > }
> >  
> > smp_wmb(); /* make pte visible before pmd */
> > +   /*
> > +* Up to this point the pmd is present and huge and userland has the
> > +* whole access to the hugepage during the split (which happens in
> > +* place). If we overwrite the pmd with the not-huge version pointing
> > +* to the pte here (which of course we could if all CPUs were bug
> > +* free), userland could trigger a small page size TLB miss on the
> > +* small sized TLB while the hugepage TLB entry is still established in
> > +* the huge TLB. Some CPU doesn't like that.
> > +* See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> > +* 383 on page 93. Intel should be safe but is also warns that it's
> > +* only safe if the permission and cache attributes of the two entries
> > +* loaded in the two TLB is identical (which should be the case here).
> > +* But it is generally safer to never allow small and huge TLB entries
> > +* for the same virtual address to be loaded simultaneously. So instead
> > +* of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> > +* current pmd notpresent (atomically because here the pmd_trans_huge
> > +* and pmd_trans_splitting must remain set at all times on the pmd
> > +* until the split is complete for this pmd), then we flush the SMP TLB
> > +* and finally we write the non-huge version of the pmd entry with
> > +* pmd_populate.
> > +*/
> > +   pmdp_invalidate(vma, haddr, pmd);
> > pmd_populate(mm, pmd, pgtable);
> >  
> > if (freeze) {
> 
> I have been tested this patch with MADV_DONTNEED for a few days and
> I couldn't see the problem any more. And I will continue to test it
> with MADV_FREE.

During the test with MADV_FREE on kernel I applied your patches,
I couldn't see any problem.

However, in this round, I did another test which is same one
I attached but a liitle bit different because it doesn't do
(memcg things/kill/swapoff) for testing program long-live test.

With that, I encountered this problem.

page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
index:0x62a02
flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3340!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x907/0x920
RSP: 0018:88004ced7a38  EFLAGS: 00010296
RAX: 0021 RBX: eaf60080 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88004ced7ab8 R08:  R09: 880bc560
R10: 8163d880 R11: 00014f25 R12: eaf60080
R13: eaf60088 R14: eaf60080 R15: 
FS:  7f43d3ced740() GS:8800782e() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
Stack:
 cccd eaf60080 88004ced7ad0 eaf60088
 88004ced7ad0  88004ced7ab8

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-15 Thread Minchan Kim

On Thu, Nov 12, 2015 at 09:36:14AM +0900, Minchan Kim wrote:



> > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > > MADV_FREE code in there
> > >  + pte_mkdirty patch
> > >  + freeze/unfreeze patch
> > >  + do_page_add_anon_rmap patch
> > >  + above split_huge_pmd
> > > 
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > BUG: Bad rss-counter state mm:88007fa3bb80 idx:1 val:512
> > 
> > With the patch below my test setup run for 2+ days without triggering the
> > bug. split_huge_pmd patch should be dropped.
> > 
> > Please test.
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 14cbbad54a3e..7aa0a3fef2aa 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > write = pmd_write(*pmd);
> > young = pmd_young(*pmd);
> >  
> > -   /* leave pmd empty until pte is filled */
> > -   pmdp_huge_clear_flush_notify(vma, haddr, pmd);
> > -
> > pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> > pmd_populate(mm, &_pmd, pgtable);
> >  
> > @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> > }
> >  
> > smp_wmb(); /* make pte visible before pmd */
> > +   /*
> > +* Up to this point the pmd is present and huge and userland has the
> > +* whole access to the hugepage during the split (which happens in
> > +* place). If we overwrite the pmd with the not-huge version pointing
> > +* to the pte here (which of course we could if all CPUs were bug
> > +* free), userland could trigger a small page size TLB miss on the
> > +* small sized TLB while the hugepage TLB entry is still established in
> > +* the huge TLB. Some CPU doesn't like that.
> > +* See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> > +* 383 on page 93. Intel should be safe but is also warns that it's
> > +* only safe if the permission and cache attributes of the two entries
> > +* loaded in the two TLB is identical (which should be the case here).
> > +* But it is generally safer to never allow small and huge TLB entries
> > +* for the same virtual address to be loaded simultaneously. So instead
> > +* of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> > +* current pmd notpresent (atomically because here the pmd_trans_huge
> > +* and pmd_trans_splitting must remain set at all times on the pmd
> > +* until the split is complete for this pmd), then we flush the SMP TLB
> > +* and finally we write the non-huge version of the pmd entry with
> > +* pmd_populate.
> > +*/
> > +   pmdp_invalidate(vma, haddr, pmd);
> > pmd_populate(mm, pmd, pgtable);
> >  
> > if (freeze) {
> 
> I have been tested this patch with MADV_DONTNEED for a few days and
> I couldn't see the problem any more. And I will continue to test it
> with MADV_FREE.

During the test with MADV_FREE on kernel I applied your patches,
I couldn't see any problem.

However, in this round, I did another test which is same one
I attached but a liitle bit different because it doesn't do
(memcg things/kill/swapoff) for testing program long-live test.

With that, I encountered this problem.

page:eaf60080 count:1 mapcount:0 mapping:88007f584691 
index:0x62a02
flags: 0x4006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:880077cf0c00
[ cut here ]
kernel BUG at mm/huge_memory.c:3340!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006b0f1a40 ti: 88004ced4000 task.ti: 88004ced4000
RIP: 0010:[]  [] 
split_huge_page_to_list+0x907/0x920
RSP: 0018:88004ced7a38  EFLAGS: 00010296
RAX: 0021 RBX: eaf60080 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 88004ced7ab8 R08:  R09: 880bc560
R10: 8163d880 R11: 00014f25 R12: eaf60080
R13: eaf60088 R14: eaf60080 R15: 
FS:  7f43d3ced740() GS:8800782e() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ff1f6fcdb98 CR3: 4cf56000 CR4: 06a0
Stack:
 cccd eaf60080 88004ced7ad0 eaf60088
 88004ced7ad0  88004ced7ab8

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-11 Thread Minchan Kim

On Mon, Nov 09, 2015 at 12:55:22AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > > Hello Kirill,
> > > > > > > 
> > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I added the code to check it and queued it 
> > > > > > > > > > > > > > > > again but I had another oops
> > > > > > > > > > > > > > > > in this time but symptom is related to 
> > > > > > > > > > > > > > > > anon_vma, too.
> > > > > > > > > > > > > > > > (kernel is based on recent mmotm + 
> > > > > > > > > > > > > > > > unconditional mkdirty for bug fix)
> > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since 
> > > > > > > > > > > > > > > > the page was not page_mapped
> > > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > > Let me think on it, but it could well relate to 
> > > > > > > > > > > > > > > the one you got before.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > > If it is fixed, I will test again with your 
> > > > > > > > > > > > > > migration patchset, then.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > > patch in there.
> > > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > > 
> > > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > > think I track it down
> > > > > > > > > > > > finally.
> > > > > > > > > > > >  
> > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > > properly tested, but
> > > > > > > > > > > > looks like it works.
> > > > > > > > > > > > 
> > > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > > works: I thought that
> > > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > > 
> > > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > > 
> > > > > > > > > > > > As result if zap_pte_range() races with

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-11 Thread Minchan Kim

On Mon, Nov 09, 2015 at 12:55:22AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > > Hello Kirill,
> > > > > > > 
> > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I added the code to check it and queued it 
> > > > > > > > > > > > > > > > again but I had another oops
> > > > > > > > > > > > > > > > in this time but symptom is related to 
> > > > > > > > > > > > > > > > anon_vma, too.
> > > > > > > > > > > > > > > > (kernel is based on recent mmotm + 
> > > > > > > > > > > > > > > > unconditional mkdirty for bug fix)
> > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since 
> > > > > > > > > > > > > > > > the page was not page_mapped
> > > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > > Let me think on it, but it could well relate to 
> > > > > > > > > > > > > > > the one you got before.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > > If it is fixed, I will test again with your 
> > > > > > > > > > > > > > migration patchset, then.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > > patch in there.
> > > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > > 
> > > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > > think I track it down
> > > > > > > > > > > > finally.
> > > > > > > > > > > >  
> > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > > properly tested, but
> > > > > > > > > > > > looks like it works.
> > > > > > > > > > > > 
> > > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > > works: I thought that
> > > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > > 
> > > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > > 
> > > > > > > > > > > > As result if zap_pte_range() races with

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-08 Thread Kirill A. Shutemov

On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > Hello Kirill,
> > > > > > 
> > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > > patchset, then.
> > > > > > > > > > > > 
> > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > patch in there.
> > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > 
> > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > think I track it down
> > > > > > > > > > > finally.
> > > > > > > > > > >  
> > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > properly tested, but
> > > > > > > > > > > looks like it works.
> > > > > > > > > > > 
> > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > works: I thought that
> > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > 
> > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > 
> > > > > > > > > > > As result if zap_pte_range() races with 
> > > > > > > > > > > split_huge_page(), we can end up
> > > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > > _mapcount
> > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-08 Thread Kirill A. Shutemov

On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > Hello Kirill,
> > > > > > 
> > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. 
> > > > > > > > > > Shutemov wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh 
> > > > > > > > > > > > > Dickins wrote:
> > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > > at that time but second check of page_mapped 
> > > > > > > > > > > > > > > right before try_to_unmap seems
> > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > page dumped because: 
> > > > > > > > > > > > > > > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > > > > > > > > > > > && !anon_vma)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > > patchset, then.
> > > > > > > > > > > > 
> > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > > attach for a long time.
> > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > > patch in there.
> > > > > > > > > > > > And I added below debug code with request from Kirill 
> > > > > > > > > > > > to all test kernels.
> > > > > > > > > > > 
> > > > > > > > > > > It took too long time (and a lot of printk()), but I 
> > > > > > > > > > > think I track it down
> > > > > > > > > > > finally.
> > > > > > > > > > >  
> > > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > > properly tested, but
> > > > > > > > > > > looks like it works.
> > > > > > > > > > > 
> > > > > > > > > > > The problem was my wrong assumption on how migration 
> > > > > > > > > > > works: I thought that
> > > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > > deconstruction mapping.
> > > > > > > > > > > 
> > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > 
> > > > > > > > > > > As result if zap_pte_range() races with 
> > > > > > > > > > > split_huge_page(), we can end up
> > > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > > _mapcount
> > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Minchan Kim

On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > Hello Kirill,
> > > > > 
> > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) 
> > > > > > > > > > > > > > && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > 
> > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > patchset, then.
> > > > > > > > > > > 
> > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > attach for a long time.
> > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > patch in there.
> > > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > > all test kernels.
> > > > > > > > > > 
> > > > > > > > > > It took too long time (and a lot of printk()), but I think 
> > > > > > > > > > I track it down
> > > > > > > > > > finally.
> > > > > > > > > >  
> > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > properly tested, but
> > > > > > > > > > looks like it works.
> > > > > > > > > > 
> > > > > > > > > > The problem was my wrong assumption on how migration works: 
> > > > > > > > > > I thought that
> > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > deconstruction mapping.
> > > > > > > > > > 
> > > > > > > > > > But turn out that's not true.
> > > > > > > > > > 
> > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), 
> > > > > > > > > > we can end up
> > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > _mapcount
> > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> > > > > > > > > > by vmscan and by
> > > > > > > > > > pfn scanners (Sasha showed few similar traces from 
> > > > > > > > > > compaction too).
> > > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > > freed anon_vma.
> > > > > > > > > > 
> > > >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Kirill A. Shutemov

On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > Hello Kirill,
> > > > 
> > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I added the code to check it and queued it again but 
> > > > > > > > > > > > > I had another oops
> > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > 
> > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > migration series.
> > > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > > you got before.
> > > > > > > > > > > 
> > > > > > > > > > > I will roll back to 
> > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > migration cleanup
> > > > > > > > > > > series and will test it again.
> > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > patchset, then.
> > > > > > > > > > 
> > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > > for a long time.
> > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > > in there.
> > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > all test kernels.
> > > > > > > > > 
> > > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > > track it down
> > > > > > > > > finally.
> > > > > > > > >  
> > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > properly tested, but
> > > > > > > > > looks like it works.
> > > > > > > > > 
> > > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > > thought that
> > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > deconstruction mapping.
> > > > > > > > > 
> > > > > > > > > But turn out that's not true.
> > > > > > > > > 
> > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > > can end up
> > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > _mapcount
> > > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > > vmscan and by
> > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > > too).
> > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > freed anon_vma.
> > > > > > > > > 
> > > > > > > > > BOOM!
> > > > > > > > > 
> > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > > migration
> > > > > > > > > entries logic: on setup we remove page from rmap and drop 
> > > > > > > > > pin, on removing
> > > > > > > > > we get pin back and put page on rmap.

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Minchan Kim

On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > Hello Kirill,
> > > > > 
> > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I added the code to check it and queued it again 
> > > > > > > > > > > > > > but I had another oops
> > > > > > > > > > > > > > in this time but symptom is related to anon_vma, 
> > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) 
> > > > > > > > > > > > > > && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > > migration series.
> > > > > > > > > > > > > Let me think on it, but it could well relate to the 
> > > > > > > > > > > > > one you got before.
> > > > > > > > > > > > 
> > > > > > > > > > > > I will roll back to 
> > > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > > migration cleanup
> > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > > patchset, then.
> > > > > > > > > > > 
> > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I 
> > > > > > > > > > > attach for a long time.
> > > > > > > > > > > Therefore, there is no patchset from Hugh's migration 
> > > > > > > > > > > patch in there.
> > > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > > all test kernels.
> > > > > > > > > > 
> > > > > > > > > > It took too long time (and a lot of printk()), but I think 
> > > > > > > > > > I track it down
> > > > > > > > > > finally.
> > > > > > > > > >  
> > > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > > properly tested, but
> > > > > > > > > > looks like it works.
> > > > > > > > > > 
> > > > > > > > > > The problem was my wrong assumption on how migration works: 
> > > > > > > > > > I thought that
> > > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > > deconstruction mapping.
> > > > > > > > > > 
> > > > > > > > > > But turn out that's not true.
> > > > > > > > > > 
> > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), 
> > > > > > > > > > we can end up
> > > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > > _mapcount
> > > > > > > > > > elevated. The page is on LRU too. So it's still reachable 
> > > > > > > > > > by vmscan and by
> > > > > > > > > > pfn scanners (Sasha showed few similar traces from 
> > > > > > > > > > compaction too).
> > > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > > freed anon_vma.
> > > > > > > > > > 
> > > >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-04 Thread Kirill A. Shutemov

On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > Hello Kirill,
> > > > 
> > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I added the code to check it and queued it again but 
> > > > > > > > > > > > > I had another oops
> > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the 
> > > > > > > > > > > > > page was not page_mapped
> > > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > > flags: 
> > > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > 
> > > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > > migration series.
> > > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > > you got before.
> > > > > > > > > > > 
> > > > > > > > > > > I will roll back to 
> > > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > > migration cleanup
> > > > > > > > > > > series and will test it again.
> > > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > > patchset, then.
> > > > > > > > > > 
> > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > > for a long time.
> > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > > in there.
> > > > > > > > > > And I added below debug code with request from Kirill to 
> > > > > > > > > > all test kernels.
> > > > > > > > > 
> > > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > > track it down
> > > > > > > > > finally.
> > > > > > > > >  
> > > > > > > > > The patch below seems fixes issue for me. It's not yet 
> > > > > > > > > properly tested, but
> > > > > > > > > looks like it works.
> > > > > > > > > 
> > > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > > thought that
> > > > > > > > > kernel would wait migration to finish on before 
> > > > > > > > > deconstruction mapping.
> > > > > > > > > 
> > > > > > > > > But turn out that's not true.
> > > > > > > > > 
> > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > > can end up
> > > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > > _mapcount
> > > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > > vmscan and by
> > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > > too).
> > > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > > freed anon_vma.
> > > > > > > > > 
> > > > > > > > > BOOM!
> > > > > > > > > 
> > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > > migration
> > > > > > > > > entries logic: on setup we remove page from rmap and drop 
> > > > > > > > > pin, on removing
> > > > > > > > > we get pin back and put page on rmap.

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-03 Thread Minchan Kim

On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > Hello Kirill,
> > > 
> > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > Hello Hugh,
> > > > > > > > > > 
> > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > > had another oops
> > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > > was not page_mapped
> > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > to be true.
> > > > > > > > > > > > 
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > flags: 
> > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > 
> > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > migration series.
> > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > you got before.
> > > > > > > > > > 
> > > > > > > > > > I will roll back to 
> > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > migration cleanup
> > > > > > > > > > series and will test it again.
> > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > patchset, then.
> > > > > > > > > 
> > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > for a long time.
> > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > in there.
> > > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > > test kernels.
> > > > > > > > 
> > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > track it down
> > > > > > > > finally.
> > > > > > > >  
> > > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > > tested, but
> > > > > > > > looks like it works.
> > > > > > > > 
> > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > thought that
> > > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > > mapping.
> > > > > > > > 
> > > > > > > > But turn out that's not true.
> > > > > > > > 
> > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > can end up
> > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > _mapcount
> > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > vmscan and by
> > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > too).
> > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > freed anon_vma.
> > > > > > > > 
> > > > > > > > BOOM!
> > > > > > > > 
> > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > migration
> > > > > > > > entries logic: on setup we remove page from rmap and drop pin, 
> > > > > > > > on removing
> > > > > > > > we get pin back and put page on rmap. This way even if 
> > > > > > > > migration entry
> > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > 
> > > > > > > > Please, test.
> > > > > > > > 
> > > > > > > 
> > > > > > > kernel: On mmotm-2015-10-15-15-20

Re: kernel oops on mmotm-2015-10-15-15-20

2015-11-03 Thread Minchan Kim

On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > Hello Kirill,
> > > 
> > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > Hello Hugh,
> > > > > > > > > > 
> > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > > had another oops
> > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > (kernel is based on recent mmotm + unconditional 
> > > > > > > > > > > > mkdirty for bug fix)
> > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > > was not page_mapped
> > > > > > > > > > > > at that time but second check of page_mapped right 
> > > > > > > > > > > > before try_to_unmap seems
> > > > > > > > > > > > to be true.
> > > > > > > > > > > > 
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 
> > > > > > > > > > > > extents:1 across:4191228k FS
> > > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > > flags: 
> > > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > > 
> > > > > > > > > > > That's interesting, that's one I added in my page 
> > > > > > > > > > > migration series.
> > > > > > > > > > > Let me think on it, but it could well relate to the one 
> > > > > > > > > > > you got before.
> > > > > > > > > > 
> > > > > > > > > > I will roll back to 
> > > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > instead of next-20151021 to remove noise from your 
> > > > > > > > > > migration cleanup
> > > > > > > > > > series and will test it again.
> > > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > > patchset, then.
> > > > > > > > > 
> > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach 
> > > > > > > > > for a long time.
> > > > > > > > > Therefore, there is no patchset from Hugh's migration patch 
> > > > > > > > > in there.
> > > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > > test kernels.
> > > > > > > > 
> > > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > > track it down
> > > > > > > > finally.
> > > > > > > >  
> > > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > > tested, but
> > > > > > > > looks like it works.
> > > > > > > > 
> > > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > > thought that
> > > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > > mapping.
> > > > > > > > 
> > > > > > > > But turn out that's not true.
> > > > > > > > 
> > > > > > > > As result if zap_pte_range() races with split_huge_page(), we 
> > > > > > > > can end up
> > > > > > > > with page which is not mapped anymore but has _count and 
> > > > > > > > _mapcount
> > > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > > vmscan and by
> > > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > > too).
> > > > > > > > It's likely that page->mapping in this case would point to 
> > > > > > > > freed anon_vma.
> > > > > > > > 
> > > > > > > > BOOM!
> > > > > > > > 
> > > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > > migration
> > > > > > > > entries logic: on setup we remove page from rmap and drop pin, 
> > > > > > > > on removing
> > > > > > > > we get pin back and put page on rmap. This way even if 
> > > > > > > > migration entry
> > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > 
> > > > > > > > Please, test.
> > > > > > > > 
> > > > > > > 
> > > > > > > kernel: On mmotm-2015-10-15-15-20

Re: kernel oops on mmotm-2015-10-15-15-20

On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > Hello Kirill,
> > 
> > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > Hello Hugh,
> > > > > > > > > 
> > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > 
> > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > had another oops
> > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > > for bug fix)
> > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > was not page_mapped
> > > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > > try_to_unmap seems
> > > > > > > > > > > to be true.
> > > > > > > > > > > 
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > flags: 
> > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > 
> > > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > > series.
> > > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > > got before.
> > > > > > > > > 
> > > > > > > > > I will roll back to 
> > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > > cleanup
> > > > > > > > > series and will test it again.
> > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > patchset, then.
> > > > > > > > 
> > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for 
> > > > > > > > a long time.
> > > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > > there.
> > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > test kernels.
> > > > > > > 
> > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > track it down
> > > > > > > finally.
> > > > > > >  
> > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > tested, but
> > > > > > > looks like it works.
> > > > > > > 
> > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > thought that
> > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > mapping.
> > > > > > > 
> > > > > > > But turn out that's not true.
> > > > > > > 
> > > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > > end up
> > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > vmscan and by
> > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > too).
> > > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > > anon_vma.
> > > > > > > 
> > > > > > > BOOM!
> > > > > > > 
> > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > migration
> > > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > > removing
> > > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > > entry
> > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > 
> > > > > > > Please, test.
> > > > > > > 
> > > > > > 
> > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > > patch, I tested
> > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > 
> > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > > index:0x61800 compound_mapcount: 0
> > > > > > flags:

Re: kernel oops on mmotm-2015-10-15-15-20

On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> Hello Kirill,
> 
> On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > Hello Hugh,
> > > > > > > > 
> > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > 
> > > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > > another oops
> > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > for bug fix)
> > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > > not page_mapped
> > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > try_to_unmap seems
> > > > > > > > > > to be true.
> > > > > > > > > > 
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > flags: 
> > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > 
> > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > series.
> > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > got before.
> > > > > > > > 
> > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > cleanup
> > > > > > > > series and will test it again.
> > > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > > then.
> > > > > > > 
> > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > > long time.
> > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > there.
> > > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > > kernels.
> > > > > > 
> > > > > > It took too long time (and a lot of printk()), but I think I track 
> > > > > > it down
> > > > > > finally.
> > > > > >  
> > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > tested, but
> > > > > > looks like it works.
> > > > > > 
> > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > thought that
> > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > mapping.
> > > > > > 
> > > > > > But turn out that's not true.
> > > > > > 
> > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > end up
> > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > > and by
> > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > anon_vma.
> > > > > > 
> > > > > > BOOM!
> > > > > > 
> > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > migration
> > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > removing
> > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > entry
> > > > > > will be removed under us we don't corrupt page's state.
> > > > > > 
> > > > > > Please, test.
> > > > > > 
> > > > > 
> > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > patch, I tested
> > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > 
> > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > index:0x61800 compound_mapcount: 0
> > > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > page->mem_cgroup:88007f613c00
> > > > 
> > > > Ignore my previous answer. Still sleeping.
> > > > 
> > > > The right way to fix I think is something like:
> > > > 
> > > > diff --git a/mm/rmap.c

Re: kernel oops on mmotm-2015-10-15-15-20

Hello Kirill,

On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > Hello Hugh,
> > > > > > > 
> > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > 
> > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > another oops
> > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > > bug fix)
> > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > not page_mapped
> > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > try_to_unmap seems
> > > > > > > > > to be true.
> > > > > > > > > 
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > flags: 
> > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > 
> > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > series.
> > > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > > before.
> > > > > > > 
> > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > cleanup
> > > > > > > series and will test it again.
> > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > then.
> > > > > > 
> > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > long time.
> > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > there.
> > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > kernels.
> > > > > 
> > > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > > down
> > > > > finally.
> > > > >  
> > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > tested, but
> > > > > looks like it works.
> > > > > 
> > > > > The problem was my wrong assumption on how migration works: I thought 
> > > > > that
> > > > > kernel would wait migration to finish on before deconstruction 
> > > > > mapping.
> > > > > 
> > > > > But turn out that's not true.
> > > > > 
> > > > > As result if zap_pte_range() races with split_huge_page(), we can end 
> > > > > up
> > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > and by
> > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > It's likely that page->mapping in this case would point to freed 
> > > > > anon_vma.
> > > > > 
> > > > > BOOM!
> > > > > 
> > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > removing
> > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > will be removed under us we don't corrupt page's state.
> > > > > 
> > > > > Please, test.
> > > > > 
> > > > 
> > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, 
> > > > I tested
> > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > 
> > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > index:0x61800 compound_mapcount: 0
> > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > page->mem_cgroup:88007f613c00
> > > 
> > > Ignore my previous answer. Still sleeping.
> > > 
> > > The right way to fix I think is something like:
> > > 
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 35643176bc15..f2d46792a554 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > >   bool compound = flags & RMAP_COMPOUND;
> > >   bool first;
> > >  
> > > - if (PageTransCompound(page)) {
> > >

Re: kernel oops on mmotm-2015-10-15-15-20

On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > Hello Hugh,
> > > > > > 
> > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > 
> > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > another oops
> > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > bug fix)
> > > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > > page_mapped
> > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > try_to_unmap seems
> > > > > > > > to be true.
> > > > > > > > 
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > flags: 
> > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > 
> > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > series.
> > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > before.
> > > > > > 
> > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > series and will test it again.
> > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > then.
> > > > > 
> > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > > time.
> > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > And I added below debug code with request from Kirill to all test 
> > > > > kernels.
> > > > 
> > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > down
> > > > finally.
> > > >  
> > > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > > but
> > > > looks like it works.
> > > > 
> > > > The problem was my wrong assumption on how migration works: I thought 
> > > > that
> > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > 
> > > > But turn out that's not true.
> > > > 
> > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > with page which is not mapped anymore but has _count and _mapcount
> > > > elevated. The page is on LRU too. So it's still reachable by vmscan and 
> > > > by
> > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > It's likely that page->mapping in this case would point to freed 
> > > > anon_vma.
> > > > 
> > > > BOOM!
> > > > 
> > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > removing
> > > > we get pin back and put page on rmap. This way even if migration entry
> > > > will be removed under us we don't corrupt page's state.
> > > > 
> > > > Please, test.
> > > > 
> > > 
> > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > > tested
> > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > 
> > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > index:0x61800 compound_mapcount: 0
> > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > page->mem_cgroup:88007f613c00
> > 
> > Ignore my previous answer. Still sleeping.
> > 
> > The right way to fix I think is something like:
> > 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 35643176bc15..f2d46792a554 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > bool compound = flags & RMAP_COMPOUND;
> > bool first;
> >  
> > -   if (PageTransCompound(page)) {
> > +   if (PageTransCompound(page) && compound) {
> > +   atomic_t *mapcount;
> > VM_BUG_ON_PAGE(!PageLocked(page), page);
> > -   if (compound) {
> > -   atomic_t *mapcount;
> > -
> > -   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > -   mapcount =

Re: kernel oops on mmotm-2015-10-15-15-20

On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > Hello Hugh,
> > > > > > 
> > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > 
> > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > another oops
> > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > bug fix)
> > > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > > page_mapped
> > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > try_to_unmap seems
> > > > > > > > to be true.
> > > > > > > > 
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > across:4191228k FS
> > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > flags: 
> > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > 
> > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > series.
> > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > before.
> > > > > > 
> > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > series and will test it again.
> > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > then.
> > > > > 
> > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > > time.
> > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > And I added below debug code with request from Kirill to all test 
> > > > > kernels.
> > > > 
> > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > down
> > > > finally.
> > > >  
> > > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > > but
> > > > looks like it works.
> > > > 
> > > > The problem was my wrong assumption on how migration works: I thought 
> > > > that
> > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > 
> > > > But turn out that's not true.
> > > > 
> > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > with page which is not mapped anymore but has _count and _mapcount
> > > > elevated. The page is on LRU too. So it's still reachable by vmscan and 
> > > > by
> > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > It's likely that page->mapping in this case would point to freed 
> > > > anon_vma.
> > > > 
> > > > BOOM!
> > > > 
> > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > removing
> > > > we get pin back and put page on rmap. This way even if migration entry
> > > > will be removed under us we don't corrupt page's state.
> > > > 
> > > > Please, test.
> > > > 
> > > 
> > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > > tested
> > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > 
> > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > index:0x61800 compound_mapcount: 0
> > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > page->mem_cgroup:88007f613c00
> > 
> > Ignore my previous answer. Still sleeping.
> > 
> > The right way to fix I think is something like:
> > 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 35643176bc15..f2d46792a554 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > bool compound = flags & RMAP_COMPOUND;
> > bool first;
> >  
> > -   if (PageTransCompound(page)) {
> > +   if (PageTransCompound(page) && compound) {
> > +   atomic_t *mapcount;
> > VM_BUG_ON_PAGE(!PageLocked(page), page);
> > -   if (compound) {
> > -   atomic_t *mapcount;
> > -
> > -   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > -   mapcount =

Re: kernel oops on mmotm-2015-10-15-15-20

On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > Hello Kirill,
> > 
> > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > Hello Hugh,
> > > > > > > > > 
> > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > 
> > > > > > > > > > > I added the code to check it and queued it again but I 
> > > > > > > > > > > had another oops
> > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > > for bug fix)
> > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page 
> > > > > > > > > > > was not page_mapped
> > > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > > try_to_unmap seems
> > > > > > > > > > > to be true.
> > > > > > > > > > > 
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > > across:4191228k FS
> > > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > > flags: 
> > > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > > 
> > > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > > series.
> > > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > > got before.
> > > > > > > > > 
> > > > > > > > > I will roll back to 
> > > > > > > > > mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > > cleanup
> > > > > > > > > series and will test it again.
> > > > > > > > > If it is fixed, I will test again with your migration 
> > > > > > > > > patchset, then.
> > > > > > > > 
> > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for 
> > > > > > > > a long time.
> > > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > > there.
> > > > > > > > And I added below debug code with request from Kirill to all 
> > > > > > > > test kernels.
> > > > > > > 
> > > > > > > It took too long time (and a lot of printk()), but I think I 
> > > > > > > track it down
> > > > > > > finally.
> > > > > > >  
> > > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > > tested, but
> > > > > > > looks like it works.
> > > > > > > 
> > > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > > thought that
> > > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > > mapping.
> > > > > > > 
> > > > > > > But turn out that's not true.
> > > > > > > 
> > > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > > end up
> > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > elevated. The page is on LRU too. So it's still reachable by 
> > > > > > > vmscan and by
> > > > > > > pfn scanners (Sasha showed few similar traces from compaction 
> > > > > > > too).
> > > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > > anon_vma.
> > > > > > > 
> > > > > > > BOOM!
> > > > > > > 
> > > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > > migration
> > > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > > removing
> > > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > > entry
> > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > 
> > > > > > > Please, test.
> > > > > > > 
> > > > > > 
> > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > > patch, I tested
> > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > 
> > > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > > index:0x61800 compound_mapcount: 0
> > > > > > flags:

Re: kernel oops on mmotm-2015-10-15-15-20

On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> Hello Kirill,
> 
> On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > Hello Hugh,
> > > > > > > > 
> > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > 
> > > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > > another oops
> > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty 
> > > > > > > > > > for bug fix)
> > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > > not page_mapped
> > > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > > try_to_unmap seems
> > > > > > > > > > to be true.
> > > > > > > > > > 
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > > across:4191228k FS
> > > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > > flags: 
> > > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > > 
> > > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > > series.
> > > > > > > > > Let me think on it, but it could well relate to the one you 
> > > > > > > > > got before.
> > > > > > > > 
> > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > > cleanup
> > > > > > > > series and will test it again.
> > > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > > then.
> > > > > > > 
> > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > > long time.
> > > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > > there.
> > > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > > kernels.
> > > > > > 
> > > > > > It took too long time (and a lot of printk()), but I think I track 
> > > > > > it down
> > > > > > finally.
> > > > > >  
> > > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > > tested, but
> > > > > > looks like it works.
> > > > > > 
> > > > > > The problem was my wrong assumption on how migration works: I 
> > > > > > thought that
> > > > > > kernel would wait migration to finish on before deconstruction 
> > > > > > mapping.
> > > > > > 
> > > > > > But turn out that's not true.
> > > > > > 
> > > > > > As result if zap_pte_range() races with split_huge_page(), we can 
> > > > > > end up
> > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > > and by
> > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > It's likely that page->mapping in this case would point to freed 
> > > > > > anon_vma.
> > > > > > 
> > > > > > BOOM!
> > > > > > 
> > > > > > The patch modify freeze/unfreeze_page() code to match normal 
> > > > > > migration
> > > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > > removing
> > > > > > we get pin back and put page on rmap. This way even if migration 
> > > > > > entry
> > > > > > will be removed under us we don't corrupt page's state.
> > > > > > 
> > > > > > Please, test.
> > > > > > 
> > > > > 
> > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new 
> > > > > patch, I tested
> > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > 
> > > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > > index:0x61800 compound_mapcount: 0
> > > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > page->mem_cgroup:88007f613c00
> > > > 
> > > > Ignore my previous answer. Still sleeping.
> > > > 
> > > > The right way to fix I think is something like:
> > > > 
> > > > diff --git a/mm/rmap.c

Re: kernel oops on mmotm-2015-10-15-15-20

Hello Kirill,

On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > Hello Hugh,
> > > > > > > 
> > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > 
> > > > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > > > another oops
> > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for 
> > > > > > > > > bug fix)
> > > > > > > > > It seems page_get_anon_vma returns NULL since the page was 
> > > > > > > > > not page_mapped
> > > > > > > > > at that time but second check of page_mapped right before 
> > > > > > > > > try_to_unmap seems
> > > > > > > > > to be true.
> > > > > > > > > 
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > > > across:4191228k FS
> > > > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 
> > > > > > > > > mapping:88007f1b5f51 index:0x60aff
> > > > > > > > > flags: 
> > > > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > > > 
> > > > > > > > That's interesting, that's one I added in my page migration 
> > > > > > > > series.
> > > > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > > > before.
> > > > > > > 
> > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > instead of next-20151021 to remove noise from your migration 
> > > > > > > cleanup
> > > > > > > series and will test it again.
> > > > > > > If it is fixed, I will test again with your migration patchset, 
> > > > > > > then.
> > > > > > 
> > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a 
> > > > > > long time.
> > > > > > Therefore, there is no patchset from Hugh's migration patch in 
> > > > > > there.
> > > > > > And I added below debug code with request from Kirill to all test 
> > > > > > kernels.
> > > > > 
> > > > > It took too long time (and a lot of printk()), but I think I track it 
> > > > > down
> > > > > finally.
> > > > >  
> > > > > The patch below seems fixes issue for me. It's not yet properly 
> > > > > tested, but
> > > > > looks like it works.
> > > > > 
> > > > > The problem was my wrong assumption on how migration works: I thought 
> > > > > that
> > > > > kernel would wait migration to finish on before deconstruction 
> > > > > mapping.
> > > > > 
> > > > > But turn out that's not true.
> > > > > 
> > > > > As result if zap_pte_range() races with split_huge_page(), we can end 
> > > > > up
> > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > elevated. The page is on LRU too. So it's still reachable by vmscan 
> > > > > and by
> > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > It's likely that page->mapping in this case would point to freed 
> > > > > anon_vma.
> > > > > 
> > > > > BOOM!
> > > > > 
> > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > entries logic: on setup we remove page from rmap and drop pin, on 
> > > > > removing
> > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > will be removed under us we don't corrupt page's state.
> > > > > 
> > > > > Please, test.
> > > > > 
> > > > 
> > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, 
> > > > I tested
> > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > 
> > > > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > > > index:0x61800 compound_mapcount: 0
> > > > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > page->mem_cgroup:88007f613c00
> > > 
> > > Ignore my previous answer. Still sleeping.
> > > 
> > > The right way to fix I think is something like:
> > > 
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 35643176bc15..f2d46792a554 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > >   bool compound = flags & RMAP_COMPOUND;
> > >   bool first;
> > >  
> > > - if (PageTransCompound(page)) {
> > >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-30 Thread Minchan Kim

On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > Hello Hugh,
> > > > > 
> > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > 
> > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > another oops
> > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > > fix)
> > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > page_mapped
> > > > > > > at that time but second check of page_mapped right before 
> > > > > > > try_to_unmap seems
> > > > > > > to be true.
> > > > > > > 
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > > index:0x60aff
> > > > > > > flags: 
> > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > 
> > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > before.
> > > > > 
> > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > series and will test it again.
> > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > 
> > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > time.
> > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > And I added below debug code with request from Kirill to all test 
> > > > kernels.
> > > 
> > > It took too long time (and a lot of printk()), but I think I track it down
> > > finally.
> > >  
> > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > but
> > > looks like it works.
> > > 
> > > The problem was my wrong assumption on how migration works: I thought that
> > > kernel would wait migration to finish on before deconstruction mapping.
> > > 
> > > But turn out that's not true.
> > > 
> > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > with page which is not mapped anymore but has _count and _mapcount
> > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > 
> > > BOOM!
> > > 
> > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > we get pin back and put page on rmap. This way even if migration entry
> > > will be removed under us we don't corrupt page's state.
> > > 
> > > Please, test.
> > > 
> > 
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > tested
> > one I sent to you(ie, oops.c + memcg_test.sh)
> > 
> > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > index:0x61800 compound_mapcount: 0
> > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > page->mem_cgroup:88007f613c00
> 
> Ignore my previous answer. Still sleeping.
> 
> The right way to fix I think is something like:
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..f2d46792a554 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
>   bool compound = flags & RMAP_COMPOUND;
>   bool first;
>  
> - if (PageTransCompound(page)) {
> + if (PageTransCompound(page) && compound) {
> + atomic_t *mapcount;
>   VM_BUG_ON_PAGE(!PageLocked(page), page);
> - if (compound) {
> - atomic_t *mapcount;
> -
> - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> - mapcount = compound_mapcount_ptr(page);
> - first = atomic_inc_and_test(mapcount);
> - } else {
> - /* Anon THP always mapped first with PMD */
> - first = 0;
> - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> - atomic_inc(>_mapcount);
> -

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-30 Thread Minchan Kim

On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > Hello Hugh,
> > > > > 
> > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > 
> > > > > > > I added the code to check it and queued it again but I had 
> > > > > > > another oops
> > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > > fix)
> > > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > > page_mapped
> > > > > > > at that time but second check of page_mapped right before 
> > > > > > > try_to_unmap seems
> > > > > > > to be true.
> > > > > > > 
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > > across:4191228k FS
> > > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > > index:0x60aff
> > > > > > > flags: 
> > > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > > !PageKsm(page) && !anon_vma)
> > > > > > 
> > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > Let me think on it, but it could well relate to the one you got 
> > > > > > before.
> > > > > 
> > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > series and will test it again.
> > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > 
> > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > > time.
> > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > And I added below debug code with request from Kirill to all test 
> > > > kernels.
> > > 
> > > It took too long time (and a lot of printk()), but I think I track it down
> > > finally.
> > >  
> > > The patch below seems fixes issue for me. It's not yet properly tested, 
> > > but
> > > looks like it works.
> > > 
> > > The problem was my wrong assumption on how migration works: I thought that
> > > kernel would wait migration to finish on before deconstruction mapping.
> > > 
> > > But turn out that's not true.
> > > 
> > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > with page which is not mapped anymore but has _count and _mapcount
> > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > 
> > > BOOM!
> > > 
> > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > we get pin back and put page on rmap. This way even if migration entry
> > > will be removed under us we don't corrupt page's state.
> > > 
> > > Please, test.
> > > 
> > 
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> > tested
> > one I sent to you(ie, oops.c + memcg_test.sh)
> > 
> > page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> > index:0x61800 compound_mapcount: 0
> > flags: 0x40044009(locked|uptodate|head|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > page->mem_cgroup:88007f613c00
> 
> Ignore my previous answer. Still sleeping.
> 
> The right way to fix I think is something like:
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..f2d46792a554 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
>   bool compound = flags & RMAP_COMPOUND;
>   bool first;
>  
> - if (PageTransCompound(page)) {
> + if (PageTransCompound(page) && compound) {
> + atomic_t *mapcount;
>   VM_BUG_ON_PAGE(!PageLocked(page), page);
> - if (compound) {
> - atomic_t *mapcount;
> -
> - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> - mapcount = compound_mapcount_ptr(page);
> - first = atomic_inc_and_test(mapcount);
> - } else {
> - /* Anon THP always mapped first with PMD */
> - first = 0;
> - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> - atomic_inc(>_mapcount);
> -

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> page->mem_cgroup:88007f613c00

Ignore my previous answer. Still sleeping.

The right way to fix I think is something like:

diff --git a/mm/rmap.c b/mm/rmap.c
index 35643176bc15..f2d46792a554 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
bool compound = flags & RMAP_COMPOUND;
bool first;
 
-   if (PageTransCompound(page)) {
+   if (PageTransCompound(page) && compound) {
+   atomic_t *mapcount;
VM_BUG_ON_PAGE(!PageLocked(page), page);
-   if (compound) {
-   atomic_t *mapcount;
-
-   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-   mapcount = compound_mapcount_ptr(page);
-   first = atomic_inc_and_test(mapcount);
-   } else {
-   /* Anon THP always mapped first with PMD */
-   first = 0;
-   VM_BUG_ON_PAGE(!page_mapcount(page), page);
-   atomic_inc(>_mapcount);
-   }
+   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+   mapcount = compound_mapcount_ptr(page);
+   first = atomic_inc_and_test(mapcount);
} else {
VM_BUG_ON_PAGE(compound, page);
first

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))

The VM_BUG_ON_PAGE() is bogus after the patch. Just drop it.

> page->mem_cgroup:88007f613c00
> [ cut here ]
> kernel BUG at mm/rmap.c:1156!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
> #1573
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
> RIP: 0010:[]  [] 
> do_page_add_anon_rmap+0x323/0x360
> RSP: :8805f758  EFLAGS: 00010292
> RAX: 0021 RBX: ea00016a RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 8805f780 R08:  R09: 880b8be0
> R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
> R13: 6180 R14:  R15: 88007e85ddc0
> FS:  7f5cd5fea740() GS:8800bfae() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 64c03000 CR3: 7f017000 CR4: 06a0
> Stack:
>  88007f351000 88007f352000 ea00016a 6180
>  88007e85ddc0 8805f790 81128278 8805f800
>  81146dbb 000619ff

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Minchan Kim

On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> > > 
> > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > instead of next-20151021 to remove noise from your migration cleanup
> > > series and will test it again.
> > > If it is fixed, I will test again with your migration patchset, then.
> > 
> > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > Therefore, there is no patchset from Hugh's migration patch in there.
> > And I added below debug code with request from Kirill to all test kernels.
> 
> It took too long time (and a lot of printk()), but I think I track it down
> finally.
>  
> The patch below seems fixes issue for me. It's not yet properly tested, but
> looks like it works.
> 
> The problem was my wrong assumption on how migration works: I thought that
> kernel would wait migration to finish on before deconstruction mapping.
> 
> But turn out that's not true.
> 
> As result if zap_pte_range() races with split_huge_page(), we can end up
> with page which is not mapped anymore but has _count and _mapcount
> elevated. The page is on LRU too. So it's still reachable by vmscan and by
> pfn scanners (Sasha showed few similar traces from compaction too).
> It's likely that page->mapping in this case would point to freed anon_vma.
> 
> BOOM!
> 
> The patch modify freeze/unfreeze_page() code to match normal migration
> entries logic: on setup we remove page from rmap and drop pin, on removing
> we get pin back and put page on rmap. This way even if migration entry
> will be removed under us we don't corrupt page's state.
> 
> Please, test.
> 

kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
one I sent to you(ie, oops.c + memcg_test.sh)

page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
index:0x61800 compound_mapcount: 0
flags: 0x40044009(locked|uptodate|head|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
page->mem_cgroup:88007f613c00
[ cut here ]
kernel BUG at mm/rmap.c:1156!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
#1573
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
RIP: 0010:[]  [] 
do_page_add_anon_rmap+0x323/0x360
RSP: :8805f758  EFLAGS: 00010292
RAX: 0021 RBX: ea00016a RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8805f780 R08:  R09: 880b8be0
R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
R13: 6180 R14:  R15: 88007e85ddc0
FS:  7f5cd5fea740() GS:8800bfae() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 64c03000 CR3: 7f017000 CR4: 06a0
Stack:
 88007f351000 88007f352000 ea00016a 6180
 88007e85ddc0 8805f790 81128278 8805f800
 81146dbb 000619ff 00061800 1600
Call Trace:
 [] page_add_anon_rmap+0x18/0x20
 [] unfreeze_page+0x24b/0x330
 [] split_huge_page_to_list+0x3df/0x920
 [] ? scan_swap_map+0x37f/0x550
 [] add_to_swap+0xb6/0x100
 [] shrink_page_list+0x3b7/0xdc0
 [] shrink_inactive_list+0x18c/0x4b0
 [] shrink_lruvec+0x58f/0x730
 [] shrink_zone+0xd4/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 []

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-29 Thread Minchan Kim

On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> > > 
> > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > instead of next-20151021 to remove noise from your migration cleanup
> > > series and will test it again.
> > > If it is fixed, I will test again with your migration patchset, then.
> > 
> > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > Therefore, there is no patchset from Hugh's migration patch in there.
> > And I added below debug code with request from Kirill to all test kernels.
> 
> It took too long time (and a lot of printk()), but I think I track it down
> finally.
>  
> The patch below seems fixes issue for me. It's not yet properly tested, but
> looks like it works.
> 
> The problem was my wrong assumption on how migration works: I thought that
> kernel would wait migration to finish on before deconstruction mapping.
> 
> But turn out that's not true.
> 
> As result if zap_pte_range() races with split_huge_page(), we can end up
> with page which is not mapped anymore but has _count and _mapcount
> elevated. The page is on LRU too. So it's still reachable by vmscan and by
> pfn scanners (Sasha showed few similar traces from compaction too).
> It's likely that page->mapping in this case would point to freed anon_vma.
> 
> BOOM!
> 
> The patch modify freeze/unfreeze_page() code to match normal migration
> entries logic: on setup we remove page from rmap and drop pin, on removing
> we get pin back and put page on rmap. This way even if migration entry
> will be removed under us we don't corrupt page's state.
> 
> Please, test.
> 

kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
one I sent to you(ie, oops.c + memcg_test.sh)

page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
index:0x61800 compound_mapcount: 0
flags: 0x40044009(locked|uptodate|head|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
page->mem_cgroup:88007f613c00
[ cut here ]
kernel BUG at mm/rmap.c:1156!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
#1573
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
RIP: 0010:[]  [] 
do_page_add_anon_rmap+0x323/0x360
RSP: :8805f758  EFLAGS: 00010292
RAX: 0021 RBX: ea00016a RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8805f780 R08:  R09: 880b8be0
R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
R13: 6180 R14:  R15: 88007e85ddc0
FS:  7f5cd5fea740() GS:8800bfae() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 64c03000 CR3: 7f017000 CR4: 06a0
Stack:
 88007f351000 88007f352000 ea00016a 6180
 88007e85ddc0 8805f790 81128278 8805f800
 81146dbb 000619ff 00061800 1600
Call Trace:
 [] page_add_anon_rmap+0x18/0x20
 [] unfreeze_page+0x24b/0x330
 [] split_huge_page_to_list+0x3df/0x920
 [] ? scan_swap_map+0x37f/0x550
 [] add_to_swap+0xb6/0x100
 [] shrink_page_list+0x3b7/0xdc0
 [] shrink_inactive_list+0x18c/0x4b0
 [] shrink_lruvec+0x58f/0x730
 [] shrink_zone+0xd4/0x280
 [] do_try_to_free_pages+0x12d/0x3b0
 []

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))

The VM_BUG_ON_PAGE() is bogus after the patch. Just drop it.

> page->mem_cgroup:88007f613c00
> [ cut here ]
> kernel BUG at mm/rmap.c:1156!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ 
> #1573
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 8800b8804ec0 ti: 8805c000 task.ti: 8805c000
> RIP: 0010:[]  [] 
> do_page_add_anon_rmap+0x323/0x360
> RSP: :8805f758  EFLAGS: 00010292
> RAX: 0021 RBX: ea00016a RCX: 81830db8
> RDX: 0001 RSI: 0246 RDI: 821df4d8
> RBP: 8805f780 R08:  R09: 880b8be0
> R10: 8163d7c0 R11: 01a5 R12: 88007e85ddc0
> R13: 6180 R14:  R15: 88007e85ddc0
> FS:  7f5cd5fea740() GS:8800bfae() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 64c03000 CR3: 7f017000 CR4: 06a0
> Stack:
>  88007f351000 88007f352000 ea00016a 6180
>  88007e85ddc0 8805f790 81128278 8805f800
>  81146dbb 000619ff

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another 
> > > > > > oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug 
> > > > > > fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > > page_mapped
> > > > > > at that time but second check of page_mapped right before 
> > > > > > try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > > across:4191228k FS
> > > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > > index:0x60aff
> > > > > > flags: 
> > > > > > 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && 
> > > > > > !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got 
> > > > > before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long 
> > > time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I 
> tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ea00016a count:3 mapcount:0 mapping:88007f49d001 
> index:0x61800 compound_mapcount: 0
> flags: 0x40044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> page->mem_cgroup:88007f613c00

Ignore my previous answer. Still sleeping.

The right way to fix I think is something like:

diff --git a/mm/rmap.c b/mm/rmap.c
index 35643176bc15..f2d46792a554 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
bool compound = flags & RMAP_COMPOUND;
bool first;
 
-   if (PageTransCompound(page)) {
+   if (PageTransCompound(page) && compound) {
+   atomic_t *mapcount;
VM_BUG_ON_PAGE(!PageLocked(page), page);
-   if (compound) {
-   atomic_t *mapcount;
-
-   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-   mapcount = compound_mapcount_ptr(page);
-   first = atomic_inc_and_test(mapcount);
-   } else {
-   /* Anon THP always mapped first with PMD */
-   first = 0;
-   VM_BUG_ON_PAGE(!page_mapcount(page), page);
-   atomic_inc(>_mapcount);
-   }
+   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+   mapcount = compound_mapcount_ptr(page);
+   first = atomic_inc_and_test(mapcount);
} else {
VM_BUG_ON_PAGE(compound, page);
first

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-28 Thread Kirill A. Shutemov

On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.
> > 
> > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > instead of next-20151021 to remove noise from your migration cleanup
> > series and will test it again.
> > If it is fixed, I will test again with your migration patchset, then.
> 
> I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> Therefore, there is no patchset from Hugh's migration patch in there.
> And I added below debug code with request from Kirill to all test kernels.

It took too long time (and a lot of printk()), but I think I track it down
finally.
 
The patch below seems fixes issue for me. It's not yet properly tested, but
looks like it works.

The problem was my wrong assumption on how migration works: I thought that
kernel would wait migration to finish on before deconstruction mapping.

But turn out that's not true.

As result if zap_pte_range() races with split_huge_page(), we can end up
with page which is not mapped anymore but has _count and _mapcount
elevated. The page is on LRU too. So it's still reachable by vmscan and by
pfn scanners (Sasha showed few similar traces from compaction too).
It's likely that page->mapping in this case would point to freed anon_vma.

BOOM!

The patch modify freeze/unfreeze_page() code to match normal migration
entries logic: on setup we remove page from rmap and drop pin, on removing
we get pin back and put page on rmap. This way even if migration entry
will be removed under us we don't corrupt page's state.

Please, test.

Not-Yet-Signed-off-by: Kirill A. Shutemov 

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5e0fe82a0fae..192b50c7526c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct 
vm_area_struct *vma, pmd_t *pmd,
 
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
+
+   if (freeze) {
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+   page_remove_rmap(page + i, false);
+   put_page(page + i);
+   }
+   }
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (pte_soft_dirty(entry))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
+   page_remove_rmap(page, false);
+   put_page(page);
}
pte_unmap_unlock(pte, ptl);
 }
@@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
return;
pte = pte_offset_map_lock(vma->vm_mm, pmd, address, );
for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
-   if (!page_mapped(page))
-   continue;
if (!is_swap_pte(pte[i]))
continue;
 
@@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (migration_entry_to_page(swp_entry) != page)
continue;
 
+   get_page(page);
+   page_add_anon_rmap(page, vma, address, false);
+
entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
entry = pte_mkdirty(entry);
if (is_write_migration_entry(swp_entry))
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-28 Thread Kirill A. Shutemov

On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.
> > 
> > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > instead of next-20151021 to remove noise from your migration cleanup
> > series and will test it again.
> > If it is fixed, I will test again with your migration patchset, then.
> 
> I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> Therefore, there is no patchset from Hugh's migration patch in there.
> And I added below debug code with request from Kirill to all test kernels.

It took too long time (and a lot of printk()), but I think I track it down
finally.
 
The patch below seems fixes issue for me. It's not yet properly tested, but
looks like it works.

The problem was my wrong assumption on how migration works: I thought that
kernel would wait migration to finish on before deconstruction mapping.

But turn out that's not true.

As result if zap_pte_range() races with split_huge_page(), we can end up
with page which is not mapped anymore but has _count and _mapcount
elevated. The page is on LRU too. So it's still reachable by vmscan and by
pfn scanners (Sasha showed few similar traces from compaction too).
It's likely that page->mapping in this case would point to freed anon_vma.

BOOM!

The patch modify freeze/unfreeze_page() code to match normal migration
entries logic: on setup we remove page from rmap and drop pin, on removing
we get pin back and put page on rmap. This way even if migration entry
will be removed under us we don't corrupt page's state.

Please, test.

Not-Yet-Signed-off-by: Kirill A. Shutemov 

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5e0fe82a0fae..192b50c7526c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct 
vm_area_struct *vma, pmd_t *pmd,
 
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
+
+   if (freeze) {
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+   page_remove_rmap(page + i, false);
+   put_page(page + i);
+   }
+   }
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (pte_soft_dirty(entry))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
+   page_remove_rmap(page, false);
+   put_page(page);
}
pte_unmap_unlock(pte, ptl);
 }
@@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
return;
pte = pte_offset_map_lock(vma->vm_mm, pmd, address, );
for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
-   if (!page_mapped(page))
-   continue;
if (!is_swap_pte(pte[i]))
continue;
 
@@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, 
struct page *page,
if (migration_entry_to_page(swp_entry) != page)
continue;
 
+   get_page(page);
+   page_add_anon_rmap(page, vma, address, false);
+
entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
entry = pte_mkdirty(entry);
if (is_write_migration_entry(swp_entry))
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Hugh Dickins

On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Wed, 21 Oct 2015, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> 
> I think I have introduced a bug there; or rather, made more evident
> a pre-existing bug.  But I'm not sure yet: the stacktrace was from
> compaction (called by khugepaged, but that may not be relevant at all),
> and thinking through the races with isolate_migratepages_block() is
> never easy.
> 
> What's certain is that I was not giving any thought to
> isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
> I was thinking about "stable" anonymous pages, and how they get
> faulted back in from swapcache while holding page lock.
> 
> It looks to me now as if a page might not yet be PageAnon when it's
> first tested in __unmap_and_move(), when going to page_get_anon_vma();
> but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
> where I inserted the VM_BUG_ON_PAGE().
> 
> If so, the code would always have been wrong (trying to unmap the
> anonymous page, and later remap its replacement, without a hold on
> the anon_vma needed to guide both lookups); but I'll have made it
> more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
> that's a good step forward :)
> 
> There's a reference count check in isolated_migratepages_block()
> before this, which would make it unlikely, but I doubt rules it out.
> 
> However... you did hit an anon_vma reference counting problem before
> my migration changes went in, and Kirill had a vague suspicion that
> he might be screwing up anon_vma refcounting in split_huge_page():
> if he confirms that, I'd say it's more likely to be the cause of
> your crash on this occasion.
> 
> Not hard to fix mine (though we'll probably have to lose the
> VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
> trivial fix), I just want to give the races more thought.

And after giving it more thought, I realize that I was wrong yesterday,
and the new VM_BUG_ON_PAGE() should be good as is: my guess is that it
is simply alerting you to the same anon_vma reference counting issue
as you had already hit without that patch.

What I was forgetting yesterday, is that isolate_migratepages_block()
can only take the page for migration when it's PageLRU(): and
do_anonymous_page() only adds a page to the LRU after it has been
marked as mapped and PageAnon.

So the window that worried me yesterday, that __unmap_and_move()
might see !PageAnon, then reach try_to_unmap() with it page_mapped
and PageAnon: that window does not exist, with or without my changes.

Hugh

> 
> However it turns out, I think you have a very useful test there.
> 
> (And I've observed no PageDirty problems with your recent patchsets,
> though I don't use MADV_FREE at all myself.)
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Minchan Kim

On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
Therefore, there is no patchset from Hugh's migration patch in there.
And I added below debug code with request from Kirill to all test kernels.

diff --git a/mm/rmap.c b/mm/rmap.c
index ddfb9be72366..1c23b70b1f57 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -513,6 +513,13 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)

anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = READ_ONCE(anon_vma->root);
+
+   if (root_anon_vma == NULL) {
+   printk("anon_vma %p refcount %d\n", anon_vma,
+   atomic_read(_vma->refcount));
+   VM_BUG_ON_PAGE(1, page);
+   }
+
if (down_read_trylock(_anon_vma->rwsem)) {
/*
 * If the page is still mapped, then this anon_vma is still

1. mmotm-2015-10-15-15-20 + kirill's pte_mkdirty

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f1ed780 idx:1 val:488
BUG: Bad rss-counter state mm:88007f1ed780 idx:2 val:24

2nd trial:

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:8800a5cca680 idx:1 val:512
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS

2. mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP.

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f4c2d80 idx:1 val:511
BUG: Bad rss-counter state mm:88007f4c2d80 idx:2 val:1

2nd trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
anon_vma 88089aa0 refcount 0
page:ea0001a2ea40 count:3 mapcount:1 mapping:88089aa1 
index:0x647a9

I tested it with KVM which guest system has 12 core and 3G memory.
In mmotm-2015-10-15-15-20-no-madvise_free, I tweaked test program does
madvise_dontneed intead of madvise_free via below patch

For the testing,

gcc -o oops oops.c
./memcg_test.sh

I will be off from now on so please understand late response
but I hope my test program will reproduce it in your machine.

diff --git a/oops.c b/oops.c
index e50330a..c8298f8 100644
--- a/oops.c
+++ b/oops.c
@@ -8,7 +8,7 @@
 #include 
 #include 

-#define MADV_FREE 5
+#define MADV_FREE 4

 int pid;

memcg_move_task.sh
Description: Bourne shell script

memcg_test.sh
Description: Bourne shell script
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MADV_FREE 4

int pid;

void sig_handler(int signo)
{
printf("pid %d sig received %d\n", pid, signo);
	exit(1);
}

void free_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;

	for (i = 0; i < buf_count; i++) {
		if (bufs[i] != NULL) {
			munmap(bufs[i],  buf_size);
			bufs[i] = NULL;
		}
	}
}

void alloc_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;
	time_t rawtime;
	struct tm * timeinfo;
	void *addr = (void*)0x6000;

	for (i = 0; i < buf_count; i++) {
		void *ptr = NULL;

		ptr = mmap(addr, buf_size, PROT_READ|PROT_WRITE,
			MAP_ANON|MAP_PRIVATE|MAP_FIXED, 0, 0);

		if (ptr == MAP_FAILED) {
			char bufs[64];

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Minchan Kim

On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
Therefore, there is no patchset from Hugh's migration patch in there.
And I added below debug code with request from Kirill to all test kernels.

diff --git a/mm/rmap.c b/mm/rmap.c
index ddfb9be72366..1c23b70b1f57 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -513,6 +513,13 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)

anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = READ_ONCE(anon_vma->root);
+
+   if (root_anon_vma == NULL) {
+   printk("anon_vma %p refcount %d\n", anon_vma,
+   atomic_read(_vma->refcount));
+   VM_BUG_ON_PAGE(1, page);
+   }
+
if (down_read_trylock(_anon_vma->rwsem)) {
/*
 * If the page is still mapped, then this anon_vma is still

1. mmotm-2015-10-15-15-20 + kirill's pte_mkdirty

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f1ed780 idx:1 val:488
BUG: Bad rss-counter state mm:88007f1ed780 idx:2 val:24

2nd trial:

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:8800a5cca680 idx:1 val:512
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS

2. mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP.

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:88007f4c2d80 idx:1 val:511
BUG: Bad rss-counter state mm:88007f4c2d80 idx:2 val:1

2nd trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
anon_vma 88089aa0 refcount 0
page:ea0001a2ea40 count:3 mapcount:1 mapping:88089aa1 
index:0x647a9

I tested it with KVM which guest system has 12 core and 3G memory.
In mmotm-2015-10-15-15-20-no-madvise_free, I tweaked test program does
madvise_dontneed intead of madvise_free via below patch

For the testing,

gcc -o oops oops.c
./memcg_test.sh

I will be off from now on so please understand late response
but I hope my test program will reproduce it in your machine.

diff --git a/oops.c b/oops.c
index e50330a..c8298f8 100644
--- a/oops.c
+++ b/oops.c
@@ -8,7 +8,7 @@
 #include 
 #include 

-#define MADV_FREE 5
+#define MADV_FREE 4

 int pid;

memcg_move_task.sh
Description: Bourne shell script

memcg_test.sh
Description: Bourne shell script
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MADV_FREE 4

int pid;

void sig_handler(int signo)
{
printf("pid %d sig received %d\n", pid, signo);
	exit(1);
}

void free_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;

	for (i = 0; i < buf_count; i++) {
		if (bufs[i] != NULL) {
			munmap(bufs[i],  buf_size);
			bufs[i] = NULL;
		}
	}
}

void alloc_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;
	time_t rawtime;
	struct tm * timeinfo;
	void *addr = (void*)0x6000;

	for (i = 0; i < buf_count; i++) {
		void *ptr = NULL;

		ptr = mmap(addr, buf_size, PROT_READ|PROT_WRITE,
			MAP_ANON|MAP_PRIVATE|MAP_FIXED, 0, 0);

		if (ptr == MAP_FAILED) {
			char bufs[64];

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-22 Thread Hugh Dickins

On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Wed, 21 Oct 2015, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another 
> > > > > oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > > page_mapped
> > > > > at that time but second check of page_mapped right before 
> > > > > try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > > across:4191228k FS
> > > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > > index:0x60aff
> > > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) 
> > > > > && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> 
> I think I have introduced a bug there; or rather, made more evident
> a pre-existing bug.  But I'm not sure yet: the stacktrace was from
> compaction (called by khugepaged, but that may not be relevant at all),
> and thinking through the races with isolate_migratepages_block() is
> never easy.
> 
> What's certain is that I was not giving any thought to
> isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
> I was thinking about "stable" anonymous pages, and how they get
> faulted back in from swapcache while holding page lock.
> 
> It looks to me now as if a page might not yet be PageAnon when it's
> first tested in __unmap_and_move(), when going to page_get_anon_vma();
> but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
> where I inserted the VM_BUG_ON_PAGE().
> 
> If so, the code would always have been wrong (trying to unmap the
> anonymous page, and later remap its replacement, without a hold on
> the anon_vma needed to guide both lookups); but I'll have made it
> more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
> that's a good step forward :)
> 
> There's a reference count check in isolated_migratepages_block()
> before this, which would make it unlikely, but I doubt rules it out.
> 
> However... you did hit an anon_vma reference counting problem before
> my migration changes went in, and Kirill had a vague suspicion that
> he might be screwing up anon_vma refcounting in split_huge_page():
> if he confirms that, I'd say it's more likely to be the cause of
> your crash on this occasion.
> 
> Not hard to fix mine (though we'll probably have to lose the
> VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
> trivial fix), I just want to give the races more thought.

And after giving it more thought, I realize that I was wrong yesterday,
and the new VM_BUG_ON_PAGE() should be good as is: my guess is that it
is simply alerting you to the same anon_vma reference counting issue
as you had already hit without that patch.

What I was forgetting yesterday, is that isolate_migratepages_block()
can only take the page for migration when it's PageLRU(): and
do_anonymous_page() only adds a page to the LRU after it has been
marked as mapped and PageAnon.

So the window that worried me yesterday, that __unmap_and_move()
might see !PageAnon, then reach try_to_unmap() with it page_mapped
and PageAnon: that window does not exist, with or without my changes.

Hugh

> 
> However it turns out, I think you have a very useful test there.
> 
> (And I've observed no PageDirty problems with your recent patchsets,
> though I don't use MADV_FREE at all myself.)
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.

I think I have introduced a bug there; or rather, made more evident
a pre-existing bug.  But I'm not sure yet: the stacktrace was from
compaction (called by khugepaged, but that may not be relevant at all),
and thinking through the races with isolate_migratepages_block() is
never easy.

What's certain is that I was not giving any thought to
isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
I was thinking about "stable" anonymous pages, and how they get
faulted back in from swapcache while holding page lock.

It looks to me now as if a page might not yet be PageAnon when it's
first tested in __unmap_and_move(), when going to page_get_anon_vma();
but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
where I inserted the VM_BUG_ON_PAGE().

If so, the code would always have been wrong (trying to unmap the
anonymous page, and later remap its replacement, without a hold on
the anon_vma needed to guide both lookups); but I'll have made it
more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
that's a good step forward :)

There's a reference count check in isolated_migratepages_block()
before this, which would make it unlikely, but I doubt rules it out.

However... you did hit an anon_vma reference counting problem before
my migration changes went in, and Kirill had a vague suspicion that
he might be screwing up anon_vma refcounting in split_huge_page():
if he confirms that, I'd say it's more likely to be the cause of
your crash on this occasion.

Not hard to fix mine (though we'll probably have to lose the
VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
trivial fix), I just want to give the races more thought.

However it turns out, I think you have a very useful test there.

(And I've observed no PageDirty problems with your recent patchsets,
though I don't use MADV_FREE at all myself.)

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, 22 Oct 2015, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

Not a good use of your time, I think.  It's sure to be fixed in the
rc5-mmotm because that VM_BUG_ON_PAGE(blah) just does not exist in
that tree: I added it to verify my reasoning in changing the comments
about page_get_anon_vma() and PageSwapCache in mm/migrate.c.

> 
> > 
> > > page->mem_cgroup:88007f3dcc00
> > > [ cut here ]
> > > kernel BUG at mm/migrate.c:889!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> > 
> > Hmm, it might be me to blame, or it might be Kirill, don't know yet.
> 
> It might be me, either.
> 
> > 
> > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> > I haven't digested yet, but it might turn out to be relevant.

Sorry, I think that was an irrelevant suggestion: today's new rc6-mmotm
is identical to yesterday's there, and the patch that was removed appears
to be identical to the one added.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

Hello Hugh,

On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > 
> > I added the code to check it and queued it again but I had another oops
> > in this time but symptom is related to anon_vma, too.
> > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > at that time but second check of page_mapped right before try_to_unmap seems
> > to be true.
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > index:0x60aff
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > !anon_vma)
> 
> That's interesting, that's one I added in my page migration series.
> Let me think on it, but it could well relate to the one you got before.

I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
instead of next-20151021 to remove noise from your migration cleanup
series and will test it again.
If it is fixed, I will test again with your migration patchset, then.

> 
> > page->mem_cgroup:88007f3dcc00
> > [ cut here ]
> > kernel BUG at mm/migrate.c:889!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> 
> Hmm, it might be me to blame, or it might be Kirill, don't know yet.

It might be me, either.

> 
> Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> I haven't digested yet, but it might turn out to be relevant.
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, 22 Oct 2015, Minchan Kim wrote:
> 
> I added the code to check it and queued it again but I had another oops
> in this time but symptom is related to anon_vma, too.
> (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> It seems page_get_anon_vma returns NULL since the page was not page_mapped
> at that time but second check of page_mapped right before try_to_unmap seems
> to be true.
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> index:0x60aff
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> !anon_vma)

That's interesting, that's one I added in my page migration series.
Let me think on it, but it could well relate to the one you got before.

> page->mem_cgroup:88007f3dcc00
> [ cut here ]
> kernel BUG at mm/migrate.c:889!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557

Hmm, it might be me to blame, or it might be Kirill, don't know yet.

Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
I haven't digested yet, but it might turn out to be relevant.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Wed, Oct 21, 2015 at 02:07:23PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> > I detach this report from my patchset thread because I see below
> > problem with removing MADV_FREE related code and I can reproduce
> > same oops with MADV_FREE + recent patches(both my SetPageDirty
> > and Kirill's pte_mkdirty) within 7 hours.
> 
> Could you share code for your workload?

It's part of test suite so I need time to factor it out.
I will do/test and send it.

> 
> > I can not be sure it's THP refcount redesign's problem but it was
> > one of big change in MM between mmotm-2015-10-15-15-20 and
> > mmotm-2015-10-06-16-30 so it could be a culprit.
> > 
> > In page_lock_anon_vma_read, anon_vma_root was NULL.
> > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.
> 
> Hm. That's tricky.. :-/
> 
> Could you please dump anon_vma->refcount too?

I added the code to check it and queued it again but I had another oops
in this time but symptom is related to anon_vma, too.
(kernel is based on recent mmotm + unconditional mkdirty for bug fix)
It seems page_get_anon_vma returns NULL since the page was not page_mapped
at that time but second check of page_mapped right before try_to_unmap seems
to be true.

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
index:0x60aff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:88007f3dcc00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 11 PID: 59 Comm: khugepaged Not tainted 
4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b9851a40 ti: 8800b985c000 task.ti: 8800b985c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:8800b985fa00  EFLAGS: 00010286
RAX: 0021 RBX: ea0002dd7fc0 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8800b985fa80 R08:  R09: 880bb160
R10: 8163e000 R11: 01e0 R12: 
R13: ea0001cfbf80 R14: ea0001cfbfc0 R15: 8189de80
FS:  () GS:8800bfb6() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 5594f9d7e578 CR3: 01808000 CR4: 06a0
Stack:
 8800b9851a40   
 811144b0  81115fb0 ea0001cfbfe0
 8800b985fb30 8800b985fb20  8800b985fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? retint_kernel+0x10/0x10
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x158/0x1b90
 [] ? hrtick_update+0x51/0x70
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? unfreeze_page+0x320/0x320
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 58 e9 77 81 4c 89 f7 e8 fa 2a fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 59eb35cc15af8a53 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

> 
> I have vage suspicion that I'm screwing up anon_vma refcounting during
> split_huge_page.
> 
> It would be great to see if the page was part of THP before.
> 
> > 
> > ..
> > ..
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> > index:0x61445
> > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> > index:0x615ef
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(1)
> > page->mem_cgroup:88007f2de000
> > [ cut here ]
> > kernel BUG at mm/rmap.c:517!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> > RIP: 0010:[]  [] 
> >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Kirill A. Shutemov

On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> I detach this report from my patchset thread because I see below
> problem with removing MADV_FREE related code and I can reproduce
> same oops with MADV_FREE + recent patches(both my SetPageDirty
> and Kirill's pte_mkdirty) within 7 hours.

Could you share code for your workload?

> I can not be sure it's THP refcount redesign's problem but it was
> one of big change in MM between mmotm-2015-10-15-15-20 and
> mmotm-2015-10-06-16-30 so it could be a culprit.
> 
> In page_lock_anon_vma_read, anon_vma_root was NULL.
> I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

Hm. That's tricky.. :-/

Could you please dump anon_vma->refcount too?

I have vage suspicion that I'm screwing up anon_vma refcounting during
split_huge_page.

It would be great to see if the page was part of THP before.

> 
> ..
> ..
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> index:0x61445
> page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> index:0x615ef
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(1)
> page->mem_cgroup:88007f2de000
> [ cut here ]
> kernel BUG at mm/rmap.c:517!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> RIP: 0010:[]  [] 
> page_lock_anon_vma_read+0x18e/0x190
> RSP: :8800ada2b868  EFLAGS: 00010296
> RAX: 0021 RBX: ea0001b87bc0 RCX: 
> RDX: 0001 RSI: 0282 RDI: 81830db0
> RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75
> R10: 01ff14bc R11:  R12: 88007e806461
> R13: 88007e806460 R14:  R15: 818464c0
> FS:  7f6d93212740() GS:8800bfa0() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 63c14000 CR3: a674b000 CR4: 06b0
> Stack:
>  ea0001b87bc0 8800ada2b8f8 88007f2de000 
>  8800ada2b8d0 81129593 8800 8105f8c0
>  ea0001b87bc0 8800ada2b9f8 88007f2de000 
> Call Trace:
>  [] rmap_walk+0x1b3/0x3f0
>  [] ? finish_task_switch+0x70/0x260
>  [] page_referenced+0x1a3/0x220
>  [] ? __page_check_address+0x1d0/0x1d0
>  [] ? page_get_anon_vma+0xd0/0xd0
>  [] ? anon_vma_ctor+0x40/0x40
>  [] shrink_page_list+0x5ce/0xdc0
>  [] shrink_inactive_list+0x18c/0x4b0
>  [] shrink_lruvec+0x58f/0x730
>  [] shrink_zone+0xd4/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [] try_charge+0x175/0x720
>  [] ? __activate_page+0x230/0x230
>  [] mem_cgroup_try_charge+0x85/0x1d0
>  [] handle_mm_fault+0xc9a/0x1000
>  [] ? __set_cpus_allowed_ptr+0x9b/0x1a0
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 
> 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 
> 00 55 48 89 e5 41 57 41 56 45 31 f6 
> 41 55 4c 
> RIP  [] page_lock_anon_vma_read+0x18e/0x190
>  RSP 
> ---[ end trace cfbb87f54f12290e ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> 
> On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > > Hello, it's too late since I sent previos patch.
> > > > https://lkml.org/lkml/2015/6/3/37
> > > > 
> > > > This patch is alomost new compared to previos approach.
> > > > I think this is more simple, clear and easy to review.
> > > > 
> > > > One thing I should notice is that I have tested this patch
> > > > and couldn't find any critical problem so I rebased patchset
> > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > > patchset. Unfortunately, I start to see sudden discarding of
> > > > the page we shouldn't do. IOW, application's valid anonymous page
> > > > was disappeared suddenly.
> > > > 
> > > > When I look through THP changes, I think we could lose
> > > > dirty bit of pte between freeze_page and unfreeze_page
> > > > when we mark it as migration entry and restore it.
> > > > So, I added below simple code without enough considering
> > > > and cannot see the problem any more.
> > > > I hope it's good hint to find right fix this problem.
> > > > 
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > >

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, 22 Oct 2015, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap 
> > > seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k 
> > > FS
> > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > index:0x60aff
> > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

Not a good use of your time, I think.  It's sure to be fixed in the
rc5-mmotm because that VM_BUG_ON_PAGE(blah) just does not exist in
that tree: I added it to verify my reasoning in changing the comments
about page_get_anon_vma() and PageSwapCache in mm/migrate.c.

> 
> > 
> > > page->mem_cgroup:88007f3dcc00
> > > [ cut here ]
> > > kernel BUG at mm/migrate.c:889!
> > > invalid opcode:  [#1] SMP 
> > > Dumping ftrace buffer:
> > >(ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> > 
> > Hmm, it might be me to blame, or it might be Kirill, don't know yet.
> 
> It might be me, either.
> 
> > 
> > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> > I haven't digested yet, but it might turn out to be relevant.

Sorry, I think that was an irrelevant suggestion: today's new rc6-mmotm
is identical to yesterday's there, and the patch that was removed appears
to be identical to the one added.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not 
> > > > page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap 
> > > > seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 
> > > > across:4191228k FS
> > > > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > > > index:0x60aff
> > > > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > > > !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.

I think I have introduced a bug there; or rather, made more evident
a pre-existing bug.  But I'm not sure yet: the stacktrace was from
compaction (called by khugepaged, but that may not be relevant at all),
and thinking through the races with isolate_migratepages_block() is
never easy.

What's certain is that I was not giving any thought to
isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
I was thinking about "stable" anonymous pages, and how they get
faulted back in from swapcache while holding page lock.

It looks to me now as if a page might not yet be PageAnon when it's
first tested in __unmap_and_move(), when going to page_get_anon_vma();
but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
where I inserted the VM_BUG_ON_PAGE().

If so, the code would always have been wrong (trying to unmap the
anonymous page, and later remap its replacement, without a hold on
the anon_vma needed to guide both lookups); but I'll have made it
more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
that's a good step forward :)

There's a reference count check in isolated_migratepages_block()
before this, which would make it unlikely, but I doubt rules it out.

However... you did hit an anon_vma reference counting problem before
my migration changes went in, and Kirill had a vague suspicion that
he might be screwing up anon_vma refcounting in split_huge_page():
if he confirms that, I'd say it's more likely to be the cause of
your crash on this occasion.

Not hard to fix mine (though we'll probably have to lose the
VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
trivial fix), I just want to give the races more thought.

However it turns out, I think you have a very useful test there.

(And I've observed no PageDirty problems with your recent patchsets,
though I don't use MADV_FREE at all myself.)

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Thu, 22 Oct 2015, Minchan Kim wrote:
> 
> I added the code to check it and queued it again but I had another oops
> in this time but symptom is related to anon_vma, too.
> (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> It seems page_get_anon_vma returns NULL since the page was not page_mapped
> at that time but second check of page_mapped right before try_to_unmap seems
> to be true.
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> index:0x60aff
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> !anon_vma)

That's interesting, that's one I added in my page migration series.
Let me think on it, but it could well relate to the one you got before.

> page->mem_cgroup:88007f3dcc00
> [ cut here ]
> kernel BUG at mm/migrate.c:889!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557

Hmm, it might be me to blame, or it might be Kirill, don't know yet.

Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
I haven't digested yet, but it might turn out to be relevant.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

Hello Hugh,

On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > 
> > I added the code to check it and queued it again but I had another oops
> > in this time but symptom is related to anon_vma, too.
> > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > at that time but second check of page_mapped right before try_to_unmap seems
> > to be true.
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
> > index:0x60aff
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
> > !anon_vma)
> 
> That's interesting, that's one I added in my page migration series.
> Let me think on it, but it could well relate to the one you got before.

I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
instead of next-20151021 to remove noise from your migration cleanup
series and will test it again.
If it is fixed, I will test again with your migration patchset, then.

> 
> > page->mem_cgroup:88007f3dcc00
> > [ cut here ]
> > kernel BUG at mm/migrate.c:889!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 11 PID: 59 Comm: khugepaged Not tainted 
> > 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> 
> Hmm, it might be me to blame, or it might be Kirill, don't know yet.

It might be me, either.

> 
> Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> I haven't digested yet, but it might turn out to be relevant.
> 
> Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel oops on mmotm-2015-10-15-15-20

On Wed, Oct 21, 2015 at 02:07:23PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> > I detach this report from my patchset thread because I see below
> > problem with removing MADV_FREE related code and I can reproduce
> > same oops with MADV_FREE + recent patches(both my SetPageDirty
> > and Kirill's pte_mkdirty) within 7 hours.
> 
> Could you share code for your workload?

It's part of test suite so I need time to factor it out.
I will do/test and send it.

> 
> > I can not be sure it's THP refcount redesign's problem but it was
> > one of big change in MM between mmotm-2015-10-15-15-20 and
> > mmotm-2015-10-06-16-30 so it could be a culprit.
> > 
> > In page_lock_anon_vma_read, anon_vma_root was NULL.
> > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.
> 
> Hm. That's tricky.. :-/
> 
> Could you please dump anon_vma->refcount too?

I added the code to check it and queued it again but I had another oops
in this time but symptom is related to anon_vma, too.
(kernel is based on recent mmotm + unconditional mkdirty for bug fix)
It seems page_get_anon_vma returns NULL since the page was not page_mapped
at that time but second check of page_mapped right before try_to_unmap seems
to be true.

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ea0001cfbfc0 count:3 mapcount:1 mapping:88007f1b5f51 
index:0x60aff
flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && 
!anon_vma)
page->mem_cgroup:88007f3dcc00
[ cut here ]
kernel BUG at mm/migrate.c:889!
invalid opcode:  [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 11 PID: 59 Comm: khugepaged Not tainted 
4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800b9851a40 ti: 8800b985c000 task.ti: 8800b985c000
RIP: 0010:[]  [] migrate_pages+0x8e6/0x950
RSP: 0018:8800b985fa00  EFLAGS: 00010286
RAX: 0021 RBX: ea0002dd7fc0 RCX: 81830db8
RDX: 0001 RSI: 0246 RDI: 821df4d8
RBP: 8800b985fa80 R08:  R09: 880bb160
R10: 8163e000 R11: 01e0 R12: 
R13: ea0001cfbf80 R14: ea0001cfbfc0 R15: 8189de80
FS:  () GS:8800bfb6() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 5594f9d7e578 CR3: 01808000 CR4: 06a0
Stack:
 8800b9851a40   
 811144b0  81115fb0 ea0001cfbfe0
 8800b985fb30 8800b985fb20  8800b985fb20
Call Trace:
 [] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [] ? isolate_freepages_block+0x3d0/0x3d0
 [] compact_zone+0x2bb/0x720
 [] ? retint_kernel+0x10/0x10
 [] ? list_del+0xd/0x30
 [] compact_zone_order+0x6d/0xa0
 [] try_to_compact_pages+0xed/0x200
 [] __alloc_pages_direct_compact+0x3b/0xd4
 [] __alloc_pages_nodemask+0x3fb/0x920
 [] khugepaged+0x158/0x1b90
 [] ? hrtick_update+0x51/0x70
 [] ? prepare_to_wait_event+0xf0/0xf0
 [] ? unfreeze_page+0x320/0x320
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 
f4 fa ff ff 48 c7 c6 58 e9 77 81 4c 89 f7 e8 fa 2a fd ff <0f> 0b 48 83 e8 01 e9 
d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [] migrate_pages+0x8e6/0x950
 RSP 
---[ end trace 59eb35cc15af8a53 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

> 
> I have vage suspicion that I'm screwing up anon_vma refcounting during
> split_huge_page.
> 
> It would be great to see if the page was part of THP before.
> 
> > 
> > ..
> > ..
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> > index:0x61445
> > page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> > index:0x615ef
> > flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(1)
> > page->mem_cgroup:88007f2de000
> > [ cut here ]
> > kernel BUG at mm/rmap.c:517!
> > invalid opcode:  [#1] SMP 
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> > 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> > RIP: 0010:[]  [] 
> >

Re: kernel oops on mmotm-2015-10-15-15-20

2015-10-21 Thread Kirill A. Shutemov

On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> I detach this report from my patchset thread because I see below
> problem with removing MADV_FREE related code and I can reproduce
> same oops with MADV_FREE + recent patches(both my SetPageDirty
> and Kirill's pte_mkdirty) within 7 hours.

Could you share code for your workload?

> I can not be sure it's THP refcount redesign's problem but it was
> one of big change in MM between mmotm-2015-10-15-15-20 and
> mmotm-2015-10-06-16-30 so it could be a culprit.
> 
> In page_lock_anon_vma_read, anon_vma_root was NULL.
> I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

Hm. That's tricky.. :-/

Could you please dump anon_vma->refcount too?

I have vage suspicion that I'm screwing up anon_vma refcounting during
split_huge_page.

It would be great to see if the page was part of THP before.

> 
> ..
> ..
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ea0001b81140 count:3 mapcount:1 mapping:88007e806461 
> index:0x61445
> page:ea0001b87bc0 count:3 mapcount:1 mapping:88007e806461 
> index:0x615ef
> flags: 0x40048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(1)
> page->mem_cgroup:88007f2de000
> [ cut here ]
> kernel BUG at mm/rmap.c:517!
> invalid opcode:  [#1] SMP 
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 24935 Comm: madvise_test Not tainted 
> 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88ce8000 ti: 8800ada28000 task.ti: 8800ada28000
> RIP: 0010:[]  [] 
> page_lock_anon_vma_read+0x18e/0x190
> RSP: :8800ada2b868  EFLAGS: 00010296
> RAX: 0021 RBX: ea0001b87bc0 RCX: 
> RDX: 0001 RSI: 0282 RDI: 81830db0
> RBP: 8800ada2b888 R08: 0021 R09: 8800ba40eb75
> R10: 01ff14bc R11:  R12: 88007e806461
> R13: 88007e806460 R14:  R15: 818464c0
> FS:  7f6d93212740() GS:8800bfa0() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 63c14000 CR3: a674b000 CR4: 06b0
> Stack:
>  ea0001b87bc0 8800ada2b8f8 88007f2de000 
>  8800ada2b8d0 81129593 8800 8105f8c0
>  ea0001b87bc0 8800ada2b9f8 88007f2de000 
> Call Trace:
>  [] rmap_walk+0x1b3/0x3f0
>  [] ? finish_task_switch+0x70/0x260
>  [] page_referenced+0x1a3/0x220
>  [] ? __page_check_address+0x1d0/0x1d0
>  [] ? page_get_anon_vma+0xd0/0xd0
>  [] ? anon_vma_ctor+0x40/0x40
>  [] shrink_page_list+0x5ce/0xdc0
>  [] shrink_inactive_list+0x18c/0x4b0
>  [] shrink_lruvec+0x58f/0x730
>  [] shrink_zone+0xd4/0x280
>  [] do_try_to_free_pages+0x12d/0x3b0
>  [] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [] try_charge+0x175/0x720
>  [] ? __activate_page+0x230/0x230
>  [] mem_cgroup_try_charge+0x85/0x1d0
>  [] handle_mm_fault+0xc9a/0x1000
>  [] ? __set_cpus_allowed_ptr+0x9b/0x1a0
>  [] __do_page_fault+0x189/0x400
>  [] do_page_fault+0xc/0x10
>  [] page_fault+0x22/0x30
> Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 
> 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 
> 00 55 48 89 e5 41 57 41 56 45 31 f6 
> 41 55 4c 
> RIP  [] page_lock_anon_vma_read+0x18e/0x190
>  RSP 
> ---[ end trace cfbb87f54f12290e ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> 
> On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > > Hello, it's too late since I sent previos patch.
> > > > https://lkml.org/lkml/2015/6/3/37
> > > > 
> > > > This patch is alomost new compared to previos approach.
> > > > I think this is more simple, clear and easy to review.
> > > > 
> > > > One thing I should notice is that I have tested this patch
> > > > and couldn't find any critical problem so I rebased patchset
> > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > > patchset. Unfortunately, I start to see sudden discarding of
> > > > the page we shouldn't do. IOW, application's valid anonymous page
> > > > was disappeared suddenly.
> > > > 
> > > > When I look through THP changes, I think we could lose
> > > > dirty bit of pte between freeze_page and unfreeze_page
> > > > when we mark it as migration entry and restore it.
> > > > So, I added below simple code without enough considering
> > > > and cannot see the problem any more.
> > > > I hope it's good hint to find right fix this problem.
> > > > 
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > >

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

2015-07-19 Thread simon


>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f957724b94cb19f5c1c97efd01dd4df8ced323c
>>
>
> Certainly looks like a plausible solution, will build kernel tonight to
> confirm.

Just to confirm; 4.2rc1 + above patch, and 4.2rc2 both function correctly
and I no longer see the lock up/Oops.

Thanks to all who helped out,
Simon.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

2015-07-19 Thread simon


 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f957724b94cb19f5c1c97efd01dd4df8ced323c


 Certainly looks like a plausible solution, will build kernel tonight to
 confirm.

Just to confirm; 4.2rc1 + above patch, and 4.2rc2 both function correctly
and I no longer see the lock up/Oops.

Thanks to all who helped out,
Simon.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

> On 07/17/2015 08:14 AM, si...@mungewell.org wrote:
>>
 So in summary this problem is showing up now as the 'User Helper
 Fallback'
 is now forced on, obviously the underlying problem needs to be fixed -
 but
 I don't know when it crept in.

>>>
>>> The 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' enables to load firmware
>>> data manually by accessing /sys/class/firmware//data. It runs in
>>> case the firmware file is missing.
>>> This user helper fallback will be enabled if one of LP55xx driver is
>>> included in your dot config. Please see my patch below.
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/leds?id=b67893206fc0a0e8af87130e67f3d8ae553fc87c
>>>
>>> However, I'm not sure why this affects your system lockup. Can I have
>>> more details?
>>
>> Hi Milo,
>> I'm not suggesting that your patch is the cause, just that it is an
>> 'enabler' and and explains why the problem (system lockup when I plug
>> USB
>> Bluetooth dongle in) appears now.
>>
>> A full Oops log is further back in this thread:
>> http://www.spinics.net/lists/linux-bluetooth/msg63090.html
>>
>>
 Will try building 4.1 with this option to see if it fails.
>>
>> A very quick test as I was leaving the house this morning shows that 4.1
>> with 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' does not show the problem.
>>
>> So at least we know the 'real' problem is a recent change to the code.
>> Simon
>>
>
> I think this was reported and fixed
>
> https://lkml.org/lkml/2015/7/8/858
> https://lkml.org/lkml/2015/7/8/1199
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f957724b94cb19f5c1c97efd01dd4df8ced323c
>

Certainly looks like a plausible solution, will build kernel tonight to
confirm.

If  Shuah is still looking for the trigger, see above note regarding
'CONFIG_FW_LOADER_USER_HELPER_FALLBACK'.

Thanks, and have an awesome weekend. :-)
Simon

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

2015-07-17 Thread Laura Abbott

On 07/17/2015 08:14 AM, si...@mungewell.org wrote:

So in summary this problem is showing up now as the 'User Helper
Fallback'
is now forced on, obviously the underlying problem needs to be fixed -
but
I don't know when it crept in.

The 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' enables to load firmware
data manually by accessing /sys/class/firmware//data. It runs in
case the firmware file is missing.
This user helper fallback will be enabled if one of LP55xx driver is
included in your dot config. Please see my patch below.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/leds?id=b67893206fc0a0e8af87130e67f3d8ae553fc87c

However, I'm not sure why this affects your system lockup. Can I have
more details?

Hi Milo,
I'm not suggesting that your patch is the cause, just that it is an
'enabler' and and explains why the problem (system lockup when I plug USB
Bluetooth dongle in) appears now.

A full Oops log is further back in this thread:
http://www.spinics.net/lists/linux-bluetooth/msg63090.html

Will try building 4.1 with this option to see if it fails.

A very quick test as I was leaving the house this morning shows that 4.1
with 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' does not show the problem.

So at least we know the 'real' problem is a recent change to the code.
Simon

I think this was reported and fixed

https://lkml.org/lkml/2015/7/8/858
https://lkml.org/lkml/2015/7/8/1199
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f957724b94cb19f5c1c97efd01dd4df8ced323c

Thanks,
Laura

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached


>> So in summary this problem is showing up now as the 'User Helper
>> Fallback'
>> is now forced on, obviously the underlying problem needs to be fixed -
>> but
>> I don't know when it crept in.
>>
>
> The 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' enables to load firmware
> data manually by accessing /sys/class/firmware//data. It runs in
> case the firmware file is missing.
> This user helper fallback will be enabled if one of LP55xx driver is
> included in your dot config. Please see my patch below.
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/leds?id=b67893206fc0a0e8af87130e67f3d8ae553fc87c
>
> However, I'm not sure why this affects your system lockup. Can I have
> more details?

Hi Milo,
I'm not suggesting that your patch is the cause, just that it is an
'enabler' and and explains why the problem (system lockup when I plug USB
Bluetooth dongle in) appears now.

A full Oops log is further back in this thread:
http://www.spinics.net/lists/linux-bluetooth/msg63090.html


>> Will try building 4.1 with this option to see if it fails.

A very quick test as I was leaving the house this morning shows that 4.1
with 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' does not show the problem.

So at least we know the 'real' problem is a recent change to the code.
Simon

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

2015-07-17 Thread Kim, Milo


Hi Simon,

On 7/17/2015 3:14 PM, si...@mungewell.org wrote:



It looks like the firmware 'opt_flags' must be different, so this may be a
contributing factor.


Plot thickens kernel config has changed since I built 4.1.0rc7, but I
don't recall doing it or starting a fresh.

/boot/config-4.1.0-rc7+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
--

/boot/config-4.2.0-rc1+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached


> It looks like the firmware 'opt_flags' must be different, so this may be a
> contributing factor.

Plot thickens kernel config has changed since I built 4.1.0rc7, but I
don't recall doing it or starting a fresh.

/boot/config-4.1.0-rc7+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
--

/boot/config-4.2.0-rc1+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

So in summary this problem is showing up now as the 'User Helper
Fallback'
is now forced on, obviously the underlying problem needs to be fixed -
but
I don't know when it crept in.

The 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' enables to load firmware
data manually by accessing /sys/class/firmware/name/data. It runs in
case the firmware file is missing.
This user helper fallback will be enabled if one of LP55xx driver is
included in your dot config. Please see my patch below.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/leds?id=b67893206fc0a0e8af87130e67f3d8ae553fc87c

However, I'm not sure why this affects your system lockup. Can I have
more details?

Hi Milo,
I'm not suggesting that your patch is the cause, just that it is an
'enabler' and and explains why the problem (system lockup when I plug USB
Bluetooth dongle in) appears now.

A full Oops log is further back in this thread:
http://www.spinics.net/lists/linux-bluetooth/msg63090.html

Will try building 4.1 with this option to see if it fails.

A very quick test as I was leaving the house this morning shows that 4.1
with 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' does not show the problem.

So at least we know the 'real' problem is a recent change to the code.
Simon

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached


 It looks like the firmware 'opt_flags' must be different, so this may be a
 contributing factor.

Plot thickens kernel config has changed since I built 4.1.0rc7, but I
don't recall doing it or starting a fresh.

/boot/config-4.1.0-rc7+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
--

/boot/config-4.2.0-rc1+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y   !!!
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
--


Has a kconfig forced a change? Grrr
--
$ git blame ./drivers/leds/Kconfig
--
c93d08fa7 (Milo(Woogyom) Kim 2013-02-05 18:01:23 +0900 228) config
LEDS_LP55XX_COMMON
33b3a561f (Kim, Milo 2013-07-09 02:11:37 -0700 229)
tristate Common Driver for TI/National LP5521/5523/55231/5562/8501
33b3a561f (Kim, Milo 2013-07-09 02:11:37 -0700 230)
depends on LEDS_LP5521 || LEDS_LP5523 || LEDS_LP5562 || LEDS_LP8501
10c06d178 (Milo(Woogyom) Kim 2013-02-05 19:17:20 +0900 231)
select FW_LOADER
b67893206 (Milo Kim  2015-06-28 17:39:14 -0700 232)
select FW_LOADER_USER_HELPER_FALLBACK
-
c93d08fa7 (Milo(Woogyom) Kim 2013-02-05 18:01:23 +0900 233) help
33b3a561f (Kim, Milo 2013-07-09 02:11:37 -0700 234)  
This option supports common operations for LP5521/5523/55231/5562/8501
c93d08fa7 (Milo(Woogyom) Kim 2013-02-05 18:01:23 +0900 235)  
devices.
--

So in summary this problem is showing up now as the 'User Helper Fallback'
is now forced on, obviously the underlying problem needs to be fixed - but
I don't know when it crept in.

Will try building 4.1 with this option to see if it fails.
Simon

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

2015-07-17 Thread Kim, Milo


Hi Simon,

On 7/17/2015 3:14 PM, si...@mungewell.org wrote:



It looks like the firmware 'opt_flags' must be different, so this may be a
contributing factor.


Plot thickens kernel config has changed since I built 4.1.0rc7, but I
don't recall doing it or starting a fresh.

/boot/config-4.1.0-rc7+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
--

/boot/config-4.2.0-rc1+
--
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y   !!!
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
--


Has a kconfig forced a change? Grrr
--
$ git blame ./drivers/leds/Kconfig
--
c93d08fa7 (Milo(Woogyom) Kim 2013-02-05 18:01:23 +0900 228) config
LEDS_LP55XX_COMMON
33b3a561f (Kim, Milo 2013-07-09 02:11:37 -0700 229)
tristate Common Driver for TI/National LP5521/5523/55231/5562/8501
33b3a561f (Kim, Milo 2013-07-09 02:11:37 -0700 230)
depends on LEDS_LP5521 || LEDS_LP5523 || LEDS_LP5562 || LEDS_LP8501
10c06d178 (Milo(Woogyom) Kim 2013-02-05 19:17:20 +0900 231)
select FW_LOADER
b67893206 (Milo Kim  2015-06-28 17:39:14 -0700 232)
select FW_LOADER_USER_HELPER_FALLBACK
-
c93d08fa7 (Milo(Woogyom) Kim 2013-02-05 18:01:23 +0900 233) help
33b3a561f (Kim, Milo 2013-07-09 02:11:37 -0700 234)
This option supports common operations for LP5521/5523/55231/5562/8501
c93d08fa7 (Milo(Woogyom) Kim 2013-02-05 18:01:23 +0900 235)
devices.
--

So in summary this problem is showing up now as the 'User Helper Fallback'
is now forced on, obviously the underlying problem needs to be fixed - but
I don't know when it crept in.



The 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' enables to load firmware 
data manually by accessing /sys/class/firmware/name/data. It runs in 
case the firmware file is missing.
This user helper fallback will be enabled if one of LP55xx driver is 
included in your dot config. Please see my patch below.


https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/leds?id=b67893206fc0a0e8af87130e67f3d8ae553fc87c

However, I'm not sure why this affects your system lockup. Can I have 
more details?


Best regards,
Milo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached

2015-07-17 Thread Laura Abbott

On 07/17/2015 08:14 AM, si...@mungewell.org wrote:

So in summary this problem is showing up now as the 'User Helper
Fallback'
is now forced on, obviously the underlying problem needs to be fixed -
but
I don't know when it crept in.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/leds?id=b67893206fc0a0e8af87130e67f3d8ae553fc87c

However, I'm not sure why this affects your system lockup. Can I have
more details?

Hi Milo,
I'm not suggesting that your patch is the cause, just that it is an
'enabler' and and explains why the problem (system lockup when I plug USB
Bluetooth dongle in) appears now.

A full Oops log is further back in this thread:
http://www.spinics.net/lists/linux-bluetooth/msg63090.html

Will try building 4.1 with this option to see if it fails.

A very quick test as I was leaving the house this morning shows that 4.1
with 'CONFIG_FW_LOADER_USER_HELPER_FALLBACK' does not show the problem.

So at least we know the 'real' problem is a recent change to the code.
Simon

I think this was reported and fixed

https://lkml.org/lkml/2015/7/8/858
https://lkml.org/lkml/2015/7/8/1199
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f957724b94cb19f5c1c97efd01dd4df8ced323c

Thanks,
Laura

Re: Kernel Oops: btusb: 4.2rc1 System lockup with BT dongle insert - log attached