Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-07 Thread Baoquan He
On 05/04/17 at 09:25am, Thomas Garnier wrote:

> > I think this needs a "Fixes:" tag and Cc: .

Sorry for late response, should I resend with them?

> 
> Agreed.
> 
> >
> > Other than that:
> >
> > Reviewed-by: Dan Williams 
> 
> Thanks again!
> 
> Reviewed-by: Thomas Garnier 
> -- 
> Thomas


Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-07 Thread Baoquan He
On 05/04/17 at 09:25am, Thomas Garnier wrote:

> > I think this needs a "Fixes:" tag and Cc: .

Sorry for late response, should I resend with them?

> 
> Agreed.
> 
> >
> > Other than that:
> >
> > Reviewed-by: Dan Williams 
> 
> Thanks again!
> 
> Reviewed-by: Thomas Garnier 
> -- 
> Thomas


Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-04 Thread Thomas Garnier
On Wed, May 3, 2017 at 7:35 PM, Dan Williams  wrote:
> On Wed, May 3, 2017 at 7:25 PM, Baoquan He  wrote:
>> Jeff Moyer reported that on his system with two memory regions 0~64G and
>> 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
>> will make system hang intermittently during boot. While adding 'nokaslr'
>> won't.
>>
>> This is because the for loop count calculation in sync_global_pgds is
>> not correct. When a mapping area crosses pgd entries, we should
>> calculate the starting address of region which next pgd covers and assign
>> it to next for loop count, but not add PGDIR_SIZE directly. The old
>> code works right only if the mapping area is times of PGDIR_SIZE,
>> otherwize the end region could be skipped so that it can't be synchronized
>> to all other processes from kernel pgd init_mm.pgd.
>>
>> In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
>> PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
>> makes this area be mapped inside one pgd entry. With kaslr enabled,
>> this area could cross two pgd entries, then the next pgd entry won't
>> be synced to all other processes. That is why we saw empty PGD.
>>
>> Fix it in this patch.
>>
>> The back trace is pasted as below:
>>
>> [9.988867] IP: memcpy_erms+0x6/0x10
>> [9.988868] PGD 0
>> [9.988868]
>> [9.988870] Oops:  [#1] SMP
>> [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) 
>> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) 
>> ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) 
>> nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E)
>> i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
>> [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE   
>> 4.11.0-rc5+ #43
>> [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS 
>> SE5C600.86B.02.01.SP06.050920141054 05/09/2014
>> [9.98] task: 9267dc2f8000 task.stack: ba92c783c000
>> [9.988890] RIP: 0010:memcpy_erms+0x6/0x10
>> [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286
>> [9.988892] RAX: 925f19e27000 RBX:  RCX: 
>> 1000
>> [9.988893] RDX: 1000 RSI: 9387bfff RDI: 
>> 925f19e27000
>> [9.988893] RBP: ba92c783fa38 R08:  R09: 
>> 1780
>> [9.988894] R10:  R11: 9387bfff R12: 
>> 925fde811ed8
>> [9.988895] R13: 002f R14: 1000 R15: 
>> 925f19e27000
>> [9.988896] FS:  7f1ee18e68c0() GS:925fdec0() 
>> knlGS:
>> [9.988896] CS:  0010 DS:  ES:  CR0: 80050033
>> [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 
>> 001406f0
>> [9.988897] Call Trace:
>> [9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
>> [9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
>> [9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
>> [9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
>> [9.988909]  bdev_read_page+0x81/0xb0
>> [9.988911]  do_mpage_readpage+0x56f/0x770
>> [9.988912]  ? I_BDEV+0x20/0x20
>> [9.988915]  ? lru_cache_add+0xe/0x10
>> [9.988917]  mpage_readpages+0x148/0x1e0
>> [9.988917]  ? I_BDEV+0x20/0x20
>> [9.988918]  ? I_BDEV+0x20/0x20
>> [9.988921]  ? alloc_pages_current+0x88/0x120
>> [9.988923]  blkdev_readpages+0x1d/0x20
>> [9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
>> [9.988926]  force_page_cache_readahead+0xa2/0x100
>> [9.988927]  page_cache_sync_readahead+0x3f/0x50
>> [9.988930]  generic_file_read_iter+0x60d/0x8c0
>> [9.988931]  blkdev_read_iter+0x37/0x40
>> [9.988933]  __vfs_read+0xe0/0x150
>> [9.988934]  vfs_read+0x8c/0x130
>> [9.988936]  SyS_read+0x55/0xc0
>> [9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [9.988940] RIP: 0033:0x7f1ee0822480
>> [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: 
>> 
>> [9.988942] RAX: ffda RBX:  RCX: 
>> 7f1ee0822480
>> [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 
>> 0008
>> [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 
>> 0068
>> [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: 
>> 
>> [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 
>> 561b7e1a55e0
>> [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 
>> e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 
>>  a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>> [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8
>> [9.988962] CR2: 9387bfff
>> [9.989022] ---[ end trace fe34c0fc0fe685ab ]---
>> [9.998690] 

Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-04 Thread Thomas Garnier
On Wed, May 3, 2017 at 7:35 PM, Dan Williams  wrote:
> On Wed, May 3, 2017 at 7:25 PM, Baoquan He  wrote:
>> Jeff Moyer reported that on his system with two memory regions 0~64G and
>> 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
>> will make system hang intermittently during boot. While adding 'nokaslr'
>> won't.
>>
>> This is because the for loop count calculation in sync_global_pgds is
>> not correct. When a mapping area crosses pgd entries, we should
>> calculate the starting address of region which next pgd covers and assign
>> it to next for loop count, but not add PGDIR_SIZE directly. The old
>> code works right only if the mapping area is times of PGDIR_SIZE,
>> otherwize the end region could be skipped so that it can't be synchronized
>> to all other processes from kernel pgd init_mm.pgd.
>>
>> In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
>> PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
>> makes this area be mapped inside one pgd entry. With kaslr enabled,
>> this area could cross two pgd entries, then the next pgd entry won't
>> be synced to all other processes. That is why we saw empty PGD.
>>
>> Fix it in this patch.
>>
>> The back trace is pasted as below:
>>
>> [9.988867] IP: memcpy_erms+0x6/0x10
>> [9.988868] PGD 0
>> [9.988868]
>> [9.988870] Oops:  [#1] SMP
>> [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) 
>> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) 
>> ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) 
>> nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E)
>> i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
>> [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE   
>> 4.11.0-rc5+ #43
>> [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS 
>> SE5C600.86B.02.01.SP06.050920141054 05/09/2014
>> [9.98] task: 9267dc2f8000 task.stack: ba92c783c000
>> [9.988890] RIP: 0010:memcpy_erms+0x6/0x10
>> [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286
>> [9.988892] RAX: 925f19e27000 RBX:  RCX: 
>> 1000
>> [9.988893] RDX: 1000 RSI: 9387bfff RDI: 
>> 925f19e27000
>> [9.988893] RBP: ba92c783fa38 R08:  R09: 
>> 1780
>> [9.988894] R10:  R11: 9387bfff R12: 
>> 925fde811ed8
>> [9.988895] R13: 002f R14: 1000 R15: 
>> 925f19e27000
>> [9.988896] FS:  7f1ee18e68c0() GS:925fdec0() 
>> knlGS:
>> [9.988896] CS:  0010 DS:  ES:  CR0: 80050033
>> [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 
>> 001406f0
>> [9.988897] Call Trace:
>> [9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
>> [9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
>> [9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
>> [9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
>> [9.988909]  bdev_read_page+0x81/0xb0
>> [9.988911]  do_mpage_readpage+0x56f/0x770
>> [9.988912]  ? I_BDEV+0x20/0x20
>> [9.988915]  ? lru_cache_add+0xe/0x10
>> [9.988917]  mpage_readpages+0x148/0x1e0
>> [9.988917]  ? I_BDEV+0x20/0x20
>> [9.988918]  ? I_BDEV+0x20/0x20
>> [9.988921]  ? alloc_pages_current+0x88/0x120
>> [9.988923]  blkdev_readpages+0x1d/0x20
>> [9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
>> [9.988926]  force_page_cache_readahead+0xa2/0x100
>> [9.988927]  page_cache_sync_readahead+0x3f/0x50
>> [9.988930]  generic_file_read_iter+0x60d/0x8c0
>> [9.988931]  blkdev_read_iter+0x37/0x40
>> [9.988933]  __vfs_read+0xe0/0x150
>> [9.988934]  vfs_read+0x8c/0x130
>> [9.988936]  SyS_read+0x55/0xc0
>> [9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [9.988940] RIP: 0033:0x7f1ee0822480
>> [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: 
>> 
>> [9.988942] RAX: ffda RBX:  RCX: 
>> 7f1ee0822480
>> [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 
>> 0008
>> [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 
>> 0068
>> [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: 
>> 
>> [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 
>> 561b7e1a55e0
>> [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 
>> e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 
>>  a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>> [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8
>> [9.988962] CR2: 9387bfff
>> [9.989022] ---[ end trace fe34c0fc0fe685ab ]---
>> [9.998690] Kernel panic - not syncing: Fatal exception
>> [ 

Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-03 Thread Dan Williams
On Wed, May 3, 2017 at 7:25 PM, Baoquan He  wrote:
> Jeff Moyer reported that on his system with two memory regions 0~64G and
> 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
> will make system hang intermittently during boot. While adding 'nokaslr'
> won't.
>
> This is because the for loop count calculation in sync_global_pgds is
> not correct. When a mapping area crosses pgd entries, we should
> calculate the starting address of region which next pgd covers and assign
> it to next for loop count, but not add PGDIR_SIZE directly. The old
> code works right only if the mapping area is times of PGDIR_SIZE,
> otherwize the end region could be skipped so that it can't be synchronized
> to all other processes from kernel pgd init_mm.pgd.
>
> In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
> PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
> makes this area be mapped inside one pgd entry. With kaslr enabled,
> this area could cross two pgd entries, then the next pgd entry won't
> be synced to all other processes. That is why we saw empty PGD.
>
> Fix it in this patch.
>
> The back trace is pasted as below:
>
> [9.988867] IP: memcpy_erms+0x6/0x10
> [9.988868] PGD 0
> [9.988868]
> [9.988870] Oops:  [#1] SMP
> [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) 
> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) 
> ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) 
> nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E)
> i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE   
> 4.11.0-rc5+ #43
> [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS 
> SE5C600.86B.02.01.SP06.050920141054 05/09/2014
> [9.98] task: 9267dc2f8000 task.stack: ba92c783c000
> [9.988890] RIP: 0010:memcpy_erms+0x6/0x10
> [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286
> [9.988892] RAX: 925f19e27000 RBX:  RCX: 
> 1000
> [9.988893] RDX: 1000 RSI: 9387bfff RDI: 
> 925f19e27000
> [9.988893] RBP: ba92c783fa38 R08:  R09: 
> 1780
> [9.988894] R10:  R11: 9387bfff R12: 
> 925fde811ed8
> [9.988895] R13: 002f R14: 1000 R15: 
> 925f19e27000
> [9.988896] FS:  7f1ee18e68c0() GS:925fdec0() 
> knlGS:
> [9.988896] CS:  0010 DS:  ES:  CR0: 80050033
> [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 
> 001406f0
> [9.988897] Call Trace:
> [9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
> [9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
> [9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
> [9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
> [9.988909]  bdev_read_page+0x81/0xb0
> [9.988911]  do_mpage_readpage+0x56f/0x770
> [9.988912]  ? I_BDEV+0x20/0x20
> [9.988915]  ? lru_cache_add+0xe/0x10
> [9.988917]  mpage_readpages+0x148/0x1e0
> [9.988917]  ? I_BDEV+0x20/0x20
> [9.988918]  ? I_BDEV+0x20/0x20
> [9.988921]  ? alloc_pages_current+0x88/0x120
> [9.988923]  blkdev_readpages+0x1d/0x20
> [9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
> [9.988926]  force_page_cache_readahead+0xa2/0x100
> [9.988927]  page_cache_sync_readahead+0x3f/0x50
> [9.988930]  generic_file_read_iter+0x60d/0x8c0
> [9.988931]  blkdev_read_iter+0x37/0x40
> [9.988933]  __vfs_read+0xe0/0x150
> [9.988934]  vfs_read+0x8c/0x130
> [9.988936]  SyS_read+0x55/0xc0
> [9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [9.988940] RIP: 0033:0x7f1ee0822480
> [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: 
> 
> [9.988942] RAX: ffda RBX:  RCX: 
> 7f1ee0822480
> [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 
> 0008
> [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 
> 0068
> [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: 
> 
> [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 
> 561b7e1a55e0
> [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 
> 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1  
> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8
> [9.988962] CR2: 9387bfff
> [9.989022] ---[ end trace fe34c0fc0fe685ab ]---
> [9.998690] Kernel panic - not syncing: Fatal exception
> [   10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation 
> range: 0x8000-0xbfff)
>
> 

Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-03 Thread Dan Williams
On Wed, May 3, 2017 at 7:25 PM, Baoquan He  wrote:
> Jeff Moyer reported that on his system with two memory regions 0~64G and
> 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
> will make system hang intermittently during boot. While adding 'nokaslr'
> won't.
>
> This is because the for loop count calculation in sync_global_pgds is
> not correct. When a mapping area crosses pgd entries, we should
> calculate the starting address of region which next pgd covers and assign
> it to next for loop count, but not add PGDIR_SIZE directly. The old
> code works right only if the mapping area is times of PGDIR_SIZE,
> otherwize the end region could be skipped so that it can't be synchronized
> to all other processes from kernel pgd init_mm.pgd.
>
> In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
> PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
> makes this area be mapped inside one pgd entry. With kaslr enabled,
> this area could cross two pgd entries, then the next pgd entry won't
> be synced to all other processes. That is why we saw empty PGD.
>
> Fix it in this patch.
>
> The back trace is pasted as below:
>
> [9.988867] IP: memcpy_erms+0x6/0x10
> [9.988868] PGD 0
> [9.988868]
> [9.988870] Oops:  [#1] SMP
> [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) 
> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) 
> ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) 
> nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E)
> i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE   
> 4.11.0-rc5+ #43
> [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS 
> SE5C600.86B.02.01.SP06.050920141054 05/09/2014
> [9.98] task: 9267dc2f8000 task.stack: ba92c783c000
> [9.988890] RIP: 0010:memcpy_erms+0x6/0x10
> [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286
> [9.988892] RAX: 925f19e27000 RBX:  RCX: 
> 1000
> [9.988893] RDX: 1000 RSI: 9387bfff RDI: 
> 925f19e27000
> [9.988893] RBP: ba92c783fa38 R08:  R09: 
> 1780
> [9.988894] R10:  R11: 9387bfff R12: 
> 925fde811ed8
> [9.988895] R13: 002f R14: 1000 R15: 
> 925f19e27000
> [9.988896] FS:  7f1ee18e68c0() GS:925fdec0() 
> knlGS:
> [9.988896] CS:  0010 DS:  ES:  CR0: 80050033
> [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 
> 001406f0
> [9.988897] Call Trace:
> [9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
> [9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
> [9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
> [9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
> [9.988909]  bdev_read_page+0x81/0xb0
> [9.988911]  do_mpage_readpage+0x56f/0x770
> [9.988912]  ? I_BDEV+0x20/0x20
> [9.988915]  ? lru_cache_add+0xe/0x10
> [9.988917]  mpage_readpages+0x148/0x1e0
> [9.988917]  ? I_BDEV+0x20/0x20
> [9.988918]  ? I_BDEV+0x20/0x20
> [9.988921]  ? alloc_pages_current+0x88/0x120
> [9.988923]  blkdev_readpages+0x1d/0x20
> [9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
> [9.988926]  force_page_cache_readahead+0xa2/0x100
> [9.988927]  page_cache_sync_readahead+0x3f/0x50
> [9.988930]  generic_file_read_iter+0x60d/0x8c0
> [9.988931]  blkdev_read_iter+0x37/0x40
> [9.988933]  __vfs_read+0xe0/0x150
> [9.988934]  vfs_read+0x8c/0x130
> [9.988936]  SyS_read+0x55/0xc0
> [9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [9.988940] RIP: 0033:0x7f1ee0822480
> [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: 
> 
> [9.988942] RAX: ffda RBX:  RCX: 
> 7f1ee0822480
> [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 
> 0008
> [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 
> 0068
> [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: 
> 
> [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 
> 561b7e1a55e0
> [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 
> 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1  
> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8
> [9.988962] CR2: 9387bfff
> [9.989022] ---[ end trace fe34c0fc0fe685ab ]---
> [9.998690] Kernel panic - not syncing: Fatal exception
> [   10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation 
> range: 0x8000-0xbfff)
>
> Reported-by: Jeff Moyer 

[PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-03 Thread Baoquan He
Jeff Moyer reported that on his system with two memory regions 0~64G and
1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
will make system hang intermittently during boot. While adding 'nokaslr'
won't.

This is because the for loop count calculation in sync_global_pgds is
not correct. When a mapping area crosses pgd entries, we should
calculate the starting address of region which next pgd covers and assign
it to next for loop count, but not add PGDIR_SIZE directly. The old
code works right only if the mapping area is times of PGDIR_SIZE,
otherwize the end region could be skipped so that it can't be synchronized
to all other processes from kernel pgd init_mm.pgd.

In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
makes this area be mapped inside one pgd entry. With kaslr enabled,
this area could cross two pgd entries, then the next pgd entry won't
be synced to all other processes. That is why we saw empty PGD.

Fix it in this patch.

The back trace is pasted as below:

[9.988867] IP: memcpy_erms+0x6/0x10
[9.988868] PGD 0
[9.988868]
[9.988870] Oops:  [#1] SMP
[9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) 
syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) ttm(E) 
libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) nd_pmem(E) dca(E) 
drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E)
i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE   
4.11.0-rc5+ #43
[9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS 
SE5C600.86B.02.01.SP06.050920141054 05/09/2014
[9.98] task: 9267dc2f8000 task.stack: ba92c783c000
[9.988890] RIP: 0010:memcpy_erms+0x6/0x10
[9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286
[9.988892] RAX: 925f19e27000 RBX:  RCX: 1000
[9.988893] RDX: 1000 RSI: 9387bfff RDI: 925f19e27000
[9.988893] RBP: ba92c783fa38 R08:  R09: 1780
[9.988894] R10:  R11: 9387bfff R12: 925fde811ed8
[9.988895] R13: 002f R14: 1000 R15: 925f19e27000
[9.988896] FS:  7f1ee18e68c0() GS:925fdec0() 
knlGS:
[9.988896] CS:  0010 DS:  ES:  CR0: 80050033
[9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 001406f0
[9.988897] Call Trace:
[9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
[9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
[9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
[9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
[9.988909]  bdev_read_page+0x81/0xb0
[9.988911]  do_mpage_readpage+0x56f/0x770
[9.988912]  ? I_BDEV+0x20/0x20
[9.988915]  ? lru_cache_add+0xe/0x10
[9.988917]  mpage_readpages+0x148/0x1e0
[9.988917]  ? I_BDEV+0x20/0x20
[9.988918]  ? I_BDEV+0x20/0x20
[9.988921]  ? alloc_pages_current+0x88/0x120
[9.988923]  blkdev_readpages+0x1d/0x20
[9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
[9.988926]  force_page_cache_readahead+0xa2/0x100
[9.988927]  page_cache_sync_readahead+0x3f/0x50
[9.988930]  generic_file_read_iter+0x60d/0x8c0
[9.988931]  blkdev_read_iter+0x37/0x40
[9.988933]  __vfs_read+0xe0/0x150
[9.988934]  vfs_read+0x8c/0x130
[9.988936]  SyS_read+0x55/0xc0
[9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[9.988940] RIP: 0033:0x7f1ee0822480
[9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: 

[9.988942] RAX: ffda RBX:  RCX: 7f1ee0822480
[9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 0008
[9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 0068
[9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: 
[9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 561b7e1a55e0
[9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 
03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1  a4 
c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8
[9.988962] CR2: 9387bfff
[9.989022] ---[ end trace fe34c0fc0fe685ab ]---
[9.998690] Kernel panic - not syncing: Fatal exception
[   10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation 
range: 0x8000-0xbfff)

Reported-by: Jeff Moyer 
Signed-off-by: Baoquan He 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Kees Cook 
Cc: Thomas Garnier 

[PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-03 Thread Baoquan He
Jeff Moyer reported that on his system with two memory regions 0~64G and
1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
will make system hang intermittently during boot. While adding 'nokaslr'
won't.

This is because the for loop count calculation in sync_global_pgds is
not correct. When a mapping area crosses pgd entries, we should
calculate the starting address of region which next pgd covers and assign
it to next for loop count, but not add PGDIR_SIZE directly. The old
code works right only if the mapping area is times of PGDIR_SIZE,
otherwize the end region could be skipped so that it can't be synchronized
to all other processes from kernel pgd init_mm.pgd.

In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
makes this area be mapped inside one pgd entry. With kaslr enabled,
this area could cross two pgd entries, then the next pgd entry won't
be synced to all other processes. That is why we saw empty PGD.

Fix it in this patch.

The back trace is pasted as below:

[9.988867] IP: memcpy_erms+0x6/0x10
[9.988868] PGD 0
[9.988868]
[9.988870] Oops:  [#1] SMP
[9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) 
syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) ttm(E) 
libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) nd_pmem(E) dca(E) 
drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E)
i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE   
4.11.0-rc5+ #43
[9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS 
SE5C600.86B.02.01.SP06.050920141054 05/09/2014
[9.98] task: 9267dc2f8000 task.stack: ba92c783c000
[9.988890] RIP: 0010:memcpy_erms+0x6/0x10
[9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286
[9.988892] RAX: 925f19e27000 RBX:  RCX: 1000
[9.988893] RDX: 1000 RSI: 9387bfff RDI: 925f19e27000
[9.988893] RBP: ba92c783fa38 R08:  R09: 1780
[9.988894] R10:  R11: 9387bfff R12: 925fde811ed8
[9.988895] R13: 002f R14: 1000 R15: 925f19e27000
[9.988896] FS:  7f1ee18e68c0() GS:925fdec0() 
knlGS:
[9.988896] CS:  0010 DS:  ES:  CR0: 80050033
[9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 001406f0
[9.988897] Call Trace:
[9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
[9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
[9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
[9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
[9.988909]  bdev_read_page+0x81/0xb0
[9.988911]  do_mpage_readpage+0x56f/0x770
[9.988912]  ? I_BDEV+0x20/0x20
[9.988915]  ? lru_cache_add+0xe/0x10
[9.988917]  mpage_readpages+0x148/0x1e0
[9.988917]  ? I_BDEV+0x20/0x20
[9.988918]  ? I_BDEV+0x20/0x20
[9.988921]  ? alloc_pages_current+0x88/0x120
[9.988923]  blkdev_readpages+0x1d/0x20
[9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
[9.988926]  force_page_cache_readahead+0xa2/0x100
[9.988927]  page_cache_sync_readahead+0x3f/0x50
[9.988930]  generic_file_read_iter+0x60d/0x8c0
[9.988931]  blkdev_read_iter+0x37/0x40
[9.988933]  __vfs_read+0xe0/0x150
[9.988934]  vfs_read+0x8c/0x130
[9.988936]  SyS_read+0x55/0xc0
[9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[9.988940] RIP: 0033:0x7f1ee0822480
[9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: 

[9.988942] RAX: ffda RBX:  RCX: 7f1ee0822480
[9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 0008
[9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 0068
[9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: 
[9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 561b7e1a55e0
[9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 
03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1  a4 
c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8
[9.988962] CR2: 9387bfff
[9.989022] ---[ end trace fe34c0fc0fe685ab ]---
[9.998690] Kernel panic - not syncing: Fatal exception
[   10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation 
range: 0x8000-0xbfff)

Reported-by: Jeff Moyer 
Signed-off-by: Baoquan He 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Andrew Morton 
Cc: Yasuaki Ishimatsu 
Cc: Jinbum Park 
Cc: Dave Hansen 
Cc: "Kirill A. Shutemov" 
Cc: Yinghai