Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
On 05/04/17 at 09:25am, Thomas Garnier wrote: > > I think this needs a "Fixes:" tag and Cc:. Sorry for late response, should I resend with them? > > Agreed. > > > > > Other than that: > > > > Reviewed-by: Dan Williams > > Thanks again! > > Reviewed-by: Thomas Garnier > -- > Thomas
Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
On 05/04/17 at 09:25am, Thomas Garnier wrote: > > I think this needs a "Fixes:" tag and Cc: . Sorry for late response, should I resend with them? > > Agreed. > > > > > Other than that: > > > > Reviewed-by: Dan Williams > > Thanks again! > > Reviewed-by: Thomas Garnier > -- > Thomas
Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
On Wed, May 3, 2017 at 7:35 PM, Dan Williamswrote: > On Wed, May 3, 2017 at 7:25 PM, Baoquan He wrote: >> Jeff Moyer reported that on his system with two memory regions 0~64G and >> 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr >> will make system hang intermittently during boot. While adding 'nokaslr' >> won't. >> >> This is because the for loop count calculation in sync_global_pgds is >> not correct. When a mapping area crosses pgd entries, we should >> calculate the starting address of region which next pgd covers and assign >> it to next for loop count, but not add PGDIR_SIZE directly. The old >> code works right only if the mapping area is times of PGDIR_SIZE, >> otherwize the end region could be skipped so that it can't be synchronized >> to all other processes from kernel pgd init_mm.pgd. >> >> In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than >> PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it >> makes this area be mapped inside one pgd entry. With kaslr enabled, >> this area could cross two pgd entries, then the next pgd entry won't >> be synced to all other processes. That is why we saw empty PGD. >> >> Fix it in this patch. >> >> The back trace is pasted as below: >> >> [9.988867] IP: memcpy_erms+0x6/0x10 >> [9.988868] PGD 0 >> [9.988868] >> [9.988870] Oops: [#1] SMP >> [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) >> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) >> ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) >> nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) >> i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) >> [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE >> 4.11.0-rc5+ #43 >> [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS >> SE5C600.86B.02.01.SP06.050920141054 05/09/2014 >> [9.98] task: 9267dc2f8000 task.stack: ba92c783c000 >> [9.988890] RIP: 0010:memcpy_erms+0x6/0x10 >> [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286 >> [9.988892] RAX: 925f19e27000 RBX: RCX: >> 1000 >> [9.988893] RDX: 1000 RSI: 9387bfff RDI: >> 925f19e27000 >> [9.988893] RBP: ba92c783fa38 R08: R09: >> 1780 >> [9.988894] R10: R11: 9387bfff R12: >> 925fde811ed8 >> [9.988895] R13: 002f R14: 1000 R15: >> 925f19e27000 >> [9.988896] FS: 7f1ee18e68c0() GS:925fdec0() >> knlGS: >> [9.988896] CS: 0010 DS: ES: CR0: 80050033 >> [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: >> 001406f0 >> [9.988897] Call Trace: >> [9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] >> [9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 >> [9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 >> [9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] >> [9.988909] bdev_read_page+0x81/0xb0 >> [9.988911] do_mpage_readpage+0x56f/0x770 >> [9.988912] ? I_BDEV+0x20/0x20 >> [9.988915] ? lru_cache_add+0xe/0x10 >> [9.988917] mpage_readpages+0x148/0x1e0 >> [9.988917] ? I_BDEV+0x20/0x20 >> [9.988918] ? I_BDEV+0x20/0x20 >> [9.988921] ? alloc_pages_current+0x88/0x120 >> [9.988923] blkdev_readpages+0x1d/0x20 >> [9.988924] __do_page_cache_readahead+0x1ce/0x2c0 >> [9.988926] force_page_cache_readahead+0xa2/0x100 >> [9.988927] page_cache_sync_readahead+0x3f/0x50 >> [9.988930] generic_file_read_iter+0x60d/0x8c0 >> [9.988931] blkdev_read_iter+0x37/0x40 >> [9.988933] __vfs_read+0xe0/0x150 >> [9.988934] vfs_read+0x8c/0x130 >> [9.988936] SyS_read+0x55/0xc0 >> [9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 >> [9.988940] RIP: 0033:0x7f1ee0822480 >> [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: >> >> [9.988942] RAX: ffda RBX: RCX: >> 7f1ee0822480 >> [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: >> 0008 >> [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: >> 0068 >> [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: >> >> [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: >> 561b7e1a55e0 >> [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 >> e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 >> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 >> [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8 >> [9.988962] CR2: 9387bfff >> [9.989022] ---[ end trace fe34c0fc0fe685ab ]--- >> [9.998690]
Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
On Wed, May 3, 2017 at 7:35 PM, Dan Williams wrote: > On Wed, May 3, 2017 at 7:25 PM, Baoquan He wrote: >> Jeff Moyer reported that on his system with two memory regions 0~64G and >> 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr >> will make system hang intermittently during boot. While adding 'nokaslr' >> won't. >> >> This is because the for loop count calculation in sync_global_pgds is >> not correct. When a mapping area crosses pgd entries, we should >> calculate the starting address of region which next pgd covers and assign >> it to next for loop count, but not add PGDIR_SIZE directly. The old >> code works right only if the mapping area is times of PGDIR_SIZE, >> otherwize the end region could be skipped so that it can't be synchronized >> to all other processes from kernel pgd init_mm.pgd. >> >> In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than >> PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it >> makes this area be mapped inside one pgd entry. With kaslr enabled, >> this area could cross two pgd entries, then the next pgd entry won't >> be synced to all other processes. That is why we saw empty PGD. >> >> Fix it in this patch. >> >> The back trace is pasted as below: >> >> [9.988867] IP: memcpy_erms+0x6/0x10 >> [9.988868] PGD 0 >> [9.988868] >> [9.988870] Oops: [#1] SMP >> [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) >> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) >> ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) >> nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) >> i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) >> [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE >> 4.11.0-rc5+ #43 >> [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS >> SE5C600.86B.02.01.SP06.050920141054 05/09/2014 >> [9.98] task: 9267dc2f8000 task.stack: ba92c783c000 >> [9.988890] RIP: 0010:memcpy_erms+0x6/0x10 >> [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286 >> [9.988892] RAX: 925f19e27000 RBX: RCX: >> 1000 >> [9.988893] RDX: 1000 RSI: 9387bfff RDI: >> 925f19e27000 >> [9.988893] RBP: ba92c783fa38 R08: R09: >> 1780 >> [9.988894] R10: R11: 9387bfff R12: >> 925fde811ed8 >> [9.988895] R13: 002f R14: 1000 R15: >> 925f19e27000 >> [9.988896] FS: 7f1ee18e68c0() GS:925fdec0() >> knlGS: >> [9.988896] CS: 0010 DS: ES: CR0: 80050033 >> [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: >> 001406f0 >> [9.988897] Call Trace: >> [9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] >> [9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 >> [9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 >> [9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] >> [9.988909] bdev_read_page+0x81/0xb0 >> [9.988911] do_mpage_readpage+0x56f/0x770 >> [9.988912] ? I_BDEV+0x20/0x20 >> [9.988915] ? lru_cache_add+0xe/0x10 >> [9.988917] mpage_readpages+0x148/0x1e0 >> [9.988917] ? I_BDEV+0x20/0x20 >> [9.988918] ? I_BDEV+0x20/0x20 >> [9.988921] ? alloc_pages_current+0x88/0x120 >> [9.988923] blkdev_readpages+0x1d/0x20 >> [9.988924] __do_page_cache_readahead+0x1ce/0x2c0 >> [9.988926] force_page_cache_readahead+0xa2/0x100 >> [9.988927] page_cache_sync_readahead+0x3f/0x50 >> [9.988930] generic_file_read_iter+0x60d/0x8c0 >> [9.988931] blkdev_read_iter+0x37/0x40 >> [9.988933] __vfs_read+0xe0/0x150 >> [9.988934] vfs_read+0x8c/0x130 >> [9.988936] SyS_read+0x55/0xc0 >> [9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 >> [9.988940] RIP: 0033:0x7f1ee0822480 >> [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: >> >> [9.988942] RAX: ffda RBX: RCX: >> 7f1ee0822480 >> [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: >> 0008 >> [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: >> 0068 >> [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: >> >> [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: >> 561b7e1a55e0 >> [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 >> e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 >> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 >> [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8 >> [9.988962] CR2: 9387bfff >> [9.989022] ---[ end trace fe34c0fc0fe685ab ]--- >> [9.998690] Kernel panic - not syncing: Fatal exception >> [
Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
On Wed, May 3, 2017 at 7:25 PM, Baoquan Hewrote: > Jeff Moyer reported that on his system with two memory regions 0~64G and > 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr > will make system hang intermittently during boot. While adding 'nokaslr' > won't. > > This is because the for loop count calculation in sync_global_pgds is > not correct. When a mapping area crosses pgd entries, we should > calculate the starting address of region which next pgd covers and assign > it to next for loop count, but not add PGDIR_SIZE directly. The old > code works right only if the mapping area is times of PGDIR_SIZE, > otherwize the end region could be skipped so that it can't be synchronized > to all other processes from kernel pgd init_mm.pgd. > > In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than > PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it > makes this area be mapped inside one pgd entry. With kaslr enabled, > this area could cross two pgd entries, then the next pgd entry won't > be synced to all other processes. That is why we saw empty PGD. > > Fix it in this patch. > > The back trace is pasted as below: > > [9.988867] IP: memcpy_erms+0x6/0x10 > [9.988868] PGD 0 > [9.988868] > [9.988870] Oops: [#1] SMP > [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) > syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) > ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) > nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) > i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE > 4.11.0-rc5+ #43 > [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS > SE5C600.86B.02.01.SP06.050920141054 05/09/2014 > [9.98] task: 9267dc2f8000 task.stack: ba92c783c000 > [9.988890] RIP: 0010:memcpy_erms+0x6/0x10 > [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286 > [9.988892] RAX: 925f19e27000 RBX: RCX: > 1000 > [9.988893] RDX: 1000 RSI: 9387bfff RDI: > 925f19e27000 > [9.988893] RBP: ba92c783fa38 R08: R09: > 1780 > [9.988894] R10: R11: 9387bfff R12: > 925fde811ed8 > [9.988895] R13: 002f R14: 1000 R15: > 925f19e27000 > [9.988896] FS: 7f1ee18e68c0() GS:925fdec0() > knlGS: > [9.988896] CS: 0010 DS: ES: CR0: 80050033 > [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: > 001406f0 > [9.988897] Call Trace: > [9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] > [9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 > [9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 > [9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] > [9.988909] bdev_read_page+0x81/0xb0 > [9.988911] do_mpage_readpage+0x56f/0x770 > [9.988912] ? I_BDEV+0x20/0x20 > [9.988915] ? lru_cache_add+0xe/0x10 > [9.988917] mpage_readpages+0x148/0x1e0 > [9.988917] ? I_BDEV+0x20/0x20 > [9.988918] ? I_BDEV+0x20/0x20 > [9.988921] ? alloc_pages_current+0x88/0x120 > [9.988923] blkdev_readpages+0x1d/0x20 > [9.988924] __do_page_cache_readahead+0x1ce/0x2c0 > [9.988926] force_page_cache_readahead+0xa2/0x100 > [9.988927] page_cache_sync_readahead+0x3f/0x50 > [9.988930] generic_file_read_iter+0x60d/0x8c0 > [9.988931] blkdev_read_iter+0x37/0x40 > [9.988933] __vfs_read+0xe0/0x150 > [9.988934] vfs_read+0x8c/0x130 > [9.988936] SyS_read+0x55/0xc0 > [9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 > [9.988940] RIP: 0033:0x7f1ee0822480 > [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: > > [9.988942] RAX: ffda RBX: RCX: > 7f1ee0822480 > [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: > 0008 > [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: > 0068 > [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: > > [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: > 561b7e1a55e0 > [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 > 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 > a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 > [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8 > [9.988962] CR2: 9387bfff > [9.989022] ---[ end trace fe34c0fc0fe685ab ]--- > [9.998690] Kernel panic - not syncing: Fatal exception > [ 10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation > range: 0x8000-0xbfff) > >
Re: [PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
On Wed, May 3, 2017 at 7:25 PM, Baoquan He wrote: > Jeff Moyer reported that on his system with two memory regions 0~64G and > 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr > will make system hang intermittently during boot. While adding 'nokaslr' > won't. > > This is because the for loop count calculation in sync_global_pgds is > not correct. When a mapping area crosses pgd entries, we should > calculate the starting address of region which next pgd covers and assign > it to next for loop count, but not add PGDIR_SIZE directly. The old > code works right only if the mapping area is times of PGDIR_SIZE, > otherwize the end region could be skipped so that it can't be synchronized > to all other processes from kernel pgd init_mm.pgd. > > In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than > PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it > makes this area be mapped inside one pgd entry. With kaslr enabled, > this area could cross two pgd entries, then the next pgd entry won't > be synced to all other processes. That is why we saw empty PGD. > > Fix it in this patch. > > The back trace is pasted as below: > > [9.988867] IP: memcpy_erms+0x6/0x10 > [9.988868] PGD 0 > [9.988868] > [9.988870] Oops: [#1] SMP > [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) > syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) > ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) > nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) > i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE > 4.11.0-rc5+ #43 > [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS > SE5C600.86B.02.01.SP06.050920141054 05/09/2014 > [9.98] task: 9267dc2f8000 task.stack: ba92c783c000 > [9.988890] RIP: 0010:memcpy_erms+0x6/0x10 > [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286 > [9.988892] RAX: 925f19e27000 RBX: RCX: > 1000 > [9.988893] RDX: 1000 RSI: 9387bfff RDI: > 925f19e27000 > [9.988893] RBP: ba92c783fa38 R08: R09: > 1780 > [9.988894] R10: R11: 9387bfff R12: > 925fde811ed8 > [9.988895] R13: 002f R14: 1000 R15: > 925f19e27000 > [9.988896] FS: 7f1ee18e68c0() GS:925fdec0() > knlGS: > [9.988896] CS: 0010 DS: ES: CR0: 80050033 > [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: > 001406f0 > [9.988897] Call Trace: > [9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] > [9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 > [9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 > [9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] > [9.988909] bdev_read_page+0x81/0xb0 > [9.988911] do_mpage_readpage+0x56f/0x770 > [9.988912] ? I_BDEV+0x20/0x20 > [9.988915] ? lru_cache_add+0xe/0x10 > [9.988917] mpage_readpages+0x148/0x1e0 > [9.988917] ? I_BDEV+0x20/0x20 > [9.988918] ? I_BDEV+0x20/0x20 > [9.988921] ? alloc_pages_current+0x88/0x120 > [9.988923] blkdev_readpages+0x1d/0x20 > [9.988924] __do_page_cache_readahead+0x1ce/0x2c0 > [9.988926] force_page_cache_readahead+0xa2/0x100 > [9.988927] page_cache_sync_readahead+0x3f/0x50 > [9.988930] generic_file_read_iter+0x60d/0x8c0 > [9.988931] blkdev_read_iter+0x37/0x40 > [9.988933] __vfs_read+0xe0/0x150 > [9.988934] vfs_read+0x8c/0x130 > [9.988936] SyS_read+0x55/0xc0 > [9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 > [9.988940] RIP: 0033:0x7f1ee0822480 > [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: > > [9.988942] RAX: ffda RBX: RCX: > 7f1ee0822480 > [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: > 0008 > [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: > 0068 > [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: > > [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: > 561b7e1a55e0 > [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 > 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 > a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 > [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8 > [9.988962] CR2: 9387bfff > [9.989022] ---[ end trace fe34c0fc0fe685ab ]--- > [9.998690] Kernel panic - not syncing: Fatal exception > [ 10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation > range: 0x8000-0xbfff) > > Reported-by: Jeff Moyer
[PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
Jeff Moyer reported that on his system with two memory regions 0~64G and 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr will make system hang intermittently during boot. While adding 'nokaslr' won't. This is because the for loop count calculation in sync_global_pgds is not correct. When a mapping area crosses pgd entries, we should calculate the starting address of region which next pgd covers and assign it to next for loop count, but not add PGDIR_SIZE directly. The old code works right only if the mapping area is times of PGDIR_SIZE, otherwize the end region could be skipped so that it can't be synchronized to all other processes from kernel pgd init_mm.pgd. In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it makes this area be mapped inside one pgd entry. With kaslr enabled, this area could cross two pgd entries, then the next pgd entry won't be synced to all other processes. That is why we saw empty PGD. Fix it in this patch. The back trace is pasted as below: [9.988867] IP: memcpy_erms+0x6/0x10 [9.988868] PGD 0 [9.988868] [9.988870] Oops: [#1] SMP [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE 4.11.0-rc5+ #43 [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS SE5C600.86B.02.01.SP06.050920141054 05/09/2014 [9.98] task: 9267dc2f8000 task.stack: ba92c783c000 [9.988890] RIP: 0010:memcpy_erms+0x6/0x10 [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286 [9.988892] RAX: 925f19e27000 RBX: RCX: 1000 [9.988893] RDX: 1000 RSI: 9387bfff RDI: 925f19e27000 [9.988893] RBP: ba92c783fa38 R08: R09: 1780 [9.988894] R10: R11: 9387bfff R12: 925fde811ed8 [9.988895] R13: 002f R14: 1000 R15: 925f19e27000 [9.988896] FS: 7f1ee18e68c0() GS:925fdec0() knlGS: [9.988896] CS: 0010 DS: ES: CR0: 80050033 [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 001406f0 [9.988897] Call Trace: [9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] [9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 [9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 [9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] [9.988909] bdev_read_page+0x81/0xb0 [9.988911] do_mpage_readpage+0x56f/0x770 [9.988912] ? I_BDEV+0x20/0x20 [9.988915] ? lru_cache_add+0xe/0x10 [9.988917] mpage_readpages+0x148/0x1e0 [9.988917] ? I_BDEV+0x20/0x20 [9.988918] ? I_BDEV+0x20/0x20 [9.988921] ? alloc_pages_current+0x88/0x120 [9.988923] blkdev_readpages+0x1d/0x20 [9.988924] __do_page_cache_readahead+0x1ce/0x2c0 [9.988926] force_page_cache_readahead+0xa2/0x100 [9.988927] page_cache_sync_readahead+0x3f/0x50 [9.988930] generic_file_read_iter+0x60d/0x8c0 [9.988931] blkdev_read_iter+0x37/0x40 [9.988933] __vfs_read+0xe0/0x150 [9.988934] vfs_read+0x8c/0x130 [9.988936] SyS_read+0x55/0xc0 [9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 [9.988940] RIP: 0033:0x7f1ee0822480 [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: [9.988942] RAX: ffda RBX: RCX: 7f1ee0822480 [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 0008 [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 0068 [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 561b7e1a55e0 [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8 [9.988962] CR2: 9387bfff [9.989022] ---[ end trace fe34c0fc0fe685ab ]--- [9.998690] Kernel panic - not syncing: Fatal exception [ 10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation range: 0x8000-0xbfff) Reported-by: Jeff MoyerSigned-off-by: Baoquan He Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: Kees Cook Cc: Thomas Garnier
[PATCH v3] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
Jeff Moyer reported that on his system with two memory regions 0~64G and 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr will make system hang intermittently during boot. While adding 'nokaslr' won't. This is because the for loop count calculation in sync_global_pgds is not correct. When a mapping area crosses pgd entries, we should calculate the starting address of region which next pgd covers and assign it to next for loop count, but not add PGDIR_SIZE directly. The old code works right only if the mapping area is times of PGDIR_SIZE, otherwize the end region could be skipped so that it can't be synchronized to all other processes from kernel pgd init_mm.pgd. In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it makes this area be mapped inside one pgd entry. With kaslr enabled, this area could cross two pgd entries, then the next pgd entry won't be synced to all other processes. That is why we saw empty PGD. Fix it in this patch. The back trace is pasted as below: [9.988867] IP: memcpy_erms+0x6/0x10 [9.988868] PGD 0 [9.988868] [9.988870] Oops: [#1] SMP [9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [9.96] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: GE 4.11.0-rc5+ #43 [9.97] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS SE5C600.86B.02.01.SP06.050920141054 05/09/2014 [9.98] task: 9267dc2f8000 task.stack: ba92c783c000 [9.988890] RIP: 0010:memcpy_erms+0x6/0x10 [9.988891] RSP: 0018:ba92c783f9b8 EFLAGS: 00010286 [9.988892] RAX: 925f19e27000 RBX: RCX: 1000 [9.988893] RDX: 1000 RSI: 9387bfff RDI: 925f19e27000 [9.988893] RBP: ba92c783fa38 R08: R09: 1780 [9.988894] R10: R11: 9387bfff R12: 925fde811ed8 [9.988895] R13: 002f R14: 1000 R15: 925f19e27000 [9.988896] FS: 7f1ee18e68c0() GS:925fdec0() knlGS: [9.988896] CS: 0010 DS: ES: CR0: 80050033 [9.988897] CR2: 9387bfff CR3: 00081ba28000 CR4: 001406f0 [9.988897] Call Trace: [9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] [9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 [9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 [9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] [9.988909] bdev_read_page+0x81/0xb0 [9.988911] do_mpage_readpage+0x56f/0x770 [9.988912] ? I_BDEV+0x20/0x20 [9.988915] ? lru_cache_add+0xe/0x10 [9.988917] mpage_readpages+0x148/0x1e0 [9.988917] ? I_BDEV+0x20/0x20 [9.988918] ? I_BDEV+0x20/0x20 [9.988921] ? alloc_pages_current+0x88/0x120 [9.988923] blkdev_readpages+0x1d/0x20 [9.988924] __do_page_cache_readahead+0x1ce/0x2c0 [9.988926] force_page_cache_readahead+0xa2/0x100 [9.988927] page_cache_sync_readahead+0x3f/0x50 [9.988930] generic_file_read_iter+0x60d/0x8c0 [9.988931] blkdev_read_iter+0x37/0x40 [9.988933] __vfs_read+0xe0/0x150 [9.988934] vfs_read+0x8c/0x130 [9.988936] SyS_read+0x55/0xc0 [9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 [9.988940] RIP: 0033:0x7f1ee0822480 [9.988941] RSP: 002b:7ffcf9e741f8 EFLAGS: 0246 ORIG_RAX: [9.988942] RAX: ffda RBX: RCX: 7f1ee0822480 [9.988943] RDX: 0040 RSI: 561b7e1aabc8 RDI: 0008 [9.988943] RBP: 561b7e1a86a0 R08: 0005 R09: 0068 [9.988944] R10: 7ffcf9e73f80 R11: 0246 R12: [9.988945] R13: 0001 R14: 561b7e1a61b0 R15: 561b7e1a55e0 [9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 [9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ba92c783f9b8 [9.988962] CR2: 9387bfff [9.989022] ---[ end trace fe34c0fc0fe685ab ]--- [9.998690] Kernel panic - not syncing: Fatal exception [ 10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation range: 0x8000-0xbfff) Reported-by: Jeff Moyer Signed-off-by: Baoquan He Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: Kees Cook Cc: Thomas Garnier Cc: Andrew Morton Cc: Yasuaki Ishimatsu Cc: Jinbum Park Cc: Dave Hansen Cc: "Kirill A. Shutemov" Cc: Yinghai