Re: [PATCH v3] ARM: mm: support big-endian page tables
Hi Russell, Could you please merge this to mainline? Thanks! Jianguo Wu. On 2014/4/24 10:51, Jianguo Wu wrote: > On 2014/4/23 21:20, Will Deacon wrote: > >> Hi Jianguo, >> >> On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote: >>> On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu >>> wrote: >>>> When enable LPAE and big-endian in a hisilicon board, while specify >>>> mem=384M mem=512M@7680M, will get bad page state: >>>> >>>> Freeing unused kernel memory: 180K (c0466000 - c0493000) >>>> BUG: Bad page state in process init pfn:fa442 >>>> page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 >>>> page flags: 0x4400(reserved) >>>> Modules linked in: >>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 >>>> [] (unwind_backtrace+0x0/0x11c) from [] >>>> (show_stack+0x10/0x14) >>>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) >>>> [] (bad_page+0xd4/0x104) from [] >>>> (free_pages_prepare+0xa8/0x14c) >>>> [] (free_pages_prepare+0xa8/0x14c) from [] >>>> (free_hot_cold_page+0x18/0xf0) >>>> [] (free_hot_cold_page+0x18/0xf0) from [] >>>> (handle_pte_fault+0xcf4/0xdc8) >>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] >>>> (handle_mm_fault+0xf4/0x120) >>>> [] (handle_mm_fault+0xf4/0x120) from [] >>>> (do_page_fault+0xfc/0x354) >>>> [] (do_page_fault+0xfc/0x354) from [] >>>> (do_DataAbort+0x2c/0x90) >>>> [] (do_DataAbort+0x2c/0x90) from [] >>>> (__dabt_usr+0x34/0x40) >> >> >> [...] >> >> Please can you put this into Russell's patch system? You can also add my >> ack: >> >> Acked-by: Will Deacon >> >> You should also CC stable in the commit log. >> > > Hi Will, > I have submit to > http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1. > > Thanks, > Jianguo Wu. > >> Cheers, >> >> Will >> >> . >> > > > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ARM: mm: support big-endian page tables
Hi Russell, Could you please merge this to mainline? Thanks! Jianguo Wu. On 2014/4/24 10:51, Jianguo Wu wrote: On 2014/4/23 21:20, Will Deacon wrote: Hi Jianguo, On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote: On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu wujian...@huawei.com wrote: When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) [...] Please can you put this into Russell's patch system? You can also add my ack: Acked-by: Will Deacon will.dea...@arm.com You should also CC stable sta...@vger.kernel.org in the commit log. Hi Will, I have submit to http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1. Thanks, Jianguo Wu. Cheers, Will . ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ARM: mm: support big-endian page tables
On 2014/4/23 21:20, Will Deacon wrote: > Hi Jianguo, > > On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote: >> On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu >> wrote: >>> When enable LPAE and big-endian in a hisilicon board, while specify >>> mem=384M mem=512M@7680M, will get bad page state: >>> >>> Freeing unused kernel memory: 180K (c0466000 - c0493000) >>> BUG: Bad page state in process init pfn:fa442 >>> page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 >>> page flags: 0x4400(reserved) >>> Modules linked in: >>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 >>> [] (unwind_backtrace+0x0/0x11c) from [] >>> (show_stack+0x10/0x14) >>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) >>> [] (bad_page+0xd4/0x104) from [] >>> (free_pages_prepare+0xa8/0x14c) >>> [] (free_pages_prepare+0xa8/0x14c) from [] >>> (free_hot_cold_page+0x18/0xf0) >>> [] (free_hot_cold_page+0x18/0xf0) from [] >>> (handle_pte_fault+0xcf4/0xdc8) >>> [] (handle_pte_fault+0xcf4/0xdc8) from [] >>> (handle_mm_fault+0xf4/0x120) >>> [] (handle_mm_fault+0xf4/0x120) from [] >>> (do_page_fault+0xfc/0x354) >>> [] (do_page_fault+0xfc/0x354) from [] >>> (do_DataAbort+0x2c/0x90) >>> [] (do_DataAbort+0x2c/0x90) from [] >>> (__dabt_usr+0x34/0x40) > > > [...] > > Please can you put this into Russell's patch system? You can also add my > ack: > > Acked-by: Will Deacon > > You should also CC stable in the commit log. > Hi Will, I have submit to http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1. Thanks, Jianguo Wu. > Cheers, > > Will > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ARM: mm: support big-endian page tables
On 2014/4/23 21:20, Will Deacon wrote: Hi Jianguo, On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote: On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu wujian...@huawei.com wrote: When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) [...] Please can you put this into Russell's patch system? You can also add my ack: Acked-by: Will Deacon will.dea...@arm.com You should also CC stable sta...@vger.kernel.org in the commit log. Hi Will, I have submit to http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1. Thanks, Jianguo Wu. Cheers, Will . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] ARM: mm: support big-endian page tables
When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [] (unwind_backtrace+0x0/0x11c) from [] (show_stack+0x10/0x14) [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) [] (bad_page+0xd4/0x104) from [] (free_pages_prepare+0xa8/0x14c) [] (free_pages_prepare+0xa8/0x14c) from [] (free_hot_cold_page+0x18/0xf0) [] (free_hot_cold_page+0x18/0xf0) from [] (handle_pte_fault+0xcf4/0xdc8) [] (handle_pte_fault+0xcf4/0xdc8) from [] (handle_mm_fault+0xf4/0x120) [] (handle_mm_fault+0xf4/0x120) from [] (do_page_fault+0xfc/0x354) [] (do_page_fault+0xfc/0x354) from [] (do_DataAbort+0x2c/0x90) [] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers. On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB. On a BE kernel, the assignment is reversed. Unfortunately, the current code always assumes the LE case, leading to corruption of the PTE when clearing/setting bits. This patch fixes this issue much like it has been done already in the cpu_v7_switch_mm case. Signed-off-by: Jianguo Wu Cc: sta...@vger.kernel.org --- -v2: Refactoring code suggested by Ben Dooks. -v3: Rewrite commit message suggested by Marc Zyngier. --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - tst r2, #L_PTE_VALID + tst rl, #L_PTE_VALID beq 1f - tst r3, #1 << (57 - 32) @ L_PTE_NONE - bicne r2, #L_PTE_VALID + tst rh, #1 << (57 - 32) @ L_PTE_NONE + bicne rl, #L_PTE_VALID bne 1f - tst r3, #1 << (55 - 32) @ L_PTE_DIRTY - orreq r2, #L_PTE_RDONLY + tst rh, #1 << (55 - 32) @ L_PTE_DIRTY + orreq rl, #L_PTE_RDONLY 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] ARM: mm: support big-endian page tables
When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers. On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB. On a BE kernel, the assignment is reversed. Unfortunately, the current code always assumes the LE case, leading to corruption of the PTE when clearing/setting bits. This patch fixes this issue much like it has been done already in the cpu_v7_switch_mm case. Signed-off-by: Jianguo Wu wujian...@huawei.com Cc: sta...@vger.kernel.org --- -v2: Refactoring code suggested by Ben Dooks. -v3: Rewrite commit message suggested by Marc Zyngier. --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - tst r2, #L_PTE_VALID + tst rl, #L_PTE_VALID beq 1f - tst r3, #1 (57 - 32) @ L_PTE_NONE - bicne r2, #L_PTE_VALID + tst rh, #1 (57 - 32) @ L_PTE_NONE + bicne rl, #L_PTE_VALID bne 1f - tst r3, #1 (55 - 32) @ L_PTE_DIRTY - orreq r2, #L_PTE_RDONLY + tst rh, #1 (55 - 32) @ L_PTE_DIRTY + orreq rl, #L_PTE_RDONLY 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: mm: support big-endian page tables
On 2014/4/16 20:28, Marc Zyngier wrote: > On 16/04/14 03:45, Jianguo Wu wrote: >> On 2014/4/14 19:14, Marc Zyngier wrote: >> >>> On 14/04/14 11:43, Will Deacon wrote: >>>> (catching up on old email) >>>> >>>> On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote: >>>>> Cloud you please take a look at this? >>>> >>>> [...] >>>> >>>>> On 2014/2/17 15:05, Jianguo Wu wrote: >>>>>> When enable LPAE and big-endian in a hisilicon board, while specify >>>>>> mem=384M mem=512M@7680M, will get bad page state: >>>>>> >>>>>> Freeing unused kernel memory: 180K (c0466000 - c0493000) >>>>>> BUG: Bad page state in process init pfn:fa442 >>>>>> page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 >>>>>> page flags: 0x4400(reserved) >>>>>> Modules linked in: >>>>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 >>>>>> [] (unwind_backtrace+0x0/0x11c) from [] >>>>>> (show_stack+0x10/0x14) >>>>>> [] (show_stack+0x10/0x14) from [] >>>>>> (bad_page+0xd4/0x104) >>>>>> [] (bad_page+0xd4/0x104) from [] >>>>>> (free_pages_prepare+0xa8/0x14c) >>>>>> [] (free_pages_prepare+0xa8/0x14c) from [] >>>>>> (free_hot_cold_page+0x18/0xf0) >>>>>> [] (free_hot_cold_page+0x18/0xf0) from [] >>>>>> (handle_pte_fault+0xcf4/0xdc8) >>>>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] >>>>>> (handle_mm_fault+0xf4/0x120) >>>>>> [] (handle_mm_fault+0xf4/0x120) from [] >>>>>> (do_page_fault+0xfc/0x354) >>>>>> [] (do_page_fault+0xfc/0x354) from [] >>>>>> (do_DataAbort+0x2c/0x90) >>>>>> [] (do_DataAbort+0x2c/0x90) from [] >>>>>> (__dabt_usr+0x34/0x40) >>>> >>>> [...] >>>> >>>>>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte): >>>>>> when pte is 64-bit, for little-endian, will store low 32-bit in r2, >>>>>> high 32-bit in r3; for big-endian, will store low 32-bit in r3, >>>>>> high 32-bit in r2, this will cause wrong pfn stored in pte, >>>>>> so we should exchange r2 and r3 for big-endian. >>>> >> >> Hi Marc, >> How about this: >> >> The bug is happened in cpu_v7_set_pte_ext(ptep, pte): >> - It tests the L_PTE_NONE in one word on the other, and possibly clear >> L_PTE_VALID >> tstr3, #1 << (57 - 32) @ L_PTE_NONE >> bicne r2, #L_PTE_VALID >> - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY >> >> As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the >> endianness, >> for little-endian, will store low 32-bit in r2, high 32-bit in r3, >> for big-endian, will store low 32-bit in r3, high 32-bit in r2, >> this will cause wrong bit is cleared or set, and get wrong pfn. >> So we should exchange r2 and r3 for big-endian. > > May I suggest the following instead: > > "An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the > r2 and r3 registers. > On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB. > On a BE kernel, the assignment is reversed. > > Unfortunately, the current code always assumes the LE case, > leading to corruption of the PTE when clearing/setting bits. > > This patch fixes this issue much like it has been done already in the > cpu_v7_switch_mm case." > OK, I will sent a new version, thanks! > Cheers, > > M. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: mm: support big-endian page tables
On 2014/4/16 20:28, Marc Zyngier wrote: On 16/04/14 03:45, Jianguo Wu wrote: On 2014/4/14 19:14, Marc Zyngier wrote: On 14/04/14 11:43, Will Deacon wrote: (catching up on old email) On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote: Cloud you please take a look at this? [...] On 2014/2/17 15:05, Jianguo Wu wrote: When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) [...] The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Hi Marc, How about this: The bug is happened in cpu_v7_set_pte_ext(ptep, pte): - It tests the L_PTE_NONE in one word on the other, and possibly clear L_PTE_VALID tstr3, #1 (57 - 32) @ L_PTE_NONE bicne r2, #L_PTE_VALID - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the endianness, for little-endian, will store low 32-bit in r2, high 32-bit in r3, for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong bit is cleared or set, and get wrong pfn. So we should exchange r2 and r3 for big-endian. May I suggest the following instead: An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers. On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB. On a BE kernel, the assignment is reversed. Unfortunately, the current code always assumes the LE case, leading to corruption of the PTE when clearing/setting bits. This patch fixes this issue much like it has been done already in the cpu_v7_switch_mm case. OK, I will sent a new version, thanks! Cheers, M. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: mm: support big-endian page tables
On 2014/4/14 19:14, Marc Zyngier wrote: > On 14/04/14 11:43, Will Deacon wrote: >> (catching up on old email) >> >> On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote: >>> Cloud you please take a look at this? >> >> [...] >> >>> On 2014/2/17 15:05, Jianguo Wu wrote: >>>> When enable LPAE and big-endian in a hisilicon board, while specify >>>> mem=384M mem=512M@7680M, will get bad page state: >>>> >>>> Freeing unused kernel memory: 180K (c0466000 - c0493000) >>>> BUG: Bad page state in process init pfn:fa442 >>>> page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 >>>> page flags: 0x4400(reserved) >>>> Modules linked in: >>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 >>>> [] (unwind_backtrace+0x0/0x11c) from [] >>>> (show_stack+0x10/0x14) >>>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) >>>> [] (bad_page+0xd4/0x104) from [] >>>> (free_pages_prepare+0xa8/0x14c) >>>> [] (free_pages_prepare+0xa8/0x14c) from [] >>>> (free_hot_cold_page+0x18/0xf0) >>>> [] (free_hot_cold_page+0x18/0xf0) from [] >>>> (handle_pte_fault+0xcf4/0xdc8) >>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] >>>> (handle_mm_fault+0xf4/0x120) >>>> [] (handle_mm_fault+0xf4/0x120) from [] >>>> (do_page_fault+0xfc/0x354) >>>> [] (do_page_fault+0xfc/0x354) from [] >>>> (do_DataAbort+0x2c/0x90) >>>> [] (do_DataAbort+0x2c/0x90) from [] >>>> (__dabt_usr+0x34/0x40) >> >> [...] >> >>>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte): >>>> when pte is 64-bit, for little-endian, will store low 32-bit in r2, >>>> high 32-bit in r3; for big-endian, will store low 32-bit in r3, >>>> high 32-bit in r2, this will cause wrong pfn stored in pte, >>>> so we should exchange r2 and r3 for big-endian. >> Hi Marc, How about this: The bug is happened in cpu_v7_set_pte_ext(ptep, pte): - It tests the L_PTE_NONE in one word on the other, and possibly clear L_PTE_VALID tst r3, #1 << (57 - 32) @ L_PTE_NONE bicne r2, #L_PTE_VALID - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the endianness, for little-endian, will store low 32-bit in r2, high 32-bit in r3, for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong bit is cleared or set, and get wrong pfn. So we should exchange r2 and r3 for big-endian. Thanks, Jianguo Wu. >> I believe that Marc (added to CC) has been running LPAE-enabled, big-endian >> KVM guests without any issues, so it seems unlikely that we're storing the >> PTEs backwards. Can you check the configuration of SCTLR.EE? > > So, for the record: > > root@when-the-lie-s-so-big:~# cat /proc/cpuinfo > processor : 0 > model name: ARMv7 Processor rev 4 (v7b) > Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 > idiva idivt vfpd32 lpae evtstrm > CPU implementer : 0x41 > CPU architecture: 7 > CPU variant : 0x0 > CPU part : 0xc07 > CPU revision : 4 > > processor : 1 > model name: ARMv7 Processor rev 4 (v7b) > Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 > idiva idivt vfpd32 lpae evtstrm > CPU implementer : 0x41 > CPU architecture: 7 > CPU variant : 0x0 > CPU part : 0xc07 > CPU revision : 4 > > Hardware : Dummy Virtual Machine > Revision : > Serial: > root@when-the-lie-s-so-big:~# uname -a > Linux when-the-lie-s-so-big 3.14.0+ #2465 SMP PREEMPT Tue Apr 8 13:05:11 BST > 2014 armv7b GNU/Linux > > Now, looking at the patch, I think it makes some sense: > - Depending on the endianness, we have to test the L_PTE_NONE in one > word on the other, and possibly clear L_PTE_VALID > - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY > > The commit message looks wrong though, as it mention the PTE storage in > memory (which looks completely fine to me, and explain why I was able to > boot a guest). As none of my guest RAM is above 4GB IPA, I didn't see > the corruption of bit 32 in the PTE (which should have been bit 0, > corresponding to L_PTE_VALID). > > So, provided that the commit message is rewritten to match the what it does, > I'm fine with that patch. > > Thanks, > > M. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: mm: support big-endian page tables
On 2014/4/14 19:14, Marc Zyngier wrote: On 14/04/14 11:43, Will Deacon wrote: (catching up on old email) On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote: Cloud you please take a look at this? [...] On 2014/2/17 15:05, Jianguo Wu wrote: When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) [...] The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Hi Marc, How about this: The bug is happened in cpu_v7_set_pte_ext(ptep, pte): - It tests the L_PTE_NONE in one word on the other, and possibly clear L_PTE_VALID tst r3, #1 (57 - 32) @ L_PTE_NONE bicne r2, #L_PTE_VALID - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the endianness, for little-endian, will store low 32-bit in r2, high 32-bit in r3, for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong bit is cleared or set, and get wrong pfn. So we should exchange r2 and r3 for big-endian. Thanks, Jianguo Wu. I believe that Marc (added to CC) has been running LPAE-enabled, big-endian KVM guests without any issues, so it seems unlikely that we're storing the PTEs backwards. Can you check the configuration of SCTLR.EE? So, for the record: root@when-the-lie-s-so-big:~# cat /proc/cpuinfo processor : 0 model name: ARMv7 Processor rev 4 (v7b) Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 4 processor : 1 model name: ARMv7 Processor rev 4 (v7b) Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 4 Hardware : Dummy Virtual Machine Revision : Serial: root@when-the-lie-s-so-big:~# uname -a Linux when-the-lie-s-so-big 3.14.0+ #2465 SMP PREEMPT Tue Apr 8 13:05:11 BST 2014 armv7b GNU/Linux Now, looking at the patch, I think it makes some sense: - Depending on the endianness, we have to test the L_PTE_NONE in one word on the other, and possibly clear L_PTE_VALID - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY The commit message looks wrong though, as it mention the PTE storage in memory (which looks completely fine to me, and explain why I was able to boot a guest). As none of my guest RAM is above 4GB IPA, I didn't see the corruption of bit 32 in the PTE (which should have been bit 0, corresponding to L_PTE_VALID). So, provided that the commit message is rewritten to match the what it does, I'm fine with that patch. Thanks, M. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.4 93/99] iwlwifi: always copy first 16 bytes of commands
On 2014/3/25 17:29, Andreas Sturmlechner wrote: > Original Message from: Ben Hutchings >> >> One piece of my backport to 3.2.y went missing in the forward-port to >> 3.4.y. Can you test 3.4.83 with this patch on top? >> >> Ben. > > iwlwifi works with the additional patch, thanks :) > > > Sorry for the missing part, thanks, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.4 93/99] iwlwifi: always copy first 16 bytes of commands
On 2014/3/25 17:29, Andreas Sturmlechner wrote: Original Message from: Ben Hutchings b...@decadent.org.uk One piece of my backport to 3.2.y went missing in the forward-port to 3.4.y. Can you test 3.4.83 with this patch on top? Ben. iwlwifi works with the additional patch, thanks :) Sorry for the missing part, thanks, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: mm: support big-endian page tables
Hi Russell, Cloud you please take a look at this? Thanks! On 2014/2/17 15:05, Jianguo Wu wrote: > When enable LPAE and big-endian in a hisilicon board, while specify > mem=384M mem=512M@7680M, will get bad page state: > > Freeing unused kernel memory: 180K (c0466000 - c0493000) > BUG: Bad page state in process init pfn:fa442 > page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 > page flags: 0x4400(reserved) > Modules linked in: > CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 > [] (unwind_backtrace+0x0/0x11c) from [] > (show_stack+0x10/0x14) > [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) > [] (bad_page+0xd4/0x104) from [] > (free_pages_prepare+0xa8/0x14c) > [] (free_pages_prepare+0xa8/0x14c) from [] > (free_hot_cold_page+0x18/0xf0) > [] (free_hot_cold_page+0x18/0xf0) from [] > (handle_pte_fault+0xcf4/0xdc8) > [] (handle_pte_fault+0xcf4/0xdc8) from [] > (handle_mm_fault+0xf4/0x120) > [] (handle_mm_fault+0xf4/0x120) from [] > (do_page_fault+0xfc/0x354) > [] (do_page_fault+0xfc/0x354) from [] > (do_DataAbort+0x2c/0x90) > [] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40) > > The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after > debugging, > I find in page fault handler, will get wrong pfn from pte just after set pte, > as follow: > do_anonymous_page() > { > ... > set_pte_at(mm, address, page_table, entry); > > //debug code > pfn = pte_pfn(entry); > pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry)); > > //read out the pte just set > new_pte = pte_offset_map(pmd, address); > new_pfn = pte_pfn(*new_pte); > pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry)); > ... > } > > pfn: 0x1fa4f5, pte:0xc1fa4f575f > new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. > > The bug is happened in cpu_v7_set_pte_ext(ptep, pte): > when pte is 64-bit, for little-endian, will store low 32-bit in r2, > high 32-bit in r3; for big-endian, will store low 32-bit in r3, > high 32-bit in r2, this will cause wrong pfn stored in pte, > so we should exchange r2 and r3 for big-endian. > > Signed-off-by: Jianguo Wu > --- > arch/arm/mm/proc-v7-3level.S | 18 +- > 1 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S > index 01a719e..22e3ad6 100644 > --- a/arch/arm/mm/proc-v7-3level.S > +++ b/arch/arm/mm/proc-v7-3level.S > @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) > mov pc, lr > ENDPROC(cpu_v7_switch_mm) > > +#ifdef __ARMEB__ > +#define rl r3 > +#define rh r2 > +#else > +#define rl r2 > +#define rh r3 > +#endif > + > /* > * cpu_v7_set_pte_ext(ptep, pte) > * > @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) > */ > ENTRY(cpu_v7_set_pte_ext) > #ifdef CONFIG_MMU > - tst r2, #L_PTE_VALID > + tst rl, #L_PTE_VALID > beq 1f > - tst r3, #1 << (57 - 32) @ L_PTE_NONE > - bicne r2, #L_PTE_VALID > + tst rh, #1 << (57 - 32) @ L_PTE_NONE > + bicne rl, #L_PTE_VALID > bne 1f > - tst r3, #1 << (55 - 32) @ L_PTE_DIRTY > - orreq r2, #L_PTE_RDONLY > + tst rh, #1 << (55 - 32) @ L_PTE_DIRTY > + orreq rl, #L_PTE_RDONLY > 1: strdr2, r3, [r0] > ALT_SMP(W(nop)) > ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: mm: support big-endian page tables
Hi Russell, Cloud you please take a look at this? Thanks! On 2014/2/17 15:05, Jianguo Wu wrote: When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - tst r2, #L_PTE_VALID + tst rl, #L_PTE_VALID beq 1f - tst r3, #1 (57 - 32) @ L_PTE_NONE - bicne r2, #L_PTE_VALID + tst rh, #1 (57 - 32) @ L_PTE_NONE + bicne rl, #L_PTE_VALID bne 1f - tst r3, #1 (55 - 32) @ L_PTE_DIRTY - orreq r2, #L_PTE_RDONLY + tst rh, #1 (55 - 32) @ L_PTE_DIRTY + orreq rl, #L_PTE_RDONLY 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/11] userspace out of memory handling
On 2014/3/6 10:52, David Rientjes wrote: > On Wed, 5 Mar 2014, Andrew Morton wrote: > >>> This patchset introduces a standard interface through memcg that allows >>> both of these conditions to be handled in the same clean way: users >>> define memory.oom_reserve_in_bytes to define the reserve and this >>> amount is allowed to be overcharged to the process handling the oom >>> condition's memcg. If used with the root memcg, this amount is allowed >>> to be allocated below the per-zone watermarks for root processes that >>> are handling such conditions (only root may write to >>> cgroup.event_control for the root memcg). >> >> If process A is trying to allocate memory, cannot do so and the >> userspace oom-killer is invoked, there must be means via which process >> A waits for the userspace oom-killer's action. > > It does so by relooping in the page allocator waiting for memory to be > freed just like it would if the kernel oom killer were called and process > A was waiting for the oom kill victim process B to exit, we don't have the > ability to put it on a waitqueue because we don't touch the freeing > hotpath. The userspace oom handler may not even necessarily kill > anything, it may be able to free its own memory and start throttling other > processes, for example. > >> And there must be >> fallbacks which occur if the userspace oom killer fails to clear the >> oom condition, or times out. >> > > I agree completely and proposed this before as memory.oom_delay_millisecs > at http://lwn.net/Articles/432226 which we use internally when memory > can't be freed or a memcg's limit cannot be expanded. I guess it makes > more sense alongside the rest of this patchset now, I can add it as an > additional patch next time around. > >> Would be interested to see a description of how all this works. >> > > There's an article for LWN also being developed on this topic. As > mentioned in that article, I think it would be best to generalize a lot of > the common functions and the eventfd handling entirely into a library. > I've attached an example implementation that just invokes a function to > handle the situation. > > For Google's usecase specifically, at the root memcg level (system oom) we > want to do priority based memcg killing. We want to kill from within a > memcg hierarchy that has the lowest priority relative to other memcgs. > This cannot be implemented with /proc/pid/oom_score_adj today. Those > priorities may also change depending on whether a memcg hierarchy is > "overlimit", i.e. its limit has been increased temporarily because it has > hit a memcg oom and additional memory is readily available on the system. > > So why not just introduce a memcg tunable that specifies a priority? > Well, it's not that simple. Other users will want to implement different > policies on system oom (think about things like existing panic_on_oom or > oom_kill_allocating_task sysctls). I introduced oom_kill_allocating_task > originally for SGI because they wanted a fast oom kill rather than > expensive tasklist scan: the allocating task itself is rather irrelevant, > it was just the unlucky task that was allocating at the moment that oom > was triggered. What's guaranteed is that current in that case will always > free memory from under oom (it's not a member of some other mempolicy or > cpuset that would be needlessly killed). Both sysctls could trivially be > reimplemented in userspace with this feature. > > I have other customers who don't run in a memcg environment at all, they > simply reattach all processes to root and delete all other memcgs. These > customers are only concerned about system oom conditions and want to do > something "interesting" before a process is killed. Some want to log the > VM statistics as an artifact to examine later, some want to examine heap > profiles, others can start throttling and freeing memory rather than kill > anything. All of this is impossible today because the kernel oom killer > will simply kill something immediately and any stats we collect afterwards > don't represent the oom condition. The heap profiles are lost, throttling > is useless, etc. > > Jianguo (cc'd) may also have usecases not described here. > I want to log memory usage, like slabinfo, vmalloc info, page-cache info, etc. before kill anything. >> It is unfortunate that this feature is memcg-only. Surely it could >> also be used by non-memcg setups. Would like to see at least a >> detailed description of how this will all be presented and implemented. >> We should aim to make the memcg and non-memcg userspace interfaces and >> user-visible behaviour as similar as possible. >> > > It's memcg only because it can handle both system and memcg oom conditions > with the same clean interface, it would be possible to implement only > system oom condition handling through procfs (a little sloppy since it > needs to register the eventfd) but then a
Re: [patch 00/11] userspace out of memory handling
On 2014/3/6 10:52, David Rientjes wrote: On Wed, 5 Mar 2014, Andrew Morton wrote: This patchset introduces a standard interface through memcg that allows both of these conditions to be handled in the same clean way: users define memory.oom_reserve_in_bytes to define the reserve and this amount is allowed to be overcharged to the process handling the oom condition's memcg. If used with the root memcg, this amount is allowed to be allocated below the per-zone watermarks for root processes that are handling such conditions (only root may write to cgroup.event_control for the root memcg). If process A is trying to allocate memory, cannot do so and the userspace oom-killer is invoked, there must be means via which process A waits for the userspace oom-killer's action. It does so by relooping in the page allocator waiting for memory to be freed just like it would if the kernel oom killer were called and process A was waiting for the oom kill victim process B to exit, we don't have the ability to put it on a waitqueue because we don't touch the freeing hotpath. The userspace oom handler may not even necessarily kill anything, it may be able to free its own memory and start throttling other processes, for example. And there must be fallbacks which occur if the userspace oom killer fails to clear the oom condition, or times out. I agree completely and proposed this before as memory.oom_delay_millisecs at http://lwn.net/Articles/432226 which we use internally when memory can't be freed or a memcg's limit cannot be expanded. I guess it makes more sense alongside the rest of this patchset now, I can add it as an additional patch next time around. Would be interested to see a description of how all this works. There's an article for LWN also being developed on this topic. As mentioned in that article, I think it would be best to generalize a lot of the common functions and the eventfd handling entirely into a library. I've attached an example implementation that just invokes a function to handle the situation. For Google's usecase specifically, at the root memcg level (system oom) we want to do priority based memcg killing. We want to kill from within a memcg hierarchy that has the lowest priority relative to other memcgs. This cannot be implemented with /proc/pid/oom_score_adj today. Those priorities may also change depending on whether a memcg hierarchy is overlimit, i.e. its limit has been increased temporarily because it has hit a memcg oom and additional memory is readily available on the system. So why not just introduce a memcg tunable that specifies a priority? Well, it's not that simple. Other users will want to implement different policies on system oom (think about things like existing panic_on_oom or oom_kill_allocating_task sysctls). I introduced oom_kill_allocating_task originally for SGI because they wanted a fast oom kill rather than expensive tasklist scan: the allocating task itself is rather irrelevant, it was just the unlucky task that was allocating at the moment that oom was triggered. What's guaranteed is that current in that case will always free memory from under oom (it's not a member of some other mempolicy or cpuset that would be needlessly killed). Both sysctls could trivially be reimplemented in userspace with this feature. I have other customers who don't run in a memcg environment at all, they simply reattach all processes to root and delete all other memcgs. These customers are only concerned about system oom conditions and want to do something interesting before a process is killed. Some want to log the VM statistics as an artifact to examine later, some want to examine heap profiles, others can start throttling and freeing memory rather than kill anything. All of this is impossible today because the kernel oom killer will simply kill something immediately and any stats we collect afterwards don't represent the oom condition. The heap profiles are lost, throttling is useless, etc. Jianguo (cc'd) may also have usecases not described here. I want to log memory usage, like slabinfo, vmalloc info, page-cache info, etc. before kill anything. It is unfortunate that this feature is memcg-only. Surely it could also be used by non-memcg setups. Would like to see at least a detailed description of how this will all be presented and implemented. We should aim to make the memcg and non-memcg userspace interfaces and user-visible behaviour as similar as possible. It's memcg only because it can handle both system and memcg oom conditions with the same clean interface, it would be possible to implement only system oom condition handling through procfs (a little sloppy since it needs to register the eventfd) but then a userspace oom handler would need to determine which interface to use based on whether it was running in a memcg or
[PATCH v2] ARM: mm: support big-endian page tables
When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [] (unwind_backtrace+0x0/0x11c) from [] (show_stack+0x10/0x14) [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) [] (bad_page+0xd4/0x104) from [] (free_pages_prepare+0xa8/0x14c) [] (free_pages_prepare+0xa8/0x14c) from [] (free_hot_cold_page+0x18/0xf0) [] (free_hot_cold_page+0x18/0xf0) from [] (handle_pte_fault+0xcf4/0xdc8) [] (handle_pte_fault+0xcf4/0xdc8) from [] (handle_mm_fault+0xf4/0x120) [] (handle_mm_fault+0xf4/0x120) from [] (do_page_fault+0xfc/0x354) [] (do_page_fault+0xfc/0x354) from [] (do_DataAbort+0x2c/0x90) [] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Signed-off-by: Jianguo Wu --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - tst r2, #L_PTE_VALID + tst rl, #L_PTE_VALID beq 1f - tst r3, #1 << (57 - 32) @ L_PTE_NONE - bicne r2, #L_PTE_VALID + tst rh, #1 << (57 - 32) @ L_PTE_NONE + bicne rl, #L_PTE_VALID bne 1f - tst r3, #1 << (55 - 32) @ L_PTE_DIRTY - orreq r2, #L_PTE_RDONLY + tst rh, #1 << (55 - 32) @ L_PTE_DIRTY + orreq rl, #L_PTE_RDONLY 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] ARM: mm: support big-endian page tables
When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - tst r2, #L_PTE_VALID + tst rl, #L_PTE_VALID beq 1f - tst r3, #1 (57 - 32) @ L_PTE_NONE - bicne r2, #L_PTE_VALID + tst rh, #1 (57 - 32) @ L_PTE_NONE + bicne rl, #L_PTE_VALID bne 1f - tst r3, #1 (55 - 32) @ L_PTE_DIRTY - orreq r2, #L_PTE_RDONLY + tst rh, #1 (55 - 32) @ L_PTE_DIRTY + orreq rl, #L_PTE_RDONLY 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ARM: mm: support big-endian page tables
Ping... On 2014/2/12 14:54, Jianguo Wu wrote: > On 2014/2/11 18:40, Ben Dooks wrote: > >> On 11/02/14 09:20, Jianguo Wu wrote: >>> When enable LPAE and big-endian in a hisilicon board, while specify >>> mem=384M mem=512M@7680M, will get bad page state: >>> >>> Freeing unused kernel memory: 180K (c0466000 - c0493000) >>> BUG: Bad page state in process init pfn:fa442 >>> page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 >>> page flags: 0x4400(reserved) >>> Modules linked in: >>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 >>> [] (unwind_backtrace+0x0/0x11c) from [] >>> (show_stack+0x10/0x14) >>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) >>> [] (bad_page+0xd4/0x104) from [] >>> (free_pages_prepare+0xa8/0x14c) >>> [] (free_pages_prepare+0xa8/0x14c) from [] >>> (free_hot_cold_page+0x18/0xf0) >>> [] (free_hot_cold_page+0x18/0xf0) from [] >>> (handle_pte_fault+0xcf4/0xdc8) >>> [] (handle_pte_fault+0xcf4/0xdc8) from [] >>> (handle_mm_fault+0xf4/0x120) >>> [] (handle_mm_fault+0xf4/0x120) from [] >>> (do_page_fault+0xfc/0x354) >>> [] (do_page_fault+0xfc/0x354) from [] >>> (do_DataAbort+0x2c/0x90) >>> [] (do_DataAbort+0x2c/0x90) from [] >>> (__dabt_usr+0x34/0x40) >>> >>> The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after >>> debugging, >>> I find in page fault handler, will get wrong pfn from pte just after set >>> pte, >>> as follow: >>> do_anonymous_page() >>> { >>> ... >>> set_pte_at(mm, address, page_table, entry); >>> >>> //debug code >>> pfn = pte_pfn(entry); >>> pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry)); >>> >>> //read out the pte just set >>> new_pte = pte_offset_map(pmd, address); >>> new_pfn = pte_pfn(*new_pte); >>> pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry)); >>> ... >>> } >> >> Thanks, must have missed tickling this one. >> >>> >>> pfn: 0x1fa4f5, pte:0xc1fa4f575f >>> new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong. >>> >>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte): >>> when pte is 64-bit, for little-endian, will store low 32-bit in r2, >>> high 32-bit in r3; for big-endian, will store low 32-bit in r3, >>> high 32-bit in r2, this will cause wrong pfn stored in pte, >>> so we should exchange r2 and r3 for big-endian. >>> >>> Signed-off-by: Jianguo Wu >>> --- >>> arch/arm/mm/proc-v7-3level.S | 10 ++ >>> 1 files changed, 10 insertions(+), 0 deletions(-) >>> >>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >>> index 6ba4bd9..71b3892 100644 >>> --- a/arch/arm/mm/proc-v7-3level.S >>> +++ b/arch/arm/mm/proc-v7-3level.S >>> @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm) >>>*/ >>> ENTRY(cpu_v7_set_pte_ext) >>> #ifdef CONFIG_MMU >>> +#ifdef CONFIG_CPU_ENDIAN_BE8 >>> +tstr3, #L_PTE_VALID >>> +beq1f >>> +tstr2, #1 << (57 - 32)@ L_PTE_NONE >>> +bicner3, #L_PTE_VALID >>> +bne1f >>> +tstr2, #1 << (55 - 32)@ L_PTE_DIRTY >>> +orreqr3, #L_PTE_RDONLY >>> +#else >>> tstr2, #L_PTE_VALID >>> beq 1f >>> tstr3, #1 << (57 - 32)@ L_PTE_NONE >>> @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext) >>> bne1f >>> tstr3, #1 << (55 - 32)@ L_PTE_DIRTY >>> orreqr2, #L_PTE_RDONLY >>> +#endif >>> 1:strdr2, r3, [r0] >>> ALT_SMP(W(nop)) >>> ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte >>> -- 1.7.1 >> >> If possible can we avoid large #ifdef blocks here? >> >> Two ideas are >> >> ARM_LE(tst r2, #L_PTE_VALID) >> ARM_BE(tst r3, #L_PTE_VALID) >> >> or change r2, r3 pair to say rlow, rhi and >> >> #ifdef CONFIG_CPU_ENDIAN_BE8 >> #define rlow r3 >> #define rhi r2 >> #else >> #define rlow r2 >> #define rhi r3 >> #endif >> > > Hi Ben, > Thanks for your suggestion, how about this? > > Signed-off-by: Jianguo Wu > --- > arch/ar
Re: [PATCH] ARM: mm: support big-endian page tables
Ping... On 2014/2/12 14:54, Jianguo Wu wrote: On 2014/2/11 18:40, Ben Dooks wrote: On 11/02/14 09:20, Jianguo Wu wrote: When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry)); ... } Thanks, must have missed tickling this one. pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/mm/proc-v7-3level.S | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 6ba4bd9..71b3892 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU +#ifdef CONFIG_CPU_ENDIAN_BE8 +tstr3, #L_PTE_VALID +beq1f +tstr2, #1 (57 - 32)@ L_PTE_NONE +bicner3, #L_PTE_VALID +bne1f +tstr2, #1 (55 - 32)@ L_PTE_DIRTY +orreqr3, #L_PTE_RDONLY +#else tstr2, #L_PTE_VALID beq1f tstr3, #1 (57 - 32)@ L_PTE_NONE @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext) bne1f tstr3, #1 (55 - 32)@ L_PTE_DIRTY orreqr2, #L_PTE_RDONLY +#endif 1:strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte -- 1.7.1 If possible can we avoid large #ifdef blocks here? Two ideas are ARM_LE(tst r2, #L_PTE_VALID) ARM_BE(tst r3, #L_PTE_VALID) or change r2, r3 pair to say rlow, rhi and #ifdef CONFIG_CPU_ENDIAN_BE8 #define rlow r3 #define rhi r2 #else #define rlow r2 #define rhi r3 #endif Hi Ben, Thanks for your suggestion, how about this? Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - tst r2, #L_PTE_VALID + tst rl, #L_PTE_VALID beq 1f - tst r3, #1 (57 - 32) @ L_PTE_NONE - bicne r2, #L_PTE_VALID + tst rh, #1 (57 - 32) @ L_PTE_NONE + bicne rl, #L_PTE_VALID bne 1f - tst r3, #1 (55 - 32) @ L_PTE_DIRTY - orreq r2, #L_PTE_RDONLY + tst rh, #1 (55 - 32) @ L_PTE_DIRTY + orreq rl, #L_PTE_RDONLY 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1
Re: [PATCH] ARM: mm: support big-endian page tables
On 2014/2/11 18:40, Ben Dooks wrote: > On 11/02/14 09:20, Jianguo Wu wrote: >> When enable LPAE and big-endian in a hisilicon board, while specify >> mem=384M mem=512M@7680M, will get bad page state: >> >> Freeing unused kernel memory: 180K (c0466000 - c0493000) >> BUG: Bad page state in process init pfn:fa442 >> page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 >> page flags: 0x4400(reserved) >> Modules linked in: >> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 >> [] (unwind_backtrace+0x0/0x11c) from [] >> (show_stack+0x10/0x14) >> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) >> [] (bad_page+0xd4/0x104) from [] >> (free_pages_prepare+0xa8/0x14c) >> [] (free_pages_prepare+0xa8/0x14c) from [] >> (free_hot_cold_page+0x18/0xf0) >> [] (free_hot_cold_page+0x18/0xf0) from [] >> (handle_pte_fault+0xcf4/0xdc8) >> [] (handle_pte_fault+0xcf4/0xdc8) from [] >> (handle_mm_fault+0xf4/0x120) >> [] (handle_mm_fault+0xf4/0x120) from [] >> (do_page_fault+0xfc/0x354) >> [] (do_page_fault+0xfc/0x354) from [] >> (do_DataAbort+0x2c/0x90) >> [] (do_DataAbort+0x2c/0x90) from [] >> (__dabt_usr+0x34/0x40) >> >> The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after >> debugging, >> I find in page fault handler, will get wrong pfn from pte just after set pte, >> as follow: >> do_anonymous_page() >> { >> ... >> set_pte_at(mm, address, page_table, entry); >> >> //debug code >> pfn = pte_pfn(entry); >> pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry)); >> >> //read out the pte just set >> new_pte = pte_offset_map(pmd, address); >> new_pfn = pte_pfn(*new_pte); >> pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry)); >> ... >> } > > Thanks, must have missed tickling this one. > >> >> pfn: 0x1fa4f5, pte:0xc1fa4f575f >> new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong. >> >> The bug is happened in cpu_v7_set_pte_ext(ptep, pte): >> when pte is 64-bit, for little-endian, will store low 32-bit in r2, >> high 32-bit in r3; for big-endian, will store low 32-bit in r3, >> high 32-bit in r2, this will cause wrong pfn stored in pte, >> so we should exchange r2 and r3 for big-endian. >> >> Signed-off-by: Jianguo Wu >> --- >> arch/arm/mm/proc-v7-3level.S | 10 ++ >> 1 files changed, 10 insertions(+), 0 deletions(-) >> >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >> index 6ba4bd9..71b3892 100644 >> --- a/arch/arm/mm/proc-v7-3level.S >> +++ b/arch/arm/mm/proc-v7-3level.S >> @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm) >>*/ >> ENTRY(cpu_v7_set_pte_ext) >> #ifdef CONFIG_MMU >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> +tstr3, #L_PTE_VALID >> +beq1f >> +tstr2, #1 << (57 - 32)@ L_PTE_NONE >> +bicner3, #L_PTE_VALID >> +bne1f >> +tstr2, #1 << (55 - 32)@ L_PTE_DIRTY >> +orreqr3, #L_PTE_RDONLY >> +#else >> tstr2, #L_PTE_VALID >> beq1f >> tstr3, #1 << (57 - 32)@ L_PTE_NONE >> @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext) >> bne1f >> tstr3, #1 << (55 - 32)@ L_PTE_DIRTY >> orreqr2, #L_PTE_RDONLY >> +#endif >> 1:strdr2, r3, [r0] >> ALT_SMP(W(nop)) >> ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte >> -- 1.7.1 > > If possible can we avoid large #ifdef blocks here? > > Two ideas are > > ARM_LE(tst r2, #L_PTE_VALID) > ARM_BE(tst r3, #L_PTE_VALID) > > or change r2, r3 pair to say rlow, rhi and > > #ifdef CONFIG_CPU_ENDIAN_BE8 > #define rlow r3 > #define rhi r2 > #else > #define rlow r2 > #define rhi r3 > #endif > Hi Ben, Thanks for your suggestion, how about this? Signed-off-by: Jianguo Wu --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */
[PATCH] ARM: mm: support big-endian page tables
When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [] (unwind_backtrace+0x0/0x11c) from [] (show_stack+0x10/0x14) [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104) [] (bad_page+0xd4/0x104) from [] (free_pages_prepare+0xa8/0x14c) [] (free_pages_prepare+0xa8/0x14c) from [] (free_hot_cold_page+0x18/0xf0) [] (free_hot_cold_page+0x18/0xf0) from [] (handle_pte_fault+0xcf4/0xdc8) [] (handle_pte_fault+0xcf4/0xdc8) from [] (handle_mm_fault+0xf4/0x120) [] (handle_mm_fault+0xf4/0x120) from [] (do_page_fault+0xfc/0x354) [] (do_page_fault+0xfc/0x354) from [] (do_DataAbort+0x2c/0x90) [] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Signed-off-by: Jianguo Wu --- arch/arm/mm/proc-v7-3level.S | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 6ba4bd9..71b3892 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU +#ifdef CONFIG_CPU_ENDIAN_BE8 + tst r3, #L_PTE_VALID + beq 1f + tst r2, #1 << (57 - 32) @ L_PTE_NONE + bicne r3, #L_PTE_VALID + bne 1f + tst r2, #1 << (55 - 32) @ L_PTE_DIRTY + orreq r3, #L_PTE_RDONLY +#else tst r2, #L_PTE_VALID beq 1f tst r3, #1 << (57 - 32) @ L_PTE_NONE @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext) bne 1f tst r3, #1 << (55 - 32) @ L_PTE_DIRTY orreq r2, #L_PTE_RDONLY +#endif 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ARM: mm: support big-endian page tables
When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/mm/proc-v7-3level.S | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 6ba4bd9..71b3892 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU +#ifdef CONFIG_CPU_ENDIAN_BE8 + tst r3, #L_PTE_VALID + beq 1f + tst r2, #1 (57 - 32) @ L_PTE_NONE + bicne r3, #L_PTE_VALID + bne 1f + tst r2, #1 (55 - 32) @ L_PTE_DIRTY + orreq r3, #L_PTE_RDONLY +#else tst r2, #L_PTE_VALID beq 1f tst r3, #1 (57 - 32) @ L_PTE_NONE @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext) bne 1f tst r3, #1 (55 - 32) @ L_PTE_DIRTY orreq r2, #L_PTE_RDONLY +#endif 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ARM: mm: support big-endian page tables
On 2014/2/11 18:40, Ben Dooks wrote: On 11/02/14 09:20, Jianguo Wu wrote: When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] (show_stack+0x10/0x14) [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104) [c009e448] (bad_page+0xd4/0x104) from [c009e520] (free_pages_prepare+0xa8/0x14c) [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] (free_hot_cold_page+0x18/0xf0) [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] (handle_pte_fault+0xcf4/0xdc8) [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] (handle_mm_fault+0xf4/0x120) [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] (do_page_fault+0xfc/0x354) [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] (do_DataAbort+0x2c/0x90) [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry)); ... } Thanks, must have missed tickling this one. pfn: 0x1fa4f5, pte:0xc1fa4f575f new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): when pte is 64-bit, for little-endian, will store low 32-bit in r2, high 32-bit in r3; for big-endian, will store low 32-bit in r3, high 32-bit in r2, this will cause wrong pfn stored in pte, so we should exchange r2 and r3 for big-endian. Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/mm/proc-v7-3level.S | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 6ba4bd9..71b3892 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU +#ifdef CONFIG_CPU_ENDIAN_BE8 +tstr3, #L_PTE_VALID +beq1f +tstr2, #1 (57 - 32)@ L_PTE_NONE +bicner3, #L_PTE_VALID +bne1f +tstr2, #1 (55 - 32)@ L_PTE_DIRTY +orreqr3, #L_PTE_RDONLY +#else tstr2, #L_PTE_VALID beq1f tstr3, #1 (57 - 32)@ L_PTE_NONE @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext) bne1f tstr3, #1 (55 - 32)@ L_PTE_DIRTY orreqr2, #L_PTE_RDONLY +#endif 1:strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte -- 1.7.1 If possible can we avoid large #ifdef blocks here? Two ideas are ARM_LE(tst r2, #L_PTE_VALID) ARM_BE(tst r3, #L_PTE_VALID) or change r2, r3 pair to say rlow, rhi and #ifdef CONFIG_CPU_ENDIAN_BE8 #define rlow r3 #define rhi r2 #else #define rlow r2 #define rhi r3 #endif Hi Ben, Thanks for your suggestion, how about this? Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/mm/proc-v7-3level.S | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 01a719e..22e3ad6 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm) mov pc, lr ENDPROC(cpu_v7_switch_mm) +#ifdef __ARMEB__ +#define rl r3 +#define rh r2 +#else +#define rl r2 +#define rh r3 +#endif + /* * cpu_v7_set_pte_ext(ptep, pte) * @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm) */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - tst r2, #L_PTE_VALID + tst rl, #L_PTE_VALID beq 1f - tst r3, #1 (57 - 32) @ L_PTE_NONE - bicne r2, #L_PTE_VALID + tst rh, #1 (57 - 32) @ L_PTE_NONE + bicne rl, #L_PTE_VALID bne 1f - tst r3, #1 (55 - 32) @ L_PTE_DIRTY - orreq r2, #L_PTE_RDONLY + tst rh, #1 (55 - 32) @ L_PTE_DIRTY + orreq rl, #L_PTE_RDONLY 1: strdr2, r3, [r0] ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte -- 1.7.1 Thanks, Jianguo Wu
Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
On 2014/1/22 4:41, David Rientjes wrote: > On Tue, 21 Jan 2014, Jianguo Wu wrote: > >>> The problem is that slabinfo becomes excessively verbose and dumping it >>> all to the kernel log often times causes important messages to be lost. >>> This is why we control things like the tasklist dump with a VM sysctl. It >>> would be possible to dump, say, the top ten slab caches with the highest >>> memory usage, but it will only be helpful for slab leaks. Typically there >>> are better debugging tools available than analyzing the kernel log; if you >>> see unusually high slab memory in the meminfo dump, you can enable it. >>> >> >> But, when OOM has happened, we can only use kernel log, slab/vmalloc info >> from proc >> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top >> 10/20 entrys? >> > > You could, but it's a tradeoff between how much to dump to a general > resource such as the kernel log and how many sysctls we add that control > every possible thing. Slab leaks would definitely be a minority of oom > conditions and you should normally be able to reproduce them by running > the same workload; just use slabtop(1) or manually inspect /proc/slabinfo > while such a workload is running for indicators. I don't think we want to > add the information by default, though, nor do we want to add sysctls to > control the behavior (you'd still need to reproduce the issue after > enabling it). > > We are currently discussing userspace oom handlers, though, that would > allow you to run a process that would be notified and allowed to allocate > a small amount of memory on oom conditions. It would then be trivial to > dump any information you feel pertinent in userspace prior to killing > something. I like to inspect heap profiles for memory hogs while > debugging our malloc() issues, for example, and you could look more > closely at kernel memory. > > I'll cc you on future discussions of that feature. > Hi David, Thanks for your kindly explanation, do you have any specific plans on this? Thanks, Jianguo Wu. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
On 2014/1/22 4:41, David Rientjes wrote: On Tue, 21 Jan 2014, Jianguo Wu wrote: The problem is that slabinfo becomes excessively verbose and dumping it all to the kernel log often times causes important messages to be lost. This is why we control things like the tasklist dump with a VM sysctl. It would be possible to dump, say, the top ten slab caches with the highest memory usage, but it will only be helpful for slab leaks. Typically there are better debugging tools available than analyzing the kernel log; if you see unusually high slab memory in the meminfo dump, you can enable it. But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys? You could, but it's a tradeoff between how much to dump to a general resource such as the kernel log and how many sysctls we add that control every possible thing. Slab leaks would definitely be a minority of oom conditions and you should normally be able to reproduce them by running the same workload; just use slabtop(1) or manually inspect /proc/slabinfo while such a workload is running for indicators. I don't think we want to add the information by default, though, nor do we want to add sysctls to control the behavior (you'd still need to reproduce the issue after enabling it). We are currently discussing userspace oom handlers, though, that would allow you to run a process that would be notified and allowed to allocate a small amount of memory on oom conditions. It would then be trivial to dump any information you feel pertinent in userspace prior to killing something. I like to inspect heap profiles for memory hogs while debugging our malloc() issues, for example, and you could look more closely at kernel memory. I'll cc you on future discussions of that feature. Hi David, Thanks for your kindly explanation, do you have any specific plans on this? Thanks, Jianguo Wu. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
On 2014/1/21 13:34, David Rientjes wrote: > On Mon, 20 Jan 2014, Jianguo Wu wrote: > >> When OOM happen, will dump buddy free areas info, hugetlb pages info, >> memory state of all eligible tasks, per-cpu memory info. >> But do not dump slab/vmalloc info, sometime, it's not enough to figure out >> the >> reason OOM happened. >> >> So, my questions are: >> 1. Should dump slab/vmalloc info when OOM happen? Though we can get these >> from proc file, >> but usually we do not monitor the logs and check proc file immediately when >> OOM happened. >> > Hi David, Thank you for your patience to answer! > The problem is that slabinfo becomes excessively verbose and dumping it > all to the kernel log often times causes important messages to be lost. > This is why we control things like the tasklist dump with a VM sysctl. It > would be possible to dump, say, the top ten slab caches with the highest > memory usage, but it will only be helpful for slab leaks. Typically there > are better debugging tools available than analyzing the kernel log; if you > see unusually high slab memory in the meminfo dump, you can enable it. > But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys? Thanks. >> 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be >> dumped? >> > > Also very verbose and would cause important messages to be lost, we try to > avoid spamming the kernel log with all of this information as much as > possible. > >> 3. Without these info, usually how to figure out OOM reason? >> > > Analyze the memory usage in the meminfo and determine what is unusually > high; if it's mostly anonymous memory, you can usually correlate it back > to a high rss for a process in the tasklist that you didn't suspect to be > using that much memory, for example. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
On 2014/1/21 13:34, David Rientjes wrote: On Mon, 20 Jan 2014, Jianguo Wu wrote: When OOM happen, will dump buddy free areas info, hugetlb pages info, memory state of all eligible tasks, per-cpu memory info. But do not dump slab/vmalloc info, sometime, it's not enough to figure out the reason OOM happened. So, my questions are: 1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file, but usually we do not monitor the logs and check proc file immediately when OOM happened. Hi David, Thank you for your patience to answer! The problem is that slabinfo becomes excessively verbose and dumping it all to the kernel log often times causes important messages to be lost. This is why we control things like the tasklist dump with a VM sysctl. It would be possible to dump, say, the top ten slab caches with the highest memory usage, but it will only be helpful for slab leaks. Typically there are better debugging tools available than analyzing the kernel log; if you see unusually high slab memory in the meminfo dump, you can enable it. But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys? Thanks. 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped? Also very verbose and would cause important messages to be lost, we try to avoid spamming the kernel log with all of this information as much as possible. 3. Without these info, usually how to figure out OOM reason? Analyze the memory usage in the meminfo and determine what is unusually high; if it's mostly anonymous memory, you can usually correlate it back to a high rss for a process in the tasklist that you didn't suspect to be using that much memory, for example. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
When OOM happen, will dump buddy free areas info, hugetlb pages info, memory state of all eligible tasks, per-cpu memory info. But do not dump slab/vmalloc info, sometime, it's not enough to figure out the reason OOM happened. So, my questions are: 1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file, but usually we do not monitor the logs and check proc file immediately when OOM happened. 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped? 3. Without these info, usually how to figure out OOM reason? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
When OOM happen, will dump buddy free areas info, hugetlb pages info, memory state of all eligible tasks, per-cpu memory info. But do not dump slab/vmalloc info, sometime, it's not enough to figure out the reason OOM happened. So, my questions are: 1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file, but usually we do not monitor the logs and check proc file immediately when OOM happened. 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped? 3. Without these info, usually how to figure out OOM reason? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime
On 2014/1/17 20:04, Catalin Marinas wrote: > On Fri, Jan 17, 2014 at 09:40:02AM +0000, Jianguo Wu wrote: >> Now disabling kmemleak is an irreversible operation, but sometimes >> we may need to re-enable kmemleak at runtime. So add a knob to enable >> kmemleak at runtime: >> echo on > /sys/kernel/debug/kmemleak > > It is irreversible for very good reason: once it missed the initial > memory allocations, there is no way for kmemleak to build the object > reference graph and you'll get lots of false positives, pretty much > making it unusable. > Do you mean we didn't trace memory allocations during kmemleak disable period, and these memory may reference to new allocated objects after re-enable? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime
On 2014/1/17 20:04, Catalin Marinas wrote: On Fri, Jan 17, 2014 at 09:40:02AM +, Jianguo Wu wrote: Now disabling kmemleak is an irreversible operation, but sometimes we may need to re-enable kmemleak at runtime. So add a knob to enable kmemleak at runtime: echo on /sys/kernel/debug/kmemleak It is irreversible for very good reason: once it missed the initial memory allocations, there is no way for kmemleak to build the object reference graph and you'll get lots of false positives, pretty much making it unusable. Do you mean we didn't trace memory allocations during kmemleak disable period, and these memory may reference to new allocated objects after re-enable? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime
Now disabling kmemleak is an irreversible operation, but sometimes we may need to re-enable kmemleak at runtime. So add a knob to enable kmemleak at runtime: echo on > /sys/kernel/debug/kmemleak Signed-off-by: Jianguo Wu --- Documentation/kmemleak.txt |3 ++- mm/kmemleak.c | 37 + 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index b6e3973..8ec56ad 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -44,7 +44,8 @@ objects to be reported as orphan. Memory scanning parameters can be modified at run-time by writing to the /sys/kernel/debug/kmemleak file. The following parameters are supported: - off - disable kmemleak (irreversible) + off - disable kmemleak + on - enable kmemleak stack=on - enable the task stacks scanning (default) stack=off- disable the tasks stacks scanning scan=on - start the automatic memory scanning thread (default) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 31f01c5..02f292c 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -260,6 +260,7 @@ static struct early_log static int crt_early_log __initdata; static void kmemleak_disable(void); +static void kmemleak_enable(void); /* * Print a warning and dump the stack trace. @@ -1616,9 +1617,6 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf, int buf_size; int ret; - if (!atomic_read(_enabled)) - return -EBUSY; - buf_size = min(size, (sizeof(buf) - 1)); if (strncpy_from_user(buf, user_buf, buf_size) < 0) return -EFAULT; @@ -1628,6 +1626,19 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf, if (ret < 0) return ret; + if (strncmp(buf, "on", 2) == 0) { + if (atomic_read(_enabled)) + ret = -EBUSY; + else + kmemleak_enable(); + goto out; + } + + if (!atomic_read(_enabled)) { + ret = -EBUSY; + goto out; + } + if (strncmp(buf, "off", 3) == 0) kmemleak_disable(); else if (strncmp(buf, "stack=on", 8) == 0) @@ -1703,7 +1714,7 @@ static DECLARE_WORK(cleanup_work, kmemleak_do_cleanup); /* * Disable kmemleak. No memory allocation/freeing will be traced once this - * function is called. Disabling kmemleak is an irreversible operation. + * function is called. */ static void kmemleak_disable(void) { @@ -1721,6 +1732,24 @@ static void kmemleak_disable(void) pr_info("Kernel memory leak detector disabled\n"); } +static void kmemleak_enable(void) +{ + struct kmemleak_object *object; + + /* free the kmemleak internal objects the previous thread scanned */ + rcu_read_lock(); + list_for_each_entry_rcu(object, _list, object_list) + delete_object_full(object->pointer); + rcu_read_unlock(); + + atomic_set(_enabled, 1); + atomic_set(_error, 0); + + start_scan_thread(); + + pr_info("Kernel memory leak detector enabled\n"); +} + /* * Allow boot-time kmemleak disabling (enabled by default). */ -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime
Now disabling kmemleak is an irreversible operation, but sometimes we may need to re-enable kmemleak at runtime. So add a knob to enable kmemleak at runtime: echo on /sys/kernel/debug/kmemleak Signed-off-by: Jianguo Wu wujian...@huawei.com --- Documentation/kmemleak.txt |3 ++- mm/kmemleak.c | 37 + 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index b6e3973..8ec56ad 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -44,7 +44,8 @@ objects to be reported as orphan. Memory scanning parameters can be modified at run-time by writing to the /sys/kernel/debug/kmemleak file. The following parameters are supported: - off - disable kmemleak (irreversible) + off - disable kmemleak + on - enable kmemleak stack=on - enable the task stacks scanning (default) stack=off- disable the tasks stacks scanning scan=on - start the automatic memory scanning thread (default) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 31f01c5..02f292c 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -260,6 +260,7 @@ static struct early_log static int crt_early_log __initdata; static void kmemleak_disable(void); +static void kmemleak_enable(void); /* * Print a warning and dump the stack trace. @@ -1616,9 +1617,6 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf, int buf_size; int ret; - if (!atomic_read(kmemleak_enabled)) - return -EBUSY; - buf_size = min(size, (sizeof(buf) - 1)); if (strncpy_from_user(buf, user_buf, buf_size) 0) return -EFAULT; @@ -1628,6 +1626,19 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf, if (ret 0) return ret; + if (strncmp(buf, on, 2) == 0) { + if (atomic_read(kmemleak_enabled)) + ret = -EBUSY; + else + kmemleak_enable(); + goto out; + } + + if (!atomic_read(kmemleak_enabled)) { + ret = -EBUSY; + goto out; + } + if (strncmp(buf, off, 3) == 0) kmemleak_disable(); else if (strncmp(buf, stack=on, 8) == 0) @@ -1703,7 +1714,7 @@ static DECLARE_WORK(cleanup_work, kmemleak_do_cleanup); /* * Disable kmemleak. No memory allocation/freeing will be traced once this - * function is called. Disabling kmemleak is an irreversible operation. + * function is called. */ static void kmemleak_disable(void) { @@ -1721,6 +1732,24 @@ static void kmemleak_disable(void) pr_info(Kernel memory leak detector disabled\n); } +static void kmemleak_enable(void) +{ + struct kmemleak_object *object; + + /* free the kmemleak internal objects the previous thread scanned */ + rcu_read_lock(); + list_for_each_entry_rcu(object, object_list, object_list) + delete_object_full(object-pointer); + rcu_read_unlock(); + + atomic_set(kmemleak_enabled, 1); + atomic_set(kmemleak_error, 0); + + start_scan_thread(); + + pr_info(Kernel memory leak detector enabled\n); +} + /* * Allow boot-time kmemleak disabling (enabled by default). */ -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] mm: free memblock.memory in free_all_bootmem
On 2014/1/7 23:16, Philipp Hachtmann wrote: > When calling free_all_bootmem() the free areas under memblock's > control are released to the buddy allocator. Additionally the > reserved list is freed if it was reallocated by memblock. > The same should apply for the memory list. > > Signed-off-by: Philipp Hachtmann > --- > include/linux/memblock.h | 1 + > mm/memblock.c| 12 > mm/nobootmem.c | 7 ++- > 3 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index 77c60e5..d174922 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -52,6 +52,7 @@ phys_addr_t memblock_find_in_range_node(phys_addr_t start, > phys_addr_t end, > phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end, > phys_addr_t size, phys_addr_t align); > phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr); > +phys_addr_t get_allocated_memblock_memory_regions_info(phys_addr_t *addr); > void memblock_allow_resize(void); > int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid); > int memblock_add(phys_addr_t base, phys_addr_t size); > diff --git a/mm/memblock.c b/mm/memblock.c > index 53e477b..1a11d04 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -271,6 +271,18 @@ phys_addr_t __init_memblock > get_allocated_memblock_reserved_regions_info( > memblock.reserved.max); > } > > +phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info( > + phys_addr_t *addr) > +{ > + if (memblock.memory.regions == memblock_memory_init_regions) > + return 0; > + > + *addr = __pa(memblock.memory.regions); > + > + return PAGE_ALIGN(sizeof(struct memblock_region) * > + memblock.memory.max); > +} > + > /** > * memblock_double_array - double the size of the memblock regions array > * @type: memblock type of the regions array being doubled > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > index 3a7e14d..83f36d3 100644 > --- a/mm/nobootmem.c > +++ b/mm/nobootmem.c > @@ -122,11 +122,16 @@ static unsigned long __init > free_low_memory_core_early(void) > for_each_free_mem_range(i, MAX_NUMNODES, , , NULL) > count += __free_memory_core(start, end); > > - /* free range that is used for reserved array if we allocate it */ > + /* Free memblock.reserved array if it was allocated */ > size = get_allocated_memblock_reserved_regions_info(); > if (size) > count += __free_memory_core(start, start + size); > > + /* Free memblock.memory array if it was allocated */ > + size = get_allocated_memblock_memory_regions_info(); > + if (size) > + count += __free_memory_core(start, start + size); > + Hi Philipp, For some archs, like arm64, would use memblock.memory after system booting, so we can not simply released to the buddy allocator, maybe need !defined(CONFIG_ARCH_DISCARD_MEMBLOCK). #ifdef CONFIG_HAVE_ARCH_PFN_VALID int pfn_valid(unsigned long pfn) { return memblock_is_memory(pfn << PAGE_SHIFT); } EXPORT_SYMBOL(pfn_valid); Thanks, Jianguo Wu > return count; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] mm: free memblock.memory in free_all_bootmem
On 2014/1/7 23:16, Philipp Hachtmann wrote: When calling free_all_bootmem() the free areas under memblock's control are released to the buddy allocator. Additionally the reserved list is freed if it was reallocated by memblock. The same should apply for the memory list. Signed-off-by: Philipp Hachtmann pha...@linux.vnet.ibm.com --- include/linux/memblock.h | 1 + mm/memblock.c| 12 mm/nobootmem.c | 7 ++- 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 77c60e5..d174922 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -52,6 +52,7 @@ phys_addr_t memblock_find_in_range_node(phys_addr_t start, phys_addr_t end, phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align); phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr); +phys_addr_t get_allocated_memblock_memory_regions_info(phys_addr_t *addr); void memblock_allow_resize(void); int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid); int memblock_add(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 53e477b..1a11d04 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -271,6 +271,18 @@ phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info( memblock.reserved.max); } +phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info( + phys_addr_t *addr) +{ + if (memblock.memory.regions == memblock_memory_init_regions) + return 0; + + *addr = __pa(memblock.memory.regions); + + return PAGE_ALIGN(sizeof(struct memblock_region) * + memblock.memory.max); +} + /** * memblock_double_array - double the size of the memblock regions array * @type: memblock type of the regions array being doubled diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 3a7e14d..83f36d3 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -122,11 +122,16 @@ static unsigned long __init free_low_memory_core_early(void) for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL) count += __free_memory_core(start, end); - /* free range that is used for reserved array if we allocate it */ + /* Free memblock.reserved array if it was allocated */ size = get_allocated_memblock_reserved_regions_info(start); if (size) count += __free_memory_core(start, start + size); + /* Free memblock.memory array if it was allocated */ + size = get_allocated_memblock_memory_regions_info(start); + if (size) + count += __free_memory_core(start, start + size); + Hi Philipp, For some archs, like arm64, would use memblock.memory after system booting, so we can not simply released to the buddy allocator, maybe need !defined(CONFIG_ARCH_DISCARD_MEMBLOCK). #ifdef CONFIG_HAVE_ARCH_PFN_VALID int pfn_valid(unsigned long pfn) { return memblock_is_memory(pfn PAGE_SHIFT); } EXPORT_SYMBOL(pfn_valid); Thanks, Jianguo Wu return count; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()
Hi Kirill, On 2013/12/16 22:25, Kirill A. Shutemov wrote: > Jianguo Wu wrote: >> In __page_check_address(), if address's pud is not present, >> huge_pte_offset() will return NULL, we should check the return value. >> >> Signed-off-by: Jianguo Wu > > Looks okay to me. > > Acked-by: Kirill A. Shutemov > > Have you triggered a crash there? Or just spotted by reading the code? > By reading the code. Thanks, Jianguo Wu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()
In __page_check_address(), if address's pud is not present, huge_pte_offset() will return NULL, we should check the return value. Signed-off-by: Jianguo Wu --- mm/rmap.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 55c8b8d..068522d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -600,7 +600,11 @@ pte_t *__page_check_address(struct page *page, struct mm_struct *mm, spinlock_t *ptl; if (unlikely(PageHuge(page))) { + /* when pud is not present, pte will be NULL */ pte = huge_pte_offset(mm, address); + if (!pte) + return NULL; + ptl = huge_pte_lockptr(page_hstate(page), mm, pte); goto check; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()
In __page_check_address(), if address's pud is not present, huge_pte_offset() will return NULL, we should check the return value. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/rmap.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 55c8b8d..068522d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -600,7 +600,11 @@ pte_t *__page_check_address(struct page *page, struct mm_struct *mm, spinlock_t *ptl; if (unlikely(PageHuge(page))) { + /* when pud is not present, pte will be NULL */ pte = huge_pte_offset(mm, address); + if (!pte) + return NULL; + ptl = huge_pte_lockptr(page_hstate(page), mm, pte); goto check; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()
Hi Kirill, On 2013/12/16 22:25, Kirill A. Shutemov wrote: Jianguo Wu wrote: In __page_check_address(), if address's pud is not present, huge_pte_offset() will return NULL, we should check the return value. Signed-off-by: Jianguo Wu wujian...@huawei.com Looks okay to me. Acked-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Have you triggered a crash there? Or just spotted by reading the code? By reading the code. Thanks, Jianguo Wu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
Changelog: - Only set PageHWPoison on the error raw page if page is freed into buddy After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Tested-by: Naoya Horiguchi Cc: sta...@vger.kernel.org Signed-off-by: Jianguo Wu --- mm/memory-failure.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..db08af9 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1505,10 +1505,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret > 0) ret = -EIO; } else { - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); - atomic_long_add(1 << compound_order(hpage), - _poisoned_pages); + /* overcommit hugetlb page will be freed to buddy */ + if (PageHuge(page)) { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + atomic_long_add(1 << compound_order(hpage), + _poisoned_pages); + } else { + SetPageHWPoison(page); + atomic_long_inc(_poisoned_pages); + } } return ret; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
Hi, On 2013/12/13 10:32, Naoya Horiguchi wrote: > On Fri, Dec 13, 2013 at 09:09:52AM +0800, Jianguo Wu wrote: >> After a successful hugetlb page migration by soft offline, the source page >> will either be freed into hugepage_freelists or buddy(over-commit page). If >> page is in >> buddy, page_hstate(page) will be NULL. It will hit a NULL pointer >> dereference in dequeue_hwpoisoned_huge_page(). >> >> [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at >> 0058 >> [ 890.685741] IP: [] >> dequeue_hwpoisoned_huge_page+0x131/0x1d0 >> [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 >> [ 890.697314] Oops: [#1] SMP >> >> So check PageHuge(page) after call migrate_pages() successfully. >> >> Tested-by: Naoya Horiguchi >> Cc: sta...@vger.kernel.org >> Signed-off-by: Jianguo Wu >> --- >> mm/memory-failure.c | 19 ++- >> 1 file changed, 14 insertions(+), 5 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index b7c1716..e5567f2 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned >> long pfn, int flags) >> >> static int soft_offline_huge_page(struct page *page, int flags) >> { >> -int ret; >> +int ret, i; >> +unsigned long nr_pages; >> unsigned long pfn = page_to_pfn(page); >> struct page *hpage = compound_head(page); >> LIST_HEAD(pagelist); >> @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, >> int flags) >> } >> unlock_page(hpage); >> >> +nr_pages = 1 << compound_order(hpage); >> + >> /* Keep page count to indicate a given hugepage is isolated. */ >> list_move(>lru, ); >> ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL, >> @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, >> int flags) >> if (ret > 0) >> ret = -EIO; >> } else { >> -set_page_hwpoison_huge_page(hpage); >> -dequeue_hwpoisoned_huge_page(hpage); >> -atomic_long_add(1 << compound_order(hpage), >> -_poisoned_pages); >> +/* overcommit hugetlb page will be freed to buddy */ >> +if (PageHuge(page)) { >> +set_page_hwpoison_huge_page(hpage); >> +dequeue_hwpoisoned_huge_page(hpage); >> +} else { >> +for (i = 0; i < nr_pages; i++) >> +SetPageHWPoison(hpage + i); > > Why don't you set PageHWPoison only on the error raw page instead > of the whole error hugepage, or is there some problem of doing so? > Oh, yes, we should only poison the error raw page. I will resend a new version. Thanks, Jianguo Wu > Thanks, > Naoya > >> +} >> + >> +atomic_long_add(nr_pages, _poisoned_pages); >> } >> return ret; >> } >> -- >> 1.8.2.2 >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majord...@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Tested-by: Naoya Horiguchi Cc: sta...@vger.kernel.org Signed-off-by: Jianguo Wu --- mm/memory-failure.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..e5567f2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) static int soft_offline_huge_page(struct page *page, int flags) { - int ret; + int ret, i; + unsigned long nr_pages; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); LIST_HEAD(pagelist); @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int flags) } unlock_page(hpage); + nr_pages = 1 << compound_order(hpage); + /* Keep page count to indicate a given hugepage is isolated. */ list_move(>lru, ); ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL, @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret > 0) ret = -EIO; } else { - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); - atomic_long_add(1 << compound_order(hpage), - _poisoned_pages); + /* overcommit hugetlb page will be freed to buddy */ + if (PageHuge(page)) { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + } else { + for (i = 0; i < nr_pages; i++) + SetPageHWPoison(hpage + i); + } + + atomic_long_add(nr_pages, _poisoned_pages); } return ret; } -- 1.8.2.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull
Hi Naoya, On 2013/12/13 1:39, Naoya Horiguchi wrote: > (Cced: Chen Gong) > > I confirmed that this patch fixes the reported bug. > And I'll send a test patch for mce-test later privately. > > Tested-by: Naoya Horiguchi > > Jianguo, could you put "Cc: sta...@vger.kernel.org" > in patch description? > And please fix a typo in subject line. > OK, thanks for your tested! Thanks, Jianguo Wu > Thanks, > Naoya Horiguchi > > On Thu, Dec 12, 2013 at 09:14:05PM +0800, Jianguo Wu wrote: >> After a successful hugetlb page migration by soft offline, the source page >> will either be freed into hugepage_freelists or buddy(over-commit page). If >> page is in >> buddy, page_hstate(page) will be NULL. It will hit a NULL pointer >> dereference in dequeue_hwpoisoned_huge_page(). >> >> [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at >> 0058 >> [ 890.685741] IP: [] >> dequeue_hwpoisoned_huge_page+0x131/0x1d0 >> [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 >> [ 890.697314] Oops: [#1] SMP >> >> So check PageHuge(page) after call migrate_pages() successfull. >> >> Signed-off-by: Jianguo Wu >> --- >> mm/memory-failure.c | 19 ++- >> 1 file changed, 14 insertions(+), 5 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index b7c1716..e5567f2 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned >> long pfn, int flags) >> >> static int soft_offline_huge_page(struct page *page, int flags) >> { >> -int ret; >> +int ret, i; >> +unsigned long nr_pages; >> unsigned long pfn = page_to_pfn(page); >> struct page *hpage = compound_head(page); >> LIST_HEAD(pagelist); >> @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, >> int flags) >> } >> unlock_page(hpage); >> >> +nr_pages = 1 << compound_order(hpage); >> + >> /* Keep page count to indicate a given hugepage is isolated. */ >> list_move(>lru, ); >> ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL, >> @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, >> int flags) >> if (ret > 0) >> ret = -EIO; >> } else { >> -set_page_hwpoison_huge_page(hpage); >> -dequeue_hwpoisoned_huge_page(hpage); >> -atomic_long_add(1 << compound_order(hpage), >> -_poisoned_pages); >> +/* over-commit hugetlb page will be freed into buddy */ >> +if (PageHuge(page)) { >> +set_page_hwpoison_huge_page(hpage); >> +dequeue_hwpoisoned_huge_page(hpage); >> +} else { >> +for (i = 0; i < nr_pages; i++) >> +SetPageHWPoison(hpage + i); >> +} >> + >> +atomic_long_add(nr_pages, _poisoned_pages); >> } >> return ret; >> } >> -- >> 1.8.2.2 >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majord...@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull
After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfull. Signed-off-by: Jianguo Wu --- mm/memory-failure.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..e5567f2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) static int soft_offline_huge_page(struct page *page, int flags) { - int ret; + int ret, i; + unsigned long nr_pages; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); LIST_HEAD(pagelist); @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int flags) } unlock_page(hpage); + nr_pages = 1 << compound_order(hpage); + /* Keep page count to indicate a given hugepage is isolated. */ list_move(>lru, ); ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL, @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret > 0) ret = -EIO; } else { - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); - atomic_long_add(1 << compound_order(hpage), - _poisoned_pages); + /* over-commit hugetlb page will be freed into buddy */ + if (PageHuge(page)) { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + } else { + for (i = 0; i < nr_pages; i++) + SetPageHWPoison(hpage + i); + } + + atomic_long_add(nr_pages, _poisoned_pages); } return ret; } -- 1.8.2.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull
After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [81163761] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfull. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/memory-failure.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..e5567f2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) static int soft_offline_huge_page(struct page *page, int flags) { - int ret; + int ret, i; + unsigned long nr_pages; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); LIST_HEAD(pagelist); @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int flags) } unlock_page(hpage); + nr_pages = 1 compound_order(hpage); + /* Keep page count to indicate a given hugepage is isolated. */ list_move(hpage-lru, pagelist); ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL, @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret 0) ret = -EIO; } else { - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); - atomic_long_add(1 compound_order(hpage), - num_poisoned_pages); + /* over-commit hugetlb page will be freed into buddy */ + if (PageHuge(page)) { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + } else { + for (i = 0; i nr_pages; i++) + SetPageHWPoison(hpage + i); + } + + atomic_long_add(nr_pages, num_poisoned_pages); } return ret; } -- 1.8.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull
Hi Naoya, On 2013/12/13 1:39, Naoya Horiguchi wrote: (Cced: Chen Gong) I confirmed that this patch fixes the reported bug. And I'll send a test patch for mce-test later privately. Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Jianguo, could you put Cc: sta...@vger.kernel.org in patch description? And please fix a typo in subject line. OK, thanks for your tested! Thanks, Jianguo Wu Thanks, Naoya Horiguchi On Thu, Dec 12, 2013 at 09:14:05PM +0800, Jianguo Wu wrote: After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [81163761] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfull. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/memory-failure.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..e5567f2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) static int soft_offline_huge_page(struct page *page, int flags) { -int ret; +int ret, i; +unsigned long nr_pages; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); LIST_HEAD(pagelist); @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int flags) } unlock_page(hpage); +nr_pages = 1 compound_order(hpage); + /* Keep page count to indicate a given hugepage is isolated. */ list_move(hpage-lru, pagelist); ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL, @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret 0) ret = -EIO; } else { -set_page_hwpoison_huge_page(hpage); -dequeue_hwpoisoned_huge_page(hpage); -atomic_long_add(1 compound_order(hpage), -num_poisoned_pages); +/* over-commit hugetlb page will be freed into buddy */ +if (PageHuge(page)) { +set_page_hwpoison_huge_page(hpage); +dequeue_hwpoisoned_huge_page(hpage); +} else { +for (i = 0; i nr_pages; i++) +SetPageHWPoison(hpage + i); +} + +atomic_long_add(nr_pages, num_poisoned_pages); } return ret; } -- 1.8.2.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [81163761] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Cc: sta...@vger.kernel.org Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/memory-failure.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..e5567f2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) static int soft_offline_huge_page(struct page *page, int flags) { - int ret; + int ret, i; + unsigned long nr_pages; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); LIST_HEAD(pagelist); @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int flags) } unlock_page(hpage); + nr_pages = 1 compound_order(hpage); + /* Keep page count to indicate a given hugepage is isolated. */ list_move(hpage-lru, pagelist); ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL, @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret 0) ret = -EIO; } else { - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); - atomic_long_add(1 compound_order(hpage), - num_poisoned_pages); + /* overcommit hugetlb page will be freed to buddy */ + if (PageHuge(page)) { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + } else { + for (i = 0; i nr_pages; i++) + SetPageHWPoison(hpage + i); + } + + atomic_long_add(nr_pages, num_poisoned_pages); } return ret; } -- 1.8.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
Hi, On 2013/12/13 10:32, Naoya Horiguchi wrote: On Fri, Dec 13, 2013 at 09:09:52AM +0800, Jianguo Wu wrote: After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [81163761] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Cc: sta...@vger.kernel.org Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/memory-failure.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..e5567f2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) static int soft_offline_huge_page(struct page *page, int flags) { -int ret; +int ret, i; +unsigned long nr_pages; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); LIST_HEAD(pagelist); @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int flags) } unlock_page(hpage); +nr_pages = 1 compound_order(hpage); + /* Keep page count to indicate a given hugepage is isolated. */ list_move(hpage-lru, pagelist); ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL, @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret 0) ret = -EIO; } else { -set_page_hwpoison_huge_page(hpage); -dequeue_hwpoisoned_huge_page(hpage); -atomic_long_add(1 compound_order(hpage), -num_poisoned_pages); +/* overcommit hugetlb page will be freed to buddy */ +if (PageHuge(page)) { +set_page_hwpoison_huge_page(hpage); +dequeue_hwpoisoned_huge_page(hpage); +} else { +for (i = 0; i nr_pages; i++) +SetPageHWPoison(hpage + i); Why don't you set PageHWPoison only on the error raw page instead of the whole error hugepage, or is there some problem of doing so? Oh, yes, we should only poison the error raw page. I will resend a new version. Thanks, Jianguo Wu Thanks, Naoya +} + +atomic_long_add(nr_pages, num_poisoned_pages); } return ret; } -- 1.8.2.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
Changelog: - Only set PageHWPoison on the error raw page if page is freed into buddy After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0058 [ 890.685741] IP: [81163761] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Cc: sta...@vger.kernel.org Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/memory-failure.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..db08af9 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1505,10 +1505,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret 0) ret = -EIO; } else { - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); - atomic_long_add(1 compound_order(hpage), - num_poisoned_pages); + /* overcommit hugetlb page will be freed to buddy */ + if (PageHuge(page)) { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + atomic_long_add(1 compound_order(hpage), + num_poisoned_pages); + } else { + SetPageHWPoison(page); + atomic_long_inc(num_poisoned_pages); + } } return ret; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: do_mincore() cleanup
On 2013/12/5 22:39, Naoya Horiguchi wrote: > On Thu, Dec 05, 2013 at 04:52:52PM +0800, Jianguo Wu wrote: >> Two cleanups: >> 1. remove redundant codes for hugetlb pages. >> 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE, >>this may increase do_mincore() calls, remove it. >> >> Signed-off-by: Jianguo Wu > > Reviewed-by: Naoya Horiguchi Hi Naoya, thanks for your review! Jianguo Wu > > Thanks! > > Naoya > >> --- >> mm/mincore.c |7 --- >> 1 files changed, 0 insertions(+), 7 deletions(-) >> >> diff --git a/mm/mincore.c b/mm/mincore.c >> index da2be56..1016233 100644 >> --- a/mm/mincore.c >> +++ b/mm/mincore.c >> @@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned >> long pages, unsigned char *v >> >> end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); >> >> -if (is_vm_hugetlb_page(vma)) { >> -mincore_hugetlb_page_range(vma, addr, end, vec); >> -return (end - addr) >> PAGE_SHIFT; >> -} >> - >> -end = pmd_addr_end(addr, end); >> - >> if (is_vm_hugetlb_page(vma)) >> mincore_hugetlb_page_range(vma, addr, end, vec); >> else >> -- >> 1.7.1 >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majord...@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: do_mincore() cleanup
Two cleanups: 1. remove redundant codes for hugetlb pages. 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE, this may increase do_mincore() calls, remove it. Signed-off-by: Jianguo Wu --- mm/mincore.c |7 --- 1 files changed, 0 insertions(+), 7 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index da2be56..1016233 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); - if (is_vm_hugetlb_page(vma)) { - mincore_hugetlb_page_range(vma, addr, end, vec); - return (end - addr) >> PAGE_SHIFT; - } - - end = pmd_addr_end(addr, end); - if (is_vm_hugetlb_page(vma)) mincore_hugetlb_page_range(vma, addr, end, vec); else -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: do_mincore() cleanup
Two cleanups: 1. remove redundant codes for hugetlb pages. 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE, this may increase do_mincore() calls, remove it. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/mincore.c |7 --- 1 files changed, 0 insertions(+), 7 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index da2be56..1016233 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v end = min(vma-vm_end, addr + (pages PAGE_SHIFT)); - if (is_vm_hugetlb_page(vma)) { - mincore_hugetlb_page_range(vma, addr, end, vec); - return (end - addr) PAGE_SHIFT; - } - - end = pmd_addr_end(addr, end); - if (is_vm_hugetlb_page(vma)) mincore_hugetlb_page_range(vma, addr, end, vec); else -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: do_mincore() cleanup
On 2013/12/5 22:39, Naoya Horiguchi wrote: On Thu, Dec 05, 2013 at 04:52:52PM +0800, Jianguo Wu wrote: Two cleanups: 1. remove redundant codes for hugetlb pages. 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE, this may increase do_mincore() calls, remove it. Signed-off-by: Jianguo Wu wujian...@huawei.com Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Hi Naoya, thanks for your review! Jianguo Wu Thanks! Naoya --- mm/mincore.c |7 --- 1 files changed, 0 insertions(+), 7 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index da2be56..1016233 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v end = min(vma-vm_end, addr + (pages PAGE_SHIFT)); -if (is_vm_hugetlb_page(vma)) { -mincore_hugetlb_page_range(vma, addr, end, vec); -return (end - addr) PAGE_SHIFT; -} - -end = pmd_addr_end(addr, end); - if (is_vm_hugetlb_page(vma)) mincore_hugetlb_page_range(vma, addr, end, vec); else -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Resend with ACK][PATCH] mm/arch: use NUMA_NO_NODE
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc() Signed-off-by: Jianguo Wu Acked-by: Ralf Baechle --- arch/arm/kernel/module.c|2 +- arch/arm64/kernel/module.c |2 +- arch/mips/kernel/module.c |2 +- arch/parisc/kernel/module.c |2 +- arch/s390/kernel/module.c |2 +- arch/sparc/kernel/module.c |2 +- arch/x86/kernel/module.c|2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c index 85c3fb6..8f4cff3 100644 --- a/arch/arm/kernel/module.c +++ b/arch/arm/kernel/module.c @@ -40,7 +40,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index ca0e3d5..8f898bd 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -29,7 +29,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c index 977a623..b507e07 100644 --- a/arch/mips/kernel/module.c +++ b/arch/mips/kernel/module.c @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock); void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c index 2a625fb..50dfafc 100644 --- a/arch/parisc/kernel/module.c +++ b/arch/parisc/kernel/module.c @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size) * init_data correctly */ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, GFP_KERNEL | __GFP_HIGHMEM, - PAGE_KERNEL_RWX, -1, + PAGE_KERNEL_RWX, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c index 7845e15..b89b591 100644 --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size) if (PAGE_ALIGN(size) > MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c index 4435488..97655e0 100644 --- a/arch/sparc/kernel/module.c +++ b/arch/sparc/kernel/module.c @@ -29,7 +29,7 @@ static void *module_map(unsigned long size) if (PAGE_ALIGN(size) > MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #else diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 216a4d7..18be189 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, - -1, __builtin_return_address(0)); + NUMA_NO_NODE, __builtin_return_address(0)); } #ifdef CONFIG_X86_32 -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Resend with ACK][PATCH] mm/arch: use NUMA_NODE
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc() Signed-off-by: Jianguo Wu Acked-by: Ralf Baechle --- arch/arm/kernel/module.c|2 +- arch/arm64/kernel/module.c |2 +- arch/mips/kernel/module.c |2 +- arch/parisc/kernel/module.c |2 +- arch/s390/kernel/module.c |2 +- arch/sparc/kernel/module.c |2 +- arch/x86/kernel/module.c|2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c index 85c3fb6..8f4cff3 100644 --- a/arch/arm/kernel/module.c +++ b/arch/arm/kernel/module.c @@ -40,7 +40,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index ca0e3d5..8f898bd 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -29,7 +29,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c index 977a623..b507e07 100644 --- a/arch/mips/kernel/module.c +++ b/arch/mips/kernel/module.c @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock); void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c index 2a625fb..50dfafc 100644 --- a/arch/parisc/kernel/module.c +++ b/arch/parisc/kernel/module.c @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size) * init_data correctly */ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, GFP_KERNEL | __GFP_HIGHMEM, - PAGE_KERNEL_RWX, -1, + PAGE_KERNEL_RWX, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c index 7845e15..b89b591 100644 --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size) if (PAGE_ALIGN(size) > MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c index 4435488..97655e0 100644 --- a/arch/sparc/kernel/module.c +++ b/arch/sparc/kernel/module.c @@ -29,7 +29,7 @@ static void *module_map(unsigned long size) if (PAGE_ALIGN(size) > MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #else diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 216a4d7..18be189 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, - -1, __builtin_return_address(0)); + NUMA_NO_NODE, __builtin_return_address(0)); } #ifdef CONFIG_X86_32 -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Resend with ACK][PATCH] mm/arch: use NUMA_NODE
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc() Signed-off-by: Jianguo Wu wujian...@huawei.com Acked-by: Ralf Baechle r...@linux-mips.org --- arch/arm/kernel/module.c|2 +- arch/arm64/kernel/module.c |2 +- arch/mips/kernel/module.c |2 +- arch/parisc/kernel/module.c |2 +- arch/s390/kernel/module.c |2 +- arch/sparc/kernel/module.c |2 +- arch/x86/kernel/module.c|2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c index 85c3fb6..8f4cff3 100644 --- a/arch/arm/kernel/module.c +++ b/arch/arm/kernel/module.c @@ -40,7 +40,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index ca0e3d5..8f898bd 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -29,7 +29,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c index 977a623..b507e07 100644 --- a/arch/mips/kernel/module.c +++ b/arch/mips/kernel/module.c @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock); void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c index 2a625fb..50dfafc 100644 --- a/arch/parisc/kernel/module.c +++ b/arch/parisc/kernel/module.c @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size) * init_data correctly */ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, GFP_KERNEL | __GFP_HIGHMEM, - PAGE_KERNEL_RWX, -1, + PAGE_KERNEL_RWX, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c index 7845e15..b89b591 100644 --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size) if (PAGE_ALIGN(size) MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c index 4435488..97655e0 100644 --- a/arch/sparc/kernel/module.c +++ b/arch/sparc/kernel/module.c @@ -29,7 +29,7 @@ static void *module_map(unsigned long size) if (PAGE_ALIGN(size) MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #else diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 216a4d7..18be189 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, - -1, __builtin_return_address(0)); + NUMA_NO_NODE, __builtin_return_address(0)); } #ifdef CONFIG_X86_32 -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Resend with ACK][PATCH] mm/arch: use NUMA_NO_NODE
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc() Signed-off-by: Jianguo Wu wujian...@huawei.com Acked-by: Ralf Baechle r...@linux-mips.org --- arch/arm/kernel/module.c|2 +- arch/arm64/kernel/module.c |2 +- arch/mips/kernel/module.c |2 +- arch/parisc/kernel/module.c |2 +- arch/s390/kernel/module.c |2 +- arch/sparc/kernel/module.c |2 +- arch/x86/kernel/module.c|2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c index 85c3fb6..8f4cff3 100644 --- a/arch/arm/kernel/module.c +++ b/arch/arm/kernel/module.c @@ -40,7 +40,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index ca0e3d5..8f898bd 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -29,7 +29,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c index 977a623..b507e07 100644 --- a/arch/mips/kernel/module.c +++ b/arch/mips/kernel/module.c @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock); void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c index 2a625fb..50dfafc 100644 --- a/arch/parisc/kernel/module.c +++ b/arch/parisc/kernel/module.c @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size) * init_data correctly */ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, GFP_KERNEL | __GFP_HIGHMEM, - PAGE_KERNEL_RWX, -1, + PAGE_KERNEL_RWX, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c index 7845e15..b89b591 100644 --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size) if (PAGE_ALIGN(size) MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c index 4435488..97655e0 100644 --- a/arch/sparc/kernel/module.c +++ b/arch/sparc/kernel/module.c @@ -29,7 +29,7 @@ static void *module_map(unsigned long size) if (PAGE_ALIGN(size) MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #else diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 216a4d7..18be189 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, - -1, __builtin_return_address(0)); + NUMA_NO_NODE, __builtin_return_address(0)); } #ifdef CONFIG_X86_32 -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/ksm: return NULL when doesn't get mergeable page
On 2013/9/19 16:33, Petr Holasek wrote: > On Mon, 16 Sep 2013, Jianguo Wu wrote: >> In get_mergeable_page() local variable page is not initialized, >> it may hold a garbage value, when find_mergeable_vma() return NULL, >> get_mergeable_page() may return a garbage value to the caller. >> >> So initialize page as NULL. >> >> Signed-off-by: Jianguo Wu >> --- >> mm/ksm.c |2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/mm/ksm.c b/mm/ksm.c >> index b6afe0c..87efbae 100644 >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item >> *rmap_item) >> struct mm_struct *mm = rmap_item->mm; >> unsigned long addr = rmap_item->address; >> struct vm_area_struct *vma; >> -struct page *page; >> +struct page *page = NULL; >> >> down_read(>mmap_sem); >> vma = find_mergeable_vma(mm, addr); >> -- >> 1.7.1 >> > > When find_mergeable_vma returned NULL, NULL is assigned to page in "out" > statement. > Oh, yes, thanks, Petr. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/ksm: return NULL when doesn't get mergeable page
On 2013/9/19 16:33, Petr Holasek wrote: On Mon, 16 Sep 2013, Jianguo Wu wrote: In get_mergeable_page() local variable page is not initialized, it may hold a garbage value, when find_mergeable_vma() return NULL, get_mergeable_page() may return a garbage value to the caller. So initialize page as NULL. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/ksm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index b6afe0c..87efbae 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item) struct mm_struct *mm = rmap_item-mm; unsigned long addr = rmap_item-address; struct vm_area_struct *vma; -struct page *page; +struct page *page = NULL; down_read(mm-mmap_sem); vma = find_mergeable_vma(mm, addr); -- 1.7.1 When find_mergeable_vma returned NULL, NULL is assigned to page in out statement. Oh, yes, thanks, Petr. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND PATCH] mm/mempolicy: use NUMA_NO_NODE
Use more appropriate NUMA_NO_NODE instead of -1 Signed-off-by: Jianguo Wu Acked-by: KOSAKI Motohiro --- mm/mempolicy.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4baf12e..4f0cd20 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, tmp = *from; while (!nodes_empty(tmp)) { int s,d; - int source = -1; + int source = NUMA_NO_NODE; int dest = 0; for_each_node_mask(s, tmp) { @@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, if (!node_isset(dest, tmp)) break; } - if (source == -1) + if (source == NUMA_NO_NODE) break; node_clear(source, tmp); @@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned nnodes = nodes_weight(pol->v.nodes); unsigned target; int c; - int nid = -1; + int nid = NUMA_NO_NODE; if (!nnodes) return numa_node_id(); @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy *pol, /* * Return the bit number of a random bit set in the nodemask. - * (returns -1 if nodemask is empty) + * (returns NUMA_NO_NODE if nodemask is empty) */ int node_random(const nodemask_t *maskp) { - int w, bit = -1; + int w, bit = NUMA_NO_NODE; w = nodes_weight(*maskp); if (w) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE
On 2013/9/17 4:26, Cody P Schafer wrote: > >> @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct >> mempolicy *pol, >> >> /* >>* Return the bit number of a random bit set in the nodemask. >> - * (returns -1 if nodemask is empty) >> + * (returns NUMA_NO_NOD if nodemask is empty) > > s/NUMA_NO_NOD/NUMA_NO_NODE/ > Thanks, I will resent this. >>*/ >> int node_random(const nodemask_t *maskp) >> { >> -int w, bit = -1; >> +int w, bit = NUMA_NO_NODE; >> >> w = nodes_weight(*maskp); >> if (w) >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE
On 2013/9/17 0:19, KOSAKI Motohiro wrote: > (9/16/13 8:53 AM), Jianguo Wu wrote: >> Use more appropriate NUMA_NO_NODE instead of -1 >> >> Signed-off-by: Jianguo Wu >> --- >> mm/mempolicy.c | 10 +- >> 1 files changed, 5 insertions(+), 5 deletions(-) > > I think this patch don't make any functional change, right? > Yes. > Acked-by: KOSAKI Motohiro Thanks for your ack. > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/mempolicy: use NUMA_NO_NODE
Use more appropriate NUMA_NO_NODE instead of -1 Signed-off-by: Jianguo Wu --- mm/mempolicy.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4baf12e..4f73025 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, tmp = *from; while (!nodes_empty(tmp)) { int s,d; - int source = -1; + int source = NUMA_NO_NODE; int dest = 0; for_each_node_mask(s, tmp) { @@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, if (!node_isset(dest, tmp)) break; } - if (source == -1) + if (source == NUMA_NO_NODE) break; node_clear(source, tmp); @@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned nnodes = nodes_weight(pol->v.nodes); unsigned target; int c; - int nid = -1; + int nid = NUMA_NO_NODE; if (!nnodes) return numa_node_id(); @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy *pol, /* * Return the bit number of a random bit set in the nodemask. - * (returns -1 if nodemask is empty) + * (returns NUMA_NO_NOD if nodemask is empty) */ int node_random(const nodemask_t *maskp) { - int w, bit = -1; + int w, bit = NUMA_NO_NODE; w = nodes_weight(*maskp); if (w) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/ksm: return NULL when doesn't get mergeable page
In get_mergeable_page() local variable page is not initialized, it may hold a garbage value, when find_mergeable_vma() return NULL, get_mergeable_page() may return a garbage value to the caller. So initialize page as NULL. Signed-off-by: Jianguo Wu --- mm/ksm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index b6afe0c..87efbae 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item) struct mm_struct *mm = rmap_item->mm; unsigned long addr = rmap_item->address; struct vm_area_struct *vma; - struct page *page; + struct page *page = NULL; down_read(>mmap_sem); vma = find_mergeable_vma(mm, addr); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/ksm: return NULL when doesn't get mergeable page
In get_mergeable_page() local variable page is not initialized, it may hold a garbage value, when find_mergeable_vma() return NULL, get_mergeable_page() may return a garbage value to the caller. So initialize page as NULL. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/ksm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index b6afe0c..87efbae 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item) struct mm_struct *mm = rmap_item-mm; unsigned long addr = rmap_item-address; struct vm_area_struct *vma; - struct page *page; + struct page *page = NULL; down_read(mm-mmap_sem); vma = find_mergeable_vma(mm, addr); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/mempolicy: use NUMA_NO_NODE
Use more appropriate NUMA_NO_NODE instead of -1 Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/mempolicy.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4baf12e..4f73025 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, tmp = *from; while (!nodes_empty(tmp)) { int s,d; - int source = -1; + int source = NUMA_NO_NODE; int dest = 0; for_each_node_mask(s, tmp) { @@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, if (!node_isset(dest, tmp)) break; } - if (source == -1) + if (source == NUMA_NO_NODE) break; node_clear(source, tmp); @@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned nnodes = nodes_weight(pol-v.nodes); unsigned target; int c; - int nid = -1; + int nid = NUMA_NO_NODE; if (!nnodes) return numa_node_id(); @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy *pol, /* * Return the bit number of a random bit set in the nodemask. - * (returns -1 if nodemask is empty) + * (returns NUMA_NO_NOD if nodemask is empty) */ int node_random(const nodemask_t *maskp) { - int w, bit = -1; + int w, bit = NUMA_NO_NODE; w = nodes_weight(*maskp); if (w) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE
On 2013/9/17 0:19, KOSAKI Motohiro wrote: (9/16/13 8:53 AM), Jianguo Wu wrote: Use more appropriate NUMA_NO_NODE instead of -1 Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/mempolicy.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) I think this patch don't make any functional change, right? Yes. Acked-by: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Thanks for your ack. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE
On 2013/9/17 4:26, Cody P Schafer wrote: @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy *pol, /* * Return the bit number of a random bit set in the nodemask. - * (returns -1 if nodemask is empty) + * (returns NUMA_NO_NOD if nodemask is empty) s/NUMA_NO_NOD/NUMA_NO_NODE/ Thanks, I will resent this. */ int node_random(const nodemask_t *maskp) { -int w, bit = -1; +int w, bit = NUMA_NO_NODE; w = nodes_weight(*maskp); if (w) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND PATCH] mm/mempolicy: use NUMA_NO_NODE
Use more appropriate NUMA_NO_NODE instead of -1 Signed-off-by: Jianguo Wu wujian...@huawei.com Acked-by: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com --- mm/mempolicy.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4baf12e..4f0cd20 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, tmp = *from; while (!nodes_empty(tmp)) { int s,d; - int source = -1; + int source = NUMA_NO_NODE; int dest = 0; for_each_node_mask(s, tmp) { @@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, if (!node_isset(dest, tmp)) break; } - if (source == -1) + if (source == NUMA_NO_NODE) break; node_clear(source, tmp); @@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned nnodes = nodes_weight(pol-v.nodes); unsigned target; int c; - int nid = -1; + int nid = NUMA_NO_NODE; if (!nnodes) return numa_node_id(); @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy *pol, /* * Return the bit number of a random bit set in the nodemask. - * (returns -1 if nodemask is empty) + * (returns NUMA_NO_NODE if nodemask is empty) */ int node_random(const nodemask_t *maskp) { - int w, bit = -1; + int w, bit = NUMA_NO_NODE; w = nodes_weight(*maskp); if (w) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/5] memblock: Introduce allocation direction to memblock.
Hi Tang, On 2013/9/13 17:30, Tang Chen wrote: > The Linux kernel cannot migrate pages used by the kernel. As a result, kernel > pages cannot be hot-removed. So we cannot allocate hotpluggable memory for > the kernel. > > ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info. > But before SRAT is parsed, memblock has already started to allocate memory > for the kernel. So we need to prevent memblock from doing this. > > In a memory hotplug system, any numa node the kernel resides in should > be unhotpluggable. And for a modern server, each node could have at least > 16GB memory. So memory around the kernel image is highly likely > unhotpluggable. > > So the basic idea is: Allocate memory from the end of the kernel image and > to the higher memory. Since memory allocation before SRAT is parsed won't > be too much, it could highly likely be in the same node with kernel image. > > The current memblock can only allocate memory from high address to low. > So this patch introduces the allocation direct to memblock. It could be > used to tell memblock to allocate memory from high to low or from low > to high. > > Signed-off-by: Tang Chen > Reviewed-by: Zhang Yanfei > --- > include/linux/memblock.h | 22 ++ > mm/memblock.c| 13 + > 2 files changed, 35 insertions(+), 0 deletions(-) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index 31e95ac..a7d3436 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -19,6 +19,11 @@ > > #define INIT_MEMBLOCK_REGIONS128 > > +/* Allocation order. */ s/order/direction/ > +#define MEMBLOCK_DIRECTION_HIGH_TO_LOW 0 > +#define MEMBLOCK_DIRECTION_LOW_TO_HIGH 1 > +#define MEMBLOCK_DIRECTION_DEFAULT MEMBLOCK_DIRECTION_HIGH_TO_LOW > + > struct memblock_region { > phys_addr_t base; > phys_addr_t size; > @@ -35,6 +40,7 @@ struct memblock_type { > }; > > struct memblock { > + int current_direction; /* allocate from higher or lower address */ > phys_addr_t current_limit; > struct memblock_type memory; > struct memblock_type reserved; > @@ -148,6 +154,12 @@ phys_addr_t memblock_alloc_try_nid(phys_addr_t size, > phys_addr_t align, int nid) > > phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align); > > +static inline bool memblock_direction_bottom_up(void) > +{ > + return memblock.current_direction == MEMBLOCK_DIRECTION_LOW_TO_HIGH; > +} > + > + > /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */ > #define MEMBLOCK_ALLOC_ANYWHERE (~(phys_addr_t)0) > #define MEMBLOCK_ALLOC_ACCESSIBLE0 > @@ -175,6 +187,16 @@ static inline void memblock_dump_all(void) > } > > /** > + * memblock_set_current_direction - Set current allocation direction to allow > + * allocating memory from higher to lower > + * address or from lower to higher address > + * > + * @direction: In which order to allocate memory. Could be s/order/direction/ > + * MEMBLOCK_DIRECTION_{HIGH_TO_LOW|LOW_TO_HIGH} > + */ > +void memblock_set_current_direction(int direction); > + > +/** > * memblock_set_current_limit - Set the current allocation limit to allow > * limiting allocations to what is currently > * accessible during boot > diff --git a/mm/memblock.c b/mm/memblock.c > index 0ac412a..f24ca2e 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -32,6 +32,7 @@ struct memblock memblock __initdata_memblock = { > .reserved.cnt = 1,/* empty dummy entry */ > .reserved.max = INIT_MEMBLOCK_REGIONS, > > + .current_direction = MEMBLOCK_DIRECTION_DEFAULT, > .current_limit = MEMBLOCK_ALLOC_ANYWHERE, > }; > > @@ -995,6 +996,18 @@ void __init_memblock memblock_trim_memory(phys_addr_t > align) > } > } > > +void __init_memblock memblock_set_current_direction(int direction) > +{ > + if (direction != MEMBLOCK_DIRECTION_HIGH_TO_LOW && > + direction != MEMBLOCK_DIRECTION_LOW_TO_HIGH) { > + pr_warn("memblock: Failed to set allocation order. " > + "Invalid order type: %d\n", direction); s/order/direction/ > + return; > + } > + > + memblock.current_direction = direction; > +} > + > void __init_memblock memblock_set_current_limit(phys_addr_t limit) > { > memblock.current_limit = limit; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/5] memblock: Introduce allocation direction to memblock.
Hi Tang, On 2013/9/13 17:30, Tang Chen wrote: The Linux kernel cannot migrate pages used by the kernel. As a result, kernel pages cannot be hot-removed. So we cannot allocate hotpluggable memory for the kernel. ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info. But before SRAT is parsed, memblock has already started to allocate memory for the kernel. So we need to prevent memblock from doing this. In a memory hotplug system, any numa node the kernel resides in should be unhotpluggable. And for a modern server, each node could have at least 16GB memory. So memory around the kernel image is highly likely unhotpluggable. So the basic idea is: Allocate memory from the end of the kernel image and to the higher memory. Since memory allocation before SRAT is parsed won't be too much, it could highly likely be in the same node with kernel image. The current memblock can only allocate memory from high address to low. So this patch introduces the allocation direct to memblock. It could be used to tell memblock to allocate memory from high to low or from low to high. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Reviewed-by: Zhang Yanfei zhangyan...@cn.fujitsu.com --- include/linux/memblock.h | 22 ++ mm/memblock.c| 13 + 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 31e95ac..a7d3436 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -19,6 +19,11 @@ #define INIT_MEMBLOCK_REGIONS128 +/* Allocation order. */ s/order/direction/ +#define MEMBLOCK_DIRECTION_HIGH_TO_LOW 0 +#define MEMBLOCK_DIRECTION_LOW_TO_HIGH 1 +#define MEMBLOCK_DIRECTION_DEFAULT MEMBLOCK_DIRECTION_HIGH_TO_LOW + struct memblock_region { phys_addr_t base; phys_addr_t size; @@ -35,6 +40,7 @@ struct memblock_type { }; struct memblock { + int current_direction; /* allocate from higher or lower address */ phys_addr_t current_limit; struct memblock_type memory; struct memblock_type reserved; @@ -148,6 +154,12 @@ phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid) phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align); +static inline bool memblock_direction_bottom_up(void) +{ + return memblock.current_direction == MEMBLOCK_DIRECTION_LOW_TO_HIGH; +} + + /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */ #define MEMBLOCK_ALLOC_ANYWHERE (~(phys_addr_t)0) #define MEMBLOCK_ALLOC_ACCESSIBLE0 @@ -175,6 +187,16 @@ static inline void memblock_dump_all(void) } /** + * memblock_set_current_direction - Set current allocation direction to allow + * allocating memory from higher to lower + * address or from lower to higher address + * + * @direction: In which order to allocate memory. Could be s/order/direction/ + * MEMBLOCK_DIRECTION_{HIGH_TO_LOW|LOW_TO_HIGH} + */ +void memblock_set_current_direction(int direction); + +/** * memblock_set_current_limit - Set the current allocation limit to allow * limiting allocations to what is currently * accessible during boot diff --git a/mm/memblock.c b/mm/memblock.c index 0ac412a..f24ca2e 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -32,6 +32,7 @@ struct memblock memblock __initdata_memblock = { .reserved.cnt = 1,/* empty dummy entry */ .reserved.max = INIT_MEMBLOCK_REGIONS, + .current_direction = MEMBLOCK_DIRECTION_DEFAULT, .current_limit = MEMBLOCK_ALLOC_ANYWHERE, }; @@ -995,6 +996,18 @@ void __init_memblock memblock_trim_memory(phys_addr_t align) } } +void __init_memblock memblock_set_current_direction(int direction) +{ + if (direction != MEMBLOCK_DIRECTION_HIGH_TO_LOW + direction != MEMBLOCK_DIRECTION_LOW_TO_HIGH) { + pr_warn(memblock: Failed to set allocation order. + Invalid order type: %d\n, direction); s/order/direction/ + return; + } + + memblock.current_direction = direction; +} + void __init_memblock memblock_set_current_limit(phys_addr_t limit) { memblock.current_limit = limit; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags
Hi Wanpeng, Thanks for your review, but this patch has minor format problem, please see below. Please review the resend one, thanks. Thanks, Jianguo Wu On 2013/9/5 16:09, Wanpeng Li wrote: > On Thu, Sep 05, 2013 at 03:57:47PM +0800, Jianguo Wu wrote: >> Changelog: >> *v1 -> v2: also update the stale comments about default transparent >> hugepage support pointed by Wanpeng Li. >> >> Since commit 13ece886d9(thp: transparent hugepage config choice), >> transparent hugepage support is disabled by default, and >> TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. >> >> And since commit d39d33c332(thp: enable direct defrag), defrag is >> enable for all transparent hugepage page faults by default, not only in >> MADV_HUGEPAGE regions. >> > > Reviewed-by: Wanpeng Li > >> Signed-off-by: Jianguo Wu >> --- >> mm/huge_memory.c | 12 ++-- >> 1 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index a92012a..0e42a70 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -26,12 +26,12 @@ >> #include >> #include "internal.h" >> >> -/* >> - * By default transparent hugepage support is enabled for all mappings >> - * and khugepaged scans all mappings. Defrag is only invoked by >> - * khugepaged hugepage allocations and by page faults inside >> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived >> - * allocations. >> +/* By default transparent hugepage support is disabled in order that avoid Should be: +/* + * By default transparent hugepage support is disabled in order that avoid Please review the resend one. Thanks. >> + * to risk increase the memory footprint of applications without a >> guaranteed >> + * benefit. When transparent hugepage support is enabled, is for all >> mappings, >> + * and khugepaged scans all mappings. >> + * Defrag is invoked by khugepaged hugepage allocations and by page faults >> + * for all hugepage allocations. >> */ >> unsigned long transparent_hugepage_flags __read_mostly = >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS >> -- >> 1.7.1 >> > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2][RESEND] mm/thp: fix stale comments of transparent_hugepage_flags
Changelog: *v1 -> v2: also update the stale comments about default transparent hugepage support pointed by Wanpeng Li. Since commit 13ece886d9(thp: transparent hugepage config choice), transparent hugepage support is disabled by default, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. And since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu --- mm/huge_memory.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..90ce6de 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -27,11 +27,12 @@ #include "internal.h" /* - * By default transparent hugepage support is enabled for all mappings - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. + * By default transparent hugepage support is disabled in order that avoid + * to risk increase the memory footprint of applications without a guaranteed + * benefit. When transparent hugepage support is enabled, is for all mappings, + * and khugepaged scans all mappings. + * Defrag is invoked by khugepaged hugepage allocations and by page faults + * for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags
Changelog: *v1 -> v2: also update the stale comments about default transparent hugepage support pointed by Wanpeng Li. Since commit 13ece886d9(thp: transparent hugepage config choice), transparent hugepage support is disabled by default, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. And since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu --- mm/huge_memory.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..0e42a70 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -26,12 +26,12 @@ #include #include "internal.h" -/* - * By default transparent hugepage support is enabled for all mappings - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. +/* By default transparent hugepage support is disabled in order that avoid + * to risk increase the memory footprint of applications without a guaranteed + * benefit. When transparent hugepage support is enabled, is for all mappings, + * and khugepaged scans all mappings. + * Defrag is invoked by khugepaged hugepage allocations and by page faults + * for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags
On 2013/9/5 12:58, Wanpeng Li wrote: > Hi Jianguo, > On Thu, Sep 05, 2013 at 11:54:00AM +0800, Jianguo Wu wrote: >> On 2013/9/5 11:37, Wanpeng Li wrote: >> >>> On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote: >>>> Hi Wanpeng, >>>> >>>> On 2013/9/5 10:11, Wanpeng Li wrote: >>>> >>>>> Hi Jianguo, >>>>> On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote: >>>>>> Since commit d39d33c332(thp: enable direct defrag), defrag is enable >>>>>> for all transparent hugepage page faults by default, not only in >>>>>> MADV_HUGEPAGE regions. >>>>>> >>>>>> Signed-off-by: Jianguo Wu >>>>>> --- >>>>>> mm/huge_memory.c | 6 ++ >>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>>> index a92012a..abf047e 100644 >>>>>> --- a/mm/huge_memory.c >>>>>> +++ b/mm/huge_memory.c >>>>>> @@ -28,10 +28,8 @@ >>>>>> >>>>>> /* >>>>>> * By default transparent hugepage support is enabled for all mappings >>>>> >>>>> This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by >>>>> default in >>>>> order that avoid to risk increase the memory footprint of applications >>>>> w/o a >>>>> guaranteed benefit. >>>>> >>>> >>>> Right, how about this: >>>> >>>> By default transparent hugepage support is disabled in order that avoid to >>>> risk >>> >>> I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured >>> by default. >>> >> >> Hi Wanpeng, >> >> We have TRANSPARENT_HUGEPAGE and >> TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE, >> TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured >> only if TRANSPARENT_HUGEPAGE >> is configured. >> >> By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is >> configured when TRANSPARENT_HUGEPAGE=y. >> >> commit 13ece886d9(thp: transparent hugepage config choice): >> >> config TRANSPARENT_HUGEPAGE >> - bool "Transparent Hugepage Support" if EMBEDDED >> + bool "Transparent Hugepage Support" >>depends on X86 && MMU >> - default y >> >> +choice >> + prompt "Transparent Hugepage Support sysfs defaults" >> + depends on TRANSPARENT_HUGEPAGE >> + default TRANSPARENT_HUGEPAGE_ALWAYS >> > > mmotm tree: > > grep 'TRANSPARENT_HUGEPAGE' .config > CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y > CONFIG_TRANSPARENT_HUGEPAGE=y > # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y > > distro: > > grep 'TRANSPARENT_HUGEPAGE' config-3.8.0-26-generic > CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y > CONFIG_TRANSPARENT_HUGEPAGE=y > # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y > Hi Wanpeng, I'm a little confused, at mm/Kconfig, TRANSPARENT_HUGEPAGE is not configured by default. and in x86_64, linus tree: $make defconfig $grep 'TRANSPARENT_HUGEPAGE' .config CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y # CONFIG_TRANSPARENT_HUGEPAGE is not set Do i misunderstand something here? Thanks > >> Thanks, >> Jianguo Wu >> >>> Regards, >>> Wanpeng Li >>> >>>> increase the memory footprint of applications w/o a guaranteed benefit, and >>>> khugepaged scans all mappings when transparent hugepage enabled. >>>> Defrag is invoked by khugepaged hugepage allocations and by page faults >>>> for all >>>> hugepage allocations. >>>> >>>> Thanks, >>>> Jianguo Wu >>>> >>>>> Regards, >>>>> Wanpeng Li >>>>> >>>>>> - * and khugepaged scans all mappings. Defrag is only invoked by >>>>>> - * khugepaged hugepage allocations and by page faults inside >>>>>> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived >>>>>> - * allocations. >>>>>> + * and khugepaged scans all mappings. Defrag is invoked by khugepaged >>>>>> + * hugepage allocations and by page faults for all hugepage allocations. >>>>>> */ >>>>>> unsigned long transparent_hugepage_flags __read_mostly = >>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS >>>>>> -- >>>>>> 1.8.1.2 >>>>>> >>>>>> -- >>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>>> the body to majord...@kvack.org. For more info on Linux MM, >>>>>> see: http://www.linux-mm.org/ . >>>>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >>>>> >>>>> -- >>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>> the body to majord...@kvack.org. For more info on Linux MM, >>>>> see: http://www.linux-mm.org/ . >>>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >>>>> >>>>> >>>> >>>> >>> >>> >>> . >>> >> >> > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags
On 2013/9/5 12:58, Wanpeng Li wrote: Hi Jianguo, On Thu, Sep 05, 2013 at 11:54:00AM +0800, Jianguo Wu wrote: On 2013/9/5 11:37, Wanpeng Li wrote: On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote: Hi Wanpeng, On 2013/9/5 10:11, Wanpeng Li wrote: Hi Jianguo, On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote: Since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/huge_memory.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..abf047e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -28,10 +28,8 @@ /* * By default transparent hugepage support is enabled for all mappings This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by default in order that avoid to risk increase the memory footprint of applications w/o a guaranteed benefit. Right, how about this: By default transparent hugepage support is disabled in order that avoid to risk I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured by default. Hi Wanpeng, We have TRANSPARENT_HUGEPAGE and TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE, TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured only if TRANSPARENT_HUGEPAGE is configured. By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. commit 13ece886d9(thp: transparent hugepage config choice): config TRANSPARENT_HUGEPAGE - bool Transparent Hugepage Support if EMBEDDED + bool Transparent Hugepage Support depends on X86 MMU - default y +choice + prompt Transparent Hugepage Support sysfs defaults + depends on TRANSPARENT_HUGEPAGE + default TRANSPARENT_HUGEPAGE_ALWAYS mmotm tree: grep 'TRANSPARENT_HUGEPAGE' .config CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE=y # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y distro: grep 'TRANSPARENT_HUGEPAGE' config-3.8.0-26-generic CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE=y # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y Hi Wanpeng, I'm a little confused, at mm/Kconfig, TRANSPARENT_HUGEPAGE is not configured by default. and in x86_64, linus tree: $make defconfig $grep 'TRANSPARENT_HUGEPAGE' .config CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y # CONFIG_TRANSPARENT_HUGEPAGE is not set Do i misunderstand something here? Thanks Thanks, Jianguo Wu Regards, Wanpeng Li increase the memory footprint of applications w/o a guaranteed benefit, and khugepaged scans all mappings when transparent hugepage enabled. Defrag is invoked by khugepaged hugepage allocations and by page faults for all hugepage allocations. Thanks, Jianguo Wu Regards, Wanpeng Li - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. + * and khugepaged scans all mappings. Defrag is invoked by khugepaged + * hugepage allocations and by page faults for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.8.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags
Changelog: *v1 - v2: also update the stale comments about default transparent hugepage support pointed by Wanpeng Li. Since commit 13ece886d9(thp: transparent hugepage config choice), transparent hugepage support is disabled by default, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. And since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/huge_memory.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..0e42a70 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -26,12 +26,12 @@ #include asm/pgalloc.h #include internal.h -/* - * By default transparent hugepage support is enabled for all mappings - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. +/* By default transparent hugepage support is disabled in order that avoid + * to risk increase the memory footprint of applications without a guaranteed + * benefit. When transparent hugepage support is enabled, is for all mappings, + * and khugepaged scans all mappings. + * Defrag is invoked by khugepaged hugepage allocations and by page faults + * for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2][RESEND] mm/thp: fix stale comments of transparent_hugepage_flags
Changelog: *v1 - v2: also update the stale comments about default transparent hugepage support pointed by Wanpeng Li. Since commit 13ece886d9(thp: transparent hugepage config choice), transparent hugepage support is disabled by default, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. And since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/huge_memory.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..90ce6de 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -27,11 +27,12 @@ #include internal.h /* - * By default transparent hugepage support is enabled for all mappings - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. + * By default transparent hugepage support is disabled in order that avoid + * to risk increase the memory footprint of applications without a guaranteed + * benefit. When transparent hugepage support is enabled, is for all mappings, + * and khugepaged scans all mappings. + * Defrag is invoked by khugepaged hugepage allocations and by page faults + * for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags
Hi Wanpeng, Thanks for your review, but this patch has minor format problem, please see below. Please review the resend one, thanks. Thanks, Jianguo Wu On 2013/9/5 16:09, Wanpeng Li wrote: On Thu, Sep 05, 2013 at 03:57:47PM +0800, Jianguo Wu wrote: Changelog: *v1 - v2: also update the stale comments about default transparent hugepage support pointed by Wanpeng Li. Since commit 13ece886d9(thp: transparent hugepage config choice), transparent hugepage support is disabled by default, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. And since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Reviewed-by: Wanpeng Li liw...@linux.vnet.ibm.com Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/huge_memory.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..0e42a70 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -26,12 +26,12 @@ #include asm/pgalloc.h #include internal.h -/* - * By default transparent hugepage support is enabled for all mappings - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. +/* By default transparent hugepage support is disabled in order that avoid Should be: +/* + * By default transparent hugepage support is disabled in order that avoid Please review the resend one. Thanks. + * to risk increase the memory footprint of applications without a guaranteed + * benefit. When transparent hugepage support is enabled, is for all mappings, + * and khugepaged scans all mappings. + * Defrag is invoked by khugepaged hugepage allocations and by page faults + * for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags
On 2013/9/5 11:37, Wanpeng Li wrote: > On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote: >> Hi Wanpeng, >> >> On 2013/9/5 10:11, Wanpeng Li wrote: >> >>> Hi Jianguo, >>> On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote: >>>> Since commit d39d33c332(thp: enable direct defrag), defrag is enable >>>> for all transparent hugepage page faults by default, not only in >>>> MADV_HUGEPAGE regions. >>>> >>>> Signed-off-by: Jianguo Wu >>>> --- >>>> mm/huge_memory.c | 6 ++ >>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index a92012a..abf047e 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -28,10 +28,8 @@ >>>> >>>> /* >>>> * By default transparent hugepage support is enabled for all mappings >>> >>> This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by >>> default in >>> order that avoid to risk increase the memory footprint of applications w/o >>> a >>> guaranteed benefit. >>> >> >> Right, how about this: >> >> By default transparent hugepage support is disabled in order that avoid to >> risk > > I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured > by default. > Hi Wanpeng, We have TRANSPARENT_HUGEPAGE and TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE, TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured only if TRANSPARENT_HUGEPAGE is configured. By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. commit 13ece886d9(thp: transparent hugepage config choice): config TRANSPARENT_HUGEPAGE - bool "Transparent Hugepage Support" if EMBEDDED + bool "Transparent Hugepage Support" depends on X86 && MMU - default y +choice + prompt "Transparent Hugepage Support sysfs defaults" + depends on TRANSPARENT_HUGEPAGE + default TRANSPARENT_HUGEPAGE_ALWAYS Thanks, Jianguo Wu > Regards, > Wanpeng Li > >> increase the memory footprint of applications w/o a guaranteed benefit, and >> khugepaged scans all mappings when transparent hugepage enabled. >> Defrag is invoked by khugepaged hugepage allocations and by page faults for >> all >> hugepage allocations. >> >> Thanks, >> Jianguo Wu >> >>> Regards, >>> Wanpeng Li >>> >>>> - * and khugepaged scans all mappings. Defrag is only invoked by >>>> - * khugepaged hugepage allocations and by page faults inside >>>> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived >>>> - * allocations. >>>> + * and khugepaged scans all mappings. Defrag is invoked by khugepaged >>>> + * hugepage allocations and by page faults for all hugepage allocations. >>>> */ >>>> unsigned long transparent_hugepage_flags __read_mostly = >>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS >>>> -- >>>> 1.8.1.2 >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majord...@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majord...@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org >>> >>> >> >> > > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags
Hi Wanpeng, On 2013/9/5 10:11, Wanpeng Li wrote: > Hi Jianguo, > On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote: >> Since commit d39d33c332(thp: enable direct defrag), defrag is enable >> for all transparent hugepage page faults by default, not only in >> MADV_HUGEPAGE regions. >> >> Signed-off-by: Jianguo Wu >> --- >> mm/huge_memory.c | 6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index a92012a..abf047e 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -28,10 +28,8 @@ >> >> /* >> * By default transparent hugepage support is enabled for all mappings > > This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by default > in > order that avoid to risk increase the memory footprint of applications w/o a > guaranteed benefit. > Right, how about this: By default transparent hugepage support is disabled in order that avoid to risk increase the memory footprint of applications w/o a guaranteed benefit, and khugepaged scans all mappings when transparent hugepage enabled. Defrag is invoked by khugepaged hugepage allocations and by page faults for all hugepage allocations. Thanks, Jianguo Wu > Regards, > Wanpeng Li > >> - * and khugepaged scans all mappings. Defrag is only invoked by >> - * khugepaged hugepage allocations and by page faults inside >> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived >> - * allocations. >> + * and khugepaged scans all mappings. Defrag is invoked by khugepaged >> + * hugepage allocations and by page faults for all hugepage allocations. >> */ >> unsigned long transparent_hugepage_flags __read_mostly = >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS >> -- >> 1.8.1.2 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majord...@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/thp: fix comments in transparent_hugepage_flags
Since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu --- mm/huge_memory.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..abf047e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -28,10 +28,8 @@ /* * By default transparent hugepage support is enabled for all mappings - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. + * and khugepaged scans all mappings. Defrag is invoked by khugepaged + * hugepage allocations and by page faults for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/thp: fix comments in transparent_hugepage_flags
Since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/huge_memory.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..abf047e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -28,10 +28,8 @@ /* * By default transparent hugepage support is enabled for all mappings - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. + * and khugepaged scans all mappings. Defrag is invoked by khugepaged + * hugepage allocations and by page faults for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags
Hi Wanpeng, On 2013/9/5 10:11, Wanpeng Li wrote: Hi Jianguo, On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote: Since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/huge_memory.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..abf047e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -28,10 +28,8 @@ /* * By default transparent hugepage support is enabled for all mappings This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by default in order that avoid to risk increase the memory footprint of applications w/o a guaranteed benefit. Right, how about this: By default transparent hugepage support is disabled in order that avoid to risk increase the memory footprint of applications w/o a guaranteed benefit, and khugepaged scans all mappings when transparent hugepage enabled. Defrag is invoked by khugepaged hugepage allocations and by page faults for all hugepage allocations. Thanks, Jianguo Wu Regards, Wanpeng Li - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. + * and khugepaged scans all mappings. Defrag is invoked by khugepaged + * hugepage allocations and by page faults for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.8.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags
On 2013/9/5 11:37, Wanpeng Li wrote: On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote: Hi Wanpeng, On 2013/9/5 10:11, Wanpeng Li wrote: Hi Jianguo, On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote: Since commit d39d33c332(thp: enable direct defrag), defrag is enable for all transparent hugepage page faults by default, not only in MADV_HUGEPAGE regions. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/huge_memory.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a92012a..abf047e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -28,10 +28,8 @@ /* * By default transparent hugepage support is enabled for all mappings This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by default in order that avoid to risk increase the memory footprint of applications w/o a guaranteed benefit. Right, how about this: By default transparent hugepage support is disabled in order that avoid to risk I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured by default. Hi Wanpeng, We have TRANSPARENT_HUGEPAGE and TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE, TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured only if TRANSPARENT_HUGEPAGE is configured. By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y. commit 13ece886d9(thp: transparent hugepage config choice): config TRANSPARENT_HUGEPAGE - bool Transparent Hugepage Support if EMBEDDED + bool Transparent Hugepage Support depends on X86 MMU - default y +choice + prompt Transparent Hugepage Support sysfs defaults + depends on TRANSPARENT_HUGEPAGE + default TRANSPARENT_HUGEPAGE_ALWAYS Thanks, Jianguo Wu Regards, Wanpeng Li increase the memory footprint of applications w/o a guaranteed benefit, and khugepaged scans all mappings when transparent hugepage enabled. Defrag is invoked by khugepaged hugepage allocations and by page faults for all hugepage allocations. Thanks, Jianguo Wu Regards, Wanpeng Li - * and khugepaged scans all mappings. Defrag is only invoked by - * khugepaged hugepage allocations and by page faults inside - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived - * allocations. + * and khugepaged scans all mappings. Defrag is invoked by khugepaged + * hugepage allocations and by page faults for all hugepage allocations. */ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -- 1.8.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] mm/arch: use NUMA_NODE
Cc linux...@kvack.org On 2013/8/30 10:06, Jianguo Wu wrote: > Use more appropriate NUMA_NO_NODE instead of -1 in some archs' module_alloc() > > Signed-off-by: Jianguo Wu > --- > arch/arm/kernel/module.c|2 +- > arch/arm64/kernel/module.c |2 +- > arch/mips/kernel/module.c |2 +- > arch/parisc/kernel/module.c |2 +- > arch/s390/kernel/module.c |2 +- > arch/sparc/kernel/module.c |2 +- > arch/x86/kernel/module.c|2 +- > 7 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c > index 85c3fb6..8f4cff3 100644 > --- a/arch/arm/kernel/module.c > +++ b/arch/arm/kernel/module.c > @@ -40,7 +40,7 @@ > void *module_alloc(unsigned long size) > { > return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, > + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, > __builtin_return_address(0)); > } > #endif > diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c > index ca0e3d5..8f898bd 100644 > --- a/arch/arm64/kernel/module.c > +++ b/arch/arm64/kernel/module.c > @@ -29,7 +29,7 @@ > void *module_alloc(unsigned long size) > { > return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, > + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, > __builtin_return_address(0)); > } > > diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c > index 977a623..b507e07 100644 > --- a/arch/mips/kernel/module.c > +++ b/arch/mips/kernel/module.c > @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock); > void *module_alloc(unsigned long size) > { > return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END, > - GFP_KERNEL, PAGE_KERNEL, -1, > + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, > __builtin_return_address(0)); > } > #endif > diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c > index 2a625fb..50dfafc 100644 > --- a/arch/parisc/kernel/module.c > +++ b/arch/parisc/kernel/module.c > @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size) >* init_data correctly */ > return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, > GFP_KERNEL | __GFP_HIGHMEM, > - PAGE_KERNEL_RWX, -1, > + PAGE_KERNEL_RWX, NUMA_NO_NODE, > __builtin_return_address(0)); > } > > diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c > index 7845e15..b89b591 100644 > --- a/arch/s390/kernel/module.c > +++ b/arch/s390/kernel/module.c > @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size) > if (PAGE_ALIGN(size) > MODULES_LEN) > return NULL; > return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > - GFP_KERNEL, PAGE_KERNEL, -1, > + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, > __builtin_return_address(0)); > } > #endif > diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c > index 4435488..97655e0 100644 > --- a/arch/sparc/kernel/module.c > +++ b/arch/sparc/kernel/module.c > @@ -29,7 +29,7 @@ static void *module_map(unsigned long size) > if (PAGE_ALIGN(size) > MODULES_LEN) > return NULL; > return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > - GFP_KERNEL, PAGE_KERNEL, -1, > + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, > __builtin_return_address(0)); > } > #else > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c > index 216a4d7..18be189 100644 > --- a/arch/x86/kernel/module.c > +++ b/arch/x86/kernel/module.c > @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size) > return NULL; > return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, > - -1, __builtin_return_address(0)); > + NUMA_NO_NODE, __builtin_return_address(0)); > } > > #ifdef CONFIG_X86_32 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] mm/vmalloc: use N_MEMORY instead of N_HIGH_MEMORY
On 2013/8/30 11:36, Jianguo Wu wrote: > Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), > we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory, > and N_HIGH_MEMORY stands for the nodes that has normal or high memory. > > The code here need to handle with the nodes which have memory, > we should use N_MEMORY instead. > As Michal pointed out in http://marc.info/?l=linux-kernel=137784852720861=2, N_HIGH_MEMORY should be kept in these places, please ignore this series. Sorry for the noise. Thanks. > Signed-off-by: Jianguo Wu > --- > mm/vmalloc.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 13a5495..1152947 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2573,7 +2573,7 @@ static void show_numa_info(struct seq_file *m, struct > vm_struct *v) > for (nr = 0; nr < v->nr_pages; nr++) > counters[page_to_nid(v->pages[nr])]++; > > - for_each_node_state(nr, N_HIGH_MEMORY) > + for_each_node_state(nr, N_MEMORY) > if (counters[nr]) > seq_printf(m, " N%u=%u", nr, counters[nr]); > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/vmalloc: use help function to get vmalloc area size
On 2013/8/30 16:49, Wanpeng Li wrote: > On Fri, Aug 30, 2013 at 04:42:49PM +0800, Jianguo Wu wrote: >> Use get_vm_area_size() to get vmalloc area's actual size without guard page. >> > > Do you see this? > > http://marc.info/?l=linux-mm=137698172417316=2 > Hi Wanpeng, Sorry for not notice your post, please ignore this patch. Thanks. >> Signed-off-by: Jianguo Wu >> --- >> mm/vmalloc.c | 12 ++-- >> 1 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 13a5495..abe13bc 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned >> long size) >> int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages) >> { >> unsigned long addr = (unsigned long)area->addr; >> -unsigned long end = addr + area->size - PAGE_SIZE; >> +unsigned long end = addr + get_vm_area_size(area); >> int err; >> >> err = vmap_page_range(addr, end, prot, *pages); >> @@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct >> *area, gfp_t gfp_mask, >> unsigned int nr_pages, array_size, i; >> gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; >> >> -nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; >> +nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; >> array_size = (nr_pages * sizeof(struct page *)); >> >> area->nr_pages = nr_pages; >> @@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count) >> >> vm = va->vm; >> vaddr = (char *) vm->addr; >> -if (addr >= vaddr + vm->size - PAGE_SIZE) >> +if (addr >= vaddr + get_vm_area_size(vm)) >> continue; >> while (addr < vaddr) { >> if (count == 0) >> @@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count) >> addr++; >> count--; >> } >> -n = vaddr + vm->size - PAGE_SIZE - addr; >> +n = vaddr + get_vm_area_size(vm) - addr; >> if (n > count) >> n = count; >> if (!(vm->flags & VM_IOREMAP)) >> @@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count) >> >> vm = va->vm; >> vaddr = (char *) vm->addr; >> -if (addr >= vaddr + vm->size - PAGE_SIZE) >> +if (addr >= vaddr + get_vm_area_size(vm)) >> continue; >> while (addr < vaddr) { >> if (count == 0) >> @@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count) >> addr++; >> count--; >> } >> -n = vaddr + vm->size - PAGE_SIZE - addr; >> +n = vaddr + get_vm_area_size(vm) - addr; >> if (n > count) >> n = count; >> if (!(vm->flags & VM_IOREMAP)) { >> -- >> 1.7.1 >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majord...@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/vmalloc: use help function to get vmalloc area size
Use get_vm_area_size() to get vmalloc area's actual size without guard page. Signed-off-by: Jianguo Wu --- mm/vmalloc.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 13a5495..abe13bc 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages) { unsigned long addr = (unsigned long)area->addr; - unsigned long end = addr + area->size - PAGE_SIZE; + unsigned long end = addr + get_vm_area_size(area); int err; err = vmap_page_range(addr, end, prot, *pages); @@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, unsigned int nr_pages, array_size, i; gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; + nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); area->nr_pages = nr_pages; @@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count) vm = va->vm; vaddr = (char *) vm->addr; - if (addr >= vaddr + vm->size - PAGE_SIZE) + if (addr >= vaddr + get_vm_area_size(vm)) continue; while (addr < vaddr) { if (count == 0) @@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + vm->size - PAGE_SIZE - addr; + n = vaddr + get_vm_area_size(vm) - addr; if (n > count) n = count; if (!(vm->flags & VM_IOREMAP)) @@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count) vm = va->vm; vaddr = (char *) vm->addr; - if (addr >= vaddr + vm->size - PAGE_SIZE) + if (addr >= vaddr + get_vm_area_size(vm)) continue; while (addr < vaddr) { if (count == 0) @@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + vm->size - PAGE_SIZE - addr; + n = vaddr + get_vm_area_size(vm) - addr; if (n > count) n = count; if (!(vm->flags & VM_IOREMAP)) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY
On 2013/8/30 15:41, Michal Hocko wrote: > On Fri 30-08-13 11:44:57, Jianguo Wu wrote: >> Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), > > But this very same commit also says: > " > A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice: > > One is in page_cgroup_init(void): > for_each_node_state(nid, N_HIGH_MEMORY) { > > It means if the node have memory, we will allocate page_cgroup map for > the node. We should use N_MEMORY instead here to gaim more clearly. > > The second using is in alloc_page_cgroup(): > if (node_state(nid, N_HIGH_MEMORY)) > addr = vzalloc_node(size, nid); > > It means if the node has high or normal memory that can be allocated > from kernel. We should keep N_HIGH_MEMORY here, and it will be better > if the "any memory" semantic of N_HIGH_MEMORY is removed. > " > > Which to me sounds like N_HIGH_MEMORY should be kept here. To be honest, Hi Michal, You are right, here we need normal or high memory, but not movable memory, so N_HIGH_MEMORY should be kept here, the same as other patches, please drop this series. Thank you for your point out. Thanks, Jianguo Wu. > the distinction is not entirely clear to me. It was supposed to make > code cleaner but it apparently causes confusion. > > It would also help if you CCed Lai Jiangshan who has introduced this > distinction. CCed now. > > I wasn't CCed on the rest of the series but if you do the same > conversion, please make sure that this is not the case for others as > well. > >> we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any >> memory, >> and N_HIGH_MEMORY stands for the nodes that has normal or high memory. >> >> The code here need to handle with the nodes which have memory, >> we should use N_MEMORY instead. >> >> Signed-off-by: Xishi Qiu >> --- >> mm/page_cgroup.c |2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c >> index 6d757e3..f6f7603 100644 >> --- a/mm/page_cgroup.c >> +++ b/mm/page_cgroup.c >> @@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, >> int nid) >> return addr; >> } >> >> -if (node_state(nid, N_HIGH_MEMORY)) >> +if (node_state(nid, N_MEMORY)) >> addr = vzalloc_node(size, nid); >> else >> addr = vzalloc(size); >> -- >> 1.7.1 >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY
On 2013/8/30 15:41, Michal Hocko wrote: On Fri 30-08-13 11:44:57, Jianguo Wu wrote: Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), But this very same commit also says: A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice: One is in page_cgroup_init(void): for_each_node_state(nid, N_HIGH_MEMORY) { It means if the node have memory, we will allocate page_cgroup map for the node. We should use N_MEMORY instead here to gaim more clearly. The second using is in alloc_page_cgroup(): if (node_state(nid, N_HIGH_MEMORY)) addr = vzalloc_node(size, nid); It means if the node has high or normal memory that can be allocated from kernel. We should keep N_HIGH_MEMORY here, and it will be better if the any memory semantic of N_HIGH_MEMORY is removed. Which to me sounds like N_HIGH_MEMORY should be kept here. To be honest, Hi Michal, You are right, here we need normal or high memory, but not movable memory, so N_HIGH_MEMORY should be kept here, the same as other patches, please drop this series. Thank you for your point out. Thanks, Jianguo Wu. the distinction is not entirely clear to me. It was supposed to make code cleaner but it apparently causes confusion. It would also help if you CCed Lai Jiangshan who has introduced this distinction. CCed now. I wasn't CCed on the rest of the series but if you do the same conversion, please make sure that this is not the case for others as well. we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory, and N_HIGH_MEMORY stands for the nodes that has normal or high memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Xishi Qiu qiuxi...@huawei.com --- mm/page_cgroup.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c index 6d757e3..f6f7603 100644 --- a/mm/page_cgroup.c +++ b/mm/page_cgroup.c @@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, int nid) return addr; } -if (node_state(nid, N_HIGH_MEMORY)) +if (node_state(nid, N_MEMORY)) addr = vzalloc_node(size, nid); else addr = vzalloc(size); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/vmalloc: use help function to get vmalloc area size
Use get_vm_area_size() to get vmalloc area's actual size without guard page. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/vmalloc.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 13a5495..abe13bc 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages) { unsigned long addr = (unsigned long)area-addr; - unsigned long end = addr + area-size - PAGE_SIZE; + unsigned long end = addr + get_vm_area_size(area); int err; err = vmap_page_range(addr, end, prot, *pages); @@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, unsigned int nr_pages, array_size, i; gfp_t nested_gfp = (gfp_mask GFP_RECLAIM_MASK) | __GFP_ZERO; - nr_pages = (area-size - PAGE_SIZE) PAGE_SHIFT; + nr_pages = get_vm_area_size(area) PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); area-nr_pages = nr_pages; @@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count) vm = va-vm; vaddr = (char *) vm-addr; - if (addr = vaddr + vm-size - PAGE_SIZE) + if (addr = vaddr + get_vm_area_size(vm)) continue; while (addr vaddr) { if (count == 0) @@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + vm-size - PAGE_SIZE - addr; + n = vaddr + get_vm_area_size(vm) - addr; if (n count) n = count; if (!(vm-flags VM_IOREMAP)) @@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count) vm = va-vm; vaddr = (char *) vm-addr; - if (addr = vaddr + vm-size - PAGE_SIZE) + if (addr = vaddr + get_vm_area_size(vm)) continue; while (addr vaddr) { if (count == 0) @@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + vm-size - PAGE_SIZE - addr; + n = vaddr + get_vm_area_size(vm) - addr; if (n count) n = count; if (!(vm-flags VM_IOREMAP)) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/vmalloc: use help function to get vmalloc area size
On 2013/8/30 16:49, Wanpeng Li wrote: On Fri, Aug 30, 2013 at 04:42:49PM +0800, Jianguo Wu wrote: Use get_vm_area_size() to get vmalloc area's actual size without guard page. Do you see this? http://marc.info/?l=linux-mmm=137698172417316w=2 Hi Wanpeng, Sorry for not notice your post, please ignore this patch. Thanks. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/vmalloc.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 13a5495..abe13bc 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages) { unsigned long addr = (unsigned long)area-addr; -unsigned long end = addr + area-size - PAGE_SIZE; +unsigned long end = addr + get_vm_area_size(area); int err; err = vmap_page_range(addr, end, prot, *pages); @@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, unsigned int nr_pages, array_size, i; gfp_t nested_gfp = (gfp_mask GFP_RECLAIM_MASK) | __GFP_ZERO; -nr_pages = (area-size - PAGE_SIZE) PAGE_SHIFT; +nr_pages = get_vm_area_size(area) PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); area-nr_pages = nr_pages; @@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count) vm = va-vm; vaddr = (char *) vm-addr; -if (addr = vaddr + vm-size - PAGE_SIZE) +if (addr = vaddr + get_vm_area_size(vm)) continue; while (addr vaddr) { if (count == 0) @@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count) addr++; count--; } -n = vaddr + vm-size - PAGE_SIZE - addr; +n = vaddr + get_vm_area_size(vm) - addr; if (n count) n = count; if (!(vm-flags VM_IOREMAP)) @@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count) vm = va-vm; vaddr = (char *) vm-addr; -if (addr = vaddr + vm-size - PAGE_SIZE) +if (addr = vaddr + get_vm_area_size(vm)) continue; while (addr vaddr) { if (count == 0) @@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count) addr++; count--; } -n = vaddr + vm-size - PAGE_SIZE - addr; +n = vaddr + get_vm_area_size(vm) - addr; if (n count) n = count; if (!(vm-flags VM_IOREMAP)) { -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] mm/vmalloc: use N_MEMORY instead of N_HIGH_MEMORY
On 2013/8/30 11:36, Jianguo Wu wrote: Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory, and N_HIGH_MEMORY stands for the nodes that has normal or high memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. As Michal pointed out in http://marc.info/?l=linux-kernelm=137784852720861w=2, N_HIGH_MEMORY should be kept in these places, please ignore this series. Sorry for the noise. Thanks. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/vmalloc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 13a5495..1152947 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2573,7 +2573,7 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v) for (nr = 0; nr v-nr_pages; nr++) counters[page_to_nid(v-pages[nr])]++; - for_each_node_state(nr, N_HIGH_MEMORY) + for_each_node_state(nr, N_MEMORY) if (counters[nr]) seq_printf(m, N%u=%u, nr, counters[nr]); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] mm/arch: use NUMA_NODE
Cc linux...@kvack.org On 2013/8/30 10:06, Jianguo Wu wrote: Use more appropriate NUMA_NO_NODE instead of -1 in some archs' module_alloc() Signed-off-by: Jianguo Wu wujian...@huawei.com --- arch/arm/kernel/module.c|2 +- arch/arm64/kernel/module.c |2 +- arch/mips/kernel/module.c |2 +- arch/parisc/kernel/module.c |2 +- arch/s390/kernel/module.c |2 +- arch/sparc/kernel/module.c |2 +- arch/x86/kernel/module.c|2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c index 85c3fb6..8f4cff3 100644 --- a/arch/arm/kernel/module.c +++ b/arch/arm/kernel/module.c @@ -40,7 +40,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index ca0e3d5..8f898bd 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -29,7 +29,7 @@ void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL_EXEC, -1, + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c index 977a623..b507e07 100644 --- a/arch/mips/kernel/module.c +++ b/arch/mips/kernel/module.c @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock); void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c index 2a625fb..50dfafc 100644 --- a/arch/parisc/kernel/module.c +++ b/arch/parisc/kernel/module.c @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size) * init_data correctly */ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, GFP_KERNEL | __GFP_HIGHMEM, - PAGE_KERNEL_RWX, -1, + PAGE_KERNEL_RWX, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c index 7845e15..b89b591 100644 --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size) if (PAGE_ALIGN(size) MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #endif diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c index 4435488..97655e0 100644 --- a/arch/sparc/kernel/module.c +++ b/arch/sparc/kernel/module.c @@ -29,7 +29,7 @@ static void *module_map(unsigned long size) if (PAGE_ALIGN(size) MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, - GFP_KERNEL, PAGE_KERNEL, -1, + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); } #else diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 216a4d7..18be189 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, - -1, __builtin_return_address(0)); + NUMA_NO_NODE, __builtin_return_address(0)); } #ifdef CONFIG_X86_32 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY
On 2013/8/30 11:44, Jianguo Wu wrote: > Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), > we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory, > and N_HIGH_MEMORY stands for the nodes that has normal or high memory. > > The code here need to handle with the nodes which have memory, > we should use N_MEMORY instead. > > Signed-off-by: Xishi Qiu Sorry, it's should be "Signed-off-by: Jianguo Wu " > --- > mm/page_cgroup.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c > index 6d757e3..f6f7603 100644 > --- a/mm/page_cgroup.c > +++ b/mm/page_cgroup.c > @@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, int > nid) > return addr; > } > > - if (node_state(nid, N_HIGH_MEMORY)) > + if (node_state(nid, N_MEMORY)) > addr = vzalloc_node(size, nid); > else > addr = vzalloc(size); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY
Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory, and N_HIGH_MEMORY stands for the nodes that has normal or high memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Xishi Qiu --- mm/page_cgroup.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c index 6d757e3..f6f7603 100644 --- a/mm/page_cgroup.c +++ b/mm/page_cgroup.c @@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, int nid) return addr; } - if (node_state(nid, N_HIGH_MEMORY)) + if (node_state(nid, N_MEMORY)) addr = vzalloc_node(size, nid); else addr = vzalloc(size); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] mm/ia64: use N_MEMORY instead of N_HIGH_MEMORY
Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory, and N_HIGH_MEMORY stands for the nodes that has normal or high memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Jianguo Wu --- arch/ia64/kernel/uncached.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c index a96bcf8..d2e5545 100644 --- a/arch/ia64/kernel/uncached.c +++ b/arch/ia64/kernel/uncached.c @@ -196,7 +196,7 @@ unsigned long uncached_alloc_page(int starting_nid, int n_pages) nid = starting_nid; do { - if (!node_state(nid, N_HIGH_MEMORY)) + if (!node_state(nid, N_MEMORY)) continue; uc_pool = _pools[nid]; if (uc_pool->pool == NULL) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] mm/vmemmap: use N_MEMORY instead of N_HIGH_MEMORY
Since commit 8219fc48a(mm: node_states: introduce N_MEMORY), we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory, and N_HIGH_MEMORY stands for the nodes that has normal or high memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Jianguo Wu --- mm/sparse-vmemmap.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 27eeab3..ca8f46b 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -52,7 +52,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node) if (slab_is_available()) { struct page *page; - if (node_state(node, N_HIGH_MEMORY)) + if (node_state(node, N_MEMORY)) page = alloc_pages_node( node, GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT, get_order(size)); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/