Re: [PATCH v3] ARM: mm: support big-endian page tables

2014-05-28 Thread Jianguo Wu
Hi Russell,
Could you please merge this to mainline? Thanks!

Jianguo Wu.

On 2014/4/24 10:51, Jianguo Wu wrote:

> On 2014/4/23 21:20, Will Deacon wrote:
> 
>> Hi Jianguo,
>>
>> On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote:
>>> On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu  
>>> wrote:
>>>> When enable LPAE and big-endian in a hisilicon board, while specify
>>>> mem=384M mem=512M@7680M, will get bad page state:
>>>>
>>>> Freeing unused kernel memory: 180K (c0466000 - c0493000)
>>>> BUG: Bad page state in process init  pfn:fa442
>>>> page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
>>>> page flags: 0x4400(reserved)
>>>> Modules linked in:
>>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
>>>> [] (unwind_backtrace+0x0/0x11c) from [] 
>>>> (show_stack+0x10/0x14)
>>>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
>>>> [] (bad_page+0xd4/0x104) from [] 
>>>> (free_pages_prepare+0xa8/0x14c)
>>>> [] (free_pages_prepare+0xa8/0x14c) from [] 
>>>> (free_hot_cold_page+0x18/0xf0)
>>>> [] (free_hot_cold_page+0x18/0xf0) from [] 
>>>> (handle_pte_fault+0xcf4/0xdc8)
>>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] 
>>>> (handle_mm_fault+0xf4/0x120)
>>>> [] (handle_mm_fault+0xf4/0x120) from [] 
>>>> (do_page_fault+0xfc/0x354)
>>>> [] (do_page_fault+0xfc/0x354) from [] 
>>>> (do_DataAbort+0x2c/0x90)
>>>> [] (do_DataAbort+0x2c/0x90) from [] 
>>>> (__dabt_usr+0x34/0x40)
>>
>>
>> [...]
>>
>> Please can you put this into Russell's patch system? You can also add my
>> ack:
>>
>>   Acked-by: Will Deacon 
>>
>> You should also CC stable  in the commit log.
>>
> 
> Hi Will,
> I have submit to 
> http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1.
> 
> Thanks,
> Jianguo Wu.
> 
>> Cheers,
>>
>> Will
>>
>> .
>>
> 
> 
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ARM: mm: support big-endian page tables

2014-05-28 Thread Jianguo Wu
Hi Russell,
Could you please merge this to mainline? Thanks!

Jianguo Wu.

On 2014/4/24 10:51, Jianguo Wu wrote:

 On 2014/4/23 21:20, Will Deacon wrote:
 
 Hi Jianguo,

 On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote:
 On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu wujian...@huawei.com 
 wrote:
 When enable LPAE and big-endian in a hisilicon board, while specify
 mem=384M mem=512M@7680M, will get bad page state:

 Freeing unused kernel memory: 180K (c0466000 - c0493000)
 BUG: Bad page state in process init  pfn:fa442
 page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
 page flags: 0x4400(reserved)
 Modules linked in:
 CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
 (show_stack+0x10/0x14)
 [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
 [c009e448] (bad_page+0xd4/0x104) from [c009e520] 
 (free_pages_prepare+0xa8/0x14c)
 [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
 (free_hot_cold_page+0x18/0xf0)
 [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
 (handle_pte_fault+0xcf4/0xdc8)
 [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
 (handle_mm_fault+0xf4/0x120)
 [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
 (do_page_fault+0xfc/0x354)
 [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
 (do_DataAbort+0x2c/0x90)
 [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] 
 (__dabt_usr+0x34/0x40)


 [...]

 Please can you put this into Russell's patch system? You can also add my
 ack:

   Acked-by: Will Deacon will.dea...@arm.com

 You should also CC stable sta...@vger.kernel.org in the commit log.

 
 Hi Will,
 I have submit to 
 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1.
 
 Thanks,
 Jianguo Wu.
 
 Cheers,

 Will

 .

 
 
 
 
 ___
 linux-arm-kernel mailing list
 linux-arm-ker...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ARM: mm: support big-endian page tables

2014-04-23 Thread Jianguo Wu
On 2014/4/23 21:20, Will Deacon wrote:

> Hi Jianguo,
> 
> On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote:
>> On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu  
>> wrote:
>>> When enable LPAE and big-endian in a hisilicon board, while specify
>>> mem=384M mem=512M@7680M, will get bad page state:
>>>
>>> Freeing unused kernel memory: 180K (c0466000 - c0493000)
>>> BUG: Bad page state in process init  pfn:fa442
>>> page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
>>> page flags: 0x4400(reserved)
>>> Modules linked in:
>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
>>> [] (unwind_backtrace+0x0/0x11c) from [] 
>>> (show_stack+0x10/0x14)
>>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
>>> [] (bad_page+0xd4/0x104) from [] 
>>> (free_pages_prepare+0xa8/0x14c)
>>> [] (free_pages_prepare+0xa8/0x14c) from [] 
>>> (free_hot_cold_page+0x18/0xf0)
>>> [] (free_hot_cold_page+0x18/0xf0) from [] 
>>> (handle_pte_fault+0xcf4/0xdc8)
>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] 
>>> (handle_mm_fault+0xf4/0x120)
>>> [] (handle_mm_fault+0xf4/0x120) from [] 
>>> (do_page_fault+0xfc/0x354)
>>> [] (do_page_fault+0xfc/0x354) from [] 
>>> (do_DataAbort+0x2c/0x90)
>>> [] (do_DataAbort+0x2c/0x90) from [] 
>>> (__dabt_usr+0x34/0x40)
> 
> 
> [...]
> 
> Please can you put this into Russell's patch system? You can also add my
> ack:
> 
>   Acked-by: Will Deacon 
> 
> You should also CC stable  in the commit log.
> 

Hi Will,
I have submit to 
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1.

Thanks,
Jianguo Wu.

> Cheers,
> 
> Will
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ARM: mm: support big-endian page tables

2014-04-23 Thread Jianguo Wu
On 2014/4/23 21:20, Will Deacon wrote:

 Hi Jianguo,
 
 On Thu, Apr 17, 2014 at 10:43:01AM +0100, Marc Zyngier wrote:
 On Thu, Apr 17 2014 at 10:31:37 am BST, Jianguo Wu wujian...@huawei.com 
 wrote:
 When enable LPAE and big-endian in a hisilicon board, while specify
 mem=384M mem=512M@7680M, will get bad page state:

 Freeing unused kernel memory: 180K (c0466000 - c0493000)
 BUG: Bad page state in process init  pfn:fa442
 page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
 page flags: 0x4400(reserved)
 Modules linked in:
 CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
 (show_stack+0x10/0x14)
 [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
 [c009e448] (bad_page+0xd4/0x104) from [c009e520] 
 (free_pages_prepare+0xa8/0x14c)
 [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
 (free_hot_cold_page+0x18/0xf0)
 [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
 (handle_pte_fault+0xcf4/0xdc8)
 [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
 (handle_mm_fault+0xf4/0x120)
 [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
 (do_page_fault+0xfc/0x354)
 [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
 (do_DataAbort+0x2c/0x90)
 [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] 
 (__dabt_usr+0x34/0x40)
 
 
 [...]
 
 Please can you put this into Russell's patch system? You can also add my
 ack:
 
   Acked-by: Will Deacon will.dea...@arm.com
 
 You should also CC stable sta...@vger.kernel.org in the commit log.
 

Hi Will,
I have submit to 
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8037/1.

Thanks,
Jianguo Wu.

 Cheers,
 
 Will
 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] ARM: mm: support big-endian page tables

2014-04-17 Thread Jianguo Wu
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x4400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[] (unwind_backtrace+0x0/0x11c) from [] 
(show_stack+0x10/0x14)
[] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
[] (bad_page+0xd4/0x104) from [] 
(free_pages_prepare+0xa8/0x14c)
[] (free_pages_prepare+0xa8/0x14c) from [] 
(free_hot_cold_page+0x18/0xf0)
[] (free_hot_cold_page+0x18/0xf0) from [] 
(handle_pte_fault+0xcf4/0xdc8)
[] (handle_pte_fault+0xcf4/0xdc8) from [] 
(handle_mm_fault+0xf4/0x120)
[] (handle_mm_fault+0xf4/0x120) from [] 
(do_page_fault+0xfc/0x354)
[] (do_page_fault+0xfc/0x354) from [] 
(do_DataAbort+0x2c/0x90)
[] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
...
set_pte_at(mm, address, page_table, entry);

//debug code
pfn = pte_pfn(entry);
pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry));

//read out the pte just set
new_pte = pte_offset_map(pmd, address);
new_pfn = pte_pfn(*new_pte);
pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry));
...
}

pfn:   0x1fa4f5, pte:0xc1fa4f575f
new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f   //new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 
registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.

Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.

This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.

Signed-off-by: Jianguo Wu 
Cc: sta...@vger.kernel.org
---
-v2: Refactoring code suggested by Ben Dooks.
-v3: Rewrite commit message suggested by Marc Zyngier.
---
 arch/arm/mm/proc-v7-3level.S |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 01a719e..22e3ad6 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
mov pc, lr
 ENDPROC(cpu_v7_switch_mm)
 
+#ifdef __ARMEB__
+#define rl r3
+#define rh r2
+#else
+#define rl r2
+#define rh r3
+#endif
+
 /*
  * cpu_v7_set_pte_ext(ptep, pte)
  *
@@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
-   tst r2, #L_PTE_VALID
+   tst rl, #L_PTE_VALID
beq 1f
-   tst r3, #1 << (57 - 32) @ L_PTE_NONE
-   bicne   r2, #L_PTE_VALID
+   tst rh, #1 << (57 - 32) @ L_PTE_NONE
+   bicne   rl, #L_PTE_VALID
bne 1f
-   tst r3, #1 << (55 - 32) @ L_PTE_DIRTY
-   orreq   r2, #L_PTE_RDONLY
+   tst rh, #1 << (55 - 32) @ L_PTE_DIRTY
+   orreq   rl, #L_PTE_RDONLY
 1: strdr2, r3, [r0]
ALT_SMP(W(nop))
ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte
-- 
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] ARM: mm: support big-endian page tables

2014-04-17 Thread Jianguo Wu
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x4400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
(show_stack+0x10/0x14)
[c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
[c009e448] (bad_page+0xd4/0x104) from [c009e520] 
(free_pages_prepare+0xa8/0x14c)
[c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
(free_hot_cold_page+0x18/0xf0)
[c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
(handle_pte_fault+0xcf4/0xdc8)
[c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
(handle_mm_fault+0xf4/0x120)
[c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
(do_page_fault+0xfc/0x354)
[c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
(do_DataAbort+0x2c/0x90)
[c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
...
set_pte_at(mm, address, page_table, entry);

//debug code
pfn = pte_pfn(entry);
pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry));

//read out the pte just set
new_pte = pte_offset_map(pmd, address);
new_pfn = pte_pfn(*new_pte);
pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry));
...
}

pfn:   0x1fa4f5, pte:0xc1fa4f575f
new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f   //new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 
registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.

Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.

This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.

Signed-off-by: Jianguo Wu wujian...@huawei.com
Cc: sta...@vger.kernel.org
---
-v2: Refactoring code suggested by Ben Dooks.
-v3: Rewrite commit message suggested by Marc Zyngier.
---
 arch/arm/mm/proc-v7-3level.S |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 01a719e..22e3ad6 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
mov pc, lr
 ENDPROC(cpu_v7_switch_mm)
 
+#ifdef __ARMEB__
+#define rl r3
+#define rh r2
+#else
+#define rl r2
+#define rh r3
+#endif
+
 /*
  * cpu_v7_set_pte_ext(ptep, pte)
  *
@@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
-   tst r2, #L_PTE_VALID
+   tst rl, #L_PTE_VALID
beq 1f
-   tst r3, #1  (57 - 32) @ L_PTE_NONE
-   bicne   r2, #L_PTE_VALID
+   tst rh, #1  (57 - 32) @ L_PTE_NONE
+   bicne   rl, #L_PTE_VALID
bne 1f
-   tst r3, #1  (55 - 32) @ L_PTE_DIRTY
-   orreq   r2, #L_PTE_RDONLY
+   tst rh, #1  (55 - 32) @ L_PTE_DIRTY
+   orreq   rl, #L_PTE_RDONLY
 1: strdr2, r3, [r0]
ALT_SMP(W(nop))
ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte
-- 
1.7.1



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mm: support big-endian page tables

2014-04-16 Thread Jianguo Wu
On 2014/4/16 20:28, Marc Zyngier wrote:

> On 16/04/14 03:45, Jianguo Wu wrote:
>> On 2014/4/14 19:14, Marc Zyngier wrote:
>>
>>> On 14/04/14 11:43, Will Deacon wrote:
>>>> (catching up on old email)
>>>>
>>>> On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote:
>>>>> Cloud you please take a look at this?
>>>>
>>>> [...]
>>>>
>>>>> On 2014/2/17 15:05, Jianguo Wu wrote:
>>>>>> When enable LPAE and big-endian in a hisilicon board, while specify
>>>>>> mem=384M mem=512M@7680M, will get bad page state:
>>>>>>
>>>>>> Freeing unused kernel memory: 180K (c0466000 - c0493000)
>>>>>> BUG: Bad page state in process init  pfn:fa442
>>>>>> page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
>>>>>> page flags: 0x4400(reserved)
>>>>>> Modules linked in:
>>>>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
>>>>>> [] (unwind_backtrace+0x0/0x11c) from [] 
>>>>>> (show_stack+0x10/0x14)
>>>>>> [] (show_stack+0x10/0x14) from [] 
>>>>>> (bad_page+0xd4/0x104)
>>>>>> [] (bad_page+0xd4/0x104) from [] 
>>>>>> (free_pages_prepare+0xa8/0x14c)
>>>>>> [] (free_pages_prepare+0xa8/0x14c) from [] 
>>>>>> (free_hot_cold_page+0x18/0xf0)
>>>>>> [] (free_hot_cold_page+0x18/0xf0) from [] 
>>>>>> (handle_pte_fault+0xcf4/0xdc8)
>>>>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] 
>>>>>> (handle_mm_fault+0xf4/0x120)
>>>>>> [] (handle_mm_fault+0xf4/0x120) from [] 
>>>>>> (do_page_fault+0xfc/0x354)
>>>>>> [] (do_page_fault+0xfc/0x354) from [] 
>>>>>> (do_DataAbort+0x2c/0x90)
>>>>>> [] (do_DataAbort+0x2c/0x90) from [] 
>>>>>> (__dabt_usr+0x34/0x40)
>>>>
>>>> [...]
>>>>
>>>>>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
>>>>>> when pte is 64-bit, for little-endian, will store low 32-bit in r2,
>>>>>> high 32-bit in r3; for big-endian, will store low 32-bit in r3,
>>>>>> high 32-bit in r2, this will cause wrong pfn stored in pte,
>>>>>> so we should exchange r2 and r3 for big-endian.
>>>>
>>
>> Hi Marc,
>> How about this:
>>
>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
>> - It tests the L_PTE_NONE in one word on the other, and possibly clear 
>> L_PTE_VALID
>>   tstr3, #1 << (57 - 32) @ L_PTE_NONE
>>   bicne  r2, #L_PTE_VALID
>> - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY
>>
>> As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the 
>> endianness,
>> for little-endian, will store low 32-bit in r2, high 32-bit in r3,
>> for big-endian, will store low 32-bit in r3, high 32-bit in r2, 
>> this will cause wrong bit is cleared or set, and get wrong pfn.
>> So we should exchange r2 and r3 for big-endian.
> 
> May I suggest the following instead:
> 
> "An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the
>  r2 and r3 registers.
>  On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
>  On a BE kernel, the assignment is reversed.
> 
>  Unfortunately, the current code always assumes the LE case,
>  leading to corruption of the PTE when clearing/setting bits.
> 
>  This patch fixes this issue much like it has been done already in the
>  cpu_v7_switch_mm case."
> 

OK, I will sent a new version, thanks!

> Cheers,
> 
>   M.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mm: support big-endian page tables

2014-04-16 Thread Jianguo Wu
On 2014/4/16 20:28, Marc Zyngier wrote:

 On 16/04/14 03:45, Jianguo Wu wrote:
 On 2014/4/14 19:14, Marc Zyngier wrote:

 On 14/04/14 11:43, Will Deacon wrote:
 (catching up on old email)

 On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote:
 Cloud you please take a look at this?

 [...]

 On 2014/2/17 15:05, Jianguo Wu wrote:
 When enable LPAE and big-endian in a hisilicon board, while specify
 mem=384M mem=512M@7680M, will get bad page state:

 Freeing unused kernel memory: 180K (c0466000 - c0493000)
 BUG: Bad page state in process init  pfn:fa442
 page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
 page flags: 0x4400(reserved)
 Modules linked in:
 CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
 (show_stack+0x10/0x14)
 [c000cbc4] (show_stack+0x10/0x14) from [c009e448] 
 (bad_page+0xd4/0x104)
 [c009e448] (bad_page+0xd4/0x104) from [c009e520] 
 (free_pages_prepare+0xa8/0x14c)
 [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
 (free_hot_cold_page+0x18/0xf0)
 [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
 (handle_pte_fault+0xcf4/0xdc8)
 [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
 (handle_mm_fault+0xf4/0x120)
 [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
 (do_page_fault+0xfc/0x354)
 [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
 (do_DataAbort+0x2c/0x90)
 [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] 
 (__dabt_usr+0x34/0x40)

 [...]

 The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
 when pte is 64-bit, for little-endian, will store low 32-bit in r2,
 high 32-bit in r3; for big-endian, will store low 32-bit in r3,
 high 32-bit in r2, this will cause wrong pfn stored in pte,
 so we should exchange r2 and r3 for big-endian.


 Hi Marc,
 How about this:

 The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
 - It tests the L_PTE_NONE in one word on the other, and possibly clear 
 L_PTE_VALID
   tstr3, #1  (57 - 32) @ L_PTE_NONE
   bicne  r2, #L_PTE_VALID
 - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY

 As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the 
 endianness,
 for little-endian, will store low 32-bit in r2, high 32-bit in r3,
 for big-endian, will store low 32-bit in r3, high 32-bit in r2, 
 this will cause wrong bit is cleared or set, and get wrong pfn.
 So we should exchange r2 and r3 for big-endian.
 
 May I suggest the following instead:
 
 An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the
  r2 and r3 registers.
  On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
  On a BE kernel, the assignment is reversed.
 
  Unfortunately, the current code always assumes the LE case,
  leading to corruption of the PTE when clearing/setting bits.
 
  This patch fixes this issue much like it has been done already in the
  cpu_v7_switch_mm case.
 

OK, I will sent a new version, thanks!

 Cheers,
 
   M.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mm: support big-endian page tables

2014-04-15 Thread Jianguo Wu
On 2014/4/14 19:14, Marc Zyngier wrote:

> On 14/04/14 11:43, Will Deacon wrote:
>> (catching up on old email)
>>
>> On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote:
>>> Cloud you please take a look at this?
>>
>> [...]
>>
>>> On 2014/2/17 15:05, Jianguo Wu wrote:
>>>> When enable LPAE and big-endian in a hisilicon board, while specify
>>>> mem=384M mem=512M@7680M, will get bad page state:
>>>>
>>>> Freeing unused kernel memory: 180K (c0466000 - c0493000)
>>>> BUG: Bad page state in process init  pfn:fa442
>>>> page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
>>>> page flags: 0x4400(reserved)
>>>> Modules linked in:
>>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
>>>> [] (unwind_backtrace+0x0/0x11c) from [] 
>>>> (show_stack+0x10/0x14)
>>>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
>>>> [] (bad_page+0xd4/0x104) from [] 
>>>> (free_pages_prepare+0xa8/0x14c)
>>>> [] (free_pages_prepare+0xa8/0x14c) from [] 
>>>> (free_hot_cold_page+0x18/0xf0)
>>>> [] (free_hot_cold_page+0x18/0xf0) from [] 
>>>> (handle_pte_fault+0xcf4/0xdc8)
>>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] 
>>>> (handle_mm_fault+0xf4/0x120)
>>>> [] (handle_mm_fault+0xf4/0x120) from [] 
>>>> (do_page_fault+0xfc/0x354)
>>>> [] (do_page_fault+0xfc/0x354) from [] 
>>>> (do_DataAbort+0x2c/0x90)
>>>> [] (do_DataAbort+0x2c/0x90) from [] 
>>>> (__dabt_usr+0x34/0x40)
>>
>> [...]
>>
>>>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
>>>> when pte is 64-bit, for little-endian, will store low 32-bit in r2,
>>>> high 32-bit in r3; for big-endian, will store low 32-bit in r3,
>>>> high 32-bit in r2, this will cause wrong pfn stored in pte,
>>>> so we should exchange r2 and r3 for big-endian.
>>

Hi Marc,
How about this:

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
- It tests the L_PTE_NONE in one word on the other, and possibly clear 
L_PTE_VALID
  tst   r3, #1 << (57 - 32)     @ L_PTE_NONE
  bicne r2, #L_PTE_VALID
- Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY

As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the 
endianness,
for little-endian, will store low 32-bit in r2, high 32-bit in r3,
for big-endian, will store low 32-bit in r3, high 32-bit in r2, 
this will cause wrong bit is cleared or set, and get wrong pfn.
So we should exchange r2 and r3 for big-endian.

Thanks,
Jianguo Wu.

>> I believe that Marc (added to CC) has been running LPAE-enabled, big-endian
>> KVM guests without any issues, so it seems unlikely that we're storing the
>> PTEs backwards. Can you check the configuration of SCTLR.EE?
> 
> So, for the record:
> 
> root@when-the-lie-s-so-big:~# cat /proc/cpuinfo 
> processor : 0
> model name: ARMv7 Processor rev 4 (v7b)
> Features  : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 
> idiva idivt vfpd32 lpae evtstrm 
> CPU implementer   : 0x41
> CPU architecture: 7
> CPU variant   : 0x0
> CPU part  : 0xc07
> CPU revision  : 4
> 
> processor : 1
> model name: ARMv7 Processor rev 4 (v7b)
> Features  : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 
> idiva idivt vfpd32 lpae evtstrm 
> CPU implementer   : 0x41
> CPU architecture: 7
> CPU variant   : 0x0
> CPU part  : 0xc07
> CPU revision  : 4
> 
> Hardware  : Dummy Virtual Machine
> Revision  : 
> Serial: 
> root@when-the-lie-s-so-big:~# uname -a
> Linux when-the-lie-s-so-big 3.14.0+ #2465 SMP PREEMPT Tue Apr 8 13:05:11 BST 
> 2014 armv7b GNU/Linux
> 
> Now, looking at the patch, I think it makes some sense:
> - Depending on the endianness, we have to test the L_PTE_NONE in one 
> word on the other, and possibly clear L_PTE_VALID
> - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY
> 
> The commit message looks wrong though, as it mention the PTE storage in 
> memory (which looks completely fine to me, and explain why I was able to
> boot a guest). As none of my guest RAM is above 4GB IPA, I didn't see 
> the corruption of bit 32 in the PTE (which should have been bit 0,
> corresponding to L_PTE_VALID).
> 
> So, provided that the commit message is rewritten to match the what it does,
> I'm fine with that patch.
> 
> Thanks,
> 
>   M.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mm: support big-endian page tables

2014-04-15 Thread Jianguo Wu
On 2014/4/14 19:14, Marc Zyngier wrote:

 On 14/04/14 11:43, Will Deacon wrote:
 (catching up on old email)

 On Tue, Mar 18, 2014 at 07:35:59AM +, Jianguo Wu wrote:
 Cloud you please take a look at this?

 [...]

 On 2014/2/17 15:05, Jianguo Wu wrote:
 When enable LPAE and big-endian in a hisilicon board, while specify
 mem=384M mem=512M@7680M, will get bad page state:

 Freeing unused kernel memory: 180K (c0466000 - c0493000)
 BUG: Bad page state in process init  pfn:fa442
 page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
 page flags: 0x4400(reserved)
 Modules linked in:
 CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
 (show_stack+0x10/0x14)
 [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
 [c009e448] (bad_page+0xd4/0x104) from [c009e520] 
 (free_pages_prepare+0xa8/0x14c)
 [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
 (free_hot_cold_page+0x18/0xf0)
 [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
 (handle_pte_fault+0xcf4/0xdc8)
 [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
 (handle_mm_fault+0xf4/0x120)
 [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
 (do_page_fault+0xfc/0x354)
 [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
 (do_DataAbort+0x2c/0x90)
 [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] 
 (__dabt_usr+0x34/0x40)

 [...]

 The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
 when pte is 64-bit, for little-endian, will store low 32-bit in r2,
 high 32-bit in r3; for big-endian, will store low 32-bit in r3,
 high 32-bit in r2, this will cause wrong pfn stored in pte,
 so we should exchange r2 and r3 for big-endian.


Hi Marc,
How about this:

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
- It tests the L_PTE_NONE in one word on the other, and possibly clear 
L_PTE_VALID
  tst   r3, #1  (57 - 32) @ L_PTE_NONE
  bicne r2, #L_PTE_VALID
- Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY

As for LPAE, the pte is 64-bits, and the value of r2/r3 is depending on the 
endianness,
for little-endian, will store low 32-bit in r2, high 32-bit in r3,
for big-endian, will store low 32-bit in r3, high 32-bit in r2, 
this will cause wrong bit is cleared or set, and get wrong pfn.
So we should exchange r2 and r3 for big-endian.

Thanks,
Jianguo Wu.

 I believe that Marc (added to CC) has been running LPAE-enabled, big-endian
 KVM guests without any issues, so it seems unlikely that we're storing the
 PTEs backwards. Can you check the configuration of SCTLR.EE?
 
 So, for the record:
 
 root@when-the-lie-s-so-big:~# cat /proc/cpuinfo 
 processor : 0
 model name: ARMv7 Processor rev 4 (v7b)
 Features  : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 
 idiva idivt vfpd32 lpae evtstrm 
 CPU implementer   : 0x41
 CPU architecture: 7
 CPU variant   : 0x0
 CPU part  : 0xc07
 CPU revision  : 4
 
 processor : 1
 model name: ARMv7 Processor rev 4 (v7b)
 Features  : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 
 idiva idivt vfpd32 lpae evtstrm 
 CPU implementer   : 0x41
 CPU architecture: 7
 CPU variant   : 0x0
 CPU part  : 0xc07
 CPU revision  : 4
 
 Hardware  : Dummy Virtual Machine
 Revision  : 
 Serial: 
 root@when-the-lie-s-so-big:~# uname -a
 Linux when-the-lie-s-so-big 3.14.0+ #2465 SMP PREEMPT Tue Apr 8 13:05:11 BST 
 2014 armv7b GNU/Linux
 
 Now, looking at the patch, I think it makes some sense:
 - Depending on the endianness, we have to test the L_PTE_NONE in one 
 word on the other, and possibly clear L_PTE_VALID
 - Same for L_PTE_DIRTY, respectively setting L_PTE_RDONLY
 
 The commit message looks wrong though, as it mention the PTE storage in 
 memory (which looks completely fine to me, and explain why I was able to
 boot a guest). As none of my guest RAM is above 4GB IPA, I didn't see 
 the corruption of bit 32 in the PTE (which should have been bit 0,
 corresponding to L_PTE_VALID).
 
 So, provided that the commit message is rewritten to match the what it does,
 I'm fine with that patch.
 
 Thanks,
 
   M.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.4 93/99] iwlwifi: always copy first 16 bytes of commands

2014-03-25 Thread Jianguo Wu
On 2014/3/25 17:29, Andreas Sturmlechner wrote:

> Original Message from: Ben Hutchings 
>>
>> One piece of my backport to 3.2.y went missing in the forward-port to
>> 3.4.y.  Can you test 3.4.83 with this patch on top?
>>
>> Ben.
> 
> iwlwifi works with the additional patch, thanks :)
> 
> 
> 


Sorry for the missing part, thanks, Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.4 93/99] iwlwifi: always copy first 16 bytes of commands

2014-03-25 Thread Jianguo Wu
On 2014/3/25 17:29, Andreas Sturmlechner wrote:

 Original Message from: Ben Hutchings b...@decadent.org.uk

 One piece of my backport to 3.2.y went missing in the forward-port to
 3.4.y.  Can you test 3.4.83 with this patch on top?

 Ben.
 
 iwlwifi works with the additional patch, thanks :)
 
 
 


Sorry for the missing part, thanks, Ben.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mm: support big-endian page tables

2014-03-18 Thread Jianguo Wu
Hi Russell,

Cloud you please take a look at this?

Thanks!

On 2014/2/17 15:05, Jianguo Wu wrote:

> When enable LPAE and big-endian in a hisilicon board, while specify
> mem=384M mem=512M@7680M, will get bad page state:
> 
> Freeing unused kernel memory: 180K (c0466000 - c0493000)
> BUG: Bad page state in process init  pfn:fa442
> page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
> page flags: 0x4400(reserved)
> Modules linked in:
> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
> [] (unwind_backtrace+0x0/0x11c) from [] 
> (show_stack+0x10/0x14)
> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
> [] (bad_page+0xd4/0x104) from [] 
> (free_pages_prepare+0xa8/0x14c)
> [] (free_pages_prepare+0xa8/0x14c) from [] 
> (free_hot_cold_page+0x18/0xf0)
> [] (free_hot_cold_page+0x18/0xf0) from [] 
> (handle_pte_fault+0xcf4/0xdc8)
> [] (handle_pte_fault+0xcf4/0xdc8) from [] 
> (handle_mm_fault+0xf4/0x120)
> [] (handle_mm_fault+0xf4/0x120) from [] 
> (do_page_fault+0xfc/0x354)
> [] (do_page_fault+0xfc/0x354) from [] 
> (do_DataAbort+0x2c/0x90)
> [] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40)
> 
> The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
> debugging,
> I find in page fault handler, will get wrong pfn from pte just after set pte,
> as follow:
> do_anonymous_page()
> {
>   ...
>   set_pte_at(mm, address, page_table, entry);
>   
>   //debug code
>   pfn = pte_pfn(entry);
>   pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry));
> 
>   //read out the pte just set
>   new_pte = pte_offset_map(pmd, address);
>   new_pfn = pte_pfn(*new_pte);
>   pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry));
>   ...
> }
> 
> pfn:   0x1fa4f5, pte:0xc1fa4f575f
> new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong.
> 
> The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
> when pte is 64-bit, for little-endian, will store low 32-bit in r2,
> high 32-bit in r3; for big-endian, will store low 32-bit in r3,
> high 32-bit in r2, this will cause wrong pfn stored in pte,
> so we should exchange r2 and r3 for big-endian.
> 
> Signed-off-by: Jianguo Wu 
> ---
>  arch/arm/mm/proc-v7-3level.S |   18 +-
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 01a719e..22e3ad6 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
>   mov pc, lr
>  ENDPROC(cpu_v7_switch_mm)
>  
> +#ifdef __ARMEB__
> +#define rl r3
> +#define rh r2
> +#else
> +#define rl r2
> +#define rh r3
> +#endif
> +
>  /*
>   * cpu_v7_set_pte_ext(ptep, pte)
>   *
> @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
>   */
>  ENTRY(cpu_v7_set_pte_ext)
>  #ifdef CONFIG_MMU
> - tst r2, #L_PTE_VALID
> + tst rl, #L_PTE_VALID
>   beq 1f
> - tst r3, #1 << (57 - 32) @ L_PTE_NONE
> - bicne   r2, #L_PTE_VALID
> + tst rh, #1 << (57 - 32) @ L_PTE_NONE
> + bicne   rl, #L_PTE_VALID
>   bne 1f
> - tst r3, #1 << (55 - 32) @ L_PTE_DIRTY
> - orreq   r2, #L_PTE_RDONLY
> + tst rh, #1 << (55 - 32) @ L_PTE_DIRTY
> + orreq   rl, #L_PTE_RDONLY
>  1:   strdr2, r3, [r0]
>   ALT_SMP(W(nop))
>   ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mm: support big-endian page tables

2014-03-18 Thread Jianguo Wu
Hi Russell,

Cloud you please take a look at this?

Thanks!

On 2014/2/17 15:05, Jianguo Wu wrote:

 When enable LPAE and big-endian in a hisilicon board, while specify
 mem=384M mem=512M@7680M, will get bad page state:
 
 Freeing unused kernel memory: 180K (c0466000 - c0493000)
 BUG: Bad page state in process init  pfn:fa442
 page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
 page flags: 0x4400(reserved)
 Modules linked in:
 CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
 (show_stack+0x10/0x14)
 [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
 [c009e448] (bad_page+0xd4/0x104) from [c009e520] 
 (free_pages_prepare+0xa8/0x14c)
 [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
 (free_hot_cold_page+0x18/0xf0)
 [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
 (handle_pte_fault+0xcf4/0xdc8)
 [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
 (handle_mm_fault+0xf4/0x120)
 [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
 (do_page_fault+0xfc/0x354)
 [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
 (do_DataAbort+0x2c/0x90)
 [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40)
 
 The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
 debugging,
 I find in page fault handler, will get wrong pfn from pte just after set pte,
 as follow:
 do_anonymous_page()
 {
   ...
   set_pte_at(mm, address, page_table, entry);
   
   //debug code
   pfn = pte_pfn(entry);
   pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry));
 
   //read out the pte just set
   new_pte = pte_offset_map(pmd, address);
   new_pfn = pte_pfn(*new_pte);
   pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry));
   ...
 }
 
 pfn:   0x1fa4f5, pte:0xc1fa4f575f
 new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f //new pfn/pte is wrong.
 
 The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
 when pte is 64-bit, for little-endian, will store low 32-bit in r2,
 high 32-bit in r3; for big-endian, will store low 32-bit in r3,
 high 32-bit in r2, this will cause wrong pfn stored in pte,
 so we should exchange r2 and r3 for big-endian.
 
 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
  arch/arm/mm/proc-v7-3level.S |   18 +-
  1 files changed, 13 insertions(+), 5 deletions(-)
 
 diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
 index 01a719e..22e3ad6 100644
 --- a/arch/arm/mm/proc-v7-3level.S
 +++ b/arch/arm/mm/proc-v7-3level.S
 @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
   mov pc, lr
  ENDPROC(cpu_v7_switch_mm)
  
 +#ifdef __ARMEB__
 +#define rl r3
 +#define rh r2
 +#else
 +#define rl r2
 +#define rh r3
 +#endif
 +
  /*
   * cpu_v7_set_pte_ext(ptep, pte)
   *
 @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
   */
  ENTRY(cpu_v7_set_pte_ext)
  #ifdef CONFIG_MMU
 - tst r2, #L_PTE_VALID
 + tst rl, #L_PTE_VALID
   beq 1f
 - tst r3, #1  (57 - 32) @ L_PTE_NONE
 - bicne   r2, #L_PTE_VALID
 + tst rh, #1  (57 - 32) @ L_PTE_NONE
 + bicne   rl, #L_PTE_VALID
   bne 1f
 - tst r3, #1  (55 - 32) @ L_PTE_DIRTY
 - orreq   r2, #L_PTE_RDONLY
 + tst rh, #1  (55 - 32) @ L_PTE_DIRTY
 + orreq   rl, #L_PTE_RDONLY
  1:   strdr2, r3, [r0]
   ALT_SMP(W(nop))
   ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/11] userspace out of memory handling

2014-03-11 Thread Jianguo Wu
On 2014/3/6 10:52, David Rientjes wrote:

> On Wed, 5 Mar 2014, Andrew Morton wrote:
> 
>>> This patchset introduces a standard interface through memcg that allows
>>> both of these conditions to be handled in the same clean way: users
>>> define memory.oom_reserve_in_bytes to define the reserve and this
>>> amount is allowed to be overcharged to the process handling the oom
>>> condition's memcg.  If used with the root memcg, this amount is allowed
>>> to be allocated below the per-zone watermarks for root processes that
>>> are handling such conditions (only root may write to
>>> cgroup.event_control for the root memcg).
>>
>> If process A is trying to allocate memory, cannot do so and the
>> userspace oom-killer is invoked, there must be means via which process
>> A waits for the userspace oom-killer's action.
> 
> It does so by relooping in the page allocator waiting for memory to be 
> freed just like it would if the kernel oom killer were called and process 
> A was waiting for the oom kill victim process B to exit, we don't have the 
> ability to put it on a waitqueue because we don't touch the freeing 
> hotpath.  The userspace oom handler may not even necessarily kill 
> anything, it may be able to free its own memory and start throttling other 
> processes, for example.
> 
>> And there must be
>> fallbacks which occur if the userspace oom killer fails to clear the
>> oom condition, or times out.
>>
> 
> I agree completely and proposed this before as memory.oom_delay_millisecs 
> at http://lwn.net/Articles/432226 which we use internally when memory 
> can't be freed or a memcg's limit cannot be expanded.  I guess it makes 
> more sense alongside the rest of this patchset now, I can add it as an 
> additional patch next time around.
> 
>> Would be interested to see a description of how all this works.
>>
> 
> There's an article for LWN also being developed on this topic.  As 
> mentioned in that article, I think it would be best to generalize a lot of 
> the common functions and the eventfd handling entirely into a library.  
> I've attached an example implementation that just invokes a function to 
> handle the situation.
> 
> For Google's usecase specifically, at the root memcg level (system oom) we 
> want to do priority based memcg killing.  We want to kill from within a 
> memcg hierarchy that has the lowest priority relative to other memcgs.  
> This cannot be implemented with /proc/pid/oom_score_adj today.  Those 
> priorities may also change depending on whether a memcg hierarchy is 
> "overlimit", i.e. its limit has been increased temporarily because it has 
> hit a memcg oom and additional memory is readily available on the system.
> 
> So why not just introduce a memcg tunable that specifies a priority?  
> Well, it's not that simple.  Other users will want to implement different 
> policies on system oom (think about things like existing panic_on_oom or 
> oom_kill_allocating_task sysctls).  I introduced oom_kill_allocating_task 
> originally for SGI because they wanted a fast oom kill rather than 
> expensive tasklist scan: the allocating task itself is rather irrelevant, 
> it was just the unlucky task that was allocating at the moment that oom 
> was triggered.  What's guaranteed is that current in that case will always 
> free memory from under oom (it's not a member of some other mempolicy or 
> cpuset that would be needlessly killed).  Both sysctls could trivially be 
> reimplemented in userspace with this feature.
> 
> I have other customers who don't run in a memcg environment at all, they 
> simply reattach all processes to root and delete all other memcgs.  These 
> customers are only concerned about system oom conditions and want to do 
> something "interesting" before a process is killed.  Some want to log the 
> VM statistics as an artifact to examine later, some want to examine heap 
> profiles, others can start throttling and freeing memory rather than kill 
> anything.  All of this is impossible today because the kernel oom killer 
> will simply kill something immediately and any stats we collect afterwards 
> don't represent the oom condition.  The heap profiles are lost, throttling 
> is useless, etc.
> 
> Jianguo (cc'd) may also have usecases not described here.
> 

I want to log memory usage, like slabinfo, vmalloc info, page-cache info, etc. 
before
kill anything.

>> It is unfortunate that this feature is memcg-only.  Surely it could
>> also be used by non-memcg setups.  Would like to see at least a
>> detailed description of how this will all be presented and implemented.
>> We should aim to make the memcg and non-memcg userspace interfaces and
>> user-visible behaviour as similar as possible.
>>
> 
> It's memcg only because it can handle both system and memcg oom conditions 
> with the same clean interface, it would be possible to implement only 
> system oom condition handling through procfs (a little sloppy since it 
> needs to register the eventfd) but then a 

Re: [patch 00/11] userspace out of memory handling

2014-03-11 Thread Jianguo Wu
On 2014/3/6 10:52, David Rientjes wrote:

 On Wed, 5 Mar 2014, Andrew Morton wrote:
 
 This patchset introduces a standard interface through memcg that allows
 both of these conditions to be handled in the same clean way: users
 define memory.oom_reserve_in_bytes to define the reserve and this
 amount is allowed to be overcharged to the process handling the oom
 condition's memcg.  If used with the root memcg, this amount is allowed
 to be allocated below the per-zone watermarks for root processes that
 are handling such conditions (only root may write to
 cgroup.event_control for the root memcg).

 If process A is trying to allocate memory, cannot do so and the
 userspace oom-killer is invoked, there must be means via which process
 A waits for the userspace oom-killer's action.
 
 It does so by relooping in the page allocator waiting for memory to be 
 freed just like it would if the kernel oom killer were called and process 
 A was waiting for the oom kill victim process B to exit, we don't have the 
 ability to put it on a waitqueue because we don't touch the freeing 
 hotpath.  The userspace oom handler may not even necessarily kill 
 anything, it may be able to free its own memory and start throttling other 
 processes, for example.
 
 And there must be
 fallbacks which occur if the userspace oom killer fails to clear the
 oom condition, or times out.

 
 I agree completely and proposed this before as memory.oom_delay_millisecs 
 at http://lwn.net/Articles/432226 which we use internally when memory 
 can't be freed or a memcg's limit cannot be expanded.  I guess it makes 
 more sense alongside the rest of this patchset now, I can add it as an 
 additional patch next time around.
 
 Would be interested to see a description of how all this works.

 
 There's an article for LWN also being developed on this topic.  As 
 mentioned in that article, I think it would be best to generalize a lot of 
 the common functions and the eventfd handling entirely into a library.  
 I've attached an example implementation that just invokes a function to 
 handle the situation.
 
 For Google's usecase specifically, at the root memcg level (system oom) we 
 want to do priority based memcg killing.  We want to kill from within a 
 memcg hierarchy that has the lowest priority relative to other memcgs.  
 This cannot be implemented with /proc/pid/oom_score_adj today.  Those 
 priorities may also change depending on whether a memcg hierarchy is 
 overlimit, i.e. its limit has been increased temporarily because it has 
 hit a memcg oom and additional memory is readily available on the system.
 
 So why not just introduce a memcg tunable that specifies a priority?  
 Well, it's not that simple.  Other users will want to implement different 
 policies on system oom (think about things like existing panic_on_oom or 
 oom_kill_allocating_task sysctls).  I introduced oom_kill_allocating_task 
 originally for SGI because they wanted a fast oom kill rather than 
 expensive tasklist scan: the allocating task itself is rather irrelevant, 
 it was just the unlucky task that was allocating at the moment that oom 
 was triggered.  What's guaranteed is that current in that case will always 
 free memory from under oom (it's not a member of some other mempolicy or 
 cpuset that would be needlessly killed).  Both sysctls could trivially be 
 reimplemented in userspace with this feature.
 
 I have other customers who don't run in a memcg environment at all, they 
 simply reattach all processes to root and delete all other memcgs.  These 
 customers are only concerned about system oom conditions and want to do 
 something interesting before a process is killed.  Some want to log the 
 VM statistics as an artifact to examine later, some want to examine heap 
 profiles, others can start throttling and freeing memory rather than kill 
 anything.  All of this is impossible today because the kernel oom killer 
 will simply kill something immediately and any stats we collect afterwards 
 don't represent the oom condition.  The heap profiles are lost, throttling 
 is useless, etc.
 
 Jianguo (cc'd) may also have usecases not described here.
 

I want to log memory usage, like slabinfo, vmalloc info, page-cache info, etc. 
before
kill anything.

 It is unfortunate that this feature is memcg-only.  Surely it could
 also be used by non-memcg setups.  Would like to see at least a
 detailed description of how this will all be presented and implemented.
 We should aim to make the memcg and non-memcg userspace interfaces and
 user-visible behaviour as similar as possible.

 
 It's memcg only because it can handle both system and memcg oom conditions 
 with the same clean interface, it would be possible to implement only 
 system oom condition handling through procfs (a little sloppy since it 
 needs to register the eventfd) but then a userspace oom handler would need 
 to determine which interface to use based on whether it was running in a 
 memcg or 

[PATCH v2] ARM: mm: support big-endian page tables

2014-02-16 Thread Jianguo Wu
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x4400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[] (unwind_backtrace+0x0/0x11c) from [] 
(show_stack+0x10/0x14)
[] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
[] (bad_page+0xd4/0x104) from [] 
(free_pages_prepare+0xa8/0x14c)
[] (free_pages_prepare+0xa8/0x14c) from [] 
(free_hot_cold_page+0x18/0xf0)
[] (free_hot_cold_page+0x18/0xf0) from [] 
(handle_pte_fault+0xcf4/0xdc8)
[] (handle_pte_fault+0xcf4/0xdc8) from [] 
(handle_mm_fault+0xf4/0x120)
[] (handle_mm_fault+0xf4/0x120) from [] 
(do_page_fault+0xfc/0x354)
[] (do_page_fault+0xfc/0x354) from [] 
(do_DataAbort+0x2c/0x90)
[] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
...
set_pte_at(mm, address, page_table, entry);

//debug code
pfn = pte_pfn(entry);
pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry));

//read out the pte just set
new_pte = pte_offset_map(pmd, address);
new_pfn = pte_pfn(*new_pte);
pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry));
...
}

pfn:   0x1fa4f5, pte:0xc1fa4f575f
new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f   //new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
when pte is 64-bit, for little-endian, will store low 32-bit in r2,
high 32-bit in r3; for big-endian, will store low 32-bit in r3,
high 32-bit in r2, this will cause wrong pfn stored in pte,
so we should exchange r2 and r3 for big-endian.

Signed-off-by: Jianguo Wu 
---
 arch/arm/mm/proc-v7-3level.S |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 01a719e..22e3ad6 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
mov pc, lr
 ENDPROC(cpu_v7_switch_mm)
 
+#ifdef __ARMEB__
+#define rl r3
+#define rh r2
+#else
+#define rl r2
+#define rh r3
+#endif
+
 /*
  * cpu_v7_set_pte_ext(ptep, pte)
  *
@@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
-   tst r2, #L_PTE_VALID
+   tst rl, #L_PTE_VALID
beq 1f
-   tst r3, #1 << (57 - 32) @ L_PTE_NONE
-   bicne   r2, #L_PTE_VALID
+   tst rh, #1 << (57 - 32) @ L_PTE_NONE
+   bicne   rl, #L_PTE_VALID
bne 1f
-   tst r3, #1 << (55 - 32) @ L_PTE_DIRTY
-   orreq   r2, #L_PTE_RDONLY
+   tst rh, #1 << (55 - 32) @ L_PTE_DIRTY
+   orreq   rl, #L_PTE_RDONLY
 1: strdr2, r3, [r0]
ALT_SMP(W(nop))
ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] ARM: mm: support big-endian page tables

2014-02-16 Thread Jianguo Wu
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x4400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
(show_stack+0x10/0x14)
[c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
[c009e448] (bad_page+0xd4/0x104) from [c009e520] 
(free_pages_prepare+0xa8/0x14c)
[c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
(free_hot_cold_page+0x18/0xf0)
[c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
(handle_pte_fault+0xcf4/0xdc8)
[c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
(handle_mm_fault+0xf4/0x120)
[c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
(do_page_fault+0xfc/0x354)
[c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
(do_DataAbort+0x2c/0x90)
[c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
...
set_pte_at(mm, address, page_table, entry);

//debug code
pfn = pte_pfn(entry);
pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry));

//read out the pte just set
new_pte = pte_offset_map(pmd, address);
new_pfn = pte_pfn(*new_pte);
pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry));
...
}

pfn:   0x1fa4f5, pte:0xc1fa4f575f
new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f   //new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
when pte is 64-bit, for little-endian, will store low 32-bit in r2,
high 32-bit in r3; for big-endian, will store low 32-bit in r3,
high 32-bit in r2, this will cause wrong pfn stored in pte,
so we should exchange r2 and r3 for big-endian.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 arch/arm/mm/proc-v7-3level.S |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 01a719e..22e3ad6 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
mov pc, lr
 ENDPROC(cpu_v7_switch_mm)
 
+#ifdef __ARMEB__
+#define rl r3
+#define rh r2
+#else
+#define rl r2
+#define rh r3
+#endif
+
 /*
  * cpu_v7_set_pte_ext(ptep, pte)
  *
@@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
-   tst r2, #L_PTE_VALID
+   tst rl, #L_PTE_VALID
beq 1f
-   tst r3, #1  (57 - 32) @ L_PTE_NONE
-   bicne   r2, #L_PTE_VALID
+   tst rh, #1  (57 - 32) @ L_PTE_NONE
+   bicne   rl, #L_PTE_VALID
bne 1f
-   tst r3, #1  (55 - 32) @ L_PTE_DIRTY
-   orreq   r2, #L_PTE_RDONLY
+   tst rh, #1  (55 - 32) @ L_PTE_DIRTY
+   orreq   rl, #L_PTE_RDONLY
 1: strdr2, r3, [r0]
ALT_SMP(W(nop))
ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: mm: support big-endian page tables

2014-02-15 Thread Jianguo Wu
Ping...

On 2014/2/12 14:54, Jianguo Wu wrote:

> On 2014/2/11 18:40, Ben Dooks wrote:
> 
>> On 11/02/14 09:20, Jianguo Wu wrote:
>>> When enable LPAE and big-endian in a hisilicon board, while specify
>>> mem=384M mem=512M@7680M, will get bad page state:
>>>
>>> Freeing unused kernel memory: 180K (c0466000 - c0493000)
>>> BUG: Bad page state in process init  pfn:fa442
>>> page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
>>> page flags: 0x4400(reserved)
>>> Modules linked in:
>>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
>>> [] (unwind_backtrace+0x0/0x11c) from [] 
>>> (show_stack+0x10/0x14)
>>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
>>> [] (bad_page+0xd4/0x104) from [] 
>>> (free_pages_prepare+0xa8/0x14c)
>>> [] (free_pages_prepare+0xa8/0x14c) from [] 
>>> (free_hot_cold_page+0x18/0xf0)
>>> [] (free_hot_cold_page+0x18/0xf0) from [] 
>>> (handle_pte_fault+0xcf4/0xdc8)
>>> [] (handle_pte_fault+0xcf4/0xdc8) from [] 
>>> (handle_mm_fault+0xf4/0x120)
>>> [] (handle_mm_fault+0xf4/0x120) from [] 
>>> (do_page_fault+0xfc/0x354)
>>> [] (do_page_fault+0xfc/0x354) from [] 
>>> (do_DataAbort+0x2c/0x90)
>>> [] (do_DataAbort+0x2c/0x90) from [] 
>>> (__dabt_usr+0x34/0x40)
>>>
>>> The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
>>> debugging,
>>> I find in page fault handler, will get wrong pfn from pte just after set 
>>> pte,
>>> as follow:
>>> do_anonymous_page()
>>> {
>>> ...
>>> set_pte_at(mm, address, page_table, entry);
>>> 
>>> //debug code
>>> pfn = pte_pfn(entry);
>>> pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry));
>>>
>>> //read out the pte just set
>>> new_pte = pte_offset_map(pmd, address);
>>> new_pfn = pte_pfn(*new_pte);
>>> pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry));
>>> ...
>>> }
>>
>> Thanks, must have missed tickling this one.
>>
>>>
>>> pfn:   0x1fa4f5, pte:0xc1fa4f575f
>>> new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong.
>>>
>>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
>>> when pte is 64-bit, for little-endian, will store low 32-bit in r2,
>>> high 32-bit in r3; for big-endian, will store low 32-bit in r3,
>>> high 32-bit in r2, this will cause wrong pfn stored in pte,
>>> so we should exchange r2 and r3 for big-endian.
>>>
>>> Signed-off-by: Jianguo Wu 
>>> ---
>>>   arch/arm/mm/proc-v7-3level.S |   10 ++
>>>   1 files changed, 10 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>>> index 6ba4bd9..71b3892 100644
>>> --- a/arch/arm/mm/proc-v7-3level.S
>>> +++ b/arch/arm/mm/proc-v7-3level.S
>>> @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm)
>>>*/
>>>   ENTRY(cpu_v7_set_pte_ext)
>>>   #ifdef CONFIG_MMU
>>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>>> +tstr3, #L_PTE_VALID
>>> +beq1f
>>> +tstr2, #1 << (57 - 32)@ L_PTE_NONE
>>> +bicner3, #L_PTE_VALID
>>> +bne1f
>>> +tstr2, #1 << (55 - 32)@ L_PTE_DIRTY
>>> +orreqr3, #L_PTE_RDONLY
>>> +#else
>>>   tstr2, #L_PTE_VALID
>>>   beq    1f
>>>   tstr3, #1 << (57 - 32)@ L_PTE_NONE
>>> @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext)
>>>   bne1f
>>>   tstr3, #1 << (55 - 32)@ L_PTE_DIRTY
>>>   orreqr2, #L_PTE_RDONLY
>>> +#endif
>>>   1:strdr2, r3, [r0]
>>>   ALT_SMP(W(nop))
>>>   ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte
>>> -- 1.7.1
>>
>> If possible can we avoid large #ifdef blocks here?
>>
>> Two ideas are
>>
>> ARM_LE(tst r2, #L_PTE_VALID)
>> ARM_BE(tst r3, #L_PTE_VALID)
>>
>> or change r2, r3 pair to say rlow, rhi and
>>
>> #ifdef  CONFIG_CPU_ENDIAN_BE8
>> #define rlow r3
>> #define rhi r2
>> #else
>> #define rlow r2
>> #define rhi r3
>> #endif
>>
> 
> Hi Ben,
> Thanks for your suggestion, how about this?
> 
> Signed-off-by: Jianguo Wu 
> ---
>  arch/ar

Re: [PATCH] ARM: mm: support big-endian page tables

2014-02-15 Thread Jianguo Wu
Ping...

On 2014/2/12 14:54, Jianguo Wu wrote:

 On 2014/2/11 18:40, Ben Dooks wrote:
 
 On 11/02/14 09:20, Jianguo Wu wrote:
 When enable LPAE and big-endian in a hisilicon board, while specify
 mem=384M mem=512M@7680M, will get bad page state:

 Freeing unused kernel memory: 180K (c0466000 - c0493000)
 BUG: Bad page state in process init  pfn:fa442
 page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
 page flags: 0x4400(reserved)
 Modules linked in:
 CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
 (show_stack+0x10/0x14)
 [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
 [c009e448] (bad_page+0xd4/0x104) from [c009e520] 
 (free_pages_prepare+0xa8/0x14c)
 [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
 (free_hot_cold_page+0x18/0xf0)
 [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
 (handle_pte_fault+0xcf4/0xdc8)
 [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
 (handle_mm_fault+0xf4/0x120)
 [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
 (do_page_fault+0xfc/0x354)
 [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
 (do_DataAbort+0x2c/0x90)
 [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] 
 (__dabt_usr+0x34/0x40)

 The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
 debugging,
 I find in page fault handler, will get wrong pfn from pte just after set 
 pte,
 as follow:
 do_anonymous_page()
 {
 ...
 set_pte_at(mm, address, page_table, entry);
 
 //debug code
 pfn = pte_pfn(entry);
 pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry));

 //read out the pte just set
 new_pte = pte_offset_map(pmd, address);
 new_pfn = pte_pfn(*new_pte);
 pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry));
 ...
 }

 Thanks, must have missed tickling this one.


 pfn:   0x1fa4f5, pte:0xc1fa4f575f
 new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong.

 The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
 when pte is 64-bit, for little-endian, will store low 32-bit in r2,
 high 32-bit in r3; for big-endian, will store low 32-bit in r3,
 high 32-bit in r2, this will cause wrong pfn stored in pte,
 so we should exchange r2 and r3 for big-endian.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
   arch/arm/mm/proc-v7-3level.S |   10 ++
   1 files changed, 10 insertions(+), 0 deletions(-)

 diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
 index 6ba4bd9..71b3892 100644
 --- a/arch/arm/mm/proc-v7-3level.S
 +++ b/arch/arm/mm/proc-v7-3level.S
 @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm)
*/
   ENTRY(cpu_v7_set_pte_ext)
   #ifdef CONFIG_MMU
 +#ifdef CONFIG_CPU_ENDIAN_BE8
 +tstr3, #L_PTE_VALID
 +beq1f
 +tstr2, #1  (57 - 32)@ L_PTE_NONE
 +bicner3, #L_PTE_VALID
 +bne1f
 +tstr2, #1  (55 - 32)@ L_PTE_DIRTY
 +orreqr3, #L_PTE_RDONLY
 +#else
   tstr2, #L_PTE_VALID
   beq1f
   tstr3, #1  (57 - 32)@ L_PTE_NONE
 @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext)
   bne1f
   tstr3, #1  (55 - 32)@ L_PTE_DIRTY
   orreqr2, #L_PTE_RDONLY
 +#endif
   1:strdr2, r3, [r0]
   ALT_SMP(W(nop))
   ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte
 -- 1.7.1

 If possible can we avoid large #ifdef blocks here?

 Two ideas are

 ARM_LE(tst r2, #L_PTE_VALID)
 ARM_BE(tst r3, #L_PTE_VALID)

 or change r2, r3 pair to say rlow, rhi and

 #ifdef  CONFIG_CPU_ENDIAN_BE8
 #define rlow r3
 #define rhi r2
 #else
 #define rlow r2
 #define rhi r3
 #endif

 
 Hi Ben,
 Thanks for your suggestion, how about this?
 
 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
  arch/arm/mm/proc-v7-3level.S |   18 +-
  1 files changed, 13 insertions(+), 5 deletions(-)
 
 diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
 index 01a719e..22e3ad6 100644
 --- a/arch/arm/mm/proc-v7-3level.S
 +++ b/arch/arm/mm/proc-v7-3level.S
 @@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
   mov pc, lr
  ENDPROC(cpu_v7_switch_mm)
  
 +#ifdef __ARMEB__
 +#define rl r3
 +#define rh r2
 +#else
 +#define rl r2
 +#define rh r3
 +#endif
 +
  /*
   * cpu_v7_set_pte_ext(ptep, pte)
   *
 @@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
   */
  ENTRY(cpu_v7_set_pte_ext)
  #ifdef CONFIG_MMU
 - tst r2, #L_PTE_VALID
 + tst rl, #L_PTE_VALID
   beq 1f
 - tst r3, #1  (57 - 32) @ L_PTE_NONE
 - bicne   r2, #L_PTE_VALID
 + tst rh, #1  (57 - 32) @ L_PTE_NONE
 + bicne   rl, #L_PTE_VALID
   bne 1f
 - tst r3, #1  (55 - 32) @ L_PTE_DIRTY
 - orreq   r2, #L_PTE_RDONLY
 + tst rh, #1  (55 - 32) @ L_PTE_DIRTY
 + orreq   rl, #L_PTE_RDONLY
  1:   strdr2, r3, [r0]
   ALT_SMP(W(nop))
   ALT_UP (mcr p15, 0, r0, c7, c10, 1

Re: [PATCH] ARM: mm: support big-endian page tables

2014-02-11 Thread Jianguo Wu
On 2014/2/11 18:40, Ben Dooks wrote:

> On 11/02/14 09:20, Jianguo Wu wrote:
>> When enable LPAE and big-endian in a hisilicon board, while specify
>> mem=384M mem=512M@7680M, will get bad page state:
>>
>> Freeing unused kernel memory: 180K (c0466000 - c0493000)
>> BUG: Bad page state in process init  pfn:fa442
>> page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
>> page flags: 0x4400(reserved)
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
>> [] (unwind_backtrace+0x0/0x11c) from [] 
>> (show_stack+0x10/0x14)
>> [] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
>> [] (bad_page+0xd4/0x104) from [] 
>> (free_pages_prepare+0xa8/0x14c)
>> [] (free_pages_prepare+0xa8/0x14c) from [] 
>> (free_hot_cold_page+0x18/0xf0)
>> [] (free_hot_cold_page+0x18/0xf0) from [] 
>> (handle_pte_fault+0xcf4/0xdc8)
>> [] (handle_pte_fault+0xcf4/0xdc8) from [] 
>> (handle_mm_fault+0xf4/0x120)
>> [] (handle_mm_fault+0xf4/0x120) from [] 
>> (do_page_fault+0xfc/0x354)
>> [] (do_page_fault+0xfc/0x354) from [] 
>> (do_DataAbort+0x2c/0x90)
>> [] (do_DataAbort+0x2c/0x90) from [] 
>> (__dabt_usr+0x34/0x40)
>>
>> The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
>> debugging,
>> I find in page fault handler, will get wrong pfn from pte just after set pte,
>> as follow:
>> do_anonymous_page()
>> {
>> ...
>> set_pte_at(mm, address, page_table, entry);
>> 
>> //debug code
>> pfn = pte_pfn(entry);
>> pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry));
>>
>> //read out the pte just set
>> new_pte = pte_offset_map(pmd, address);
>> new_pfn = pte_pfn(*new_pte);
>> pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry));
>> ...
>> }
> 
> Thanks, must have missed tickling this one.
> 
>>
>> pfn:   0x1fa4f5, pte:0xc1fa4f575f
>> new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong.
>>
>> The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
>> when pte is 64-bit, for little-endian, will store low 32-bit in r2,
>> high 32-bit in r3; for big-endian, will store low 32-bit in r3,
>> high 32-bit in r2, this will cause wrong pfn stored in pte,
>> so we should exchange r2 and r3 for big-endian.
>>
>> Signed-off-by: Jianguo Wu 
>> ---
>>   arch/arm/mm/proc-v7-3level.S |   10 ++
>>   1 files changed, 10 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 6ba4bd9..71b3892 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm)
>>*/
>>   ENTRY(cpu_v7_set_pte_ext)
>>   #ifdef CONFIG_MMU
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +tstr3, #L_PTE_VALID
>> +beq1f
>> +tstr2, #1 << (57 - 32)@ L_PTE_NONE
>> +bicner3, #L_PTE_VALID
>> +bne1f
>> +tstr2, #1 << (55 - 32)@ L_PTE_DIRTY
>> +orreqr3, #L_PTE_RDONLY
>> +#else
>>   tstr2, #L_PTE_VALID
>>   beq1f
>>   tstr3, #1 << (57 - 32)@ L_PTE_NONE
>> @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext)
>>   bne1f
>>   tstr3, #1 << (55 - 32)@ L_PTE_DIRTY
>>   orreqr2, #L_PTE_RDONLY
>> +#endif
>>   1:strdr2, r3, [r0]
>>   ALT_SMP(W(nop))
>>   ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte
>> -- 1.7.1
> 
> If possible can we avoid large #ifdef blocks here?
> 
> Two ideas are
> 
> ARM_LE(tst r2, #L_PTE_VALID)
> ARM_BE(tst r3, #L_PTE_VALID)
> 
> or change r2, r3 pair to say rlow, rhi and
> 
> #ifdef  CONFIG_CPU_ENDIAN_BE8
> #define rlow r3
> #define rhi r2
> #else
> #define rlow r2
> #define rhi r3
> #endif
> 

Hi Ben,
Thanks for your suggestion, how about this?

Signed-off-by: Jianguo Wu 
---
 arch/arm/mm/proc-v7-3level.S |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 01a719e..22e3ad6 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
mov pc, lr
 ENDPROC(cpu_v7_switch_mm)
 
+#ifdef __ARMEB__
+#define rl r3
+#define rh r2
+#else
+#define rl r2
+#define rh r3
+#endif
+
 /*
  * cpu_v7_set_pte_ext(ptep, pte)
  *
@@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
  */

[PATCH] ARM: mm: support big-endian page tables

2014-02-11 Thread Jianguo Wu
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x4400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[] (unwind_backtrace+0x0/0x11c) from [] 
(show_stack+0x10/0x14)
[] (show_stack+0x10/0x14) from [] (bad_page+0xd4/0x104)
[] (bad_page+0xd4/0x104) from [] 
(free_pages_prepare+0xa8/0x14c)
[] (free_pages_prepare+0xa8/0x14c) from [] 
(free_hot_cold_page+0x18/0xf0)
[] (free_hot_cold_page+0x18/0xf0) from [] 
(handle_pte_fault+0xcf4/0xdc8)
[] (handle_pte_fault+0xcf4/0xdc8) from [] 
(handle_mm_fault+0xf4/0x120)
[] (handle_mm_fault+0xf4/0x120) from [] 
(do_page_fault+0xfc/0x354)
[] (do_page_fault+0xfc/0x354) from [] 
(do_DataAbort+0x2c/0x90)
[] (do_DataAbort+0x2c/0x90) from [] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
...
set_pte_at(mm, address, page_table, entry);

//debug code
pfn = pte_pfn(entry);
pr_info("pfn:0x%lx, pte:0x%llx\n", pfn, pte_val(entry));

//read out the pte just set
new_pte = pte_offset_map(pmd, address);
new_pfn = pte_pfn(*new_pte);
pr_info("new pfn:0x%lx, new pte:0x%llx\n", pfn, pte_val(entry));
...
}

pfn:   0x1fa4f5, pte:0xc1fa4f575f
new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f   //new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
when pte is 64-bit, for little-endian, will store low 32-bit in r2,
high 32-bit in r3; for big-endian, will store low 32-bit in r3,
high 32-bit in r2, this will cause wrong pfn stored in pte,
so we should exchange r2 and r3 for big-endian.

Signed-off-by: Jianguo Wu 
---
 arch/arm/mm/proc-v7-3level.S |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 6ba4bd9..71b3892 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm)
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_ENDIAN_BE8
+   tst r3, #L_PTE_VALID
+   beq 1f
+   tst r2, #1 << (57 - 32) @ L_PTE_NONE
+   bicne   r3, #L_PTE_VALID
+   bne 1f
+   tst r2, #1 << (55 - 32) @ L_PTE_DIRTY
+   orreq   r3, #L_PTE_RDONLY
+#else
tst r2, #L_PTE_VALID
beq 1f
tst r3, #1 << (57 - 32) @ L_PTE_NONE
@@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext)
bne 1f
tst r3, #1 << (55 - 32) @ L_PTE_DIRTY
orreq   r2, #L_PTE_RDONLY
+#endif
 1: strdr2, r3, [r0]
ALT_SMP(W(nop))
ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte
-- 1.7.1 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: mm: support big-endian page tables

2014-02-11 Thread Jianguo Wu
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x4400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
(show_stack+0x10/0x14)
[c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
[c009e448] (bad_page+0xd4/0x104) from [c009e520] 
(free_pages_prepare+0xa8/0x14c)
[c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
(free_hot_cold_page+0x18/0xf0)
[c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
(handle_pte_fault+0xcf4/0xdc8)
[c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
(handle_mm_fault+0xf4/0x120)
[c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
(do_page_fault+0xfc/0x354)
[c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
(do_DataAbort+0x2c/0x90)
[c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
...
set_pte_at(mm, address, page_table, entry);

//debug code
pfn = pte_pfn(entry);
pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry));

//read out the pte just set
new_pte = pte_offset_map(pmd, address);
new_pfn = pte_pfn(*new_pte);
pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry));
...
}

pfn:   0x1fa4f5, pte:0xc1fa4f575f
new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f   //new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
when pte is 64-bit, for little-endian, will store low 32-bit in r2,
high 32-bit in r3; for big-endian, will store low 32-bit in r3,
high 32-bit in r2, this will cause wrong pfn stored in pte,
so we should exchange r2 and r3 for big-endian.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 arch/arm/mm/proc-v7-3level.S |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 6ba4bd9..71b3892 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm)
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_ENDIAN_BE8
+   tst r3, #L_PTE_VALID
+   beq 1f
+   tst r2, #1  (57 - 32) @ L_PTE_NONE
+   bicne   r3, #L_PTE_VALID
+   bne 1f
+   tst r2, #1  (55 - 32) @ L_PTE_DIRTY
+   orreq   r3, #L_PTE_RDONLY
+#else
tst r2, #L_PTE_VALID
beq 1f
tst r3, #1  (57 - 32) @ L_PTE_NONE
@@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext)
bne 1f
tst r3, #1  (55 - 32) @ L_PTE_DIRTY
orreq   r2, #L_PTE_RDONLY
+#endif
 1: strdr2, r3, [r0]
ALT_SMP(W(nop))
ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte
-- 1.7.1 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: mm: support big-endian page tables

2014-02-11 Thread Jianguo Wu
On 2014/2/11 18:40, Ben Dooks wrote:

 On 11/02/14 09:20, Jianguo Wu wrote:
 When enable LPAE and big-endian in a hisilicon board, while specify
 mem=384M mem=512M@7680M, will get bad page state:

 Freeing unused kernel memory: 180K (c0466000 - c0493000)
 BUG: Bad page state in process init  pfn:fa442
 page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
 page flags: 0x4400(reserved)
 Modules linked in:
 CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
 [c000f5f0] (unwind_backtrace+0x0/0x11c) from [c000cbc4] 
 (show_stack+0x10/0x14)
 [c000cbc4] (show_stack+0x10/0x14) from [c009e448] (bad_page+0xd4/0x104)
 [c009e448] (bad_page+0xd4/0x104) from [c009e520] 
 (free_pages_prepare+0xa8/0x14c)
 [c009e520] (free_pages_prepare+0xa8/0x14c) from [c009f8ec] 
 (free_hot_cold_page+0x18/0xf0)
 [c009f8ec] (free_hot_cold_page+0x18/0xf0) from [c00b5444] 
 (handle_pte_fault+0xcf4/0xdc8)
 [c00b5444] (handle_pte_fault+0xcf4/0xdc8) from [c00b6458] 
 (handle_mm_fault+0xf4/0x120)
 [c00b6458] (handle_mm_fault+0xf4/0x120) from [c0013754] 
 (do_page_fault+0xfc/0x354)
 [c0013754] (do_page_fault+0xfc/0x354) from [c0008400] 
 (do_DataAbort+0x2c/0x90)
 [c0008400] (do_DataAbort+0x2c/0x90) from [c0008fb4] 
 (__dabt_usr+0x34/0x40)

 The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after 
 debugging,
 I find in page fault handler, will get wrong pfn from pte just after set pte,
 as follow:
 do_anonymous_page()
 {
 ...
 set_pte_at(mm, address, page_table, entry);
 
 //debug code
 pfn = pte_pfn(entry);
 pr_info(pfn:0x%lx, pte:0x%llx\n, pfn, pte_val(entry));

 //read out the pte just set
 new_pte = pte_offset_map(pmd, address);
 new_pfn = pte_pfn(*new_pte);
 pr_info(new pfn:0x%lx, new pte:0x%llx\n, pfn, pte_val(entry));
 ...
 }
 
 Thanks, must have missed tickling this one.
 

 pfn:   0x1fa4f5, pte:0xc1fa4f575f
 new_pfn:0xfa4f5, new_pte:0xc0fa4f5f5f//new pfn/pte is wrong.

 The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
 when pte is 64-bit, for little-endian, will store low 32-bit in r2,
 high 32-bit in r3; for big-endian, will store low 32-bit in r3,
 high 32-bit in r2, this will cause wrong pfn stored in pte,
 so we should exchange r2 and r3 for big-endian.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
   arch/arm/mm/proc-v7-3level.S |   10 ++
   1 files changed, 10 insertions(+), 0 deletions(-)

 diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
 index 6ba4bd9..71b3892 100644
 --- a/arch/arm/mm/proc-v7-3level.S
 +++ b/arch/arm/mm/proc-v7-3level.S
 @@ -65,6 +65,15 @@ ENDPROC(cpu_v7_switch_mm)
*/
   ENTRY(cpu_v7_set_pte_ext)
   #ifdef CONFIG_MMU
 +#ifdef CONFIG_CPU_ENDIAN_BE8
 +tstr3, #L_PTE_VALID
 +beq1f
 +tstr2, #1  (57 - 32)@ L_PTE_NONE
 +bicner3, #L_PTE_VALID
 +bne1f
 +tstr2, #1  (55 - 32)@ L_PTE_DIRTY
 +orreqr3, #L_PTE_RDONLY
 +#else
   tstr2, #L_PTE_VALID
   beq1f
   tstr3, #1  (57 - 32)@ L_PTE_NONE
 @@ -72,6 +81,7 @@ ENTRY(cpu_v7_set_pte_ext)
   bne1f
   tstr3, #1  (55 - 32)@ L_PTE_DIRTY
   orreqr2, #L_PTE_RDONLY
 +#endif
   1:strdr2, r3, [r0]
   ALT_SMP(W(nop))
   ALT_UP (mcrp15, 0, r0, c7, c10, 1)@ flush_pte
 -- 1.7.1
 
 If possible can we avoid large #ifdef blocks here?
 
 Two ideas are
 
 ARM_LE(tst r2, #L_PTE_VALID)
 ARM_BE(tst r3, #L_PTE_VALID)
 
 or change r2, r3 pair to say rlow, rhi and
 
 #ifdef  CONFIG_CPU_ENDIAN_BE8
 #define rlow r3
 #define rhi r2
 #else
 #define rlow r2
 #define rhi r3
 #endif
 

Hi Ben,
Thanks for your suggestion, how about this?

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 arch/arm/mm/proc-v7-3level.S |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 01a719e..22e3ad6 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -64,6 +64,14 @@ ENTRY(cpu_v7_switch_mm)
mov pc, lr
 ENDPROC(cpu_v7_switch_mm)
 
+#ifdef __ARMEB__
+#define rl r3
+#define rh r2
+#else
+#define rl r2
+#define rh r3
+#endif
+
 /*
  * cpu_v7_set_pte_ext(ptep, pte)
  *
@@ -73,13 +81,13 @@ ENDPROC(cpu_v7_switch_mm)
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
-   tst r2, #L_PTE_VALID
+   tst rl, #L_PTE_VALID
beq 1f
-   tst r3, #1  (57 - 32) @ L_PTE_NONE
-   bicne   r2, #L_PTE_VALID
+   tst rh, #1  (57 - 32) @ L_PTE_NONE
+   bicne   rl, #L_PTE_VALID
bne 1f
-   tst r3, #1  (55 - 32) @ L_PTE_DIRTY
-   orreq   r2, #L_PTE_RDONLY
+   tst rh, #1  (55 - 32) @ L_PTE_DIRTY
+   orreq   rl, #L_PTE_RDONLY
 1: strdr2, r3, [r0]
ALT_SMP(W(nop))
ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte
-- 
1.7.1

Thanks,
Jianguo Wu

Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

2014-02-10 Thread Jianguo Wu
On 2014/1/22 4:41, David Rientjes wrote:

> On Tue, 21 Jan 2014, Jianguo Wu wrote:
> 
>>> The problem is that slabinfo becomes excessively verbose and dumping it 
>>> all to the kernel log often times causes important messages to be lost.  
>>> This is why we control things like the tasklist dump with a VM sysctl.  It 
>>> would be possible to dump, say, the top ten slab caches with the highest 
>>> memory usage, but it will only be helpful for slab leaks.  Typically there 
>>> are better debugging tools available than analyzing the kernel log; if you 
>>> see unusually high slab memory in the meminfo dump, you can enable it.
>>>
>>
>> But, when OOM has happened, we can only use kernel log, slab/vmalloc info 
>> from proc
>> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 
>> 10/20 entrys?
>>
> 
> You could, but it's a tradeoff between how much to dump to a general 
> resource such as the kernel log and how many sysctls we add that control 
> every possible thing.  Slab leaks would definitely be a minority of oom 
> conditions and you should normally be able to reproduce them by running 
> the same workload; just use slabtop(1) or manually inspect /proc/slabinfo 
> while such a workload is running for indicators.  I don't think we want to 
> add the information by default, though, nor do we want to add sysctls to 
> control the behavior (you'd still need to reproduce the issue after 
> enabling it).
> 
> We are currently discussing userspace oom handlers, though, that would 
> allow you to run a process that would be notified and allowed to allocate 
> a small amount of memory on oom conditions.  It would then be trivial to 
> dump any information you feel pertinent in userspace prior to killing 
> something.  I like to inspect heap profiles for memory hogs while 
> debugging our malloc() issues, for example, and you could look more 
> closely at kernel memory.
> 
> I'll cc you on future discussions of that feature.
> 

Hi David,

Thanks for your kindly explanation, do you have any specific plans on this?

Thanks,
Jianguo Wu.

> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

2014-02-10 Thread Jianguo Wu
On 2014/1/22 4:41, David Rientjes wrote:

 On Tue, 21 Jan 2014, Jianguo Wu wrote:
 
 The problem is that slabinfo becomes excessively verbose and dumping it 
 all to the kernel log often times causes important messages to be lost.  
 This is why we control things like the tasklist dump with a VM sysctl.  It 
 would be possible to dump, say, the top ten slab caches with the highest 
 memory usage, but it will only be helpful for slab leaks.  Typically there 
 are better debugging tools available than analyzing the kernel log; if you 
 see unusually high slab memory in the meminfo dump, you can enable it.


 But, when OOM has happened, we can only use kernel log, slab/vmalloc info 
 from proc
 is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 
 10/20 entrys?

 
 You could, but it's a tradeoff between how much to dump to a general 
 resource such as the kernel log and how many sysctls we add that control 
 every possible thing.  Slab leaks would definitely be a minority of oom 
 conditions and you should normally be able to reproduce them by running 
 the same workload; just use slabtop(1) or manually inspect /proc/slabinfo 
 while such a workload is running for indicators.  I don't think we want to 
 add the information by default, though, nor do we want to add sysctls to 
 control the behavior (you'd still need to reproduce the issue after 
 enabling it).
 
 We are currently discussing userspace oom handlers, though, that would 
 allow you to run a process that would be notified and allowed to allocate 
 a small amount of memory on oom conditions.  It would then be trivial to 
 dump any information you feel pertinent in userspace prior to killing 
 something.  I like to inspect heap profiles for memory hogs while 
 debugging our malloc() issues, for example, and you could look more 
 closely at kernel memory.
 
 I'll cc you on future discussions of that feature.
 

Hi David,

Thanks for your kindly explanation, do you have any specific plans on this?

Thanks,
Jianguo Wu.

 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

2014-01-21 Thread Jianguo Wu
On 2014/1/21 13:34, David Rientjes wrote:

> On Mon, 20 Jan 2014, Jianguo Wu wrote:
> 
>> When OOM happen, will dump buddy free areas info, hugetlb pages info,
>> memory state of all eligible tasks, per-cpu memory info.
>> But do not dump slab/vmalloc info, sometime, it's not enough to figure out 
>> the
>> reason OOM happened.
>>
>> So, my questions are:
>> 1. Should dump slab/vmalloc info when OOM happen? Though we can get these 
>> from proc file,
>> but usually we do not monitor the logs and check proc file immediately when 
>> OOM happened.
>>
> 

Hi David,
Thank you for your patience to answer!

> The problem is that slabinfo becomes excessively verbose and dumping it 
> all to the kernel log often times causes important messages to be lost.  
> This is why we control things like the tasklist dump with a VM sysctl.  It 
> would be possible to dump, say, the top ten slab caches with the highest 
> memory usage, but it will only be helpful for slab leaks.  Typically there 
> are better debugging tools available than analyzing the kernel log; if you 
> see unusually high slab memory in the meminfo dump, you can enable it.
> 

But, when OOM has happened, we can only use kernel log, slab/vmalloc info from 
proc
is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 
entrys?

Thanks.

>> 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be 
>> dumped?

>>
> 
> Also very verbose and would cause important messages to be lost, we try to 
> avoid spamming the kernel log with all of this information as much as 
> possible.
> 
>> 3. Without these info, usually how to figure out OOM reason?
>>
> 
> Analyze the memory usage in the meminfo and determine what is unusually 
> high; if it's mostly anonymous memory, you can usually correlate it back 
> to a high rss for a process in the tasklist that you didn't suspect to be 
> using that much memory, for example.
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

2014-01-21 Thread Jianguo Wu
On 2014/1/21 13:34, David Rientjes wrote:

 On Mon, 20 Jan 2014, Jianguo Wu wrote:
 
 When OOM happen, will dump buddy free areas info, hugetlb pages info,
 memory state of all eligible tasks, per-cpu memory info.
 But do not dump slab/vmalloc info, sometime, it's not enough to figure out 
 the
 reason OOM happened.

 So, my questions are:
 1. Should dump slab/vmalloc info when OOM happen? Though we can get these 
 from proc file,
 but usually we do not monitor the logs and check proc file immediately when 
 OOM happened.

 

Hi David,
Thank you for your patience to answer!

 The problem is that slabinfo becomes excessively verbose and dumping it 
 all to the kernel log often times causes important messages to be lost.  
 This is why we control things like the tasklist dump with a VM sysctl.  It 
 would be possible to dump, say, the top ten slab caches with the highest 
 memory usage, but it will only be helpful for slab leaks.  Typically there 
 are better debugging tools available than analyzing the kernel log; if you 
 see unusually high slab memory in the meminfo dump, you can enable it.
 

But, when OOM has happened, we can only use kernel log, slab/vmalloc info from 
proc
is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 
entrys?

Thanks.

 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be 
 dumped?


 
 Also very verbose and would cause important messages to be lost, we try to 
 avoid spamming the kernel log with all of this information as much as 
 possible.
 
 3. Without these info, usually how to figure out OOM reason?

 
 Analyze the memory usage in the meminfo and determine what is unusually 
 high; if it's mostly anonymous memory, you can usually correlate it back 
 to a high rss for a process in the tasklist that you didn't suspect to be 
 using that much memory, for example.
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

2014-01-20 Thread Jianguo Wu
When OOM happen, will dump buddy free areas info, hugetlb pages info,
memory state of all eligible tasks, per-cpu memory info.
But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
reason OOM happened.

So, my questions are:
1. Should dump slab/vmalloc info when OOM happen? Though we can get these from 
proc file,
but usually we do not monitor the logs and check proc file immediately when OOM 
happened.

2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be 
dumped?

3. Without these info, usually how to figure out OOM reason?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

2014-01-20 Thread Jianguo Wu
When OOM happen, will dump buddy free areas info, hugetlb pages info,
memory state of all eligible tasks, per-cpu memory info.
But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
reason OOM happened.

So, my questions are:
1. Should dump slab/vmalloc info when OOM happen? Though we can get these from 
proc file,
but usually we do not monitor the logs and check proc file immediately when OOM 
happened.

2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be 
dumped?

3. Without these info, usually how to figure out OOM reason?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime

2014-01-18 Thread Jianguo Wu
On 2014/1/17 20:04, Catalin Marinas wrote:

> On Fri, Jan 17, 2014 at 09:40:02AM +0000, Jianguo Wu wrote:
>> Now disabling kmemleak is an irreversible operation, but sometimes
>> we may need to re-enable kmemleak at runtime. So add a knob to enable
>> kmemleak at runtime:
>> echo on > /sys/kernel/debug/kmemleak
> 
> It is irreversible for very good reason: once it missed the initial
> memory allocations, there is no way for kmemleak to build the object
> reference graph and you'll get lots of false positives, pretty much
> making it unusable.
> 

Do you mean we didn't trace memory allocations during kmemleak disable period,
and these memory may reference to new allocated objects after re-enable? 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime

2014-01-18 Thread Jianguo Wu
On 2014/1/17 20:04, Catalin Marinas wrote:

 On Fri, Jan 17, 2014 at 09:40:02AM +, Jianguo Wu wrote:
 Now disabling kmemleak is an irreversible operation, but sometimes
 we may need to re-enable kmemleak at runtime. So add a knob to enable
 kmemleak at runtime:
 echo on  /sys/kernel/debug/kmemleak
 
 It is irreversible for very good reason: once it missed the initial
 memory allocations, there is no way for kmemleak to build the object
 reference graph and you'll get lots of false positives, pretty much
 making it unusable.
 

Do you mean we didn't trace memory allocations during kmemleak disable period,
and these memory may reference to new allocated objects after re-enable? 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime

2014-01-17 Thread Jianguo Wu
Now disabling kmemleak is an irreversible operation, but sometimes
we may need to re-enable kmemleak at runtime. So add a knob to enable
kmemleak at runtime:
echo on > /sys/kernel/debug/kmemleak

Signed-off-by: Jianguo Wu 
---
 Documentation/kmemleak.txt |3 ++-
 mm/kmemleak.c  |   37 +
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
index b6e3973..8ec56ad 100644
--- a/Documentation/kmemleak.txt
+++ b/Documentation/kmemleak.txt
@@ -44,7 +44,8 @@ objects to be reported as orphan.
 Memory scanning parameters can be modified at run-time by writing to the
 /sys/kernel/debug/kmemleak file. The following parameters are supported:
 
-  off  - disable kmemleak (irreversible)
+  off  - disable kmemleak
+  on   - enable kmemleak
   stack=on - enable the task stacks scanning (default)
   stack=off- disable the tasks stacks scanning
   scan=on  - start the automatic memory scanning thread (default)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 31f01c5..02f292c 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -260,6 +260,7 @@ static struct early_log
 static int crt_early_log __initdata;
 
 static void kmemleak_disable(void);
+static void kmemleak_enable(void);
 
 /*
  * Print a warning and dump the stack trace.
@@ -1616,9 +1617,6 @@ static ssize_t kmemleak_write(struct file *file, const 
char __user *user_buf,
int buf_size;
int ret;
 
-   if (!atomic_read(_enabled))
-   return -EBUSY;
-
buf_size = min(size, (sizeof(buf) - 1));
if (strncpy_from_user(buf, user_buf, buf_size) < 0)
return -EFAULT;
@@ -1628,6 +1626,19 @@ static ssize_t kmemleak_write(struct file *file, const 
char __user *user_buf,
if (ret < 0)
return ret;
 
+   if (strncmp(buf, "on", 2) == 0) {
+   if (atomic_read(_enabled))
+   ret = -EBUSY;
+   else
+   kmemleak_enable();
+   goto out;
+   }
+
+   if (!atomic_read(_enabled)) {
+   ret = -EBUSY;
+   goto out;
+   }
+
if (strncmp(buf, "off", 3) == 0)
kmemleak_disable();
else if (strncmp(buf, "stack=on", 8) == 0)
@@ -1703,7 +1714,7 @@ static DECLARE_WORK(cleanup_work, kmemleak_do_cleanup);
 
 /*
  * Disable kmemleak. No memory allocation/freeing will be traced once this
- * function is called. Disabling kmemleak is an irreversible operation.
+ * function is called.
  */
 static void kmemleak_disable(void)
 {
@@ -1721,6 +1732,24 @@ static void kmemleak_disable(void)
pr_info("Kernel memory leak detector disabled\n");
 }
 
+static void kmemleak_enable(void)
+{
+   struct kmemleak_object *object;
+
+   /* free the kmemleak internal objects the previous thread scanned */
+   rcu_read_lock();
+   list_for_each_entry_rcu(object, _list, object_list)
+   delete_object_full(object->pointer);
+   rcu_read_unlock();
+
+   atomic_set(_enabled, 1);
+   atomic_set(_error, 0);
+
+   start_scan_thread();
+
+   pr_info("Kernel memory leak detector enabled\n");
+}
+
 /*
  * Allow boot-time kmemleak disabling (enabled by default).
  */
-- 
1.7.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/kmemleak: add support for re-enable kmemleak at runtime

2014-01-17 Thread Jianguo Wu
Now disabling kmemleak is an irreversible operation, but sometimes
we may need to re-enable kmemleak at runtime. So add a knob to enable
kmemleak at runtime:
echo on  /sys/kernel/debug/kmemleak

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 Documentation/kmemleak.txt |3 ++-
 mm/kmemleak.c  |   37 +
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
index b6e3973..8ec56ad 100644
--- a/Documentation/kmemleak.txt
+++ b/Documentation/kmemleak.txt
@@ -44,7 +44,8 @@ objects to be reported as orphan.
 Memory scanning parameters can be modified at run-time by writing to the
 /sys/kernel/debug/kmemleak file. The following parameters are supported:
 
-  off  - disable kmemleak (irreversible)
+  off  - disable kmemleak
+  on   - enable kmemleak
   stack=on - enable the task stacks scanning (default)
   stack=off- disable the tasks stacks scanning
   scan=on  - start the automatic memory scanning thread (default)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 31f01c5..02f292c 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -260,6 +260,7 @@ static struct early_log
 static int crt_early_log __initdata;
 
 static void kmemleak_disable(void);
+static void kmemleak_enable(void);
 
 /*
  * Print a warning and dump the stack trace.
@@ -1616,9 +1617,6 @@ static ssize_t kmemleak_write(struct file *file, const 
char __user *user_buf,
int buf_size;
int ret;
 
-   if (!atomic_read(kmemleak_enabled))
-   return -EBUSY;
-
buf_size = min(size, (sizeof(buf) - 1));
if (strncpy_from_user(buf, user_buf, buf_size)  0)
return -EFAULT;
@@ -1628,6 +1626,19 @@ static ssize_t kmemleak_write(struct file *file, const 
char __user *user_buf,
if (ret  0)
return ret;
 
+   if (strncmp(buf, on, 2) == 0) {
+   if (atomic_read(kmemleak_enabled))
+   ret = -EBUSY;
+   else
+   kmemleak_enable();
+   goto out;
+   }
+
+   if (!atomic_read(kmemleak_enabled)) {
+   ret = -EBUSY;
+   goto out;
+   }
+
if (strncmp(buf, off, 3) == 0)
kmemleak_disable();
else if (strncmp(buf, stack=on, 8) == 0)
@@ -1703,7 +1714,7 @@ static DECLARE_WORK(cleanup_work, kmemleak_do_cleanup);
 
 /*
  * Disable kmemleak. No memory allocation/freeing will be traced once this
- * function is called. Disabling kmemleak is an irreversible operation.
+ * function is called.
  */
 static void kmemleak_disable(void)
 {
@@ -1721,6 +1732,24 @@ static void kmemleak_disable(void)
pr_info(Kernel memory leak detector disabled\n);
 }
 
+static void kmemleak_enable(void)
+{
+   struct kmemleak_object *object;
+
+   /* free the kmemleak internal objects the previous thread scanned */
+   rcu_read_lock();
+   list_for_each_entry_rcu(object, object_list, object_list)
+   delete_object_full(object-pointer);
+   rcu_read_unlock();
+
+   atomic_set(kmemleak_enabled, 1);
+   atomic_set(kmemleak_error, 0);
+
+   start_scan_thread();
+
+   pr_info(Kernel memory leak detector enabled\n);
+}
+
 /*
  * Allow boot-time kmemleak disabling (enabled by default).
  */
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: free memblock.memory in free_all_bootmem

2014-01-07 Thread Jianguo Wu
On 2014/1/7 23:16, Philipp Hachtmann wrote:

> When calling free_all_bootmem() the free areas under memblock's
> control are released to the buddy allocator. Additionally the
> reserved list is freed if it was reallocated by memblock.
> The same should apply for the memory list.
> 
> Signed-off-by: Philipp Hachtmann 
> ---
>  include/linux/memblock.h |  1 +
>  mm/memblock.c| 12 
>  mm/nobootmem.c   |  7 ++-
>  3 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 77c60e5..d174922 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -52,6 +52,7 @@ phys_addr_t memblock_find_in_range_node(phys_addr_t start, 
> phys_addr_t end,
>  phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
>  phys_addr_t size, phys_addr_t align);
>  phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr);
> +phys_addr_t get_allocated_memblock_memory_regions_info(phys_addr_t *addr);
>  void memblock_allow_resize(void);
>  int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
>  int memblock_add(phys_addr_t base, phys_addr_t size);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 53e477b..1a11d04 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -271,6 +271,18 @@ phys_addr_t __init_memblock 
> get_allocated_memblock_reserved_regions_info(
> memblock.reserved.max);
>  }
>  
> +phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info(
> + phys_addr_t *addr)
> +{
> + if (memblock.memory.regions == memblock_memory_init_regions)
> + return 0;
> +
> + *addr = __pa(memblock.memory.regions);
> +
> + return PAGE_ALIGN(sizeof(struct memblock_region) *
> +   memblock.memory.max);
> +}
> +
>  /**
>   * memblock_double_array - double the size of the memblock regions array
>   * @type: memblock type of the regions array being doubled
> diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> index 3a7e14d..83f36d3 100644
> --- a/mm/nobootmem.c
> +++ b/mm/nobootmem.c
> @@ -122,11 +122,16 @@ static unsigned long __init 
> free_low_memory_core_early(void)
>   for_each_free_mem_range(i, MAX_NUMNODES, , , NULL)
>   count += __free_memory_core(start, end);
>  
> - /* free range that is used for reserved array if we allocate it */
> + /* Free memblock.reserved array if it was allocated */
>   size = get_allocated_memblock_reserved_regions_info();
>   if (size)
>   count += __free_memory_core(start, start + size);
>  
> + /* Free memblock.memory array if it was allocated */
> + size = get_allocated_memblock_memory_regions_info();
> + if (size)
> + count += __free_memory_core(start, start + size);
> +

Hi Philipp,

For some archs, like arm64, would use memblock.memory after system booting,
so we can not simply released to the buddy allocator, maybe need 
!defined(CONFIG_ARCH_DISCARD_MEMBLOCK).

#ifdef CONFIG_HAVE_ARCH_PFN_VALID
int pfn_valid(unsigned long pfn)
{
return memblock_is_memory(pfn << PAGE_SHIFT);
}
EXPORT_SYMBOL(pfn_valid);

Thanks,
Jianguo Wu

>   return count;
>  }
>  



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: free memblock.memory in free_all_bootmem

2014-01-07 Thread Jianguo Wu
On 2014/1/7 23:16, Philipp Hachtmann wrote:

 When calling free_all_bootmem() the free areas under memblock's
 control are released to the buddy allocator. Additionally the
 reserved list is freed if it was reallocated by memblock.
 The same should apply for the memory list.
 
 Signed-off-by: Philipp Hachtmann pha...@linux.vnet.ibm.com
 ---
  include/linux/memblock.h |  1 +
  mm/memblock.c| 12 
  mm/nobootmem.c   |  7 ++-
  3 files changed, 19 insertions(+), 1 deletion(-)
 
 diff --git a/include/linux/memblock.h b/include/linux/memblock.h
 index 77c60e5..d174922 100644
 --- a/include/linux/memblock.h
 +++ b/include/linux/memblock.h
 @@ -52,6 +52,7 @@ phys_addr_t memblock_find_in_range_node(phys_addr_t start, 
 phys_addr_t end,
  phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
  phys_addr_t size, phys_addr_t align);
  phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr);
 +phys_addr_t get_allocated_memblock_memory_regions_info(phys_addr_t *addr);
  void memblock_allow_resize(void);
  int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
  int memblock_add(phys_addr_t base, phys_addr_t size);
 diff --git a/mm/memblock.c b/mm/memblock.c
 index 53e477b..1a11d04 100644
 --- a/mm/memblock.c
 +++ b/mm/memblock.c
 @@ -271,6 +271,18 @@ phys_addr_t __init_memblock 
 get_allocated_memblock_reserved_regions_info(
 memblock.reserved.max);
  }
  
 +phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info(
 + phys_addr_t *addr)
 +{
 + if (memblock.memory.regions == memblock_memory_init_regions)
 + return 0;
 +
 + *addr = __pa(memblock.memory.regions);
 +
 + return PAGE_ALIGN(sizeof(struct memblock_region) *
 +   memblock.memory.max);
 +}
 +
  /**
   * memblock_double_array - double the size of the memblock regions array
   * @type: memblock type of the regions array being doubled
 diff --git a/mm/nobootmem.c b/mm/nobootmem.c
 index 3a7e14d..83f36d3 100644
 --- a/mm/nobootmem.c
 +++ b/mm/nobootmem.c
 @@ -122,11 +122,16 @@ static unsigned long __init 
 free_low_memory_core_early(void)
   for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL)
   count += __free_memory_core(start, end);
  
 - /* free range that is used for reserved array if we allocate it */
 + /* Free memblock.reserved array if it was allocated */
   size = get_allocated_memblock_reserved_regions_info(start);
   if (size)
   count += __free_memory_core(start, start + size);
  
 + /* Free memblock.memory array if it was allocated */
 + size = get_allocated_memblock_memory_regions_info(start);
 + if (size)
 + count += __free_memory_core(start, start + size);
 +

Hi Philipp,

For some archs, like arm64, would use memblock.memory after system booting,
so we can not simply released to the buddy allocator, maybe need 
!defined(CONFIG_ARCH_DISCARD_MEMBLOCK).

#ifdef CONFIG_HAVE_ARCH_PFN_VALID
int pfn_valid(unsigned long pfn)
{
return memblock_is_memory(pfn  PAGE_SHIFT);
}
EXPORT_SYMBOL(pfn_valid);

Thanks,
Jianguo Wu

   return count;
  }
  



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()

2013-12-16 Thread Jianguo Wu
Hi Kirill,

On 2013/12/16 22:25, Kirill A. Shutemov wrote:

> Jianguo Wu wrote:
>> In __page_check_address(), if address's pud is not present,
>> huge_pte_offset() will return NULL, we should check the return value.
>>
>> Signed-off-by: Jianguo Wu 
> 
> Looks okay to me.
> 
> Acked-by: Kirill A. Shutemov 
> 
> Have you triggered a crash there? Or just spotted by reading the code?
> 


By reading the code.

Thanks,
Jianguo Wu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()

2013-12-16 Thread Jianguo Wu
In __page_check_address(), if address's pud is not present,
huge_pte_offset() will return NULL, we should check the return value.

Signed-off-by: Jianguo Wu 
---
 mm/rmap.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 55c8b8d..068522d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -600,7 +600,11 @@ pte_t *__page_check_address(struct page *page, struct 
mm_struct *mm,
spinlock_t *ptl;
 
if (unlikely(PageHuge(page))) {
+   /* when pud is not present, pte will be NULL */
pte = huge_pte_offset(mm, address);
+   if (!pte)
+   return NULL;
+
ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
goto check;
}
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()

2013-12-16 Thread Jianguo Wu
In __page_check_address(), if address's pud is not present,
huge_pte_offset() will return NULL, we should check the return value.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/rmap.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 55c8b8d..068522d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -600,7 +600,11 @@ pte_t *__page_check_address(struct page *page, struct 
mm_struct *mm,
spinlock_t *ptl;
 
if (unlikely(PageHuge(page))) {
+   /* when pud is not present, pte will be NULL */
pte = huge_pte_offset(mm, address);
+   if (!pte)
+   return NULL;
+
ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
goto check;
}
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/hugetlb: check for pte NULL pointer in __page_check_address()

2013-12-16 Thread Jianguo Wu
Hi Kirill,

On 2013/12/16 22:25, Kirill A. Shutemov wrote:

 Jianguo Wu wrote:
 In __page_check_address(), if address's pud is not present,
 huge_pte_offset() will return NULL, we should check the return value.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 
 Looks okay to me.
 
 Acked-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
 
 Have you triggered a crash there? Or just spotted by reading the code?
 


By reading the code.

Thanks,
Jianguo Wu

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

2013-12-12 Thread Jianguo Wu
Changelog:
 - Only set PageHWPoison on the error raw page if page is freed into buddy

After a successful hugetlb page migration by soft offline, the source page
will either be freed into hugepage_freelists or buddy(over-commit page).
If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL
pointer dereference in dequeue_hwpoisoned_huge_page().

[  890.677918] BUG: unable to handle kernel NULL pointer dereference at
 0058
[  890.685741] IP: []
dequeue_hwpoisoned_huge_page+0x131/0x1d0
[  890.692861] PGD c23762067 PUD c24be2067 PMD 0
[  890.697314] Oops:  [#1] SMP

So check PageHuge(page) after call migrate_pages() successfully.

Tested-by: Naoya Horiguchi 
Cc: sta...@vger.kernel.org
Signed-off-by: Jianguo Wu 
---
 mm/memory-failure.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b7c1716..db08af9 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1505,10 +1505,16 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
if (ret > 0)
ret = -EIO;
} else {
-   set_page_hwpoison_huge_page(hpage);
-   dequeue_hwpoisoned_huge_page(hpage);
-   atomic_long_add(1 << compound_order(hpage),
-   _poisoned_pages);
+   /* overcommit hugetlb page will be freed to buddy */
+   if (PageHuge(page)) {
+   set_page_hwpoison_huge_page(hpage);
+   dequeue_hwpoisoned_huge_page(hpage);
+   atomic_long_add(1 << compound_order(hpage),
+   _poisoned_pages);
+   } else {
+   SetPageHWPoison(page);
+   atomic_long_inc(_poisoned_pages);
+   }
}
return ret;
 }
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

2013-12-12 Thread Jianguo Wu
Hi,

On 2013/12/13 10:32, Naoya Horiguchi wrote:

> On Fri, Dec 13, 2013 at 09:09:52AM +0800, Jianguo Wu wrote:
>> After a successful hugetlb page migration by soft offline, the source page
>> will either be freed into hugepage_freelists or buddy(over-commit page). If 
>> page is in
>> buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
>> dereference in dequeue_hwpoisoned_huge_page().
>>
>> [  890.677918] BUG: unable to handle kernel NULL pointer dereference at
>>  0058
>> [  890.685741] IP: []
>> dequeue_hwpoisoned_huge_page+0x131/0x1d0
>> [  890.692861] PGD c23762067 PUD c24be2067 PMD 0
>> [  890.697314] Oops:  [#1] SMP
>>
>> So check PageHuge(page) after call migrate_pages() successfully.
>>
>> Tested-by: Naoya Horiguchi 
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Jianguo Wu 
>> ---
>>  mm/memory-failure.c | 19 ++-
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b7c1716..e5567f2 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned 
>> long pfn, int flags)
>>  
>>  static int soft_offline_huge_page(struct page *page, int flags)
>>  {
>> -int ret;
>> +int ret, i;
>> +unsigned long nr_pages;
>>  unsigned long pfn = page_to_pfn(page);
>>  struct page *hpage = compound_head(page);
>>  LIST_HEAD(pagelist);
>> @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, 
>> int flags)
>>  }
>>  unlock_page(hpage);
>>  
>> +nr_pages = 1 << compound_order(hpage);
>> +
>>  /* Keep page count to indicate a given hugepage is isolated. */
>>  list_move(>lru, );
>>  ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL,
>> @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
>> int flags)
>>  if (ret > 0)
>>  ret = -EIO;
>>  } else {
>> -set_page_hwpoison_huge_page(hpage);
>> -dequeue_hwpoisoned_huge_page(hpage);
>> -atomic_long_add(1 << compound_order(hpage),
>> -_poisoned_pages);
>> +/* overcommit hugetlb page will be freed to buddy */
>> +if (PageHuge(page)) {
>> +set_page_hwpoison_huge_page(hpage);
>> +dequeue_hwpoisoned_huge_page(hpage);
>> +} else {
>> +for (i = 0; i < nr_pages; i++)
>> +SetPageHWPoison(hpage + i);
> 
> Why don't you set PageHWPoison only on the error raw page instead
> of the whole error hugepage, or is there some problem of doing so?
> 

Oh, yes, we should only poison the error raw page. I will resend a new version.

Thanks,
Jianguo Wu

> Thanks,
> Naoya
> 
>> +}
>> +
>> +atomic_long_add(nr_pages, _poisoned_pages);
>>  }
>>  return ret;
>>  }
>> -- 
>> 1.8.2.2
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

2013-12-12 Thread Jianguo Wu
After a successful hugetlb page migration by soft offline, the source page
will either be freed into hugepage_freelists or buddy(over-commit page). If 
page is in
buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
dereference in dequeue_hwpoisoned_huge_page().

[  890.677918] BUG: unable to handle kernel NULL pointer dereference at
 0058
[  890.685741] IP: []
dequeue_hwpoisoned_huge_page+0x131/0x1d0
[  890.692861] PGD c23762067 PUD c24be2067 PMD 0
[  890.697314] Oops:  [#1] SMP

So check PageHuge(page) after call migrate_pages() successfully.

Tested-by: Naoya Horiguchi 
Cc: sta...@vger.kernel.org
Signed-off-by: Jianguo Wu 
---
 mm/memory-failure.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b7c1716..e5567f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
 
 static int soft_offline_huge_page(struct page *page, int flags)
 {
-   int ret;
+   int ret, i;
+   unsigned long nr_pages;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
LIST_HEAD(pagelist);
@@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
}
unlock_page(hpage);
 
+   nr_pages = 1 << compound_order(hpage);
+
/* Keep page count to indicate a given hugepage is isolated. */
list_move(>lru, );
ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL,
@@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
if (ret > 0)
ret = -EIO;
} else {
-   set_page_hwpoison_huge_page(hpage);
-   dequeue_hwpoisoned_huge_page(hpage);
-   atomic_long_add(1 << compound_order(hpage),
-   _poisoned_pages);
+   /* overcommit hugetlb page will be freed to buddy */
+   if (PageHuge(page)) {
+   set_page_hwpoison_huge_page(hpage);
+   dequeue_hwpoisoned_huge_page(hpage);
+   } else {
+   for (i = 0; i < nr_pages; i++)
+   SetPageHWPoison(hpage + i);
+   }
+
+   atomic_long_add(nr_pages, _poisoned_pages);
}
return ret;
 }
-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull

2013-12-12 Thread Jianguo Wu
Hi Naoya,

On 2013/12/13 1:39, Naoya Horiguchi wrote:

> (Cced: Chen Gong)
> 
> I confirmed that this patch fixes the reported bug.
> And I'll send a test patch for mce-test later privately.
> 
> Tested-by: Naoya Horiguchi 
> 
> Jianguo, could you put "Cc: sta...@vger.kernel.org"
> in patch description?
> And please fix a typo in subject line.
> 

OK, thanks for your tested!

Thanks,
Jianguo Wu

> Thanks,
> Naoya Horiguchi
> 
> On Thu, Dec 12, 2013 at 09:14:05PM +0800, Jianguo Wu wrote:
>> After a successful hugetlb page migration by soft offline, the source page
>> will either be freed into hugepage_freelists or buddy(over-commit page). If 
>> page is in
>> buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
>> dereference in dequeue_hwpoisoned_huge_page().
>>
>> [  890.677918] BUG: unable to handle kernel NULL pointer dereference at
>>  0058
>> [  890.685741] IP: []
>> dequeue_hwpoisoned_huge_page+0x131/0x1d0
>> [  890.692861] PGD c23762067 PUD c24be2067 PMD 0
>> [  890.697314] Oops:  [#1] SMP
>>
>> So check PageHuge(page) after call migrate_pages() successfull.
>>
>> Signed-off-by: Jianguo Wu 
>> ---
>>  mm/memory-failure.c | 19 ++-
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b7c1716..e5567f2 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned 
>> long pfn, int flags)
>>  
>>  static int soft_offline_huge_page(struct page *page, int flags)
>>  {
>> -int ret;
>> +int ret, i;
>> +unsigned long nr_pages;
>>  unsigned long pfn = page_to_pfn(page);
>>  struct page *hpage = compound_head(page);
>>  LIST_HEAD(pagelist);
>> @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, 
>> int flags)
>>  }
>>  unlock_page(hpage);
>>  
>> +nr_pages = 1 << compound_order(hpage);
>> +
>>  /* Keep page count to indicate a given hugepage is isolated. */
>>  list_move(>lru, );
>>  ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL,
>> @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
>> int flags)
>>  if (ret > 0)
>>  ret = -EIO;
>>  } else {
>> -set_page_hwpoison_huge_page(hpage);
>> -dequeue_hwpoisoned_huge_page(hpage);
>> -atomic_long_add(1 << compound_order(hpage),
>> -_poisoned_pages);
>> +/* over-commit hugetlb page will be freed into buddy */
>> +if (PageHuge(page)) {
>> +set_page_hwpoison_huge_page(hpage);
>> +dequeue_hwpoisoned_huge_page(hpage);
>> +} else {
>> +for (i = 0; i < nr_pages; i++)
>> +SetPageHWPoison(hpage + i);
>> +}
>> +
>> +atomic_long_add(nr_pages, _poisoned_pages);
>>  }
>>  return ret;
>>  }
>> -- 
>> 1.8.2.2
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull

2013-12-12 Thread Jianguo Wu
After a successful hugetlb page migration by soft offline, the source page
will either be freed into hugepage_freelists or buddy(over-commit page). If 
page is in
buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
dereference in dequeue_hwpoisoned_huge_page().

[  890.677918] BUG: unable to handle kernel NULL pointer dereference at
 0058
[  890.685741] IP: []
dequeue_hwpoisoned_huge_page+0x131/0x1d0
[  890.692861] PGD c23762067 PUD c24be2067 PMD 0
[  890.697314] Oops:  [#1] SMP

So check PageHuge(page) after call migrate_pages() successfull.

Signed-off-by: Jianguo Wu 
---
 mm/memory-failure.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b7c1716..e5567f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
 
 static int soft_offline_huge_page(struct page *page, int flags)
 {
-   int ret;
+   int ret, i;
+   unsigned long nr_pages;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
LIST_HEAD(pagelist);
@@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
}
unlock_page(hpage);
 
+   nr_pages = 1 << compound_order(hpage);
+
/* Keep page count to indicate a given hugepage is isolated. */
list_move(>lru, );
ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL,
@@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
if (ret > 0)
ret = -EIO;
} else {
-   set_page_hwpoison_huge_page(hpage);
-   dequeue_hwpoisoned_huge_page(hpage);
-   atomic_long_add(1 << compound_order(hpage),
-   _poisoned_pages);
+   /* over-commit hugetlb page will be freed into buddy */
+   if (PageHuge(page)) {
+   set_page_hwpoison_huge_page(hpage);
+   dequeue_hwpoisoned_huge_page(hpage);
+   } else {
+   for (i = 0; i < nr_pages; i++)
+   SetPageHWPoison(hpage + i);
+   }
+
+   atomic_long_add(nr_pages, _poisoned_pages);
}
return ret;
 }
-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull

2013-12-12 Thread Jianguo Wu
After a successful hugetlb page migration by soft offline, the source page
will either be freed into hugepage_freelists or buddy(over-commit page). If 
page is in
buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
dereference in dequeue_hwpoisoned_huge_page().

[  890.677918] BUG: unable to handle kernel NULL pointer dereference at
 0058
[  890.685741] IP: [81163761]
dequeue_hwpoisoned_huge_page+0x131/0x1d0
[  890.692861] PGD c23762067 PUD c24be2067 PMD 0
[  890.697314] Oops:  [#1] SMP

So check PageHuge(page) after call migrate_pages() successfull.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/memory-failure.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b7c1716..e5567f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
 
 static int soft_offline_huge_page(struct page *page, int flags)
 {
-   int ret;
+   int ret, i;
+   unsigned long nr_pages;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
LIST_HEAD(pagelist);
@@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
}
unlock_page(hpage);
 
+   nr_pages = 1  compound_order(hpage);
+
/* Keep page count to indicate a given hugepage is isolated. */
list_move(hpage-lru, pagelist);
ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL,
@@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
if (ret  0)
ret = -EIO;
} else {
-   set_page_hwpoison_huge_page(hpage);
-   dequeue_hwpoisoned_huge_page(hpage);
-   atomic_long_add(1  compound_order(hpage),
-   num_poisoned_pages);
+   /* over-commit hugetlb page will be freed into buddy */
+   if (PageHuge(page)) {
+   set_page_hwpoison_huge_page(hpage);
+   dequeue_hwpoisoned_huge_page(hpage);
+   } else {
+   for (i = 0; i  nr_pages; i++)
+   SetPageHWPoison(hpage + i);
+   }
+
+   atomic_long_add(nr_pages, num_poisoned_pages);
}
return ret;
 }
-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfull

2013-12-12 Thread Jianguo Wu
Hi Naoya,

On 2013/12/13 1:39, Naoya Horiguchi wrote:

 (Cced: Chen Gong)
 
 I confirmed that this patch fixes the reported bug.
 And I'll send a test patch for mce-test later privately.
 
 Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
 
 Jianguo, could you put Cc: sta...@vger.kernel.org
 in patch description?
 And please fix a typo in subject line.
 

OK, thanks for your tested!

Thanks,
Jianguo Wu

 Thanks,
 Naoya Horiguchi
 
 On Thu, Dec 12, 2013 at 09:14:05PM +0800, Jianguo Wu wrote:
 After a successful hugetlb page migration by soft offline, the source page
 will either be freed into hugepage_freelists or buddy(over-commit page). If 
 page is in
 buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
 dereference in dequeue_hwpoisoned_huge_page().

 [  890.677918] BUG: unable to handle kernel NULL pointer dereference at
  0058
 [  890.685741] IP: [81163761]
 dequeue_hwpoisoned_huge_page+0x131/0x1d0
 [  890.692861] PGD c23762067 PUD c24be2067 PMD 0
 [  890.697314] Oops:  [#1] SMP

 So check PageHuge(page) after call migrate_pages() successfull.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
  mm/memory-failure.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

 diff --git a/mm/memory-failure.c b/mm/memory-failure.c
 index b7c1716..e5567f2 100644
 --- a/mm/memory-failure.c
 +++ b/mm/memory-failure.c
 @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned 
 long pfn, int flags)
  
  static int soft_offline_huge_page(struct page *page, int flags)
  {
 -int ret;
 +int ret, i;
 +unsigned long nr_pages;
  unsigned long pfn = page_to_pfn(page);
  struct page *hpage = compound_head(page);
  LIST_HEAD(pagelist);
 @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, 
 int flags)
  }
  unlock_page(hpage);
  
 +nr_pages = 1  compound_order(hpage);
 +
  /* Keep page count to indicate a given hugepage is isolated. */
  list_move(hpage-lru, pagelist);
  ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL,
 @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
 int flags)
  if (ret  0)
  ret = -EIO;
  } else {
 -set_page_hwpoison_huge_page(hpage);
 -dequeue_hwpoisoned_huge_page(hpage);
 -atomic_long_add(1  compound_order(hpage),
 -num_poisoned_pages);
 +/* over-commit hugetlb page will be freed into buddy */
 +if (PageHuge(page)) {
 +set_page_hwpoison_huge_page(hpage);
 +dequeue_hwpoisoned_huge_page(hpage);
 +} else {
 +for (i = 0; i  nr_pages; i++)
 +SetPageHWPoison(hpage + i);
 +}
 +
 +atomic_long_add(nr_pages, num_poisoned_pages);
  }
  return ret;
  }
 -- 
 1.8.2.2


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

2013-12-12 Thread Jianguo Wu
After a successful hugetlb page migration by soft offline, the source page
will either be freed into hugepage_freelists or buddy(over-commit page). If 
page is in
buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
dereference in dequeue_hwpoisoned_huge_page().

[  890.677918] BUG: unable to handle kernel NULL pointer dereference at
 0058
[  890.685741] IP: [81163761]
dequeue_hwpoisoned_huge_page+0x131/0x1d0
[  890.692861] PGD c23762067 PUD c24be2067 PMD 0
[  890.697314] Oops:  [#1] SMP

So check PageHuge(page) after call migrate_pages() successfully.

Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
Cc: sta...@vger.kernel.org
Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/memory-failure.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b7c1716..e5567f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
 
 static int soft_offline_huge_page(struct page *page, int flags)
 {
-   int ret;
+   int ret, i;
+   unsigned long nr_pages;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
LIST_HEAD(pagelist);
@@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
}
unlock_page(hpage);
 
+   nr_pages = 1  compound_order(hpage);
+
/* Keep page count to indicate a given hugepage is isolated. */
list_move(hpage-lru, pagelist);
ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL,
@@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
if (ret  0)
ret = -EIO;
} else {
-   set_page_hwpoison_huge_page(hpage);
-   dequeue_hwpoisoned_huge_page(hpage);
-   atomic_long_add(1  compound_order(hpage),
-   num_poisoned_pages);
+   /* overcommit hugetlb page will be freed to buddy */
+   if (PageHuge(page)) {
+   set_page_hwpoison_huge_page(hpage);
+   dequeue_hwpoisoned_huge_page(hpage);
+   } else {
+   for (i = 0; i  nr_pages; i++)
+   SetPageHWPoison(hpage + i);
+   }
+
+   atomic_long_add(nr_pages, num_poisoned_pages);
}
return ret;
 }
-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

2013-12-12 Thread Jianguo Wu
Hi,

On 2013/12/13 10:32, Naoya Horiguchi wrote:

 On Fri, Dec 13, 2013 at 09:09:52AM +0800, Jianguo Wu wrote:
 After a successful hugetlb page migration by soft offline, the source page
 will either be freed into hugepage_freelists or buddy(over-commit page). If 
 page is in
 buddy, page_hstate(page) will be NULL. It will hit a NULL pointer
 dereference in dequeue_hwpoisoned_huge_page().

 [  890.677918] BUG: unable to handle kernel NULL pointer dereference at
  0058
 [  890.685741] IP: [81163761]
 dequeue_hwpoisoned_huge_page+0x131/0x1d0
 [  890.692861] PGD c23762067 PUD c24be2067 PMD 0
 [  890.697314] Oops:  [#1] SMP

 So check PageHuge(page) after call migrate_pages() successfully.

 Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
 Cc: sta...@vger.kernel.org
 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
  mm/memory-failure.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

 diff --git a/mm/memory-failure.c b/mm/memory-failure.c
 index b7c1716..e5567f2 100644
 --- a/mm/memory-failure.c
 +++ b/mm/memory-failure.c
 @@ -1471,7 +1471,8 @@ static int get_any_page(struct page *page, unsigned 
 long pfn, int flags)
  
  static int soft_offline_huge_page(struct page *page, int flags)
  {
 -int ret;
 +int ret, i;
 +unsigned long nr_pages;
  unsigned long pfn = page_to_pfn(page);
  struct page *hpage = compound_head(page);
  LIST_HEAD(pagelist);
 @@ -1489,6 +1490,8 @@ static int soft_offline_huge_page(struct page *page, 
 int flags)
  }
  unlock_page(hpage);
  
 +nr_pages = 1  compound_order(hpage);
 +
  /* Keep page count to indicate a given hugepage is isolated. */
  list_move(hpage-lru, pagelist);
  ret = migrate_pages(pagelist, new_page, MPOL_MF_MOVE_ALL,
 @@ -1505,10 +1508,16 @@ static int soft_offline_huge_page(struct page *page, 
 int flags)
  if (ret  0)
  ret = -EIO;
  } else {
 -set_page_hwpoison_huge_page(hpage);
 -dequeue_hwpoisoned_huge_page(hpage);
 -atomic_long_add(1  compound_order(hpage),
 -num_poisoned_pages);
 +/* overcommit hugetlb page will be freed to buddy */
 +if (PageHuge(page)) {
 +set_page_hwpoison_huge_page(hpage);
 +dequeue_hwpoisoned_huge_page(hpage);
 +} else {
 +for (i = 0; i  nr_pages; i++)
 +SetPageHWPoison(hpage + i);
 
 Why don't you set PageHWPoison only on the error raw page instead
 of the whole error hugepage, or is there some problem of doing so?
 

Oh, yes, we should only poison the error raw page. I will resend a new version.

Thanks,
Jianguo Wu

 Thanks,
 Naoya
 
 +}
 +
 +atomic_long_add(nr_pages, num_poisoned_pages);
  }
  return ret;
  }
 -- 
 1.8.2.2


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

2013-12-12 Thread Jianguo Wu
Changelog:
 - Only set PageHWPoison on the error raw page if page is freed into buddy

After a successful hugetlb page migration by soft offline, the source page
will either be freed into hugepage_freelists or buddy(over-commit page).
If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL
pointer dereference in dequeue_hwpoisoned_huge_page().

[  890.677918] BUG: unable to handle kernel NULL pointer dereference at
 0058
[  890.685741] IP: [81163761]
dequeue_hwpoisoned_huge_page+0x131/0x1d0
[  890.692861] PGD c23762067 PUD c24be2067 PMD 0
[  890.697314] Oops:  [#1] SMP

So check PageHuge(page) after call migrate_pages() successfully.

Tested-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
Cc: sta...@vger.kernel.org
Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/memory-failure.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b7c1716..db08af9 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1505,10 +1505,16 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
if (ret  0)
ret = -EIO;
} else {
-   set_page_hwpoison_huge_page(hpage);
-   dequeue_hwpoisoned_huge_page(hpage);
-   atomic_long_add(1  compound_order(hpage),
-   num_poisoned_pages);
+   /* overcommit hugetlb page will be freed to buddy */
+   if (PageHuge(page)) {
+   set_page_hwpoison_huge_page(hpage);
+   dequeue_hwpoisoned_huge_page(hpage);
+   atomic_long_add(1  compound_order(hpage),
+   num_poisoned_pages);
+   } else {
+   SetPageHWPoison(page);
+   atomic_long_inc(num_poisoned_pages);
+   }
}
return ret;
 }
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: do_mincore() cleanup

2013-12-05 Thread Jianguo Wu
On 2013/12/5 22:39, Naoya Horiguchi wrote:

> On Thu, Dec 05, 2013 at 04:52:52PM +0800, Jianguo Wu wrote:
>> Two cleanups:
>> 1. remove redundant codes for hugetlb pages.
>> 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE,
>>this may increase do_mincore() calls, remove it.
>>
>> Signed-off-by: Jianguo Wu 
> 
> Reviewed-by: Naoya Horiguchi 

Hi Naoya, thanks for your review!

Jianguo Wu

> 
> Thanks!
> 
> Naoya
> 
>> ---
>>  mm/mincore.c |7 ---
>>  1 files changed, 0 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/mincore.c b/mm/mincore.c
>> index da2be56..1016233 100644
>> --- a/mm/mincore.c
>> +++ b/mm/mincore.c
>> @@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned 
>> long pages, unsigned char *v
>>  
>>  end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
>>  
>> -if (is_vm_hugetlb_page(vma)) {
>> -mincore_hugetlb_page_range(vma, addr, end, vec);
>> -return (end - addr) >> PAGE_SHIFT;
>> -}
>> -
>> -end = pmd_addr_end(addr, end);
>> -
>>  if (is_vm_hugetlb_page(vma))
>>  mincore_hugetlb_page_range(vma, addr, end, vec);
>>  else
>> -- 
>> 1.7.1
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: do_mincore() cleanup

2013-12-05 Thread Jianguo Wu
Two cleanups:
1. remove redundant codes for hugetlb pages.
2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE,
   this may increase do_mincore() calls, remove it.

Signed-off-by: Jianguo Wu 
---
 mm/mincore.c |7 ---
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index da2be56..1016233 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned long 
pages, unsigned char *v
 
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
 
-   if (is_vm_hugetlb_page(vma)) {
-   mincore_hugetlb_page_range(vma, addr, end, vec);
-   return (end - addr) >> PAGE_SHIFT;
-   }
-
-   end = pmd_addr_end(addr, end);
-
if (is_vm_hugetlb_page(vma))
mincore_hugetlb_page_range(vma, addr, end, vec);
else
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: do_mincore() cleanup

2013-12-05 Thread Jianguo Wu
Two cleanups:
1. remove redundant codes for hugetlb pages.
2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE,
   this may increase do_mincore() calls, remove it.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/mincore.c |7 ---
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index da2be56..1016233 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned long 
pages, unsigned char *v
 
end = min(vma-vm_end, addr + (pages  PAGE_SHIFT));
 
-   if (is_vm_hugetlb_page(vma)) {
-   mincore_hugetlb_page_range(vma, addr, end, vec);
-   return (end - addr)  PAGE_SHIFT;
-   }
-
-   end = pmd_addr_end(addr, end);
-
if (is_vm_hugetlb_page(vma))
mincore_hugetlb_page_range(vma, addr, end, vec);
else
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: do_mincore() cleanup

2013-12-05 Thread Jianguo Wu
On 2013/12/5 22:39, Naoya Horiguchi wrote:

 On Thu, Dec 05, 2013 at 04:52:52PM +0800, Jianguo Wu wrote:
 Two cleanups:
 1. remove redundant codes for hugetlb pages.
 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE,
this may increase do_mincore() calls, remove it.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 
 Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com

Hi Naoya, thanks for your review!

Jianguo Wu

 
 Thanks!
 
 Naoya
 
 ---
  mm/mincore.c |7 ---
  1 files changed, 0 insertions(+), 7 deletions(-)

 diff --git a/mm/mincore.c b/mm/mincore.c
 index da2be56..1016233 100644
 --- a/mm/mincore.c
 +++ b/mm/mincore.c
 @@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned 
 long pages, unsigned char *v
  
  end = min(vma-vm_end, addr + (pages  PAGE_SHIFT));
  
 -if (is_vm_hugetlb_page(vma)) {
 -mincore_hugetlb_page_range(vma, addr, end, vec);
 -return (end - addr)  PAGE_SHIFT;
 -}
 -
 -end = pmd_addr_end(addr, end);
 -
  if (is_vm_hugetlb_page(vma))
  mincore_hugetlb_page_range(vma, addr, end, vec);
  else
 -- 
 1.7.1


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Resend with ACK][PATCH] mm/arch: use NUMA_NO_NODE

2013-09-23 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()

Signed-off-by: Jianguo Wu 
Acked-by: Ralf Baechle 
---
 arch/arm/kernel/module.c|2 +-
 arch/arm64/kernel/module.c  |2 +-
 arch/mips/kernel/module.c   |2 +-
 arch/parisc/kernel/module.c |2 +-
 arch/s390/kernel/module.c   |2 +-
 arch/sparc/kernel/module.c  |2 +-
 arch/x86/kernel/module.c|2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 85c3fb6..8f4cff3 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -40,7 +40,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index ca0e3d5..8f898bd 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -29,7 +29,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 977a623..b507e07 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock);
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 2a625fb..50dfafc 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -219,7 +219,7 @@ void *module_alloc(unsigned long size)
 * init_data correctly */
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
GFP_KERNEL | __GFP_HIGHMEM,
-   PAGE_KERNEL_RWX, -1,
+   PAGE_KERNEL_RWX, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 7845e15..b89b591 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -50,7 +50,7 @@ void *module_alloc(unsigned long size)
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 4435488..97655e0 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -29,7 +29,7 @@ static void *module_map(unsigned long size)
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #else
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 216a4d7..18be189 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -49,7 +49,7 @@ void *module_alloc(unsigned long size)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
-   -1, __builtin_return_address(0));
+   NUMA_NO_NODE, __builtin_return_address(0));
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Resend with ACK][PATCH] mm/arch: use NUMA_NODE

2013-09-23 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()

Signed-off-by: Jianguo Wu 
Acked-by: Ralf Baechle 
---
 arch/arm/kernel/module.c|2 +-
 arch/arm64/kernel/module.c  |2 +-
 arch/mips/kernel/module.c   |2 +-
 arch/parisc/kernel/module.c |2 +-
 arch/s390/kernel/module.c   |2 +-
 arch/sparc/kernel/module.c  |2 +-
 arch/x86/kernel/module.c|2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 85c3fb6..8f4cff3 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -40,7 +40,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index ca0e3d5..8f898bd 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -29,7 +29,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 977a623..b507e07 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock);
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 2a625fb..50dfafc 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -219,7 +219,7 @@ void *module_alloc(unsigned long size)
 * init_data correctly */
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
GFP_KERNEL | __GFP_HIGHMEM,
-   PAGE_KERNEL_RWX, -1,
+   PAGE_KERNEL_RWX, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 7845e15..b89b591 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -50,7 +50,7 @@ void *module_alloc(unsigned long size)
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 4435488..97655e0 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -29,7 +29,7 @@ static void *module_map(unsigned long size)
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #else
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 216a4d7..18be189 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -49,7 +49,7 @@ void *module_alloc(unsigned long size)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
-   -1, __builtin_return_address(0));
+   NUMA_NO_NODE, __builtin_return_address(0));
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Resend with ACK][PATCH] mm/arch: use NUMA_NODE

2013-09-23 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()

Signed-off-by: Jianguo Wu wujian...@huawei.com
Acked-by: Ralf Baechle r...@linux-mips.org
---
 arch/arm/kernel/module.c|2 +-
 arch/arm64/kernel/module.c  |2 +-
 arch/mips/kernel/module.c   |2 +-
 arch/parisc/kernel/module.c |2 +-
 arch/s390/kernel/module.c   |2 +-
 arch/sparc/kernel/module.c  |2 +-
 arch/x86/kernel/module.c|2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 85c3fb6..8f4cff3 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -40,7 +40,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index ca0e3d5..8f898bd 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -29,7 +29,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 977a623..b507e07 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock);
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 2a625fb..50dfafc 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -219,7 +219,7 @@ void *module_alloc(unsigned long size)
 * init_data correctly */
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
GFP_KERNEL | __GFP_HIGHMEM,
-   PAGE_KERNEL_RWX, -1,
+   PAGE_KERNEL_RWX, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 7845e15..b89b591 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -50,7 +50,7 @@ void *module_alloc(unsigned long size)
if (PAGE_ALIGN(size)  MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 4435488..97655e0 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -29,7 +29,7 @@ static void *module_map(unsigned long size)
if (PAGE_ALIGN(size)  MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #else
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 216a4d7..18be189 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -49,7 +49,7 @@ void *module_alloc(unsigned long size)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
-   -1, __builtin_return_address(0));
+   NUMA_NO_NODE, __builtin_return_address(0));
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Resend with ACK][PATCH] mm/arch: use NUMA_NO_NODE

2013-09-23 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()

Signed-off-by: Jianguo Wu wujian...@huawei.com
Acked-by: Ralf Baechle r...@linux-mips.org
---
 arch/arm/kernel/module.c|2 +-
 arch/arm64/kernel/module.c  |2 +-
 arch/mips/kernel/module.c   |2 +-
 arch/parisc/kernel/module.c |2 +-
 arch/s390/kernel/module.c   |2 +-
 arch/sparc/kernel/module.c  |2 +-
 arch/x86/kernel/module.c|2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 85c3fb6..8f4cff3 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -40,7 +40,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index ca0e3d5..8f898bd 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -29,7 +29,7 @@
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+   GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 977a623..b507e07 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock);
 void *module_alloc(unsigned long size)
 {
return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 2a625fb..50dfafc 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -219,7 +219,7 @@ void *module_alloc(unsigned long size)
 * init_data correctly */
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
GFP_KERNEL | __GFP_HIGHMEM,
-   PAGE_KERNEL_RWX, -1,
+   PAGE_KERNEL_RWX, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 7845e15..b89b591 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -50,7 +50,7 @@ void *module_alloc(unsigned long size)
if (PAGE_ALIGN(size)  MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #endif
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 4435488..97655e0 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -29,7 +29,7 @@ static void *module_map(unsigned long size)
if (PAGE_ALIGN(size)  MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, -1,
+   GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 #else
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 216a4d7..18be189 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -49,7 +49,7 @@ void *module_alloc(unsigned long size)
return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
-   -1, __builtin_return_address(0));
+   NUMA_NO_NODE, __builtin_return_address(0));
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/ksm: return NULL when doesn't get mergeable page

2013-09-21 Thread Jianguo Wu
On 2013/9/19 16:33, Petr Holasek wrote:

> On Mon, 16 Sep 2013, Jianguo Wu wrote:
>> In get_mergeable_page() local variable page is not initialized,
>> it may hold a garbage value, when find_mergeable_vma() return NULL,
>> get_mergeable_page() may return a garbage value to the caller.
>>
>> So initialize page as NULL.
>>
>> Signed-off-by: Jianguo Wu 
>> ---
>>  mm/ksm.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index b6afe0c..87efbae 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item 
>> *rmap_item)
>>  struct mm_struct *mm = rmap_item->mm;
>>  unsigned long addr = rmap_item->address;
>>  struct vm_area_struct *vma;
>> -struct page *page;
>> +struct page *page = NULL;
>>  
>>  down_read(>mmap_sem);
>>  vma = find_mergeable_vma(mm, addr);
>> -- 
>> 1.7.1
>>
> 
> When find_mergeable_vma returned NULL, NULL is assigned to page in "out"
> statement.
> 

Oh, yes, thanks, Petr.

> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/ksm: return NULL when doesn't get mergeable page

2013-09-21 Thread Jianguo Wu
On 2013/9/19 16:33, Petr Holasek wrote:

 On Mon, 16 Sep 2013, Jianguo Wu wrote:
 In get_mergeable_page() local variable page is not initialized,
 it may hold a garbage value, when find_mergeable_vma() return NULL,
 get_mergeable_page() may return a garbage value to the caller.

 So initialize page as NULL.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
  mm/ksm.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/mm/ksm.c b/mm/ksm.c
 index b6afe0c..87efbae 100644
 --- a/mm/ksm.c
 +++ b/mm/ksm.c
 @@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item 
 *rmap_item)
  struct mm_struct *mm = rmap_item-mm;
  unsigned long addr = rmap_item-address;
  struct vm_area_struct *vma;
 -struct page *page;
 +struct page *page = NULL;
  
  down_read(mm-mmap_sem);
  vma = find_mergeable_vma(mm, addr);
 -- 
 1.7.1

 
 When find_mergeable_vma returned NULL, NULL is assigned to page in out
 statement.
 

Oh, yes, thanks, Petr.

 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1

Signed-off-by: Jianguo Wu 
Acked-by: KOSAKI Motohiro 
---
 mm/mempolicy.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4baf12e..4f0cd20 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
tmp = *from;
while (!nodes_empty(tmp)) {
int s,d;
-   int source = -1;
+   int source = NUMA_NO_NODE;
int dest = 0;
 
for_each_node_mask(s, tmp) {
@@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
if (!node_isset(dest, tmp))
break;
}
-   if (source == -1)
+   if (source == NUMA_NO_NODE)
break;
 
node_clear(source, tmp);
@@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol,
unsigned nnodes = nodes_weight(pol->v.nodes);
unsigned target;
int c;
-   int nid = -1;
+   int nid = NUMA_NO_NODE;
 
if (!nnodes)
return numa_node_id();
@@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy 
*pol,
 
 /*
  * Return the bit number of a random bit set in the nodemask.
- * (returns -1 if nodemask is empty)
+ * (returns NUMA_NO_NODE if nodemask is empty)
  */
 int node_random(const nodemask_t *maskp)
 {
-   int w, bit = -1;
+   int w, bit = NUMA_NO_NODE;
 
w = nodes_weight(*maskp);
if (w)
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
On 2013/9/17 4:26, Cody P Schafer wrote:

> 
>> @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct 
>> mempolicy *pol,
>>
>>   /*
>>* Return the bit number of a random bit set in the nodemask.
>> - * (returns -1 if nodemask is empty)
>> + * (returns NUMA_NO_NOD if nodemask is empty)
> 
> s/NUMA_NO_NOD/NUMA_NO_NODE/

> 

Thanks, I will resent this.

>>*/
>>   int node_random(const nodemask_t *maskp)
>>   {
>> -int w, bit = -1;
>> +int w, bit = NUMA_NO_NODE;
>>
>>   w = nodes_weight(*maskp);
>>   if (w)
>>
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
On 2013/9/17 0:19, KOSAKI Motohiro wrote:

> (9/16/13 8:53 AM), Jianguo Wu wrote:
>> Use more appropriate NUMA_NO_NODE instead of -1
>>
>> Signed-off-by: Jianguo Wu 
>> ---
>>   mm/mempolicy.c |   10 +-
>>   1 files changed, 5 insertions(+), 5 deletions(-)
> 
> I think this patch don't make any functional change, right?
> 

Yes.

> Acked-by: KOSAKI Motohiro 

Thanks for your ack.

> 
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1

Signed-off-by: Jianguo Wu 
---
 mm/mempolicy.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4baf12e..4f73025 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
tmp = *from;
while (!nodes_empty(tmp)) {
int s,d;
-   int source = -1;
+   int source = NUMA_NO_NODE;
int dest = 0;
 
for_each_node_mask(s, tmp) {
@@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
if (!node_isset(dest, tmp))
break;
}
-   if (source == -1)
+   if (source == NUMA_NO_NODE)
break;
 
node_clear(source, tmp);
@@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol,
unsigned nnodes = nodes_weight(pol->v.nodes);
unsigned target;
int c;
-   int nid = -1;
+   int nid = NUMA_NO_NODE;
 
if (!nnodes)
return numa_node_id();
@@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy 
*pol,
 
 /*
  * Return the bit number of a random bit set in the nodemask.
- * (returns -1 if nodemask is empty)
+ * (returns NUMA_NO_NOD if nodemask is empty)
  */
 int node_random(const nodemask_t *maskp)
 {
-   int w, bit = -1;
+   int w, bit = NUMA_NO_NODE;
 
w = nodes_weight(*maskp);
if (w)
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/ksm: return NULL when doesn't get mergeable page

2013-09-16 Thread Jianguo Wu
In get_mergeable_page() local variable page is not initialized,
it may hold a garbage value, when find_mergeable_vma() return NULL,
get_mergeable_page() may return a garbage value to the caller.

So initialize page as NULL.

Signed-off-by: Jianguo Wu 
---
 mm/ksm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index b6afe0c..87efbae 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item 
*rmap_item)
struct mm_struct *mm = rmap_item->mm;
unsigned long addr = rmap_item->address;
struct vm_area_struct *vma;
-   struct page *page;
+   struct page *page = NULL;
 
down_read(>mmap_sem);
vma = find_mergeable_vma(mm, addr);
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/ksm: return NULL when doesn't get mergeable page

2013-09-16 Thread Jianguo Wu
In get_mergeable_page() local variable page is not initialized,
it may hold a garbage value, when find_mergeable_vma() return NULL,
get_mergeable_page() may return a garbage value to the caller.

So initialize page as NULL.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/ksm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index b6afe0c..87efbae 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -460,7 +460,7 @@ static struct page *get_mergeable_page(struct rmap_item 
*rmap_item)
struct mm_struct *mm = rmap_item-mm;
unsigned long addr = rmap_item-address;
struct vm_area_struct *vma;
-   struct page *page;
+   struct page *page = NULL;
 
down_read(mm-mmap_sem);
vma = find_mergeable_vma(mm, addr);
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/mempolicy.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4baf12e..4f73025 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
tmp = *from;
while (!nodes_empty(tmp)) {
int s,d;
-   int source = -1;
+   int source = NUMA_NO_NODE;
int dest = 0;
 
for_each_node_mask(s, tmp) {
@@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
if (!node_isset(dest, tmp))
break;
}
-   if (source == -1)
+   if (source == NUMA_NO_NODE)
break;
 
node_clear(source, tmp);
@@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol,
unsigned nnodes = nodes_weight(pol-v.nodes);
unsigned target;
int c;
-   int nid = -1;
+   int nid = NUMA_NO_NODE;
 
if (!nnodes)
return numa_node_id();
@@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy 
*pol,
 
 /*
  * Return the bit number of a random bit set in the nodemask.
- * (returns -1 if nodemask is empty)
+ * (returns NUMA_NO_NOD if nodemask is empty)
  */
 int node_random(const nodemask_t *maskp)
 {
-   int w, bit = -1;
+   int w, bit = NUMA_NO_NODE;
 
w = nodes_weight(*maskp);
if (w)
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
On 2013/9/17 0:19, KOSAKI Motohiro wrote:

 (9/16/13 8:53 AM), Jianguo Wu wrote:
 Use more appropriate NUMA_NO_NODE instead of -1

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
   mm/mempolicy.c |   10 +-
   1 files changed, 5 insertions(+), 5 deletions(-)
 
 I think this patch don't make any functional change, right?
 

Yes.

 Acked-by: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com

Thanks for your ack.

 
 
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
On 2013/9/17 4:26, Cody P Schafer wrote:

 
 @@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct 
 mempolicy *pol,

   /*
* Return the bit number of a random bit set in the nodemask.
 - * (returns -1 if nodemask is empty)
 + * (returns NUMA_NO_NOD if nodemask is empty)
 
 s/NUMA_NO_NOD/NUMA_NO_NODE/

 

Thanks, I will resent this.

*/
   int node_random(const nodemask_t *maskp)
   {
 -int w, bit = -1;
 +int w, bit = NUMA_NO_NODE;

   w = nodes_weight(*maskp);
   if (w)

 
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND PATCH] mm/mempolicy: use NUMA_NO_NODE

2013-09-16 Thread Jianguo Wu
Use more appropriate NUMA_NO_NODE instead of -1

Signed-off-by: Jianguo Wu wujian...@huawei.com
Acked-by: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
---
 mm/mempolicy.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4baf12e..4f0cd20 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1083,7 +1083,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
tmp = *from;
while (!nodes_empty(tmp)) {
int s,d;
-   int source = -1;
+   int source = NUMA_NO_NODE;
int dest = 0;
 
for_each_node_mask(s, tmp) {
@@ -1118,7 +1118,7 @@ int do_migrate_pages(struct mm_struct *mm, const 
nodemask_t *from,
if (!node_isset(dest, tmp))
break;
}
-   if (source == -1)
+   if (source == NUMA_NO_NODE)
break;
 
node_clear(source, tmp);
@@ -1765,7 +1765,7 @@ static unsigned offset_il_node(struct mempolicy *pol,
unsigned nnodes = nodes_weight(pol-v.nodes);
unsigned target;
int c;
-   int nid = -1;
+   int nid = NUMA_NO_NODE;
 
if (!nnodes)
return numa_node_id();
@@ -1802,11 +1802,11 @@ static inline unsigned interleave_nid(struct mempolicy 
*pol,
 
 /*
  * Return the bit number of a random bit set in the nodemask.
- * (returns -1 if nodemask is empty)
+ * (returns NUMA_NO_NODE if nodemask is empty)
  */
 int node_random(const nodemask_t *maskp)
 {
-   int w, bit = -1;
+   int w, bit = NUMA_NO_NODE;
 
w = nodes_weight(*maskp);
if (w)
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/5] memblock: Introduce allocation direction to memblock.

2013-09-13 Thread Jianguo Wu
Hi Tang,

On 2013/9/13 17:30, Tang Chen wrote:

> The Linux kernel cannot migrate pages used by the kernel. As a result, kernel
> pages cannot be hot-removed. So we cannot allocate hotpluggable memory for
> the kernel.
> 
> ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info.
> But before SRAT is parsed, memblock has already started to allocate memory
> for the kernel. So we need to prevent memblock from doing this.
> 
> In a memory hotplug system, any numa node the kernel resides in should
> be unhotpluggable. And for a modern server, each node could have at least
> 16GB memory. So memory around the kernel image is highly likely 
> unhotpluggable.
> 
> So the basic idea is: Allocate memory from the end of the kernel image and
> to the higher memory. Since memory allocation before SRAT is parsed won't
> be too much, it could highly likely be in the same node with kernel image.
> 
> The current memblock can only allocate memory from high address to low.
> So this patch introduces the allocation direct to memblock. It could be
> used to tell memblock to allocate memory from high to low or from low
> to high.
> 
> Signed-off-by: Tang Chen 
> Reviewed-by: Zhang Yanfei 
> ---
>  include/linux/memblock.h |   22 ++
>  mm/memblock.c|   13 +
>  2 files changed, 35 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 31e95ac..a7d3436 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -19,6 +19,11 @@
>  
>  #define INIT_MEMBLOCK_REGIONS128
>  
> +/* Allocation order. */

s/order/direction/

> +#define MEMBLOCK_DIRECTION_HIGH_TO_LOW   0
> +#define MEMBLOCK_DIRECTION_LOW_TO_HIGH   1
> +#define MEMBLOCK_DIRECTION_DEFAULT   MEMBLOCK_DIRECTION_HIGH_TO_LOW
> +
>  struct memblock_region {
>   phys_addr_t base;
>   phys_addr_t size;
> @@ -35,6 +40,7 @@ struct memblock_type {
>  };
>  
>  struct memblock {
> + int current_direction;  /* allocate from higher or lower address */
>   phys_addr_t current_limit;
>   struct memblock_type memory;
>   struct memblock_type reserved;
> @@ -148,6 +154,12 @@ phys_addr_t memblock_alloc_try_nid(phys_addr_t size, 
> phys_addr_t align, int nid)
>  
>  phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
>  
> +static inline bool memblock_direction_bottom_up(void)
> +{
> + return memblock.current_direction == MEMBLOCK_DIRECTION_LOW_TO_HIGH;
> +}
> +
> +
>  /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
>  #define MEMBLOCK_ALLOC_ANYWHERE  (~(phys_addr_t)0)
>  #define MEMBLOCK_ALLOC_ACCESSIBLE0
> @@ -175,6 +187,16 @@ static inline void memblock_dump_all(void)
>  }
>  
>  /**
> + * memblock_set_current_direction - Set current allocation direction to allow
> + *  allocating memory from higher to lower
> + *  address or from lower to higher address
> + *
> + * @direction: In which order to allocate memory. Could be

s/order/direction/

> + * MEMBLOCK_DIRECTION_{HIGH_TO_LOW|LOW_TO_HIGH}
> + */
> +void memblock_set_current_direction(int direction);
> +
> +/**
>   * memblock_set_current_limit - Set the current allocation limit to allow
>   * limiting allocations to what is currently
>   * accessible during boot
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 0ac412a..f24ca2e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -32,6 +32,7 @@ struct memblock memblock __initdata_memblock = {
>   .reserved.cnt   = 1,/* empty dummy entry */
>   .reserved.max   = INIT_MEMBLOCK_REGIONS,
>  
> + .current_direction  = MEMBLOCK_DIRECTION_DEFAULT,
>   .current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
>  };
>  
> @@ -995,6 +996,18 @@ void __init_memblock memblock_trim_memory(phys_addr_t 
> align)
>   }
>  }
>  
> +void __init_memblock memblock_set_current_direction(int direction)
> +{
> + if (direction != MEMBLOCK_DIRECTION_HIGH_TO_LOW &&
> + direction != MEMBLOCK_DIRECTION_LOW_TO_HIGH) {
> + pr_warn("memblock: Failed to set allocation order. "
> + "Invalid order type: %d\n", direction);

s/order/direction/

> + return;
> + }
> +
> + memblock.current_direction = direction;
> +}
> +
>  void __init_memblock memblock_set_current_limit(phys_addr_t limit)
>  {
>   memblock.current_limit = limit;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/5] memblock: Introduce allocation direction to memblock.

2013-09-13 Thread Jianguo Wu
Hi Tang,

On 2013/9/13 17:30, Tang Chen wrote:

 The Linux kernel cannot migrate pages used by the kernel. As a result, kernel
 pages cannot be hot-removed. So we cannot allocate hotpluggable memory for
 the kernel.
 
 ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info.
 But before SRAT is parsed, memblock has already started to allocate memory
 for the kernel. So we need to prevent memblock from doing this.
 
 In a memory hotplug system, any numa node the kernel resides in should
 be unhotpluggable. And for a modern server, each node could have at least
 16GB memory. So memory around the kernel image is highly likely 
 unhotpluggable.
 
 So the basic idea is: Allocate memory from the end of the kernel image and
 to the higher memory. Since memory allocation before SRAT is parsed won't
 be too much, it could highly likely be in the same node with kernel image.
 
 The current memblock can only allocate memory from high address to low.
 So this patch introduces the allocation direct to memblock. It could be
 used to tell memblock to allocate memory from high to low or from low
 to high.
 
 Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
 Reviewed-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
 ---
  include/linux/memblock.h |   22 ++
  mm/memblock.c|   13 +
  2 files changed, 35 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/memblock.h b/include/linux/memblock.h
 index 31e95ac..a7d3436 100644
 --- a/include/linux/memblock.h
 +++ b/include/linux/memblock.h
 @@ -19,6 +19,11 @@
  
  #define INIT_MEMBLOCK_REGIONS128
  
 +/* Allocation order. */

s/order/direction/

 +#define MEMBLOCK_DIRECTION_HIGH_TO_LOW   0
 +#define MEMBLOCK_DIRECTION_LOW_TO_HIGH   1
 +#define MEMBLOCK_DIRECTION_DEFAULT   MEMBLOCK_DIRECTION_HIGH_TO_LOW
 +
  struct memblock_region {
   phys_addr_t base;
   phys_addr_t size;
 @@ -35,6 +40,7 @@ struct memblock_type {
  };
  
  struct memblock {
 + int current_direction;  /* allocate from higher or lower address */
   phys_addr_t current_limit;
   struct memblock_type memory;
   struct memblock_type reserved;
 @@ -148,6 +154,12 @@ phys_addr_t memblock_alloc_try_nid(phys_addr_t size, 
 phys_addr_t align, int nid)
  
  phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
  
 +static inline bool memblock_direction_bottom_up(void)
 +{
 + return memblock.current_direction == MEMBLOCK_DIRECTION_LOW_TO_HIGH;
 +}
 +
 +
  /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
  #define MEMBLOCK_ALLOC_ANYWHERE  (~(phys_addr_t)0)
  #define MEMBLOCK_ALLOC_ACCESSIBLE0
 @@ -175,6 +187,16 @@ static inline void memblock_dump_all(void)
  }
  
  /**
 + * memblock_set_current_direction - Set current allocation direction to allow
 + *  allocating memory from higher to lower
 + *  address or from lower to higher address
 + *
 + * @direction: In which order to allocate memory. Could be

s/order/direction/

 + * MEMBLOCK_DIRECTION_{HIGH_TO_LOW|LOW_TO_HIGH}
 + */
 +void memblock_set_current_direction(int direction);
 +
 +/**
   * memblock_set_current_limit - Set the current allocation limit to allow
   * limiting allocations to what is currently
   * accessible during boot
 diff --git a/mm/memblock.c b/mm/memblock.c
 index 0ac412a..f24ca2e 100644
 --- a/mm/memblock.c
 +++ b/mm/memblock.c
 @@ -32,6 +32,7 @@ struct memblock memblock __initdata_memblock = {
   .reserved.cnt   = 1,/* empty dummy entry */
   .reserved.max   = INIT_MEMBLOCK_REGIONS,
  
 + .current_direction  = MEMBLOCK_DIRECTION_DEFAULT,
   .current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
  };
  
 @@ -995,6 +996,18 @@ void __init_memblock memblock_trim_memory(phys_addr_t 
 align)
   }
  }
  
 +void __init_memblock memblock_set_current_direction(int direction)
 +{
 + if (direction != MEMBLOCK_DIRECTION_HIGH_TO_LOW 
 + direction != MEMBLOCK_DIRECTION_LOW_TO_HIGH) {
 + pr_warn(memblock: Failed to set allocation order. 
 + Invalid order type: %d\n, direction);

s/order/direction/

 + return;
 + }
 +
 + memblock.current_direction = direction;
 +}
 +
  void __init_memblock memblock_set_current_limit(phys_addr_t limit)
  {
   memblock.current_limit = limit;



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
Hi Wanpeng,
Thanks for your review, but this patch has minor format problem, please see 
below.
Please review the resend one, thanks.

Thanks,
Jianguo Wu

On 2013/9/5 16:09, Wanpeng Li wrote:

> On Thu, Sep 05, 2013 at 03:57:47PM +0800, Jianguo Wu wrote:
>> Changelog:
>> *v1 -> v2: also update the stale comments about default transparent
>> hugepage support pointed by Wanpeng Li.
>>
>> Since commit 13ece886d9(thp: transparent hugepage config choice),
>> transparent hugepage support is disabled by default, and
>> TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y.
>>
>> And since commit d39d33c332(thp: enable direct defrag), defrag is
>> enable for all transparent hugepage page faults by default, not only in
>> MADV_HUGEPAGE regions.
>>
> 
> Reviewed-by: Wanpeng Li 
> 
>> Signed-off-by: Jianguo Wu 
>> ---
>> mm/huge_memory.c |   12 ++--
>> 1 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index a92012a..0e42a70 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -26,12 +26,12 @@
>> #include 
>> #include "internal.h"
>>
>> -/*
>> - * By default transparent hugepage support is enabled for all mappings
>> - * and khugepaged scans all mappings. Defrag is only invoked by
>> - * khugepaged hugepage allocations and by page faults inside
>> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
>> - * allocations.
>> +/* By default transparent hugepage support is disabled in order that avoid

Should be:
+/*
+ * By default transparent hugepage support is disabled in order that avoid

Please review the resend one.

Thanks.

>> + * to risk increase the memory footprint of applications without a 
>> guaranteed
>> + * benefit. When transparent hugepage support is enabled, is for all 
>> mappings,
>> + * and khugepaged scans all mappings.
>> + * Defrag is invoked by khugepaged hugepage allocations and by page faults
>> + * for all hugepage allocations.
>>  */
>> unsigned long transparent_hugepage_flags __read_mostly =
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
>> -- 
>> 1.7.1
>>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2][RESEND] mm/thp: fix stale comments of transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
Changelog:
 *v1 -> v2: also update the stale comments about default transparent
hugepage support pointed by Wanpeng Li.

Since commit 13ece886d9(thp: transparent hugepage config choice),
transparent hugepage support is disabled by default, and
TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y.

And since commit d39d33c332(thp: enable direct defrag), defrag is
enable for all transparent hugepage page faults by default, not only in
MADV_HUGEPAGE regions.

Signed-off-by: Jianguo Wu 
---
 mm/huge_memory.c |   11 ++-
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..90ce6de 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -27,11 +27,12 @@
 #include "internal.h"
 
 /*
- * By default transparent hugepage support is enabled for all mappings
- * and khugepaged scans all mappings. Defrag is only invoked by
- * khugepaged hugepage allocations and by page faults inside
- * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
- * allocations.
+ * By default transparent hugepage support is disabled in order that avoid
+ * to risk increase the memory footprint of applications without a guaranteed
+ * benefit. When transparent hugepage support is enabled, is for all mappings,
+ * and khugepaged scans all mappings.
+ * Defrag is invoked by khugepaged hugepage allocations and by page faults
+ * for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
Changelog:
 *v1 -> v2: also update the stale comments about default transparent
hugepage support pointed by Wanpeng Li.

Since commit 13ece886d9(thp: transparent hugepage config choice),
transparent hugepage support is disabled by default, and
TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y.

And since commit d39d33c332(thp: enable direct defrag), defrag is
enable for all transparent hugepage page faults by default, not only in
MADV_HUGEPAGE regions.

Signed-off-by: Jianguo Wu 
---
 mm/huge_memory.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..0e42a70 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -26,12 +26,12 @@
 #include 
 #include "internal.h"
 
-/*
- * By default transparent hugepage support is enabled for all mappings
- * and khugepaged scans all mappings. Defrag is only invoked by
- * khugepaged hugepage allocations and by page faults inside
- * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
- * allocations.
+/* By default transparent hugepage support is disabled in order that avoid
+ * to risk increase the memory footprint of applications without a guaranteed
+ * benefit. When transparent hugepage support is enabled, is for all mappings,
+ * and khugepaged scans all mappings.
+ * Defrag is invoked by khugepaged hugepage allocations and by page faults
+ * for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
On 2013/9/5 12:58, Wanpeng Li wrote:

> Hi Jianguo,
> On Thu, Sep 05, 2013 at 11:54:00AM +0800, Jianguo Wu wrote:
>> On 2013/9/5 11:37, Wanpeng Li wrote:
>>
>>> On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote:
>>>> Hi Wanpeng,
>>>>
>>>> On 2013/9/5 10:11, Wanpeng Li wrote:
>>>>
>>>>> Hi Jianguo,
>>>>> On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote:
>>>>>> Since commit d39d33c332(thp: enable direct defrag), defrag is enable
>>>>>> for all transparent hugepage page faults by default, not only in
>>>>>> MADV_HUGEPAGE regions.
>>>>>>
>>>>>> Signed-off-by: Jianguo Wu 
>>>>>> ---
>>>>>> mm/huge_memory.c | 6 ++
>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index a92012a..abf047e 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -28,10 +28,8 @@
>>>>>>
>>>>>> /*
>>>>>>  * By default transparent hugepage support is enabled for all mappings
>>>>>
>>>>> This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by 
>>>>> default in
>>>>> order that avoid to risk increase the memory footprint of applications 
>>>>> w/o a 
>>>>> guaranteed benefit.
>>>>>
>>>>
>>>> Right, how about this:
>>>>
>>>> By default transparent hugepage support is disabled in order that avoid to 
>>>> risk
>>>
>>> I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured
>>> by default.
>>>
>>
>> Hi Wanpeng,
>>
>> We have TRANSPARENT_HUGEPAGE and 
>> TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE,
>> TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured 
>> only if TRANSPARENT_HUGEPAGE
>> is configured.
>>
>> By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is 
>> configured when TRANSPARENT_HUGEPAGE=y.
>>
>> commit 13ece886d9(thp: transparent hugepage config choice):
>>
>> config TRANSPARENT_HUGEPAGE
>> -   bool "Transparent Hugepage Support" if EMBEDDED
>> +   bool "Transparent Hugepage Support"
>>depends on X86 && MMU
>> -   default y
>>
>> +choice
>> +   prompt "Transparent Hugepage Support sysfs defaults"
>> +   depends on TRANSPARENT_HUGEPAGE
>> +   default TRANSPARENT_HUGEPAGE_ALWAYS
>>
> 
> mmotm tree:
> 
> grep 'TRANSPARENT_HUGEPAGE' .config
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
> 
> distro:
> 
> grep 'TRANSPARENT_HUGEPAGE' config-3.8.0-26-generic 
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
> 

Hi Wanpeng,

I'm a little confused, at mm/Kconfig, TRANSPARENT_HUGEPAGE is not configured by 
default.

and in x86_64, linus tree:

$make defconfig
$grep 'TRANSPARENT_HUGEPAGE' .config
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set

Do i misunderstand something here?

Thanks

> 
>> Thanks,
>> Jianguo Wu
>>
>>> Regards,
>>> Wanpeng Li 
>>>
>>>> increase the memory footprint of applications w/o a guaranteed benefit, and
>>>> khugepaged scans all mappings when transparent hugepage enabled.
>>>> Defrag is invoked by khugepaged hugepage allocations and by page faults 
>>>> for all
>>>> hugepage allocations.
>>>>
>>>> Thanks,
>>>> Jianguo Wu
>>>>
>>>>> Regards,
>>>>> Wanpeng Li 
>>>>>
>>>>>> - * and khugepaged scans all mappings. Defrag is only invoked by
>>>>>> - * khugepaged hugepage allocations and by page faults inside
>>>>>> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
>>>>>> - * allocations.
>>>>>> + * and khugepaged scans all mappings. Defrag is invoked by khugepaged
>>>>>> + * hugepage allocations and by page faults for all hugepage allocations.
>>>>>>  */
>>>>>> unsigned long transparent_hugepage_flags __read_mostly =
>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
>>>>>> -- 
>>>>>> 1.8.1.2
>>>>>>
>>>>>> --
>>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>>>>> see: http://www.linux-mm.org/ .
>>>>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
On 2013/9/5 12:58, Wanpeng Li wrote:

 Hi Jianguo,
 On Thu, Sep 05, 2013 at 11:54:00AM +0800, Jianguo Wu wrote:
 On 2013/9/5 11:37, Wanpeng Li wrote:

 On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote:
 Hi Wanpeng,

 On 2013/9/5 10:11, Wanpeng Li wrote:

 Hi Jianguo,
 On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote:
 Since commit d39d33c332(thp: enable direct defrag), defrag is enable
 for all transparent hugepage page faults by default, not only in
 MADV_HUGEPAGE regions.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
 mm/huge_memory.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index a92012a..abf047e 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -28,10 +28,8 @@

 /*
  * By default transparent hugepage support is enabled for all mappings

 This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by 
 default in
 order that avoid to risk increase the memory footprint of applications 
 w/o a 
 guaranteed benefit.


 Right, how about this:

 By default transparent hugepage support is disabled in order that avoid to 
 risk

 I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured
 by default.


 Hi Wanpeng,

 We have TRANSPARENT_HUGEPAGE and 
 TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE,
 TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured 
 only if TRANSPARENT_HUGEPAGE
 is configured.

 By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is 
 configured when TRANSPARENT_HUGEPAGE=y.

 commit 13ece886d9(thp: transparent hugepage config choice):

 config TRANSPARENT_HUGEPAGE
 -   bool Transparent Hugepage Support if EMBEDDED
 +   bool Transparent Hugepage Support
depends on X86  MMU
 -   default y

 +choice
 +   prompt Transparent Hugepage Support sysfs defaults
 +   depends on TRANSPARENT_HUGEPAGE
 +   default TRANSPARENT_HUGEPAGE_ALWAYS

 
 mmotm tree:
 
 grep 'TRANSPARENT_HUGEPAGE' .config
 CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
 CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
 
 distro:
 
 grep 'TRANSPARENT_HUGEPAGE' config-3.8.0-26-generic 
 CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
 CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
 

Hi Wanpeng,

I'm a little confused, at mm/Kconfig, TRANSPARENT_HUGEPAGE is not configured by 
default.

and in x86_64, linus tree:

$make defconfig
$grep 'TRANSPARENT_HUGEPAGE' .config
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set

Do i misunderstand something here?

Thanks

 
 Thanks,
 Jianguo Wu

 Regards,
 Wanpeng Li 

 increase the memory footprint of applications w/o a guaranteed benefit, and
 khugepaged scans all mappings when transparent hugepage enabled.
 Defrag is invoked by khugepaged hugepage allocations and by page faults 
 for all
 hugepage allocations.

 Thanks,
 Jianguo Wu

 Regards,
 Wanpeng Li 

 - * and khugepaged scans all mappings. Defrag is only invoked by
 - * khugepaged hugepage allocations and by page faults inside
 - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
 - * allocations.
 + * and khugepaged scans all mappings. Defrag is invoked by khugepaged
 + * hugepage allocations and by page faults for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
 -- 
 1.8.1.2

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a






 .



 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
Changelog:
 *v1 - v2: also update the stale comments about default transparent
hugepage support pointed by Wanpeng Li.

Since commit 13ece886d9(thp: transparent hugepage config choice),
transparent hugepage support is disabled by default, and
TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y.

And since commit d39d33c332(thp: enable direct defrag), defrag is
enable for all transparent hugepage page faults by default, not only in
MADV_HUGEPAGE regions.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/huge_memory.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..0e42a70 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -26,12 +26,12 @@
 #include asm/pgalloc.h
 #include internal.h
 
-/*
- * By default transparent hugepage support is enabled for all mappings
- * and khugepaged scans all mappings. Defrag is only invoked by
- * khugepaged hugepage allocations and by page faults inside
- * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
- * allocations.
+/* By default transparent hugepage support is disabled in order that avoid
+ * to risk increase the memory footprint of applications without a guaranteed
+ * benefit. When transparent hugepage support is enabled, is for all mappings,
+ * and khugepaged scans all mappings.
+ * Defrag is invoked by khugepaged hugepage allocations and by page faults
+ * for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2][RESEND] mm/thp: fix stale comments of transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
Changelog:
 *v1 - v2: also update the stale comments about default transparent
hugepage support pointed by Wanpeng Li.

Since commit 13ece886d9(thp: transparent hugepage config choice),
transparent hugepage support is disabled by default, and
TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y.

And since commit d39d33c332(thp: enable direct defrag), defrag is
enable for all transparent hugepage page faults by default, not only in
MADV_HUGEPAGE regions.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/huge_memory.c |   11 ++-
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..90ce6de 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -27,11 +27,12 @@
 #include internal.h
 
 /*
- * By default transparent hugepage support is enabled for all mappings
- * and khugepaged scans all mappings. Defrag is only invoked by
- * khugepaged hugepage allocations and by page faults inside
- * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
- * allocations.
+ * By default transparent hugepage support is disabled in order that avoid
+ * to risk increase the memory footprint of applications without a guaranteed
+ * benefit. When transparent hugepage support is enabled, is for all mappings,
+ * and khugepaged scans all mappings.
+ * Defrag is invoked by khugepaged hugepage allocations and by page faults
+ * for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm/thp: fix stale comments of transparent_hugepage_flags

2013-09-05 Thread Jianguo Wu
Hi Wanpeng,
Thanks for your review, but this patch has minor format problem, please see 
below.
Please review the resend one, thanks.

Thanks,
Jianguo Wu

On 2013/9/5 16:09, Wanpeng Li wrote:

 On Thu, Sep 05, 2013 at 03:57:47PM +0800, Jianguo Wu wrote:
 Changelog:
 *v1 - v2: also update the stale comments about default transparent
 hugepage support pointed by Wanpeng Li.

 Since commit 13ece886d9(thp: transparent hugepage config choice),
 transparent hugepage support is disabled by default, and
 TRANSPARENT_HUGEPAGE_ALWAYS is configured when TRANSPARENT_HUGEPAGE=y.

 And since commit d39d33c332(thp: enable direct defrag), defrag is
 enable for all transparent hugepage page faults by default, not only in
 MADV_HUGEPAGE regions.

 
 Reviewed-by: Wanpeng Li liw...@linux.vnet.ibm.com
 
 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
 mm/huge_memory.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index a92012a..0e42a70 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -26,12 +26,12 @@
 #include asm/pgalloc.h
 #include internal.h

 -/*
 - * By default transparent hugepage support is enabled for all mappings
 - * and khugepaged scans all mappings. Defrag is only invoked by
 - * khugepaged hugepage allocations and by page faults inside
 - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
 - * allocations.
 +/* By default transparent hugepage support is disabled in order that avoid

Should be:
+/*
+ * By default transparent hugepage support is disabled in order that avoid

Please review the resend one.

Thanks.

 + * to risk increase the memory footprint of applications without a 
 guaranteed
 + * benefit. When transparent hugepage support is enabled, is for all 
 mappings,
 + * and khugepaged scans all mappings.
 + * Defrag is invoked by khugepaged hugepage allocations and by page faults
 + * for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
 -- 
 1.7.1

 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-04 Thread Jianguo Wu
On 2013/9/5 11:37, Wanpeng Li wrote:

> On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote:
>> Hi Wanpeng,
>>
>> On 2013/9/5 10:11, Wanpeng Li wrote:
>>
>>> Hi Jianguo,
>>> On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote:
>>>> Since commit d39d33c332(thp: enable direct defrag), defrag is enable
>>>> for all transparent hugepage page faults by default, not only in
>>>> MADV_HUGEPAGE regions.
>>>>
>>>> Signed-off-by: Jianguo Wu 
>>>> ---
>>>> mm/huge_memory.c | 6 ++
>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index a92012a..abf047e 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -28,10 +28,8 @@
>>>>
>>>> /*
>>>>  * By default transparent hugepage support is enabled for all mappings
>>>
>>> This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by 
>>> default in
>>> order that avoid to risk increase the memory footprint of applications w/o 
>>> a 
>>> guaranteed benefit.
>>>
>>
>> Right, how about this:
>>
>> By default transparent hugepage support is disabled in order that avoid to 
>> risk
> 
> I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured
> by default.
> 

Hi Wanpeng,

We have TRANSPARENT_HUGEPAGE and 
TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE,
TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured only 
if TRANSPARENT_HUGEPAGE
is configured.

By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is 
configured when TRANSPARENT_HUGEPAGE=y.

commit 13ece886d9(thp: transparent hugepage config choice):

 config TRANSPARENT_HUGEPAGE
-   bool "Transparent Hugepage Support" if EMBEDDED
+   bool "Transparent Hugepage Support"
depends on X86 && MMU
-   default y

+choice
+   prompt "Transparent Hugepage Support sysfs defaults"
+   depends on TRANSPARENT_HUGEPAGE
+   default TRANSPARENT_HUGEPAGE_ALWAYS

Thanks,
Jianguo Wu

> Regards,
> Wanpeng Li 
> 
>> increase the memory footprint of applications w/o a guaranteed benefit, and
>> khugepaged scans all mappings when transparent hugepage enabled.
>> Defrag is invoked by khugepaged hugepage allocations and by page faults for 
>> all
>> hugepage allocations.
>>
>> Thanks,
>> Jianguo Wu
>>
>>> Regards,
>>> Wanpeng Li 
>>>
>>>> - * and khugepaged scans all mappings. Defrag is only invoked by
>>>> - * khugepaged hugepage allocations and by page faults inside
>>>> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
>>>> - * allocations.
>>>> + * and khugepaged scans all mappings. Defrag is invoked by khugepaged
>>>> + * hugepage allocations and by page faults for all hugepage allocations.
>>>>  */
>>>> unsigned long transparent_hugepage_flags __read_mostly =
>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
>>>> -- 
>>>> 1.8.1.2
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>
>>>
>>
>>
> 
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-04 Thread Jianguo Wu
Hi Wanpeng,

On 2013/9/5 10:11, Wanpeng Li wrote:

> Hi Jianguo,
> On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote:
>> Since commit d39d33c332(thp: enable direct defrag), defrag is enable
>> for all transparent hugepage page faults by default, not only in
>> MADV_HUGEPAGE regions.
>>
>> Signed-off-by: Jianguo Wu 
>> ---
>> mm/huge_memory.c | 6 ++
>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index a92012a..abf047e 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -28,10 +28,8 @@
>>
>> /*
>>  * By default transparent hugepage support is enabled for all mappings
> 
> This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by default 
> in
> order that avoid to risk increase the memory footprint of applications w/o a 
> guaranteed benefit.
> 

Right, how about this:

By default transparent hugepage support is disabled in order that avoid to risk
increase the memory footprint of applications w/o a guaranteed benefit, and
khugepaged scans all mappings when transparent hugepage enabled.
Defrag is invoked by khugepaged hugepage allocations and by page faults for all
hugepage allocations.

Thanks,
Jianguo Wu

> Regards,
> Wanpeng Li 
> 
>> - * and khugepaged scans all mappings. Defrag is only invoked by
>> - * khugepaged hugepage allocations and by page faults inside
>> - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
>> - * allocations.
>> + * and khugepaged scans all mappings. Defrag is invoked by khugepaged
>> + * hugepage allocations and by page faults for all hugepage allocations.
>>  */
>> unsigned long transparent_hugepage_flags __read_mostly =
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
>> -- 
>> 1.8.1.2
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-04 Thread Jianguo Wu
Since commit d39d33c332(thp: enable direct defrag), defrag is enable
for all transparent hugepage page faults by default, not only in
MADV_HUGEPAGE regions.

Signed-off-by: Jianguo Wu 
---
 mm/huge_memory.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..abf047e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -28,10 +28,8 @@
 
 /*
  * By default transparent hugepage support is enabled for all mappings
- * and khugepaged scans all mappings. Defrag is only invoked by
- * khugepaged hugepage allocations and by page faults inside
- * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
- * allocations.
+ * and khugepaged scans all mappings. Defrag is invoked by khugepaged
+ * hugepage allocations and by page faults for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-04 Thread Jianguo Wu
Since commit d39d33c332(thp: enable direct defrag), defrag is enable
for all transparent hugepage page faults by default, not only in
MADV_HUGEPAGE regions.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/huge_memory.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..abf047e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -28,10 +28,8 @@
 
 /*
  * By default transparent hugepage support is enabled for all mappings
- * and khugepaged scans all mappings. Defrag is only invoked by
- * khugepaged hugepage allocations and by page faults inside
- * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
- * allocations.
+ * and khugepaged scans all mappings. Defrag is invoked by khugepaged
+ * hugepage allocations and by page faults for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-04 Thread Jianguo Wu
Hi Wanpeng,

On 2013/9/5 10:11, Wanpeng Li wrote:

 Hi Jianguo,
 On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote:
 Since commit d39d33c332(thp: enable direct defrag), defrag is enable
 for all transparent hugepage page faults by default, not only in
 MADV_HUGEPAGE regions.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
 mm/huge_memory.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index a92012a..abf047e 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -28,10 +28,8 @@

 /*
  * By default transparent hugepage support is enabled for all mappings
 
 This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by default 
 in
 order that avoid to risk increase the memory footprint of applications w/o a 
 guaranteed benefit.
 

Right, how about this:

By default transparent hugepage support is disabled in order that avoid to risk
increase the memory footprint of applications w/o a guaranteed benefit, and
khugepaged scans all mappings when transparent hugepage enabled.
Defrag is invoked by khugepaged hugepage allocations and by page faults for all
hugepage allocations.

Thanks,
Jianguo Wu

 Regards,
 Wanpeng Li 
 
 - * and khugepaged scans all mappings. Defrag is only invoked by
 - * khugepaged hugepage allocations and by page faults inside
 - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
 - * allocations.
 + * and khugepaged scans all mappings. Defrag is invoked by khugepaged
 + * hugepage allocations and by page faults for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
 -- 
 1.8.1.2

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/thp: fix comments in transparent_hugepage_flags

2013-09-04 Thread Jianguo Wu
On 2013/9/5 11:37, Wanpeng Li wrote:

 On Thu, Sep 05, 2013 at 11:04:22AM +0800, Jianguo Wu wrote:
 Hi Wanpeng,

 On 2013/9/5 10:11, Wanpeng Li wrote:

 Hi Jianguo,
 On Wed, Sep 04, 2013 at 09:30:22PM +0800, Jianguo Wu wrote:
 Since commit d39d33c332(thp: enable direct defrag), defrag is enable
 for all transparent hugepage page faults by default, not only in
 MADV_HUGEPAGE regions.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
 mm/huge_memory.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index a92012a..abf047e 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -28,10 +28,8 @@

 /*
  * By default transparent hugepage support is enabled for all mappings

 This is also stale. TRANSPARENT_HUGEPAGE_ALWAYS is not configured by 
 default in
 order that avoid to risk increase the memory footprint of applications w/o 
 a 
 guaranteed benefit.


 Right, how about this:

 By default transparent hugepage support is disabled in order that avoid to 
 risk
 
 I don't think it's disabled. TRANSPARENT_HUGEPAGE_MADVISE is configured
 by default.
 

Hi Wanpeng,

We have TRANSPARENT_HUGEPAGE and 
TRANSPARENT_HUGEPAGE_ALWAYS/TRANSPARENT_HUGEPAGE_MADVISE,
TRANSPARENT_HUGEPAGE_ALWAYS or TRANSPARENT_HUGEPAGE_MADVISE is configured only 
if TRANSPARENT_HUGEPAGE
is configured.

By default, TRANSPARENT_HUGEPAGE=n, and TRANSPARENT_HUGEPAGE_ALWAYS is 
configured when TRANSPARENT_HUGEPAGE=y.

commit 13ece886d9(thp: transparent hugepage config choice):

 config TRANSPARENT_HUGEPAGE
-   bool Transparent Hugepage Support if EMBEDDED
+   bool Transparent Hugepage Support
depends on X86  MMU
-   default y

+choice
+   prompt Transparent Hugepage Support sysfs defaults
+   depends on TRANSPARENT_HUGEPAGE
+   default TRANSPARENT_HUGEPAGE_ALWAYS

Thanks,
Jianguo Wu

 Regards,
 Wanpeng Li 
 
 increase the memory footprint of applications w/o a guaranteed benefit, and
 khugepaged scans all mappings when transparent hugepage enabled.
 Defrag is invoked by khugepaged hugepage allocations and by page faults for 
 all
 hugepage allocations.

 Thanks,
 Jianguo Wu

 Regards,
 Wanpeng Li 

 - * and khugepaged scans all mappings. Defrag is only invoked by
 - * khugepaged hugepage allocations and by page faults inside
 - * MADV_HUGEPAGE regions to avoid the risk of slowing down short lived
 - * allocations.
 + * and khugepaged scans all mappings. Defrag is invoked by khugepaged
 + * hugepage allocations and by page faults for all hugepage allocations.
  */
 unsigned long transparent_hugepage_flags __read_mostly =
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS
 -- 
 1.8.1.2

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a




 
 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm/arch: use NUMA_NODE

2013-08-30 Thread Jianguo Wu
Cc linux...@kvack.org

On 2013/8/30 10:06, Jianguo Wu wrote:

> Use more appropriate NUMA_NO_NODE instead of -1 in some archs' module_alloc()
> 
> Signed-off-by: Jianguo Wu 
> ---
>  arch/arm/kernel/module.c|2 +-
>  arch/arm64/kernel/module.c  |2 +-
>  arch/mips/kernel/module.c   |2 +-
>  arch/parisc/kernel/module.c |2 +-
>  arch/s390/kernel/module.c   |2 +-
>  arch/sparc/kernel/module.c  |2 +-
>  arch/x86/kernel/module.c|2 +-
>  7 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
> index 85c3fb6..8f4cff3 100644
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -40,7 +40,7 @@
>  void *module_alloc(unsigned long size)
>  {
>   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
> + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
>   __builtin_return_address(0));
>  }
>  #endif
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index ca0e3d5..8f898bd 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -29,7 +29,7 @@
>  void *module_alloc(unsigned long size)
>  {
>   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
> + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
>   __builtin_return_address(0));
>  }
>  
> diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
> index 977a623..b507e07 100644
> --- a/arch/mips/kernel/module.c
> +++ b/arch/mips/kernel/module.c
> @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock);
>  void *module_alloc(unsigned long size)
>  {
>   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
> - GFP_KERNEL, PAGE_KERNEL, -1,
> + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
>   __builtin_return_address(0));
>  }
>  #endif
> diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
> index 2a625fb..50dfafc 100644
> --- a/arch/parisc/kernel/module.c
> +++ b/arch/parisc/kernel/module.c
> @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size)
>* init_data correctly */
>   return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
>   GFP_KERNEL | __GFP_HIGHMEM,
> - PAGE_KERNEL_RWX, -1,
> + PAGE_KERNEL_RWX, NUMA_NO_NODE,
>   __builtin_return_address(0));
>  }
>  
> diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
> index 7845e15..b89b591 100644
> --- a/arch/s390/kernel/module.c
> +++ b/arch/s390/kernel/module.c
> @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size)
>   if (PAGE_ALIGN(size) > MODULES_LEN)
>   return NULL;
>   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, -1,
> + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
>   __builtin_return_address(0));
>  }
>  #endif
> diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
> index 4435488..97655e0 100644
> --- a/arch/sparc/kernel/module.c
> +++ b/arch/sparc/kernel/module.c
> @@ -29,7 +29,7 @@ static void *module_map(unsigned long size)
>   if (PAGE_ALIGN(size) > MODULES_LEN)
>   return NULL;
>   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, -1,
> + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
>   __builtin_return_address(0));
>  }
>  #else
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index 216a4d7..18be189 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size)
>   return NULL;
>   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
>   GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
> - -1, __builtin_return_address(0));
> + NUMA_NO_NODE, __builtin_return_address(0));
>  }
>  
>  #ifdef CONFIG_X86_32



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] mm/vmalloc: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-30 Thread Jianguo Wu
On 2013/8/30 11:36, Jianguo Wu wrote:

> Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
> we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory,
> and N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> 
> The code here need to handle with the nodes which have memory,
> we should use N_MEMORY instead.
> 

As Michal pointed out in http://marc.info/?l=linux-kernel=137784852720861=2,
N_HIGH_MEMORY should be kept in these places, please ignore this series.

Sorry for the noise.

Thanks.

> Signed-off-by: Jianguo Wu 
> ---
>  mm/vmalloc.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 13a5495..1152947 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2573,7 +2573,7 @@ static void show_numa_info(struct seq_file *m, struct 
> vm_struct *v)
>   for (nr = 0; nr < v->nr_pages; nr++)
>   counters[page_to_nid(v->pages[nr])]++;
>  
> - for_each_node_state(nr, N_HIGH_MEMORY)
> + for_each_node_state(nr, N_MEMORY)
>   if (counters[nr])
>   seq_printf(m, " N%u=%u", nr, counters[nr]);
>   }



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/vmalloc: use help function to get vmalloc area size

2013-08-30 Thread Jianguo Wu
On 2013/8/30 16:49, Wanpeng Li wrote:

> On Fri, Aug 30, 2013 at 04:42:49PM +0800, Jianguo Wu wrote:
>> Use get_vm_area_size() to get vmalloc area's actual size without guard page.
>>
> 
> Do you see this?
> 
> http://marc.info/?l=linux-mm=137698172417316=2
> 

Hi Wanpeng,
Sorry for not notice your post, please ignore this patch.

Thanks.

>> Signed-off-by: Jianguo Wu 
>> ---
>> mm/vmalloc.c |   12 ++--
>> 1 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 13a5495..abe13bc 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned 
>> long size)
>> int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
>> {
>>  unsigned long addr = (unsigned long)area->addr;
>> -unsigned long end = addr + area->size - PAGE_SIZE;
>> +unsigned long end = addr + get_vm_area_size(area);
>>  int err;
>>
>>  err = vmap_page_range(addr, end, prot, *pages);
>> @@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct 
>> *area, gfp_t gfp_mask,
>>  unsigned int nr_pages, array_size, i;
>>  gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>>
>> -nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
>> +nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>>  array_size = (nr_pages * sizeof(struct page *));
>>
>>  area->nr_pages = nr_pages;
>> @@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count)
>>
>>  vm = va->vm;
>>  vaddr = (char *) vm->addr;
>> -if (addr >= vaddr + vm->size - PAGE_SIZE)
>> +if (addr >= vaddr + get_vm_area_size(vm))
>>  continue;
>>  while (addr < vaddr) {
>>  if (count == 0)
>> @@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count)
>>  addr++;
>>  count--;
>>  }
>> -n = vaddr + vm->size - PAGE_SIZE - addr;
>> +n = vaddr + get_vm_area_size(vm) - addr;
>>  if (n > count)
>>  n = count;
>>  if (!(vm->flags & VM_IOREMAP))
>> @@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
>>
>>  vm = va->vm;
>>  vaddr = (char *) vm->addr;
>> -if (addr >= vaddr + vm->size - PAGE_SIZE)
>> +if (addr >= vaddr + get_vm_area_size(vm))
>>  continue;
>>  while (addr < vaddr) {
>>  if (count == 0)
>> @@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
>>  addr++;
>>  count--;
>>  }
>> -n = vaddr + vm->size - PAGE_SIZE - addr;
>> +n = vaddr + get_vm_area_size(vm) - addr;
>>  if (n > count)
>>  n = count;
>>  if (!(vm->flags & VM_IOREMAP)) {
>> -- 
>> 1.7.1
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/vmalloc: use help function to get vmalloc area size

2013-08-30 Thread Jianguo Wu
Use get_vm_area_size() to get vmalloc area's actual size without guard page.

Signed-off-by: Jianguo Wu 
---
 mm/vmalloc.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 13a5495..abe13bc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long 
size)
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
 {
unsigned long addr = (unsigned long)area->addr;
-   unsigned long end = addr + area->size - PAGE_SIZE;
+   unsigned long end = addr + get_vm_area_size(area);
int err;
 
err = vmap_page_range(addr, end, prot, *pages);
@@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
unsigned int nr_pages, array_size, i;
gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 
-   nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
+   nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));
 
area->nr_pages = nr_pages;
@@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count)
 
vm = va->vm;
vaddr = (char *) vm->addr;
-   if (addr >= vaddr + vm->size - PAGE_SIZE)
+   if (addr >= vaddr + get_vm_area_size(vm))
continue;
while (addr < vaddr) {
if (count == 0)
@@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count)
addr++;
count--;
}
-   n = vaddr + vm->size - PAGE_SIZE - addr;
+   n = vaddr + get_vm_area_size(vm) - addr;
if (n > count)
n = count;
if (!(vm->flags & VM_IOREMAP))
@@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
 
vm = va->vm;
vaddr = (char *) vm->addr;
-   if (addr >= vaddr + vm->size - PAGE_SIZE)
+   if (addr >= vaddr + get_vm_area_size(vm))
continue;
while (addr < vaddr) {
if (count == 0)
@@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
addr++;
count--;
}
-   n = vaddr + vm->size - PAGE_SIZE - addr;
+   n = vaddr + get_vm_area_size(vm) - addr;
if (n > count)
n = count;
if (!(vm->flags & VM_IOREMAP)) {
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-30 Thread Jianguo Wu
On 2013/8/30 15:41, Michal Hocko wrote:

> On Fri 30-08-13 11:44:57, Jianguo Wu wrote:
>> Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
> 
> But this very same commit also says:
> "
> A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:
> 
> One is in page_cgroup_init(void):
> for_each_node_state(nid, N_HIGH_MEMORY) {
> 
> It means if the node have memory, we will allocate page_cgroup map for
> the node. We should use N_MEMORY instead here to gaim more clearly.
> 
> The second using is in alloc_page_cgroup():
> if (node_state(nid, N_HIGH_MEMORY))
> addr = vzalloc_node(size, nid);
> 
> It means if the node has high or normal memory that can be allocated
> from kernel. We should keep N_HIGH_MEMORY here, and it will be better
> if the "any memory" semantic of N_HIGH_MEMORY is removed.
> "
> 
> Which to me sounds like N_HIGH_MEMORY should be kept here. To be honest,

Hi Michal,

You are right, here we need normal or high memory, but not movable memory,
so N_HIGH_MEMORY should be kept here, the same as other patches, please drop 
this series.

Thank you for your point out.

Thanks,
Jianguo Wu.

> the distinction is not entirely clear to me. It was supposed to make
> code cleaner but it apparently causes confusion.
> 
> It would also help if you CCed Lai Jiangshan who has introduced this
> distinction. CCed now.
> 
> I wasn't CCed on the rest of the series but if you do the same
> conversion, please make sure that this is not the case for others as
> well.
> 
>> we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any 
>> memory,
>> and N_HIGH_MEMORY stands for the nodes that has normal or high memory.
>>
>> The code here need to handle with the nodes which have memory,
>> we should use N_MEMORY instead.
>>
>> Signed-off-by: Xishi Qiu 
>> ---
>>  mm/page_cgroup.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
>> index 6d757e3..f6f7603 100644
>> --- a/mm/page_cgroup.c
>> +++ b/mm/page_cgroup.c
>> @@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, 
>> int nid)
>>  return addr;
>>  }
>>  
>> -if (node_state(nid, N_HIGH_MEMORY))
>> +if (node_state(nid, N_MEMORY))
>>  addr = vzalloc_node(size, nid);
>>  else
>>  addr = vzalloc(size);
>> -- 
>> 1.7.1
>>
>>
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-30 Thread Jianguo Wu
On 2013/8/30 15:41, Michal Hocko wrote:

 On Fri 30-08-13 11:44:57, Jianguo Wu wrote:
 Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
 
 But this very same commit also says:
 
 A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:
 
 One is in page_cgroup_init(void):
 for_each_node_state(nid, N_HIGH_MEMORY) {
 
 It means if the node have memory, we will allocate page_cgroup map for
 the node. We should use N_MEMORY instead here to gaim more clearly.
 
 The second using is in alloc_page_cgroup():
 if (node_state(nid, N_HIGH_MEMORY))
 addr = vzalloc_node(size, nid);
 
 It means if the node has high or normal memory that can be allocated
 from kernel. We should keep N_HIGH_MEMORY here, and it will be better
 if the any memory semantic of N_HIGH_MEMORY is removed.
 
 
 Which to me sounds like N_HIGH_MEMORY should be kept here. To be honest,

Hi Michal,

You are right, here we need normal or high memory, but not movable memory,
so N_HIGH_MEMORY should be kept here, the same as other patches, please drop 
this series.

Thank you for your point out.

Thanks,
Jianguo Wu.

 the distinction is not entirely clear to me. It was supposed to make
 code cleaner but it apparently causes confusion.
 
 It would also help if you CCed Lai Jiangshan who has introduced this
 distinction. CCed now.
 
 I wasn't CCed on the rest of the series but if you do the same
 conversion, please make sure that this is not the case for others as
 well.
 
 we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any 
 memory,
 and N_HIGH_MEMORY stands for the nodes that has normal or high memory.

 The code here need to handle with the nodes which have memory,
 we should use N_MEMORY instead.

 Signed-off-by: Xishi Qiu qiuxi...@huawei.com
 ---
  mm/page_cgroup.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
 index 6d757e3..f6f7603 100644
 --- a/mm/page_cgroup.c
 +++ b/mm/page_cgroup.c
 @@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, 
 int nid)
  return addr;
  }
  
 -if (node_state(nid, N_HIGH_MEMORY))
 +if (node_state(nid, N_MEMORY))
  addr = vzalloc_node(size, nid);
  else
  addr = vzalloc(size);
 -- 
 1.7.1


 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/vmalloc: use help function to get vmalloc area size

2013-08-30 Thread Jianguo Wu
Use get_vm_area_size() to get vmalloc area's actual size without guard page.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/vmalloc.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 13a5495..abe13bc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long 
size)
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
 {
unsigned long addr = (unsigned long)area-addr;
-   unsigned long end = addr + area-size - PAGE_SIZE;
+   unsigned long end = addr + get_vm_area_size(area);
int err;
 
err = vmap_page_range(addr, end, prot, *pages);
@@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
unsigned int nr_pages, array_size, i;
gfp_t nested_gfp = (gfp_mask  GFP_RECLAIM_MASK) | __GFP_ZERO;
 
-   nr_pages = (area-size - PAGE_SIZE)  PAGE_SHIFT;
+   nr_pages = get_vm_area_size(area)  PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));
 
area-nr_pages = nr_pages;
@@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count)
 
vm = va-vm;
vaddr = (char *) vm-addr;
-   if (addr = vaddr + vm-size - PAGE_SIZE)
+   if (addr = vaddr + get_vm_area_size(vm))
continue;
while (addr  vaddr) {
if (count == 0)
@@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count)
addr++;
count--;
}
-   n = vaddr + vm-size - PAGE_SIZE - addr;
+   n = vaddr + get_vm_area_size(vm) - addr;
if (n  count)
n = count;
if (!(vm-flags  VM_IOREMAP))
@@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
 
vm = va-vm;
vaddr = (char *) vm-addr;
-   if (addr = vaddr + vm-size - PAGE_SIZE)
+   if (addr = vaddr + get_vm_area_size(vm))
continue;
while (addr  vaddr) {
if (count == 0)
@@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
addr++;
count--;
}
-   n = vaddr + vm-size - PAGE_SIZE - addr;
+   n = vaddr + get_vm_area_size(vm) - addr;
if (n  count)
n = count;
if (!(vm-flags  VM_IOREMAP)) {
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/vmalloc: use help function to get vmalloc area size

2013-08-30 Thread Jianguo Wu
On 2013/8/30 16:49, Wanpeng Li wrote:

 On Fri, Aug 30, 2013 at 04:42:49PM +0800, Jianguo Wu wrote:
 Use get_vm_area_size() to get vmalloc area's actual size without guard page.

 
 Do you see this?
 
 http://marc.info/?l=linux-mmm=137698172417316w=2
 

Hi Wanpeng,
Sorry for not notice your post, please ignore this patch.

Thanks.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
 mm/vmalloc.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/mm/vmalloc.c b/mm/vmalloc.c
 index 13a5495..abe13bc 100644
 --- a/mm/vmalloc.c
 +++ b/mm/vmalloc.c
 @@ -1263,7 +1263,7 @@ void unmap_kernel_range(unsigned long addr, unsigned 
 long size)
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
 {
  unsigned long addr = (unsigned long)area-addr;
 -unsigned long end = addr + area-size - PAGE_SIZE;
 +unsigned long end = addr + get_vm_area_size(area);
  int err;

  err = vmap_page_range(addr, end, prot, *pages);
 @@ -1558,7 +1558,7 @@ static void *__vmalloc_area_node(struct vm_struct 
 *area, gfp_t gfp_mask,
  unsigned int nr_pages, array_size, i;
  gfp_t nested_gfp = (gfp_mask  GFP_RECLAIM_MASK) | __GFP_ZERO;

 -nr_pages = (area-size - PAGE_SIZE)  PAGE_SHIFT;
 +nr_pages = get_vm_area_size(area)  PAGE_SHIFT;
  array_size = (nr_pages * sizeof(struct page *));

  area-nr_pages = nr_pages;
 @@ -1990,7 +1990,7 @@ long vread(char *buf, char *addr, unsigned long count)

  vm = va-vm;
  vaddr = (char *) vm-addr;
 -if (addr = vaddr + vm-size - PAGE_SIZE)
 +if (addr = vaddr + get_vm_area_size(vm))
  continue;
  while (addr  vaddr) {
  if (count == 0)
 @@ -2000,7 +2000,7 @@ long vread(char *buf, char *addr, unsigned long count)
  addr++;
  count--;
  }
 -n = vaddr + vm-size - PAGE_SIZE - addr;
 +n = vaddr + get_vm_area_size(vm) - addr;
  if (n  count)
  n = count;
  if (!(vm-flags  VM_IOREMAP))
 @@ -2072,7 +2072,7 @@ long vwrite(char *buf, char *addr, unsigned long count)

  vm = va-vm;
  vaddr = (char *) vm-addr;
 -if (addr = vaddr + vm-size - PAGE_SIZE)
 +if (addr = vaddr + get_vm_area_size(vm))
  continue;
  while (addr  vaddr) {
  if (count == 0)
 @@ -2081,7 +2081,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
  addr++;
  count--;
  }
 -n = vaddr + vm-size - PAGE_SIZE - addr;
 +n = vaddr + get_vm_area_size(vm) - addr;
  if (n  count)
  n = count;
  if (!(vm-flags  VM_IOREMAP)) {
 -- 
 1.7.1


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] mm/vmalloc: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-30 Thread Jianguo Wu
On 2013/8/30 11:36, Jianguo Wu wrote:

 Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
 we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory,
 and N_HIGH_MEMORY stands for the nodes that has normal or high memory.
 
 The code here need to handle with the nodes which have memory,
 we should use N_MEMORY instead.
 

As Michal pointed out in http://marc.info/?l=linux-kernelm=137784852720861w=2,
N_HIGH_MEMORY should be kept in these places, please ignore this series.

Sorry for the noise.

Thanks.

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
  mm/vmalloc.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/mm/vmalloc.c b/mm/vmalloc.c
 index 13a5495..1152947 100644
 --- a/mm/vmalloc.c
 +++ b/mm/vmalloc.c
 @@ -2573,7 +2573,7 @@ static void show_numa_info(struct seq_file *m, struct 
 vm_struct *v)
   for (nr = 0; nr  v-nr_pages; nr++)
   counters[page_to_nid(v-pages[nr])]++;
  
 - for_each_node_state(nr, N_HIGH_MEMORY)
 + for_each_node_state(nr, N_MEMORY)
   if (counters[nr])
   seq_printf(m,  N%u=%u, nr, counters[nr]);
   }



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] mm/arch: use NUMA_NODE

2013-08-30 Thread Jianguo Wu
Cc linux...@kvack.org

On 2013/8/30 10:06, Jianguo Wu wrote:

 Use more appropriate NUMA_NO_NODE instead of -1 in some archs' module_alloc()
 
 Signed-off-by: Jianguo Wu wujian...@huawei.com
 ---
  arch/arm/kernel/module.c|2 +-
  arch/arm64/kernel/module.c  |2 +-
  arch/mips/kernel/module.c   |2 +-
  arch/parisc/kernel/module.c |2 +-
  arch/s390/kernel/module.c   |2 +-
  arch/sparc/kernel/module.c  |2 +-
  arch/x86/kernel/module.c|2 +-
  7 files changed, 7 insertions(+), 7 deletions(-)
 
 diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
 index 85c3fb6..8f4cff3 100644
 --- a/arch/arm/kernel/module.c
 +++ b/arch/arm/kernel/module.c
 @@ -40,7 +40,7 @@
  void *module_alloc(unsigned long size)
  {
   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
 - GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
 + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
   __builtin_return_address(0));
  }
  #endif
 diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
 index ca0e3d5..8f898bd 100644
 --- a/arch/arm64/kernel/module.c
 +++ b/arch/arm64/kernel/module.c
 @@ -29,7 +29,7 @@
  void *module_alloc(unsigned long size)
  {
   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
 - GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
 + GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
   __builtin_return_address(0));
  }
  
 diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
 index 977a623..b507e07 100644
 --- a/arch/mips/kernel/module.c
 +++ b/arch/mips/kernel/module.c
 @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(dbe_lock);
  void *module_alloc(unsigned long size)
  {
   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
 - GFP_KERNEL, PAGE_KERNEL, -1,
 + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
   __builtin_return_address(0));
  }
  #endif
 diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
 index 2a625fb..50dfafc 100644
 --- a/arch/parisc/kernel/module.c
 +++ b/arch/parisc/kernel/module.c
 @@ -219,7 +219,7 @@ void *module_alloc(unsigned long size)
* init_data correctly */
   return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
   GFP_KERNEL | __GFP_HIGHMEM,
 - PAGE_KERNEL_RWX, -1,
 + PAGE_KERNEL_RWX, NUMA_NO_NODE,
   __builtin_return_address(0));
  }
  
 diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
 index 7845e15..b89b591 100644
 --- a/arch/s390/kernel/module.c
 +++ b/arch/s390/kernel/module.c
 @@ -50,7 +50,7 @@ void *module_alloc(unsigned long size)
   if (PAGE_ALIGN(size)  MODULES_LEN)
   return NULL;
   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
 - GFP_KERNEL, PAGE_KERNEL, -1,
 + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
   __builtin_return_address(0));
  }
  #endif
 diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
 index 4435488..97655e0 100644
 --- a/arch/sparc/kernel/module.c
 +++ b/arch/sparc/kernel/module.c
 @@ -29,7 +29,7 @@ static void *module_map(unsigned long size)
   if (PAGE_ALIGN(size)  MODULES_LEN)
   return NULL;
   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
 - GFP_KERNEL, PAGE_KERNEL, -1,
 + GFP_KERNEL, PAGE_KERNEL, NUMA_NO_NODE,
   __builtin_return_address(0));
  }
  #else
 diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
 index 216a4d7..18be189 100644
 --- a/arch/x86/kernel/module.c
 +++ b/arch/x86/kernel/module.c
 @@ -49,7 +49,7 @@ void *module_alloc(unsigned long size)
   return NULL;
   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
   GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
 - -1, __builtin_return_address(0));
 + NUMA_NO_NODE, __builtin_return_address(0));
  }
  
  #ifdef CONFIG_X86_32



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-29 Thread Jianguo Wu
On 2013/8/30 11:44, Jianguo Wu wrote:

> Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
> we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory,
> and N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> 
> The code here need to handle with the nodes which have memory,
> we should use N_MEMORY instead.
> 
> Signed-off-by: Xishi Qiu 

Sorry, it's should be "Signed-off-by: Jianguo Wu "

> ---
>  mm/page_cgroup.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 6d757e3..f6f7603 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, int 
> nid)
>   return addr;
>   }
>  
> - if (node_state(nid, N_HIGH_MEMORY))
> + if (node_state(nid, N_MEMORY))
>   addr = vzalloc_node(size, nid);
>   else
>   addr = vzalloc(size);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] mm/cgroup: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-29 Thread Jianguo Wu
Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory,
and N_HIGH_MEMORY stands for the nodes that has normal or high memory.

The code here need to handle with the nodes which have memory,
we should use N_MEMORY instead.

Signed-off-by: Xishi Qiu 
---
 mm/page_cgroup.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 6d757e3..f6f7603 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -116,7 +116,7 @@ static void *__meminit alloc_page_cgroup(size_t size, int 
nid)
return addr;
}
 
-   if (node_state(nid, N_HIGH_MEMORY))
+   if (node_state(nid, N_MEMORY))
addr = vzalloc_node(size, nid);
else
addr = vzalloc(size);
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] mm/ia64: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-29 Thread Jianguo Wu
Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory,
and N_HIGH_MEMORY stands for the nodes that has normal or high memory.

The code here need to handle with the nodes which have memory,
we should use N_MEMORY instead.

Signed-off-by: Jianguo Wu 
---
 arch/ia64/kernel/uncached.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index a96bcf8..d2e5545 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -196,7 +196,7 @@ unsigned long uncached_alloc_page(int starting_nid, int 
n_pages)
nid = starting_nid;
 
do {
-   if (!node_state(nid, N_HIGH_MEMORY))
+   if (!node_state(nid, N_MEMORY))
continue;
uc_pool = _pools[nid];
if (uc_pool->pool == NULL)
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] mm/vmemmap: use N_MEMORY instead of N_HIGH_MEMORY

2013-08-29 Thread Jianguo Wu
Since commit 8219fc48a(mm: node_states: introduce N_MEMORY),
we introduced N_MEMORY, now N_MEMORY stands for the nodes that has any memory,
and N_HIGH_MEMORY stands for the nodes that has normal or high memory.

The code here need to handle with the nodes which have memory,
we should use N_MEMORY instead.

Signed-off-by: Jianguo Wu 
---
 mm/sparse-vmemmap.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 27eeab3..ca8f46b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -52,7 +52,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int 
node)
if (slab_is_available()) {
struct page *page;
 
-   if (node_state(node, N_HIGH_MEMORY))
+   if (node_state(node, N_MEMORY))
page = alloc_pages_node(
node, GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT,
get_order(size));
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >