Re: [PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks

2014-03-06 Thread Sasha Levin
On 03/06/2014 11:08 AM, Naoya Horiguchi wrote:
> And I found my patch was totally wrong because it should check
> !pte_present(), not pte_present().
> I'm testing fixed one (see below), and the problem seems not to reproduce
> in my environment at least for now.
> But I'm not 100% sure, so I need your double checking.

Nope, I still see the problem. Same NULL deref and trace as before.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks

2014-03-05 Thread Sasha Levin
On 03/04/2014 06:49 PM, Naoya Horiguchi wrote:
> On Tue, Mar 04, 2014 at 05:46:52PM -0500, Sasha Levin wrote:
>> On 03/04/2014 04:32 PM, Naoya Horiguchi wrote:
>>> # sorry if duplicate message
>>>
>>> On Mon, Mar 03, 2014 at 04:38:41PM -0500, Sasha Levin wrote:
 On 03/03/2014 03:06 PM, Sasha Levin wrote:
> On 03/03/2014 12:02 AM, Naoya Horiguchi wrote:
>> Hi Sasha,
>>
 I can confirm that with this patch the lockdep issue is gone. However, 
 the NULL deref in
 walk_pte_range() and the BUG at mm/hugemem.c:3580 still appear.
>> I spotted the cause of this problem.
>> Could you try testing if this patch fixes it?
>
> I'm seeing a different failure with this patch:

 And the NULL deref still happens.
>>>
>>> I don't yet find out the root reason why this issue remains.
>>> So I tried to run trinity myself but the problem didn't reproduce.
>>> (I did simply like "./trinity --group vm --dangerous" a few hours.)
>>> Could you show more detail or tips about how the problem occurs?
>>
>> I run it as root in a disposable vm, that may be the difference here.
> 
> Sorry, I didn't write it but I also run it as root on VM, so condition is
> the same. It might depend on kernel config, so I'm now trying the config
> you previously gave me, but it doesn't boot correctly on my environment
> (panic in initialization). I may need some time to get over this.

I'd be happy to help with anything off-list, it shouldn't be too difficult
to get that kernel to boot :)

I've also reverted the page walker series for now, it makes it impossible
to test anything else since it seems that hitting one of the issues is quite
easy.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks

2014-03-04 Thread Sasha Levin
On 03/04/2014 04:32 PM, Naoya Horiguchi wrote:
> # sorry if duplicate message
> 
> On Mon, Mar 03, 2014 at 04:38:41PM -0500, Sasha Levin wrote:
>> On 03/03/2014 03:06 PM, Sasha Levin wrote:
>>> On 03/03/2014 12:02 AM, Naoya Horiguchi wrote:
 Hi Sasha,

>> I can confirm that with this patch the lockdep issue is gone. However, 
>> the NULL deref in
>> walk_pte_range() and the BUG at mm/hugemem.c:3580 still appear.
 I spotted the cause of this problem.
 Could you try testing if this patch fixes it?
>>>
>>> I'm seeing a different failure with this patch:
>>
>> And the NULL deref still happens.
> 
> I don't yet find out the root reason why this issue remains.
> So I tried to run trinity myself but the problem didn't reproduce.
> (I did simply like "./trinity --group vm --dangerous" a few hours.)
> Could you show more detail or tips about how the problem occurs?

I run it as root in a disposable vm, that may be the difference here.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks

2014-03-03 Thread Sasha Levin

On 03/03/2014 03:06 PM, Sasha Levin wrote:

On 03/03/2014 12:02 AM, Naoya Horiguchi wrote:

Hi Sasha,


>I can confirm that with this patch the lockdep issue is gone. However, the 
NULL deref in
>walk_pte_range() and the BUG at mm/hugemem.c:3580 still appear.

I spotted the cause of this problem.
Could you try testing if this patch fixes it?


I'm seeing a different failure with this patch:


And the NULL deref still happens.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks

2014-03-03 Thread Sasha Levin

On 03/03/2014 12:02 AM, Naoya Horiguchi wrote:

Hi Sasha,


>I can confirm that with this patch the lockdep issue is gone. However, the 
NULL deref in
>walk_pte_range() and the BUG at mm/hugemem.c:3580 still appear.

I spotted the cause of this problem.
Could you try testing if this patch fixes it?


I'm seeing a different failure with this patch:

[ 1860.669114] BUG: unable to handle kernel NULL pointer dereference at 
0050
[ 1860.670498] IP: [] vm_normal_page+0x3f/0x90
[ 1860.672795] PGD 6c1c84067 PUD 6e0a3d067 PMD 0
[ 1860.672795] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1860.672795] Dumping ftrace buffer:
[ 1860.672795](ftrace buffer empty)
[ 1860.672795] Modules linked in:
[ 1860.672795] CPU: 4 PID: 34914 Comm: trinity-c184 Tainted: GW
3.14.0-rc4-ne
[ 1860.672795] task: 880717d9 ti: 88070b3da000 task.ti: 
88070b3da000
[ 1860.672795] RIP: 0010:[]  [] 
vm_normal_page+0x3f/
[ 1860.672795] RSP: 0018:88070b3dbba8  EFLAGS: 00010202
[ 1860.672795] RAX: 767f RBX: 88070b3dbdd8 RCX: 88070b3dbd78
[ 1860.672795] RDX: 8767f225 RSI: 01699000 RDI: 8767f225
[ 1860.672795] RBP: 88070b3dbba8 R08:  R09: 
[ 1860.672795] R10: 0001 R11:  R12: 880717df24c8
[ 1860.672795] R13: 0020 R14: 01699000 R15: 0180
[ 1860.672795] FS:  7f20a3584700() GS:88052b80() 
knlGS:0
[ 1860.672795] CS:  0010 DS:  ES:  CR0: 80050033
[ 1860.672795] CR2: 0050 CR3: 0006d73cf000 CR4: 06e0
[ 1860.672795] Stack:
[ 1860.672795]  88070b3dbbd8 812c2f3d 812b2dc0 
0169a000
[ 1860.672795]  880717df24c8 88070b3dbd78 88070b3dbc28 
812b2e00
[ 1860.672795]   88072956bcf0 88070b3dbc28 
8806e0a3d018
[ 1860.672795] Call Trace:
[ 1860.672795]  [] queue_pages_pte+0x3d/0xd0
[ 1860.672795]  [] ? walk_pte_range+0xc0/0x180
[ 1860.672795]  [] walk_pte_range+0x100/0x180
[ 1860.672795]  [] walk_pmd_range+0x211/0x240
[ 1860.672795]  [] walk_pud_range+0x12b/0x160
[ 1860.672795]  [] ? __slab_free+0x384/0x5e0
[ 1860.672795]  [] walk_pgd_range+0x109/0x140
[ 1860.672795]  [] __walk_page_range+0x35/0x40
[ 1860.672795]  [] walk_page_range+0xf2/0x130
[ 1860.672795]  [] queue_pages_range+0x71/0x90
[ 1860.672795]  [] ? queue_pages_hugetlb+0xa0/0xa0
[ 1860.672795]  [] ? queue_pages_range+0x90/0x90
[ 1860.672795]  [] ? change_prot_numa+0x30/0x30
[ 1860.672795]  [] do_mbind+0x321/0x340
[ 1860.672795]  [] ? might_fault+0x9f/0xb0
[ 1860.672795]  [] ? might_fault+0x56/0xb0
[ 1860.672795]  [] SYSC_mbind+0x89/0xb0
[ 1860.672795]  [] ? context_tracking_user_exit+0x195/0x1d0
[ 1860.672795]  [] SyS_mbind+0xe/0x10
[ 1860.672795]  [] tracesys+0xdd/0xe2


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks

2014-03-02 Thread Naoya Horiguchi
Hi Sasha,

> I can confirm that with this patch the lockdep issue is gone. However, the 
> NULL deref in
> walk_pte_range() and the BUG at mm/hugemem.c:3580 still appear.

I spotted the cause of this problem.
Could you try testing if this patch fixes it?

Thanks,
Naoya
---
Page table walker doesn't check non-present hugetlb entry in common path,
so hugetlb_entry() callbacks must check it. The reason for this behavior
is that some callers want to handle it in its own way.

However, some callers don't check it now, which causes unpredictable result,
for example when we have a race between migrating hugepage and reading
/proc/pid/numa_maps. This patch fixes it by adding pte_present checks on
buggy callbacks.

This bug exists for long and got visible by introducing hugepage migration.

Reported-by: Sasha Levin 
Signed-off-by: Naoya Horiguchi 
Cc: sta...@vger.kernel.org # 3.12+
---
 fs/proc/task_mmu.c | 3 +++
 mm/mempolicy.c | 6 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git next-20140228.orig/fs/proc/task_mmu.c 
next-20140228/fs/proc/task_mmu.c
index 3746d89c768b..a4cecadce867 100644
--- next-20140228.orig/fs/proc/task_mmu.c
+++ next-20140228/fs/proc/task_mmu.c
@@ -1299,6 +1299,9 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long 
addr,
if (pte_none(*pte))
return 0;
 
+   if (pte_present(*pte))
+   return 0;
+
page = pte_page(*pte);
if (!page)
return 0;
diff --git next-20140228.orig/mm/mempolicy.c next-20140228/mm/mempolicy.c
index c0d1cbd68790..1e171186ee6d 100644
--- next-20140228.orig/mm/mempolicy.c
+++ next-20140228/mm/mempolicy.c
@@ -524,8 +524,12 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long 
addr,
unsigned long flags = qp->flags;
int nid;
struct page *page;
+   pte_t entry;
 
-   page = pte_page(huge_ptep_get(pte));
+   entry = huge_ptep_get(pte);
+   if (pte_present(entry))
+   return 0;
+   page = pte_page(entry);
nid = page_to_nid(page);
if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
return 0;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/