Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-23 Thread Sasha Levin
On 05/23/2014 05:16 AM, Kirill A. Shutemov wrote:
> On Fri, May 23, 2014 at 12:21:47AM -0400, Sasha Levin wrote:
>> On 05/22/2014 09:58 AM, Dave Jones wrote:
>>> Not sure if Sasha has already reported this on -next (It's getting hard
>>> to keep track of all the VM bugs he's been finding), but I hit this 
>>> overnight
>>> on .15-rc6.  First time I've seen this one.
>>
>> Unfortunately I had to disable transhuge/hugetlb in my testing .config since
>> the open issues in -next get hit pretty often, and were unfixed for a while
>> now.
> 
> What THP-related is not fixed by now? collapse hung? what else?

Besides the collapse hang, we have this: https://lkml.org/lkml/2013/3/29/103 .

I know it's not a "real" bug, but DEBUG_PAGEALLOC misbehaving, but it's
still something that makes testing difficult.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-23 Thread Vlastimil Babka

On 05/22/2014 05:41 PM, Dave Jones wrote:

On Thu, May 22, 2014 at 05:08:09PM +0200, Vlastimil Babka wrote:

  > > RIP: 0010:[]  [] 
PageTransHuge.part.23+0xb/0xd
  > > Call Trace:
  > >   [] isolate_migratepages_range+0x7a3/0x870
  > >   [] compact_zone+0x370/0x560
  > >   [] compact_zone_order+0xa2/0x110
  > >   [] try_to_compact_pages+0x101/0x130
  > > ...
  > > Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d 
c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff <0f> 0b 0f 1f 44 00 00 55 48 
89 e5 41 57 45 31 ff 41 56 49 89 fe
  > > RIP  []
  > >
  > > That BUG is..
  > >
  > > 413 static inline int PageTransHuge(struct page *page)
  > > 414 {
  > > 415 VM_BUG_ON_PAGE(PageTail(page), page);
  > > 416 return PageHead(page);
  > > 417 }
  >
  > Any idea which of the two PageTransHuge() calls in
  > isolate_migratepages_range() that is? Offset far in the function suggest
  > it's where the lru lock is already held, but I'm not sure as decodecode
  > of your dump and objdump of my own compile look widely different.

Yeah, the only thing the code: matches is the BUG() which is in another section.


Oh right, it's not a simple BUG_ON that would be inlined.


(see end of file at http://paste.fedoraproject.org/104155/40077293/raw/)

Maybe you can make more sense of that disassembly than I can..


Could you try adding -r to objdump, as now I have no idea where all 
those calls go :/



Dave


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-23 Thread Kirill A. Shutemov
On Fri, May 23, 2014 at 12:21:47AM -0400, Sasha Levin wrote:
> On 05/22/2014 09:58 AM, Dave Jones wrote:
> > Not sure if Sasha has already reported this on -next (It's getting hard
> > to keep track of all the VM bugs he's been finding), but I hit this 
> > overnight
> > on .15-rc6.  First time I've seen this one.
> 
> Unfortunately I had to disable transhuge/hugetlb in my testing .config since
> the open issues in -next get hit pretty often, and were unfixed for a while
> now.

What THP-related is not fixed by now? collapse hung? what else?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-23 Thread Kirill A. Shutemov
On Fri, May 23, 2014 at 12:21:47AM -0400, Sasha Levin wrote:
 On 05/22/2014 09:58 AM, Dave Jones wrote:
  Not sure if Sasha has already reported this on -next (It's getting hard
  to keep track of all the VM bugs he's been finding), but I hit this 
  overnight
  on .15-rc6.  First time I've seen this one.
 
 Unfortunately I had to disable transhuge/hugetlb in my testing .config since
 the open issues in -next get hit pretty often, and were unfixed for a while
 now.

What THP-related is not fixed by now? collapse hung? what else?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-23 Thread Vlastimil Babka

On 05/22/2014 05:41 PM, Dave Jones wrote:

On Thu, May 22, 2014 at 05:08:09PM +0200, Vlastimil Babka wrote:

RIP: 0010:[bb718d98]  [bb718d98] 
PageTransHuge.part.23+0xb/0xd
Call Trace:
  [bb1728a3] isolate_migratepages_range+0x7a3/0x870
  [bb172d90] compact_zone+0x370/0x560
  [bb173022] compact_zone_order+0xa2/0x110
  [bb1733f1] try_to_compact_pages+0x101/0x130
...
Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d 
c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff 0f 0b 0f 1f 44 00 00 55 48 
89 e5 41 57 45 31 ff 41 56 49 89 fe
RIP  [bb718d98]
   
That BUG is..
   
413 static inline int PageTransHuge(struct page *page)
414 {
415 VM_BUG_ON_PAGE(PageTail(page), page);
416 return PageHead(page);
417 }
  
   Any idea which of the two PageTransHuge() calls in
   isolate_migratepages_range() that is? Offset far in the function suggest
   it's where the lru lock is already held, but I'm not sure as decodecode
   of your dump and objdump of my own compile look widely different.

Yeah, the only thing the code: matches is the BUG() which is in another section.


Oh right, it's not a simple BUG_ON that would be inlined.


(see end of file at http://paste.fedoraproject.org/104155/40077293/raw/)

Maybe you can make more sense of that disassembly than I can..


Could you try adding -r to objdump, as now I have no idea where all 
those calls go :/



Dave


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-23 Thread Sasha Levin
On 05/23/2014 05:16 AM, Kirill A. Shutemov wrote:
 On Fri, May 23, 2014 at 12:21:47AM -0400, Sasha Levin wrote:
 On 05/22/2014 09:58 AM, Dave Jones wrote:
 Not sure if Sasha has already reported this on -next (It's getting hard
 to keep track of all the VM bugs he's been finding), but I hit this 
 overnight
 on .15-rc6.  First time I've seen this one.

 Unfortunately I had to disable transhuge/hugetlb in my testing .config since
 the open issues in -next get hit pretty often, and were unfixed for a while
 now.
 
 What THP-related is not fixed by now? collapse hung? what else?

Besides the collapse hang, we have this: https://lkml.org/lkml/2013/3/29/103 .

I know it's not a real bug, but DEBUG_PAGEALLOC misbehaving, but it's
still something that makes testing difficult.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Vlastimil Babka

On 23.5.2014 6:21, Sasha Levin wrote:

On 05/22/2014 09:58 AM, Dave Jones wrote:

Not sure if Sasha has already reported this on -next (It's getting hard
to keep track of all the VM bugs he's been finding), but I hit this overnight
on .15-rc6.  First time I've seen this one.

Unfortunately I had to disable transhuge/hugetlb in my testing .config since
the open issues in -next get hit pretty often, and were unfixed for a while
now.


Meh, I lost track. We should really consider something like bugzilla for 
this, IMHO.



Keeping them enabled just made it impossible to test anything else.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Sasha Levin
On 05/22/2014 09:58 AM, Dave Jones wrote:
> Not sure if Sasha has already reported this on -next (It's getting hard
> to keep track of all the VM bugs he's been finding), but I hit this overnight
> on .15-rc6.  First time I've seen this one.

Unfortunately I had to disable transhuge/hugetlb in my testing .config since
the open issues in -next get hit pretty often, and were unfixed for a while
now.

Keeping them enabled just made it impossible to test anything else.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Dave Jones
On Thu, May 22, 2014 at 05:08:09PM +0200, Vlastimil Babka wrote:
 
 > > RIP: 0010:[]  [] 
 > > PageTransHuge.part.23+0xb/0xd
 > > Call Trace:
 > >   [] isolate_migratepages_range+0x7a3/0x870
 > >   [] compact_zone+0x370/0x560
 > >   [] compact_zone_order+0xa2/0x110
 > >   [] try_to_compact_pages+0x101/0x130
 > > ...
 > > Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 
 > > 5d c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff <0f> 0b 0f 1f 
 > > 44 00 00 55 48 89 e5 41 57 45 31 ff 41 56 49 89 fe
 > > RIP  []
 > >
 > > That BUG is..
 > >
 > > 413 static inline int PageTransHuge(struct page *page)
 > > 414 {
 > > 415 VM_BUG_ON_PAGE(PageTail(page), page);
 > > 416 return PageHead(page);
 > > 417 }
 > 
 > Any idea which of the two PageTransHuge() calls in 
 > isolate_migratepages_range() that is? Offset far in the function suggest 
 > it's where the lru lock is already held, but I'm not sure as decodecode 
 > of your dump and objdump of my own compile look widely different.

Yeah, the only thing the code: matches is the BUG() which is in another section.
(see end of file at http://paste.fedoraproject.org/104155/40077293/raw/)

Maybe you can make more sense of that disassembly than I can..

Dave


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Vlastimil Babka

On 05/22/2014 03:58 PM, Dave Jones wrote:

Not sure if Sasha has already reported this on -next (It's getting hard
to keep track of all the VM bugs he's been finding), but I hit this overnight
on .15-rc6.  First time I've seen this one.


page:ea0004599800 count:0 mapcount:0 mapping:  (null) index:0x2
page flags: 0x208000(tail)
[ cut here ]
kernel BUG at include/linux/page-flags.h:415!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 6858 Comm: trinity-c42 Not tainted 3.15.0-rc6+ #216
task: 88012d18e900 ti: 88009e87a000 task.ti: 88009e87a000
RIP: 0010:[]  [] 
PageTransHuge.part.23+0xb/0xd
RSP: :88009e87b940  EFLAGS: 00010246
RAX: 0001 RBX: 00116660 RCX: 0006
RDX:  RSI: bb0c00f8 RDI: bb0bfed2
RBP: 88009e87b940 R08: bc01203c R09: 03da
R10: 03d9 R11: 0003 R12: 0001
R13: 00116800 R14: 88024d64ce00 R15: ea0004599800
FS:  7f4fd192e740() GS:88024d04() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 04c0 CR3: a19ce000 CR4: 001407e0
DR0: 024f4000 DR1: 01d43000 DR2: 
DR3:  DR6: fffe0ff0 DR7: 0600
Stack:
  88009e87b9e8 bb1728a3 88009e87b9e8 88009e87baa8
  88012d18e900 88009e87ba60  00040016
   88009e87bfd8 08b3 88009e87ba50
Call Trace:
  [] isolate_migratepages_range+0x7a3/0x870
  [] compact_zone+0x370/0x560
  [] compact_zone_order+0xa2/0x110
  [] try_to_compact_pages+0x101/0x130
  [] __alloc_pages_direct_compact+0xac/0x1d0
  [] __alloc_pages_nodemask+0x6ab/0xaf0
  [] alloc_pages_vma+0x9a/0x160
  [] do_huge_pmd_anonymous_page+0xfd/0x3c0
  [] ? get_parent_ip+0xd/0x50
  [] handle_mm_fault+0x158/0xcb0
  [] ? retint_restore_args+0xe/0xe
  [] __do_page_fault+0x1a6/0x620
  [] ? __acct_update_integrals+0x8e/0x120
  [] ? get_parent_ip+0xd/0x50
  [] ? preempt_count_sub+0x6b/0xf0
  [] do_page_fault+0x1e/0x70
Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d c6 05 
74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff <0f> 0b 0f 1f 44 00 00 55 48 
89 e5 41 57 45 31 ff 41 56 49 89 fe
RIP  []

That BUG is..

413 static inline int PageTransHuge(struct page *page)
414 {
415 VM_BUG_ON_PAGE(PageTail(page), page);
416 return PageHead(page);
417 }


Any idea which of the two PageTransHuge() calls in 
isolate_migratepages_range() that is? Offset far in the function suggest 
it's where the lru lock is already held, but I'm not sure as decodecode 
of your dump and objdump of my own compile look widely different.


If it's indeed the later PageTransHuge() call, it means that somebody 
else has cleared PageLRU and set PageTail (I don't think a page could 
have both at once) between the checks for PageLRU() and PageTransHuge() 
in isolate_migratepages_range(), while the latter was holding lru_lock. 
That's quite weird...


Vlastimil


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Dave Jones
Not sure if Sasha has already reported this on -next (It's getting hard
to keep track of all the VM bugs he's been finding), but I hit this overnight
on .15-rc6.  First time I've seen this one.


page:ea0004599800 count:0 mapcount:0 mapping:  (null) index:0x2
page flags: 0x208000(tail)
[ cut here ]
kernel BUG at include/linux/page-flags.h:415!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 6858 Comm: trinity-c42 Not tainted 3.15.0-rc6+ #216
task: 88012d18e900 ti: 88009e87a000 task.ti: 88009e87a000
RIP: 0010:[]  [] 
PageTransHuge.part.23+0xb/0xd
RSP: :88009e87b940  EFLAGS: 00010246
RAX: 0001 RBX: 00116660 RCX: 0006
RDX:  RSI: bb0c00f8 RDI: bb0bfed2
RBP: 88009e87b940 R08: bc01203c R09: 03da
R10: 03d9 R11: 0003 R12: 0001
R13: 00116800 R14: 88024d64ce00 R15: ea0004599800
FS:  7f4fd192e740() GS:88024d04() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 04c0 CR3: a19ce000 CR4: 001407e0
DR0: 024f4000 DR1: 01d43000 DR2: 
DR3:  DR6: fffe0ff0 DR7: 0600
Stack:
 88009e87b9e8 bb1728a3 88009e87b9e8 88009e87baa8
 88012d18e900 88009e87ba60  00040016
  88009e87bfd8 08b3 88009e87ba50
Call Trace:
 [] isolate_migratepages_range+0x7a3/0x870
 [] compact_zone+0x370/0x560
 [] compact_zone_order+0xa2/0x110
 [] try_to_compact_pages+0x101/0x130
 [] __alloc_pages_direct_compact+0xac/0x1d0
 [] __alloc_pages_nodemask+0x6ab/0xaf0
 [] alloc_pages_vma+0x9a/0x160
 [] do_huge_pmd_anonymous_page+0xfd/0x3c0
 [] ? get_parent_ip+0xd/0x50
 [] handle_mm_fault+0x158/0xcb0
 [] ? retint_restore_args+0xe/0xe
 [] __do_page_fault+0x1a6/0x620
 [] ? __acct_update_integrals+0x8e/0x120
 [] ? get_parent_ip+0xd/0x50
 [] ? preempt_count_sub+0x6b/0xf0
 [] do_page_fault+0x1e/0x70
Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d 
c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff <0f> 0b 0f 1f 44 00 00 
55 48 89 e5 41 57 45 31 ff 41 56 49 89 fe 
RIP  []

That BUG is..

413 static inline int PageTransHuge(struct page *page)
414 {
415 VM_BUG_ON_PAGE(PageTail(page), page);
416 return PageHead(page);
417 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Dave Jones
Not sure if Sasha has already reported this on -next (It's getting hard
to keep track of all the VM bugs he's been finding), but I hit this overnight
on .15-rc6.  First time I've seen this one.


page:ea0004599800 count:0 mapcount:0 mapping:  (null) index:0x2
page flags: 0x208000(tail)
[ cut here ]
kernel BUG at include/linux/page-flags.h:415!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 6858 Comm: trinity-c42 Not tainted 3.15.0-rc6+ #216
task: 88012d18e900 ti: 88009e87a000 task.ti: 88009e87a000
RIP: 0010:[bb718d98]  [bb718d98] 
PageTransHuge.part.23+0xb/0xd
RSP: :88009e87b940  EFLAGS: 00010246
RAX: 0001 RBX: 00116660 RCX: 0006
RDX:  RSI: bb0c00f8 RDI: bb0bfed2
RBP: 88009e87b940 R08: bc01203c R09: 03da
R10: 03d9 R11: 0003 R12: 0001
R13: 00116800 R14: 88024d64ce00 R15: ea0004599800
FS:  7f4fd192e740() GS:88024d04() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 04c0 CR3: a19ce000 CR4: 001407e0
DR0: 024f4000 DR1: 01d43000 DR2: 
DR3:  DR6: fffe0ff0 DR7: 0600
Stack:
 88009e87b9e8 bb1728a3 88009e87b9e8 88009e87baa8
 88012d18e900 88009e87ba60  00040016
  88009e87bfd8 08b3 88009e87ba50
Call Trace:
 [bb1728a3] isolate_migratepages_range+0x7a3/0x870
 [bb172d90] compact_zone+0x370/0x560
 [bb173022] compact_zone_order+0xa2/0x110
 [bb1733f1] try_to_compact_pages+0x101/0x130
 [bb71861b] __alloc_pages_direct_compact+0xac/0x1d0
 [bb15760b] __alloc_pages_nodemask+0x6ab/0xaf0
 [bb19c9ea] alloc_pages_vma+0x9a/0x160
 [bb1aef0d] do_huge_pmd_anonymous_page+0xfd/0x3c0
 [bb0a19cd] ? get_parent_ip+0xd/0x50
 [bb17ac18] handle_mm_fault+0x158/0xcb0
 [bb72594d] ? retint_restore_args+0xe/0xe
 [bb728bb6] __do_page_fault+0x1a6/0x620
 [bb11011e] ? __acct_update_integrals+0x8e/0x120
 [bb0a19cd] ? get_parent_ip+0xd/0x50
 [bb72949b] ? preempt_count_sub+0x6b/0xf0
 [bb72904e] do_page_fault+0x1e/0x70
Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d 
c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff 0f 0b 0f 1f 44 00 00 
55 48 89 e5 41 57 45 31 ff 41 56 49 89 fe 
RIP  [bb718d98]

That BUG is..

413 static inline int PageTransHuge(struct page *page)
414 {
415 VM_BUG_ON_PAGE(PageTail(page), page);
416 return PageHead(page);
417 }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Vlastimil Babka

On 05/22/2014 03:58 PM, Dave Jones wrote:

Not sure if Sasha has already reported this on -next (It's getting hard
to keep track of all the VM bugs he's been finding), but I hit this overnight
on .15-rc6.  First time I've seen this one.


page:ea0004599800 count:0 mapcount:0 mapping:  (null) index:0x2
page flags: 0x208000(tail)
[ cut here ]
kernel BUG at include/linux/page-flags.h:415!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 6858 Comm: trinity-c42 Not tainted 3.15.0-rc6+ #216
task: 88012d18e900 ti: 88009e87a000 task.ti: 88009e87a000
RIP: 0010:[bb718d98]  [bb718d98] 
PageTransHuge.part.23+0xb/0xd
RSP: :88009e87b940  EFLAGS: 00010246
RAX: 0001 RBX: 00116660 RCX: 0006
RDX:  RSI: bb0c00f8 RDI: bb0bfed2
RBP: 88009e87b940 R08: bc01203c R09: 03da
R10: 03d9 R11: 0003 R12: 0001
R13: 00116800 R14: 88024d64ce00 R15: ea0004599800
FS:  7f4fd192e740() GS:88024d04() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 04c0 CR3: a19ce000 CR4: 001407e0
DR0: 024f4000 DR1: 01d43000 DR2: 
DR3:  DR6: fffe0ff0 DR7: 0600
Stack:
  88009e87b9e8 bb1728a3 88009e87b9e8 88009e87baa8
  88012d18e900 88009e87ba60  00040016
   88009e87bfd8 08b3 88009e87ba50
Call Trace:
  [bb1728a3] isolate_migratepages_range+0x7a3/0x870
  [bb172d90] compact_zone+0x370/0x560
  [bb173022] compact_zone_order+0xa2/0x110
  [bb1733f1] try_to_compact_pages+0x101/0x130
  [bb71861b] __alloc_pages_direct_compact+0xac/0x1d0
  [bb15760b] __alloc_pages_nodemask+0x6ab/0xaf0
  [bb19c9ea] alloc_pages_vma+0x9a/0x160
  [bb1aef0d] do_huge_pmd_anonymous_page+0xfd/0x3c0
  [bb0a19cd] ? get_parent_ip+0xd/0x50
  [bb17ac18] handle_mm_fault+0x158/0xcb0
  [bb72594d] ? retint_restore_args+0xe/0xe
  [bb728bb6] __do_page_fault+0x1a6/0x620
  [bb11011e] ? __acct_update_integrals+0x8e/0x120
  [bb0a19cd] ? get_parent_ip+0xd/0x50
  [bb72949b] ? preempt_count_sub+0x6b/0xf0
  [bb72904e] do_page_fault+0x1e/0x70
Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d c6 05 
74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff 0f 0b 0f 1f 44 00 00 55 48 
89 e5 41 57 45 31 ff 41 56 49 89 fe
RIP  [bb718d98]

That BUG is..

413 static inline int PageTransHuge(struct page *page)
414 {
415 VM_BUG_ON_PAGE(PageTail(page), page);
416 return PageHead(page);
417 }


Any idea which of the two PageTransHuge() calls in 
isolate_migratepages_range() that is? Offset far in the function suggest 
it's where the lru lock is already held, but I'm not sure as decodecode 
of your dump and objdump of my own compile look widely different.


If it's indeed the later PageTransHuge() call, it means that somebody 
else has cleared PageLRU and set PageTail (I don't think a page could 
have both at once) between the checks for PageLRU() and PageTransHuge() 
in isolate_migratepages_range(), while the latter was holding lru_lock. 
That's quite weird...


Vlastimil


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Dave Jones
On Thu, May 22, 2014 at 05:08:09PM +0200, Vlastimil Babka wrote:
 
   RIP: 0010:[bb718d98]  [bb718d98] 
   PageTransHuge.part.23+0xb/0xd
   Call Trace:
 [bb1728a3] isolate_migratepages_range+0x7a3/0x870
 [bb172d90] compact_zone+0x370/0x560
 [bb173022] compact_zone_order+0xa2/0x110
 [bb1733f1] try_to_compact_pages+0x101/0x130
   ...
   Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 
   5d c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff 0f 0b 0f 1f 
   44 00 00 55 48 89 e5 41 57 45 31 ff 41 56 49 89 fe
   RIP  [bb718d98]
  
   That BUG is..
  
   413 static inline int PageTransHuge(struct page *page)
   414 {
   415 VM_BUG_ON_PAGE(PageTail(page), page);
   416 return PageHead(page);
   417 }
  
  Any idea which of the two PageTransHuge() calls in 
  isolate_migratepages_range() that is? Offset far in the function suggest 
  it's where the lru lock is already held, but I'm not sure as decodecode 
  of your dump and objdump of my own compile look widely different.

Yeah, the only thing the code: matches is the BUG() which is in another section.
(see end of file at http://paste.fedoraproject.org/104155/40077293/raw/)

Maybe you can make more sense of that disassembly than I can..

Dave


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Sasha Levin
On 05/22/2014 09:58 AM, Dave Jones wrote:
 Not sure if Sasha has already reported this on -next (It's getting hard
 to keep track of all the VM bugs he's been finding), but I hit this overnight
 on .15-rc6.  First time I've seen this one.

Unfortunately I had to disable transhuge/hugetlb in my testing .config since
the open issues in -next get hit pretty often, and were unfixed for a while
now.

Keeping them enabled just made it impossible to test anything else.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

2014-05-22 Thread Vlastimil Babka

On 23.5.2014 6:21, Sasha Levin wrote:

On 05/22/2014 09:58 AM, Dave Jones wrote:

Not sure if Sasha has already reported this on -next (It's getting hard
to keep track of all the VM bugs he's been finding), but I hit this overnight
on .15-rc6.  First time I've seen this one.

Unfortunately I had to disable transhuge/hugetlb in my testing .config since
the open issues in -next get hit pretty often, and were unfixed for a while
now.


Meh, I lost track. We should really consider something like bugzilla for 
this, IMHO.



Keeping them enabled just made it impossible to test anything else.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/