Re: mm, something wring in page_lock_anon_vma_read()?

2017-06-08 Thread zhong jiang
On 2017/6/8 21:59, Vlastimil Babka wrote:
> On 06/08/2017 03:44 PM, Xishi Qiu wrote:
>> On 2017/5/23 17:33, Vlastimil Babka wrote:
>>
>>> On 05/23/2017 11:21 AM, zhong jiang wrote:
 On 2017/5/23 0:51, Vlastimil Babka wrote:
> On 05/20/2017 05:01 AM, zhong jiang wrote:
>> On 2017/5/20 10:40, Hugh Dickins wrote:
>>> On Sat, 20 May 2017, Xishi Qiu wrote:
 Here is a bug report form redhat: 
 https://bugzilla.redhat.com/show_bug.cgi?id=1305620
 And I meet the bug too. However it is hard to reproduce, and 
 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is 
 not help.

 From the vmcore, it seems that the page is still mapped(_mapcount=0 
 and _count=2),
 and the value of mapping is a valid address(mapping = 
 0x8801b3e2a101),
 but anon_vma has been corrupted.

 Any ideas?
>>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>>> a pte mapped in on top of another pte; but cannot begin tell you where
>>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>>
>>> Hugh
>>>
>>> .
>>>
>> Hi, Hugh
>>
>> I find the following message from the dmesg.
>>
>> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>>
>> I can prove that the __mapcount is misaccount.  when task is exited. the 
>> rmap
>> still exist.
> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
   HI, Vlastimil
  
   I miss the patch.
>>> Try applying it then, there's good chance the error and crash will go
>>> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
>>>
>> Hi Vlastimil,
>>
>> I find this error was reported by Kirill as following, right?
>> https://patchwork.kernel.org/patch/7550401/
> That was reported by Minchan.
>
>> The call trace is quite like the same as ours.
> In that thread, the error seems just disappeared in the end.
  without any patch,  I wonder that how to disappear. 
> So, did you apply the patch I suggested? Did it help?
 yes, I apply the patch,  test two weeks,  no panic occur.
 but last panic just occur after one month.  so we still not sure that
  it is really resolved the issue.

  Thanks
zhongjiang
>> Thanks,
>> Xishi Qiu
>>
 when I read the patch. I find the following issue. but I am sure it is 
 right.

   if (unlikely(pmd_trans_unstable(pmd)))
 return 0;
 /*
  * A regular pmd is established and it can't morph into a huge pmd
  * from under us anymore at this point because we hold the mmap_sem
  * read mode and khugepaged takes it in write mode. So now it's
  * safe to run pte_offset_map().
  */
 pte = pte_offset_map(pmd, address);

   after pmd_trans_unstable call,  without any protect method.  by the 
 comments,
   it think the pte_offset_map is safe.before pte_offset_map call, it 
 still may be
   unstable. it is possible?
>>> IIRC it's "unstable" wrt possible none->huge->none transition. But once
>>> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
>>> transition from regular pmd can't happen.
>>>
   Thanks
 zhongjiang
>> Thanks
>> zhongjiang
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> .
>

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

>>>
>>> .
>>>
>>
>>
>
> .
>




Re: mm, something wring in page_lock_anon_vma_read()?

2017-06-08 Thread zhong jiang
On 2017/6/8 21:59, Vlastimil Babka wrote:
> On 06/08/2017 03:44 PM, Xishi Qiu wrote:
>> On 2017/5/23 17:33, Vlastimil Babka wrote:
>>
>>> On 05/23/2017 11:21 AM, zhong jiang wrote:
 On 2017/5/23 0:51, Vlastimil Babka wrote:
> On 05/20/2017 05:01 AM, zhong jiang wrote:
>> On 2017/5/20 10:40, Hugh Dickins wrote:
>>> On Sat, 20 May 2017, Xishi Qiu wrote:
 Here is a bug report form redhat: 
 https://bugzilla.redhat.com/show_bug.cgi?id=1305620
 And I meet the bug too. However it is hard to reproduce, and 
 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is 
 not help.

 From the vmcore, it seems that the page is still mapped(_mapcount=0 
 and _count=2),
 and the value of mapping is a valid address(mapping = 
 0x8801b3e2a101),
 but anon_vma has been corrupted.

 Any ideas?
>>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>>> a pte mapped in on top of another pte; but cannot begin tell you where
>>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>>
>>> Hugh
>>>
>>> .
>>>
>> Hi, Hugh
>>
>> I find the following message from the dmesg.
>>
>> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>>
>> I can prove that the __mapcount is misaccount.  when task is exited. the 
>> rmap
>> still exist.
> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
   HI, Vlastimil
  
   I miss the patch.
>>> Try applying it then, there's good chance the error and crash will go
>>> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
>>>
>> Hi Vlastimil,
>>
>> I find this error was reported by Kirill as following, right?
>> https://patchwork.kernel.org/patch/7550401/
> That was reported by Minchan.
>
>> The call trace is quite like the same as ours.
> In that thread, the error seems just disappeared in the end.
  without any patch,  I wonder that how to disappear. 
> So, did you apply the patch I suggested? Did it help?
 yes, I apply the patch,  test two weeks,  no panic occur.
 but last panic just occur after one month.  so we still not sure that
  it is really resolved the issue.

  Thanks
zhongjiang
>> Thanks,
>> Xishi Qiu
>>
 when I read the patch. I find the following issue. but I am sure it is 
 right.

   if (unlikely(pmd_trans_unstable(pmd)))
 return 0;
 /*
  * A regular pmd is established and it can't morph into a huge pmd
  * from under us anymore at this point because we hold the mmap_sem
  * read mode and khugepaged takes it in write mode. So now it's
  * safe to run pte_offset_map().
  */
 pte = pte_offset_map(pmd, address);

   after pmd_trans_unstable call,  without any protect method.  by the 
 comments,
   it think the pte_offset_map is safe.before pte_offset_map call, it 
 still may be
   unstable. it is possible?
>>> IIRC it's "unstable" wrt possible none->huge->none transition. But once
>>> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
>>> transition from regular pmd can't happen.
>>>
   Thanks
 zhongjiang
>> Thanks
>> zhongjiang
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> .
>

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

>>>
>>> .
>>>
>>
>>
>
> .
>




Re: mm, something wring in page_lock_anon_vma_read()?

2017-06-08 Thread Vlastimil Babka
On 06/08/2017 03:44 PM, Xishi Qiu wrote:
> On 2017/5/23 17:33, Vlastimil Babka wrote:
> 
>> On 05/23/2017 11:21 AM, zhong jiang wrote:
>>> On 2017/5/23 0:51, Vlastimil Babka wrote:
 On 05/20/2017 05:01 AM, zhong jiang wrote:
> On 2017/5/20 10:40, Hugh Dickins wrote:
>> On Sat, 20 May 2017, Xishi Qiu wrote:
>>> Here is a bug report form redhat: 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>>> And I meet the bug too. However it is hard to reproduce, and 
>>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is 
>>> not help.
>>>
>>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>>> _count=2),
>>> and the value of mapping is a valid address(mapping = 
>>> 0x8801b3e2a101),
>>> but anon_vma has been corrupted.
>>>
>>> Any ideas?
>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>> a pte mapped in on top of another pte; but cannot begin tell you where
>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>
>> Hugh
>>
>> .
>>
> Hi, Hugh
>
> I find the following message from the dmesg.
>
> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>
> I can prove that the __mapcount is misaccount.  when task is exited. the 
> rmap
> still exist.
 Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
 thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>>>   HI, Vlastimil
>>>  
>>>   I miss the patch.
>>
>> Try applying it then, there's good chance the error and crash will go
>> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
>>
> 
> Hi Vlastimil,
> 
> I find this error was reported by Kirill as following, right?
> https://patchwork.kernel.org/patch/7550401/

That was reported by Minchan.

> The call trace is quite like the same as ours.

In that thread, the error seems just disappeared in the end.

So, did you apply the patch I suggested? Did it help?

> Thanks,
> Xishi Qiu
> 
>>> when I read the patch. I find the following issue. but I am sure it is 
>>> right.
>>>
>>>   if (unlikely(pmd_trans_unstable(pmd)))
>>> return 0;
>>> /*
>>>  * A regular pmd is established and it can't morph into a huge pmd
>>>  * from under us anymore at this point because we hold the mmap_sem
>>>  * read mode and khugepaged takes it in write mode. So now it's
>>>  * safe to run pte_offset_map().
>>>  */
>>> pte = pte_offset_map(pmd, address);
>>>
>>>   after pmd_trans_unstable call,  without any protect method.  by the 
>>> comments,
>>>   it think the pte_offset_map is safe.before pte_offset_map call, it 
>>> still may be
>>>   unstable. it is possible?
>>
>> IIRC it's "unstable" wrt possible none->huge->none transition. But once
>> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
>> transition from regular pmd can't happen.
>>
>>>   Thanks
>>> zhongjiang
> Thanks
> zhongjiang
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>

 .

>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>
>>
>>
>> .
>>
> 
> 
> 



Re: mm, something wring in page_lock_anon_vma_read()?

2017-06-08 Thread Vlastimil Babka
On 06/08/2017 03:44 PM, Xishi Qiu wrote:
> On 2017/5/23 17:33, Vlastimil Babka wrote:
> 
>> On 05/23/2017 11:21 AM, zhong jiang wrote:
>>> On 2017/5/23 0:51, Vlastimil Babka wrote:
 On 05/20/2017 05:01 AM, zhong jiang wrote:
> On 2017/5/20 10:40, Hugh Dickins wrote:
>> On Sat, 20 May 2017, Xishi Qiu wrote:
>>> Here is a bug report form redhat: 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>>> And I meet the bug too. However it is hard to reproduce, and 
>>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is 
>>> not help.
>>>
>>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>>> _count=2),
>>> and the value of mapping is a valid address(mapping = 
>>> 0x8801b3e2a101),
>>> but anon_vma has been corrupted.
>>>
>>> Any ideas?
>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>> a pte mapped in on top of another pte; but cannot begin tell you where
>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>
>> Hugh
>>
>> .
>>
> Hi, Hugh
>
> I find the following message from the dmesg.
>
> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>
> I can prove that the __mapcount is misaccount.  when task is exited. the 
> rmap
> still exist.
 Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
 thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>>>   HI, Vlastimil
>>>  
>>>   I miss the patch.
>>
>> Try applying it then, there's good chance the error and crash will go
>> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
>>
> 
> Hi Vlastimil,
> 
> I find this error was reported by Kirill as following, right?
> https://patchwork.kernel.org/patch/7550401/

That was reported by Minchan.

> The call trace is quite like the same as ours.

In that thread, the error seems just disappeared in the end.

So, did you apply the patch I suggested? Did it help?

> Thanks,
> Xishi Qiu
> 
>>> when I read the patch. I find the following issue. but I am sure it is 
>>> right.
>>>
>>>   if (unlikely(pmd_trans_unstable(pmd)))
>>> return 0;
>>> /*
>>>  * A regular pmd is established and it can't morph into a huge pmd
>>>  * from under us anymore at this point because we hold the mmap_sem
>>>  * read mode and khugepaged takes it in write mode. So now it's
>>>  * safe to run pte_offset_map().
>>>  */
>>> pte = pte_offset_map(pmd, address);
>>>
>>>   after pmd_trans_unstable call,  without any protect method.  by the 
>>> comments,
>>>   it think the pte_offset_map is safe.before pte_offset_map call, it 
>>> still may be
>>>   unstable. it is possible?
>>
>> IIRC it's "unstable" wrt possible none->huge->none transition. But once
>> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
>> transition from regular pmd can't happen.
>>
>>>   Thanks
>>> zhongjiang
> Thanks
> zhongjiang
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>

 .

>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>
>>
>>
>> .
>>
> 
> 
> 



Re: mm, something wring in page_lock_anon_vma_read()?

2017-06-08 Thread Xishi Qiu
On 2017/5/23 17:33, Vlastimil Babka wrote:

> On 05/23/2017 11:21 AM, zhong jiang wrote:
>> On 2017/5/23 0:51, Vlastimil Babka wrote:
>>> On 05/20/2017 05:01 AM, zhong jiang wrote:
 On 2017/5/20 10:40, Hugh Dickins wrote:
> On Sat, 20 May 2017, Xishi Qiu wrote:
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 
>> 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where
> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>
> Hugh
>
> .
>
 Hi, Hugh

 I find the following message from the dmesg.

 [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

 I can prove that the __mapcount is misaccount.  when task is exited. the 
 rmap
 still exist.
>>> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
>>> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>>   HI, Vlastimil
>>  
>>   I miss the patch.
> 
> Try applying it then, there's good chance the error and crash will go
> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
> 

Hi Vlastimil,

I find this error was reported by Kirill as following, right?
https://patchwork.kernel.org/patch/7550401/

The call trace is quite like the same as ours.

Thanks,
Xishi Qiu

>> when I read the patch. I find the following issue. but I am sure it is right.
>>
>>   if (unlikely(pmd_trans_unstable(pmd)))
>> return 0;
>> /*
>>  * A regular pmd is established and it can't morph into a huge pmd
>>  * from under us anymore at this point because we hold the mmap_sem
>>  * read mode and khugepaged takes it in write mode. So now it's
>>  * safe to run pte_offset_map().
>>  */
>> pte = pte_offset_map(pmd, address);
>>
>>   after pmd_trans_unstable call,  without any protect method.  by the 
>> comments,
>>   it think the pte_offset_map is safe.before pte_offset_map call, it 
>> still may be
>>   unstable. it is possible?
> 
> IIRC it's "unstable" wrt possible none->huge->none transition. But once
> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
> transition from regular pmd can't happen.
> 
>>   Thanks
>> zhongjiang
 Thanks
 zhongjiang

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

>>>
>>> .
>>>
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> 
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-06-08 Thread Xishi Qiu
On 2017/5/23 17:33, Vlastimil Babka wrote:

> On 05/23/2017 11:21 AM, zhong jiang wrote:
>> On 2017/5/23 0:51, Vlastimil Babka wrote:
>>> On 05/20/2017 05:01 AM, zhong jiang wrote:
 On 2017/5/20 10:40, Hugh Dickins wrote:
> On Sat, 20 May 2017, Xishi Qiu wrote:
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 
>> 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where
> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>
> Hugh
>
> .
>
 Hi, Hugh

 I find the following message from the dmesg.

 [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

 I can prove that the __mapcount is misaccount.  when task is exited. the 
 rmap
 still exist.
>>> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
>>> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>>   HI, Vlastimil
>>  
>>   I miss the patch.
> 
> Try applying it then, there's good chance the error and crash will go
> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
> 

Hi Vlastimil,

I find this error was reported by Kirill as following, right?
https://patchwork.kernel.org/patch/7550401/

The call trace is quite like the same as ours.

Thanks,
Xishi Qiu

>> when I read the patch. I find the following issue. but I am sure it is right.
>>
>>   if (unlikely(pmd_trans_unstable(pmd)))
>> return 0;
>> /*
>>  * A regular pmd is established and it can't morph into a huge pmd
>>  * from under us anymore at this point because we hold the mmap_sem
>>  * read mode and khugepaged takes it in write mode. So now it's
>>  * safe to run pte_offset_map().
>>  */
>> pte = pte_offset_map(pmd, address);
>>
>>   after pmd_trans_unstable call,  without any protect method.  by the 
>> comments,
>>   it think the pte_offset_map is safe.before pte_offset_map call, it 
>> still may be
>>   unstable. it is possible?
> 
> IIRC it's "unstable" wrt possible none->huge->none transition. But once
> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
> transition from regular pmd can't happen.
> 
>>   Thanks
>> zhongjiang
 Thanks
 zhongjiang

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

>>>
>>> .
>>>
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> 
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-23 Thread zhong jiang
On 2017/5/23 17:33, Vlastimil Babka wrote:
> On 05/23/2017 11:21 AM, zhong jiang wrote:
>> On 2017/5/23 0:51, Vlastimil Babka wrote:
>>> On 05/20/2017 05:01 AM, zhong jiang wrote:
 On 2017/5/20 10:40, Hugh Dickins wrote:
> On Sat, 20 May 2017, Xishi Qiu wrote:
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 
>> 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where
> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>
> Hugh
>
> .
>
 Hi, Hugh

 I find the following message from the dmesg.

 [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

 I can prove that the __mapcount is misaccount.  when task is exited. the 
 rmap
 still exist.
>>> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
>>> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>>   HI, Vlastimil
>>  
>>   I miss the patch.
> Try applying it then, there's good chance the error and crash will go
> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
 ok , I will try.   Thanks
>> when I read the patch. I find the following issue. but I am sure it is right.
>>
>>   if (unlikely(pmd_trans_unstable(pmd)))
>> return 0;
>> /*
>>  * A regular pmd is established and it can't morph into a huge pmd
>>  * from under us anymore at this point because we hold the mmap_sem
>>  * read mode and khugepaged takes it in write mode. So now it's
>>  * safe to run pte_offset_map().
>>  */
>> pte = pte_offset_map(pmd, address);
>>
>>   after pmd_trans_unstable call,  without any protect method.  by the 
>> comments,
>>   it think the pte_offset_map is safe.before pte_offset_map call, it 
>> still may be
>>   unstable. it is possible?
> IIRC it's "unstable" wrt possible none->huge->none transition. But once
> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
> transition from regular pmd can't happen.
  Thank you for clarify. 
 
  Regards
 zhongjiang
>>   Thanks
>> zhongjiang
 Thanks
 zhongjiang

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

>>> .
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>
> .
>




Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-23 Thread zhong jiang
On 2017/5/23 17:33, Vlastimil Babka wrote:
> On 05/23/2017 11:21 AM, zhong jiang wrote:
>> On 2017/5/23 0:51, Vlastimil Babka wrote:
>>> On 05/20/2017 05:01 AM, zhong jiang wrote:
 On 2017/5/20 10:40, Hugh Dickins wrote:
> On Sat, 20 May 2017, Xishi Qiu wrote:
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 
>> 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where
> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>
> Hugh
>
> .
>
 Hi, Hugh

 I find the following message from the dmesg.

 [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

 I can prove that the __mapcount is misaccount.  when task is exited. the 
 rmap
 still exist.
>>> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
>>> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>>   HI, Vlastimil
>>  
>>   I miss the patch.
> Try applying it then, there's good chance the error and crash will go
> away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).
 ok , I will try.   Thanks
>> when I read the patch. I find the following issue. but I am sure it is right.
>>
>>   if (unlikely(pmd_trans_unstable(pmd)))
>> return 0;
>> /*
>>  * A regular pmd is established and it can't morph into a huge pmd
>>  * from under us anymore at this point because we hold the mmap_sem
>>  * read mode and khugepaged takes it in write mode. So now it's
>>  * safe to run pte_offset_map().
>>  */
>> pte = pte_offset_map(pmd, address);
>>
>>   after pmd_trans_unstable call,  without any protect method.  by the 
>> comments,
>>   it think the pte_offset_map is safe.before pte_offset_map call, it 
>> still may be
>>   unstable. it is possible?
> IIRC it's "unstable" wrt possible none->huge->none transition. But once
> we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
> transition from regular pmd can't happen.
  Thank you for clarify. 
 
  Regards
 zhongjiang
>>   Thanks
>> zhongjiang
 Thanks
 zhongjiang

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

>>> .
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>
> .
>




Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-23 Thread Vlastimil Babka
On 05/23/2017 11:21 AM, zhong jiang wrote:
> On 2017/5/23 0:51, Vlastimil Babka wrote:
>> On 05/20/2017 05:01 AM, zhong jiang wrote:
>>> On 2017/5/20 10:40, Hugh Dickins wrote:
 On Sat, 20 May 2017, Xishi Qiu wrote:
> Here is a bug report form redhat: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
> And I meet the bug too. However it is hard to reproduce, and 
> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
> help.
>
> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
> _count=2),
> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
> but anon_vma has been corrupted.
>
> Any ideas?
 Sorry, no.  I assume that _mapcount has been misaccounted, for example
 a pte mapped in on top of another pte; but cannot begin tell you where
 in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.

 Hugh

 .

>>> Hi, Hugh
>>>
>>> I find the following message from the dmesg.
>>>
>>> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>>>
>>> I can prove that the __mapcount is misaccount.  when task is exited. the 
>>> rmap
>>> still exist.
>> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
>> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>   HI, Vlastimil
>  
>   I miss the patch.

Try applying it then, there's good chance the error and crash will go
away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).

> when I read the patch. I find the following issue. but I am sure it is right.
> 
>   if (unlikely(pmd_trans_unstable(pmd)))
> return 0;
> /*
>  * A regular pmd is established and it can't morph into a huge pmd
>  * from under us anymore at this point because we hold the mmap_sem
>  * read mode and khugepaged takes it in write mode. So now it's
>  * safe to run pte_offset_map().
>  */
> pte = pte_offset_map(pmd, address);
> 
>   after pmd_trans_unstable call,  without any protect method.  by the 
> comments,
>   it think the pte_offset_map is safe.before pte_offset_map call, it 
> still may be
>   unstable. it is possible?

IIRC it's "unstable" wrt possible none->huge->none transition. But once
we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
transition from regular pmd can't happen.

>   Thanks
> zhongjiang
>>> Thanks
>>> zhongjiang
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>
>>
>> .
>>
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 



Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-23 Thread Vlastimil Babka
On 05/23/2017 11:21 AM, zhong jiang wrote:
> On 2017/5/23 0:51, Vlastimil Babka wrote:
>> On 05/20/2017 05:01 AM, zhong jiang wrote:
>>> On 2017/5/20 10:40, Hugh Dickins wrote:
 On Sat, 20 May 2017, Xishi Qiu wrote:
> Here is a bug report form redhat: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
> And I meet the bug too. However it is hard to reproduce, and 
> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
> help.
>
> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
> _count=2),
> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
> but anon_vma has been corrupted.
>
> Any ideas?
 Sorry, no.  I assume that _mapcount has been misaccounted, for example
 a pte mapped in on top of another pte; but cannot begin tell you where
 in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.

 Hugh

 .

>>> Hi, Hugh
>>>
>>> I find the following message from the dmesg.
>>>
>>> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>>>
>>> I can prove that the __mapcount is misaccount.  when task is exited. the 
>>> rmap
>>> still exist.
>> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
>> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
>   HI, Vlastimil
>  
>   I miss the patch.

Try applying it then, there's good chance the error and crash will go
away. Even if your workload doesn't actually run any madvise(MADV_DONTNEED).

> when I read the patch. I find the following issue. but I am sure it is right.
> 
>   if (unlikely(pmd_trans_unstable(pmd)))
> return 0;
> /*
>  * A regular pmd is established and it can't morph into a huge pmd
>  * from under us anymore at this point because we hold the mmap_sem
>  * read mode and khugepaged takes it in write mode. So now it's
>  * safe to run pte_offset_map().
>  */
> pte = pte_offset_map(pmd, address);
> 
>   after pmd_trans_unstable call,  without any protect method.  by the 
> comments,
>   it think the pte_offset_map is safe.before pte_offset_map call, it 
> still may be
>   unstable. it is possible?

IIRC it's "unstable" wrt possible none->huge->none transition. But once
we've seen it's a regular pmd via pmd_trans_unstable(), we're safe as a
transition from regular pmd can't happen.

>   Thanks
> zhongjiang
>>> Thanks
>>> zhongjiang
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>>
>>
>> .
>>
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 



Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-23 Thread zhong jiang
On 2017/5/23 0:51, Vlastimil Babka wrote:
> On 05/20/2017 05:01 AM, zhong jiang wrote:
>> On 2017/5/20 10:40, Hugh Dickins wrote:
>>> On Sat, 20 May 2017, Xishi Qiu wrote:
 Here is a bug report form redhat: 
 https://bugzilla.redhat.com/show_bug.cgi?id=1305620
 And I meet the bug too. However it is hard to reproduce, and 
 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
 help.

 From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
 _count=2),
 and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
 but anon_vma has been corrupted.

 Any ideas?
>>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>>> a pte mapped in on top of another pte; but cannot begin tell you where
>>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>>
>>> Hugh
>>>
>>> .
>>>
>> Hi, Hugh
>>
>> I find the following message from the dmesg.
>>
>> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>>
>> I can prove that the __mapcount is misaccount.  when task is exited. the rmap
>> still exist.
> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
  HI, Vlastimil
 
  I miss the patch.  when I read the patch. I find the following issue. but I 
am sure it is right.

  if (unlikely(pmd_trans_unstable(pmd)))
return 0;
/*
 * A regular pmd is established and it can't morph into a huge pmd
 * from under us anymore at this point because we hold the mmap_sem
 * read mode and khugepaged takes it in write mode. So now it's
 * safe to run pte_offset_map().
 */
pte = pte_offset_map(pmd, address);

  after pmd_trans_unstable call,  without any protect method.  by the comments,
  it think the pte_offset_map is safe.before pte_offset_map call, it still 
may be
  unstable. it is possible?

  Thanks
zhongjiang
>> Thanks
>> zhongjiang
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
>
> .
>




Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-23 Thread zhong jiang
On 2017/5/23 0:51, Vlastimil Babka wrote:
> On 05/20/2017 05:01 AM, zhong jiang wrote:
>> On 2017/5/20 10:40, Hugh Dickins wrote:
>>> On Sat, 20 May 2017, Xishi Qiu wrote:
 Here is a bug report form redhat: 
 https://bugzilla.redhat.com/show_bug.cgi?id=1305620
 And I meet the bug too. However it is hard to reproduce, and 
 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
 help.

 From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
 _count=2),
 and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
 but anon_vma has been corrupted.

 Any ideas?
>>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>>> a pte mapped in on top of another pte; but cannot begin tell you where
>>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>>
>>> Hugh
>>>
>>> .
>>>
>> Hi, Hugh
>>
>> I find the following message from the dmesg.
>>
>> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
>>
>> I can prove that the __mapcount is misaccount.  when task is exited. the rmap
>> still exist.
> Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
> thp: fix SMP race condition between THP page fault and MADV_DONTNEED")
  HI, Vlastimil
 
  I miss the patch.  when I read the patch. I find the following issue. but I 
am sure it is right.

  if (unlikely(pmd_trans_unstable(pmd)))
return 0;
/*
 * A regular pmd is established and it can't morph into a huge pmd
 * from under us anymore at this point because we hold the mmap_sem
 * read mode and khugepaged takes it in write mode. So now it's
 * safe to run pte_offset_map().
 */
pte = pte_offset_map(pmd, address);

  after pmd_trans_unstable call,  without any protect method.  by the comments,
  it think the pte_offset_map is safe.before pte_offset_map call, it still 
may be
  unstable. it is possible?

  Thanks
zhongjiang
>> Thanks
>> zhongjiang
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>>
>
> .
>




Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Hugh Dickins
On Tue, 23 May 2017, Xishi Qiu wrote:
> On 2017/5/23 3:26, Hugh Dickins wrote:
> > I mean, there are various places in mm/memory.c which decide what they
> > intend to do based on orig_pte, then take pte lock, then check that
> > pte_same(pte, orig_pte) before taking it any further.  If a pte_same()
> > check were missing (I do not know of any such case), then two racing
> > tasks might install the same pte, one on top of the other - page
> > mapcount being incremented twice, but decremented only once when
> > that pte is finally unmapped later.
> > 
> 
> Hi Hugh,
> 
> Do you mean that the ptes from two racing point to the same page?
> or the two racing point to two pages, but one covers the other later?
> and the first page maybe alone in the lru list, and it will never be freed
> when the process exit.
> 
> We got this info before crash.
> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

I might mean either: you are taking my suggestion too seriously,
it is merely a suggestion of one way in which this could happen.

Another way is ordinary memory corruption (whether by software error
or by flipped DRAM bits) of a page table: that could end up here too.

Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Hugh Dickins
On Tue, 23 May 2017, Xishi Qiu wrote:
> On 2017/5/23 3:26, Hugh Dickins wrote:
> > I mean, there are various places in mm/memory.c which decide what they
> > intend to do based on orig_pte, then take pte lock, then check that
> > pte_same(pte, orig_pte) before taking it any further.  If a pte_same()
> > check were missing (I do not know of any such case), then two racing
> > tasks might install the same pte, one on top of the other - page
> > mapcount being incremented twice, but decremented only once when
> > that pte is finally unmapped later.
> > 
> 
> Hi Hugh,
> 
> Do you mean that the ptes from two racing point to the same page?
> or the two racing point to two pages, but one covers the other later?
> and the first page maybe alone in the lru list, and it will never be freed
> when the process exit.
> 
> We got this info before crash.
> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

I might mean either: you are taking my suggestion too seriously,
it is merely a suggestion of one way in which this could happen.

Another way is ordinary memory corruption (whether by software error
or by flipped DRAM bits) of a page table: that could end up here too.

Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Xishi Qiu
On 2017/5/23 3:26, Hugh Dickins wrote:

> On Mon, 22 May 2017, Xishi Qiu wrote:
>> On 2017/5/20 10:40, Hugh Dickins wrote:
>>> On Sat, 20 May 2017, Xishi Qiu wrote:

 Here is a bug report form redhat: 
 https://bugzilla.redhat.com/show_bug.cgi?id=1305620
 And I meet the bug too. However it is hard to reproduce, and 
 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
 help.

 From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
 _count=2),
 and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
 but anon_vma has been corrupted.

 Any ideas?
>>>
>>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>>> a pte mapped in on top of another pte; but cannot begin tell you where
>>
>> Hi Hugh,
>>
>> What does "a pte mapped in on top of another pte" mean? Could you give more 
>> info?
> 
> I mean, there are various places in mm/memory.c which decide what they
> intend to do based on orig_pte, then take pte lock, then check that
> pte_same(pte, orig_pte) before taking it any further.  If a pte_same()
> check were missing (I do not know of any such case), then two racing
> tasks might install the same pte, one on top of the other - page
> mapcount being incremented twice, but decremented only once when
> that pte is finally unmapped later.
> 

Hi Hugh,

Do you mean that the ptes from two racing point to the same page?
or the two racing point to two pages, but one covers the other later?
and the first page maybe alone in the lru list, and it will never be freed
when the process exit.

We got this info before crash.
[26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

Thanks,
Xishi Qiu

> Please see similar discussion in the earlier thread at
> marc.info/?l=linux-mm=148222656211837=2
> 
> Hugh
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>>
>>> Hugh
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Xishi Qiu
On 2017/5/23 3:26, Hugh Dickins wrote:

> On Mon, 22 May 2017, Xishi Qiu wrote:
>> On 2017/5/20 10:40, Hugh Dickins wrote:
>>> On Sat, 20 May 2017, Xishi Qiu wrote:

 Here is a bug report form redhat: 
 https://bugzilla.redhat.com/show_bug.cgi?id=1305620
 And I meet the bug too. However it is hard to reproduce, and 
 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
 help.

 From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
 _count=2),
 and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
 but anon_vma has been corrupted.

 Any ideas?
>>>
>>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>>> a pte mapped in on top of another pte; but cannot begin tell you where
>>
>> Hi Hugh,
>>
>> What does "a pte mapped in on top of another pte" mean? Could you give more 
>> info?
> 
> I mean, there are various places in mm/memory.c which decide what they
> intend to do based on orig_pte, then take pte lock, then check that
> pte_same(pte, orig_pte) before taking it any further.  If a pte_same()
> check were missing (I do not know of any such case), then two racing
> tasks might install the same pte, one on top of the other - page
> mapcount being incremented twice, but decremented only once when
> that pte is finally unmapped later.
> 

Hi Hugh,

Do you mean that the ptes from two racing point to the same page?
or the two racing point to two pages, but one covers the other later?
and the first page maybe alone in the lru list, and it will never be freed
when the process exit.

We got this info before crash.
[26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

Thanks,
Xishi Qiu

> Please see similar discussion in the earlier thread at
> marc.info/?l=linux-mm=148222656211837=2
> 
> Hugh
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>>
>>> Hugh
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Hugh Dickins
On Mon, 22 May 2017, Xishi Qiu wrote:
> On 2017/5/20 10:40, Hugh Dickins wrote:
> > On Sat, 20 May 2017, Xishi Qiu wrote:
> >>
> >> Here is a bug report form redhat: 
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
> >> And I meet the bug too. However it is hard to reproduce, and 
> >> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
> >> help.
> >>
> >> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
> >> _count=2),
> >> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
> >> but anon_vma has been corrupted.
> >>
> >> Any ideas?
> > 
> > Sorry, no.  I assume that _mapcount has been misaccounted, for example
> > a pte mapped in on top of another pte; but cannot begin tell you where
> 
> Hi Hugh,
> 
> What does "a pte mapped in on top of another pte" mean? Could you give more 
> info?

I mean, there are various places in mm/memory.c which decide what they
intend to do based on orig_pte, then take pte lock, then check that
pte_same(pte, orig_pte) before taking it any further.  If a pte_same()
check were missing (I do not know of any such case), then two racing
tasks might install the same pte, one on top of the other - page
mapcount being incremented twice, but decremented only once when
that pte is finally unmapped later.

Please see similar discussion in the earlier thread at
marc.info/?l=linux-mm=148222656211837=2

Hugh

> 
> Thanks,
> Xishi Qiu
> 
> > in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
> > 
> > Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Hugh Dickins
On Mon, 22 May 2017, Xishi Qiu wrote:
> On 2017/5/20 10:40, Hugh Dickins wrote:
> > On Sat, 20 May 2017, Xishi Qiu wrote:
> >>
> >> Here is a bug report form redhat: 
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
> >> And I meet the bug too. However it is hard to reproduce, and 
> >> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
> >> help.
> >>
> >> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
> >> _count=2),
> >> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
> >> but anon_vma has been corrupted.
> >>
> >> Any ideas?
> > 
> > Sorry, no.  I assume that _mapcount has been misaccounted, for example
> > a pte mapped in on top of another pte; but cannot begin tell you where
> 
> Hi Hugh,
> 
> What does "a pte mapped in on top of another pte" mean? Could you give more 
> info?

I mean, there are various places in mm/memory.c which decide what they
intend to do based on orig_pte, then take pte lock, then check that
pte_same(pte, orig_pte) before taking it any further.  If a pte_same()
check were missing (I do not know of any such case), then two racing
tasks might install the same pte, one on top of the other - page
mapcount being incremented twice, but decremented only once when
that pte is finally unmapped later.

Please see similar discussion in the earlier thread at
marc.info/?l=linux-mm=148222656211837=2

Hugh

> 
> Thanks,
> Xishi Qiu
> 
> > in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
> > 
> > Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Vlastimil Babka
On 05/20/2017 05:01 AM, zhong jiang wrote:
> On 2017/5/20 10:40, Hugh Dickins wrote:
>> On Sat, 20 May 2017, Xishi Qiu wrote:
>>> Here is a bug report form redhat: 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>>> And I meet the bug too. However it is hard to reproduce, and 
>>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>>> help.
>>>
>>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>>> _count=2),
>>> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
>>> but anon_vma has been corrupted.
>>>
>>> Any ideas?
>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>> a pte mapped in on top of another pte; but cannot begin tell you where
>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>
>> Hugh
>>
>> .
>>
> Hi, Hugh
> 
> I find the following message from the dmesg.
> 
> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
> 
> I can prove that the __mapcount is misaccount.  when task is exited. the rmap
> still exist.

Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
thp: fix SMP race condition between THP page fault and MADV_DONTNEED")

> Thanks
> zhongjiang
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 



Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Vlastimil Babka
On 05/20/2017 05:01 AM, zhong jiang wrote:
> On 2017/5/20 10:40, Hugh Dickins wrote:
>> On Sat, 20 May 2017, Xishi Qiu wrote:
>>> Here is a bug report form redhat: 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>>> And I meet the bug too. However it is hard to reproduce, and 
>>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>>> help.
>>>
>>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>>> _count=2),
>>> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
>>> but anon_vma has been corrupted.
>>>
>>> Any ideas?
>> Sorry, no.  I assume that _mapcount has been misaccounted, for example
>> a pte mapped in on top of another pte; but cannot begin tell you where
>> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>>
>> Hugh
>>
>> .
>>
> Hi, Hugh
> 
> I find the following message from the dmesg.
> 
> [26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1
> 
> I can prove that the __mapcount is misaccount.  when task is exited. the rmap
> still exist.

Check if the kernel in question contains this commit: ad33bb04b2a6 ("mm:
thp: fix SMP race condition between THP page fault and MADV_DONTNEED")

> Thanks
> zhongjiang
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 



Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Xishi Qiu
On 2017/5/20 10:40, Hugh Dickins wrote:

> On Sat, 20 May 2017, Xishi Qiu wrote:
>>
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> 
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where

Hi Hugh,

What does "a pte mapped in on top of another pte" mean? Could you give more 
info?

Thanks,
Xishi Qiu

> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
> 
> Hugh
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-22 Thread Xishi Qiu
On 2017/5/20 10:40, Hugh Dickins wrote:

> On Sat, 20 May 2017, Xishi Qiu wrote:
>>
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> 
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where

Hi Hugh,

What does "a pte mapped in on top of another pte" mean? Could you give more 
info?

Thanks,
Xishi Qiu

> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
> 
> Hugh
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread zhong jiang
On 2017/5/20 10:40, Hugh Dickins wrote:
> On Sat, 20 May 2017, Xishi Qiu wrote:
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where
> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>
> Hugh
>
> .
>
Hi, Hugh

I find the following message from the dmesg.

[26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

I can prove that the __mapcount is misaccount.  when task is exited. the rmap
still exist.

Thanks
zhongjiang



Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread zhong jiang
On 2017/5/20 10:40, Hugh Dickins wrote:
> On Sat, 20 May 2017, Xishi Qiu wrote:
>> Here is a bug report form redhat: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
>> And I meet the bug too. However it is hard to reproduce, and 
>> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not 
>> help.
>>
>> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>> _count=2),
>> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
>> but anon_vma has been corrupted.
>>
>> Any ideas?
> Sorry, no.  I assume that _mapcount has been misaccounted, for example
> a pte mapped in on top of another pte; but cannot begin tell you where
> in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.
>
> Hugh
>
> .
>
Hi, Hugh

I find the following message from the dmesg.

[26068.316592] BUG: Bad rss-counter state mm:8800a7de2d80 idx:1 val:1

I can prove that the __mapcount is misaccount.  when task is exited. the rmap
still exist.

Thanks
zhongjiang



Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Hugh Dickins
On Sat, 20 May 2017, Xishi Qiu wrote:
> 
> Here is a bug report form redhat: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
> And I meet the bug too. However it is hard to reproduce, and 
> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not help.
> 
> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
> _count=2),
> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
> but anon_vma has been corrupted.
> 
> Any ideas?

Sorry, no.  I assume that _mapcount has been misaccounted, for example
a pte mapped in on top of another pte; but cannot begin tell you where
in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.

Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Hugh Dickins
On Sat, 20 May 2017, Xishi Qiu wrote:
> 
> Here is a bug report form redhat: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1305620
> And I meet the bug too. However it is hard to reproduce, and 
> 624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not help.
> 
> From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
> _count=2),
> and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
> but anon_vma has been corrupted.
> 
> Any ideas?

Sorry, no.  I assume that _mapcount has been misaccounted, for example
a pte mapped in on top of another pte; but cannot begin tell you where
in Red Hat's kernel-3.10.0-229.4.2.el7 that might happen.

Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/20 10:02, Hugh Dickins wrote:

> On Sat, 20 May 2017, Xishi Qiu wrote:
>> On 2017/5/20 6:00, Hugh Dickins wrote:
>>>
>>> You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
>>> and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
>>> of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
>>> root in anon_vma_free(), others could still be looking at it.
>>>
>>> Hugh
>>>
>>
>> Hi Hugh,
>>
>> Thanks for your reply.
>>
>> SLAB_DESTROY_BY_RCU will let it call call_rcu() in free_slab(), but if the
>> anon_vma *reuse* by someone again, access root_anon_vma is not safe, right?
> 
> That is safe, on reuse it is still a struct anon_vma; then the test for
> !page_mapped(page) will show that it's no longer a reliable anon_vma for
> this page, so page_lock_anon_vma_read() returns NULL.
> 
> But of course, if page->_mapcount has been corrupted or misaccounted,
> it may think page_mapped(page) when actually page is not mapped,
> and the anon_vma is not good for it.
> 

Hi Hugh,

Here is a bug report form redhat: 
https://bugzilla.redhat.com/show_bug.cgi?id=1305620
And I meet the bug too. However it is hard to reproduce, and 
624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not help.

>From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>_count=2),
and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
but anon_vma has been corrupted.

Any ideas?

Thanks,
Xishi Qiu

>>
>> e.g. if I clean the root pointer before free it, then access root_anon_vma
>> in page_lock_anon_vma_read() is NULL pointer access, right?
> 
> Yes, cleaning root pointer before free may result in NULL pointer access.
> 
> Hugh
> 
>>
>> anon_vma_free()
>>  ...
>>  anon_vma->root = NULL;
>>  kmem_cache_free(anon_vma_cachep, anon_vma);
>>  ...
>>
>> Thanks,
>> Xishi Qiu
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/20 10:02, Hugh Dickins wrote:

> On Sat, 20 May 2017, Xishi Qiu wrote:
>> On 2017/5/20 6:00, Hugh Dickins wrote:
>>>
>>> You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
>>> and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
>>> of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
>>> root in anon_vma_free(), others could still be looking at it.
>>>
>>> Hugh
>>>
>>
>> Hi Hugh,
>>
>> Thanks for your reply.
>>
>> SLAB_DESTROY_BY_RCU will let it call call_rcu() in free_slab(), but if the
>> anon_vma *reuse* by someone again, access root_anon_vma is not safe, right?
> 
> That is safe, on reuse it is still a struct anon_vma; then the test for
> !page_mapped(page) will show that it's no longer a reliable anon_vma for
> this page, so page_lock_anon_vma_read() returns NULL.
> 
> But of course, if page->_mapcount has been corrupted or misaccounted,
> it may think page_mapped(page) when actually page is not mapped,
> and the anon_vma is not good for it.
> 

Hi Hugh,

Here is a bug report form redhat: 
https://bugzilla.redhat.com/show_bug.cgi?id=1305620
And I meet the bug too. However it is hard to reproduce, and 
624483f3ea82598("mm: rmap: fix use-after-free in __put_anon_vma") is not help.

>From the vmcore, it seems that the page is still mapped(_mapcount=0 and 
>_count=2),
and the value of mapping is a valid address(mapping = 0x8801b3e2a101),
but anon_vma has been corrupted.

Any ideas?

Thanks,
Xishi Qiu

>>
>> e.g. if I clean the root pointer before free it, then access root_anon_vma
>> in page_lock_anon_vma_read() is NULL pointer access, right?
> 
> Yes, cleaning root pointer before free may result in NULL pointer access.
> 
> Hugh
> 
>>
>> anon_vma_free()
>>  ...
>>  anon_vma->root = NULL;
>>  kmem_cache_free(anon_vma_cachep, anon_vma);
>>  ...
>>
>> Thanks,
>> Xishi Qiu
> 
> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Hugh Dickins
On Sat, 20 May 2017, Xishi Qiu wrote:
> On 2017/5/20 6:00, Hugh Dickins wrote:
> > 
> > You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
> > and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
> > of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
> > root in anon_vma_free(), others could still be looking at it.
> > 
> > Hugh
> > 
> 
> Hi Hugh,
> 
> Thanks for your reply.
> 
> SLAB_DESTROY_BY_RCU will let it call call_rcu() in free_slab(), but if the
> anon_vma *reuse* by someone again, access root_anon_vma is not safe, right?

That is safe, on reuse it is still a struct anon_vma; then the test for
!page_mapped(page) will show that it's no longer a reliable anon_vma for
this page, so page_lock_anon_vma_read() returns NULL.

But of course, if page->_mapcount has been corrupted or misaccounted,
it may think page_mapped(page) when actually page is not mapped,
and the anon_vma is not good for it.

> 
> e.g. if I clean the root pointer before free it, then access root_anon_vma
> in page_lock_anon_vma_read() is NULL pointer access, right?

Yes, cleaning root pointer before free may result in NULL pointer access.

Hugh

> 
> anon_vma_free()
>   ...
>   anon_vma->root = NULL;
>   kmem_cache_free(anon_vma_cachep, anon_vma);
>   ...
> 
> Thanks,
> Xishi Qiu


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Hugh Dickins
On Sat, 20 May 2017, Xishi Qiu wrote:
> On 2017/5/20 6:00, Hugh Dickins wrote:
> > 
> > You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
> > and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
> > of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
> > root in anon_vma_free(), others could still be looking at it.
> > 
> > Hugh
> > 
> 
> Hi Hugh,
> 
> Thanks for your reply.
> 
> SLAB_DESTROY_BY_RCU will let it call call_rcu() in free_slab(), but if the
> anon_vma *reuse* by someone again, access root_anon_vma is not safe, right?

That is safe, on reuse it is still a struct anon_vma; then the test for
!page_mapped(page) will show that it's no longer a reliable anon_vma for
this page, so page_lock_anon_vma_read() returns NULL.

But of course, if page->_mapcount has been corrupted or misaccounted,
it may think page_mapped(page) when actually page is not mapped,
and the anon_vma is not good for it.

> 
> e.g. if I clean the root pointer before free it, then access root_anon_vma
> in page_lock_anon_vma_read() is NULL pointer access, right?

Yes, cleaning root pointer before free may result in NULL pointer access.

Hugh

> 
> anon_vma_free()
>   ...
>   anon_vma->root = NULL;
>   kmem_cache_free(anon_vma_cachep, anon_vma);
>   ...
> 
> Thanks,
> Xishi Qiu


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/20 6:00, Hugh Dickins wrote:

> On Fri, 19 May 2017, Xishi Qiu wrote:
>> On 2017/5/19 16:52, Xishi Qiu wrote:
>>> On 2017/5/18 17:46, Xishi Qiu wrote:
>>>
 Hi, my system triggers this bug, and the vmcore shows the anon_vma seems 
 be freed.
 The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know 
 if it
 exists in mainline, any reply is welcome!

>>>
>>> When we alloc anon_vma, we will init the value of anon_vma->root,
>>> so can we set anon_vma->root to NULL when calling
>>> anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);
>>>
>>> anon_vma_free()
>>> ...
>>> anon_vma->root = NULL;
>>> kmem_cache_free(anon_vma_cachep, anon_vma);
>>>
>>> I find if we do this above, system boot failed, why?
>>>
>>
>> If anon_vma was freed, we should not to access the root_anon_vma, because it 
>> maybe also
>> freed(e.g. anon_vma == root_anon_vma), right?
>>
>> page_lock_anon_vma_read()
>>  ...
>>  anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>>  root_anon_vma = ACCESS_ONCE(anon_vma->root);
>>  if (down_read_trylock(_anon_vma->rwsem)) {  // it's not safe
>>  ...
>>  if (!atomic_inc_not_zero(_vma->refcount)) {  // check anon_vma was 
>> not freed
>>  ...
>>  anon_vma_lock_read(anon_vma);  // it's safe
>>  ...
> 
> You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
> and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
> of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
> root in anon_vma_free(), others could still be looking at it.
> 
> Hugh
> 

Hi Hugh,

Thanks for your reply.

SLAB_DESTROY_BY_RCU will let it call call_rcu() in free_slab(), but if the
anon_vma *reuse* by someone again, access root_anon_vma is not safe, right?

e.g. if I clean the root pointer before free it, then access root_anon_vma
in page_lock_anon_vma_read() is NULL pointer access, right?

anon_vma_free()
...
anon_vma->root = NULL;
kmem_cache_free(anon_vma_cachep, anon_vma);
...

Thanks,
Xishi Qiu

> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/20 6:00, Hugh Dickins wrote:

> On Fri, 19 May 2017, Xishi Qiu wrote:
>> On 2017/5/19 16:52, Xishi Qiu wrote:
>>> On 2017/5/18 17:46, Xishi Qiu wrote:
>>>
 Hi, my system triggers this bug, and the vmcore shows the anon_vma seems 
 be freed.
 The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know 
 if it
 exists in mainline, any reply is welcome!

>>>
>>> When we alloc anon_vma, we will init the value of anon_vma->root,
>>> so can we set anon_vma->root to NULL when calling
>>> anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);
>>>
>>> anon_vma_free()
>>> ...
>>> anon_vma->root = NULL;
>>> kmem_cache_free(anon_vma_cachep, anon_vma);
>>>
>>> I find if we do this above, system boot failed, why?
>>>
>>
>> If anon_vma was freed, we should not to access the root_anon_vma, because it 
>> maybe also
>> freed(e.g. anon_vma == root_anon_vma), right?
>>
>> page_lock_anon_vma_read()
>>  ...
>>  anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>>  root_anon_vma = ACCESS_ONCE(anon_vma->root);
>>  if (down_read_trylock(_anon_vma->rwsem)) {  // it's not safe
>>  ...
>>  if (!atomic_inc_not_zero(_vma->refcount)) {  // check anon_vma was 
>> not freed
>>  ...
>>  anon_vma_lock_read(anon_vma);  // it's safe
>>  ...
> 
> You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
> and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
> of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
> root in anon_vma_free(), others could still be looking at it.
> 
> Hugh
> 

Hi Hugh,

Thanks for your reply.

SLAB_DESTROY_BY_RCU will let it call call_rcu() in free_slab(), but if the
anon_vma *reuse* by someone again, access root_anon_vma is not safe, right?

e.g. if I clean the root pointer before free it, then access root_anon_vma
in page_lock_anon_vma_read() is NULL pointer access, right?

anon_vma_free()
...
anon_vma->root = NULL;
kmem_cache_free(anon_vma_cachep, anon_vma);
...

Thanks,
Xishi Qiu

> .
> 





Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Hugh Dickins
On Fri, 19 May 2017, Xishi Qiu wrote:
> On 2017/5/19 16:52, Xishi Qiu wrote:
> > On 2017/5/18 17:46, Xishi Qiu wrote:
> > 
> >> Hi, my system triggers this bug, and the vmcore shows the anon_vma seems 
> >> be freed.
> >> The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know 
> >> if it
> >> exists in mainline, any reply is welcome!
> >>
> > 
> > When we alloc anon_vma, we will init the value of anon_vma->root,
> > so can we set anon_vma->root to NULL when calling
> > anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);
> > 
> > anon_vma_free()
> > ...
> > anon_vma->root = NULL;
> > kmem_cache_free(anon_vma_cachep, anon_vma);
> > 
> > I find if we do this above, system boot failed, why?
> > 
> 
> If anon_vma was freed, we should not to access the root_anon_vma, because it 
> maybe also
> freed(e.g. anon_vma == root_anon_vma), right?
> 
> page_lock_anon_vma_read()
>   ...
>   anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>   root_anon_vma = ACCESS_ONCE(anon_vma->root);
>   if (down_read_trylock(_anon_vma->rwsem)) {  // it's not safe
>   ...
>   if (!atomic_inc_not_zero(_vma->refcount)) {  // check anon_vma was 
> not freed
>   ...
>   anon_vma_lock_read(anon_vma);  // it's safe
>   ...

You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
root in anon_vma_free(), others could still be looking at it.

Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Hugh Dickins
On Fri, 19 May 2017, Xishi Qiu wrote:
> On 2017/5/19 16:52, Xishi Qiu wrote:
> > On 2017/5/18 17:46, Xishi Qiu wrote:
> > 
> >> Hi, my system triggers this bug, and the vmcore shows the anon_vma seems 
> >> be freed.
> >> The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know 
> >> if it
> >> exists in mainline, any reply is welcome!
> >>
> > 
> > When we alloc anon_vma, we will init the value of anon_vma->root,
> > so can we set anon_vma->root to NULL when calling
> > anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);
> > 
> > anon_vma_free()
> > ...
> > anon_vma->root = NULL;
> > kmem_cache_free(anon_vma_cachep, anon_vma);
> > 
> > I find if we do this above, system boot failed, why?
> > 
> 
> If anon_vma was freed, we should not to access the root_anon_vma, because it 
> maybe also
> freed(e.g. anon_vma == root_anon_vma), right?
> 
> page_lock_anon_vma_read()
>   ...
>   anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>   root_anon_vma = ACCESS_ONCE(anon_vma->root);
>   if (down_read_trylock(_anon_vma->rwsem)) {  // it's not safe
>   ...
>   if (!atomic_inc_not_zero(_vma->refcount)) {  // check anon_vma was 
> not freed
>   ...
>   anon_vma_lock_read(anon_vma);  // it's safe
>   ...

You're ignoring the rcu_read_lock() on entry to page_lock_anon_vma_read(),
and the SLAB_DESTROY_BY_RCU (recently renamed SLAB_TYPESAFE_BY_RCU) nature
of the anon_vma_cachep kmem cache.  It is not safe to muck with anon_vma->
root in anon_vma_free(), others could still be looking at it.

Hugh


Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/19 16:52, Xishi Qiu wrote:

> On 2017/5/18 17:46, Xishi Qiu wrote:
> 
>> Hi, my system triggers this bug, and the vmcore shows the anon_vma seems be 
>> freed.
>> The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know if 
>> it
>> exists in mainline, any reply is welcome!
>>
> 
> When we alloc anon_vma, we will init the value of anon_vma->root,
> so can we set anon_vma->root to NULL when calling
> anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);
> 
> anon_vma_free()
>   ...
>   anon_vma->root = NULL;
>   kmem_cache_free(anon_vma_cachep, anon_vma);
> 
> I find if we do this above, system boot failed, why?
> 

If anon_vma was freed, we should not to access the root_anon_vma, because it 
maybe also
freed(e.g. anon_vma == root_anon_vma), right?

page_lock_anon_vma_read()
...
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = ACCESS_ONCE(anon_vma->root);
if (down_read_trylock(_anon_vma->rwsem)) {  // it's not safe
...
if (!atomic_inc_not_zero(_vma->refcount)) {  // check anon_vma was 
not freed
...
anon_vma_lock_read(anon_vma);  // it's safe
...


> Thanks,
> Xishi Qiu
> 
>> [35030.332666] general protection fault:  [#1] SMP
>> [35030.333016] Modules linked in: veth ipt_MASQUERADE nf_nat_masquerade_ipv4 
>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
>> iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool 
>> dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtos_kbox_panic(OE) 
>> ipmi_devintf ipmi_si ipmi_msghandler signo_catch(O) cirrus syscopyarea 
>> sysfillrect sysimgblt ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper 
>> aesni_intel ppdev drm lrw gf128mul parport_pc glue_helper ablk_helper 
>> serio_raw cryptd i2c_piix4 parport pcspkr sg floppy i2c_core dm_mod 
>> sha512_generic ip_tables sd_mod crc_t10dif crct10dif_generic sr_mod cdrom 
>> virtio_console virtio_scsi virtio_net ata_generic pata_acpi crct10dif_pclmul 
>> crct10dif_common crc32c_intel virtio_pci virtio_ring virtio ata_piix libata 
>> ext4 mbcache
>> [35030.333016]  jbd2
>> [35030.333016] CPU: 3 PID: 48 Comm: kswapd0 Tainted: G   OE   
>> ---   3.10.0-327.36.58.4.x86_64 #1
>> [35030.333016] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
>> rel-1.8.1-0-g4adadbd-20160826_03-hghoulaslx112 04/01/2014
>> [35030.333016] task: 8801b2d2 ti: 8801b4c38000 task.ti: 
>> 8801b4c38000
>> [35030.333016] RIP: 0010:[]  [] 
>> down_read_trylock+0x5/0x50
>> [35030.333016] RSP: :8801b4c3ba90  EFLAGS: 00010282
>> [35030.333016] RAX:  RBX: 8801b3e2a100 RCX: 
>> 
>> [35030.333016] RDX:  RSI:  RDI: 
>> deb604d497705c5d
>> [35030.333016] RBP: 8801b4c3bab8 R08: ea0002c34460 R09: 
>> 8801b3d7e8a0
>> [35030.333016] R10: 0004 R11: fff0fe00 R12: 
>> 8801b3e2a101
>> [35030.333016] R13: ea0002c34440 R14: deb604d497705c5d R15: 
>> ea0002c34440
>> [35030.333016] FS:  () GS:8801bed8() 
>> knlGS:
>> [35030.333016] CS:  0010 DS:  ES:  CR0: 80050033
>> [35030.333016] CR2: 00c422011080 CR3: 01976000 CR4: 
>> 001407e0
>> [35030.333016] DR0:  DR1:  DR2: 
>> 
>> [35030.333016] DR3:  DR6: 0ff0 DR7: 
>> 0400
>> [35030.333016] Stack:
>> [35030.333016]  811b2795 ea0002c34440  
>> 000f
>> [35030.333016]  0001 8801b4c3bb30 811b2a17 
>> 8800a712d640
>> [35030.333016]  0c4229e2 8801b4c3bb80 0001 
>> 0c41fe38
>> [35030.333016] Call Trace:
>> [35030.333016]  [] ? page_lock_anon_vma_read+0x55/0x110
>> [35030.333016]  [] page_referenced+0x1c7/0x350
>> [35030.333016]  [] shrink_active_list+0x1e4/0x400
>> [35030.333016]  [] shrink_lruvec+0x4bd/0x770
>> [35030.333016]  [] shrink_zone+0x76/0x1a0
>> [35030.333016]  [] balance_pgdat+0x49c/0x610
>> [35030.333016]  [] kswapd+0x173/0x450
>> [35030.333016]  [] ? wake_up_atomic_t+0x30/0x30
>> [35030.333016]  [] ? balance_pgdat+0x610/0x610
>> [35030.333016]  [] kthread+0xcf/0xe0
>> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
>> [35030.333016]  [] ret_from_fork+0x58/0x90
>> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
>> [35030.333016] Code: 00 ba ff ff ff ff 48 89 d8 f0 48 0f c1 10 79 05 e8 31 
>> 06 27 00 5b 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
>> <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
>> [35030.333016] RIP  [] down_read_trylock+0x5/0x50
>> [35030.333016]  RSP 
>> [35030.333016] [ cut here ]
>>
>> struct page {
>>   flags = 9007194960298056,
>>   mapping = 0x8801b3e2a101,
>>   {
>> {
>>   index = 

Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/19 16:52, Xishi Qiu wrote:

> On 2017/5/18 17:46, Xishi Qiu wrote:
> 
>> Hi, my system triggers this bug, and the vmcore shows the anon_vma seems be 
>> freed.
>> The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know if 
>> it
>> exists in mainline, any reply is welcome!
>>
> 
> When we alloc anon_vma, we will init the value of anon_vma->root,
> so can we set anon_vma->root to NULL when calling
> anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);
> 
> anon_vma_free()
>   ...
>   anon_vma->root = NULL;
>   kmem_cache_free(anon_vma_cachep, anon_vma);
> 
> I find if we do this above, system boot failed, why?
> 

If anon_vma was freed, we should not to access the root_anon_vma, because it 
maybe also
freed(e.g. anon_vma == root_anon_vma), right?

page_lock_anon_vma_read()
...
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = ACCESS_ONCE(anon_vma->root);
if (down_read_trylock(_anon_vma->rwsem)) {  // it's not safe
...
if (!atomic_inc_not_zero(_vma->refcount)) {  // check anon_vma was 
not freed
...
anon_vma_lock_read(anon_vma);  // it's safe
...


> Thanks,
> Xishi Qiu
> 
>> [35030.332666] general protection fault:  [#1] SMP
>> [35030.333016] Modules linked in: veth ipt_MASQUERADE nf_nat_masquerade_ipv4 
>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
>> iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool 
>> dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtos_kbox_panic(OE) 
>> ipmi_devintf ipmi_si ipmi_msghandler signo_catch(O) cirrus syscopyarea 
>> sysfillrect sysimgblt ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper 
>> aesni_intel ppdev drm lrw gf128mul parport_pc glue_helper ablk_helper 
>> serio_raw cryptd i2c_piix4 parport pcspkr sg floppy i2c_core dm_mod 
>> sha512_generic ip_tables sd_mod crc_t10dif crct10dif_generic sr_mod cdrom 
>> virtio_console virtio_scsi virtio_net ata_generic pata_acpi crct10dif_pclmul 
>> crct10dif_common crc32c_intel virtio_pci virtio_ring virtio ata_piix libata 
>> ext4 mbcache
>> [35030.333016]  jbd2
>> [35030.333016] CPU: 3 PID: 48 Comm: kswapd0 Tainted: G   OE   
>> ---   3.10.0-327.36.58.4.x86_64 #1
>> [35030.333016] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
>> rel-1.8.1-0-g4adadbd-20160826_03-hghoulaslx112 04/01/2014
>> [35030.333016] task: 8801b2d2 ti: 8801b4c38000 task.ti: 
>> 8801b4c38000
>> [35030.333016] RIP: 0010:[]  [] 
>> down_read_trylock+0x5/0x50
>> [35030.333016] RSP: :8801b4c3ba90  EFLAGS: 00010282
>> [35030.333016] RAX:  RBX: 8801b3e2a100 RCX: 
>> 
>> [35030.333016] RDX:  RSI:  RDI: 
>> deb604d497705c5d
>> [35030.333016] RBP: 8801b4c3bab8 R08: ea0002c34460 R09: 
>> 8801b3d7e8a0
>> [35030.333016] R10: 0004 R11: fff0fe00 R12: 
>> 8801b3e2a101
>> [35030.333016] R13: ea0002c34440 R14: deb604d497705c5d R15: 
>> ea0002c34440
>> [35030.333016] FS:  () GS:8801bed8() 
>> knlGS:
>> [35030.333016] CS:  0010 DS:  ES:  CR0: 80050033
>> [35030.333016] CR2: 00c422011080 CR3: 01976000 CR4: 
>> 001407e0
>> [35030.333016] DR0:  DR1:  DR2: 
>> 
>> [35030.333016] DR3:  DR6: 0ff0 DR7: 
>> 0400
>> [35030.333016] Stack:
>> [35030.333016]  811b2795 ea0002c34440  
>> 000f
>> [35030.333016]  0001 8801b4c3bb30 811b2a17 
>> 8800a712d640
>> [35030.333016]  0c4229e2 8801b4c3bb80 0001 
>> 0c41fe38
>> [35030.333016] Call Trace:
>> [35030.333016]  [] ? page_lock_anon_vma_read+0x55/0x110
>> [35030.333016]  [] page_referenced+0x1c7/0x350
>> [35030.333016]  [] shrink_active_list+0x1e4/0x400
>> [35030.333016]  [] shrink_lruvec+0x4bd/0x770
>> [35030.333016]  [] shrink_zone+0x76/0x1a0
>> [35030.333016]  [] balance_pgdat+0x49c/0x610
>> [35030.333016]  [] kswapd+0x173/0x450
>> [35030.333016]  [] ? wake_up_atomic_t+0x30/0x30
>> [35030.333016]  [] ? balance_pgdat+0x610/0x610
>> [35030.333016]  [] kthread+0xcf/0xe0
>> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
>> [35030.333016]  [] ret_from_fork+0x58/0x90
>> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
>> [35030.333016] Code: 00 ba ff ff ff ff 48 89 d8 f0 48 0f c1 10 79 05 e8 31 
>> 06 27 00 5b 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
>> <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
>> [35030.333016] RIP  [] down_read_trylock+0x5/0x50
>> [35030.333016]  RSP 
>> [35030.333016] [ cut here ]
>>
>> struct page {
>>   flags = 9007194960298056,
>>   mapping = 0x8801b3e2a101,
>>   {
>> {
>>   index = 

Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/18 17:46, Xishi Qiu wrote:

> Hi, my system triggers this bug, and the vmcore shows the anon_vma seems be 
> freed.
> The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know if 
> it
> exists in mainline, any reply is welcome!
> 

When we alloc anon_vma, we will init the value of anon_vma->root,
so can we set anon_vma->root to NULL when calling
anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);

anon_vma_free()
...
anon_vma->root = NULL;
kmem_cache_free(anon_vma_cachep, anon_vma);

I find if we do this above, system boot failed, why?

Thanks,
Xishi Qiu

> [35030.332666] general protection fault:  [#1] SMP
> [35030.333016] Modules linked in: veth ipt_MASQUERADE nf_nat_masquerade_ipv4 
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
> iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool 
> dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtos_kbox_panic(OE) 
> ipmi_devintf ipmi_si ipmi_msghandler signo_catch(O) cirrus syscopyarea 
> sysfillrect sysimgblt ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper 
> aesni_intel ppdev drm lrw gf128mul parport_pc glue_helper ablk_helper 
> serio_raw cryptd i2c_piix4 parport pcspkr sg floppy i2c_core dm_mod 
> sha512_generic ip_tables sd_mod crc_t10dif crct10dif_generic sr_mod cdrom 
> virtio_console virtio_scsi virtio_net ata_generic pata_acpi crct10dif_pclmul 
> crct10dif_common crc32c_intel virtio_pci virtio_ring virtio ata_piix libata 
> ext4 mbcache
> [35030.333016]  jbd2
> [35030.333016] CPU: 3 PID: 48 Comm: kswapd0 Tainted: G   OE   
> ---   3.10.0-327.36.58.4.x86_64 #1
> [35030.333016] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
> rel-1.8.1-0-g4adadbd-20160826_03-hghoulaslx112 04/01/2014
> [35030.333016] task: 8801b2d2 ti: 8801b4c38000 task.ti: 
> 8801b4c38000
> [35030.333016] RIP: 0010:[]  [] 
> down_read_trylock+0x5/0x50
> [35030.333016] RSP: :8801b4c3ba90  EFLAGS: 00010282
> [35030.333016] RAX:  RBX: 8801b3e2a100 RCX: 
> 
> [35030.333016] RDX:  RSI:  RDI: 
> deb604d497705c5d
> [35030.333016] RBP: 8801b4c3bab8 R08: ea0002c34460 R09: 
> 8801b3d7e8a0
> [35030.333016] R10: 0004 R11: fff0fe00 R12: 
> 8801b3e2a101
> [35030.333016] R13: ea0002c34440 R14: deb604d497705c5d R15: 
> ea0002c34440
> [35030.333016] FS:  () GS:8801bed8() 
> knlGS:
> [35030.333016] CS:  0010 DS:  ES:  CR0: 80050033
> [35030.333016] CR2: 00c422011080 CR3: 01976000 CR4: 
> 001407e0
> [35030.333016] DR0:  DR1:  DR2: 
> 
> [35030.333016] DR3:  DR6: 0ff0 DR7: 
> 0400
> [35030.333016] Stack:
> [35030.333016]  811b2795 ea0002c34440  
> 000f
> [35030.333016]  0001 8801b4c3bb30 811b2a17 
> 8800a712d640
> [35030.333016]  0c4229e2 8801b4c3bb80 0001 
> 0c41fe38
> [35030.333016] Call Trace:
> [35030.333016]  [] ? page_lock_anon_vma_read+0x55/0x110
> [35030.333016]  [] page_referenced+0x1c7/0x350
> [35030.333016]  [] shrink_active_list+0x1e4/0x400
> [35030.333016]  [] shrink_lruvec+0x4bd/0x770
> [35030.333016]  [] shrink_zone+0x76/0x1a0
> [35030.333016]  [] balance_pgdat+0x49c/0x610
> [35030.333016]  [] kswapd+0x173/0x450
> [35030.333016]  [] ? wake_up_atomic_t+0x30/0x30
> [35030.333016]  [] ? balance_pgdat+0x610/0x610
> [35030.333016]  [] kthread+0xcf/0xe0
> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
> [35030.333016]  [] ret_from_fork+0x58/0x90
> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
> [35030.333016] Code: 00 ba ff ff ff ff 48 89 d8 f0 48 0f c1 10 79 05 e8 31 06 
> 27 00 5b 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <48> 
> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
> [35030.333016] RIP  [] down_read_trylock+0x5/0x50
> [35030.333016]  RSP 
> [35030.333016] [ cut here ]
> 
> struct page {
>   flags = 9007194960298056,
>   mapping = 0x8801b3e2a101,
>   {
> {
>   index = 34324593617,
>   freelist = 0x7fde7bbd1,
>   pfmemalloc = 209,
>   thp_mmu_gather = {
> counter = -35144751
>   },
>   pmd_huge_pte = 0x7fde7bbd1
> },
> {
>   counters = 8589934592,
>   {
> {
>   _mapcount = {
> counter = 0
>   },
>   {
> inuse = 0,
> objects = 0,
> frozen = 0
>   },
>   units = 0
> },
> _count = {
>   counter = 2
> }
>   }
> }
>   },
>   {
> lru = {
>   next = 0xdead00100100,
>   prev = 0xdead00200200
> },
> {
>   next = 0xdead00100100,
>   pages = 

Re: mm, something wring in page_lock_anon_vma_read()?

2017-05-19 Thread Xishi Qiu
On 2017/5/18 17:46, Xishi Qiu wrote:

> Hi, my system triggers this bug, and the vmcore shows the anon_vma seems be 
> freed.
> The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know if 
> it
> exists in mainline, any reply is welcome!
> 

When we alloc anon_vma, we will init the value of anon_vma->root,
so can we set anon_vma->root to NULL when calling
anon_vma_free -> kmem_cache_free(anon_vma_cachep, anon_vma);

anon_vma_free()
...
anon_vma->root = NULL;
kmem_cache_free(anon_vma_cachep, anon_vma);

I find if we do this above, system boot failed, why?

Thanks,
Xishi Qiu

> [35030.332666] general protection fault:  [#1] SMP
> [35030.333016] Modules linked in: veth ipt_MASQUERADE nf_nat_masquerade_ipv4 
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
> iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool 
> dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtos_kbox_panic(OE) 
> ipmi_devintf ipmi_si ipmi_msghandler signo_catch(O) cirrus syscopyarea 
> sysfillrect sysimgblt ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper 
> aesni_intel ppdev drm lrw gf128mul parport_pc glue_helper ablk_helper 
> serio_raw cryptd i2c_piix4 parport pcspkr sg floppy i2c_core dm_mod 
> sha512_generic ip_tables sd_mod crc_t10dif crct10dif_generic sr_mod cdrom 
> virtio_console virtio_scsi virtio_net ata_generic pata_acpi crct10dif_pclmul 
> crct10dif_common crc32c_intel virtio_pci virtio_ring virtio ata_piix libata 
> ext4 mbcache
> [35030.333016]  jbd2
> [35030.333016] CPU: 3 PID: 48 Comm: kswapd0 Tainted: G   OE   
> ---   3.10.0-327.36.58.4.x86_64 #1
> [35030.333016] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
> rel-1.8.1-0-g4adadbd-20160826_03-hghoulaslx112 04/01/2014
> [35030.333016] task: 8801b2d2 ti: 8801b4c38000 task.ti: 
> 8801b4c38000
> [35030.333016] RIP: 0010:[]  [] 
> down_read_trylock+0x5/0x50
> [35030.333016] RSP: :8801b4c3ba90  EFLAGS: 00010282
> [35030.333016] RAX:  RBX: 8801b3e2a100 RCX: 
> 
> [35030.333016] RDX:  RSI:  RDI: 
> deb604d497705c5d
> [35030.333016] RBP: 8801b4c3bab8 R08: ea0002c34460 R09: 
> 8801b3d7e8a0
> [35030.333016] R10: 0004 R11: fff0fe00 R12: 
> 8801b3e2a101
> [35030.333016] R13: ea0002c34440 R14: deb604d497705c5d R15: 
> ea0002c34440
> [35030.333016] FS:  () GS:8801bed8() 
> knlGS:
> [35030.333016] CS:  0010 DS:  ES:  CR0: 80050033
> [35030.333016] CR2: 00c422011080 CR3: 01976000 CR4: 
> 001407e0
> [35030.333016] DR0:  DR1:  DR2: 
> 
> [35030.333016] DR3:  DR6: 0ff0 DR7: 
> 0400
> [35030.333016] Stack:
> [35030.333016]  811b2795 ea0002c34440  
> 000f
> [35030.333016]  0001 8801b4c3bb30 811b2a17 
> 8800a712d640
> [35030.333016]  0c4229e2 8801b4c3bb80 0001 
> 0c41fe38
> [35030.333016] Call Trace:
> [35030.333016]  [] ? page_lock_anon_vma_read+0x55/0x110
> [35030.333016]  [] page_referenced+0x1c7/0x350
> [35030.333016]  [] shrink_active_list+0x1e4/0x400
> [35030.333016]  [] shrink_lruvec+0x4bd/0x770
> [35030.333016]  [] shrink_zone+0x76/0x1a0
> [35030.333016]  [] balance_pgdat+0x49c/0x610
> [35030.333016]  [] kswapd+0x173/0x450
> [35030.333016]  [] ? wake_up_atomic_t+0x30/0x30
> [35030.333016]  [] ? balance_pgdat+0x610/0x610
> [35030.333016]  [] kthread+0xcf/0xe0
> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
> [35030.333016]  [] ret_from_fork+0x58/0x90
> [35030.333016]  [] ? kthread_create_on_node+0x120/0x120
> [35030.333016] Code: 00 ba ff ff ff ff 48 89 d8 f0 48 0f c1 10 79 05 e8 31 06 
> 27 00 5b 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <48> 
> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
> [35030.333016] RIP  [] down_read_trylock+0x5/0x50
> [35030.333016]  RSP 
> [35030.333016] [ cut here ]
> 
> struct page {
>   flags = 9007194960298056,
>   mapping = 0x8801b3e2a101,
>   {
> {
>   index = 34324593617,
>   freelist = 0x7fde7bbd1,
>   pfmemalloc = 209,
>   thp_mmu_gather = {
> counter = -35144751
>   },
>   pmd_huge_pte = 0x7fde7bbd1
> },
> {
>   counters = 8589934592,
>   {
> {
>   _mapcount = {
> counter = 0
>   },
>   {
> inuse = 0,
> objects = 0,
> frozen = 0
>   },
>   units = 0
> },
> _count = {
>   counter = 2
> }
>   }
> }
>   },
>   {
> lru = {
>   next = 0xdead00100100,
>   prev = 0xdead00200200
> },
> {
>   next = 0xdead00100100,
>   pages = 

mm, something wring in page_lock_anon_vma_read()?

2017-05-18 Thread Xishi Qiu
Hi, my system triggers this bug, and the vmcore shows the anon_vma seems be 
freed.
The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know if it
exists in mainline, any reply is welcome!

[35030.332666] general protection fault:  [#1] SMP
[35030.333016] Modules linked in: veth ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool 
dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtos_kbox_panic(OE) 
ipmi_devintf ipmi_si ipmi_msghandler signo_catch(O) cirrus syscopyarea 
sysfillrect sysimgblt ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper 
aesni_intel ppdev drm lrw gf128mul parport_pc glue_helper ablk_helper serio_raw 
cryptd i2c_piix4 parport pcspkr sg floppy i2c_core dm_mod sha512_generic 
ip_tables sd_mod crc_t10dif crct10dif_generic sr_mod cdrom virtio_console 
virtio_scsi virtio_net ata_generic pata_acpi crct10dif_pclmul crct10dif_common 
crc32c_intel virtio_pci virtio_ring virtio ata_piix libata ext4 mbcache
[35030.333016]  jbd2
[35030.333016] CPU: 3 PID: 48 Comm: kswapd0 Tainted: G   OE   
---   3.10.0-327.36.58.4.x86_64 #1
[35030.333016] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
rel-1.8.1-0-g4adadbd-20160826_03-hghoulaslx112 04/01/2014
[35030.333016] task: 8801b2d2 ti: 8801b4c38000 task.ti: 
8801b4c38000
[35030.333016] RIP: 0010:[]  [] 
down_read_trylock+0x5/0x50
[35030.333016] RSP: :8801b4c3ba90  EFLAGS: 00010282
[35030.333016] RAX:  RBX: 8801b3e2a100 RCX: 
[35030.333016] RDX:  RSI:  RDI: deb604d497705c5d
[35030.333016] RBP: 8801b4c3bab8 R08: ea0002c34460 R09: 8801b3d7e8a0
[35030.333016] R10: 0004 R11: fff0fe00 R12: 8801b3e2a101
[35030.333016] R13: ea0002c34440 R14: deb604d497705c5d R15: ea0002c34440
[35030.333016] FS:  () GS:8801bed8() 
knlGS:
[35030.333016] CS:  0010 DS:  ES:  CR0: 80050033
[35030.333016] CR2: 00c422011080 CR3: 01976000 CR4: 001407e0
[35030.333016] DR0:  DR1:  DR2: 
[35030.333016] DR3:  DR6: 0ff0 DR7: 0400
[35030.333016] Stack:
[35030.333016]  811b2795 ea0002c34440  
000f
[35030.333016]  0001 8801b4c3bb30 811b2a17 
8800a712d640
[35030.333016]  0c4229e2 8801b4c3bb80 0001 
0c41fe38
[35030.333016] Call Trace:
[35030.333016]  [] ? page_lock_anon_vma_read+0x55/0x110
[35030.333016]  [] page_referenced+0x1c7/0x350
[35030.333016]  [] shrink_active_list+0x1e4/0x400
[35030.333016]  [] shrink_lruvec+0x4bd/0x770
[35030.333016]  [] shrink_zone+0x76/0x1a0
[35030.333016]  [] balance_pgdat+0x49c/0x610
[35030.333016]  [] kswapd+0x173/0x450
[35030.333016]  [] ? wake_up_atomic_t+0x30/0x30
[35030.333016]  [] ? balance_pgdat+0x610/0x610
[35030.333016]  [] kthread+0xcf/0xe0
[35030.333016]  [] ? kthread_create_on_node+0x120/0x120
[35030.333016]  [] ret_from_fork+0x58/0x90
[35030.333016]  [] ? kthread_create_on_node+0x120/0x120
[35030.333016] Code: 00 ba ff ff ff ff 48 89 d8 f0 48 0f c1 10 79 05 e8 31 06 
27 00 5b 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <48> 8b 
07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
[35030.333016] RIP  [] down_read_trylock+0x5/0x50
[35030.333016]  RSP 
[35030.333016] [ cut here ]

struct page {
  flags = 9007194960298056,
  mapping = 0x8801b3e2a101,
  {
{
  index = 34324593617,
  freelist = 0x7fde7bbd1,
  pfmemalloc = 209,
  thp_mmu_gather = {
counter = -35144751
  },
  pmd_huge_pte = 0x7fde7bbd1
},
{
  counters = 8589934592,
  {
{
  _mapcount = {
counter = 0
  },
  {
inuse = 0,
objects = 0,
frozen = 0
  },
  units = 0
},
_count = {
  counter = 2
}
  }
}
  },
  {
lru = {
  next = 0xdead00100100,
  prev = 0xdead00200200
},
{
  next = 0xdead00100100,
  pages = 2097664,
  pobjects = -559087616
},
list = {
  next = 0xdead00100100,
  prev = 0xdead00200200
},
slab_page = 0xdead00100100
  },
  {
private = 0,
ptl = {
  {
rlock = {
  raw_lock = {
{
  head_tail = 0,
  tickets = {
head = 0,
tail = 0
  }
}
  }
}
  }
},
slab_cache = 0x0,
first_page = 0x0
  }
}



crash> struct anon_vma 0x8801b3e2a100
struct anon_vma {
  root = 0xdeb604d497705c55,
  rwsem = {
count = -8192007903225070328,
wait_lock = {
  

mm, something wring in page_lock_anon_vma_read()?

2017-05-18 Thread Xishi Qiu
Hi, my system triggers this bug, and the vmcore shows the anon_vma seems be 
freed.
The kernel is RHEL 7.2, and the bug is hard to reproduce, so I don't know if it
exists in mainline, any reply is welcome!

[35030.332666] general protection fault:  [#1] SMP
[35030.333016] Modules linked in: veth ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool 
dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtos_kbox_panic(OE) 
ipmi_devintf ipmi_si ipmi_msghandler signo_catch(O) cirrus syscopyarea 
sysfillrect sysimgblt ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper 
aesni_intel ppdev drm lrw gf128mul parport_pc glue_helper ablk_helper serio_raw 
cryptd i2c_piix4 parport pcspkr sg floppy i2c_core dm_mod sha512_generic 
ip_tables sd_mod crc_t10dif crct10dif_generic sr_mod cdrom virtio_console 
virtio_scsi virtio_net ata_generic pata_acpi crct10dif_pclmul crct10dif_common 
crc32c_intel virtio_pci virtio_ring virtio ata_piix libata ext4 mbcache
[35030.333016]  jbd2
[35030.333016] CPU: 3 PID: 48 Comm: kswapd0 Tainted: G   OE   
---   3.10.0-327.36.58.4.x86_64 #1
[35030.333016] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
rel-1.8.1-0-g4adadbd-20160826_03-hghoulaslx112 04/01/2014
[35030.333016] task: 8801b2d2 ti: 8801b4c38000 task.ti: 
8801b4c38000
[35030.333016] RIP: 0010:[]  [] 
down_read_trylock+0x5/0x50
[35030.333016] RSP: :8801b4c3ba90  EFLAGS: 00010282
[35030.333016] RAX:  RBX: 8801b3e2a100 RCX: 
[35030.333016] RDX:  RSI:  RDI: deb604d497705c5d
[35030.333016] RBP: 8801b4c3bab8 R08: ea0002c34460 R09: 8801b3d7e8a0
[35030.333016] R10: 0004 R11: fff0fe00 R12: 8801b3e2a101
[35030.333016] R13: ea0002c34440 R14: deb604d497705c5d R15: ea0002c34440
[35030.333016] FS:  () GS:8801bed8() 
knlGS:
[35030.333016] CS:  0010 DS:  ES:  CR0: 80050033
[35030.333016] CR2: 00c422011080 CR3: 01976000 CR4: 001407e0
[35030.333016] DR0:  DR1:  DR2: 
[35030.333016] DR3:  DR6: 0ff0 DR7: 0400
[35030.333016] Stack:
[35030.333016]  811b2795 ea0002c34440  
000f
[35030.333016]  0001 8801b4c3bb30 811b2a17 
8800a712d640
[35030.333016]  0c4229e2 8801b4c3bb80 0001 
0c41fe38
[35030.333016] Call Trace:
[35030.333016]  [] ? page_lock_anon_vma_read+0x55/0x110
[35030.333016]  [] page_referenced+0x1c7/0x350
[35030.333016]  [] shrink_active_list+0x1e4/0x400
[35030.333016]  [] shrink_lruvec+0x4bd/0x770
[35030.333016]  [] shrink_zone+0x76/0x1a0
[35030.333016]  [] balance_pgdat+0x49c/0x610
[35030.333016]  [] kswapd+0x173/0x450
[35030.333016]  [] ? wake_up_atomic_t+0x30/0x30
[35030.333016]  [] ? balance_pgdat+0x610/0x610
[35030.333016]  [] kthread+0xcf/0xe0
[35030.333016]  [] ? kthread_create_on_node+0x120/0x120
[35030.333016]  [] ret_from_fork+0x58/0x90
[35030.333016]  [] ? kthread_create_on_node+0x120/0x120
[35030.333016] Code: 00 ba ff ff ff ff 48 89 d8 f0 48 0f c1 10 79 05 e8 31 06 
27 00 5b 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <48> 8b 
07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
[35030.333016] RIP  [] down_read_trylock+0x5/0x50
[35030.333016]  RSP 
[35030.333016] [ cut here ]

struct page {
  flags = 9007194960298056,
  mapping = 0x8801b3e2a101,
  {
{
  index = 34324593617,
  freelist = 0x7fde7bbd1,
  pfmemalloc = 209,
  thp_mmu_gather = {
counter = -35144751
  },
  pmd_huge_pte = 0x7fde7bbd1
},
{
  counters = 8589934592,
  {
{
  _mapcount = {
counter = 0
  },
  {
inuse = 0,
objects = 0,
frozen = 0
  },
  units = 0
},
_count = {
  counter = 2
}
  }
}
  },
  {
lru = {
  next = 0xdead00100100,
  prev = 0xdead00200200
},
{
  next = 0xdead00100100,
  pages = 2097664,
  pobjects = -559087616
},
list = {
  next = 0xdead00100100,
  prev = 0xdead00200200
},
slab_page = 0xdead00100100
  },
  {
private = 0,
ptl = {
  {
rlock = {
  raw_lock = {
{
  head_tail = 0,
  tickets = {
head = 0,
tail = 0
  }
}
  }
}
  }
},
slab_cache = 0x0,
first_page = 0x0
  }
}



crash> struct anon_vma 0x8801b3e2a100
struct anon_vma {
  root = 0xdeb604d497705c55,
  rwsem = {
count = -8192007903225070328,
wait_lock = {