Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-26 Thread Eric Biggers
On Fri, Aug 25, 2017 at 04:41:36PM -0700, Mike Kravetz wrote:
> >
> > If madvise(..., MADV_FREE) split a transparent hugepage, it called
> > put_page() before unlock_page().  This was wrong because put_page() can
> > free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
> > removed it from the memory mapping.  put_page() then rightfully 
> > complained
> > about freeing a locked page.
> >
> > Fix this by moving the unlock_page() before put_page().
> >>>
> >>> Quick grep shows that a similar flow (put_page() followed by an
> >>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem 
> >>> as
> >>> well?
> >>
> >> I assume you are asking about this block of code?
> > 
> > Yes.
> > 
> >>
> >>/*
> >> * page_put due to reference from alloc_huge_page()
> >> * unlock_page because locked by add_to_page_cache()
> >> */
> >>put_page(page);
> >>unlock_page(page);
> >>
> >> Well, there is a typo (page_put) in the comment. :(
> >>
> >> However, in this case we have just added the huge page to a hugetlbfs
> >> file.  The put_page() is there just to drop the reference count on the
> >> page (taken when allocated).  It will still be non-zero as we have
> >> successfully added it to the page cache.  So, we are not freeing the
> >> page here, just dropping the reference count.
> >>
> >> This should not cause a problem like that seen in madvise.
> > 
> > Thanks for the quick response.
> > 
> > I am not too familiar with this piece of code, so just for the matter of
> > understanding: what prevents the page from being removed from the page cache
> > shortly after it is added (even if it is highly unlikely)? The page lock? 
> > The
> > inode lock?
> 
> Someone would need to acquire the inode lock to remove the page.  This
> is held until we exit the routine.  Also note that put_page for this
> type of huge page almost always results in the page being put back
> on a free list within the hugetlb(fs) subsystem.  It is not returned
> to the 'normal' memory allocators for general use.
> 

I'm not sure about that.  What about sys_fadvise64(..., POSIX_FADV_DONTNEED)?
That removes pages from the page cache without taking the inode lock.  It won't
remove locked pages though, so presumably it is only the page lock that prevents
the race with hugetlbfs_fallocate().

But in any case, why not do the "obviously correct" thing --- unlock before put?

Eric


Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-26 Thread Eric Biggers
On Fri, Aug 25, 2017 at 04:41:36PM -0700, Mike Kravetz wrote:
> >
> > If madvise(..., MADV_FREE) split a transparent hugepage, it called
> > put_page() before unlock_page().  This was wrong because put_page() can
> > free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
> > removed it from the memory mapping.  put_page() then rightfully 
> > complained
> > about freeing a locked page.
> >
> > Fix this by moving the unlock_page() before put_page().
> >>>
> >>> Quick grep shows that a similar flow (put_page() followed by an
> >>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem 
> >>> as
> >>> well?
> >>
> >> I assume you are asking about this block of code?
> > 
> > Yes.
> > 
> >>
> >>/*
> >> * page_put due to reference from alloc_huge_page()
> >> * unlock_page because locked by add_to_page_cache()
> >> */
> >>put_page(page);
> >>unlock_page(page);
> >>
> >> Well, there is a typo (page_put) in the comment. :(
> >>
> >> However, in this case we have just added the huge page to a hugetlbfs
> >> file.  The put_page() is there just to drop the reference count on the
> >> page (taken when allocated).  It will still be non-zero as we have
> >> successfully added it to the page cache.  So, we are not freeing the
> >> page here, just dropping the reference count.
> >>
> >> This should not cause a problem like that seen in madvise.
> > 
> > Thanks for the quick response.
> > 
> > I am not too familiar with this piece of code, so just for the matter of
> > understanding: what prevents the page from being removed from the page cache
> > shortly after it is added (even if it is highly unlikely)? The page lock? 
> > The
> > inode lock?
> 
> Someone would need to acquire the inode lock to remove the page.  This
> is held until we exit the routine.  Also note that put_page for this
> type of huge page almost always results in the page being put back
> on a free list within the hugetlb(fs) subsystem.  It is not returned
> to the 'normal' memory allocators for general use.
> 

I'm not sure about that.  What about sys_fadvise64(..., POSIX_FADV_DONTNEED)?
That removes pages from the page cache without taking the inode lock.  It won't
remove locked pages though, so presumably it is only the page lock that prevents
the race with hugetlbfs_fallocate().

But in any case, why not do the "obviously correct" thing --- unlock before put?

Eric


Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Mike Kravetz
On 08/25/2017 03:51 PM, Nadav Amit wrote:
> Mike Kravetz  wrote:
> 
>> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>>> Michal Hocko  wrote:
>>>
 Hmm, I do not see this neither in linux-mm nor LKML. Strange

 On Wed 23-08-17 14:41:21, Andrew Morton wrote:
> From: Eric Biggers 
> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>
> If madvise(..., MADV_FREE) split a transparent hugepage, it called
> put_page() before unlock_page().  This was wrong because put_page() can
> free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
> removed it from the memory mapping.  put_page() then rightfully complained
> about freeing a locked page.
>
> Fix this by moving the unlock_page() before put_page().
>>>
>>> Quick grep shows that a similar flow (put_page() followed by an
>>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
>>> well?
>>
>> I assume you are asking about this block of code?
> 
> Yes.
> 
>>
>>/*
>> * page_put due to reference from alloc_huge_page()
>> * unlock_page because locked by add_to_page_cache()
>> */
>>put_page(page);
>>unlock_page(page);
>>
>> Well, there is a typo (page_put) in the comment. :(
>>
>> However, in this case we have just added the huge page to a hugetlbfs
>> file.  The put_page() is there just to drop the reference count on the
>> page (taken when allocated).  It will still be non-zero as we have
>> successfully added it to the page cache.  So, we are not freeing the
>> page here, just dropping the reference count.
>>
>> This should not cause a problem like that seen in madvise.
> 
> Thanks for the quick response.
> 
> I am not too familiar with this piece of code, so just for the matter of
> understanding: what prevents the page from being removed from the page cache
> shortly after it is added (even if it is highly unlikely)? The page lock? The
> inode lock?

Someone would need to acquire the inode lock to remove the page.  This
is held until we exit the routine.  Also note that put_page for this
type of huge page almost always results in the page being put back
on a free list within the hugetlb(fs) subsystem.  It is not returned
to the 'normal' memory allocators for general use.

-- 
Mike Kravetz


Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Mike Kravetz
On 08/25/2017 03:51 PM, Nadav Amit wrote:
> Mike Kravetz  wrote:
> 
>> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>>> Michal Hocko  wrote:
>>>
 Hmm, I do not see this neither in linux-mm nor LKML. Strange

 On Wed 23-08-17 14:41:21, Andrew Morton wrote:
> From: Eric Biggers 
> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>
> If madvise(..., MADV_FREE) split a transparent hugepage, it called
> put_page() before unlock_page().  This was wrong because put_page() can
> free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
> removed it from the memory mapping.  put_page() then rightfully complained
> about freeing a locked page.
>
> Fix this by moving the unlock_page() before put_page().
>>>
>>> Quick grep shows that a similar flow (put_page() followed by an
>>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
>>> well?
>>
>> I assume you are asking about this block of code?
> 
> Yes.
> 
>>
>>/*
>> * page_put due to reference from alloc_huge_page()
>> * unlock_page because locked by add_to_page_cache()
>> */
>>put_page(page);
>>unlock_page(page);
>>
>> Well, there is a typo (page_put) in the comment. :(
>>
>> However, in this case we have just added the huge page to a hugetlbfs
>> file.  The put_page() is there just to drop the reference count on the
>> page (taken when allocated).  It will still be non-zero as we have
>> successfully added it to the page cache.  So, we are not freeing the
>> page here, just dropping the reference count.
>>
>> This should not cause a problem like that seen in madvise.
> 
> Thanks for the quick response.
> 
> I am not too familiar with this piece of code, so just for the matter of
> understanding: what prevents the page from being removed from the page cache
> shortly after it is added (even if it is highly unlikely)? The page lock? The
> inode lock?

Someone would need to acquire the inode lock to remove the page.  This
is held until we exit the routine.  Also note that put_page for this
type of huge page almost always results in the page being put back
on a free list within the hugetlb(fs) subsystem.  It is not returned
to the 'normal' memory allocators for general use.

-- 
Mike Kravetz


Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Nadav Amit
Mike Kravetz  wrote:

> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>> Michal Hocko  wrote:
>> 
>>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>> 
>>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
 From: Eric Biggers 
 Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
 
 If madvise(..., MADV_FREE) split a transparent hugepage, it called
 put_page() before unlock_page().  This was wrong because put_page() can
 free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
 removed it from the memory mapping.  put_page() then rightfully complained
 about freeing a locked page.
 
 Fix this by moving the unlock_page() before put_page().
>> 
>> Quick grep shows that a similar flow (put_page() followed by an
>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
>> well?
> 
> I assume you are asking about this block of code?

Yes.

> 
>/*
> * page_put due to reference from alloc_huge_page()
> * unlock_page because locked by add_to_page_cache()
> */
>put_page(page);
>unlock_page(page);
> 
> Well, there is a typo (page_put) in the comment. :(
> 
> However, in this case we have just added the huge page to a hugetlbfs
> file.  The put_page() is there just to drop the reference count on the
> page (taken when allocated).  It will still be non-zero as we have
> successfully added it to the page cache.  So, we are not freeing the
> page here, just dropping the reference count.
> 
> This should not cause a problem like that seen in madvise.

Thanks for the quick response.

I am not too familiar with this piece of code, so just for the matter of
understanding: what prevents the page from being removed from the page cache
shortly after it is added (even if it is highly unlikely)? The page lock? The
inode lock?

Thanks again,
Nadav



Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Nadav Amit
Mike Kravetz  wrote:

> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>> Michal Hocko  wrote:
>> 
>>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>> 
>>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
 From: Eric Biggers 
 Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
 
 If madvise(..., MADV_FREE) split a transparent hugepage, it called
 put_page() before unlock_page().  This was wrong because put_page() can
 free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
 removed it from the memory mapping.  put_page() then rightfully complained
 about freeing a locked page.
 
 Fix this by moving the unlock_page() before put_page().
>> 
>> Quick grep shows that a similar flow (put_page() followed by an
>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
>> well?
> 
> I assume you are asking about this block of code?

Yes.

> 
>/*
> * page_put due to reference from alloc_huge_page()
> * unlock_page because locked by add_to_page_cache()
> */
>put_page(page);
>unlock_page(page);
> 
> Well, there is a typo (page_put) in the comment. :(
> 
> However, in this case we have just added the huge page to a hugetlbfs
> file.  The put_page() is there just to drop the reference count on the
> page (taken when allocated).  It will still be non-zero as we have
> successfully added it to the page cache.  So, we are not freeing the
> page here, just dropping the reference count.
> 
> This should not cause a problem like that seen in madvise.

Thanks for the quick response.

I am not too familiar with this piece of code, so just for the matter of
understanding: what prevents the page from being removed from the page cache
shortly after it is added (even if it is highly unlikely)? The page lock? The
inode lock?

Thanks again,
Nadav



Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Mike Kravetz
On 08/25/2017 03:02 PM, Nadav Amit wrote:
> Michal Hocko  wrote:
> 
>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>
>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>> From: Eric Biggers 
>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>
>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>> put_page() before unlock_page().  This was wrong because put_page() can
>>> free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
>>> removed it from the memory mapping.  put_page() then rightfully complained
>>> about freeing a locked page.
>>>
>>> Fix this by moving the unlock_page() before put_page().
> 
> Quick grep shows that a similar flow (put_page() followed by an
> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
> well?

I assume you are asking about this block of code?

/*
 * page_put due to reference from alloc_huge_page()
 * unlock_page because locked by add_to_page_cache()
 */
put_page(page);
unlock_page(page);

Well, there is a typo (page_put) in the comment. :(

However, in this case we have just added the huge page to a hugetlbfs
file.  The put_page() is there just to drop the reference count on the
page (taken when allocated).  It will still be non-zero as we have
successfully added it to the page cache.  So, we are not freeing the
page here, just dropping the reference count.

This should not cause a problem like that seen in madvise.
-- 
Mike Kravetz


Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Mike Kravetz
On 08/25/2017 03:02 PM, Nadav Amit wrote:
> Michal Hocko  wrote:
> 
>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>
>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>> From: Eric Biggers 
>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>
>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>> put_page() before unlock_page().  This was wrong because put_page() can
>>> free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
>>> removed it from the memory mapping.  put_page() then rightfully complained
>>> about freeing a locked page.
>>>
>>> Fix this by moving the unlock_page() before put_page().
> 
> Quick grep shows that a similar flow (put_page() followed by an
> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
> well?

I assume you are asking about this block of code?

/*
 * page_put due to reference from alloc_huge_page()
 * unlock_page because locked by add_to_page_cache()
 */
put_page(page);
unlock_page(page);

Well, there is a typo (page_put) in the comment. :(

However, in this case we have just added the huge page to a hugetlbfs
file.  The put_page() is there just to drop the reference count on the
page (taken when allocated).  It will still be non-zero as we have
successfully added it to the page cache.  So, we are not freeing the
page here, just dropping the reference count.

This should not cause a problem like that seen in madvise.
-- 
Mike Kravetz


Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Nadav Amit
Michal Hocko  wrote:

> Hmm, I do not see this neither in linux-mm nor LKML. Strange
> 
> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>> From: Eric Biggers 
>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>> 
>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>> put_page() before unlock_page().  This was wrong because put_page() can
>> free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
>> removed it from the memory mapping.  put_page() then rightfully complained
>> about freeing a locked page.
>> 
>> Fix this by moving the unlock_page() before put_page().

Quick grep shows that a similar flow (put_page() followed by an
unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
well?



Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree

2017-08-25 Thread Nadav Amit
Michal Hocko  wrote:

> Hmm, I do not see this neither in linux-mm nor LKML. Strange
> 
> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>> From: Eric Biggers 
>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>> 
>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>> put_page() before unlock_page().  This was wrong because put_page() can
>> free the page, e.g.  if a concurrent madvise(..., MADV_DONTNEED) has
>> removed it from the memory mapping.  put_page() then rightfully complained
>> about freeing a locked page.
>> 
>> Fix this by moving the unlock_page() before put_page().

Quick grep shows that a similar flow (put_page() followed by an
unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
well?