On Mon, 30 May 2011 11:37:38 +0200
Jan Kara <[email protected]> wrote:

> Under heavy memory and filesystem load, users observe the assertion
> mapping->nrpages == 0 in end_writeback() trigger. This can be caused
> by page reclaim reclaiming the last page from a mapping in the following
> race:
>       CPU0                            CPU1
>   ...
>   shrink_page_list()
>     __remove_mapping()
>       __delete_from_page_cache()
>         radix_tree_delete()
>                                       evict_inode()
>                                         truncate_inode_pages()
>                                           truncate_inode_pages_range()
>                                             pagevec_lookup() - finds nothing
>                                         end_writeback()
>                                           mapping->nrpages != 0 -> BUG
>         page->mapping = NULL
>         mapping->nrpages--
> 
> Fix the problem by cycling the mapping->tree_lock at the end of
> truncate_inode_pages_range() to synchronize with page reclaim.
> 
> Analyzed by Jay <[email protected]>, lost in LKML, and dug
> out by Miklos Szeredi <[email protected]>.
> 
> CC: Jay <[email protected]>
> CC: [email protected]
> Acked-by: Miklos Szeredi <[email protected]>
> Signed-off-by: Jan Kara <[email protected]>
> ---
>  mm/truncate.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
>  Andrew, would you merge this patch please? Thanks.
> 
> diff --git a/mm/truncate.c b/mm/truncate.c
> index a956675..ec3d292 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -291,6 +291,13 @@ void truncate_inode_pages_range(struct address_space 
> *mapping,
>               pagevec_release(&pvec);
>               mem_cgroup_uncharge_end();
>       }
> +     /*
> +      * Cycle the tree_lock to make sure all __delete_from_page_cache()
> +      * calls run from page reclaim have finished as well (this handles the
> +      * case when page reclaim took the last page from our range).
> +      */
> +     spin_lock_irq(&mapping->tree_lock);
> +     spin_unlock_irq(&mapping->tree_lock);
>  }
>  EXPORT_SYMBOL(truncate_inode_pages_range);

That's one ugly patch.


Perhaps this regression was added by Nick's RCUification of pagecache. 

Before that patch, mapping->nrpages and the radix-tree state were
coherent for holders of tree_lock.  So pagevec_lookup() would never
return "no pages" while ->nrpages is non-zero.

After that patch, find_get_pages() uses RCU to protect the radix-tree
but I don't think it correctly protects the aggregate (radix-tree +
nrpages).


If it's not that then I see another possibility. 
truncate_inode_pages_range() does

        if (mapping->nrpages == 0)
                return;

Is there anything to prevent a page getting added to the inode _after_
this test?  i_mutex?  If not, that would trigger the BUG.


Either way, I don't think that the uglypatch expresses a full
understanding of te bug ;)

_______________________________________________
stable mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/stable

Reply via email to