Re: [f2fs-dev] [PATCH] mm: workingset: fix NULL ptr dereference

2018-04-10 Thread Minchan Kim
On Tue, Apr 10, 2018 at 05:05:28AM -0700, Matthew Wilcox wrote:
> On Tue, Apr 10, 2018 at 10:26:43AM +0200, Michal Hocko wrote:
> > On Mon 09-04-18 12:40:44, Matthew Wilcox wrote:
> > > The problem is that the mapping gfp flags are used not only for allocating
> > > pages, but also for allocating the page cache data structures that hold
> > > the pages.  F2FS is the only filesystem that set the __GFP_ZERO bit,
> > > so it's the first time anyone's noticed that the page cache passes the
> > > __GFP_ZERO bit through to the radix tree allocation routines, which
> > > causes the radix tree nodes to be zeroed instead of constructed.
> > > 
> > > I think the right solution to this is:
> > 
> > This just hides the underlying problem that the node is not fully and
> > properly initialized. Relying on the previous released state is just too
> > subtle.
> 
> That's the fundamental design of slab-with-constructors.  The user provides
> a constructor, so all newly allocagted objects are initialised to a known
> state, then the user will restore the object to that state when it frees
> the object to slab.
> 
> > Are you going to blacklist all potential gfp flags that come
> > from the mapping? This is just unmaintainable! If anything this should
> > be an explicit & with the allowed set of allowed flags.
> 
> Oh, I agree that using the set of flags used to allocate the page
> in order to allocate the radix tree nodes is a pretty horrible idea.
> 
> Your suggestion, then, is:
> 
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK);
> 
> correct?
> 

Looks much better.

Finally, it seems everyone agree on this. However, I won't include
warning part of slab allocator because I think it's improve stuff
not bug fix so it could be separted.
If anyone really want to include it in this stable patch,
please discuss with slub maintainers before.

Thanks for the reivew, Matthew, Michal, Jan and Johannes.

>From 652bb75124896fa040df78b98496a354f54fc524 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Tue, 10 Apr 2018 22:13:50 +0900
Subject: [PATCH v4] mm: workingset: fix NULL ptr dereference

GFP mask passed to page cache functions (often coming from
mapping->gfp_mask) is used both for allocation of page cache page and for
allocation of radix tree metadata necessary to add the page to the page
cache. When the mask contains __GFP_ZERO (as is the case for some f2fs
metadata mappings), this breaks radix tree code as that code expects
allocated radix tree nodes to be properly initialized by the slab
constructor and not zeroed. In particular node->private_list is failing
list_empty() check and the following list operation in
workingset_update_node() will dereference NULL.

Fix the problem by removing non-reclimable flags by GFP_RECLAIM_MASK
for radix tree allocations.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Cc: Johannes Weiner 
Cc: Jan Kara 
Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: Christopher Lameter 
Cc: linux-fsde...@vger.kernel.org
Cc: sta...@vger.kernel.org
Suggested-by: Matthew Wilcox 
Reported-by: Chris Fries 
Signed-off-by: Minchan Kim 
---
 mm/filemap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index ab77e19ab09c..5f3311edfea4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -786,7 +786,7 @@ int replace_page_cache_page(struct page *old, struct page 
*new, gfp_t gfp_mask)
VM_BUG_ON_PAGE(!PageLocked(new), new);
VM_BUG_ON_PAGE(new->mapping, new);
 
-   error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+   error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK);
if (!error) {
struct address_space *mapping = old->mapping;
void (*freepage)(struct page *);
@@ -842,7 +842,7 @@ static int __add_to_page_cache_locked(struct page *page,
return error;
}
 
-   error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
+   error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK);
if (error) {
if (!huge)
mem_cgroup_cancel_charge(page, memcg, false);
-- 
2.17.0.484.g0c8726318c-goog



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] mm: workingset: fix NULL ptr dereference

2018-04-09 Thread Minchan Kim
On Mon, Apr 09, 2018 at 07:41:52PM -0700, Matthew Wilcox wrote:
> On Tue, Apr 10, 2018 at 11:33:39AM +0900, Minchan Kim wrote:
> > @@ -522,7 +532,7 @@ EXPORT_SYMBOL(radix_tree_preload);
> >   */
> >  int radix_tree_maybe_preload(gfp_t gfp_mask)
> >  {
> > -   if (gfpflags_allow_blocking(gfp_mask))
> > +   if (gfpflags_allow_blocking(gfp_mask) && !(gfp_mask & __GFP_ZERO))
> > return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE);
> > /* Preloading doesn't help anything with this gfp mask, skip it */
> > preempt_disable();
> 
> No, you've completely misunderstood what's going on in this function.

Okay, I hope this version clear current concerns.

>From fb37c41b90f7d3ead1798e5cb7baef76709afd94 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Tue, 10 Apr 2018 11:54:57 +0900
Subject: [PATCH v3] mm: workingset: fix NULL ptr dereference

It assumes shadow entries of radix tree rely on the init state
that node->private_list allocated newly is list_empty state
for the working. Currently, it's initailized in SLAB constructor
which means node of radix tree would be initialized only when
*slub allocates new page*, not *slub alloctes new object*.

If some FS or subsystem pass gfp_mask to __GFP_ZERO, that means
newly allocated node can have !list_empty(node->private_list)
by memset of slab allocator. It ends up calling NULL deference
at workingset_update_node by failing list_empty check.

This patch fixes it.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Cc: Johannes Weiner 
Cc: Jan Kara 
Cc: Matthew Wilcox 
Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: Christopher Lameter 
Cc: linux-fsde...@vger.kernel.org
Cc: sta...@vger.kernel.org
Reported-by: Chris Fries 
Signed-off-by: Minchan Kim 
---
 lib/radix-tree.c | 9 +
 mm/filemap.c | 5 +++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index da9e10c827df..7569e637dbaa 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -470,6 +470,15 @@ static __must_check int __radix_tree_preload(gfp_t 
gfp_mask, unsigned nr)
struct radix_tree_node *node;
int ret = -ENOMEM;
 
+   /*
+* New allocate node must have node->private_list as INIT_LIST_HEAD
+* state by workingset shadow memory implementation.
+* If user pass  __GFP_ZERO by mistake, slab allocator will clear
+* node->private_list, which makes a BUG. Rather than going Oops,
+* just fix and warn about it.
+*/
+   if (WARN_ON(gfp_mask & __GFP_ZERO))
+   gfp_mask &= ~__GFP_ZERO;
/*
 * Nodes preloaded by one cgroup can be be used by another cgroup, so
 * they should never be accounted to any particular memory cgroup.
diff --git a/mm/filemap.c b/mm/filemap.c
index ab77e19ab09c..b6de9d691c8a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -786,7 +786,7 @@ int replace_page_cache_page(struct page *old, struct page 
*new, gfp_t gfp_mask)
VM_BUG_ON_PAGE(!PageLocked(new), new);
VM_BUG_ON_PAGE(new->mapping, new);
 
-   error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+   error = radix_tree_preload(gfp_mask & ~(__GFP_HIGHMEM | __GFP_ZERO));
if (!error) {
struct address_space *mapping = old->mapping;
void (*freepage)(struct page *);
@@ -842,7 +842,8 @@ static int __add_to_page_cache_locked(struct page *page,
return error;
}
 
-   error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
+   error = radix_tree_maybe_preload(gfp_mask &
+   ~(__GFP_HIGHMEM | __GFP_ZERO));
if (error) {
if (!huge)
mem_cgroup_cancel_charge(page, memcg, false);
-- 
2.17.0.484.g0c8726318c-goog



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] mm: workingset: fix NULL ptr dereference

2018-04-09 Thread Minchan Kim
On Tue, Apr 10, 2018 at 11:33:39AM +0900, Minchan Kim wrote:
> On Mon, Apr 09, 2018 at 06:12:11PM -0700, Matthew Wilcox wrote:
> > On Tue, Apr 10, 2018 at 08:04:09AM +0900, Minchan Kim wrote:
> > > On Mon, Apr 09, 2018 at 08:20:32AM -0700, Matthew Wilcox wrote:
> > > > I don't think this is something the radix tree should know about.
> > > 
> > > Because shadow entry implementation is hidden by radix tree implemetation.
> > > IOW, radix tree user cannot know how it works.
> > 
> > I have no idea what you mean.
> > 
> > > > SLAB should be checking for it (the patch I posted earlier in this
> > > 
> > > I don't think it's right approach. SLAB constructor can initialize
> > > some metadata for slab page populated as well as page zeroing.
> > > However, __GFP_ZERO means only clearing pages, not metadata.
> > > So it's different semantic. No need to mix out.
> > 
> > No, __GFP_ZERO is specified to clear the allocated memory whether
> > you're allocating from alloc_pages or from slab.  What makes no sense
> > is allocating an object from slab with a constructor *and* __GFP_ZERO.
> > They're in conflict, and slab can't fulfill both of those requirements.
> 
> It's a stable material. If you really think it does make sense,
> please submit patch separately.
> 
> > 
> > > > thread), but the right place to filter this out is in the caller of
> > > > radix_tree_maybe_preload -- it's already filtering out HIGHMEM pages,
> > > > and should filter out GFP_ZERO too.
> > > 
> > > radix_tree_[maybe]_preload is exported API, which are error-prone
> > > for out of modules or upcoming customers.
> > > 
> > > More proper place is __radix_tree_preload.
> > 
> > I could not disagree with you more.  It is the responsibility of the
> > callers of radix_tree_preload to avoid calling it with nonsense flags
> > like __GFP_DMA, __GFP_HIGHMEM or __GFP_ZERO.
> 
> How about this?
> 
> It would fix current problem and warn potential bugs as well.
> radix_tree_preload already has done such warning and
> radix_tree_maybe_preload has skipping for misbehaivor gfp.
> 
> From 27ecf7a009d3570d1155c528c7f08040ede68ed3 Mon Sep 17 00:00:00 2001
> From: Minchan Kim 
> Date: Tue, 10 Apr 2018 11:20:11 +0900
> Subject: [PATCH v2] mm: workingset: fix NULL ptr dereference
> 
> It assumes shadow entries of radix tree rely on the init state
> that node->private_list allocated newly is list_empty state
> for the working. Currently, it's initailized in SLAB constructor
> which means node of radix tree would be initialized only when
> *slub allocates new page*, not *slub alloctes new object*.
> 
> If some FS or subsystem pass gfp_mask to __GFP_ZERO, that means
> newly allocated node can have !list_empty(node->private_list)
> by memset of slab allocator. It ends up calling NULL deference
> at workingset_update_node by failing list_empty check.
> 
> This patch fixes it.
> 
> Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
> Cc: Johannes Weiner 
> Cc: Jan Kara 
> Cc: Matthew Wilcox 
> Cc: Jaegeuk Kim 
> Cc: Chao Yu 
> Cc: Christopher Lameter 
> Cc: linux-fsde...@vger.kernel.org
> Reported-by: Chris Fries 
> Signed-off-by: Minchan Kim 
> ---
>  lib/radix-tree.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index da9e10c827df..9d68f2a7888e 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -511,6 +511,16 @@ int radix_tree_preload(gfp_t gfp_mask)
>  {
>   /* Warn on non-sensical use... */
>   WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask));
> + /*
> +  * New allocate node must have node->private_list as INIT_LIST_HEAD
> +  * state by workingset shadow memory implementation.
> +  * If user pass  __GFP_ZERO by mistake, slab allocator will clear
> +  * node->private_list, which makes a BUG. Rather than going Oops,
> +  * just fix and warn about it.
> +  */
> + if (WARN_ON(gfp_mask & __GFP_ZERO))
> + gfp_mask &= ~GFP_ZERO
 
Build fail.

If others are okay for this patch, I will resend fixed patch with stable mark.
I will wait feedback from others.

Thanks.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] mm: workingset: fix NULL ptr dereference

2018-04-09 Thread Minchan Kim
On Mon, Apr 09, 2018 at 06:12:11PM -0700, Matthew Wilcox wrote:
> On Tue, Apr 10, 2018 at 08:04:09AM +0900, Minchan Kim wrote:
> > On Mon, Apr 09, 2018 at 08:20:32AM -0700, Matthew Wilcox wrote:
> > > I don't think this is something the radix tree should know about.
> > 
> > Because shadow entry implementation is hidden by radix tree implemetation.
> > IOW, radix tree user cannot know how it works.
> 
> I have no idea what you mean.
> 
> > > SLAB should be checking for it (the patch I posted earlier in this
> > 
> > I don't think it's right approach. SLAB constructor can initialize
> > some metadata for slab page populated as well as page zeroing.
> > However, __GFP_ZERO means only clearing pages, not metadata.
> > So it's different semantic. No need to mix out.
> 
> No, __GFP_ZERO is specified to clear the allocated memory whether
> you're allocating from alloc_pages or from slab.  What makes no sense
> is allocating an object from slab with a constructor *and* __GFP_ZERO.
> They're in conflict, and slab can't fulfill both of those requirements.

It's a stable material. If you really think it does make sense,
please submit patch separately.

> 
> > > thread), but the right place to filter this out is in the caller of
> > > radix_tree_maybe_preload -- it's already filtering out HIGHMEM pages,
> > > and should filter out GFP_ZERO too.
> > 
> > radix_tree_[maybe]_preload is exported API, which are error-prone
> > for out of modules or upcoming customers.
> > 
> > More proper place is __radix_tree_preload.
> 
> I could not disagree with you more.  It is the responsibility of the
> callers of radix_tree_preload to avoid calling it with nonsense flags
> like __GFP_DMA, __GFP_HIGHMEM or __GFP_ZERO.

How about this?

It would fix current problem and warn potential bugs as well.
radix_tree_preload already has done such warning and
radix_tree_maybe_preload has skipping for misbehaivor gfp.

>From 27ecf7a009d3570d1155c528c7f08040ede68ed3 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Tue, 10 Apr 2018 11:20:11 +0900
Subject: [PATCH v2] mm: workingset: fix NULL ptr dereference

It assumes shadow entries of radix tree rely on the init state
that node->private_list allocated newly is list_empty state
for the working. Currently, it's initailized in SLAB constructor
which means node of radix tree would be initialized only when
*slub allocates new page*, not *slub alloctes new object*.

If some FS or subsystem pass gfp_mask to __GFP_ZERO, that means
newly allocated node can have !list_empty(node->private_list)
by memset of slab allocator. It ends up calling NULL deference
at workingset_update_node by failing list_empty check.

This patch fixes it.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Cc: Johannes Weiner 
Cc: Jan Kara 
Cc: Matthew Wilcox 
Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: Christopher Lameter 
Cc: linux-fsde...@vger.kernel.org
Reported-by: Chris Fries 
Signed-off-by: Minchan Kim 
---
 lib/radix-tree.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index da9e10c827df..9d68f2a7888e 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -511,6 +511,16 @@ int radix_tree_preload(gfp_t gfp_mask)
 {
/* Warn on non-sensical use... */
WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask));
+   /*
+* New allocate node must have node->private_list as INIT_LIST_HEAD
+* state by workingset shadow memory implementation.
+* If user pass  __GFP_ZERO by mistake, slab allocator will clear
+* node->private_list, which makes a BUG. Rather than going Oops,
+* just fix and warn about it.
+*/
+   if (WARN_ON(gfp_mask & __GFP_ZERO))
+   gfp_mask &= ~GFP_ZERO
+
return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE);
 }
 EXPORT_SYMBOL(radix_tree_preload);
@@ -522,7 +532,7 @@ EXPORT_SYMBOL(radix_tree_preload);
  */
 int radix_tree_maybe_preload(gfp_t gfp_mask)
 {
-   if (gfpflags_allow_blocking(gfp_mask))
+   if (gfpflags_allow_blocking(gfp_mask) && !(gfp_mask & __GFP_ZERO))
return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE);
/* Preloading doesn't help anything with this gfp mask, skip it */
preempt_disable();
-- 
2.17.0.484.g0c8726318c-goog



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] mm: workingset: fix NULL ptr dereference

2018-04-09 Thread Minchan Kim
On Mon, Apr 09, 2018 at 08:20:32AM -0700, Matthew Wilcox wrote:
> On Mon, Apr 09, 2018 at 11:49:58PM +0900, Minchan Kim wrote:
> > On Mon, Apr 09, 2018 at 08:25:06PM +0800, Chao Yu wrote:
> > > On 2018/4/9 19:25, Minchan Kim wrote:
> > > > On Mon, Apr 09, 2018 at 04:14:03AM -0700, Matthew Wilcox wrote:
> > > >> On Mon, Apr 09, 2018 at 12:09:30PM +0900, Minchan Kim wrote:
> > > >>> Look at fs/f2fs/inode.c
> > > >>> mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
> > > >>>
> > > >>> __add_to_page_cache_locked
> > > >>>   radix_tree_maybe_preload
> > > >>>
> > > >>> add_to_page_cache_lru
> > > 
> > > No, sometimes, we need to write meta data to new allocated block address,
> > > then we will allocate a zeroed page in inner inode's address space, and
> > > fill partial data in it, and leave other place with zero value which means
> > > some fields are initial status.
> > 
> > Thanks for the explaining.
> > 
> > > There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
> > > I have just checked them, for both of them, we can avoid using __GFP_ZERO,
> > > and do initialization by ourselves to avoid unneeded/redundant zeroing
> > > from mm.
> > 
> > Yub, it would be desirable for f2fs. Please go ahead for f2fs side.
> > However, I think current problem is orthgonal. Now, the problem is
> > radix_tree_node allocation is bind to page cache allocation.
> > Why does FS cannot allocate page cache with __GFP_ZERO?
> > I agree if the concern is only performance matter as Matthew mentioned.
> > But it is beyond that because it shouldn't do due to limitation
> > of workingset shadow entry implementation. I think such coupling is
> > not a good idea.
> > 
> > I think right approach to abstract shadow entry in radix_tree is
> > to mask off __GFP_ZERO in radix_tree's allocation APIs.
> 
> I don't think this is something the radix tree should know about.

Because shadow entry implementation is hidden by radix tree implemetation.
IOW, radix tree user cannot know how it works.

> SLAB should be checking for it (the patch I posted earlier in this

I don't think it's right approach. SLAB constructor can initialize
some metadata for slab page populated as well as page zeroing.
However, __GFP_ZERO means only clearing pages, not metadata.
So it's different semantic. No need to mix out.

> thread), but the right place to filter this out is in the caller of
> radix_tree_maybe_preload -- it's already filtering out HIGHMEM pages,
> and should filter out GFP_ZERO too.

radix_tree_[maybe]_preload is exported API, which are error-prone
for out of modules or upcoming customers.

More proper place is __radix_tree_preload.

> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c2147682f4c3..a87a523eea8e 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page 
> *new, gfp_t gfp_mask)
>   VM_BUG_ON_PAGE(!PageLocked(new), new);
>   VM_BUG_ON_PAGE(new->mapping, new);
>  
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_preload(gfp_mask & ~(__GFP_HIGHMEM | __GFP_ZERO));
>   if (!error) {
>   struct address_space *mapping = old->mapping;
>   void (*freepage)(struct page *);
> @@ -841,7 +841,8 @@ static int __add_to_page_cache_locked(struct page *page,
>   return error;
>   }
>  
> - error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_maybe_preload(gfp_mask &
> + ~(__GFP_HIGHMEM | __GFP_ZERO));
>   if (error) {
>   if (!huge)
>   mem_cgroup_cancel_charge(page, memcg, false);

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] mm: workingset: fix NULL ptr dereference

2018-04-09 Thread Minchan Kim
On Mon, Apr 09, 2018 at 08:25:06PM +0800, Chao Yu wrote:
> On 2018/4/9 19:25, Minchan Kim wrote:
> > On Mon, Apr 09, 2018 at 04:14:03AM -0700, Matthew Wilcox wrote:
> >> On Mon, Apr 09, 2018 at 12:09:30PM +0900, Minchan Kim wrote:
> >>> On Sun, Apr 08, 2018 at 07:49:25PM -0700, Matthew Wilcox wrote:
> >>>> On Mon, Apr 09, 2018 at 10:58:15AM +0900, Minchan Kim wrote:
> >>>>> It assumes shadow entry of radix tree relies on the init state
> >>>>> that node->private_list allocated should be list_empty state.
> >>>>> Currently, it's initailized in SLAB constructor which means
> >>>>> node of radix tree would be initialized only when *slub allocates
> >>>>> new page*, not *new object*. So, if some FS or subsystem pass
> >>>>> gfp_mask to __GFP_ZERO, slub allocator will do memset blindly.
> >>>>
> >>>> Wait, what?  Who's declaring their radix tree with GFP_ZERO flags?
> >>>> I don't see anyone using INIT_RADIX_TREE or RADIX_TREE or RADIX_TREE_INIT
> >>>> with GFP_ZERO.
> >>>
> >>> Look at fs/f2fs/inode.c
> >>> mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
> >>>
> >>> __add_to_page_cache_locked
> >>>   radix_tree_maybe_preload
> >>>
> >>> add_to_page_cache_lru
> >>>
> >>> What's the wrong with setting __GFP_ZERO with mapping->gfp_mask?
> >>
> >> Because it's a stupid thing to do.  Pages are allocated and then filled
> >> from disk.  Zeroing them before DMAing to them is just a waste of time.
> > 
> > Every FSes do address_space to read pages from storage? I'm not sure.
> 
> No, sometimes, we need to write meta data to new allocated block address,
> then we will allocate a zeroed page in inner inode's address space, and
> fill partial data in it, and leave other place with zero value which means
> some fields are initial status.

Thanks for the explaining.

> 
> There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
> I have just checked them, for both of them, we can avoid using __GFP_ZERO,
> and do initialization by ourselves to avoid unneeded/redundant zeroing
> from mm.

Yub, it would be desirable for f2fs. Please go ahead for f2fs side.
However, I think current problem is orthgonal. Now, the problem is
radix_tree_node allocation is bind to page cache allocation.
Why does FS cannot allocate page cache with __GFP_ZERO?
I agree if the concern is only performance matter as Matthew mentioned.
But it is beyond that because it shouldn't do due to limitation
of workingset shadow entry implementation. I think such coupling is
not a good idea.

I think right approach to abstract shadow entry in radix_tree is
to mask off __GFP_ZERO in radix_tree's allocation APIs.

> 
> To Jaegeuk, if I missed something, please let me know.
> 
> ---
>  fs/f2fs/inode.c | 4 ++--
>  fs/f2fs/node.c  | 2 ++
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index c852e800..cc63f8c448f0 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -339,10 +339,10 @@ struct inode *f2fs_iget(struct super_block *sb, 
> unsigned long ino)
>  make_now:
>   if (ino == F2FS_NODE_INO(sbi)) {
>   inode->i_mapping->a_ops = &f2fs_node_aops;
> - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
> + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
>   } else if (ino == F2FS_META_INO(sbi)) {
>   inode->i_mapping->a_ops = &f2fs_meta_aops;
> - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
> + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
>   } else if (S_ISREG(inode->i_mode)) {
>   inode->i_op = &f2fs_file_inode_operations;
>   inode->i_fop = &f2fs_file_operations;
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 9dedd4b5e077..31e5ecf98ffd 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1078,6 +1078,7 @@ struct page *new_node_page(struct dnode_of_data *dn, 
> unsigned int ofs)
>   set_node_addr(sbi, &new_ni, NEW_ADDR, false);
> 
>   f2fs_wait_on_page_writeback(page, NODE, true);
> + memset(F2FS_NODE(page), 0, PAGE_SIZE);
>   fill_node_footer(page, dn->nid, dn->inode->i_ino, ofs, true);
>   set_cold_node(page, S_ISDIR(dn->inode->i_mode));
>   if (!PageUptodate(page))
> @@ -2321,6 +2322,7 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct 
> page *page)
> 
&

Re: [f2fs-dev] [PATCH] mm: workingset: fix NULL ptr dereference

2018-04-09 Thread Minchan Kim
On Mon, Apr 09, 2018 at 04:14:03AM -0700, Matthew Wilcox wrote:
> On Mon, Apr 09, 2018 at 12:09:30PM +0900, Minchan Kim wrote:
> > On Sun, Apr 08, 2018 at 07:49:25PM -0700, Matthew Wilcox wrote:
> > > On Mon, Apr 09, 2018 at 10:58:15AM +0900, Minchan Kim wrote:
> > > > It assumes shadow entry of radix tree relies on the init state
> > > > that node->private_list allocated should be list_empty state.
> > > > Currently, it's initailized in SLAB constructor which means
> > > > node of radix tree would be initialized only when *slub allocates
> > > > new page*, not *new object*. So, if some FS or subsystem pass
> > > > gfp_mask to __GFP_ZERO, slub allocator will do memset blindly.
> > > 
> > > Wait, what?  Who's declaring their radix tree with GFP_ZERO flags?
> > > I don't see anyone using INIT_RADIX_TREE or RADIX_TREE or RADIX_TREE_INIT
> > > with GFP_ZERO.
> > 
> > Look at fs/f2fs/inode.c
> > mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
> > 
> > __add_to_page_cache_locked
> >   radix_tree_maybe_preload
> > 
> > add_to_page_cache_lru
> > 
> > What's the wrong with setting __GFP_ZERO with mapping->gfp_mask?
> 
> Because it's a stupid thing to do.  Pages are allocated and then filled
> from disk.  Zeroing them before DMAing to them is just a waste of time.

Every FSes do address_space to read pages from storage? I'm not sure.

If you're right, we need to insert WARN_ON to catch up __GFP_ZERO
on mapping_set_gfp_mask at the beginning and remove all of those
stupid thins. 

Jaegeuk, why do you need __GFP_ZERO? Could you explain?

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel