Re: [PATCH 4/7] mm: introduce memalloc_nofs_{save,restore} API

2017-03-09 Thread David Sterba
On Tue, Mar 07, 2017 at 04:09:56PM +0100, Michal Hocko wrote:
> On Mon 06-03-17 13:22:14, Andrew Morton wrote:
> > On Mon,  6 Mar 2017 14:14:05 +0100 Michal Hocko  wrote:
> [...]
> > > --- a/include/linux/gfp.h
> > > +++ b/include/linux/gfp.h
> > > @@ -210,8 +210,16 @@ struct vm_area_struct;
> > >   *
> > >   * GFP_NOIO will use direct reclaim to discard clean pages or slab pages
> > >   *   that do not require the starting of any physical IO.
> > > + *   Please try to avoid using this flag directly and instead use
> > > + *   memalloc_noio_{save,restore} to mark the whole scope which cannot
> > > + *   perform any IO with a short explanation why. All allocation requests
> > > + *   will inherit GFP_NOIO implicitly.
> > >   *
> > >   * GFP_NOFS will use direct reclaim but will not use any filesystem 
> > > interfaces.
> > > + *   Please try to avoid using this flag directly and instead use
> > > + *   memalloc_nofs_{save,restore} to mark the whole scope which 
> > > cannot/shouldn't
> > > + *   recurse into the FS layer with a short explanation why. All 
> > > allocation
> > > + *   requests will inherit GFP_NOFS implicitly.
> > 
> > I wonder if these are worth a checkpatch rule.
> 
> I am not really sure, to be honest. This may easilly end up people
> replacing
> 
> do_alloc(GFP_NOFS)
> 
> with
> 
> memalloc_nofs_save()
> do_alloc(GFP_KERNEL)
> memalloc_nofs_restore()
> 
> which doesn't make any sense of course. From my experience, people tend
> to do stupid things just to silent checkpatch warnings very often.
> Moreover I believe we need to do the transition to the new api first
> before we can push back on the explicit GFP_NOFS usage. Maybe then we
> can think about the a checkpatch warning.

I agree will all your objections against adding that to checkpatch, at
this point it's less harmful to use GFP_NOFS.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] mm: introduce memalloc_nofs_{save,restore} API

2017-03-07 Thread Michal Hocko
On Mon 06-03-17 13:22:14, Andrew Morton wrote:
> On Mon,  6 Mar 2017 14:14:05 +0100 Michal Hocko  wrote:
[...]
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -210,8 +210,16 @@ struct vm_area_struct;
> >   *
> >   * GFP_NOIO will use direct reclaim to discard clean pages or slab pages
> >   *   that do not require the starting of any physical IO.
> > + *   Please try to avoid using this flag directly and instead use
> > + *   memalloc_noio_{save,restore} to mark the whole scope which cannot
> > + *   perform any IO with a short explanation why. All allocation requests
> > + *   will inherit GFP_NOIO implicitly.
> >   *
> >   * GFP_NOFS will use direct reclaim but will not use any filesystem 
> > interfaces.
> > + *   Please try to avoid using this flag directly and instead use
> > + *   memalloc_nofs_{save,restore} to mark the whole scope which 
> > cannot/shouldn't
> > + *   recurse into the FS layer with a short explanation why. All allocation
> > + *   requests will inherit GFP_NOFS implicitly.
> 
> I wonder if these are worth a checkpatch rule.

I am not really sure, to be honest. This may easilly end up people
replacing

do_alloc(GFP_NOFS)

with

memalloc_nofs_save()
do_alloc(GFP_KERNEL)
memalloc_nofs_restore()

which doesn't make any sense of course. From my experience, people tend
to do stupid things just to silent checkpatch warnings very often.
Moreover I believe we need to do the transition to the new api first
before we can push back on the explicit GFP_NOFS usage. Maybe then we
can think about the a checkpatch warning.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] mm: introduce memalloc_nofs_{save,restore} API

2017-03-06 Thread Andrew Morton
On Mon,  6 Mar 2017 14:14:05 +0100 Michal Hocko  wrote:

> From: Michal Hocko 
> 
> GFP_NOFS context is used for the following 5 reasons currently
>   - to prevent from deadlocks when the lock held by the allocation
> context would be needed during the memory reclaim
>   - to prevent from stack overflows during the reclaim because
> the allocation is performed from a deep context already
>   - to prevent lockups when the allocation context depends on
> other reclaimers to make a forward progress indirectly
>   - just in case because this would be safe from the fs POV
>   - silence lockdep false positives
> 
> Unfortunately overuse of this allocation context brings some problems
> to the MM. Memory reclaim is much weaker (especially during heavy FS
> metadata workloads), OOM killer cannot be invoked because the MM layer
> doesn't have enough information about how much memory is freeable by the
> FS layer.
> 
> In many cases it is far from clear why the weaker context is even used
> and so it might be used unnecessarily. We would like to get rid of
> those as much as possible. One way to do that is to use the flag in
> scopes rather than isolated cases. Such a scope is declared when really
> necessary, tracked per task and all the allocation requests from within
> the context will simply inherit the GFP_NOFS semantic.
> 
> Not only this is easier to understand and maintain because there are
> much less problematic contexts than specific allocation requests, this
> also helps code paths where FS layer interacts with other layers (e.g.
> crypto, security modules, MM etc...) and there is no easy way to convey
> the allocation context between the layers.
> 
> Introduce memalloc_nofs_{save,restore} API to control the scope
> of GFP_NOFS allocation context. This is basically copying
> memalloc_noio_{save,restore} API we have for other restricted allocation
> context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
> just an alias for PF_FSTRANS which has been xfs specific until recently.
> There are no more PF_FSTRANS users anymore so let's just drop it.
> 
> PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
> implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
> is renamed to current_gfp_context because it now cares about both
> PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
> their semantic. kmem_flags_convert() doesn't need to evaluate the flag
> anymore.
> 
> This patch shouldn't introduce any functional changes.
> 
> Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
> usage as much as possible and only use a properly documented
> memalloc_nofs_{save,restore} checkpoints where they are appropriate.
> 
> 
>
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -210,8 +210,16 @@ struct vm_area_struct;
>   *
>   * GFP_NOIO will use direct reclaim to discard clean pages or slab pages
>   *   that do not require the starting of any physical IO.
> + *   Please try to avoid using this flag directly and instead use
> + *   memalloc_noio_{save,restore} to mark the whole scope which cannot
> + *   perform any IO with a short explanation why. All allocation requests
> + *   will inherit GFP_NOIO implicitly.
>   *
>   * GFP_NOFS will use direct reclaim but will not use any filesystem 
> interfaces.
> + *   Please try to avoid using this flag directly and instead use
> + *   memalloc_nofs_{save,restore} to mark the whole scope which 
> cannot/shouldn't
> + *   recurse into the FS layer with a short explanation why. All allocation
> + *   requests will inherit GFP_NOFS implicitly.

I wonder if these are worth a checkpatch rule.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] mm: introduce memalloc_nofs_{save,restore} API

2017-03-06 Thread Michal Hocko
From: Michal Hocko 

GFP_NOFS context is used for the following 5 reasons currently
- to prevent from deadlocks when the lock held by the allocation
  context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because
  the allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on
  other reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives

Unfortunately overuse of this allocation context brings some problems
to the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.

In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.

Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.

Introduce memalloc_nofs_{save,restore} API to control the scope
of GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.

PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.

This patch shouldn't introduce any functional changes.

Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.

Acked-by: Vlastimil Babka 
Signed-off-by: Michal Hocko 
---
 fs/xfs/kmem.h|  2 +-
 include/linux/gfp.h  |  8 
 include/linux/sched.h|  8 +++-
 include/linux/sched/mm.h | 26 +++---
 kernel/locking/lockdep.c |  6 +++---
 mm/page_alloc.c  | 10 ++
 mm/vmscan.c  |  6 +++---
 7 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index d973dbfc2bfa..ae08cfd9552a 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -50,7 +50,7 @@ kmem_flags_convert(xfs_km_flags_t flags)
lflags = GFP_ATOMIC | __GFP_NOWARN;
} else {
lflags = GFP_KERNEL | __GFP_NOWARN;
-   if ((current->flags & PF_MEMALLOC_NOFS) || (flags & KM_NOFS))
+   if (flags & KM_NOFS)
lflags &= ~__GFP_FS;
}
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 978232a3b4ae..2bfcfd33e476 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -210,8 +210,16 @@ struct vm_area_struct;
  *
  * GFP_NOIO will use direct reclaim to discard clean pages or slab pages
  *   that do not require the starting of any physical IO.
+ *   Please try to avoid using this flag directly and instead use
+ *   memalloc_noio_{save,restore} to mark the whole scope which cannot
+ *   perform any IO with a short explanation why. All allocation requests
+ *   will inherit GFP_NOIO implicitly.
  *
  * GFP_NOFS will use direct reclaim but will not use any filesystem interfaces.
+ *   Please try to avoid using this flag directly and instead use
+ *   memalloc_nofs_{save,restore} to mark the whole scope which 
cannot/shouldn't
+ *   recurse into the FS layer with a short explanation why. All allocation
+ *   requests will inherit GFP_NOFS implicitly.
  *
  * GFP_USER is for userspace allocations that also need to be directly
  *   accessibly by the kernel or hardware. It is typically used by hardware
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4528f7c9789f..9c3ee2281a56 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1211,9 +1211,9 @@ extern struct pid *cad_pid;
 #define PF_USED_ASYNC  0x4000  /* Used async_schedule*(), used 
by module init */
 #define PF_NOFREEZE