Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Tue 14-04-15 06:36:25, Johannes Weiner wrote: > On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: [...] > > AFAIU, David wasn't asking for the OOM killer as much as he was > > interested in getting access to a small amount of reserves in order to > > make a progress. __GFP_HIGH is there for this purpose. > > That's not just any reserve pool available to the generic caller, it's > the reserve pool for interrupts, which can not wait and replenish it. > It relies on kswapd to run soon after the interrupt, or right away on > SMP. But locks held in the filesystem can hold up kswapd (the reason > we even still perform direct reclaim) so NOFS allocs shouldn't use it. > > [hannes@dexter linux]$ git grep '__GFP_HIGH\b' | wc -l > 39 > [hannes@dexter linux]$ git grep GFP_ATOMIC | wc -l > 4324 > > Interrupts have *no other option*. Atomic context in general can ALLOC_HARDER so it has an access to additional reserves wrt. __GFP_HIGH|__GFP_WAIT. > It's misguided to deplete their > reserves, cause loss of network packets, loss of input events, from > allocations that can actually perform reclaim and have perfectly > acceptable fallback strategies in the caller. OK, I thought that it was clear that the proposed __GFP_HIGH is a fallback strategy for those paths which cannot do much better. Not a random solution for "this shouldn't fail to eagerly". > Generally, for any reserve system there must be a way to replenish it. > For interrupts it's kswapd, for the OOM reserves I proposed it's the > OOM victim exiting soon after the allocation, if not right away. And my understanding was that the fallback mode would be used in the context which would lead to release of the fs pressure thus releasing a memory as well. > __GFP_NOFAIL is the odd one out here because accessing the system's > emergency reserves without any prospect of near-future replenishing is > just slightly better than deadlocking right away. Which is why this > reserve access can not be separated out: if you can do *anything* > better than hanging, do it. If not, use __GFP_NOFAIL. Agreed. > > > My question here would be: are there any NOFS allocations that *don't* > > > want this behavior? Does it even make sense to require this separate > > > annotation or should we just make it the default? > > > > > > The argument here was always that NOFS allocations are very limited in > > > their reclaim powers and will trigger OOM prematurely. However, the > > > way we limit dirty memory these days forces most cache to be clean at > > > all times, and direct reclaim in general hasn't been allowed to issue > > > page writeback for quite some time. So these days, NOFS reclaim isn't > > > really weaker than regular direct reclaim. > > > > What about [di]cache and some others fs specific shrinkers (and heavy > > metadata loads)? > > My bad, I forgot about those. But it doesn't really change the basic > question of whether we want to change the GFP_NOFS default or merely > annotate individual sites that want to try harder. My understanding was the later one. If you look at page cache allocations which use mapping_gfp_mask (e.g. xfs is using GFP_NOFS for that context all the time) then those do not really have to try harder. > > > The only exception is that > > > it might block writeback, so we'd go OOM if the only reclaimables left > > > were dirty pages against that filesystem. That should be acceptable. > > > > OOM killer is hardly acceptable by most users I've heard from. OOM > > killer is the _last_ resort and if the allocation is restricted then > > we shouldn't use the big hammer. > > We *are* talking about the last resort for these allocations! There > is nothing else we can do to avoid allocation failure at this point. > Absent a reservation system, we have the choice between failing after > reclaim - which Dave said was too fragile for XFS - or OOM killing. As per other emails in this thread (e.g. http://marc.info/?l=linux-mm=142897087230385=2), I understood that the access to a small portion of emergency pool would be sufficient to release the pressure and that sounds preferable to me over a destructive reclaim attempts. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: > [Sorry for a late reply] > > On Tue 07-04-15 10:18:22, Johannes Weiner wrote: > > On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: > > > On Mon 30-03-15 11:32:40, Dave Chinner wrote: > > > > On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: > > > [...] > > > > > GFP_NOFS sites are currently one of the sites that can deadlock inside > > > > > the allocator, even though many of them seem to have fallback code. > > > > > My reasoning here is that if you *have* an exit strategy for failing > > > > > allocations that is smarter than hanging, we should probably use that. > > > > > > > > We already do that for allocations where we can handle failure in > > > > GFP_NOFS conditions. It is, however, somewhat useless if we can't > > > > tell the allocator to try really hard if we've already had a failure > > > > and we are already in memory reclaim conditions (e.g. a shrinker > > > > trying to clean dirty objects so they can be reclaimed). > > > > > > > > From that perspective, I think that this patch set aims force us > > > > away from handling fallbacks ourselves because a) it makes GFP_NOFS > > > > more likely to fail, and b) provides no mechanism to "try harder" > > > > when we really need the allocation to succeed. > > > > > > You can ask for this "try harder" by __GFP_HIGH flag. Would that help > > > in your fallback case? > > > > I would think __GFP_REPEAT would be more suitable here. From the doc: > > > > * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt > > * _might_ fail. This depends upon the particular VM implementation. > > > > so we can make the semantics of GFP_NOFS | __GFP_REPEAT such that they > > are allowed to use the OOM killer and dip into the OOM reserves. > > __GFP_REPEAT is quite subtle already. It makes a difference only > for high order allocations That's an implementation detail, owed to the fact that smaller orders already imply that behavior. That doesn't change the semantics. And people currently *use* it all over the tree for small orders, because of how the flag is defined in gfp.h; not because of how it's currently implemented. > and it is not clear to me why it should imply OOM killer for small > orders now. Or did you suggest making it special only with > GFP_NOFS? That sounds even more ugly. Small orders already invoke the OOM killer. I suggested using this flag to override the specialness of GFP_NOFS not OOM killing - in response to whether we can provide an annotation to make some GFP_NOFS sites more robust. This is exactly what __GFP_REPEAT is: try the allocation harder than you would without this flag. It identifies a caller that is willing to put in extra effort or be more aggressive because the allocation is more important than other allocations of the otherwise same gfp_mask. > AFAIU, David wasn't asking for the OOM killer as much as he was > interested in getting access to a small amount of reserves in order to > make a progress. __GFP_HIGH is there for this purpose. That's not just any reserve pool available to the generic caller, it's the reserve pool for interrupts, which can not wait and replenish it. It relies on kswapd to run soon after the interrupt, or right away on SMP. But locks held in the filesystem can hold up kswapd (the reason we even still perform direct reclaim) so NOFS allocs shouldn't use it. [hannes@dexter linux]$ git grep '__GFP_HIGH\b' | wc -l 39 [hannes@dexter linux]$ git grep GFP_ATOMIC | wc -l 4324 Interrupts have *no other option*. It's misguided to deplete their reserves, cause loss of network packets, loss of input events, from allocations that can actually perform reclaim and have perfectly acceptable fallback strategies in the caller. Generally, for any reserve system there must be a way to replenish it. For interrupts it's kswapd, for the OOM reserves I proposed it's the OOM victim exiting soon after the allocation, if not right away. __GFP_NOFAIL is the odd one out here because accessing the system's emergency reserves without any prospect of near-future replenishing is just slightly better than deadlocking right away. Which is why this reserve access can not be separated out: if you can do *anything* better than hanging, do it. If not, use __GFP_NOFAIL. > > My question here would be: are there any NOFS allocations that *don't* > > want this behavior? Does it even make sense to require this separate > > annotation or should we just make it the default? > > > > The argument here was always that NOFS allocations are very limited in > > their reclaim powers and will trigger OOM prematurely. However, the > > way we limit dirty memory these days forces most cache to be clean at > > all times, and direct reclaim in general hasn't been allowed to issue > > page writeback for quite some time. So these days, NOFS reclaim isn't > > really weaker than regular direct reclaim. > > What about
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Tue 14-04-15 10:11:18, Dave Chinner wrote: > On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: > > [Sorry for a late reply] > > > > On Tue 07-04-15 10:18:22, Johannes Weiner wrote: > > > On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: > > > My question here would be: are there any NOFS allocations that *don't* > > > want this behavior? Does it even make sense to require this separate > > > annotation or should we just make it the default? > > > > > > The argument here was always that NOFS allocations are very limited in > > > their reclaim powers and will trigger OOM prematurely. However, the > > > way we limit dirty memory these days forces most cache to be clean at > > > all times, and direct reclaim in general hasn't been allowed to issue > > > page writeback for quite some time. So these days, NOFS reclaim isn't > > > really weaker than regular direct reclaim. > > > > What about [di]cache and some others fs specific shrinkers (and heavy > > metadata loads)? > > We don't do direct reclaim for fs shrinkers in GFP_NOFS context, > either. Yeah but we invoke fs shrinkers for the _regular_ direct reclaim (with __GFP_FS), which was the point I've tried to make here. > *HOWEVER* > > The shrinker reclaim we can not execute is deferred to the next > context that can do the reclaim, which is usually kswapd. So the > reclaim gets done according to the GFP_NOFS memory pressure that is > occurring, it is just done in a different context... Right, deferring to kswapd is the reason why I think the direct reclaim shouldn't invoke OOM killer in this context because that would be premature - as kswapd still can make some progress. Sorry for not being more clear. > > > The only exception is that > > > it might block writeback, so we'd go OOM if the only reclaimables left > > > were dirty pages against that filesystem. That should be acceptable. > > > > OOM killer is hardly acceptable by most users I've heard from. OOM > > killer is the _last_ resort and if the allocation is restricted then > > we shouldn't use the big hammer. The allocator might use __GFP_HIGH to > > get access to memory reserves if it can fail or __GFP_NOFAIL if it > > cannot. With your patches the NOFAIL case would get an access to memory > > reserves as well. So I do not really see a reason to change GFP_NOFS vs. > > OOM killer semantic. > > So, really, what we want is something like: > > #define __GFP_USE_LOWMEM_RESERVE __GFP_HIGH > > So that it documents the code that is using it effectively and we > can find them easily with cscope/grep? I wouldn't be opposed. To be honest I was never fond of __GFP_HIGH. The naming is counterintuitive. So I would rather go with renaminag it. We do not have that many users in the tree. git grep "GFP_HIGH\>" | wc -l 40 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Tue 14-04-15 10:11:18, Dave Chinner wrote: On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: [Sorry for a late reply] On Tue 07-04-15 10:18:22, Johannes Weiner wrote: On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: My question here would be: are there any NOFS allocations that *don't* want this behavior? Does it even make sense to require this separate annotation or should we just make it the default? The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. What about [di]cache and some others fs specific shrinkers (and heavy metadata loads)? We don't do direct reclaim for fs shrinkers in GFP_NOFS context, either. Yeah but we invoke fs shrinkers for the _regular_ direct reclaim (with __GFP_FS), which was the point I've tried to make here. *HOWEVER* The shrinker reclaim we can not execute is deferred to the next context that can do the reclaim, which is usually kswapd. So the reclaim gets done according to the GFP_NOFS memory pressure that is occurring, it is just done in a different context... Right, deferring to kswapd is the reason why I think the direct reclaim shouldn't invoke OOM killer in this context because that would be premature - as kswapd still can make some progress. Sorry for not being more clear. The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. OOM killer is hardly acceptable by most users I've heard from. OOM killer is the _last_ resort and if the allocation is restricted then we shouldn't use the big hammer. The allocator might use __GFP_HIGH to get access to memory reserves if it can fail or __GFP_NOFAIL if it cannot. With your patches the NOFAIL case would get an access to memory reserves as well. So I do not really see a reason to change GFP_NOFS vs. OOM killer semantic. So, really, what we want is something like: #define __GFP_USE_LOWMEM_RESERVE __GFP_HIGH So that it documents the code that is using it effectively and we can find them easily with cscope/grep? I wouldn't be opposed. To be honest I was never fond of __GFP_HIGH. The naming is counterintuitive. So I would rather go with renaminag it. We do not have that many users in the tree. git grep GFP_HIGH\ | wc -l 40 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: [Sorry for a late reply] On Tue 07-04-15 10:18:22, Johannes Weiner wrote: On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: On Mon 30-03-15 11:32:40, Dave Chinner wrote: On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: [...] GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. You can ask for this try harder by __GFP_HIGH flag. Would that help in your fallback case? I would think __GFP_REPEAT would be more suitable here. From the doc: * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. so we can make the semantics of GFP_NOFS | __GFP_REPEAT such that they are allowed to use the OOM killer and dip into the OOM reserves. __GFP_REPEAT is quite subtle already. It makes a difference only for high order allocations That's an implementation detail, owed to the fact that smaller orders already imply that behavior. That doesn't change the semantics. And people currently *use* it all over the tree for small orders, because of how the flag is defined in gfp.h; not because of how it's currently implemented. and it is not clear to me why it should imply OOM killer for small orders now. Or did you suggest making it special only with GFP_NOFS? That sounds even more ugly. Small orders already invoke the OOM killer. I suggested using this flag to override the specialness of GFP_NOFS not OOM killing - in response to whether we can provide an annotation to make some GFP_NOFS sites more robust. This is exactly what __GFP_REPEAT is: try the allocation harder than you would without this flag. It identifies a caller that is willing to put in extra effort or be more aggressive because the allocation is more important than other allocations of the otherwise same gfp_mask. AFAIU, David wasn't asking for the OOM killer as much as he was interested in getting access to a small amount of reserves in order to make a progress. __GFP_HIGH is there for this purpose. That's not just any reserve pool available to the generic caller, it's the reserve pool for interrupts, which can not wait and replenish it. It relies on kswapd to run soon after the interrupt, or right away on SMP. But locks held in the filesystem can hold up kswapd (the reason we even still perform direct reclaim) so NOFS allocs shouldn't use it. [hannes@dexter linux]$ git grep '__GFP_HIGH\b' | wc -l 39 [hannes@dexter linux]$ git grep GFP_ATOMIC | wc -l 4324 Interrupts have *no other option*. It's misguided to deplete their reserves, cause loss of network packets, loss of input events, from allocations that can actually perform reclaim and have perfectly acceptable fallback strategies in the caller. Generally, for any reserve system there must be a way to replenish it. For interrupts it's kswapd, for the OOM reserves I proposed it's the OOM victim exiting soon after the allocation, if not right away. __GFP_NOFAIL is the odd one out here because accessing the system's emergency reserves without any prospect of near-future replenishing is just slightly better than deadlocking right away. Which is why this reserve access can not be separated out: if you can do *anything* better than hanging, do it. If not, use __GFP_NOFAIL. My question here would be: are there any NOFS allocations that *don't* want this behavior? Does it even make sense to require this separate annotation or should we just make it the default? The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. What about [di]cache and some others fs specific shrinkers (and heavy metadata loads)? My bad, I forgot about those. But it doesn't really change the
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Tue 14-04-15 06:36:25, Johannes Weiner wrote: On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: [...] AFAIU, David wasn't asking for the OOM killer as much as he was interested in getting access to a small amount of reserves in order to make a progress. __GFP_HIGH is there for this purpose. That's not just any reserve pool available to the generic caller, it's the reserve pool for interrupts, which can not wait and replenish it. It relies on kswapd to run soon after the interrupt, or right away on SMP. But locks held in the filesystem can hold up kswapd (the reason we even still perform direct reclaim) so NOFS allocs shouldn't use it. [hannes@dexter linux]$ git grep '__GFP_HIGH\b' | wc -l 39 [hannes@dexter linux]$ git grep GFP_ATOMIC | wc -l 4324 Interrupts have *no other option*. Atomic context in general can ALLOC_HARDER so it has an access to additional reserves wrt. __GFP_HIGH|__GFP_WAIT. It's misguided to deplete their reserves, cause loss of network packets, loss of input events, from allocations that can actually perform reclaim and have perfectly acceptable fallback strategies in the caller. OK, I thought that it was clear that the proposed __GFP_HIGH is a fallback strategy for those paths which cannot do much better. Not a random solution for this shouldn't fail to eagerly. Generally, for any reserve system there must be a way to replenish it. For interrupts it's kswapd, for the OOM reserves I proposed it's the OOM victim exiting soon after the allocation, if not right away. And my understanding was that the fallback mode would be used in the context which would lead to release of the fs pressure thus releasing a memory as well. __GFP_NOFAIL is the odd one out here because accessing the system's emergency reserves without any prospect of near-future replenishing is just slightly better than deadlocking right away. Which is why this reserve access can not be separated out: if you can do *anything* better than hanging, do it. If not, use __GFP_NOFAIL. Agreed. My question here would be: are there any NOFS allocations that *don't* want this behavior? Does it even make sense to require this separate annotation or should we just make it the default? The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. What about [di]cache and some others fs specific shrinkers (and heavy metadata loads)? My bad, I forgot about those. But it doesn't really change the basic question of whether we want to change the GFP_NOFS default or merely annotate individual sites that want to try harder. My understanding was the later one. If you look at page cache allocations which use mapping_gfp_mask (e.g. xfs is using GFP_NOFS for that context all the time) then those do not really have to try harder. The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. OOM killer is hardly acceptable by most users I've heard from. OOM killer is the _last_ resort and if the allocation is restricted then we shouldn't use the big hammer. We *are* talking about the last resort for these allocations! There is nothing else we can do to avoid allocation failure at this point. Absent a reservation system, we have the choice between failing after reclaim - which Dave said was too fragile for XFS - or OOM killing. As per other emails in this thread (e.g. http://marc.info/?l=linux-mmm=142897087230385w=2), I understood that the access to a small portion of emergency pool would be sufficient to release the pressure and that sounds preferable to me over a destructive reclaim attempts. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: > [Sorry for a late reply] > > On Tue 07-04-15 10:18:22, Johannes Weiner wrote: > > On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: > > My question here would be: are there any NOFS allocations that *don't* > > want this behavior? Does it even make sense to require this separate > > annotation or should we just make it the default? > > > > The argument here was always that NOFS allocations are very limited in > > their reclaim powers and will trigger OOM prematurely. However, the > > way we limit dirty memory these days forces most cache to be clean at > > all times, and direct reclaim in general hasn't been allowed to issue > > page writeback for quite some time. So these days, NOFS reclaim isn't > > really weaker than regular direct reclaim. > > What about [di]cache and some others fs specific shrinkers (and heavy > metadata loads)? We don't do direct reclaim for fs shrinkers in GFP_NOFS context, either. *HOWEVER* The shrinker reclaim we can not execute is deferred to the next context that can do the reclaim, which is usually kswapd. So the reclaim gets done according to the GFP_NOFS memory pressure that is occurring, it is just done in a different context... > > The only exception is that > > it might block writeback, so we'd go OOM if the only reclaimables left > > were dirty pages against that filesystem. That should be acceptable. > > OOM killer is hardly acceptable by most users I've heard from. OOM > killer is the _last_ resort and if the allocation is restricted then > we shouldn't use the big hammer. The allocator might use __GFP_HIGH to > get access to memory reserves if it can fail or __GFP_NOFAIL if it > cannot. With your patches the NOFAIL case would get an access to memory > reserves as well. So I do not really see a reason to change GFP_NOFS vs. > OOM killer semantic. So, really, what we want is something like: #define __GFP_USE_LOWMEM_RESERVE__GFP_HIGH So that it documents the code that is using it effectively and we can find them easily with cscope/grep? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Sat 11-04-15 16:29:26, Tetsuo Handa wrote: > Johannes Weiner wrote: > > The argument here was always that NOFS allocations are very limited in > > their reclaim powers and will trigger OOM prematurely. However, the > > way we limit dirty memory these days forces most cache to be clean at > > all times, and direct reclaim in general hasn't been allowed to issue > > page writeback for quite some time. So these days, NOFS reclaim isn't > > really weaker than regular direct reclaim. The only exception is that > > it might block writeback, so we'd go OOM if the only reclaimables left > > were dirty pages against that filesystem. That should be acceptable. > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 47981c5e54c3..fe3cb2b0b85b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int > > order, int alloc_flags, > > /* The OOM killer does not needlessly kill tasks for lowmem */ > > if (ac->high_zoneidx < ZONE_NORMAL) > > goto out; > > - /* The OOM killer does not compensate for IO-less reclaim */ > > - if (!(gfp_mask & __GFP_FS)) { > > - /* > > -* XXX: Page reclaim didn't yield anything, > > -* and the OOM killer can't be invoked, but > > -* keep looping as per tradition. > > -*/ > > - *did_some_progress = 1; > > - goto out; > > - } > > if (pm_suspended_storage()) > > goto out; > > /* The OOM killer may not free memory on a specific node */ > > > > I think this change will allow calling out_of_memory() which results in > "oom_kill_process() is trivially called via pagefault_out_of_memory()" > problem described in https://lkml.org/lkml/2015/3/18/219 . > > I myself think that we should trigger OOM killer for !__GFP_FS allocation > in order to make forward progress in case the OOM victim is blocked. > So, my question about this change is whether we can accept involving OOM > killer from page fault, no matter how trivially OOM killer will kill some > process? We trigger OOM killer from the page fault path for ages. In fact the memcg will trigger memcg OOM killer _only_ from the page fault path because this context is safe as we do not sit on any locks at the time. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
[Sorry for a late reply] On Tue 07-04-15 10:18:22, Johannes Weiner wrote: > On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: > > On Mon 30-03-15 11:32:40, Dave Chinner wrote: > > > On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: > > [...] > > > > GFP_NOFS sites are currently one of the sites that can deadlock inside > > > > the allocator, even though many of them seem to have fallback code. > > > > My reasoning here is that if you *have* an exit strategy for failing > > > > allocations that is smarter than hanging, we should probably use that. > > > > > > We already do that for allocations where we can handle failure in > > > GFP_NOFS conditions. It is, however, somewhat useless if we can't > > > tell the allocator to try really hard if we've already had a failure > > > and we are already in memory reclaim conditions (e.g. a shrinker > > > trying to clean dirty objects so they can be reclaimed). > > > > > > From that perspective, I think that this patch set aims force us > > > away from handling fallbacks ourselves because a) it makes GFP_NOFS > > > more likely to fail, and b) provides no mechanism to "try harder" > > > when we really need the allocation to succeed. > > > > You can ask for this "try harder" by __GFP_HIGH flag. Would that help > > in your fallback case? > > I would think __GFP_REPEAT would be more suitable here. From the doc: > > * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt > * _might_ fail. This depends upon the particular VM implementation. > > so we can make the semantics of GFP_NOFS | __GFP_REPEAT such that they > are allowed to use the OOM killer and dip into the OOM reserves. __GFP_REPEAT is quite subtle already. It makes a difference only for high order allocations and it is not clear to me why it should imply OOM killer for small orders now. Or did you suggest making it special only with GFP_NOFS? That sounds even more ugly. AFAIU, David wasn't asking for the OOM killer as much as he was interested in getting access to a small amount of reserves in order to make a progress. __GFP_HIGH is there for this purpose. > My question here would be: are there any NOFS allocations that *don't* > want this behavior? Does it even make sense to require this separate > annotation or should we just make it the default? > > The argument here was always that NOFS allocations are very limited in > their reclaim powers and will trigger OOM prematurely. However, the > way we limit dirty memory these days forces most cache to be clean at > all times, and direct reclaim in general hasn't been allowed to issue > page writeback for quite some time. So these days, NOFS reclaim isn't > really weaker than regular direct reclaim. What about [di]cache and some others fs specific shrinkers (and heavy metadata loads)? > The only exception is that > it might block writeback, so we'd go OOM if the only reclaimables left > were dirty pages against that filesystem. That should be acceptable. OOM killer is hardly acceptable by most users I've heard from. OOM killer is the _last_ resort and if the allocation is restricted then we shouldn't use the big hammer. The allocator might use __GFP_HIGH to get access to memory reserves if it can fail or __GFP_NOFAIL if it cannot. With your patches the NOFAIL case would get an access to memory reserves as well. So I do not really see a reason to change GFP_NOFS vs. OOM killer semantic. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 47981c5e54c3..fe3cb2b0b85b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int > order, int alloc_flags, > /* The OOM killer does not needlessly kill tasks for lowmem */ > if (ac->high_zoneidx < ZONE_NORMAL) > goto out; > - /* The OOM killer does not compensate for IO-less reclaim */ > - if (!(gfp_mask & __GFP_FS)) { > - /* > - * XXX: Page reclaim didn't yield anything, > - * and the OOM killer can't be invoked, but > - * keep looping as per tradition. > - */ > - *did_some_progress = 1; > - goto out; > - } > if (pm_suspended_storage()) > goto out; > /* The OOM killer may not free memory on a specific node */ -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote: [Sorry for a late reply] On Tue 07-04-15 10:18:22, Johannes Weiner wrote: On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: My question here would be: are there any NOFS allocations that *don't* want this behavior? Does it even make sense to require this separate annotation or should we just make it the default? The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. What about [di]cache and some others fs specific shrinkers (and heavy metadata loads)? We don't do direct reclaim for fs shrinkers in GFP_NOFS context, either. *HOWEVER* The shrinker reclaim we can not execute is deferred to the next context that can do the reclaim, which is usually kswapd. So the reclaim gets done according to the GFP_NOFS memory pressure that is occurring, it is just done in a different context... The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. OOM killer is hardly acceptable by most users I've heard from. OOM killer is the _last_ resort and if the allocation is restricted then we shouldn't use the big hammer. The allocator might use __GFP_HIGH to get access to memory reserves if it can fail or __GFP_NOFAIL if it cannot. With your patches the NOFAIL case would get an access to memory reserves as well. So I do not really see a reason to change GFP_NOFS vs. OOM killer semantic. So, really, what we want is something like: #define __GFP_USE_LOWMEM_RESERVE__GFP_HIGH So that it documents the code that is using it effectively and we can find them easily with cscope/grep? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Sat 11-04-15 16:29:26, Tetsuo Handa wrote: Johannes Weiner wrote: The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47981c5e54c3..fe3cb2b0b85b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, int alloc_flags, /* The OOM killer does not needlessly kill tasks for lowmem */ if (ac-high_zoneidx ZONE_NORMAL) goto out; - /* The OOM killer does not compensate for IO-less reclaim */ - if (!(gfp_mask __GFP_FS)) { - /* -* XXX: Page reclaim didn't yield anything, -* and the OOM killer can't be invoked, but -* keep looping as per tradition. -*/ - *did_some_progress = 1; - goto out; - } if (pm_suspended_storage()) goto out; /* The OOM killer may not free memory on a specific node */ I think this change will allow calling out_of_memory() which results in oom_kill_process() is trivially called via pagefault_out_of_memory() problem described in https://lkml.org/lkml/2015/3/18/219 . I myself think that we should trigger OOM killer for !__GFP_FS allocation in order to make forward progress in case the OOM victim is blocked. So, my question about this change is whether we can accept involving OOM killer from page fault, no matter how trivially OOM killer will kill some process? We trigger OOM killer from the page fault path for ages. In fact the memcg will trigger memcg OOM killer _only_ from the page fault path because this context is safe as we do not sit on any locks at the time. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
[Sorry for a late reply] On Tue 07-04-15 10:18:22, Johannes Weiner wrote: On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: On Mon 30-03-15 11:32:40, Dave Chinner wrote: On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: [...] GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. You can ask for this try harder by __GFP_HIGH flag. Would that help in your fallback case? I would think __GFP_REPEAT would be more suitable here. From the doc: * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. so we can make the semantics of GFP_NOFS | __GFP_REPEAT such that they are allowed to use the OOM killer and dip into the OOM reserves. __GFP_REPEAT is quite subtle already. It makes a difference only for high order allocations and it is not clear to me why it should imply OOM killer for small orders now. Or did you suggest making it special only with GFP_NOFS? That sounds even more ugly. AFAIU, David wasn't asking for the OOM killer as much as he was interested in getting access to a small amount of reserves in order to make a progress. __GFP_HIGH is there for this purpose. My question here would be: are there any NOFS allocations that *don't* want this behavior? Does it even make sense to require this separate annotation or should we just make it the default? The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. What about [di]cache and some others fs specific shrinkers (and heavy metadata loads)? The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. OOM killer is hardly acceptable by most users I've heard from. OOM killer is the _last_ resort and if the allocation is restricted then we shouldn't use the big hammer. The allocator might use __GFP_HIGH to get access to memory reserves if it can fail or __GFP_NOFAIL if it cannot. With your patches the NOFAIL case would get an access to memory reserves as well. So I do not really see a reason to change GFP_NOFS vs. OOM killer semantic. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47981c5e54c3..fe3cb2b0b85b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, int alloc_flags, /* The OOM killer does not needlessly kill tasks for lowmem */ if (ac-high_zoneidx ZONE_NORMAL) goto out; - /* The OOM killer does not compensate for IO-less reclaim */ - if (!(gfp_mask __GFP_FS)) { - /* - * XXX: Page reclaim didn't yield anything, - * and the OOM killer can't be invoked, but - * keep looping as per tradition. - */ - *did_some_progress = 1; - goto out; - } if (pm_suspended_storage()) goto out; /* The OOM killer may not free memory on a specific node */ -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
Johannes Weiner wrote: > The argument here was always that NOFS allocations are very limited in > their reclaim powers and will trigger OOM prematurely. However, the > way we limit dirty memory these days forces most cache to be clean at > all times, and direct reclaim in general hasn't been allowed to issue > page writeback for quite some time. So these days, NOFS reclaim isn't > really weaker than regular direct reclaim. The only exception is that > it might block writeback, so we'd go OOM if the only reclaimables left > were dirty pages against that filesystem. That should be acceptable. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 47981c5e54c3..fe3cb2b0b85b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int > order, int alloc_flags, > /* The OOM killer does not needlessly kill tasks for lowmem */ > if (ac->high_zoneidx < ZONE_NORMAL) > goto out; > - /* The OOM killer does not compensate for IO-less reclaim */ > - if (!(gfp_mask & __GFP_FS)) { > - /* > - * XXX: Page reclaim didn't yield anything, > - * and the OOM killer can't be invoked, but > - * keep looping as per tradition. > - */ > - *did_some_progress = 1; > - goto out; > - } > if (pm_suspended_storage()) > goto out; > /* The OOM killer may not free memory on a specific node */ > I think this change will allow calling out_of_memory() which results in "oom_kill_process() is trivially called via pagefault_out_of_memory()" problem described in https://lkml.org/lkml/2015/3/18/219 . I myself think that we should trigger OOM killer for !__GFP_FS allocation in order to make forward progress in case the OOM victim is blocked. So, my question about this change is whether we can accept involving OOM killer from page fault, no matter how trivially OOM killer will kill some process? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
Johannes Weiner wrote: The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47981c5e54c3..fe3cb2b0b85b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, int alloc_flags, /* The OOM killer does not needlessly kill tasks for lowmem */ if (ac-high_zoneidx ZONE_NORMAL) goto out; - /* The OOM killer does not compensate for IO-less reclaim */ - if (!(gfp_mask __GFP_FS)) { - /* - * XXX: Page reclaim didn't yield anything, - * and the OOM killer can't be invoked, but - * keep looping as per tradition. - */ - *did_some_progress = 1; - goto out; - } if (pm_suspended_storage()) goto out; /* The OOM killer may not free memory on a specific node */ I think this change will allow calling out_of_memory() which results in oom_kill_process() is trivially called via pagefault_out_of_memory() problem described in https://lkml.org/lkml/2015/3/18/219 . I myself think that we should trigger OOM killer for !__GFP_FS allocation in order to make forward progress in case the OOM victim is blocked. So, my question about this change is whether we can accept involving OOM killer from page fault, no matter how trivially OOM killer will kill some process? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: > On Mon 30-03-15 11:32:40, Dave Chinner wrote: > > On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: > [...] > > > GFP_NOFS sites are currently one of the sites that can deadlock inside > > > the allocator, even though many of them seem to have fallback code. > > > My reasoning here is that if you *have* an exit strategy for failing > > > allocations that is smarter than hanging, we should probably use that. > > > > We already do that for allocations where we can handle failure in > > GFP_NOFS conditions. It is, however, somewhat useless if we can't > > tell the allocator to try really hard if we've already had a failure > > and we are already in memory reclaim conditions (e.g. a shrinker > > trying to clean dirty objects so they can be reclaimed). > > > > From that perspective, I think that this patch set aims force us > > away from handling fallbacks ourselves because a) it makes GFP_NOFS > > more likely to fail, and b) provides no mechanism to "try harder" > > when we really need the allocation to succeed. > > You can ask for this "try harder" by __GFP_HIGH flag. Would that help > in your fallback case? I would think __GFP_REPEAT would be more suitable here. From the doc: * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. so we can make the semantics of GFP_NOFS | __GFP_REPEAT such that they are allowed to use the OOM killer and dip into the OOM reserves. My question here would be: are there any NOFS allocations that *don't* want this behavior? Does it even make sense to require this separate annotation or should we just make it the default? The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47981c5e54c3..fe3cb2b0b85b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, int alloc_flags, /* The OOM killer does not needlessly kill tasks for lowmem */ if (ac->high_zoneidx < ZONE_NORMAL) goto out; - /* The OOM killer does not compensate for IO-less reclaim */ - if (!(gfp_mask & __GFP_FS)) { - /* -* XXX: Page reclaim didn't yield anything, -* and the OOM killer can't be invoked, but -* keep looping as per tradition. -*/ - *did_some_progress = 1; - goto out; - } if (pm_suspended_storage()) goto out; /* The OOM killer may not free memory on a specific node */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: On Mon 30-03-15 11:32:40, Dave Chinner wrote: On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: [...] GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. You can ask for this try harder by __GFP_HIGH flag. Would that help in your fallback case? I would think __GFP_REPEAT would be more suitable here. From the doc: * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. so we can make the semantics of GFP_NOFS | __GFP_REPEAT such that they are allowed to use the OOM killer and dip into the OOM reserves. My question here would be: are there any NOFS allocations that *don't* want this behavior? Does it even make sense to require this separate annotation or should we just make it the default? The argument here was always that NOFS allocations are very limited in their reclaim powers and will trigger OOM prematurely. However, the way we limit dirty memory these days forces most cache to be clean at all times, and direct reclaim in general hasn't been allowed to issue page writeback for quite some time. So these days, NOFS reclaim isn't really weaker than regular direct reclaim. The only exception is that it might block writeback, so we'd go OOM if the only reclaimables left were dirty pages against that filesystem. That should be acceptable. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47981c5e54c3..fe3cb2b0b85b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2367,16 +2367,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, int alloc_flags, /* The OOM killer does not needlessly kill tasks for lowmem */ if (ac-high_zoneidx ZONE_NORMAL) goto out; - /* The OOM killer does not compensate for IO-less reclaim */ - if (!(gfp_mask __GFP_FS)) { - /* -* XXX: Page reclaim didn't yield anything, -* and the OOM killer can't be invoked, but -* keep looping as per tradition. -*/ - *did_some_progress = 1; - goto out; - } if (pm_suspended_storage()) goto out; /* The OOM killer may not free memory on a specific node */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Thu 02-04-15 08:39:02, Dave Chinner wrote: > On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: > > On Mon 30-03-15 11:32:40, Dave Chinner wrote: > > > On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: > > [...] > > > > GFP_NOFS sites are currently one of the sites that can deadlock inside > > > > the allocator, even though many of them seem to have fallback code. > > > > My reasoning here is that if you *have* an exit strategy for failing > > > > allocations that is smarter than hanging, we should probably use that. > > > > > > We already do that for allocations where we can handle failure in > > > GFP_NOFS conditions. It is, however, somewhat useless if we can't > > > tell the allocator to try really hard if we've already had a failure > > > and we are already in memory reclaim conditions (e.g. a shrinker > > > trying to clean dirty objects so they can be reclaimed). > > > > > > From that perspective, I think that this patch set aims force us > > > away from handling fallbacks ourselves because a) it makes GFP_NOFS > > > more likely to fail, and b) provides no mechanism to "try harder" > > > when we really need the allocation to succeed. > > > > You can ask for this "try harder" by __GFP_HIGH flag. Would that help > > in your fallback case? > > That dips into GFP_ATOMIC reserves, right? What is the impact on the > GFP_ATOMIC allocations that need it? Yes the memory reserve is shared but the flag would be used only after previous GFP_NOFS allocation has failed which means that that the system is close to the OOM and chances for GFP_ATOMIC allocations (which are GFP_NOWAIT and cannot perform any reclaim) success are quite low already. > We typically see network cards fail GFP_ATOMIC allocations before XFS > starts complaining about allocation failures, so i suspect that this > might just make things worse rather than better... My understanding is that GFP_ATOMIC allocation would fallback to GFP_WAIT type of allocation in the deferred context in the networking code. There would be some performance hit but again we are talking about close to OOM conditions here. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Thu 02-04-15 08:39:02, Dave Chinner wrote: On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: On Mon 30-03-15 11:32:40, Dave Chinner wrote: On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: [...] GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. You can ask for this try harder by __GFP_HIGH flag. Would that help in your fallback case? That dips into GFP_ATOMIC reserves, right? What is the impact on the GFP_ATOMIC allocations that need it? Yes the memory reserve is shared but the flag would be used only after previous GFP_NOFS allocation has failed which means that that the system is close to the OOM and chances for GFP_ATOMIC allocations (which are GFP_NOWAIT and cannot perform any reclaim) success are quite low already. We typically see network cards fail GFP_ATOMIC allocations before XFS starts complaining about allocation failures, so i suspect that this might just make things worse rather than better... My understanding is that GFP_ATOMIC allocation would fallback to GFP_WAIT type of allocation in the deferred context in the networking code. There would be some performance hit but again we are talking about close to OOM conditions here. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: > On Mon 30-03-15 11:32:40, Dave Chinner wrote: > > On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: > [...] > > > GFP_NOFS sites are currently one of the sites that can deadlock inside > > > the allocator, even though many of them seem to have fallback code. > > > My reasoning here is that if you *have* an exit strategy for failing > > > allocations that is smarter than hanging, we should probably use that. > > > > We already do that for allocations where we can handle failure in > > GFP_NOFS conditions. It is, however, somewhat useless if we can't > > tell the allocator to try really hard if we've already had a failure > > and we are already in memory reclaim conditions (e.g. a shrinker > > trying to clean dirty objects so they can be reclaimed). > > > > From that perspective, I think that this patch set aims force us > > away from handling fallbacks ourselves because a) it makes GFP_NOFS > > more likely to fail, and b) provides no mechanism to "try harder" > > when we really need the allocation to succeed. > > You can ask for this "try harder" by __GFP_HIGH flag. Would that help > in your fallback case? That dips into GFP_ATOMIC reserves, right? What is the impact on the GFP_ATOMIC allocations that need it? We typically see network cards fail GFP_ATOMIC allocations before XFS starts complaining about allocation failures, so i suspect that this might just make things worse rather than better... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon 30-03-15 11:32:40, Dave Chinner wrote: > On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: [...] > > GFP_NOFS sites are currently one of the sites that can deadlock inside > > the allocator, even though many of them seem to have fallback code. > > My reasoning here is that if you *have* an exit strategy for failing > > allocations that is smarter than hanging, we should probably use that. > > We already do that for allocations where we can handle failure in > GFP_NOFS conditions. It is, however, somewhat useless if we can't > tell the allocator to try really hard if we've already had a failure > and we are already in memory reclaim conditions (e.g. a shrinker > trying to clean dirty objects so they can be reclaimed). > > From that perspective, I think that this patch set aims force us > away from handling fallbacks ourselves because a) it makes GFP_NOFS > more likely to fail, and b) provides no mechanism to "try harder" > when we really need the allocation to succeed. You can ask for this "try harder" by __GFP_HIGH flag. Would that help in your fallback case? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon 30-03-15 11:32:40, Dave Chinner wrote: On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: [...] GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. You can ask for this try harder by __GFP_HIGH flag. Would that help in your fallback case? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Wed, Apr 01, 2015 at 05:19:20PM +0200, Michal Hocko wrote: On Mon 30-03-15 11:32:40, Dave Chinner wrote: On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: [...] GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. You can ask for this try harder by __GFP_HIGH flag. Would that help in your fallback case? That dips into GFP_ATOMIC reserves, right? What is the impact on the GFP_ATOMIC allocations that need it? We typically see network cards fail GFP_ATOMIC allocations before XFS starts complaining about allocation failures, so i suspect that this might just make things worse rather than better... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon, Mar 30, 2015 at 11:32:40AM +1100, Dave Chinner wrote: > On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: > > GFP_NOFS sites are currently one of the sites that can deadlock inside > > the allocator, even though many of them seem to have fallback code. > > My reasoning here is that if you *have* an exit strategy for failing > > allocations that is smarter than hanging, we should probably use that. > > We already do that for allocations where we can handle failure in > GFP_NOFS conditions. It is, however, somewhat useless if we can't > tell the allocator to try really hard if we've already had a failure > and we are already in memory reclaim conditions (e.g. a shrinker > trying to clean dirty objects so they can be reclaimed). What do you mean you already do that? These allocations currently won't fail. They loop forever in the allocator. Fallback code is dead code right now. (Unless you do order-4 and up, which I doubt.) > From that perspective, I think that this patch set aims force us > away from handling fallbacks ourselves because a) it makes GFP_NOFS > more likely to fail, and b) provides no mechanism to "try harder" > when we really need the allocation to succeed. If by "more likely" you mean "at all possible", then yes. However, as far as trying harder goes, that sounds like a good idea. It should be possible for NOFS contexts to use the OOM killer and its reserves. But still, they should be allowed to propagate allocation failures rather than just hanging in the allocator. > > > > mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations > > > > > > > > An exacerbation of the victim-stuck-behind-allocation scenario are > > > > __GFP_NOFAIL allocations, because they will actually deadlock. To > > > > avoid this, or try to, give __GFP_NOFAIL allocations access to not > > > > just the OOM reserves but also the system's emergency reserves. > > > > > > > > This is basically a poor man's reservation system, which could or > > > > should be replaced later on with an explicit reservation system that > > > > e.g. filesystems have control over for use by transactions. > > > > > > > > It's obviously not bulletproof and might still lock up, but it should > > > > greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has > > > > been using this successfully for some time. > > > > > > So, if we want GFP_NOFS allocations to be able to dip into a > > > small extra reservation to make progress at ENOMEM, we have to use > > > use __GFP_NOFAIL because looping ourselves won't allow use of these > > > extra reserves? > > > > As I said, this series is not about providing reserves just yet. It > > is about using the fallback strategies you already implemented. And > > where you don't have any, it's about making the allocator's last way > > of forward progress, the OOM killer, more reliable. > > Sure - but you're doing that by adding a special reserve for > GFP_NOFAIL allocations to dip into when the OOM killer is active. > That can only be accessed by GFP_NOFAIL allocations - anyone who > has a fallback but really needs the allocation to succeed if at all > possible (i.e. should only fail to avoid a deadlock situation) can't > communicate that fact to the allocator Hm? It's not restricted to NOFAIL at all, look closer at my patch series. What you are describing is exactly how I propose the allocator should handle all regular allocations: exhaust reclaimable pages, use the OOM killer, dip into OOM reserves, but ultimately fail. The only thing __GFP_NOFAIL does in *addition* to that is use the last emergency reserves of the system in an attempt to avoid deadlocking. [ Once those reserves are depleted, however, the system will deadlock, so we can only give them to allocations that would otherwise lock up anyway, i.e. __GFP_NOFAIL. It would be silly to risk a system deadlock for an allocation that has a fallback strategy. That is why you have to let the allocator know whether you can fall back. ] The notable exception to this behavior are NOFS callers because of its current OOM kill restrictions. But as I said, I'm absolutely open to addressing this and either let them generally use the OOM killer after some time, or provide you with another annotation that lets you come back to try harder. I don't really care which way, that depends on your requirements. > > > > This patch makes NOFS allocations fail if reclaim can't free anything. > > > > > > > > It would be good if the filesystem people could weigh in on whether > > > > they can deal with failing GFP_NOFS allocations, or annotate the > > > > exceptions with __GFP_NOFAIL etc. It could well be that a middle > > > > ground is required that allows using the OOM killer before giving up. > > > > > > ... which looks to me like a catch-22 situation for us: We > > > have reserves, but callers need to use __GFP_NOFAIL to access them. > > > GFP_NOFS is going to fail more often, so callers
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Mon, Mar 30, 2015 at 11:32:40AM +1100, Dave Chinner wrote: On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). What do you mean you already do that? These allocations currently won't fail. They loop forever in the allocator. Fallback code is dead code right now. (Unless you do order-4 and up, which I doubt.) From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. If by more likely you mean at all possible, then yes. However, as far as trying harder goes, that sounds like a good idea. It should be possible for NOFS contexts to use the OOM killer and its reserves. But still, they should be allowed to propagate allocation failures rather than just hanging in the allocator. mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations An exacerbation of the victim-stuck-behind-allocation scenario are __GFP_NOFAIL allocations, because they will actually deadlock. To avoid this, or try to, give __GFP_NOFAIL allocations access to not just the OOM reserves but also the system's emergency reserves. This is basically a poor man's reservation system, which could or should be replaced later on with an explicit reservation system that e.g. filesystems have control over for use by transactions. It's obviously not bulletproof and might still lock up, but it should greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has been using this successfully for some time. So, if we want GFP_NOFS allocations to be able to dip into a small extra reservation to make progress at ENOMEM, we have to use use __GFP_NOFAIL because looping ourselves won't allow use of these extra reserves? As I said, this series is not about providing reserves just yet. It is about using the fallback strategies you already implemented. And where you don't have any, it's about making the allocator's last way of forward progress, the OOM killer, more reliable. Sure - but you're doing that by adding a special reserve for GFP_NOFAIL allocations to dip into when the OOM killer is active. That can only be accessed by GFP_NOFAIL allocations - anyone who has a fallback but really needs the allocation to succeed if at all possible (i.e. should only fail to avoid a deadlock situation) can't communicate that fact to the allocator Hm? It's not restricted to NOFAIL at all, look closer at my patch series. What you are describing is exactly how I propose the allocator should handle all regular allocations: exhaust reclaimable pages, use the OOM killer, dip into OOM reserves, but ultimately fail. The only thing __GFP_NOFAIL does in *addition* to that is use the last emergency reserves of the system in an attempt to avoid deadlocking. [ Once those reserves are depleted, however, the system will deadlock, so we can only give them to allocations that would otherwise lock up anyway, i.e. __GFP_NOFAIL. It would be silly to risk a system deadlock for an allocation that has a fallback strategy. That is why you have to let the allocator know whether you can fall back. ] The notable exception to this behavior are NOFS callers because of its current OOM kill restrictions. But as I said, I'm absolutely open to addressing this and either let them generally use the OOM killer after some time, or provide you with another annotation that lets you come back to try harder. I don't really care which way, that depends on your requirements. This patch makes NOFS allocations fail if reclaim can't free anything. It would be good if the filesystem people could weigh in on whether they can deal with failing GFP_NOFS allocations, or annotate the exceptions with __GFP_NOFAIL etc. It could well be that a middle ground is required that allows using the OOM killer before giving up. ... which looks to me like a catch-22 situation for us: We have reserves, but callers need to use __GFP_NOFAIL to access them. GFP_NOFS is going to fail more often, so callers need to handle that in some way, either by looping or erroring out. But if we loop manually because we try to handle ENOMEM situations
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: > On Fri, Mar 27, 2015 at 06:58:22AM +1100, Dave Chinner wrote: > > On Wed, Mar 25, 2015 at 02:17:04AM -0400, Johannes Weiner wrote: > > > Hi everybody, > > > > > > in the recent past we've had several reports and discussions on how to > > > deal with allocations hanging in the allocator upon OOM. > > > > > > The idea of this series is mainly to make the mechanism of detecting > > > OOM situations reliable enough that we can be confident about failing > > > allocations, and then leave the fallback strategy to the caller rather > > > than looping forever in the allocator. > > > > > > The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at > > > least for the short term while we don't have a reservation system yet. > > > > A valid goal, but I think this series goes about it the wrong way. > > i.e. it forces us to use __GFP_NOFAIL rather than providing us a > > valid fallback mechanism to access reserves. > > I think you misunderstood the goal. > > While I agree that reserves would be the optimal fallback strategy, > this series is about avoiding deadlocks in existing callsites that > currently can not fail. This is about getting the best out of our > existing mechanisms until we have universal reservation coverage, > which will take time to devise and transition our codebase to. That might be the goal, but it looks like the wrong path to me. > GFP_NOFS sites are currently one of the sites that can deadlock inside > the allocator, even though many of them seem to have fallback code. > My reasoning here is that if you *have* an exit strategy for failing > allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). >From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to "try harder" when we really need the allocation to succeed. > > > mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations > > > > > > An exacerbation of the victim-stuck-behind-allocation scenario are > > > __GFP_NOFAIL allocations, because they will actually deadlock. To > > > avoid this, or try to, give __GFP_NOFAIL allocations access to not > > > just the OOM reserves but also the system's emergency reserves. > > > > > > This is basically a poor man's reservation system, which could or > > > should be replaced later on with an explicit reservation system that > > > e.g. filesystems have control over for use by transactions. > > > > > > It's obviously not bulletproof and might still lock up, but it should > > > greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has > > > been using this successfully for some time. > > > > So, if we want GFP_NOFS allocations to be able to dip into a > > small extra reservation to make progress at ENOMEM, we have to use > > use __GFP_NOFAIL because looping ourselves won't allow use of these > > extra reserves? > > As I said, this series is not about providing reserves just yet. It > is about using the fallback strategies you already implemented. And > where you don't have any, it's about making the allocator's last way > of forward progress, the OOM killer, more reliable. Sure - but you're doing that by adding a special reserve for GFP_NOFAIL allocations to dip into when the OOM killer is active. That can only be accessed by GFP_NOFAIL allocations - anyone who has a fallback but really needs the allocation to succeed if at all possible (i.e. should only fail to avoid a deadlock situation) can't communicate that fact to the allocator > > > This patch makes NOFS allocations fail if reclaim can't free anything. > > > > > > It would be good if the filesystem people could weigh in on whether > > > they can deal with failing GFP_NOFS allocations, or annotate the > > > exceptions with __GFP_NOFAIL etc. It could well be that a middle > > > ground is required that allows using the OOM killer before giving up. > > > > ... which looks to me like a catch-22 situation for us: We > > have reserves, but callers need to use __GFP_NOFAIL to access them. > > GFP_NOFS is going to fail more often, so callers need to handle that > > in some way, either by looping or erroring out. > > > > But if we loop manually because we try to handle ENOMEM situations > > gracefully (e.g. try a number of times before erroring out) we can't > > dip into the reserves because the only semantics being provided are > > "try-once-without-reserves" or "try-forever-with-reserves". i.e. > > what we actually need here is "try-once-with-reserves" semantics
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Fri, Mar 27, 2015 at 11:05:09AM -0400, Johannes Weiner wrote: On Fri, Mar 27, 2015 at 06:58:22AM +1100, Dave Chinner wrote: On Wed, Mar 25, 2015 at 02:17:04AM -0400, Johannes Weiner wrote: Hi everybody, in the recent past we've had several reports and discussions on how to deal with allocations hanging in the allocator upon OOM. The idea of this series is mainly to make the mechanism of detecting OOM situations reliable enough that we can be confident about failing allocations, and then leave the fallback strategy to the caller rather than looping forever in the allocator. The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at least for the short term while we don't have a reservation system yet. A valid goal, but I think this series goes about it the wrong way. i.e. it forces us to use __GFP_NOFAIL rather than providing us a valid fallback mechanism to access reserves. I think you misunderstood the goal. While I agree that reserves would be the optimal fallback strategy, this series is about avoiding deadlocks in existing callsites that currently can not fail. This is about getting the best out of our existing mechanisms until we have universal reservation coverage, which will take time to devise and transition our codebase to. That might be the goal, but it looks like the wrong path to me. GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. We already do that for allocations where we can handle failure in GFP_NOFS conditions. It is, however, somewhat useless if we can't tell the allocator to try really hard if we've already had a failure and we are already in memory reclaim conditions (e.g. a shrinker trying to clean dirty objects so they can be reclaimed). From that perspective, I think that this patch set aims force us away from handling fallbacks ourselves because a) it makes GFP_NOFS more likely to fail, and b) provides no mechanism to try harder when we really need the allocation to succeed. mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations An exacerbation of the victim-stuck-behind-allocation scenario are __GFP_NOFAIL allocations, because they will actually deadlock. To avoid this, or try to, give __GFP_NOFAIL allocations access to not just the OOM reserves but also the system's emergency reserves. This is basically a poor man's reservation system, which could or should be replaced later on with an explicit reservation system that e.g. filesystems have control over for use by transactions. It's obviously not bulletproof and might still lock up, but it should greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has been using this successfully for some time. So, if we want GFP_NOFS allocations to be able to dip into a small extra reservation to make progress at ENOMEM, we have to use use __GFP_NOFAIL because looping ourselves won't allow use of these extra reserves? As I said, this series is not about providing reserves just yet. It is about using the fallback strategies you already implemented. And where you don't have any, it's about making the allocator's last way of forward progress, the OOM killer, more reliable. Sure - but you're doing that by adding a special reserve for GFP_NOFAIL allocations to dip into when the OOM killer is active. That can only be accessed by GFP_NOFAIL allocations - anyone who has a fallback but really needs the allocation to succeed if at all possible (i.e. should only fail to avoid a deadlock situation) can't communicate that fact to the allocator This patch makes NOFS allocations fail if reclaim can't free anything. It would be good if the filesystem people could weigh in on whether they can deal with failing GFP_NOFS allocations, or annotate the exceptions with __GFP_NOFAIL etc. It could well be that a middle ground is required that allows using the OOM killer before giving up. ... which looks to me like a catch-22 situation for us: We have reserves, but callers need to use __GFP_NOFAIL to access them. GFP_NOFS is going to fail more often, so callers need to handle that in some way, either by looping or erroring out. But if we loop manually because we try to handle ENOMEM situations gracefully (e.g. try a number of times before erroring out) we can't dip into the reserves because the only semantics being provided are try-once-without-reserves or try-forever-with-reserves. i.e. what we actually need here is try-once-with-reserves semantics so that we can make progress after a failing GFP_NOFS try-once-without-reserves allocation. IOWS, __GFP_NOFAIL is not the answer here - it's GFP_NOFS |
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Fri, Mar 27, 2015 at 06:58:22AM +1100, Dave Chinner wrote: > On Wed, Mar 25, 2015 at 02:17:04AM -0400, Johannes Weiner wrote: > > Hi everybody, > > > > in the recent past we've had several reports and discussions on how to > > deal with allocations hanging in the allocator upon OOM. > > > > The idea of this series is mainly to make the mechanism of detecting > > OOM situations reliable enough that we can be confident about failing > > allocations, and then leave the fallback strategy to the caller rather > > than looping forever in the allocator. > > > > The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at > > least for the short term while we don't have a reservation system yet. > > A valid goal, but I think this series goes about it the wrong way. > i.e. it forces us to use __GFP_NOFAIL rather than providing us a > valid fallback mechanism to access reserves. I think you misunderstood the goal. While I agree that reserves would be the optimal fallback strategy, this series is about avoiding deadlocks in existing callsites that currently can not fail. This is about getting the best out of our existing mechanisms until we have universal reservation coverage, which will take time to devise and transition our codebase to. GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. > > mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations > > > > An exacerbation of the victim-stuck-behind-allocation scenario are > > __GFP_NOFAIL allocations, because they will actually deadlock. To > > avoid this, or try to, give __GFP_NOFAIL allocations access to not > > just the OOM reserves but also the system's emergency reserves. > > > > This is basically a poor man's reservation system, which could or > > should be replaced later on with an explicit reservation system that > > e.g. filesystems have control over for use by transactions. > > > > It's obviously not bulletproof and might still lock up, but it should > > greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has > > been using this successfully for some time. > > So, if we want GFP_NOFS allocations to be able to dip into a > small extra reservation to make progress at ENOMEM, we have to use > use __GFP_NOFAIL because looping ourselves won't allow use of these > extra reserves? As I said, this series is not about providing reserves just yet. It is about using the fallback strategies you already implemented. And where you don't have any, it's about making the allocator's last way of forward progress, the OOM killer, more reliable. If you have an allocation site that is endlessly looping around calls to the allocator, it means you DON'T have a fallback strategy. In that case, it would be in your interest to tell the allocator, such that it can take measures to break the infinite loop. However, those measures are not without their own risk and they need to be carefully sequenced to reduce the risk for deadlocks. E.g. we can not give __GFP_NOFAIL allocations access to the statically-sized emergency reserves without taking steps to free memory at the same time, because then we'd just trade forward progress of that allocation against forward progress of some memory reclaimer later on which finds the emergency reserves exhausted. > > mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM > > > > Another hang that was reported was from NOFS allocations. The trouble > > with these is that they can't issue or wait for writeback during page > > reclaim, and so we don't want to OOM kill on their behalf. However, > > with such restrictions on making progress, they are prone to hangs. > > And because this effectively means GFP_NOFS allocations are > going to fail much more often, we're either going to have to loop > ourselves or use __GFP_NOFAIL... > > > This patch makes NOFS allocations fail if reclaim can't free anything. > > > > It would be good if the filesystem people could weigh in on whether > > they can deal with failing GFP_NOFS allocations, or annotate the > > exceptions with __GFP_NOFAIL etc. It could well be that a middle > > ground is required that allows using the OOM killer before giving up. > > ... which looks to me like a catch-22 situation for us: We > have reserves, but callers need to use __GFP_NOFAIL to access them. > GFP_NOFS is going to fail more often, so callers need to handle that > in some way, either by looping or erroring out. > > But if we loop manually because we try to handle ENOMEM situations > gracefully (e.g. try a number of times before erroring out) we can't > dip into the reserves because the only semantics being provided are > "try-once-without-reserves" or "try-forever-with-reserves". i.e. > what we actually need here is
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Fri, Mar 27, 2015 at 06:58:22AM +1100, Dave Chinner wrote: On Wed, Mar 25, 2015 at 02:17:04AM -0400, Johannes Weiner wrote: Hi everybody, in the recent past we've had several reports and discussions on how to deal with allocations hanging in the allocator upon OOM. The idea of this series is mainly to make the mechanism of detecting OOM situations reliable enough that we can be confident about failing allocations, and then leave the fallback strategy to the caller rather than looping forever in the allocator. The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at least for the short term while we don't have a reservation system yet. A valid goal, but I think this series goes about it the wrong way. i.e. it forces us to use __GFP_NOFAIL rather than providing us a valid fallback mechanism to access reserves. I think you misunderstood the goal. While I agree that reserves would be the optimal fallback strategy, this series is about avoiding deadlocks in existing callsites that currently can not fail. This is about getting the best out of our existing mechanisms until we have universal reservation coverage, which will take time to devise and transition our codebase to. GFP_NOFS sites are currently one of the sites that can deadlock inside the allocator, even though many of them seem to have fallback code. My reasoning here is that if you *have* an exit strategy for failing allocations that is smarter than hanging, we should probably use that. mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations An exacerbation of the victim-stuck-behind-allocation scenario are __GFP_NOFAIL allocations, because they will actually deadlock. To avoid this, or try to, give __GFP_NOFAIL allocations access to not just the OOM reserves but also the system's emergency reserves. This is basically a poor man's reservation system, which could or should be replaced later on with an explicit reservation system that e.g. filesystems have control over for use by transactions. It's obviously not bulletproof and might still lock up, but it should greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has been using this successfully for some time. So, if we want GFP_NOFS allocations to be able to dip into a small extra reservation to make progress at ENOMEM, we have to use use __GFP_NOFAIL because looping ourselves won't allow use of these extra reserves? As I said, this series is not about providing reserves just yet. It is about using the fallback strategies you already implemented. And where you don't have any, it's about making the allocator's last way of forward progress, the OOM killer, more reliable. If you have an allocation site that is endlessly looping around calls to the allocator, it means you DON'T have a fallback strategy. In that case, it would be in your interest to tell the allocator, such that it can take measures to break the infinite loop. However, those measures are not without their own risk and they need to be carefully sequenced to reduce the risk for deadlocks. E.g. we can not give __GFP_NOFAIL allocations access to the statically-sized emergency reserves without taking steps to free memory at the same time, because then we'd just trade forward progress of that allocation against forward progress of some memory reclaimer later on which finds the emergency reserves exhausted. mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Another hang that was reported was from NOFS allocations. The trouble with these is that they can't issue or wait for writeback during page reclaim, and so we don't want to OOM kill on their behalf. However, with such restrictions on making progress, they are prone to hangs. And because this effectively means GFP_NOFS allocations are going to fail much more often, we're either going to have to loop ourselves or use __GFP_NOFAIL... This patch makes NOFS allocations fail if reclaim can't free anything. It would be good if the filesystem people could weigh in on whether they can deal with failing GFP_NOFS allocations, or annotate the exceptions with __GFP_NOFAIL etc. It could well be that a middle ground is required that allows using the OOM killer before giving up. ... which looks to me like a catch-22 situation for us: We have reserves, but callers need to use __GFP_NOFAIL to access them. GFP_NOFS is going to fail more often, so callers need to handle that in some way, either by looping or erroring out. But if we loop manually because we try to handle ENOMEM situations gracefully (e.g. try a number of times before erroring out) we can't dip into the reserves because the only semantics being provided are try-once-without-reserves or try-forever-with-reserves. i.e. what we actually need here is try-once-with-reserves semantics so that we can make progress after a failing GFP_NOFS try-once-without-reserves
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Wed, Mar 25, 2015 at 02:17:04AM -0400, Johannes Weiner wrote: > Hi everybody, > > in the recent past we've had several reports and discussions on how to > deal with allocations hanging in the allocator upon OOM. > > The idea of this series is mainly to make the mechanism of detecting > OOM situations reliable enough that we can be confident about failing > allocations, and then leave the fallback strategy to the caller rather > than looping forever in the allocator. > > The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at > least for the short term while we don't have a reservation system yet. A valid goal, but I think this series goes about it the wrong way. i.e. it forces us to use __GFP_NOFAIL rather than providing us a valid fallback mechanism to access reserves. > mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations > > An exacerbation of the victim-stuck-behind-allocation scenario are > __GFP_NOFAIL allocations, because they will actually deadlock. To > avoid this, or try to, give __GFP_NOFAIL allocations access to not > just the OOM reserves but also the system's emergency reserves. > > This is basically a poor man's reservation system, which could or > should be replaced later on with an explicit reservation system that > e.g. filesystems have control over for use by transactions. > > It's obviously not bulletproof and might still lock up, but it should > greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has > been using this successfully for some time. So, if we want GFP_NOFS allocations to be able to dip into a small extra reservation to make progress at ENOMEM, we have to use use __GFP_NOFAIL because looping ourselves won't allow use of these extra reserves? > mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM > > Another hang that was reported was from NOFS allocations. The trouble > with these is that they can't issue or wait for writeback during page > reclaim, and so we don't want to OOM kill on their behalf. However, > with such restrictions on making progress, they are prone to hangs. And because this effectively means GFP_NOFS allocations are going to fail much more often, we're either going to have to loop ourselves or use __GFP_NOFAIL... > This patch makes NOFS allocations fail if reclaim can't free anything. > > It would be good if the filesystem people could weigh in on whether > they can deal with failing GFP_NOFS allocations, or annotate the > exceptions with __GFP_NOFAIL etc. It could well be that a middle > ground is required that allows using the OOM killer before giving up. ... which looks to me like a catch-22 situation for us: We have reserves, but callers need to use __GFP_NOFAIL to access them. GFP_NOFS is going to fail more often, so callers need to handle that in some way, either by looping or erroring out. But if we loop manually because we try to handle ENOMEM situations gracefully (e.g. try a number of times before erroring out) we can't dip into the reserves because the only semantics being provided are "try-once-without-reserves" or "try-forever-with-reserves". i.e. what we actually need here is "try-once-with-reserves" semantics so that we can make progress after a failing GFP_NOFS "try-once-without-reserves" allocation. IOWS, __GFP_NOFAIL is not the answer here - it's GFP_NOFS | __GFP_USE_RESERVE that we need on the failure fallback path. Which, incidentally, is trivial to add to the XFS allocation code. Indeed, I'll request that you test series like this on metadata intensive filesystem workloads on XFS under memory stress and quantify how many new "XFS: possible deadlock in memory allocation" warnings are emitted. If the patch set floods the system with such warnings, then it means the proposed means the fallback for "caller handles allocation failure" is not making progress. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
On Wed, Mar 25, 2015 at 02:17:04AM -0400, Johannes Weiner wrote: Hi everybody, in the recent past we've had several reports and discussions on how to deal with allocations hanging in the allocator upon OOM. The idea of this series is mainly to make the mechanism of detecting OOM situations reliable enough that we can be confident about failing allocations, and then leave the fallback strategy to the caller rather than looping forever in the allocator. The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at least for the short term while we don't have a reservation system yet. A valid goal, but I think this series goes about it the wrong way. i.e. it forces us to use __GFP_NOFAIL rather than providing us a valid fallback mechanism to access reserves. mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations An exacerbation of the victim-stuck-behind-allocation scenario are __GFP_NOFAIL allocations, because they will actually deadlock. To avoid this, or try to, give __GFP_NOFAIL allocations access to not just the OOM reserves but also the system's emergency reserves. This is basically a poor man's reservation system, which could or should be replaced later on with an explicit reservation system that e.g. filesystems have control over for use by transactions. It's obviously not bulletproof and might still lock up, but it should greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has been using this successfully for some time. So, if we want GFP_NOFS allocations to be able to dip into a small extra reservation to make progress at ENOMEM, we have to use use __GFP_NOFAIL because looping ourselves won't allow use of these extra reserves? mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Another hang that was reported was from NOFS allocations. The trouble with these is that they can't issue or wait for writeback during page reclaim, and so we don't want to OOM kill on their behalf. However, with such restrictions on making progress, they are prone to hangs. And because this effectively means GFP_NOFS allocations are going to fail much more often, we're either going to have to loop ourselves or use __GFP_NOFAIL... This patch makes NOFS allocations fail if reclaim can't free anything. It would be good if the filesystem people could weigh in on whether they can deal with failing GFP_NOFS allocations, or annotate the exceptions with __GFP_NOFAIL etc. It could well be that a middle ground is required that allows using the OOM killer before giving up. ... which looks to me like a catch-22 situation for us: We have reserves, but callers need to use __GFP_NOFAIL to access them. GFP_NOFS is going to fail more often, so callers need to handle that in some way, either by looping or erroring out. But if we loop manually because we try to handle ENOMEM situations gracefully (e.g. try a number of times before erroring out) we can't dip into the reserves because the only semantics being provided are try-once-without-reserves or try-forever-with-reserves. i.e. what we actually need here is try-once-with-reserves semantics so that we can make progress after a failing GFP_NOFS try-once-without-reserves allocation. IOWS, __GFP_NOFAIL is not the answer here - it's GFP_NOFS | __GFP_USE_RESERVE that we need on the failure fallback path. Which, incidentally, is trivial to add to the XFS allocation code. Indeed, I'll request that you test series like this on metadata intensive filesystem workloads on XFS under memory stress and quantify how many new XFS: possible deadlock in memory allocation warnings are emitted. If the patch set floods the system with such warnings, then it means the proposed means the fallback for caller handles allocation failure is not making progress. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 00/12] mm: page_alloc: improve OOM mechanism and policy
Hi everybody, in the recent past we've had several reports and discussions on how to deal with allocations hanging in the allocator upon OOM. The idea of this series is mainly to make the mechanism of detecting OOM situations reliable enough that we can be confident about failing allocations, and then leave the fallback strategy to the caller rather than looping forever in the allocator. The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at least for the short term while we don't have a reservation system yet. Here is a breakdown of the proposed changes: mm: oom_kill: remove pointless locking in oom_enable() mm: oom_kill: clean up victim marking and exiting interfaces mm: oom_kill: remove misleading test-and-clear of known TIF_MEMDIE mm: oom_kill: remove pointless locking in exit_oom_victim() mm: oom_kill: generalize OOM progress waitqueue mm: oom_kill: simplify OOM killer locking mm: page_alloc: inline should_alloc_retry() contents These are preparational patches to clean up parts in the OOM killer and the page allocator. Filesystem folks and others that only care about allocation semantics may want to skip over these. mm: page_alloc: wait for OOM killer progress before retrying One of the hangs we have seen reported is from lower order allocations that loop infinitely in the allocator. In an attempt to address that, it has been proposed to limit the number of retry loops - possibly even make that number configurable from userspace - and return NULL once we are certain that the system is "truly OOM". But it wasn't clear how high that number needs to be to reliably determine a global OOM situation from the perspective of an individual allocation. An issue is that OOM killing is currently an asynchroneous operation and the optimal retry number depends on how long it takes an OOM kill victim to exit and release its memory - which of course varies with system load and exiting task. To address this, this patch makes OOM killing synchroneous and only returns to the allocator once the victim has actually exited. With that, the allocator no longer requires retry loops just to poll for the victim releasing memory. mm: page_alloc: private memory reserves for OOM-killing allocations Once out_of_memory() is synchroneous, there are still two issues that can make determining system-wide OOM from a single allocation context unreliable. For one, concurrent allocations can swoop in right after a kill and steal the memory, causing spurious allocation failures for contexts that actually freed memory. But also, the OOM victim could get blocked on some state that the allocation is holding, which would delay the release of the memory (and refilling of the reserves) until after the allocation has completed. This patch creates private reserves for allocations that have issued an OOM kill. Once these reserves run dry, it seems reasonable to assume that other allocations are not succeeding either anymore. mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations An exacerbation of the victim-stuck-behind-allocation scenario are __GFP_NOFAIL allocations, because they will actually deadlock. To avoid this, or try to, give __GFP_NOFAIL allocations access to not just the OOM reserves but also the system's emergency reserves. This is basically a poor man's reservation system, which could or should be replaced later on with an explicit reservation system that e.g. filesystems have control over for use by transactions. It's obviously not bulletproof and might still lock up, but it should greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has been using this successfully for some time. mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Another hang that was reported was from NOFS allocations. The trouble with these is that they can't issue or wait for writeback during page reclaim, and so we don't want to OOM kill on their behalf. However, with such restrictions on making progress, they are prone to hangs. This patch makes NOFS allocations fail if reclaim can't free anything. It would be good if the filesystem people could weigh in on whether they can deal with failing GFP_NOFS allocations, or annotate the exceptions with __GFP_NOFAIL etc. It could well be that a middle ground is required that allows using the OOM killer before giving up. mm: page_alloc: do not lock up low-order allocations upon OOM With both OOM killing and "true OOM situation" detection more reliable, this patch finally allows allocations up to order 3 to actually fail on OOM and leave the fallback strategy to the caller - as opposed to the current policy of hanging in the allocator. Comments? drivers/staging/android/lowmemorykiller.c | 2 +- include/linux/mmzone.h| 2 + include/linux/oom.h | 12 +- kernel/exit.c | 2 +- mm/internal.h | 3 +- mm/memcontrol.c
[patch 00/12] mm: page_alloc: improve OOM mechanism and policy
Hi everybody, in the recent past we've had several reports and discussions on how to deal with allocations hanging in the allocator upon OOM. The idea of this series is mainly to make the mechanism of detecting OOM situations reliable enough that we can be confident about failing allocations, and then leave the fallback strategy to the caller rather than looping forever in the allocator. The other part is trying to reduce the __GFP_NOFAIL deadlock rate, at least for the short term while we don't have a reservation system yet. Here is a breakdown of the proposed changes: mm: oom_kill: remove pointless locking in oom_enable() mm: oom_kill: clean up victim marking and exiting interfaces mm: oom_kill: remove misleading test-and-clear of known TIF_MEMDIE mm: oom_kill: remove pointless locking in exit_oom_victim() mm: oom_kill: generalize OOM progress waitqueue mm: oom_kill: simplify OOM killer locking mm: page_alloc: inline should_alloc_retry() contents These are preparational patches to clean up parts in the OOM killer and the page allocator. Filesystem folks and others that only care about allocation semantics may want to skip over these. mm: page_alloc: wait for OOM killer progress before retrying One of the hangs we have seen reported is from lower order allocations that loop infinitely in the allocator. In an attempt to address that, it has been proposed to limit the number of retry loops - possibly even make that number configurable from userspace - and return NULL once we are certain that the system is truly OOM. But it wasn't clear how high that number needs to be to reliably determine a global OOM situation from the perspective of an individual allocation. An issue is that OOM killing is currently an asynchroneous operation and the optimal retry number depends on how long it takes an OOM kill victim to exit and release its memory - which of course varies with system load and exiting task. To address this, this patch makes OOM killing synchroneous and only returns to the allocator once the victim has actually exited. With that, the allocator no longer requires retry loops just to poll for the victim releasing memory. mm: page_alloc: private memory reserves for OOM-killing allocations Once out_of_memory() is synchroneous, there are still two issues that can make determining system-wide OOM from a single allocation context unreliable. For one, concurrent allocations can swoop in right after a kill and steal the memory, causing spurious allocation failures for contexts that actually freed memory. But also, the OOM victim could get blocked on some state that the allocation is holding, which would delay the release of the memory (and refilling of the reserves) until after the allocation has completed. This patch creates private reserves for allocations that have issued an OOM kill. Once these reserves run dry, it seems reasonable to assume that other allocations are not succeeding either anymore. mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations An exacerbation of the victim-stuck-behind-allocation scenario are __GFP_NOFAIL allocations, because they will actually deadlock. To avoid this, or try to, give __GFP_NOFAIL allocations access to not just the OOM reserves but also the system's emergency reserves. This is basically a poor man's reservation system, which could or should be replaced later on with an explicit reservation system that e.g. filesystems have control over for use by transactions. It's obviously not bulletproof and might still lock up, but it should greatly reduce the likelihood. AFAIK Andrea, whose idea this was, has been using this successfully for some time. mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Another hang that was reported was from NOFS allocations. The trouble with these is that they can't issue or wait for writeback during page reclaim, and so we don't want to OOM kill on their behalf. However, with such restrictions on making progress, they are prone to hangs. This patch makes NOFS allocations fail if reclaim can't free anything. It would be good if the filesystem people could weigh in on whether they can deal with failing GFP_NOFS allocations, or annotate the exceptions with __GFP_NOFAIL etc. It could well be that a middle ground is required that allows using the OOM killer before giving up. mm: page_alloc: do not lock up low-order allocations upon OOM With both OOM killing and true OOM situation detection more reliable, this patch finally allows allocations up to order 3 to actually fail on OOM and leave the fallback strategy to the caller - as opposed to the current policy of hanging in the allocator. Comments? drivers/staging/android/lowmemorykiller.c | 2 +- include/linux/mmzone.h| 2 + include/linux/oom.h | 12 +- kernel/exit.c | 2 +- mm/internal.h | 3 +- mm/memcontrol.c