Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-19 Thread Vlastimil Babka
On 05/18/2017 07:07 PM, Christoph Lameter wrote: > On Thu, 18 May 2017, Vlastimil Babka wrote: > >>> The race is where? If you expand the node set during the move of the >>> application then you are safe in terms of the legacy apps that did not >>> include static bindings. >> >> No, that

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-19 Thread Vlastimil Babka
On 05/18/2017 07:07 PM, Christoph Lameter wrote: > On Thu, 18 May 2017, Vlastimil Babka wrote: > >>> The race is where? If you expand the node set during the move of the >>> application then you are safe in terms of the legacy apps that did not >>> include static bindings. >> >> No, that

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-19 Thread Michal Hocko
On Thu 18-05-17 14:07:45, Cristopher Lameter wrote: > On Thu, 18 May 2017, Michal Hocko wrote: > > > > See above. OOM Kill in a cpuset does not kill an innocent task but a task > > > that does an allocation in that specific context meaning a task in that > > > cpuset that also has a memory

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-19 Thread Michal Hocko
On Thu 18-05-17 14:07:45, Cristopher Lameter wrote: > On Thu, 18 May 2017, Michal Hocko wrote: > > > > See above. OOM Kill in a cpuset does not kill an innocent task but a task > > > that does an allocation in that specific context meaning a task in that > > > cpuset that also has a memory

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Christoph Lameter
On Thu, 18 May 2017, Michal Hocko wrote: > > See above. OOM Kill in a cpuset does not kill an innocent task but a task > > that does an allocation in that specific context meaning a task in that > > cpuset that also has a memory policty. > > No, the oom killer will chose the largest task in the

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Christoph Lameter
On Thu, 18 May 2017, Michal Hocko wrote: > > See above. OOM Kill in a cpuset does not kill an innocent task but a task > > that does an allocation in that specific context meaning a task in that > > cpuset that also has a memory policty. > > No, the oom killer will chose the largest task in the

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Michal Hocko
On Thu 18-05-17 11:57:55, Cristopher Lameter wrote: > On Thu, 18 May 2017, Michal Hocko wrote: > > > > Nope. The OOM in a cpuset gets the process doing the alloc killed. Or what > > > that changed? > > ! > > > > > > > At this point you have messed up royally and nothing is going to rescue >

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Michal Hocko
On Thu 18-05-17 11:57:55, Cristopher Lameter wrote: > On Thu, 18 May 2017, Michal Hocko wrote: > > > > Nope. The OOM in a cpuset gets the process doing the alloc killed. Or what > > > that changed? > > ! > > > > > > > At this point you have messed up royally and nothing is going to rescue >

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Christoph Lameter
On Thu, 18 May 2017, Vlastimil Babka wrote: > > The race is where? If you expand the node set during the move of the > > application then you are safe in terms of the legacy apps that did not > > include static bindings. > > No, that expand/shrink by itself doesn't work against parallel

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Christoph Lameter
On Thu, 18 May 2017, Vlastimil Babka wrote: > > The race is where? If you expand the node set during the move of the > > application then you are safe in terms of the legacy apps that did not > > include static bindings. > > No, that expand/shrink by itself doesn't work against parallel

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Christoph Lameter
On Thu, 18 May 2017, Michal Hocko wrote: > > Nope. The OOM in a cpuset gets the process doing the alloc killed. Or what > > that changed? ! > > > > At this point you have messed up royally and nothing is going to rescue > > you anyways. OOM or not does not matter anymore. The app will fail.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Christoph Lameter
On Thu, 18 May 2017, Michal Hocko wrote: > > Nope. The OOM in a cpuset gets the process doing the alloc killed. Or what > > that changed? ! > > > > At this point you have messed up royally and nothing is going to rescue > > you anyways. OOM or not does not matter anymore. The app will fail.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Vlastimil Babka
On 05/17/2017 04:48 PM, Christoph Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy case in a raceless way? >>> >>> You dont have to do that if you do not create an empty mempolicy in the >>> first place.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Vlastimil Babka
On 05/17/2017 04:48 PM, Christoph Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy case in a raceless way? >>> >>> You dont have to do that if you do not create an empty mempolicy in the >>> first place.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Michal Hocko
On Wed 17-05-17 10:25:09, Cristopher Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > > > > If you have screwy things like static mbinds in there then you are > > > hopelessly lost anyways. You may have moved the process to another set > > > of nodes but the static bindings may refer

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-18 Thread Michal Hocko
On Wed 17-05-17 10:25:09, Cristopher Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > > > > If you have screwy things like static mbinds in there then you are > > > hopelessly lost anyways. You may have moved the process to another set > > > of nodes but the static bindings may refer

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > The race is where? If you expand the node set during the move of the > > application then you are safe in terms of the legacy apps that did not > > include static bindings. > > I am pretty sure it is describe in those changelogs and I won't repeat > it

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > The race is where? If you expand the node set during the move of the > > application then you are safe in terms of the legacy apps that did not > > include static bindings. > > I am pretty sure it is describe in those changelogs and I won't repeat > it

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > If you have screwy things like static mbinds in there then you are > > hopelessly lost anyways. You may have moved the process to another set > > of nodes but the static bindings may refer to a node no longer > > available. Thus the OOM is legitimate.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > If you have screwy things like static mbinds in there then you are > > hopelessly lost anyways. You may have moved the process to another set > > of nodes but the static bindings may refer to a node no longer > > available. Thus the OOM is legitimate.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Michal Hocko
On Wed 17-05-17 09:48:25, Cristopher Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > > > > > So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy > > > > case in a raceless way? > > > > > > You dont have to do that if you do not create an empty mempolicy in the > >

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Michal Hocko
On Wed 17-05-17 09:48:25, Cristopher Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > > > > > So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy > > > > case in a raceless way? > > > > > > You dont have to do that if you do not create an empty mempolicy in the > >

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > > So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy > > > case in a raceless way? > > > > You dont have to do that if you do not create an empty mempolicy in the > > first place. The current kernel code avoids that by first

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > > So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy > > > case in a raceless way? > > > > You dont have to do that if you do not create an empty mempolicy in the > > first place. The current kernel code avoids that by first

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Michal Hocko
On Wed 17-05-17 08:56:34, Cristopher Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > > > > We certainly can do that. The failure of the page faults are due to the > > > admin trying to move an application that is not aware of this and is using > > > mempols. That could be an error.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Michal Hocko
On Wed 17-05-17 08:56:34, Cristopher Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > > > > We certainly can do that. The failure of the page faults are due to the > > > admin trying to move an application that is not aware of this and is using > > > mempols. That could be an error.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > We certainly can do that. The failure of the page faults are due to the > > admin trying to move an application that is not aware of this and is using > > mempols. That could be an error. Trying to move an application that > > contains both absolute

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Christoph Lameter
On Wed, 17 May 2017, Michal Hocko wrote: > > We certainly can do that. The failure of the page faults are due to the > > admin trying to move an application that is not aware of this and is using > > mempols. That could be an error. Trying to move an application that > > contains both absolute

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Michal Hocko
On Sun 30-04-17 16:33:10, Cristopher Lameter wrote: > On Wed, 26 Apr 2017, Vlastimil Babka wrote: > > > > Such an application typically already has such logic and executes a > > > binding after discovering its numa node configuration on startup. It would > > > have to be modified to redo that

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-05-17 Thread Michal Hocko
On Sun 30-04-17 16:33:10, Cristopher Lameter wrote: > On Wed, 26 Apr 2017, Vlastimil Babka wrote: > > > > Such an application typically already has such logic and executes a > > > binding after discovering its numa node configuration on startup. It would > > > have to be modified to redo that

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-30 Thread Christoph Lameter
On Wed, 26 Apr 2017, Vlastimil Babka wrote: > > Such an application typically already has such logic and executes a > > binding after discovering its numa node configuration on startup. It would > > have to be modified to redo that action when it gets some sort of a signal > > from the script

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-30 Thread Christoph Lameter
On Wed, 26 Apr 2017, Vlastimil Babka wrote: > > Such an application typically already has such logic and executes a > > binding after discovering its numa node configuration on startup. It would > > have to be modified to redo that action when it gets some sort of a signal > > from the script

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-26 Thread Vlastimil Babka
On 04/14/2017 10:37 PM, Christoph Lameter wrote: > On Thu, 13 Apr 2017, Vlastimil Babka wrote: > >> >> I doubt we can change that now, because that can break existing >> programs. It also makes some sense at least to me, because a task can >> control its own mempolicy (for performance reasons),

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-26 Thread Vlastimil Babka
On 04/14/2017 10:37 PM, Christoph Lameter wrote: > On Thu, 13 Apr 2017, Vlastimil Babka wrote: > >> >> I doubt we can change that now, because that can break existing >> programs. It also makes some sense at least to me, because a task can >> control its own mempolicy (for performance reasons),

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-14 Thread Christoph Lameter
On Thu, 13 Apr 2017, Vlastimil Babka wrote: > > I doubt we can change that now, because that can break existing > programs. It also makes some sense at least to me, because a task can > control its own mempolicy (for performance reasons), but cpuset changes > are admin decisions that the task

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-14 Thread Christoph Lameter
On Thu, 13 Apr 2017, Vlastimil Babka wrote: > > I doubt we can change that now, because that can break existing > programs. It also makes some sense at least to me, because a task can > control its own mempolicy (for performance reasons), but cpuset changes > are admin decisions that the task

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-13 Thread Vlastimil Babka
On 04/12/2017 11:25 PM, Christoph Lameter wrote: > On Tue, 11 Apr 2017, Vlastimil Babka wrote: > >>> The fallback was only intended for a cpuset on which boundaries are not >>> enforced >>> in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) >>> should fail the allocation. >>

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-13 Thread Vlastimil Babka
On 04/12/2017 11:25 PM, Christoph Lameter wrote: > On Tue, 11 Apr 2017, Vlastimil Babka wrote: > >>> The fallback was only intended for a cpuset on which boundaries are not >>> enforced >>> in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) >>> should fail the allocation. >>

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-13 Thread Vlastimil Babka
On 04/13/2017 08:06 AM, Vlastimil Babka wrote: >> Did you really mean node_zonelist() in both the instances above. Because >> that function just picks up either FALLBACK_ZONELIST or NOFALLBACK_ZONELIST >> depending upon the passed GFP flags in the allocation request and does not >> deal with

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-13 Thread Vlastimil Babka
On 04/13/2017 08:06 AM, Vlastimil Babka wrote: >> Did you really mean node_zonelist() in both the instances above. Because >> that function just picks up either FALLBACK_ZONELIST or NOFALLBACK_ZONELIST >> depending upon the passed GFP flags in the allocation request and does not >> deal with

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-13 Thread Vlastimil Babka
On 04/13/2017 07:42 AM, Anshuman Khandual wrote: > On 04/11/2017 07:36 PM, Vlastimil Babka wrote: >> Commit e47483bca2cc ("mm, page_alloc: fix premature OOM when racing with >> cpuset >> mems update") has fixed known recent regressions found by LTP's cpuset01 >> testcase. I have however found

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-13 Thread Vlastimil Babka
On 04/13/2017 07:42 AM, Anshuman Khandual wrote: > On 04/11/2017 07:36 PM, Vlastimil Babka wrote: >> Commit e47483bca2cc ("mm, page_alloc: fix premature OOM when racing with >> cpuset >> mems update") has fixed known recent regressions found by LTP's cpuset01 >> testcase. I have however found

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-12 Thread Anshuman Khandual
On 04/11/2017 07:36 PM, Vlastimil Babka wrote: > Commit e47483bca2cc ("mm, page_alloc: fix premature OOM when racing with > cpuset > mems update") has fixed known recent regressions found by LTP's cpuset01 > testcase. I have however found that by modifying the testcase to use per-vma >

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-12 Thread Anshuman Khandual
On 04/11/2017 07:36 PM, Vlastimil Babka wrote: > Commit e47483bca2cc ("mm, page_alloc: fix premature OOM when racing with > cpuset > mems update") has fixed known recent regressions found by LTP's cpuset01 > testcase. I have however found that by modifying the testcase to use per-vma >

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-12 Thread Christoph Lameter
On Tue, 11 Apr 2017, Vlastimil Babka wrote: > > The fallback was only intended for a cpuset on which boundaries are not > > enforced > > in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) > > should fail the allocation. > > Hmm just to clarify - I'm talking about ignoring the

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-12 Thread Christoph Lameter
On Tue, 11 Apr 2017, Vlastimil Babka wrote: > > The fallback was only intended for a cpuset on which boundaries are not > > enforced > > in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) > > should fail the allocation. > > Hmm just to clarify - I'm talking about ignoring the

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-11 Thread Vlastimil Babka
+CC linux-api On 11.4.2017 19:24, Christoph Lameter wrote: > On Tue, 11 Apr 2017, Vlastimil Babka wrote: > >> The root of the problem is that the cpuset's mems_allowed and mempolicy's >> nodemask can temporarily have no intersection, thus get_page_from_freelist() >> cannot find any usable zone.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-11 Thread Vlastimil Babka
+CC linux-api On 11.4.2017 19:24, Christoph Lameter wrote: > On Tue, 11 Apr 2017, Vlastimil Babka wrote: > >> The root of the problem is that the cpuset's mems_allowed and mempolicy's >> nodemask can temporarily have no intersection, thus get_page_from_freelist() >> cannot find any usable zone.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-11 Thread Christoph Lameter
On Tue, 11 Apr 2017, Vlastimil Babka wrote: > The root of the problem is that the cpuset's mems_allowed and mempolicy's > nodemask can temporarily have no intersection, thus get_page_from_freelist() > cannot find any usable zone. The current semantic for empty intersection is to > ignore

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-11 Thread Christoph Lameter
On Tue, 11 Apr 2017, Vlastimil Babka wrote: > The root of the problem is that the cpuset's mems_allowed and mempolicy's > nodemask can temporarily have no intersection, thus get_page_from_freelist() > cannot find any usable zone. The current semantic for empty intersection is to > ignore

[RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-11 Thread Vlastimil Babka
Commit e47483bca2cc ("mm, page_alloc: fix premature OOM when racing with cpuset mems update") has fixed known recent regressions found by LTP's cpuset01 testcase. I have however found that by modifying the testcase to use per-vma mempolicies via bind(2) instead of per-task mempolicies via

[RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-11 Thread Vlastimil Babka
Commit e47483bca2cc ("mm, page_alloc: fix premature OOM when racing with cpuset mems update") has fixed known recent regressions found by LTP's cpuset01 testcase. I have however found that by modifying the testcase to use per-vma mempolicies via bind(2) instead of per-task mempolicies via