Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-06 Thread Michal Hocko
On Fri 06-11-20 17:08:57, Feng Tang wrote: [...] > You are right, there are quiet several types of page allocations failures. > The callstack in patch 2/2 is a GFP_HIGHUSER from pipe_write, and there > are more types of kernel allocation requests which will got blocked by > the differnt check. My

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-06 Thread Feng Tang
On Fri, Nov 06, 2020 at 09:10:26AM +0100, Michal Hocko wrote: > > > > The incomming parameter nodemask is NULL, and the function will first > > > > try the > > > > cpuset nodemask (1 here), and the zoneidx is only granted 2, which > > > > makes the > > > > 'ac's preferred zone to be NULL. so it

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-06 Thread Michal Hocko
On Fri 06-11-20 15:06:56, Feng Tang wrote: > On Thu, Nov 05, 2020 at 05:16:12PM +0100, Michal Hocko wrote: > > On Thu 05-11-20 21:43:05, Feng Tang wrote: > > > On Thu, Nov 05, 2020 at 02:12:45PM +0100, Michal Hocko wrote: > > > > On Thu 05-11-20 21:07:10, Feng Tang wrote: > > > > [...] > > > > >

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Michal Hocko
On Fri 06-11-20 12:32:44, Huang, Ying wrote: > Michal Hocko writes: > > > On Thu 05-11-20 09:40:28, Feng Tang wrote: > >> On Wed, Nov 04, 2020 at 09:53:43AM +0100, Michal Hocko wrote: > >> > >> > > > As I've said in reply to your second patch. I think we can make the > >> > > > oom > >> > > >

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Feng Tang
On Thu, Nov 05, 2020 at 05:16:12PM +0100, Michal Hocko wrote: > On Thu 05-11-20 21:43:05, Feng Tang wrote: > > On Thu, Nov 05, 2020 at 02:12:45PM +0100, Michal Hocko wrote: > > > On Thu 05-11-20 21:07:10, Feng Tang wrote: > > > [...] > > > > My debug traces shows it is, and its gfp_mask is

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Huang, Ying
Michal Hocko writes: > On Thu 05-11-20 09:40:28, Feng Tang wrote: >> On Wed, Nov 04, 2020 at 09:53:43AM +0100, Michal Hocko wrote: >> >> > > > As I've said in reply to your second patch. I think we can make the oom >> > > > killer behavior more sensible in this misconfigured cases but I do not

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Michal Hocko
On Thu 05-11-20 21:43:05, Feng Tang wrote: > On Thu, Nov 05, 2020 at 02:12:45PM +0100, Michal Hocko wrote: > > On Thu 05-11-20 21:07:10, Feng Tang wrote: > > [...] > > > My debug traces shows it is, and its gfp_mask is 'GFP_KERNEL' > > > > Can you provide the full information please? Which node

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Feng Tang
On Thu, Nov 05, 2020 at 02:12:45PM +0100, Michal Hocko wrote: > On Thu 05-11-20 21:07:10, Feng Tang wrote: > [...] > > My debug traces shows it is, and its gfp_mask is 'GFP_KERNEL' > > Can you provide the full information please? Which node has been > requested. Which cpuset the calling process

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Vlastimil Babka
On 11/5/20 2:19 PM, Michal Hocko wrote: On Thu 05-11-20 14:14:25, Vlastimil Babka wrote: On 11/5/20 1:58 PM, Michal Hocko wrote: > On Thu 05-11-20 13:53:24, Vlastimil Babka wrote: > > On 11/5/20 1:08 PM, Michal Hocko wrote: > > > On Thu 05-11-20 09:40:28, Feng Tang wrote: > > > > > > Could you

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Michal Hocko
On Thu 05-11-20 14:14:25, Vlastimil Babka wrote: > On 11/5/20 1:58 PM, Michal Hocko wrote: > > On Thu 05-11-20 13:53:24, Vlastimil Babka wrote: > > > On 11/5/20 1:08 PM, Michal Hocko wrote: > > > > On Thu 05-11-20 09:40:28, Feng Tang wrote: > > > > > > > Could you be more specific? This sounds

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Vlastimil Babka
On 11/5/20 1:58 PM, Michal Hocko wrote: On Thu 05-11-20 13:53:24, Vlastimil Babka wrote: On 11/5/20 1:08 PM, Michal Hocko wrote: > On Thu 05-11-20 09:40:28, Feng Tang wrote: > > > > Could you be more specific? This sounds like a bug. Allocations > > > shouldn't spill over to a node which is not

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Michal Hocko
On Thu 05-11-20 21:07:10, Feng Tang wrote: [...] > My debug traces shows it is, and its gfp_mask is 'GFP_KERNEL' Can you provide the full information please? Which node has been requested. Which cpuset the calling process run in and which node has the allocation succeeded from? A bare dump_stack

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Feng Tang
On Thu, Nov 05, 2020 at 01:58:28PM +0100, Michal Hocko wrote: > On Thu 05-11-20 13:53:24, Vlastimil Babka wrote: > > On 11/5/20 1:08 PM, Michal Hocko wrote: > > > On Thu 05-11-20 09:40:28, Feng Tang wrote: > > > > > > Could you be more specific? This sounds like a bug. Allocations > > > > >

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Michal Hocko
On Thu 05-11-20 13:53:24, Vlastimil Babka wrote: > On 11/5/20 1:08 PM, Michal Hocko wrote: > > On Thu 05-11-20 09:40:28, Feng Tang wrote: > > > > > Could you be more specific? This sounds like a bug. Allocations > > > > shouldn't spill over to a node which is not in the cpuset. There are few > > >

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Vlastimil Babka
On 11/5/20 1:08 PM, Michal Hocko wrote: On Thu 05-11-20 09:40:28, Feng Tang wrote: > > Could you be more specific? This sounds like a bug. Allocations > shouldn't spill over to a node which is not in the cpuset. There are few > exceptions like IRQ context but that shouldn't happen regurarly.

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-05 Thread Michal Hocko
On Thu 05-11-20 09:40:28, Feng Tang wrote: > On Wed, Nov 04, 2020 at 09:53:43AM +0100, Michal Hocko wrote: > > > > > As I've said in reply to your second patch. I think we can make the oom > > > > killer behavior more sensible in this misconfigured cases but I do not > > > > think we want break

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-04 Thread Feng Tang
On Wed, Nov 04, 2020 at 09:53:43AM +0100, Michal Hocko wrote: > > > As I've said in reply to your second patch. I think we can make the oom > > > killer behavior more sensible in this misconfigured cases but I do not > > > think we want break the cpuset isolation for such a configuration. > > >

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-04 Thread Michal Hocko
On Wed 04-11-20 16:40:21, Feng Tang wrote: > On Wed, Nov 04, 2020 at 08:58:19AM +0100, Michal Hocko wrote: > > On Wed 04-11-20 15:38:26, Feng Tang wrote: > > [...] > > > > Could you be more specific about the usecase here? Why do you need a > > > > binding to a pure movable node? > > > > > > One

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-04 Thread Feng Tang
On Wed, Nov 04, 2020 at 08:58:19AM +0100, Michal Hocko wrote: > On Wed 04-11-20 15:38:26, Feng Tang wrote: > [...] > > > Could you be more specific about the usecase here? Why do you need a > > > binding to a pure movable node? > > > > One common configuration for a platform is small size of

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-03 Thread Michal Hocko
On Wed 04-11-20 15:38:26, Feng Tang wrote: [...] > > Could you be more specific about the usecase here? Why do you need a > > binding to a pure movable node? > > One common configuration for a platform is small size of DRAM plus huge > size of PMEM (which is slower but cheaper), and my guess of

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-03 Thread Feng Tang
Hi Michal, Thanks for the prompt review! On Wed, Nov 04, 2020 at 08:13:08AM +0100, Michal Hocko wrote: > On Wed 04-11-20 14:10:08, Feng Tang wrote: > > Hi, > > > > This patchset tries to report a problem and get suggestion/review > > for the RFC fix patches. > > > > We recently got a OOM

Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-03 Thread Michal Hocko
On Wed 04-11-20 14:10:08, Feng Tang wrote: > Hi, > > This patchset tries to report a problem and get suggestion/review > for the RFC fix patches. > > We recently got a OOM report, that when user try to bind a docker(container) > instance to a memory node which only has movable zones, and OOM

[RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

2020-11-03 Thread Feng Tang
Hi, This patchset tries to report a problem and get suggestion/review for the RFC fix patches. We recently got a OOM report, that when user try to bind a docker(container) instance to a memory node which only has movable zones, and OOM killing still can't solve the page allocation failure. The