Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri, Dec 07, 2018 at 03:05:28PM -0800, David Rientjes wrote: > > > Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat > > > different policy for hugepage allocations, which were allocated through > > > alloc_hugepage_vma(). For hugepage allocations, if the allocating > > > process's node is in the set of allowed nodes, allocate with > > > __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with > > > __GFP_THISNODE instead). > > > > Why is it wrong to fallback to an explicitly configured mbind mask? > > > > The new_page() case is similar to the shmem_alloc_hugepage() case. Prior > to 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into > alloc_hugepage_direct_gfpmask"), shmem_alloc_hugepage() did > alloc_pages_vma() with hugepage == true, which effected a different > allocation policy: if the node current is running on is allowed by the > policy, use __GFP_THISNODE (considering ac5b2c18911ff is reverted, which > it is in Linus's tree). > > After 89c83fb539f9, we lose that and can fallback to remote memory. Since > the discussion is on-going wrt the NUMA aspects of hugepage allocations, > it's better to have a stable 4.20 tree while that is being worked out and > likely deserves separate patches for both new_page() and > shmem_alloc_hugepage(). For the latter specifically, I assume it would be > nice to get an Acked-by by Kirill who implemented shmem_alloc_hugepage() > with hugepage == true back in 4.8 that also had the __GFP_THISNODE > behavior before the allocation policy is suddenly changed. I do not have much experience with page_alloc/compaction/reclaim paths and I don't feel that my opinion should have much weight here. Do not gate it on me. (I do follow the discussion, but I don't have anything meaningful to contribute so far.) -- Kirill A. Shutemov
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri 07-12-18 15:05:28, David Rientjes wrote: > On Fri, 7 Dec 2018, Michal Hocko wrote: [...] > > > Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat > > > different policy for hugepage allocations, which were allocated through > > > alloc_hugepage_vma(). For hugepage allocations, if the allocating > > > process's node is in the set of allowed nodes, allocate with > > > __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with > > > __GFP_THISNODE instead). > > > > Why is it wrong to fallback to an explicitly configured mbind mask? > > > > The new_page() case is similar to the shmem_alloc_hugepage() case. Prior > to 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into > alloc_hugepage_direct_gfpmask"), shmem_alloc_hugepage() did > alloc_pages_vma() with hugepage == true, which effected a different > allocation policy: if the node current is running on is allowed by the > policy, use __GFP_THISNODE (considering ac5b2c18911ff is reverted, which > it is in Linus's tree). > > After 89c83fb539f9, we lose that and can fallback to remote memory. Since > the discussion is on-going wrt the NUMA aspects of hugepage allocations, > it's better to have a stable 4.20 tree while that is being worked out and > likely deserves separate patches for both new_page() and > shmem_alloc_hugepage(). For the latter specifically, I assume it would be > nice to get an Acked-by by Kirill who implemented shmem_alloc_hugepage() > with hugepage == true back in 4.8 that also had the __GFP_THISNODE > behavior before the allocation policy is suddenly changed. This should be a part of the changelog. -- Michal Hocko SUSE Labs
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri, 7 Dec 2018, Michal Hocko wrote: > > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > > > There are a couple of issues with 89c83fb539f9 independent of its partial > > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > > allocations"): > > > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > > the same for shared vma policies, triggering the WARN_ON_ONCE() in > > policy_node(). > > Could you share a test case? > Sorry, as Vlastimil pointed out this race does not exist anymore since commit 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") since it removed the restructuring of alloc_hugepage_direct_gfpmask(). It existed prior to this commit for shared vma policies that could modify the policy between alloc_hugepage_direct_gfpmask() and alloc_pages_vma() if the policy switches to MPOL_BIND in that window. > > Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat > > different policy for hugepage allocations, which were allocated through > > alloc_hugepage_vma(). For hugepage allocations, if the allocating > > process's node is in the set of allowed nodes, allocate with > > __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with > > __GFP_THISNODE instead). > > Why is it wrong to fallback to an explicitly configured mbind mask? > The new_page() case is similar to the shmem_alloc_hugepage() case. Prior to 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"), shmem_alloc_hugepage() did alloc_pages_vma() with hugepage == true, which effected a different allocation policy: if the node current is running on is allowed by the policy, use __GFP_THISNODE (considering ac5b2c18911ff is reverted, which it is in Linus's tree). After 89c83fb539f9, we lose that and can fallback to remote memory. Since the discussion is on-going wrt the NUMA aspects of hugepage allocations, it's better to have a stable 4.20 tree while that is being worked out and likely deserves separate patches for both new_page() and shmem_alloc_hugepage(). For the latter specifically, I assume it would be nice to get an Acked-by by Kirill who implemented shmem_alloc_hugepage() with hugepage == true back in 4.8 that also had the __GFP_THISNODE behavior before the allocation policy is suddenly changed.
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri, 7 Dec 2018, Michal Hocko wrote: > > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > > > There are a couple of issues with 89c83fb539f9 independent of its partial > > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > > allocations"): > > > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > > the same for shared vma policies, triggering the WARN_ON_ONCE() in > > policy_node(). > > Could you share a test case? > Sorry, as Vlastimil pointed out this race does not exist anymore since commit 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") since it removed the restructuring of alloc_hugepage_direct_gfpmask(). It existed prior to this commit for shared vma policies that could modify the policy between alloc_hugepage_direct_gfpmask() and alloc_pages_vma() if the policy switches to MPOL_BIND in that window. > > Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat > > different policy for hugepage allocations, which were allocated through > > alloc_hugepage_vma(). For hugepage allocations, if the allocating > > process's node is in the set of allowed nodes, allocate with > > __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with > > __GFP_THISNODE instead). > > Why is it wrong to fallback to an explicitly configured mbind mask? > The new_page() case is similar to the shmem_alloc_hugepage() case. Prior to 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"), shmem_alloc_hugepage() did alloc_pages_vma() with hugepage == true, which effected a different allocation policy: if the node current is running on is allowed by the policy, use __GFP_THISNODE (considering ac5b2c18911ff is reverted, which it is in Linus's tree). After 89c83fb539f9, we lose that and can fallback to remote memory. Since the discussion is on-going wrt the NUMA aspects of hugepage allocations, it's better to have a stable 4.20 tree while that is being worked out and likely deserves separate patches for both new_page() and shmem_alloc_hugepage(). For the latter specifically, I assume it would be nice to get an Acked-by by Kirill who implemented shmem_alloc_hugepage() with hugepage == true back in 4.8 that also had the __GFP_THISNODE behavior before the allocation policy is suddenly changed.
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri, Dec 7, 2018 at 2:27 PM David Rientjes wrote: > > I noticed the race in 89c83fb539f9 ("mm, thp: consolidate THP gfp handling > into alloc_hugepage_direct_gfpmask") that is fixed by the revert, but as > you noted it didn't cleanup the second part which is the balancing act for > gfp flags between alloc_hugepage_direct_gfpmask() and alloc_pages_vma(). > Syzbot found this to trigger the WARN_ON_ONCE() you mention. Can you rewrite the commit message to explain this better rather than the misleading "race" description? Linus smime.p7s Description: S/MIME Cryptographic Signature
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri, Dec 7, 2018 at 2:27 PM David Rientjes wrote: > > I noticed the race in 89c83fb539f9 ("mm, thp: consolidate THP gfp handling > into alloc_hugepage_direct_gfpmask") that is fixed by the revert, but as > you noted it didn't cleanup the second part which is the balancing act for > gfp flags between alloc_hugepage_direct_gfpmask() and alloc_pages_vma(). > Syzbot found this to trigger the WARN_ON_ONCE() you mention. Can you rewrite the commit message to explain this better rather than the misleading "race" description? Linus smime.p7s Description: S/MIME Cryptographic Signature
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri, 7 Dec 2018, Vlastimil Babka wrote: > > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > > > There are a couple of issues with 89c83fb539f9 independent of its partial > > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > > allocations"): > > > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > > the same for shared vma policies, triggering the WARN_ON_ONCE() in > > policy_node(). > > AFAICS 2f0799a0ffc0 removed the policy check in > alloc_hugepage_direct_gfpmask() comlpetely, so it's not racy and the > warning will always trigger for a MPOL_BIND policy right now? > Yup, looks like you hit it on the head. This revert should have been done alongside 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations"), I didn't appreciate how invasive the consolidation patch was. I noticed the race in 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") that is fixed by the revert, but as you noted it didn't cleanup the second part which is the balancing act for gfp flags between alloc_hugepage_direct_gfpmask() and alloc_pages_vma(). Syzbot found this to trigger the WARN_ON_ONCE() you mention. So we certainly need this patch for 4.20 as a follow-up to 2f0799a0ffc0. It's likely better to regroup and discuss NUMA aspects of all thp allocations separately with a stable 4.20.
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Fri, 7 Dec 2018, Vlastimil Babka wrote: > > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > > > There are a couple of issues with 89c83fb539f9 independent of its partial > > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > > allocations"): > > > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > > the same for shared vma policies, triggering the WARN_ON_ONCE() in > > policy_node(). > > AFAICS 2f0799a0ffc0 removed the policy check in > alloc_hugepage_direct_gfpmask() comlpetely, so it's not racy and the > warning will always trigger for a MPOL_BIND policy right now? > Yup, looks like you hit it on the head. This revert should have been done alongside 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations"), I didn't appreciate how invasive the consolidation patch was. I noticed the race in 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") that is fixed by the revert, but as you noted it didn't cleanup the second part which is the balancing act for gfp flags between alloc_hugepage_direct_gfpmask() and alloc_pages_vma(). Syzbot found this to trigger the WARN_ON_ONCE() you mention. So we certainly need this patch for 4.20 as a follow-up to 2f0799a0ffc0. It's likely better to regroup and discuss NUMA aspects of all thp allocations separately with a stable 4.20.
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On 12/6/18 11:00 PM, David Rientjes wrote: > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > There are a couple of issues with 89c83fb539f9 independent of its partial > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > allocations"): > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > the same for shared vma policies, triggering the WARN_ON_ONCE() in > policy_node(). AFAICS 2f0799a0ffc0 removed the policy check in alloc_hugepage_direct_gfpmask() comlpetely, so it's not racy and the warning will always trigger for a MPOL_BIND policy right now?
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On 12/6/18 11:00 PM, David Rientjes wrote: > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > There are a couple of issues with 89c83fb539f9 independent of its partial > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > allocations"): > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > the same for shared vma policies, triggering the WARN_ON_ONCE() in > policy_node(). AFAICS 2f0799a0ffc0 removed the policy check in alloc_hugepage_direct_gfpmask() comlpetely, so it's not racy and the warning will always trigger for a MPOL_BIND policy right now?
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Thu 06-12-18 14:00:20, David Rientjes wrote: > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > There are a couple of issues with 89c83fb539f9 independent of its partial > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > allocations"): > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > the same for shared vma policies, triggering the WARN_ON_ONCE() in > policy_node(). Could you share a test case? > Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat > different policy for hugepage allocations, which were allocated through > alloc_hugepage_vma(). For hugepage allocations, if the allocating > process's node is in the set of allowed nodes, allocate with > __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with > __GFP_THISNODE instead). Why is it wrong to fallback to an explicitly configured mbind mask? > This was changed for shmem_alloc_hugepage() to > allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in > mm/mempolicy.c which is functionally different behavior and removes the > requirement to only allocate hugepages locally. Also why should new_page behave any differently for THP from any other pages? There is no fallback allocation for those and so the mbind could easily fail. So what is the actual rationale? > The latter should have been reverted as part of 2f0799a0ffc0 as well. > > Fully revert 89c83fb539f9 so that hugepage allocation policy is fully > restored and there is no race between alloc_hugepage_direct_gfpmask() and > alloc_pages_vma(). > > Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into > alloc_hugepage_direct_gfpmask") > Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") > Signed-off-by: David Rientjes > --- > include/linux/gfp.h | 12 > mm/huge_memory.c| 27 +-- > mm/mempolicy.c | 32 +--- > mm/shmem.c | 2 +- > 4 files changed, 51 insertions(+), 22 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -510,18 +510,22 @@ alloc_pages(gfp_t gfp_mask, unsigned int order) > } > extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, > struct vm_area_struct *vma, unsigned long addr, > - int node); > + int node, bool hugepage); > +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ > + alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true) > #else > #define alloc_pages(gfp_mask, order) \ > alloc_pages_node(numa_node_id(), gfp_mask, order) > -#define alloc_pages_vma(gfp_mask, order, vma, addr, node)\ > +#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\ > + alloc_pages(gfp_mask, order) > +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ > alloc_pages(gfp_mask, order) > #endif > #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) > #define alloc_page_vma(gfp_mask, vma, addr) \ > - alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id()) > + alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false) > #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ > - alloc_pages_vma(gfp_mask, 0, vma, addr, node) > + alloc_pages_vma(gfp_mask, 0, vma, addr, node, false) > > extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); > extern unsigned long get_zeroed_page(gfp_t gfp_mask); > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -629,30 +629,30 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct > vm_fault *vmf, > * available > * never: never stall for any thp allocation > */ > -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct > *vma, unsigned long addr) > +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) > { > const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); > - const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE; > > /* Always do synchronous compaction */ > if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, > _hugepage_flags)) > - return GFP_TRANSHUGE | __GFP_THISNODE | > -(vma_madvised ? 0 : __GFP_NORETRY); > + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); > > /* Kick kcompactd and fail quickly */ > if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, > _hugepage_flags)) > - return gfp_mask | __GFP_KSWAPD_RECLAIM; > + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; >
Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
On Thu 06-12-18 14:00:20, David Rientjes wrote: > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. > > There are a couple of issues with 89c83fb539f9 independent of its partial > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage > allocations"): > > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be > the same for shared vma policies, triggering the WARN_ON_ONCE() in > policy_node(). Could you share a test case? > Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat > different policy for hugepage allocations, which were allocated through > alloc_hugepage_vma(). For hugepage allocations, if the allocating > process's node is in the set of allowed nodes, allocate with > __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with > __GFP_THISNODE instead). Why is it wrong to fallback to an explicitly configured mbind mask? > This was changed for shmem_alloc_hugepage() to > allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in > mm/mempolicy.c which is functionally different behavior and removes the > requirement to only allocate hugepages locally. Also why should new_page behave any differently for THP from any other pages? There is no fallback allocation for those and so the mbind could easily fail. So what is the actual rationale? > The latter should have been reverted as part of 2f0799a0ffc0 as well. > > Fully revert 89c83fb539f9 so that hugepage allocation policy is fully > restored and there is no race between alloc_hugepage_direct_gfpmask() and > alloc_pages_vma(). > > Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into > alloc_hugepage_direct_gfpmask") > Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") > Signed-off-by: David Rientjes > --- > include/linux/gfp.h | 12 > mm/huge_memory.c| 27 +-- > mm/mempolicy.c | 32 +--- > mm/shmem.c | 2 +- > 4 files changed, 51 insertions(+), 22 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -510,18 +510,22 @@ alloc_pages(gfp_t gfp_mask, unsigned int order) > } > extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, > struct vm_area_struct *vma, unsigned long addr, > - int node); > + int node, bool hugepage); > +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ > + alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true) > #else > #define alloc_pages(gfp_mask, order) \ > alloc_pages_node(numa_node_id(), gfp_mask, order) > -#define alloc_pages_vma(gfp_mask, order, vma, addr, node)\ > +#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\ > + alloc_pages(gfp_mask, order) > +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ > alloc_pages(gfp_mask, order) > #endif > #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) > #define alloc_page_vma(gfp_mask, vma, addr) \ > - alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id()) > + alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false) > #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ > - alloc_pages_vma(gfp_mask, 0, vma, addr, node) > + alloc_pages_vma(gfp_mask, 0, vma, addr, node, false) > > extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); > extern unsigned long get_zeroed_page(gfp_t gfp_mask); > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -629,30 +629,30 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct > vm_fault *vmf, > * available > * never: never stall for any thp allocation > */ > -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct > *vma, unsigned long addr) > +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) > { > const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); > - const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE; > > /* Always do synchronous compaction */ > if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, > _hugepage_flags)) > - return GFP_TRANSHUGE | __GFP_THISNODE | > -(vma_madvised ? 0 : __GFP_NORETRY); > + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); > > /* Kick kcompactd and fail quickly */ > if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, > _hugepage_flags)) > - return gfp_mask | __GFP_KSWAPD_RECLAIM; > + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; >
[patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. There are a couple of issues with 89c83fb539f9 independent of its partial revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations"): Firstly, the interaction between alloc_hugepage_direct_gfpmask() and alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be the same for shared vma policies, triggering the WARN_ON_ONCE() in policy_node(). Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat different policy for hugepage allocations, which were allocated through alloc_hugepage_vma(). For hugepage allocations, if the allocating process's node is in the set of allowed nodes, allocate with __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with __GFP_THISNODE instead). This was changed for shmem_alloc_hugepage() to allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in mm/mempolicy.c which is functionally different behavior and removes the requirement to only allocate hugepages locally. The latter should have been reverted as part of 2f0799a0ffc0 as well. Fully revert 89c83fb539f9 so that hugepage allocation policy is fully restored and there is no race between alloc_hugepage_direct_gfpmask() and alloc_pages_vma(). Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") Signed-off-by: David Rientjes --- include/linux/gfp.h | 12 mm/huge_memory.c| 27 +-- mm/mempolicy.c | 32 +--- mm/shmem.c | 2 +- 4 files changed, 51 insertions(+), 22 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -510,18 +510,22 @@ alloc_pages(gfp_t gfp_mask, unsigned int order) } extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, struct vm_area_struct *vma, unsigned long addr, - int node); + int node, bool hugepage); +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ + alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true) #else #define alloc_pages(gfp_mask, order) \ alloc_pages_node(numa_node_id(), gfp_mask, order) -#define alloc_pages_vma(gfp_mask, order, vma, addr, node)\ +#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\ + alloc_pages(gfp_mask, order) +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ alloc_pages(gfp_mask, order) #endif #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) #define alloc_page_vma(gfp_mask, vma, addr)\ - alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id()) + alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false) #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ - alloc_pages_vma(gfp_mask, 0, vma, addr, node) + alloc_pages_vma(gfp_mask, 0, vma, addr, node, false) extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -629,30 +629,30 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) { const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); - const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE; /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, _hugepage_flags)) - return GFP_TRANSHUGE | __GFP_THISNODE | - (vma_madvised ? 0 : __GFP_NORETRY); + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); /* Kick kcompactd and fail quickly */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, _hugepage_flags)) - return gfp_mask | __GFP_KSWAPD_RECLAIM; + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; /* Synchronous compaction if madvised, otherwise kick kcompactd */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, _hugepage_flags)) - return gfp_mask | (vma_madvised ? __GFP_DIRECT_RECLAIM : - __GFP_KSWAPD_RECLAIM); + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : + __GFP_KSWAPD_RECLAIM);
[patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. There are a couple of issues with 89c83fb539f9 independent of its partial revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations"): Firstly, the interaction between alloc_hugepage_direct_gfpmask() and alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND. alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be the same for shared vma policies, triggering the WARN_ON_ONCE() in policy_node(). Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat different policy for hugepage allocations, which were allocated through alloc_hugepage_vma(). For hugepage allocations, if the allocating process's node is in the set of allowed nodes, allocate with __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with __GFP_THISNODE instead). This was changed for shmem_alloc_hugepage() to allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in mm/mempolicy.c which is functionally different behavior and removes the requirement to only allocate hugepages locally. The latter should have been reverted as part of 2f0799a0ffc0 as well. Fully revert 89c83fb539f9 so that hugepage allocation policy is fully restored and there is no race between alloc_hugepage_direct_gfpmask() and alloc_pages_vma(). Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") Signed-off-by: David Rientjes --- include/linux/gfp.h | 12 mm/huge_memory.c| 27 +-- mm/mempolicy.c | 32 +--- mm/shmem.c | 2 +- 4 files changed, 51 insertions(+), 22 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -510,18 +510,22 @@ alloc_pages(gfp_t gfp_mask, unsigned int order) } extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, struct vm_area_struct *vma, unsigned long addr, - int node); + int node, bool hugepage); +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ + alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true) #else #define alloc_pages(gfp_mask, order) \ alloc_pages_node(numa_node_id(), gfp_mask, order) -#define alloc_pages_vma(gfp_mask, order, vma, addr, node)\ +#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\ + alloc_pages(gfp_mask, order) +#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ alloc_pages(gfp_mask, order) #endif #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) #define alloc_page_vma(gfp_mask, vma, addr)\ - alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id()) + alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false) #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ - alloc_pages_vma(gfp_mask, 0, vma, addr, node) + alloc_pages_vma(gfp_mask, 0, vma, addr, node, false) extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -629,30 +629,30 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) { const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); - const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE; /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, _hugepage_flags)) - return GFP_TRANSHUGE | __GFP_THISNODE | - (vma_madvised ? 0 : __GFP_NORETRY); + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); /* Kick kcompactd and fail quickly */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, _hugepage_flags)) - return gfp_mask | __GFP_KSWAPD_RECLAIM; + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; /* Synchronous compaction if madvised, otherwise kick kcompactd */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, _hugepage_flags)) - return gfp_mask | (vma_madvised ? __GFP_DIRECT_RECLAIM : - __GFP_KSWAPD_RECLAIM); + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : + __GFP_KSWAPD_RECLAIM);