Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
On 31.7.2015 23:25, David Rientjes wrote: On Thu, 30 Jul 2015, Vlastimil Babka wrote: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aa58a32..56355f2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm, */ up_read(mm-mmap_sem); -*hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER); +*hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); @@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm, VM_BUG_ON(address ~HPAGE_PMD_MASK); -/* Only allocate from the target node */ -gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) | -__GFP_THISNODE; +gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0); /* release the mmap_sem read lock. */ new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node); Hmm, where is the __GFP_THISNODE enforcement in khugepaged_alloc_page() that is removed in collapse_huge_page()? I also don't see what happened to the __GFP_OTHER_NODE. Crap, I messed up with git, this hunk was supposed to be gone. Thanks for noticing. Please apply without the collapse_huge_page hunk, or tell me to resend once more. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
On Thu, 30 Jul 2015, Vlastimil Babka wrote: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aa58a32..56355f2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm, */ up_read(mm-mmap_sem); - *hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER); + *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); @@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm, VM_BUG_ON(address ~HPAGE_PMD_MASK); - /* Only allocate from the target node */ - gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) | - __GFP_THISNODE; + gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0); /* release the mmap_sem read lock. */ new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node); Hmm, where is the __GFP_THISNODE enforcement in khugepaged_alloc_page() that is removed in collapse_huge_page()? I also don't see what happened to the __GFP_OTHER_NODE. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
On Thu, 30 Jul 2015, Vlastimil Babka wrote: NAK. This is changing slob behavior. With no node specified it must use alloc_pages because that obeys NUMA memory policies etc etc. It should not force allocation from the current node like what is happening here after the patch. See the code in slub.c that is similar. Doh, somehow I convinced myself that there's #else and alloc_pages() is only used for !CONFIG_NUMA so it doesn't matter. Here's a fixed version. Acked-by: Christoph Lameter c...@linux.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
On 07/30/2015 07:58 PM, Christoph Lameter wrote: On Thu, 30 Jul 2015, Vlastimil Babka wrote: --- a/mm/slob.c +++ b/mm/slob.c void *page; -#ifdef CONFIG_NUMA -if (node != NUMA_NO_NODE) -page = alloc_pages_exact_node(node, gfp, order); -else -#endif -page = alloc_pages(gfp, order); +page = alloc_pages_node(node, gfp, order); NAK. This is changing slob behavior. With no node specified it must use alloc_pages because that obeys NUMA memory policies etc etc. It should not force allocation from the current node like what is happening here after the patch. See the code in slub.c that is similar. Doh, somehow I convinced myself that there's #else and alloc_pages() is only used for !CONFIG_NUMA so it doesn't matter. Here's a fixed version. --8-- From: Vlastimil Babka vba...@suse.cz Date: Fri, 24 Jul 2015 15:49:47 +0200 Subject: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 (page allocator: do not check NUMA node ID when the caller knows the node is valid) as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily suggest that the allocation is restricted to the given node and fails otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is passed among the gfp flags. The misleading name has lead to mistakes in the past, see 5265047ac301 (mm, thp: really limit transparent hugepage allocation to local node) and b360edb43f8e (mm, mempolicy: migrate_to_node should only migrate to node). Another issue with the name is that there's a family of alloc_pages_exact*() functions where 'exact' means exact size (instead of page order), which leads to more confusion. To prevent further mistakes, this patch effectively renames alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's an optimized variant of alloc_pages_node() not intended for general usage. Both functions get described in comments. It has been also considered to really provide a convenience function for allocations restricted to a node, but the major opinion seems to be that __GFP_THISNODE already provides that functionality and we shouldn't duplicate the API needlessly. The number of users would be small anyway. Existing callers of alloc_pages_exact_node() are simply converted to call __alloc_pages_node(), with the exception of sba_alloc_coherent() which open-codes the check for NUMA_NO_NODE, so it is converted to use alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON checks, and since the current check for nid in alloc_pages_node() uses a 'nid 0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which would be previously exposed. Both differences will be rectified by the next patch. To sum up, this patch makes no functional changes, except temporarily hiding potentially buggy callers. Restricting the checks in alloc_pages_node() is left for the next patch which can in turn expose more existing buggy callers. Signed-off-by: Vlastimil Babka vba...@suse.cz Acked-by: Johannes Weiner han...@cmpxchg.org Cc: Mel Gorman mgor...@suse.de Cc: David Rientjes rient...@google.com Cc: Greg Thelen gthe...@google.com Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Christoph Lameter c...@linux.com Cc: Pekka Enberg penb...@kernel.org Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Naoya Horiguchi n-horigu...@ah.jp.nec.com Cc: Tony Luck tony.l...@intel.com Cc: Fenghua Yu fenghua...@intel.com Cc: Arnd Bergmann a...@arndb.de Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Acked-by: Michael Ellerman m...@ellerman.id.au Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Cliff Whickman c...@sgi.com Acked-by: Robin Holt robinmh...@gmail.com --- arch/ia64/hp/common/sba_iommu.c | 6 +- arch/ia64/kernel/uncached.c | 2 +- arch/ia64/sn/pci/pci_dma.c| 2 +- arch/powerpc/platforms/cell/ras.c | 2 +- arch/x86/kvm/vmx.c| 2 +- drivers/misc/sgi-xp/xpc_uv.c | 2 +- include/linux/gfp.h | 23 +++ kernel/profile.c | 8 mm/filemap.c | 2 +- mm/huge_memory.c | 6 ++ mm/hugetlb.c | 4 ++-- mm/memory-failure.c | 2 +- mm/mempolicy.c| 4 ++-- mm/migrate.c | 4 ++-- mm/page_alloc.c | 2 -- mm/slab.c | 2 +- mm/slob.c | 4 ++-- mm/slub.c | 2 +- 18 files changed, 39 insertions(+), 40 deletions(-) diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common
[PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 (page allocator: do not check NUMA node ID when the caller knows the node is valid) as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily suggest that the allocation is restricted to the given node and fails otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is passed among the gfp flags. The misleading name has lead to mistakes in the past, see 5265047ac301 (mm, thp: really limit transparent hugepage allocation to local node) and b360edb43f8e (mm, mempolicy: migrate_to_node should only migrate to node). Another issue with the name is that there's a family of alloc_pages_exact*() functions where 'exact' means exact size (instead of page order), which leads to more confusion. To prevent further mistakes, this patch effectively renames alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's an optimized variant of alloc_pages_node() not intended for general usage. Both functions get described in comments. It has been also considered to really provide a convenience function for allocations restricted to a node, but the major opinion seems to be that __GFP_THISNODE already provides that functionality and we shouldn't duplicate the API needlessly. The number of users would be small anyway. Existing callers of alloc_pages_exact_node() are simply converted to call __alloc_pages_node(), with two exceptions. sba_alloc_coherent() and slob_new_page() both open-code the check for NUMA_NO_NODE, so they are converted to use alloc_pages_node() instead. This means they no longer perform some VM_BUG_ON checks, and since the current check for nid in alloc_pages_node() uses a 'nid 0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which would be previously exposed. Both differences will be rectified by the next patch. To sum up, this patch makes no functional changes, except temporarily hiding potentially buggy callers. Restricting the checks in alloc_pages_node() is left for the next patch which can in turn expose more existing buggy callers. Signed-off-by: Vlastimil Babka vba...@suse.cz Cc: Mel Gorman mgor...@suse.de Cc: David Rientjes rient...@google.com Cc: Greg Thelen gthe...@google.com Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Christoph Lameter c...@linux.com Cc: Pekka Enberg penb...@kernel.org Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Naoya Horiguchi n-horigu...@ah.jp.nec.com Cc: Tony Luck tony.l...@intel.com Cc: Fenghua Yu fenghua...@intel.com Cc: Arnd Bergmann a...@arndb.de Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Acked-by: Michael Ellerman m...@ellerman.id.au Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Cliff Whickman c...@sgi.com Acked-by: Robin Holt robinmh...@gmail.com --- Based on feedback from v1 and v2, The name is __alloc_pages_node() instead of alloc_pages_prefer_node() from v1. Two callsites were also converted to alloc_pages_node() instead. v2 was a RFC for linux-mm to settle on the API first. It tried keeping alloc_pages_exact_node() as a wrapper that adds __GFP_THISNODE but the consensus was to drop it. I'm CC'ing also maintainers of the callsites so they can verify that the callsites that don't pass __GFP_THISNODE are really not intended to restrict allocation to the given node. I went through them myself and each looked like it's better off if it can successfully allocate on a fallback node rather than fail. DavidR checked them also I think, but it's better if maintainers can verify that. I'm not completely sure about all the usages in sl*b due to multiple layers through which gfp flags are being passed. Patches 2 and 3 are mm-only so I don't CC everyone. arch/ia64/hp/common/sba_iommu.c | 6 +- arch/ia64/kernel/uncached.c | 2 +- arch/ia64/sn/pci/pci_dma.c| 2 +- arch/powerpc/platforms/cell/ras.c | 2 +- arch/x86/kvm/vmx.c| 2 +- drivers/misc/sgi-xp/xpc_uv.c | 2 +- include/linux/gfp.h | 23 +++ kernel/profile.c | 8 mm/filemap.c | 2 +- mm/huge_memory.c | 6 ++ mm/hugetlb.c | 4 ++-- mm/memory-failure.c | 2 +- mm/mempolicy.c| 4 ++-- mm/migrate.c | 4 ++-- mm/page_alloc.c | 2 -- mm/slab.c | 2 +- mm/slob.c | 14 -- mm/slub.c | 2 +- 18 files changed, 41 insertions(+), 48 deletions(-) diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index 344387a..a6d6190 100644 ---
Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
On Thu, 30 Jul 2015, Vlastimil Babka wrote: --- a/mm/slob.c +++ b/mm/slob.c void *page; -#ifdef CONFIG_NUMA - if (node != NUMA_NO_NODE) - page = alloc_pages_exact_node(node, gfp, order); - else -#endif - page = alloc_pages(gfp, order); + page = alloc_pages_node(node, gfp, order); NAK. This is changing slob behavior. With no node specified it must use alloc_pages because that obeys NUMA memory policies etc etc. It should not force allocation from the current node like what is happening here after the patch. See the code in slub.c that is similar. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev