Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-07-31 Thread Vlastimil Babka
On 31.7.2015 23:25, David Rientjes wrote:
 On Thu, 30 Jul 2015, Vlastimil Babka wrote:
 
 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index aa58a32..56355f2 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, 
 struct mm_struct *mm,
   */
  up_read(mm-mmap_sem);
  
 -*hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER);
 +*hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
  if (unlikely(!*hpage)) {
  count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
  *hpage = ERR_PTR(-ENOMEM);
 @@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm,
  
  VM_BUG_ON(address  ~HPAGE_PMD_MASK);
  
 -/* Only allocate from the target node */
 -gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) |
 -__GFP_THISNODE;
 +gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0);
  
  /* release the mmap_sem read lock. */
  new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
 
 Hmm, where is the __GFP_THISNODE enforcement in khugepaged_alloc_page() 
 that is removed in collapse_huge_page()?  I also don't see what happened 
 to the __GFP_OTHER_NODE.

Crap, I messed up with git, this hunk was supposed to be gone. Thanks for
noticing. Please apply without the collapse_huge_page hunk, or tell me to resend
once more.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-07-31 Thread David Rientjes
On Thu, 30 Jul 2015, Vlastimil Babka wrote:

 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index aa58a32..56355f2 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, 
 struct mm_struct *mm,
*/
   up_read(mm-mmap_sem);
  
 - *hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER);
 + *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
   if (unlikely(!*hpage)) {
   count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
   *hpage = ERR_PTR(-ENOMEM);
 @@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm,
  
   VM_BUG_ON(address  ~HPAGE_PMD_MASK);
  
 - /* Only allocate from the target node */
 - gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) |
 - __GFP_THISNODE;
 + gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0);
  
   /* release the mmap_sem read lock. */
   new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);

Hmm, where is the __GFP_THISNODE enforcement in khugepaged_alloc_page() 
that is removed in collapse_huge_page()?  I also don't see what happened 
to the __GFP_OTHER_NODE.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-07-30 Thread Christoph Lameter
On Thu, 30 Jul 2015, Vlastimil Babka wrote:

  NAK. This is changing slob behavior. With no node specified it must use
  alloc_pages because that obeys NUMA memory policies etc etc. It should not
  force allocation from the current node like what is happening here after
  the patch. See the code in slub.c that is similar.

 Doh, somehow I convinced myself that there's #else and alloc_pages() is only
 used for !CONFIG_NUMA so it doesn't matter. Here's a fixed version.

Acked-by: Christoph Lameter c...@linux.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-07-30 Thread Vlastimil Babka
On 07/30/2015 07:58 PM, Christoph Lameter wrote:
 On Thu, 30 Jul 2015, Vlastimil Babka wrote:
 
 --- a/mm/slob.c
 +++ b/mm/slob.c
  void *page;

 -#ifdef CONFIG_NUMA
 -if (node != NUMA_NO_NODE)
 -page = alloc_pages_exact_node(node, gfp, order);
 -else
 -#endif
 -page = alloc_pages(gfp, order);
 +page = alloc_pages_node(node, gfp, order);
 
 NAK. This is changing slob behavior. With no node specified it must use
 alloc_pages because that obeys NUMA memory policies etc etc. It should not
 force allocation from the current node like what is happening here after
 the patch. See the code in slub.c that is similar.
 
Doh, somehow I convinced myself that there's #else and alloc_pages() is only
used for !CONFIG_NUMA so it doesn't matter. Here's a fixed version.

--8--
From: Vlastimil Babka vba...@suse.cz
Date: Fri, 24 Jul 2015 15:49:47 +0200
Subject: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to
 __alloc_pages_node

The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 (page
allocator: do not check NUMA node ID when the caller knows the node is valid)
as an optimized variant of alloc_pages_node(), that doesn't fallback to current
node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily
suggest that the allocation is restricted to the given node and fails
otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is
passed among the gfp flags.

The misleading name has lead to mistakes in the past, see 5265047ac301 (mm,
thp: really limit transparent hugepage allocation to local node) and
b360edb43f8e (mm, mempolicy: migrate_to_node should only migrate to node).

Another issue with the name is that there's a family of alloc_pages_exact*()
functions where 'exact' means exact size (instead of page order), which leads
to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's
an optimized variant of alloc_pages_node() not intended for general usage.
Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't duplicate
the API needlessly. The number of users would be small anyway.

Existing callers of alloc_pages_exact_node() are simply converted to call
__alloc_pages_node(), with the exception of sba_alloc_coherent() which
open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON
checks, and since the current check for nid in alloc_pages_node() uses a 'nid 
0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which
would be previously exposed. Both differences will be rectified by the next
patch.

To sum up, this patch makes no functional changes, except temporarily hiding
potentially buggy callers. Restricting the checks in alloc_pages_node() is
left for the next patch which can in turn expose more existing buggy callers.

Signed-off-by: Vlastimil Babka vba...@suse.cz
Acked-by: Johannes Weiner han...@cmpxchg.org
Cc: Mel Gorman mgor...@suse.de
Cc: David Rientjes rient...@google.com
Cc: Greg Thelen gthe...@google.com
Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Cc: Christoph Lameter c...@linux.com
Cc: Pekka Enberg penb...@kernel.org
Cc: Joonsoo Kim iamjoonsoo@lge.com
Cc: Naoya Horiguchi n-horigu...@ah.jp.nec.com
Cc: Tony Luck tony.l...@intel.com
Cc: Fenghua Yu fenghua...@intel.com
Cc: Arnd Bergmann a...@arndb.de
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Acked-by: Michael Ellerman m...@ellerman.id.au
Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Cliff Whickman c...@sgi.com
Acked-by: Robin Holt robinmh...@gmail.com
---
 arch/ia64/hp/common/sba_iommu.c   |  6 +-
 arch/ia64/kernel/uncached.c   |  2 +-
 arch/ia64/sn/pci/pci_dma.c|  2 +-
 arch/powerpc/platforms/cell/ras.c |  2 +-
 arch/x86/kvm/vmx.c|  2 +-
 drivers/misc/sgi-xp/xpc_uv.c  |  2 +-
 include/linux/gfp.h   | 23 +++
 kernel/profile.c  |  8 
 mm/filemap.c  |  2 +-
 mm/huge_memory.c  |  6 ++
 mm/hugetlb.c  |  4 ++--
 mm/memory-failure.c   |  2 +-
 mm/mempolicy.c|  4 ++--
 mm/migrate.c  |  4 ++--
 mm/page_alloc.c   |  2 --
 mm/slab.c |  2 +-
 mm/slob.c |  4 ++--
 mm/slub.c |  2 +-
 18 files changed, 39 insertions(+), 40 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common

[PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-07-30 Thread Vlastimil Babka
The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 (page
allocator: do not check NUMA node ID when the caller knows the node is valid)
as an optimized variant of alloc_pages_node(), that doesn't fallback to current
node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily
suggest that the allocation is restricted to the given node and fails
otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is
passed among the gfp flags.

The misleading name has lead to mistakes in the past, see 5265047ac301 (mm,
thp: really limit transparent hugepage allocation to local node) and
b360edb43f8e (mm, mempolicy: migrate_to_node should only migrate to node).

Another issue with the name is that there's a family of alloc_pages_exact*()
functions where 'exact' means exact size (instead of page order), which leads
to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's
an optimized variant of alloc_pages_node() not intended for general usage.
Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't duplicate
the API needlessly. The number of users would be small anyway.

Existing callers of alloc_pages_exact_node() are simply converted to call
__alloc_pages_node(), with two exceptions. sba_alloc_coherent() and
slob_new_page() both open-code the check for NUMA_NO_NODE, so they are
converted to use alloc_pages_node() instead. This means they no longer perform
some VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid  0' comparison (which includes NUMA_NO_NODE),
it may hide wrong values which would be previously exposed. Both differences
will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily hiding
potentially buggy callers. Restricting the checks in alloc_pages_node() is
left for the next patch which can in turn expose more existing buggy callers.

Signed-off-by: Vlastimil Babka vba...@suse.cz
Cc: Mel Gorman mgor...@suse.de
Cc: David Rientjes rient...@google.com
Cc: Greg Thelen gthe...@google.com
Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Cc: Christoph Lameter c...@linux.com
Cc: Pekka Enberg penb...@kernel.org
Cc: Joonsoo Kim iamjoonsoo@lge.com
Cc: Naoya Horiguchi n-horigu...@ah.jp.nec.com
Cc: Tony Luck tony.l...@intel.com
Cc: Fenghua Yu fenghua...@intel.com
Cc: Arnd Bergmann a...@arndb.de
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Acked-by: Michael Ellerman m...@ellerman.id.au
Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Cliff Whickman c...@sgi.com
Acked-by: Robin Holt robinmh...@gmail.com
---
 Based on feedback from v1 and v2, The name is __alloc_pages_node() instead of
 alloc_pages_prefer_node() from v1. Two callsites were also converted to
 alloc_pages_node() instead. v2 was a RFC for linux-mm to settle on the API
 first. It tried keeping alloc_pages_exact_node() as a wrapper that adds
 __GFP_THISNODE but the consensus was to drop it.

 I'm CC'ing also maintainers of the callsites so they can verify that the
 callsites that don't pass __GFP_THISNODE are really not intended to restrict
 allocation to the given node. I went through them myself and each looked like
 it's better off if it can successfully allocate on a fallback node rather
 than fail. DavidR checked them also I think, but it's better if maintainers
 can verify that. I'm not completely sure about all the usages in sl*b due to
 multiple layers through which gfp flags are being passed.

 Patches 2 and 3 are mm-only so I don't CC everyone.

 arch/ia64/hp/common/sba_iommu.c   |  6 +-
 arch/ia64/kernel/uncached.c   |  2 +-
 arch/ia64/sn/pci/pci_dma.c|  2 +-
 arch/powerpc/platforms/cell/ras.c |  2 +-
 arch/x86/kvm/vmx.c|  2 +-
 drivers/misc/sgi-xp/xpc_uv.c  |  2 +-
 include/linux/gfp.h   | 23 +++
 kernel/profile.c  |  8 
 mm/filemap.c  |  2 +-
 mm/huge_memory.c  |  6 ++
 mm/hugetlb.c  |  4 ++--
 mm/memory-failure.c   |  2 +-
 mm/mempolicy.c|  4 ++--
 mm/migrate.c  |  4 ++--
 mm/page_alloc.c   |  2 --
 mm/slab.c |  2 +-
 mm/slob.c | 14 --
 mm/slub.c |  2 +-
 18 files changed, 41 insertions(+), 48 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 344387a..a6d6190 100644
--- 

Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-07-30 Thread Christoph Lameter
On Thu, 30 Jul 2015, Vlastimil Babka wrote:

 --- a/mm/slob.c
 +++ b/mm/slob.c
   void *page;

 -#ifdef CONFIG_NUMA
 - if (node != NUMA_NO_NODE)
 - page = alloc_pages_exact_node(node, gfp, order);
 - else
 -#endif
 - page = alloc_pages(gfp, order);
 + page = alloc_pages_node(node, gfp, order);

NAK. This is changing slob behavior. With no node specified it must use
alloc_pages because that obeys NUMA memory policies etc etc. It should not
force allocation from the current node like what is happening here after
the patch. See the code in slub.c that is similar.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev