Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus

2020-03-16 Thread Joonsoo Kim
2020년 3월 13일 (금) 오후 8:38, Vlastimil Babka 님이 작성:
>
> On 3/13/20 12:04 PM, Srikar Dronamraju wrote:
> >> I lost all the memory about it. :)
> >> Anyway, how about this?
> >>
> >> 1. make node_present_pages() safer
> >> static inline node_present_pages(nid)
> >> {
> >> if (!node_online(nid)) return 0;
> >> return (NODE_DATA(nid)->node_present_pages);
> >> }
> >>
> >
> > Yes this would help.
>
> Looks good, yeah.
>
> >> 2. make node_to_mem_node() safer for all cases
> >> In ppc arch's mem_topology_setup(void)
> >> for_each_present_cpu(cpu) {
> >>  numa_setup_cpu(cpu);
> >>  mem_node = node_to_mem_node(numa_mem_id());
> >>  if (!node_present_pages(mem_node)) {
> >>   _node_numa_mem_[numa_mem_id()] = first_online_node;
> >>  }
> >> }
> >>
> >
> > But here as discussed above, we miss the case of possible but not present 
> > nodes.
> > For such nodes, the above change may not update, resulting in they still
> > having 0. And node 0 can be only possible but not present.

Oops, I don't read full thread so miss the case.

> So is there other way to do the setup so that node_to_mem_node() returns an
> online+present node when called for any possible node?

Two changes seems to be sufficient.

1. initialize all node's _node_numa_mem_[] = first_online_node in
mem_topology_setup()
2. replace the node with online+present node for _node_to_mem_node_[]
in set_cpu_numa_mem().

 static inline void set_cpu_numa_mem(int cpu, int node)
 {
per_cpu(_numa_mem_, cpu) = node;
+   if (!node_present_pages(node))
+   node = first_online_node;
_node_numa_mem_[cpu_to_node(cpu)] = node;
 }
 #endif

With these two change, we can safely call node_to_mem_node() anywhere.

Thanks.


Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus

2020-03-13 Thread Joonsoo Kim
2020년 3월 13일 (금) 오전 1:42, Vlastimil Babka 님이 작성:
>
> On 3/12/20 5:13 PM, Srikar Dronamraju wrote:
> > * Vlastimil Babka  [2020-03-12 14:51:38]:
> >
> >> > * Vlastimil Babka  [2020-03-12 10:30:50]:
> >> >
> >> >> On 3/12/20 9:23 AM, Sachin Sant wrote:
> >> >> >> On 12-Mar-2020, at 10:57 AM, Srikar Dronamraju 
> >> >> >>  wrote:
> >> >> >> * Michal Hocko  [2020-03-11 12:57:35]:
> >> >> >>> On Wed 11-03-20 16:32:35, Srikar Dronamraju wrote:
> >> >>  To ensure a cpuless, memoryless dummy node is not online, powerpc 
> >> >>  need
> >> >>  to make sure all possible but not present cpu_to_node are set to a
> >> >>  proper node.
> >> >> >>>
> >> >> >>> Just curious, is this somehow related to
> >> >> >>> http://lkml.kernel.org/r/20200227182650.gg3...@dhcp22.suse.cz?
> >> >> >>>
> >> >> >>
> >> >> >> The issue I am trying to fix is a known issue in Powerpc since many 
> >> >> >> years.
> >> >> >> So this surely not a problem after a75056fc1e7c (mm/memcontrol.c: 
> >> >> >> allocate
> >> >> >> shrinker_map on appropriate NUMA node").
> >> >> >>
> >> >
> >> > While I am not an expert in the slub area, I looked at the patch
> >> > a75056fc1e7c and had some thoughts on why this could be causing this 
> >> > issue.
> >> >
> >> > On the system where the crash happens, the possible number of nodes is 
> >> > much
> >> > greater than the number of onlined nodes. The pdgat or the NODE_DATA is 
> >> > only
> >> > available for onlined nodes.
> >> >
> >> > With a75056fc1e7c memcg_alloc_shrinker_maps, we end up calling 
> >> > kzalloc_node
> >> > for all possible nodes and in ___slab_alloc we end up looking at the
> >> > node_present_pages which is NODE_DATA(nid)->node_present_pages.
> >> > i.e for a node whose pdgat struct is not allocated, we are trying to
> >> > dereference.
> >>
> >> From what we saw, the pgdat does exist, the problem is that slab's 
> >> per-node data
> >> doesn't exist for a node that doesn't have present pages, as it would be a 
> >> waste
> >> of memory.
> >
> > Just to be clear
> > Before my 3 patches to fix dummy node:
> > srikar@ltc-zzci-2 /sys/devices/system/node $ cat $PWD/possible
> > 0-31
> > srikar@ltc-zzci-2 /sys/devices/system/node $ cat $PWD/online
> > 0-1
>
> OK
>
> >>
> >> Uh actually you are probably right, the NODE_DATA doesn't exist anymore? In
> >> Sachin's first report [1] we have
> >>
> >> [0.00] numa:   NODE_DATA [mem 0x8bfedc900-0x8bfee3fff]
> >> [0.00] numa: NODE_DATA(0) on node 1
> >> [0.00] numa:   NODE_DATA [mem 0x8bfed5200-0x8bfedc8ff]
> >>
> >
> > So even if pgdat would exist for nodes 0 and 1, there is no pgdat for the
> > rest 30 nodes.
>
> I see. Perhaps node_present_pages(node) is not safe in SLUB then and it should
> check online first, as you suggested.
>
> >> But in this thread, with your patches Sachin reports:
> >
> > and with my patches
> > srikar@ltc-zzci-2 /sys/devices/system/node $ cat $PWD/possible
> > 0-31
> > srikar@ltc-zzci-2 /sys/devices/system/node $ cat $PWD/online
> > 1
> >
> >>
> >> [0.00] numa:   NODE_DATA [mem 0x8bfedc900-0x8bfee3fff]
> >>
> >
> > so we only see one pgdat.
> >
> >> So I assume it's just node 1. In that case, node_present_pages is really 
> >> dangerous.
> >>
> >> [1]
> >> https://lore.kernel.org/linux-next/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com/
> >>
> >> > Also for a memoryless/cpuless node or possible but not present nodes,
> >> > node_to_mem_node(node) will still end up as node (atleast on powerpc).
> >>
> >> I think that's the place where this would be best to fix.
> >>
> >
> > Maybe. I thought about it but the current set_numa_mem semantics are apt
> > for memoryless cpu node and not for possible nodes.  We could have upto 256
> > possible nodes and only 2 nodes (1,2) with cpu and 1 node (1) with memory.
> > node_to_mem_node seems to return what is set in set_numa_mem().
> > set_numa_mem() seems to say set my numa_mem node for the current memoryless
> > node to the param passed.
> >
> > But how do we set numa_mem for all the other 253 possible nodes, which
> > probably will have 0 as default?
> >
> > Should we introduce another API such that we could update for all possible
> > nodes?
>
> If we want to rely on node_to_mem_node() to give us something safe for each
> possible node, then probably it would have to be like that, yeah.
>
> >> > I tried with this hunk below and it works.
> >> >
> >> > But I am not sure if we need to check at other places were
> >> > node_present_pages is being called.
> >>
> >> I think this seems to defeat the purpose of node_to_mem_node()? Shouldn't 
> >> it
> >> return only nodes that are online with present memory?
> >> CCing Joonsoo who seems to have introduced this in ad2c8144418c 
> >> ("topology: add
> >> support for node_to_mem_node() to determine the fallback node")
> >>
> >
> > Agree

I lost all the memory about it. :)
Anyway, how about this?

1. make node_present_pages() safer
static inline node_present_pages(nid)
{
if 

Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-13 Thread Joonsoo Kim
2018-07-12 16:15 GMT+09:00 Christoph Hellwig :
> On Thu, Jul 12, 2018 at 11:48:47AM +0900, Joonsoo Kim wrote:
>> One of existing user is general DMA layer and it takes gfp flags that is
>> provided by user. I don't check all the DMA allocation sites but how do
>> you convince that none of them try to use anything other
>> than GFP_KERNEL [|__GFP_NOWARN]?
>
> They use a few others things still like __GFP_COMP, __GPF_DMA or
> GFP_HUGEPAGE.  But all these are bogus as we have various implementations
> that can't respect them.  I plan to get rid of the gfp_t argument
> in the dma_map_ops alloc method in a few merge windows because of that,
> but it needs further implementation consolidation first.

Okay. If those flags are all, this change would be okay.

For the remind of this gfp flag introduction in cma_alloc(), see the
following link.

https://marc.info/?l=linux-mm=148431452118407

Thanks.


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-11 Thread Joonsoo Kim
2018-07-11 17:54 GMT+09:00 Michal Hocko :
> On Wed 11-07-18 16:35:28, Joonsoo Kim wrote:
>> 2018-07-10 18:50 GMT+09:00 Michal Hocko :
>> > On Tue 10-07-18 16:19:32, Joonsoo Kim wrote:
>> >> Hello, Marek.
>> >>
>> >> 2018-07-09 21:19 GMT+09:00 Marek Szyprowski :
>> >> > cma_alloc() function doesn't really support gfp flags other than
>> >> > __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn 
>> >> > parameter.
>> >>
>> >> Although gfp_mask isn't used in cma_alloc() except no_warn, it can be used
>> >> in alloc_contig_range(). For example, if passed gfp mask has no __GFP_FS,
>> >> compaction(isolation) would work differently. Do you have considered
>> >> such a case?
>> >
>> > Does any of cma_alloc users actually care about GFP_NO{FS,IO}?
>>
>> I don't know. My guess is that cma_alloc() is used for DMA allocation so
>> block device would use it, too. If fs/block subsystem initiates the
>> request for the device,
>> it would be possible that cma_alloc() is called with such a flag.
>> Again, I don't know
>> much about those subsystem so I would be wrong.
>
> The patch converts existing users and none of them really tries to use
> anything other than GFP_KERNEL [|__GFP_NOWARN] so this doesn't seem to
> be the case. Should there be a new user requiring more restricted
> gfp_mask we should carefuly re-evaluate and think how to support it.

One of existing user is general DMA layer and it takes gfp flags that is
provided by user. I don't check all the DMA allocation sites but how do
you convince that none of them try to use anything other
than GFP_KERNEL [|__GFP_NOWARN]?

Thanks.


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-11 Thread Joonsoo Kim
2018-07-10 18:50 GMT+09:00 Michal Hocko :
> On Tue 10-07-18 16:19:32, Joonsoo Kim wrote:
>> Hello, Marek.
>>
>> 2018-07-09 21:19 GMT+09:00 Marek Szyprowski :
>> > cma_alloc() function doesn't really support gfp flags other than
>> > __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.
>>
>> Although gfp_mask isn't used in cma_alloc() except no_warn, it can be used
>> in alloc_contig_range(). For example, if passed gfp mask has no __GFP_FS,
>> compaction(isolation) would work differently. Do you have considered
>> such a case?
>
> Does any of cma_alloc users actually care about GFP_NO{FS,IO}?

I don't know. My guess is that cma_alloc() is used for DMA allocation so
block device would use it, too. If fs/block subsystem initiates the
request for the device,
it would be possible that cma_alloc() is called with such a flag.
Again, I don't know
much about those subsystem so I would be wrong.

Thanks.


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-10 Thread Joonsoo Kim
Hello, Marek.

2018-07-09 21:19 GMT+09:00 Marek Szyprowski :
> cma_alloc() function doesn't really support gfp flags other than
> __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.

Although gfp_mask isn't used in cma_alloc() except no_warn, it can be used
in alloc_contig_range(). For example, if passed gfp mask has no __GFP_FS,
compaction(isolation) would work differently. Do you have considered
such a case?

Thanks.


Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-11 Thread Joonsoo Kim
On Fri, Jul 08, 2016 at 04:48:38PM -0400, Kees Cook wrote:
> On Fri, Jul 8, 2016 at 1:41 PM, Kees Cook  wrote:
> > On Fri, Jul 8, 2016 at 12:20 PM, Christoph Lameter  wrote:
> >> On Fri, 8 Jul 2016, Kees Cook wrote:
> >>
> >>> Is check_valid_pointer() making sure the pointer is within the usable
> >>> size? It seemed like it was checking that it was within the slub
> >>> object (checks against s->size, wants it above base after moving
> >>> pointer to include redzone, etc).
> >>
> >> check_valid_pointer verifies that a pointer is pointing to the start of an
> >> object. It is used to verify the internal points that SLUB used and
> >> should not be modified to do anything different.
> >
> > Yup, no worries -- I won't touch it. :) I just wanted to verify my
> > understanding.
> >
> > And after playing a bit more, I see that the only thing to the left is
> > padding and redzone. SLUB layout, from what I saw:
> >
> > offset: what's there
> > ---
> > start: padding, redzone
> > red_left_pad: object itself
> > inuse: rest of metadata
> > size: start of next slub object
> >
> > (and object_size == inuse - red_left_pad)
> >
> > i.e. a pointer must be between red_left_pad and inuse, which is the
> > same as pointer - ref_left_pad being less than object_size.
> >
> > So, as found already, the position in the usercopy check needs to be
> > bumped down by red_left_pad, which is what Michael's fix does, so I'll
> > include it in the next version.
> 
> Actually, after some offline chats, I think this is better, since it
> makes sure the ptr doesn't end up somewhere weird before we start the
> calculations. This leaves the pointer as-is, but explicitly handles
> the redzone on the offset instead, with no wrapping, etc:
> 
> /* Find offset within object. */
> offset = (ptr - page_address(page)) % s->size;
> 
> +   /* Adjust for redzone and reject if within the redzone. */
> +   if (s->flags & SLAB_RED_ZONE) {
> +   if (offset < s->red_left_pad)
> +   return s->name;
> +   offset -= s->red_left_pad;
> +   }
> +
> /* Allow address range falling entirely within object size. */
> if (offset <= s->object_size && n <= s->object_size - offset)
> return NULL;
> 

As Christoph saids, please use slab_ksize() rather than
s->object_size.

Otherwise, looks good to me.

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] powerpc/mm: fix undefined reference to `.__kernel_map_pages' on FSL PPC64

2015-01-27 Thread Joonsoo Kim
2015-01-28 10:01 GMT+09:00 Michael Ellerman m...@ellerman.id.au:
 On Mon, 2015-01-26 at 13:22 -0600, Kim Phillips wrote:
 arch/powerpc has __kernel_map_pages implementations in mm/pgtable_32.c, and
 mm/hash_utils_64.c, of which the former is built for PPC32, and the latter
 for PPC64 machines with PPC_STD_MMU.  Fix arch/powerpc/Kconfig to not select
 ARCH_SUPPORTS_DEBUG_PAGEALLOC when CONFIG_PPC_STD_MMU_64 isn't defined,
 i.e., for 64-bit book3e builds to use the generic __kernel_map_pages()
 in mm/debug-pagealloc.c.

   LD  init/built-in.o
 mm/built-in.o: In function `kernel_map_pages':
 include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
 include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
 include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
 Makefile:925: recipe for target 'vmlinux' failed
 make: *** [vmlinux] Error 1

 Signed-off-by: Kim Phillips kim.phill...@freescale.com
 ---
 v3:
 - fix wording for hash_utils_64.c implementation pointed out by
 Michael Ellerman
 - changed designation from 'mm:' to 'powerpc/mm:', as I think this
 now belongs in ppc-land

 v2:
 - corrected SUPPORTS_DEBUG_PAGEALLOC selection to enable
 non-STD_MMU_64 builds to use the generic __kernel_map_pages().

 I'd be happy to take this through the powerpc tree for 3.20, but for this:

 depends on:
 From: Joonsoo Kim iamjoonsoo@lge.com
 Date: Thu, 22 Jan 2015 10:28:58 +0900
 Subject: [PATCH] mm/debug_pagealloc: fix build failure on ppc and some other 
 archs

 I don't have that patch in my tree.

 But in what way does this patch depend on that one?

 It looks to me like it'd be safe to take this on its own, or am I wrong?


Hello,

These two patches are merged to Andrew's tree now.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] mm: fix undefined reference to `.kernel_map_pages' on PPC builds

2015-01-26 Thread Joonsoo Kim
On Thu, Jan 22, 2015 at 10:45:51AM +0900, Joonsoo Kim wrote:
 On Wed, Jan 21, 2015 at 09:57:59PM +0900, Akinobu Mita wrote:
  2015-01-21 9:07 GMT+09:00 Andrew Morton a...@linux-foundation.org:
   On Tue, 20 Jan 2015 15:01:50 -0800 j...@joshtriplett.org wrote:
  
   On Tue, Jan 20, 2015 at 02:02:00PM -0600, Kim Phillips wrote:
It's possible to configure DEBUG_PAGEALLOC without PAGE_POISONING on
ppc.  Fix building the generic kernel_map_pages() implementation in
this case:
   
  LD  init/built-in.o
mm/built-in.o: In function `free_pages_prepare':
mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
mm/built-in.o: In function `prep_new_page':
mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
mm/built-in.o: In function `map_pages':
mm/compaction.c:61: undefined reference to `.kernel_map_pages'
make: *** [vmlinux] Error 1
  
  kernel_map_pages() is static inline function since commit 031bc5743f15
  (mm/debug-pagealloc: make debug-pagealloc boottime configurable).
  
  But there is old declaration in 'arch/powerpc/include/asm/cacheflush.h'.
  Removing it or changing s/kernel_map_pages/__kernel_map_pages/ in this
  header file or something can fix this problem?
  
  The architecture which has ARCH_SUPPORTS_DEBUG_PAGEALLOC
  including PPC should not build mm/debug-pagealloc.o
 
 Yes, architecture with ARCH_SUPPORTS_DEBUG_PAGEALLOC should not build
 mm/debug-pagealloc.o. I attach the patch to remove old declaration.
 I hope it will fix Kim's problem.
 
 --8--
 From 7cb9d1ed8a785df152cb8934e187031c8ebd1bb2 Mon Sep 17 00:00:00 2001
 From: Joonsoo Kim iamjoonsoo@lge.com
 Date: Thu, 22 Jan 2015 10:28:58 +0900
 Subject: [PATCH] mm/debug_pagealloc: fix build failure on ppc and some other
  archs
 
 Kim Phillips reported following build failure.
 
   LD  init/built-in.o
   mm/built-in.o: In function `free_pages_prepare':
   mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
   mm/built-in.o: In function `prep_new_page':
   mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
   mm/built-in.o: In function `map_pages':
   mm/compaction.c:61: undefined reference to `.kernel_map_pages'
   make: *** [vmlinux] Error 1
 
 Reason for this problem is that commit 031bc5743f15
 (mm/debug-pagealloc: make debug-pagealloc boottime configurable) forgot
 to remove old declaration of kernel_map_pages() in some architectures.
 This patch removes them to fix build failure.
 
 Reported-by: Kim Phillips kim.phill...@freescale.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Hello, Andrew.

Could you take this patch?
This patch is also needed to fix build failure.

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] mm: fix undefined reference to `.kernel_map_pages' on PPC builds

2015-01-21 Thread Joonsoo Kim
On Wed, Jan 21, 2015 at 09:57:59PM +0900, Akinobu Mita wrote:
 2015-01-21 9:07 GMT+09:00 Andrew Morton a...@linux-foundation.org:
  On Tue, 20 Jan 2015 15:01:50 -0800 j...@joshtriplett.org wrote:
 
  On Tue, Jan 20, 2015 at 02:02:00PM -0600, Kim Phillips wrote:
   It's possible to configure DEBUG_PAGEALLOC without PAGE_POISONING on
   ppc.  Fix building the generic kernel_map_pages() implementation in
   this case:
  
 LD  init/built-in.o
   mm/built-in.o: In function `free_pages_prepare':
   mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
   mm/built-in.o: In function `prep_new_page':
   mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
   mm/built-in.o: In function `map_pages':
   mm/compaction.c:61: undefined reference to `.kernel_map_pages'
   make: *** [vmlinux] Error 1
 
 kernel_map_pages() is static inline function since commit 031bc5743f15
 (mm/debug-pagealloc: make debug-pagealloc boottime configurable).
 
 But there is old declaration in 'arch/powerpc/include/asm/cacheflush.h'.
 Removing it or changing s/kernel_map_pages/__kernel_map_pages/ in this
 header file or something can fix this problem?
 
 The architecture which has ARCH_SUPPORTS_DEBUG_PAGEALLOC
 including PPC should not build mm/debug-pagealloc.o

Yes, architecture with ARCH_SUPPORTS_DEBUG_PAGEALLOC should not build
mm/debug-pagealloc.o. I attach the patch to remove old declaration.
I hope it will fix Kim's problem.

--8--
From 7cb9d1ed8a785df152cb8934e187031c8ebd1bb2 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim iamjoonsoo@lge.com
Date: Thu, 22 Jan 2015 10:28:58 +0900
Subject: [PATCH] mm/debug_pagealloc: fix build failure on ppc and some other
 archs

Kim Phillips reported following build failure.

  LD  init/built-in.o
  mm/built-in.o: In function `free_pages_prepare':
  mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
  mm/built-in.o: In function `prep_new_page':
  mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
  mm/built-in.o: In function `map_pages':
  mm/compaction.c:61: undefined reference to `.kernel_map_pages'
  make: *** [vmlinux] Error 1

Reason for this problem is that commit 031bc5743f15
(mm/debug-pagealloc: make debug-pagealloc boottime configurable) forgot
to remove old declaration of kernel_map_pages() in some architectures.
This patch removes them to fix build failure.

Reported-by: Kim Phillips kim.phill...@freescale.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
---
 arch/mn10300/include/asm/cacheflush.h  |7 ---
 arch/powerpc/include/asm/cacheflush.h  |7 ---
 arch/s390/include/asm/cacheflush.h |4 
 arch/sparc/include/asm/cacheflush_64.h |5 -
 4 files changed, 23 deletions(-)

diff --git a/arch/mn10300/include/asm/cacheflush.h 
b/arch/mn10300/include/asm/cacheflush.h
index faed902..6d6df83 100644
--- a/arch/mn10300/include/asm/cacheflush.h
+++ b/arch/mn10300/include/asm/cacheflush.h
@@ -159,13 +159,6 @@ extern void flush_icache_range(unsigned long start, 
unsigned long end);
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy(dst, src, len)
 
-/*
- * Internal debugging function
- */
-#ifdef CONFIG_DEBUG_PAGEALLOC
-extern void kernel_map_pages(struct page *page, int numpages, int enable);
-#endif
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_CACHEFLUSH_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 5b93122..30b35ff 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -60,13 +60,6 @@ extern void flush_dcache_phys_range(unsigned long start, 
unsigned long stop);
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy(dst, src, len)
 
-
-
-#ifdef CONFIG_DEBUG_PAGEALLOC
-/* internal debugging function */
-void kernel_map_pages(struct page *page, int numpages, int enable);
-#endif
-
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_POWERPC_CACHEFLUSH_H */
diff --git a/arch/s390/include/asm/cacheflush.h 
b/arch/s390/include/asm/cacheflush.h
index 3e20383..58fae7d 100644
--- a/arch/s390/include/asm/cacheflush.h
+++ b/arch/s390/include/asm/cacheflush.h
@@ -4,10 +4,6 @@
 /* Caches aren't brain-dead on the s390. */
 #include asm-generic/cacheflush.h
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
-void kernel_map_pages(struct page *page, int numpages, int enable);
-#endif
-
 int set_memory_ro(unsigned long addr, int numpages);
 int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_nx(unsigned long addr, int numpages);
diff --git a/arch/sparc/include/asm/cacheflush_64.h 
b/arch/sparc/include/asm/cacheflush_64.h
index 3896537..68513c4 100644
--- a/arch/sparc/include/asm/cacheflush_64.h
+++ b/arch/sparc/include/asm/cacheflush_64.h
@@ -74,11 +74,6 @@ void flush_ptrace_access(struct vm_area_struct *, struct 
page *,
 #define flush_cache_vmap(start, end)   do { } while (0)
 #define flush_cache_vunmap(start, end) do { } while (0

Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-19 Thread Joonsoo Kim
On Thu, Aug 14, 2014 at 03:03:07PM +1000, Alexey Kardashevskiy wrote:
 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().
 
 This replaces get_order() with order_base_2() (round-up version of ilog2).
 
 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Sorry for my fault. :(

Acked-by: Joonsoo Kim iamjoonsoo@lge.com

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/kvm: Fix build break with CONFIG_KVM_BOOK3S_64_HV=y

2014-06-24 Thread Joonsoo Kim
On Tue, Jun 24, 2014 at 04:36:47PM +1000, Michael Ellerman wrote:
 Commit e58e263 PPC, KVM, CMA: use general CMA reserved area management
 framework in next-20140624 removed arch/powerpc/kvm/book3s_hv_cma.c but
 neglected to update the Makefile, thus breaking the build.
 
 Signed-off-by: Michael Ellerman m...@ellerman.id.au
 ---
 
 Hi Andrew,
 
 This is in your akpm-current and is breaking some of the builds for
 powerpc in linux-next. Squashing this fix into the original patch would
 be best for us.
 

I sent really same patch 10 minutes ago. :)

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code

2014-06-24 Thread Joonsoo Kim
On Wed, Jun 18, 2014 at 01:51:44PM -0700, Andrew Morton wrote:
 On Tue, 17 Jun 2014 10:25:07 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote:
 
   v2:
  - Although this patchset looks very different with v1, the end result,
  that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.
   
   This patchset is based on linux-next 20140610.
   
   Thanks for taking care of this. I will test it with my setup and if
   everything goes well, I will take it to my -next tree. If any branch
   is required for anyone to continue his works on top of those patches,
   let me know, I will also prepare it.
  
  Hello,
  
  I'm glad to hear that. :)
  But, there is one concern. As you already know, I am preparing further
  patches (Aggressively allocate the pages on CMA reserved memory). It
  may be highly related to MM branch and also slightly depends on this CMA
  changes. In this case, what is the best strategy to merge this
  patchset? IMHO, Anrew's tree is more appropriate branch. If there is
  no issue in this case, I am willing to develope further patches based
  on your tree.
 
 That's probably easier.  Marek, I'll merge these into -mm (and hence
 -next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
 and shall hold them pending you review/ack/test/etc, OK?

Hello, Marek.

Could you share your decision about this patchset?

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 -next 4/9] DMA, CMA: support arbitrary bitmap granularity

2014-06-19 Thread Joonsoo Kim
On Wed, Jun 18, 2014 at 01:48:15PM -0700, Andrew Morton wrote:
 On Mon, 16 Jun 2014 14:40:46 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote:
 
  PPC KVM's CMA area management requires arbitrary bitmap granularity,
  since they want to reserve very large memory and manage this region
  with bitmap that one bit for several pages to reduce management overheads.
  So support arbitrary bitmap granularity for following generalization.
  
  ...
 
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -38,6 +38,7 @@ struct cma {
  unsigned long   base_pfn;
  unsigned long   count;
  unsigned long   *bitmap;
  +   unsigned int order_per_bit; /* Order of pages represented by one bit */
  struct mutexlock;
   };
   
  @@ -157,9 +158,37 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
   
   static DEFINE_MUTEX(cma_mutex);
   
  +static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int 
  align_order)
  +{
  +   return (1  (align_order  cma-order_per_bit)) - 1;
  +}
 
 Might want a 1UL  ... here.

Okay!

 
  +static unsigned long cma_bitmap_maxno(struct cma *cma)
  +{
  +   return cma-count  cma-order_per_bit;
  +}
  +
  +static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
  +   unsigned long pages)
  +{
  +   return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
  +}
 
 Ditto.  I'm not really sure what the compiler will do in these cases,
 but would prefer not to rely on it anyway!

Okay!

Thanks for fix!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code

2014-06-16 Thread Joonsoo Kim
On Mon, Jun 16, 2014 at 11:11:35AM +0200, Marek Szyprowski wrote:
 Hello,
 
 On 2014-06-16 07:40, Joonsoo Kim wrote:
 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the KVM on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.
 
 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.
 
 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.
 
 v3:
- Simplify old patch 1(log format fix) and move it to the end of patchset.
- Patch 2: Pass aligned base and size to dma_contiguous_early_fixup()
- Patch 5: Add some accessor functions to pass aligned base and size to
dma_contiguous_early_fixup() function
- Patch 5: Move MAX_CMA_AREAS definition to cma.h
- Patch 6: Add CMA region zeroing to PPC KVM's CMA alloc function
- Patch 8: put 'base' ahead of 'size' in cma_declare_contiguous()
- Remaining minor fixes are noted in commit description of each one
 
 v2:
- Although this patchset looks very different with v1, the end result,
that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.
 
 This patchset is based on linux-next 20140610.
 
 Thanks for taking care of this. I will test it with my setup and if
 everything goes well, I will take it to my -next tree. If any branch
 is required for anyone to continue his works on top of those patches,
 let me know, I will also prepare it.

Hello,

I'm glad to hear that. :)
But, there is one concern. As you already know, I am preparing further
patches (Aggressively allocate the pages on CMA reserved memory). It
may be highly related to MM branch and also slightly depends on this CMA
changes. In this case, what is the best strategy to merge this
patchset? IMHO, Anrew's tree is more appropriate branch. If there is
no issue in this case, I am willing to develope further patches based
on your tree.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 -next 1/9] DMA, CMA: fix possible memory leak

2014-06-16 Thread Joonsoo Kim
On Mon, Jun 16, 2014 at 03:27:19PM +0900, Minchan Kim wrote:
 Hi, Joonsoo
 
 On Mon, Jun 16, 2014 at 02:40:43PM +0900, Joonsoo Kim wrote:
  We should free memory for bitmap when we find zone mis-match,
  otherwise this memory will leak.
  
  Additionally, I copy code comment from PPC KVM's CMA code to inform
  why we need to check zone mis-match.
  
  * Note
  Minchan suggested to add a tag for the stable, but, I don't do it,
  because I found this possibility during code-review and, IMO,
  this patch isn't suitable for stable tree.
 
 Nice idea to put the comment in here. Thanks Joonsoo.
 
 It seems you obey It must fix a real bug that bothers people
 on Documentation/stable_kernel_rules.txt but it's a really obvious
 bug and hard to get a report from people because limited user and
 hard to detect small such small memory leak.
 
 In my experince, Andrew perfered stable marking for such a obvious
 problem but simple fix like this but not sure so let's pass the decision
 to him and will learn a lesson from him and will follow the decision
 from now on.

Yes, I intended to pass the decision to others. :)

 
 Thanks.
 
 Acked-by: Minchan Kim minc...@kernel.org

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-15 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 11:53:16AM +0200, Michal Nazarewicz wrote:
 On Thu, Jun 12 2014, Michal Nazarewicz min...@mina86.com wrote:
  I used “function(arg1, arg2, …)” at the *beginning* of functions when
  the arguments passed to the function were included in the message.  In
  all other cases I left it at just “function:” (or just no additional
  prefix).  IMO that's a reasonable strategy.
 
 At closer inspection, I realised drivers/base/dma-contiguous.c is
 Marek's code, but the above I think is still reasonable thing to do, so
 I'd rather standardise on having “function(…)” only at the beginning of
 a function.  Just my 0.02 CHF.

Hello,

Now, I realize that these changes aren't needed in this patchset, so I
simplify this patch just to remove redundant 'CMA' prefix. Other things
can be done after merging if we need.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-15 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 12:02:38PM +0200, Michal Nazarewicz wrote:
 On Thu, Jun 12 2014, Joonsoo Kim iamjoonsoo@lge.com wrote:
  ppc kvm's cma area management needs alignment constraint on
 
 I've noticed it earlier and cannot seem to get to terms with this.  It
 should IMO be PPC, KVM and CMA since those are acronyms.  But if you
 have strong feelings, it's not a big issue.

Yes, I will fix it.

 
  cma region. So support it to prepare generalization of cma area
  management functionality.
 
  Additionally, add some comments which tell us why alignment
  constraint is needed on cma region.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Acked-by: Michal Nazarewicz min...@mina86.com
 
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index 8a44c82..bc4c171 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
* @size: Size of the reserved area (in bytes),
* @base: Base address of the reserved area optional, use 0 for any
* @limit: End address of the reserved memory (optional, 0 for any).
  + * @alignment: Alignment for the contiguous memory area, should be
  power of 2
 
 “must be power of 2 or zero”.

Okay.

* @res_cma: Pointer to store the created cma region.
* @fixed: hint about where to place the reserved area
*
  @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
*/
   static int __init __dma_contiguous_reserve_area(phys_addr_t size,
  phys_addr_t base, phys_addr_t limit,
  +   phys_addr_t alignment,
  struct cma **res_cma, bool fixed)
   {
  struct cma *cma = cma_areas[cma_area_count];
  -   phys_addr_t alignment;
  int ret = 0;
   
  -   pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__,
  -(unsigned long)size, (unsigned long)base,
  -(unsigned long)limit);
  +   pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n,
  +   __func__, (unsigned long)size, (unsigned long)base,
  +   (unsigned long)limit, (unsigned long)alignment);
 
 Nit: Align with the rest of the arguments, i.e.:
 
 + pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n,
 +  __func__, (unsigned long)size, (unsigned long)base,
 +  (unsigned long)limit, (unsigned long)alignment);

What's the difference between mine and yours?

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity

2014-06-15 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 12:19:54PM +0200, Michal Nazarewicz wrote:
 On Thu, Jun 12 2014, Joonsoo Kim iamjoonsoo@lge.com wrote:
  ppc kvm's cma region management requires arbitrary bitmap granularity,
  since they want to reserve very large memory and manage this region
  with bitmap that one bit for several pages to reduce management overheads.
  So support arbitrary bitmap granularity for following generalization.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Acked-by: Michal Nazarewicz min...@mina86.com
 
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index bc4c171..9bc9340 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -38,6 +38,7 @@ struct cma {
  unsigned long   base_pfn;
  unsigned long   count;
 
 Have you considered replacing count with maxno?

No, I haven't.
I think that count is better than maxno, since it represent number of
pages in this region.

 
  unsigned long   *bitmap;
  +   int order_per_bit; /* Order of pages represented by one bit */
 
 I'd make it unsigned.

Will fix it.

  struct mutexlock;
   };
   
  +static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int
  count)
 
 For consistency cma_clear_bitmap would make more sense I think.  On the
 other hand, you're just moving stuff around so perhaps renaming the
 function at this point is not worth it any more.

Will fix it.

  +{
  +   unsigned long bitmapno, nr_bits;
  +
  +   bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
  +   nr_bits = cma_bitmap_pages_to_bits(cma, count);
  +
  +   mutex_lock(cma-lock);
  +   bitmap_clear(cma-bitmap, bitmapno, nr_bits);
  +   mutex_unlock(cma-lock);
  +}
  +
   static int __init cma_activate_area(struct cma *cma)
   {
  -   int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long);
  +   int bitmap_maxno = cma_bitmap_maxno(cma);
  +   int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
  unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
  unsigned i = cma-count  pageblock_order;
  struct zone *zone;
 
 bitmap_maxno is never used again, perhaps:
 
 + int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
 
 instead? Up to you.

Okay!!

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs

2014-06-15 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 02:37:43PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:40PM +0900, Joonsoo Kim wrote:
  To prepare future generalization work on cma area management code,
  we need to separate core cma management codes from DMA APIs.
  We will extend these core functions to cover requirements of
  ppc kvm's cma area management functionality in following patches.
  This separation helps us not to touch DMA APIs while extending
  core functions.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index fb0cdce..8a44c82 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas);
* If @fixed is true, reserve contiguous area at exactly @base.  If false,
* reserve in range from @base to @limit.
*/
  -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
  -  phys_addr_t limit, struct cma **res_cma,
  -  bool fixed)
  +static int __init __dma_contiguous_reserve_area(phys_addr_t size,
  +   phys_addr_t base, phys_addr_t limit,
  +   struct cma **res_cma, bool fixed)
   {
  struct cma *cma = cma_areas[cma_area_count];
  phys_addr_t alignment;
  @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
  size, phys_addr_t base,
   
  pr_info(%s(): reserved %ld MiB at %08lx\n,
  __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
  -
  -   /* Architecture specific contiguous memory fixup. */
  -   dma_contiguous_early_fixup(base, size);
  return 0;
  +
   err:
  pr_err(%s(): failed to reserve %ld MiB\n,
  __func__, (unsigned long)size / SZ_1M);
  return ret;
   }
   
  +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
  +  phys_addr_t limit, struct cma **res_cma,
  +  bool fixed)
  +{
  +   int ret;
  +
  +   ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
  +   if (ret)
  +   return ret;
  +
  +   /* Architecture specific contiguous memory fixup. */
  +   dma_contiguous_early_fixup(base, size);
 
 In old, base and size are aligned with alignment and passed into arch fixup
 but your patch is changing it.
 I didn't look at what kinds of side effect it makes but just want to confirm.

Good catch!!!
I will fix it.

  +
  +   return 0;
  +}
  +
   static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
   {
  mutex_lock(cma-lock);
  @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, 
  unsigned long pfn, int count)
* global one. Requires architecture specific dev_get_cma_area() helper
* function.
*/
  -struct page *dma_alloc_from_contiguous(struct device *dev, int count,
  +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
 unsigned int align)
   {
  unsigned long mask, pfn, pageno, start = 0;
  -   struct cma *cma = dev_get_cma_area(dev);
  struct page *page = NULL;
  int ret;
   
  if (!cma || !cma-count)
  return NULL;
   
  -   if (align  CONFIG_CMA_ALIGNMENT)
  -   align = CONFIG_CMA_ALIGNMENT;
  -
  pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma,
   count, align);
   
  @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device 
  *dev, int count,
  return page;
   }
   
 
 Please move the description in __dma_alloc_from_contiguous to here exported 
 API.
 

Okay.

  +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
  +  unsigned int align)
  +{
  +   struct cma *cma = dev_get_cma_area(dev);
  +
  +   if (align  CONFIG_CMA_ALIGNMENT)
  +   align = CONFIG_CMA_ALIGNMENT;
  +
  +   return __dma_alloc_from_contiguous(cma, count, align);
  +}
  +
   /**
* dma_release_from_contiguous() - release allocated pages
* @dev:   Pointer to device for which the pages were allocated.
  @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device 
  *dev, int count,
* It returns false when provided pages do not belong to contiguous area 
  and
* true otherwise.
*/
  -bool dma_release_from_contiguous(struct device *dev, struct page *pages,
  +static bool __dma_release_from_contiguous(struct cma *cma, struct page 
  *pages,
   int count)
   {
  -   struct cma *cma = dev_get_cma_area(dev);
  unsigned long pfn;
   
  if (!cma || !pages)
  @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, 
  struct page *pages,
   
  return true;
   }
  +
 
 Ditto.
 
Okay.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-15 Thread Joonsoo Kim
On Sat, Jun 14, 2014 at 03:46:44PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Currently, there are two users on CMA functionality, one is the DMA
  subsystem and the other is the kvm on powerpc. They have their own code
  to manage CMA reserved area even if they looks really similar.
  From my guess, it is caused by some needs on bitmap management. Kvm side
  wants to maintain bitmap not for 1 page, but for more size. Eventually it
  use bitmap where one bit represents 64 pages.
 
  When I implement CMA related patches, I should change those two places
  to apply my change and it seem to be painful to me. I want to change
  this situation and reduce future code management overhead through
  this patch.
 
  This change could also help developer who want to use CMA in their
  new feature development, since they can use CMA easily without
  copying  pasting this reserved area management code.
 
  In previous patches, we have prepared some features to generalize
  CMA reserved area management and now it's time to do it. This patch
  moves core functions to mm/cma.c and change DMA APIs to use
  these functions.
 
  There is no functional change in DMA APIs.
 
  v2: There is no big change from v1 in mm/cma.c. Mostly renaming.
 
  Acked-by: Michal Nazarewicz min...@mina86.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 
 .
 
  +
  +   mask = cma_bitmap_aligned_mask(cma, align);
  +   bitmap_maxno = cma_bitmap_maxno(cma);
  +   nr_bits = cma_bitmap_pages_to_bits(cma, count);
  +
  +   for (;;) {
  +   mutex_lock(cma-lock);
  +   bitmapno = bitmap_find_next_zero_area(cma-bitmap,
  +   bitmap_maxno, start, nr_bits, mask);
  +   if (bitmapno = bitmap_maxno) {
  +   mutex_unlock(cma-lock);
  +   break;
  +   }
  +   bitmap_set(cma-bitmap, bitmapno, nr_bits);
  +   /*
  +* It's safe to drop the lock here. We've marked this region for
  +* our exclusive use. If the migration fails we will take the
  +* lock again and unmark it.
  +*/
  +   mutex_unlock(cma-lock);
  +
  +   pfn = cma-base_pfn + (bitmapno  cma-order_per_bit);
  +   mutex_lock(cma_mutex);
  +   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
  +   mutex_unlock(cma_mutex);
  +   if (ret == 0) {
  +   page = pfn_to_page(pfn);
  +   break;
  +   } else if (ret != -EBUSY) {
  +   clear_cma_bitmap(cma, pfn, count);
  +   break;
  +   }
  +   
 
 
 For setting bit map we do
   bitmap_set(cma-bitmap, bitmapno, nr_bits);
 alloc_contig()..
 if (error)
 clear_cma_bitmap(cma, pfn, count);
 
 Why ?
 
 why not bitmap_clear() ?
 

Unlike your psuedo code, for setting bitmap, we do
- grab the mutex
- bitmap_set
- release the mutex

clear_cma_bitmap() handles these things.

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-15 Thread Joonsoo Kim
On Sat, Jun 14, 2014 at 03:35:33PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Now, we have general CMA reserved area management framework,
  so use it for future maintainabilty. There is no functional change.
 
  Acked-by: Michal Nazarewicz min...@mina86.com
  Acked-by: Paolo Bonzini pbonz...@redhat.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
  diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
  b/arch/powerpc/kvm/book3s_hv_builtin.c
  index 7cde8a6..28ec226 100644
  --- a/arch/powerpc/kvm/book3s_hv_builtin.c
  +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
  @@ -16,12 +16,14 @@
   #include linux/init.h
   #include linux/memblock.h
   #include linux/sizes.h
  +#include linux/cma.h
 
   #include asm/cputable.h
   #include asm/kvm_ppc.h
   #include asm/kvm_book3s.h
 
  -#include book3s_hv_cma.h
  +#define KVM_CMA_CHUNK_ORDER18
  +
   /*
* Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
* should be power of 2.
  @@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5;
   unsigned long kvm_rma_pages = (1  27)  PAGE_SHIFT; /* 128MB */
   EXPORT_SYMBOL_GPL(kvm_rma_pages);
 
  +static struct cma *kvm_cma;
  +
   /* Work out RMLS (real mode limit selector) field value for a given RMA 
  size.
  Assumes POWER7 or PPC970. */
   static inline int lpcr_rmls(unsigned long rma_size)
  @@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
  ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
  if (!ri)
  return NULL;
  -   page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages);
  +   page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
  if (!page)
  goto err_out;
  atomic_set(ri-use_count, 1);
  @@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma);
   void kvm_release_rma(struct kvm_rma_info *ri)
   {
  if (atomic_dec_and_test(ri-use_count)) {
  -   kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages);
  +   cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages);
  kfree(ri);
  }
   }
  @@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  /* Old CPUs require HPT aligned on a multiple of its size */
  if (!cpu_has_feature(CPU_FTR_ARCH_206))
  align_pages = nr_pages;
  -   return kvm_alloc_cma(nr_pages, align_pages);
  +   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
   }
   EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
 
   void kvm_release_hpt(struct page *page, unsigned long nr_pages)
   {
  -   kvm_release_cma(page, nr_pages);
  +   cma_release(kvm_cma, page, nr_pages);
   }
   EXPORT_SYMBOL_GPL(kvm_release_hpt);
 
  @@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void)
  align_size = HPT_ALIGN_PAGES  PAGE_SHIFT;
 
  align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
  -   kvm_cma_declare_contiguous(selected_size, align_size);
  +   cma_declare_contiguous(selected_size, 0, 0, align_size,
  +   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
  }
   }
 
  diff --git a/arch/powerpc/kvm/book3s_hv_cma.c 
  b/arch/powerpc/kvm/book3s_hv_cma.c
  deleted file mode 100644
  index d9d3d85..000
  --- a/arch/powerpc/kvm/book3s_hv_cma.c
  +++ /dev/null
  @@ -1,240 +0,0 @@
  -/*
  - * Contiguous Memory Allocator for ppc KVM hash pagetable  based on CMA
  - * for DMA mapping framework
  - *
  - * Copyright IBM Corporation, 2013
  - * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  - *
  - * This program is free software; you can redistribute it and/or
  - * modify it under the terms of the GNU General Public License as
  - * published by the Free Software Foundation; either version 2 of the
  - * License or (at your optional) any later version of the license.
  - *
  - */
  -#define pr_fmt(fmt) kvm_cma:  fmt
  -
  -#ifdef CONFIG_CMA_DEBUG
  -#ifndef DEBUG
  -#  define DEBUG
  -#endif
  -#endif
  -
  -#include linux/memblock.h
  -#include linux/mutex.h
  -#include linux/sizes.h
  -#include linux/slab.h
  -
  -#include book3s_hv_cma.h
  -
  -struct kvm_cma {
  -   unsigned long   base_pfn;
  -   unsigned long   count;
  -   unsigned long   *bitmap;
  -};
  -
  -static DEFINE_MUTEX(kvm_cma_mutex);
  -static struct kvm_cma kvm_cma_area;
  -
  -/**
  - * kvm_cma_declare_contiguous() - reserve area for contiguous memory 
  handling
  - *   for kvm hash pagetable
  - * @size:  Size of the reserved memory.
  - * @alignment:  Alignment for the contiguous memory area
  - *
  - * This function reserves memory for kvm cma area. It should be
  - * called by arch code when early allocator (memblock or bootmem)
  - * is still activate.
  - */
  -long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t 
  alignment)
  -{
  -   long base_pfn;
  -   phys_addr_t addr;
  -   struct kvm_cma *cma = kvm_cma_area;
  -
  -   pr_debug(%s(size %lx)\n, __func__, (unsigned long)size);
  -
  -   if (!size

Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code

2014-06-15 Thread Joonsoo Kim
On Sat, Jun 14, 2014 at 12:55:39PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Currently, there are two users on CMA functionality, one is the DMA
  subsystem and the other is the kvm on powerpc. They have their own code
  to manage CMA reserved area even if they looks really similar.
  From my guess, it is caused by some needs on bitmap management. Kvm side
  wants to maintain bitmap not for 1 page, but for more size. Eventually it
  use bitmap where one bit represents 64 pages.
 
  When I implement CMA related patches, I should change those two places
  to apply my change and it seem to be painful to me. I want to change
  this situation and reduce future code management overhead through
  this patch.
 
  This change could also help developer who want to use CMA in their
  new feature development, since they can use CMA easily without
  copying  pasting this reserved area management code.
 
  v2:
Although this patchset looks very different with v1, the end result,
that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.
 
  Patch 1-5 prepare some features to cover ppc kvm's requirements.
  Patch 6-7 generalize CMA reserved area management code and change users
  to use it.
  Patch 8-10 clean-up minor things.
 
 
 I wanted to test the ppc changes and found that the patch series doesn't apply
 against v3.15 . Do you have a kernel tree which I can clone to test this
 series ?

This is based on linux-next -next-20140610.
And my tree is on following link.

https://github.com/JoonsooKim/linux/tree/cma-general-v2.0-next-20140610

But, I think I'm late, because you have already added a Tested-by tag.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-15 Thread Joonsoo Kim
On Sat, Jun 14, 2014 at 02:23:59PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Now, we have general CMA reserved area management framework,
  so use it for future maintainabilty. There is no functional change.
 
  Acked-by: Michal Nazarewicz min...@mina86.com
  Acked-by: Paolo Bonzini pbonz...@redhat.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Need this. We may want to keep the VM_BUG_ON by moving
 KVM_CMA_CHUNK_ORDER around.
 
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index 8056107..1932e0e 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -37,8 +37,6 @@
  #include asm/ppc-opcode.h
  #include asm/cputable.h
  
 -#include book3s_hv_cma.h
 -
  /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
  #define MAX_LPID_970   63
  
 @@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 }
  
 kvm-arch.hpt_cma_alloc = 0;
 -   VM_BUG_ON(order  KVM_CMA_CHUNK_ORDER);
 page = kvm_alloc_hpt(1  (order - PAGE_SHIFT));
 if (page) {
 hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
 
 
 
 -aneesh

Okay.
So do you also want this?

@@ -131,16 +135,18 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
 {
unsigned long align_pages = HPT_ALIGN_PAGES;
 
+   VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
+
/* Old CPUs require HPT aligned on a multiple of its size */
if (!cpu_has_feature(CPU_FTR_ARCH_206))
align_pages = nr_pages;
-   return kvm_alloc_cma(nr_pages, align_pages);
+   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 }

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 -next 2/9] DMA, CMA: separate core CMA management codes from DMA APIs

2014-06-15 Thread Joonsoo Kim
To prepare future generalization work on CMA area management code,
we need to separate core CMA management codes from DMA APIs.
We will extend these core functions to cover requirements of
PPC KVM's CMA area management functionality in following patches.
This separation helps us not to touch DMA APIs while extending
core functions.

v3: move decriptions to exporeted APIs (Minchan)
pass aligned base and size to dma_contiguous_early_fixup() (Minchan)

Acked-by: Michal Nazarewicz min...@mina86.com
Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 6467c91..9021762 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -213,26 +213,9 @@ static int __init cma_init_reserved_areas(void)
 }
 core_initcall(cma_init_reserved_areas);
 
-/**
- * dma_contiguous_reserve_area() - reserve custom contiguous area
- * @size: Size of the reserved area (in bytes),
- * @base: Base address of the reserved area optional, use 0 for any
- * @limit: End address of the reserved memory (optional, 0 for any).
- * @res_cma: Pointer to store the created cma region.
- * @fixed: hint about where to place the reserved area
- *
- * This function reserves memory from early allocator. It should be
- * called by arch specific code once the early allocator (memblock or bootmem)
- * has been activated and all other subsystems have already allocated/reserved
- * memory. This function allows to create custom reserved areas for specific
- * devices.
- *
- * If @fixed is true, reserve contiguous area at exactly @base.  If false,
- * reserve in range from @base to @limit.
- */
-int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
-  phys_addr_t limit, struct cma **res_cma,
-  bool fixed)
+static int __init __dma_contiguous_reserve_area(phys_addr_t size,
+   phys_addr_t base, phys_addr_t limit,
+   struct cma **res_cma, bool fixed)
 {
struct cma *cma = cma_areas[cma_area_count];
phys_addr_t alignment;
@@ -286,15 +269,47 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 
pr_info(CMA: reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M,
(unsigned long)base);
-
-   /* Architecture specific contiguous memory fixup. */
-   dma_contiguous_early_fixup(base, size);
return 0;
+
 err:
pr_err(CMA: failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M);
return ret;
 }
 
+/**
+ * dma_contiguous_reserve_area() - reserve custom contiguous area
+ * @size: Size of the reserved area (in bytes),
+ * @base: Base address of the reserved area optional, use 0 for any
+ * @limit: End address of the reserved memory (optional, 0 for any).
+ * @res_cma: Pointer to store the created cma region.
+ * @fixed: hint about where to place the reserved area
+ *
+ * This function reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory. This function allows to create custom reserved areas for specific
+ * devices.
+ *
+ * If @fixed is true, reserve contiguous area at exactly @base.  If false,
+ * reserve in range from @base to @limit.
+ */
+int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
+  phys_addr_t limit, struct cma **res_cma,
+  bool fixed)
+{
+   int ret;
+
+   ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
+   if (ret)
+   return ret;
+
+   /* Architecture specific contiguous memory fixup. */
+   dma_contiguous_early_fixup(PFN_PHYS((*res_cma)-base_pfn),
+   (*res_cma)-count  PAGE_SHIFT);
+
+   return 0;
+}
+
 static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 {
mutex_lock(cma-lock);
@@ -302,31 +317,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
long pfn, int count)
mutex_unlock(cma-lock);
 }
 
-/**
- * dma_alloc_from_contiguous() - allocate pages from contiguous area
- * @dev:   Pointer to device for which the allocation is performed.
- * @count: Requested number of pages.
- * @align: Requested alignment of pages (in PAGE_SIZE order).
- *
- * This function allocates memory buffer for specified device. It uses
- * device specific contiguous memory area if available or the default
- * global one. Requires architecture specific dev_get_cma_area() helper
- * function.
- */
-struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count

[PATCH v3 -next 3/9] DMA, CMA: support alignment constraint on CMA region

2014-06-15 Thread Joonsoo Kim
PPC KVM's CMA area management needs alignment constraint on
CMA region. So support it to prepare generalization of CMA area
management functionality.

Additionally, add some comments which tell us why alignment
constraint is needed on CMA region.

v3: fix wrongly spelled word, align_order-alignment (Minchan)
clear code documentation by Minchan's comment (Minchan)

Acked-by: Michal Nazarewicz min...@mina86.com
Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 9021762..5f62c28 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -32,6 +32,7 @@
 #include linux/swap.h
 #include linux/mm_types.h
 #include linux/dma-contiguous.h
+#include linux/log2.h
 
 struct cma {
unsigned long   base_pfn;
@@ -215,17 +216,16 @@ core_initcall(cma_init_reserved_areas);
 
 static int __init __dma_contiguous_reserve_area(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
+   phys_addr_t alignment,
struct cma **res_cma, bool fixed)
 {
struct cma *cma = cma_areas[cma_area_count];
-   phys_addr_t alignment;
int ret = 0;
 
-   pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__,
-(unsigned long)size, (unsigned long)base,
-(unsigned long)limit);
+   pr_debug(%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n,
+   __func__, (unsigned long)size, (unsigned long)base,
+   (unsigned long)limit, (unsigned long)alignment);
 
-   /* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
pr_err(Not enough slots for CMA reserved regions!\n);
return -ENOSPC;
@@ -234,8 +234,17 @@ static int __init 
__dma_contiguous_reserve_area(phys_addr_t size,
if (!size)
return -EINVAL;
 
-   /* Sanitise input arguments */
-   alignment = PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order);
+   if (alignment  !is_power_of_2(alignment))
+   return -EINVAL;
+
+   /*
+* Sanitise input arguments.
+* Pages both ends in CMA area could be merged into adjacent unmovable
+* migratetype page by page allocator's buddy algorithm. In the case,
+* you couldn't get a contiguous memory, which is not what we want.
+*/
+   alignment = max(alignment,
+   (phys_addr_t)PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order));
base = ALIGN(base, alignment);
size = ALIGN(size, alignment);
limit = ~(alignment - 1);
@@ -299,7 +308,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
+   ret = __dma_contiguous_reserve_area(size, base, limit, 0,
+   res_cma, fixed);
if (ret)
return ret;
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 -next 8/9] mm, CMA: change cma_declare_contiguous() to obey coding convention

2014-06-15 Thread Joonsoo Kim
Conventionally, we put output param to the end of param list
and put the 'base' ahead of 'size', but cma_declare_contiguous()
doesn't look like that, so change it.

Additionally, move down cma_areas reference code to the position
where it is really needed.

v3: put 'base' ahead of 'size' (Minchan)

Acked-by: Michal Nazarewicz min...@mina86.com
Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 3960e0b..6cf498a 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -185,8 +185,8 @@ void __init kvm_cma_reserve(void)
align_size = HPT_ALIGN_PAGES  PAGE_SHIFT;
 
align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
-   cma_declare_contiguous(selected_size, 0, 0, align_size,
-   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
+   cma_declare_contiguous(0, selected_size, 0, align_size,
+   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, kvm_cma);
}
 }
 
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 0411c1c..6606abd 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -165,7 +165,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed);
+   ret = cma_declare_contiguous(base, size, limit, 0, 0, fixed, res_cma);
if (ret)
return ret;
 
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 69d3726..32cab7a 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -15,7 +15,7 @@ extern unsigned long cma_get_size(struct cma *cma);
 extern int __init cma_declare_contiguous(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
phys_addr_t alignment, unsigned int order_per_bit,
-   struct cma **res_cma, bool fixed);
+   bool fixed, struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align);
 extern bool cma_release(struct cma *cma, struct page *pages, int count);
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index b442a13..9961120 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -141,13 +141,13 @@ core_initcall(cma_init_reserved_areas);
 
 /**
  * cma_declare_contiguous() - reserve custom contiguous area
- * @size: Size of the reserved area (in bytes),
  * @base: Base address of the reserved area optional, use 0 for any
+ * @size: Size of the reserved area (in bytes),
  * @limit: End address of the reserved memory (optional, 0 for any).
  * @alignment: Alignment for the CMA area, should be power of 2 or zero
  * @order_per_bit: Order of pages represented by one bit on bitmap.
- * @res_cma: Pointer to store the created cma region.
  * @fixed: hint about where to place the reserved area
+ * @res_cma: Pointer to store the created cma region.
  *
  * This function reserves memory from early allocator. It should be
  * called by arch specific code once the early allocator (memblock or bootmem)
@@ -157,12 +157,12 @@ core_initcall(cma_init_reserved_areas);
  * If @fixed is true, reserve contiguous area at exactly @base.  If false,
  * reserve in range from @base to @limit.
  */
-int __init cma_declare_contiguous(phys_addr_t size,
-   phys_addr_t base, phys_addr_t limit,
+int __init cma_declare_contiguous(phys_addr_t base,
+   phys_addr_t size, phys_addr_t limit,
phys_addr_t alignment, unsigned int order_per_bit,
-   struct cma **res_cma, bool fixed)
+   bool fixed, struct cma **res_cma)
 {
-   struct cma *cma = cma_areas[cma_area_count];
+   struct cma *cma;
int ret = 0;
 
pr_debug(%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n,
@@ -218,6 +218,7 @@ int __init cma_declare_contiguous(phys_addr_t size,
 * Each reserved area must be initialised later, when more kernel
 * subsystems (like slab allocator) are available.
 */
+   cma = cma_areas[cma_area_count];
cma-base_pfn = PFN_DOWN(base);
cma-count = size  PAGE_SHIFT;
cma-order_per_bit = order_per_bit;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code

2014-06-15 Thread Joonsoo Kim
Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the KVM on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar.
From my guess, it is caused by some needs on bitmap management. Kvm side
wants to maintain bitmap not for 1 page, but for more size. Eventually it
use bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through
this patch.

This change could also help developer who want to use CMA in their
new feature development, since they can use CMA easily without
copying  pasting this reserved area management code.

v3:
  - Simplify old patch 1(log format fix) and move it to the end of patchset.
  - Patch 2: Pass aligned base and size to dma_contiguous_early_fixup()
  - Patch 5: Add some accessor functions to pass aligned base and size to
  dma_contiguous_early_fixup() function
  - Patch 5: Move MAX_CMA_AREAS definition to cma.h
  - Patch 6: Add CMA region zeroing to PPC KVM's CMA alloc function
  - Patch 8: put 'base' ahead of 'size' in cma_declare_contiguous()
  - Remaining minor fixes are noted in commit description of each one

v2:
  - Although this patchset looks very different with v1, the end result,
  that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.

This patchset is based on linux-next 20140610.

Patch 1-4 prepare some features to cover PPC KVM's requirements.
Patch 5-6 generalize CMA reserved area management code and change users
to use it.
Patch 7-9 clean-up minor things.

Joonsoo Kim (9):
  DMA, CMA: fix possible memory leak
  DMA, CMA: separate core CMA management codes from DMA APIs
  DMA, CMA: support alignment constraint on CMA region
  DMA, CMA: support arbitrary bitmap granularity
  CMA: generalize CMA reserved area management functionality
  PPC, KVM, CMA: use general CMA reserved area management framework
  mm, CMA: clean-up CMA allocation error path
  mm, CMA: change cma_declare_contiguous() to obey coding convention
  mm, CMA: clean-up log message

 arch/arm/mm/dma-mapping.c|1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |4 +-
 arch/powerpc/kvm/book3s_hv_builtin.c |   19 +-
 arch/powerpc/kvm/book3s_hv_cma.c |  240 
 arch/powerpc/kvm/book3s_hv_cma.h |   27 ---
 drivers/base/Kconfig |   10 -
 drivers/base/dma-contiguous.c|  210 ++---
 include/linux/cma.h  |   21 +++
 include/linux/dma-contiguous.h   |   11 +-
 mm/Kconfig   |   11 ++
 mm/Makefile  |1 +
 mm/cma.c |  335 ++
 12 files changed, 397 insertions(+), 493 deletions(-)
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.c
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.h
 create mode 100644 include/linux/cma.h
 create mode 100644 mm/cma.c

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 -next 6/9] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-15 Thread Joonsoo Kim
Now, we have general CMA reserved area management framework,
so use it for future maintainabilty. There is no functional change.

v3: add zeroing to CMA region (Aneesh)
fix compile error (Aneesh)
move VM_BUG_ON() to kvm_alloc_hpt() in book3s_hv_builtin.c (Aneesh)

Acked-by: Michal Nazarewicz min...@mina86.com
Acked-by: Paolo Bonzini pbonz...@redhat.com
Tested-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8056107..a41e625 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -37,8 +37,6 @@
 #include asm/ppc-opcode.h
 #include asm/cputable.h
 
-#include book3s_hv_cma.h
-
 /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
 #define MAX_LPID_970   63
 
@@ -64,10 +62,10 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
kvm-arch.hpt_cma_alloc = 0;
-   VM_BUG_ON(order  KVM_CMA_CHUNK_ORDER);
page = kvm_alloc_hpt(1  (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
+   memset((void *)hpt, 0, (1  order));
kvm-arch.hpt_cma_alloc = 1;
}
 
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7cde8a6..3960e0b 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -16,12 +16,14 @@
 #include linux/init.h
 #include linux/memblock.h
 #include linux/sizes.h
+#include linux/cma.h
 
 #include asm/cputable.h
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
 
-#include book3s_hv_cma.h
+#define KVM_CMA_CHUNK_ORDER18
+
 /*
  * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
  * should be power of 2.
@@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5;
 unsigned long kvm_rma_pages = (1  27)  PAGE_SHIFT; /* 128MB */
 EXPORT_SYMBOL_GPL(kvm_rma_pages);
 
+static struct cma *kvm_cma;
+
 /* Work out RMLS (real mode limit selector) field value for a given RMA size.
Assumes POWER7 or PPC970. */
 static inline int lpcr_rmls(unsigned long rma_size)
@@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
if (!ri)
return NULL;
-   page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages);
+   page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
if (!page)
goto err_out;
atomic_set(ri-use_count, 1);
@@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma);
 void kvm_release_rma(struct kvm_rma_info *ri)
 {
if (atomic_dec_and_test(ri-use_count)) {
-   kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages);
+   cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages);
kfree(ri);
}
 }
@@ -131,16 +135,18 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
 {
unsigned long align_pages = HPT_ALIGN_PAGES;
 
+   VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
+
/* Old CPUs require HPT aligned on a multiple of its size */
if (!cpu_has_feature(CPU_FTR_ARCH_206))
align_pages = nr_pages;
-   return kvm_alloc_cma(nr_pages, align_pages);
+   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
 
 void kvm_release_hpt(struct page *page, unsigned long nr_pages)
 {
-   kvm_release_cma(page, nr_pages);
+   cma_release(kvm_cma, page, nr_pages);
 }
 EXPORT_SYMBOL_GPL(kvm_release_hpt);
 
@@ -179,7 +185,8 @@ void __init kvm_cma_reserve(void)
align_size = HPT_ALIGN_PAGES  PAGE_SHIFT;
 
align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
-   kvm_cma_declare_contiguous(selected_size, align_size);
+   cma_declare_contiguous(selected_size, 0, 0, align_size,
+   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
}
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
deleted file mode 100644
index d9d3d85..000
--- a/arch/powerpc/kvm/book3s_hv_cma.c
+++ /dev/null
@@ -1,240 +0,0 @@
-/*
- * Contiguous Memory Allocator for ppc KVM hash pagetable  based on CMA
- * for DMA mapping framework
- *
- * Copyright IBM Corporation, 2013
- * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of the
- * License or (at your optional) any later version of the license.
- *
- */
-#define pr_fmt(fmt) kvm_cma:  fmt
-
-#ifdef CONFIG_CMA_DEBUG
-#ifndef DEBUG
-#  define DEBUG
-#endif
-#endif
-
-#include linux/memblock.h
-#include linux/mutex.h
-#include linux/sizes.h
-#include linux/slab.h

[PATCH v3 -next 1/9] DMA, CMA: fix possible memory leak

2014-06-15 Thread Joonsoo Kim
We should free memory for bitmap when we find zone mis-match,
otherwise this memory will leak.

Additionally, I copy code comment from PPC KVM's CMA code to inform
why we need to check zone mis-match.

* Note
Minchan suggested to add a tag for the stable, but, I don't do it,
because I found this possibility during code-review and, IMO,
this patch isn't suitable for stable tree.

Acked-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
Reviewed-by: Michal Nazarewicz min...@mina86.com
Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 83969f8..6467c91 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -176,14 +176,24 @@ static int __init cma_activate_area(struct cma *cma)
base_pfn = pfn;
for (j = pageblock_nr_pages; j; --j, pfn++) {
WARN_ON_ONCE(!pfn_valid(pfn));
+   /*
+* alloc_contig_range requires the pfn range
+* specified to be in the same zone. Make this
+* simple by forcing the entire CMA resv range
+* to be in the same zone.
+*/
if (page_zone(pfn_to_page(pfn)) != zone)
-   return -EINVAL;
+   goto err;
}
init_cma_reserved_pageblock(pfn_to_page(base_pfn));
} while (--i);
 
mutex_init(cma-lock);
return 0;
+
+err:
+   kfree(cma-bitmap);
+   return -EINVAL;
 }
 
 static struct cma cma_areas[MAX_CMA_AREAS];
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 -next 7/9] mm, CMA: clean-up CMA allocation error path

2014-06-15 Thread Joonsoo Kim
We can remove one call sites for clear_cma_bitmap() if we first
call it before checking error number.

Acked-by: Minchan Kim minc...@kernel.org
Reviewed-by: Michal Nazarewicz min...@mina86.com
Reviewed-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/cma.c b/mm/cma.c
index 0cf50da..b442a13 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -285,11 +285,12 @@ struct page *cma_alloc(struct cma *cma, int count, 
unsigned int align)
if (ret == 0) {
page = pfn_to_page(pfn);
break;
-   } else if (ret != -EBUSY) {
-   cma_clear_bitmap(cma, pfn, count);
-   break;
}
+
cma_clear_bitmap(cma, pfn, count);
+   if (ret != -EBUSY)
+   break;
+
pr_debug(%s(): memory range at %p is busy, retrying\n,
 __func__, pfn_to_page(pfn));
/* try again with a bit different memory target */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 -next 9/9] mm, CMA: clean-up log message

2014-06-15 Thread Joonsoo Kim
We don't need explicit 'CMA:' prefix, since we already define prefix
'cma:' in pr_fmt. So remove it.

Acked-by: Michal Nazarewicz min...@mina86.com
Reviewed-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/cma.c b/mm/cma.c
index 9961120..4b251b0 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -225,12 +225,12 @@ int __init cma_declare_contiguous(phys_addr_t base,
*res_cma = cma;
cma_area_count++;
 
-   pr_info(CMA: reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M,
+   pr_info(Reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M,
(unsigned long)base);
return 0;
 
 err:
-   pr_err(CMA: failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M);
+   pr_err(Failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M);
return ret;
 }
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 -next 4/9] DMA, CMA: support arbitrary bitmap granularity

2014-06-15 Thread Joonsoo Kim
PPC KVM's CMA area management requires arbitrary bitmap granularity,
since they want to reserve very large memory and manage this region
with bitmap that one bit for several pages to reduce management overheads.
So support arbitrary bitmap granularity for following generalization.

v3: use consistent local variable name (Minchan)
use unsigned int for order_per_bit (Michal)
change clear_cma_bitmap to cma_clear_bitmap for consistency (Michal)
remove un-needed local variable, bitmap_maxno (Michal)

Acked-by: Michal Nazarewicz min...@mina86.com
Acked-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
Acked-by: Minchan Kim minc...@kernel.org
Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 5f62c28..c6eeb2c 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -38,6 +38,7 @@ struct cma {
unsigned long   base_pfn;
unsigned long   count;
unsigned long   *bitmap;
+   unsigned int order_per_bit; /* Order of pages represented by one bit */
struct mutexlock;
 };
 
@@ -157,9 +158,37 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
 
 static DEFINE_MUTEX(cma_mutex);
 
+static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order)
+{
+   return (1  (align_order  cma-order_per_bit)) - 1;
+}
+
+static unsigned long cma_bitmap_maxno(struct cma *cma)
+{
+   return cma-count  cma-order_per_bit;
+}
+
+static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
+   unsigned long pages)
+{
+   return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
+}
+
+static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, int count)
+{
+   unsigned long bitmap_no, bitmap_count;
+
+   bitmap_no = (pfn - cma-base_pfn)  cma-order_per_bit;
+   bitmap_count = cma_bitmap_pages_to_bits(cma, count);
+
+   mutex_lock(cma-lock);
+   bitmap_clear(cma-bitmap, bitmap_no, bitmap_count);
+   mutex_unlock(cma-lock);
+}
+
 static int __init cma_activate_area(struct cma *cma)
 {
-   int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long);
+   int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
unsigned i = cma-count  pageblock_order;
struct zone *zone;
@@ -215,9 +244,9 @@ static int __init cma_init_reserved_areas(void)
 core_initcall(cma_init_reserved_areas);
 
 static int __init __dma_contiguous_reserve_area(phys_addr_t size,
-   phys_addr_t base, phys_addr_t limit,
-   phys_addr_t alignment,
-   struct cma **res_cma, bool fixed)
+   phys_addr_t base, phys_addr_t limit,
+   phys_addr_t alignment, unsigned int order_per_bit,
+   struct cma **res_cma, bool fixed)
 {
struct cma *cma = cma_areas[cma_area_count];
int ret = 0;
@@ -249,6 +278,10 @@ static int __init 
__dma_contiguous_reserve_area(phys_addr_t size,
size = ALIGN(size, alignment);
limit = ~(alignment - 1);
 
+   /* size should be aligned with order_per_bit */
+   if (!IS_ALIGNED(size  PAGE_SHIFT, 1  order_per_bit))
+   return -EINVAL;
+
/* Reserve memory */
if (base  fixed) {
if (memblock_is_region_reserved(base, size) ||
@@ -273,6 +306,7 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t 
size,
 */
cma-base_pfn = PFN_DOWN(base);
cma-count = size  PAGE_SHIFT;
+   cma-order_per_bit = order_per_bit;
*res_cma = cma;
cma_area_count++;
 
@@ -308,7 +342,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = __dma_contiguous_reserve_area(size, base, limit, 0,
+   ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0,
res_cma, fixed);
if (ret)
return ret;
@@ -320,17 +354,11 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
return 0;
 }
 
-static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
-{
-   mutex_lock(cma-lock);
-   bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count);
-   mutex_unlock(cma-lock);
-}
-
 static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
   unsigned int align)
 {
-   unsigned long mask, pfn, pageno, start = 0;
+   unsigned long mask, pfn, start = 0;
+   unsigned long bitmap_maxno, bitmap_no, bitmap_count;
struct page *page = NULL;
int ret;
 
@@ -343,18 +371,19 @@ static struct page *__dma_alloc_from_contiguous(struct 
cma *cma, int count,
if (!count

[PATCH v3 -next 5/9] CMA: generalize CMA reserved area management functionality

2014-06-15 Thread Joonsoo Kim
Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the KVM on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar.
From my guess, it is caused by some needs on bitmap management. KVM side
wants to maintain bitmap not for 1 page, but for more size. Eventually it
use bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through
this patch.

This change could also help developer who want to use CMA in their
new feature development, since they can use CMA easily without
copying  pasting this reserved area management code.

In previous patches, we have prepared some features to generalize
CMA reserved area management and now it's time to do it. This patch
moves core functions to mm/cma.c and change DMA APIs to use
these functions.

There is no functional change in DMA APIs.

v2: There is no big change from v1 in mm/cma.c. Mostly renaming.
v3: remove log2.h in dma-contiguous.c (Minchan)
add some accessor functions to pass aligned base and size to
dma_contiguous_early_fixup() function
move MAX_CMA_AREAS to cma.h

Acked-by: Michal Nazarewicz min...@mina86.com
Acked-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
Acked-by: Minchan Kim minc...@kernel.org
Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 4c88935..3116880 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -26,6 +26,7 @@
 #include linux/io.h
 #include linux/vmalloc.h
 #include linux/sizes.h
+#include linux/cma.h
 
 #include asm/memory.h
 #include asm/highmem.h
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 00e13ce..4eac559 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -283,16 +283,6 @@ config CMA_ALIGNMENT
 
  If unsure, leave the default value 8.
 
-config CMA_AREAS
-   int Maximum count of the CMA device-private areas
-   default 7
-   help
- CMA allows to create CMA areas for particular devices. This parameter
- sets the maximum number of such device private CMA areas in the
- system.
-
- If unsure, leave the default value 7.
-
 endif
 
 endmenu
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index c6eeb2c..0411c1c 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -24,25 +24,9 @@
 
 #include linux/memblock.h
 #include linux/err.h
-#include linux/mm.h
-#include linux/mutex.h
-#include linux/page-isolation.h
 #include linux/sizes.h
-#include linux/slab.h
-#include linux/swap.h
-#include linux/mm_types.h
 #include linux/dma-contiguous.h
-#include linux/log2.h
-
-struct cma {
-   unsigned long   base_pfn;
-   unsigned long   count;
-   unsigned long   *bitmap;
-   unsigned int order_per_bit; /* Order of pages represented by one bit */
-   struct mutexlock;
-};
-
-struct cma *dma_contiguous_default_area;
+#include linux/cma.h
 
 #ifdef CONFIG_CMA_SIZE_MBYTES
 #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
@@ -50,6 +34,8 @@ struct cma *dma_contiguous_default_area;
 #define CMA_SIZE_MBYTES 0
 #endif
 
+struct cma *dma_contiguous_default_area;
+
 /*
  * Default global CMA area size can be defined in kernel's .config.
  * This is useful mainly for distro maintainers to create a kernel
@@ -156,169 +142,6 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
}
 }
 
-static DEFINE_MUTEX(cma_mutex);
-
-static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order)
-{
-   return (1  (align_order  cma-order_per_bit)) - 1;
-}
-
-static unsigned long cma_bitmap_maxno(struct cma *cma)
-{
-   return cma-count  cma-order_per_bit;
-}
-
-static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
-   unsigned long pages)
-{
-   return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
-}
-
-static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, int count)
-{
-   unsigned long bitmap_no, bitmap_count;
-
-   bitmap_no = (pfn - cma-base_pfn)  cma-order_per_bit;
-   bitmap_count = cma_bitmap_pages_to_bits(cma, count);
-
-   mutex_lock(cma-lock);
-   bitmap_clear(cma-bitmap, bitmap_no, bitmap_count);
-   mutex_unlock(cma-lock);
-}
-
-static int __init cma_activate_area(struct cma *cma)
-{
-   int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
-   unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
-   unsigned i = cma-count  pageblock_order;
-   struct zone *zone;
-
-   cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
-
-   if (!cma-bitmap)
-   return -ENOMEM;
-
-   WARN_ON_ONCE(!pfn_valid(pfn

Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-12 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 02:52:20PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:41PM +0900, Joonsoo Kim wrote:
  ppc kvm's cma area management needs alignment constraint on
  cma region. So support it to prepare generalization of cma area
  management functionality.
  
  Additionally, add some comments which tell us why alignment
  constraint is needed on cma region.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index 8a44c82..bc4c171 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -32,6 +32,7 @@
   #include linux/swap.h
   #include linux/mm_types.h
   #include linux/dma-contiguous.h
  +#include linux/log2.h
   
   struct cma {
  unsigned long   base_pfn;
  @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
* @size: Size of the reserved area (in bytes),
* @base: Base address of the reserved area optional, use 0 for any
* @limit: End address of the reserved memory (optional, 0 for any).
  + * @alignment: Alignment for the contiguous memory area, should be power 
  of 2
* @res_cma: Pointer to store the created cma region.
* @fixed: hint about where to place the reserved area
*
 
 Pz, move the all description to new API function rather than internal one.

Reason I leave all description as is is that I will remove it in
following patch. I think that moving these makes patch bigger and hard
to review.

But, if it is necessary, I will do it. :)

 
  @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
*/
   static int __init __dma_contiguous_reserve_area(phys_addr_t size,
  phys_addr_t base, phys_addr_t limit,
  +   phys_addr_t alignment,
  struct cma **res_cma, bool fixed)
   {
  struct cma *cma = cma_areas[cma_area_count];
  -   phys_addr_t alignment;
  int ret = 0;
   
  -   pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__,
  -(unsigned long)size, (unsigned long)base,
  -(unsigned long)limit);
  +   pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n,
 
 Why is it called by align_order?

Oops... mistake.
I will fix it.

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity

2014-06-12 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 03:06:10PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:42PM +0900, Joonsoo Kim wrote:
  ppc kvm's cma region management requires arbitrary bitmap granularity,
  since they want to reserve very large memory and manage this region
  with bitmap that one bit for several pages to reduce management overheads.
  So support arbitrary bitmap granularity for following generalization.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index bc4c171..9bc9340 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -38,6 +38,7 @@ struct cma {
  unsigned long   base_pfn;
  unsigned long   count;
  unsigned long   *bitmap;
  +   int order_per_bit; /* Order of pages represented by one bit */
 
 Hmm, I'm not sure it's good as *general* interface even though it covers
 existing usecases.
 
 It forces a cma area should be handled by same size unit. Right?
 It's really important point for this patchset's motivation so I will stop
 review and wait other opinions.

If you pass 0 to order_per_bit, you can manage cma area in every
size(page unit) you want. If you pass certain number to order_per_bit,
you can allocate and release cma area in multiple of such page order.

I think that this is more general implementation than previous versions.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity

2014-06-12 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 04:08:11PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:42PM +0900, Joonsoo Kim wrote:
  ppc kvm's cma region management requires arbitrary bitmap granularity,
  since they want to reserve very large memory and manage this region
  with bitmap that one bit for several pages to reduce management overheads.
  So support arbitrary bitmap granularity for following generalization.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 Acked-by: Minchan Kim minc...@kernel.org
 

Thanks.

[snip...]
   /**
* dma_alloc_from_contiguous() - allocate pages from contiguous area
* @dev:   Pointer to device for which the allocation is performed.
  @@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
  long pfn, int count)
   static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
 unsigned int align)
   {
  -   unsigned long mask, pfn, pageno, start = 0;
  +   unsigned long mask, pfn, start = 0;
  +   unsigned long bitmap_maxno, bitmapno, nr_bits;
 
 Just Nit: bitmap_maxno, bitmap_no or something consistent.
 I know you love consistent when I read description in first patch
 in this patchset. ;-)

Yeah, I will fix it. :)

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-12 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 04:13:11PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:43PM +0900, Joonsoo Kim wrote:
  Currently, there are two users on CMA functionality, one is the DMA
  subsystem and the other is the kvm on powerpc. They have their own code
  to manage CMA reserved area even if they looks really similar.
  From my guess, it is caused by some needs on bitmap management. Kvm side
  wants to maintain bitmap not for 1 page, but for more size. Eventually it
  use bitmap where one bit represents 64 pages.
  
  When I implement CMA related patches, I should change those two places
  to apply my change and it seem to be painful to me. I want to change
  this situation and reduce future code management overhead through
  this patch.
  
  This change could also help developer who want to use CMA in their
  new feature development, since they can use CMA easily without
  copying  pasting this reserved area management code.
  
  In previous patches, we have prepared some features to generalize
  CMA reserved area management and now it's time to do it. This patch
  moves core functions to mm/cma.c and change DMA APIs to use
  these functions.
  
  There is no functional change in DMA APIs.
  
  v2: There is no big change from v1 in mm/cma.c. Mostly renaming.
  
  Acked-by: Michal Nazarewicz min...@mina86.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Acutally, I want to remove bool return of cma_release but it's not
 a out of scope in this patchset.
 
 Acked-by: Minchan Kim minc...@kernel.org
 
  
  diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
  index 00e13ce..4eac559 100644
  --- a/drivers/base/Kconfig
  +++ b/drivers/base/Kconfig
  @@ -283,16 +283,6 @@ config CMA_ALIGNMENT
   
If unsure, leave the default value 8.
   
  -config CMA_AREAS
  -   int Maximum count of the CMA device-private areas
  -   default 7
  -   help
  - CMA allows to create CMA areas for particular devices. This parameter
  - sets the maximum number of such device private CMA areas in the
  - system.
  -
  - If unsure, leave the default value 7.
  -
   endif
   
   endmenu
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index 9bc9340..f177f73 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -24,25 +24,10 @@
   
   #include linux/memblock.h
   #include linux/err.h
  -#include linux/mm.h
  -#include linux/mutex.h
  -#include linux/page-isolation.h
   #include linux/sizes.h
  -#include linux/slab.h
  -#include linux/swap.h
  -#include linux/mm_types.h
   #include linux/dma-contiguous.h
   #include linux/log2.h
 
 Should we remain log2.h in here?
 

We should remove it. I will fix it.

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 09/10] mm, cma: move output param to the end of param list

2014-06-12 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 04:19:31PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:46PM +0900, Joonsoo Kim wrote:
  Conventionally, we put output param to the end of param list.
  cma_declare_contiguous() doesn't look like that, so change it.
 
 If you says Conventionally, I'd like to suggest one more thing.
 Conventionally, we put 'base' ahead 'size' but dma_contiguous_reserve_area
 is opposite.

Okay. I will do it. :)

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 10/10] mm, cma: use spinlock instead of mutex

2014-06-12 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 04:40:29PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:47PM +0900, Joonsoo Kim wrote:
  Currently, we should take the mutex for manipulating bitmap.
  This job may be really simple and short so we don't need to sleep
  if contended. So I change it to spinlock.
 
 I'm not sure it would be good always.
 Maybe you remember we discussed about similar stuff about bitmap
 searching in vmap friend internally, which was really painful
 when it was fragmented. So, at least we need number if you really want
 and I hope the number from ARM machine most popular platform for CMA
 at the moment.

Good Point!! Agreed. I will drop this one in next spin and re-submit
in separate patchset after some testing.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 08/10] mm, cma: clean-up cma allocation error path

2014-06-11 Thread Joonsoo Kim
We can remove one call sites for clear_cma_bitmap() if we first
call it before checking error number.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/cma.c b/mm/cma.c
index 1e1b017..01a0713 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -282,11 +282,12 @@ struct page *cma_alloc(struct cma *cma, int count, 
unsigned int align)
if (ret == 0) {
page = pfn_to_page(pfn);
break;
-   } else if (ret != -EBUSY) {
-   clear_cma_bitmap(cma, pfn, count);
-   break;
}
+
clear_cma_bitmap(cma, pfn, count);
+   if (ret != -EBUSY)
+   break;
+
pr_debug(%s(): memory range at %p is busy, retrying\n,
 __func__, pfn_to_page(pfn));
/* try again with a bit different memory target */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-11 Thread Joonsoo Kim
ppc kvm's cma area management needs alignment constraint on
cma region. So support it to prepare generalization of cma area
management functionality.

Additionally, add some comments which tell us why alignment
constraint is needed on cma region.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 8a44c82..bc4c171 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -32,6 +32,7 @@
 #include linux/swap.h
 #include linux/mm_types.h
 #include linux/dma-contiguous.h
+#include linux/log2.h
 
 struct cma {
unsigned long   base_pfn;
@@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
  * @size: Size of the reserved area (in bytes),
  * @base: Base address of the reserved area optional, use 0 for any
  * @limit: End address of the reserved memory (optional, 0 for any).
+ * @alignment: Alignment for the contiguous memory area, should be power of 2
  * @res_cma: Pointer to store the created cma region.
  * @fixed: hint about where to place the reserved area
  *
@@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
  */
 static int __init __dma_contiguous_reserve_area(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
+   phys_addr_t alignment,
struct cma **res_cma, bool fixed)
 {
struct cma *cma = cma_areas[cma_area_count];
-   phys_addr_t alignment;
int ret = 0;
 
-   pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__,
-(unsigned long)size, (unsigned long)base,
-(unsigned long)limit);
+   pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n,
+   __func__, (unsigned long)size, (unsigned long)base,
+   (unsigned long)limit, (unsigned long)alignment);
 
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
@@ -253,8 +255,17 @@ static int __init 
__dma_contiguous_reserve_area(phys_addr_t size,
if (!size)
return -EINVAL;
 
-   /* Sanitise input arguments */
-   alignment = PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order);
+   if (alignment  !is_power_of_2(alignment))
+   return -EINVAL;
+
+   /*
+* Sanitise input arguments.
+* CMA area should be at least MAX_ORDER - 1 aligned. Otherwise,
+* CMA area could be merged into other MIGRATE_TYPE by buddy mechanism
+* and CMA property will be broken.
+*/
+   alignment = max(alignment,
+   (phys_addr_t)PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order));
base = ALIGN(base, alignment);
size = ALIGN(size, alignment);
limit = ~(alignment - 1);
@@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
+   ret = __dma_contiguous_reserve_area(size, base, limit, 0,
+   res_cma, fixed);
if (ret)
return ret;
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 10/10] mm, cma: use spinlock instead of mutex

2014-06-11 Thread Joonsoo Kim
Currently, we should take the mutex for manipulating bitmap.
This job may be really simple and short so we don't need to sleep
if contended. So I change it to spinlock.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/cma.c b/mm/cma.c
index 22a5b23..3085e8c 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -27,6 +27,7 @@
 #include linux/memblock.h
 #include linux/err.h
 #include linux/mm.h
+#include linux/spinlock.h
 #include linux/mutex.h
 #include linux/sizes.h
 #include linux/slab.h
@@ -36,7 +37,7 @@ struct cma {
unsigned long   count;
unsigned long   *bitmap;
int order_per_bit; /* Order of pages represented by one bit */
-   struct mutexlock;
+   spinlock_t  lock;
 };
 
 /*
@@ -72,9 +73,9 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long 
pfn, int count)
bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
nr_bits = cma_bitmap_pages_to_bits(cma, count);
 
-   mutex_lock(cma-lock);
+   spin_lock(cma-lock);
bitmap_clear(cma-bitmap, bitmapno, nr_bits);
-   mutex_unlock(cma-lock);
+   spin_unlock(cma-lock);
 }
 
 static int __init cma_activate_area(struct cma *cma)
@@ -112,7 +113,7 @@ static int __init cma_activate_area(struct cma *cma)
init_cma_reserved_pageblock(pfn_to_page(base_pfn));
} while (--i);
 
-   mutex_init(cma-lock);
+   spin_lock_init(cma-lock);
return 0;
 
 err:
@@ -261,11 +262,11 @@ struct page *cma_alloc(struct cma *cma, int count, 
unsigned int align)
nr_bits = cma_bitmap_pages_to_bits(cma, count);
 
for (;;) {
-   mutex_lock(cma-lock);
+   spin_lock(cma-lock);
bitmapno = bitmap_find_next_zero_area(cma-bitmap,
bitmap_maxno, start, nr_bits, mask);
if (bitmapno = bitmap_maxno) {
-   mutex_unlock(cma-lock);
+   spin_unlock(cma-lock);
break;
}
bitmap_set(cma-bitmap, bitmapno, nr_bits);
@@ -274,7 +275,7 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned 
int align)
 * our exclusive use. If the migration fails we will take the
 * lock again and unmark it.
 */
-   mutex_unlock(cma-lock);
+   spin_unlock(cma-lock);
 
pfn = cma-base_pfn + (bitmapno  cma-order_per_bit);
mutex_lock(cma_mutex);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Joonsoo Kim
We should free memory for bitmap when we find zone mis-match,
otherwise this memory will leak.

Additionally, I copy code comment from ppc kvm's cma code to notify
why we need to check zone mis-match.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index bd0bb81..fb0cdce 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
base_pfn = pfn;
for (j = pageblock_nr_pages; j; --j, pfn++) {
WARN_ON_ONCE(!pfn_valid(pfn));
+   /*
+* alloc_contig_range requires the pfn range
+* specified to be in the same zone. Make this
+* simple by forcing the entire CMA resv range
+* to be in the same zone.
+*/
if (page_zone(pfn_to_page(pfn)) != zone)
-   return -EINVAL;
+   goto err;
}
init_cma_reserved_pageblock(pfn_to_page(base_pfn));
} while (--i);
 
mutex_init(cma-lock);
return 0;
+
+err:
+   kfree(cma-bitmap);
+   return -EINVAL;
 }
 
 static struct cma cma_areas[MAX_CMA_AREAS];
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Joonsoo Kim
We don't need explicit 'CMA:' prefix, since we already define prefix
'cma:' in pr_fmt. So remove it.

And, some logs print function name and others doesn't. This looks
bad to me, so I unify log format to print function name consistently.

Lastly, I add one more debug log on cma_activate_area().

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 83969f8..bd0bb81 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
}
 
if (selected_size  !dma_contiguous_default_area) {
-   pr_debug(%s: reserving %ld MiB for global area\n, __func__,
+   pr_debug(%s(): reserving %ld MiB for global area\n, __func__,
 (unsigned long)selected_size / SZ_1M);
 
dma_contiguous_reserve_area(selected_size, selected_base,
@@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
unsigned i = cma-count  pageblock_order;
struct zone *zone;
 
-   cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+   pr_debug(%s()\n, __func__);
 
+   cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
if (!cma-bitmap)
return -ENOMEM;
 
@@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
-   pr_err(Not enough slots for CMA reserved regions!\n);
+   pr_err(%s(): Not enough slots for CMA reserved regions!\n,
+   __func__);
return -ENOSPC;
}
 
@@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
*res_cma = cma;
cma_area_count++;
 
-   pr_info(CMA: reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M,
-   (unsigned long)base);
+   pr_info(%s(): reserved %ld MiB at %08lx\n,
+   __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
 
/* Architecture specific contiguous memory fixup. */
dma_contiguous_early_fixup(base, size);
return 0;
 err:
-   pr_err(CMA: failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M);
+   pr_err(%s(): failed to reserve %ld MiB\n,
+   __func__, (unsigned long)size / SZ_1M);
return ret;
 }
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 09/10] mm, cma: move output param to the end of param list

2014-06-11 Thread Joonsoo Kim
Conventionally, we put output param to the end of param list.
cma_declare_contiguous() doesn't look like that, so change it.

Additionally, move down cma_areas reference code to the position
where it is really needed.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 28ec226..97613ea 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -184,7 +184,7 @@ void __init kvm_cma_reserve(void)
 
align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
cma_declare_contiguous(selected_size, 0, 0, align_size,
-   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
+   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, kvm_cma);
}
 }
 
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index f177f73..bfd4553 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -149,7 +149,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed);
+   ret = cma_declare_contiguous(size, base, limit, 0, 0, fixed, res_cma);
if (ret)
return ret;
 
diff --git a/include/linux/cma.h b/include/linux/cma.h
index e38efe9..e53eead 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -6,7 +6,7 @@ struct cma;
 extern int __init cma_declare_contiguous(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
phys_addr_t alignment, int order_per_bit,
-   struct cma **res_cma, bool fixed);
+   bool fixed, struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align);
 extern bool cma_release(struct cma *cma, struct page *pages, int count);
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index 01a0713..22a5b23 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,8 +142,8 @@ core_initcall(cma_init_reserved_areas);
  * @limit: End address of the reserved memory (optional, 0 for any).
  * @alignment: Alignment for the contiguous memory area, should be power of 2
  * @order_per_bit: Order of pages represented by one bit on bitmap.
- * @res_cma: Pointer to store the created cma region.
  * @fixed: hint about where to place the reserved area
+ * @res_cma: Pointer to store the created cma region.
  *
  * This function reserves memory from early allocator. It should be
  * called by arch specific code once the early allocator (memblock or bootmem)
@@ -156,9 +156,9 @@ core_initcall(cma_init_reserved_areas);
 int __init cma_declare_contiguous(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
phys_addr_t alignment, int order_per_bit,
-   struct cma **res_cma, bool fixed)
+   bool fixed, struct cma **res_cma)
 {
-   struct cma *cma = cma_areas[cma_area_count];
+   struct cma *cma;
int ret = 0;
 
pr_debug(%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n,
@@ -214,6 +214,7 @@ int __init cma_declare_contiguous(phys_addr_t size,
 * Each reserved area must be initialised later, when more kernel
 * subsystems (like slab allocator) are available.
 */
+   cma = cma_areas[cma_area_count];
cma-base_pfn = PFN_DOWN(base);
cma-count = size  PAGE_SHIFT;
cma-order_per_bit = order_per_bit;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity

2014-06-11 Thread Joonsoo Kim
ppc kvm's cma region management requires arbitrary bitmap granularity,
since they want to reserve very large memory and manage this region
with bitmap that one bit for several pages to reduce management overheads.
So support arbitrary bitmap granularity for following generalization.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index bc4c171..9bc9340 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -38,6 +38,7 @@ struct cma {
unsigned long   base_pfn;
unsigned long   count;
unsigned long   *bitmap;
+   int order_per_bit; /* Order of pages represented by one bit */
struct mutexlock;
 };
 
@@ -157,9 +158,38 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
 
 static DEFINE_MUTEX(cma_mutex);
 
+static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order)
+{
+   return (1  (align_order  cma-order_per_bit)) - 1;
+}
+
+static unsigned long cma_bitmap_maxno(struct cma *cma)
+{
+   return cma-count  cma-order_per_bit;
+}
+
+static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
+   unsigned long pages)
+{
+   return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
+}
+
+static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
+{
+   unsigned long bitmapno, nr_bits;
+
+   bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
+   nr_bits = cma_bitmap_pages_to_bits(cma, count);
+
+   mutex_lock(cma-lock);
+   bitmap_clear(cma-bitmap, bitmapno, nr_bits);
+   mutex_unlock(cma-lock);
+}
+
 static int __init cma_activate_area(struct cma *cma)
 {
-   int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long);
+   int bitmap_maxno = cma_bitmap_maxno(cma);
+   int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
unsigned i = cma-count  pageblock_order;
struct zone *zone;
@@ -221,6 +251,7 @@ core_initcall(cma_init_reserved_areas);
  * @base: Base address of the reserved area optional, use 0 for any
  * @limit: End address of the reserved memory (optional, 0 for any).
  * @alignment: Alignment for the contiguous memory area, should be power of 2
+ * @order_per_bit: Order of pages represented by one bit on bitmap.
  * @res_cma: Pointer to store the created cma region.
  * @fixed: hint about where to place the reserved area
  *
@@ -235,7 +266,7 @@ core_initcall(cma_init_reserved_areas);
  */
 static int __init __dma_contiguous_reserve_area(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
-   phys_addr_t alignment,
+   phys_addr_t alignment, int order_per_bit,
struct cma **res_cma, bool fixed)
 {
struct cma *cma = cma_areas[cma_area_count];
@@ -269,6 +300,8 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t 
size,
base = ALIGN(base, alignment);
size = ALIGN(size, alignment);
limit = ~(alignment - 1);
+   /* size should be aligned with order_per_bit */
+   BUG_ON(!IS_ALIGNED(size  PAGE_SHIFT, 1  order_per_bit));
 
/* Reserve memory */
if (base  fixed) {
@@ -294,6 +327,7 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t 
size,
 */
cma-base_pfn = PFN_DOWN(base);
cma-count = size  PAGE_SHIFT;
+   cma-order_per_bit = order_per_bit;
*res_cma = cma;
cma_area_count++;
 
@@ -313,7 +347,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = __dma_contiguous_reserve_area(size, base, limit, 0,
+   ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0,
res_cma, fixed);
if (ret)
return ret;
@@ -324,13 +358,6 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
return 0;
 }
 
-static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
-{
-   mutex_lock(cma-lock);
-   bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count);
-   mutex_unlock(cma-lock);
-}
-
 /**
  * dma_alloc_from_contiguous() - allocate pages from contiguous area
  * @dev:   Pointer to device for which the allocation is performed.
@@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long 
pfn, int count)
 static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
   unsigned int align)
 {
-   unsigned long mask, pfn, pageno, start = 0;
+   unsigned long mask, pfn, start = 0;
+   unsigned long bitmap_maxno, bitmapno, nr_bits;
struct page *page = NULL;
int ret;
 
@@ -358,18 +386,19 @@ static struct page *__dma_alloc_from_contiguous(struct 
cma *cma

[PATCH v2 00/10] CMA: generalize CMA reserved area management code

2014-06-11 Thread Joonsoo Kim
Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the kvm on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar.
From my guess, it is caused by some needs on bitmap management. Kvm side
wants to maintain bitmap not for 1 page, but for more size. Eventually it
use bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through
this patch.

This change could also help developer who want to use CMA in their
new feature development, since they can use CMA easily without
copying  pasting this reserved area management code.

v2:
  Although this patchset looks very different with v1, the end result,
  that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.

Patch 1-5 prepare some features to cover ppc kvm's requirements.
Patch 6-7 generalize CMA reserved area management code and change users
to use it.
Patch 8-10 clean-up minor things.

Joonsoo Kim (10):
  DMA, CMA: clean-up log message
  DMA, CMA: fix possible memory leak
  DMA, CMA: separate core cma management codes from DMA APIs
  DMA, CMA: support alignment constraint on cma region
  DMA, CMA: support arbitrary bitmap granularity
  CMA: generalize CMA reserved area management functionality
  PPC, KVM, CMA: use general CMA reserved area management framework
  mm, cma: clean-up cma allocation error path
  mm, cma: move output param to the end of param list
  mm, cma: use spinlock instead of mutex

 arch/powerpc/kvm/book3s_hv_builtin.c |   17 +-
 arch/powerpc/kvm/book3s_hv_cma.c |  240 
 arch/powerpc/kvm/book3s_hv_cma.h |   27 ---
 drivers/base/Kconfig |   10 -
 drivers/base/dma-contiguous.c|  248 ++---
 include/linux/cma.h  |   12 ++
 include/linux/dma-contiguous.h   |3 +-
 mm/Kconfig   |   11 ++
 mm/Makefile  |1 +
 mm/cma.c |  333 ++
 10 files changed, 382 insertions(+), 520 deletions(-)
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.c
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.h
 create mode 100644 include/linux/cma.h
 create mode 100644 mm/cma.c

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs

2014-06-11 Thread Joonsoo Kim
To prepare future generalization work on cma area management code,
we need to separate core cma management codes from DMA APIs.
We will extend these core functions to cover requirements of
ppc kvm's cma area management functionality in following patches.
This separation helps us not to touch DMA APIs while extending
core functions.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index fb0cdce..8a44c82 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas);
  * If @fixed is true, reserve contiguous area at exactly @base.  If false,
  * reserve in range from @base to @limit.
  */
-int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
-  phys_addr_t limit, struct cma **res_cma,
-  bool fixed)
+static int __init __dma_contiguous_reserve_area(phys_addr_t size,
+   phys_addr_t base, phys_addr_t limit,
+   struct cma **res_cma, bool fixed)
 {
struct cma *cma = cma_areas[cma_area_count];
phys_addr_t alignment;
@@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 
pr_info(%s(): reserved %ld MiB at %08lx\n,
__func__, (unsigned long)size / SZ_1M, (unsigned long)base);
-
-   /* Architecture specific contiguous memory fixup. */
-   dma_contiguous_early_fixup(base, size);
return 0;
+
 err:
pr_err(%s(): failed to reserve %ld MiB\n,
__func__, (unsigned long)size / SZ_1M);
return ret;
 }
 
+int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
+  phys_addr_t limit, struct cma **res_cma,
+  bool fixed)
+{
+   int ret;
+
+   ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
+   if (ret)
+   return ret;
+
+   /* Architecture specific contiguous memory fixup. */
+   dma_contiguous_early_fixup(base, size);
+
+   return 0;
+}
+
 static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 {
mutex_lock(cma-lock);
@@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
long pfn, int count)
  * global one. Requires architecture specific dev_get_cma_area() helper
  * function.
  */
-struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
   unsigned int align)
 {
unsigned long mask, pfn, pageno, start = 0;
-   struct cma *cma = dev_get_cma_area(dev);
struct page *page = NULL;
int ret;
 
if (!cma || !cma-count)
return NULL;
 
-   if (align  CONFIG_CMA_ALIGNMENT)
-   align = CONFIG_CMA_ALIGNMENT;
-
pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma,
 count, align);
 
@@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device *dev, 
int count,
return page;
 }
 
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+  unsigned int align)
+{
+   struct cma *cma = dev_get_cma_area(dev);
+
+   if (align  CONFIG_CMA_ALIGNMENT)
+   align = CONFIG_CMA_ALIGNMENT;
+
+   return __dma_alloc_from_contiguous(cma, count, align);
+}
+
 /**
  * dma_release_from_contiguous() - release allocated pages
  * @dev:   Pointer to device for which the pages were allocated.
@@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device *dev, 
int count,
  * It returns false when provided pages do not belong to contiguous area and
  * true otherwise.
  */
-bool dma_release_from_contiguous(struct device *dev, struct page *pages,
+static bool __dma_release_from_contiguous(struct cma *cma, struct page *pages,
 int count)
 {
-   struct cma *cma = dev_get_cma_area(dev);
unsigned long pfn;
 
if (!cma || !pages)
@@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, 
struct page *pages,
 
return true;
 }
+
+bool dma_release_from_contiguous(struct device *dev, struct page *pages,
+int count)
+{
+   struct cma *cma = dev_get_cma_area(dev);
+
+   return __dma_release_from_contiguous(cma, pages, count);
+}
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-11 Thread Joonsoo Kim
Now, we have general CMA reserved area management framework,
so use it for future maintainabilty. There is no functional change.

Acked-by: Michal Nazarewicz min...@mina86.com
Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7cde8a6..28ec226 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -16,12 +16,14 @@
 #include linux/init.h
 #include linux/memblock.h
 #include linux/sizes.h
+#include linux/cma.h
 
 #include asm/cputable.h
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
 
-#include book3s_hv_cma.h
+#define KVM_CMA_CHUNK_ORDER18
+
 /*
  * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
  * should be power of 2.
@@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5;
 unsigned long kvm_rma_pages = (1  27)  PAGE_SHIFT; /* 128MB */
 EXPORT_SYMBOL_GPL(kvm_rma_pages);
 
+static struct cma *kvm_cma;
+
 /* Work out RMLS (real mode limit selector) field value for a given RMA size.
Assumes POWER7 or PPC970. */
 static inline int lpcr_rmls(unsigned long rma_size)
@@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
if (!ri)
return NULL;
-   page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages);
+   page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
if (!page)
goto err_out;
atomic_set(ri-use_count, 1);
@@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma);
 void kvm_release_rma(struct kvm_rma_info *ri)
 {
if (atomic_dec_and_test(ri-use_count)) {
-   kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages);
+   cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages);
kfree(ri);
}
 }
@@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
/* Old CPUs require HPT aligned on a multiple of its size */
if (!cpu_has_feature(CPU_FTR_ARCH_206))
align_pages = nr_pages;
-   return kvm_alloc_cma(nr_pages, align_pages);
+   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
 
 void kvm_release_hpt(struct page *page, unsigned long nr_pages)
 {
-   kvm_release_cma(page, nr_pages);
+   cma_release(kvm_cma, page, nr_pages);
 }
 EXPORT_SYMBOL_GPL(kvm_release_hpt);
 
@@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void)
align_size = HPT_ALIGN_PAGES  PAGE_SHIFT;
 
align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
-   kvm_cma_declare_contiguous(selected_size, align_size);
+   cma_declare_contiguous(selected_size, 0, 0, align_size,
+   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
}
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
deleted file mode 100644
index d9d3d85..000
--- a/arch/powerpc/kvm/book3s_hv_cma.c
+++ /dev/null
@@ -1,240 +0,0 @@
-/*
- * Contiguous Memory Allocator for ppc KVM hash pagetable  based on CMA
- * for DMA mapping framework
- *
- * Copyright IBM Corporation, 2013
- * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of the
- * License or (at your optional) any later version of the license.
- *
- */
-#define pr_fmt(fmt) kvm_cma:  fmt
-
-#ifdef CONFIG_CMA_DEBUG
-#ifndef DEBUG
-#  define DEBUG
-#endif
-#endif
-
-#include linux/memblock.h
-#include linux/mutex.h
-#include linux/sizes.h
-#include linux/slab.h
-
-#include book3s_hv_cma.h
-
-struct kvm_cma {
-   unsigned long   base_pfn;
-   unsigned long   count;
-   unsigned long   *bitmap;
-};
-
-static DEFINE_MUTEX(kvm_cma_mutex);
-static struct kvm_cma kvm_cma_area;
-
-/**
- * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling
- *   for kvm hash pagetable
- * @size:  Size of the reserved memory.
- * @alignment:  Alignment for the contiguous memory area
- *
- * This function reserves memory for kvm cma area. It should be
- * called by arch code when early allocator (memblock or bootmem)
- * is still activate.
- */
-long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t alignment)
-{
-   long base_pfn;
-   phys_addr_t addr;
-   struct kvm_cma *cma = kvm_cma_area;
-
-   pr_debug(%s(size %lx)\n, __func__, (unsigned long)size);
-
-   if (!size)
-   return -EINVAL;
-   /*
-* Sanitise input arguments.
-* We should be pageblock aligned for CMA.
-*/
-   alignment = max(alignment, (phys_addr_t)(PAGE_SIZE  pageblock_order));
-   size = ALIGN(size, alignment

Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 10:11:19AM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  We don't need explicit 'CMA:' prefix, since we already define prefix
  'cma:' in pr_fmt. So remove it.
 
  And, some logs print function name and others doesn't. This looks
  bad to me, so I unify log format to print function name consistently.
 
  Lastly, I add one more debug log on cma_activate_area().
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index 83969f8..bd0bb81 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
  }
 
  if (selected_size  !dma_contiguous_default_area) {
  -   pr_debug(%s: reserving %ld MiB for global area\n, __func__,
  +   pr_debug(%s(): reserving %ld MiB for global area\n, __func__,
   (unsigned long)selected_size / SZ_1M);
 
 Do we need to do function(), or just function:. I have seen the later
 usage in other parts of the kernel.

Hello,

I also haven't seen this format in other kernel code, but, in cma, they use
this format as following.

function(arg1, arg2, ...): some message

If we all dislike this format, we can change it after merging this
patchset. Until then, it seems better to me to leave it as is.

 
 
  dma_contiguous_reserve_area(selected_size, selected_base,
  @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
  unsigned i = cma-count  pageblock_order;
  struct zone *zone;
 
  -   cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
  +   pr_debug(%s()\n, __func__);
 
 why ?
 

This pr_debug() comes from ppc kvm's kvm_cma_init_reserved_areas().
I want to maintain all log messages as much as possible to reduce confusion
with this generalization.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 02:18:53PM +0900, Minchan Kim wrote:
 Hi Joonsoo,
 
 On Thu, Jun 12, 2014 at 12:21:38PM +0900, Joonsoo Kim wrote:
  We don't need explicit 'CMA:' prefix, since we already define prefix
  'cma:' in pr_fmt. So remove it.
  
  And, some logs print function name and others doesn't. This looks
  bad to me, so I unify log format to print function name consistently.
  
  Lastly, I add one more debug log on cma_activate_area().
 
 When I take a look, it just indicates cma_activate_area was called or not,
 without what range for the area was reserved successfully so I couldn't see
 the intention for new message. Description should explain it so that everybody
 can agree on your claim.
 

Hello,

I paste the answer in other thread.

This pr_debug() comes from ppc kvm's kvm_cma_init_reserved_areas().
I want to maintain all log messages as much as possible to reduce
confusion with this generalization.

If I need to respin this patchset, I will explain more about it.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Joonsoo Kim
On Thu, Jun 12, 2014 at 02:25:43PM +0900, Minchan Kim wrote:
 On Thu, Jun 12, 2014 at 12:21:39PM +0900, Joonsoo Kim wrote:
  We should free memory for bitmap when we find zone mis-match,
  otherwise this memory will leak.
 
 Then, -stable stuff?

I don't think so. This is just possible leak candidate, so we don't
need to push this to stable tree.

 
  
  Additionally, I copy code comment from ppc kvm's cma code to notify
  why we need to check zone mis-match.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index bd0bb81..fb0cdce 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
  base_pfn = pfn;
  for (j = pageblock_nr_pages; j; --j, pfn++) {
  WARN_ON_ONCE(!pfn_valid(pfn));
  +   /*
  +* alloc_contig_range requires the pfn range
  +* specified to be in the same zone. Make this
  +* simple by forcing the entire CMA resv range
  +* to be in the same zone.
  +*/
  if (page_zone(pfn_to_page(pfn)) != zone)
  -   return -EINVAL;
  +   goto err;
 
 At a first glance, I thought it would be better to handle such error
 before activating.
 So when I see the registration code(ie, dma_contiguous_revere_area),
 I realized it is impossible because we didn't set up zone yet. :(
 
 If so, when we detect to fail here, it would be better to report more
 meaningful error message like what was successful zone and what is
 new zone and failed pfn number?

What I want to do in early phase of this patchset is to make cma code
on DMA APIs similar to ppc kvm's cma code. ppc kvm's cma code already
has this error handling logic, so I make this patch.

If we think that we need more things, we can do that on general cma code
after merging this patchset.

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 1/3] CMA: generalize CMA reserved area management functionality

2014-06-09 Thread Joonsoo Kim
On Tue, Jun 03, 2014 at 08:56:00AM +0200, Michal Nazarewicz wrote:
 On Tue, Jun 03 2014, Joonsoo Kim wrote:
  Currently, there are two users on CMA functionality, one is the DMA
  subsystem and the other is the kvm on powerpc. They have their own code
  to manage CMA reserved area even if they looks really similar.
  From my guess, it is caused by some needs on bitmap management. Kvm side
  wants to maintain bitmap not for 1 page, but for more size. Eventually it
  use bitmap where one bit represents 64 pages.
 
  When I implement CMA related patches, I should change those two places
  to apply my change and it seem to be painful to me. I want to change
  this situation and reduce future code management overhead through
  this patch.
 
  This change could also help developer who want to use CMA in their
  new feature development, since they can use CMA easily without
  copying  pasting this reserved area management code.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Some small comments below, but in general
 
 Acked-by: Michal Nazarewicz min...@mina86.com

Hello, Michal.

Thanks!

 
 
  diff --git a/include/linux/cma.h b/include/linux/cma.h
  new file mode 100644
  index 000..60ba06f
  --- /dev/null
  +++ b/include/linux/cma.h
  @@ -0,0 +1,28 @@
  +/*
  + * Contiguous Memory Allocator
  + *
  + * Copyright LG Electronics Inc., 2014
  + * Written by:
  + * Joonsoo Kim iamjoonsoo@lge.com
  + *
  + * This program is free software; you can redistribute it and/or
  + * modify it under the terms of the GNU General Public License as
  + * published by the Free Software Foundation; either version 2 of the
  + * License or (at your optional) any later version of the license.
  + *
 
 Superfluous empty comment line.
 
 Also, I'm not certain whether this copyright notice is appropriate here,
 but that's another story.

Yeah, I will remove copyright notice in .h file.

 
  + */
  +
  +#ifndef __CMA_H__
  +#define __CMA_H__
  +
  +struct cma;
  +
  +extern struct page *cma_alloc(struct cma *cma, unsigned long count,
  +   unsigned long align);
  +extern bool cma_release(struct cma *cma, struct page *pages,
  +   unsigned long count);
  +extern int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t 
  base,
  +   phys_addr_t limit, phys_addr_t alignment,
  +   unsigned long bitmap_shift, bool fixed,
  +   struct cma **res_cma);
  +#endif
 
  diff --git a/mm/cma.c b/mm/cma.c
  new file mode 100644
  index 000..0dae88d
  --- /dev/null
  +++ b/mm/cma.c
  @@ -0,0 +1,329 @@
 
  +static int __init cma_activate_area(struct cma *cma)
  +{
  +   int max_bitmapno = cma_bitmap_max_no(cma);
  +   int bitmap_size = BITS_TO_LONGS(max_bitmapno) * sizeof(long);
  +   unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
  +   unsigned i = cma-count  pageblock_order;
  +   struct zone *zone;
  +
  +   pr_debug(%s()\n, __func__);
  +   if (!cma-count)
  +   return 0;
 
 Alternatively:
 
 + if (!i)
 + return 0;

I prefer cma-count than i, since it represents what it does itself.

  +
  +   cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
  +   if (!cma-bitmap)
  +   return -ENOMEM;
  +
  +   WARN_ON_ONCE(!pfn_valid(pfn));
  +   zone = page_zone(pfn_to_page(pfn));
  +
  +   do {
  +   unsigned j;
  +
  +   base_pfn = pfn;
  +   for (j = pageblock_nr_pages; j; --j, pfn++) {
  +   WARN_ON_ONCE(!pfn_valid(pfn));
  +   /*
  +* alloc_contig_range requires the pfn range
  +* specified to be in the same zone. Make this
  +* simple by forcing the entire CMA resv range
  +* to be in the same zone.
  +*/
  +   if (page_zone(pfn_to_page(pfn)) != zone)
  +   goto err;
  +   }
  +   init_cma_reserved_pageblock(pfn_to_page(base_pfn));
  +   } while (--i);
  +
  +   mutex_init(cma-lock);
  +   return 0;
  +
  +err:
  +   kfree(cma-bitmap);
  +   return -EINVAL;
  +}
 
  +static int __init cma_init_reserved_areas(void)
  +{
  +   int i;
  +
  +   for (i = 0; i  cma_area_count; i++) {
  +   int ret = cma_activate_area(cma_areas[i]);
  +
  +   if (ret)
  +   return ret;
  +   }
  +
  +   return 0;
  +}
 
 Or even:
 
 static int __init cma_init_reserved_areas(void)
 {
   int i, ret = 0;
   for (i = 0; !ret  i  cma_area_count; ++i)
   ret = cma_activate_area(cma_areas[i]);
   return ret;
 }

I think that originial implementation is better, since it seems
more readable to me.

  +int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base,
  +   phys_addr_t limit, phys_addr_t alignment,
  +   unsigned long bitmap_shift, bool fixed

Re: [RFC PATCH 2/3] DMA, CMA: use general CMA reserved area management framework

2014-06-09 Thread Joonsoo Kim
On Tue, Jun 03, 2014 at 09:00:48AM +0200, Michal Nazarewicz wrote:
 On Tue, Jun 03 2014, Joonsoo Kim wrote:
  Now, we have general CMA reserved area management framework,
  so use it for future maintainabilty. There is no functional change.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Acked-by: Michal Nazarewicz min...@mina86.com
 
  diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
  index dfb1dc9..ecb85ac 100644
  --- a/include/linux/dma-contiguous.h
  +++ b/include/linux/dma-contiguous.h
  @@ -53,9 +53,10 @@
   
   #ifdef __KERNEL__
   
  +#include linux/device.h
  +
 
 Why is this suddenly required?
 
   struct cma;
   struct page;
  -struct device;
   
   #ifdef CONFIG_DMA_CMA
 

Without including device.h, build failure occurs.
In dma-contiguous.h, we try to access to dev-cma_area, so we need
device.h. In the past, we included it luckily by swap.h in
drivers/base/dma-contiguous.c. Swap.h includes node.h and then node.h
includes device.h, so we were happy. But, in this patch, I remove
'include linux/swap.h' so we need to include device.h explicitly.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 1/3] CMA: generalize CMA reserved area management functionality

2014-06-09 Thread Joonsoo Kim
On Thu, Jun 05, 2014 at 11:09:05PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Currently, there are two users on CMA functionality, one is the DMA
  subsystem and the other is the kvm on powerpc. They have their own code
  to manage CMA reserved area even if they looks really similar.
  From my guess, it is caused by some needs on bitmap management. Kvm side
  wants to maintain bitmap not for 1 page, but for more size. Eventually it
  use bitmap where one bit represents 64 pages.
 
  When I implement CMA related patches, I should change those two places
  to apply my change and it seem to be painful to me. I want to change
  this situation and reduce future code management overhead through
  this patch.
 
  This change could also help developer who want to use CMA in their
  new feature development, since they can use CMA easily without
  copying  pasting this reserved area management code.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 
 The way patches are split makes it difficult to review. Will it be
 possible to make changes against one implementation and them move that
 to generic code. That helps in finding out what exactly changed.
 

Hello,

You are right! I will respin this patchset as the form you
recommended.

 
  diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
  index 00e13ce..b3fe1cc 100644
  --- a/drivers/base/Kconfig
  +++ b/drivers/base/Kconfig
  @@ -283,7 +283,7 @@ config CMA_ALIGNMENT
 
If unsure, leave the default value 8.
 
  -config CMA_AREAS
  +config DMA_CMA_AREAS
  int Maximum count of the CMA device-private areas
  default 7
  help
 
 for ex: The above can be a seperate patch along with further DMA related
 cleanups . 

Okay.

 
  diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
  index 83969f8..48cdac8 100644
  --- a/drivers/base/dma-contiguous.c
  +++ b/drivers/base/dma-contiguous.c
  @@ -186,7 +186,7 @@ static int __init cma_activate_area(struct cma *cma)
  return 0;
   }
 
  -static struct cma cma_areas[MAX_CMA_AREAS];
  +static struct cma cma_areas[MAX_DMA_CMA_AREAS];
   static unsigned cma_area_count;
 
   static int __init cma_init_reserved_areas(void)
  diff --git a/include/linux/cma.h b/include/linux/cma.h
  new file mode 100644
  index 000..60ba06f
  --- /dev/null
  +++ b/include/linux/cma.h
  @@ -0,0 +1,28 @@
  +/*
  + * Contiguous Memory Allocator
  + *
  + * Copyright LG Electronics Inc., 2014
  + * Written by:
  + * Joonsoo Kim iamjoonsoo@lge.com
  + *
  + * This program is free software; you can redistribute it and/or
  + * modify it under the terms of the GNU General Public License as
  + * published by the Free Software Foundation; either version 2 of the
  + * License or (at your optional) any later version of the license.
  + *
  + */
  +
  +#ifndef __CMA_H__
  +#define __CMA_H__
  +
  +struct cma;
  +
  +extern struct page *cma_alloc(struct cma *cma, unsigned long count,
  +   unsigned long align);
  +extern bool cma_release(struct cma *cma, struct page *pages,
  +   unsigned long count);
  +extern int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t 
  base,
  +   phys_addr_t limit, phys_addr_t alignment,
  +   unsigned long bitmap_shift, bool fixed,
  +   struct cma **res_cma);
  +#endif
  diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
  index 772eab5..dfb1dc9 100644
  --- a/include/linux/dma-contiguous.h
  +++ b/include/linux/dma-contiguous.h
  @@ -63,7 +63,7 @@ struct device;
* There is always at least global CMA area and a few optional device
* private areas configured in kernel .config.
*/
  -#define MAX_CMA_AREAS  (1 + CONFIG_CMA_AREAS)
  +#define MAX_DMA_CMA_AREAS  (1 + CONFIG_DMA_CMA_AREAS)
 
   extern struct cma *dma_contiguous_default_area;
 
  @@ -123,7 +123,7 @@ bool dma_release_from_contiguous(struct device *dev, 
  struct page *pages,
 
   #else
 
  -#define MAX_CMA_AREAS  (0)
  +#define MAX_DMA_CMA_AREAS  (0)
 
   static inline struct cma *dev_get_cma_area(struct device *dev)
   {
  diff --git a/mm/Kconfig b/mm/Kconfig
  index 7511b4a..0877ddc 100644
  --- a/mm/Kconfig
  +++ b/mm/Kconfig
  @@ -515,6 +515,17 @@ config CMA_DEBUG
processing calls such as dma_alloc_from_contiguous().
This option does not affect warning and error messages.
 
  +config CMA_AREAS
  +   int Maximum count of the CMA areas
  +   depends on CMA
  +   default 7
  +   help
  + CMA allows to create CMA areas for particular purpose, mainly,
  + used as device private area. This parameter sets the maximum
  + number of CMA area in the system.
  +
  + If unsure, leave the default value 7.
  +
 
 If we have 7 declare continugous request, a kvm cma allocation request will 
 fail ?

Yes. If you need more, you can increase it. :)

   config ZBUD
  tristate

[RFC PATCH 0/3] CMA: generalize CMA reserved area management code

2014-06-02 Thread Joonsoo Kim
Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the kvm on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar.
From my guess, it is caused by some needs on bitmap management. Kvm side
wants to maintain bitmap not for 1 page, but for more size. Eventually it
use bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through
this patch.

This change could also help developer who want to use CMA in their
new feature development, since they can use CMA easily without
copying  pasting this reserved area management code.

Now, we are in merge window, so this is not for merging. I'd like to
listen opinion from people who related to this stuff before actually
trying to merge this patchset. If all agree with this change, I will
resend it after rc1.

Thanks.

Joonsoo Kim (3):
  CMA: generalize CMA reserved area management functionality
  DMA, CMA: use general CMA reserved area management framework
  PPC, KVM, CMA: use general CMA reserved area management framework

 arch/powerpc/kvm/book3s_hv_builtin.c |   17 +-
 arch/powerpc/kvm/book3s_hv_cma.c |  240 -
 arch/powerpc/kvm/book3s_hv_cma.h |   27 ---
 drivers/base/Kconfig |   10 --
 drivers/base/dma-contiguous.c|  230 ++--
 include/linux/cma.h  |   28 +++
 include/linux/dma-contiguous.h   |7 +-
 mm/Kconfig   |   11 ++
 mm/Makefile  |1 +
 mm/cma.c |  329 ++
 10 files changed, 396 insertions(+), 504 deletions(-)
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.c
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.h
 create mode 100644 include/linux/cma.h
 create mode 100644 mm/cma.c

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 1/3] CMA: generalize CMA reserved area management functionality

2014-06-02 Thread Joonsoo Kim
Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the kvm on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar.
From my guess, it is caused by some needs on bitmap management. Kvm side
wants to maintain bitmap not for 1 page, but for more size. Eventually it
use bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through
this patch.

This change could also help developer who want to use CMA in their
new feature development, since they can use CMA easily without
copying  pasting this reserved area management code.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 00e13ce..b3fe1cc 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -283,7 +283,7 @@ config CMA_ALIGNMENT
 
  If unsure, leave the default value 8.
 
-config CMA_AREAS
+config DMA_CMA_AREAS
int Maximum count of the CMA device-private areas
default 7
help
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 83969f8..48cdac8 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -186,7 +186,7 @@ static int __init cma_activate_area(struct cma *cma)
return 0;
 }
 
-static struct cma cma_areas[MAX_CMA_AREAS];
+static struct cma cma_areas[MAX_DMA_CMA_AREAS];
 static unsigned cma_area_count;
 
 static int __init cma_init_reserved_areas(void)
diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 000..60ba06f
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,28 @@
+/*
+ * Contiguous Memory Allocator
+ *
+ * Copyright LG Electronics Inc., 2014
+ * Written by:
+ * Joonsoo Kim iamjoonsoo@lge.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ *
+ */
+
+#ifndef __CMA_H__
+#define __CMA_H__
+
+struct cma;
+
+extern struct page *cma_alloc(struct cma *cma, unsigned long count,
+   unsigned long align);
+extern bool cma_release(struct cma *cma, struct page *pages,
+   unsigned long count);
+extern int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base,
+   phys_addr_t limit, phys_addr_t alignment,
+   unsigned long bitmap_shift, bool fixed,
+   struct cma **res_cma);
+#endif
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index 772eab5..dfb1dc9 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -63,7 +63,7 @@ struct device;
  * There is always at least global CMA area and a few optional device
  * private areas configured in kernel .config.
  */
-#define MAX_CMA_AREAS  (1 + CONFIG_CMA_AREAS)
+#define MAX_DMA_CMA_AREAS  (1 + CONFIG_DMA_CMA_AREAS)
 
 extern struct cma *dma_contiguous_default_area;
 
@@ -123,7 +123,7 @@ bool dma_release_from_contiguous(struct device *dev, struct 
page *pages,
 
 #else
 
-#define MAX_CMA_AREAS  (0)
+#define MAX_DMA_CMA_AREAS  (0)
 
 static inline struct cma *dev_get_cma_area(struct device *dev)
 {
diff --git a/mm/Kconfig b/mm/Kconfig
index 7511b4a..0877ddc 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -515,6 +515,17 @@ config CMA_DEBUG
  processing calls such as dma_alloc_from_contiguous().
  This option does not affect warning and error messages.
 
+config CMA_AREAS
+   int Maximum count of the CMA areas
+   depends on CMA
+   default 7
+   help
+ CMA allows to create CMA areas for particular purpose, mainly,
+ used as device private area. This parameter sets the maximum
+ number of CMA area in the system.
+
+ If unsure, leave the default value 7.
+
 config ZBUD
tristate
default n
diff --git a/mm/Makefile b/mm/Makefile
index 1eaa70b..bc0422b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -62,3 +62,4 @@ obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
 obj-$(CONFIG_ZBUD) += zbud.o
 obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
 obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
+obj-$(CONFIG_CMA)  += cma.o
diff --git a/mm/cma.c b/mm/cma.c
new file mode 100644
index 000..0dae88d
--- /dev/null
+++ b/mm/cma.c
@@ -0,0 +1,329 @@
+/*
+ * Contiguous Memory Allocator
+ *
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Copyright IBM Corporation, 2013
+ * Copyright LG Electronics Inc., 2014
+ * Written by:
+ * Marek Szyprowski m.szyprow...@samsung.com
+ * Michal Nazarewicz min...@mina86.com
+ * Aneesh Kumar K.V

[RFC PATCH 2/3] DMA, CMA: use general CMA reserved area management framework

2014-06-02 Thread Joonsoo Kim
Now, we have general CMA reserved area management framework,
so use it for future maintainabilty. There is no functional change.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index b3fe1cc..4eac559 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -283,16 +283,6 @@ config CMA_ALIGNMENT
 
  If unsure, leave the default value 8.
 
-config DMA_CMA_AREAS
-   int Maximum count of the CMA device-private areas
-   default 7
-   help
- CMA allows to create CMA areas for particular devices. This parameter
- sets the maximum number of such device private CMA areas in the
- system.
-
- If unsure, leave the default value 7.
-
 endif
 
 endmenu
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 48cdac8..4bce4e1 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -24,23 +24,9 @@
 
 #include linux/memblock.h
 #include linux/err.h
-#include linux/mm.h
-#include linux/mutex.h
-#include linux/page-isolation.h
 #include linux/sizes.h
-#include linux/slab.h
-#include linux/swap.h
-#include linux/mm_types.h
 #include linux/dma-contiguous.h
-
-struct cma {
-   unsigned long   base_pfn;
-   unsigned long   count;
-   unsigned long   *bitmap;
-   struct mutexlock;
-};
-
-struct cma *dma_contiguous_default_area;
+#include linux/cma.h
 
 #ifdef CONFIG_CMA_SIZE_MBYTES
 #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
@@ -48,6 +34,8 @@ struct cma *dma_contiguous_default_area;
 #define CMA_SIZE_MBYTES 0
 #endif
 
+struct cma *dma_contiguous_default_area;
+
 /*
  * Default global CMA area size can be defined in kernel's .config.
  * This is useful mainly for distro maintainers to create a kernel
@@ -154,55 +142,6 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
}
 }
 
-static DEFINE_MUTEX(cma_mutex);
-
-static int __init cma_activate_area(struct cma *cma)
-{
-   int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long);
-   unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
-   unsigned i = cma-count  pageblock_order;
-   struct zone *zone;
-
-   cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
-
-   if (!cma-bitmap)
-   return -ENOMEM;
-
-   WARN_ON_ONCE(!pfn_valid(pfn));
-   zone = page_zone(pfn_to_page(pfn));
-
-   do {
-   unsigned j;
-   base_pfn = pfn;
-   for (j = pageblock_nr_pages; j; --j, pfn++) {
-   WARN_ON_ONCE(!pfn_valid(pfn));
-   if (page_zone(pfn_to_page(pfn)) != zone)
-   return -EINVAL;
-   }
-   init_cma_reserved_pageblock(pfn_to_page(base_pfn));
-   } while (--i);
-
-   mutex_init(cma-lock);
-   return 0;
-}
-
-static struct cma cma_areas[MAX_DMA_CMA_AREAS];
-static unsigned cma_area_count;
-
-static int __init cma_init_reserved_areas(void)
-{
-   int i;
-
-   for (i = 0; i  cma_area_count; i++) {
-   int ret = cma_activate_area(cma_areas[i]);
-   if (ret)
-   return ret;
-   }
-
-   return 0;
-}
-core_initcall(cma_init_reserved_areas);
-
 /**
  * dma_contiguous_reserve_area() - reserve custom contiguous area
  * @size: Size of the reserved area (in bytes),
@@ -224,176 +163,31 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
   phys_addr_t limit, struct cma **res_cma,
   bool fixed)
 {
-   struct cma *cma = cma_areas[cma_area_count];
-   phys_addr_t alignment;
-   int ret = 0;
-
-   pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__,
-(unsigned long)size, (unsigned long)base,
-(unsigned long)limit);
-
-   /* Sanity checks */
-   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
-   pr_err(Not enough slots for CMA reserved regions!\n);
-   return -ENOSPC;
-   }
-
-   if (!size)
-   return -EINVAL;
-
-   /* Sanitise input arguments */
-   alignment = PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order);
-   base = ALIGN(base, alignment);
-   size = ALIGN(size, alignment);
-   limit = ~(alignment - 1);
-
-   /* Reserve memory */
-   if (base  fixed) {
-   if (memblock_is_region_reserved(base, size) ||
-   memblock_reserve(base, size)  0) {
-   ret = -EBUSY;
-   goto err;
-   }
-   } else {
-   phys_addr_t addr = memblock_alloc_range(size, alignment, base,
-   limit);
-   if (!addr) {
-   ret = -ENOMEM;
-   goto err;
-   } else {
-   base = addr;
-   }
-   }
-
-   /*
-* Each

[RFC PATCH 3/3] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-02 Thread Joonsoo Kim
Now, we have general CMA reserved area management framework,
so use it for future maintainabilty. There is no functional change.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 8cd0dae..43c3f81 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -15,12 +15,14 @@
 #include linux/init.h
 #include linux/memblock.h
 #include linux/sizes.h
+#include linux/cma.h
 
 #include asm/cputable.h
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
 
-#include book3s_hv_cma.h
+#define KVM_CMA_CHUNK_ORDER18
+
 /*
  * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
  * should be power of 2.
@@ -42,6 +44,8 @@ static unsigned long kvm_cma_resv_ratio = 5;
 unsigned long kvm_rma_pages = (1  27)  PAGE_SHIFT; /* 128MB */
 EXPORT_SYMBOL_GPL(kvm_rma_pages);
 
+static struct cma *kvm_cma;
+
 /* Work out RMLS (real mode limit selector) field value for a given RMA size.
Assumes POWER7 or PPC970. */
 static inline int lpcr_rmls(unsigned long rma_size)
@@ -96,7 +100,7 @@ struct kvm_rma_info *kvm_alloc_rma()
ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
if (!ri)
return NULL;
-   page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages);
+   page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
if (!page)
goto err_out;
atomic_set(ri-use_count, 1);
@@ -111,7 +115,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma);
 void kvm_release_rma(struct kvm_rma_info *ri)
 {
if (atomic_dec_and_test(ri-use_count)) {
-   kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages);
+   cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages);
kfree(ri);
}
 }
@@ -133,13 +137,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
/* Old CPUs require HPT aligned on a multiple of its size */
if (!cpu_has_feature(CPU_FTR_ARCH_206))
align_pages = nr_pages;
-   return kvm_alloc_cma(nr_pages, align_pages);
+   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
 
 void kvm_release_hpt(struct page *page, unsigned long nr_pages)
 {
-   kvm_release_cma(page, nr_pages);
+   cma_release(kvm_cma, page, nr_pages);
 }
 EXPORT_SYMBOL_GPL(kvm_release_hpt);
 
@@ -178,6 +182,7 @@ void __init kvm_cma_reserve(void)
align_size = HPT_ALIGN_PAGES  PAGE_SHIFT;
 
align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
-   kvm_cma_declare_contiguous(selected_size, align_size);
+   cma_declare_contiguous(selected_size, 0, 0, align_size,
+   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, kvm_cma);
}
 }
diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
deleted file mode 100644
index d9d3d85..000
--- a/arch/powerpc/kvm/book3s_hv_cma.c
+++ /dev/null
@@ -1,240 +0,0 @@
-/*
- * Contiguous Memory Allocator for ppc KVM hash pagetable  based on CMA
- * for DMA mapping framework
- *
- * Copyright IBM Corporation, 2013
- * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of the
- * License or (at your optional) any later version of the license.
- *
- */
-#define pr_fmt(fmt) kvm_cma:  fmt
-
-#ifdef CONFIG_CMA_DEBUG
-#ifndef DEBUG
-#  define DEBUG
-#endif
-#endif
-
-#include linux/memblock.h
-#include linux/mutex.h
-#include linux/sizes.h
-#include linux/slab.h
-
-#include book3s_hv_cma.h
-
-struct kvm_cma {
-   unsigned long   base_pfn;
-   unsigned long   count;
-   unsigned long   *bitmap;
-};
-
-static DEFINE_MUTEX(kvm_cma_mutex);
-static struct kvm_cma kvm_cma_area;
-
-/**
- * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling
- *   for kvm hash pagetable
- * @size:  Size of the reserved memory.
- * @alignment:  Alignment for the contiguous memory area
- *
- * This function reserves memory for kvm cma area. It should be
- * called by arch code when early allocator (memblock or bootmem)
- * is still activate.
- */
-long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t alignment)
-{
-   long base_pfn;
-   phys_addr_t addr;
-   struct kvm_cma *cma = kvm_cma_area;
-
-   pr_debug(%s(size %lx)\n, __func__, (unsigned long)size);
-
-   if (!size)
-   return -EINVAL;
-   /*
-* Sanitise input arguments.
-* We should be pageblock aligned for CMA.
-*/
-   alignment = max(alignment, (phys_addr_t)(PAGE_SIZE  pageblock_order));
-   size = ALIGN(size, alignment);
-   /*
-* Reserve memory
-* Use __memblock_alloc_base() since

Re: [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id()

2014-05-18 Thread Joonsoo Kim
On Fri, May 16, 2014 at 04:37:35PM -0700, Nishanth Aravamudan wrote:
 On 06.02.2014 [17:07:04 +0900], Joonsoo Kim wrote:
  Currently, if allocation constraint to node is NUMA_NO_NODE, we search
  a partial slab on numa_node_id() node. This doesn't work properly on the
  system having memoryless node, since it can have no memory on that node and
  there must be no partial slab on that node.
  
  On that node, page allocation always fallback to numa_mem_id() first. So
  searching a partial slab on numa_node_id() in that case is proper solution
  for memoryless node case.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com
 
 Joonsoo, would you send this one on to Andrew?

Hello,

Okay. I will do it.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-23 Thread Joonsoo Kim
On Tue, Feb 18, 2014 at 10:38:01AM -0600, Christoph Lameter wrote:
 On Mon, 17 Feb 2014, Joonsoo Kim wrote:
 
  On Wed, Feb 12, 2014 at 04:16:11PM -0600, Christoph Lameter wrote:
   Here is another patch with some fixes. The additional logic is only
   compiled in if CONFIG_HAVE_MEMORYLESS_NODES is set.
  
   Subject: slub: Memoryless node support
  
   Support memoryless nodes by tracking which allocations are failing.
 
  I still don't understand why this tracking is needed.
 
 Its an optimization to avoid calling the page allocator to figure out if
 there is memory available on a particular node.
 
  All we need for allcation targeted to memoryless node is to fallback proper
  node, that it, numa_mem_id() node of targeted node. My previous patch
  implements it and use proper fallback node on every allocation code path.
  Why this tracking is needed? Please elaborate more on this.
 
 Its too slow to do that on every alloc. One needs to be able to satisfy
 most allocations without switching percpu slabs for optimal performance.

I don't think that we need to switch percpu slabs on every alloc.
Allocation targeted to specific node is rare. And most of these allocations
may be targeted to either numa_node_id() or numa_mem_id(). My patch considers
these cases, so most of allocations are processed by percpu slabs. There is
no suboptimal performance.

 
   Allocations targeted to the nodes without memory fall back to the
   current available per cpu objects and if that is not available will
   create a new slab using the page allocator to fallback from the
   memoryless node to some other node.
 
 And what about the next alloc? Assuem there are N allocs from a memoryless
 node this means we push back the partial slab on each alloc and then fall
 back?
 
{
 void *object;
   - int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
   + int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
  
 object = get_partial_node(s, get_node(s, searchnode), c, flags);
 if (object || node != NUMA_NO_NODE)
 
  This isn't enough.
  Consider that allcation targeted to memoryless node.
 
 It will not common get there because of the tracking. Instead a per cpu
 object will be used.
  get_partial_node() always fails even if there are some partial slab on
  memoryless node's neareast node.
 
 Correct and that leads to a page allocator action whereupon the node will
 be marked as empty.

Why do we need to request to a page allocator if there is partial slab?
Checking whether node is memoryless or not is really easy, so we don't need
to skip this. To skip this is suboptimal solution.

  We should fallback to some proper node in this case, since there is no slab
  on memoryless node.
 
 NUMA is about optimization of memory allocations. It is often *not* about
 correctness but heuristics are used in many cases. F.e. see the zone
 reclaim logic, zone reclaim mode, fallback scenarios in the page allocator
 etc etc.

Okay. But, 'do our best' is preferable to me.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-16 Thread Joonsoo Kim
On Wed, Feb 12, 2014 at 04:16:11PM -0600, Christoph Lameter wrote:
 Here is another patch with some fixes. The additional logic is only
 compiled in if CONFIG_HAVE_MEMORYLESS_NODES is set.
 
 Subject: slub: Memoryless node support
 
 Support memoryless nodes by tracking which allocations are failing.

I still don't understand why this tracking is needed.
All we need for allcation targeted to memoryless node is to fallback proper
node, that it, numa_mem_id() node of targeted node. My previous patch
implements it and use proper fallback node on every allocation code path.
Why this tracking is needed? Please elaborate more on this.

 Allocations targeted to the nodes without memory fall back to the
 current available per cpu objects and if that is not available will
 create a new slab using the page allocator to fallback from the
 memoryless node to some other node.
 
 Signed-off-by: Christoph Lameter c...@linux.com
 
 @@ -1722,7 +1738,7 @@ static void *get_partial(struct kmem_cac
   struct kmem_cache_cpu *c)
  {
   void *object;
 - int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
 + int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
 
   object = get_partial_node(s, get_node(s, searchnode), c, flags);
   if (object || node != NUMA_NO_NODE)

This isn't enough.
Consider that allcation targeted to memoryless node.
get_partial_node() always fails even if there are some partial slab on
memoryless node's neareast node.
We should fallback to some proper node in this case, since there is no slab
on memoryless node.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-16 Thread Joonsoo Kim
On Wed, Feb 12, 2014 at 10:51:37PM -0800, Nishanth Aravamudan wrote:
 Hi Joonsoo,
 Also, given that only ia64 and (hopefuly soon) ppc64 can set
 CONFIG_HAVE_MEMORYLESS_NODES, does that mean x86_64 can't have
 memoryless nodes present? Even with fakenuma? Just curious.

I don't know, because I'm not expert on NUMA system :)
At first glance, fakenuma can't be used for testing
CONFIG_HAVE_MEMORYLESS_NODES. Maybe some modification is needed.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-10 Thread Joonsoo Kim
On Mon, Feb 10, 2014 at 11:13:21AM -0800, Nishanth Aravamudan wrote:
 Hi Christoph,
 
 On 07.02.2014 [12:51:07 -0600], Christoph Lameter wrote:
  Here is a draft of a patch to make this work with memoryless nodes.
  
  The first thing is that we modify node_match to also match if we hit an
  empty node. In that case we simply take the current slab if its there.
  
  If there is no current slab then a regular allocation occurs with the
  memoryless node. The page allocator will fallback to a possible node and
  that will become the current slab. Next alloc from a memoryless node
  will then use that slab.
  
  For that we also add some tracking of allocations on nodes that were not
  satisfied using the empty_node[] array. A successful alloc on a node
  clears that flag.
  
  I would rather avoid the empty_node[] array since its global and there may
  be thread specific allocation restrictions but it would be expensive to do
  an allocation attempt via the page allocator to make sure that there is
  really no page available from the page allocator.
 
 With this patch on our test system (I pulled out the numa_mem_id()
 change, since you Acked Joonsoo's already), on top of 3.13.0 + my
 kthread locality change + CONFIG_HAVE_MEMORYLESS_NODES + Joonsoo's RFC
 patch 1):
 
 MemTotal:8264704 kB
 MemFree: 5924608 kB
 ...
 Slab:1402496 kB
 SReclaimable: 102848 kB
 SUnreclaim:  1299648 kB
 
 And Anton's slabusage reports:
 
 slab   mem objsslabs
   used   active   active
 
 kmalloc-16384   207 MB   98.60%  100.00%
 task_struct 134 MB   97.82%  100.00%
 kmalloc-8192117 MB  100.00%  100.00%
 pgtable-2^12111 MB  100.00%  100.00%
 pgtable-2^10104 MB  100.00%  100.00%
 
 For comparison, Anton's patch applied at the same point in the series:
 
 meminfo:
 
 MemTotal:8264704 kB
 MemFree: 4150464 kB
 ...
 Slab:1590336 kB
 SReclaimable: 208768 kB
 SUnreclaim:  1381568 kB
 
 slabusage:
 
 slab   mem objsslabs
   used   active   active
 
 kmalloc-16384   227 MB   98.63%  100.00%
 kmalloc-8192130 MB  100.00%  100.00%
 task_struct 129 MB   97.73%  100.00%
 pgtable-2^12112 MB  100.00%  100.00%
 pgtable-2^10106 MB  100.00%  100.00%
 
 
 Consider this patch:
 
 Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com
 Tested-by: Nishanth Aravamudan n...@linux.vnet.ibm.com

Hello,

I still think that there is another problem.
Your report about CONFIG_SLAB said that SLAB uses just 200MB.
Below is your previous report.

  Ok, with your patches applied and CONFIG_SLAB enabled:

  MemTotal:8264640 kB
  MemFree: 7119680 kB
  Slab: 207232 kB
  SReclaimable:  32896 kB
  SUnreclaim:   174336 kB

The number on CONFIG_SLUB with these patches tell us that SLUB uses 1.4GB.
There is large difference on slab usage.

And, I should note that number of active objects on slabinfo can be wrong
on some situation, since it doesn't consider cpu slab (and cpu partial slab).

I recommend to confirm page_to_nid() and other things as I mentioned earlier.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-09 Thread Joonsoo Kim
On Sat, Feb 08, 2014 at 01:57:39AM -0800, David Rientjes wrote:
 On Fri, 7 Feb 2014, Joonsoo Kim wrote:
 
   It seems like a better approach would be to do this when a node is 
   brought 
   online and determine the fallback node based not on the zonelists as you 
   do here but rather on locality (such as through a SLIT if provided, see 
   node_distance()).
  
  Hmm...
  I guess that zonelist is base on locality. Zonelist is generated using
  node_distance(), so I think that it reflects locality. But, I'm not expert
  on NUMA, so please let me know what I am missing here :)
  
 
 The zonelist is, yes, but I'm talking about memoryless and cpuless nodes.  
 If your solution is going to become the generic kernel API that determines 
 what node has local memory for a particular node, then it will have to 
 support all definitions of node.  That includes nodes that consist solely 
 of I/O, chipsets, networking, or storage devices.  These nodes may not 
 have memory or cpus, so doing it as part of onlining cpus isn't going to 
 be generic enough.  You want a node_to_mem_node() API for all possible 
 node types (the possible node types listed above are straight from the 
 ACPI spec).  For 99% of people, node_to_mem_node(X) is always going to be 
 X and we can optimize for that, but any solution that relies on cpu online 
 is probably shortsighted right now.
 
 I think it would be much better to do this as a part of setting a node to 
 be online.

Okay. I got your point.
I will change it to rely on node online if this patch is really needed.

Thanks!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-09 Thread Joonsoo Kim
On Fri, Feb 07, 2014 at 01:38:55PM -0800, Nishanth Aravamudan wrote:
 On 07.02.2014 [12:51:07 -0600], Christoph Lameter wrote:
  Here is a draft of a patch to make this work with memoryless nodes.
 
 Hi Christoph, this should be tested instead of Joonsoo's patch 2 (and 3)?

Hello,

I guess that your system has another problem that makes my patches inactive.
Maybe it will also affect to the Christoph's one. Could you confirm 
page_to_nid(),
numa_mem_id() and node_present_pages although I doubt mostly about 
page_to_nid()?

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node

2014-02-09 Thread Joonsoo Kim
On Fri, Feb 07, 2014 at 11:49:57AM -0600, Christoph Lameter wrote:
 On Fri, 7 Feb 2014, Joonsoo Kim wrote:
 
   This check wouild need to be something that checks for other contigencies
   in the page allocator as well. A simple solution would be to actually run
   a GFP_THIS_NODE alloc to see if you can grab a page from the proper node.
   If that fails then fallback. See how fallback_alloc() does it in slab.
  
 
  Hello, Christoph.
 
  This !node_present_pages() ensure that allocation on this node cannot 
  succeed.
  So we can directly use numa_mem_id() here.
 
 Yes of course we can use numa_mem_id().
 
 But the check is only for not having any memory at all on a node. There
 are other reason for allocations to fail on a certain node. The node could
 have memory that cannot be reclaimed, all dirty, beyond certain
 thresholds, not in the current set of allowed nodes etc etc.

Yes. There are many other cases, but I prefer that we think them separately.
Maybe they needs another approach. For now, to solve memoryless node problem,
my solution is enough and safe.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-09 Thread Joonsoo Kim
On Fri, Feb 07, 2014 at 12:51:07PM -0600, Christoph Lameter wrote:
 Here is a draft of a patch to make this work with memoryless nodes.
 
 The first thing is that we modify node_match to also match if we hit an
 empty node. In that case we simply take the current slab if its there.

Why not inspecting whether we can get the page on the best node such as
numa_mem_id() node?

 
 If there is no current slab then a regular allocation occurs with the
 memoryless node. The page allocator will fallback to a possible node and
 that will become the current slab. Next alloc from a memoryless node
 will then use that slab.
 
 For that we also add some tracking of allocations on nodes that were not
 satisfied using the empty_node[] array. A successful alloc on a node
 clears that flag.
 
 I would rather avoid the empty_node[] array since its global and there may
 be thread specific allocation restrictions but it would be expensive to do
 an allocation attempt via the page allocator to make sure that there is
 really no page available from the page allocator.
 
 Index: linux/mm/slub.c
 ===
 --- linux.orig/mm/slub.c  2014-02-03 13:19:22.896853227 -0600
 +++ linux/mm/slub.c   2014-02-07 12:44:49.311494806 -0600
 @@ -132,6 +132,8 @@ static inline bool kmem_cache_has_cpu_pa
  #endif
  }
 
 +static int empty_node[MAX_NUMNODES];
 +
  /*
   * Issues still to be resolved:
   *
 @@ -1405,16 +1407,22 @@ static struct page *new_slab(struct kmem
   void *last;
   void *p;
   int order;
 + int alloc_node;
 
   BUG_ON(flags  GFP_SLAB_BUG_MASK);
 
   page = allocate_slab(s,
   flags  (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
 - if (!page)
 + if (!page) {
 + if (node != NUMA_NO_NODE)
 + empty_node[node] = 1;
   goto out;
 + }

empty_node cannot be set on memoryless node, since page allocation would
succeed on different node.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory

2014-02-07 Thread Joonsoo Kim
On Thu, Feb 06, 2014 at 11:28:12AM -0800, Nishanth Aravamudan wrote:
 On 06.02.2014 [10:59:55 -0800], Nishanth Aravamudan wrote:
  On 06.02.2014 [17:04:18 +0900], Joonsoo Kim wrote:
   On Wed, Feb 05, 2014 at 06:07:57PM -0800, Nishanth Aravamudan wrote:
On 24.01.2014 [16:25:58 -0800], David Rientjes wrote:
 On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
 
  Thank you for clarifying and providing  a test patch. I ran with 
  this on
  the system showing the original problem, configured to have 15GB of
  memory.
  
  With your patch after boot:
  
  MemTotal:   15604736 kB
  MemFree: 8768192 kB
  Slab:3882560 kB
  SReclaimable: 105408 kB
  SUnreclaim:  3777152 kB
  
  With Anton's patch after boot:
  
  MemTotal:   15604736 kB
  MemFree:11195008 kB
  Slab:1427968 kB
  SReclaimable: 109184 kB
  SUnreclaim:  1318784 kB
  
  
  I know that's fairly unscientific, but the numbers are 
  reproducible. 
  
 
 I don't think the goal of the discussion is to reduce the amount of 
 slab 
 allocated, but rather get the most local slab memory possible by use 
 of 
 kmalloc_node().  When a memoryless node is being passed to 
 kmalloc_node(), 
 which is probably cpu_to_node() for a cpu bound to a node without 
 memory, 
 my patch is allocating it on the most local node; Anton's patch is 
 allocating it on whatever happened to be the cpu slab.
 
   diff --git a/mm/slub.c b/mm/slub.c
   --- a/mm/slub.c
   +++ b/mm/slub.c
   @@ -2278,10 +2278,14 @@ redo:
   
 if (unlikely(!node_match(page, node))) {
 stat(s, ALLOC_NODE_MISMATCH);
   - deactivate_slab(s, page, c-freelist);
   - c-page = NULL;
   - c-freelist = NULL;
   - goto new_slab;
   + if (unlikely(!node_present_pages(node)))
   + node = numa_mem_id();
   + if (!node_match(page, node)) {
   + deactivate_slab(s, page, c-freelist);
   + c-page = NULL;
   + c-freelist = NULL;
   + goto new_slab;
   + }
  
  Semantically, and please correct me if I'm wrong, this patch is 
  saying
  if we have a memoryless node, we expect the page's locality to be 
  that
  of numa_mem_id(), and we still deactivate the slab if that isn't 
  true.
  Just wanting to make sure I understand the intent.
  
 
 Yeah, the default policy should be to fallback to local memory if the 
 node 
 passed is memoryless.
 
  What I find odd is that there are only 2 nodes on this system, node   0
  (empty) and node 1. So won't numa_mem_id() always be 1? And every 
  page
  should be coming from node 1 (thus node_match() should always be 
  true?)
  
 
 The nice thing about slub is its debugging ability, what is 
 /sys/kernel/slab/cache/objects showing in comparison between the two 
 patches?

Ok, I finally got around to writing a script that compares the objects
output from both kernels.

log1 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
and Joonsoo's patch.

log2 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
and Anton's patch.

slab   objectsobjects   percent
   log1   log2  change
---
:t-104 71190  85680  20.353982 %
UDP4352   3392   22.058824 %
inode_cache54302  41923  22.796582 %
fscache_cookie_jar 3276   2457   25.00 %
:t-896 43829233.33 %
:t-080 310401 195323 37.073978 %
ext4_inode_cache   33520140.00 %
:t-192 89408  128898 44.168307 %
:t-184 151300 81880  45.882353 %
:t-512 49698  73648  48.191074 %
:at-192242867 120948 50.199904 %
xfs_inode  34350  15221  55.688501 %
:t-0016384 11005  17257  56.810541 %
proc_inode_cache   103868 34717  66.575846 %
tw_sock_TCP76825666.67 %
:t-0004096 15240  25672  68.451444 %
nfs_inode_cache1008   31568.75 %
:t-0001024 14528  24720  70.154185 %
:t-0032768 6551312

[RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node

2014-02-06 Thread Joonsoo Kim
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/slub.c b/mm/slub.c
index cc1f995..c851f82 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1700,6 +1700,14 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
flags, int node,
void *object;
int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
 
+   if (node == NUMA_NO_NODE)
+   searchnode = numa_mem_id();
+   else {
+   searchnode = node;
+   if (!node_present_pages(node))
+   searchnode = get_numa_mem(node);
+   }
+
object = get_partial_node(s, get_node(s, searchnode), c, flags);
if (object || node != NUMA_NO_NODE)
return object;
@@ -2277,11 +2285,18 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t 
gfpflags, int node,
 redo:
 
if (unlikely(!node_match(page, node))) {
-   stat(s, ALLOC_NODE_MISMATCH);
-   deactivate_slab(s, page, c-freelist);
-   c-page = NULL;
-   c-freelist = NULL;
-   goto new_slab;
+   int searchnode = node;
+
+   if (node != NUMA_NO_NODE  !node_present_pages(node))
+   searchnode = get_numa_mem(node);
+
+   if (!node_match(page, searchnode)) {
+   stat(s, ALLOC_NODE_MISMATCH);
+   deactivate_slab(s, page, c-freelist);
+   c-page = NULL;
+   c-freelist = NULL;
+   goto new_slab;
+   }
}
 
/*
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-06 Thread Joonsoo Kim
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 12ae6ce..a6d5438 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -233,11 +233,20 @@ static inline int numa_node_id(void)
  * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
  */
 DECLARE_PER_CPU(int, _numa_mem_);
+int _node_numa_mem_[MAX_NUMNODES];
 
 #ifndef set_numa_mem
 static inline void set_numa_mem(int node)
 {
this_cpu_write(_numa_mem_, node);
+   _node_numa_mem_[numa_node_id()] = node;
+}
+#endif
+
+#ifndef get_numa_mem
+static inline int get_numa_mem(int node)
+{
+   return _node_numa_mem_[node];
 }
 #endif
 
@@ -260,6 +269,7 @@ static inline int cpu_to_mem(int cpu)
 static inline void set_cpu_numa_mem(int cpu, int node)
 {
per_cpu(_numa_mem_, cpu) = node;
+   _node_numa_mem_[numa_node_id()] = node;
 }
 #endif
 
@@ -273,6 +283,13 @@ static inline int numa_mem_id(void)
 }
 #endif
 
+#ifndef get_numa_mem
+static inline int get_numa_mem(int node)
+{
+   return node;
+}
+#endif
+
 #ifndef cpu_to_mem
 static inline int cpu_to_mem(int cpu)
 {
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id()

2014-02-06 Thread Joonsoo Kim
Currently, if allocation constraint to node is NUMA_NO_NODE, we search
a partial slab on numa_node_id() node. This doesn't work properly on the
system having memoryless node, since it can have no memory on that node and
there must be no partial slab on that node.

On that node, page allocation always fallback to numa_mem_id() first. So
searching a partial slab on numa_node_id() in that case is proper solution
for memoryless node case.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/slub.c b/mm/slub.c
index 545a170..cc1f995 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1698,7 +1698,7 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
flags, int node,
struct kmem_cache_cpu *c)
 {
void *object;
-   int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
+   int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
 
object = get_partial_node(s, get_node(s, searchnode), c, flags);
if (object || node != NUMA_NO_NODE)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-06 Thread Joonsoo Kim
2014-02-06 David Rientjes rient...@google.com:
 On Thu, 6 Feb 2014, Joonsoo Kim wrote:

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com


 I may be misunderstanding this patch and there's no help because there's
 no changelog.

Sorry about that.
I made this patch just for testing. :)
Thanks for looking this.

 diff --git a/include/linux/topology.h b/include/linux/topology.h
 index 12ae6ce..a6d5438 100644
 --- a/include/linux/topology.h
 +++ b/include/linux/topology.h
 @@ -233,11 +233,20 @@ static inline int numa_node_id(void)
   * Use the accessor functions set_numa_mem(), numa_mem_id() and 
 cpu_to_mem().
   */
  DECLARE_PER_CPU(int, _numa_mem_);
 +int _node_numa_mem_[MAX_NUMNODES];

  #ifndef set_numa_mem
  static inline void set_numa_mem(int node)
  {
   this_cpu_write(_numa_mem_, node);
 + _node_numa_mem_[numa_node_id()] = node;
 +}
 +#endif
 +
 +#ifndef get_numa_mem
 +static inline int get_numa_mem(int node)
 +{
 + return _node_numa_mem_[node];
  }
  #endif

 @@ -260,6 +269,7 @@ static inline int cpu_to_mem(int cpu)
  static inline void set_cpu_numa_mem(int cpu, int node)
  {
   per_cpu(_numa_mem_, cpu) = node;
 + _node_numa_mem_[numa_node_id()] = node;

 The intention seems to be that _node_numa_mem_[X] for a node X will return
 a node Y with memory that has the nearest distance?  In other words,
 caching the value returned by local_memory_node(X)?

Yes, you are right.

 That doesn't seem to be what it's doing since numa_node_id() is the node
 of the cpu that current is running on so this ends up getting initialized
 to whatever local_memory_node(cpu_to_node(cpu)) is for the last bit set in
 cpu_possible_mask.

Yes, I made a mistake.
Thanks for pointer.
I fix it and attach v2.
Now I'm out of office, so I'm not sure this second version is correct :(

Thanks.

--8--
From bf691e7eb07f966e3aed251eaeb18f229ee32d1f Mon Sep 17 00:00:00 2001
From: Joonsoo Kim iamjoonsoo@lge.com
Date: Thu, 6 Feb 2014 17:07:05 +0900
Subject: [RFC PATCH 2/3 v2] topology: support node_numa_mem() for
determining the
 fallback node

We need to determine the fallback node in slub allocator if the allocation
target node is memoryless node. Without it, the SLUB wrongly select
the node which has no memory and can't use a partial slab, because of node
mismatch. Introduced function, node_numa_mem(X), will return
a node Y with memory that has the nearest distance. If X is memoryless
node, it will return nearest distance node, but, if
X is normal node, it will return itself.

We will use this function in following patch to determine the fallback
node.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 12ae6ce..66b19b8 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -233,11 +233,20 @@ static inline int numa_node_id(void)
  * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
  */
 DECLARE_PER_CPU(int, _numa_mem_);
+int _node_numa_mem_[MAX_NUMNODES];

 #ifndef set_numa_mem
 static inline void set_numa_mem(int node)
 {
  this_cpu_write(_numa_mem_, node);
+ _node_numa_mem_[numa_node_id()] = node;
+}
+#endif
+
+#ifndef get_numa_mem
+static inline int get_numa_mem(int node)
+{
+ return _node_numa_mem_[node];
 }
 #endif

@@ -260,6 +269,7 @@ static inline int cpu_to_mem(int cpu)
 static inline void set_cpu_numa_mem(int cpu, int node)
 {
  per_cpu(_numa_mem_, cpu) = node;
+ _node_numa_mem_[cpu_to_node(cpu)] = node;
 }
 #endif

@@ -273,6 +283,13 @@ static inline int numa_mem_id(void)
 }
 #endif

+#ifndef get_numa_mem
+static inline int get_numa_mem(int node)
+{
+ return node;
+}
+#endif
+
 #ifndef cpu_to_mem
 static inline int cpu_to_mem(int cpu)
 {
-- 
1.7.9.5
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-06 Thread Joonsoo Kim
On Thu, Feb 06, 2014 at 11:11:31AM -0800, Nishanth Aravamudan wrote:
  diff --git a/include/linux/topology.h b/include/linux/topology.h
  index 12ae6ce..66b19b8 100644
  --- a/include/linux/topology.h
  +++ b/include/linux/topology.h
  @@ -233,11 +233,20 @@ static inline int numa_node_id(void)
* Use the accessor functions set_numa_mem(), numa_mem_id() and 
  cpu_to_mem().
*/
   DECLARE_PER_CPU(int, _numa_mem_);
  +int _node_numa_mem_[MAX_NUMNODES];
 
 Should be static, I think?

Yes, will update it.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node

2014-02-06 Thread Joonsoo Kim
On Thu, Feb 06, 2014 at 11:30:20AM -0600, Christoph Lameter wrote:
 On Thu, 6 Feb 2014, Joonsoo Kim wrote:
 
  diff --git a/mm/slub.c b/mm/slub.c
  index cc1f995..c851f82 100644
  --- a/mm/slub.c
  +++ b/mm/slub.c
  @@ -1700,6 +1700,14 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
  flags, int node,
  void *object;
  int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
 
  +   if (node == NUMA_NO_NODE)
  +   searchnode = numa_mem_id();
  +   else {
  +   searchnode = node;
  +   if (!node_present_pages(node))
 
 This check wouild need to be something that checks for other contigencies
 in the page allocator as well. A simple solution would be to actually run
 a GFP_THIS_NODE alloc to see if you can grab a page from the proper node.
 If that fails then fallback. See how fallback_alloc() does it in slab.
 

Hello, Christoph.

This !node_present_pages() ensure that allocation on this node cannot succeed.
So we can directly use numa_mem_id() here.

  +   searchnode = get_numa_mem(node);
  +   }
 
  @@ -2277,11 +2285,18 @@ static void *__slab_alloc(struct kmem_cache *s, 
  gfp_t gfpflags, int node,
   redo:
 
  if (unlikely(!node_match(page, node))) {
  -   stat(s, ALLOC_NODE_MISMATCH);
  -   deactivate_slab(s, page, c-freelist);
  -   c-page = NULL;
  -   c-freelist = NULL;
  -   goto new_slab;
  +   int searchnode = node;
  +
  +   if (node != NUMA_NO_NODE  !node_present_pages(node))
 
 Same issue here. I would suggest not deactivating the slab and first check
 if the node has no pages. If so then just take an object from the current
 cpu slab. If that is not available do an allcoation from the indicated
 node and take whatever the page allocator gave you.

Here I do is not to deactivate the slab. I first check if the node has no pages.
And then, not taking an object from the current cpu slab. Instead, checking
current cpu slab comes from proper node getting from introduced get_numa_mem().
I think that this approach is better than just taking an object whatever node
requested.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-06 Thread Joonsoo Kim
On Thu, Feb 06, 2014 at 12:52:11PM -0800, David Rientjes wrote:
 On Thu, 6 Feb 2014, Joonsoo Kim wrote:
 
  From bf691e7eb07f966e3aed251eaeb18f229ee32d1f Mon Sep 17 00:00:00 2001
  From: Joonsoo Kim iamjoonsoo@lge.com
  Date: Thu, 6 Feb 2014 17:07:05 +0900
  Subject: [RFC PATCH 2/3 v2] topology: support node_numa_mem() for
  determining the
   fallback node
  
  We need to determine the fallback node in slub allocator if the allocation
  target node is memoryless node. Without it, the SLUB wrongly select
  the node which has no memory and can't use a partial slab, because of node
  mismatch. Introduced function, node_numa_mem(X), will return
  a node Y with memory that has the nearest distance. If X is memoryless
  node, it will return nearest distance node, but, if
  X is normal node, it will return itself.
  
  We will use this function in following patch to determine the fallback
  node.
  
 
 I like the approach and it may fix the problem today, but it may not be 
 sufficient in the future: nodes may not only be memoryless but they may 
 also be cpuless.  It's possible that a node can only have I/O, networking, 
 or storage devices and we can define affinity for them that is remote from 
 every cpu and/or memory by the ACPI specification.
 
 It seems like a better approach would be to do this when a node is brought 
 online and determine the fallback node based not on the zonelists as you 
 do here but rather on locality (such as through a SLIT if provided, see 
 node_distance()).

Hmm...
I guess that zonelist is base on locality. Zonelist is generated using
node_distance(), so I think that it reflects locality. But, I'm not expert
on NUMA, so please let me know what I am missing here :)

 Also, the names aren't very descriptive: {get,set}_numa_mem() doesn't make 
 a lot of sense in generic code.  I'd suggest something like 
 node_to_mem_node().

It's much better!
If this patch eventually will be needed, I will update it.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory

2014-01-26 Thread Joonsoo Kim
On Fri, Jan 24, 2014 at 05:10:42PM -0800, Nishanth Aravamudan wrote:
 On 24.01.2014 [16:25:58 -0800], David Rientjes wrote:
  On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
  
   Thank you for clarifying and providing  a test patch. I ran with this on
   the system showing the original problem, configured to have 15GB of
   memory.
   
   With your patch after boot:
   
   MemTotal:   15604736 kB
   MemFree: 8768192 kB
   Slab:3882560 kB
   SReclaimable: 105408 kB
   SUnreclaim:  3777152 kB
   
   With Anton's patch after boot:
   
   MemTotal:   15604736 kB
   MemFree:11195008 kB
   Slab:1427968 kB
   SReclaimable: 109184 kB
   SUnreclaim:  1318784 kB
   
   
   I know that's fairly unscientific, but the numbers are reproducible. 
   

Hello,

I think that there is one mistake on David's patch although I'm not sure
that it is the reason for this result.

With David's patch, get_partial() in new_slab_objects() doesn't work properly,
because we only change node id in !node_match() case. If we meet just !freelist
case, we pass node id directly to new_slab_objects(), so we always try to 
allocate
new slab page regardless existence of partial pages. We should solve it.

Could you try this one?

Thanks.

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1698,8 +1698,10 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
flags, int node,
struct kmem_cache_cpu *c)
 {
void *object;
-   int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
+   int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
 
+   if (node != NUMA_NO_NODE  !node_present_pages(node))
+   searchnode = numa_mem_id();
object = get_partial_node(s, get_node(s, searchnode), c, flags);
if (object || node != NUMA_NO_NODE)
return object;
@@ -2278,10 +2280,14 @@ redo:
 
if (unlikely(!node_match(page, node))) {
stat(s, ALLOC_NODE_MISMATCH);
-   deactivate_slab(s, page, c-freelist);
-   c-page = NULL;
-   c-freelist = NULL;
-   goto new_slab;
+   if (unlikely(!node_present_pages(node)))
+   node = numa_mem_id();
+   if (!node_match(page, node)) {
+   deactivate_slab(s, page, c-freelist);
+   c-page = NULL;
+   c-freelist = NULL;
+   goto new_slab;
+   }
}
 
/*

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory

2014-01-08 Thread Joonsoo Kim
On Tue, Jan 07, 2014 at 05:52:31PM +0800, Wanpeng Li wrote:
 On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
 On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:

  Index: b/mm/slub.c
  ===
  --- a/mm/slub.c
  +++ b/mm/slub.c
  @@ -2278,10 +2278,17 @@ redo:
   
 if (unlikely(!node_match(page, node))) {
 stat(s, ALLOC_NODE_MISMATCH);
  -  deactivate_slab(s, page, c-freelist);
  -  c-page = NULL;
  -  c-freelist = NULL;
  -  goto new_slab;
  +
  +  /*
  +   * If the node contains no memory there is no point in trying
  +   * to allocate a new node local slab
  +   */
  +  if (node_spanned_pages(node)) {
  +  deactivate_slab(s, page, c-freelist);
  +  c-page = NULL;
  +  c-freelist = NULL;
  +  goto new_slab;
  +  }
 }
   
 /*
 
 Hello,
 
 I think that we need more efforts to solve unbalanced node problem.
 
 With this patch, even if node of current cpu slab is not favorable to
 unbalanced node, allocation would proceed and we would get the unintended 
 memory.
 
 And there is one more problem. Even if we have some partial slabs on
 compatible node, we would allocate new slab, because get_partial() cannot 
 handle
 this unbalance node case.
 
 To fix this correctly, how about following patch?
 
 Thanks.
 
 -8
 diff --git a/mm/slub.c b/mm/slub.c
 index c3eb3d3..a1f6dfa 100644
 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
 flags, int node,
  {
 void *object;
 int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
 +   struct zonelist *zonelist;
 +   struct zoneref *z;
 +   struct zone *zone;
 +   enum zone_type high_zoneidx = gfp_zone(flags);
 
 +   if (!node_present_pages(searchnode)) {
 +   zonelist = node_zonelist(searchnode, flags);
 +   for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
 +   searchnode = zone_to_nid(zone);
 +   if (node_present_pages(searchnode))
 +   break;
 +   }
 +   }
 
 Why change searchnode instead of depending on fallback zones/nodes in 
 get_any_partial() to allocate partial slabs?
 

If node != NUMA_NO_NODE, get_any_partial() isn't called.
That's why I change searchnode here instead of get_any_partial().

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory

2014-01-07 Thread Joonsoo Kim
On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
 
 We noticed a huge amount of slab memory consumed on a large ppc64 box:
 
 Slab:2094336 kB
 
 Almost 2GB. This box is not balanced and some nodes do not have local
 memory, causing slub to be very inefficient in its slab usage.
 
 Each time we call kmem_cache_alloc_node slub checks the per cpu slab,
 sees it isn't node local, deactivates it and tries to allocate a new
 slab. On empty nodes we will allocate a new remote slab and use the
 first slot, but as explained above when we get called a second time
 we will just deactivate that slab and retry.
 
 As such we end up only using 1 entry in each slab:
 
 slabmem  objects
used   active
 
 kmalloc-16384   1404 MB4.90%
 task_struct  668 MB2.90%
 kmalloc-128  193 MB3.61%
 kmalloc-192  152 MB5.23%
 kmalloc-8192  72 MB   23.40%
 kmalloc-1664 MB7.43%
 kmalloc-512   33 MB   22.41%
 
 The patch below checks that a node is not empty before deactivating a
 slab and trying to allocate it again. With this patch applied we now
 use about 352MB:
 
 Slab: 360192 kB
 
 And our efficiency is much better:
 
 slabmem  objects
used   active
 
 kmalloc-16384 92 MB   74.27%
 task_struct   23 MB   83.46%
 idr_layer_cache   18 MB  100.00%
 pgtable-2^12  17 MB  100.00%
 kmalloc-65536 15 MB  100.00%
 inode_cache   14 MB  100.00%
 kmalloc-256   14 MB   97.81%
 kmalloc-8192  14 MB   85.71%
 
 Signed-off-by: Anton Blanchard an...@samba.org
 ---
 
 Thoughts? It seems like we could hit a similar situation if a machine
 is balanced but we run out of memory on a single node.
 
 Index: b/mm/slub.c
 ===
 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -2278,10 +2278,17 @@ redo:
  
   if (unlikely(!node_match(page, node))) {
   stat(s, ALLOC_NODE_MISMATCH);
 - deactivate_slab(s, page, c-freelist);
 - c-page = NULL;
 - c-freelist = NULL;
 - goto new_slab;
 +
 + /*
 +  * If the node contains no memory there is no point in trying
 +  * to allocate a new node local slab
 +  */
 + if (node_spanned_pages(node)) {
 + deactivate_slab(s, page, c-freelist);
 + c-page = NULL;
 + c-freelist = NULL;
 + goto new_slab;
 + }
   }
  
   /*

Hello,

I think that we need more efforts to solve unbalanced node problem.

With this patch, even if node of current cpu slab is not favorable to
unbalanced node, allocation would proceed and we would get the unintended 
memory.

And there is one more problem. Even if we have some partial slabs on
compatible node, we would allocate new slab, because get_partial() cannot handle
this unbalance node case.

To fix this correctly, how about following patch?

Thanks.

-8
diff --git a/mm/slub.c b/mm/slub.c
index c3eb3d3..a1f6dfa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
flags, int node,
 {
void *object;
int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
+   struct zonelist *zonelist;
+   struct zoneref *z;
+   struct zone *zone;
+   enum zone_type high_zoneidx = gfp_zone(flags);
 
+   if (!node_present_pages(searchnode)) {
+   zonelist = node_zonelist(searchnode, flags);
+   for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+   searchnode = zone_to_nid(zone);
+   if (node_present_pages(searchnode))
+   break;
+   }
+   }
object = get_partial_node(s, get_node(s, searchnode), c, flags);
if (object || node != NUMA_NO_NODE)
return object;
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory

2014-01-07 Thread Joonsoo Kim
On Tue, Jan 07, 2014 at 04:48:40PM +0800, Wanpeng Li wrote:
 Hi Joonsoo,
 On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
 On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
  
 [...]
 Hello,
 
 I think that we need more efforts to solve unbalanced node problem.
 
 With this patch, even if node of current cpu slab is not favorable to
 unbalanced node, allocation would proceed and we would get the unintended 
 memory.
 
 
 We have a machine:
 
 [0.00] Node 0 Memory:
 [0.00] Node 4 Memory: 0x0-0x1000 0x2000-0x6000 
 0x8000-0xc000
 [0.00] Node 6 Memory: 0x1000-0x2000 0x6000-0x8000
 [0.00] Node 10 Memory: 0xc000-0x18000
 
 [0.041486] Node 0 CPUs: 0-19
 [0.041490] Node 4 CPUs:
 [0.041492] Node 6 CPUs:
 [0.041495] Node 10 CPUs:
 
 The pages of current cpu slab should be allocated from fallback zones/nodes 
 of the memoryless node in buddy system, how can not favorable happen? 

Hi, Wanpeng.

IIRC, if we call kmem_cache_alloc_node() with certain node #, we try to
allocate the page in fallback zones/node of that node #. So fallback list isn't
related to fallback one of memoryless node #. Am I wrong?

Thanks.

 
 And there is one more problem. Even if we have some partial slabs on
 compatible node, we would allocate new slab, because get_partial() cannot 
 handle
 this unbalance node case.
 
 To fix this correctly, how about following patch?
 
 
 So I think we should fold both of your two patches to one.
 
 Regards,
 Wanpeng Li 
 
 Thanks.
 
 -8
 diff --git a/mm/slub.c b/mm/slub.c
 index c3eb3d3..a1f6dfa 100644
 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
 flags, int node,
  {
 void *object;
 int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
 +   struct zonelist *zonelist;
 +   struct zoneref *z;
 +   struct zone *zone;
 +   enum zone_type high_zoneidx = gfp_zone(flags);
 
 +   if (!node_present_pages(searchnode)) {
 +   zonelist = node_zonelist(searchnode, flags);
 +   for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
 +   searchnode = zone_to_nid(zone);
 +   if (node_present_pages(searchnode))
 +   break;
 +   }
 +   }
 object = get_partial_node(s, get_node(s, searchnode), c, flags);
 if (object || node != NUMA_NO_NODE)
 return object;
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory

2014-01-07 Thread Joonsoo Kim
On Tue, Jan 07, 2014 at 05:21:45PM +0800, Wanpeng Li wrote:
 On Tue, Jan 07, 2014 at 06:10:16PM +0900, Joonsoo Kim wrote:
 On Tue, Jan 07, 2014 at 04:48:40PM +0800, Wanpeng Li wrote:
  Hi Joonsoo,
  On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
  On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
   
  [...]
  Hello,
  
  I think that we need more efforts to solve unbalanced node problem.
  
  With this patch, even if node of current cpu slab is not favorable to
  unbalanced node, allocation would proceed and we would get the unintended 
  memory.
  
  
  We have a machine:
  
  [0.00] Node 0 Memory:
  [0.00] Node 4 Memory: 0x0-0x1000 0x2000-0x6000 
  0x8000-0xc000
  [0.00] Node 6 Memory: 0x1000-0x2000 0x6000-0x8000
  [0.00] Node 10 Memory: 0xc000-0x18000
  
  [0.041486] Node 0 CPUs: 0-19
  [0.041490] Node 4 CPUs:
  [0.041492] Node 6 CPUs:
  [0.041495] Node 10 CPUs:
  
  The pages of current cpu slab should be allocated from fallback 
  zones/nodes 
  of the memoryless node in buddy system, how can not favorable happen? 
 
 Hi, Wanpeng.
 
 IIRC, if we call kmem_cache_alloc_node() with certain node #, we try to
 allocate the page in fallback zones/node of that node #. So fallback list 
 isn't
 related to fallback one of memoryless node #. Am I wrong?
 
 
 Anton add node_spanned_pages(node) check, so current cpu slab mentioned
 above is against memoryless node. If I miss something?

I thought following scenario.

memoryless node # : 1
1's fallback node # : 0

On node 1's cpu,

1. kmem_cache_alloc_node (node 2)
2. allocate the page on node 2 for the slab, now cpu slab is that one.
3. kmem_cache_alloc_node (local node, that is, node 1)
4. It check node_spanned_pages() and find it is memoryless node.
So return node 2's memory.

Is it impossible scenario?

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: change free_bootmem() to kfree()

2012-11-12 Thread Joonsoo Kim
commit ea96025a('Don't use alloc_bootmem() in init_IRQ() path')
changed alloc_bootmem() to kzalloc(),
but missed to change free_bootmem() to kfree().
So correct it.

Signed-off-by: Joonsoo Kim js1...@gmail.com

diff --git a/arch/powerpc/platforms/82xx/pq2ads-pci-pic.c 
b/arch/powerpc/platforms/82xx/pq2ads-pci-pic.c
index 328d221..74861a7 100644
--- a/arch/powerpc/platforms/82xx/pq2ads-pci-pic.c
+++ b/arch/powerpc/platforms/82xx/pq2ads-pci-pic.c
@@ -16,7 +16,6 @@
 #include linux/spinlock.h
 #include linux/irq.h
 #include linux/types.h
-#include linux/bootmem.h
 #include linux/slab.h
 
 #include asm/io.h
@@ -149,7 +148,7 @@ int __init pq2ads_pci_init_irq(void)
priv-regs = of_iomap(np, 0);
if (!priv-regs) {
printk(KERN_ERR Cannot map PCI PIC registers.\n);
-   goto out_free_bootmem;
+   goto out_free_kmalloc;
}
 
/* mask all PCI interrupts */
@@ -171,9 +170,8 @@ int __init pq2ads_pci_init_irq(void)
 
 out_unmap_regs:
iounmap(priv-regs);
-out_free_bootmem:
-   free_bootmem((unsigned long)priv,
-sizeof(struct pq2ads_pci_pic));
+out_free_kmalloc:
+   kfree(priv);
of_node_put(np);
 out_unmap_irq:
irq_dispose_mapping(irq);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev