from:"Michal Nazarewicz"

[PATCHv2] staging: go7007: fix use of uninitialised pointer

2013-11-11 Thread Michal Nazarewicz

go variable is initialised only after the switch case so it cannot be
dereferenced prior to that happening.

Signed-off-by: Michal Nazarewicz min...@mina86.com
---
 drivers/staging/media/go7007/go7007-usb.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

On Sun, Nov 10 2013, Dan Carpenter wrote:
 There are 3 other uses before go gets initialized.

Argh...  Other occurrences of the letters “GO” deceived my eyes.  Sorry
about that and thanks.

diff --git a/drivers/staging/media/go7007/go7007-usb.c 
b/drivers/staging/media/go7007/go7007-usb.c
index 58684da..ee8ab89 100644
--- a/drivers/staging/media/go7007/go7007-usb.c
+++ b/drivers/staging/media/go7007/go7007-usb.c
@@ -1057,7 +1057,7 @@ static int go7007_usb_probe(struct usb_interface *intf,
char *name;
int video_pipe, i, v_urb_len;
 
-   dev_dbg(go-dev, probing new GO7007 USB board\n);
+   dev_dbg(intf-dev, probing new GO7007 USB board\n);
 
switch (id-driver_info) {
case GO7007_BOARDID_MATRIX_II:
@@ -1097,13 +1097,13 @@ static int go7007_usb_probe(struct usb_interface *intf,
board = board_px_tv402u;
break;
case GO7007_BOARDID_LIFEVIEW_LR192:
-   dev_err(go-dev, The Lifeview TV Walker Ultra is not 
supported. Sorry!\n);
+   dev_err(intf-dev, The Lifeview TV Walker Ultra is not 
supported. Sorry!\n);
return -ENODEV;
name = Lifeview TV Walker Ultra;
board = board_lifeview_lr192;
break;
case GO7007_BOARDID_SENSORAY_2250:
-   dev_info(go-dev, Sensoray 2250 found\n);
+   dev_info(intf-dev, Sensoray 2250 found\n);
name = Sensoray 2250/2251;
board = board_sensoray_2250;
break;
@@ -1112,7 +1112,7 @@ static int go7007_usb_probe(struct usb_interface *intf,
board = board_ads_usbav_709;
break;
default:
-   dev_err(go-dev, unknown board ID %d!\n,
+   dev_err(intf-dev, unknown board ID %d!\n,
(unsigned int)id-driver_info);
return -ENODEV;
}
-- 
1.8.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] staging: go7007: fix use of uninitialised pointer

2013-11-10 Thread Michal Nazarewicz

From: Michal Nazarewicz min...@mina86.com

The go variable is declade without initialisation and invocation of
dev_dbg immediatelly tries to dereference it.
---
 drivers/staging/media/go7007/go7007-usb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/go7007/go7007-usb.c 
b/drivers/staging/media/go7007/go7007-usb.c
index 58684da..457ab63 100644
--- a/drivers/staging/media/go7007/go7007-usb.c
+++ b/drivers/staging/media/go7007/go7007-usb.c
@@ -1057,7 +1057,7 @@ static int go7007_usb_probe(struct usb_interface *intf,
char *name;
int video_pipe, i, v_urb_len;
 
-   dev_dbg(go-dev, probing new GO7007 USB board\n);
+   pr_debug(probing new GO7007 USB board\n);
 
switch (id-driver_info) {
case GO7007_BOARDID_MATRIX_II:
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/7] staging: go7007: fix use of uninitialised pointer

2013-11-10 Thread Michal Nazarewicz


Signed-off-by: Michal Nazarewicz min...@mina86.com
---
 drivers/staging/media/go7007/go7007-usb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

On Sun, Nov 10 2013, Greg Kroah-Hartman wrote:
 Please either delete this entirely, or use the struct device in the
 usb_interface pointer.

 A driver should never have a raw pr_* call.

diff --git a/drivers/staging/media/go7007/go7007-usb.c 
b/drivers/staging/media/go7007/go7007-usb.c
index 58684da..e8c708c 100644
--- a/drivers/staging/media/go7007/go7007-usb.c
+++ b/drivers/staging/media/go7007/go7007-usb.c
@@ -1057,7 +1057,7 @@ static int go7007_usb_probe(struct usb_interface *intf,
char *name;
int video_pipe, i, v_urb_len;
 
-   dev_dbg(go-dev, probing new GO7007 USB board\n);
+   dev_dbg(intf-dev, probing new GO7007 USB board\n);
 
switch (id-driver_info) {
case GO7007_BOARDID_MATRIX_II:
-- 
1.8.3.2
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] dma-buf: add helpers for attacher dma-parms

2012-08-06 Thread Michal Nazarewicz


Tomasz Stanislawski t.stanisl...@samsung.com writes:
 I recommend to change the semantics for unlimited number of segments
 from 'value 0' to:

 #define DMA_SEGMENTS_COUNT_UNLIMITED ((unsigned long)INT_MAX)

 Using INT_MAX will allow using safe conversions between signed and
 unsigned integers.

LONG_MAX seems cleaner regardless.

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--

pgpMdROuCD0IG.pgp
Description: PGP signature

Re: [Linaro-mm-sig] [PATCHv23 00/16] Contiguous Memory Allocator

2012-02-29 Thread Michal Nazarewicz


On Wed, 29 Feb 2012 10:35:42 +0100, Barry Song 21cn...@gmail.com wrote:


2012/2/23 Marek Szyprowski m.szyprow...@samsung.com:

This is (yet another) quick update of CMA patches. I've rebased them
onto next-20120222 tree from
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git and
fixed the bug pointed by Aaro Koskinen.


For the whole series:

Tested-by: Barry Song baohua.s...@csr.com

and i also write a simple kernel helper to test the CMA:


Would it make sense to make a patch out of it putting it to tools/cma (or 
similar)?

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCHv23 00/16] Contiguous Memory Allocator

2012-02-29 Thread Michal Nazarewicz


On Wed, 29 Feb 2012 14:34:02 +0100, Barry Song 21cn...@gmail.com wrote:


2012/2/29 Michal Nazarewicz min...@mina86.com:

On Wed, 29 Feb 2012 10:35:42 +0100, Barry Song 21cn...@gmail.com wrote:

and i also write a simple kernel helper to test the CMA:



Would it make sense to make a patch out of it putting it to tools/cma (or
similar)?



i can send a patch for this. i am just thinking, should it be placed
in tools/ as a test utility or Documents/ as an example to explain CMA
to users who want to use cma.


I'd put it in tools/.


i also think we should have a seperate document to explain cma in details
in documents/, and my helper program can be placed there.


I'm currently writing an article for LWN which I hope will also lead to 
something
worth putting in Documentation/.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCHv21 12/16] mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks

2012-02-13 Thread Michal Nazarewicz


On Fri, Feb 10, 2012 at 11:32 AM, Marek Szyprowski
m.szyprow...@samsung.com wrote:

@@ -5637,6 +5642,56 @@ static int __alloc_contig_migrate_range(unsigned long 
start, unsigned long end)
   return ret  0 ? 0 : ret;
 }

+/*
+ * Update zone's cma pages counter used for watermark level calculation.
+ */
+static inline void __update_cma_wmark_pages(struct zone *zone, int count)
+{
+   unsigned long flags;
+   spin_lock_irqsave(zone-lock, flags);
+   zone-min_cma_pages += count;
+   spin_unlock_irqrestore(zone-lock, flags);
+   setup_per_zone_wmarks();
+}
+
+/*
+ * Trigger memory pressure bump to reclaim some pages in order to be able to
+ * allocate 'count' pages in single page units. Does similar work as
+ *__alloc_pages_slowpath() function.
+ */
+static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
+{
+   enum zone_type high_zoneidx = gfp_zone(gfp_mask);
+   struct zonelist *zonelist = node_zonelist(0, gfp_mask);
+   int did_some_progress = 0;
+   int order = 1;
+   unsigned long watermark;
+
+   /*
+* Increase level of watermarks to force kswapd do his job
+* to stabilise at new watermark level.
+*/
+   __modify_min_cma_pages(zone, count);


On Mon, 13 Feb 2012 10:57:58 -0800, Robert Nelson robertcnel...@gmail.com 
wrote:

Hi Marek,   This ^^^ function doesn't seem to exist in this patchset,
is it in another set posted to lkml?


This should read __update_cma_wmark_pages().  Sorry for the incorrect patch.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCHv21 12/16] mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks

2012-02-13 Thread Michal Nazarewicz


On Fri, Feb 10, 2012 at 11:32 AM, Marek Szyprowski

+static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
+{
+   enum zone_type high_zoneidx = gfp_zone(gfp_mask);
+   struct zonelist *zonelist = node_zonelist(0, gfp_mask);
+   int did_some_progress = 0;
+   int order = 1;
+   unsigned long watermark;
+
+   /*
+* Increase level of watermarks to force kswapd do his job
+* to stabilise at new watermark level.
+*/
+   __modify_min_cma_pages(zone, count);



2012/2/13 Michal Nazarewicz min...@mina86.com:

This should read __update_cma_wmark_pages().  Sorry for the incorrect patch.


On Mon, 13 Feb 2012 13:15:13 -0800, Robert Nelson robertcnel...@gmail.com 
wrote:

Thanks Michal, that fixed it..


You are most welcome.


cma-v21 with Rob Clark's omapdrm works great on the Beagle xM..


Can we take that as:

Tested-by: Robert Nelson robertcnel...@gmail.com

? ;)

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks

2012-02-08 Thread Michal Nazarewicz


On Wed, 08 Feb 2012 03:04:18 +0100, sandeep patil psandee...@gmail.com wrote:

There's another problem I am facing with zone watermarks and CMA.

Test details:
Memory  : 480 MB of total memory, 128 MB CMA region
Test case : around 600 MB of file transfer over USB RNDIS onto target
System Load : ftpd with console running on target.
No one is doing CMA allocations except for the DMA allocations done by the
drivers.

Result : After about 300MB transfer, I start getting GFP_ATOMIC
allocation failures.  This only happens if CMA region is reserved.


[...]


Total memory available is way above the zone watermarks. So, we ended
up starving
UNMOVABLE/RECLAIMABLE atomic allocations that cannot fallback on CMA region.


This looks like something Mel warned me about.  I don't really have a good
solution for that yet. ;/

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/15] mm: compaction: export some of the functions

2012-02-05 Thread Michal Nazarewicz


On Fri, Feb 3, 2012 at 8:18 PM, Marek Szyprowski
m.szyprow...@samsung.com wrote:

From: Michal Nazarewicz min...@mina86.com

This commit exports some of the functions from compaction.c file
outside of it adding their declaration into internal.h header
file so that other mm related code can use them.

This forced compaction.c to always be compiled (as opposed to being
compiled only if CONFIG_COMPACTION is defined) but as to avoid
introducing code that user did not ask for, part of the compaction.c
is now wrapped in on #ifdef.


On Sun, 05 Feb 2012 08:40:08 +0100, Hillf Danton dhi...@gmail.com wrote:

What if both compaction and CMA are not enabled?


What about it?  If both are enabled, both will be compiled and usable.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added

2012-02-05 Thread Michal Nazarewicz


+static inline bool migrate_async_suitable(int migratetype)



On Fri, 03 Feb 2012 15:19:54 +0100, Hillf Danton dhi...@gmail.com wrote:

Just nitpick, since the helper is not directly related to what async
means, how about migrate_suitable(int migrate_type) ?



2012/2/3 Michal Nazarewicz min...@mina86.com:

I feel current name is better suited since it says that it's OK to scan this
block if it's an asynchronous compaction run.


On Sat, 04 Feb 2012 10:09:02 +0100, Hillf Danton dhi...@gmail.com wrote:

The input is the migrate type of page considered, and the async is only one
of the modes that compaction should be carried out. Plus the helper is
also used in other cases where async is entirely not concerned.

That said, the naming is not clear, if not misleading.


In the first version the function was called is_migrate_cma_or_movable() which
described what the function checked.  Mel did not like it though, hence the
change to migrate_async_suitable().  Honestly, I'm not sure what would be the
best name for function.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added

2012-02-03 Thread Michal Nazarewicz


On Fri, Feb 3, 2012 at 8:18 PM, Marek Szyprowski
m.szyprow...@samsung.com wrote:

From: Michal Nazarewicz min...@mina86.com
diff --git a/mm/compaction.c b/mm/compaction.c
index d5174c4..a6e7c64 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -45,6 +45,11 @@ static void map_pages(struct list_head *list)
   }
 }

+static inline bool migrate_async_suitable(int migratetype)


On Fri, 03 Feb 2012 15:19:54 +0100, Hillf Danton dhi...@gmail.com wrote:

Just nitpick, since the helper is not directly related to what async means,
how about migrate_suitable(int migrate_type) ?


I feel current name is better suited since it says that it's OK to scan this
block if it's an asynchronous compaction run.


+{
+   return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+}
+
 /*
 * Isolate free pages onto a private freelist. Caller must hold zone-lock.
 * If @strict is true, will abort returning 0 on any invalid PFNs or non-free
@@ -277,7 +282,7 @@ isolate_migratepages_range(struct zone *zone, struct 
compact_control *cc,
*/
   pageblock_nr = low_pfn  pageblock_order;
   if (!cc-sync  last_pageblock_nr != pageblock_nr 
-   get_pageblock_migratetype(page) != 
MIGRATE_MOVABLE) {
+   migrate_async_suitable(get_pageblock_migratetype(page))) {


Here compaction looks corrupted if CMA not enabled, Mel?


Damn, yes, this should be !migrate_async_suitable(...).  Sorry about that.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating

2012-02-02 Thread Michal Nazarewicz


On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote:

Pages, which have incorrect migrate type on free finally
causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE.


On Thu, 02 Feb 2012 13:47:29 +0100, Mel Gorman m...@csn.ul.ie wrote:

I'm not quite seeing this. In free_hot_cold_page(), the pageblock
type is checked so the page private should be set to MIGRATE_CMA or
MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a
pageblock to MIGRATE_MOVABLE in error.


Here's what I think may happen:

When drain_all_pages() is called, __free_one_page() is called for each page on
pcp list with migrate type deducted from page_private() which is MIGRATE_CMA.
This result in the page being put on MIGRATE_CMA freelist even though its
pageblock's migrate type is MIGRATE_ISOLATE.

When allocation happens and pcp list is empty, rmqueue_bulk() will get executed
with migratetype argument set to MIGRATE_MOVABLE.  It calls __rmqueue() to grab
some pages and because the page described above is on MIGRATE_CMA freelist it
may be returned back to rmqueue_bulk().

But, pageblock's migrate type is not MIGRATE_CMA but MIGRATE_ISOLATE, so the
following code:

#ifdef CONFIG_CMA
if (is_pageblock_cma(page))
set_page_private(page, MIGRATE_CMA);
else
#endif
set_page_private(page, migratetype);

will set it's private to MIGRATE_MOVABLE and in the end the page lands back
on MIGRATE_MOVABLE pcp list but this time with page_private == MIGRATE_MOVABLE
and not MIGRATE_CMA.

One more drain_all_pages() (which may happen since alloc_contig_range() calls
set_migratetype_isolate() for each block) and next __rmqueue_fallback() may
convert the whole pageblock to MIGRATE_MOVABLE.

I know, this sounds crazy and improbable, but I couldn't find an easier path
to destruction.  As you pointed, once the page is allocated, 
free_hot_cold_page()
will do the right thing by reading pageblock's migrate type.

Marek is currently experimenting with various patches including the following
change:

#ifdef CONFIG_CMA
int mt = get_pageblock_migratetype(page);
if (is_migrate_cma(mt) || mt == MIGRATE_ISOLATE)
set_page_private(page, mt);
else
#endif
set_page_private(page, migratetype);

As a matter of fact, if __rmqueue() was changed to return migrate type of the
freelist it took page from, we could avoid this get_pageblock_migratetype() all
together.  For now, however, I'd rather not go that way just yet -- I'll be 
happy
to dig into it once CMA gets merged.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/15] mm: compaction: export some of the functions

2012-01-30 Thread Michal Nazarewicz


On Thu, Jan 26, 2012 at 10:00:47AM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -16,30 +16,11 @@
 #include linux/sysfs.h
 #include internal.h

+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+


On Mon, 30 Jan 2012 12:57:26 +0100, Mel Gorman m...@csn.ul.ie wrote:

This is pedantic but you reference CONFIG_CMA before the patch that
declares it. The only time this really matters is when it breaks
bisection but I do not think that is the case here.


I think I'll choose to be lazy on this one. ;) I actually tried to move
some commits around to resolve this future-reference, but this resulted
in quite a few conflicts during rebase and after several minutes I decided
that it's not worth the effort.


Whether you fix this or not by moving the CONFIG_CMA check to the same
patch that declares it in Kconfig

Acked-by: Mel Gorman m...@csn.ul.ie


--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().

2012-01-30 Thread Michal Nazarewicz


On Thu, Jan 26, 2012 at 10:00:45AM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
@@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct zone 
*zone,
} else if (!locked)
spin_lock_irq(zone-lru_lock);

-   if (!pfn_valid_within(low_pfn))
+   if (!pfn_valid(low_pfn))
continue;
nr_scanned++;



On Mon, 30 Jan 2012 12:24:28 +0100, Mel Gorman m...@csn.ul.ie wrote:

This chunk looks unrelated to the rest of the patch.

I think what you are doing is patching around a bug that CMA exposed
which is very similar to the bug report at
http://www.spinics.net/lists/linux-mm/msg29260.html . Is this true?

If so, I posted a fix that only calls pfn_valid() when necessary. Can
you check if that works for you and if so, drop this hunk please? If
the patch does not work for you, then this hunk still needs to be
in a separate patch and handled separately as it would also be a fix
for -stable.


I'll actually never encountered this bug myself and CMA is unlikely to
expose it, since it always operates on continuous memory regions with
no holes.

I've made this change because looking at the code it seemed like this
may cause problems in some cases.  The crash that you linked to looks
like the kind of problem I was thinking about.

I'll drop this hunk and let you resolve this independently of CMA.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added

2012-01-30 Thread Michal Nazarewicz


On Thu, Jan 26, 2012 at 10:00:50AM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
@@ -875,10 +895,15 @@ struct page *__rmqueue_smallest(struct zone *zone, 
unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][3] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
MIGRATE_RESERVE },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
MIGRATE_RESERVE },
+#ifdef CONFIG_CMA
+   [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
MIGRATE_CMA, MIGRATE_RESERVE },


On Mon, 30 Jan 2012 13:35:42 +0100, Mel Gorman m...@csn.ul.ie wrote:

This is a curious choice. MIGRATE_CMA is allowed to contain movable
pages. By using MIGRATE_RECLAIMABLE and MIGRATE_UNMOVABLE for movable
pages instead of MIGRATE_CMA, you increase the changes that unmovable
pages will need to use MIGRATE_MOVABLE in the future which impacts
fragmentation avoidance. I would recommend that you change this to

{ MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }


At the beginning the idea was to try hard not to get pages from MIGRATE_CMA
allocated at all, thus it was put at the end of the fallbacks list, but on
a busy system this probably won't help anyway, so I'll change it per your
suggestion.


@@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, int 
start_migratetype)
rmv_page_order(page);

/* Take ownership for orders = pageblock_order */
-   if (current_order = pageblock_order)
+   if (current_order = pageblock_order 
+   !is_pageblock_cma(page))
change_pageblock_range(page, current_order,
start_migratetype);

-   expand(zone, page, order, current_order, area, 
migratetype);
+   expand(zone, page, order, current_order, area,
+  is_migrate_cma(start_migratetype)
+? start_migratetype : migratetype);



What is this check meant to be doing?

start_migratetype is determined by allocflags_to_migratetype() and
that never will be MIGRATE_CMA so is_migrate_cma(start_migratetype)
should always be false.


Right, thanks!  This should be the other way around, ie.:

+   expand(zone, page, order, current_order, area,
+  is_migrate_cma(migratetype)
+? migratetype : start_migratetype);

I'll fix this and the calls to is_pageblock_cma().

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating

2012-01-30 Thread Michal Nazarewicz


On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman m...@csn.ul.ie wrote:


On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
@@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn)
spin_unlock_irqrestore(zone-lock, flags);
return ret ? 0 : -EBUSY;
 }
+
+/* must hold zone-lock */
+void update_pcp_isolate_block(unsigned long pfn)
+{
+   unsigned long end_pfn = pfn + pageblock_nr_pages;
+   struct page *page;
+
+   while (pfn  end_pfn) {
+   if (!pfn_valid_within(pfn)) {
+   ++pfn;
+   continue;
+   }
+


On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman m...@csn.ul.ie wrote:

There is a potential problem here that you need to be aware of.
set_pageblock_migratetype() is called from start_isolate_page_range().
I do not think there is a guarantee that pfn + pageblock_nr_pages is
not in a different block of MAX_ORDER_NR_PAGES. If that is right then
your options are to add a check like this;

if ((pfn  (MAX_ORDER_NR_PAGES - 1)) == 0  !pfn_valid(pfn))
break;

or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in
the same block as pfn and relying on the caller to have called
pfn_valid.


pfn = round_down(pfn, pageblock_nr_pages);
end_pfn = pfn + pageblock_nr_pages;

should do the trick as well, right?  move_freepages_block() seem to be
doing the same thing.


+   page = pfn_to_page(pfn);
+   if (PageBuddy(page)) {
+   pfn += 1  page_order(page);
+   } else if (page_count(page) == 0) {
+   set_page_private(page, MIGRATE_ISOLATE);
+   ++pfn;


This is dangerous for two reasons. If the page_count is 0, it could
be because the page is in the process of being freed and is not
necessarily on the per-cpu lists yet and you cannot be sure if the
contents of page-private are important. Second, there is nothing to
prevent another CPU allocating this page from its per-cpu list while
the private field is getting updated from here which might lead to
some interesting races.

I recognise that what you are trying to do is respond to Gilad's
request that you really check if an IPI here is necessary. I think what
you need to do is check if a page with a count of 0 is encountered
and if it is, then a draining of the per-cpu lists is necessary. To
address Gilad's concerns, be sure to only this this once per attempt at
CMA rather than for every page encountered with a count of 0 to avoid a
storm of IPIs.


It's actually more then that.

This is the same issue that I first fixed with a change to free_pcppages_bulk()
function[1].  At the time of positing, you said you'd like me to try and find
a different solution which would not involve paying the price of calling
get_pageblock_migratetype().  Later I also realised that this solution is
not enough.

[1] http://article.gmane.org/gmane.linux.kernel.mm/70314

My next attempt was to run drain PCP list while holding zone-lock[2], but that
quickly proven to be broken approach when Marek started testing it on an SMP
system.

[2] http://article.gmane.org/gmane.linux.kernel.mm/72016

This patch is yet another attempt of solving this old issue.  Even though it has
a potential race condition we came to conclusion that the actual chances of
causing any problems are slim.  Various stress tests did not, in fact, show
the race to be an issue.

The problem is that if a page is on a PCP list, and it's underlaying pageblocks'
migrate type is changed to MIGRATE_ISOLATE, the page (i) will still remain on 
PCP
list and thus someone can allocate it, and (ii) when removed from PCP list, the
page will be put on freelist of migrate type it had prior to change.

(i) is actually not such a big issue since the next thing that happens after
isolation is migration so all the pages will get freed.  (ii) is actual problem
and if [1] is not an acceptable solution I really don't have a good fix for 
that.

One things that comes to mind is calling drain_all_pages() prior to acquiring
zone-lock in set_migratetype_isolate().  This is however prone to races since
after the drain and before the zone-lock is acquired, pages might get moved
back to PCP list.

Draining PCP list after acquiring zone-lock is not possible because
smp_call_function_many() cannot be called with interrupts disabled, and changing
spin_lock_irqsave() to spin_lock() followed by local_irq_save() causes a dead
lock (that's what [2] attempted to do).

Any suggestions are welcome!


+   } else {
+   ++pfn;
+   }
+   }
+}


--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe

Re: [PATCHv19 00/15] Contiguous Memory Allocator

2012-01-30 Thread Michal Nazarewicz


On Mon, 30 Jan 2012 14:25:12 +0100, Mel Gorman m...@csn.ul.ie wrote:

I reviewed the core MM changes and I've acked most of them so the
next release should have a few acks where you expect them. I did not
add a reviewed-by because I did not build and test the thing.


Thanks!

I've either replied to your comments or applied suggested changes.
If anyone cares, not-tested changes are available at
git://github.com/mina86/linux-2.6.git cma

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv19 00/15] Contiguous Memory Allocator

2012-01-26 Thread Michal Nazarewicz


On Thu, 26 Jan 2012 16:31:40 +0100, Arnd Bergmann a...@arndb.de wrote:

I haven't followed the last two releases so closely. It seems that
in v17 the movable pages with pending i/o was still a major problem
but in v18 you added a solution. Is that right? What is still left
to be done here then?


In the current version, when allocation fails because of a page with
pending I/O, CMA automatically tries allocation in another region.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()

2012-01-17 Thread Michal Nazarewicz


On Tue, 17 Jan 2012 22:54:28 +0100, sandeep patil psandee...@gmail.com wrote:


Marek,

I am running a CMA test where I keep allocating from a CMA region as long
as the allocation fails due to lack of space.

However, I am seeing failures much before I expect them to happen.
When the allocation fails, I see a warning coming from __alloc_contig_range(),
because test_pages_isolated() returned true.


Yeah, we are wondering ourselves about that.  Could you try cherry-picking
commit ad10eb079c97e27b4d27bc755c605226ce1625de (update migrate type on pcp
when isolating) from git://github.com/mina86/linux-2.6.git?  It probably won't
apply cleanly but resolving the conflicts should not be hard (alternatively
you can try branch cma from the same repo but it is a work in progress at the
moment).


I tried to find out why this happened and added in a debug print inside
__test_page_isolated_in_pageblock(). Here's the resulting log ..


[...]


From the log it looks like the warning showed up because page-private
is set to MIGRATE_CMA instead of MIGRATE_ISOLATED.


My understanding of that situation is that the page is on pcp list in which
cases it's page_private is not updated.  Draining and the first patch in
the series (and also the commit I've pointed to above) are designed to fix
that but I'm unsure why they don't work all the time.


I've also had a test case where it failed because (page_count() != 0)





Have you or anyone else seen this during the CMA testing?

Also, could this be because we are finding a page within (start, end)
that actually belongs to a higher order Buddy block ?


Higher order free buddy blocks are skipped in the “if (PageBuddy(page))”
path of __test_page_isolated_in_pageblock().  Then again, now that I think
of it, something fishy may be happening on the edges.  Moving the check
outside of __alloc_contig_migrate_range() after outer_start is calculated
in alloc_contig_range() could help.  I'll take a look at it.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()

2012-01-17 Thread Michal Nazarewicz


My understanding of that situation is that the page is on pcp list in which
cases it's page_private is not updated.  Draining and the first patch in
the series (and also the commit I've pointed to above) are designed to fix
that but I'm unsure why they don't work all the time.


On Wed, 18 Jan 2012 01:46:37 +0100, sandeep patil psandee...@gmail.com wrote:

Will verify this if the page is found on the pcp list as well .


I was wondering in general if “!PageBuddy(page)  !page_count(page)” means
page is on PCP.  From what I've seen in page_isolate.c it seems to be the case.


I've also had a test case where it failed because (page_count() != 0)



With this, when it failed the page_count() returned a value of 2.  I am not
sure why, but I will try and see If I can reproduce this.


If I'm not mistaken, page_count() != 0 means the page is allocated.  I can see
the following scenarios which can lead to page being allocated in when
test_pages_isolated() is called:

1. The page failed to migrate.  In this case however, the code would abort 
earlier.

2. The page was migrated but then allocated.  This is not possible since
   migrated pages are freed which puts the page on MIGRATE_ISOLATE freelist 
which
   guarantees that the page will not be migrated.

3. The page was removed from PCP list but with migratetype == MIGRATE_CMA.  This
   is something the first patch in the series as well as the commit I've 
mentioned tries
   to address so hopefully it won't be an issue any more.

4. The page was allocated from PCP list.  This may happen because draining of 
PCP
   list happens after IRQs are enabled in set_migratetype_isolate().  I don't 
have
   a solution for that just yet.  One is to alter update_pcp_isolate_block() to
   remove page from the PCP list.  I haven't looked at specifics of how to 
implement
   this just yet.


Moving the check outside of __alloc_contig_migrate_range() after outer_start
is calculated in alloc_contig_range() could help.


I was going to suggest that, moving the check until after outer_start
is calculated will definitely help IMO. I am sure I've seen a case where

  page_count(page) = page-private = 0 and PageBuddy(page) was false.


Yep, I've pushed new content to my branch 
(git://github.com/mina86/linux-2.6.git cma)
and will try to get Marek to test it some time soon (I'm currently swamped with
non-Linux related work myself).

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()

2012-01-16 Thread Michal Nazarewicz


On Mon, 16 Jan 2012 10:01:10 +0100, Mel Gorman m...@csn.ul.ie wrote:


On Fri, Jan 13, 2012 at 09:04:31PM +0100, Michal Nazarewicz wrote:

On Thu, Dec 29, 2011 at 01:39:05PM +0100, Marek Szyprowski wrote:
From: Michal Nazarewicz min...@mina86.com
+   /* Make sure all pages are isolated. */
+   if (!ret) {
+   lru_add_drain_all();
+   drain_all_pages();
+   if (WARN_ON(test_pages_isolated(start, end)))
+   ret = -EBUSY;
+   }

On Tue, 10 Jan 2012 15:16:13 +0100, Mel Gorman m...@csn.ul.ie wrote:
Another global IPI seems overkill. Drain pages only from the local CPU
(drain_pages(get_cpu()); put_cpu()) and test if the pages are isolated.

Is get_cpu() + put_cpu() required? Won't drain_local_pages() work?



drain_local_pages() calls smp_processor_id() without preemption
disabled.


Thanks, I wasn't sure if preemption is an issue.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()

2012-01-13 Thread Michal Nazarewicz


On Thu, Dec 29, 2011 at 01:39:05PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
+   /* Make sure all pages are isolated. */
+   if (!ret) {
+   lru_add_drain_all();
+   drain_all_pages();
+   if (WARN_ON(test_pages_isolated(start, end)))
+   ret = -EBUSY;
+   }


On Tue, 10 Jan 2012 15:16:13 +0100, Mel Gorman m...@csn.ul.ie wrote:

Another global IPI seems overkill. Drain pages only from the local CPU
(drain_pages(get_cpu()); put_cpu()) and test if the pages are isolated.


Is get_cpu() + put_cpu() required? Won't drain_local_pages() work?

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().

2012-01-10 Thread Michal Nazarewicz


On Tue, 10 Jan 2012 14:43:51 +0100, Mel Gorman m...@csn.ul.ie wrote:


On Thu, Dec 29, 2011 at 01:39:03PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com



+static unsigned long
+isolate_freepages_range(struct zone *zone,
+   unsigned long start_pfn, unsigned long end_pfn,
+   struct list_head *freelist)
 {
-   unsigned long zone_end_pfn, end_pfn;
-   int nr_scanned = 0, total_isolated = 0;
-   struct page *cursor;
-
-   /* Get the last PFN we should scan for free pages at */
-   zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages;
-   end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
+   unsigned long nr_scanned = 0, total_isolated = 0;
+   unsigned long pfn = start_pfn;
+   struct page *page;

-   /* Find the first usable PFN in the block to initialse page cursor */
-   for (; blockpfn  end_pfn; blockpfn++) {
-   if (pfn_valid_within(blockpfn))
-   break;
-   }
-   cursor = pfn_to_page(blockpfn);
+   VM_BUG_ON(!pfn_valid(pfn));
+   page = pfn_to_page(pfn);


The existing code is able to find the first usable PFN in a pageblock
with pfn_valid_within(). It's allowed to do that because it knows
the pageblock is valid so calling pfn_valid() is unnecessary.

It is curious to change this to something that can sometimes BUG_ON
!pfn_valid(pfn) instead of having a PFN walker that knows how to
handle pfn_valid().


Actually, this walker seem redundant since one is already present in
isolate_freepages(), ie. if !pfn_valid(pfn) then the loop in
isolate_freepages() will “continue;” and the function will never get
called.


+cleanup:
+   /*
+* Undo what we have done so far, and return.  We know all pages from
+* [start_pfn, pfn) are free because we have just freed them.  If one of
+* the page in the range was not freed, we would end up here earlier.
+*/
+   for (; start_pfn  pfn; ++start_pfn)
+   __free_page(pfn_to_page(start_pfn));
+   return 0;


This returns 0 even though you could have isolated some pages.


The loop's intention is to “unisolate” (ie. free) anything that got
isolated, so at the end number of isolated pages in in fact zero.


Overall, there would have been less contortion if you
implemented isolate_freepages_range() in terms of the existing
isolate_freepages_block. Something that looked kinda like this not
compiled untested illustration.

static unsigned long isolate_freepages_range(struct zone *zone,
unsigned long start_pfn, unsigned long end_pfn,
struct list_head *list)
{
unsigned long pfn = start_pfn;
unsigned long isolated;

for (pfn = start_pfn; pfn  end_pfn; pfn += pageblock_nr_pages) {
if (!pfn_valid(pfn))
continue;
isolated += isolate_freepages_block(zone, pfn, list);
}

return isolated;
}

I neglected to update isolate_freepages_block() to take the end_pfn
meaning it will in fact isolate too much but that would be easy to
address.


This is not a problem since my isolate_freepages_range() implementation
can go beyond end_pfn, anyway.

Of course, the function taking end_pfn is an optimisation since it does
not have to compute zone_end each time.


It would be up to yourself whether to shuffle the tracepoint
around because with this example it will be triggered once per
pageblock. You'd still need the cleanup code and freelist check in
isolate_freepages_block() of course.

The point is that it would minimise the disruption to the existing
code and easier to get details like pfn_valid() right without odd
contortions, more gotos than should be necessary and trying to keep
pfn and page straight.


Even though I'd personally go with my approach, I see merit in your point,
so will change.


 }

 /* Returns true if the page is within a block suitable for migration to */



@@ -365,17 +403,49 @@ static isolate_migrate_t isolate_migratepages(struct zone 
*zone,
nr_isolated++;

/* Avoid isolating too much */
-   if (cc-nr_migratepages == COMPACT_CLUSTER_MAX)
+   if (cc-nr_migratepages == COMPACT_CLUSTER_MAX) {
+   ++low_pfn;
break;
+   }
}


This minor optimisation is unrelated to the implementation of
isolate_migratepages_range() and should be in its own patch.


It's actually not a mere minor optimisation, but rather making the function work
according to the documentation added, ie. that it returns “PFN of the first page
that was not scanned”.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe

Re: [PATCH 05/11] mm: mmzone: MIGRATE_CMA migration type added

2012-01-10 Thread Michal Nazarewicz


On Tue, 10 Jan 2012 15:38:36 +0100, Mel Gorman m...@csn.ul.ie wrote:


On Thu, Dec 29, 2011 at 01:39:06PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com



@@ -35,13 +35,35 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3

-#define MIGRATE_UNMOVABLE 0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE   2
-#define MIGRATE_PCPTYPES  3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE   3
-#define MIGRATE_ISOLATE   4 /* can't allocate from here */
-#define MIGRATE_TYPES 5
+enum {
+   MIGRATE_UNMOVABLE,
+   MIGRATE_RECLAIMABLE,
+   MIGRATE_MOVABLE,
+   MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
+   MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+   /*
+* MIGRATE_CMA migration type is designed to mimic the way
+* ZONE_MOVABLE works.  Only movable pages can be allocated
+* from MIGRATE_CMA pageblocks and page allocator never
+* implicitly change migration type of MIGRATE_CMA pageblock.
+*
+* The way to use it is to change migratetype of a range of
+* pageblocks to MIGRATE_CMA which can be done by
+* __free_pageblock_cma() function.  What is important though
+* is that a range of pageblocks must be aligned to
+* MAX_ORDER_NR_PAGES should biggest page be bigger then
+* a single pageblock.
+*/
+   MIGRATE_CMA,
+   MIGRATE_ISOLATE,/* can't allocate from here */
+   MIGRATE_TYPES
+};


MIGRATE_CMA is being added whether or not CONFIG_CMA is set. This
increases the size of the pageblock bitmap and where that is just 1 bit
per pageblock in the system, it may be noticable on large machines.


Wasn't aware of that, will do.  In fact, in earlier versions in was done this 
way,
but resulted in more #ifdefs.




+
+#ifdef CONFIG_CMA
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif



Use static inlines.


I decide to use #define for the sake of situations like
is_migrate_cma(get_pageblock_migratetype(page)).  With a static inline it will 
have
to read pageblock's migrate type even if !CONFIG_CMA.  A macro gets rid of this
operation all together.


 #define for_each_migratetype_order(order, type) \
for (order = 0; order  MAX_ORDER; order++) \
@@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page 
*page)
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }

+static inline bool is_pageblock_cma(struct page *page)
+{
+   return is_migrate_cma(get_pageblock_migratetype(page));
+}
+


This allows additional calls to get_pageblock_migratetype() even if
CONFIG_CMA is not set.


 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
unsigned long   nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index d305080..af650db 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -37,4 +37,7 @@ extern void unset_migratetype_isolate(struct page *page);
 int alloc_contig_range(unsigned long start, unsigned long end);
 void free_contig_range(unsigned long pfn, unsigned nr_pages);

+/* CMA stuff */
+extern void init_cma_reserved_pageblock(struct page *page);
+
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 011b110..e080cac 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -192,7 +192,7 @@ config COMPACTION
 config MIGRATION
bool Page migration
def_bool y
-   depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+   depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA
help
  Allows the migration of the physical location of pages of processes
  while the virtual addresses are not changed. This is useful in
diff --git a/mm/compaction.c b/mm/compaction.c
index 8733441..46783b4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -21,6 +21,11 @@
 #define CREATE_TRACE_POINTS
 #include trace/events/compaction.h

+static inline bool is_migrate_cma_or_movable(int migratetype)
+{
+   return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+}
+


That is not a name that helps any. migrate_async_suitable() would be
marginally better.


 /**
  * isolate_freepages_range() - isolate free pages, must hold zone-lock.
  * @zone:  Zone pages are in.
@@ -213,7 +218,7 @@ isolate_migratepages_range(struct zone *zone, struct 
compact_control *cc,
 */
pageblock_nr = low_pfn  pageblock_order;
if (!cc-sync  last_pageblock_nr != pageblock_nr 
-   get_pageblock_migratetype(page) != 
MIGRATE_MOVABLE) {
+   is_migrate_cma_or_movable(get_pageblock_migratetype(page))) 
{
low_pfn += pageblock_nr_pages;
low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1

Re: [PATCH 01/11] mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating

2012-01-01 Thread Michal Nazarewicz


On Thu, Dec 29, 2011 at 2:39 PM, Marek Szyprowski
m.szyprow...@samsung.com wrote:

From: Michal Nazarewicz min...@mina86.com

When set_migratetype_isolate() sets pageblock's migrate type, it does
not change each page_private data.  This makes sense, as the function
has no way of knowing what kind of information page_private stores.



A side effect is that instead of draining pages from all zones,
set_migratetype_isolate() now drain only pages from zone pageblock it
operates on is in.



+/* Caller must hold zone-lock. */
+static void __zone_drain_local_pages(void *arg)
+{
+   struct per_cpu_pages *pcp;
+   struct zone *zone = arg;
+   unsigned long flags;
+
+   local_irq_save(flags);
+   pcp = per_cpu_ptr(zone-pageset, smp_processor_id())-pcp;
+   if (pcp-count) {
+   /* Caller holds zone-lock, no need to grab it. */
+   __free_pcppages_bulk(zone, pcp-count, pcp);
+   pcp-count = 0;
+   }
+   local_irq_restore(flags);
+}
+
+/*
+ * Like drain_all_pages() but operates on a single zone.  Caller must
+ * hold zone-lock.
+ */
+static void __zone_drain_all_pages(struct zone *zone)
+{
+   on_each_cpu(__zone_drain_local_pages, zone, 1);
+}
+


On Sun, 01 Jan 2012 08:49:13 +0100, Gilad Ben-Yossef gi...@benyossef.com 
wrote:

Please consider whether sending an IPI to all processors in the system
and interrupting them is appropriate here.

You seem to assume that it is probable that each CPU of the possibly
4,096 (MAXSMP on x86) has a per-cpu page for the specified zone,


I'm not really assuming that (in fact I expect what you fear, ie. that
most CPUs won't have pages from specified zone an PCP list), however,
I really need to make sure to get them off all PCP lists.


otherwise you're just interrupting them out of doing something useful,
or save power idle for nothing.


Exactly what's happening now anyway.


While that may or may not be a reasonable assumption for the general
drain_all_pages that drains pcps from all zones, I feel it is less
likely to be the right thing once you limit the drain to a single
zone.


Currently, set_migratetype_isolate() seem to do more then it needs to,
ie. it drains all the pages even though all it cares about is a single
zone.


Some background on my attempt to reduce IPI noise in the system in
this context is probably useful here as
well: https://lkml.org/lkml/2011/11/22/133


Looks interesting, I'm not entirely sure why it does not end up a race
condition, but in case of __zone_drain_all_pages() we already hold
zone-lock, so my fears are somehow gone..  I'll give it a try, and prepare
a patch for __zone_drain_all_pages().

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/11] mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating

2012-01-01 Thread Michal Nazarewicz


On Sun, 01 Jan 2012 17:06:53 +0100, Gilad Ben-Yossef gi...@benyossef.com 
wrote:


2012/1/1 Michal Nazarewicz min...@mina86.com:

Looks interesting, I'm not entirely sure why it does not end up a race
condition, but in case of __zone_drain_all_pages() we already hold



If a page is in the PCP list when we check, you'll send the IPI and all is well.

If it isn't when we check and gets added later you could just the same have
situation where we send the IPI, try to do try an empty PCP list and then
the page gets added. So we are not adding a race condition that is not there
already :-)


Right, makes sense.


zone-lock, so my fears are somehow gone..  I'll give it a try, and prepare
a patch for __zone_drain_all_pages().



I plan to send V5 of the IPI noise patch after some testing. It has a new
version of the drain_all_pages, with no allocation in the reclaim path
and no locking. You might want to wait till that one is out to base on it.


This shouldn't be a problem for my case as set_migratetype_isolate() is hardly
ever called in reclaim path. :)

The change so far seems rather obvious:

 mm/page_alloc.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 424d36a..eaa686b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1181,7 +1181,19 @@ static void __zone_drain_local_pages(void *arg)
  */
 static void __zone_drain_all_pages(struct zone *zone)
 {
-   on_each_cpu(__zone_drain_local_pages, zone, 1);
+   struct per_cpu_pageset *pcp;
+   cpumask_var_t cpus;
+   int cpu;
+
+   if (likely(zalloc_cpumask_var(cpus, GFP_ATOMIC | __GFP_NOWARN))) {
+   for_each_online_cpu(cpu)
+   if (per_cpu_ptr(zone-pageset, cpu)-pcp.count)
+   cpumask_set_cpu(cpu, cpus);
+   on_each_cpu_mask(cpus, __zone_drain_local_pages, zone, 1);
+   free_cpumask_var(cpus);
+   } else {
+   on_each_cpu(__zone_drain_local_pages, zone, 1);
+   }
 }

 #ifdef CONFIG_HIBERNATION

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/11] mm: page_alloc: handle MIGRATE_ISOLATE in free_pcppages_bulk()

2011-12-12 Thread Michal Nazarewicz


On Fri, Nov 18, 2011 at 05:43:08PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9dd443d..58d1a2e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -628,6 +628,18 @@ static void free_pcppages_bulk(struct zone *zone, int 
count,
page = list_entry(list-prev, struct page, lru);
/* must delete as __free_one_page list manipulates */
list_del(page-lru);
+
+   /*
+* When page is isolated in set_migratetype_isolate()
+* function it's page_private is not changed since the
+* function has no way of knowing if it can touch it.
+* This means that when a page is on PCP list, it's
+* page_private no longer matches the desired migrate
+* type.
+*/
+   if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
+   set_page_private(page, MIGRATE_ISOLATE);
+


On Mon, 12 Dec 2011 14:42:35 +0100, Mel Gorman m...@csn.ul.ie wrote:

How much of a problem is this in practice?


IIRC, this lead to allocation being made from area marked as isolated
or some such.


[...] I'd go as far to say that it would be preferable to drain the
per-CPU lists after you set pageblocks MIGRATE_ISOLATE. The IPIs also have
overhead but it will be incurred for the rare rather than the common case.


I'll look into that.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/11] mm: mmzone: introduce zone_pfn_same_memmap()

2011-12-12 Thread Michal Nazarewicz


On Fri, Nov 18, 2011 at 05:43:10PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
diff --git a/mm/compaction.c b/mm/compaction.c
index 6afae0e..09c9702 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -111,7 +111,10 @@ skip:

 next:
pfn += isolated;
-   page += isolated;
+   if (zone_pfn_same_memmap(pfn - isolated, pfn))
+   page += isolated;
+   else
+   page = pfn_to_page(pfn);
}


On Mon, 12 Dec 2011 15:19:53 +0100, Mel Gorman m...@csn.ul.ie wrote:

Is this necessary?

We are isolating pages, the largest of which is a MAX_ORDER_NR_PAGES
page.  [...]


This is not true for CMA.


That said, everywhere else managed to avoid checks like this by always
scanning in units of pageblocks. Maybe this should be structured
the same way to guarantee pfn_valid is called at least per pageblock
(even though only once per MAX_ORDER_NR_PAGES is necessary).


I'll look into that.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/11] mm: compaction: export some of the functions

2011-12-12 Thread Michal Nazarewicz


On Mon, 12 Dec 2011 15:29:07 +0100, Mel Gorman m...@csn.ul.ie wrote:


On Fri, Nov 18, 2011 at 05:43:11PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com

This commit exports some of the functions from compaction.c file
outside of it adding their declaration into internal.h header
file so that other mm related code can use them.

This forced compaction.c to always be compiled (as opposed to being
compiled only if CONFIG_COMPACTION is defined) but as to avoid
introducing code that user did not ask for, part of the compaction.c
is now wrapped in on #ifdef.

Signed-off-by: Michal Nazarewicz min...@mina86.com
Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
---
 mm/Makefile |3 +-
 mm/compaction.c |  112 +++
 mm/internal.h   |   35 +
 3 files changed, 83 insertions(+), 67 deletions(-)

diff --git a/mm/Makefile b/mm/Makefile
index 50ec00e..24ed801 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -13,7 +13,7 @@ obj-y := filemap.o mempool.o oom_kill.o 
fadvise.o \
   readahead.o swap.o truncate.o vmscan.o shmem.o \
   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
   page_isolation.o mm_init.o mmu_context.o percpu.o \
-  $(mmu-y)
+  $(mmu-y) compaction.o


That should be

compaction.o $(mmu-y)

for consistency.

Overall, this patch implies that CMA is always compiled in.


Not really.  But yes, it produces some bloat when neither CMA nor
compaction are compiled.  I assume that linker will be able to deal
with that (since the functions are not EXPORT_SYMBOL'ed).

Note also that the was majority of compaction.c is #ifdef'd though
so only a handful of functions are compiled.


Why not just make CMA depend on COMPACTION to keep things simplier?


I could imagine that someone would want to have CMA but not compaction,
hence I decided to give that choice.


For example, if you enable CMA and do not enable COMPACTION, you
lose things like the vmstat counters that can aid debugging. In
fact, as parts of compaction.c are using defines like COMPACTBLOCKS,
I'm not even sure compaction.c can compile without CONFIG_COMPACTION
because of the vmstat stuff.


--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/11] mm: mmzone: introduce zone_pfn_same_memmap()

2011-12-12 Thread Michal Nazarewicz


On Fri, Nov 18, 2011 at 05:43:10PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
diff --git a/mm/compaction.c b/mm/compaction.c
index 6afae0e..09c9702 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -111,7 +111,10 @@ skip:

 next:
pfn += isolated;
-   page += isolated;
+   if (zone_pfn_same_memmap(pfn - isolated, pfn))
+   page += isolated;
+   else
+   page = pfn_to_page(pfn);
}


On Mon, 12 Dec 2011 15:19:53 +0100, Mel Gorman m...@csn.ul.ie wrote:

Is this necessary?

We are isolating pages, the largest of which is a MAX_ORDER_NR_PAGES
page.  [...]


On Mon, 12 Dec 2011 15:40:30 +0100, Mel Gorman m...@csn.ul.ie wrote:

To be clear, I'm referring to a single page being isolated here. It may
or may not be a high-order page but it's still going to be less then
MAX_ORDER_NR_PAGES so you should be able check when a new block is
entered and pfn_to_page is necessary.


Do you mean something like:

if (same pageblock)
just do arithmetic;
else
use pfn_to_page;

?

I've discussed it with Dave and he suggested that approach as an
optimisation since in some configurations zone_pfn_same_memmap()
is always true thus compiler will strip the else part, whereas
same pageblock test will be false on occasions regardless of kernel
configuration.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().

2011-12-12 Thread Michal Nazarewicz


On Fri, Nov 18, 2011 at 05:43:09PM +0100, Marek Szyprowski wrote:

From: Michal Nazarewicz min...@mina86.com
diff --git a/mm/compaction.c b/mm/compaction.c
index 899d956..6afae0e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -54,51 +54,64 @@ static unsigned long release_freepages(struct list_head 
*freelist)
return count;
 }

-/* Isolate free pages onto a private freelist. Must hold zone-lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-   unsigned long blockpfn,
-   struct list_head *freelist)
+/**
+ * isolate_freepages_range() - isolate free pages, must hold zone-lock.
+ * @zone:  Zone pages are in.
+ * @start: The first PFN to start isolating.
+ * @end:   The one-past-last PFN.
+ * @freelist:  A list to save isolated pages to.
+ *
+ * If @freelist is not provided, holes in range (either non-free pages
+ * or invalid PFNs) are considered an error and function undos its
+ * actions and returns zero.
+ *
+ * If @freelist is provided, function will simply skip non-free and
+ * missing pages and put only the ones isolated on the list.
+ *
+ * Returns number of isolated pages.  This may be more then end-start
+ * if end fell in a middle of a free page.
+ */
+static unsigned long
+isolate_freepages_range(struct zone *zone,
+   unsigned long start, unsigned long end,
+   struct list_head *freelist)


On Mon, 12 Dec 2011 15:07:28 +0100, Mel Gorman m...@csn.ul.ie wrote:

Use start_pfn and end_pfn to keep it consistent with the rest of
compaction.c.


Will do.


 {
-   unsigned long zone_end_pfn, end_pfn;
-   int nr_scanned = 0, total_isolated = 0;
-   struct page *cursor;
-
-   /* Get the last PFN we should scan for free pages at */
-   zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages;
-   end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
+   unsigned long nr_scanned = 0, total_isolated = 0;
+   unsigned long pfn = start;
+   struct page *page;

-   /* Find the first usable PFN in the block to initialse page cursor */
-   for (; blockpfn  end_pfn; blockpfn++) {
-   if (pfn_valid_within(blockpfn))
-   break;
-   }
-   cursor = pfn_to_page(blockpfn);
+   VM_BUG_ON(!pfn_valid(pfn));
+   page = pfn_to_page(pfn);

/* Isolate free pages. This assumes the block is valid */
-   for (; blockpfn  end_pfn; blockpfn++, cursor++) {
-   int isolated, i;
-   struct page *page = cursor;
-
-   if (!pfn_valid_within(blockpfn))
-   continue;
-   nr_scanned++;
-
-   if (!PageBuddy(page))
-   continue;
+   while (pfn  end) {
+   unsigned isolated = 1, i;
+



Do not use implcit types. These are unsigned ints, call them unsigned
ints.


Will do.




+   if (!pfn_valid_within(pfn))
+   goto skip;


The flow of this function in general with gotos of skipped and next
is confusing in comparison to the existing function. For example,
if this PFN is not valid, and no freelist is provided, then we call
__free_page() on a PFN that is known to be invalid.


+   ++nr_scanned;
+
+   if (!PageBuddy(page)) {
+skip:
+   if (freelist)
+   goto next;
+   for (; start  pfn; ++start)
+   __free_page(pfn_to_page(pfn));
+   return 0;
+   }


So if a PFN is valid and !PageBuddy and no freelist is provided, we
call __free_page() on it regardless of reference count. That does not
sound safe.


Sorry about that.  It's a bug in the code which was caught later on.  The
code should read “__free_page(pfn_to_page(start))”.



/* Found a free page, break it into order-0 pages */
isolated = split_free_page(page);
total_isolated += isolated;
-   for (i = 0; i  isolated; i++) {
-   list_add(page-lru, freelist);
-   page++;
+   if (freelist) {
+   struct page *p = page;
+   for (i = isolated; i; --i, ++p)
+   list_add(p-lru, freelist);
}

-   /* If a page was split, advance to the end of it */
-   if (isolated) {
-   blockpfn += isolated - 1;
-   cursor += isolated - 1;
-   }
+next:
+   pfn += isolated;
+   page += isolated;


The name isolated is now confusing because it can mean either
pages isolated or pages scanned depending on context. Your patch
appears to be doing a lot more than is necessary to convert
isolate_freepages_block into isolate_freepages_range and at this point,
it's unclear why you did that.


When

Re: [PATCH 04/11] mm: compaction: export some of the functions

2011-12-12 Thread Michal Nazarewicz


On Mon, 12 Dec 2011 15:29:07 +0100, Mel Gorman m...@csn.ul.ie wrote:

Overall, this patch implies that CMA is always compiled in.



On Mon, Dec 12, 2011 at 03:41:04PM +0100, Michal Nazarewicz wrote:

Not really.  But yes, it produces some bloat when neither CMA nor
compaction are compiled.  I assume that linker will be able to deal
with that (since the functions are not EXPORT_SYMBOL'ed).


On Mon, 12 Dec 2011 16:40:15 +0100, Mel Gorman m...@csn.ul.ie wrote:

The bloat exists either way. I don't believe the linker strips it out so
overall it would make more sense to depend on compaction to keep the
vmstat counters for debugging reasons if nothing else. It's not
something I feel very strongly about though.


KK, I'll do that then.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().

2011-12-12 Thread Michal Nazarewicz


On Mon, 12 Dec 2011 17:30:52 +0100, Mel Gorman m...@csn.ul.ie wrote:


On Mon, Dec 12, 2011 at 04:22:39PM +0100, Michal Nazarewicz wrote:

 SNIP

+   if (!pfn_valid_within(pfn))
+   goto skip;

The flow of this function in general with gotos of skipped and next
is confusing in comparison to the existing function. For example,
if this PFN is not valid, and no freelist is provided, then we call
__free_page() on a PFN that is known to be invalid.

+   ++nr_scanned;
+
+   if (!PageBuddy(page)) {
+skip:
+   if (freelist)
+   goto next;
+   for (; start  pfn; ++start)
+   __free_page(pfn_to_page(pfn));
+   return 0;
+   }

So if a PFN is valid and !PageBuddy and no freelist is provided, we
call __free_page() on it regardless of reference count. That does not
sound safe.

Sorry about that.  It's a bug in the code which was caught later on.  The
code should read ???__free_page(pfn_to_page(start))???.


On Mon, 12 Dec 2011 17:30:52 +0100, Mel Gorman m...@csn.ul.ie wrote:

That will call free on valid PFNs but why is it safe to call
__free_page() at all?  You say later that CMA requires that all
pages in the range be valid but if the pages are in use, that does
not mean that calling __free_page() is safe. I suspect you have not
seen a problem because the pages in the range were free as expected
and not in use because of MIGRATE_ISOLATE.


All pages from [start, pfn) have passed through the loop body which
means that they are valid and they have been removed from buddy (for
caller's use).  Also, because of split_free_page(), all of those pages
have been split into 0-order pages.  Therefore, in error recovery, to
undo what the loop has done so far, we put give back to buddy by
calling __free_page() on each 0-order page.


/* Found a free page, break it into order-0 pages */
isolated = split_free_page(page);
total_isolated += isolated;
-   for (i = 0; i  isolated; i++) {
-   list_add(page-lru, freelist);
-   page++;
+   if (freelist) {
+   struct page *p = page;
+   for (i = isolated; i; --i, ++p)
+   list_add(p-lru, freelist);
}

-   /* If a page was split, advance to the end of it */
-   if (isolated) {
-   blockpfn += isolated - 1;
-   cursor += isolated - 1;
-   }
+next:
+   pfn += isolated;
+   page += isolated;

The name isolated is now confusing because it can mean either
pages isolated or pages scanned depending on context. Your patch
appears to be doing a lot more than is necessary to convert
isolate_freepages_block into isolate_freepages_range and at this point,
it's unclear why you did that.

When CMA uses this function, it requires all pages in the range to be valid
and free. (Both conditions should be met but you never know.)


To be clear, I meant that the CMA expects pages to be in buddy when the function
is called but after the function finishes, all the pages in the range are 
removed
from buddy.  This, among other things, is why the call to split_free_page() is
necessary.


It seems racy but I guess you are depending on MIGRATE_ISOLATE to keep
things sane which is fine. However, I strongly suspect that if there
is a race and a page is in use, then you will need to retry the
migration step. Calling __free_page does not look right because
something still has a reference to the page.


This change
adds a second way isolate_freepages_range() works, which is when freelist is
not specified, abort on invalid or non-free page, but continue as usual if
freelist is provided.


Ok, I think you should be able to do that by not calling split_free_page
or adding to the list if !freelist with a comment explaining why the
pages are left on the buddy lists for the caller to figure out. Bail if
a page-in-use is found and have the caller check that the return value
of isolate_freepages_block == end_pfn - start_pfn.


--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mm: cma: hack/workaround for some allocation issues

2011-11-25 Thread Michal Nazarewicz


On Fri, 25 Nov 2011 17:43:07 +0100, Marek Szyprowski m.szyprow...@samsung.com 
wrote:

This is a quick and dirty patch and hack to solve some memory allocation
issues that appeared at CMA v17 after switching migration code from
hotplug to memory compaction. Especially the issue with watermark
adjustment need a real fix instead of disabling the code.

Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
---

Hello,

This patch fixes the issues that have been reported recently. It should
be considered only as a temporary solution until a new version of CMA
patches is ready.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center

---
 mm/compaction.c |5 -
 mm/page_alloc.c |   12 +---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3e07341..41976f8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -79,8 +79,9 @@ isolate_freepages_range(struct zone *zone,
 skip:
if (freelist)
goto next;
+failed:
for (; start  pfn; ++start)
-   __free_page(pfn_to_page(pfn));
+   __free_page(pfn_to_page(start));
return 0;
}


Yeah, my mistake, sorry about that. ;)



@@ -91,6 +92,8 @@ skip:
struct page *p = page;
for (i = isolated; i; --i, ++p)
list_add(p-lru, freelist);
+   } else if (!isolated) {
+   goto failed;
}
next:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 714b1c1..b4a46c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1303,12 +1303,12 @@ int split_free_page(struct page *page)
zone = page_zone(page);
order = page_order(page);
-
+#if 0
/* Obey watermarks as if the page was being allocated */
watermark = low_wmark_pages(zone) + (1  order);
if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
return 0;
-
+#endif
/* Remove page from free list */
list_del(page-lru);
zone-free_area[order].nr_free--;


Come to think of it, this watermark check seem a little meaningless in case of
CMA.  With CMA the pages that we are splitting here have migrate type ISOLATE
so they aren't “free” at all.  Buddy will never use them for allocation.  That
means that we don't really allocate any pages, we just want to split them into
order-0 pages.

Also, if we bail out now, it's a huge waste of time and efforts.

So, if the watermarks need to be checked, they should somewhere before we do
migration and stuff.  This may be due to my ignorance, but I don't know whether
we really need the watermark check if we decide to use plain alloc_page() as
allocator for migrate_pages() rather then compaction_alloc().


@@ -5734,6 +5734,12 @@ static unsigned long pfn_align_to_maxpage_up(unsigned 
long pfn)
return ALIGN(pfn, MAX_ORDER_NR_PAGES);
 }
+static struct page *
+cma_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+   return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
 static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 {
/* This function is based on compact_zone() from compaction.c. */
@@ -5801,7 +5807,7 @@ static int __alloc_contig_migrate_range(unsigned long 
start, unsigned long end)
}
/* Try to migrate. */
-   ret = migrate_pages(cc.migratepages, compaction_alloc,
+   ret = migrate_pages(cc.migratepages, cma_migrate_alloc,
(unsigned long)cc, false, cc.sync);
/* Migrated all of them? Great! */


Yep, that makes sense to me.  compaction_alloc() takes only free pages (ie. 
pages
from buddy system) from given zone.  This means that no pages get discarded or
swapped to disk.  This makes sense for compaction since it's opportunistic in 
its
nature.  We, however, want pages to be discarded/swapped if that's the only way 
of
getting pages to migrate to.

Of course, with this change the “(unsigneg long)cc” part can be safely replaced
with “NULL” and “cc.nr_freepages -= release_freepages(cc.freepages);” at the 
end
of the function (not visible in this patch) with the next line removed.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCHv17 0/11] Contiguous Memory Allocator

2011-11-19 Thread Michal Nazarewicz


On Fri, 18 Nov 2011 22:20:48 +0100, sandeep patil wrote:

So, i guess my question is, until all the migration failures are
tracked down and fixed, is there a plan to retry the contiguous
allocation from a new range in the CMA region?



2011/11/18 Michal Nazarewicz min...@mina86.com:

No.  Current CMA implementation will stick to the same range of pages also
on consequent allocations of the same size.


On Sat, 19 Nov 2011 00:30:49 +0100, sandeep patil psandee...@gmail.com wrote:

Doesn't that mean the drivers that fail to allocate from contiguous DMA region
will fail, if the migration fails?


Yes.

I have some ideas how that could be mitigated.  The easiest would be to try
another region to allocate from on failure.  More complicated could be to try
and wait for the I/O transfer to finish.  I'll try to work on it during
upcoming week.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCHv17 0/11] Contiguous Memory Allocator

2011-11-18 Thread Michal Nazarewicz


On Fri, 18 Nov 2011 22:20:48 +0100, sandeep patil psandee...@gmail.com wrote:

I am running a simple test to allocate contiguous regions and write a log on
in a file on sdcard simultaneously. I can reproduce this migration failure 100%
times with it.
when I tracked the pages that failed to migrate, I found them on the
buffer head lru
list with a reference held on the buffer_head in the page, which
causes drop_buffers()
to fail.

So, i guess my question is, until all the migration failures are
tracked down and fixed,
is there a plan to retry the contiguous allocation from a new range in
the CMA region?


No.  Current CMA implementation will stick to the same range of pages also
on consequent allocations of the same size.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CMA v16 and DMA-mapping v13 patch series

2011-11-06 Thread Michal Nazarewicz

Marek Szyprowski m.szyprow...@samsung.com writes:
 Linux v3.1-rc10 with both CMA v16 and DMA-mapping v3:
 git://git.infradead.org/users/kmpark/linux-2.6-samsung 3.1-rc10-cma-v16-dma-v3

I've pushed a new version based on Mel's suggestions to 

 git://github.com/mina86/linux-2.6.git cma-17

Unfortunately, it took me more time then I anticipated and so I had no
time to test the code in any way (other then compile it on x86_64).

-- 
Best regards,  _ _
 .o. | Liege of Serenly Enlightened Majesty of   o' \,=./ `o
 ..o | Computer Science,  Michal mina86 Nazarewicz(o o)
 ooo +-mina86-mina86.com-jid:mina86-jabber.org--ooO--(_)--Ooo--


pgpdqFUugNscM.pgp
Description: PGP signature

Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-11-01 Thread Michal Nazarewicz


On Tue, 01 Nov 2011 16:04:48 +0100, Mel Gorman m...@csn.ul.ie wrote:

For the purposes of review, have a separate patch for moving
isolate_freepages_block to another file that does not alter the
function in any way. When the function is updated in a follow-on patch,
it'll be far easier to see what has changed.


Will do.


page_isolation.c may also be a better fit than page_alloc.c


Since isolate_freepages_block() is the only user of split_free_page(),
would it make sense to move split_free_page() to page_isolation.c as
well?  I sort of like the idea of making it static and removing from
header file.


I confess I didn't read closely because of the mess in page_alloc.c but
the intent seems fine.


No worries.  I just needed for a quick comment whether I'm headed the right
direction. :)


Hopefully there will be a new version of CMA posted that will be easier
to review.


I'll try and create the code no latter then on the weekend so hopefully
the new version will be sent next week.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added

2011-10-27 Thread Michal Nazarewicz


On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman m...@csn.ul.ie wrote:

This does mean that MIGRATE_CMA also does not have a per-cpu list.
I don't know if that matters to you but all allocations using
MIGRATE_CMA will take the zone lock.


On Mon, 24 Oct 2011 21:32:45 +0200, Michal Nazarewicz min...@mina86.com wrote:

This is sort of an artefact of my misunderstanding of pcp lists in the
past.  I'll have to re-evaluate the decision not to include CMA on pcp
list.


Actually sorry.  My comment above is somehow invalid.

The CMA does not need to be on pcp list because CMA pages are never allocated
via standard kmalloc() and friends.  Because of the fallbacks in rmqueue_bulk()
the CMA pages end up being added to a pcp list of the MOVABLE type and so when
kmallec() allocates an MOVABLE page it can end up grabbing a CMA page.

So it's quite OK that CMA does not have its own pcp list as the list would
not be used anyway.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added

2011-10-24 Thread Michal Nazarewicz


On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).


On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman m...@csn.ul.ie wrote:

Or the count is premanently elevated by a device driver for some reason or if
the page is backed by a filesystem with a broken or unusable migrate_page()
function. This is unavoidable, I'm just pointing out that you can stil have
migration failures, particularly if GFP_MOVABLE has been improperly used.


CMA does not handle that well right now.  I guess it's something to think about
once the rest is nice and working.


It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.



It would be preferable if you could figure out how to reuse the
MIGRATE_RESERVE type for just the bitmap.


I'm not entirely sure of what you mean here.


Like MIGRATE_CMA, it does not
change type except when min_free_kbytes changes. However, it is
something that could be done in the future to keep the size of the
pageblock bitmap where it is now.




+enum {
+   MIGRATE_UNMOVABLE,
+   MIGRATE_RECLAIMABLE,
+   MIGRATE_MOVABLE,
+   MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
+   MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+   /*
+* MIGRATE_CMA migration type is designed to mimic the way
+* ZONE_MOVABLE works.  Only movable pages can be allocated
+* from MIGRATE_CMA pageblocks and page allocator never
+* implicitly change migration type of MIGRATE_CMA pageblock.
+*
+* The way to use it is to change migratetype of a range of
+* pageblocks to MIGRATE_CMA which can be done by
+* __free_pageblock_cma() function.  What is important though
+* is that a range of pageblocks must be aligned to
+* MAX_ORDER_NR_PAGES should biggest page be bigger then
+* a single pageblock.
+*/
+   MIGRATE_CMA,



This does mean that MIGRATE_CMA also does not have a per-cpu list.
I don't know if that matters to you but all allocations using
MIGRATE_CMA will take the zone lock.


This is sort of an artefact of my misunderstanding of pcp lists in the
past.  I'll have to re-evaluate the decision not to include CMA on pcp
list.

Still, I think that CMA not being on pcp lists should not be a problem
for us.  At least we can try and get CMA running and then consider adding
CMA to pcp lists.


I'm not sure this can be easily avoided because
if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
line for it and incur a different set of performance problems.



+   MIGRATE_ISOLATE,/* can't allocate from here */
+   MIGRATE_TYPES
+};



diff --git a/mm/compaction.c b/mm/compaction.c
index 97254e4..9cf6b2b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page)
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
return false;

+   /* Keep MIGRATE_CMA alone as well. */
+   /*
+* XXX Revisit.  We currently cannot let compaction touch CMA
+* pages since compaction insists on changing their migration
+* type to MIGRATE_MOVABLE (see split_free_page() called from
+* isolate_freepages_block() above).
+*/
+   if (is_migrate_cma(migratetype))
+   return false;
+


This is another reason why CMA and compaction should be using almost
identical code. It does mean that the compact_control may need to be
renamed and get flags to control things like the setting of pageblock
flags but it would be preferable to having two almost identical pieces
of code.


I've addressed it in my other mail where I've changed the split_free_page()
to not touch CMA and ISOLATE pageblocks.  I think that this change should
make the above comment no longer accurate and the check unnecessary.


/* If the page is a large free page, then allow migration */
if (PageBuddy(page)  page_order(page) = pageblock_order)
return true;



@@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int 
start_migratetype)
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1; current_order = order;
--current_order) {
-

Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator

2011-10-24 Thread Michal Nazarewicz


On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:

+static unsigned long __init __cma_early_get_total_pages(void)
+{
+   struct memblock_region *reg;
+   unsigned long total_pages = 0;
+
+   /*
+* We cannot use memblock_phys_mem_size() here, because
+* memblock_analyze() has not been called yet.
+*/
+   for_each_memblock(memory, reg)
+   total_pages += memblock_region_memory_end_pfn(reg) -
+  memblock_region_memory_base_pfn(reg);
+   return total_pages;
+}
+


On Tue, 18 Oct 2011 06:43:21 -0700, Mel Gorman m...@csn.ul.ie wrote:

Is this being called too early yet? What prevents you seeing up the CMA
regions after the page allocator is brought up for example? I understand
that there is a need for the memory to be coherent so maybe that is the
obstacle.


Another reason is that we want to be sure that we can get given range of pages.
After page allocator is set-up, someone could allocate a non-movable page from
the range that interests us and that wouldn't be nice for us.


+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+  unsigned int align)
+{
+   struct cma *cma = get_dev_cma_area(dev);
+   unsigned long pfn, pageno;
+   int ret;
+
+   if (!cma)
+   return NULL;
+
+   if (align  CONFIG_CMA_ALIGNMENT)
+   align = CONFIG_CMA_ALIGNMENT;
+
+   pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma,
+count, align);
+
+   if (!count)
+   return NULL;
+
+   mutex_lock(cma_mutex);
+
+   pageno = bitmap_find_next_zero_area(cma-bitmap, cma-count, 0, count,
+   (1  align) - 1);
+   if (pageno = cma-count) {
+   ret = -ENOMEM;
+   goto error;
+   }
+   bitmap_set(cma-bitmap, pageno, count);
+
+   pfn = cma-base_pfn + pageno;
+   ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+   if (ret)
+   goto free;
+



If alloc_contig_range returns failure, the bitmap is still set. It will
never be freed so now the area cannot be used for CMA allocations any
more.


bitmap is cleared at the “free:” label.


+   mutex_unlock(cma_mutex);
+
+   pr_debug(%s(): returned %p\n, __func__, pfn_to_page(pfn));
+   return pfn_to_page(pfn);
+free:
+   bitmap_clear(cma-bitmap, pageno, count);
+error:
+   mutex_unlock(cma_mutex);
+   return NULL;
+}




+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+   int count)
+{
+   struct cma *cma = get_dev_cma_area(dev);
+   unsigned long pfn;
+
+   if (!cma || !pages)
+   return 0;
+
+   pr_debug(%s(page %p)\n, __func__, (void *)pages);
+
+   pfn = page_to_pfn(pages);
+
+   if (pfn  cma-base_pfn || pfn = cma-base_pfn + cma-count)
+   return 0;
+
+   mutex_lock(cma_mutex);
+
+   bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count);
+   free_contig_pages(pfn, count);
+
+   mutex_unlock(cma_mutex);


It feels like the mutex could be a lot lighter here. If the bitmap is
protected by a spinlock, it would only need to be held while the bitmap
was being cleared. free the contig pages outside the spinlock and clear
the bitmap afterwards.

It's not particularly important as the scalability of CMA is not
something to be concerned with at this point.


Mutex is used also to protect the core operations, ie. isolating pages
and such.  This is because two CMA calls may want to work on the same
pageblock and we have to prevent that from happening.

We could add the spinlock for protecting the bitmap but we will still
need mutex for other uses.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-10-23 Thread Michal Nazarewicz


On Fri, 21 Oct 2011 03:06:24 -0700, Mel Gorman m...@csn.ul.ie wrote:


On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:

On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman m...@csn.ul.ie wrote:

At this point, I'm going to apologise for not reviewing this a long long
time ago.

On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range. Caller has to guarantee that all pages in range
are in buddy system.


Straight away, I'm wondering why you didn't use

mm/compaction.c#isolate_freepages()

It knows how to isolate pages within ranges. All its control information
is passed via struct compact_control() which I recognise may be awkward
for CMA but compaction.c know how to manage all the isolated pages and
pass them to migrate.c appropriately.

It is something to consider.  At first glance, I see that isolate_freepages
seem to operate on pageblocks which is not desired for CMA.



isolate_freepages_block operates on a range of pages that happens to be
hard-coded to be a pageblock because that was the requirements. It calculates
end_pfn and it is possible to make that a function parameter.


Yes, this seems doable.  I'll try and rewrite the patches to use it.

The biggest difference is in how CMA and compaction treat pages which are not
free.  CMA treat it as an error and compaction just skips those.  This is
solvable by an argument though.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-10-18 Thread Michal Nazarewicz


On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman m...@csn.ul.ie wrote:


At this point, I'm going to apologise for not reviewing this a long long
time ago.

On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:

From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range. Caller has to guarantee that all pages in range
are in buddy system.



Straight away, I'm wondering why you didn't use

mm/compaction.c#isolate_freepages()

It knows how to isolate pages within ranges. All its control information
is passed via struct compact_control() which I recognise may be awkward
for CMA but compaction.c know how to manage all the isolated pages and
pass them to migrate.c appropriately.


It is something to consider.  At first glance, I see that isolate_freepages
seem to operate on pageblocks which is not desired for CMA.


I haven't read all the patches yet but isolate_freepages() does break
everything up into order-0 pages. This may not be to your liking but it
would not be possible to change.


Splitting everything into order-0 pages is desired behaviour.


Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().



mm/compaction.c#release_freepages()


It sort of does the same thing but release_freepages() assumes that pages
that are being freed are not-continuous and they need to be on the lru list.
With free_contig_pages(), we can assume all pages are continuous.


+#if defined(CONFIG_SPARSEMEM)  !defined(CONFIG_SPARSEMEM_VMEMMAP)
+/*
+ * Both PFNs must be from the same zone!  If this function returns
+ * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
+ */
+static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
+{
+   return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
+}
+


Why do you care what section the page is in? The zone is important all
right, but not the section. Also, offhand I'm unsure if being in the
same section guarantees the same zone. sections are ordinarily fully
populated (except on ARM but hey) but I can't remember anything
enforcing that zones be section-aligned.

Later I think I see that the intention was to reduce the use of
pfn_to_page().


That is correct.


You can do this in a more general fashion by checking the
zone boundaries and resolving the pfn-page every MAX_ORDER_NR_PAGES.
That will not be SPARSEMEM specific.


I've tried doing stuff that way but it ended up with much more code.

Dave suggested the above function to check if pointer arithmetic is valid.

Please see also https://lkml.org/lkml/2011/9/21/220.




+#else
+
+#define zone_pfn_same_memmap(pfn1, pfn2) (true)
+
+#endif
+
 #endif /* !__GENERATING_BOUNDS.H */
 #endif /* !__ASSEMBLY__ */
 #endif /* _LINUX_MMZONE_H */




@@ -5706,6 +5706,73 @@ out:
spin_unlock_irqrestore(zone-lock, flags);
 }

+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+  gfp_t flag)
+{
+   unsigned long pfn = start, count;
+   struct page *page;
+   struct zone *zone;
+   int order;
+
+   VM_BUG_ON(!pfn_valid(start));


VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
caller sees reasonable.


+   page = pfn_to_page(start);
+   zone = page_zone(page);
+
+   spin_lock_irq(zone-lock);
+
+   for (;;) {
+   VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
+ page_zone(page) != zone);
+


Here you will VM_BUG_ON with the zone lock held leading to system
halting very shortly.


+   list_del(page-lru);
+   order = page_order(page);
+   count = 1UL  order;
+   zone-free_area[order].nr_free--;
+   rmv_page_order(page);
+   __mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+


The callers need to check in advance if watermarks are sufficient for
this. In compaction, it happens in compaction_suitable() because it only
needed to be checked once. Your requirements might be different.


+   pfn += count;
+   if (pfn = end)
+   break;
+   VM_BUG_ON(!pfn_valid(pfn));
+


On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious
overkill.


+   if (zone_pfn_same_memmap(pfn - count, pfn))
+   page += count;
+   else
+   page = pfn_to_page(pfn);
+   }
+
+   spin_unlock_irq(zone-lock);
+
+   /* After this, pages in the range can be freed one be one */
+   count = pfn - start;
+   pfn = start;
+   for (page = pfn_to_page(pfn); count; --count) {
+   prep_new_page(page, 0, flag);
+   ++pfn;
+   if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
+

Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-10-18 Thread Michal Nazarewicz


On Tue, 18 Oct 2011 10:48:46 -0700, Dave Hansen d...@linux.vnet.ibm.com wrote:


On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote:

 You can do this in a more general fashion by checking the
 zone boundaries and resolving the pfn-page every MAX_ORDER_NR_PAGES.
 That will not be SPARSEMEM specific.

I've tried doing stuff that way but it ended up with much more code.


I guess instead of:


+static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
+{
+return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
+}


You could do:

static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long 
pfn2)
{
unsigned long mask = MAX_ORDER_NR_PAGES-1;
return (pfn1  mask) == (pfn2  mask);
}

I think that works.  Should be the same code you have now, basically.


Makes sense.  It'd require calling pfn_to_page() every MAX_ORDER_NR_PAGES even
in memory models that have linear mapping of struct page, but I guess that's
not that bad.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-10-16 Thread Michal Nazarewicz


On Sat, 15 Oct 2011 01:29:33 +0200, Andrew Morton a...@linux-foundation.org 
wrote:


On Thu, 06 Oct 2011 15:54:42 +0200
Marek Szyprowski m.szyprow...@samsung.com wrote:


From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

This commit introduces alloc_contig_freed_pages() function


The freed seems redundant to me.  Wouldn't alloc_contig_pages be a
better name?


The “freed” is there because the function operates on pages that are in
buddy system, ie. it is given a range of PFNs that are to be removed
from buddy system.

There's also a alloc_contig_range() function (added by next patch)
which frees pages in given range and then calls
alloc_contig_free_pages() to allocate them.

IMO, if there was an alloc_contig_pages() function, it would have to
be one level up (ie. it would figure out where to allocate memory and
then call alloc_contig_range()).  (That's really what CMA is doing).

Still, as I think of it now, maybe alloc_contig_free_range() would be
better?
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-10-16 Thread Michal Nazarewicz


On Sun, 16 Oct 2011 10:01:36 +0200 Michal Nazarewicz wrote:

Still, as I think of it now, maybe alloc_contig_free_range() would be
better?


On Sun, 16 Oct 2011 10:31:16 +0200, Andrew Morton wrote:

Nope.  Of *course* the pages were free.  Otherwise we couldn't
(re)allocate them.  I still think the free part is redundant.


Makes sense.


What could be improved is the alloc part.  This really isn't an
allocation operation.  The pages are being removed from buddy then
moved into the free arena of a different memory manager from where they
will _later_ be allocated.


Not quite.  After alloc_contig_range() returns, the pages are passed with
no further processing to the caller.  Ie. the area is not later split into
several parts nor kept in CMA's pool unused.

alloc_contig_freed_pages() is a little different since it must be called on
a buddy page boundary and may return more then requested (because of the way
buddy system merges buddies) so there is a little processing after it returns
(namely freeing of the excess pages).


So we should move away from the alloc/free naming altogether for this
operation and think up new terms.  How about claim and release?
claim_contig_pages, claim_contig_range, release_contig_pages, etc?
Or we could use take/return.


Personally, I'm not convinced about changing the names of alloc_contig_range()
and free_contig_pages() but I see merit in changing alloc_contig_freed_pages()
to something else.

Since at the moment, it's used only by alloc_contig_range(), I'd lean
towards removing it from page-isolation.h, marking as static and renaming
to __alloc_contig_range().


Also, if we have no expectation that anything apart from CMA will use
these interfaces (?), the names could/should be prefixed with cma_.


In Kamezawa's original patchset, he used those for a bit different
approach (IIRC, Kamezawa's patchset introduced a function that scanned memory
and tried to allocate contiguous memory where it could), so I can imagine that
someone will make use of those functions.  It may be used in any situation
where a range of pages is either free (ie. in buddy system) or movable and
one wants to allocate them for some reason.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +--min...@mina86.com---min...@jabber.org---ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/8] mm: alloc_contig_freed_pages() added

2011-09-21 Thread Michal Nazarewicz

On Thu, 08 Sep 2011 20:05:52 +0200, Dave Hansen d...@linux.vnet.ibm.com  
wrote:



On Fri, 2011-08-19 at 16:27 +0200, Marek Szyprowski wrote:
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned  
long end,

+  gfp_t flag)
+{
+   unsigned long pfn = start, count;
+   struct page *page;
+   struct zone *zone;
+   int order;
+
+   VM_BUG_ON(!pfn_valid(start));
+   zone = page_zone(pfn_to_page(start));


This implies that start-end are entirely contained in a single zone.
What enforces that?


In case of CMA, the __cma_activate_area() function from 6/8 has the check:

 151 VM_BUG_ON(!pfn_valid(pfn));
 152 VM_BUG_ON(page_zone(pfn_to_page(pfn)) !=  
zone);


This guarantees that CMA will never try to call alloc_contig_freed_pages()
on a region that spans multiple regions.


If some higher layer enforces that, I think we probably need at least
a VM_BUG_ON() in here and a comment about who enforces it.


Agreed.


+   spin_lock_irq(zone-lock);
+
+   page = pfn_to_page(pfn);
+   for (;;) {
+   VM_BUG_ON(page_count(page) || !PageBuddy(page));
+   list_del(page-lru);
+   order = page_order(page);
+   zone-free_area[order].nr_free--;
+   rmv_page_order(page);
+   __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL  order));
+   pfn  += 1  order;
+   if (pfn = end)
+   break;
+   VM_BUG_ON(!pfn_valid(pfn));
+   page += 1  order;
+   }



This 'struct page *'++ stuff is OK, but only for small, aligned areas.
For at least some of the sparsemem modes (non-VMEMMAP), you could walk
off of the end of the section_mem_map[] when you cross a MAX_ORDER
boundary.  I'd feel a little bit more comfortable if pfn_to_page() was
being done each time, or only occasionally when you cross a section
boundary.


I'm fine with that.  I've used pointer arithmetic for performance reasons
but if that may potentially lead to bugs then obviously pfn_to_page()  
should

be used.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] fixup! mm: alloc_contig_freed_pages() added

2011-09-21 Thread Michal Nazarewicz

From: Michal Nazarewicz min...@mina86.com

Signed-off-by: Michal Nazarewicz min...@mina86.com

---
 include/linux/page-isolation.h |4 ++-
 mm/page_alloc.c|   66 ++-
 2 files changed, 60 insertions(+), 10 deletions(-)

 On Fri, 2011-08-19 at 16:27 +0200, Marek Szyprowski wrote:
 +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned  
 long end,
 +   gfp_t flag)
 +{
 +unsigned long pfn = start, count;
 +struct page *page;
 +struct zone *zone;
 +int order;
 +
 +VM_BUG_ON(!pfn_valid(start));
 +zone = page_zone(pfn_to_page(start));

On Thu, 08 Sep 2011 20:05:52 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote:
 This implies that start-end are entirely contained in a single zone.
 What enforces that?  If some higher layer enforces that, I think we
 probably need at least a VM_BUG_ON() in here and a comment about who
 enforces it.

 +spin_lock_irq(zone-lock);
 +
 +page = pfn_to_page(pfn);
 +for (;;) {
 +VM_BUG_ON(page_count(page) || !PageBuddy(page));
 +list_del(page-lru);
 +order = page_order(page);
 +zone-free_area[order].nr_free--;
 +rmv_page_order(page);
 +__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL  order));
 +pfn  += 1  order;
 +if (pfn = end)
 +break;
 +VM_BUG_ON(!pfn_valid(pfn));
 +page += 1  order;
 +}

 This 'struct page *'++ stuff is OK, but only for small, aligned areas.
 For at least some of the sparsemem modes (non-VMEMMAP), you could walk
 off of the end of the section_mem_map[] when you cross a MAX_ORDER
 boundary.  I'd feel a little bit more comfortable if pfn_to_page() was
 being done each time, or only occasionally when you cross a section
 boundary.

Do the attached changes seem to make sense?

I wanted to avoid calling pfn_to_page() each time as it seem fairly
expensive in sparsemem and disctontig modes.  At the same time, the
macro trickery is so that users of sparsemem-vmemmap and flatmem won't
have to pay the price.

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index b2a81fd..003c52f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,11 +46,13 @@ static inline void unset_migratetype_isolate(struct page 
*page)
 {
__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
 }
+
+/* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
  unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
  gfp_t flags, unsigned migratetype);
-extern void free_contig_pages(struct page *page, int nr_pages);
+extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 46e78d4..32fda5d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5716,9 +5716,41 @@ out:
spin_unlock_irqrestore(zone-lock, flags);
 }
 
+#if defined(CONFIG_FLATMEM) || defined(CONFIG_SPARSEMEM_VMEMMAP)
+
+/*
+ * In FLATMEM and CONFIG_SPARSEMEM_VMEMMAP we can safely increment the page
+ * pointer and get the same value as if we were to get by calling
+ * pfn_to_page() on incremented pfn counter.
+ */
+#define __contig_next_page(page, pageblock_left, pfn, increment) \
+   ((page) + (increment))
+
+#define __contig_first_page(pageblock_left, pfn) pfn_to_page(pfn)
+
+#else
+
+/*
+ * If we cross pageblock boundary, make sure we get a valid page pointer.  If
+ * we are within pageblock, incrementing the pointer is good enough, and is
+ * a bit of an optimisation.
+ */
+#define __contig_next_page(page, pageblock_left, pfn, increment)   \
+   (likely((pageblock_left) -= (increment)) ? (page) + (increment) \
+: (((pageblock_left) = pageblock_nr_pages), pfn_to_page(pfn)))
+
+#define __contig_first_page(pageblock_left, pfn) ( \
+   ((pageblock_left) = pageblock_nr_pages -\
+((pfn)  (pageblock_nr_pages - 1))),   \
+   pfn_to_page(pfn))
+
+
+#endif
+
 unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
   gfp_t flag)
 {
+   unsigned long pageblock_left __unused;
unsigned long pfn = start, count;
struct page *page;
struct zone *zone;
@@ -5729,27 +5761,37 @@ unsigned long alloc_contig_freed_pages(unsigned long 
start, unsigned long end,
 
spin_lock_irq(zone-lock);
 
-   page = pfn_to_page(pfn);
+   page = __contig_first_page(pageblock_left, pfn);
for (;;) {
-   VM_BUG_ON(page_count(page) || !PageBuddy(page));
+   VM_BUG_ON(!page_count(page) || !PageBuddy(page

[PATCH 1/3] fixup! mm: alloc_contig_freed_pages() added

2011-09-21 Thread Michal Nazarewicz

From: Michal Nazarewicz min...@mina86.com

Signed-off-by: Michal Nazarewicz min...@mina86.com

---
 include/asm-generic/memory_model.h |   17 ++
 include/linux/page-isolation.h |4 ++-
 mm/page_alloc.c|   43 +++
 3 files changed, 53 insertions(+), 11 deletions(-)

 On Wed, 2011-09-21 at 17:19 +0200, Michal Nazarewicz wrote:
 I wanted to avoid calling pfn_to_page() each time as it seem fairly
 expensive in sparsemem and disctontig modes.  At the same time, the
 macro trickery is so that users of sparsemem-vmemmap and flatmem won't
 have to pay the price.

On Wed, 21 Sep 2011 17:45:59 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote:
 Personally, I'd say the (incredibly minuscule) runtime cost is worth the
 cost of making folks' eyes bleed when they see those macros.  I think
 there are some nicer ways to do it.

Yeah.  I wasn't amazed by them either.

 Is there a reason you can't logically do?
   page = pfn_to_page(pfn);
   for (;;) {
   if (pfn_to_section_nr(pfn) == pfn_to_section_nr(pfn+1))
   page++;
   else
   page = pfn_to_page(pfn+1);
   }

Done.  Thanks for the suggestions!

 +#define __contig_next_page(page, pageblock_left, pfn, increment)\
 +(likely((pageblock_left) -= (increment)) ? (page) + (increment) \
 + : (((pageblock_left) = pageblock_nr_pages), pfn_to_page(pfn)))
 +
 +#define __contig_first_page(pageblock_left, pfn) (  \
 +((pageblock_left) = pageblock_nr_pages -\
 + ((pfn)  (pageblock_nr_pages - 1))),   \
 +pfn_to_page(pfn))
 +
 +#endif

 For the love of Pete, please make those in to functions if you're going
 to keep them.

That was tricky because they modify pageblock_left.  Not relevant now
anyways though.

diff --git a/include/asm-generic/memory_model.h 
b/include/asm-generic/memory_model.h
index fb2d63f..900da88 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -69,6 +69,23 @@
 })
 #endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */
 
+#if defined(CONFIG_SPARSEMEM)  !defined(CONFIG_SPARSEMEM_VMEMMAP)
+
+/*
+ * Both PFNs must be from the same zone!  If this function returns
+ * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
+ */
+static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2)
+{
+   return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
+}
+
+#else
+
+#define zone_pfn_same_memmap(pfn1, pfn2) (true)
+
+#endif
+
 #define page_to_pfn __page_to_pfn
 #define pfn_to_page __pfn_to_page
 
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index b2a81fd..003c52f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -46,11 +46,13 @@ static inline void unset_migratetype_isolate(struct page 
*page)
 {
__unset_migratetype_isolate(page, MIGRATE_MOVABLE);
 }
+
+/* The below functions must be run on a range from a single zone. */
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
  unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
  gfp_t flags, unsigned migratetype);
-extern void free_contig_pages(struct page *page, int nr_pages);
+extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 46e78d4..bc200a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5725,31 +5725,46 @@ unsigned long alloc_contig_freed_pages(unsigned long 
start, unsigned long end,
int order;
 
VM_BUG_ON(!pfn_valid(start));
-   zone = page_zone(pfn_to_page(start));
+   page = pfn_to_page(start);
+   zone = page_zone(page);
 
spin_lock_irq(zone-lock);
 
-   page = pfn_to_page(pfn);
for (;;) {
-   VM_BUG_ON(page_count(page) || !PageBuddy(page));
+   VM_BUG_ON(!page_count(page) || !PageBuddy(page) ||
+ page_zone(page) != zone);
+
list_del(page-lru);
order = page_order(page);
+   count = 1UL  order;
zone-free_area[order].nr_free--;
rmv_page_order(page);
-   __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL  order));
-   pfn  += 1  order;
+   __mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count);
+
+   pfn += count;
if (pfn = end)
break;
VM_BUG_ON(!pfn_valid(pfn));
-   page += 1  order;
+
+   if (zone_pfn_same_memmap(pfn - count, pfn))
+   page += count;
+   else
+   page = pfn_to_page(pfn);
}
 
spin_unlock_irq(zone-lock);
 
/* After

Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator

2011-07-06 Thread Michal Nazarewicz


On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter c...@linux.com wrote:
ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot  
DMA into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL

can be used for DMA as well and a fully capable device would be expected
to handle any memory in the system for DMA transfers.

guaranteed dmaable memory? DMA abilities are device specific. Well  
maybe you can call ZONE_DMA memory to be guaranteed if you guarantee

that any device must at mininum be able to perform DMA into ZONE_DMA
memory. But there may not be much of that memory around so you would
want to limit the use of that scarce resource.


As pointed in Marek's other mail, this reasoning is not helping in any
way.  In case of video codec on various Samsung devices (and from some
other threads this is not limited to Samsung), the codec needs separate
buffers in separate memory banks.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Michal Nazarewicz

On Wed, 22 Jun 2011 09:03:30 +0200, Hans Verkuil hverk...@xs4all.nl  
wrote:
What I was wondering about is how this patch series changes the  
allocation in case it can't allocate from the CMA pool. Will it

attempt to fall back to a 'normal' allocation?


Unless Marek changed something since I wrote the code, which I doubt,
if CMA cannot obtain memory from CMA region, it will fail.

Part of the reason is that CMA lacks the knowledge where to allocate
memory from.  For instance, with the case of several memory banks,
it does not know which memory bank to allocate from.

It is, in my opinion, a task for a higher level functions (read:
DMA layer) to try another mechanism if CMA fails.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Michal Nazarewicz


On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote:

We could also go further and add a runtime sysctl mechanism like the
one for hugepages, where you can grow the pool at run time as long
as there is enough free contiguous memory (e.g. from init scripts),
or shrink it later if you want to allow larger nonmovable allocations.


On Wed, 22 Jun 2011 15:15:35 +0200, Marek Szyprowski wrote:

Sounds really good, but it might be really hard to implement, at
least for CMA, because it needs to tweak parameters of memory
management internal structures very early, when buddy allocator
has not been activated yet.


If you are able to allocate a pageblock of free memory from buddy system,
you should be able to convert it to CMA memory with no problems.

Also, if you want to convert CMA memory back to regular memory you
should be able to do that even if some of the memory is used by CMA
(it just won't be available right away but only when CMA frees it).

It is important to note that, because of the use of migration type,
all such conversion have to be performed on pageblock basis.

I don't think this is a feature we should consider for the first patch
though.  We started with an overgrown idea about what CMA might do
and it didn't got us far.  Let's first get the basics right and
then start implementing features as they become needed.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Michal Nazarewicz


On Wed, 22 Jun 2011 15:39:23 +0200, Arnd Bergmann a...@arndb.de wrote:

Why that? I would expect you can do the same that hugepages (used to) do
and just attempt high-order allocations. If they succeed, you can add  
them as a CMA region and free them again, into the movable set of pages,  
otherwise you just fail the  request from user space when the memory is

already fragmented.


Problem with that is that CMA needs to have whole pageblocks allocated
and buddy can allocate at most half a pageblock.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Michal Nazarewicz


On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann a...@arndb.de wrote:

* We still need to solve the same problem in case of IOMMU mappings
  at some point, even if today's hardware doesn't have this combination.
  It would be good to use the same solution for both.


I don't think I follow.  What does IOMMU has to do with CMA?

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Michal Nazarewicz


On Wed, 15 Jun 2011 13:20:42 +0200, Arnd Bergmann a...@arndb.de wrote:

The point is that on the higher level device drivers, we want to
hide the presence of CMA and/or IOMMU behind the dma mapping API,
but the device drivers do need to know about the bank properties.


Gotcha, thanks.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-14 Thread Michal Nazarewicz


On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote:

Please explain the exact requirements that lead you to defining multiple
contexts.


Some devices may have access only to some banks of memory.  Some devices
may use different banks of memory for different purposes.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-14 Thread Michal Nazarewicz


On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote:
Please explain the exact requirements that lead you to defining  
multiple contexts.



On Tuesday 14 June 2011, Michal Nazarewicz wrote:

Some devices may have access only to some banks of memory.  Some devices
may use different banks of memory for different purposes.


On Tue, 14 Jun 2011 18:03:00 +0200, Arnd Bergmann wrote:
For all I know, that is something that is only true for a few very  
special Samsung devices,


Maybe.  I'm just answering your question. :)

Ah yes, I forgot that separate regions for different purposes could
decrease fragmentation.


I would suggest going forward without having multiple regions:


Is having support for multiple regions a bad thing?  Frankly,
removing this support will change code from reading context passed
as argument to code reading context from global variable.  Nothing
is gained; functionality is lost.


* Remove the registration of specific addresses from the initial patch
  set (but keep the patch).
* Add a heuristic plus command-line override to automatically come up
  with a reasonable location+size for *one* CMA area in the system.


I'm not arguing those.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-14 Thread Michal Nazarewicz


On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:

Is having support for multiple regions a bad thing?  Frankly,
removing this support will change code from reading context passed
as argument to code reading context from global variable.  Nothing
is gained; functionality is lost.


On Tue, 14 Jun 2011 20:30:07 +0200, Arnd Bergmann wrote:

What is bad IMHO is making them the default, which forces the board
code to care about memory management details. I would much prefer
to have contiguous allocation parameters tuned automatically to just
work on most boards before we add ways to do board-specific hacks.


I see those as orthogonal problems.  The code can have support for
multiple contexts but by default use a single global context exported
as cma_global variable (or some such).

And I'm not arguing against having “contiguous allocation parameters
tuned automatically to just work on most boards”.  I just don't see
the reason to delete functionality that is already there, does not
add much code and can be useful.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/10] lib: genalloc: Generic allocator improvements

2011-06-10 Thread Michal Nazarewicz

On Fri, 10 Jun 2011 14:52:17 +0200, Alan Cox a...@lxorguk.ukuu.org.uk  
wrote:


I plan to replace it with lib/bitmap.c bitmap_* based allocator  
(similar like

it it is used by dma_declare_coherent_memory() and friends in
drivers/base/dma-coherent.c). We need something really simple for CMA  
area

management.

IMHO allocate_resource and friends a bit too heavy here, but good to  
know

that such allocator also exists.


Not sure I'd class allocate_resource as heavyweight but providing it's
using something that already exists rather than inventing yet another
allocator.


genalloc is already in the kernel and is used in a few places, so we
either let everyone use it as they see fit or we deprecate the library.
If we don't deprecate it I see no reason why CMA should not use it.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

2011-04-04 Thread Michal Nazarewicz


On Fri, 2011-04-01 at 00:51 +0200, Michal Nazarewicz wrote:

The function is called from alloc_contig_range() (see patch 05/12) which
makes sure that the PFN is valid.  Situation where there is not enough
space is caught earlier in alloc_contig_range().

alloc_contig_freed_pages() must be given a valid PFN range such that all
the pages in that range are free (as in are within the region tracked by
page allocator) and of MIGRATE_ISOLATE so that page allocator won't
touch them.


On Fri, 01 Apr 2011 16:03:16 +0200, Dave Hansen wrote:

OK, so it really is a low-level function only.  How about a comment that
explicitly says this?  Only called from $FOO with the area already
isolated.  It probably also deserves an __ prefix.


Yes, it's not really for general use.  Comment may indeed be useful here.


That's why invalid PFN is a bug in the caller and not an exception that
has to be handled.

Also, the function is not called during boot time.  It is called while
system is already running.



What kind of success have you had running this in practice?  I'd be
worried that some silly task or a sticky dentry would end up in the
range that you want to allocate in.


I'm not sure what you are asking.

The function requires the range to be marked as MIGRATE_ISOLATE and all
pages being free, so nothing can be allocated there while the function
is running.

If you are asking about CMA in general, the range that CMA uses is marked
as MIGRATE_CMA (a new migrate type) which means that only MIGRATE_MOVABLE
pages can be allocated there.  This means, that in theory, if there is
enough memory the pages can always be moved out of the region.  At leasts
that's my understanding of the type.  If this is correct, the allocation
should always succeed provided enough memory for the pages within the
region to be moved to is available.

As of practice, I have run some simple test to see if the code works and
they succeeded.  Also, Marek has run some test with actual hardware and
those worked well as well (but I'll let Marek talk about any details).

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/12] mm: alloc_contig_range() added

2011-03-31 Thread Michal Nazarewicz


On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:

+   ret = 0;
+   while (!PageBuddy(pfn_to_page(start  (~0UL  ret
+   if (WARN_ON(++ret = MAX_ORDER))
+   return -EINVAL;


On Thu, 31 Mar 2011 18:02:41 +0200, Dave Hansen wrote:

Holy cow, that's dense.  Is there really no more straightforward way to
do that?


Which part exactly is dense?  What would be qualify as a more
straightforward way?


In any case, please pull the ++ret bit out of the WARN_ON().  Some
people like to do:

#define WARN_ON(...) do{}while(0)

to save space on some systems.


I don't think that's the case.  Even if WARN_ON() decides not to print
a warning, it will still return the value of the argument.  If not,
a lot of code will brake.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/12] mm: alloc_contig_range() added

2011-03-31 Thread Michal Nazarewicz


On Thu, 31 Mar 2011 21:28:21 +0200, Steven Rostedt wrote:

WARN_ON() should never do anything but test. That ret++ does not belong
inside the WARN_ON() condition. If there are other locations in the
kernel that do that, then those locations need to be fixed.


Testing implies evaluating, so if we allow:

if (++i == end) { /* ... */ }

I see no reason why not to allow:

if (WARN_ON(++i == end)) { /* ... */ }

In both cases the condition is tested.


On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:

+   ret = 0;
+   while (!PageBuddy(pfn_to_page(start  (~0UL  ret
+   if (WARN_ON(++ret = MAX_ORDER))
+   return -EINVAL;



On Thu, Mar 31, 2011 at 09:02:41AM -0700, Dave Hansen wrote:

In any case, please pull the ++ret bit out of the WARN_ON().  Some
people like to do:

#define WARN_ON(...) do{}while(0)

to save space on some systems.


On Thu, 31 Mar 2011 21:26:50 +0200, Steven Rostedt wrote:

That should be fixed, as the if (WARN_ON()) has become a standard in
most of the kernel. Removing WARN_ON() should be:

#define WARN_ON(x) ({0;})


This would break a lot of code which expect that testing to take place.
Also see http://lxr.linux.no/linux+*/include/asm-generic/bug.h#L108.


But I agree, that there should be no side effects inside a WARN_ON(),
which that ++ret is definitely one.


Thus I don't really agree with this point.

At any rate, I don't really care.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

2011-03-31 Thread Michal Nazarewicz

On Thu, 31 Mar 2011 17:58:03 +0200, Dave Hansen d...@linux.vnet.ibm.com  
wrote:



On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:


+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned  
long end,

+  gfp_t flag)
+{
+   unsigned long pfn = start, count;
+   struct page *page;
+   struct zone *zone;
+   int order;
+
+   VM_BUG_ON(!pfn_valid(start));


This seems kinda mean.  Could we return an error?  I understand that
this is largely going to be an early-boot thing, but surely trying to
punt on crappy input beats a full-on BUG().


Actually, I would have to check but I think that the usage of this function
(in this patchset) is that the caller expects the function to succeed.  It  
is
quite a low-level function so before running it a lot of preparation is  
needed
and the caller must make sure that several conditions are met.  I don't  
really

see advantage of returning a value rather then BUG()ing.

Also, CMA does not call this function at boot time.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/12] mm: alloc_contig_range() added

2011-03-31 Thread Michal Nazarewicz

On Thu, 31 Mar 2011 22:28:42 +0200, Dave Hansen d...@linux.vnet.ibm.com  
wrote:



On Thu, 2011-03-31 at 18:26 +0200, Michal Nazarewicz wrote:

 On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
 +   ret = 0;
 +   while (!PageBuddy(pfn_to_page(start  (~0UL  ret
 +   if (WARN_ON(++ret = MAX_ORDER))
 +   return -EINVAL;

On Thu, 31 Mar 2011 18:02:41 +0200, Dave Hansen wrote:
 Holy cow, that's dense.  Is there really no more straightforward way  
to

 do that?

Which part exactly is dense?  What would be qualify as a more
straightforward way?


I'm still not 100% sure what it's trying to do.  It looks like it
attempts to check all of start's buddy pages.


No.  I'm going up through parents.  This is because even though start
falls in a free block (ie. one that page allocator tracks), the actual
page that is in buddy system is larger then start and this loop looks
for beginning of that page.


int order;
for (order = 0; order = MAX_ORDER; order++) {
unsigned long buddy_pfn = find_buddy(start, order);
struct page *buddy = pfn_to_page(buddy_pfn);
if (PageBuddy(buddy)
break;
WARN();
return -EINVAL;
}


The WARN() and return would have to be outside of the loop and, as I
described, instead of find_buddy() something like find_parent() would
have to be used.


I'm wondering also if you can share some code with __rmqueue().


Doubtful since start does not (have to) point to a page that is tracked
by page allocator but a page inside such a page.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

2011-03-31 Thread Michal Nazarewicz


On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:

+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned
long end,
+  gfp_t flag)
+{
+   unsigned long pfn = start, count;
+   struct page *page;
+   struct zone *zone;
+   int order;
+
+   VM_BUG_ON(!pfn_valid(start));


On Thu, 31 Mar 2011 23:14:38 +0200, Dave Hansen wrote:

We BUG_ON() in bootmem.  Basically if we try to allocate an early-boot
structure and fail, we're screwed.  We can't keep running without an
inode hash, or a mem_map[].

This looks like it's going to at least get partially used in drivers, at
least from the examples.  Are these kinds of things that, if the driver
fails to load, that the system is useless and hosed?  Or, is it
something where we might limp along to figure out what went wrong before
we reboot?


Bug in the above place does not mean that we could not allocate memory.  It
means caller is broken.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

2011-03-31 Thread Michal Nazarewicz

On Fri, 01 Apr 2011 00:26:51 +0200, Dave Hansen d...@linux.vnet.ibm.com  
wrote:



On Fri, 2011-04-01 at 00:18 +0200, Michal Nazarewicz wrote:

On Thu, 31 Mar 2011 23:14:38 +0200, Dave Hansen wrote:
 We BUG_ON() in bootmem.  Basically if we try to allocate an early-boot
 structure and fail, we're screwed.  We can't keep running without an
 inode hash, or a mem_map[].

 This looks like it's going to at least get partially used in drivers,  
at
 least from the examples.  Are these kinds of things that, if the  
driver

 fails to load, that the system is useless and hosed?  Or, is it
 something where we might limp along to figure out what went wrong  
before

 we reboot?

Bug in the above place does not mean that we could not allocate  
memory.  It means caller is broken.


Could you explain that a bit?

Is this a case where a device is mapped to a very *specific* range of
physical memory and no where else?  What are the reasons for not marking
it off limits at boot?  I also saw some bits of isolation and migration
in those patches.  Can't the migration fail?


The function is called from alloc_contig_range() (see patch 05/12) which
makes sure that the PFN is valid.  Situation where there is not enough
space is caught earlier in alloc_contig_range().

alloc_contig_freed_pages() must be given a valid PFN range such that all
the pages in that range are free (as in are within the region tracked by
page allocator) and of MIGRATETYPE_ISOLATE so that page allocator won't
touch them.

That's why invalid PFN is a bug in the caller and not an exception that
has to be handled.

Also, the function is not called during boot time.  It is called while
system is already running.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFD] frame-size switching: preview / single-shot use-case

2011-02-18 Thread Michal Nazarewicz

On Fri, 18 Feb 2011 11:31:30 +0100, Laurent Pinchart  
laurent.pinch...@ideasonboard.com wrote:



Hi Michal,

On Friday 18 February 2011 00:09:51 Michal Nazarewicz wrote:

 On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote:
 There's an additional problem with that: assume that streaming is
 happening, and a S_FMT changing the resolution was sent. There's
 no way to warrant that the very next frame will have the new
 resolution. So, a meta-data with the frame resolution (and format)
 would be needed.

 Em 17-02-2011 17:26, Guennadi Liakhovetski escreveu:
 Sorry, we are not going to allow format changes during a running
 capture. You have to stop streaming, set new formats (possibly
 switch to another queue) and restart streaming.

 What am I missing?

 On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote:
 If you're stopping the stream, the current API will work as-is.

 If all of your concerns is about reserving a bigger buffer queue, I
 think that one of the reasons for the CMA allocator it for such  
usage.


From: Guennadi Liakhovetski g.liakhovet...@gmx.de

 Not just bigger, say, with our preview / still-shot example, we would
 have one queue with a larger number of small buffers for drop-free
 preview, and a small number of larger buffers for still images.

Ie. waste memory? As in you have both those queues allocated but only
one is used at given time?


It's a trade-off between memory and speed. Preallocating still image  
capture
buffers will give you better snapshot performances, at the expense of  
memory.


The basic problems we have here is that taking snapshots is slow with the
current API if we need to stop capture, free buffers, change the format,
allocate new buffers (and perform cache management operations) and  
restart the
stream. To fix this we're considering a way to preallocate still image  
capture

buffers, but I'm open to proposals for other ways to solve the issue :-)


What I'm trying to say is that it would be best if one could configure the
device in such a way that switching between modes would not require the
device to free buffers (even though in user space they could be  
inaccessible).



This is what I have in mind the usage would look like:

1. Open device
Kernel creates some control structures, the usual stuff.
2. Initialize multi-format (specifying what formats user space will use).
Kernel calculates amount of memory needed for the most
demanding format and allocates it.

3. Set format (restricted to one of formats specified in step 2)
Kernel has memory already allocated, so it only needs to split
it to buffers needed in given format.
4. Allocate buffers.
Kernel allocates memory needed for most demanding format
(calculated in step 2).
Once memory is allocated, splits it to buffers needed in
given format.
5. Do the loop... queue, dequeue, all the usual stuff.
Kernel instructs device to handle buffers, the usual stuff.
6. Free buffers.
Kernel space destroys the buffers needed for given format
but DOES NOT free memory.

7. If not done, go to step 3.

8. Finish multi-format mode.
Kernel actually frees the memory.

9. Close the device.

A V4L device driver could just ignore step 2 and 7 and work in the less  
optimal

mode.


If I understand V4L2 correctly, the API does not allow for step 2 and 8.
In theory, they could be merged with step 1 and 9 respectively, I don't
know id that feasible though.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFD] frame-size switching: preview / single-shot use-case

2011-02-18 Thread Michal Nazarewicz


Em 18-02-2011 08:31, Laurent Pinchart escreveu:
It's a trade-off between memory and speed. Preallocating still image  
capture buffers will give you better snapshot performances, at the

expense of memory.

The basic problems we have here is that taking snapshots is slow with  
the current API if we need to stop capture, free buffers, change the

format, allocate new buffers (and perform cache management operations)
and restart the stream. To fix this we're considering a way to
preallocate still image capture buffers, but I'm open to proposals for
other ways to solve the issue :-)


On Fri, 18 Feb 2011 13:33:12 +0100, Mauro Carvalho Chehab wrote:

From the above operations, considering that CMA is used to reserve a
non-shared memory with enough space for the new buffer size/qtd, I don't
think that the most expensive operation would be to realloc the memory.

The logic to stop/start streaming seems to be the most consuming one, as  
driver will need to wait for the current I/O operation to complete, and

this can take hundreds of milisseconds (the duration of one frame).

How much time would CMA need to free and re-allocate the buffers for,  
let's say, something in the range of 1-10 MB, on a pre-allocated, non

shared memory space?


Like I said, if memory is shared with page allocator this can potentially
take a lot of time.  If device driver could preallocate some space and
then spit it into buffers when needed, it could be a better solution then
to free chunk and allocate a new one.  CMA was not designed with very low
latency in mind.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFD] frame-size switching: preview / single-shot use-case

2011-02-18 Thread Michal Nazarewicz

On Fri, 18 Feb 2011 13:57:25 +0100, Laurent Pinchart  
laurent.pinch...@ideasonboard.com wrote:



Hi Michal,

On Friday 18 February 2011 12:37:24 Michal Nazarewicz wrote:

[snip]

What I'm trying to say is that it would be best if one could configure  
the device in such a way that switching between modes would not require

the device to free buffers (even though in user space they could be
inaccessible).


This is what I have in mind the usage would look like:

1. Open device
Kernel creates some control structures, the usual stuff.
2. Initialize multi-format (specifying what formats user space will  
use).

Kernel calculates amount of memory needed for the most
demanding format and allocates it.


Don't forget that applications can also use USERPTR. We need a  
low-latency solution for that as well.


That would probably work best if user provided one big buffer.  Again,
I don't know how this maps to V4L API.




3. Set format (restricted to one of formats specified in step 2)
Kernel has memory already allocated, so it only needs to split
it to buffers needed in given format.
4. Allocate buffers.
Kernel allocates memory needed for most demanding format
(calculated in step 2).
Once memory is allocated, splits it to buffers needed in
given format.
5. Do the loop... queue, dequeue, all the usual stuff.
Kernel instructs device to handle buffers, the usual stuff.


When buffers are queued cache needs to be cleaned. This is an expensive
operation, and we need to be able to pre-queue (or at least pre-clean)
buffers.


Cache operations are always needed, aren't they?  Whatever you do, you
will always have to handle cache coherency (in one way or another) so
there's nothing we can do about it, or is there?


6. Free buffers.
Kernel space destroys the buffers needed for given format
but DOES NOT free memory.

7. If not done, go to step 3.

8. Finish multi-format mode.
Kernel actually frees the memory.

9. Close the device.

A V4L device driver could just ignore step 2 and 7 and work in the less
optimal mode.

If I understand V4L2 correctly, the API does not allow for step 2 and 8.
In theory, they could be merged with step 1 and 9 respectively, I don't
know id that feasible though.


--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFD] frame-size switching: preview / single-shot use-case

2011-02-18 Thread Michal Nazarewicz


On Friday 18 February 2011 14:19:44 Michal Nazarewicz wrote:

Cache operations are always needed, aren't they?  Whatever you do, you
will always have to handle cache coherency (in one way or another) so
there's nothing we can do about it, or is there?


On Fri, 18 Feb 2011 14:21:53 +0100, Laurent Pinchart wrote:

To achieve low latency still image capture, you need to minimize the time
spent reconfiguring the device from viewfinder to still capture. Cache
cleaning is always needed, but you can prequeue buffers you can clean the
cache in advance, avoiding an extra delay when the user presses the still
image capture button.


If there is enough time to perform those operation while preview is shown  
(ie.

several frames pare second), why would there not be enough time to perform
those operations for still image capture?

If I understand you correctly, what you are describing is a situation where
one has set of buffers for preview and a buffer for still image laying
around waiting to be used, right?

Such scheme will of course work, but I'm just suggesting to think of
a scheme where the unused buffer for still image is reused for preview
frames when preview is shown.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFD] frame-size switching: preview / single-shot use-case

2011-02-17 Thread Michal Nazarewicz


On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote:
There's an additional problem with that: assume that streaming is  
happening, and a S_FMT changing the resolution was sent. There's

no way to warrant that the very next frame will have the new
resolution. So, a meta-data with the frame resolution (and format)
would be needed.



Em 17-02-2011 17:26, Guennadi Liakhovetski escreveu:
Sorry, we are not going to allow format changes during a running  
capture. You have to stop streaming, set new formats (possibly

switch to another queue) and restart streaming.

What am I missing?



On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote:

If you're stopping the stream, the current API will work as-is.

If all of your concerns is about reserving a bigger buffer queue, I  
think that one of the reasons for the CMA allocator it for such usage.


From: Guennadi Liakhovetski g.liakhovet...@gmx.de
Not just bigger, say, with our preview / still-shot example, we would  
have one queue with a larger number of small buffers for drop-free

preview, and a small number of larger buffers for still images.


Ie. waste memory? As in you have both those queues allocated but only
one is used at given time?


Currently you would have to allocate a large number of large buffers,
which would waste memory. Or you would have to reallocate the queue,
losing time.



AFAICS, CMA doesn't manage our memory for us. It only provides an API
to reserve memory for various uses with various restrictions (alignment,
etc.) and use different allocators to obtain that memory.


I'm not sure if I understand you here.  CMA has some API for reserving
memory at boot time but it sure does manage this reserved memory, ie.
when system is running you can allocate chunks of memory from this
reserved block.

Also note, that latest CMA uses only one allocator.


So, are you suggesting, that with that in place, we would
first allocate the preview queue from this memory, then free it, when
switching to snapshooting, allocate our large-buffer queue from the
_same_ memory, capture images, free and allocate preview queue again?
Would that be fast enough?


If CMA is considered, the most important thing to note is that CMA may
share memory with page allocator (so that other parts of the system can
use it if CMA-compatible devices are not using it).  When CMA allocates
memory chunk it may potentially need to migrate memory pages which may
take so time (there is room for improvement, but still).

Sharing can be disabled in which case allocation should be quite fast
(the last CMA patchset uses a first-fit bitmap-based gen_allocator API
but O(log n) best-fit algorithm can easily used instead).

To sum things up, if sharing is disabled, CMA should be able to fulfil
your requirements, however it may be undesirable as it wastes space.
If sharing is enabled, on the other hand, the delay may potentially be
noticeable.


In fact, it would be kind of smart to reuse the same memory for both
queues, but if we could do it without re-allocations?...


What I would do is allocate a few big buffers and when needed divide them
into smaller chunks (or even allocate one big block and later divide it in
whatever way needed).  I'm not sure if such usage would map well to V4L2
API.

This usage is, as a matter of fact, supported by CMA.  You can allocate
a big block and then run cma_create() on it to create a new CMA context.
Using this context you can allocate a lot of small blocks, then free them
all, to finally allocate few big blocks.

Again, I'm not sure how it maps to V4L2 API.  If you can change formats  
while

retaining V4L device instance's state, this should be doable.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv8 07/12] mm: cma: Contiguous Memory Allocator added

2011-02-02 Thread Michal Nazarewicz


On Wed, Dec 15, 2010 at 09:34:27PM +0100, Michal Nazarewicz wrote:

+unsigned long cma_reserve(unsigned long start, unsigned long size,
+ unsigned long alignment)
+{
+   pr_debug(%s(%p+%p/%p)\n, __func__, (void *)start, (void *)size,
+(void *)alignment);
+
+   /* Sanity checks */
+   if (!size || (alignment  (alignment - 1)))
+   return (unsigned long)-EINVAL;
+
+   /* Sanitise input arguments */
+   start = PAGE_ALIGN(start);
+   size  = PAGE_ALIGN(size);
+   if (alignment  PAGE_SIZE)
+   alignment = PAGE_SIZE;
+
+   /* Reserve memory */
+   if (start) {
+   if (memblock_is_region_reserved(start, size) ||
+   memblock_reserve(start, size)  0)
+   return (unsigned long)-EBUSY;
+   } else {
+   /*
+* Use __memblock_alloc_base() since
+* memblock_alloc_base() panic()s.
+*/
+   u64 addr = __memblock_alloc_base(size, alignment, 0);
+   if (!addr) {
+   return (unsigned long)-ENOMEM;
+   } else if (addr + size  ~(unsigned long)0) {
+   memblock_free(addr, size);
+   return (unsigned long)-EOVERFLOW;
+   } else {
+   start = addr;
+   }
+   }
+


On Wed, 02 Feb 2011 13:43:33 +0100, Ankita Garg ank...@in.ibm.com wrote:

Reserving the areas of memory belonging to CMA using memblock_reserve,
would preclude that range from the zones, due to which it would not be
available for buddy allocations right ?


Correct.  CMA however, injects allocated pageblocks to buddy so they end
up in buddy with migratetype set to MIGRATE_CMA.


+   return start;
+}


--
Best regards,   _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz  (o o)
ooo +-email/jid: mnazarew...@google.comooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv8 00/12] Contiguous Memory Allocator

2010-12-23 Thread Michal Nazarewicz

Russell King - ARM Linux li...@arm.linux.org.uk writes:
 Has anyone addressed my issue with it that this is wide-open for
 abuse by allocating large chunks of memory, and then remapping
 them in some way with different attributes, thereby violating the
 ARM architecture specification?

 In other words, do we _actually_ have a use for this which doesn't
 involve doing something like allocating 32MB of memory from it,
 remapping it so that it's DMA coherent, and then performing DMA
 on the resulting buffer?

Huge pages.

Also, don't treat it as coherent memory and just flush/clear/invalidate
cache before and after each DMA transaction.  I never understood what's
wrong with that approach.

-- 
Best regards, _ _
 .o. | Liege of Serenly Enlightened Majesty of  o' \,=./ `o
 ..o | Computer Science,  Michal mina86 Nazarewicz   (o o)
 ooo +--mina86-tlen.pl--jid:mina86-jabber.org--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv8 00/12] Contiguous Memory Allocator

2010-12-23 Thread Michal Nazarewicz

 On Thu, Dec 23, 2010 at 03:04:07PM +0100, Tomasz Fujak wrote:
 In other words, should we take your response as yet another NAK?
 Or would you try harder and at least point us to some direction that
 would not doom the effort from the very beginning.

 On Thu, Dec 23, 2010 at 4:16 PM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
 What the fsck do you think I've been doing?  This is NOT THE FIRST time
 I've raised this issue.  I gave up raising it after the first couple
 of attempts because I wasn't being listened to.

 You say about _me_ not being very helpful.  How about the CMA proponents
 start taking the issue I've raised seriously, and try to work out how
 to solve it?  And how about blaming them for the months of wasted time
 on this issue _because_ _they_ have chosen to ignore it?

Felipe Contreras felipe.contre...@gmail.com writes:
 I've also raised the issue for ARM. However, I don't see what is the
 big problem.

 A generic solution (that I think I already proposed) would be to
 reserve a chunk of memory for the CMA that can be removed from the
 normally mapped kernel memory through memblock at boot time. The size
 of this memory region would be configurable through kconfig. Then, the
 CMA would have a dma flag or something, 

Having exactly that usage in mind, in v8 I've added notion of private
CMA contexts which can be used for DMA coherent RAM as well as memory
mapped devices.

 and take chunks out of it until there's no more, and then return
 errors. That would work for ARM.

-- 
Best regards, _ _
 .o. | Liege of Serenly Enlightened Majesty of  o' \,=./ `o
 ..o | Computer Science,  Michal mina86 Nazarewicz   (o o)
 ooo +--mina86-tlen.pl--jid:mina86-jabber.org--ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv8 01/12] mm: migrate.c: fix compilation error

2010-12-15 Thread Michal Nazarewicz

GCC complained about update_mmu_cache() not being defined
in migrate.c.  Including asm/tlbflush.h seems to solve the problem.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
---
 mm/migrate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fe5a3c6..6ae8a66 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,6 +35,8 @@
 #include linux/hugetlb.h
 #include linux/gfp.h
 
+#include asm/tlbflush.h
+
 #include internal.h
 
 #define lru_to_page(_head) (list_entry((_head)-prev, struct page, lru))
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv8 02/12] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()

2010-12-15 Thread Michal Nazarewicz

From: Michal Nazarewicz min...@mina86.com

This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked.  This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.

Signed-off-by: Michal Nazarewicz min...@mina86.com
---
 include/linux/bitmap.h |   24 +++-
 lib/bitmap.c   |   22 --
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..c0528d1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -45,6 +45,7 @@
  * bitmap_set(dst, pos, nbits) Set specified bit area
  * bitmap_clear(dst, pos, nbits)   Clear specified bit area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
+ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask)  as above
  * bitmap_shift_right(dst, src, n, nbits)  *dst = *src  n
  * bitmap_shift_left(dst, src, n, nbits)   *dst = *src  n
  * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, 
int bits);
 
 extern void bitmap_set(unsigned long *map, int i, int len);
 extern void bitmap_clear(unsigned long *map, int start, int nr);
-extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
-unsigned long size,
-unsigned long start,
-unsigned int nr,
-unsigned long align_mask);
+
+extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+   unsigned long size,
+   unsigned long start,
+   unsigned int nr,
+   unsigned long align_mask,
+   unsigned long align_offset);
+
+static inline unsigned long
+bitmap_find_next_zero_area(unsigned long *map,
+  unsigned long size,
+  unsigned long start,
+  unsigned int nr,
+  unsigned long align_mask)
+{
+   return bitmap_find_next_zero_area_off(map, size, start, nr,
+ align_mask, 0);
+}
 
 extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 741fae9..8e75a6f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr)
 }
 EXPORT_SYMBOL(bitmap_clear);
 
-/*
+/**
  * bitmap_find_next_zero_area - find a contiguous aligned zero area
  * @map: The address to base the search on
  * @size: The bitmap size in bits
  * @start: The bitnumber to start searching at
  * @nr: The number of zeroed bits we're looking for
  * @align_mask: Alignment mask for zero area
+ * @align_offset: Alignment offset for zero area.
  *
  * The @align_mask should be one less than a power of 2; the effect is that
- * the bit offset of all zero areas this function finds is multiples of that
- * power of 2. A @align_mask of 0 means no alignment is required.
+ * the bit offset of all zero areas this function finds plus @align_offset
+ * is multiple of that power of 2.
  */
-unsigned long bitmap_find_next_zero_area(unsigned long *map,
-unsigned long size,
-unsigned long start,
-unsigned int nr,
-unsigned long align_mask)
+unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int nr,
+unsigned long align_mask,
+unsigned long align_offset)
 {
unsigned long index, end, i;
 again:
index = find_next_zero_bit(map, size, start);
 
/* Align allocation */
-   index = __ALIGN_MASK(index, align_mask);
+   index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;
 
end = index + nr;
if (end  size)
@@ -350,7 +352,7 @@ again:
}
return index;
 }
-EXPORT_SYMBOL(bitmap_find_next_zero_area);
+EXPORT_SYMBOL(bitmap_find_next_zero_area_off);
 
 /*
  * Bitmap printing  parsing functions: first version by Bill Irwin,
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body

[PATCHv8 05/12] mm: alloc_contig_freed_pages() added

2010-12-15 Thread Michal Nazarewicz

From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range.  Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
---
 include/linux/page-isolation.h |3 ++
 mm/page_alloc.c|   44 
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn);
  */
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+ unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
  * For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 826ba69..be240a3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5425,6 +5425,50 @@ out:
spin_unlock_irqrestore(zone-lock, flags);
 }
 
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+  gfp_t flag)
+{
+   unsigned long pfn = start, count;
+   struct page *page;
+   struct zone *zone;
+   int order;
+
+   VM_BUG_ON(!pfn_valid(start));
+   zone = page_zone(pfn_to_page(start));
+
+   spin_lock_irq(zone-lock);
+
+   page = pfn_to_page(pfn);
+   for (;;) {
+   VM_BUG_ON(page_count(page) || !PageBuddy(page));
+   list_del(page-lru);
+   order = page_order(page);
+   zone-free_area[order].nr_free--;
+   rmv_page_order(page);
+   __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL  order));
+   pfn  += 1  order;
+   if (pfn = end)
+   break;
+   VM_BUG_ON(!pfn_valid(pfn));
+   page += 1  order;
+   }
+
+   spin_unlock_irq(zone-lock);
+
+   /* After this, pages in the range can be freed one be one */
+   page = pfn_to_page(start);
+   for (count = pfn - start; count; --count, ++page)
+   prep_new_page(page, 0, flag);
+
+   return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+   for (; nr_pages; --nr_pages, ++page)
+   __free_page(page);
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv8 04/12] mm: move some functions from memory_hotplug.c to page_isolation.c

2010-12-15 Thread Michal Nazarewicz

From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
[mina86: reworded commit message]
Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
---
 include/linux/page-isolation.h |7 +++
 mm/memory_hotplug.c|  108 --
 mm/page_isolation.c|  111 
 3 files changed, 118 insertions(+), 108 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2c6523a..2b18cb5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -634,114 +634,6 @@ int is_mem_section_removable(unsigned long start_pfn, 
unsigned long nr_pages)
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-   unsigned long pfn;
-   struct zone *zone = NULL;
-   struct page *page;
-   int i;
-   for (pfn = start_pfn;
-pfn  end_pfn;
-pfn += MAX_ORDER_NR_PAGES) {
-   i = 0;
-   /* This is just a CONFIG_HOLES_IN_ZONE check.*/
-   while ((i  MAX_ORDER_NR_PAGES)  !pfn_valid_within(pfn + i))
-   i++;
-   if (i == MAX_ORDER_NR_PAGES)
-   continue;
-   page = pfn_to_page(pfn + i);
-   if (zone  page_zone(page) != zone)
-   return 0;
-   zone = page_zone(page);
-   }
-   return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
-   unsigned long pfn;
-   struct page *page;
-   for (pfn = start; pfn  end; pfn++) {
-   if (pfn_valid(pfn)) {
-   page = pfn_to_page(pfn);
-   if (PageLRU(page))
-   return pfn;
-   }
-   }
-   return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-   /* This should be improved!! */
-   return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES   (256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
-   unsigned long pfn;
-   struct page *page;
-   int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-   int not_managed = 0;
-   int ret = 0;
-   LIST_HEAD(source);
-
-   for (pfn = start_pfn; pfn  end_pfn  move_pages  0; pfn++) {
-   if (!pfn_valid(pfn))
-   continue;
-   page = pfn_to_page(pfn);
-   if (!page_count(page))
-   continue;
-   /*
-* We can skip free pages. And we can only deal with pages on
-* LRU.
-*/
-   ret = isolate_lru_page(page);
-   if (!ret) { /* Success */
-   list_add_tail(page-lru, source);
-   move_pages--;
-   inc_zone_page_state(page, NR_ISOLATED_ANON +
-   page_is_file_cache(page));
-
-   } else {
-#ifdef CONFIG_DEBUG_VM
-   printk(KERN_ALERT removing pfn %lx from LRU failed\n,
-  pfn);
-   dump_page(page);
-#endif
-   /* Becasue we don't have big zone-lock. we should
-  check this again here. */
-   if (page_count(page)) {
-   not_managed++;
-   ret = -EBUSY;
-   break;
-   }
-   }
-   }
-   if (!list_empty(source)) {
-   if (not_managed

[PATCHv8 12/12] ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

2010-12-15 Thread Michal Nazarewicz

This commit adds CMA memory reservation code to Aquila, Goni and c210
universal boards.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 arch/arm/mach-s5pv210/Kconfig   |2 +
 arch/arm/mach-s5pv210/mach-aquila.c |2 +
 arch/arm/mach-s5pv210/mach-goni.c   |2 +
 arch/arm/mach-s5pv310/Kconfig   |1 +
 arch/arm/mach-s5pv310/mach-universal_c210.c |2 +
 arch/arm/plat-s5p/Makefile  |2 +
 arch/arm/plat-s5p/cma-stub.c|   49 +++
 arch/arm/plat-s5p/include/plat/cma-stub.h   |   21 +++
 8 files changed, 81 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/plat-s5p/cma-stub.c
 create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 53aabef..b395a16 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -68,6 +68,7 @@ config MACH_AQUILA
select S5P_DEV_ONENAND
select S5PV210_SETUP_FB_24BPP
select S5PV210_SETUP_SDHCI
+   select CMA_DEVICE_POSSIBLE
help
  Machine support for the Samsung Aquila target based on S5PC110 SoC
 
@@ -92,6 +93,7 @@ config MACH_GONI
select S5PV210_SETUP_I2C2
select S5PV210_SETUP_KEYPAD
select S5PV210_SETUP_SDHCI
+   select CMA_DEVICE_POSSIBLE
help
  Machine support for Samsung GONI board
  S5PC110(MCP) is one of package option of S5PV210
diff --git a/arch/arm/mach-s5pv210/mach-aquila.c 
b/arch/arm/mach-s5pv210/mach-aquila.c
index 28677ca..8608a16 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -39,6 +39,7 @@
 #include plat/fb.h
 #include plat/fimc-core.h
 #include plat/sdhci.h
+#include plat/cma-stub.h
 
 /* Following are default values for UCON, ULCON and UFCON UART registers */
 #define AQUILA_UCON_DEFAULT(S3C2410_UCON_TXILEVEL |\
@@ -690,4 +691,5 @@ MACHINE_START(AQUILA, Aquila)
.map_io = aquila_map_io,
.init_machine   = aquila_machine_init,
.timer  = s3c24xx_timer,
+   .reserve= cma_mach_reserve,
 MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c 
b/arch/arm/mach-s5pv210/mach-goni.c
index b1dcf96..b1bf079 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -45,6 +45,7 @@
 #include plat/keypad.h
 #include plat/sdhci.h
 #include plat/clock.h
+#include plat/cma-stub.h
 
 /* Following are default values for UCON, ULCON and UFCON UART registers */
 #define GONI_UCON_DEFAULT  (S3C2410_UCON_TXILEVEL |\
@@ -865,4 +866,5 @@ MACHINE_START(GONI, GONI)
.map_io = goni_map_io,
.init_machine   = goni_machine_init,
.timer  = s3c24xx_timer,
+   .reserve= cma_mach_reserve,
 MACHINE_END
diff --git a/arch/arm/mach-s5pv310/Kconfig b/arch/arm/mach-s5pv310/Kconfig
index d64efe0..ae4e0da 100644
--- a/arch/arm/mach-s5pv310/Kconfig
+++ b/arch/arm/mach-s5pv310/Kconfig
@@ -85,6 +85,7 @@ config MACH_UNIVERSAL_C210
select S5P_DEV_ONENAND
select S3C_DEV_I2C1
select S5PV310_SETUP_I2C1
+   select CMA_DEVICE_POSSIBLE
help
  Machine support for Samsung Mobile Universal S5PC210 Reference
  Board. S5PC210(MCP) is one of package option of S5PV310
diff --git a/arch/arm/mach-s5pv310/mach-universal_c210.c 
b/arch/arm/mach-s5pv310/mach-universal_c210.c
index 16d8fc0..d65703a 100644
--- a/arch/arm/mach-s5pv310/mach-universal_c210.c
+++ b/arch/arm/mach-s5pv310/mach-universal_c210.c
@@ -21,6 +21,7 @@
 #include plat/s5pv310.h
 #include plat/cpu.h
 #include plat/devs.h
+#include plat/cma-stub.h
 
 #include mach/map.h
 
@@ -152,6 +153,7 @@ MACHINE_START(UNIVERSAL_C210, UNIVERSAL_C210)
.boot_params= S5P_PA_SDRAM + 0x100,
.init_irq   = s5pv310_init_irq,
.map_io = universal_map_io,
+   .reserve= cma_mach_reserve,
.init_machine   = universal_machine_init,
.timer  = s5pv310_timer,
 MACHINE_END
diff --git a/arch/arm/plat-s5p/Makefile b/arch/arm/plat-s5p/Makefile
index de65238..6fdb6ce 100644
--- a/arch/arm/plat-s5p/Makefile
+++ b/arch/arm/plat-s5p/Makefile
@@ -28,3 +28,5 @@ obj-$(CONFIG_S5P_DEV_FIMC0)   += dev-fimc0.o
 obj-$(CONFIG_S5P_DEV_FIMC1)+= dev-fimc1.o
 obj-$(CONFIG_S5P_DEV_FIMC2)+= dev-fimc2.o
 obj-$(CONFIG_S5P_DEV_ONENAND)  += dev-onenand.o
+
+obj-$(CONFIG_CMA) += cma-stub.o
diff --git a/arch/arm/plat-s5p/cma-stub.c b/arch/arm/plat-s5p/cma-stub.c
new file mode 100644
index 000..c175ba8
--- /dev/null
+++ b/arch/arm/plat-s5p/cma-stub.c
@@ -0,0 +1,49 @@
+/*
+ * This file is just a quick and dirty hack to get CMA testing device
+ * working.  The cma_mach_reserve() should be called as mach's reserve
+ * callback.  CMA testing device will use cma_ctx for allocations

[PATCHv8 07/12] mm: cma: Contiguous Memory Allocator added

2010-12-15 Thread Michal Nazarewicz

The Contiguous Memory Allocator is a set of functions that lets
one initialise a region of memory which then can be used to perform
allocations of contiguous memory chunks from.

CMA allows for creation of private and non-private contexts.
The former is reserved for CMA and no other kernel subsystem can
use it.  The latter allows for movable pages to be allocated within
CMA's managed memory so that it can be used for page cache when
CMA devices do not use it.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/cma.h |  219 ++
 mm/Kconfig  |   22 
 mm/Makefile |1 +
 mm/cma.c|  328 +++
 4 files changed, 570 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cma.h
 create mode 100644 mm/cma.c

diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 000..e9575fd
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,219 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz (m.nazarew...@samsung.com)
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible for
+ *   device drivers to allocate big contiguous chunks of memory after
+ *   the system has booted.
+ *
+ *   It requires some machine- and/or platform-specific initialisation
+ *   code which prepares memory ranges to be used with CMA and later,
+ *   device drivers can allocate memory from those ranges.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   For device driver to use CMA it needs to have a pointer to a CMA
+ *   context represented by a struct cma (which is an opaque data
+ *   type).
+ *
+ *   Once such pointer is obtained, device driver may allocate
+ *   contiguous memory chunk using the following function:
+ *
+ * cm_alloc()
+ *
+ *   This function returns a pointer to struct cm (another opaque data
+ *   type) which represent a contiguous memory chunk.  This pointer
+ *   may be used with the following functions:
+ *
+ * cm_free()-- frees allocated contiguous memory
+ * cm_pin() -- pins memory
+ * cm_unpin()   -- unpins memory
+ * cm_vmap()-- maps memory in kernel space
+ * cm_vunmap()  -- unmaps memory from kernel space
+ *
+ *   See the respective functions for more information.
+ *
+ * Platform/machine integration
+ *
+ *   For device drivers to be able to use CMA platform or machine
+ *   initialisation code must create a CMA context and pass it to
+ *   device drivers.  The latter may be done by a global variable or
+ *   a platform/machine specific function.  For the former CMA
+ *   provides the following functions:
+ *
+ * cma_reserve()
+ * cma_create()
+ *
+ *   The cma_reserve() function must be called when memblock is still
+ *   operational and reserving memory with it is still possible.  On
+ *   ARM platform the reserve machine callback is a perfect place to
+ *   call it.
+ *
+ *   The last function creates a CMA context on a range of previously
+ *   initialised memory addresses.  Because it uses kmalloc() it needs
+ *   to be called after SLAB is initialised.
+ */
+
+/* Kernel level API */
+
+#if defined __KERNEL__  defined CONFIG_CMA
+
+/* CMA context */
+struct cma;
+/* Contiguous Memory chunk */
+struct cm;
+
+/**
+ * cma_reserve() - reserves memory.
+ * @start: start address of the memory range in bytes hint; if unsure
+ * pass zero.
+ * @size:  size of the memory to reserve in bytes.
+ * @alignment: desired alignment in bytes (must be power of two or zero).
+ *
+ * It will use memblock to allocate memory.  @start and @size will be
+ * aligned to PAGE_SIZE.
+ *
+ * Returns reserved's area physical address or value

[PATCHv8 10/12] mm: MIGRATE_CMA support added to CMA

2010-12-15 Thread Michal Nazarewicz

This commit adds MIGRATE_CMA migratetype support to the CMA.
The advantage is that an (almost) arbitrary memory range can
be marked as MIGRATE_CMA which may not be the case with
ZONE_MOVABLE.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/cma.h |   58 ---
 mm/cma.c|  161 +--
 2 files changed, 194 insertions(+), 25 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index e9575fd..8952531 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -71,9 +71,14 @@
  *   a platform/machine specific function.  For the former CMA
  *   provides the following functions:
  *
+ * cma_init_migratetype()
  * cma_reserve()
  * cma_create()
  *
+ *   The first one initialises a portion of reserved memory so that it
+ *   can be used with CMA.  The second first tries to reserve memory
+ *   (using memblock) and then initialise it.
+ *
  *   The cma_reserve() function must be called when memblock is still
  *   operational and reserving memory with it is still possible.  On
  *   ARM platform the reserve machine callback is a perfect place to
@@ -93,21 +98,56 @@ struct cma;
 /* Contiguous Memory chunk */
 struct cm;
 
+#ifdef CONFIG_MIGRATE_CMA
+
+/**
+ * cma_init_migratetype() - initialises range of physical memory to be used
+ * with CMA context.
+ * @start: start address of the memory range in bytes.
+ * @size:  size of the memory range in bytes.
+ *
+ * The range must be MAX_ORDER_NR_PAGES aligned and it must have been
+ * already reserved (eg. with memblock).
+ *
+ * The actual initialisation is deferred until subsys initcalls are
+ * evaluated (unless this has already happened).
+ *
+ * Returns zero on success or negative error.
+ */
+int cma_init_migratetype(unsigned long start, unsigned long end);
+
+#else
+
+static inline int cma_init_migratetype(unsigned long start, unsigned long end)
+{
+   (void)start; (void)end;
+   return -EOPNOTSUPP;
+}
+
+#endif
+
 /**
  * cma_reserve() - reserves memory.
  * @start: start address of the memory range in bytes hint; if unsure
  * pass zero.
  * @size:  size of the memory to reserve in bytes.
  * @alignment: desired alignment in bytes (must be power of two or zero).
+ * @init_migratetype:  whether to initialise pageblocks.
+ *
+ * It will use memblock to allocate memory.  If @init_migratetype is
+ * true, the function will also call cma_init_migratetype() on
+ * reserved region so that a non-private CMA context can be created on
+ * given range.
  *
- * It will use memblock to allocate memory.  @start and @size will be
- * aligned to PAGE_SIZE.
+ * @start and @size will be aligned to PAGE_SIZE if @init_migratetype
+ * is false or to (MAX_ORDER_NR_PAGES  PAGE_SHIFT) if
+ * @init_migratetype is true.
  *
  * Returns reserved's area physical address or value that yields true
  * when checked with IS_ERR_VALUE().
  */
 unsigned long cma_reserve(unsigned long start, unsigned long size,
- unsigned long alignment);
+ unsigned long alignment, _Bool init_migratetype);
 
 /**
  * cma_create() - creates a CMA context.
@@ -118,12 +158,14 @@ unsigned long cma_reserve(unsigned long start, unsigned 
long size,
  *
  * The range must be page aligned.  Different contexts cannot overlap.
  *
- * Unless @private is true the memory range must lay in ZONE_MOVABLE.
- * If @private is true no underlaying memory checking is done and
- * during allocation no pages migration will be performed - it is
- * assumed that the memory is reserved and only CMA manages it.
+ * Unless @private is true the memory range must either lay in
+ * ZONE_MOVABLE or must have been initialised with
+ * cma_init_migratetype() function.  If @private is true no
+ * underlaying memory checking is done and during allocation no pages
+ * migration will be performed - it is assumed that the memory is
+ * reserved and only CMA manages it.
  *
- * @start and @size must be page and @min_alignment alignment.
+ * @start and @size must be page and @min_alignment aligned.
  * @min_alignment specifies the minimal alignment that user will be
  * able to request through cm_alloc() function.  In most cases one
  * will probably pass zero as @min_alignment but if the CMA context
diff --git a/mm/cma.c b/mm/cma.c
index d82361b..4017dee 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -57,21 +57,130 @@ static unsigned long phys_to_pfn(phys_addr_t phys)
 
 /* Initialise CMA */
 
+#ifdef CONFIG_MIGRATE_CMA
+
+static struct cma_grabbed {
+   unsigned long start;
+   unsigned long size;
+} cma_grabbed[8] __initdata;
+static unsigned cma_grabbed_count __initdata;
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_give_back(unsigned long start, unsigned long size)
+{
+   unsigned long pfn = phys_to_pfn(start

[PATCHv8 11/12] mm: cma: Test device and application added

2010-12-15 Thread Michal Nazarewicz

This patch adds a cma misc device which lets user space use the
CMA API.  This device is meant for testing.  A testing application
is also provided.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/misc/Kconfig   |   28 +++
 drivers/misc/Makefile  |1 +
 drivers/misc/cma-dev.c |  238 +
 include/linux/cma.h|   29 +++
 tools/cma/cma-test.c   |  457 
 5 files changed, 753 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/cma-dev.c
 create mode 100644 tools/cma/cma-test.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 1e1a4be..b90e36b 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -458,4 +458,32 @@ source drivers/misc/cb710/Kconfig
 source drivers/misc/iwmc3200top/Kconfig
 source drivers/misc/ti-st/Kconfig
 
+# Selet this for platforms/machines that implemented code making
+# the CMA test device usable.
+config CMA_DEVICE_POSSIBLE
+   bool
+
+config CMA_DEVICE
+   tristate CMA test device (DEVELOPEMENT)
+   depends on CMA  CMA_DEVICE_POSSIBLE
+   help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space.  This is for testing of the CMA framework.
+
+ If unsure, say n
+
+# Selet this for platforms/machines that implemented code making
+# the CMA test device usable.
+config CMA_DEVICE_POSSIBLE
+   bool
+
+config CMA_DEVICE
+   tristate CMA test device (DEVELOPEMENT)
+   depends on CMA  CMA_DEVICE_POSSIBLE
+   help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space.  This is for testing of the CMA framework.
+
+ If unsure, say n
+
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..f8eadd4 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
 obj-$(CONFIG_PCH_PHUB) += pch_phub.o
 obj-y  += ti-st/
 obj-$(CONFIG_AB8500_PWM)   += ab8500-pwm.o
+obj-$(CONFIG_CMA_DEVICE)   += cma-dev.o
diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c
new file mode 100644
index 000..6c36064
--- /dev/null
+++ b/drivers/misc/cma-dev.c
@@ -0,0 +1,238 @@
+/*
+ * Contiguous Memory Allocator userspace driver
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz (m.nazarew...@samsung.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) cma:  fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#  define DEBUG
+#endif
+
+#include linux/errno.h   /* Error numbers */
+#include linux/err.h /* IS_ERR_VALUE() */
+#include linux/fs.h  /* struct file */
+#include linux/mm.h  /* Memory stuff */
+#include linux/mman.h
+#include linux/slab.h
+#include linux/module.h  /* Standard module stuff */
+#include linux/device.h  /* struct device, dev_dbg() */
+#include linux/types.h   /* Just to be safe ;) */
+#include linux/uaccess.h /* __copy_{to,from}_user */
+#include linux/miscdevice.h  /* misc_register() and company */
+
+#include linux/cma.h
+
+#include plat/cma-stub.h
+
+static int  cma_file_open(struct inode *inode, struct file *file);
+static int  cma_file_release(struct inode *inode, struct file *file);
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+static int  cma_file_mmap(struct file *file, struct vm_area_struct *vma);
+
+static struct miscdevice cma_miscdev = {
+   .minor = MISC_DYNAMIC_MINOR,
+   .name  = cma,
+   .fops  = (const struct file_operations) {
+   .owner  = THIS_MODULE,
+   .open   = cma_file_open,
+   .release= cma_file_release,
+   .unlocked_ioctl = cma_file_ioctl,
+   .mmap   = cma_file_mmap,
+   },
+};
+#define cma_dev (cma_miscdev.this_device)
+
+struct cma_private_data {
+   struct cm   *cm;
+   unsigned long   size;
+   unsigned long   phys;
+};
+
+static int  cma_file_open(struct inode *inode, struct file *file)
+{
+   struct cma_private_data *prv;
+
+   dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file);
+
+   if (!cma_ctx)
+   return -EOPNOTSUPP;
+
+   prv = kzalloc(sizeof *prv, GFP_KERNEL);
+   if (!prv)
+   return -ENOMEM;
+
+   file-private_data = prv;
+
+   return 0;
+}
+
+static int  cma_file_release(struct inode *inode, struct file *file)
+{
+   struct cma_private_data *prv = file-private_data;
+
+   dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file);
+
+   if (prv-cm

[PATCHv8 08/12] mm: MIGRATE_CMA migration type added

2010-12-15 Thread Michal Nazarewicz

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/mmzone.h |   43 
 mm/Kconfig |   14 
 mm/compaction.c|   10 +
 mm/internal.h  |3 ++
 mm/page_alloc.c|   87 +--
 5 files changed, 131 insertions(+), 26 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 39c24eb..cc798b1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,37 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE 0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE   2
-#define MIGRATE_PCPTYPES  3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE   3
-#define MIGRATE_ISOLATE   4 /* can't allocate from here */
-#define MIGRATE_TYPES 5
+enum {
+   MIGRATE_UNMOVABLE,
+   MIGRATE_RECLAIMABLE,
+   MIGRATE_MOVABLE,
+   MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
+   MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+#ifdef CONFIG_MIGRATE_CMA
+   /*
+* MIGRATE_CMA migration type is designed to mimic the way
+* ZONE_MOVABLE works.  Only movable pages can be allocated
+* from MIGRATE_CMA pageblocks and page allocator never
+* implicitly change migration type of MIGRATE_CMA pageblock.
+*
+* The way to use it is to change migratetype of a range of
+* pageblocks to MIGRATE_CMA which can be done by
+* __free_pageblock_cma() function.  What is important though
+* is that a range of pageblocks must be aligned to
+* MAX_ORDER_NR_PAGES should biggest page be bigger then
+* a single pageblock.
+*/
+   MIGRATE_CMA,
+#endif
+   MIGRATE_ISOLATE,/* can't allocate from here */
+   MIGRATE_TYPES
+};
+
+#ifdef CONFIG_MIGRATE_CMA
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
for (order = 0; order  MAX_ORDER; order++) \
@@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page 
*page)
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+   return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
unsigned long   nr_free;
diff --git a/mm/Kconfig b/mm/Kconfig
index 2beab4d..32fb085 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -343,6 +343,20 @@ config CMA
 
  For more information see include/linux/cma.h.  If unsure, say n.
 
+config MIGRATE_CMA
+   bool Use MIGRATE_CMA migratetype
+   depends on CMA
+   default y
+   help
+ This enables the use the MIGRATE_CMA migrate type in the CMA.
+ MIGRATE_CMA lets CMA work on almost arbitrary memory range and
+ not only inside ZONE_MOVABLE.
+
+ This option can also be selected by code that uses MIGRATE_CMA
+ even if CMA is not present.
+
+ If unsure, say y.
+
 config CMA_DEBUG
bool CMA debug messages (DEVELOPEMENT)
depends on CMA
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..c5e404b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page)
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
return false;
 
+   /* Keep MIGRATE_CMA alone as well. */
+   /*
+* XXX Revisit.  We currently cannot let compaction touch CMA
+* pages since compaction insists on changing their migration
+* type to MIGRATE_MOVABLE (see split_free_page() called from
+* isolate_freepages_block() above).
+*/
+   if (is_migrate_cma(migratetype))
+   return false;
+
/* If the page is a large free page, then allow migration */
if (PageBuddy(page)  page_order(page) = pageblock_order)
return true;
diff --git a/mm

[PATCHv8 09/12] mm: MIGRATE_CMA isolation functions added

2010-12-15 Thread Michal Nazarewicz

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/page-isolation.h |   40 +++-
 mm/page_alloc.c|   19 ---
 mm/page_isolation.c|   15 ---
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..177b307 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
  * free all pages in the range. test_page_isolated() can be used for
  * test it.
  */
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+  unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+   return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+   return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+   __unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
  unsigned long end, gfp_t flag);
 extern int alloc_contig_range(unsigned long start, unsigned long end,
- gfp_t flags);
+ gfp_t flags, unsigned migratetype);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e706282..7f913d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5460,7 +5460,7 @@ out:
return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
struct zone *zone;
unsigned long flags;
@@ -5468,8 +5468,8 @@ void unset_migratetype_isolate(struct page *page)
spin_lock_irqsave(zone-lock, flags);
if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
goto out;
-   set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-   move_freepages_block(zone, page, MIGRATE_MOVABLE);
+   set_pageblock_migratetype(page, migratetype);
+   move_freepages_block(zone, page, migratetype);
 out:
spin_unlock_irqrestore(zone-lock, flags);
 }
@@ -5574,6 +5574,10 @@ static int __alloc_contig_migrate_range(unsigned long 
start, unsigned long end)
  * @start: start PFN to allocate
  * @end:   one-past-the-last PFN to allocate
  * @flags: flags passed to alloc_contig_freed_pages().
+ * @migratetype:   migratetype of the underlaying pageblocks (either
+ * #MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ * in range must have the same migratetype and it must
+ * be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, hovewer it's callers responsibility to guarantee that we
@@ -5585,7 +5589,7 @@ static int __alloc_contig_migrate_range(unsigned long 
start, unsigned long end)
  * need to be freed with free_contig_pages().
  */
 int

[PATCHv8 06/12] mm: alloc_contig_range() added

2010-12-15 Thread Michal Nazarewicz

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/page-isolation.h |2 +
 mm/page_alloc.c|  144 
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 extern unsigned long alloc_contig_freed_pages(unsigned long start,
  unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+ gfp_t flags);
 extern void free_contig_pages(struct page *page, int nr_pages);
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index be240a3..008a6e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5463,6 +5463,150 @@ unsigned long alloc_contig_freed_pages(unsigned long 
start, unsigned long end,
return pfn;
 }
 
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+   return pfn  ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+   return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+   int migration_failed = 0, ret;
+   unsigned long pfn = start;
+
+   /*
+* Some code borrowed from KAMEZAWA Hiroyuki's
+* __alloc_contig_pages().
+*/
+
+   for (;;) {
+   pfn = scan_lru_pages(pfn, end);
+   if (!pfn || pfn = end)
+   break;
+
+   ret = do_migrate_range(pfn, end);
+   if (!ret) {
+   migration_failed = 0;
+   } else if (ret != -EBUSY
+   || ++migration_failed = MIGRATION_RETRY) {
+   return ret;
+   } else {
+   /* There are unstable pages.on pagevec. */
+   lru_add_drain_all();
+   /*
+* there may be pages on pcplist before
+* we mark the range as ISOLATED.
+*/
+   drain_all_pages();
+   }
+   cond_resched();
+   }
+
+   if (!migration_failed) {
+   /* drop all pages in pagevec and pcp list */
+   lru_add_drain_all();
+   drain_all_pages();
+   }
+
+   /* Make sure all pages are isolated */
+   if (WARN_ON(test_pages_isolated(start, end)))
+   return -EBUSY;
+
+   return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start: start PFN to allocate
+ * @end:   one-past-the-last PFN to allocate
+ * @flags: flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+  gfp_t flags)
+{
+   unsigned long _start, _end;
+   int ret;
+
+   /*
+* What we do here is we mark all pageblocks in range as
+* MIGRATE_ISOLATE.  Because of the way page allocator work, we
+* align the range to MAX_ORDER pages so that page allocator
+* won't try to merge buddies from different pageblocks and
+* change MIGRATE_ISOLATE to some other migration type.
+*
+* Once the pageblocks are marked as MIGRATE_ISOLATE, we
+* migrate the pages from an unaligned range (ie. pages that
+* we are interested in).  This will put all the pages in
+* range back to page allocator as MIGRATE_ISOLATE.
+*
+* When this is done, we take the pages in range from page
+* allocator removing them from the buddy system.  This way
+* page allocator will never consider using them.
+*
+* This lets us mark the pageblocks back as
+* MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+* MAX_ORDER aligned range

[PATCHv8 03/12] lib: genalloc: Generic allocator improvements

2010-12-15 Thread Michal Nazarewicz

This commit adds a gen_pool_alloc_aligned() function to the
generic allocator API.  It allows specifying alignment for the
allocated block.  This feature uses
the bitmap_find_next_zero_area_off() function.

It also fixes possible issue with bitmap's last element being
not fully allocated (ie. space allocated for chunk-bits is
not a multiple of sizeof(long)).

It also makes some other smaller changes:
- moves structure definitions out of the header file,
- adds __must_check to functions returning value,
- makes gen_pool_add() return -ENOMEM rater than -1 on error,
- changes list_for_each to list_for_each_entry, and
- makes use of bitmap_clear().

Signed-off-by: Michal Nazarewicz min...@mina86.com
---
 include/linux/genalloc.h |   46 ++--
 lib/genalloc.c   |  182 ++---
 2 files changed, 129 insertions(+), 99 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 9869ef3..8ac7337 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -8,29 +8,31 @@
  * Version 2.  See the file COPYING for more details.
  */
 
+struct gen_pool;
 
-/*
- *  General purpose special memory pool descriptor.
- */
-struct gen_pool {
-   rwlock_t lock;
-   struct list_head chunks;/* list of chunks in this pool */
-   int min_alloc_order;/* minimum allocation order */
-};
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid);
 
-/*
- *  General purpose special memory pool chunk descriptor.
+int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr,
+ size_t size, int nid);
+
+void gen_pool_destroy(struct gen_pool *pool);
+
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+  unsigned alignment_order);
+
+/**
+ * gen_pool_alloc() - allocate special memory from the pool
+ * @pool:  Pool to allocate from.
+ * @size:  Number of bytes to allocate from the pool.
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses a first-fit algorithm.
  */
-struct gen_pool_chunk {
-   spinlock_t lock;
-   struct list_head next_chunk;/* next chunk in pool */
-   unsigned long start_addr;   /* starting address of memory chunk */
-   unsigned long end_addr; /* ending address of memory chunk */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
-};
+static inline unsigned long __must_check
+gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+   return gen_pool_alloc_aligned(pool, size, 0);
+}
 
-extern struct gen_pool *gen_pool_create(int, int);
-extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 1923f14..0761079 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -16,53 +16,80 @@
 #include linux/genalloc.h
 
 
+/* General purpose special memory pool descriptor. */
+struct gen_pool {
+   rwlock_t lock;  /* protects chunks list */
+   struct list_head chunks;/* list of chunks in this pool */
+   unsigned order; /* minimum allocation order */
+};
+
+/* General purpose special memory pool chunk descriptor. */
+struct gen_pool_chunk {
+   spinlock_t lock;/* protects bits */
+   struct list_head next_chunk;/* next chunk in pool */
+   unsigned long start;/* start of memory chunk */
+   unsigned long size; /* number of bits */
+   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+};
+
+
 /**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @order: Log base 2 of number of bytes each bitmap bit
+ * represents.
+ * @nid:   Node id of the node the pool structure should be allocated
+ * on, or -1.  This will be also used for other allocations.
  *
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface.
  */
-struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid)
 {
struct gen_pool *pool;
 
-   pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
-   if (pool != NULL) {
+   if (WARN_ON(order = BITS_PER_LONG))
+   return NULL;
+
+   pool = kmalloc_node(sizeof *pool, GFP_KERNEL, nid

[PATCHv8 00/12] Contiguous Memory Allocator

2010-12-15 Thread Michal Nazarewicz

Hello everyone,

This is yet another version of CMA this time stripped from a lot of
code and with working migration implementation.

   The Contiguous Memory Allocator (CMA) makes it possible for
   device drivers to allocate big contiguous chunks of memory after
   the system has booted.

For more information see 7th patch in the set.


This version fixes some things Kamezawa suggested plus it separates
code that uses MIGRATE_CMA from the rest of the code.  This I hope
will help to grasp the overall idea of CMA.


The current version is just an allocator that handles allocation of
contiguous memory blocks.  The difference between this patchset and
Kamezawa's alloc_contig_pages() are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded systems where a few MiBs are
   required.

   Lack of the requirement on the alignment means that several threads
   might try to access the same pageblock/page.  To prevent this from
   happening CMA uses a mutex so that only one cm_alloc()/cm_free()
   function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system RAM.  (ie. in two memory banks (do not confuse
   with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.


Links to previous versions of the patchset:
v7: http://article.gmane.org/gmane.linux.kernel.mm/55626
v6: http://article.gmane.org/gmane.linux.kernel.mm/55626
v5: (intentionally left out as CMA v5 was identical to CMA v4)
v4: http://article.gmane.org/gmane.linux.kernel.mm/52010
v3: http://article.gmane.org/gmane.linux.kernel.mm/51573
v2: http://article.gmane.org/gmane.linux.kernel.mm/50986
v1: http://article.gmane.org/gmane.linux.kernel.mm/50669


Changelog:

v8: 1. The alloc_contig_range() function has now been separated from
   CMA and put in page_allocator.c.  This function tries to
   migrate all LRU pages in specified range and then allocate the
   range using alloc_contig_freed_pages().

2. Support for MIGRATE_CMA has been separated from the CMA code.
   I have not tested if CMA works with ZONE_MOVABLE but I see no
   reasons why it shouldn't.

3. I have added a @private argument when creating CMA contexts so
   that one can reserve memory and not share it with the rest of
   the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver-allocator_context
   mapping has been removed from the patchset.  This is not to say
   that this code is not needed, it's just not worth posting
   everything in one patchset.

   Currently, CMA is just an allocator.  It uses it's own
   migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
   which behave just like ZONE_MOVABLE but dispite the latter can
   be put in arbitrary places.

2. The migration code that was introduced in the previous version
   actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
   The implementation is not yet complete though.

   Migration support means that when CMA is not using memory
   reserved for it, page allocator can allocate pages from it.
   When CMA wants to use the memory, the pages have to be moved
   and/or evicted as to make room for CMA.

   To make it possible it must be guaranteed that only movable and
   reclaimable pages are allocated in CMA controlled regions.
   This is done by introducing a MIGRATE_CMA migrate type that
   guarantees exactly that.

   Some of the migration code is borrowed from Kamezawa
   Hiroyuki's alloc_contig_pages() implementation.  The main
   difference is that thanks to MIGRATE_CMA migrate type CMA
   assumes that memory controlled by CMA are is always movable or
   reclaimable so that it makes allocation decisions regardless of
   the whether some pages are actually allocated and migrates them
   if needed.

   The most interesting patches from the patchset that implement
   the functionality are:

 09/13: mm: alloc_contig_free_pages() added
 10/13: mm: MIGRATE_CMA migration type added
 11/13: mm: MIGRATE_CMA isolation functions added
 12/13: mm: cma: Migration support added [wip]

   Currently, kernel panics in some situations which I am trying
   to investigate.

2. cma_pin() and cma_unpin() functions has been added (after
   a conversation with Johan Mossberg).  The idea is that whenever
   hardware does not use the

Re: [PATCHv7 06/10] mm: MIGRATE_CMA migration type added

2010-12-14 Thread Michal Nazarewicz

 On Mon, 13 Dec 2010 12:26:47 +0100
 Michal Nazarewicz m.nazarew...@samsung.com wrote:
 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
 @@ -35,13 +35,24 @@
   */
  #define PAGE_ALLOC_COSTLY_ORDER 3
  
 -#define MIGRATE_UNMOVABLE 0
 -#define MIGRATE_RECLAIMABLE   1
 -#define MIGRATE_MOVABLE   2
 -#define MIGRATE_PCPTYPES  3 /* the number of types on the pcp lists */
 -#define MIGRATE_RESERVE   3
 -#define MIGRATE_ISOLATE   4 /* can't allocate from here */
 -#define MIGRATE_TYPES 5
 +enum {
 +MIGRATE_UNMOVABLE,
 +MIGRATE_RECLAIMABLE,
 +MIGRATE_MOVABLE,
 +MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
 +MIGRATE_RESERVE = MIGRATE_PCPTYPES,
 +MIGRATE_ISOLATE,/* can't allocate from here */
 +#ifdef CONFIG_MIGRATE_CMA
 +MIGRATE_CMA,/* only movable */
 +#endif
 +MIGRATE_TYPES
 +};


KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com writes:
 I personaly would like to put MIGRATE_ISOLATE to be bottom of enum
 because it means _not_for_allocation.

Will change.  I didn't want to change the value of MIGRATE_ISOLATE in
fear of breaking something but hopefully no one depends on
MIGRATE_ISOLATE's value. 

 diff --git a/mm/compaction.c b/mm/compaction.c
 @@ -824,11 +848,15 @@ struct page *__rmqueue_smallest(struct zone *zone, 
 unsigned int order,
   * This array describes the order lists are fallen back to when
   * the free lists for the desirable migrate type are depleted
   */
 -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
 +static int fallbacks[MIGRATE_TYPES][4] = {
  [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
 MIGRATE_RESERVE },
  [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
 MIGRATE_RESERVE },
 +#ifdef CONFIG_MIGRATE_CMA
 +[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
 MIGRATE_CMA, MIGRATE_RESERVE },
 +#else
  [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
 MIGRATE_RESERVE },
 -[MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE,   
 MIGRATE_RESERVE }, /* Never used */
 +#endif
 +[MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
  };
  
  /*
 @@ -924,12 +952,12 @@ __rmqueue_fallback(struct zone *zone, int order, int 
 start_migratetype)
  /* Find the largest possible block of pages in the other list */
  for (current_order = MAX_ORDER-1; current_order = order;
  --current_order) {
 -for (i = 0; i  MIGRATE_TYPES - 1; i++) {
 +for (i = 0; i  ARRAY_SIZE(fallbacks[0]); i++) {

 Why fallbacks[0] ? and why do you need to change this ?

I've changed the dimensions of fallbacks matrix, in particular second
dimension from MIGRATE_TYPE - 1 to 4 so this place needed to be changed
as well.  Now, I think changing to ARRAY_SIZE() is just the safest
option available.  This is actually just a minor optimisation.

  migratetype = fallbacks[start_migratetype][i];
  
  /* MIGRATE_RESERVE handled later if necessary */
  if (migratetype == MIGRATE_RESERVE)
 -continue;
 +break;
  

 Isn't this change enough for your purpose ?

This is mostly just an optimisation really.  I'm not sure what you think
is my purpose here. ;)  It does fix an this issue of some of the
fallback[*] arrays having MIGRATETYPE_UNMOVABLE at the end.

 @@ -1042,7 +1083,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned 
 int order,
  list_add(page-lru, list);
  else
  list_add_tail(page-lru, list);
 -set_page_private(page, migratetype);
 +#ifdef CONFIG_MIGRATE_CMA
 +if (is_pageblock_cma(page))
 +set_page_private(page, MIGRATE_CMA);
 +else
 +#endif
 +set_page_private(page, migratetype);

 Hmm, doesn't this meet your changes which makes MIGRATE_CMA 
 MIGRATE_PCPLIST ?  And I think putting mixture of pages marked of
 MIGRATE_TYPE onto a pcplist is ugly.

You mean that pcplist page is on can disagree with page_private()?
I didn't think this was such a big deal honestly.  Unless MIGRATE_CMA 
MIGRATE_PCPTYPES, a special case is needed either here or in
free_pcppages_bulk(), so I think this comes down to whether to make
MIGRATE_CAM  MIGRATE_PCPTYPES for which see below.

 @@ -1181,9 +1227,16 @@ void free_hot_cold_page(struct page *page, int cold)
   * offlined but treat RESERVE as movable pages so we can get those
   * areas back if necessary. Otherwise, we may have to free
   * excessively into the page allocator
 + *
 + * Still, do not change migration type of MIGRATE_CMA pages (if
 + * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
 + * be allocated from MIGRATE_CMA block and we don't want to allow
 + * that).  In this respect, treat

Re: [PATCHv7 08/10] mm: cma: Contiguous Memory Allocator added

2010-12-14 Thread Michal Nazarewicz

 On Mon, 13 Dec 2010 12:26:49 +0100
 Michal Nazarewicz m.nazarew...@samsung.com wrote:
 +/* Initialise CMA */
 +
 +static struct cma_grabbed {
 +unsigned long start;
 +unsigned long size;
 +} cma_grabbed[8] __initdata;
 +static unsigned cma_grabbed_count __initdata;
 +
 +int cma_init(unsigned long start, unsigned long size)
 +{
 +pr_debug(%s(%p+%p)\n, __func__, (void *)start, (void *)size);
 +
 +if (!size)
 +return -EINVAL;
 +if ((start | size)  ((MAX_ORDER_NR_PAGES  PAGE_SHIFT) - 1))
 +return -EINVAL;
 +if (start + size  start)
 +return -EOVERFLOW;
 +
 +if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed))
 +return -ENOSPC;
 +
 +cma_grabbed[cma_grabbed_count].start = start;
 +cma_grabbed[cma_grabbed_count].size  = size;
 +++cma_grabbed_count;
 +return 0;
 +}
 +

KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com writes:
 Is it guaranteed that there are no memory holes, or zone overlap
 in the range ? I think correctness of the range must be checked.

I keep thinking about it myself.  The idea is that you get memory range
reserved using memblock (or some such) thus it should not contain any
memory holes.  I'm not entirely sure about spanning different zones.
I'll add the checking code.

 +#define MIGRATION_RETRY 5
 +static int __cm_migrate(unsigned long start, unsigned long end)
 +{
[...]
 +}
 +
 +static int __cm_alloc(unsigned long start, unsigned long size)
 +{
 +unsigned long end, _start, _end;
 +int ret;
 +
[...]
 +
 +start = phys_to_pfn(start);
 +end   = start + (size  PAGE_SHIFT);
 +
 +pr_debug(\tisolate range(%lx, %lx)\n,
 + pfn_to_maxpage(start), pfn_to_maxpage_up(end));
 +ret = __start_isolate_page_range(pfn_to_maxpage(start),
 + pfn_to_maxpage_up(end), MIGRATE_CMA);
 +if (ret)
 +goto done;
 +
 +pr_debug(\tmigrate range(%lx, %lx)\n, start, end);
 +ret = __cm_migrate(start, end);
 +if (ret)
 +goto done;
 +
[...]
 +
 +pr_debug(\tfinding buddy\n);
 +ret = 0;
 +while (!PageBuddy(pfn_to_page(start  (~0UL  ret
 +if (WARN_ON(++ret = MAX_ORDER))
 +return -EINVAL;
 +
 +_start = start  (~0UL  ret);
 +pr_debug(\talloc freed(%lx, %lx)\n, _start, end);
 +_end   = alloc_contig_freed_pages(_start, end, 0);
 +
 +/* Free head and tail (if any) */
 +pr_debug(\tfree contig(%lx, %lx)\n, _start, start);
 +free_contig_pages(pfn_to_page(_start), start - _start);
 +pr_debug(\tfree contig(%lx, %lx)\n, end, _end);
 +free_contig_pages(pfn_to_page(end), _end - end);
 +
 +ret = 0;
 +
 +done:
 +pr_debug(\tundo isolate range(%lx, %lx)\n,
 + pfn_to_maxpage(start), pfn_to_maxpage_up(end));
 +__undo_isolate_page_range(pfn_to_maxpage(start),
 +  pfn_to_maxpage_up(end), MIGRATE_CMA);
 +
 +pr_debug(ret = %d\n, ret);
 +return ret;
 +}
 +
 +static void __cm_free(unsigned long start, unsigned long size)
 +{
 +pr_debug(%s(%p+%p)\n, __func__, (void *)start, (void *)size);
 +
 +free_contig_pages(pfn_to_page(phys_to_pfn(start)),
 +  size  PAGE_SHIFT);
 +}

 Hmm, it seems __cm_alloc() and __cm_migrate() has no special codes for CMA.
 I'd like reuse this for my own contig page allocator.
 So, could you make these function be more generic (name) ?
 as
   __alloc_range(start, size, mirate_type);

 Then, what I have to do is only to add search range functions.

Sure thing.  I'll post it tomorrow or Friday. How about
alloc_contig_range() maybe?

-- 
Pozdrawiam_ _
 .o. | Wasal Jasnie Oswieconej Pani Informatyki o' \,=./ `o
 ..o | Michal mina86 Nazarewicz  mina86*tlen.pl(o o)
 ooo +---jid:mina86-jabber.org---tlen:mina86---ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 01/10] mm: migrate.c: fix compilation error

2010-12-13 Thread Michal Nazarewicz

GCC complained about update_mmu_cache() not being defined
in migrate.c.  Including asm/tlbflush.h seems to solve the problem.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 mm/migrate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fe5a3c6..6ae8a66 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,6 +35,8 @@
 #include linux/hugetlb.h
 #include linux/gfp.h
 
+#include asm/tlbflush.h
+
 #include internal.h
 
 #define lru_to_page(_head) (list_entry((_head)-prev, struct page, lru))
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 00/10] Contiguous Memory Allocator

2010-12-13 Thread Michal Nazarewicz

Hello everyone,

This is yet another version of CMA this time stripped from a lot of
code and with working migration implementation.

   The Contiguous Memory Allocator (CMA) makes it possible for
   device drivers to allocate big contiguous chunks of memory after
   the system has booted.

For more information see 8th patch in the set.


The current version is just an allocator that handles allocation of
contiguous memory blocks.  The difference between this patchset and
Kamezawa's alloc_contig_pages() are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
   which may be unsuitable for embeded where a few MiBs are required.

2. CMA uses its own migratetype (MIGRATE_CMA) as it was suggested
   during one of previous iterations.  The migrate type behaves
   similarly to ZONE_MOVABLE but can be put in arbitrary places.

   This is required for us since we need to define two disjoint memory
   ranges inside system's RAM.  (ie. in two memory banks (do not
   confuse with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
   migrated.  CMA on the other hand maintains its own allocator to
   decide where to allocate memory for device drivers and then tries
   to migrate pages from that part if needed.  This is not strictly
   required but I somehow feel it might be faster.


Links to previous versions of the patchset:
v6: http://article.gmane.org/gmane.linux.kernel.mm/55626/
v5: (intentionally left out as CMA v5 was identical to CMA v4)
v4: http://article.gmane.org/gmane.linux.kernel.mm/52010/
v3: http://article.gmane.org/gmane.linux.kernel.mm/51573/
v2: http://article.gmane.org/gmane.linux.kernel.mm/50986/
v1: http://article.gmane.org/gmane.linux.kernel.mm/50669/


Changelog:

v7: 1. A lot of functionality that handled driver-allocator_context
   mapping has been removed from the patchset.  This is not to say
   that this code is not needed, it's just not worth posting
   everything in one patchset.

   Currently, CMA is just an allocator.  It uses it's own
   migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
   which behave just like ZONE_MOVABLE but dispite the latter can
   be put in arbitrary places.

2. The migration code that was introduced in the previous version
   actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
   The implementation is not yet complete though.

   Migration support means that when CMA is not using memory
   reserved for it, page allocator can allocate pages from it.
   When CMA wants to use the memory, the pages have to be moved
   and/or evicted as to make room for CMA.

   To make it possible it must be guaranteed that only movable and
   reclaimable pages are allocated in CMA controlled regions.
   This is done by introducing a MIGRATE_CMA migrate type that
   guarantees exactly that.

   Some of the migration code is borrowed from Kamezawa
   Hiroyuki's alloc_contig_pages() implementation.  The main
   difference is that thanks to MIGRATE_CMA migrate type CMA
   assumes that memory controlled by CMA are is always movable or
   reclaimable so that it makes allocation decisions regardless of
   the whether some pages are actually allocated and migrates them
   if needed.

   The most interesting patches from the patchset that implement
   the functionality are:

 09/13: mm: alloc_contig_free_pages() added
 10/13: mm: MIGRATE_CMA migration type added
 11/13: mm: MIGRATE_CMA isolation functions added
 12/13: mm: cma: Migration support added [wip]

   Currently, kernel panics in some situations which I am trying
   to investigate.

2. cma_pin() and cma_unpin() functions has been added (after
   a conversation with Johan Mossberg).  The idea is that whenever
   hardware does not use the memory (no transaction is on) the
   chunk can be moved around.  This would allow defragmentation to
   be implemented if desired.  No defragmentation algorithm is
   provided at this time.

3. Sysfs support has been replaced with debugfs.  I always felt
   unsure about the sysfs interface and when Greg KH pointed it
   out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The asterisk flag has been removed in favour of requiring
   that platform will provide a *=regions rule in the map
   attribute.

2. The terminology has been changed slightly renaming kind to
   type of memory.  In the previous revisions, the documentation
   indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
   a separate patch, the fourth one).  As a consequence, the
   cma_set_defaults() function has been changed -- it no longer
   accepts a string with list of regions but an

[PATCHv7 10/10] ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

2010-12-13 Thread Michal Nazarewicz

This commit adds CMA memory reservation code to Aquila, Goni and c210
universal boards.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 arch/arm/mach-s5pv210/mach-aquila.c |2 +
 arch/arm/mach-s5pv210/mach-goni.c   |2 +
 arch/arm/mach-s5pv310/mach-universal_c210.c |2 +
 arch/arm/plat-s5p/Makefile  |2 +
 arch/arm/plat-s5p/cma-stub.c|   49 +++
 arch/arm/plat-s5p/include/plat/cma-stub.h   |   21 +++
 6 files changed, 78 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/plat-s5p/cma-stub.c
 create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h

diff --git a/arch/arm/mach-s5pv210/mach-aquila.c 
b/arch/arm/mach-s5pv210/mach-aquila.c
index 28677ca..8608a16 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -39,6 +39,7 @@
 #include plat/fb.h
 #include plat/fimc-core.h
 #include plat/sdhci.h
+#include plat/cma-stub.h
 
 /* Following are default values for UCON, ULCON and UFCON UART registers */
 #define AQUILA_UCON_DEFAULT(S3C2410_UCON_TXILEVEL |\
@@ -690,4 +691,5 @@ MACHINE_START(AQUILA, Aquila)
.map_io = aquila_map_io,
.init_machine   = aquila_machine_init,
.timer  = s3c24xx_timer,
+   .reserve= cma_mach_reserve,
 MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c 
b/arch/arm/mach-s5pv210/mach-goni.c
index b1dcf96..b1bf079 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -45,6 +45,7 @@
 #include plat/keypad.h
 #include plat/sdhci.h
 #include plat/clock.h
+#include plat/cma-stub.h
 
 /* Following are default values for UCON, ULCON and UFCON UART registers */
 #define GONI_UCON_DEFAULT  (S3C2410_UCON_TXILEVEL |\
@@ -865,4 +866,5 @@ MACHINE_START(GONI, GONI)
.map_io = goni_map_io,
.init_machine   = goni_machine_init,
.timer  = s3c24xx_timer,
+   .reserve= cma_mach_reserve,
 MACHINE_END
diff --git a/arch/arm/mach-s5pv310/mach-universal_c210.c 
b/arch/arm/mach-s5pv310/mach-universal_c210.c
index 16d8fc0..d65703a 100644
--- a/arch/arm/mach-s5pv310/mach-universal_c210.c
+++ b/arch/arm/mach-s5pv310/mach-universal_c210.c
@@ -21,6 +21,7 @@
 #include plat/s5pv310.h
 #include plat/cpu.h
 #include plat/devs.h
+#include plat/cma-stub.h
 
 #include mach/map.h
 
@@ -152,6 +153,7 @@ MACHINE_START(UNIVERSAL_C210, UNIVERSAL_C210)
.boot_params= S5P_PA_SDRAM + 0x100,
.init_irq   = s5pv310_init_irq,
.map_io = universal_map_io,
+   .reserve= cma_mach_reserve,
.init_machine   = universal_machine_init,
.timer  = s5pv310_timer,
 MACHINE_END
diff --git a/arch/arm/plat-s5p/Makefile b/arch/arm/plat-s5p/Makefile
index de65238..6fdb6ce 100644
--- a/arch/arm/plat-s5p/Makefile
+++ b/arch/arm/plat-s5p/Makefile
@@ -28,3 +28,5 @@ obj-$(CONFIG_S5P_DEV_FIMC0)   += dev-fimc0.o
 obj-$(CONFIG_S5P_DEV_FIMC1)+= dev-fimc1.o
 obj-$(CONFIG_S5P_DEV_FIMC2)+= dev-fimc2.o
 obj-$(CONFIG_S5P_DEV_ONENAND)  += dev-onenand.o
+
+obj-$(CONFIG_CMA) += cma-stub.o
diff --git a/arch/arm/plat-s5p/cma-stub.c b/arch/arm/plat-s5p/cma-stub.c
new file mode 100644
index 000..716e56d
--- /dev/null
+++ b/arch/arm/plat-s5p/cma-stub.c
@@ -0,0 +1,49 @@
+/*
+ * This file is just a quick and dirty hack to get CMA testing device
+ * working.  The cma_mach_reserve() should be called as mach's reserve
+ * callback.  CMA testing device will use cma_ctx for allocations.
+ */
+
+#include plat/cma-stub.h
+
+#include linux/cma.h
+#include linux/err.h
+#include linux/init.h
+#include linux/kernel.h
+
+struct cma *cma_ctx;
+
+#define cma_size (32UL  20) /* 32 MiB */
+
+static unsigned long cma_start __initdata;
+
+void __init cma_mach_reserve(void)
+{
+   unsigned long start = cma_reserve(0, cma_size, 0);
+   if (IS_ERR_VALUE(start))
+   printk(KERN_WARNING cma: unable to reserve %lu for CMA: %d\n,
+  cma_size  20, (int)start);
+   else
+   cma_start = start;
+}
+
+static int __init cma_mach_init(void)
+{
+   int ret = -ENOMEM;
+
+   if (cma_start) {
+   struct cma *ctx = cma_create(cma_start, cma_size);
+   if (IS_ERR(ctx)) {
+   ret = PTR_ERR(ctx);
+   printk(KERN_WARNING
+  cma: cma_create(%p, %p) failed: %d\n,
+  (void *)cma_start, (void *)cma_size, ret);
+   } else {
+   cma_ctx = ctx;
+   ret = 0;
+   }
+   }
+
+   return ret;
+}
+device_initcall(cma_mach_init);
diff --git a/arch/arm/plat-s5p/include/plat/cma-stub.h 
b/arch/arm/plat-s5p/include/plat/cma-stub.h
new file mode 100644
index 000..a24a03b
--- /dev/null
+++ b/arch/arm/plat-s5p/include/plat/cma

[PATCHv7 09/10] mm: cma: Test device and application added

2010-12-13 Thread Michal Nazarewicz

This patch adds a cma misc device which lets user space use the
CMA API.  This device is meant for testing.  A testing application
is also provided.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/misc/Kconfig   |   10 +
 drivers/misc/Makefile  |1 +
 drivers/misc/cma-dev.c |  238 
 include/linux/cma.h|   29 +++
 tools/cma/cma-test.c   |  466 
 5 files changed, 744 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/cma-dev.c
 create mode 100644 tools/cma/cma-test.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 1e1a4be..0dd1e8c 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -458,4 +458,14 @@ source drivers/misc/cb710/Kconfig
 source drivers/misc/iwmc3200top/Kconfig
 source drivers/misc/ti-st/Kconfig
 
+config CMA_DEVICE
+   tristate CMA misc device (DEVELOPEMENT)
+   depends on CMA
+   help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space.  This is mostly for testing of the CMA
+ framework.
+
+ If unsure, say n
+
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..f8eadd4 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
 obj-$(CONFIG_PCH_PHUB) += pch_phub.o
 obj-y  += ti-st/
 obj-$(CONFIG_AB8500_PWM)   += ab8500-pwm.o
+obj-$(CONFIG_CMA_DEVICE)   += cma-dev.o
diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c
new file mode 100644
index 000..6c36064
--- /dev/null
+++ b/drivers/misc/cma-dev.c
@@ -0,0 +1,238 @@
+/*
+ * Contiguous Memory Allocator userspace driver
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz (m.nazarew...@samsung.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) cma:  fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#  define DEBUG
+#endif
+
+#include linux/errno.h   /* Error numbers */
+#include linux/err.h /* IS_ERR_VALUE() */
+#include linux/fs.h  /* struct file */
+#include linux/mm.h  /* Memory stuff */
+#include linux/mman.h
+#include linux/slab.h
+#include linux/module.h  /* Standard module stuff */
+#include linux/device.h  /* struct device, dev_dbg() */
+#include linux/types.h   /* Just to be safe ;) */
+#include linux/uaccess.h /* __copy_{to,from}_user */
+#include linux/miscdevice.h  /* misc_register() and company */
+
+#include linux/cma.h
+
+#include plat/cma-stub.h
+
+static int  cma_file_open(struct inode *inode, struct file *file);
+static int  cma_file_release(struct inode *inode, struct file *file);
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+static int  cma_file_mmap(struct file *file, struct vm_area_struct *vma);
+
+static struct miscdevice cma_miscdev = {
+   .minor = MISC_DYNAMIC_MINOR,
+   .name  = cma,
+   .fops  = (const struct file_operations) {
+   .owner  = THIS_MODULE,
+   .open   = cma_file_open,
+   .release= cma_file_release,
+   .unlocked_ioctl = cma_file_ioctl,
+   .mmap   = cma_file_mmap,
+   },
+};
+#define cma_dev (cma_miscdev.this_device)
+
+struct cma_private_data {
+   struct cm   *cm;
+   unsigned long   size;
+   unsigned long   phys;
+};
+
+static int  cma_file_open(struct inode *inode, struct file *file)
+{
+   struct cma_private_data *prv;
+
+   dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file);
+
+   if (!cma_ctx)
+   return -EOPNOTSUPP;
+
+   prv = kzalloc(sizeof *prv, GFP_KERNEL);
+   if (!prv)
+   return -ENOMEM;
+
+   file-private_data = prv;
+
+   return 0;
+}
+
+static int  cma_file_release(struct inode *inode, struct file *file)
+{
+   struct cma_private_data *prv = file-private_data;
+
+   dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file);
+
+   if (prv-cm) {
+   cm_unpin(prv-cm);
+   cm_free(prv-cm);
+   }
+   kfree(prv);
+
+   return 0;
+}
+
+static long cma_file_ioctl_req(struct cma_private_data *prv, unsigned long arg)
+{
+   struct cma_alloc_request req;
+   struct cm *cm;
+
+   dev_dbg(cma_dev, %s()\n, __func__);
+
+   if (!arg)
+   return -EINVAL;
+
+   if (copy_from_user(req, (void *)arg, sizeof req))
+   return -EFAULT;
+
+   if (req.magic != CMA_MAGIC)
+   return -ENOTTY;
+
+   /* May happen on 32 bit system. */
+   if (req.size  ~(unsigned

[PATCHv7 02/10] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()

2010-12-13 Thread Michal Nazarewicz

This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked.  This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/bitmap.h |   24 +++-
 lib/bitmap.c   |   22 --
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..c0528d1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -45,6 +45,7 @@
  * bitmap_set(dst, pos, nbits) Set specified bit area
  * bitmap_clear(dst, pos, nbits)   Clear specified bit area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
+ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask)  as above
  * bitmap_shift_right(dst, src, n, nbits)  *dst = *src  n
  * bitmap_shift_left(dst, src, n, nbits)   *dst = *src  n
  * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, 
int bits);
 
 extern void bitmap_set(unsigned long *map, int i, int len);
 extern void bitmap_clear(unsigned long *map, int start, int nr);
-extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
-unsigned long size,
-unsigned long start,
-unsigned int nr,
-unsigned long align_mask);
+
+extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+   unsigned long size,
+   unsigned long start,
+   unsigned int nr,
+   unsigned long align_mask,
+   unsigned long align_offset);
+
+static inline unsigned long
+bitmap_find_next_zero_area(unsigned long *map,
+  unsigned long size,
+  unsigned long start,
+  unsigned int nr,
+  unsigned long align_mask)
+{
+   return bitmap_find_next_zero_area_off(map, size, start, nr,
+ align_mask, 0);
+}
 
 extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 741fae9..8e75a6f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr)
 }
 EXPORT_SYMBOL(bitmap_clear);
 
-/*
+/**
  * bitmap_find_next_zero_area - find a contiguous aligned zero area
  * @map: The address to base the search on
  * @size: The bitmap size in bits
  * @start: The bitnumber to start searching at
  * @nr: The number of zeroed bits we're looking for
  * @align_mask: Alignment mask for zero area
+ * @align_offset: Alignment offset for zero area.
  *
  * The @align_mask should be one less than a power of 2; the effect is that
- * the bit offset of all zero areas this function finds is multiples of that
- * power of 2. A @align_mask of 0 means no alignment is required.
+ * the bit offset of all zero areas this function finds plus @align_offset
+ * is multiple of that power of 2.
  */
-unsigned long bitmap_find_next_zero_area(unsigned long *map,
-unsigned long size,
-unsigned long start,
-unsigned int nr,
-unsigned long align_mask)
+unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int nr,
+unsigned long align_mask,
+unsigned long align_offset)
 {
unsigned long index, end, i;
 again:
index = find_next_zero_bit(map, size, start);
 
/* Align allocation */
-   index = __ALIGN_MASK(index, align_mask);
+   index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;
 
end = index + nr;
if (end  size)
@@ -350,7 +352,7 @@ again:
}
return index;
 }
-EXPORT_SYMBOL(bitmap_find_next_zero_area);
+EXPORT_SYMBOL(bitmap_find_next_zero_area_off);
 
 /*
  * Bitmap printing  parsing functions: first version by Bill Irwin,
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux

[PATCHv7 03/10] lib: genalloc: Generic allocator improvements

2010-12-13 Thread Michal Nazarewicz

This commit adds a gen_pool_alloc_aligned() function to the
generic allocator API.  It allows specifying alignment for the
allocated block.  This feature uses
the bitmap_find_next_zero_area_off() function.

It also fixes possible issue with bitmap's last element being
not fully allocated (ie. space allocated for chunk-bits is
not a multiple of sizeof(long)).

It also makes some other smaller changes:
- moves structure definitions out of the header file,
- adds __must_check to functions returning value,
- makes gen_pool_add() return -ENOMEM rater than -1 on error,
- changes list_for_each to list_for_each_entry, and
- makes use of bitmap_clear().

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/genalloc.h |   46 ++--
 lib/genalloc.c   |  182 ++---
 2 files changed, 129 insertions(+), 99 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 9869ef3..8ac7337 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -8,29 +8,31 @@
  * Version 2.  See the file COPYING for more details.
  */
 
+struct gen_pool;
 
-/*
- *  General purpose special memory pool descriptor.
- */
-struct gen_pool {
-   rwlock_t lock;
-   struct list_head chunks;/* list of chunks in this pool */
-   int min_alloc_order;/* minimum allocation order */
-};
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid);
 
-/*
- *  General purpose special memory pool chunk descriptor.
+int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr,
+ size_t size, int nid);
+
+void gen_pool_destroy(struct gen_pool *pool);
+
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+  unsigned alignment_order);
+
+/**
+ * gen_pool_alloc() - allocate special memory from the pool
+ * @pool:  Pool to allocate from.
+ * @size:  Number of bytes to allocate from the pool.
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses a first-fit algorithm.
  */
-struct gen_pool_chunk {
-   spinlock_t lock;
-   struct list_head next_chunk;/* next chunk in pool */
-   unsigned long start_addr;   /* starting address of memory chunk */
-   unsigned long end_addr; /* ending address of memory chunk */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
-};
+static inline unsigned long __must_check
+gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+   return gen_pool_alloc_aligned(pool, size, 0);
+}
 
-extern struct gen_pool *gen_pool_create(int, int);
-extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 1923f14..0761079 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -16,53 +16,80 @@
 #include linux/genalloc.h
 
 
+/* General purpose special memory pool descriptor. */
+struct gen_pool {
+   rwlock_t lock;  /* protects chunks list */
+   struct list_head chunks;/* list of chunks in this pool */
+   unsigned order; /* minimum allocation order */
+};
+
+/* General purpose special memory pool chunk descriptor. */
+struct gen_pool_chunk {
+   spinlock_t lock;/* protects bits */
+   struct list_head next_chunk;/* next chunk in pool */
+   unsigned long start;/* start of memory chunk */
+   unsigned long size; /* number of bits */
+   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+};
+
+
 /**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @order: Log base 2 of number of bytes each bitmap bit
+ * represents.
+ * @nid:   Node id of the node the pool structure should be allocated
+ * on, or -1.  This will be also used for other allocations.
  *
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface.
  */
-struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid)
 {
struct gen_pool *pool;
 
-   pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
-   if (pool != NULL) {
+   if (WARN_ON(order = BITS_PER_LONG))
+   return NULL

[PATCHv7 06/10] mm: MIGRATE_CMA migration type added

2010-12-13 Thread Michal Nazarewicz

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory.  Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/mmzone.h |   30 +++---
 mm/Kconfig |8 
 mm/compaction.c|   10 +
 mm/internal.h  |3 +
 mm/page_alloc.c|   97 +++
 5 files changed, 124 insertions(+), 24 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 39c24eb..1b95899 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,24 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE 0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE   2
-#define MIGRATE_PCPTYPES  3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE   3
-#define MIGRATE_ISOLATE   4 /* can't allocate from here */
-#define MIGRATE_TYPES 5
+enum {
+   MIGRATE_UNMOVABLE,
+   MIGRATE_RECLAIMABLE,
+   MIGRATE_MOVABLE,
+   MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
+   MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+   MIGRATE_ISOLATE,/* can't allocate from here */
+#ifdef CONFIG_MIGRATE_CMA
+   MIGRATE_CMA,/* only movable */
+#endif
+   MIGRATE_TYPES
+};
+
+#ifdef CONFIG_MIGRATE_CMA
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
for (order = 0; order  MAX_ORDER; order++) \
@@ -54,6 +65,11 @@ static inline int get_pageblock_migratetype(struct page 
*page)
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+   return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
unsigned long   nr_free;
diff --git a/mm/Kconfig b/mm/Kconfig
index b911ad3..7818b07 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1,3 +1,11 @@
+config MIGRATE_CMA
+   bool
+   help
+ This option should be selected by code that requires MIGRATE_CMA
+ migration type to be present.  Once a page block has this
+ migration type, only movable pages can be allocated from it and
+ the page block never changes it's migration type.
+
 config SELECT_MEMORY_MODEL
def_bool y
depends on EXPERIMENTAL || ARCH_SELECT_MEMORY_MODEL
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..c5e404b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page)
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
return false;
 
+   /* Keep MIGRATE_CMA alone as well. */
+   /*
+* XXX Revisit.  We currently cannot let compaction touch CMA
+* pages since compaction insists on changing their migration
+* type to MIGRATE_MOVABLE (see split_free_page() called from
+* isolate_freepages_block() above).
+*/
+   if (is_migrate_cma(migratetype))
+   return false;
+
/* If the page is a large free page, then allow migration */
if (PageBuddy(page)  page_order(page) = pageblock_order)
return true;
diff --git a/mm/internal.h b/mm/internal.h
index dedb0af..cc24e74 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page);
  * in mm/page_alloc.c
  */
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
+#ifdef CONFIG_MIGRATE_CMA
+extern void __free_pageblock_cma(struct page *page);
+#endif
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
 extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 997f6c8..537d1f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -717,6 +717,30 @@ void __meminit __free_pages_bootmem(struct page *page, 
unsigned int order)
}
 }
 
+#ifdef CONFIG_MIGRATE_CMA
+
+/*
+ * Free whole pageblock and set it's migration type

[RFCv6 01/13] lib: rbtree: rb_root_init() function added

2010-11-19 Thread Michal Nazarewicz

Added a rb_root_init() function which initialises a rb_root
structure as a red-black tree with at most one element.  The
rationale is that using rb_root_init(root, node) is more
straightforward and cleaner then first initialising and
empty tree followed by an insert operation.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/rbtree.h |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 7066acb..5b6dc66 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -130,6 +130,17 @@ static inline void rb_set_color(struct rb_node *rb, int 
color)
 }
 
 #define RB_ROOT(struct rb_root) { NULL, }
+
+static inline void rb_root_init(struct rb_root *root, struct rb_node *node)
+{
+   root-rb_node = node;
+   if (node) {
+   node-rb_parent_color = RB_BLACK; /* black, no parent */
+   node-rb_left  = NULL;
+   node-rb_right = NULL;
+   }
+}
+
 #definerb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)((root)-rb_node == NULL)
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFCv6 02/13] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()

2010-11-19 Thread Michal Nazarewicz

This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked.  This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 include/linux/bitmap.h |   24 +++-
 lib/bitmap.c   |   22 --
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..c0528d1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -45,6 +45,7 @@
  * bitmap_set(dst, pos, nbits) Set specified bit area
  * bitmap_clear(dst, pos, nbits)   Clear specified bit area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
+ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask)  as above
  * bitmap_shift_right(dst, src, n, nbits)  *dst = *src  n
  * bitmap_shift_left(dst, src, n, nbits)   *dst = *src  n
  * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, 
int bits);
 
 extern void bitmap_set(unsigned long *map, int i, int len);
 extern void bitmap_clear(unsigned long *map, int start, int nr);
-extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
-unsigned long size,
-unsigned long start,
-unsigned int nr,
-unsigned long align_mask);
+
+extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+   unsigned long size,
+   unsigned long start,
+   unsigned int nr,
+   unsigned long align_mask,
+   unsigned long align_offset);
+
+static inline unsigned long
+bitmap_find_next_zero_area(unsigned long *map,
+  unsigned long size,
+  unsigned long start,
+  unsigned int nr,
+  unsigned long align_mask)
+{
+   return bitmap_find_next_zero_area_off(map, size, start, nr,
+ align_mask, 0);
+}
 
 extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 741fae9..8e75a6f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr)
 }
 EXPORT_SYMBOL(bitmap_clear);
 
-/*
+/**
  * bitmap_find_next_zero_area - find a contiguous aligned zero area
  * @map: The address to base the search on
  * @size: The bitmap size in bits
  * @start: The bitnumber to start searching at
  * @nr: The number of zeroed bits we're looking for
  * @align_mask: Alignment mask for zero area
+ * @align_offset: Alignment offset for zero area.
  *
  * The @align_mask should be one less than a power of 2; the effect is that
- * the bit offset of all zero areas this function finds is multiples of that
- * power of 2. A @align_mask of 0 means no alignment is required.
+ * the bit offset of all zero areas this function finds plus @align_offset
+ * is multiple of that power of 2.
  */
-unsigned long bitmap_find_next_zero_area(unsigned long *map,
-unsigned long size,
-unsigned long start,
-unsigned int nr,
-unsigned long align_mask)
+unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int nr,
+unsigned long align_mask,
+unsigned long align_offset)
 {
unsigned long index, end, i;
 again:
index = find_next_zero_bit(map, size, start);
 
/* Align allocation */
-   index = __ALIGN_MASK(index, align_mask);
+   index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;
 
end = index + nr;
if (end  size)
@@ -350,7 +352,7 @@ again:
}
return index;
 }
-EXPORT_SYMBOL(bitmap_find_next_zero_area);
+EXPORT_SYMBOL(bitmap_find_next_zero_area_off);
 
 /*
  * Bitmap printing  parsing functions: first version by Bill Irwin,
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux

[RFCv6 05/13] mm: cma: debugfs support added

2010-11-19 Thread Michal Nazarewicz

The debugfs development interface lets one change the map attribute
at run time as well as observe what regions have been reserved.

Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/contiguous-memory.txt |4 +
 include/linux/cma.h |   11 +
 mm/Kconfig  |   25 ++-
 mm/cma.c|  501 ++-
 4 files changed, 537 insertions(+), 4 deletions(-)

diff --git a/Documentation/contiguous-memory.txt 
b/Documentation/contiguous-memory.txt
index f1715ba..ec09d8e 100644
--- a/Documentation/contiguous-memory.txt
+++ b/Documentation/contiguous-memory.txt
@@ -258,6 +258,10 @@
  iff it matched in previous pattern.  If the second part is
  omitted it will mach any type of memory requested by device.
 
+ If debugfs support is enabled, this attribute is accessible via
+ debugfs and can be changed at run-time by writing to
+ contiguous/map.
+
  Some examples (whitespace added for better readability):
 
  cma_map = foo/quaz = r1;
diff --git a/include/linux/cma.h b/include/linux/cma.h
index a6031a7..8437104 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -24,6 +24,7 @@
 
 struct device;
 struct cma_info;
+struct dentry;
 
 /**
  * struct cma - an allocated contiguous chunk of memory.
@@ -276,6 +277,11 @@ struct cma_region {
unsigned users;
struct list_head list;
 
+#if defined CONFIG_CMA_DEBUGFS
+   const char *to_alloc_link, *from_alloc_link;
+   struct dentry *dir, *to_alloc, *from_alloc;
+#endif
+
unsigned used:1;
unsigned registered:1;
unsigned reserved:1;
@@ -382,6 +388,11 @@ struct cma_allocator {
void (*unpin)(struct cma *chunk);
 
struct list_head list;
+
+#if defined CONFIG_CMA_DEBUGFS
+   const char *dir_name;
+   struct dentry *regs_dir;
+#endif
 };
 
 /**
diff --git a/mm/Kconfig b/mm/Kconfig
index c7eb1bc..a5480ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -351,16 +351,35 @@ config CMA
  For more information see Documentation/contiguous-memory.txt.
  If unsure, say n.
 
-config CMA_DEBUG
-   bool CMA debug messages (DEVELOPEMENT)
+config CMA_DEVELOPEMENT
+   bool Include CMA developement features
depends on CMA
help
+ This lets you enable some developement features of the CMA
+ framework.  It does not add any code to the kernel.
+
+ Those options are mostly usable during development and testing.
+ If unsure, say n.
+
+config CMA_DEBUG
+   bool CMA debug messages
+   depends on CMA_DEVELOPEMENT
+   help
  Turns on debug messages in CMA.  This produces KERN_DEBUG
  messages for every CMA call as well as various messages while
  processing calls such as cma_alloc().  This option does not
  affect warning and error messages.
 
- This is mostly used during development.  If unsure, say n.
+config CMA_DEBUGFS
+   bool CMA debugfs interface support
+   depends on CMA_DEVELOPEMENT  DEBUG_FS
+   help
+ Enable support for debugfs interface.  It is available under the
+ contiguous directory in the debugfs root directory.  Each
+ region and allocator is represented there.
+
+ For more information consult
+ Documentation/contiguous-memory.txt.
 
 config CMA_GENERIC_ALLOCATOR
bool CMA generic allocator
diff --git a/mm/cma.c b/mm/cma.c
index 17276b3..dfdeeb7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -34,11 +34,16 @@
 #include linux/slab.h/* kmalloc() */
 #include linux/string.h  /* str*() */
 #include linux/genalloc.h/* gen_pool_*() */
+#include linux/debugfs.h /* debugfs stuff */
+#include linux/uaccess.h /* copy_{to,from}_user */
 
 #include linux/cma.h
 
 
-/* Protects cma_regions, cma_allocators, cma_map and cma_map_length. */
+/*
+ * Protects cma_regions, cma_allocators, cma_map, cma_map_length,
+ * cma_dfs_regions and cma_dfs_allocators.
+ */
 static DEFINE_MUTEX(cma_mutex);
 
 
@@ -139,7 +144,13 @@ int __init __must_check cma_early_region_register(struct 
cma_region *reg)
 
 /* Regions  Allocators */
 
+static void __cma_dfs_region_add(struct cma_region *reg);
+static void __cma_dfs_region_alloc_update(struct cma_region *reg);
+static void __cma_dfs_allocator_add(struct cma_allocator *alloc);
+
 static int __cma_region_attach_alloc(struct cma_region *reg);
+static void __maybe_unused __cma_region_detach_alloc(struct cma_region *reg);
+
 
 /* List of all regions.  Named regions are kept before unnamed. */
 static LIST_HEAD(cma_regions);
@@ -222,6 +233,8 @@ int __must_check cma_region_register(struct cma_region *reg)
else
list_add_tail(reg-list, cma_regions);
 
+   __cma_dfs_region_add(reg);
+
 done:
mutex_unlock(cma_mutex);
 
@@ -298,6 +311,8 @@ int

1 2 >

1 - 100 of 134 matches

Mail list logo