[PATCHv2] staging: go7007: fix use of uninitialised pointer
go variable is initialised only after the switch case so it cannot be dereferenced prior to that happening. Signed-off-by: Michal Nazarewicz min...@mina86.com --- drivers/staging/media/go7007/go7007-usb.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) On Sun, Nov 10 2013, Dan Carpenter wrote: There are 3 other uses before go gets initialized. Argh... Other occurrences of the letters “GO” deceived my eyes. Sorry about that and thanks. diff --git a/drivers/staging/media/go7007/go7007-usb.c b/drivers/staging/media/go7007/go7007-usb.c index 58684da..ee8ab89 100644 --- a/drivers/staging/media/go7007/go7007-usb.c +++ b/drivers/staging/media/go7007/go7007-usb.c @@ -1057,7 +1057,7 @@ static int go7007_usb_probe(struct usb_interface *intf, char *name; int video_pipe, i, v_urb_len; - dev_dbg(go-dev, probing new GO7007 USB board\n); + dev_dbg(intf-dev, probing new GO7007 USB board\n); switch (id-driver_info) { case GO7007_BOARDID_MATRIX_II: @@ -1097,13 +1097,13 @@ static int go7007_usb_probe(struct usb_interface *intf, board = board_px_tv402u; break; case GO7007_BOARDID_LIFEVIEW_LR192: - dev_err(go-dev, The Lifeview TV Walker Ultra is not supported. Sorry!\n); + dev_err(intf-dev, The Lifeview TV Walker Ultra is not supported. Sorry!\n); return -ENODEV; name = Lifeview TV Walker Ultra; board = board_lifeview_lr192; break; case GO7007_BOARDID_SENSORAY_2250: - dev_info(go-dev, Sensoray 2250 found\n); + dev_info(intf-dev, Sensoray 2250 found\n); name = Sensoray 2250/2251; board = board_sensoray_2250; break; @@ -1112,7 +1112,7 @@ static int go7007_usb_probe(struct usb_interface *intf, board = board_ads_usbav_709; break; default: - dev_err(go-dev, unknown board ID %d!\n, + dev_err(intf-dev, unknown board ID %d!\n, (unsigned int)id-driver_info); return -ENODEV; } -- 1.8.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] staging: go7007: fix use of uninitialised pointer
From: Michal Nazarewicz min...@mina86.com The go variable is declade without initialisation and invocation of dev_dbg immediatelly tries to dereference it. --- drivers/staging/media/go7007/go7007-usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/go7007/go7007-usb.c b/drivers/staging/media/go7007/go7007-usb.c index 58684da..457ab63 100644 --- a/drivers/staging/media/go7007/go7007-usb.c +++ b/drivers/staging/media/go7007/go7007-usb.c @@ -1057,7 +1057,7 @@ static int go7007_usb_probe(struct usb_interface *intf, char *name; int video_pipe, i, v_urb_len; - dev_dbg(go-dev, probing new GO7007 USB board\n); + pr_debug(probing new GO7007 USB board\n); switch (id-driver_info) { case GO7007_BOARDID_MATRIX_II: -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] staging: go7007: fix use of uninitialised pointer
Signed-off-by: Michal Nazarewicz min...@mina86.com --- drivers/staging/media/go7007/go7007-usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) On Sun, Nov 10 2013, Greg Kroah-Hartman wrote: Please either delete this entirely, or use the struct device in the usb_interface pointer. A driver should never have a raw pr_* call. diff --git a/drivers/staging/media/go7007/go7007-usb.c b/drivers/staging/media/go7007/go7007-usb.c index 58684da..e8c708c 100644 --- a/drivers/staging/media/go7007/go7007-usb.c +++ b/drivers/staging/media/go7007/go7007-usb.c @@ -1057,7 +1057,7 @@ static int go7007_usb_probe(struct usb_interface *intf, char *name; int video_pipe, i, v_urb_len; - dev_dbg(go-dev, probing new GO7007 USB board\n); + dev_dbg(intf-dev, probing new GO7007 USB board\n); switch (id-driver_info) { case GO7007_BOARDID_MATRIX_II: -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] dma-buf: add helpers for attacher dma-parms
Tomasz Stanislawski t.stanisl...@samsung.com writes: I recommend to change the semantics for unlimited number of segments from 'value 0' to: #define DMA_SEGMENTS_COUNT_UNLIMITED ((unsigned long)INT_MAX) Using INT_MAX will allow using safe conversions between signed and unsigned integers. LONG_MAX seems cleaner regardless. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- pgpMdROuCD0IG.pgp Description: PGP signature
Re: [Linaro-mm-sig] [PATCHv23 00/16] Contiguous Memory Allocator
On Wed, 29 Feb 2012 10:35:42 +0100, Barry Song 21cn...@gmail.com wrote: 2012/2/23 Marek Szyprowski m.szyprow...@samsung.com: This is (yet another) quick update of CMA patches. I've rebased them onto next-20120222 tree from git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git and fixed the bug pointed by Aaro Koskinen. For the whole series: Tested-by: Barry Song baohua.s...@csr.com and i also write a simple kernel helper to test the CMA: Would it make sense to make a patch out of it putting it to tools/cma (or similar)? -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCHv23 00/16] Contiguous Memory Allocator
On Wed, 29 Feb 2012 14:34:02 +0100, Barry Song 21cn...@gmail.com wrote: 2012/2/29 Michal Nazarewicz min...@mina86.com: On Wed, 29 Feb 2012 10:35:42 +0100, Barry Song 21cn...@gmail.com wrote: and i also write a simple kernel helper to test the CMA: Would it make sense to make a patch out of it putting it to tools/cma (or similar)? i can send a patch for this. i am just thinking, should it be placed in tools/ as a test utility or Documents/ as an example to explain CMA to users who want to use cma. I'd put it in tools/. i also think we should have a seperate document to explain cma in details in documents/, and my helper program can be placed there. I'm currently writing an article for LWN which I hope will also lead to something worth putting in Documentation/. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCHv21 12/16] mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks
On Fri, Feb 10, 2012 at 11:32 AM, Marek Szyprowski m.szyprow...@samsung.com wrote: @@ -5637,6 +5642,56 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) return ret 0 ? 0 : ret; } +/* + * Update zone's cma pages counter used for watermark level calculation. + */ +static inline void __update_cma_wmark_pages(struct zone *zone, int count) +{ + unsigned long flags; + spin_lock_irqsave(zone-lock, flags); + zone-min_cma_pages += count; + spin_unlock_irqrestore(zone-lock, flags); + setup_per_zone_wmarks(); +} + +/* + * Trigger memory pressure bump to reclaim some pages in order to be able to + * allocate 'count' pages in single page units. Does similar work as + *__alloc_pages_slowpath() function. + */ +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count) +{ + enum zone_type high_zoneidx = gfp_zone(gfp_mask); + struct zonelist *zonelist = node_zonelist(0, gfp_mask); + int did_some_progress = 0; + int order = 1; + unsigned long watermark; + + /* +* Increase level of watermarks to force kswapd do his job +* to stabilise at new watermark level. +*/ + __modify_min_cma_pages(zone, count); On Mon, 13 Feb 2012 10:57:58 -0800, Robert Nelson robertcnel...@gmail.com wrote: Hi Marek, This ^^^ function doesn't seem to exist in this patchset, is it in another set posted to lkml? This should read __update_cma_wmark_pages(). Sorry for the incorrect patch. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCHv21 12/16] mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks
On Fri, Feb 10, 2012 at 11:32 AM, Marek Szyprowski +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count) +{ + enum zone_type high_zoneidx = gfp_zone(gfp_mask); + struct zonelist *zonelist = node_zonelist(0, gfp_mask); + int did_some_progress = 0; + int order = 1; + unsigned long watermark; + + /* +* Increase level of watermarks to force kswapd do his job +* to stabilise at new watermark level. +*/ + __modify_min_cma_pages(zone, count); 2012/2/13 Michal Nazarewicz min...@mina86.com: This should read __update_cma_wmark_pages(). Sorry for the incorrect patch. On Mon, 13 Feb 2012 13:15:13 -0800, Robert Nelson robertcnel...@gmail.com wrote: Thanks Michal, that fixed it.. You are most welcome. cma-v21 with Rob Clark's omapdrm works great on the Beagle xM.. Can we take that as: Tested-by: Robert Nelson robertcnel...@gmail.com ? ;) -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks
On Wed, 08 Feb 2012 03:04:18 +0100, sandeep patil psandee...@gmail.com wrote: There's another problem I am facing with zone watermarks and CMA. Test details: Memory : 480 MB of total memory, 128 MB CMA region Test case : around 600 MB of file transfer over USB RNDIS onto target System Load : ftpd with console running on target. No one is doing CMA allocations except for the DMA allocations done by the drivers. Result : After about 300MB transfer, I start getting GFP_ATOMIC allocation failures. This only happens if CMA region is reserved. [...] Total memory available is way above the zone watermarks. So, we ended up starving UNMOVABLE/RECLAIMABLE atomic allocations that cannot fallback on CMA region. This looks like something Mel warned me about. I don't really have a good solution for that yet. ;/ -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/15] mm: compaction: export some of the functions
On Fri, Feb 3, 2012 at 8:18 PM, Marek Szyprowski m.szyprow...@samsung.com wrote: From: Michal Nazarewicz min...@mina86.com This commit exports some of the functions from compaction.c file outside of it adding their declaration into internal.h header file so that other mm related code can use them. This forced compaction.c to always be compiled (as opposed to being compiled only if CONFIG_COMPACTION is defined) but as to avoid introducing code that user did not ask for, part of the compaction.c is now wrapped in on #ifdef. On Sun, 05 Feb 2012 08:40:08 +0100, Hillf Danton dhi...@gmail.com wrote: What if both compaction and CMA are not enabled? What about it? If both are enabled, both will be compiled and usable. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added
+static inline bool migrate_async_suitable(int migratetype) On Fri, 03 Feb 2012 15:19:54 +0100, Hillf Danton dhi...@gmail.com wrote: Just nitpick, since the helper is not directly related to what async means, how about migrate_suitable(int migrate_type) ? 2012/2/3 Michal Nazarewicz min...@mina86.com: I feel current name is better suited since it says that it's OK to scan this block if it's an asynchronous compaction run. On Sat, 04 Feb 2012 10:09:02 +0100, Hillf Danton dhi...@gmail.com wrote: The input is the migrate type of page considered, and the async is only one of the modes that compaction should be carried out. Plus the helper is also used in other cases where async is entirely not concerned. That said, the naming is not clear, if not misleading. In the first version the function was called is_migrate_cma_or_movable() which described what the function checked. Mel did not like it though, hence the change to migrate_async_suitable(). Honestly, I'm not sure what would be the best name for function. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added
On Fri, Feb 3, 2012 at 8:18 PM, Marek Szyprowski m.szyprow...@samsung.com wrote: From: Michal Nazarewicz min...@mina86.com diff --git a/mm/compaction.c b/mm/compaction.c index d5174c4..a6e7c64 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -45,6 +45,11 @@ static void map_pages(struct list_head *list) } } +static inline bool migrate_async_suitable(int migratetype) On Fri, 03 Feb 2012 15:19:54 +0100, Hillf Danton dhi...@gmail.com wrote: Just nitpick, since the helper is not directly related to what async means, how about migrate_suitable(int migrate_type) ? I feel current name is better suited since it says that it's OK to scan this block if it's an asynchronous compaction run. +{ + return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; +} + /* * Isolate free pages onto a private freelist. Caller must hold zone-lock. * If @strict is true, will abort returning 0 on any invalid PFNs or non-free @@ -277,7 +282,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, */ pageblock_nr = low_pfn pageblock_order; if (!cc-sync last_pageblock_nr != pageblock_nr - get_pageblock_migratetype(page) != MIGRATE_MOVABLE) { + migrate_async_suitable(get_pageblock_migratetype(page))) { Here compaction looks corrupted if CMA not enabled, Mel? Damn, yes, this should be !migrate_async_suitable(...). Sorry about that. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote: Pages, which have incorrect migrate type on free finally causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE. On Thu, 02 Feb 2012 13:47:29 +0100, Mel Gorman m...@csn.ul.ie wrote: I'm not quite seeing this. In free_hot_cold_page(), the pageblock type is checked so the page private should be set to MIGRATE_CMA or MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a pageblock to MIGRATE_MOVABLE in error. Here's what I think may happen: When drain_all_pages() is called, __free_one_page() is called for each page on pcp list with migrate type deducted from page_private() which is MIGRATE_CMA. This result in the page being put on MIGRATE_CMA freelist even though its pageblock's migrate type is MIGRATE_ISOLATE. When allocation happens and pcp list is empty, rmqueue_bulk() will get executed with migratetype argument set to MIGRATE_MOVABLE. It calls __rmqueue() to grab some pages and because the page described above is on MIGRATE_CMA freelist it may be returned back to rmqueue_bulk(). But, pageblock's migrate type is not MIGRATE_CMA but MIGRATE_ISOLATE, so the following code: #ifdef CONFIG_CMA if (is_pageblock_cma(page)) set_page_private(page, MIGRATE_CMA); else #endif set_page_private(page, migratetype); will set it's private to MIGRATE_MOVABLE and in the end the page lands back on MIGRATE_MOVABLE pcp list but this time with page_private == MIGRATE_MOVABLE and not MIGRATE_CMA. One more drain_all_pages() (which may happen since alloc_contig_range() calls set_migratetype_isolate() for each block) and next __rmqueue_fallback() may convert the whole pageblock to MIGRATE_MOVABLE. I know, this sounds crazy and improbable, but I couldn't find an easier path to destruction. As you pointed, once the page is allocated, free_hot_cold_page() will do the right thing by reading pageblock's migrate type. Marek is currently experimenting with various patches including the following change: #ifdef CONFIG_CMA int mt = get_pageblock_migratetype(page); if (is_migrate_cma(mt) || mt == MIGRATE_ISOLATE) set_page_private(page, mt); else #endif set_page_private(page, migratetype); As a matter of fact, if __rmqueue() was changed to return migrate type of the freelist it took page from, we could avoid this get_pageblock_migratetype() all together. For now, however, I'd rather not go that way just yet -- I'll be happy to dig into it once CMA gets merged. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/15] mm: compaction: export some of the functions
On Thu, Jan 26, 2012 at 10:00:47AM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com --- a/mm/compaction.c +++ b/mm/compaction.c @@ -16,30 +16,11 @@ #include linux/sysfs.h #include internal.h +#if defined CONFIG_COMPACTION || defined CONFIG_CMA + On Mon, 30 Jan 2012 12:57:26 +0100, Mel Gorman m...@csn.ul.ie wrote: This is pedantic but you reference CONFIG_CMA before the patch that declares it. The only time this really matters is when it breaks bisection but I do not think that is the case here. I think I'll choose to be lazy on this one. ;) I actually tried to move some commits around to resolve this future-reference, but this resulted in quite a few conflicts during rebase and after several minutes I decided that it's not worth the effort. Whether you fix this or not by moving the CONFIG_CMA check to the same patch that declares it in Kconfig Acked-by: Mel Gorman m...@csn.ul.ie -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().
On Thu, Jan 26, 2012 at 10:00:45AM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com @@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, } else if (!locked) spin_lock_irq(zone-lru_lock); - if (!pfn_valid_within(low_pfn)) + if (!pfn_valid(low_pfn)) continue; nr_scanned++; On Mon, 30 Jan 2012 12:24:28 +0100, Mel Gorman m...@csn.ul.ie wrote: This chunk looks unrelated to the rest of the patch. I think what you are doing is patching around a bug that CMA exposed which is very similar to the bug report at http://www.spinics.net/lists/linux-mm/msg29260.html . Is this true? If so, I posted a fix that only calls pfn_valid() when necessary. Can you check if that works for you and if so, drop this hunk please? If the patch does not work for you, then this hunk still needs to be in a separate patch and handled separately as it would also be a fix for -stable. I'll actually never encountered this bug myself and CMA is unlikely to expose it, since it always operates on continuous memory regions with no holes. I've made this change because looking at the code it seemed like this may cause problems in some cases. The crash that you linked to looks like the kind of problem I was thinking about. I'll drop this hunk and let you resolve this independently of CMA. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added
On Thu, Jan 26, 2012 at 10:00:50AM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com @@ -875,10 +895,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, * This array describes the order lists are fallen back to when * the free lists for the desirable migrate type are depleted */ -static int fallbacks[MIGRATE_TYPES][3] = { +static int fallbacks[MIGRATE_TYPES][4] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, +#ifdef CONFIG_CMA + [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA, MIGRATE_RESERVE }, On Mon, 30 Jan 2012 13:35:42 +0100, Mel Gorman m...@csn.ul.ie wrote: This is a curious choice. MIGRATE_CMA is allowed to contain movable pages. By using MIGRATE_RECLAIMABLE and MIGRATE_UNMOVABLE for movable pages instead of MIGRATE_CMA, you increase the changes that unmovable pages will need to use MIGRATE_MOVABLE in the future which impacts fragmentation avoidance. I would recommend that you change this to { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE } At the beginning the idea was to try hard not to get pages from MIGRATE_CMA allocated at all, thus it was put at the end of the fallbacks list, but on a busy system this probably won't help anyway, so I'll change it per your suggestion. @@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) rmv_page_order(page); /* Take ownership for orders = pageblock_order */ - if (current_order = pageblock_order) + if (current_order = pageblock_order + !is_pageblock_cma(page)) change_pageblock_range(page, current_order, start_migratetype); - expand(zone, page, order, current_order, area, migratetype); + expand(zone, page, order, current_order, area, + is_migrate_cma(start_migratetype) +? start_migratetype : migratetype); What is this check meant to be doing? start_migratetype is determined by allocflags_to_migratetype() and that never will be MIGRATE_CMA so is_migrate_cma(start_migratetype) should always be false. Right, thanks! This should be the other way around, ie.: + expand(zone, page, order, current_order, area, + is_migrate_cma(migratetype) +? migratetype : start_migratetype); I'll fix this and the calls to is_pageblock_cma(). -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman m...@csn.ul.ie wrote: On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com @@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) spin_unlock_irqrestore(zone-lock, flags); return ret ? 0 : -EBUSY; } + +/* must hold zone-lock */ +void update_pcp_isolate_block(unsigned long pfn) +{ + unsigned long end_pfn = pfn + pageblock_nr_pages; + struct page *page; + + while (pfn end_pfn) { + if (!pfn_valid_within(pfn)) { + ++pfn; + continue; + } + On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman m...@csn.ul.ie wrote: There is a potential problem here that you need to be aware of. set_pageblock_migratetype() is called from start_isolate_page_range(). I do not think there is a guarantee that pfn + pageblock_nr_pages is not in a different block of MAX_ORDER_NR_PAGES. If that is right then your options are to add a check like this; if ((pfn (MAX_ORDER_NR_PAGES - 1)) == 0 !pfn_valid(pfn)) break; or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in the same block as pfn and relying on the caller to have called pfn_valid. pfn = round_down(pfn, pageblock_nr_pages); end_pfn = pfn + pageblock_nr_pages; should do the trick as well, right? move_freepages_block() seem to be doing the same thing. + page = pfn_to_page(pfn); + if (PageBuddy(page)) { + pfn += 1 page_order(page); + } else if (page_count(page) == 0) { + set_page_private(page, MIGRATE_ISOLATE); + ++pfn; This is dangerous for two reasons. If the page_count is 0, it could be because the page is in the process of being freed and is not necessarily on the per-cpu lists yet and you cannot be sure if the contents of page-private are important. Second, there is nothing to prevent another CPU allocating this page from its per-cpu list while the private field is getting updated from here which might lead to some interesting races. I recognise that what you are trying to do is respond to Gilad's request that you really check if an IPI here is necessary. I think what you need to do is check if a page with a count of 0 is encountered and if it is, then a draining of the per-cpu lists is necessary. To address Gilad's concerns, be sure to only this this once per attempt at CMA rather than for every page encountered with a count of 0 to avoid a storm of IPIs. It's actually more then that. This is the same issue that I first fixed with a change to free_pcppages_bulk() function[1]. At the time of positing, you said you'd like me to try and find a different solution which would not involve paying the price of calling get_pageblock_migratetype(). Later I also realised that this solution is not enough. [1] http://article.gmane.org/gmane.linux.kernel.mm/70314 My next attempt was to run drain PCP list while holding zone-lock[2], but that quickly proven to be broken approach when Marek started testing it on an SMP system. [2] http://article.gmane.org/gmane.linux.kernel.mm/72016 This patch is yet another attempt of solving this old issue. Even though it has a potential race condition we came to conclusion that the actual chances of causing any problems are slim. Various stress tests did not, in fact, show the race to be an issue. The problem is that if a page is on a PCP list, and it's underlaying pageblocks' migrate type is changed to MIGRATE_ISOLATE, the page (i) will still remain on PCP list and thus someone can allocate it, and (ii) when removed from PCP list, the page will be put on freelist of migrate type it had prior to change. (i) is actually not such a big issue since the next thing that happens after isolation is migration so all the pages will get freed. (ii) is actual problem and if [1] is not an acceptable solution I really don't have a good fix for that. One things that comes to mind is calling drain_all_pages() prior to acquiring zone-lock in set_migratetype_isolate(). This is however prone to races since after the drain and before the zone-lock is acquired, pages might get moved back to PCP list. Draining PCP list after acquiring zone-lock is not possible because smp_call_function_many() cannot be called with interrupts disabled, and changing spin_lock_irqsave() to spin_lock() followed by local_irq_save() causes a dead lock (that's what [2] attempted to do). Any suggestions are welcome! + } else { + ++pfn; + } + } +} -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe
Re: [PATCHv19 00/15] Contiguous Memory Allocator
On Mon, 30 Jan 2012 14:25:12 +0100, Mel Gorman m...@csn.ul.ie wrote: I reviewed the core MM changes and I've acked most of them so the next release should have a few acks where you expect them. I did not add a reviewed-by because I did not build and test the thing. Thanks! I've either replied to your comments or applied suggested changes. If anyone cares, not-tested changes are available at git://github.com/mina86/linux-2.6.git cma -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv19 00/15] Contiguous Memory Allocator
On Thu, 26 Jan 2012 16:31:40 +0100, Arnd Bergmann a...@arndb.de wrote: I haven't followed the last two releases so closely. It seems that in v17 the movable pages with pending i/o was still a major problem but in v18 you added a solution. Is that right? What is still left to be done here then? In the current version, when allocation fails because of a page with pending I/O, CMA automatically tries allocation in another region. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()
On Tue, 17 Jan 2012 22:54:28 +0100, sandeep patil psandee...@gmail.com wrote: Marek, I am running a CMA test where I keep allocating from a CMA region as long as the allocation fails due to lack of space. However, I am seeing failures much before I expect them to happen. When the allocation fails, I see a warning coming from __alloc_contig_range(), because test_pages_isolated() returned true. Yeah, we are wondering ourselves about that. Could you try cherry-picking commit ad10eb079c97e27b4d27bc755c605226ce1625de (update migrate type on pcp when isolating) from git://github.com/mina86/linux-2.6.git? It probably won't apply cleanly but resolving the conflicts should not be hard (alternatively you can try branch cma from the same repo but it is a work in progress at the moment). I tried to find out why this happened and added in a debug print inside __test_page_isolated_in_pageblock(). Here's the resulting log .. [...] From the log it looks like the warning showed up because page-private is set to MIGRATE_CMA instead of MIGRATE_ISOLATED. My understanding of that situation is that the page is on pcp list in which cases it's page_private is not updated. Draining and the first patch in the series (and also the commit I've pointed to above) are designed to fix that but I'm unsure why they don't work all the time. I've also had a test case where it failed because (page_count() != 0) Have you or anyone else seen this during the CMA testing? Also, could this be because we are finding a page within (start, end) that actually belongs to a higher order Buddy block ? Higher order free buddy blocks are skipped in the “if (PageBuddy(page))” path of __test_page_isolated_in_pageblock(). Then again, now that I think of it, something fishy may be happening on the edges. Moving the check outside of __alloc_contig_migrate_range() after outer_start is calculated in alloc_contig_range() could help. I'll take a look at it. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()
My understanding of that situation is that the page is on pcp list in which cases it's page_private is not updated. Draining and the first patch in the series (and also the commit I've pointed to above) are designed to fix that but I'm unsure why they don't work all the time. On Wed, 18 Jan 2012 01:46:37 +0100, sandeep patil psandee...@gmail.com wrote: Will verify this if the page is found on the pcp list as well . I was wondering in general if “!PageBuddy(page) !page_count(page)” means page is on PCP. From what I've seen in page_isolate.c it seems to be the case. I've also had a test case where it failed because (page_count() != 0) With this, when it failed the page_count() returned a value of 2. I am not sure why, but I will try and see If I can reproduce this. If I'm not mistaken, page_count() != 0 means the page is allocated. I can see the following scenarios which can lead to page being allocated in when test_pages_isolated() is called: 1. The page failed to migrate. In this case however, the code would abort earlier. 2. The page was migrated but then allocated. This is not possible since migrated pages are freed which puts the page on MIGRATE_ISOLATE freelist which guarantees that the page will not be migrated. 3. The page was removed from PCP list but with migratetype == MIGRATE_CMA. This is something the first patch in the series as well as the commit I've mentioned tries to address so hopefully it won't be an issue any more. 4. The page was allocated from PCP list. This may happen because draining of PCP list happens after IRQs are enabled in set_migratetype_isolate(). I don't have a solution for that just yet. One is to alter update_pcp_isolate_block() to remove page from the PCP list. I haven't looked at specifics of how to implement this just yet. Moving the check outside of __alloc_contig_migrate_range() after outer_start is calculated in alloc_contig_range() could help. I was going to suggest that, moving the check until after outer_start is calculated will definitely help IMO. I am sure I've seen a case where page_count(page) = page-private = 0 and PageBuddy(page) was false. Yep, I've pushed new content to my branch (git://github.com/mina86/linux-2.6.git cma) and will try to get Marek to test it some time soon (I'm currently swamped with non-Linux related work myself). -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()
On Mon, 16 Jan 2012 10:01:10 +0100, Mel Gorman m...@csn.ul.ie wrote: On Fri, Jan 13, 2012 at 09:04:31PM +0100, Michal Nazarewicz wrote: On Thu, Dec 29, 2011 at 01:39:05PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com + /* Make sure all pages are isolated. */ + if (!ret) { + lru_add_drain_all(); + drain_all_pages(); + if (WARN_ON(test_pages_isolated(start, end))) + ret = -EBUSY; + } On Tue, 10 Jan 2012 15:16:13 +0100, Mel Gorman m...@csn.ul.ie wrote: Another global IPI seems overkill. Drain pages only from the local CPU (drain_pages(get_cpu()); put_cpu()) and test if the pages are isolated. Is get_cpu() + put_cpu() required? Won't drain_local_pages() work? drain_local_pages() calls smp_processor_id() without preemption disabled. Thanks, I wasn't sure if preemption is an issue. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()
On Thu, Dec 29, 2011 at 01:39:05PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com + /* Make sure all pages are isolated. */ + if (!ret) { + lru_add_drain_all(); + drain_all_pages(); + if (WARN_ON(test_pages_isolated(start, end))) + ret = -EBUSY; + } On Tue, 10 Jan 2012 15:16:13 +0100, Mel Gorman m...@csn.ul.ie wrote: Another global IPI seems overkill. Drain pages only from the local CPU (drain_pages(get_cpu()); put_cpu()) and test if the pages are isolated. Is get_cpu() + put_cpu() required? Won't drain_local_pages() work? -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().
On Tue, 10 Jan 2012 14:43:51 +0100, Mel Gorman m...@csn.ul.ie wrote: On Thu, Dec 29, 2011 at 01:39:03PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com +static unsigned long +isolate_freepages_range(struct zone *zone, + unsigned long start_pfn, unsigned long end_pfn, + struct list_head *freelist) { - unsigned long zone_end_pfn, end_pfn; - int nr_scanned = 0, total_isolated = 0; - struct page *cursor; - - /* Get the last PFN we should scan for free pages at */ - zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages; - end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn); + unsigned long nr_scanned = 0, total_isolated = 0; + unsigned long pfn = start_pfn; + struct page *page; - /* Find the first usable PFN in the block to initialse page cursor */ - for (; blockpfn end_pfn; blockpfn++) { - if (pfn_valid_within(blockpfn)) - break; - } - cursor = pfn_to_page(blockpfn); + VM_BUG_ON(!pfn_valid(pfn)); + page = pfn_to_page(pfn); The existing code is able to find the first usable PFN in a pageblock with pfn_valid_within(). It's allowed to do that because it knows the pageblock is valid so calling pfn_valid() is unnecessary. It is curious to change this to something that can sometimes BUG_ON !pfn_valid(pfn) instead of having a PFN walker that knows how to handle pfn_valid(). Actually, this walker seem redundant since one is already present in isolate_freepages(), ie. if !pfn_valid(pfn) then the loop in isolate_freepages() will “continue;” and the function will never get called. +cleanup: + /* +* Undo what we have done so far, and return. We know all pages from +* [start_pfn, pfn) are free because we have just freed them. If one of +* the page in the range was not freed, we would end up here earlier. +*/ + for (; start_pfn pfn; ++start_pfn) + __free_page(pfn_to_page(start_pfn)); + return 0; This returns 0 even though you could have isolated some pages. The loop's intention is to “unisolate” (ie. free) anything that got isolated, so at the end number of isolated pages in in fact zero. Overall, there would have been less contortion if you implemented isolate_freepages_range() in terms of the existing isolate_freepages_block. Something that looked kinda like this not compiled untested illustration. static unsigned long isolate_freepages_range(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn, struct list_head *list) { unsigned long pfn = start_pfn; unsigned long isolated; for (pfn = start_pfn; pfn end_pfn; pfn += pageblock_nr_pages) { if (!pfn_valid(pfn)) continue; isolated += isolate_freepages_block(zone, pfn, list); } return isolated; } I neglected to update isolate_freepages_block() to take the end_pfn meaning it will in fact isolate too much but that would be easy to address. This is not a problem since my isolate_freepages_range() implementation can go beyond end_pfn, anyway. Of course, the function taking end_pfn is an optimisation since it does not have to compute zone_end each time. It would be up to yourself whether to shuffle the tracepoint around because with this example it will be triggered once per pageblock. You'd still need the cleanup code and freelist check in isolate_freepages_block() of course. The point is that it would minimise the disruption to the existing code and easier to get details like pfn_valid() right without odd contortions, more gotos than should be necessary and trying to keep pfn and page straight. Even though I'd personally go with my approach, I see merit in your point, so will change. } /* Returns true if the page is within a block suitable for migration to */ @@ -365,17 +403,49 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, nr_isolated++; /* Avoid isolating too much */ - if (cc-nr_migratepages == COMPACT_CLUSTER_MAX) + if (cc-nr_migratepages == COMPACT_CLUSTER_MAX) { + ++low_pfn; break; + } } This minor optimisation is unrelated to the implementation of isolate_migratepages_range() and should be in its own patch. It's actually not a mere minor optimisation, but rather making the function work according to the documentation added, ie. that it returns “PFN of the first page that was not scanned”. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe
Re: [PATCH 05/11] mm: mmzone: MIGRATE_CMA migration type added
On Tue, 10 Jan 2012 15:38:36 +0100, Mel Gorman m...@csn.ul.ie wrote: On Thu, Dec 29, 2011 at 01:39:06PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com @@ -35,13 +35,35 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3 -#define MIGRATE_UNMOVABLE 0 -#define MIGRATE_RECLAIMABLE 1 -#define MIGRATE_MOVABLE 2 -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ -#define MIGRATE_RESERVE 3 -#define MIGRATE_ISOLATE 4 /* can't allocate from here */ -#define MIGRATE_TYPES 5 +enum { + MIGRATE_UNMOVABLE, + MIGRATE_RECLAIMABLE, + MIGRATE_MOVABLE, + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ + MIGRATE_RESERVE = MIGRATE_PCPTYPES, + /* +* MIGRATE_CMA migration type is designed to mimic the way +* ZONE_MOVABLE works. Only movable pages can be allocated +* from MIGRATE_CMA pageblocks and page allocator never +* implicitly change migration type of MIGRATE_CMA pageblock. +* +* The way to use it is to change migratetype of a range of +* pageblocks to MIGRATE_CMA which can be done by +* __free_pageblock_cma() function. What is important though +* is that a range of pageblocks must be aligned to +* MAX_ORDER_NR_PAGES should biggest page be bigger then +* a single pageblock. +*/ + MIGRATE_CMA, + MIGRATE_ISOLATE,/* can't allocate from here */ + MIGRATE_TYPES +}; MIGRATE_CMA is being added whether or not CONFIG_CMA is set. This increases the size of the pageblock bitmap and where that is just 1 bit per pageblock in the system, it may be noticable on large machines. Wasn't aware of that, will do. In fact, in earlier versions in was done this way, but resulted in more #ifdefs. + +#ifdef CONFIG_CMA +# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) +#else +# define is_migrate_cma(migratetype) false +#endif Use static inlines. I decide to use #define for the sake of situations like is_migrate_cma(get_pageblock_migratetype(page)). With a static inline it will have to read pageblock's migrate type even if !CONFIG_CMA. A macro gets rid of this operation all together. #define for_each_migratetype_order(order, type) \ for (order = 0; order MAX_ORDER; order++) \ @@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page) return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end); } +static inline bool is_pageblock_cma(struct page *page) +{ + return is_migrate_cma(get_pageblock_migratetype(page)); +} + This allows additional calls to get_pageblock_migratetype() even if CONFIG_CMA is not set. struct free_area { struct list_headfree_list[MIGRATE_TYPES]; unsigned long nr_free; diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index d305080..af650db 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -37,4 +37,7 @@ extern void unset_migratetype_isolate(struct page *page); int alloc_contig_range(unsigned long start, unsigned long end); void free_contig_range(unsigned long pfn, unsigned nr_pages); +/* CMA stuff */ +extern void init_cma_reserved_pageblock(struct page *page); + #endif diff --git a/mm/Kconfig b/mm/Kconfig index 011b110..e080cac 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -192,7 +192,7 @@ config COMPACTION config MIGRATION bool Page migration def_bool y - depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION + depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA help Allows the migration of the physical location of pages of processes while the virtual addresses are not changed. This is useful in diff --git a/mm/compaction.c b/mm/compaction.c index 8733441..46783b4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -21,6 +21,11 @@ #define CREATE_TRACE_POINTS #include trace/events/compaction.h +static inline bool is_migrate_cma_or_movable(int migratetype) +{ + return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; +} + That is not a name that helps any. migrate_async_suitable() would be marginally better. /** * isolate_freepages_range() - isolate free pages, must hold zone-lock. * @zone: Zone pages are in. @@ -213,7 +218,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, */ pageblock_nr = low_pfn pageblock_order; if (!cc-sync last_pageblock_nr != pageblock_nr - get_pageblock_migratetype(page) != MIGRATE_MOVABLE) { + is_migrate_cma_or_movable(get_pageblock_migratetype(page))) { low_pfn += pageblock_nr_pages; low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1
Re: [PATCH 01/11] mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating
On Thu, Dec 29, 2011 at 2:39 PM, Marek Szyprowski m.szyprow...@samsung.com wrote: From: Michal Nazarewicz min...@mina86.com When set_migratetype_isolate() sets pageblock's migrate type, it does not change each page_private data. This makes sense, as the function has no way of knowing what kind of information page_private stores. A side effect is that instead of draining pages from all zones, set_migratetype_isolate() now drain only pages from zone pageblock it operates on is in. +/* Caller must hold zone-lock. */ +static void __zone_drain_local_pages(void *arg) +{ + struct per_cpu_pages *pcp; + struct zone *zone = arg; + unsigned long flags; + + local_irq_save(flags); + pcp = per_cpu_ptr(zone-pageset, smp_processor_id())-pcp; + if (pcp-count) { + /* Caller holds zone-lock, no need to grab it. */ + __free_pcppages_bulk(zone, pcp-count, pcp); + pcp-count = 0; + } + local_irq_restore(flags); +} + +/* + * Like drain_all_pages() but operates on a single zone. Caller must + * hold zone-lock. + */ +static void __zone_drain_all_pages(struct zone *zone) +{ + on_each_cpu(__zone_drain_local_pages, zone, 1); +} + On Sun, 01 Jan 2012 08:49:13 +0100, Gilad Ben-Yossef gi...@benyossef.com wrote: Please consider whether sending an IPI to all processors in the system and interrupting them is appropriate here. You seem to assume that it is probable that each CPU of the possibly 4,096 (MAXSMP on x86) has a per-cpu page for the specified zone, I'm not really assuming that (in fact I expect what you fear, ie. that most CPUs won't have pages from specified zone an PCP list), however, I really need to make sure to get them off all PCP lists. otherwise you're just interrupting them out of doing something useful, or save power idle for nothing. Exactly what's happening now anyway. While that may or may not be a reasonable assumption for the general drain_all_pages that drains pcps from all zones, I feel it is less likely to be the right thing once you limit the drain to a single zone. Currently, set_migratetype_isolate() seem to do more then it needs to, ie. it drains all the pages even though all it cares about is a single zone. Some background on my attempt to reduce IPI noise in the system in this context is probably useful here as well: https://lkml.org/lkml/2011/11/22/133 Looks interesting, I'm not entirely sure why it does not end up a race condition, but in case of __zone_drain_all_pages() we already hold zone-lock, so my fears are somehow gone.. I'll give it a try, and prepare a patch for __zone_drain_all_pages(). -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/11] mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating
On Sun, 01 Jan 2012 17:06:53 +0100, Gilad Ben-Yossef gi...@benyossef.com wrote: 2012/1/1 Michal Nazarewicz min...@mina86.com: Looks interesting, I'm not entirely sure why it does not end up a race condition, but in case of __zone_drain_all_pages() we already hold If a page is in the PCP list when we check, you'll send the IPI and all is well. If it isn't when we check and gets added later you could just the same have situation where we send the IPI, try to do try an empty PCP list and then the page gets added. So we are not adding a race condition that is not there already :-) Right, makes sense. zone-lock, so my fears are somehow gone.. I'll give it a try, and prepare a patch for __zone_drain_all_pages(). I plan to send V5 of the IPI noise patch after some testing. It has a new version of the drain_all_pages, with no allocation in the reclaim path and no locking. You might want to wait till that one is out to base on it. This shouldn't be a problem for my case as set_migratetype_isolate() is hardly ever called in reclaim path. :) The change so far seems rather obvious: mm/page_alloc.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 424d36a..eaa686b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1181,7 +1181,19 @@ static void __zone_drain_local_pages(void *arg) */ static void __zone_drain_all_pages(struct zone *zone) { - on_each_cpu(__zone_drain_local_pages, zone, 1); + struct per_cpu_pageset *pcp; + cpumask_var_t cpus; + int cpu; + + if (likely(zalloc_cpumask_var(cpus, GFP_ATOMIC | __GFP_NOWARN))) { + for_each_online_cpu(cpu) + if (per_cpu_ptr(zone-pageset, cpu)-pcp.count) + cpumask_set_cpu(cpu, cpus); + on_each_cpu_mask(cpus, __zone_drain_local_pages, zone, 1); + free_cpumask_var(cpus); + } else { + on_each_cpu(__zone_drain_local_pages, zone, 1); + } } #ifdef CONFIG_HIBERNATION -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/11] mm: page_alloc: handle MIGRATE_ISOLATE in free_pcppages_bulk()
On Fri, Nov 18, 2011 at 05:43:08PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9dd443d..58d1a2e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -628,6 +628,18 @@ static void free_pcppages_bulk(struct zone *zone, int count, page = list_entry(list-prev, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(page-lru); + + /* +* When page is isolated in set_migratetype_isolate() +* function it's page_private is not changed since the +* function has no way of knowing if it can touch it. +* This means that when a page is on PCP list, it's +* page_private no longer matches the desired migrate +* type. +*/ + if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE) + set_page_private(page, MIGRATE_ISOLATE); + On Mon, 12 Dec 2011 14:42:35 +0100, Mel Gorman m...@csn.ul.ie wrote: How much of a problem is this in practice? IIRC, this lead to allocation being made from area marked as isolated or some such. [...] I'd go as far to say that it would be preferable to drain the per-CPU lists after you set pageblocks MIGRATE_ISOLATE. The IPIs also have overhead but it will be incurred for the rare rather than the common case. I'll look into that. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] mm: mmzone: introduce zone_pfn_same_memmap()
On Fri, Nov 18, 2011 at 05:43:10PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com diff --git a/mm/compaction.c b/mm/compaction.c index 6afae0e..09c9702 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -111,7 +111,10 @@ skip: next: pfn += isolated; - page += isolated; + if (zone_pfn_same_memmap(pfn - isolated, pfn)) + page += isolated; + else + page = pfn_to_page(pfn); } On Mon, 12 Dec 2011 15:19:53 +0100, Mel Gorman m...@csn.ul.ie wrote: Is this necessary? We are isolating pages, the largest of which is a MAX_ORDER_NR_PAGES page. [...] This is not true for CMA. That said, everywhere else managed to avoid checks like this by always scanning in units of pageblocks. Maybe this should be structured the same way to guarantee pfn_valid is called at least per pageblock (even though only once per MAX_ORDER_NR_PAGES is necessary). I'll look into that. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/11] mm: compaction: export some of the functions
On Mon, 12 Dec 2011 15:29:07 +0100, Mel Gorman m...@csn.ul.ie wrote: On Fri, Nov 18, 2011 at 05:43:11PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com This commit exports some of the functions from compaction.c file outside of it adding their declaration into internal.h header file so that other mm related code can use them. This forced compaction.c to always be compiled (as opposed to being compiled only if CONFIG_COMPACTION is defined) but as to avoid introducing code that user did not ask for, part of the compaction.c is now wrapped in on #ifdef. Signed-off-by: Michal Nazarewicz min...@mina86.com Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com --- mm/Makefile |3 +- mm/compaction.c | 112 +++ mm/internal.h | 35 + 3 files changed, 83 insertions(+), 67 deletions(-) diff --git a/mm/Makefile b/mm/Makefile index 50ec00e..24ed801 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -13,7 +13,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \ page_isolation.o mm_init.o mmu_context.o percpu.o \ - $(mmu-y) + $(mmu-y) compaction.o That should be compaction.o $(mmu-y) for consistency. Overall, this patch implies that CMA is always compiled in. Not really. But yes, it produces some bloat when neither CMA nor compaction are compiled. I assume that linker will be able to deal with that (since the functions are not EXPORT_SYMBOL'ed). Note also that the was majority of compaction.c is #ifdef'd though so only a handful of functions are compiled. Why not just make CMA depend on COMPACTION to keep things simplier? I could imagine that someone would want to have CMA but not compaction, hence I decided to give that choice. For example, if you enable CMA and do not enable COMPACTION, you lose things like the vmstat counters that can aid debugging. In fact, as parts of compaction.c are using defines like COMPACTBLOCKS, I'm not even sure compaction.c can compile without CONFIG_COMPACTION because of the vmstat stuff. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] mm: mmzone: introduce zone_pfn_same_memmap()
On Fri, Nov 18, 2011 at 05:43:10PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com diff --git a/mm/compaction.c b/mm/compaction.c index 6afae0e..09c9702 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -111,7 +111,10 @@ skip: next: pfn += isolated; - page += isolated; + if (zone_pfn_same_memmap(pfn - isolated, pfn)) + page += isolated; + else + page = pfn_to_page(pfn); } On Mon, 12 Dec 2011 15:19:53 +0100, Mel Gorman m...@csn.ul.ie wrote: Is this necessary? We are isolating pages, the largest of which is a MAX_ORDER_NR_PAGES page. [...] On Mon, 12 Dec 2011 15:40:30 +0100, Mel Gorman m...@csn.ul.ie wrote: To be clear, I'm referring to a single page being isolated here. It may or may not be a high-order page but it's still going to be less then MAX_ORDER_NR_PAGES so you should be able check when a new block is entered and pfn_to_page is necessary. Do you mean something like: if (same pageblock) just do arithmetic; else use pfn_to_page; ? I've discussed it with Dave and he suggested that approach as an optimisation since in some configurations zone_pfn_same_memmap() is always true thus compiler will strip the else part, whereas same pageblock test will be false on occasions regardless of kernel configuration. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().
On Fri, Nov 18, 2011 at 05:43:09PM +0100, Marek Szyprowski wrote: From: Michal Nazarewicz min...@mina86.com diff --git a/mm/compaction.c b/mm/compaction.c index 899d956..6afae0e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -54,51 +54,64 @@ static unsigned long release_freepages(struct list_head *freelist) return count; } -/* Isolate free pages onto a private freelist. Must hold zone-lock */ -static unsigned long isolate_freepages_block(struct zone *zone, - unsigned long blockpfn, - struct list_head *freelist) +/** + * isolate_freepages_range() - isolate free pages, must hold zone-lock. + * @zone: Zone pages are in. + * @start: The first PFN to start isolating. + * @end: The one-past-last PFN. + * @freelist: A list to save isolated pages to. + * + * If @freelist is not provided, holes in range (either non-free pages + * or invalid PFNs) are considered an error and function undos its + * actions and returns zero. + * + * If @freelist is provided, function will simply skip non-free and + * missing pages and put only the ones isolated on the list. + * + * Returns number of isolated pages. This may be more then end-start + * if end fell in a middle of a free page. + */ +static unsigned long +isolate_freepages_range(struct zone *zone, + unsigned long start, unsigned long end, + struct list_head *freelist) On Mon, 12 Dec 2011 15:07:28 +0100, Mel Gorman m...@csn.ul.ie wrote: Use start_pfn and end_pfn to keep it consistent with the rest of compaction.c. Will do. { - unsigned long zone_end_pfn, end_pfn; - int nr_scanned = 0, total_isolated = 0; - struct page *cursor; - - /* Get the last PFN we should scan for free pages at */ - zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages; - end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn); + unsigned long nr_scanned = 0, total_isolated = 0; + unsigned long pfn = start; + struct page *page; - /* Find the first usable PFN in the block to initialse page cursor */ - for (; blockpfn end_pfn; blockpfn++) { - if (pfn_valid_within(blockpfn)) - break; - } - cursor = pfn_to_page(blockpfn); + VM_BUG_ON(!pfn_valid(pfn)); + page = pfn_to_page(pfn); /* Isolate free pages. This assumes the block is valid */ - for (; blockpfn end_pfn; blockpfn++, cursor++) { - int isolated, i; - struct page *page = cursor; - - if (!pfn_valid_within(blockpfn)) - continue; - nr_scanned++; - - if (!PageBuddy(page)) - continue; + while (pfn end) { + unsigned isolated = 1, i; + Do not use implcit types. These are unsigned ints, call them unsigned ints. Will do. + if (!pfn_valid_within(pfn)) + goto skip; The flow of this function in general with gotos of skipped and next is confusing in comparison to the existing function. For example, if this PFN is not valid, and no freelist is provided, then we call __free_page() on a PFN that is known to be invalid. + ++nr_scanned; + + if (!PageBuddy(page)) { +skip: + if (freelist) + goto next; + for (; start pfn; ++start) + __free_page(pfn_to_page(pfn)); + return 0; + } So if a PFN is valid and !PageBuddy and no freelist is provided, we call __free_page() on it regardless of reference count. That does not sound safe. Sorry about that. It's a bug in the code which was caught later on. The code should read “__free_page(pfn_to_page(start))”. /* Found a free page, break it into order-0 pages */ isolated = split_free_page(page); total_isolated += isolated; - for (i = 0; i isolated; i++) { - list_add(page-lru, freelist); - page++; + if (freelist) { + struct page *p = page; + for (i = isolated; i; --i, ++p) + list_add(p-lru, freelist); } - /* If a page was split, advance to the end of it */ - if (isolated) { - blockpfn += isolated - 1; - cursor += isolated - 1; - } +next: + pfn += isolated; + page += isolated; The name isolated is now confusing because it can mean either pages isolated or pages scanned depending on context. Your patch appears to be doing a lot more than is necessary to convert isolate_freepages_block into isolate_freepages_range and at this point, it's unclear why you did that. When
Re: [PATCH 04/11] mm: compaction: export some of the functions
On Mon, 12 Dec 2011 15:29:07 +0100, Mel Gorman m...@csn.ul.ie wrote: Overall, this patch implies that CMA is always compiled in. On Mon, Dec 12, 2011 at 03:41:04PM +0100, Michal Nazarewicz wrote: Not really. But yes, it produces some bloat when neither CMA nor compaction are compiled. I assume that linker will be able to deal with that (since the functions are not EXPORT_SYMBOL'ed). On Mon, 12 Dec 2011 16:40:15 +0100, Mel Gorman m...@csn.ul.ie wrote: The bloat exists either way. I don't believe the linker strips it out so overall it would make more sense to depend on compaction to keep the vmstat counters for debugging reasons if nothing else. It's not something I feel very strongly about though. KK, I'll do that then. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().
On Mon, 12 Dec 2011 17:30:52 +0100, Mel Gorman m...@csn.ul.ie wrote: On Mon, Dec 12, 2011 at 04:22:39PM +0100, Michal Nazarewicz wrote: SNIP + if (!pfn_valid_within(pfn)) + goto skip; The flow of this function in general with gotos of skipped and next is confusing in comparison to the existing function. For example, if this PFN is not valid, and no freelist is provided, then we call __free_page() on a PFN that is known to be invalid. + ++nr_scanned; + + if (!PageBuddy(page)) { +skip: + if (freelist) + goto next; + for (; start pfn; ++start) + __free_page(pfn_to_page(pfn)); + return 0; + } So if a PFN is valid and !PageBuddy and no freelist is provided, we call __free_page() on it regardless of reference count. That does not sound safe. Sorry about that. It's a bug in the code which was caught later on. The code should read ???__free_page(pfn_to_page(start))???. On Mon, 12 Dec 2011 17:30:52 +0100, Mel Gorman m...@csn.ul.ie wrote: That will call free on valid PFNs but why is it safe to call __free_page() at all? You say later that CMA requires that all pages in the range be valid but if the pages are in use, that does not mean that calling __free_page() is safe. I suspect you have not seen a problem because the pages in the range were free as expected and not in use because of MIGRATE_ISOLATE. All pages from [start, pfn) have passed through the loop body which means that they are valid and they have been removed from buddy (for caller's use). Also, because of split_free_page(), all of those pages have been split into 0-order pages. Therefore, in error recovery, to undo what the loop has done so far, we put give back to buddy by calling __free_page() on each 0-order page. /* Found a free page, break it into order-0 pages */ isolated = split_free_page(page); total_isolated += isolated; - for (i = 0; i isolated; i++) { - list_add(page-lru, freelist); - page++; + if (freelist) { + struct page *p = page; + for (i = isolated; i; --i, ++p) + list_add(p-lru, freelist); } - /* If a page was split, advance to the end of it */ - if (isolated) { - blockpfn += isolated - 1; - cursor += isolated - 1; - } +next: + pfn += isolated; + page += isolated; The name isolated is now confusing because it can mean either pages isolated or pages scanned depending on context. Your patch appears to be doing a lot more than is necessary to convert isolate_freepages_block into isolate_freepages_range and at this point, it's unclear why you did that. When CMA uses this function, it requires all pages in the range to be valid and free. (Both conditions should be met but you never know.) To be clear, I meant that the CMA expects pages to be in buddy when the function is called but after the function finishes, all the pages in the range are removed from buddy. This, among other things, is why the call to split_free_page() is necessary. It seems racy but I guess you are depending on MIGRATE_ISOLATE to keep things sane which is fine. However, I strongly suspect that if there is a race and a page is in use, then you will need to retry the migration step. Calling __free_page does not look right because something still has a reference to the page. This change adds a second way isolate_freepages_range() works, which is when freelist is not specified, abort on invalid or non-free page, but continue as usual if freelist is provided. Ok, I think you should be able to do that by not calling split_free_page or adding to the list if !freelist with a comment explaining why the pages are left on the buddy lists for the caller to figure out. Bail if a page-in-use is found and have the caller check that the return value of isolate_freepages_block == end_pfn - start_pfn. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm: cma: hack/workaround for some allocation issues
On Fri, 25 Nov 2011 17:43:07 +0100, Marek Szyprowski m.szyprow...@samsung.com wrote: This is a quick and dirty patch and hack to solve some memory allocation issues that appeared at CMA v17 after switching migration code from hotplug to memory compaction. Especially the issue with watermark adjustment need a real fix instead of disabling the code. Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com --- Hello, This patch fixes the issues that have been reported recently. It should be considered only as a temporary solution until a new version of CMA patches is ready. Best regards -- Marek Szyprowski Samsung Poland RD Center --- mm/compaction.c |5 - mm/page_alloc.c | 12 +--- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 3e07341..41976f8 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -79,8 +79,9 @@ isolate_freepages_range(struct zone *zone, skip: if (freelist) goto next; +failed: for (; start pfn; ++start) - __free_page(pfn_to_page(pfn)); + __free_page(pfn_to_page(start)); return 0; } Yeah, my mistake, sorry about that. ;) @@ -91,6 +92,8 @@ skip: struct page *p = page; for (i = isolated; i; --i, ++p) list_add(p-lru, freelist); + } else if (!isolated) { + goto failed; } next: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 714b1c1..b4a46c7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1303,12 +1303,12 @@ int split_free_page(struct page *page) zone = page_zone(page); order = page_order(page); - +#if 0 /* Obey watermarks as if the page was being allocated */ watermark = low_wmark_pages(zone) + (1 order); if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) return 0; - +#endif /* Remove page from free list */ list_del(page-lru); zone-free_area[order].nr_free--; Come to think of it, this watermark check seem a little meaningless in case of CMA. With CMA the pages that we are splitting here have migrate type ISOLATE so they aren't “free” at all. Buddy will never use them for allocation. That means that we don't really allocate any pages, we just want to split them into order-0 pages. Also, if we bail out now, it's a huge waste of time and efforts. So, if the watermarks need to be checked, they should somewhere before we do migration and stuff. This may be due to my ignorance, but I don't know whether we really need the watermark check if we decide to use plain alloc_page() as allocator for migrate_pages() rather then compaction_alloc(). @@ -5734,6 +5734,12 @@ static unsigned long pfn_align_to_maxpage_up(unsigned long pfn) return ALIGN(pfn, MAX_ORDER_NR_PAGES); } +static struct page * +cma_migrate_alloc(struct page *page, unsigned long private, int **x) +{ + return alloc_page(GFP_HIGHUSER_MOVABLE); +} + static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) { /* This function is based on compact_zone() from compaction.c. */ @@ -5801,7 +5807,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) } /* Try to migrate. */ - ret = migrate_pages(cc.migratepages, compaction_alloc, + ret = migrate_pages(cc.migratepages, cma_migrate_alloc, (unsigned long)cc, false, cc.sync); /* Migrated all of them? Great! */ Yep, that makes sense to me. compaction_alloc() takes only free pages (ie. pages from buddy system) from given zone. This means that no pages get discarded or swapped to disk. This makes sense for compaction since it's opportunistic in its nature. We, however, want pages to be discarded/swapped if that's the only way of getting pages to migrate to. Of course, with this change the “(unsigneg long)cc” part can be safely replaced with “NULL” and “cc.nr_freepages -= release_freepages(cc.freepages);” at the end of the function (not visible in this patch) with the next line removed. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCHv17 0/11] Contiguous Memory Allocator
On Fri, 18 Nov 2011 22:20:48 +0100, sandeep patil wrote: So, i guess my question is, until all the migration failures are tracked down and fixed, is there a plan to retry the contiguous allocation from a new range in the CMA region? 2011/11/18 Michal Nazarewicz min...@mina86.com: No. Current CMA implementation will stick to the same range of pages also on consequent allocations of the same size. On Sat, 19 Nov 2011 00:30:49 +0100, sandeep patil psandee...@gmail.com wrote: Doesn't that mean the drivers that fail to allocate from contiguous DMA region will fail, if the migration fails? Yes. I have some ideas how that could be mitigated. The easiest would be to try another region to allocate from on failure. More complicated could be to try and wait for the I/O transfer to finish. I'll try to work on it during upcoming week. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCHv17 0/11] Contiguous Memory Allocator
On Fri, 18 Nov 2011 22:20:48 +0100, sandeep patil psandee...@gmail.com wrote: I am running a simple test to allocate contiguous regions and write a log on in a file on sdcard simultaneously. I can reproduce this migration failure 100% times with it. when I tracked the pages that failed to migrate, I found them on the buffer head lru list with a reference held on the buffer_head in the page, which causes drop_buffers() to fail. So, i guess my question is, until all the migration failures are tracked down and fixed, is there a plan to retry the contiguous allocation from a new range in the CMA region? No. Current CMA implementation will stick to the same range of pages also on consequent allocations of the same size. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CMA v16 and DMA-mapping v13 patch series
Marek Szyprowski m.szyprow...@samsung.com writes: Linux v3.1-rc10 with both CMA v16 and DMA-mapping v3: git://git.infradead.org/users/kmpark/linux-2.6-samsung 3.1-rc10-cma-v16-dma-v3 I've pushed a new version based on Mel's suggestions to git://github.com/mina86/linux-2.6.git cma-17 Unfortunately, it took me more time then I anticipated and so I had no time to test the code in any way (other then compile it on x86_64). -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-mina86-mina86.com-jid:mina86-jabber.org--ooO--(_)--Ooo-- pgpdqFUugNscM.pgp Description: PGP signature
Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
On Tue, 01 Nov 2011 16:04:48 +0100, Mel Gorman m...@csn.ul.ie wrote: For the purposes of review, have a separate patch for moving isolate_freepages_block to another file that does not alter the function in any way. When the function is updated in a follow-on patch, it'll be far easier to see what has changed. Will do. page_isolation.c may also be a better fit than page_alloc.c Since isolate_freepages_block() is the only user of split_free_page(), would it make sense to move split_free_page() to page_isolation.c as well? I sort of like the idea of making it static and removing from header file. I confess I didn't read closely because of the mess in page_alloc.c but the intent seems fine. No worries. I just needed for a quick comment whether I'm headed the right direction. :) Hopefully there will be a new version of CMA posted that will be easier to review. I'll try and create the code no latter then on the weekend so hopefully the new version will be sent next week. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman m...@csn.ul.ie wrote: This does mean that MIGRATE_CMA also does not have a per-cpu list. I don't know if that matters to you but all allocations using MIGRATE_CMA will take the zone lock. On Mon, 24 Oct 2011 21:32:45 +0200, Michal Nazarewicz min...@mina86.com wrote: This is sort of an artefact of my misunderstanding of pcp lists in the past. I'll have to re-evaluate the decision not to include CMA on pcp list. Actually sorry. My comment above is somehow invalid. The CMA does not need to be on pcp list because CMA pages are never allocated via standard kmalloc() and friends. Because of the fallbacks in rmqueue_bulk() the CMA pages end up being added to a pcp list of the MOVABLE type and so when kmallec() allocates an MOVABLE page it can end up grabbing a CMA page. So it's quite OK that CMA does not have its own pcp list as the list would not be used anyway. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added
On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote: The MIGRATE_CMA migration type has two main characteristics: (i) only movable pages can be allocated from MIGRATE_CMA pageblocks and (ii) page allocator will never change migration type of MIGRATE_CMA pageblocks. This guarantees that page in a MIGRATE_CMA page block can always be migrated somewhere else (unless there's no memory left in the system). On Tue, 18 Oct 2011 06:08:26 -0700, Mel Gorman m...@csn.ul.ie wrote: Or the count is premanently elevated by a device driver for some reason or if the page is backed by a filesystem with a broken or unusable migrate_page() function. This is unavoidable, I'm just pointing out that you can stil have migration failures, particularly if GFP_MOVABLE has been improperly used. CMA does not handle that well right now. I guess it's something to think about once the rest is nice and working. It is designed to be used with Contiguous Memory Allocator (CMA) for allocating big chunks (eg. 10MiB) of physically contiguous memory. Once driver requests contiguous memory, CMA will migrate pages from MIGRATE_CMA pageblocks. To minimise number of migrations, MIGRATE_CMA migration type is the last type tried when page allocator falls back to other migration types then requested. It would be preferable if you could figure out how to reuse the MIGRATE_RESERVE type for just the bitmap. I'm not entirely sure of what you mean here. Like MIGRATE_CMA, it does not change type except when min_free_kbytes changes. However, it is something that could be done in the future to keep the size of the pageblock bitmap where it is now. +enum { + MIGRATE_UNMOVABLE, + MIGRATE_RECLAIMABLE, + MIGRATE_MOVABLE, + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ + MIGRATE_RESERVE = MIGRATE_PCPTYPES, + /* +* MIGRATE_CMA migration type is designed to mimic the way +* ZONE_MOVABLE works. Only movable pages can be allocated +* from MIGRATE_CMA pageblocks and page allocator never +* implicitly change migration type of MIGRATE_CMA pageblock. +* +* The way to use it is to change migratetype of a range of +* pageblocks to MIGRATE_CMA which can be done by +* __free_pageblock_cma() function. What is important though +* is that a range of pageblocks must be aligned to +* MAX_ORDER_NR_PAGES should biggest page be bigger then +* a single pageblock. +*/ + MIGRATE_CMA, This does mean that MIGRATE_CMA also does not have a per-cpu list. I don't know if that matters to you but all allocations using MIGRATE_CMA will take the zone lock. This is sort of an artefact of my misunderstanding of pcp lists in the past. I'll have to re-evaluate the decision not to include CMA on pcp list. Still, I think that CMA not being on pcp lists should not be a problem for us. At least we can try and get CMA running and then consider adding CMA to pcp lists. I'm not sure this can be easily avoided because if there is a per-CPU list for MIGRATE_CMA, it might use a new cache line for it and incur a different set of performance problems. + MIGRATE_ISOLATE,/* can't allocate from here */ + MIGRATE_TYPES +}; diff --git a/mm/compaction.c b/mm/compaction.c index 97254e4..9cf6b2b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -115,6 +115,16 @@ static bool suitable_migration_target(struct page *page) if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE) return false; + /* Keep MIGRATE_CMA alone as well. */ + /* +* XXX Revisit. We currently cannot let compaction touch CMA +* pages since compaction insists on changing their migration +* type to MIGRATE_MOVABLE (see split_free_page() called from +* isolate_freepages_block() above). +*/ + if (is_migrate_cma(migratetype)) + return false; + This is another reason why CMA and compaction should be using almost identical code. It does mean that the compact_control may need to be renamed and get flags to control things like the setting of pageblock flags but it would be preferable to having two almost identical pieces of code. I've addressed it in my other mail where I've changed the split_free_page() to not touch CMA and ISOLATE pageblocks. I think that this change should make the above comment no longer accurate and the check unnecessary. /* If the page is a large free page, then allow migration */ if (PageBuddy(page) page_order(page) = pageblock_order) return true; @@ -940,12 +963,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* Find the largest possible block of pages in the other list */ for (current_order = MAX_ORDER-1; current_order = order; --current_order) { -
Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator
On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote: +static unsigned long __init __cma_early_get_total_pages(void) +{ + struct memblock_region *reg; + unsigned long total_pages = 0; + + /* +* We cannot use memblock_phys_mem_size() here, because +* memblock_analyze() has not been called yet. +*/ + for_each_memblock(memory, reg) + total_pages += memblock_region_memory_end_pfn(reg) - + memblock_region_memory_base_pfn(reg); + return total_pages; +} + On Tue, 18 Oct 2011 06:43:21 -0700, Mel Gorman m...@csn.ul.ie wrote: Is this being called too early yet? What prevents you seeing up the CMA regions after the page allocator is brought up for example? I understand that there is a need for the memory to be coherent so maybe that is the obstacle. Another reason is that we want to be sure that we can get given range of pages. After page allocator is set-up, someone could allocate a non-movable page from the range that interests us and that wouldn't be nice for us. +struct page *dma_alloc_from_contiguous(struct device *dev, int count, + unsigned int align) +{ + struct cma *cma = get_dev_cma_area(dev); + unsigned long pfn, pageno; + int ret; + + if (!cma) + return NULL; + + if (align CONFIG_CMA_ALIGNMENT) + align = CONFIG_CMA_ALIGNMENT; + + pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma, +count, align); + + if (!count) + return NULL; + + mutex_lock(cma_mutex); + + pageno = bitmap_find_next_zero_area(cma-bitmap, cma-count, 0, count, + (1 align) - 1); + if (pageno = cma-count) { + ret = -ENOMEM; + goto error; + } + bitmap_set(cma-bitmap, pageno, count); + + pfn = cma-base_pfn + pageno; + ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA); + if (ret) + goto free; + If alloc_contig_range returns failure, the bitmap is still set. It will never be freed so now the area cannot be used for CMA allocations any more. bitmap is cleared at the “free:” label. + mutex_unlock(cma_mutex); + + pr_debug(%s(): returned %p\n, __func__, pfn_to_page(pfn)); + return pfn_to_page(pfn); +free: + bitmap_clear(cma-bitmap, pageno, count); +error: + mutex_unlock(cma_mutex); + return NULL; +} +int dma_release_from_contiguous(struct device *dev, struct page *pages, + int count) +{ + struct cma *cma = get_dev_cma_area(dev); + unsigned long pfn; + + if (!cma || !pages) + return 0; + + pr_debug(%s(page %p)\n, __func__, (void *)pages); + + pfn = page_to_pfn(pages); + + if (pfn cma-base_pfn || pfn = cma-base_pfn + cma-count) + return 0; + + mutex_lock(cma_mutex); + + bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count); + free_contig_pages(pfn, count); + + mutex_unlock(cma_mutex); It feels like the mutex could be a lot lighter here. If the bitmap is protected by a spinlock, it would only need to be held while the bitmap was being cleared. free the contig pages outside the spinlock and clear the bitmap afterwards. It's not particularly important as the scalability of CMA is not something to be concerned with at this point. Mutex is used also to protect the core operations, ie. isolating pages and such. This is because two CMA calls may want to work on the same pageblock and we have to prevent that from happening. We could add the spinlock for protecting the bitmap but we will still need mutex for other uses. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
On Fri, 21 Oct 2011 03:06:24 -0700, Mel Gorman m...@csn.ul.ie wrote: On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote: On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman m...@csn.ul.ie wrote: At this point, I'm going to apologise for not reviewing this a long long time ago. On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote: From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com This commit introduces alloc_contig_freed_pages() function which allocates (ie. removes from buddy system) free pages in range. Caller has to guarantee that all pages in range are in buddy system. Straight away, I'm wondering why you didn't use mm/compaction.c#isolate_freepages() It knows how to isolate pages within ranges. All its control information is passed via struct compact_control() which I recognise may be awkward for CMA but compaction.c know how to manage all the isolated pages and pass them to migrate.c appropriately. It is something to consider. At first glance, I see that isolate_freepages seem to operate on pageblocks which is not desired for CMA. isolate_freepages_block operates on a range of pages that happens to be hard-coded to be a pageblock because that was the requirements. It calculates end_pfn and it is possible to make that a function parameter. Yes, this seems doable. I'll try and rewrite the patches to use it. The biggest difference is in how CMA and compaction treat pages which are not free. CMA treat it as an error and compaction just skips those. This is solvable by an argument though. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman m...@csn.ul.ie wrote: At this point, I'm going to apologise for not reviewing this a long long time ago. On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote: From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com This commit introduces alloc_contig_freed_pages() function which allocates (ie. removes from buddy system) free pages in range. Caller has to guarantee that all pages in range are in buddy system. Straight away, I'm wondering why you didn't use mm/compaction.c#isolate_freepages() It knows how to isolate pages within ranges. All its control information is passed via struct compact_control() which I recognise may be awkward for CMA but compaction.c know how to manage all the isolated pages and pass them to migrate.c appropriately. It is something to consider. At first glance, I see that isolate_freepages seem to operate on pageblocks which is not desired for CMA. I haven't read all the patches yet but isolate_freepages() does break everything up into order-0 pages. This may not be to your liking but it would not be possible to change. Splitting everything into order-0 pages is desired behaviour. Along with this function, a free_contig_pages() function is provided which frees all (or a subset of) pages allocated with alloc_contig_free_pages(). mm/compaction.c#release_freepages() It sort of does the same thing but release_freepages() assumes that pages that are being freed are not-continuous and they need to be on the lru list. With free_contig_pages(), we can assume all pages are continuous. +#if defined(CONFIG_SPARSEMEM) !defined(CONFIG_SPARSEMEM_VMEMMAP) +/* + * Both PFNs must be from the same zone! If this function returns + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2). + */ +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2) +{ + return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2); +} + Why do you care what section the page is in? The zone is important all right, but not the section. Also, offhand I'm unsure if being in the same section guarantees the same zone. sections are ordinarily fully populated (except on ARM but hey) but I can't remember anything enforcing that zones be section-aligned. Later I think I see that the intention was to reduce the use of pfn_to_page(). That is correct. You can do this in a more general fashion by checking the zone boundaries and resolving the pfn-page every MAX_ORDER_NR_PAGES. That will not be SPARSEMEM specific. I've tried doing stuff that way but it ended up with much more code. Dave suggested the above function to check if pointer arithmetic is valid. Please see also https://lkml.org/lkml/2011/9/21/220. +#else + +#define zone_pfn_same_memmap(pfn1, pfn2) (true) + +#endif + #endif /* !__GENERATING_BOUNDS.H */ #endif /* !__ASSEMBLY__ */ #endif /* _LINUX_MMZONE_H */ @@ -5706,6 +5706,73 @@ out: spin_unlock_irqrestore(zone-lock, flags); } +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ + unsigned long pfn = start, count; + struct page *page; + struct zone *zone; + int order; + + VM_BUG_ON(!pfn_valid(start)); VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the caller sees reasonable. + page = pfn_to_page(start); + zone = page_zone(page); + + spin_lock_irq(zone-lock); + + for (;;) { + VM_BUG_ON(page_count(page) || !PageBuddy(page) || + page_zone(page) != zone); + Here you will VM_BUG_ON with the zone lock held leading to system halting very shortly. + list_del(page-lru); + order = page_order(page); + count = 1UL order; + zone-free_area[order].nr_free--; + rmv_page_order(page); + __mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count); + The callers need to check in advance if watermarks are sufficient for this. In compaction, it happens in compaction_suitable() because it only needed to be checked once. Your requirements might be different. + pfn += count; + if (pfn = end) + break; + VM_BUG_ON(!pfn_valid(pfn)); + On ARM, it's possible to encounter invalid pages. VM_BUG_ON is serious overkill. + if (zone_pfn_same_memmap(pfn - count, pfn)) + page += count; + else + page = pfn_to_page(pfn); + } + + spin_unlock_irq(zone-lock); + + /* After this, pages in the range can be freed one be one */ + count = pfn - start; + pfn = start; + for (page = pfn_to_page(pfn); count; --count) { + prep_new_page(page, 0, flag); + ++pfn; + if (likely(zone_pfn_same_memmap(pfn - 1, pfn))) +
Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
On Tue, 18 Oct 2011 10:48:46 -0700, Dave Hansen d...@linux.vnet.ibm.com wrote: On Tue, 2011-10-18 at 10:26 -0700, Michal Nazarewicz wrote: You can do this in a more general fashion by checking the zone boundaries and resolving the pfn-page every MAX_ORDER_NR_PAGES. That will not be SPARSEMEM specific. I've tried doing stuff that way but it ended up with much more code. I guess instead of: +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2) +{ +return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2); +} You could do: static inline bool zone_pfn_same_maxorder(unsigned long pfn1, unsigned long pfn2) { unsigned long mask = MAX_ORDER_NR_PAGES-1; return (pfn1 mask) == (pfn2 mask); } I think that works. Should be the same code you have now, basically. Makes sense. It'd require calling pfn_to_page() every MAX_ORDER_NR_PAGES even in memory models that have linear mapping of struct page, but I guess that's not that bad. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
On Sat, 15 Oct 2011 01:29:33 +0200, Andrew Morton a...@linux-foundation.org wrote: On Thu, 06 Oct 2011 15:54:42 +0200 Marek Szyprowski m.szyprow...@samsung.com wrote: From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com This commit introduces alloc_contig_freed_pages() function The freed seems redundant to me. Wouldn't alloc_contig_pages be a better name? The “freed” is there because the function operates on pages that are in buddy system, ie. it is given a range of PFNs that are to be removed from buddy system. There's also a alloc_contig_range() function (added by next patch) which frees pages in given range and then calls alloc_contig_free_pages() to allocate them. IMO, if there was an alloc_contig_pages() function, it would have to be one level up (ie. it would figure out where to allocate memory and then call alloc_contig_range()). (That's really what CMA is doing). Still, as I think of it now, maybe alloc_contig_free_range() would be better? -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added
On Sun, 16 Oct 2011 10:01:36 +0200 Michal Nazarewicz wrote: Still, as I think of it now, maybe alloc_contig_free_range() would be better? On Sun, 16 Oct 2011 10:31:16 +0200, Andrew Morton wrote: Nope. Of *course* the pages were free. Otherwise we couldn't (re)allocate them. I still think the free part is redundant. Makes sense. What could be improved is the alloc part. This really isn't an allocation operation. The pages are being removed from buddy then moved into the free arena of a different memory manager from where they will _later_ be allocated. Not quite. After alloc_contig_range() returns, the pages are passed with no further processing to the caller. Ie. the area is not later split into several parts nor kept in CMA's pool unused. alloc_contig_freed_pages() is a little different since it must be called on a buddy page boundary and may return more then requested (because of the way buddy system merges buddies) so there is a little processing after it returns (namely freeing of the excess pages). So we should move away from the alloc/free naming altogether for this operation and think up new terms. How about claim and release? claim_contig_pages, claim_contig_range, release_contig_pages, etc? Or we could use take/return. Personally, I'm not convinced about changing the names of alloc_contig_range() and free_contig_pages() but I see merit in changing alloc_contig_freed_pages() to something else. Since at the moment, it's used only by alloc_contig_range(), I'd lean towards removing it from page-isolation.h, marking as static and renaming to __alloc_contig_range(). Also, if we have no expectation that anything apart from CMA will use these interfaces (?), the names could/should be prefixed with cma_. In Kamezawa's original patchset, he used those for a bit different approach (IIRC, Kamezawa's patchset introduced a function that scanned memory and tried to allocate contiguous memory where it could), so I can imagine that someone will make use of those functions. It may be used in any situation where a range of pages is either free (ie. in buddy system) or movable and one wants to allocate them for some reason. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +--min...@mina86.com---min...@jabber.org---ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] mm: alloc_contig_freed_pages() added
On Thu, 08 Sep 2011 20:05:52 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote: On Fri, 2011-08-19 at 16:27 +0200, Marek Szyprowski wrote: +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ + unsigned long pfn = start, count; + struct page *page; + struct zone *zone; + int order; + + VM_BUG_ON(!pfn_valid(start)); + zone = page_zone(pfn_to_page(start)); This implies that start-end are entirely contained in a single zone. What enforces that? In case of CMA, the __cma_activate_area() function from 6/8 has the check: 151 VM_BUG_ON(!pfn_valid(pfn)); 152 VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone); This guarantees that CMA will never try to call alloc_contig_freed_pages() on a region that spans multiple regions. If some higher layer enforces that, I think we probably need at least a VM_BUG_ON() in here and a comment about who enforces it. Agreed. + spin_lock_irq(zone-lock); + + page = pfn_to_page(pfn); + for (;;) { + VM_BUG_ON(page_count(page) || !PageBuddy(page)); + list_del(page-lru); + order = page_order(page); + zone-free_area[order].nr_free--; + rmv_page_order(page); + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL order)); + pfn += 1 order; + if (pfn = end) + break; + VM_BUG_ON(!pfn_valid(pfn)); + page += 1 order; + } This 'struct page *'++ stuff is OK, but only for small, aligned areas. For at least some of the sparsemem modes (non-VMEMMAP), you could walk off of the end of the section_mem_map[] when you cross a MAX_ORDER boundary. I'd feel a little bit more comfortable if pfn_to_page() was being done each time, or only occasionally when you cross a section boundary. I'm fine with that. I've used pointer arithmetic for performance reasons but if that may potentially lead to bugs then obviously pfn_to_page() should be used. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] fixup! mm: alloc_contig_freed_pages() added
From: Michal Nazarewicz min...@mina86.com Signed-off-by: Michal Nazarewicz min...@mina86.com --- include/linux/page-isolation.h |4 ++- mm/page_alloc.c| 66 ++- 2 files changed, 60 insertions(+), 10 deletions(-) On Fri, 2011-08-19 at 16:27 +0200, Marek Szyprowski wrote: +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ +unsigned long pfn = start, count; +struct page *page; +struct zone *zone; +int order; + +VM_BUG_ON(!pfn_valid(start)); +zone = page_zone(pfn_to_page(start)); On Thu, 08 Sep 2011 20:05:52 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote: This implies that start-end are entirely contained in a single zone. What enforces that? If some higher layer enforces that, I think we probably need at least a VM_BUG_ON() in here and a comment about who enforces it. +spin_lock_irq(zone-lock); + +page = pfn_to_page(pfn); +for (;;) { +VM_BUG_ON(page_count(page) || !PageBuddy(page)); +list_del(page-lru); +order = page_order(page); +zone-free_area[order].nr_free--; +rmv_page_order(page); +__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL order)); +pfn += 1 order; +if (pfn = end) +break; +VM_BUG_ON(!pfn_valid(pfn)); +page += 1 order; +} This 'struct page *'++ stuff is OK, but only for small, aligned areas. For at least some of the sparsemem modes (non-VMEMMAP), you could walk off of the end of the section_mem_map[] when you cross a MAX_ORDER boundary. I'd feel a little bit more comfortable if pfn_to_page() was being done each time, or only occasionally when you cross a section boundary. Do the attached changes seem to make sense? I wanted to avoid calling pfn_to_page() each time as it seem fairly expensive in sparsemem and disctontig modes. At the same time, the macro trickery is so that users of sparsemem-vmemmap and flatmem won't have to pay the price. diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index b2a81fd..003c52f 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -46,11 +46,13 @@ static inline void unset_migratetype_isolate(struct page *page) { __unset_migratetype_isolate(page, MIGRATE_MOVABLE); } + +/* The below functions must be run on a range from a single zone. */ extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); extern int alloc_contig_range(unsigned long start, unsigned long end, gfp_t flags, unsigned migratetype); -extern void free_contig_pages(struct page *page, int nr_pages); +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages); /* * For migration. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 46e78d4..32fda5d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5716,9 +5716,41 @@ out: spin_unlock_irqrestore(zone-lock, flags); } +#if defined(CONFIG_FLATMEM) || defined(CONFIG_SPARSEMEM_VMEMMAP) + +/* + * In FLATMEM and CONFIG_SPARSEMEM_VMEMMAP we can safely increment the page + * pointer and get the same value as if we were to get by calling + * pfn_to_page() on incremented pfn counter. + */ +#define __contig_next_page(page, pageblock_left, pfn, increment) \ + ((page) + (increment)) + +#define __contig_first_page(pageblock_left, pfn) pfn_to_page(pfn) + +#else + +/* + * If we cross pageblock boundary, make sure we get a valid page pointer. If + * we are within pageblock, incrementing the pointer is good enough, and is + * a bit of an optimisation. + */ +#define __contig_next_page(page, pageblock_left, pfn, increment) \ + (likely((pageblock_left) -= (increment)) ? (page) + (increment) \ +: (((pageblock_left) = pageblock_nr_pages), pfn_to_page(pfn))) + +#define __contig_first_page(pageblock_left, pfn) ( \ + ((pageblock_left) = pageblock_nr_pages -\ +((pfn) (pageblock_nr_pages - 1))), \ + pfn_to_page(pfn)) + + +#endif + unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag) { + unsigned long pageblock_left __unused; unsigned long pfn = start, count; struct page *page; struct zone *zone; @@ -5729,27 +5761,37 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, spin_lock_irq(zone-lock); - page = pfn_to_page(pfn); + page = __contig_first_page(pageblock_left, pfn); for (;;) { - VM_BUG_ON(page_count(page) || !PageBuddy(page)); + VM_BUG_ON(!page_count(page) || !PageBuddy(page
[PATCH 1/3] fixup! mm: alloc_contig_freed_pages() added
From: Michal Nazarewicz min...@mina86.com Signed-off-by: Michal Nazarewicz min...@mina86.com --- include/asm-generic/memory_model.h | 17 ++ include/linux/page-isolation.h |4 ++- mm/page_alloc.c| 43 +++ 3 files changed, 53 insertions(+), 11 deletions(-) On Wed, 2011-09-21 at 17:19 +0200, Michal Nazarewicz wrote: I wanted to avoid calling pfn_to_page() each time as it seem fairly expensive in sparsemem and disctontig modes. At the same time, the macro trickery is so that users of sparsemem-vmemmap and flatmem won't have to pay the price. On Wed, 21 Sep 2011 17:45:59 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote: Personally, I'd say the (incredibly minuscule) runtime cost is worth the cost of making folks' eyes bleed when they see those macros. I think there are some nicer ways to do it. Yeah. I wasn't amazed by them either. Is there a reason you can't logically do? page = pfn_to_page(pfn); for (;;) { if (pfn_to_section_nr(pfn) == pfn_to_section_nr(pfn+1)) page++; else page = pfn_to_page(pfn+1); } Done. Thanks for the suggestions! +#define __contig_next_page(page, pageblock_left, pfn, increment)\ +(likely((pageblock_left) -= (increment)) ? (page) + (increment) \ + : (((pageblock_left) = pageblock_nr_pages), pfn_to_page(pfn))) + +#define __contig_first_page(pageblock_left, pfn) ( \ +((pageblock_left) = pageblock_nr_pages -\ + ((pfn) (pageblock_nr_pages - 1))), \ +pfn_to_page(pfn)) + +#endif For the love of Pete, please make those in to functions if you're going to keep them. That was tricky because they modify pageblock_left. Not relevant now anyways though. diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h index fb2d63f..900da88 100644 --- a/include/asm-generic/memory_model.h +++ b/include/asm-generic/memory_model.h @@ -69,6 +69,23 @@ }) #endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */ +#if defined(CONFIG_SPARSEMEM) !defined(CONFIG_SPARSEMEM_VMEMMAP) + +/* + * Both PFNs must be from the same zone! If this function returns + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2). + */ +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2) +{ + return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2); +} + +#else + +#define zone_pfn_same_memmap(pfn1, pfn2) (true) + +#endif + #define page_to_pfn __page_to_pfn #define pfn_to_page __pfn_to_page diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index b2a81fd..003c52f 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -46,11 +46,13 @@ static inline void unset_migratetype_isolate(struct page *page) { __unset_migratetype_isolate(page, MIGRATE_MOVABLE); } + +/* The below functions must be run on a range from a single zone. */ extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); extern int alloc_contig_range(unsigned long start, unsigned long end, gfp_t flags, unsigned migratetype); -extern void free_contig_pages(struct page *page, int nr_pages); +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages); /* * For migration. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 46e78d4..bc200a9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5725,31 +5725,46 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, int order; VM_BUG_ON(!pfn_valid(start)); - zone = page_zone(pfn_to_page(start)); + page = pfn_to_page(start); + zone = page_zone(page); spin_lock_irq(zone-lock); - page = pfn_to_page(pfn); for (;;) { - VM_BUG_ON(page_count(page) || !PageBuddy(page)); + VM_BUG_ON(!page_count(page) || !PageBuddy(page) || + page_zone(page) != zone); + list_del(page-lru); order = page_order(page); + count = 1UL order; zone-free_area[order].nr_free--; rmv_page_order(page); - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL order)); - pfn += 1 order; + __mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count); + + pfn += count; if (pfn = end) break; VM_BUG_ON(!pfn_valid(pfn)); - page += 1 order; + + if (zone_pfn_same_memmap(pfn - count, pfn)) + page += count; + else + page = pfn_to_page(pfn); } spin_unlock_irq(zone-lock); /* After
Re: [PATCH 6/8] drivers: add Contiguous Memory Allocator
On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter c...@linux.com wrote: ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be used for DMA as well and a fully capable device would be expected to handle any memory in the system for DMA transfers. guaranteed dmaable memory? DMA abilities are device specific. Well maybe you can call ZONE_DMA memory to be guaranteed if you guarantee that any device must at mininum be able to perform DMA into ZONE_DMA memory. But there may not be much of that memory around so you would want to limit the use of that scarce resource. As pointed in Marek's other mail, this reasoning is not helping in any way. In case of video codec on various Samsung devices (and from some other threads this is not limited to Samsung), the codec needs separate buffers in separate memory banks. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wed, 22 Jun 2011 09:03:30 +0200, Hans Verkuil hverk...@xs4all.nl wrote: What I was wondering about is how this patch series changes the allocation in case it can't allocate from the CMA pool. Will it attempt to fall back to a 'normal' allocation? Unless Marek changed something since I wrote the code, which I doubt, if CMA cannot obtain memory from CMA region, it will fail. Part of the reason is that CMA lacks the knowledge where to allocate memory from. For instance, with the case of several memory banks, it does not know which memory bank to allocate from. It is, in my opinion, a task for a higher level functions (read: DMA layer) to try another mechanism if CMA fails. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote: We could also go further and add a runtime sysctl mechanism like the one for hugepages, where you can grow the pool at run time as long as there is enough free contiguous memory (e.g. from init scripts), or shrink it later if you want to allow larger nonmovable allocations. On Wed, 22 Jun 2011 15:15:35 +0200, Marek Szyprowski wrote: Sounds really good, but it might be really hard to implement, at least for CMA, because it needs to tweak parameters of memory management internal structures very early, when buddy allocator has not been activated yet. If you are able to allocate a pageblock of free memory from buddy system, you should be able to convert it to CMA memory with no problems. Also, if you want to convert CMA memory back to regular memory you should be able to do that even if some of the memory is used by CMA (it just won't be available right away but only when CMA frees it). It is important to note that, because of the use of migration type, all such conversion have to be performed on pageblock basis. I don't think this is a feature we should consider for the first patch though. We started with an overgrown idea about what CMA might do and it didn't got us far. Let's first get the basics right and then start implementing features as they become needed. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wed, 22 Jun 2011 15:39:23 +0200, Arnd Bergmann a...@arndb.de wrote: Why that? I would expect you can do the same that hugepages (used to) do and just attempt high-order allocations. If they succeed, you can add them as a CMA region and free them again, into the movable set of pages, otherwise you just fail the request from user space when the memory is already fragmented. Problem with that is that CMA needs to have whole pageblocks allocated and buddy can allocate at most half a pageblock. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann a...@arndb.de wrote: * We still need to solve the same problem in case of IOMMU mappings at some point, even if today's hardware doesn't have this combination. It would be good to use the same solution for both. I don't think I follow. What does IOMMU has to do with CMA? -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wed, 15 Jun 2011 13:20:42 +0200, Arnd Bergmann a...@arndb.de wrote: The point is that on the higher level device drivers, we want to hide the presence of CMA and/or IOMMU behind the dma mapping API, but the device drivers do need to know about the bank properties. Gotcha, thanks. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote: Please explain the exact requirements that lead you to defining multiple contexts. Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote: Please explain the exact requirements that lead you to defining multiple contexts. On Tuesday 14 June 2011, Michal Nazarewicz wrote: Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes. On Tue, 14 Jun 2011 18:03:00 +0200, Arnd Bergmann wrote: For all I know, that is something that is only true for a few very special Samsung devices, Maybe. I'm just answering your question. :) Ah yes, I forgot that separate regions for different purposes could decrease fragmentation. I would suggest going forward without having multiple regions: Is having support for multiple regions a bad thing? Frankly, removing this support will change code from reading context passed as argument to code reading context from global variable. Nothing is gained; functionality is lost. * Remove the registration of specific addresses from the initial patch set (but keep the patch). * Add a heuristic plus command-line override to automatically come up with a reasonable location+size for *one* CMA area in the system. I'm not arguing those. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote: Is having support for multiple regions a bad thing? Frankly, removing this support will change code from reading context passed as argument to code reading context from global variable. Nothing is gained; functionality is lost. On Tue, 14 Jun 2011 20:30:07 +0200, Arnd Bergmann wrote: What is bad IMHO is making them the default, which forces the board code to care about memory management details. I would much prefer to have contiguous allocation parameters tuned automatically to just work on most boards before we add ways to do board-specific hacks. I see those as orthogonal problems. The code can have support for multiple contexts but by default use a single global context exported as cma_global variable (or some such). And I'm not arguing against having “contiguous allocation parameters tuned automatically to just work on most boards”. I just don't see the reason to delete functionality that is already there, does not add much code and can be useful. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] lib: genalloc: Generic allocator improvements
On Fri, 10 Jun 2011 14:52:17 +0200, Alan Cox a...@lxorguk.ukuu.org.uk wrote: I plan to replace it with lib/bitmap.c bitmap_* based allocator (similar like it it is used by dma_declare_coherent_memory() and friends in drivers/base/dma-coherent.c). We need something really simple for CMA area management. IMHO allocate_resource and friends a bit too heavy here, but good to know that such allocator also exists. Not sure I'd class allocate_resource as heavyweight but providing it's using something that already exists rather than inventing yet another allocator. genalloc is already in the kernel and is used in a few places, so we either let everyone use it as they see fit or we deprecate the library. If we don't deprecate it I see no reason why CMA should not use it. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added
On Fri, 2011-04-01 at 00:51 +0200, Michal Nazarewicz wrote: The function is called from alloc_contig_range() (see patch 05/12) which makes sure that the PFN is valid. Situation where there is not enough space is caught earlier in alloc_contig_range(). alloc_contig_freed_pages() must be given a valid PFN range such that all the pages in that range are free (as in are within the region tracked by page allocator) and of MIGRATE_ISOLATE so that page allocator won't touch them. On Fri, 01 Apr 2011 16:03:16 +0200, Dave Hansen wrote: OK, so it really is a low-level function only. How about a comment that explicitly says this? Only called from $FOO with the area already isolated. It probably also deserves an __ prefix. Yes, it's not really for general use. Comment may indeed be useful here. That's why invalid PFN is a bug in the caller and not an exception that has to be handled. Also, the function is not called during boot time. It is called while system is already running. What kind of success have you had running this in practice? I'd be worried that some silly task or a sticky dentry would end up in the range that you want to allocate in. I'm not sure what you are asking. The function requires the range to be marked as MIGRATE_ISOLATE and all pages being free, so nothing can be allocated there while the function is running. If you are asking about CMA in general, the range that CMA uses is marked as MIGRATE_CMA (a new migrate type) which means that only MIGRATE_MOVABLE pages can be allocated there. This means, that in theory, if there is enough memory the pages can always be moved out of the region. At leasts that's my understanding of the type. If this is correct, the allocation should always succeed provided enough memory for the pages within the region to be moved to is available. As of practice, I have run some simple test to see if the code works and they succeeded. Also, Marek has run some test with actual hardware and those worked well as well (but I'll let Marek talk about any details). -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/12] mm: alloc_contig_range() added
On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote: + ret = 0; + while (!PageBuddy(pfn_to_page(start (~0UL ret + if (WARN_ON(++ret = MAX_ORDER)) + return -EINVAL; On Thu, 31 Mar 2011 18:02:41 +0200, Dave Hansen wrote: Holy cow, that's dense. Is there really no more straightforward way to do that? Which part exactly is dense? What would be qualify as a more straightforward way? In any case, please pull the ++ret bit out of the WARN_ON(). Some people like to do: #define WARN_ON(...) do{}while(0) to save space on some systems. I don't think that's the case. Even if WARN_ON() decides not to print a warning, it will still return the value of the argument. If not, a lot of code will brake. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/12] mm: alloc_contig_range() added
On Thu, 31 Mar 2011 21:28:21 +0200, Steven Rostedt wrote: WARN_ON() should never do anything but test. That ret++ does not belong inside the WARN_ON() condition. If there are other locations in the kernel that do that, then those locations need to be fixed. Testing implies evaluating, so if we allow: if (++i == end) { /* ... */ } I see no reason why not to allow: if (WARN_ON(++i == end)) { /* ... */ } In both cases the condition is tested. On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote: + ret = 0; + while (!PageBuddy(pfn_to_page(start (~0UL ret + if (WARN_ON(++ret = MAX_ORDER)) + return -EINVAL; On Thu, Mar 31, 2011 at 09:02:41AM -0700, Dave Hansen wrote: In any case, please pull the ++ret bit out of the WARN_ON(). Some people like to do: #define WARN_ON(...) do{}while(0) to save space on some systems. On Thu, 31 Mar 2011 21:26:50 +0200, Steven Rostedt wrote: That should be fixed, as the if (WARN_ON()) has become a standard in most of the kernel. Removing WARN_ON() should be: #define WARN_ON(x) ({0;}) This would break a lot of code which expect that testing to take place. Also see http://lxr.linux.no/linux+*/include/asm-generic/bug.h#L108. But I agree, that there should be no side effects inside a WARN_ON(), which that ++ret is definitely one. Thus I don't really agree with this point. At any rate, I don't really care. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added
On Thu, 31 Mar 2011 17:58:03 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote: On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote: +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ + unsigned long pfn = start, count; + struct page *page; + struct zone *zone; + int order; + + VM_BUG_ON(!pfn_valid(start)); This seems kinda mean. Could we return an error? I understand that this is largely going to be an early-boot thing, but surely trying to punt on crappy input beats a full-on BUG(). Actually, I would have to check but I think that the usage of this function (in this patchset) is that the caller expects the function to succeed. It is quite a low-level function so before running it a lot of preparation is needed and the caller must make sure that several conditions are met. I don't really see advantage of returning a value rather then BUG()ing. Also, CMA does not call this function at boot time. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/12] mm: alloc_contig_range() added
On Thu, 31 Mar 2011 22:28:42 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote: On Thu, 2011-03-31 at 18:26 +0200, Michal Nazarewicz wrote: On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote: + ret = 0; + while (!PageBuddy(pfn_to_page(start (~0UL ret + if (WARN_ON(++ret = MAX_ORDER)) + return -EINVAL; On Thu, 31 Mar 2011 18:02:41 +0200, Dave Hansen wrote: Holy cow, that's dense. Is there really no more straightforward way to do that? Which part exactly is dense? What would be qualify as a more straightforward way? I'm still not 100% sure what it's trying to do. It looks like it attempts to check all of start's buddy pages. No. I'm going up through parents. This is because even though start falls in a free block (ie. one that page allocator tracks), the actual page that is in buddy system is larger then start and this loop looks for beginning of that page. int order; for (order = 0; order = MAX_ORDER; order++) { unsigned long buddy_pfn = find_buddy(start, order); struct page *buddy = pfn_to_page(buddy_pfn); if (PageBuddy(buddy) break; WARN(); return -EINVAL; } The WARN() and return would have to be outside of the loop and, as I described, instead of find_buddy() something like find_parent() would have to be used. I'm wondering also if you can share some code with __rmqueue(). Doubtful since start does not (have to) point to a page that is tracked by page allocator but a page inside such a page. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added
On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote: +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ + unsigned long pfn = start, count; + struct page *page; + struct zone *zone; + int order; + + VM_BUG_ON(!pfn_valid(start)); On Thu, 31 Mar 2011 23:14:38 +0200, Dave Hansen wrote: We BUG_ON() in bootmem. Basically if we try to allocate an early-boot structure and fail, we're screwed. We can't keep running without an inode hash, or a mem_map[]. This looks like it's going to at least get partially used in drivers, at least from the examples. Are these kinds of things that, if the driver fails to load, that the system is useless and hosed? Or, is it something where we might limp along to figure out what went wrong before we reboot? Bug in the above place does not mean that we could not allocate memory. It means caller is broken. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added
On Fri, 01 Apr 2011 00:26:51 +0200, Dave Hansen d...@linux.vnet.ibm.com wrote: On Fri, 2011-04-01 at 00:18 +0200, Michal Nazarewicz wrote: On Thu, 31 Mar 2011 23:14:38 +0200, Dave Hansen wrote: We BUG_ON() in bootmem. Basically if we try to allocate an early-boot structure and fail, we're screwed. We can't keep running without an inode hash, or a mem_map[]. This looks like it's going to at least get partially used in drivers, at least from the examples. Are these kinds of things that, if the driver fails to load, that the system is useless and hosed? Or, is it something where we might limp along to figure out what went wrong before we reboot? Bug in the above place does not mean that we could not allocate memory. It means caller is broken. Could you explain that a bit? Is this a case where a device is mapped to a very *specific* range of physical memory and no where else? What are the reasons for not marking it off limits at boot? I also saw some bits of isolation and migration in those patches. Can't the migration fail? The function is called from alloc_contig_range() (see patch 05/12) which makes sure that the PFN is valid. Situation where there is not enough space is caught earlier in alloc_contig_range(). alloc_contig_freed_pages() must be given a valid PFN range such that all the pages in that range are free (as in are within the region tracked by page allocator) and of MIGRATETYPE_ISOLATE so that page allocator won't touch them. That's why invalid PFN is a bug in the caller and not an exception that has to be handled. Also, the function is not called during boot time. It is called while system is already running. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] frame-size switching: preview / single-shot use-case
On Fri, 18 Feb 2011 11:31:30 +0100, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Michal, On Friday 18 February 2011 00:09:51 Michal Nazarewicz wrote: On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote: There's an additional problem with that: assume that streaming is happening, and a S_FMT changing the resolution was sent. There's no way to warrant that the very next frame will have the new resolution. So, a meta-data with the frame resolution (and format) would be needed. Em 17-02-2011 17:26, Guennadi Liakhovetski escreveu: Sorry, we are not going to allow format changes during a running capture. You have to stop streaming, set new formats (possibly switch to another queue) and restart streaming. What am I missing? On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote: If you're stopping the stream, the current API will work as-is. If all of your concerns is about reserving a bigger buffer queue, I think that one of the reasons for the CMA allocator it for such usage. From: Guennadi Liakhovetski g.liakhovet...@gmx.de Not just bigger, say, with our preview / still-shot example, we would have one queue with a larger number of small buffers for drop-free preview, and a small number of larger buffers for still images. Ie. waste memory? As in you have both those queues allocated but only one is used at given time? It's a trade-off between memory and speed. Preallocating still image capture buffers will give you better snapshot performances, at the expense of memory. The basic problems we have here is that taking snapshots is slow with the current API if we need to stop capture, free buffers, change the format, allocate new buffers (and perform cache management operations) and restart the stream. To fix this we're considering a way to preallocate still image capture buffers, but I'm open to proposals for other ways to solve the issue :-) What I'm trying to say is that it would be best if one could configure the device in such a way that switching between modes would not require the device to free buffers (even though in user space they could be inaccessible). This is what I have in mind the usage would look like: 1. Open device Kernel creates some control structures, the usual stuff. 2. Initialize multi-format (specifying what formats user space will use). Kernel calculates amount of memory needed for the most demanding format and allocates it. 3. Set format (restricted to one of formats specified in step 2) Kernel has memory already allocated, so it only needs to split it to buffers needed in given format. 4. Allocate buffers. Kernel allocates memory needed for most demanding format (calculated in step 2). Once memory is allocated, splits it to buffers needed in given format. 5. Do the loop... queue, dequeue, all the usual stuff. Kernel instructs device to handle buffers, the usual stuff. 6. Free buffers. Kernel space destroys the buffers needed for given format but DOES NOT free memory. 7. If not done, go to step 3. 8. Finish multi-format mode. Kernel actually frees the memory. 9. Close the device. A V4L device driver could just ignore step 2 and 7 and work in the less optimal mode. If I understand V4L2 correctly, the API does not allow for step 2 and 8. In theory, they could be merged with step 1 and 9 respectively, I don't know id that feasible though. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] frame-size switching: preview / single-shot use-case
Em 18-02-2011 08:31, Laurent Pinchart escreveu: It's a trade-off between memory and speed. Preallocating still image capture buffers will give you better snapshot performances, at the expense of memory. The basic problems we have here is that taking snapshots is slow with the current API if we need to stop capture, free buffers, change the format, allocate new buffers (and perform cache management operations) and restart the stream. To fix this we're considering a way to preallocate still image capture buffers, but I'm open to proposals for other ways to solve the issue :-) On Fri, 18 Feb 2011 13:33:12 +0100, Mauro Carvalho Chehab wrote: From the above operations, considering that CMA is used to reserve a non-shared memory with enough space for the new buffer size/qtd, I don't think that the most expensive operation would be to realloc the memory. The logic to stop/start streaming seems to be the most consuming one, as driver will need to wait for the current I/O operation to complete, and this can take hundreds of milisseconds (the duration of one frame). How much time would CMA need to free and re-allocate the buffers for, let's say, something in the range of 1-10 MB, on a pre-allocated, non shared memory space? Like I said, if memory is shared with page allocator this can potentially take a lot of time. If device driver could preallocate some space and then spit it into buffers when needed, it could be a better solution then to free chunk and allocate a new one. CMA was not designed with very low latency in mind. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] frame-size switching: preview / single-shot use-case
On Fri, 18 Feb 2011 13:57:25 +0100, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Michal, On Friday 18 February 2011 12:37:24 Michal Nazarewicz wrote: [snip] What I'm trying to say is that it would be best if one could configure the device in such a way that switching between modes would not require the device to free buffers (even though in user space they could be inaccessible). This is what I have in mind the usage would look like: 1. Open device Kernel creates some control structures, the usual stuff. 2. Initialize multi-format (specifying what formats user space will use). Kernel calculates amount of memory needed for the most demanding format and allocates it. Don't forget that applications can also use USERPTR. We need a low-latency solution for that as well. That would probably work best if user provided one big buffer. Again, I don't know how this maps to V4L API. 3. Set format (restricted to one of formats specified in step 2) Kernel has memory already allocated, so it only needs to split it to buffers needed in given format. 4. Allocate buffers. Kernel allocates memory needed for most demanding format (calculated in step 2). Once memory is allocated, splits it to buffers needed in given format. 5. Do the loop... queue, dequeue, all the usual stuff. Kernel instructs device to handle buffers, the usual stuff. When buffers are queued cache needs to be cleaned. This is an expensive operation, and we need to be able to pre-queue (or at least pre-clean) buffers. Cache operations are always needed, aren't they? Whatever you do, you will always have to handle cache coherency (in one way or another) so there's nothing we can do about it, or is there? 6. Free buffers. Kernel space destroys the buffers needed for given format but DOES NOT free memory. 7. If not done, go to step 3. 8. Finish multi-format mode. Kernel actually frees the memory. 9. Close the device. A V4L device driver could just ignore step 2 and 7 and work in the less optimal mode. If I understand V4L2 correctly, the API does not allow for step 2 and 8. In theory, they could be merged with step 1 and 9 respectively, I don't know id that feasible though. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] frame-size switching: preview / single-shot use-case
On Friday 18 February 2011 14:19:44 Michal Nazarewicz wrote: Cache operations are always needed, aren't they? Whatever you do, you will always have to handle cache coherency (in one way or another) so there's nothing we can do about it, or is there? On Fri, 18 Feb 2011 14:21:53 +0100, Laurent Pinchart wrote: To achieve low latency still image capture, you need to minimize the time spent reconfiguring the device from viewfinder to still capture. Cache cleaning is always needed, but you can prequeue buffers you can clean the cache in advance, avoiding an extra delay when the user presses the still image capture button. If there is enough time to perform those operation while preview is shown (ie. several frames pare second), why would there not be enough time to perform those operations for still image capture? If I understand you correctly, what you are describing is a situation where one has set of buffers for preview and a buffer for still image laying around waiting to be used, right? Such scheme will of course work, but I'm just suggesting to think of a scheme where the unused buffer for still image is reused for preview frames when preview is shown. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] frame-size switching: preview / single-shot use-case
On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote: There's an additional problem with that: assume that streaming is happening, and a S_FMT changing the resolution was sent. There's no way to warrant that the very next frame will have the new resolution. So, a meta-data with the frame resolution (and format) would be needed. Em 17-02-2011 17:26, Guennadi Liakhovetski escreveu: Sorry, we are not going to allow format changes during a running capture. You have to stop streaming, set new formats (possibly switch to another queue) and restart streaming. What am I missing? On Thu, 17 Feb 2011, Mauro Carvalho Chehab wrote: If you're stopping the stream, the current API will work as-is. If all of your concerns is about reserving a bigger buffer queue, I think that one of the reasons for the CMA allocator it for such usage. From: Guennadi Liakhovetski g.liakhovet...@gmx.de Not just bigger, say, with our preview / still-shot example, we would have one queue with a larger number of small buffers for drop-free preview, and a small number of larger buffers for still images. Ie. waste memory? As in you have both those queues allocated but only one is used at given time? Currently you would have to allocate a large number of large buffers, which would waste memory. Or you would have to reallocate the queue, losing time. AFAICS, CMA doesn't manage our memory for us. It only provides an API to reserve memory for various uses with various restrictions (alignment, etc.) and use different allocators to obtain that memory. I'm not sure if I understand you here. CMA has some API for reserving memory at boot time but it sure does manage this reserved memory, ie. when system is running you can allocate chunks of memory from this reserved block. Also note, that latest CMA uses only one allocator. So, are you suggesting, that with that in place, we would first allocate the preview queue from this memory, then free it, when switching to snapshooting, allocate our large-buffer queue from the _same_ memory, capture images, free and allocate preview queue again? Would that be fast enough? If CMA is considered, the most important thing to note is that CMA may share memory with page allocator (so that other parts of the system can use it if CMA-compatible devices are not using it). When CMA allocates memory chunk it may potentially need to migrate memory pages which may take so time (there is room for improvement, but still). Sharing can be disabled in which case allocation should be quite fast (the last CMA patchset uses a first-fit bitmap-based gen_allocator API but O(log n) best-fit algorithm can easily used instead). To sum things up, if sharing is disabled, CMA should be able to fulfil your requirements, however it may be undesirable as it wastes space. If sharing is enabled, on the other hand, the delay may potentially be noticeable. In fact, it would be kind of smart to reuse the same memory for both queues, but if we could do it without re-allocations?... What I would do is allocate a few big buffers and when needed divide them into smaller chunks (or even allocate one big block and later divide it in whatever way needed). I'm not sure if such usage would map well to V4L2 API. This usage is, as a matter of fact, supported by CMA. You can allocate a big block and then run cma_create() on it to create a new CMA context. Using this context you can allocate a lot of small blocks, then free them all, to finally allocate few big blocks. Again, I'm not sure how it maps to V4L2 API. If you can change formats while retaining V4L device instance's state, this should be doable. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 07/12] mm: cma: Contiguous Memory Allocator added
On Wed, Dec 15, 2010 at 09:34:27PM +0100, Michal Nazarewicz wrote: +unsigned long cma_reserve(unsigned long start, unsigned long size, + unsigned long alignment) +{ + pr_debug(%s(%p+%p/%p)\n, __func__, (void *)start, (void *)size, +(void *)alignment); + + /* Sanity checks */ + if (!size || (alignment (alignment - 1))) + return (unsigned long)-EINVAL; + + /* Sanitise input arguments */ + start = PAGE_ALIGN(start); + size = PAGE_ALIGN(size); + if (alignment PAGE_SIZE) + alignment = PAGE_SIZE; + + /* Reserve memory */ + if (start) { + if (memblock_is_region_reserved(start, size) || + memblock_reserve(start, size) 0) + return (unsigned long)-EBUSY; + } else { + /* +* Use __memblock_alloc_base() since +* memblock_alloc_base() panic()s. +*/ + u64 addr = __memblock_alloc_base(size, alignment, 0); + if (!addr) { + return (unsigned long)-ENOMEM; + } else if (addr + size ~(unsigned long)0) { + memblock_free(addr, size); + return (unsigned long)-EOVERFLOW; + } else { + start = addr; + } + } + On Wed, 02 Feb 2011 13:43:33 +0100, Ankita Garg ank...@in.ibm.com wrote: Reserving the areas of memory belonging to CMA using memblock_reserve, would preclude that range from the zones, due to which it would not be available for buddy allocations right ? Correct. CMA however, injects allocated pageblocks to buddy so they end up in buddy with migratetype set to MIGRATE_CMA. + return start; +} -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz (o o) ooo +-email/jid: mnazarew...@google.comooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 00/12] Contiguous Memory Allocator
Russell King - ARM Linux li...@arm.linux.org.uk writes: Has anyone addressed my issue with it that this is wide-open for abuse by allocating large chunks of memory, and then remapping them in some way with different attributes, thereby violating the ARM architecture specification? In other words, do we _actually_ have a use for this which doesn't involve doing something like allocating 32MB of memory from it, remapping it so that it's DMA coherent, and then performing DMA on the resulting buffer? Huge pages. Also, don't treat it as coherent memory and just flush/clear/invalidate cache before and after each DMA transaction. I never understood what's wrong with that approach. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz (o o) ooo +--mina86-tlen.pl--jid:mina86-jabber.org--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 00/12] Contiguous Memory Allocator
On Thu, Dec 23, 2010 at 03:04:07PM +0100, Tomasz Fujak wrote: In other words, should we take your response as yet another NAK? Or would you try harder and at least point us to some direction that would not doom the effort from the very beginning. On Thu, Dec 23, 2010 at 4:16 PM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: What the fsck do you think I've been doing? This is NOT THE FIRST time I've raised this issue. I gave up raising it after the first couple of attempts because I wasn't being listened to. You say about _me_ not being very helpful. How about the CMA proponents start taking the issue I've raised seriously, and try to work out how to solve it? And how about blaming them for the months of wasted time on this issue _because_ _they_ have chosen to ignore it? Felipe Contreras felipe.contre...@gmail.com writes: I've also raised the issue for ARM. However, I don't see what is the big problem. A generic solution (that I think I already proposed) would be to reserve a chunk of memory for the CMA that can be removed from the normally mapped kernel memory through memblock at boot time. The size of this memory region would be configurable through kconfig. Then, the CMA would have a dma flag or something, Having exactly that usage in mind, in v8 I've added notion of private CMA contexts which can be used for DMA coherent RAM as well as memory mapped devices. and take chunks out of it until there's no more, and then return errors. That would work for ARM. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz (o o) ooo +--mina86-tlen.pl--jid:mina86-jabber.org--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv8 01/12] mm: migrate.c: fix compilation error
GCC complained about update_mmu_cache() not being defined in migrate.c. Including asm/tlbflush.h seems to solve the problem. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com --- mm/migrate.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index fe5a3c6..6ae8a66 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -35,6 +35,8 @@ #include linux/hugetlb.h #include linux/gfp.h +#include asm/tlbflush.h + #include internal.h #define lru_to_page(_head) (list_entry((_head)-prev, struct page, lru)) -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv8 02/12] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
From: Michal Nazarewicz min...@mina86.com This commit adds a bitmap_find_next_zero_area_off() function which works like bitmap_find_next_zero_area() function expect it allows an offset to be specified when alignment is checked. This lets caller request a bit such that its number plus the offset is aligned according to the mask. Signed-off-by: Michal Nazarewicz min...@mina86.com --- include/linux/bitmap.h | 24 +++- lib/bitmap.c | 22 -- 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index daf8c48..c0528d1 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -45,6 +45,7 @@ * bitmap_set(dst, pos, nbits) Set specified bit area * bitmap_clear(dst, pos, nbits) Clear specified bit area * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area + * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above * bitmap_shift_right(dst, src, n, nbits) *dst = *src n * bitmap_shift_left(dst, src, n, nbits) *dst = *src n * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src) @@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits); extern void bitmap_set(unsigned long *map, int i, int len); extern void bitmap_clear(unsigned long *map, int start, int nr); -extern unsigned long bitmap_find_next_zero_area(unsigned long *map, -unsigned long size, -unsigned long start, -unsigned int nr, -unsigned long align_mask); + +extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask, + unsigned long align_offset); + +static inline unsigned long +bitmap_find_next_zero_area(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask) +{ + return bitmap_find_next_zero_area_off(map, size, start, nr, + align_mask, 0); +} extern int bitmap_scnprintf(char *buf, unsigned int len, const unsigned long *src, int nbits); diff --git a/lib/bitmap.c b/lib/bitmap.c index 741fae9..8e75a6f 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr) } EXPORT_SYMBOL(bitmap_clear); -/* +/** * bitmap_find_next_zero_area - find a contiguous aligned zero area * @map: The address to base the search on * @size: The bitmap size in bits * @start: The bitnumber to start searching at * @nr: The number of zeroed bits we're looking for * @align_mask: Alignment mask for zero area + * @align_offset: Alignment offset for zero area. * * The @align_mask should be one less than a power of 2; the effect is that - * the bit offset of all zero areas this function finds is multiples of that - * power of 2. A @align_mask of 0 means no alignment is required. + * the bit offset of all zero areas this function finds plus @align_offset + * is multiple of that power of 2. */ -unsigned long bitmap_find_next_zero_area(unsigned long *map, -unsigned long size, -unsigned long start, -unsigned int nr, -unsigned long align_mask) +unsigned long bitmap_find_next_zero_area_off(unsigned long *map, +unsigned long size, +unsigned long start, +unsigned int nr, +unsigned long align_mask, +unsigned long align_offset) { unsigned long index, end, i; again: index = find_next_zero_bit(map, size, start); /* Align allocation */ - index = __ALIGN_MASK(index, align_mask); + index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset; end = index + nr; if (end size) @@ -350,7 +352,7 @@ again: } return index; } -EXPORT_SYMBOL(bitmap_find_next_zero_area); +EXPORT_SYMBOL(bitmap_find_next_zero_area_off); /* * Bitmap printing parsing functions: first version by Bill Irwin, -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body
[PATCHv8 05/12] mm: alloc_contig_freed_pages() added
From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com This commit introduces alloc_contig_freed_pages() function which allocates (ie. removes from buddy system) free pages in range. Caller has to guarantee that all pages in range are in buddy system. Along with this function, a free_contig_pages() function is provided which frees all (or a subset of) pages allocated with alloc_contig_free_pages(). Michal Nazarewicz has modified the function to make it easier to allocate not MAX_ORDER_NR_PAGES aligned pages by making it return pfn of one-past-the-last allocated page. Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com --- include/linux/page-isolation.h |3 ++ mm/page_alloc.c| 44 2 files changed, 47 insertions(+), 0 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 58cdbac..f1417ed 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); */ extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page); +extern unsigned long alloc_contig_freed_pages(unsigned long start, + unsigned long end, gfp_t flag); +extern void free_contig_pages(struct page *page, int nr_pages); /* * For migration. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 826ba69..be240a3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5425,6 +5425,50 @@ out: spin_unlock_irqrestore(zone-lock, flags); } +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ + unsigned long pfn = start, count; + struct page *page; + struct zone *zone; + int order; + + VM_BUG_ON(!pfn_valid(start)); + zone = page_zone(pfn_to_page(start)); + + spin_lock_irq(zone-lock); + + page = pfn_to_page(pfn); + for (;;) { + VM_BUG_ON(page_count(page) || !PageBuddy(page)); + list_del(page-lru); + order = page_order(page); + zone-free_area[order].nr_free--; + rmv_page_order(page); + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL order)); + pfn += 1 order; + if (pfn = end) + break; + VM_BUG_ON(!pfn_valid(pfn)); + page += 1 order; + } + + spin_unlock_irq(zone-lock); + + /* After this, pages in the range can be freed one be one */ + page = pfn_to_page(start); + for (count = pfn - start; count; --count, ++page) + prep_new_page(page, 0, flag); + + return pfn; +} + +void free_contig_pages(struct page *page, int nr_pages) +{ + for (; nr_pages; --nr_pages, ++page) + __free_page(page); +} + #ifdef CONFIG_MEMORY_HOTREMOVE /* * All pages in the range must be isolated before calling this. -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv8 04/12] mm: move some functions from memory_hotplug.c to page_isolation.c
From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Memory hotplug is a logic for making pages unused in the specified range of pfn. So, some of core logics can be used for other purpose as allocating a very large contigous memory block. This patch moves some functions from mm/memory_hotplug.c to mm/page_isolation.c. This helps adding a function for large-alloc in page_isolation.c with memory-unplug technique. Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [mina86: reworded commit message] Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com --- include/linux/page-isolation.h |7 +++ mm/memory_hotplug.c| 108 -- mm/page_isolation.c| 111 3 files changed, 118 insertions(+), 108 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 051c1b1..58cdbac 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page); +/* + * For migration. + */ + +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn); +unsigned long scan_lru_pages(unsigned long start, unsigned long end); +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn); #endif diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 2c6523a..2b18cb5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -634,114 +634,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) } /* - * Confirm all pages in a range [start, end) is belongs to the same zone. - */ -static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) -{ - unsigned long pfn; - struct zone *zone = NULL; - struct page *page; - int i; - for (pfn = start_pfn; -pfn end_pfn; -pfn += MAX_ORDER_NR_PAGES) { - i = 0; - /* This is just a CONFIG_HOLES_IN_ZONE check.*/ - while ((i MAX_ORDER_NR_PAGES) !pfn_valid_within(pfn + i)) - i++; - if (i == MAX_ORDER_NR_PAGES) - continue; - page = pfn_to_page(pfn + i); - if (zone page_zone(page) != zone) - return 0; - zone = page_zone(page); - } - return 1; -} - -/* - * Scanning pfn is much easier than scanning lru list. - * Scan pfn from start to end and Find LRU page. - */ -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) -{ - unsigned long pfn; - struct page *page; - for (pfn = start; pfn end; pfn++) { - if (pfn_valid(pfn)) { - page = pfn_to_page(pfn); - if (PageLRU(page)) - return pfn; - } - } - return 0; -} - -static struct page * -hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) -{ - /* This should be improved!! */ - return alloc_page(GFP_HIGHUSER_MOVABLE); -} - -#define NR_OFFLINE_AT_ONCE_PAGES (256) -static int -do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) -{ - unsigned long pfn; - struct page *page; - int move_pages = NR_OFFLINE_AT_ONCE_PAGES; - int not_managed = 0; - int ret = 0; - LIST_HEAD(source); - - for (pfn = start_pfn; pfn end_pfn move_pages 0; pfn++) { - if (!pfn_valid(pfn)) - continue; - page = pfn_to_page(pfn); - if (!page_count(page)) - continue; - /* -* We can skip free pages. And we can only deal with pages on -* LRU. -*/ - ret = isolate_lru_page(page); - if (!ret) { /* Success */ - list_add_tail(page-lru, source); - move_pages--; - inc_zone_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - - } else { -#ifdef CONFIG_DEBUG_VM - printk(KERN_ALERT removing pfn %lx from LRU failed\n, - pfn); - dump_page(page); -#endif - /* Becasue we don't have big zone-lock. we should - check this again here. */ - if (page_count(page)) { - not_managed++; - ret = -EBUSY; - break; - } - } - } - if (!list_empty(source)) { - if (not_managed
[PATCHv8 12/12] ARM: cma: Added CMA to Aquila, Goni and c210 universal boards
This commit adds CMA memory reservation code to Aquila, Goni and c210 universal boards. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- arch/arm/mach-s5pv210/Kconfig |2 + arch/arm/mach-s5pv210/mach-aquila.c |2 + arch/arm/mach-s5pv210/mach-goni.c |2 + arch/arm/mach-s5pv310/Kconfig |1 + arch/arm/mach-s5pv310/mach-universal_c210.c |2 + arch/arm/plat-s5p/Makefile |2 + arch/arm/plat-s5p/cma-stub.c| 49 +++ arch/arm/plat-s5p/include/plat/cma-stub.h | 21 +++ 8 files changed, 81 insertions(+), 0 deletions(-) create mode 100644 arch/arm/plat-s5p/cma-stub.c create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig index 53aabef..b395a16 100644 --- a/arch/arm/mach-s5pv210/Kconfig +++ b/arch/arm/mach-s5pv210/Kconfig @@ -68,6 +68,7 @@ config MACH_AQUILA select S5P_DEV_ONENAND select S5PV210_SETUP_FB_24BPP select S5PV210_SETUP_SDHCI + select CMA_DEVICE_POSSIBLE help Machine support for the Samsung Aquila target based on S5PC110 SoC @@ -92,6 +93,7 @@ config MACH_GONI select S5PV210_SETUP_I2C2 select S5PV210_SETUP_KEYPAD select S5PV210_SETUP_SDHCI + select CMA_DEVICE_POSSIBLE help Machine support for Samsung GONI board S5PC110(MCP) is one of package option of S5PV210 diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c index 28677ca..8608a16 100644 --- a/arch/arm/mach-s5pv210/mach-aquila.c +++ b/arch/arm/mach-s5pv210/mach-aquila.c @@ -39,6 +39,7 @@ #include plat/fb.h #include plat/fimc-core.h #include plat/sdhci.h +#include plat/cma-stub.h /* Following are default values for UCON, ULCON and UFCON UART registers */ #define AQUILA_UCON_DEFAULT(S3C2410_UCON_TXILEVEL |\ @@ -690,4 +691,5 @@ MACHINE_START(AQUILA, Aquila) .map_io = aquila_map_io, .init_machine = aquila_machine_init, .timer = s3c24xx_timer, + .reserve= cma_mach_reserve, MACHINE_END diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c index b1dcf96..b1bf079 100644 --- a/arch/arm/mach-s5pv210/mach-goni.c +++ b/arch/arm/mach-s5pv210/mach-goni.c @@ -45,6 +45,7 @@ #include plat/keypad.h #include plat/sdhci.h #include plat/clock.h +#include plat/cma-stub.h /* Following are default values for UCON, ULCON and UFCON UART registers */ #define GONI_UCON_DEFAULT (S3C2410_UCON_TXILEVEL |\ @@ -865,4 +866,5 @@ MACHINE_START(GONI, GONI) .map_io = goni_map_io, .init_machine = goni_machine_init, .timer = s3c24xx_timer, + .reserve= cma_mach_reserve, MACHINE_END diff --git a/arch/arm/mach-s5pv310/Kconfig b/arch/arm/mach-s5pv310/Kconfig index d64efe0..ae4e0da 100644 --- a/arch/arm/mach-s5pv310/Kconfig +++ b/arch/arm/mach-s5pv310/Kconfig @@ -85,6 +85,7 @@ config MACH_UNIVERSAL_C210 select S5P_DEV_ONENAND select S3C_DEV_I2C1 select S5PV310_SETUP_I2C1 + select CMA_DEVICE_POSSIBLE help Machine support for Samsung Mobile Universal S5PC210 Reference Board. S5PC210(MCP) is one of package option of S5PV310 diff --git a/arch/arm/mach-s5pv310/mach-universal_c210.c b/arch/arm/mach-s5pv310/mach-universal_c210.c index 16d8fc0..d65703a 100644 --- a/arch/arm/mach-s5pv310/mach-universal_c210.c +++ b/arch/arm/mach-s5pv310/mach-universal_c210.c @@ -21,6 +21,7 @@ #include plat/s5pv310.h #include plat/cpu.h #include plat/devs.h +#include plat/cma-stub.h #include mach/map.h @@ -152,6 +153,7 @@ MACHINE_START(UNIVERSAL_C210, UNIVERSAL_C210) .boot_params= S5P_PA_SDRAM + 0x100, .init_irq = s5pv310_init_irq, .map_io = universal_map_io, + .reserve= cma_mach_reserve, .init_machine = universal_machine_init, .timer = s5pv310_timer, MACHINE_END diff --git a/arch/arm/plat-s5p/Makefile b/arch/arm/plat-s5p/Makefile index de65238..6fdb6ce 100644 --- a/arch/arm/plat-s5p/Makefile +++ b/arch/arm/plat-s5p/Makefile @@ -28,3 +28,5 @@ obj-$(CONFIG_S5P_DEV_FIMC0) += dev-fimc0.o obj-$(CONFIG_S5P_DEV_FIMC1)+= dev-fimc1.o obj-$(CONFIG_S5P_DEV_FIMC2)+= dev-fimc2.o obj-$(CONFIG_S5P_DEV_ONENAND) += dev-onenand.o + +obj-$(CONFIG_CMA) += cma-stub.o diff --git a/arch/arm/plat-s5p/cma-stub.c b/arch/arm/plat-s5p/cma-stub.c new file mode 100644 index 000..c175ba8 --- /dev/null +++ b/arch/arm/plat-s5p/cma-stub.c @@ -0,0 +1,49 @@ +/* + * This file is just a quick and dirty hack to get CMA testing device + * working. The cma_mach_reserve() should be called as mach's reserve + * callback. CMA testing device will use cma_ctx for allocations
[PATCHv8 07/12] mm: cma: Contiguous Memory Allocator added
The Contiguous Memory Allocator is a set of functions that lets one initialise a region of memory which then can be used to perform allocations of contiguous memory chunks from. CMA allows for creation of private and non-private contexts. The former is reserved for CMA and no other kernel subsystem can use it. The latter allows for movable pages to be allocated within CMA's managed memory so that it can be used for page cache when CMA devices do not use it. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/cma.h | 219 ++ mm/Kconfig | 22 mm/Makefile |1 + mm/cma.c| 328 +++ 4 files changed, 570 insertions(+), 0 deletions(-) create mode 100644 include/linux/cma.h create mode 100644 mm/cma.c diff --git a/include/linux/cma.h b/include/linux/cma.h new file mode 100644 index 000..e9575fd --- /dev/null +++ b/include/linux/cma.h @@ -0,0 +1,219 @@ +#ifndef __LINUX_CMA_H +#define __LINUX_CMA_H + +/* + * Contiguous Memory Allocator + * Copyright (c) 2010 by Samsung Electronics. + * Written by Michal Nazarewicz (m.nazarew...@samsung.com) + */ + +/* + * Contiguous Memory Allocator + * + * The Contiguous Memory Allocator (CMA) makes it possible for + * device drivers to allocate big contiguous chunks of memory after + * the system has booted. + * + * It requires some machine- and/or platform-specific initialisation + * code which prepares memory ranges to be used with CMA and later, + * device drivers can allocate memory from those ranges. + * + * Why is it needed? + * + * Various devices on embedded systems have no scatter-getter and/or + * IO map support and require contiguous blocks of memory to + * operate. They include devices such as cameras, hardware video + * coders, etc. + * + * Such devices often require big memory buffers (a full HD frame + * is, for instance, more then 2 mega pixels large, i.e. more than 6 + * MB of memory), which makes mechanisms such as kmalloc() or + * alloc_page() ineffective. + * + * At the same time, a solution where a big memory region is + * reserved for a device is suboptimal since often more memory is + * reserved then strictly required and, moreover, the memory is + * inaccessible to page system even if device drivers don't use it. + * + * CMA tries to solve this issue by operating on memory regions + * where only movable pages can be allocated from. This way, kernel + * can use the memory for pagecache and when device driver requests + * it, allocated pages can be migrated. + * + * Driver usage + * + * For device driver to use CMA it needs to have a pointer to a CMA + * context represented by a struct cma (which is an opaque data + * type). + * + * Once such pointer is obtained, device driver may allocate + * contiguous memory chunk using the following function: + * + * cm_alloc() + * + * This function returns a pointer to struct cm (another opaque data + * type) which represent a contiguous memory chunk. This pointer + * may be used with the following functions: + * + * cm_free()-- frees allocated contiguous memory + * cm_pin() -- pins memory + * cm_unpin() -- unpins memory + * cm_vmap()-- maps memory in kernel space + * cm_vunmap() -- unmaps memory from kernel space + * + * See the respective functions for more information. + * + * Platform/machine integration + * + * For device drivers to be able to use CMA platform or machine + * initialisation code must create a CMA context and pass it to + * device drivers. The latter may be done by a global variable or + * a platform/machine specific function. For the former CMA + * provides the following functions: + * + * cma_reserve() + * cma_create() + * + * The cma_reserve() function must be called when memblock is still + * operational and reserving memory with it is still possible. On + * ARM platform the reserve machine callback is a perfect place to + * call it. + * + * The last function creates a CMA context on a range of previously + * initialised memory addresses. Because it uses kmalloc() it needs + * to be called after SLAB is initialised. + */ + +/* Kernel level API */ + +#if defined __KERNEL__ defined CONFIG_CMA + +/* CMA context */ +struct cma; +/* Contiguous Memory chunk */ +struct cm; + +/** + * cma_reserve() - reserves memory. + * @start: start address of the memory range in bytes hint; if unsure + * pass zero. + * @size: size of the memory to reserve in bytes. + * @alignment: desired alignment in bytes (must be power of two or zero). + * + * It will use memblock to allocate memory. @start and @size will be + * aligned to PAGE_SIZE. + * + * Returns reserved's area physical address or value
[PATCHv8 10/12] mm: MIGRATE_CMA support added to CMA
This commit adds MIGRATE_CMA migratetype support to the CMA. The advantage is that an (almost) arbitrary memory range can be marked as MIGRATE_CMA which may not be the case with ZONE_MOVABLE. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/cma.h | 58 --- mm/cma.c| 161 +-- 2 files changed, 194 insertions(+), 25 deletions(-) diff --git a/include/linux/cma.h b/include/linux/cma.h index e9575fd..8952531 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -71,9 +71,14 @@ * a platform/machine specific function. For the former CMA * provides the following functions: * + * cma_init_migratetype() * cma_reserve() * cma_create() * + * The first one initialises a portion of reserved memory so that it + * can be used with CMA. The second first tries to reserve memory + * (using memblock) and then initialise it. + * * The cma_reserve() function must be called when memblock is still * operational and reserving memory with it is still possible. On * ARM platform the reserve machine callback is a perfect place to @@ -93,21 +98,56 @@ struct cma; /* Contiguous Memory chunk */ struct cm; +#ifdef CONFIG_MIGRATE_CMA + +/** + * cma_init_migratetype() - initialises range of physical memory to be used + * with CMA context. + * @start: start address of the memory range in bytes. + * @size: size of the memory range in bytes. + * + * The range must be MAX_ORDER_NR_PAGES aligned and it must have been + * already reserved (eg. with memblock). + * + * The actual initialisation is deferred until subsys initcalls are + * evaluated (unless this has already happened). + * + * Returns zero on success or negative error. + */ +int cma_init_migratetype(unsigned long start, unsigned long end); + +#else + +static inline int cma_init_migratetype(unsigned long start, unsigned long end) +{ + (void)start; (void)end; + return -EOPNOTSUPP; +} + +#endif + /** * cma_reserve() - reserves memory. * @start: start address of the memory range in bytes hint; if unsure * pass zero. * @size: size of the memory to reserve in bytes. * @alignment: desired alignment in bytes (must be power of two or zero). + * @init_migratetype: whether to initialise pageblocks. + * + * It will use memblock to allocate memory. If @init_migratetype is + * true, the function will also call cma_init_migratetype() on + * reserved region so that a non-private CMA context can be created on + * given range. * - * It will use memblock to allocate memory. @start and @size will be - * aligned to PAGE_SIZE. + * @start and @size will be aligned to PAGE_SIZE if @init_migratetype + * is false or to (MAX_ORDER_NR_PAGES PAGE_SHIFT) if + * @init_migratetype is true. * * Returns reserved's area physical address or value that yields true * when checked with IS_ERR_VALUE(). */ unsigned long cma_reserve(unsigned long start, unsigned long size, - unsigned long alignment); + unsigned long alignment, _Bool init_migratetype); /** * cma_create() - creates a CMA context. @@ -118,12 +158,14 @@ unsigned long cma_reserve(unsigned long start, unsigned long size, * * The range must be page aligned. Different contexts cannot overlap. * - * Unless @private is true the memory range must lay in ZONE_MOVABLE. - * If @private is true no underlaying memory checking is done and - * during allocation no pages migration will be performed - it is - * assumed that the memory is reserved and only CMA manages it. + * Unless @private is true the memory range must either lay in + * ZONE_MOVABLE or must have been initialised with + * cma_init_migratetype() function. If @private is true no + * underlaying memory checking is done and during allocation no pages + * migration will be performed - it is assumed that the memory is + * reserved and only CMA manages it. * - * @start and @size must be page and @min_alignment alignment. + * @start and @size must be page and @min_alignment aligned. * @min_alignment specifies the minimal alignment that user will be * able to request through cm_alloc() function. In most cases one * will probably pass zero as @min_alignment but if the CMA context diff --git a/mm/cma.c b/mm/cma.c index d82361b..4017dee 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -57,21 +57,130 @@ static unsigned long phys_to_pfn(phys_addr_t phys) /* Initialise CMA */ +#ifdef CONFIG_MIGRATE_CMA + +static struct cma_grabbed { + unsigned long start; + unsigned long size; +} cma_grabbed[8] __initdata; +static unsigned cma_grabbed_count __initdata; + +#ifdef CONFIG_DEBUG_VM + +static int __cma_give_back(unsigned long start, unsigned long size) +{ + unsigned long pfn = phys_to_pfn(start
[PATCHv8 11/12] mm: cma: Test device and application added
This patch adds a cma misc device which lets user space use the CMA API. This device is meant for testing. A testing application is also provided. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/misc/Kconfig | 28 +++ drivers/misc/Makefile |1 + drivers/misc/cma-dev.c | 238 + include/linux/cma.h| 29 +++ tools/cma/cma-test.c | 457 5 files changed, 753 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/cma-dev.c create mode 100644 tools/cma/cma-test.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 1e1a4be..b90e36b 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -458,4 +458,32 @@ source drivers/misc/cb710/Kconfig source drivers/misc/iwmc3200top/Kconfig source drivers/misc/ti-st/Kconfig +# Selet this for platforms/machines that implemented code making +# the CMA test device usable. +config CMA_DEVICE_POSSIBLE + bool + +config CMA_DEVICE + tristate CMA test device (DEVELOPEMENT) + depends on CMA CMA_DEVICE_POSSIBLE + help + The CMA misc device allows allocating contiguous memory areas + from user space. This is for testing of the CMA framework. + + If unsure, say n + +# Selet this for platforms/machines that implemented code making +# the CMA test device usable. +config CMA_DEVICE_POSSIBLE + bool + +config CMA_DEVICE + tristate CMA test device (DEVELOPEMENT) + depends on CMA CMA_DEVICE_POSSIBLE + help + The CMA misc device allows allocating contiguous memory areas + from user space. This is for testing of the CMA framework. + + If unsure, say n + endif # MISC_DEVICES diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 98009cc..f8eadd4 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o obj-$(CONFIG_PCH_PHUB) += pch_phub.o obj-y += ti-st/ obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o +obj-$(CONFIG_CMA_DEVICE) += cma-dev.o diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c new file mode 100644 index 000..6c36064 --- /dev/null +++ b/drivers/misc/cma-dev.c @@ -0,0 +1,238 @@ +/* + * Contiguous Memory Allocator userspace driver + * Copyright (c) 2010 by Samsung Electronics. + * Written by Michal Nazarewicz (m.nazarew...@samsung.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License or (at your optional) any later version of the license. + */ + +#define pr_fmt(fmt) cma: fmt + +#ifdef CONFIG_CMA_DEBUG +# define DEBUG +#endif + +#include linux/errno.h /* Error numbers */ +#include linux/err.h /* IS_ERR_VALUE() */ +#include linux/fs.h /* struct file */ +#include linux/mm.h /* Memory stuff */ +#include linux/mman.h +#include linux/slab.h +#include linux/module.h /* Standard module stuff */ +#include linux/device.h /* struct device, dev_dbg() */ +#include linux/types.h /* Just to be safe ;) */ +#include linux/uaccess.h /* __copy_{to,from}_user */ +#include linux/miscdevice.h /* misc_register() and company */ + +#include linux/cma.h + +#include plat/cma-stub.h + +static int cma_file_open(struct inode *inode, struct file *file); +static int cma_file_release(struct inode *inode, struct file *file); +static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg); +static int cma_file_mmap(struct file *file, struct vm_area_struct *vma); + +static struct miscdevice cma_miscdev = { + .minor = MISC_DYNAMIC_MINOR, + .name = cma, + .fops = (const struct file_operations) { + .owner = THIS_MODULE, + .open = cma_file_open, + .release= cma_file_release, + .unlocked_ioctl = cma_file_ioctl, + .mmap = cma_file_mmap, + }, +}; +#define cma_dev (cma_miscdev.this_device) + +struct cma_private_data { + struct cm *cm; + unsigned long size; + unsigned long phys; +}; + +static int cma_file_open(struct inode *inode, struct file *file) +{ + struct cma_private_data *prv; + + dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file); + + if (!cma_ctx) + return -EOPNOTSUPP; + + prv = kzalloc(sizeof *prv, GFP_KERNEL); + if (!prv) + return -ENOMEM; + + file-private_data = prv; + + return 0; +} + +static int cma_file_release(struct inode *inode, struct file *file) +{ + struct cma_private_data *prv = file-private_data; + + dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file); + + if (prv-cm
[PATCHv8 08/12] mm: MIGRATE_CMA migration type added
The MIGRATE_CMA migration type has two main characteristics: (i) only movable pages can be allocated from MIGRATE_CMA pageblocks and (ii) page allocator will never change migration type of MIGRATE_CMA pageblocks. This guarantees that page in a MIGRATE_CMA page block can always be migrated somewhere else (unless there's no memory left in the system). It is designed to be used with Contiguous Memory Allocator (CMA) for allocating big chunks (eg. 10MiB) of physically contiguous memory. Once driver requests contiguous memory, CMA will migrate pages from MIGRATE_CMA pageblocks. To minimise number of migrations, MIGRATE_CMA migration type is the last type tried when page allocator falls back to other migration types then requested. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/mmzone.h | 43 mm/Kconfig | 14 mm/compaction.c| 10 + mm/internal.h |3 ++ mm/page_alloc.c| 87 +-- 5 files changed, 131 insertions(+), 26 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 39c24eb..cc798b1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -35,13 +35,37 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3 -#define MIGRATE_UNMOVABLE 0 -#define MIGRATE_RECLAIMABLE 1 -#define MIGRATE_MOVABLE 2 -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ -#define MIGRATE_RESERVE 3 -#define MIGRATE_ISOLATE 4 /* can't allocate from here */ -#define MIGRATE_TYPES 5 +enum { + MIGRATE_UNMOVABLE, + MIGRATE_RECLAIMABLE, + MIGRATE_MOVABLE, + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ + MIGRATE_RESERVE = MIGRATE_PCPTYPES, +#ifdef CONFIG_MIGRATE_CMA + /* +* MIGRATE_CMA migration type is designed to mimic the way +* ZONE_MOVABLE works. Only movable pages can be allocated +* from MIGRATE_CMA pageblocks and page allocator never +* implicitly change migration type of MIGRATE_CMA pageblock. +* +* The way to use it is to change migratetype of a range of +* pageblocks to MIGRATE_CMA which can be done by +* __free_pageblock_cma() function. What is important though +* is that a range of pageblocks must be aligned to +* MAX_ORDER_NR_PAGES should biggest page be bigger then +* a single pageblock. +*/ + MIGRATE_CMA, +#endif + MIGRATE_ISOLATE,/* can't allocate from here */ + MIGRATE_TYPES +}; + +#ifdef CONFIG_MIGRATE_CMA +# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) +#else +# define is_migrate_cma(migratetype) false +#endif #define for_each_migratetype_order(order, type) \ for (order = 0; order MAX_ORDER; order++) \ @@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page) return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end); } +static inline bool is_pageblock_cma(struct page *page) +{ + return is_migrate_cma(get_pageblock_migratetype(page)); +} + struct free_area { struct list_headfree_list[MIGRATE_TYPES]; unsigned long nr_free; diff --git a/mm/Kconfig b/mm/Kconfig index 2beab4d..32fb085 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -343,6 +343,20 @@ config CMA For more information see include/linux/cma.h. If unsure, say n. +config MIGRATE_CMA + bool Use MIGRATE_CMA migratetype + depends on CMA + default y + help + This enables the use the MIGRATE_CMA migrate type in the CMA. + MIGRATE_CMA lets CMA work on almost arbitrary memory range and + not only inside ZONE_MOVABLE. + + This option can also be selected by code that uses MIGRATE_CMA + even if CMA is not present. + + If unsure, say y. + config CMA_DEBUG bool CMA debug messages (DEVELOPEMENT) depends on CMA diff --git a/mm/compaction.c b/mm/compaction.c index 4d709ee..c5e404b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page) if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE) return false; + /* Keep MIGRATE_CMA alone as well. */ + /* +* XXX Revisit. We currently cannot let compaction touch CMA +* pages since compaction insists on changing their migration +* type to MIGRATE_MOVABLE (see split_free_page() called from +* isolate_freepages_block() above). +*/ + if (is_migrate_cma(migratetype)) + return false; + /* If the page is a large free page, then allow migration */ if (PageBuddy(page) page_order(page) = pageblock_order) return true; diff --git a/mm
[PATCHv8 09/12] mm: MIGRATE_CMA isolation functions added
This commit changes various functions that change pages and pageblocks migrate type between MIGRATE_ISOLATE and MIGRATE_MOVABLE in such a way as to allow to work with MIGRATE_CMA migrate type. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/page-isolation.h | 40 +++- mm/page_alloc.c| 19 --- mm/page_isolation.c| 15 --- 3 files changed, 47 insertions(+), 27 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index c5d1a7c..177b307 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -3,39 +3,53 @@ /* * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE. - * If specified range includes migrate types other than MOVABLE, + * If specified range includes migrate types other than MOVABLE or CMA, * this will fail with -EBUSY. * * For isolating all pages in the range finally, the caller have to * free all pages in the range. test_page_isolated() can be used for * test it. */ -extern int -start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); +int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype); + +static inline int +start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +{ + return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); +} + +int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype); /* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. * target range is [start_pfn, end_pfn) */ -extern int -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); +static inline int +undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +{ + return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); +} /* - * test all pages in [start_pfn, end_pfn)are isolated or not. + * Test all pages in [start_pfn, end_pfn) are isolated or not. */ -extern int -test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); +int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); /* - * Internal funcs.Changes pageblock's migrate type. - * Please use make_pagetype_isolated()/make_pagetype_movable(). + * Internal functions. Changes pageblock's migrate type. */ -extern int set_migratetype_isolate(struct page *page); -extern void unset_migratetype_isolate(struct page *page); +int set_migratetype_isolate(struct page *page); +void __unset_migratetype_isolate(struct page *page, unsigned migratetype); +static inline void unset_migratetype_isolate(struct page *page) +{ + __unset_migratetype_isolate(page, MIGRATE_MOVABLE); +} extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); extern int alloc_contig_range(unsigned long start, unsigned long end, - gfp_t flags); + gfp_t flags, unsigned migratetype); extern void free_contig_pages(struct page *page, int nr_pages); /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e706282..7f913d1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5460,7 +5460,7 @@ out: return ret; } -void unset_migratetype_isolate(struct page *page) +void __unset_migratetype_isolate(struct page *page, unsigned migratetype) { struct zone *zone; unsigned long flags; @@ -5468,8 +5468,8 @@ void unset_migratetype_isolate(struct page *page) spin_lock_irqsave(zone-lock, flags); if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) goto out; - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(zone, page, MIGRATE_MOVABLE); + set_pageblock_migratetype(page, migratetype); + move_freepages_block(zone, page, migratetype); out: spin_unlock_irqrestore(zone-lock, flags); } @@ -5574,6 +5574,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) * @start: start PFN to allocate * @end: one-past-the-last PFN to allocate * @flags: flags passed to alloc_contig_freed_pages(). + * @migratetype: migratetype of the underlaying pageblocks (either + * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks + * in range must have the same migratetype and it must + * be either of the two. * * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES * aligned, hovewer it's callers responsibility to guarantee that we @@ -5585,7 +5589,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) * need to be freed with free_contig_pages(). */ int
[PATCHv8 06/12] mm: alloc_contig_range() added
This commit adds the alloc_contig_range() function which tries to allecate given range of pages. It tries to migrate all already allocated pages that fall in the range thus freeing them. Once all pages in the range are freed they are removed from the buddy system thus allocated for the caller to use. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/page-isolation.h |2 + mm/page_alloc.c| 144 2 files changed, 146 insertions(+), 0 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index f1417ed..c5d1a7c 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page); extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); +extern int alloc_contig_range(unsigned long start, unsigned long end, + gfp_t flags); extern void free_contig_pages(struct page *page, int nr_pages); /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index be240a3..008a6e8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5463,6 +5463,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, return pfn; } +static unsigned long pfn_to_maxpage(unsigned long pfn) +{ + return pfn ~(MAX_ORDER_NR_PAGES - 1); +} + +static unsigned long pfn_to_maxpage_up(unsigned long pfn) +{ + return ALIGN(pfn, MAX_ORDER_NR_PAGES); +} + +#define MIGRATION_RETRY5 +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) +{ + int migration_failed = 0, ret; + unsigned long pfn = start; + + /* +* Some code borrowed from KAMEZAWA Hiroyuki's +* __alloc_contig_pages(). +*/ + + for (;;) { + pfn = scan_lru_pages(pfn, end); + if (!pfn || pfn = end) + break; + + ret = do_migrate_range(pfn, end); + if (!ret) { + migration_failed = 0; + } else if (ret != -EBUSY + || ++migration_failed = MIGRATION_RETRY) { + return ret; + } else { + /* There are unstable pages.on pagevec. */ + lru_add_drain_all(); + /* +* there may be pages on pcplist before +* we mark the range as ISOLATED. +*/ + drain_all_pages(); + } + cond_resched(); + } + + if (!migration_failed) { + /* drop all pages in pagevec and pcp list */ + lru_add_drain_all(); + drain_all_pages(); + } + + /* Make sure all pages are isolated */ + if (WARN_ON(test_pages_isolated(start, end))) + return -EBUSY; + + return 0; +} + +/** + * alloc_contig_range() -- tries to allocate given range of pages + * @start: start PFN to allocate + * @end: one-past-the-last PFN to allocate + * @flags: flags passed to alloc_contig_freed_pages(). + * + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES + * aligned, hovewer it's callers responsibility to guarantee that we + * are the only thread that changes migrate type of pageblocks the + * pages fall in. + * + * Returns zero on success or negative error code. On success all + * pages which PFN is in (start, end) are allocated for the caller and + * need to be freed with free_contig_pages(). + */ +int alloc_contig_range(unsigned long start, unsigned long end, + gfp_t flags) +{ + unsigned long _start, _end; + int ret; + + /* +* What we do here is we mark all pageblocks in range as +* MIGRATE_ISOLATE. Because of the way page allocator work, we +* align the range to MAX_ORDER pages so that page allocator +* won't try to merge buddies from different pageblocks and +* change MIGRATE_ISOLATE to some other migration type. +* +* Once the pageblocks are marked as MIGRATE_ISOLATE, we +* migrate the pages from an unaligned range (ie. pages that +* we are interested in). This will put all the pages in +* range back to page allocator as MIGRATE_ISOLATE. +* +* When this is done, we take the pages in range from page +* allocator removing them from the buddy system. This way +* page allocator will never consider using them. +* +* This lets us mark the pageblocks back as +* MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the +* MAX_ORDER aligned range
[PATCHv8 03/12] lib: genalloc: Generic allocator improvements
This commit adds a gen_pool_alloc_aligned() function to the generic allocator API. It allows specifying alignment for the allocated block. This feature uses the bitmap_find_next_zero_area_off() function. It also fixes possible issue with bitmap's last element being not fully allocated (ie. space allocated for chunk-bits is not a multiple of sizeof(long)). It also makes some other smaller changes: - moves structure definitions out of the header file, - adds __must_check to functions returning value, - makes gen_pool_add() return -ENOMEM rater than -1 on error, - changes list_for_each to list_for_each_entry, and - makes use of bitmap_clear(). Signed-off-by: Michal Nazarewicz min...@mina86.com --- include/linux/genalloc.h | 46 ++-- lib/genalloc.c | 182 ++--- 2 files changed, 129 insertions(+), 99 deletions(-) diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 9869ef3..8ac7337 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -8,29 +8,31 @@ * Version 2. See the file COPYING for more details. */ +struct gen_pool; -/* - * General purpose special memory pool descriptor. - */ -struct gen_pool { - rwlock_t lock; - struct list_head chunks;/* list of chunks in this pool */ - int min_alloc_order;/* minimum allocation order */ -}; +struct gen_pool *__must_check gen_pool_create(unsigned order, int nid); -/* - * General purpose special memory pool chunk descriptor. +int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr, + size_t size, int nid); + +void gen_pool_destroy(struct gen_pool *pool); + +unsigned long __must_check +gen_pool_alloc_aligned(struct gen_pool *pool, size_t size, + unsigned alignment_order); + +/** + * gen_pool_alloc() - allocate special memory from the pool + * @pool: Pool to allocate from. + * @size: Number of bytes to allocate from the pool. + * + * Allocate the requested number of bytes from the specified pool. + * Uses a first-fit algorithm. */ -struct gen_pool_chunk { - spinlock_t lock; - struct list_head next_chunk;/* next chunk in pool */ - unsigned long start_addr; /* starting address of memory chunk */ - unsigned long end_addr; /* ending address of memory chunk */ - unsigned long bits[0]; /* bitmap for allocating memory chunk */ -}; +static inline unsigned long __must_check +gen_pool_alloc(struct gen_pool *pool, size_t size) +{ + return gen_pool_alloc_aligned(pool, size, 0); +} -extern struct gen_pool *gen_pool_create(int, int); -extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int); -extern void gen_pool_destroy(struct gen_pool *); -extern unsigned long gen_pool_alloc(struct gen_pool *, size_t); -extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); +void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size); diff --git a/lib/genalloc.c b/lib/genalloc.c index 1923f14..0761079 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -16,53 +16,80 @@ #include linux/genalloc.h +/* General purpose special memory pool descriptor. */ +struct gen_pool { + rwlock_t lock; /* protects chunks list */ + struct list_head chunks;/* list of chunks in this pool */ + unsigned order; /* minimum allocation order */ +}; + +/* General purpose special memory pool chunk descriptor. */ +struct gen_pool_chunk { + spinlock_t lock;/* protects bits */ + struct list_head next_chunk;/* next chunk in pool */ + unsigned long start;/* start of memory chunk */ + unsigned long size; /* number of bits */ + unsigned long bits[0]; /* bitmap for allocating memory chunk */ +}; + + /** - * gen_pool_create - create a new special memory pool - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents - * @nid: node id of the node the pool structure should be allocated on, or -1 + * gen_pool_create() - create a new special memory pool + * @order: Log base 2 of number of bytes each bitmap bit + * represents. + * @nid: Node id of the node the pool structure should be allocated + * on, or -1. This will be also used for other allocations. * * Create a new special memory pool that can be used to manage special purpose * memory not managed by the regular kmalloc/kfree interface. */ -struct gen_pool *gen_pool_create(int min_alloc_order, int nid) +struct gen_pool *__must_check gen_pool_create(unsigned order, int nid) { struct gen_pool *pool; - pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid); - if (pool != NULL) { + if (WARN_ON(order = BITS_PER_LONG)) + return NULL; + + pool = kmalloc_node(sizeof *pool, GFP_KERNEL, nid
[PATCHv8 00/12] Contiguous Memory Allocator
Hello everyone, This is yet another version of CMA this time stripped from a lot of code and with working migration implementation. The Contiguous Memory Allocator (CMA) makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. For more information see 7th patch in the set. This version fixes some things Kamezawa suggested plus it separates code that uses MIGRATE_CMA from the rest of the code. This I hope will help to grasp the overall idea of CMA. The current version is just an allocator that handles allocation of contiguous memory blocks. The difference between this patchset and Kamezawa's alloc_contig_pages() are: 1. alloc_contig_pages() requires MAX_ORDER alignment of allocations which may be unsuitable for embeded systems where a few MiBs are required. Lack of the requirement on the alignment means that several threads might try to access the same pageblock/page. To prevent this from happening CMA uses a mutex so that only one cm_alloc()/cm_free() function may run at one point. 2. CMA may use its own migratetype (MIGRATE_CMA) which behaves similarly to ZONE_MOVABLE but can be put in arbitrary places. This is required for us since we need to define two disjoint memory ranges inside system RAM. (ie. in two memory banks (do not confuse with nodes)). 3. alloc_contig_pages() scans memory in search for range that could be migrated. CMA on the other hand maintains its own allocator to decide where to allocate memory for device drivers and then tries to migrate pages from that part if needed. This is not strictly required but I somehow feel it might be faster. Links to previous versions of the patchset: v7: http://article.gmane.org/gmane.linux.kernel.mm/55626 v6: http://article.gmane.org/gmane.linux.kernel.mm/55626 v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: http://article.gmane.org/gmane.linux.kernel.mm/52010 v3: http://article.gmane.org/gmane.linux.kernel.mm/51573 v2: http://article.gmane.org/gmane.linux.kernel.mm/50986 v1: http://article.gmane.org/gmane.linux.kernel.mm/50669 Changelog: v8: 1. The alloc_contig_range() function has now been separated from CMA and put in page_allocator.c. This function tries to migrate all LRU pages in specified range and then allocate the range using alloc_contig_freed_pages(). 2. Support for MIGRATE_CMA has been separated from the CMA code. I have not tested if CMA works with ZONE_MOVABLE but I see no reasons why it shouldn't. 3. I have added a @private argument when creating CMA contexts so that one can reserve memory and not share it with the rest of the system. This way, CMA acts only as allocation algorithm. v7: 1. A lot of functionality that handled driver-allocator_context mapping has been removed from the patchset. This is not to say that this code is not needed, it's just not worth posting everything in one patchset. Currently, CMA is just an allocator. It uses it's own migratetype (MIGRATE_CMA) for defining ranges of pageblokcs which behave just like ZONE_MOVABLE but dispite the latter can be put in arbitrary places. 2. The migration code that was introduced in the previous version actually started working. v6: 1. Most importantly, v6 introduces support for memory migration. The implementation is not yet complete though. Migration support means that when CMA is not using memory reserved for it, page allocator can allocate pages from it. When CMA wants to use the memory, the pages have to be moved and/or evicted as to make room for CMA. To make it possible it must be guaranteed that only movable and reclaimable pages are allocated in CMA controlled regions. This is done by introducing a MIGRATE_CMA migrate type that guarantees exactly that. Some of the migration code is borrowed from Kamezawa Hiroyuki's alloc_contig_pages() implementation. The main difference is that thanks to MIGRATE_CMA migrate type CMA assumes that memory controlled by CMA are is always movable or reclaimable so that it makes allocation decisions regardless of the whether some pages are actually allocated and migrates them if needed. The most interesting patches from the patchset that implement the functionality are: 09/13: mm: alloc_contig_free_pages() added 10/13: mm: MIGRATE_CMA migration type added 11/13: mm: MIGRATE_CMA isolation functions added 12/13: mm: cma: Migration support added [wip] Currently, kernel panics in some situations which I am trying to investigate. 2. cma_pin() and cma_unpin() functions has been added (after a conversation with Johan Mossberg). The idea is that whenever hardware does not use the
Re: [PATCHv7 06/10] mm: MIGRATE_CMA migration type added
On Mon, 13 Dec 2010 12:26:47 +0100 Michal Nazarewicz m.nazarew...@samsung.com wrote: diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h @@ -35,13 +35,24 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3 -#define MIGRATE_UNMOVABLE 0 -#define MIGRATE_RECLAIMABLE 1 -#define MIGRATE_MOVABLE 2 -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ -#define MIGRATE_RESERVE 3 -#define MIGRATE_ISOLATE 4 /* can't allocate from here */ -#define MIGRATE_TYPES 5 +enum { +MIGRATE_UNMOVABLE, +MIGRATE_RECLAIMABLE, +MIGRATE_MOVABLE, +MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ +MIGRATE_RESERVE = MIGRATE_PCPTYPES, +MIGRATE_ISOLATE,/* can't allocate from here */ +#ifdef CONFIG_MIGRATE_CMA +MIGRATE_CMA,/* only movable */ +#endif +MIGRATE_TYPES +}; KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com writes: I personaly would like to put MIGRATE_ISOLATE to be bottom of enum because it means _not_for_allocation. Will change. I didn't want to change the value of MIGRATE_ISOLATE in fear of breaking something but hopefully no one depends on MIGRATE_ISOLATE's value. diff --git a/mm/compaction.c b/mm/compaction.c @@ -824,11 +848,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, * This array describes the order lists are fallen back to when * the free lists for the desirable migrate type are depleted */ -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = { +static int fallbacks[MIGRATE_TYPES][4] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, +#ifdef CONFIG_MIGRATE_CMA +[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA, MIGRATE_RESERVE }, +#else [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, -[MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */ +#endif +[MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */ }; /* @@ -924,12 +952,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* Find the largest possible block of pages in the other list */ for (current_order = MAX_ORDER-1; current_order = order; --current_order) { -for (i = 0; i MIGRATE_TYPES - 1; i++) { +for (i = 0; i ARRAY_SIZE(fallbacks[0]); i++) { Why fallbacks[0] ? and why do you need to change this ? I've changed the dimensions of fallbacks matrix, in particular second dimension from MIGRATE_TYPE - 1 to 4 so this place needed to be changed as well. Now, I think changing to ARRAY_SIZE() is just the safest option available. This is actually just a minor optimisation. migratetype = fallbacks[start_migratetype][i]; /* MIGRATE_RESERVE handled later if necessary */ if (migratetype == MIGRATE_RESERVE) -continue; +break; Isn't this change enough for your purpose ? This is mostly just an optimisation really. I'm not sure what you think is my purpose here. ;) It does fix an this issue of some of the fallback[*] arrays having MIGRATETYPE_UNMOVABLE at the end. @@ -1042,7 +1083,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add(page-lru, list); else list_add_tail(page-lru, list); -set_page_private(page, migratetype); +#ifdef CONFIG_MIGRATE_CMA +if (is_pageblock_cma(page)) +set_page_private(page, MIGRATE_CMA); +else +#endif +set_page_private(page, migratetype); Hmm, doesn't this meet your changes which makes MIGRATE_CMA MIGRATE_PCPLIST ? And I think putting mixture of pages marked of MIGRATE_TYPE onto a pcplist is ugly. You mean that pcplist page is on can disagree with page_private()? I didn't think this was such a big deal honestly. Unless MIGRATE_CMA MIGRATE_PCPTYPES, a special case is needed either here or in free_pcppages_bulk(), so I think this comes down to whether to make MIGRATE_CAM MIGRATE_PCPTYPES for which see below. @@ -1181,9 +1227,16 @@ void free_hot_cold_page(struct page *page, int cold) * offlined but treat RESERVE as movable pages so we can get those * areas back if necessary. Otherwise, we may have to free * excessively into the page allocator + * + * Still, do not change migration type of MIGRATE_CMA pages (if + * they'd be recorded as MIGRATE_MOVABLE an unmovable page could + * be allocated from MIGRATE_CMA block and we don't want to allow + * that). In this respect, treat
Re: [PATCHv7 08/10] mm: cma: Contiguous Memory Allocator added
On Mon, 13 Dec 2010 12:26:49 +0100 Michal Nazarewicz m.nazarew...@samsung.com wrote: +/* Initialise CMA */ + +static struct cma_grabbed { +unsigned long start; +unsigned long size; +} cma_grabbed[8] __initdata; +static unsigned cma_grabbed_count __initdata; + +int cma_init(unsigned long start, unsigned long size) +{ +pr_debug(%s(%p+%p)\n, __func__, (void *)start, (void *)size); + +if (!size) +return -EINVAL; +if ((start | size) ((MAX_ORDER_NR_PAGES PAGE_SHIFT) - 1)) +return -EINVAL; +if (start + size start) +return -EOVERFLOW; + +if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed)) +return -ENOSPC; + +cma_grabbed[cma_grabbed_count].start = start; +cma_grabbed[cma_grabbed_count].size = size; +++cma_grabbed_count; +return 0; +} + KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com writes: Is it guaranteed that there are no memory holes, or zone overlap in the range ? I think correctness of the range must be checked. I keep thinking about it myself. The idea is that you get memory range reserved using memblock (or some such) thus it should not contain any memory holes. I'm not entirely sure about spanning different zones. I'll add the checking code. +#define MIGRATION_RETRY 5 +static int __cm_migrate(unsigned long start, unsigned long end) +{ [...] +} + +static int __cm_alloc(unsigned long start, unsigned long size) +{ +unsigned long end, _start, _end; +int ret; + [...] + +start = phys_to_pfn(start); +end = start + (size PAGE_SHIFT); + +pr_debug(\tisolate range(%lx, %lx)\n, + pfn_to_maxpage(start), pfn_to_maxpage_up(end)); +ret = __start_isolate_page_range(pfn_to_maxpage(start), + pfn_to_maxpage_up(end), MIGRATE_CMA); +if (ret) +goto done; + +pr_debug(\tmigrate range(%lx, %lx)\n, start, end); +ret = __cm_migrate(start, end); +if (ret) +goto done; + [...] + +pr_debug(\tfinding buddy\n); +ret = 0; +while (!PageBuddy(pfn_to_page(start (~0UL ret +if (WARN_ON(++ret = MAX_ORDER)) +return -EINVAL; + +_start = start (~0UL ret); +pr_debug(\talloc freed(%lx, %lx)\n, _start, end); +_end = alloc_contig_freed_pages(_start, end, 0); + +/* Free head and tail (if any) */ +pr_debug(\tfree contig(%lx, %lx)\n, _start, start); +free_contig_pages(pfn_to_page(_start), start - _start); +pr_debug(\tfree contig(%lx, %lx)\n, end, _end); +free_contig_pages(pfn_to_page(end), _end - end); + +ret = 0; + +done: +pr_debug(\tundo isolate range(%lx, %lx)\n, + pfn_to_maxpage(start), pfn_to_maxpage_up(end)); +__undo_isolate_page_range(pfn_to_maxpage(start), + pfn_to_maxpage_up(end), MIGRATE_CMA); + +pr_debug(ret = %d\n, ret); +return ret; +} + +static void __cm_free(unsigned long start, unsigned long size) +{ +pr_debug(%s(%p+%p)\n, __func__, (void *)start, (void *)size); + +free_contig_pages(pfn_to_page(phys_to_pfn(start)), + size PAGE_SHIFT); +} Hmm, it seems __cm_alloc() and __cm_migrate() has no special codes for CMA. I'd like reuse this for my own contig page allocator. So, could you make these function be more generic (name) ? as __alloc_range(start, size, mirate_type); Then, what I have to do is only to add search range functions. Sure thing. I'll post it tomorrow or Friday. How about alloc_contig_range() maybe? -- Pozdrawiam_ _ .o. | Wasal Jasnie Oswieconej Pani Informatyki o' \,=./ `o ..o | Michal mina86 Nazarewicz mina86*tlen.pl(o o) ooo +---jid:mina86-jabber.org---tlen:mina86---ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 01/10] mm: migrate.c: fix compilation error
GCC complained about update_mmu_cache() not being defined in migrate.c. Including asm/tlbflush.h seems to solve the problem. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- mm/migrate.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index fe5a3c6..6ae8a66 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -35,6 +35,8 @@ #include linux/hugetlb.h #include linux/gfp.h +#include asm/tlbflush.h + #include internal.h #define lru_to_page(_head) (list_entry((_head)-prev, struct page, lru)) -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 00/10] Contiguous Memory Allocator
Hello everyone, This is yet another version of CMA this time stripped from a lot of code and with working migration implementation. The Contiguous Memory Allocator (CMA) makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. For more information see 8th patch in the set. The current version is just an allocator that handles allocation of contiguous memory blocks. The difference between this patchset and Kamezawa's alloc_contig_pages() are: 1. alloc_contig_pages() requires MAX_ORDER alignment of allocations which may be unsuitable for embeded where a few MiBs are required. 2. CMA uses its own migratetype (MIGRATE_CMA) as it was suggested during one of previous iterations. The migrate type behaves similarly to ZONE_MOVABLE but can be put in arbitrary places. This is required for us since we need to define two disjoint memory ranges inside system's RAM. (ie. in two memory banks (do not confuse with nodes)). 3. alloc_contig_pages() scans memory in search for range that could be migrated. CMA on the other hand maintains its own allocator to decide where to allocate memory for device drivers and then tries to migrate pages from that part if needed. This is not strictly required but I somehow feel it might be faster. Links to previous versions of the patchset: v6: http://article.gmane.org/gmane.linux.kernel.mm/55626/ v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: http://article.gmane.org/gmane.linux.kernel.mm/52010/ v3: http://article.gmane.org/gmane.linux.kernel.mm/51573/ v2: http://article.gmane.org/gmane.linux.kernel.mm/50986/ v1: http://article.gmane.org/gmane.linux.kernel.mm/50669/ Changelog: v7: 1. A lot of functionality that handled driver-allocator_context mapping has been removed from the patchset. This is not to say that this code is not needed, it's just not worth posting everything in one patchset. Currently, CMA is just an allocator. It uses it's own migratetype (MIGRATE_CMA) for defining ranges of pageblokcs which behave just like ZONE_MOVABLE but dispite the latter can be put in arbitrary places. 2. The migration code that was introduced in the previous version actually started working. v6: 1. Most importantly, v6 introduces support for memory migration. The implementation is not yet complete though. Migration support means that when CMA is not using memory reserved for it, page allocator can allocate pages from it. When CMA wants to use the memory, the pages have to be moved and/or evicted as to make room for CMA. To make it possible it must be guaranteed that only movable and reclaimable pages are allocated in CMA controlled regions. This is done by introducing a MIGRATE_CMA migrate type that guarantees exactly that. Some of the migration code is borrowed from Kamezawa Hiroyuki's alloc_contig_pages() implementation. The main difference is that thanks to MIGRATE_CMA migrate type CMA assumes that memory controlled by CMA are is always movable or reclaimable so that it makes allocation decisions regardless of the whether some pages are actually allocated and migrates them if needed. The most interesting patches from the patchset that implement the functionality are: 09/13: mm: alloc_contig_free_pages() added 10/13: mm: MIGRATE_CMA migration type added 11/13: mm: MIGRATE_CMA isolation functions added 12/13: mm: cma: Migration support added [wip] Currently, kernel panics in some situations which I am trying to investigate. 2. cma_pin() and cma_unpin() functions has been added (after a conversation with Johan Mossberg). The idea is that whenever hardware does not use the memory (no transaction is on) the chunk can be moved around. This would allow defragmentation to be implemented if desired. No defragmentation algorithm is provided at this time. 3. Sysfs support has been replaced with debugfs. I always felt unsure about the sysfs interface and when Greg KH pointed it out I finally got to rewrite it to debugfs. v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: 1. The asterisk flag has been removed in favour of requiring that platform will provide a *=regions rule in the map attribute. 2. The terminology has been changed slightly renaming kind to type of memory. In the previous revisions, the documentation indicated that device drivers define memory kinds and now, v3: 1. The command line parameters have been removed (and moved to a separate patch, the fourth one). As a consequence, the cma_set_defaults() function has been changed -- it no longer accepts a string with list of regions but an
[PATCHv7 10/10] ARM: cma: Added CMA to Aquila, Goni and c210 universal boards
This commit adds CMA memory reservation code to Aquila, Goni and c210 universal boards. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- arch/arm/mach-s5pv210/mach-aquila.c |2 + arch/arm/mach-s5pv210/mach-goni.c |2 + arch/arm/mach-s5pv310/mach-universal_c210.c |2 + arch/arm/plat-s5p/Makefile |2 + arch/arm/plat-s5p/cma-stub.c| 49 +++ arch/arm/plat-s5p/include/plat/cma-stub.h | 21 +++ 6 files changed, 78 insertions(+), 0 deletions(-) create mode 100644 arch/arm/plat-s5p/cma-stub.c create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c index 28677ca..8608a16 100644 --- a/arch/arm/mach-s5pv210/mach-aquila.c +++ b/arch/arm/mach-s5pv210/mach-aquila.c @@ -39,6 +39,7 @@ #include plat/fb.h #include plat/fimc-core.h #include plat/sdhci.h +#include plat/cma-stub.h /* Following are default values for UCON, ULCON and UFCON UART registers */ #define AQUILA_UCON_DEFAULT(S3C2410_UCON_TXILEVEL |\ @@ -690,4 +691,5 @@ MACHINE_START(AQUILA, Aquila) .map_io = aquila_map_io, .init_machine = aquila_machine_init, .timer = s3c24xx_timer, + .reserve= cma_mach_reserve, MACHINE_END diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c index b1dcf96..b1bf079 100644 --- a/arch/arm/mach-s5pv210/mach-goni.c +++ b/arch/arm/mach-s5pv210/mach-goni.c @@ -45,6 +45,7 @@ #include plat/keypad.h #include plat/sdhci.h #include plat/clock.h +#include plat/cma-stub.h /* Following are default values for UCON, ULCON and UFCON UART registers */ #define GONI_UCON_DEFAULT (S3C2410_UCON_TXILEVEL |\ @@ -865,4 +866,5 @@ MACHINE_START(GONI, GONI) .map_io = goni_map_io, .init_machine = goni_machine_init, .timer = s3c24xx_timer, + .reserve= cma_mach_reserve, MACHINE_END diff --git a/arch/arm/mach-s5pv310/mach-universal_c210.c b/arch/arm/mach-s5pv310/mach-universal_c210.c index 16d8fc0..d65703a 100644 --- a/arch/arm/mach-s5pv310/mach-universal_c210.c +++ b/arch/arm/mach-s5pv310/mach-universal_c210.c @@ -21,6 +21,7 @@ #include plat/s5pv310.h #include plat/cpu.h #include plat/devs.h +#include plat/cma-stub.h #include mach/map.h @@ -152,6 +153,7 @@ MACHINE_START(UNIVERSAL_C210, UNIVERSAL_C210) .boot_params= S5P_PA_SDRAM + 0x100, .init_irq = s5pv310_init_irq, .map_io = universal_map_io, + .reserve= cma_mach_reserve, .init_machine = universal_machine_init, .timer = s5pv310_timer, MACHINE_END diff --git a/arch/arm/plat-s5p/Makefile b/arch/arm/plat-s5p/Makefile index de65238..6fdb6ce 100644 --- a/arch/arm/plat-s5p/Makefile +++ b/arch/arm/plat-s5p/Makefile @@ -28,3 +28,5 @@ obj-$(CONFIG_S5P_DEV_FIMC0) += dev-fimc0.o obj-$(CONFIG_S5P_DEV_FIMC1)+= dev-fimc1.o obj-$(CONFIG_S5P_DEV_FIMC2)+= dev-fimc2.o obj-$(CONFIG_S5P_DEV_ONENAND) += dev-onenand.o + +obj-$(CONFIG_CMA) += cma-stub.o diff --git a/arch/arm/plat-s5p/cma-stub.c b/arch/arm/plat-s5p/cma-stub.c new file mode 100644 index 000..716e56d --- /dev/null +++ b/arch/arm/plat-s5p/cma-stub.c @@ -0,0 +1,49 @@ +/* + * This file is just a quick and dirty hack to get CMA testing device + * working. The cma_mach_reserve() should be called as mach's reserve + * callback. CMA testing device will use cma_ctx for allocations. + */ + +#include plat/cma-stub.h + +#include linux/cma.h +#include linux/err.h +#include linux/init.h +#include linux/kernel.h + +struct cma *cma_ctx; + +#define cma_size (32UL 20) /* 32 MiB */ + +static unsigned long cma_start __initdata; + +void __init cma_mach_reserve(void) +{ + unsigned long start = cma_reserve(0, cma_size, 0); + if (IS_ERR_VALUE(start)) + printk(KERN_WARNING cma: unable to reserve %lu for CMA: %d\n, + cma_size 20, (int)start); + else + cma_start = start; +} + +static int __init cma_mach_init(void) +{ + int ret = -ENOMEM; + + if (cma_start) { + struct cma *ctx = cma_create(cma_start, cma_size); + if (IS_ERR(ctx)) { + ret = PTR_ERR(ctx); + printk(KERN_WARNING + cma: cma_create(%p, %p) failed: %d\n, + (void *)cma_start, (void *)cma_size, ret); + } else { + cma_ctx = ctx; + ret = 0; + } + } + + return ret; +} +device_initcall(cma_mach_init); diff --git a/arch/arm/plat-s5p/include/plat/cma-stub.h b/arch/arm/plat-s5p/include/plat/cma-stub.h new file mode 100644 index 000..a24a03b --- /dev/null +++ b/arch/arm/plat-s5p/include/plat/cma
[PATCHv7 09/10] mm: cma: Test device and application added
This patch adds a cma misc device which lets user space use the CMA API. This device is meant for testing. A testing application is also provided. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/misc/Kconfig | 10 + drivers/misc/Makefile |1 + drivers/misc/cma-dev.c | 238 include/linux/cma.h| 29 +++ tools/cma/cma-test.c | 466 5 files changed, 744 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/cma-dev.c create mode 100644 tools/cma/cma-test.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 1e1a4be..0dd1e8c 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -458,4 +458,14 @@ source drivers/misc/cb710/Kconfig source drivers/misc/iwmc3200top/Kconfig source drivers/misc/ti-st/Kconfig +config CMA_DEVICE + tristate CMA misc device (DEVELOPEMENT) + depends on CMA + help + The CMA misc device allows allocating contiguous memory areas + from user space. This is mostly for testing of the CMA + framework. + + If unsure, say n + endif # MISC_DEVICES diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 98009cc..f8eadd4 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o obj-$(CONFIG_PCH_PHUB) += pch_phub.o obj-y += ti-st/ obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o +obj-$(CONFIG_CMA_DEVICE) += cma-dev.o diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c new file mode 100644 index 000..6c36064 --- /dev/null +++ b/drivers/misc/cma-dev.c @@ -0,0 +1,238 @@ +/* + * Contiguous Memory Allocator userspace driver + * Copyright (c) 2010 by Samsung Electronics. + * Written by Michal Nazarewicz (m.nazarew...@samsung.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License or (at your optional) any later version of the license. + */ + +#define pr_fmt(fmt) cma: fmt + +#ifdef CONFIG_CMA_DEBUG +# define DEBUG +#endif + +#include linux/errno.h /* Error numbers */ +#include linux/err.h /* IS_ERR_VALUE() */ +#include linux/fs.h /* struct file */ +#include linux/mm.h /* Memory stuff */ +#include linux/mman.h +#include linux/slab.h +#include linux/module.h /* Standard module stuff */ +#include linux/device.h /* struct device, dev_dbg() */ +#include linux/types.h /* Just to be safe ;) */ +#include linux/uaccess.h /* __copy_{to,from}_user */ +#include linux/miscdevice.h /* misc_register() and company */ + +#include linux/cma.h + +#include plat/cma-stub.h + +static int cma_file_open(struct inode *inode, struct file *file); +static int cma_file_release(struct inode *inode, struct file *file); +static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg); +static int cma_file_mmap(struct file *file, struct vm_area_struct *vma); + +static struct miscdevice cma_miscdev = { + .minor = MISC_DYNAMIC_MINOR, + .name = cma, + .fops = (const struct file_operations) { + .owner = THIS_MODULE, + .open = cma_file_open, + .release= cma_file_release, + .unlocked_ioctl = cma_file_ioctl, + .mmap = cma_file_mmap, + }, +}; +#define cma_dev (cma_miscdev.this_device) + +struct cma_private_data { + struct cm *cm; + unsigned long size; + unsigned long phys; +}; + +static int cma_file_open(struct inode *inode, struct file *file) +{ + struct cma_private_data *prv; + + dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file); + + if (!cma_ctx) + return -EOPNOTSUPP; + + prv = kzalloc(sizeof *prv, GFP_KERNEL); + if (!prv) + return -ENOMEM; + + file-private_data = prv; + + return 0; +} + +static int cma_file_release(struct inode *inode, struct file *file) +{ + struct cma_private_data *prv = file-private_data; + + dev_dbg(cma_dev, %s(%p)\n, __func__, (void *)file); + + if (prv-cm) { + cm_unpin(prv-cm); + cm_free(prv-cm); + } + kfree(prv); + + return 0; +} + +static long cma_file_ioctl_req(struct cma_private_data *prv, unsigned long arg) +{ + struct cma_alloc_request req; + struct cm *cm; + + dev_dbg(cma_dev, %s()\n, __func__); + + if (!arg) + return -EINVAL; + + if (copy_from_user(req, (void *)arg, sizeof req)) + return -EFAULT; + + if (req.magic != CMA_MAGIC) + return -ENOTTY; + + /* May happen on 32 bit system. */ + if (req.size ~(unsigned
[PATCHv7 02/10] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
This commit adds a bitmap_find_next_zero_area_off() function which works like bitmap_find_next_zero_area() function expect it allows an offset to be specified when alignment is checked. This lets caller request a bit such that its number plus the offset is aligned according to the mask. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/bitmap.h | 24 +++- lib/bitmap.c | 22 -- 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index daf8c48..c0528d1 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -45,6 +45,7 @@ * bitmap_set(dst, pos, nbits) Set specified bit area * bitmap_clear(dst, pos, nbits) Clear specified bit area * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area + * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above * bitmap_shift_right(dst, src, n, nbits) *dst = *src n * bitmap_shift_left(dst, src, n, nbits) *dst = *src n * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src) @@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits); extern void bitmap_set(unsigned long *map, int i, int len); extern void bitmap_clear(unsigned long *map, int start, int nr); -extern unsigned long bitmap_find_next_zero_area(unsigned long *map, -unsigned long size, -unsigned long start, -unsigned int nr, -unsigned long align_mask); + +extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask, + unsigned long align_offset); + +static inline unsigned long +bitmap_find_next_zero_area(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask) +{ + return bitmap_find_next_zero_area_off(map, size, start, nr, + align_mask, 0); +} extern int bitmap_scnprintf(char *buf, unsigned int len, const unsigned long *src, int nbits); diff --git a/lib/bitmap.c b/lib/bitmap.c index 741fae9..8e75a6f 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr) } EXPORT_SYMBOL(bitmap_clear); -/* +/** * bitmap_find_next_zero_area - find a contiguous aligned zero area * @map: The address to base the search on * @size: The bitmap size in bits * @start: The bitnumber to start searching at * @nr: The number of zeroed bits we're looking for * @align_mask: Alignment mask for zero area + * @align_offset: Alignment offset for zero area. * * The @align_mask should be one less than a power of 2; the effect is that - * the bit offset of all zero areas this function finds is multiples of that - * power of 2. A @align_mask of 0 means no alignment is required. + * the bit offset of all zero areas this function finds plus @align_offset + * is multiple of that power of 2. */ -unsigned long bitmap_find_next_zero_area(unsigned long *map, -unsigned long size, -unsigned long start, -unsigned int nr, -unsigned long align_mask) +unsigned long bitmap_find_next_zero_area_off(unsigned long *map, +unsigned long size, +unsigned long start, +unsigned int nr, +unsigned long align_mask, +unsigned long align_offset) { unsigned long index, end, i; again: index = find_next_zero_bit(map, size, start); /* Align allocation */ - index = __ALIGN_MASK(index, align_mask); + index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset; end = index + nr; if (end size) @@ -350,7 +352,7 @@ again: } return index; } -EXPORT_SYMBOL(bitmap_find_next_zero_area); +EXPORT_SYMBOL(bitmap_find_next_zero_area_off); /* * Bitmap printing parsing functions: first version by Bill Irwin, -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux
[PATCHv7 03/10] lib: genalloc: Generic allocator improvements
This commit adds a gen_pool_alloc_aligned() function to the generic allocator API. It allows specifying alignment for the allocated block. This feature uses the bitmap_find_next_zero_area_off() function. It also fixes possible issue with bitmap's last element being not fully allocated (ie. space allocated for chunk-bits is not a multiple of sizeof(long)). It also makes some other smaller changes: - moves structure definitions out of the header file, - adds __must_check to functions returning value, - makes gen_pool_add() return -ENOMEM rater than -1 on error, - changes list_for_each to list_for_each_entry, and - makes use of bitmap_clear(). Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/genalloc.h | 46 ++-- lib/genalloc.c | 182 ++--- 2 files changed, 129 insertions(+), 99 deletions(-) diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 9869ef3..8ac7337 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -8,29 +8,31 @@ * Version 2. See the file COPYING for more details. */ +struct gen_pool; -/* - * General purpose special memory pool descriptor. - */ -struct gen_pool { - rwlock_t lock; - struct list_head chunks;/* list of chunks in this pool */ - int min_alloc_order;/* minimum allocation order */ -}; +struct gen_pool *__must_check gen_pool_create(unsigned order, int nid); -/* - * General purpose special memory pool chunk descriptor. +int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr, + size_t size, int nid); + +void gen_pool_destroy(struct gen_pool *pool); + +unsigned long __must_check +gen_pool_alloc_aligned(struct gen_pool *pool, size_t size, + unsigned alignment_order); + +/** + * gen_pool_alloc() - allocate special memory from the pool + * @pool: Pool to allocate from. + * @size: Number of bytes to allocate from the pool. + * + * Allocate the requested number of bytes from the specified pool. + * Uses a first-fit algorithm. */ -struct gen_pool_chunk { - spinlock_t lock; - struct list_head next_chunk;/* next chunk in pool */ - unsigned long start_addr; /* starting address of memory chunk */ - unsigned long end_addr; /* ending address of memory chunk */ - unsigned long bits[0]; /* bitmap for allocating memory chunk */ -}; +static inline unsigned long __must_check +gen_pool_alloc(struct gen_pool *pool, size_t size) +{ + return gen_pool_alloc_aligned(pool, size, 0); +} -extern struct gen_pool *gen_pool_create(int, int); -extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int); -extern void gen_pool_destroy(struct gen_pool *); -extern unsigned long gen_pool_alloc(struct gen_pool *, size_t); -extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); +void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size); diff --git a/lib/genalloc.c b/lib/genalloc.c index 1923f14..0761079 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -16,53 +16,80 @@ #include linux/genalloc.h +/* General purpose special memory pool descriptor. */ +struct gen_pool { + rwlock_t lock; /* protects chunks list */ + struct list_head chunks;/* list of chunks in this pool */ + unsigned order; /* minimum allocation order */ +}; + +/* General purpose special memory pool chunk descriptor. */ +struct gen_pool_chunk { + spinlock_t lock;/* protects bits */ + struct list_head next_chunk;/* next chunk in pool */ + unsigned long start;/* start of memory chunk */ + unsigned long size; /* number of bits */ + unsigned long bits[0]; /* bitmap for allocating memory chunk */ +}; + + /** - * gen_pool_create - create a new special memory pool - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents - * @nid: node id of the node the pool structure should be allocated on, or -1 + * gen_pool_create() - create a new special memory pool + * @order: Log base 2 of number of bytes each bitmap bit + * represents. + * @nid: Node id of the node the pool structure should be allocated + * on, or -1. This will be also used for other allocations. * * Create a new special memory pool that can be used to manage special purpose * memory not managed by the regular kmalloc/kfree interface. */ -struct gen_pool *gen_pool_create(int min_alloc_order, int nid) +struct gen_pool *__must_check gen_pool_create(unsigned order, int nid) { struct gen_pool *pool; - pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid); - if (pool != NULL) { + if (WARN_ON(order = BITS_PER_LONG)) + return NULL
[PATCHv7 06/10] mm: MIGRATE_CMA migration type added
The MIGRATE_CMA migration type has two main characteristics: (i) only movable pages can be allocated from MIGRATE_CMA pageblocks and (ii) page allocator will never change migration type of MIGRATE_CMA pageblocks. This guarantees that page in a MIGRATE_CMA page block can always be migrated somewhere else (unless there's no memory left in the system). It is designed to be used with Contiguous Memory Allocator (CMA) for allocating big chunks (eg. 10MiB) of physically contiguous memory. Once driver requests contiguous memory, CMA will migrate pages from MIGRATE_CMA pageblocks. To minimise number of migrations, MIGRATE_CMA migration type is the last type tried when page allocator falls back to other migration types then requested. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/mmzone.h | 30 +++--- mm/Kconfig |8 mm/compaction.c| 10 + mm/internal.h |3 + mm/page_alloc.c| 97 +++ 5 files changed, 124 insertions(+), 24 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 39c24eb..1b95899 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -35,13 +35,24 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3 -#define MIGRATE_UNMOVABLE 0 -#define MIGRATE_RECLAIMABLE 1 -#define MIGRATE_MOVABLE 2 -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ -#define MIGRATE_RESERVE 3 -#define MIGRATE_ISOLATE 4 /* can't allocate from here */ -#define MIGRATE_TYPES 5 +enum { + MIGRATE_UNMOVABLE, + MIGRATE_RECLAIMABLE, + MIGRATE_MOVABLE, + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ + MIGRATE_RESERVE = MIGRATE_PCPTYPES, + MIGRATE_ISOLATE,/* can't allocate from here */ +#ifdef CONFIG_MIGRATE_CMA + MIGRATE_CMA,/* only movable */ +#endif + MIGRATE_TYPES +}; + +#ifdef CONFIG_MIGRATE_CMA +# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) +#else +# define is_migrate_cma(migratetype) false +#endif #define for_each_migratetype_order(order, type) \ for (order = 0; order MAX_ORDER; order++) \ @@ -54,6 +65,11 @@ static inline int get_pageblock_migratetype(struct page *page) return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end); } +static inline bool is_pageblock_cma(struct page *page) +{ + return is_migrate_cma(get_pageblock_migratetype(page)); +} + struct free_area { struct list_headfree_list[MIGRATE_TYPES]; unsigned long nr_free; diff --git a/mm/Kconfig b/mm/Kconfig index b911ad3..7818b07 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1,3 +1,11 @@ +config MIGRATE_CMA + bool + help + This option should be selected by code that requires MIGRATE_CMA + migration type to be present. Once a page block has this + migration type, only movable pages can be allocated from it and + the page block never changes it's migration type. + config SELECT_MEMORY_MODEL def_bool y depends on EXPERIMENTAL || ARCH_SELECT_MEMORY_MODEL diff --git a/mm/compaction.c b/mm/compaction.c index 4d709ee..c5e404b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page) if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE) return false; + /* Keep MIGRATE_CMA alone as well. */ + /* +* XXX Revisit. We currently cannot let compaction touch CMA +* pages since compaction insists on changing their migration +* type to MIGRATE_MOVABLE (see split_free_page() called from +* isolate_freepages_block() above). +*/ + if (is_migrate_cma(migratetype)) + return false; + /* If the page is a large free page, then allow migration */ if (PageBuddy(page) page_order(page) = pageblock_order) return true; diff --git a/mm/internal.h b/mm/internal.h index dedb0af..cc24e74 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page); * in mm/page_alloc.c */ extern void __free_pages_bootmem(struct page *page, unsigned int order); +#ifdef CONFIG_MIGRATE_CMA +extern void __free_pageblock_cma(struct page *page); +#endif extern void prep_compound_page(struct page *page, unsigned long order); #ifdef CONFIG_MEMORY_FAILURE extern bool is_free_buddy_page(struct page *page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 997f6c8..537d1f6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -717,6 +717,30 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order) } } +#ifdef CONFIG_MIGRATE_CMA + +/* + * Free whole pageblock and set it's migration type
[RFCv6 01/13] lib: rbtree: rb_root_init() function added
Added a rb_root_init() function which initialises a rb_root structure as a red-black tree with at most one element. The rationale is that using rb_root_init(root, node) is more straightforward and cleaner then first initialising and empty tree followed by an insert operation. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/rbtree.h | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h index 7066acb..5b6dc66 100644 --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -130,6 +130,17 @@ static inline void rb_set_color(struct rb_node *rb, int color) } #define RB_ROOT(struct rb_root) { NULL, } + +static inline void rb_root_init(struct rb_root *root, struct rb_node *node) +{ + root-rb_node = node; + if (node) { + node-rb_parent_color = RB_BLACK; /* black, no parent */ + node-rb_left = NULL; + node-rb_right = NULL; + } +} + #definerb_entry(ptr, type, member) container_of(ptr, type, member) #define RB_EMPTY_ROOT(root)((root)-rb_node == NULL) -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFCv6 02/13] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
This commit adds a bitmap_find_next_zero_area_off() function which works like bitmap_find_next_zero_area() function expect it allows an offset to be specified when alignment is checked. This lets caller request a bit such that its number plus the offset is aligned according to the mask. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/bitmap.h | 24 +++- lib/bitmap.c | 22 -- 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index daf8c48..c0528d1 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -45,6 +45,7 @@ * bitmap_set(dst, pos, nbits) Set specified bit area * bitmap_clear(dst, pos, nbits) Clear specified bit area * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area + * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above * bitmap_shift_right(dst, src, n, nbits) *dst = *src n * bitmap_shift_left(dst, src, n, nbits) *dst = *src n * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src) @@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits); extern void bitmap_set(unsigned long *map, int i, int len); extern void bitmap_clear(unsigned long *map, int start, int nr); -extern unsigned long bitmap_find_next_zero_area(unsigned long *map, -unsigned long size, -unsigned long start, -unsigned int nr, -unsigned long align_mask); + +extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask, + unsigned long align_offset); + +static inline unsigned long +bitmap_find_next_zero_area(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask) +{ + return bitmap_find_next_zero_area_off(map, size, start, nr, + align_mask, 0); +} extern int bitmap_scnprintf(char *buf, unsigned int len, const unsigned long *src, int nbits); diff --git a/lib/bitmap.c b/lib/bitmap.c index 741fae9..8e75a6f 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr) } EXPORT_SYMBOL(bitmap_clear); -/* +/** * bitmap_find_next_zero_area - find a contiguous aligned zero area * @map: The address to base the search on * @size: The bitmap size in bits * @start: The bitnumber to start searching at * @nr: The number of zeroed bits we're looking for * @align_mask: Alignment mask for zero area + * @align_offset: Alignment offset for zero area. * * The @align_mask should be one less than a power of 2; the effect is that - * the bit offset of all zero areas this function finds is multiples of that - * power of 2. A @align_mask of 0 means no alignment is required. + * the bit offset of all zero areas this function finds plus @align_offset + * is multiple of that power of 2. */ -unsigned long bitmap_find_next_zero_area(unsigned long *map, -unsigned long size, -unsigned long start, -unsigned int nr, -unsigned long align_mask) +unsigned long bitmap_find_next_zero_area_off(unsigned long *map, +unsigned long size, +unsigned long start, +unsigned int nr, +unsigned long align_mask, +unsigned long align_offset) { unsigned long index, end, i; again: index = find_next_zero_bit(map, size, start); /* Align allocation */ - index = __ALIGN_MASK(index, align_mask); + index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset; end = index + nr; if (end size) @@ -350,7 +352,7 @@ again: } return index; } -EXPORT_SYMBOL(bitmap_find_next_zero_area); +EXPORT_SYMBOL(bitmap_find_next_zero_area_off); /* * Bitmap printing parsing functions: first version by Bill Irwin, -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux
[RFCv6 05/13] mm: cma: debugfs support added
The debugfs development interface lets one change the map attribute at run time as well as observe what regions have been reserved. Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- Documentation/contiguous-memory.txt |4 + include/linux/cma.h | 11 + mm/Kconfig | 25 ++- mm/cma.c| 501 ++- 4 files changed, 537 insertions(+), 4 deletions(-) diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt index f1715ba..ec09d8e 100644 --- a/Documentation/contiguous-memory.txt +++ b/Documentation/contiguous-memory.txt @@ -258,6 +258,10 @@ iff it matched in previous pattern. If the second part is omitted it will mach any type of memory requested by device. + If debugfs support is enabled, this attribute is accessible via + debugfs and can be changed at run-time by writing to + contiguous/map. + Some examples (whitespace added for better readability): cma_map = foo/quaz = r1; diff --git a/include/linux/cma.h b/include/linux/cma.h index a6031a7..8437104 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -24,6 +24,7 @@ struct device; struct cma_info; +struct dentry; /** * struct cma - an allocated contiguous chunk of memory. @@ -276,6 +277,11 @@ struct cma_region { unsigned users; struct list_head list; +#if defined CONFIG_CMA_DEBUGFS + const char *to_alloc_link, *from_alloc_link; + struct dentry *dir, *to_alloc, *from_alloc; +#endif + unsigned used:1; unsigned registered:1; unsigned reserved:1; @@ -382,6 +388,11 @@ struct cma_allocator { void (*unpin)(struct cma *chunk); struct list_head list; + +#if defined CONFIG_CMA_DEBUGFS + const char *dir_name; + struct dentry *regs_dir; +#endif }; /** diff --git a/mm/Kconfig b/mm/Kconfig index c7eb1bc..a5480ea 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -351,16 +351,35 @@ config CMA For more information see Documentation/contiguous-memory.txt. If unsure, say n. -config CMA_DEBUG - bool CMA debug messages (DEVELOPEMENT) +config CMA_DEVELOPEMENT + bool Include CMA developement features depends on CMA help + This lets you enable some developement features of the CMA + framework. It does not add any code to the kernel. + + Those options are mostly usable during development and testing. + If unsure, say n. + +config CMA_DEBUG + bool CMA debug messages + depends on CMA_DEVELOPEMENT + help Turns on debug messages in CMA. This produces KERN_DEBUG messages for every CMA call as well as various messages while processing calls such as cma_alloc(). This option does not affect warning and error messages. - This is mostly used during development. If unsure, say n. +config CMA_DEBUGFS + bool CMA debugfs interface support + depends on CMA_DEVELOPEMENT DEBUG_FS + help + Enable support for debugfs interface. It is available under the + contiguous directory in the debugfs root directory. Each + region and allocator is represented there. + + For more information consult + Documentation/contiguous-memory.txt. config CMA_GENERIC_ALLOCATOR bool CMA generic allocator diff --git a/mm/cma.c b/mm/cma.c index 17276b3..dfdeeb7 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -34,11 +34,16 @@ #include linux/slab.h/* kmalloc() */ #include linux/string.h /* str*() */ #include linux/genalloc.h/* gen_pool_*() */ +#include linux/debugfs.h /* debugfs stuff */ +#include linux/uaccess.h /* copy_{to,from}_user */ #include linux/cma.h -/* Protects cma_regions, cma_allocators, cma_map and cma_map_length. */ +/* + * Protects cma_regions, cma_allocators, cma_map, cma_map_length, + * cma_dfs_regions and cma_dfs_allocators. + */ static DEFINE_MUTEX(cma_mutex); @@ -139,7 +144,13 @@ int __init __must_check cma_early_region_register(struct cma_region *reg) /* Regions Allocators */ +static void __cma_dfs_region_add(struct cma_region *reg); +static void __cma_dfs_region_alloc_update(struct cma_region *reg); +static void __cma_dfs_allocator_add(struct cma_allocator *alloc); + static int __cma_region_attach_alloc(struct cma_region *reg); +static void __maybe_unused __cma_region_detach_alloc(struct cma_region *reg); + /* List of all regions. Named regions are kept before unnamed. */ static LIST_HEAD(cma_regions); @@ -222,6 +233,8 @@ int __must_check cma_region_register(struct cma_region *reg) else list_add_tail(reg-list, cma_regions); + __cma_dfs_region_add(reg); + done: mutex_unlock(cma_mutex); @@ -298,6 +311,8 @@ int