Re: [PATCH 2/9] mm: Provide new get_vaddr_frames() helper

2015-05-11 Thread Mel Gorman
On Mon, May 11, 2015 at 04:00:19PM +0200, Jan Kara wrote:
   +int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
   +  bool write, bool force, struct frame_vector *vec)
   +{
   + struct mm_struct *mm = current-mm;
   + struct vm_area_struct *vma;
   + int ret = 0;
   + int err;
   + int locked = 1;
   +
  
  bool locked.
   It cannot be bool. It is passed to get_user_pages_locked() which expects
 int *.
 

My bad.

   +int frame_vector_to_pages(struct frame_vector *vec)
   +{
  
  I think it's probably best to make the relevant counters in frame_vector
  signed and limit the maximum possible size of it. It's still not putting
  any practical limit on the size of the frame_vector.

   I don't see a reason why counters in frame_vector should be signed... Can
 you share your reason?  I've added a check into frame_vector_create() to
 limit number of frames to INT_MAX / sizeof(void *) / 2 to avoid arithmetics
 overflow. Thanks for review!
 

Only that the return value of frame_vector_to_pages() returns int where
as the potential range that is converted is unsigned int. I don't think
there are any mistakes dealing with signed/unsigned but I don't see any
advantage of using unsigned either and limiting it to INT_MAX either.
It's not a big deal.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] mm: Provide new get_vaddr_frames() helper

2015-05-08 Thread Mel Gorman
 mmap_sem as necessary.
 + */
 +int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 +  bool write, bool force, struct frame_vector *vec)
 +{
 + struct mm_struct *mm = current-mm;
 + struct vm_area_struct *vma;
 + int ret = 0;
 + int err;
 + int locked = 1;
 +

bool locked.

 + if (nr_frames == 0)
 + return 0;
 +
 + if (WARN_ON_ONCE(nr_frames  vec-nr_allocated))
 + nr_frames = vec-nr_allocated;
 +
 + down_read(mm-mmap_sem);
 + vma = find_vma_intersection(mm, start, start + 1);
 + if (!vma) {
 + ret = -EFAULT;
 + goto out;
 + }
 + if (!(vma-vm_flags  (VM_IO | VM_PFNMAP))) {
 + vec-got_ref = 1;
 + vec-is_pfns = 0;

They're bools and while correct, it looks weird. There are a few
instances of this but I won't comment on it again.

 + ret = get_user_pages_locked(current, mm, start, nr_frames,
 + write, force, (struct page **)(vec-ptrs), locked);
 + goto out;
 + }
 +
 + vec-got_ref = 0;
 + vec-is_pfns = 1;
 + do {
 + unsigned long *nums = frame_vector_pfns(vec);
 +
 + while (ret  nr_frames  start + PAGE_SIZE = vma-vm_end) {
 + err = follow_pfn(vma, start, nums[ret]);
 + if (err) {
 + if (ret == 0)
 + ret = err;
 + goto out;
 + }
 + start += PAGE_SIZE;
 + ret++;
 + }
 + /*
 +  * We stop if we have enough pages or if VMA doesn't completely
 +  * cover the tail page.
 +  */
 + if (ret = nr_frames || start  vma-vm_end)
 + break;
 + vma = find_vma_intersection(mm, start, start + 1);
 + } while (vma  vma-vm_flags  (VM_IO | VM_PFNMAP));
 +out:
 + if (locked)
 + up_read(mm-mmap_sem);
 + if (!ret)
 + ret = -EFAULT;
 + if (ret  0)
 + vec-nr_frames = ret;
 + return ret;
 +}
 +EXPORT_SYMBOL(get_vaddr_frames);
 +
 +/**
 + * put_vaddr_frames() - drop references to pages if get_vaddr_frames() 
 acquired
 + *   them
 + * @vec: frame vector to put
 + *
 + * Drop references to pages if get_vaddr_frames() acquired them. We also
 + * invalidate the frame vector so that it is prepared for the next call into
 + * get_vaddr_frames().
 + */
 +void put_vaddr_frames(struct frame_vector *vec)
 +{
 + int i;
 + struct page **pages;
 +
 + if (!vec-got_ref)
 + goto out;
 + pages = frame_vector_pages(vec);
 + /*
 +  * frame_vector_pages() might needed to do a conversion when we
 +  * get_vaddr_frames() got pages but vec was later converted to pfns.
 +  * But it shouldn't really fail to convert pfns back...
 +  */
 + BUG_ON(IS_ERR(pages));

It's hard to see how it could but it is recoverable so BUG_ON is
overkill. WARN_ON and return even though we potentially leaked now.

 + for (i = 0; i  vec-nr_frames; i++)
 + put_page(pages[i]);
 + vec-got_ref = 0;
 +out:
 + vec-nr_frames = 0;
 +}
 +EXPORT_SYMBOL(put_vaddr_frames);
 +
 +/**
 + * frame_vector_to_pages - convert frame vector to contain page pointers
 + * @vec: frame vector to convert
 + *
 + * Convert @vec to contain array of page pointers.  If the conversion is
 + * successful, return 0. Otherwise return an error.
 + */

If you do another version, it would not hurt to mention that a page
reference is not taken when doing this conversion.

 +int frame_vector_to_pages(struct frame_vector *vec)
 +{

I think it's probably best to make the relevant counters in frame_vector
signed and limit the maximum possible size of it. It's still not putting
any practical limit on the size of the frame_vector.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] mm: Provide new get_vaddr_pfns() helper

2015-04-30 Thread Mel Gorman
;
 + int size = sizeof(struct pinned_pfns) + sizeof(unsigned long) * nr_pfns;
 +

Should be sizeof(void *) 

 + if (WARN_ON_ONCE(nr_pfns = 0))
 + return NULL;
 + if (size = PAGE_SIZE)
 + pfns = kmalloc(size, GFP_KERNEL);

kmalloc supports larger sizes than this but I suspect this was
deliberate to avoid high-order allocations.

 + else
 + pfns = vmalloc(size);
 + if (!pfns)
 + return NULL;
 + pfns-nr_allocated_pfns = nr_pfns;
 + pfns-nr_pfns = 0;
 + return pfns;
 +}
 +EXPORT_SYMBOL(pfns_vector_create);
 +
 +/**
 + * pfns_vector_destroy() - free memory allocated to carry pinned pfns
 + * @pfns:Memory to free
 + *
 + * Free structure allocated by pfns_vector_create() to carry pinned pfns.
 + */
 +void pfns_vector_destroy(struct pinned_pfns *pfns)
 +{
 + if (!is_vmalloc_addr(pfns))
 + kfree(pfns);
 + else
 + vfree(pfns);
 +}
 +EXPORT_SYMBOL(pfns_vector_destroy);
 +
 +/**
   * get_dump_page() - pin user page in memory while writing it to core dump
   * @addr: user address
   *
 -- 
 2.1.4
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks

2012-02-10 Thread Mel Gorman
On Wed, Feb 08, 2012 at 04:14:46PM +0100, Marek Szyprowski wrote:
   SNIP
   +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
   +{
   + enum zone_type high_zoneidx = gfp_zone(gfp_mask);
   + struct zonelist *zonelist = node_zonelist(0, gfp_mask);
   + int did_some_progress = 0;
   + int order = 1;
   + unsigned long watermark;
   +
   + /*
   +  * Increase level of watermarks to force kswapd do his job
   +  * to stabilize at new watermark level.
   +  */
   + min_free_kbytes += count * PAGE_SIZE / 1024;
  
  There is a risk of overflow here although it is incredibly
  small. Still, a potentially nicer way of doing this was
  
  count  (PAGE_SHIFT - 10)
  
   + setup_per_zone_wmarks();
   +
  
  Nothing prevents two or more processes updating the wmarks at the same
  time which is racy and unpredictable. Today it is not much of a problem
  but CMA makes this path hotter than it was and you may see weirdness
  if two processes are updating zonelists at the same time. Swap-over-NFS
  actually starts with a patch that serialises setup_per_zone_wmarks()
  
  You also potentially have a BIG problem here if this happens
  
  min_free_kbytes = 32768
  Process a: min_free_kbytes  += 65536
  Process a: start direct reclaim
  echo 16374  /proc/sys/vm/min_free_kbytes
  Process a: exit direct_reclaim
  Process a: min_free_kbytes -= 65536
  
  min_free_kbytes now wraps negative and the machine hangs.
  
  The damage is confined to CMA though so I am not going to lose sleep
  over it but you might want to consider at least preventing parallel
  updates to min_free_kbytes from proc.
 
 Right. This approach was definitely too hacky. What do you think about 
 replacing 
 it with the following code (I assume that setup_per_zone_wmarks() 
 serialization 
 patch will be merged anyway so I skipped it here):
 

It's part of a larger series and the rest of that series is
controversial. That single patch can be split out obviously so feel free
to add it to your series and stick your Signed-off-by on the end of it.

 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
 index 82f4fa5..bb9ae41 100644
 --- a/include/linux/mmzone.h
 +++ b/include/linux/mmzone.h
 @@ -371,6 +371,13 @@ struct zone {
 /* see spanned/present_pages for more description */
 seqlock_t   span_seqlock;
  #endif
 +#ifdef CONFIG_CMA
 +   /*
 +* CMA needs to increase watermark levels during the allocation
 +* process to make sure that the system is not starved.
 +*/
 +   unsigned long   min_cma_pages;
 +#endif
 struct free_areafree_area[MAX_ORDER];
 
  #ifndef CONFIG_SPARSEMEM
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 824fb37..1ca52f0 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -5044,6 +5044,11 @@ void setup_per_zone_wmarks(void)
 
 zone-watermark[WMARK_LOW]  = min_wmark_pages(zone) + (tmp  
 2);
 zone-watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp  
 1);
 +#ifdef CONFIG_CMA
 +   zone-watermark[WMARK_MIN] += zone-min_cma_pages;
 +   zone-watermark[WMARK_LOW] += zone-min_cma_pages;
 +   zone-watermark[WMARK_HIGH] += zone-min_cma_pages;
 +#endif
 setup_zone_migrate_reserve(zone);
 spin_unlock_irqrestore(zone-lock, flags);
 }

This is better in that it is not vunerable to parallel updates of
min_free_kbytes. It would be slightly tidier to introduce something
like cma_wmark_pages() that returns min_cma_pages if CONFIG_CMA and 0
otherwise. Use the helper to get right of this ifdef CONFIG_CMA within
setup_per_zone_wmarks().

You'll still have the problem of kswapd not taking CMA pages properly into
account when deciding whether to reclaim or not though.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating

2012-02-03 Thread Mel Gorman
On Thu, Feb 02, 2012 at 08:53:25PM +0100, Michal Nazarewicz wrote:
 On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote:
 Pages, which have incorrect migrate type on free finally
 causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE.
 
 On Thu, 02 Feb 2012 13:47:29 +0100, Mel Gorman m...@csn.ul.ie wrote:
 I'm not quite seeing this. In free_hot_cold_page(), the pageblock
 type is checked so the page private should be set to MIGRATE_CMA or
 MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a
 pageblock to MIGRATE_MOVABLE in error.
 
 Here's what I think may happen:
 
 When drain_all_pages() is called, __free_one_page() is called for each page on
 pcp list with migrate type deducted from page_private() which is MIGRATE_CMA.
 This result in the page being put on MIGRATE_CMA freelist even though its
 pageblock's migrate type is MIGRATE_ISOLATE.
 

Ok, although it will only be allocated for MIGRATE_CMA-compatible
requests so it is not a disaster.

 When allocation happens and pcp list is empty, rmqueue_bulk() will get 
 executed
 with migratetype argument set to MIGRATE_MOVABLE.  It calls __rmqueue() to 
 grab
 some pages and because the page described above is on MIGRATE_CMA freelist it
 may be returned back to rmqueue_bulk().
 

This will allocate the page from a pageblock we are trying to isolate
pages from, but only for a movable page that can still be migrated. It
does mean that CMA is doing more work than it should of course and
the problem also impacts memory hot-remove. It's worse for memory
hot-remove because potentially an UNMOVABLE page was allocated from
a MIGRATE_ISOLATE pageblock.

 But, pageblock's migrate type is not MIGRATE_CMA but MIGRATE_ISOLATE, so the
 following code:
 
 #ifdef CONFIG_CMA
   if (is_pageblock_cma(page))
   set_page_private(page, MIGRATE_CMA);
   else
 #endif
   set_page_private(page, migratetype);
 
 will set it's private to MIGRATE_MOVABLE and in the end the page lands back
 on MIGRATE_MOVABLE pcp list but this time with page_private == MIGRATE_MOVABLE
 and not MIGRATE_CMA.
 
 One more drain_all_pages() (which may happen since alloc_contig_range() calls
 set_migratetype_isolate() for each block) and next __rmqueue_fallback() may
 convert the whole pageblock to MIGRATE_MOVABLE.
 
 I know, this sounds crazy and improbable, but I couldn't find an easier path
 to destruction.  As you pointed, once the page is allocated, 
 free_hot_cold_page()
 will do the right thing by reading pageblock's migrate type.
 

Ok, it's crazy but the problem is there.

 Marek is currently experimenting with various patches including the following
 change:
 
 #ifdef CONFIG_CMA
 int mt = get_pageblock_migratetype(page);
 if (is_migrate_cma(mt) || mt == MIGRATE_ISOLATE)
 set_page_private(page, mt);
 else
 #endif
 set_page_private(page, migratetype);
 
 As a matter of fact, if __rmqueue() was changed to return migrate type of the
 freelist it took page from, we could avoid this get_pageblock_migratetype() 
 all
 together.  For now, however, I'd rather not go that way just yet -- I'll be 
 happy
 to dig into it once CMA gets merged.
 

Ok, thanks for persisting with this.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] mm: compaction: introduce map_pages()

2012-02-03 Thread Mel Gorman
On Fri, Feb 03, 2012 at 01:18:46PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 This commit creates a map_pages() function which map pages freed
 using split_free_pages().  This merely moves some code from
 isolate_freepages() so that it can be reused in other places.
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com

Acked-by: Mel Gorman m...@csn.ul.ie

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added

2012-02-03 Thread Mel Gorman
On Fri, Feb 03, 2012 at 01:18:51PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 The MIGRATE_CMA migration type has two main characteristics:
 (i) only movable pages can be allocated from MIGRATE_CMA
 pageblocks and (ii) page allocator will never change migration
 type of MIGRATE_CMA pageblocks.
 
 This guarantees (to some degree) that page in a MIGRATE_CMA page
 block can always be migrated somewhere else (unless there's no
 memory left in the system).
 
 It is designed to be used for allocating big chunks (eg. 10MiB)
 of physically contiguous memory.  Once driver requests
 contiguous memory, pages from MIGRATE_CMA pageblocks may be
 migrated away to create a contiguous block.
 
 To minimise number of migrations, MIGRATE_CMA migration type
 is the last type tried when page allocator falls back to other
 migration types then requested.
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 Tested-by: Rob Clark rob.cl...@linaro.org
 Tested-by: Ohad Ben-Cohen o...@wizery.com
 Tested-by: Benjamin Gaignard benjamin.gaign...@linaro.org

Minor comments that can be handled as a follow-up but otherwise

Acked-by: Mel Gorman m...@csn.ul.ie

 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 238fcec..993c375 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -750,6 +750,26 @@ void __meminit __free_pages_bootmem(struct page *page, 
 unsigned int order)
   __free_pages(page, order);
  }
  
 +#ifdef CONFIG_CMA
 +/*
 + * Free whole pageblock and set it's migration type to MIGRATE_CMA.
 + */
 +void __init init_cma_reserved_pageblock(struct page *page)
 +{
 + unsigned i = pageblock_nr_pages;
 + struct page *p = page;
 +
 + do {
 + __ClearPageReserved(p);
 + set_page_count(p, 0);
 + } while (++p, --i);
 +
 + set_page_refcounted(page);
 + set_pageblock_migratetype(page, MIGRATE_CMA);
 + __free_pages(page, pageblock_order);
 + totalram_pages += pageblock_nr_pages;
 +}
 +#endif
  

This chunk is not used with the patch. Usually a hunk like this would
be part of the patch that used it.  Functionally it looks ok but
I see that the function that calls it is *not* __init. That should
trigger a section warning. Do a make CONFIG_DEBUG_SECTION_MISMATCH=y
and clean it up if necessary.

  /*
   * The order of subdivision here is critical for the IO subsystem.
 @@ -875,10 +895,15 @@ struct page *__rmqueue_smallest(struct zone *zone, 
 unsigned int order,
   * This array describes the order lists are fallen back to when
   * the free lists for the desirable migrate type are depleted
   */
 -static int fallbacks[MIGRATE_TYPES][3] = {
 - [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
 MIGRATE_RESERVE },
 - [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
 MIGRATE_RESERVE },
 - [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
 MIGRATE_RESERVE },
 +static int fallbacks[MIGRATE_TYPES][4] = {
 + [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, 
 MIGRATE_RESERVE },
 + [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE, 
 MIGRATE_RESERVE },
 +#ifdef CONFIG_CMA
 + [MIGRATE_MOVABLE] = { MIGRATE_CMA, MIGRATE_RECLAIMABLE, 
 MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
 + [MIGRATE_CMA] = { MIGRATE_RESERVE }, /* Never used */
 +#else
 + [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   
 MIGRATE_RESERVE },
 +#endif
   [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
   [MIGRATE_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
  };
 @@ -995,11 +1020,18 @@ __rmqueue_fallback(struct zone *zone, int order, int 
 start_migratetype)
* pages to the preferred allocation list. If falling
* back for a reclaimable kernel allocation, be more
* aggressive about taking ownership of free pages
 +  *
 +  * On the other hand, never change migration
 +  * type of MIGRATE_CMA pageblocks nor move CMA
 +  * pages on different free lists. We don't
 +  * want unmovable pages to be allocated from
 +  * MIGRATE_CMA areas.
*/
 - if (unlikely(current_order = (pageblock_order  1)) ||
 - start_migratetype == 
 MIGRATE_RECLAIMABLE ||
 - page_group_by_mobility_disabled) {
 - unsigned long pages;
 + if (!is_migrate_cma(migratetype) 
 + (unlikely(current_order = pageblock_order / 2) ||
 +  start_migratetype == MIGRATE_RECLAIMABLE ||
 +  page_group_by_mobility_disabled

Re: [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks

2012-02-03 Thread Mel Gorman
On Fri, Feb 03, 2012 at 01:18:54PM +0100, Marek Szyprowski wrote:
 alloc_contig_range() performs memory allocation so it also should keep
 track on keeping the correct level of memory watermarks. This commit adds
 a call to *_slowpath style reclaim to grab enough pages to make sure that
 the final collection of contiguous pages from freelists will not starve
 the system.
 
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 CC: Michal Nazarewicz min...@mina86.com
 Tested-by: Rob Clark rob.cl...@linaro.org
 Tested-by: Ohad Ben-Cohen o...@wizery.com
 Tested-by: Benjamin Gaignard benjamin.gaign...@linaro.org

I still do not intend to ack this patch and any damage is confined to
CMA but I have a few comments anyway.

 ---
  mm/page_alloc.c |   47 +++
  1 files changed, 47 insertions(+), 0 deletions(-)
 
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 983ccba..371a79f 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -5632,6 +5632,46 @@ static int __alloc_contig_migrate_range(unsigned long 
 start, unsigned long end)
   return ret  0 ? 0 : ret;
  }
  
 +/*
 + * Trigger memory pressure bump to reclaim some pages in order to be able to
 + * allocate 'count' pages in single page units. Does similar work as
 + *__alloc_pages_slowpath() function.
 + */
 +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
 +{
 + enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 + struct zonelist *zonelist = node_zonelist(0, gfp_mask);
 + int did_some_progress = 0;
 + int order = 1;
 + unsigned long watermark;
 +
 + /*
 +  * Increase level of watermarks to force kswapd do his job
 +  * to stabilize at new watermark level.
 +  */
 + min_free_kbytes += count * PAGE_SIZE / 1024;

There is a risk of overflow here although it is incredibly
small. Still, a potentially nicer way of doing this was

count  (PAGE_SHIFT - 10)

 + setup_per_zone_wmarks();
 +

Nothing prevents two or more processes updating the wmarks at the same
time which is racy and unpredictable. Today it is not much of a problem
but CMA makes this path hotter than it was and you may see weirdness
if two processes are updating zonelists at the same time. Swap-over-NFS
actually starts with a patch that serialises setup_per_zone_wmarks()

You also potentially have a BIG problem here if this happens

min_free_kbytes = 32768
Process a: min_free_kbytes  += 65536
Process a: start direct reclaim
echo 16374  /proc/sys/vm/min_free_kbytes
Process a: exit direct_reclaim
Process a: min_free_kbytes -= 65536

min_free_kbytes now wraps negative and the machine hangs.

The damage is confined to CMA though so I am not going to lose sleep
over it but you might want to consider at least preventing parallel
updates to min_free_kbytes from proc.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv20 00/15] Contiguous Memory Allocator

2012-02-03 Thread Mel Gorman
On Fri, Feb 03, 2012 at 01:18:43PM +0100, Marek Szyprowski wrote:
 Welcome everyone again!
 
 This is yet another quick update on Contiguous Memory Allocator patches.
 This version includes another set of code cleanups requested by Mel
 Gorman and a few minor bug fixes. I really hope that this version will
 be accepted for merging and future development will be handled by
 incremental patches.

FWIW, I've acked all I'm going to ack of this series and made some
suggestions on follow-ups on the core MM parts that could be done
in-tree. I think the current reclaim logic is going to burn CMA with
race conditions but it is a CMA-specific problem so watch out for
that :)

As before, I did not even look at the CMA driver itself or the
arch-specific parts. I'm assuming Arnd has that side of things covered.

Thanks Marek.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating

2012-02-02 Thread Mel Gorman
On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote:
   +   page = pfn_to_page(pfn);
   +   if (PageBuddy(page)) {
   +   pfn += 1  page_order(page);
   +   } else if (page_count(page) == 0) {
   +   set_page_private(page, MIGRATE_ISOLATE);
   +   ++pfn;
   
   This is dangerous for two reasons. If the page_count is 0, it could
   be because the page is in the process of being freed and is not
   necessarily on the per-cpu lists yet and you cannot be sure if the
   contents of page-private are important. Second, there is nothing to
   prevent another CPU allocating this page from its per-cpu list while
   the private field is getting updated from here which might lead to
   some interesting races.
   
   I recognise that what you are trying to do is respond to Gilad's
   request that you really check if an IPI here is necessary. I think what
   you need to do is check if a page with a count of 0 is encountered
   and if it is, then a draining of the per-cpu lists is necessary. To
   address Gilad's concerns, be sure to only this this once per attempt at
   CMA rather than for every page encountered with a count of 0 to avoid a
   storm of IPIs.
  
   It's actually more then that.
  
   This is the same issue that I first fixed with a change to 
   free_pcppages_bulk()
   function[1].  At the time of positing, you said you'd like me to try and 
   find
   a different solution which would not involve paying the price of calling
   get_pageblock_migratetype().  Later I also realised that this solution is
   not enough.
  
   [1] http://article.gmane.org/gmane.linux.kernel.mm/70314
  
  
  Yes. I had forgotten the history but looking at that patch again,
  I would reach the conclusion that this was adding a new call to
  get_pageblock_migratetype() in the bulk free path. That would affect
  everybody whether they were using CMA or not.
 
 This will be a bit ugly, but we can also use that code and compile it 
 conditionally
 when CMA has been enabled.

That would also be very unfortunate because it means enabling CMA incurs
a performance cost to everyone whether they use CMA or not. For ARM,
this may not be a problem but it would be for other arches if they
wanted to use CMA or if it ever became part of a distro contig.

 Pages, which have incorrect migrate type on free finally
 causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE.

I'm not quite seeing this. In free_hot_cold_page(), the pageblock
type is checked so the page private should be set to MIGRATE_CMA or
MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a
pageblock to MIGRATE_MOVABLE in error. If it turns out that you
absolutely have to call get_pageblock_migratetype() from
free_pcppages_bulk() and my alternative suggestion did not work out then
document all these issues in a comment when putting the call under
CONFIG_CMA so that it is not forgotten.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] mm: page_alloc: remove trailing whitespace

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:43AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com

Ordinarily, I do not like these sort of patches because they can
interfere with git blame but as it is comments that are affected;

Acked-by: Mel Gorman m...@csn.ul.ie

Thanks

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 This commit changes set_migratetype_isolate() so that it updates
 migrate type of pages on pcp list which is saved in their
 page_private.
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 ---
  include/linux/page-isolation.h |6 ++
  mm/page_alloc.c|1 +
  mm/page_isolation.c|   24 
  3 files changed, 31 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
 index 051c1b1..8c02c2b 100644
 --- a/include/linux/page-isolation.h
 +++ b/include/linux/page-isolation.h
 @@ -27,6 +27,12 @@ extern int
  test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
  
  /*
 + * Check all pages in pageblock, find the ones on pcp list, and set
 + * their page_private to MIGRATE_ISOLATE.
 + */
 +extern void update_pcp_isolate_block(unsigned long pfn);
 +
 +/*
   * Internal funcs.Changes pageblock's migrate type.
   * Please use make_pagetype_isolated()/make_pagetype_movable().
   */
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index e1c5656..70709e7 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -5465,6 +5465,7 @@ out:
   if (!ret) {
   set_pageblock_migratetype(page, MIGRATE_ISOLATE);
   move_freepages_block(zone, page, MIGRATE_ISOLATE);
 + update_pcp_isolate_block(pfn);
   }
  
   spin_unlock_irqrestore(zone-lock, flags);
 diff --git a/mm/page_isolation.c b/mm/page_isolation.c
 index 4ae42bb..9ea2f6e 100644
 --- a/mm/page_isolation.c
 +++ b/mm/page_isolation.c
 @@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, 
 unsigned long end_pfn)
   spin_unlock_irqrestore(zone-lock, flags);
   return ret ? 0 : -EBUSY;
  }
 +
 +/* must hold zone-lock */
 +void update_pcp_isolate_block(unsigned long pfn)
 +{
 + unsigned long end_pfn = pfn + pageblock_nr_pages;
 + struct page *page;
 +
 + while (pfn  end_pfn) {
 + if (!pfn_valid_within(pfn)) {
 + ++pfn;
 + continue;
 + }
 +

There is a potential problem here that you need to be aware of.
set_pageblock_migratetype() is called from start_isolate_page_range().
I do not think there is a guarantee that pfn + pageblock_nr_pages is
not in a different block of MAX_ORDER_NR_PAGES. If that is right then
your options are to add a check like this;

if ((pfn  (MAX_ORDER_NR_PAGES - 1)) == 0  !pfn_valid(pfn))
break;

or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in
the same block as pfn and relying on the caller to have called
pfn_valid.

 + page = pfn_to_page(pfn);
 + if (PageBuddy(page)) {
 + pfn += 1  page_order(page);
 + } else if (page_count(page) == 0) {
 + set_page_private(page, MIGRATE_ISOLATE);
 + ++pfn;

This is dangerous for two reasons. If the page_count is 0, it could
be because the page is in the process of being freed and is not
necessarily on the per-cpu lists yet and you cannot be sure if the
contents of page-private are important. Second, there is nothing to
prevent another CPU allocating this page from its per-cpu list while
the private field is getting updated from here which might lead to
some interesting races.

I recognise that what you are trying to do is respond to Gilad's
request that you really check if an IPI here is necessary. I think what
you need to do is check if a page with a count of 0 is encountered
and if it is, then a draining of the per-cpu lists is necessary. To
address Gilad's concerns, be sure to only this this once per attempt at
CMA rather than for every page encountered with a count of 0 to avoid a
storm of IPIs.

 + } else {
 + ++pfn;
 + }
 + }
 +}

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:45AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 This commit introduces isolate_migratepages_range() function which
 extracts functionality from isolate_migratepages() so that it can be
 used on arbitrary PFN ranges.
 
 isolate_migratepages() function is implemented as a simple wrapper
 around isolate_migratepages_range().
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com

Super, this is much easier to read. I have just one nit below but once
that is fixed;

Acked-by: Mel Gorman m...@csn.ul.ie

 @@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct zone 
 *zone,
   } else if (!locked)
   spin_lock_irq(zone-lru_lock);
  
 - if (!pfn_valid_within(low_pfn))
 + if (!pfn_valid(low_pfn))
   continue;
   nr_scanned++;
  

This chunk looks unrelated to the rest of the patch.

I think what you are doing is patching around a bug that CMA exposed
which is very similar to the bug report at
http://www.spinics.net/lists/linux-mm/msg29260.html . Is this true?

If so, I posted a fix that only calls pfn_valid() when necessary. Can
you check if that works for you and if so, drop this hunk please? If
the patch does not work for you, then this hunk still needs to be
in a separate patch and handled separately as it would also be a fix
for -stable.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] mm: compaction: introduce isolate_freepages_range()

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:46AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 This commit introduces isolate_freepages_range() function which
 generalises isolate_freepages_block() so that it can be used on
 arbitrary PFN ranges.
 
 isolate_freepages_block() is left with only minor changes.
 

The minor changes to isolate_freepages_block() look fine in
terms of how current compaction works. I have a minor comment on
isolate_freepages_range() but it is up to you whether to address them
or not. Whether you alter isolate_freepages_range() or not;

Acked-by: Mel Gorman m...@csn.ul.ie

 SNIP
 @@ -105,6 +109,80 @@ static unsigned long isolate_freepages_block(struct zone 
 *zone,
   return total_isolated;
  }
  
 +/**
 + * isolate_freepages_range() - isolate free pages.
 + * @start_pfn: The first PFN to start isolating.
 + * @end_pfn:   The one-past-last PFN.
 + *
 + * Non-free pages, invalid PFNs, or zone boundaries within the
 + * [start_pfn, end_pfn) range are considered errors, cause function to
 + * undo its actions and return zero.
 + *
 + * Otherwise, function returns one-past-the-last PFN of isolated page
 + * (which may be greater then end_pfn if end fell in a middle of
 + * a free page).
 + */
 +static unsigned long
 +isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
 +{
 + unsigned long isolated, pfn, block_end_pfn, flags;
 + struct zone *zone = NULL;
 + LIST_HEAD(freelist);
 + struct page *page;
 +
 + for (pfn = start_pfn; pfn  end_pfn; pfn += isolated) {
 + if (!pfn_valid(pfn))
 + break;
 +
 + if (!zone)
 + zone = page_zone(pfn_to_page(pfn));
 + else if (zone != page_zone(pfn_to_page(pfn)))
 + break;
 +

So what you are checking for here is if you straddle zones.
You could just initialise zone outside of the for loop. You can
then check outside the loop if end_pfn is in a different zone to
start_pfn. If it is, either adjust end_pfn accordingly or bail the
entire operation avoiding the need for release_freepages() later. This
will be a little cheaper.

 + /*
 +  * On subsequent iterations round_down() is actually not
 +  * needed, but we keep it that we not to complicate the code.
 +  */
 + block_end_pfn = round_down(pfn, pageblock_nr_pages)
 + + pageblock_nr_pages;

Seems a little more involved than it needs to be. Something like
this might suit and be a bit nicer?

block_end_pfn = ALIGN(pfn+1, pageblock_nr_pages);

 + block_end_pfn = min(block_end_pfn, end_pfn);
 +
 + spin_lock_irqsave(zone-lock, flags);
 + isolated = isolate_freepages_block(pfn, block_end_pfn,
 +freelist, true);
 + spin_unlock_irqrestore(zone-lock, flags);
 +
 + /*
 +  * In strict mode, isolate_freepages_block() returns 0 if
 +  * there are any holes in the block (ie. invalid PFNs or
 +  * non-free pages).
 +  */
 + if (!isolated)
 + break;
 +
 + /*
 +  * If we managed to isolate pages, it is always (1  n) *
 +  * pageblock_nr_pages for some non-negative n.  (Max order
 +  * page may span two pageblocks).
 +  */
 + }
 +
 + /* split_free_page does not map the pages */
 + list_for_each_entry(page, freelist, lru) {
 + arch_alloc_page(page, 0);
 + kernel_map_pages(page, 1, 1);
 + }
 +

This block is copied in two places - isolate_freepages and
isolate_freepages_range() so sharing a common helper would be nice. I
suspect you didn't because it would interfere with existing code more
than was strictly necessary which I complained about previously as
it made review harder. If that was your thinking, then just create
this helper in a separate patch. It's not critical though.

 + if (pfn  end_pfn) {
 + /* Loop terminated early, cleanup. */
 + release_freepages(freelist);
 + return 0;
 + }
 +
 + /* We don't use freelists for anything. */
 + return pfn;
 +}
 +
  /* Returns true if the page is within a block suitable for migration to */
  static bool suitable_migration_target(struct page *page)
  {

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] mm: compaction: introduce isolate_freepages_range()

2012-01-30 Thread Mel Gorman
On Mon, Jan 30, 2012 at 11:48:20AM +, Mel Gorman wrote:
  +   if (!zone)
  +   zone = page_zone(pfn_to_page(pfn));
  +   else if (zone != page_zone(pfn_to_page(pfn)))
  +   break;
  +
 
 So what you are checking for here is if you straddle zones.
 You could just initialise zone outside of the for loop. You can
 then check outside the loop if end_pfn is in a different zone to
 start_pfn. If it is, either adjust end_pfn accordingly or bail the
 entire operation avoiding the need for release_freepages() later. This
 will be a little cheaper.
 

Whoops, silly me! You are watching for overlapping zones which can
happen in some rare configurations and for that checking page_zone()
like this is necessary. You can still initialise zone outside the loop
but the page_zone() check is still necessary.

My bad.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] mm: compaction: export some of the functions

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:47AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 This commit exports some of the functions from compaction.c file
 outside of it adding their declaration into internal.h header
 file so that other mm related code can use them.
 
 This forced compaction.c to always be compiled (as opposed to being
 compiled only if CONFIG_COMPACTION is defined) but as to avoid
 introducing code that user did not ask for, part of the compaction.c
 is now wrapped in on #ifdef.
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 ---
  mm/Makefile |3 +-
  mm/compaction.c |  314 
 ++-
  mm/internal.h   |   33 ++
  3 files changed, 184 insertions(+), 166 deletions(-)
 
 diff --git a/mm/Makefile b/mm/Makefile
 index 50ec00e..8aada89 100644
 --- a/mm/Makefile
 +++ b/mm/Makefile
 @@ -13,7 +13,7 @@ obj-y   := filemap.o mempool.o 
 oom_kill.o fadvise.o \
  readahead.o swap.o truncate.o vmscan.o shmem.o \
  prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
  page_isolation.o mm_init.o mmu_context.o percpu.o \
 -$(mmu-y)
 +compaction.o $(mmu-y)
  obj-y += init-mm.o
  
  ifdef CONFIG_NO_BOOTMEM
 @@ -32,7 +32,6 @@ obj-$(CONFIG_NUMA)  += mempolicy.o
  obj-$(CONFIG_SPARSEMEM)  += sparse.o
  obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
  obj-$(CONFIG_SLOB) += slob.o
 -obj-$(CONFIG_COMPACTION) += compaction.o
  obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
  obj-$(CONFIG_KSM) += ksm.o
  obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
 diff --git a/mm/compaction.c b/mm/compaction.c
 index 63f82be..3e21d28 100644
 --- a/mm/compaction.c
 +++ b/mm/compaction.c
 @@ -16,30 +16,11 @@
  #include linux/sysfs.h
  #include internal.h
  
 +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
 +

This is pedantic but you reference CONFIG_CMA before the patch that
declares it. The only time this really matters is when it breaks
bisection but I do not think that is the case here.

Whether you fix this or not by moving the CONFIG_CMA check to the same
patch that declares it in Kconfig

Acked-by: Mel Gorman m...@csn.ul.ie

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range()

2012-01-30 Thread Mel Gorman
 and
 +  * change MIGRATE_ISOLATE to some other migration type.
 +  *
 +  * Once the pageblocks are marked as MIGRATE_ISOLATE, we
 +  * migrate the pages from an unaligned range (ie. pages that
 +  * we are interested in).  This will put all the pages in
 +  * range back to page allocator as MIGRATE_ISOLATE.
 +  *
 +  * When this is done, we take the pages in range from page
 +  * allocator removing them from the buddy system.  This way
 +  * page allocator will never consider using them.
 +  *
 +  * This lets us mark the pageblocks back as
 +  * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
 +  * MAX_ORDER aligned range but not in the unaligned, original
 +  * range are put back to page allocator so that buddy can use
 +  * them.
 +  */
 +
 + ret = start_isolate_page_range(pfn_align_to_maxpage_down(start),
 +pfn_align_to_maxpage_up(end));
 + if (ret)
 + goto done;
 +
 + ret = __alloc_contig_migrate_range(start, end);
 + if (ret)
 + goto done;
 +
 + /*
 +  * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
 +  * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 +  * more, all pages in [start, end) are free in page allocator.
 +  * What we are going to do is to allocate all pages from
 +  * [start, end) (that is remove them from page allocater).
 +  *
 +  * The only problem is that pages at the beginning and at the
 +  * end of interesting range may be not aligned with pages that
 +  * page allocator holds, ie. they can be part of higher order
 +  * pages.  Because of this, we reserve the bigger range and
 +  * once this is done free the pages we are not interested in.
 +  */
 +
 + lru_add_drain_all();
 + drain_all_pages();
 +

You unconditionally drain all pages here. It's up to you whether to
keep that or try reduce IPIs by only sending one if a page with count
0 is found in the range. I think it is something that could be followed
up on later and is not necessary for initial merging and wider testing.

 + order = 0;
 + outer_start = start;
 + while (!PageBuddy(pfn_to_page(outer_start))) {
 + if (WARN_ON(++order = MAX_ORDER)) {
 + ret = -EINVAL;
 + goto done;
 + }
 + outer_start = ~0UL  order;
 + }
 +

Just a small note here - you are checking PageBuddy without zone-lock .
As you have isolated the range, you have a reasonable expectation that
this is safe but if you spin another version of the patch it might
justify a small comment.

 + /* Make sure the range is really isolated. */
 + if (test_pages_isolated(outer_start, end)) {
 + pr_warn(__alloc_contig_migrate_range: test_pages_isolated(%lx, 
 %lx) failed\n,
 +outer_start, end);
 + ret = -EBUSY;
 + goto done;
 + }
 +
 + outer_end = isolate_freepages_range(outer_start, end);
 + if (!outer_end) {
 + ret = -EBUSY;
 + goto done;
 + }
 +
 + /* Free head and tail (if any) */
 + if (start != outer_start)
 + free_contig_range(outer_start, start - outer_start);
 + if (end != outer_end)
 + free_contig_range(end, outer_end - end);
 +
 +done:
 + undo_isolate_page_range(pfn_align_to_maxpage_down(start),
 + pfn_align_to_maxpage_up(end));
 + return ret;
 +}
 +
 +void free_contig_range(unsigned long pfn, unsigned nr_pages)
 +{
 + for (; nr_pages--; ++pfn)
 + __free_page(pfn_to_page(pfn));
 +}
 +
 +#endif
 +
 +

Bit of whitespace damage there.

I confess that I did not read this one quite as carefully because I
think I looked a previous version that looked ok at the time. As it
affects CMA and only CMA I also expect others will be spending a lot
of effort and testing on this. Nothing obvious or horrible jumped out
at me other than the page_isolation.h thing and that could be argued
either way so;

Acked-by: Mel Gorman m...@csn.ul.ie

  /*
  #ifdef CONFIG_MEMORY_HOTREMOVE
  /*
   * All pages in the range must be isolated before calling this.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/15] mm: page_alloc: change fallbacks array handling

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:49AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 This commit adds a row for MIGRATE_ISOLATE type to the fallbacks array
 which was missing from it.  It also, changes the array traversal logic
 a little making MIGRATE_RESERVE an end marker.  The letter change,
 removes the implicit MIGRATE_UNMOVABLE from the end of each row which
 was read by __rmqueue_fallback() function.
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com

Acked-by: Mel Gorman m...@csn.ul.ie

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added

2012-01-30 Thread Mel Gorman
 / 2) ||
 +  start_migratetype == MIGRATE_RECLAIMABLE ||
 +  page_group_by_mobility_disabled)) {
 + int pages;

You call is_pageblock_cma(page) here which in turn calls
get_pageblock_migratetype(). get_pageblock_migratetype() should be
avoided where possible and it is unecessary in this context because
we know what the migratetype of page. Use that information instead of
calling get_pageblock_migratetype().

   pages = move_freepages_block(zone, page,
   
 start_migratetype);
  
 @@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, int 
 start_migratetype)
   rmv_page_order(page);
  
   /* Take ownership for orders = pageblock_order */
 - if (current_order = pageblock_order)
 + if (current_order = pageblock_order 
 + !is_pageblock_cma(page))

Same, the get_pageblock_migratetype() call can be avoided.

   change_pageblock_range(page, current_order,
   start_migratetype);
  
 - expand(zone, page, order, current_order, area, 
 migratetype);
 + expand(zone, page, order, current_order, area,
 +is_migrate_cma(start_migratetype)
 +  ? start_migratetype : migratetype);
  

What is this check meant to be doing?

start_migratetype is determined by allocflags_to_migratetype() and
that never will be MIGRATE_CMA so is_migrate_cma(start_migratetype)
should always be false.

   trace_mm_page_alloc_extfrag(page, order, current_order,
   start_migratetype, migratetype);
 @@ -1093,7 +1128,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned 
 int order,
   list_add(page-lru, list);
   else
   list_add_tail(page-lru, list);
 - set_page_private(page, migratetype);
 +#ifdef CONFIG_CMA
 + if (is_pageblock_cma(page))
 + set_page_private(page, MIGRATE_CMA);
 + else
 +#endif
 + set_page_private(page, migratetype);
   list = page-lru;
   }
   __mod_zone_page_state(zone, NR_FREE_PAGES, -(i  order));
 @@ -1337,8 +1377,12 @@ int split_free_page(struct page *page)
  
   if (order = pageblock_order - 1) {
   struct page *endpage = page + (1  order) - 1;
 - for (; page  endpage; page += pageblock_nr_pages)
 - set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 + for (; page  endpage; page += pageblock_nr_pages) {
 + int mt = get_pageblock_migratetype(page);
 + if (mt != MIGRATE_ISOLATE  !is_migrate_cma(mt))
 + set_pageblock_migratetype(page,
 +   MIGRATE_MOVABLE);
 + }
   }
  
   return 1  order;
 @@ -5375,8 +5419,8 @@ __count_immobile_pages(struct zone *zone, struct page 
 *page, int count)
*/
   if (zone_idx(zone) == ZONE_MOVABLE)
   return true;
 -
 - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
 + if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
 + is_pageblock_cma(page))
   return true;
  
   pfn = page_to_pfn(page);
 diff --git a/mm/vmstat.c b/mm/vmstat.c
 index f600557..ace5383 100644
 --- a/mm/vmstat.c
 +++ b/mm/vmstat.c
 @@ -613,6 +613,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
   Reclaimable,
   Movable,
   Reserve,
 +#ifdef CONFIG_CMA
 + CMA,
 +#endif
   Isolate,
  };
  
 -- 
 1.7.1.569.g6f426
 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim()

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:52AM +0100, Marek Szyprowski wrote:
 This patch extracts common reclaim code from __alloc_pages_direct_reclaim()
 function to separate function: __perform_reclaim() which can be later used
 by alloc_contig_range().
 
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 CC: Michal Nazarewicz min...@mina86.com
 ---
  mm/page_alloc.c |   30 +-
  1 files changed, 21 insertions(+), 9 deletions(-)
 
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 4e60c0b..e35d06b 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -2094,16 +2094,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
 int order,
  }
  #endif /* CONFIG_COMPACTION */
  
 -/* The really slow allocator path where we enter direct reclaim */
 -static inline struct page *
 -__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 - struct zonelist *zonelist, enum zone_type high_zoneidx,
 - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
 - int migratetype, unsigned long *did_some_progress)
 +/* Perform direct synchronous page reclaim */
 +static inline int
 +__perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist 
 *zonelist,
 +   nodemask_t *nodemask)

This function is too large to be inlined. Make it a static int. Once
that is fixed add a

Acked-by: Mel Gorman m...@csn.ul.ie

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks

2012-01-30 Thread Mel Gorman
On Thu, Jan 26, 2012 at 10:00:53AM +0100, Marek Szyprowski wrote:
 alloc_contig_range() performs memory allocation so it also should keep
 track on keeping the correct level of memory watermarks. This commit adds
 a call to *_slowpath style reclaim to grab enough pages to make sure that
 the final collection of contiguous pages from freelists will not starve
 the system.
 
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 CC: Michal Nazarewicz min...@mina86.com
 ---
  mm/page_alloc.c |   36 
  1 files changed, 36 insertions(+), 0 deletions(-)
 
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index e35d06b..05eaa82 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -5613,6 +5613,34 @@ static int __alloc_contig_migrate_range(unsigned long 
 start, unsigned long end)
   return ret;
  }
  
 +/*
 + * Trigger memory pressure bump to reclaim some pages in order to be able to
 + * allocate 'count' pages in single page units. Does similar work as
 + *__alloc_pages_slowpath() function.
 + */
 +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
 +{
 + enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 + struct zonelist *zonelist = node_zonelist(0, gfp_mask);
 + int did_some_progress = 0;
 + int order = 1;
 + unsigned long watermark;
 +
 + /* Obey watermarks as if the page was being allocated */
 + watermark = low_wmark_pages(zone) + count;
 + while (!zone_watermark_ok(zone, 0, watermark, 0, 0)) {
 + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone));
 +
 + did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
 +   NULL);
 + if (!did_some_progress) {
 + /* Exhausted what can be done so it's blamo time */
 + out_of_memory(zonelist, gfp_mask, order, NULL);
 + }

There are three problems here

1. CMA can trigger the OOM killer.

That seems like overkill to me but as I do not know the consequences
of CMA failing, it's your call.

2. You cannot guarantee that try_to_free_pages will free pages from the
   zone you care about or that kswapd will do anything

You check the watermarks and take into account the size of the pending
CMA allocation. kswapd in vmscan.c on the other hand will simply check
the watermarks and probably go back to sleep. You should be aware of
this in case you ever get bugs that CMA takes too long and that it
appears to be stuck in this loop with kswapd staying asleep.

3. You reclaim from zones other than your target zone

try_to_free_pages is not necessarily going to free pages in the
zone you are checking for. It'll work on ARM in many cases because
there will be only one zone but on other arches, this logic will
be problematic and will potentially livelock. You need to pass in
a zonelist that only contains the zone that CMA cares about. If it
cannot reclaim, did_some_progress == 0 and it'll exit. Otherwise
there is a possibility that this will loop forever reclaiming pages
from the wrong zones.

I won't ack this particular patch but I am not going to insist that
you fix these prior to merging either. If you leave problem 3 as it
is, I would really like to see a comment explaning the problem for
future users of CMA on other arches (if they exist).

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv19 00/15] Contiguous Memory Allocator

2012-01-30 Thread Mel Gorman
On Fri, Jan 27, 2012 at 04:26:24PM -0800, Andrew Morton wrote:
 On Thu, 26 Jan 2012 15:31:40 +
 Arnd Bergmann a...@arndb.de wrote:
 
  On Thursday 26 January 2012, Marek Szyprowski wrote:
   Welcome everyone!
   
   Yes, that's true. This is yet another release of the Contiguous Memory
   Allocator patches. This version mainly includes code cleanups requested
   by Mel Gorman and a few minor bug fixes.
  
  Hi Marek,
  
  Thanks for keeping up this work! I really hope it works out for the
  next merge window.
 
 Someone please tell me when it's time to start paying attention
 again ;)
 
 These patches don't seem to have as many acked-bys and reviewed-bys as
 I'd expect. 

I reviewed the core MM changes and I've acked most of them so the
next release should have a few acks where you expect them. I did not
add a reviewed-by because I did not build and test the thing.

For me, Patch 2 is the only one that must be fixed prior to merging
as it can interfere with pages on a remote per-cpu list which is
dangerous. I know your suggestion will be to delete the per-cpu lists
and be done with it but I am a bit away from doing that just yet.

Patch 8 could do with a bit more care too but it is not a
potential hand grenade like patch 2 and could be fixed as part of
a follow-up. Even if you don't see an ack from me there, it should not
be treated as a show stopper.

I highlighted some issues on how CMA interacts with reclaim but I
think this is a problem specific to CMA and should not prevent it being
merged. I just wanted to be sure that the CMA people were aware of the
potential issues so they will recognise the class of bug if it occurs.

 Given the scope and duration of this, it would be useful
 to gather these up.  But please ensure they are real ones - people
 sometimes like to ack things without showing much sign of having
 actually read them.
 

FWIW, the acks I put on the core MM changes are real acks :)

 The patches do seem to have been going round in ever-decreasing circles
 lately and I think we have decided to merge them (yes?) so we may as well
 get on and do that and sort out remaining issues in-tree.
 

I'm a lot happier with the core MM patches than I was when I reviewed
this first around last September or October.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().

2012-01-30 Thread Mel Gorman
On Mon, Jan 30, 2012 at 01:42:50PM +0100, Michal Nazarewicz wrote:
 On Thu, Jan 26, 2012 at 10:00:45AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 @@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct 
 zone *zone,
 } else if (!locked)
 spin_lock_irq(zone-lru_lock);
 
 -   if (!pfn_valid_within(low_pfn))
 +   if (!pfn_valid(low_pfn))
 continue;
 nr_scanned++;
 
 
 On Mon, 30 Jan 2012 12:24:28 +0100, Mel Gorman m...@csn.ul.ie wrote:
 This chunk looks unrelated to the rest of the patch.
 
 I think what you are doing is patching around a bug that CMA exposed
 which is very similar to the bug report at
 http://www.spinics.net/lists/linux-mm/msg29260.html . Is this true?
 
 If so, I posted a fix that only calls pfn_valid() when necessary. Can
 you check if that works for you and if so, drop this hunk please? If
 the patch does not work for you, then this hunk still needs to be
 in a separate patch and handled separately as it would also be a fix
 for -stable.
 
 I'll actually never encountered this bug myself and CMA is unlikely to
 expose it, since it always operates on continuous memory regions with
 no holes.
 
 I've made this change because looking at the code it seemed like this
 may cause problems in some cases.  The crash that you linked to looks
 like the kind of problem I was thinking about.
 
 I'll drop this hunk and let you resolve this independently of CMA.
 

Ok, thanks.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added

2012-01-30 Thread Mel Gorman
On Mon, Jan 30, 2012 at 02:06:50PM +0100, Michal Nazarewicz wrote:
 @@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, 
 int start_migratetype)
 rmv_page_order(page);
 
 /* Take ownership for orders = pageblock_order */
 -   if (current_order = pageblock_order)
 +   if (current_order = pageblock_order 
 +   !is_pageblock_cma(page))
 change_pageblock_range(page, current_order,
 start_migratetype);
 
 -   expand(zone, page, order, current_order, area, 
 migratetype);
 +   expand(zone, page, order, current_order, area,
 +  is_migrate_cma(start_migratetype)
 +? start_migratetype : migratetype);
 
 
 What is this check meant to be doing?
 
 start_migratetype is determined by allocflags_to_migratetype() and
 that never will be MIGRATE_CMA so is_migrate_cma(start_migratetype)
 should always be false.
 
 Right, thanks!  This should be the other way around, ie.:
 
 + expand(zone, page, order, current_order, area,
 +is_migrate_cma(migratetype)
 +  ? migratetype : start_migratetype);
 
 I'll fix this and the calls to is_pageblock_cma().
 

That makes a lot more sense. Thanks.

I have a vague recollection that there was a problem with finding
unmovable pages in MIGRATE_CMA regions. This might have been part of
the problem.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating

2012-01-30 Thread Mel Gorman
On Mon, Jan 30, 2012 at 04:41:22PM +0100, Michal Nazarewicz wrote:
 On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman m...@csn.ul.ie wrote:
 
 On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 @@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, 
 unsigned long end_pfn)
 spin_unlock_irqrestore(zone-lock, flags);
 return ret ? 0 : -EBUSY;
  }
 +
 +/* must hold zone-lock */
 +void update_pcp_isolate_block(unsigned long pfn)
 +{
 +   unsigned long end_pfn = pfn + pageblock_nr_pages;
 +   struct page *page;
 +
 +   while (pfn  end_pfn) {
 +   if (!pfn_valid_within(pfn)) {
 +   ++pfn;
 +   continue;
 +   }
 +
 
 On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman m...@csn.ul.ie wrote:
 There is a potential problem here that you need to be aware of.
 set_pageblock_migratetype() is called from start_isolate_page_range().
 I do not think there is a guarantee that pfn + pageblock_nr_pages is
 not in a different block of MAX_ORDER_NR_PAGES. If that is right then
 your options are to add a check like this;
 
 if ((pfn  (MAX_ORDER_NR_PAGES - 1)) == 0  !pfn_valid(pfn))
  break;
 
 or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in
 the same block as pfn and relying on the caller to have called
 pfn_valid.
 
   pfn = round_down(pfn, pageblock_nr_pages);
   end_pfn = pfn + pageblock_nr_pages;
 
 should do the trick as well, right?  move_freepages_block() seem to be
 doing the same thing.
 

That would also do it the trick.

 +   page = pfn_to_page(pfn);
 +   if (PageBuddy(page)) {
 +   pfn += 1  page_order(page);
 +   } else if (page_count(page) == 0) {
 +   set_page_private(page, MIGRATE_ISOLATE);
 +   ++pfn;
 
 This is dangerous for two reasons. If the page_count is 0, it could
 be because the page is in the process of being freed and is not
 necessarily on the per-cpu lists yet and you cannot be sure if the
 contents of page-private are important. Second, there is nothing to
 prevent another CPU allocating this page from its per-cpu list while
 the private field is getting updated from here which might lead to
 some interesting races.
 
 I recognise that what you are trying to do is respond to Gilad's
 request that you really check if an IPI here is necessary. I think what
 you need to do is check if a page with a count of 0 is encountered
 and if it is, then a draining of the per-cpu lists is necessary. To
 address Gilad's concerns, be sure to only this this once per attempt at
 CMA rather than for every page encountered with a count of 0 to avoid a
 storm of IPIs.
 
 It's actually more then that.
 
 This is the same issue that I first fixed with a change to 
 free_pcppages_bulk()
 function[1].  At the time of positing, you said you'd like me to try and find
 a different solution which would not involve paying the price of calling
 get_pageblock_migratetype().  Later I also realised that this solution is
 not enough.
 
 [1] http://article.gmane.org/gmane.linux.kernel.mm/70314
 

Yes. I had forgotten the history but looking at that patch again,
I would reach the conclusion that this was adding a new call to
get_pageblock_migratetype() in the bulk free path. That would affect
everybody whether they were using CMA or not.

 My next attempt was to run drain PCP list while holding zone-lock[2], but 
 that
 quickly proven to be broken approach when Marek started testing it on an SMP
 system.
 
 [2] http://article.gmane.org/gmane.linux.kernel.mm/72016
 
 This patch is yet another attempt of solving this old issue.  Even though it 
 has
 a potential race condition we came to conclusion that the actual chances of
 causing any problems are slim.  Various stress tests did not, in fact, show
 the race to be an issue.
 

It is a really small race. To cause a problem CPU 1 must find a page
with count 0, CPU 2 must then allocate the page and set page-private
before CPU 1 overwrites that value but it's there.

 The problem is that if a page is on a PCP list, and it's underlaying 
 pageblocks'
 migrate type is changed to MIGRATE_ISOLATE, the page (i) will still remain on 
 PCP
 list and thus someone can allocate it, and (ii) when removed from PCP list, 
 the
 page will be put on freelist of migrate type it had prior to change.
 
 (i) is actually not such a big issue since the next thing that happens after
 isolation is migration so all the pages will get freed.  (ii) is actual 
 problem
 and if [1] is not an acceptable solution I really don't have a good fix for 
 that.
 
 One things that comes to mind is calling drain_all_pages() prior to acquiring
 zone-lock in set_migratetype_isolate().  This is however prone to races since
 after the drain and before the zone-lock is acquired, pages might get moved
 back to PCP list.
 
 Draining PCP list after acquiring zone-lock is not possible because

Re: [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()

2012-01-16 Thread Mel Gorman
On Fri, Jan 13, 2012 at 09:04:31PM +0100, Michal Nazarewicz wrote:
 On Thu, Dec 29, 2011 at 01:39:05PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 +   /* Make sure all pages are isolated. */
 +   if (!ret) {
 +   lru_add_drain_all();
 +   drain_all_pages();
 +   if (WARN_ON(test_pages_isolated(start, end)))
 +   ret = -EBUSY;
 +   }
 
 On Tue, 10 Jan 2012 15:16:13 +0100, Mel Gorman m...@csn.ul.ie wrote:
 Another global IPI seems overkill. Drain pages only from the local CPU
 (drain_pages(get_cpu()); put_cpu()) and test if the pages are isolated.
 
 Is get_cpu() + put_cpu() required? Won't drain_local_pages() work?
 

drain_local_pages() calls smp_processor_id() without preemption
disabled. 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/11] mm: page_alloc: introduce alloc_contig_range()

2012-01-10 Thread Mel Gorman
, warning and
retyrning -EBUSY. I expect the common case is that a global drain is
unneccessary.

 +
 + /* Release pages */
 + putback_lru_pages(cc.migratepages);
 +
 + return ret;
 +}
 +
 +/**
 + * alloc_contig_range() -- tries to allocate given range of pages
 + * @start:   start PFN to allocate
 + * @end: one-past-the-last PFN to allocate
 + *
 + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
 + * aligned, hovewer it's callers responsibility to guarantee that we

s/hovewer/however/

s/it's/it's the'

 + * are the only thread that changes migrate type of pageblocks the
 + * pages fall in.
 + *
 + * Returns zero on success or negative error code.  On success all
 + * pages which PFN is in (start, end) are allocated for the caller and
 + * need to be freed with free_contig_range().
 + */
 +int alloc_contig_range(unsigned long start, unsigned long end)
 +{
 + unsigned long outer_start, outer_end;
 + int ret;
 +
 + /*
 +  * What we do here is we mark all pageblocks in range as
 +  * MIGRATE_ISOLATE.  Because of the way page allocator work, we
 +  * align the range to MAX_ORDER pages so that page allocator
 +  * won't try to merge buddies from different pageblocks and
 +  * change MIGRATE_ISOLATE to some other migration type.
 +  *
 +  * Once the pageblocks are marked as MIGRATE_ISOLATE, we
 +  * migrate the pages from an unaligned range (ie. pages that
 +  * we are interested in).  This will put all the pages in
 +  * range back to page allocator as MIGRATE_ISOLATE.
 +  *
 +  * When this is done, we take the pages in range from page
 +  * allocator removing them from the buddy system.  This way
 +  * page allocator will never consider using them.
 +  *
 +  * This lets us mark the pageblocks back as
 +  * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
 +  * MAX_ORDER aligned range but not in the unaligned, original
 +  * range are put back to page allocator so that buddy can use
 +  * them.
 +  */
 +
 + ret = start_isolate_page_range(pfn_align_to_maxpage_down(start),
 +pfn_align_to_maxpage_up(end));
 + if (ret)
 + goto done;
 +
 + ret = __alloc_contig_migrate_range(start, end);
 + if (ret)
 + goto done;
 +
 + /*
 +  * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
 +  * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 +  * more, all pages in [start, end) are free in page allocator.
 +  * What we are going to do is to allocate all pages from
 +  * [start, end) (that is remove them from page allocater).
 +  *
 +  * The only problem is that pages at the beginning and at the
 +  * end of interesting range may be not aligned with pages that
 +  * page allocator holds, ie. they can be part of higher order
 +  * pages.  Because of this, we reserve the bigger range and
 +  * once this is done free the pages we are not interested in.
 +  */
 +
 + ret = 0;
 + while (!PageBuddy(pfn_to_page(start  (~0UL  ret
 + if (WARN_ON(++ret = MAX_ORDER)) {
 + ret = -EINVAL;
 + goto done;
 + }
 +
 + outer_start = start  (~0UL  ret);
 + outer_end = isolate_freepages_range(page_zone(pfn_to_page(outer_start)),
 + outer_start, end, NULL);
 + if (!outer_end) {
 + ret = -EBUSY;
 + goto done;
 + }
 + outer_end += outer_start;
 +
 + /* Free head and tail (if any) */
 + if (start != outer_start)
 + free_contig_range(outer_start, start - outer_start);
 + if (end != outer_end)
 + free_contig_range(end, outer_end - end);
 +
 + ret = 0;
 +done:
 + undo_isolate_page_range(pfn_align_to_maxpage_down(start),
 + pfn_align_to_maxpage_up(end));
 + return ret;
 +}
 +
 +void free_contig_range(unsigned long pfn, unsigned nr_pages)
 +{
 + for (; nr_pages--; ++pfn)
 + __free_page(pfn_to_page(pfn));
 +}
 +
  #ifdef CONFIG_MEMORY_HOTREMOVE
  /*
   * All pages in the range must be isolated before calling this.
 -- 
 1.7.1.569.g6f426
 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/11] mm: mmzone: MIGRATE_CMA migration type added

2012-01-10 Thread Mel Gorman
 +  * pages on different free lists. We don't
 +  * want unmovable pages to be allocated from
 +  * MIGRATE_CMA areas.
*/
 - if (unlikely(current_order = (pageblock_order  1)) ||
 - start_migratetype == 
 MIGRATE_RECLAIMABLE ||
 - page_group_by_mobility_disabled) {
 - unsigned long pages;
 + if (!is_pageblock_cma(page) 
 + (unlikely(current_order = pageblock_order / 2) ||
 +  start_migratetype == MIGRATE_RECLAIMABLE ||
 +  page_group_by_mobility_disabled)) {
 + int pages;
   pages = move_freepages_block(zone, page,
   
 start_migratetype);
  
 @@ -971,11 +997,14 @@ __rmqueue_fallback(struct zone *zone, int order, int 
 start_migratetype)
   rmv_page_order(page);
  
   /* Take ownership for orders = pageblock_order */
 - if (current_order = pageblock_order)
 + if (current_order = pageblock_order 
 + !is_pageblock_cma(page))
   change_pageblock_range(page, current_order,
   start_migratetype);
  
 - expand(zone, page, order, current_order, area, 
 migratetype);
 + expand(zone, page, order, current_order, area,
 +is_migrate_cma(start_migratetype)
 +  ? start_migratetype : migratetype);
  
   trace_mm_page_alloc_extfrag(page, order, current_order,
   start_migratetype, migratetype);
 @@ -1047,7 +1076,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned 
 int order,
   list_add(page-lru, list);
   else
   list_add_tail(page-lru, list);
 - set_page_private(page, migratetype);
 + if (is_pageblock_cma(page))
 + set_page_private(page, MIGRATE_CMA);
 + else
 + set_page_private(page, migratetype);
   list = page-lru;
   }
   __mod_zone_page_state(zone, NR_FREE_PAGES, -(i  order));
 @@ -1308,8 +1340,12 @@ int split_free_page(struct page *page)
  
   if (order = pageblock_order - 1) {
   struct page *endpage = page + (1  order) - 1;
 - for (; page  endpage; page += pageblock_nr_pages)
 - set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 + for (; page  endpage; page += pageblock_nr_pages) {
 + int mt = get_pageblock_migratetype(page);
 + if (mt != MIGRATE_ISOLATE  !is_migrate_cma(mt))
 + set_pageblock_migratetype(page,
 +   MIGRATE_MOVABLE);
 + }
   }
  
   return 1  order;
 @@ -5599,8 +5635,8 @@ __count_immobile_pages(struct zone *zone, struct page 
 *page, int count)
*/
   if (zone_idx(zone) == ZONE_MOVABLE)
   return true;
 -
 - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
 + if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
 + is_pageblock_cma(page))
   return true;
  
   pfn = page_to_pfn(page);
 diff --git a/mm/vmstat.c b/mm/vmstat.c
 index 8fd603b..9fb1789 100644
 --- a/mm/vmstat.c
 +++ b/mm/vmstat.c
 @@ -613,6 +613,7 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
   Reclaimable,
   Movable,
   Reserve,
 + Cma,
   Isolate,
  };
  
 -- 
 1.7.1.569.g6f426
 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/11] mm: add optional memory reclaim in split_free_page()

2012-01-10 Thread Mel Gorman
)
 +{
 + gfp_t gfp_mask = GFP_HIGHUSER_MOVABLE;
 + enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 + struct zonelist *zonelist = node_zonelist(0, gfp_mask);
 + struct reclaim_state reclaim_state;
 + int did_some_progress = 0;
 +
 + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone));
 +
 + /* We now go into synchronous reclaim */
 + cpuset_memory_pressure_bump();
 + current-flags |= PF_MEMALLOC;
 + lockdep_set_current_reclaim_state(gfp_mask);
 + reclaim_state.reclaimed_slab = 0;
 + current-reclaim_state = reclaim_state;
 +
 + did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, NULL);
 +
 + current-reclaim_state = NULL;
 + lockdep_clear_current_reclaim_state();
 + current-flags = ~PF_MEMALLOC;
 +
 + cond_resched();
 +
 + if (!did_some_progress) {
 + /* Exhausted what can be done so it's blamo time */
 + out_of_memory(zonelist, gfp_mask, order, NULL);
 + }
 + return order;
 +}

At the very least, __alloc_pages_direct_reclaim() should be sharing this
code.

 +
  /*
   * Similar to split_page except the page is already free. As this is only
   * being used for migration, the migratetype of the block also changes.
 - * As this is called with interrupts disabled, the caller is responsible
 - * for calling arch_alloc_page() and kernel_map_page() after interrupts
 - * are enabled.
 + * The caller is responsible for calling arch_alloc_page() and 
 kernel_map_page()
 + * after interrupts are enabled.
   *
   * Note: this is probably too low level an operation for use in drivers.
   * Please consult with lkml before using this in your driver.
   */
 -int split_free_page(struct page *page)
 +int split_free_page(struct page *page, int force_reclaim)
  {
   unsigned int order;
   unsigned long watermark;
 @@ -1323,10 +1371,15 @@ int split_free_page(struct page *page)
   zone = page_zone(page);
   order = page_order(page);
  
 +try_again:
   /* Obey watermarks as if the page was being allocated */
   watermark = low_wmark_pages(zone) + (1  order);
 - if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
 - return 0;
 + if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) {
 + if (force_reclaim  __force_memory_reclaim(order, zone))
 + goto try_again;
 + else
 + return 0;
 + }
  
   /* Remove page from free list */
   list_del(page-lru);
 @@ -2086,18 +2139,6 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned 
 int order,
   return page;
  }
  
 -static inline
 -void wake_all_kswapd(unsigned int order, struct zonelist *zonelist,
 - enum zone_type high_zoneidx,
 - enum zone_type classzone_idx)
 -{
 - struct zoneref *z;
 - struct zone *zone;
 -
 - for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
 - wakeup_kswapd(zone, order, classzone_idx);
 -}
 -
  static inline int
  gfp_to_alloc_flags(gfp_t gfp_mask)
  {
 @@ -5917,7 +5958,7 @@ int alloc_contig_range(unsigned long start, unsigned 
 long end,
  
   outer_start = start  (~0UL  ret);
   outer_end = isolate_freepages_range(page_zone(pfn_to_page(outer_start)),
 - outer_start, end, NULL);
 + outer_start, end, NULL, true);
   if (!outer_end) {
   ret = -EBUSY;
   goto done;

I think it makes far more sense to have the logic related to calling
try_to_free_pages here in alloc_contig_range rather than trying to pass
it into compaction helper functions.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/11] mm: page_alloc: handle MIGRATE_ISOLATE in free_pcppages_bulk()

2011-12-12 Thread Mel Gorman
On Fri, Nov 18, 2011 at 05:43:08PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 If page is on PCP list while pageblock it belongs to gets isolated,
 the page's private still holds the old migrate type.  This means
 that free_pcppages_bulk() will put the page on a freelist of the
 old migrate type instead of MIGRATE_ISOLATE.
 
 This commit changes that by explicitly checking whether page's
 pageblock's migrate type is MIGRATE_ISOLATE and if it is, overwrites
 page's private data.
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 ---
  mm/page_alloc.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)
 
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 9dd443d..58d1a2e 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -628,6 +628,18 @@ static void free_pcppages_bulk(struct zone *zone, int 
 count,
   page = list_entry(list-prev, struct page, lru);
   /* must delete as __free_one_page list manipulates */
   list_del(page-lru);
 +
 + /*
 +  * When page is isolated in set_migratetype_isolate()
 +  * function it's page_private is not changed since the
 +  * function has no way of knowing if it can touch it.
 +  * This means that when a page is on PCP list, it's
 +  * page_private no longer matches the desired migrate
 +  * type.
 +  */
 + if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
 + set_page_private(page, MIGRATE_ISOLATE);
 +

How much of a problem is this in practice? It's adding overhead to the
free path for what should be a very rare case which is undesirable. I
know we are already calling get_pageblock_migrate() when freeing
pages but it should be unnecessary to call it again. I'd go as far
to say that it would be preferable to drain the per-CPU lists after
you set pageblocks MIGRATE_ISOLATE. The IPIs also have overhead but it
will be incurred for the rare rather than the common case.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().

2011-12-12 Thread Mel Gorman
 recognise that
incrementing low_pfn would prevent an already isolated PFN being
scanned the next time but it should be a separate patch.

  
   acct_isolated(zone, cc);
  
   spin_unlock_irq(zone-lru_lock);
 - cc-migrate_pfn = low_pfn;
  
   trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
  
 + return low_pfn;
 +}
 +
 +/*
 + * Isolate all pages that can be migrated from the block pointed to by
 + * the migrate scanner within compact_control.
 + */
 +static isolate_migrate_t isolate_migratepages(struct zone *zone,
 + struct compact_control *cc)
 +{
 + unsigned long low_pfn, end_pfn;
 +
 + /* Do not scan outside zone boundaries */
 + low_pfn = max(cc-migrate_pfn, zone-zone_start_pfn);
 +
 + /* Only scan within a pageblock boundary */
 + end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
 +
 + /* Do not cross the free scanner or scan within a memory hole */
 + if (end_pfn  cc-free_pfn || !pfn_valid(low_pfn)) {
 + cc-migrate_pfn = end_pfn;
 + return ISOLATE_NONE;
 + }
 +
 + /* Perform the isolation */
 + low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
 + if (!low_pfn)
 + return ISOLATE_ABORT;
 +
 + cc-migrate_pfn = low_pfn;
 +
   return ISOLATE_SUCCESS;
  }
  
 -- 
 1.7.1.569.g6f426
 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] mm: mmzone: introduce zone_pfn_same_memmap()

2011-12-12 Thread Mel Gorman
On Mon, Dec 12, 2011 at 03:35:03PM +0100, Michal Nazarewicz wrote:
 On Fri, Nov 18, 2011 at 05:43:10PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 diff --git a/mm/compaction.c b/mm/compaction.c
 index 6afae0e..09c9702 100644
 --- a/mm/compaction.c
 +++ b/mm/compaction.c
 @@ -111,7 +111,10 @@ skip:
 
  next:
 pfn += isolated;
 -   page += isolated;
 +   if (zone_pfn_same_memmap(pfn - isolated, pfn))
 +   page += isolated;
 +   else
 +   page = pfn_to_page(pfn);
 }
 
 On Mon, 12 Dec 2011 15:19:53 +0100, Mel Gorman m...@csn.ul.ie wrote:
 Is this necessary?
 
 We are isolating pages, the largest of which is a MAX_ORDER_NR_PAGES
 page.  [...]
 
 This is not true for CMA.
 

To be clear, I'm referring to a single page being isolated here. It may
or may not be a high-order page but it's still going to be less then
MAX_ORDER_NR_PAGES so you should be able check when a new block is
entered and pfn_to_page is necessary.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/11] mm: page_alloc: handle MIGRATE_ISOLATE in free_pcppages_bulk()

2011-12-12 Thread Mel Gorman
On Mon, Dec 12, 2011 at 03:23:02PM +0100, Michal Nazarewicz wrote:
 On Fri, Nov 18, 2011 at 05:43:08PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 9dd443d..58d1a2e 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -628,6 +628,18 @@ static void free_pcppages_bulk(struct zone *zone, int 
 count,
 page = list_entry(list-prev, struct page, lru);
 /* must delete as __free_one_page list manipulates */
 list_del(page-lru);
 +
 +   /*
 +* When page is isolated in set_migratetype_isolate()
 +* function it's page_private is not changed since the
 +* function has no way of knowing if it can touch it.
 +* This means that when a page is on PCP list, it's
 +* page_private no longer matches the desired migrate
 +* type.
 +*/
 +   if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
 +   set_page_private(page, MIGRATE_ISOLATE);
 +
 
 On Mon, 12 Dec 2011 14:42:35 +0100, Mel Gorman m...@csn.ul.ie wrote:
 How much of a problem is this in practice?
 
 IIRC, this lead to allocation being made from area marked as isolated
 or some such.
 

And I believe that nothing prevents that from happening. I was just
wondering how common it was in practice. Draining the per-cpu lists
should work as a substitute either way.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/11] mm: compaction: export some of the functions

2011-12-12 Thread Mel Gorman
On Mon, Dec 12, 2011 at 03:41:04PM +0100, Michal Nazarewicz wrote:
 On Mon, 12 Dec 2011 15:29:07 +0100, Mel Gorman m...@csn.ul.ie wrote:
 
 On Fri, Nov 18, 2011 at 05:43:11PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 
 This commit exports some of the functions from compaction.c file
 outside of it adding their declaration into internal.h header
 file so that other mm related code can use them.
 
 This forced compaction.c to always be compiled (as opposed to being
 compiled only if CONFIG_COMPACTION is defined) but as to avoid
 introducing code that user did not ask for, part of the compaction.c
 is now wrapped in on #ifdef.
 
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 ---
  mm/Makefile |3 +-
  mm/compaction.c |  112 
  +++
  mm/internal.h   |   35 +
  3 files changed, 83 insertions(+), 67 deletions(-)
 
 diff --git a/mm/Makefile b/mm/Makefile
 index 50ec00e..24ed801 100644
 --- a/mm/Makefile
 +++ b/mm/Makefile
 @@ -13,7 +13,7 @@ obj-y := filemap.o mempool.o 
 oom_kill.o fadvise.o \
readahead.o swap.o truncate.o vmscan.o shmem.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
page_isolation.o mm_init.o mmu_context.o percpu.o \
 -  $(mmu-y)
 +  $(mmu-y) compaction.o
 
 That should be
 
 compaction.o $(mmu-y)
 
 for consistency.
 
 Overall, this patch implies that CMA is always compiled in.
 
 Not really.  But yes, it produces some bloat when neither CMA nor
 compaction are compiled.  I assume that linker will be able to deal
 with that (since the functions are not EXPORT_SYMBOL'ed).
 

The bloat exists either way. I don't believe the linker strips it out so
overall it would make more sense to depend on compaction to keep the
vmstat counters for debugging reasons if nothing else. It's not
something I feel very strongly about though.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] mm: mmzone: introduce zone_pfn_same_memmap()

2011-12-12 Thread Mel Gorman
On Mon, Dec 12, 2011 at 03:51:55PM +0100, Michal Nazarewicz wrote:
 On Fri, Nov 18, 2011 at 05:43:10PM +0100, Marek Szyprowski wrote:
 From: Michal Nazarewicz min...@mina86.com
 diff --git a/mm/compaction.c b/mm/compaction.c
 index 6afae0e..09c9702 100644
 --- a/mm/compaction.c
 +++ b/mm/compaction.c
 @@ -111,7 +111,10 @@ skip:
 
  next:
 pfn += isolated;
 -   page += isolated;
 +   if (zone_pfn_same_memmap(pfn - isolated, pfn))
 +   page += isolated;
 +   else
 +   page = pfn_to_page(pfn);
 }
 
 On Mon, 12 Dec 2011 15:19:53 +0100, Mel Gorman m...@csn.ul.ie wrote:
 Is this necessary?
 
 We are isolating pages, the largest of which is a MAX_ORDER_NR_PAGES
 page.  [...]
 
 On Mon, 12 Dec 2011 15:40:30 +0100, Mel Gorman m...@csn.ul.ie wrote:
 To be clear, I'm referring to a single page being isolated here. It may
 or may not be a high-order page but it's still going to be less then
 MAX_ORDER_NR_PAGES so you should be able check when a new block is
 entered and pfn_to_page is necessary.
 
 Do you mean something like:
 
 if (same pageblock)
   just do arithmetic;
 else
   use pfn_to_page;
 

something like the following untested snippet.

/*
 * Resolve pfn_to_page every MAX_ORDER_NR_PAGES to handle the case where
 * memmap is not contiguous such as with SPARSEMEM memory model without
 * VMEMMAP
 */
pfn += isolated;
page += isolated;
if ((pfn  ~(MAX_ORDER_NR_PAGES-1)) == 0)
page = pfn_to_page(pfn);

That would be closer to what other PFN walkers do

 ?
 
 I've discussed it with Dave and he suggested that approach as an
 optimisation since in some configurations zone_pfn_same_memmap()
 is always true thus compiler will strip the else part, whereas
 same pageblock test will be false on occasions regardless of kernel
 configuration.
 

Ok, while I recognise it's an optimisation, it's a very small
optimisation and I'm not keen on introducing something new for
CMA that has been coped with in the past by always walking PFNs in
pageblock-sized ranges with pfn_valid checks where necessary.

See setup_zone_migrate_reserve as one example where pfn_to_page is
only called once per pageblock and calls pageblock_is_reserved()
for examining pages within a pageblock. Still, if you really want
the helper, at least keep it in compaction.c as there should be no
need to have it in mmzone.h

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().

2011-12-12 Thread Mel Gorman
On Mon, Dec 12, 2011 at 04:22:39PM +0100, Michal Nazarewicz wrote:
  SNIP
 
 +   if (!pfn_valid_within(pfn))
 +   goto skip;
 
 The flow of this function in general with gotos of skipped and next
 is confusing in comparison to the existing function. For example,
 if this PFN is not valid, and no freelist is provided, then we call
 __free_page() on a PFN that is known to be invalid.
 
 +   ++nr_scanned;
 +
 +   if (!PageBuddy(page)) {
 +skip:
 +   if (freelist)
 +   goto next;
 +   for (; start  pfn; ++start)
 +   __free_page(pfn_to_page(pfn));
 +   return 0;
 +   }
 
 So if a PFN is valid and !PageBuddy and no freelist is provided, we
 call __free_page() on it regardless of reference count. That does not
 sound safe.
 
 Sorry about that.  It's a bug in the code which was caught later on.  The
 code should read ???__free_page(pfn_to_page(start))???.
 

That will call free on valid PFNs but why is it safe to call
__free_page() at all?  You say later that CMA requires that all
pages in the range be valid but if the pages are in use, that does
not mean that calling __free_page() is safe. I suspect you have not
seen a problem because the pages in the range were free as expected
and not in use because of MIGRATE_ISOLATE.

 /* Found a free page, break it into order-0 pages */
 isolated = split_free_page(page);
 total_isolated += isolated;
 -   for (i = 0; i  isolated; i++) {
 -   list_add(page-lru, freelist);
 -   page++;
 +   if (freelist) {
 +   struct page *p = page;
 +   for (i = isolated; i; --i, ++p)
 +   list_add(p-lru, freelist);
 }
 
 -   /* If a page was split, advance to the end of it */
 -   if (isolated) {
 -   blockpfn += isolated - 1;
 -   cursor += isolated - 1;
 -   }
 +next:
 +   pfn += isolated;
 +   page += isolated;
 
 The name isolated is now confusing because it can mean either
 pages isolated or pages scanned depending on context. Your patch
 appears to be doing a lot more than is necessary to convert
 isolate_freepages_block into isolate_freepages_range and at this point,
 it's unclear why you did that.
 
 When CMA uses this function, it requires all pages in the range to be valid
 and free. (Both conditions should be met but you never know.) 

It seems racy but I guess you are depending on MIGRATE_ISOLATE to keep
things sane which is fine. However, I strongly suspect that if there
is a race and a page is in use, then you will need to retry the
migration step. Calling __free_page does not look right because
something still has a reference to the page.

 This change
 adds a second way isolate_freepages_range() works, which is when freelist is
 not specified, abort on invalid or non-free page, but continue as usual if
 freelist is provided.
 

Ok, I think you should be able to do that by not calling split_free_page
or adding to the list if !freelist with a comment explaining why the
pages are left on the buddy lists for the caller to figure out. Bail if
a page-in-use is found and have the caller check that the return value
of isolate_freepages_block == end_pfn - start_pfn.

 I can try and restructure this function a bit so that there are fewer 
 ???gotos???,
 but without the above change, CMA won't really be able to use it effectively
 (it would have to provide a freelist and then validate if pages on it are
 added in order).
 

Please do and double check that __free_page logic too.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/11] mm: compaction: introduce isolate_{free,migrate}pages_range().

2011-12-12 Thread Mel Gorman
On Mon, Dec 12, 2011 at 05:46:13PM +0100, Michal Nazarewicz wrote:
 On Mon, 12 Dec 2011 17:30:52 +0100, Mel Gorman m...@csn.ul.ie wrote:
 
 On Mon, Dec 12, 2011 at 04:22:39PM +0100, Michal Nazarewicz wrote:
  SNIP
 
 + if (!pfn_valid_within(pfn))
 + goto skip;
 
 The flow of this function in general with gotos of skipped and next
 is confusing in comparison to the existing function. For example,
 if this PFN is not valid, and no freelist is provided, then we call
 __free_page() on a PFN that is known to be invalid.
 
 + ++nr_scanned;
 +
 + if (!PageBuddy(page)) {
 +skip:
 + if (freelist)
 + goto next;
 + for (; start  pfn; ++start)
 + __free_page(pfn_to_page(pfn));
 + return 0;
 + }
 
 So if a PFN is valid and !PageBuddy and no freelist is provided, we
 call __free_page() on it regardless of reference count. That does not
 sound safe.
 
 Sorry about that.  It's a bug in the code which was caught later on.  The
 code should read ???__free_page(pfn_to_page(start))???.
 
 On Mon, 12 Dec 2011 17:30:52 +0100, Mel Gorman m...@csn.ul.ie wrote:
 That will call free on valid PFNs but why is it safe to call
 __free_page() at all?  You say later that CMA requires that all
 pages in the range be valid but if the pages are in use, that does
 not mean that calling __free_page() is safe. I suspect you have not
 seen a problem because the pages in the range were free as expected
 and not in use because of MIGRATE_ISOLATE.
 
 All pages from [start, pfn) have passed through the loop body which
 means that they are valid and they have been removed from buddy (for
 caller's use). Also, because of split_free_page(), all of those pages
 have been split into 0-order pages. 

Ah, I see. Even though you are not putting the pages on a freelist, the
function still returns with an elevated reference count and it's up to
the caller to find them again.

 Therefore, in error recovery, to
 undo what the loop has done so far, we put give back to buddy by
 calling __free_page() on each 0-order page.
 

Ok.

   /* Found a free page, break it into order-0 pages */
   isolated = split_free_page(page);
   total_isolated += isolated;
 - for (i = 0; i  isolated; i++) {
 - list_add(page-lru, freelist);
 - page++;
 + if (freelist) {
 + struct page *p = page;
 + for (i = isolated; i; --i, ++p)
 + list_add(p-lru, freelist);
   }
 
 - /* If a page was split, advance to the end of it */
 - if (isolated) {
 - blockpfn += isolated - 1;
 - cursor += isolated - 1;
 - }
 +next:
 + pfn += isolated;
 + page += isolated;
 
 The name isolated is now confusing because it can mean either
 pages isolated or pages scanned depending on context. Your patch
 appears to be doing a lot more than is necessary to convert
 isolate_freepages_block into isolate_freepages_range and at this point,
 it's unclear why you did that.
 
 When CMA uses this function, it requires all pages in the range to be valid
 and free. (Both conditions should be met but you never know.)
 
 To be clear, I meant that the CMA expects pages to be in buddy when the 
 function
 is called but after the function finishes, all the pages in the range are 
 removed
 from buddy.  This, among other things, is why the call to split_free_page() is
 necessary.
 

Understood.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-11-01 Thread Mel Gorman
On Sun, Oct 23, 2011 at 09:05:05PM -0700, Michal Nazarewicz wrote:
  On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
  This commit introduces alloc_contig_freed_pages() function
  which allocates (ie. removes from buddy system) free pages
  in range. Caller has to guarantee that all pages in range
  are in buddy system.
 
 On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman m...@csn.ul.ie wrote:
  Straight away, I'm wondering why you didn't use
  mm/compaction.c#isolate_freepages()
 
 Does the below look like a step in the right direction?
 
 It basically moves isolate_freepages_block() to page_alloc.c (changing

For the purposes of review, have a separate patch for moving
isolate_freepages_block to another file that does not alter the
function in any way. When the function is updated in a follow-on patch,
it'll be far easier to see what has changed.

page_isolation.c may also be a better fit than page_alloc.c

As it is, the patch for isolate_freepages_block is almost impossible
to read because it's getting munged with existing code that is already
in page_alloc.c . About all I caught from it is that scannedp does
not have a type. It defaults to unsigned int but it's unnecessarily
obscure.

 it name to isolate_freepages_range()) and changes it so that depending
 on arguments it treats holes (either invalid PFN or non-free page) as
 errors so that CMA can use it.
 

I haven't actually read the function because it's too badly mixed with
page_alloc.c code but this description fits what I'm looking for.

 It also accepts a range rather then just assuming a single pageblock
 thus the change moves range calculation in compaction.c from
 isolate_freepages_block() up to isolate_freepages().
 
 The change also modifies spilt_free_page() so that it does not try to
 change pageblock's migrate type if current migrate type is ISOLATE or
 CMA.
 

This is fine. Later, the flags that determine what happens to pageblocks
may be placed in compact_control.

 ---
  include/linux/mm.h |1 -
  include/linux/page-isolation.h |4 +-
  mm/compaction.c|   73 +++
  mm/internal.h  |5 ++
  mm/page_alloc.c|  128 
 +---
  5 files changed, 95 insertions(+), 116 deletions(-)
 

I confess I didn't read closely because of the mess in page_alloc.c but
the intent seems fine. Hopefully there will be a new version of CMA
posted that will be easier to review.

Thanks

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-11-01 Thread Mel Gorman
On Tue, Nov 01, 2011 at 07:06:56PM +0100, Michal Nazarewicz wrote:
 page_isolation.c may also be a better fit than page_alloc.c
 
 Since isolate_freepages_block() is the only user of split_free_page(),
 would it make sense to move split_free_page() to page_isolation.c as
 well?  I sort of like the idea of making it static and removing from
 header file.
 

I see no problem with that. It'll be separate from split_page() but that
is not earth shattering.

Thanks.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-10-21 Thread Mel Gorman
On Tue, Oct 18, 2011 at 10:26:37AM -0700, Michal Nazarewicz wrote:
 On Tue, 18 Oct 2011 05:21:09 -0700, Mel Gorman m...@csn.ul.ie wrote:
 
 At this point, I'm going to apologise for not reviewing this a long long
 time ago.
 
 On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
 From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
 
 This commit introduces alloc_contig_freed_pages() function
 which allocates (ie. removes from buddy system) free pages
 in range. Caller has to guarantee that all pages in range
 are in buddy system.
 
 
 Straight away, I'm wondering why you didn't use
 
 mm/compaction.c#isolate_freepages()
 
 It knows how to isolate pages within ranges. All its control information
 is passed via struct compact_control() which I recognise may be awkward
 for CMA but compaction.c know how to manage all the isolated pages and
 pass them to migrate.c appropriately.
 
 It is something to consider.  At first glance, I see that isolate_freepages
 seem to operate on pageblocks which is not desired for CMA.
 

isolate_freepages_block operates on a range of pages that happens to be
hard-coded to be a pageblock because that was the requirements. It calculates
end_pfn and it is possible to make that a function parameter.

 I haven't read all the patches yet but isolate_freepages() does break
 everything up into order-0 pages. This may not be to your liking but it
 would not be possible to change.
 
 Splitting everything into order-0 pages is desired behaviour.
 

Great.

 Along with this function, a free_contig_pages() function is
 provided which frees all (or a subset of) pages allocated
 with alloc_contig_free_pages().
 
 mm/compaction.c#release_freepages()
 
 It sort of does the same thing but release_freepages() assumes that pages
 that are being freed are not-continuous and they need to be on the lru list.
 With free_contig_pages(), we can assume all pages are continuous.
 

Ok, I jumped the gun here. release_freepages() may not be a perfect fit.
release_freepages() is also used when finishing compaction where as it
is a real free function that is required here.

 You can do this in a more general fashion by checking the
 zone boundaries and resolving the pfn-page every MAX_ORDER_NR_PAGES.
 That will not be SPARSEMEM specific.
 
 I've tried doing stuff that way but it ended up with much more code.
 
 Dave suggested the above function to check if pointer arithmetic is valid.
 
 Please see also https://lkml.org/lkml/2011/9/21/220.
 

Ok, I'm still not fully convinced but I confess I'm not thinking about this
particular function too deeply because I am expecting the problem would
go away if compaction and CMA shared common code for freeing contiguous
regions via page migration.

  SNIP
 +   if (zone_pfn_same_memmap(pfn - count, pfn))
 +   page += count;
 +   else
 +   page = pfn_to_page(pfn);
 +   }
 +
 +   spin_unlock_irq(zone-lock);
 +
 +   /* After this, pages in the range can be freed one be one */
 +   count = pfn - start;
 +   pfn = start;
 +   for (page = pfn_to_page(pfn); count; --count) {
 +   prep_new_page(page, 0, flag);
 +   ++pfn;
 +   if (likely(zone_pfn_same_memmap(pfn - 1, pfn)))
 +   ++page;
 +   else
 +   page = pfn_to_page(pfn);
 +   }
 +
 
 Here it looks like you have implemented something like split_free_page().
 
 split_free_page() takes a single page, removes it from buddy system, and 
 finally
 splits it. 

I'm referring to just this chunk.

split_free_page takes a page, checks the watermarks and performs similar
operations to prep_new_page(). There should be no need to introduce a
new similar function. split_free_page() does affect hte pageblock
migratetype and that is undesirable but that part could be taken out and
moved to compaction.c if necessary.

On the watermarks thing, CMA does not pay much attention to them. I have
a strong feeling that it is easy to deadlock a system by using CMA while
under memory pressure. Compaction had the same problem early in its
development FWIW. This is partially why I'd prefer to see CMA and
compaction sharing as much code as possible because compaction gets
continual testing.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] mm: move some functions from memory_hotplug.c to page_isolation.c

2011-10-18 Thread Mel Gorman
On Thu, Oct 06, 2011 at 03:54:41PM +0200, Marek Szyprowski wrote:
 From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
 
 Memory hotplug is a logic for making pages unused in the specified
 range of pfn. So, some of core logics can be used for other purpose
 as allocating a very large contigous memory block.
 
 This patch moves some functions from mm/memory_hotplug.c to
 mm/page_isolation.c. This helps adding a function for large-alloc in
 page_isolation.c with memory-unplug technique.
 
 Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
 [m.nazarewicz: reworded commit message]
 Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 [m.szyprowski: rebased and updated to Linux v3.0-rc1]
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 CC: Michal Nazarewicz min...@mina86.com
 Acked-by: Arnd Bergmann a...@arndb.de
 ---
  include/linux/page-isolation.h |7 +++
  mm/memory_hotplug.c|  111 --
  mm/page_isolation.c|  114 
 
  3 files changed, 121 insertions(+), 111 deletions(-)
 
 diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
 index 051c1b1..58cdbac 100644
 --- a/include/linux/page-isolation.h
 +++ b/include/linux/page-isolation.h
 @@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long 
 end_pfn);
  extern int set_migratetype_isolate(struct page *page);
  extern void unset_migratetype_isolate(struct page *page);
  
 +/*
 + * For migration.
 + */
 +
 +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);

bool

 +unsigned long scan_lru_pages(unsigned long start, unsigned long end);

Both function names are misleading.

 +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
  

scan_lru_pages as it is used by memory hotplug is also extremely
expensive. Unplugging memory is rare so the performance is not a concern
but it make be for CMA.

I think it would have made more sense to either express this as an iterator
like for_each_lru_page_in_range() and use cursors or reuse the compaction
code. As it is, these functions are a bit rough. I'm biased but this code
seems to have very similar responsibilities to the compaction.c code for
isolate_migratepages and how it handles migration. It also knows how to avoid
isolating so much memory as to put the system in the risk of being livelocked,
isolate pages from the LRU in batch etc.

This is not a show-stopper as such but personally I would prefer that
the memory hotplug code be sharing code with compaction than CMA adding
a new dependency on memory hotplug.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] mm: alloc_contig_freed_pages() added

2011-10-18 Thread Mel Gorman
At this point, I'm going to apologise for not reviewing this a long long
time ago.

On Thu, Oct 06, 2011 at 03:54:42PM +0200, Marek Szyprowski wrote:
 From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
 
 This commit introduces alloc_contig_freed_pages() function
 which allocates (ie. removes from buddy system) free pages
 in range. Caller has to guarantee that all pages in range
 are in buddy system.
 

Straight away, I'm wondering why you didn't use

mm/compaction.c#isolate_freepages()

It knows how to isolate pages within ranges. All its control information
is passed via struct compact_control() which I recognise may be awkward
for CMA but compaction.c know how to manage all the isolated pages and
pass them to migrate.c appropriately.

I haven't read all the patches yet but isolate_freepages() does break
everything up into order-0 pages. This may not be to your liking but it
would not be possible to change.

 Along with this function, a free_contig_pages() function is
 provided which frees all (or a subset of) pages allocated
 with alloc_contig_free_pages().
 

mm/compaction.c#release_freepages()

 Michal Nazarewicz has modified the function to make it easier
 to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
 return pfn of one-past-the-last allocated page.
 
 Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
 Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
 [m.nazarewicz: added checks if all allocated pages comes from the
 same memory zone]
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 [m.szyprowski: fixed wrong condition in VM_BUG_ON assert]
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 Acked-by: Arnd Bergmann a...@arndb.de
 ---
  include/linux/mmzone.h |   16 +
  include/linux/page-isolation.h |5 +++
  mm/page_alloc.c|   67 
 
  3 files changed, 88 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
 index a2760bb..862a834 100644
 --- a/include/linux/mmzone.h
 +++ b/include/linux/mmzone.h
 @@ -1168,6 +1168,22 @@ static inline int memmap_valid_within(unsigned long 
 pfn,
  }
  #endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
  
 +#if defined(CONFIG_SPARSEMEM)  !defined(CONFIG_SPARSEMEM_VMEMMAP)
 +/*
 + * Both PFNs must be from the same zone!  If this function returns
 + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2).
 + */
 +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long 
 pfn2)
 +{
 + return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2);
 +}
 +

Why do you care what section the page is in? The zone is important all
right, but not the section. Also, offhand I'm unsure if being in the
same section guarantees the same zone. sections are ordinarily fully
populated (except on ARM but hey) but I can't remember anything
enforcing that zones be section-aligned.

Later I think I see that the intention was to reduce the use of
pfn_to_page(). You can do this in a more general fashion by checking the
zone boundaries and resolving the pfn-page every MAX_ORDER_NR_PAGES.
That will not be SPARSEMEM specific.

 +#else
 +
 +#define zone_pfn_same_memmap(pfn1, pfn2) (true)
 +
 +#endif
 +
  #endif /* !__GENERATING_BOUNDS.H */
  #endif /* !__ASSEMBLY__ */
  #endif /* _LINUX_MMZONE_H */
 diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
 index 58cdbac..b9fc428 100644
 --- a/include/linux/page-isolation.h
 +++ b/include/linux/page-isolation.h
 @@ -33,6 +33,11 @@ test_pages_isolated(unsigned long start_pfn, unsigned long 
 end_pfn);
  extern int set_migratetype_isolate(struct page *page);
  extern void unset_migratetype_isolate(struct page *page);
  
 +/* The below functions must be run on a range from a single zone. */
 +extern unsigned long alloc_contig_freed_pages(unsigned long start,
 +   unsigned long end, gfp_t flag);
 +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
 +
  /*
   * For migration.
   */
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index bf4399a..fbfb920 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -5706,6 +5706,73 @@ out:
   spin_unlock_irqrestore(zone-lock, flags);
  }
  
 +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long 
 end,
 +gfp_t flag)
 +{
 + unsigned long pfn = start, count;
 + struct page *page;
 + struct zone *zone;
 + int order;
 +
 + VM_BUG_ON(!pfn_valid(start));

VM_BUG_ON seems very harsh here. WARN_ON_ONCE and returning 0 to the
caller sees reasonable.

 + page = pfn_to_page(start);
 + zone = page_zone(page);
 +
 + spin_lock_irq(zone-lock);
 +
 + for (;;) {
 + VM_BUG_ON(page_count(page) || !PageBuddy(page) ||
 +   page_zone(page) != zone);
 +

Here you will VM_BUG_ON with the 

Re: [PATCH 3/9] mm: alloc_contig_range() added

2011-10-18 Thread Mel Gorman
On Thu, Oct 06, 2011 at 03:54:43PM +0200, Marek Szyprowski wrote:
 From: Michal Nazarewicz m.nazarew...@samsung.com
 
 This commit adds the alloc_contig_range() function which tries
 to allocate given range of pages.  It tries to migrate all
 already allocated pages that fall in the range thus freeing them.
 Once all pages in the range are freed they are removed from the
 buddy system thus allocated for the caller to use.
 
 Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 [m.szyprowski: renamed some variables for easier code reading]
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 CC: Michal Nazarewicz min...@mina86.com
 Acked-by: Arnd Bergmann a...@arndb.de
 ---
  include/linux/page-isolation.h |2 +
  mm/page_alloc.c|  148 
 
  2 files changed, 150 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
 index b9fc428..774ecec 100644
 --- a/include/linux/page-isolation.h
 +++ b/include/linux/page-isolation.h
 @@ -36,6 +36,8 @@ extern void unset_migratetype_isolate(struct page *page);
  /* The below functions must be run on a range from a single zone. */
  extern unsigned long alloc_contig_freed_pages(unsigned long start,
 unsigned long end, gfp_t flag);
 +extern int alloc_contig_range(unsigned long start, unsigned long end,
 +   gfp_t flags);
  extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
  
  /*
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index fbfb920..8010854 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -5773,6 +5773,154 @@ void free_contig_pages(unsigned long pfn, unsigned 
 nr_pages)
   }
  }
  
 +static unsigned long pfn_to_maxpage(unsigned long pfn)
 +{
 + return pfn  ~(MAX_ORDER_NR_PAGES - 1);
 +}
 +

pfn_to_maxpage is a very confusing name here. It would be preferable to
create a MAX_ORDER_MASK that you apply directly.

Maybe something like SECTION_ALIGN_UP and SECTION_ALIGN_DOWN.

 +static unsigned long pfn_to_maxpage_up(unsigned long pfn)
 +{
 + return ALIGN(pfn, MAX_ORDER_NR_PAGES);
 +}
 +
 +#define MIGRATION_RETRY  5
 +static int __alloc_contig_migrate_range(unsigned long start, unsigned long 
 end)
 +{
 + int migration_failed = 0, ret;
 + unsigned long pfn = start;
 +
 + /*
 +  * Some code borrowed from KAMEZAWA Hiroyuki's
 +  * __alloc_contig_pages().
 +  */
 +

There is no need to put a comment like this here. Credit him in the
changelog.

 + /* drop all pages in pagevec and pcp list */
 + lru_add_drain_all();
 + drain_all_pages();
 +

Very similar to migrate_prep(). drain_all_pages should not be required
at this point.

 + for (;;) {
 + pfn = scan_lru_pages(pfn, end);

scan_lru_pages() is inefficient, this is going to be costly.

 + if (!pfn || pfn = end)
 + break;
 +
 + ret = do_migrate_range(pfn, end);
 + if (!ret) {
 + migration_failed = 0;
 + } else if (ret != -EBUSY
 + || ++migration_failed = MIGRATION_RETRY) {
 + return ret;
 + } else {
 + /* There are unstable pages.on pagevec. */
 + lru_add_drain_all();
 + /*
 +  * there may be pages on pcplist before
 +  * we mark the range as ISOLATED.
 +  */
 + drain_all_pages();
 + }
 + cond_resched();
 + }
 +
 + if (!migration_failed) {
 + /* drop all pages in pagevec and pcp list */
 + lru_add_drain_all();
 + drain_all_pages();
 + }
 +
 + /* Make sure all pages are isolated */
 + if (WARN_ON(test_pages_isolated(start, end)))
 + return -EBUSY;
 +

In some respects, this is very similar to mm/compaction#compact_zone().
They could have shared significant code if you reworked compact_zone
to work on ranges of memory and express compact_zone to operate on
zone-zone_start_pfn zone-zone_start_pfn+zone-spanned_pages . The
compaction code is 

 + return 0;
 +}
 +
 +/**
 + * alloc_contig_range() -- tries to allocate given range of pages
 + * @start:   start PFN to allocate
 + * @end: one-past-the-last PFN to allocate
 + * @flags:   flags passed to alloc_contig_freed_pages().
 + *
 + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
 + * aligned, hovewer it's callers responsibility to guarantee that we
 + * are the only thread that changes migrate type of pageblocks the
 + * pages fall in.
 + *
 + * Returns zero on success or negative error code.  On success all
 + * pages which PFN is in (start, end) are allocated for the caller and
 + * need to be freed with free_contig_pages().
 + */
 +int 

Re: [PATCH 4/9] mm: MIGRATE_CMA migration type added

2011-10-18 Thread Mel Gorman
On Thu, Oct 06, 2011 at 03:54:44PM +0200, Marek Szyprowski wrote:
 From: Michal Nazarewicz m.nazarew...@samsung.com
 
 The MIGRATE_CMA migration type has two main characteristics:
 (i) only movable pages can be allocated from MIGRATE_CMA
 pageblocks and (ii) page allocator will never change migration
 type of MIGRATE_CMA pageblocks.
 
 This guarantees that page in a MIGRATE_CMA page block can
 always be migrated somewhere else (unless there's no memory left
 in the system).
 

Or the count is premanently elevated by a device driver for some reason or if
the page is backed by a filesystem with a broken or unusable migrate_page()
function. This is unavoidable, I'm just pointing out that you can stil have
migration failures, particularly if GFP_MOVABLE has been improperly
used.

 It is designed to be used with Contiguous Memory Allocator
 (CMA) for allocating big chunks (eg. 10MiB) of physically
 contiguous memory.  Once driver requests contiguous memory,
 CMA will migrate pages from MIGRATE_CMA pageblocks.
 
 To minimise number of migrations, MIGRATE_CMA migration type
 is the last type tried when page allocator falls back to other
 migration types then requested.
 

It would be preferable if you could figure out how to reuse the
MIGRATE_RESERVE type for just the bitmap. Like MIGRATE_CMA, it does not
change type except when min_free_kbytes changes. However, it is
something that could be done in the future to keep the size of the
pageblock bitmap where it is now.

 Signed-off-by: Michal Nazarewicz m.nazarew...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 [m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs]
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 [m.nazarewicz: fixed incorrect handling of pages from ISOLATE page blocks]
 Signed-off-by: Michal Nazarewicz min...@mina86.com
 Acked-by: Arnd Bergmann a...@arndb.de
 ---
  include/linux/mmzone.h |   41 +
  include/linux/page-isolation.h |1 +
  mm/Kconfig |8 -
  mm/compaction.c|   10 +
  mm/page_alloc.c|   79 
 ++--
  5 files changed, 112 insertions(+), 27 deletions(-)
 
 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
 index 862a834..cc34965 100644
 --- a/include/linux/mmzone.h
 +++ b/include/linux/mmzone.h
 @@ -35,13 +35,35 @@
   */
  #define PAGE_ALLOC_COSTLY_ORDER 3
  
 -#define MIGRATE_UNMOVABLE 0
 -#define MIGRATE_RECLAIMABLE   1
 -#define MIGRATE_MOVABLE   2
 -#define MIGRATE_PCPTYPES  3 /* the number of types on the pcp lists */
 -#define MIGRATE_RESERVE   3
 -#define MIGRATE_ISOLATE   4 /* can't allocate from here */
 -#define MIGRATE_TYPES 5
 +enum {
 + MIGRATE_UNMOVABLE,
 + MIGRATE_RECLAIMABLE,
 + MIGRATE_MOVABLE,
 + MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
 + MIGRATE_RESERVE = MIGRATE_PCPTYPES,
 + /*
 +  * MIGRATE_CMA migration type is designed to mimic the way
 +  * ZONE_MOVABLE works.  Only movable pages can be allocated
 +  * from MIGRATE_CMA pageblocks and page allocator never
 +  * implicitly change migration type of MIGRATE_CMA pageblock.
 +  *
 +  * The way to use it is to change migratetype of a range of
 +  * pageblocks to MIGRATE_CMA which can be done by
 +  * __free_pageblock_cma() function.  What is important though
 +  * is that a range of pageblocks must be aligned to
 +  * MAX_ORDER_NR_PAGES should biggest page be bigger then
 +  * a single pageblock.
 +  */
 + MIGRATE_CMA,

This does mean that MIGRATE_CMA also does not have a per-cpu list. I
don't know if that matters to you but all allocations using MIGRATE_CMA
will take the zone lock. I'm not sure this can be easily avoided because
if there is a per-CPU list for MIGRATE_CMA, it might use a new cache
line for it and incur a different set of performance problems.

 + MIGRATE_ISOLATE,/* can't allocate from here */
 + MIGRATE_TYPES
 +};
 +
 +#ifdef CONFIG_CMA_MIGRATE_TYPE
 +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
 +#else
 +#  define is_migrate_cma(migratetype) false
 +#endif
  
  #define for_each_migratetype_order(order, type) \
   for (order = 0; order  MAX_ORDER; order++) \
 @@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page 
 *page)
   return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
  }
  
 +static inline bool is_pageblock_cma(struct page *page)
 +{
 + return is_migrate_cma(get_pageblock_migratetype(page));
 +}
 +
  struct free_area {
   struct list_headfree_list[MIGRATE_TYPES];
   unsigned long   nr_free;
 diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
 index 774ecec..9b6aa8a 100644
 --- a/include/linux/page-isolation.h
 +++ b/include/linux/page-isolation.h
 @@ -48,4 +48,5 @@ int 

Re: [PATCH 6/9] drivers: add Contiguous Memory Allocator

2011-10-18 Thread Mel Gorman
On Thu, Oct 06, 2011 at 03:54:46PM +0200, Marek Szyprowski wrote:
 The Contiguous Memory Allocator is a set of helper functions for DMA
 mapping framework that improves allocations of contiguous memory chunks.
 
 CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
 gives back to the system. Kernel is allowed to allocate movable pages
 within CMA's managed memory so that it can be used for example for page
 cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
 request such pages are migrated out of CMA area to free required
 contiguous block and fulfill the request. This allows to allocate large
 contiguous chunks of memory at any time assuming that there is enough
 free memory available in the system.
 
 This code is heavily based on earlier works by Michal Nazarewicz.
 
 Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
 CC: Michal Nazarewicz min...@mina86.com
 ---
  arch/Kconfig |3 +
  drivers/base/Kconfig |   79 +++
  drivers/base/Makefile|1 +
  drivers/base/dma-contiguous.c|  386 
 ++
  include/asm-generic/dma-contiguous.h |   27 +++
  include/linux/device.h   |4 +
  include/linux/dma-contiguous.h   |  106 ++
  7 files changed, 606 insertions(+), 0 deletions(-)
  create mode 100644 drivers/base/dma-contiguous.c
  create mode 100644 include/asm-generic/dma-contiguous.h
  create mode 100644 include/linux/dma-contiguous.h
 
 diff --git a/arch/Kconfig b/arch/Kconfig
 index 4b0669c..a3b39a2 100644
 --- a/arch/Kconfig
 +++ b/arch/Kconfig
 @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK
  config HAVE_DMA_ATTRS
   bool
  
 +config HAVE_DMA_CONTIGUOUS
 + bool
 +
  config USE_GENERIC_SMP_HELPERS
   bool
  
 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 21cf46f..a5e6d75 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
  
  source drivers/base/regmap/Kconfig
  
 +config CMA
 + bool Contiguous Memory Allocator (EXPERIMENTAL)
 + depends on HAVE_DMA_CONTIGUOUS  HAVE_MEMBLOCK  EXPERIMENTAL
 + select MIGRATION
 + select CMA_MIGRATE_TYPE
 + help
 +   This enables the Contiguous Memory Allocator which allows drivers
 +   to allocate big physically-contiguous blocks of memory for use with
 +   hardware components that do not support I/O map nor scatter-gather.
 +
 +   For more information see include/linux/dma-contiguous.h.
 +   If unsure, say n.
 +
 +if CMA
 +
 +config CMA_DEBUG
 + bool CMA debug messages (DEVELOPEMENT)

s/DEVELOPEMENT/DEVELOPMENT/

Should it be under DEBUG_KERNEL?

 + help
 +   Turns on debug messages in CMA.  This produces KERN_DEBUG
 +   messages for every CMA call as well as various messages while
 +   processing calls such as dma_alloc_from_contiguous().
 +   This option does not affect warning and error messages.
 +
 +comment Default contiguous memory area size:
 +
 +config CMA_SIZE_ABSOLUTE
 + int Absolute size (in MiB)
 + depends on !CMA_SIZE_SEL_PERCENTAGE
 + default 16
 + help
 +   Defines the size (in MiB) of the default memory area for Contiguous
 +   Memory Allocator.
 +
 +config CMA_SIZE_PERCENTAGE
 + int Percentage of total memory
 + depends on !CMA_SIZE_SEL_ABSOLUTE
 + default 10
 + help
 +   Defines the size of the default memory area for Contiguous Memory
 +   Allocator as a percentage of the total memory in the system.
 +

Why is this not a kernel parameter rather than a config option?

Better yet, why do drivers not register how much CMA memory they are
interested in and then the drive core figure out if it can allocate that
much or not?

 +choice
 + prompt Selected region size
 + default CMA_SIZE_SEL_ABSOLUTE
 +
 +config CMA_SIZE_SEL_ABSOLUTE
 + bool Use absolute value only
 +
 +config CMA_SIZE_SEL_PERCENTAGE
 + bool Use percentage value only
 +
 +config CMA_SIZE_SEL_MIN
 + bool Use lower value (minimum)
 +
 +config CMA_SIZE_SEL_MAX
 + bool Use higher value (maximum)
 +
 +endchoice
 +
 +config CMA_ALIGNMENT
 + int Maximum PAGE_SIZE order of alignment for contiguous buffers
 + range 4 9
 + default 8
 + help
 +   DMA mapping framework by default aligns all buffers to the smallest
 +   PAGE_SIZE order which is greater than or equal to the requested buffer
 +   size. This works well for buffers up to a few hundreds kilobytes, but
 +   for larger buffers it just a memory waste. With this parameter you can
 +   specify the maximum PAGE_SIZE order for contiguous buffers. Larger
 +   buffers will be aligned only to this specified order. The order is
 +   expressed as a power of two multiplied by the PAGE_SIZE.
 +
 +   For example, if your system defaults to 4KiB pages, the order value
 +   of 8 

Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

2010-08-26 Thread Mel Gorman
On Fri, Aug 20, 2010 at 03:15:10PM +0200, Peter Zijlstra wrote:
 On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote:
  Hello everyone,
  
  The following patchset implements a Contiguous Memory Allocator.  For
  those who have not yet stumbled across CMA an excerpt from
  documentation:
  
 The Contiguous Memory Allocator (CMA) is a framework, which allows
 setting up a machine-specific configuration for physically-contiguous
 memory management. Memory for devices is then allocated according
 to that configuration.
  
 The main role of the framework is not to allocate memory, but to
 parse and manage memory configurations, as well as to act as an
 in-between between device drivers and pluggable allocators. It is
 thus not tied to any memory allocation method or strategy.
  
  For more information please refer to the second patch from the
  patchset which contains the documentation.
 

I'm only taking a quick look at this - slow as ever so pardon me if I
missed anything.

 So the idea is to grab a large chunk of memory at boot time and then
 later allow some device to use it?

 I'd much rather we'd improve the regular page allocator to be smarter
 about this. We recently added a lot of smarts to it like memory
 compaction, which allows large gobs of contiguous memory to be freed for
 things like huge pages.
 

Quick glance tells me that buffer sizes of 20MB are being thrown about
which the core page allocator doesn't handle very well (and couldn't
without major modification). Fragmentation avoidance only works well on
sizes  MAX_ORDER_NR_PAGES which likely will be 2MB or 4MB.

That said, there are things the core VM can do to help. One is related
to ZONE_MOVABLE and the second is on the use of MIGRATE_ISOLATE.

ZONE_MOVABLE is setup when the command line has kernelcore= or movablecore=
specified. In ZONE_MOVABLE only pages that can be migrated are allocated
(or huge pages if specifically configured to be allowed).  The zone is setup
during initialisation by slicing pieces from the end of existing zones and
for various reasons, it would be best to maintain that behaviour unless CMA
had a specific requirement for memory in the middle of an existing zone.

So lets say the maximum amount of contiguous memory required by all
devices is 64M and ZONE_MOVABLE is 64M. During normal operation, normal
order-0 pages can be allocated from this zone meaning the memory is not
pinned and unusable by anybody else. This avoids wasting memory. When a
device needs a new buffer, compaction would need some additional smarts
to compact or reclaim the size of memory needed by the driver but
because all the pages in the zone are movable, it should be possible.
Ideally it would have swap to reclaim because if not, compaction needs
to know how to move pages outside a zone (something it currently
avoids).

Essentially, cma_alloc() would be a normal alloc_pages that uses
ZONE_MOVABLE for buffers  MAX_ORDER_NR_PAGES but would need additional
compaction smarts for the larger buffers. I think it would reuse as much
of the existing VM as possible but without reviewing the code, I don't
know for sure how useful the suggestion is.

 If you want guarantees you can free stuff, why not add constraints to
 the page allocation type and only allow MIGRATE_MOVABLE pages inside a
 certain region, those pages are easily freed/moved aside to satisfy
 large contiguous allocations.
 

Relatively handy to do something like this. It can also be somewhat
contrained by doing something similar to MIGRATE_ISOLATE to have
contiguous regions of memory in a zone unusable by non-movable
allocationos. It would be a lot trickier when interacting with reclaim
though so using ZONE_MOVABLE would have less gotchas.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

2010-08-26 Thread Mel Gorman
On Thu, Aug 26, 2010 at 04:40:46AM +0200, Micha?? Nazarewicz wrote:
 Hello Andrew,

 I think Pawel has replied to most of your comments, so I'll just add my own
 0.02 KRW. ;)

 Peter Zijlstra pet...@infradead.org wrote:
 So the idea is to grab a large chunk of memory at boot time and then
 later allow some device to use it?

 I'd much rather we'd improve the regular page allocator to be smarter
 about this. We recently added a lot of smarts to it like memory
 compaction, which allows large gobs of contiguous memory to be freed for
 things like huge pages.

 If you want guarantees you can free stuff, why not add constraints to
 the page allocation type and only allow MIGRATE_MOVABLE pages inside a
 certain region, those pages are easily freed/moved aside to satisfy
 large contiguous allocations.

 On Thu, 26 Aug 2010 00:58:14 +0200, Andrew Morton a...@linux-foundation.org 
 wrote:
 That would be good.  Although I expect that the allocation would need
 to be 100% rock-solid reliable, otherwise the end user has a
 non-functioning device.  Could generic core VM provide the required level
 of service?

 I think that the biggest problem is fragmentation here.  For instance,
 I think that a situation where there is enough free space but it's
 fragmented so no single contiguous chunk can be allocated is a serious
 problem.  However, I would argue that if there's simply no space left,
 a multimedia device could fail and even though it's not desirable, it
 would not be such a big issue in my eyes.


For handling fragmentation, there is the option of ZONE_MOVABLE so it's
usable by normal allocations but the CMA can take action to get it
cleared out if necessary. Another option that is trickier but less
disruptive would be to select a range of memory in a normal zone for CMA
and mark it MIGRATE_MOVABLE so that movable pages are allocated from it.
The trickier part is you need to make that bit stick so that non-movable
pages are never allocated from that range. That would be trickish to
implement but possible and it would avoid the fragmentation
problem without pinning memory.

 So, if only movable or discardable pages are allocated in CMA managed
 regions all should work well.  When a device needs memory discardable
 pages would get freed and movable moved unless there is no space left
 on the device in which case allocation would fail.

 Critical devices (just a hypothetical entities) could have separate
 regions on which only discardable pages can be allocated so that memory
 can always be allocated for them.

 I agree that having two contiguous memory allocators floating about
 on the list is distressing.  Are we really all 100% diligently certain
 that there is no commonality here with Zach's work?

 As Pawel said, I think Zach's trying to solve a different problem.  No
 matter, as I've said in response to Konrad's message, I have thought
 about unifying Zach's IOMMU and CMA in such a way that devices could
 work on both systems with and without IOMMU if only they would limit
 the usage of the API to some subset which always works.

 Please cc me on future emails on this topic?

 Not a problem.

 -- 
 Best regards,_ _
 | Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
 | Computer Science,  Micha?? mina86 Nazarewicz   (o o)
 +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo--

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a


-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html