[PATCH v12 0/7] make balloon pages movable by compaction

2012-11-11 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
Ballooning for transparent huge pages -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction  migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Following are numbers that prove this patch benefits on allowing compaction
to be more effective at memory ballooned guests.

Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB chunks,
at every minute (inflating/deflating), while test was running:

===BEGIN stress-highalloc

STRESS-HIGHALLOC
 highalloc-3.7 highalloc-3.7
 rc4-clean rc4-patch
Pass 1  55.00 ( 0.00%)62.00 ( 7.00%)
Pass 2  54.00 ( 0.00%)62.00 ( 8.00%)
while Rested75.00 ( 0.00%)80.00 ( 5.00%)

MMTests Statistics: duration
 3.7 3.7
   rc4-clean   rc4-patch
User 1207.59 1207.46
System   1300.55 1299.61
Elapsed  2273.72 2157.06

MMTests Statistics: vmstat
3.7 3.7
  rc4-clean   rc4-patch
Page Ins3581516 2374368
Page Outs  1114869210410332
Swap Ins 80  47
Swap Outs  3641 476
Direct pages scanned  37978   33826
Kswapd pages scanned1828245 1342869
Kswapd pages reclaimed  1710236 1304099
Direct pages reclaimed32207   31005
Kswapd efficiency   93% 97%
Kswapd velocity 804.077 622.546
Direct efficiency   84% 91%
Direct velocity  16.703  15.682
Percentage direct scans  2%  2%
Page writes by reclaim792529704
Page writes file  756119228
Page writes anon   3641 476
Page reclaim immediate16764   11014
Page rescued immediate0   0
Slabs scanned   2171904 2152448
Direct inode steals 3852261
Kswapd inode steals  659137  609670
Kswapd skipped wait   1  69
THP fault alloc 546 631
THP collapse alloc  361 339
THP splits  259 263
THP fault fallback   98  50
THP collapse fail20  17
Compaction stalls   747 499
Compaction success  244 145
Compaction failures 503 354
Compaction pages moved   370888  474837
Compaction move failure   77378   65259

===END stress-highalloc

Rafael Aquini (7):
  mm: adjust address_space_operations.migratepage() return code
  mm: redefine address_space.assoc_mapping
  mm: introduce a common interface for balloon pages mobility
  mm: introduce compaction and migration for ballooned pages
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce putback_movable_pages()
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c| 139 +++--
 fs/buffer.c|  12 +-
 fs/gfs2/glock.c|   2 +-
 fs/hugetlbfs/inode.c   |   4 +-
 fs/inode.c |   2 +-
 fs/nilfs2/page.c   |   2 +-
 include/linux/balloon_compaction.h | 263 
 include/linux/fs.h |   2 +-
 include/linux/migrate.h|  19 +++
 include/linux/pagemap.h|  16 ++
 include/linux/vm_event_item.h  |   7 +-
 mm/Kconfig |  15 ++
 mm/Makefile|   3 +-
 mm/balloon_compaction.c| 304 +
 mm/compaction.c|  27 +++-
 mm/migrate.c   |  86 ---
 mm/page_alloc.c|   2 +-
 mm/vmstat.c|   9 +-
 18 files changed, 862 insertions(+), 52 deletions(-)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c

Change log:
v12:
 * Address last suggestions on sorting the barriers usage out  (Mel Gorman);
 * Fix reported build breakages for CONFIG_BALLOON_COMPACTION=n (Andrew Morton);
 * Enhance commentary on the locking scheme used for balloon page compaction;
 * Move all the 'balloon 

[PATCH v12 1/7] mm: adjust address_space_operations.migratepage() return code

2012-11-11 Thread Rafael Aquini
This patch introduces MIGRATEPAGE_SUCCESS as the default return code
for address_space_operations.migratepage() method and documents the
expected return code for the same method in failure cases.

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 fs/hugetlbfs/inode.c|  4 ++--
 include/linux/migrate.h |  7 +++
 mm/migrate.c| 33 +++--
 3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 14bc0c1..fed1cd5 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -582,11 +582,11 @@ static int hugetlbfs_migrate_page(struct address_space 
*mapping,
int rc;
 
rc = migrate_huge_page_move_mapping(mapping, newpage, page);
-   if (rc)
+   if (rc != MIGRATEPAGE_SUCCESS)
return rc;
migrate_page_copy(newpage, page);
 
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 
 static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9a5afea..fab15ae 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -7,6 +7,13 @@
 
 typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
+/*
+ * Return values from addresss_space_operations.migratepage():
+ * - negative errno on page migration failure;
+ * - zero on page migration success;
+ */
+#define MIGRATEPAGE_SUCCESS0
+
 #ifdef CONFIG_MIGRATION
 
 extern void putback_lru_pages(struct list_head *l);
diff --git a/mm/migrate.c b/mm/migrate.c
index 0c5ec37..6f408c7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -286,7 +286,7 @@ static int migrate_page_move_mapping(struct address_space 
*mapping,
expected_count += 1;
if (page_count(page) != expected_count)
return -EAGAIN;
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
}
 
spin_lock_irq(mapping-tree_lock);
@@ -356,7 +356,7 @@ static int migrate_page_move_mapping(struct address_space 
*mapping,
}
spin_unlock_irq(mapping-tree_lock);
 
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 
 /*
@@ -372,7 +372,7 @@ int migrate_huge_page_move_mapping(struct address_space 
*mapping,
if (!mapping) {
if (page_count(page) != 1)
return -EAGAIN;
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
}
 
spin_lock_irq(mapping-tree_lock);
@@ -399,7 +399,7 @@ int migrate_huge_page_move_mapping(struct address_space 
*mapping,
page_unfreeze_refs(page, expected_count - 1);
 
spin_unlock_irq(mapping-tree_lock);
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 
 /*
@@ -486,11 +486,11 @@ int migrate_page(struct address_space *mapping,
 
rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);
 
-   if (rc)
+   if (rc != MIGRATEPAGE_SUCCESS)
return rc;
 
migrate_page_copy(newpage, page);
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 EXPORT_SYMBOL(migrate_page);
 
@@ -513,7 +513,7 @@ int buffer_migrate_page(struct address_space *mapping,
 
rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);
 
-   if (rc)
+   if (rc != MIGRATEPAGE_SUCCESS)
return rc;
 
/*
@@ -549,7 +549,7 @@ int buffer_migrate_page(struct address_space *mapping,
 
} while (bh != head);
 
-   return 0;
+   return MIGRATEPAGE_SUCCESS;
 }
 EXPORT_SYMBOL(buffer_migrate_page);
 #endif
@@ -628,7 +628,7 @@ static int fallback_migrate_page(struct address_space 
*mapping,
  *
  * Return value:
  *0 - error code
- *  == 0 - success
+ *  MIGRATEPAGE_SUCCESS - success
  */
 static int move_to_new_page(struct page *newpage, struct page *page,
int remap_swapcache, enum migrate_mode mode)
@@ -665,7 +665,7 @@ static int move_to_new_page(struct page *newpage, struct 
page *page,
else
rc = fallback_migrate_page(mapping, newpage, page, mode);
 
-   if (rc) {
+   if (rc != MIGRATEPAGE_SUCCESS) {
newpage-mapping = NULL;
} else {
if (remap_swapcache)
@@ -814,7 +814,7 @@ skip_unmap:
put_anon_vma(anon_vma);
 
 uncharge:
-   mem_cgroup_end_migration(mem, page, newpage, rc == 0);
+   mem_cgroup_end_migration(mem, page, newpage, rc == MIGRATEPAGE_SUCCESS);
 unlock:
unlock_page(page);
 out:
@@ -987,7 +987,7 @@ int migrate_pages(struct list_head *from,
case -EAGAIN:
retry++;
break;
-   case 0:
+   case MIGRATEPAGE_SUCCESS:
break;
default:
/* Permanent failure */
@@ -996,15 +996,12 @@ int migrate_pages(struct list_head *from,

[PATCH v12 2/7] mm: redefine address_space.assoc_mapping

2012-11-11 Thread Rafael Aquini
This patch overhauls struct address_space.assoc_mapping renaming it to
address_space.private_data and its type is redefined to void*.
By this approach we consistently name the .private_* elements from
struct address_space as well as allow extended usage for address_space
association with other data structures through -private_data.

Also, all users of old -assoc_mapping element are converted to reflect
its new name and type change (-private_data).

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 fs/buffer.c| 12 ++--
 fs/gfs2/glock.c|  2 +-
 fs/inode.c |  2 +-
 fs/nilfs2/page.c   |  2 +-
 include/linux/fs.h |  2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b5f0442..e0bad95 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -555,7 +555,7 @@ void emergency_thaw_all(void)
  */
 int sync_mapping_buffers(struct address_space *mapping)
 {
-   struct address_space *buffer_mapping = mapping-assoc_mapping;
+   struct address_space *buffer_mapping = mapping-private_data;
 
if (buffer_mapping == NULL || list_empty(mapping-private_list))
return 0;
@@ -588,10 +588,10 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, 
struct inode *inode)
struct address_space *buffer_mapping = bh-b_page-mapping;
 
mark_buffer_dirty(bh);
-   if (!mapping-assoc_mapping) {
-   mapping-assoc_mapping = buffer_mapping;
+   if (!mapping-private_data) {
+   mapping-private_data = buffer_mapping;
} else {
-   BUG_ON(mapping-assoc_mapping != buffer_mapping);
+   BUG_ON(mapping-private_data != buffer_mapping);
}
if (!bh-b_assoc_map) {
spin_lock(buffer_mapping-private_lock);
@@ -788,7 +788,7 @@ void invalidate_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = inode-i_data;
struct list_head *list = mapping-private_list;
-   struct address_space *buffer_mapping = mapping-assoc_mapping;
+   struct address_space *buffer_mapping = mapping-private_data;
 
spin_lock(buffer_mapping-private_lock);
while (!list_empty(list))
@@ -811,7 +811,7 @@ int remove_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = inode-i_data;
struct list_head *list = mapping-private_list;
-   struct address_space *buffer_mapping = mapping-assoc_mapping;
+   struct address_space *buffer_mapping = mapping-private_data;
 
spin_lock(buffer_mapping-private_lock);
while (!list_empty(list)) {
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 6114571..904a808 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -766,7 +766,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
mapping-host = s-s_bdev-bd_inode;
mapping-flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
-   mapping-assoc_mapping = NULL;
+   mapping-private_data = NULL;
mapping-backing_dev_info = s-s_bdi;
mapping-writeback_index = 0;
}
diff --git a/fs/inode.c b/fs/inode.c
index b03c719..4cac8e1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -165,7 +165,7 @@ int inode_init_always(struct super_block *sb, struct inode 
*inode)
mapping-host = inode;
mapping-flags = 0;
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
-   mapping-assoc_mapping = NULL;
+   mapping-private_data = NULL;
mapping-backing_dev_info = default_backing_dev_info;
mapping-writeback_index = 0;
 
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 3e7b2a0..07f76db 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -431,7 +431,7 @@ void nilfs_mapping_init(struct address_space *mapping, 
struct inode *inode,
mapping-host = inode;
mapping-flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
-   mapping-assoc_mapping = NULL;
+   mapping-private_data = NULL;
mapping-backing_dev_info = bdi;
mapping-a_ops = empty_aops;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b33cfc9..0982565 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -418,7 +418,7 @@ struct address_space {
struct backing_dev_info *backing_dev_info; /* device readahead, etc */
spinlock_t  private_lock;   /* for use by the address_space 
*/
struct list_headprivate_list;   /* ditto */
-   struct address_space*assoc_mapping; /* ditto */
+   void*private_data;  /* ditto */
 } __attribute__((aligned(sizeof(long;
/*
 * On most architectures that alignment is already the case; but
-- 
1.7.11.7

___
Virtualization mailing 

[PATCH v12 3/7] mm: introduce a common interface for balloon pages mobility

2012-11-11 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.

Signed-off-by: Rafael Aquini aqu...@redhat.com
Acked-by: Mel Gorman m...@csn.ul.ie
---
 include/linux/balloon_compaction.h | 256 +++
 include/linux/migrate.h|  10 ++
 include/linux/pagemap.h|  16 ++
 mm/Kconfig |  15 ++
 mm/Makefile|   3 +-
 mm/balloon_compaction.c| 302 +
 6 files changed, 601 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c

diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
new file mode 100644
index 000..2e63d94
--- /dev/null
+++ b/include/linux/balloon_compaction.h
@@ -0,0 +1,256 @@
+/*
+ * include/linux/balloon_compaction.h
+ *
+ * Common interface definitions for making balloon pages movable by compaction.
+ *
+ * Despite being perfectly possible to perform ballooned pages migration, they
+ * make a special corner case to compaction scans because balloon pages are not
+ * enlisted at any LRU list like the other pages we do compact / migrate.
+ *
+ * As the page isolation scanning step a compaction thread does is a lockless
+ * procedure (from a page standpoint), it might bring some racy situations 
while
+ * performing balloon page compaction. In order to sort out these racy 
scenarios
+ * and safely perform balloon's page compaction and migration we must, always,
+ * ensure following these three simple rules:
+ *
+ *   i. when updating a balloon's page -mapping element, strictly do it under
+ *  the following lock order, independently of the far superior
+ *  locking scheme (lru_lock, balloon_lock):
+ * +-page_lock(page);
+ *   +--spin_lock_irq(b_dev_info-pages_lock);
+ * ... page-mapping updates here ...
+ *
+ *  ii. before isolating or dequeueing a balloon page from the balloon device
+ *  pages list, the page reference counter must be raised by one and the
+ *  extra refcount must be dropped when the page is enqueued back into
+ *  the balloon device page list, thus a balloon page keeps its reference
+ *  counter raised only while it is under our special handling;
+ *
+ * iii. after the lockless scan step have selected a potential balloon page for
+ *  isolation, re-test the page-mapping flags and the page ref counter
+ *  under the proper page lock, to ensure isolating a valid balloon page
+ *  (not yet isolated, nor under release procedure)
+ *
+ * The functions provided by this interface are placed to help on coping with
+ * the aforementioned balloon page corner case, as well as to ensure the simple
+ * set of exposed rules are satisfied while we are dealing with balloon pages
+ * compaction / migration.
+ *
+ * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini aqu...@redhat.com
+ */
+#ifndef _LINUX_BALLOON_COMPACTION_H
+#define _LINUX_BALLOON_COMPACTION_H
+#include linux/pagemap.h
+#include linux/migrate.h
+#include linux/gfp.h
+#include linux/err.h
+
+/*
+ * Balloon device information descriptor.
+ * This struct is used to allow the common balloon compaction interface
+ * procedures to find the proper balloon device holding memory pages they'll
+ * have to cope for page compaction / migration, as well as it serves the
+ * balloon driver as a page book-keeper for its registered balloon devices.
+ */
+struct balloon_dev_info {
+   void *balloon_device;   /* balloon device descriptor */
+   struct address_space *mapping;  /* balloon special page-mapping */
+   unsigned long isolated_pages;   /* # of isolated pages for migration */
+   spinlock_t pages_lock;  /* Protection to pages list */
+   struct list_head pages; /* Pages enqueued  handled to Host */
+};
+
+extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
+extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
+extern struct balloon_dev_info *balloon_devinfo_alloc(
+   void *balloon_dev_descriptor);
+
+static inline void balloon_devinfo_free(struct balloon_dev_info *b_dev_info)
+{
+   kfree(b_dev_info);
+}
+
+/*
+ * balloon_page_free - release a balloon page back to the page free lists
+ * @page: ballooned page to be set free
+ *
+ * This function must be used to properly set free an isolated/dequeued balloon
+ * page at the end of a sucessful page migration, or 

[PATCH v12 4/7] mm: introduce compaction and migration for ballooned pages

2012-11-11 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini aqu...@redhat.com
Acked-by: Mel Gorman m...@csn.ul.ie
---
 mm/compaction.c | 21 +++--
 mm/migrate.c| 34 --
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9eef558..76abd84 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
 #include linux/backing-dev.h
 #include linux/sysctl.h
 #include linux/sysfs.h
+#include linux/balloon_compaction.h
 #include internal.h
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -565,9 +566,24 @@ isolate_migratepages_range(struct zone *zone, struct 
compact_control *cc,
goto next_pageblock;
}
 
-   /* Check may be lockless but that's ok as we recheck later */
-   if (!PageLRU(page))
+   /*
+* Check may be lockless but that's ok as we recheck later.
+* It's possible to migrate LRU pages and balloon pages
+* Skip any other type of page
+*/
+   if (!PageLRU(page)) {
+   if (unlikely(balloon_page_movable(page))) {
+   if (locked  balloon_page_isolate(page)) {
+   /* Successfully isolated */
+   cc-finished_update_migrate = true;
+   list_add(page-lru, migratelist);
+   cc-nr_migratepages++;
+   nr_isolated++;
+   goto check_compact_cluster;
+   }
+   }
continue;
+   }
 
/*
 * PageLRU is set. lru_lock normally excludes isolation
@@ -621,6 +637,7 @@ isolate_migratepages_range(struct zone *zone, struct 
compact_control *cc,
cc-nr_migratepages++;
nr_isolated++;
 
+check_compact_cluster:
/* Avoid isolating too much */
if (cc-nr_migratepages == COMPACT_CLUSTER_MAX) {
++low_pfn;
diff --git a/mm/migrate.c b/mm/migrate.c
index 6f408c7..a771751 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,6 +35,7 @@
 #include linux/hugetlb.h
 #include linux/hugetlb_cgroup.h
 #include linux/gfp.h
+#include linux/balloon_compaction.h
 
 #include asm/tlbflush.h
 
@@ -79,7 +80,10 @@ void putback_lru_pages(struct list_head *l)
list_del(page-lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   putback_lru_page(page);
+   if (unlikely(balloon_page_movable(page)))
+   balloon_page_putback(page);
+   else
+   putback_lru_page(page);
}
 }
 
@@ -778,6 +782,18 @@ static int __unmap_and_move(struct page *page, struct page 
*newpage,
}
}
 
+   if (unlikely(balloon_page_movable(page))) {
+   /*
+* A ballooned page does not need any special attention from
+* physical to virtual reverse mapping procedures.
+* Skip any attempt to unmap PTEs or to remap swap cache,
+* in order to avoid burning cycles at rmap level, and perform
+* the page migration right away (proteced by page lock).
+*/
+   rc = balloon_page_migrate(newpage, page, mode);
+   goto uncharge;
+   }
+
/*
 * Corner case handling:
 * 1. When a new swap-cache page is read into, it is added to the LRU
@@ -814,7 +830,9 @@ skip_unmap:
put_anon_vma(anon_vma);
 
 uncharge:
-   mem_cgroup_end_migration(mem, page, newpage, rc == MIGRATEPAGE_SUCCESS);
+   mem_cgroup_end_migration(mem, page, newpage,
+(rc == MIGRATEPAGE_SUCCESS ||
+ rc == MIGRATEPAGE_BALLOON_SUCCESS));
 unlock:
unlock_page(page);
 out:
@@ -846,6 +864,18 @@ static int unmap_and_move(new_page_t get_new_page, 
unsigned long private,
goto out;
 
rc = __unmap_and_move(page, newpage, force, offlining, mode);
+
+   if (unlikely(rc == MIGRATEPAGE_BALLOON_SUCCESS)) {
+   /*
+* A ballooned page has been 

[PATCH v12 5/7] virtio_balloon: introduce migration primitives to balloon pages

2012-11-11 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme, in order to
enhance the syncronization methods for accessing elements of struct
virtio_balloon, thus providing protection against concurrent access
introduced by parallel memory migration threads.

 - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;

Signed-off-by: Rafael Aquini aqu...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 139 
 1 file changed, 127 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 586395c..8f92ab7 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
 #include linux/delay.h
 #include linux/slab.h
 #include linux/module.h
+#include linux/balloon_compaction.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -34,6 +35,7 @@
  * page units.
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE  VIRTIO_BALLOON_PFN_SHIFT)
+#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
 
 struct virtio_balloon
 {
@@ -52,15 +54,19 @@ struct virtio_balloon
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
/*
-* The pages we've told the Host we're not using.
+* The pages we've told the Host we're not using are enqueued
+* at vb_dev_info-pages list.
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
 * to num_pages above.
 */
-   struct list_head pages;
+   struct balloon_dev_info *vb_dev_info;
+
+   /* Synchronize access/update to this struct virtio_balloon elements */
+   struct mutex balloon_lock;
 
/* The array of pfns we tell the Host about. */
unsigned int num_pfns;
-   u32 pfns[256];
+   u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
 
/* Memory statistics */
int need_stats_update;
@@ -122,17 +128,20 @@ static void set_page_pfns(u32 pfns[], struct page *page)
 
 static void fill_balloon(struct virtio_balloon *vb, size_t num)
 {
+   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
+
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 
+   mutex_lock(vb-balloon_lock);
for (vb-num_pfns = 0; vb-num_pfns  num;
 vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-   __GFP_NOMEMALLOC | __GFP_NOWARN);
+   struct page *page = balloon_page_enqueue(vb_dev_info);
+
if (!page) {
dev_info_ratelimited(vb-vdev-dev,
 Out of puff! Can't get %zu 
pages\n,
-num);
+VIRTIO_BALLOON_PAGES_PER_PAGE);
/* Sleep for at least 1/5 of a second before retry. */
msleep(200);
break;
@@ -140,7 +149,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
-   list_add(page-lru, vb-pages);
}
 
/* Didn't get any?  Oh well. */
@@ -148,6 +156,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
return;
 
tell_host(vb, vb-inflate_vq);
+   mutex_unlock(vb-balloon_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -156,7 +165,7 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned 
int num)
 
/* Find pfns pointing at start of each page, get pages and free them. */
for (i = 0; i  num; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   __free_page(balloon_pfn_to_page(pfns[i]));
+   balloon_page_free(balloon_pfn_to_page(pfns[i]));
totalram_pages++;
}
 }
@@ -164,14 +173,17 @@ static void release_pages_by_pfn(const u32 pfns[], 
unsigned int num)
 static void leak_balloon(struct virtio_balloon *vb, size_t num)
 {
struct page *page;
+   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 

[PATCH v12 7/7] mm: add vm event counters for balloon pages compaction

2012-11-11 Thread Rafael Aquini
This patch introduces a new set of vm event counters to keep track of
ballooned pages compaction activity.

Signed-off-by: Rafael Aquini aqu...@redhat.com
---
 include/linux/balloon_compaction.h | 7 +++
 include/linux/vm_event_item.h  | 7 ++-
 mm/balloon_compaction.c| 2 ++
 mm/migrate.c   | 1 +
 mm/vmstat.c| 9 -
 5 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
index 2e63d94..68893bc 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -197,8 +197,15 @@ static inline bool balloon_compaction_check(void)
return true;
 }
 
+static inline void balloon_event_count(enum vm_event_item item)
+{
+   count_vm_event(item);
+}
 #else /* !CONFIG_BALLOON_COMPACTION */
 
+/* A macro, to avoid generating references to the undefined COMPACTBALLOON* */
+#define balloon_event_count(item) do { } while (0)
+
 static inline void *balloon_mapping_alloc(void *balloon_device,
const struct address_space_operations *a_ops)
 {
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 3d31145..bd67c3f 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,7 +41,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
-#endif
+#ifdef CONFIG_BALLOON_COMPACTION
+   COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
+   COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
+   COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
+#endif /* CONFIG_BALLOON_COMPACTION */
+#endif /* CONFIG_COMPACTION */
 #ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
 #endif
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 07dbc8e..2c8ce49 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -242,6 +242,7 @@ bool balloon_page_isolate(struct page *page)
if (__is_movable_balloon_page(page) 
page_count(page) == 2) {
__isolate_balloon_page(page);
+   balloon_event_count(COMPACTBALLOONISOLATED);
unlock_page(page);
return true;
}
@@ -265,6 +266,7 @@ void balloon_page_putback(struct page *page)
__putback_balloon_page(page);
/* drop the extra ref count taken for page isolation */
put_page(page);
+   balloon_event_count(COMPACTBALLOONRETURNED);
} else {
WARN_ON(1);
dump_page(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index 107a281..ecae213 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -894,6 +894,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned 
long private,
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
balloon_page_free(page);
+   balloon_event_count(COMPACTBALLOONMIGRATED);
return MIGRATEPAGE_SUCCESS;
}
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..18a76ea 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -781,7 +781,14 @@ const char * const vmstat_text[] = {
compact_stall,
compact_fail,
compact_success,
-#endif
+
+#ifdef CONFIG_BALLOON_COMPACTION
+   compact_balloon_isolated,
+   compact_balloon_migrated,
+   compact_balloon_returned,
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+#endif /* CONFIG_COMPACTION */
 
 #ifdef CONFIG_HUGETLB_PAGE
htlb_buddy_alloc_success,
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

2012-11-11 Thread Rafael Aquini
On Sat, Nov 10, 2012 at 05:55:38PM +0200, Michael S. Tsirkin wrote:
  mutex_unlock(vb-balloon_lock);
  +   balloon_event_count(COMPACTBALLOONMIGRATED);
   
  return MIGRATEPAGE_BALLOON_SUCCESS;
   }
 
 Looks like any ballon would need to do this.
 Can this  chunk go into caller instead?


Good catch. It's done, already (v12 just hit the wild).

Thanks!
-- Rafael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

2012-11-11 Thread Rusty Russell
Rafael Aquini aqu...@redhat.com writes:

 On Thu, Nov 08, 2012 at 09:32:18AM +1030, Rusty Russell wrote:
 The first one can be delayed, the second one can be delayed if the host
 didn't ask for VIRTIO_BALLOON_F_MUST_TELL_HOST (qemu doesn't).
 
 We could implement a proper request queue for these, and return -EAGAIN
 if the queue fills.  Though in practice, it's not important (it might
 help performance).

 I liked the idea. Give me the directions to accomplish it and I'll give it a 
 try
 for sure.

OK, let's get this applied first, but here are some pointers:

Here's the current callback function when the host has processed the
buffers we put in the queue:

 static void balloon_ack(struct virtqueue *vq)
 {
struct virtio_balloon *vb = vq-vdev-priv;

wake_up(vb-acked);
 }

It's almost a noop: here's how we use it to make our queues synchronous:

 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
struct scatterlist sg;
unsigned int len;

sg_init_one(sg, vb-pfns, sizeof(vb-pfns[0]) * vb-num_pfns);

/* We should always be able to add one buffer to an empty queue. */
if (virtqueue_add_buf(vq, sg, 1, 0, vb, GFP_KERNEL)  0)
BUG();
virtqueue_kick(vq);

/* When host has read buffer, this completes via balloon_ack */
wait_event(vb-acked, virtqueue_get_buf(vq, len));
 }

And we set up the callback when we create the virtqueue:

vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
...
err = vb-vdev-config-find_vqs(vb-vdev, nvqs, vqs, callbacks, names);

So off the top of my head it should be as simple as changing tell_host()
to only wait if the virtqueue_add_buf() fails (ie. queue is full).

Hmm, though you will want to synchronize the inflate and deflate queues:
if we tell the host we're giving a page up we want it to have seen that
before we tell it we're using it again...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization