Re: [virtio-dev] Re: [virtio-comment] [PATCH v3 2/3] content: Document balloon feature page poison

2020-05-27 Thread Wei Wang

On 05/26/2020 11:38 PM, Cornelia Huck wrote:

On Tue, 26 May 2020 17:28:00 +0200
David Hildenbrand  wrote:


On 26.05.20 16:50, Alexander Duyck wrote:

On Tue, May 26, 2020 at 1:24 AM David Hildenbrand  wrote:

Still wondering what to do with free page hinting ... in the meantime
I'll have a look at free page reporting :)

The problem is it is already out there so I worry we wouldn't ever be
able to get rid of it. At most we could deprecate it, but we are still
stuck with it consuming bit resources and such.

Yeah, that's not an issue, they will simply turn to dead bits with
minimal documentation. I just don't see us fixing/supporting that
feature, really. Let's see what @MST things when he has time to look
into this.


If free page hinting is broken enough that we don't want anybody to try
implementing it, we maybe could:


May I know the issues that you got with FREE_PAGE_HINT?

I thought the discussion was about the PAGE_POISON bit (not related to 
FREE_PAGE_HINT).

It just enables a way for guest to tell host about the poison value.
I think currently we don't have a usage in QEMU to use the poison value 
(not sure about other userspace),

so it's disabled by default.

Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH] virtio-balloon: fix a use-after-free case

2019-03-12 Thread Wei Wang
The elem could theorically contain both outbuf and inbufs. We move the
free operation to the end of this function to avoid using elem->in_sg
while elem has been freed.

Fixes: c13c4153f7
("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Reported-by: Peter Maydell 
Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e3a6594..b614552 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -391,6 +391,7 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 VirtQueueElement *elem;
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtQueue *vq = dev->free_page_vq;
+bool ret = true;
 
 while (dev->block_iothread) {
 qemu_cond_wait(>free_page_cond, >free_page_lock);
@@ -405,13 +406,12 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 uint32_t id;
 size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
  , sizeof(id));
-virtqueue_push(vq, elem, size);
-g_free(elem);
 
 virtio_tswap32s(vdev, );
 if (unlikely(size != sizeof(id))) {
 virtio_error(vdev, "received an incorrect cmd id");
-return false;
+ret = false;
+goto out;
 }
 if (id == dev->free_page_report_cmd_id) {
 dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
@@ -431,11 +431,12 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
-virtqueue_push(vq, elem, 1);
-g_free(elem);
 }
 
-return true;
+out:
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+return ret;
 }
 
 static void virtio_ballloon_get_free_page_hints(void *opaque)
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-03-05 Thread Wei Wang

On 03/05/2019 10:50 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
  hw/virtio/virtio-balloon.c  | 263 
  include/hw/virtio/virtio-balloon.h  |  28 ++-
  include/standard-headers/linux/virtio_balloon.h |   5 +
  3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
+config.free_page_report_cmd_id =
+   cpu_to_le32(dev->free_page_report_cmd_id);
+} else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+config.free_page_report_cmd_id =
+   cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+} else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_DONE) {
+config.free_page_report_cmd_id =
+   cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_DONE_ID);
+}

It looks like somewhere in the last 3 months the name in the kernel
changed; so I think I've fixed this correctly but please shout if it's
wrong:

 if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
 config.free_page_report_cmd_id =
cpu_to_le32(dev->free_page_report_cmd_id);
 } else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
 config.free_page_report_cmd_id =
cpu_to_le32(VIRTIO_BALLOON_CMD_ID_STOP);
 } else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_DONE) {
 config.free_page_report_cmd_id =
cpu_to_le32(VIRTIO_BALLOON_CMD_ID_DONE);
 }

and I've dropped the kernel header update since it's already there.



Looks good. Thanks for the update.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH 2/2] virtio: document virtio_config_ops restrictions

2019-01-14 Thread Wei Wang

On 01/04/2019 12:08 AM, Cornelia Huck wrote:

Some transports (e.g. virtio-ccw) implement virtio operations that
seem to be a simple read/write as something more involved that
cannot be done from an atomic context.

Give at least a hint about that.

Signed-off-by: Cornelia Huck 
---
  include/linux/virtio_config.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 7087ef946ba7..987b6491b946 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -12,6 +12,11 @@ struct irq_affinity;
  
  /**

   * virtio_config_ops - operations for configuring a virtio device
+ * Note: Do not assume that a transport implements all of the operations
+ *   getting/setting a value as a simple read/write! Generally speaking,
+ *   any of @get/@set, @get_status/@set_status, or @get_features/
+ *   @finalize_features are NOT safe to be called from an atomic
+ *   context.
   * @get: read the value of a configuration field
   *vdev: the virtio_device
   *offset: the offset of the configuration field



Reviewed-by: Wei Wang 

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 1/2] virtio: fix virtio_config_ops description

2019-01-14 Thread Wei Wang

On 01/14/2019 10:09 PM, Cornelia Huck wrote:

On Thu,  3 Jan 2019 17:08:03 +0100
Cornelia Huck  wrote:


- get_features has returned 64 bits since commit d025477368792
   ("virtio: add support for 64 bit features.")
- properly mark all optional callbacks

Signed-off-by: Cornelia Huck 
---
  include/linux/virtio_config.h | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 32baf8e26735..7087ef946ba7 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -22,7 +22,7 @@ struct irq_affinity;
   *offset: the offset of the configuration field
   *buf: the buffer to read the field value from.
   *len: the length of the buffer
- * @generation: config generation counter
+ * @generation: config generation counter (optional)
   *vdev: the virtio_device
   *Returns the config generation counter
   * @get_status: read the status byte
@@ -48,17 +48,17 @@ struct irq_affinity;
   * @del_vqs: free virtqueues found by find_vqs().
   * @get_features: get the array of feature bits for this device.
   *vdev: the virtio_device
- * Returns the first 32 feature bits (all we currently need).
+ * Returns the first 64 feature bits (all we currently need).
   * @finalize_features: confirm what device features we'll be using.
   *vdev: the virtio_device
   *This gives the final feature bits for the device: it can change
   *the dev->feature bits if it wants.
   *Returns 0 on success or error status
- * @bus_name: return the bus name associated with the device
+ * @bus_name: return the bus name associated with the device (optional)
   *vdev: the virtio_device
   *  This returns a pointer to the bus name a la pci_name from which
   *  the caller can then copy.
- * @set_vq_affinity: set the affinity for a virtqueue.
+ * @set_vq_affinity: set the affinity for a virtqueue (optional).
   * @get_vq_affinity: get the affinity for a virtqueue (optional).
   */
  typedef void vq_callback_t(struct virtqueue *);

Ping. Any feedback on that patch?


Reviewed-by: Wei Wang 

Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-09 Thread Wei Wang

On 01/08/2019 04:46 PM, Christian Borntraeger wrote:


On 08.01.2019 06:35, Wei Wang wrote:

On 01/07/2019 09:49 PM, Christian Borntraeger wrote:

On 07.01.2019 08:01, Wei Wang wrote:

virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

The cmd_id_received is also renamed to cmd_id_received_cache, and
the value should be obtained via virtio_balloon_cmd_id_received.

Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 

Together with
virtio_pci: use queue idx instead of array idx to set up the vq
virtio: don't allocate vqs when names[i] = NULL

Tested-by: Christian Borntraeger 

OK. I don't plan to send a new version of the above patches as no changes 
needed so far.

Michael, if the above two patches look good to you, please help add the related 
tested-by
and reviewed-by tags. Thanks.

Can we also make sure that

virtio_pci: use queue idx instead of array idx to set up the vq
virtio: don't allocate vqs when names[i] = NULL

also land in stable?



You could also send the request to stable after it gets merged to Linus' 
tree.

The stable review committee will decide whether to take it.

Please see Option 2:

https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Wei Wang

On 01/07/2019 09:49 PM, Christian Borntraeger wrote:


On 07.01.2019 08:01, Wei Wang wrote:

virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

The cmd_id_received is also renamed to cmd_id_received_cache, and
the value should be obtained via virtio_balloon_cmd_id_received.

Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 

Together with
   virtio_pci: use queue idx instead of array idx to set up the vq
   virtio: don't allocate vqs when names[i] = NULL

Tested-by: Christian Borntraeger 


OK. I don't plan to send a new version of the above patches as no 
changes needed so far.


Michael, if the above two patches look good to you, please help add the 
related tested-by

and reviewed-by tags. Thanks.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v4 2/3] virtio-balloon: improve update_balloon_size_func

2019-01-07 Thread Wei Wang
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.

Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
Tested-by: Christian Borntraeger 
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 45d32f5..d48c12c 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -457,9 +457,12 @@ static void update_balloon_size_func(struct work_struct 
*work)
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (!diff)
+   return;
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
-   else if (diff < 0)
+   else
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
 
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v4 1/3] virtio-balloon: tweak config_changed implementation

2019-01-07 Thread Wei Wang
virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

The cmd_id_received is also renamed to cmd_id_received_cache, and
the value should be obtained via virtio_balloon_cmd_id_received.

Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
Tested-by: Christian Borntraeger 
---
 drivers/virtio/virtio_balloon.c | 98 +++--
 1 file changed, 65 insertions(+), 33 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1..45d32f5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -61,6 +61,10 @@ enum virtio_balloon_vq {
VIRTIO_BALLOON_VQ_MAX
 };
 
+enum virtio_balloon_config_read {
+   VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
@@ -77,14 +81,20 @@ struct virtio_balloon {
/* Prevent updating balloon when it is being canceled. */
spinlock_t stop_update_lock;
bool stop_update;
+   /* Bitmap to indicate if reading the related config fields are needed */
+   unsigned long config_read_bitmap;
 
/* The list of allocated free pages, waiting to be given back to mm */
struct list_head free_page_list;
spinlock_t free_page_list_lock;
/* The number of free page blocks on the above list */
unsigned long num_free_page_blocks;
-   /* The cmd id received from host */
-   u32 cmd_id_received;
+   /*
+* The cmd id received from host.
+* Read it via virtio_balloon_cmd_id_received to get the latest value
+* sent from host.
+*/
+   u32 cmd_id_received_cache;
/* The cmd id that is actively in use */
__virtio32 cmd_id_active;
/* Buffer to store the stop sign */
@@ -390,37 +400,31 @@ static unsigned long return_free_pages_to_mm(struct 
virtio_balloon *vb,
return num_returned;
 }
 
+static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
+{
+   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+   return;
+
+   /* No need to queue the work if the bit was already set. */
+   if (test_and_set_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
+>config_read_bitmap))
+   return;
+
+   queue_work(vb->balloon_wq, >report_free_page_work);
+}
+
 static void virtballoon_changed(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
-   s64 diff = towards_target(vb);
-
-   if (diff) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq,
-  >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
 
-   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-   virtio_cread(vdev, struct virtio_balloon_config,
-free_page_report_cmd_id, >cmd_id_received);
-   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
-   /* Pass ULONG_MAX to give back all the free pages */
-   return_free_pages_to_mm(vb, ULONG_MAX);
-   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
-  vb->cmd_id_received !=
-  virtio32_to_cpu(vdev, vb->cmd_id_active)) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update) {
-   queue_work(vb->balloon_wq,
-  >report_free_page_work);
-   }
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update) {
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   virtio_balloon_queue_free_page_work(vb);
}
+   spin_unlock_irqrestore(>stop_update_lock, flags);
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
@@ -527,6 +531,17 @@ static int init_vqs(struct virtio_balloon *vb)
return 0;
 }
 
+static u32 virtio_balloon_cmd_id_received(struct 

[virtio-dev] [PATCH v4 3/3] virtio_balloon: remove the unnecessary 0-initialization

2019-01-07 Thread Wei Wang
We've changed to kzalloc the vb struct, so no need to 0-initialize
this field one more time.

Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
---
 drivers/virtio/virtio_balloon.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d48c12c..048959a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -925,7 +925,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
  VIRTIO_BALLOON_CMD_ID_STOP);
vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
  VIRTIO_BALLOON_CMD_ID_STOP);
-   vb->num_free_page_blocks = 0;
spin_lock_init(>free_page_list_lock);
INIT_LIST_HEAD(>free_page_list);
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v4 0/3] virtio-balloon: tweak config_changed

2019-01-07 Thread Wei Wang
Since virtio-ccw doesn't work with accessing to the config space
inside an interrupt context, this patch series avoids that issue by
moving the config register accesses to the related workqueue contexts.

v3->v4 ChangeLog:
- change virtio32_to_cpu to cpu_to_virtio_32 in send_cmd_id_start;
v2->v3 ChangeLog:
- rename cmd_id_received to cmd_id_received_cache, and have call sites
  read the latest value via virtio_balloon_cmd_id_received. (Still
  kept Cornelia and Halil's reviewed-by as it's a minor change)
- remove zeroing vb->num_free_page_blocks in probe since vb is
  allocated via kzalloc.
v1->v2 ChangeLog:
- add config_read_bitmap to indicate to the workqueue callbacks about
  the necessity of reading the related config fields.

Wei Wang (3):
  virtio-balloon: tweak config_changed implementation
  virtio-balloon: improve update_balloon_size_func
  virtio_balloon: remove the unnecessary 0-initialization

 drivers/virtio/virtio_balloon.c | 104 ++--
 1 file changed, 69 insertions(+), 35 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v3 3/3] virtio_balloon: remove the unnecessary 0-initialization

2019-01-06 Thread Wei Wang
We've changed to kzalloc the vb struct, so no need to 0-initialize
this field one more time.

Signed-off-by: Wei Wang 
---
 drivers/virtio/virtio_balloon.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index e33dc8e..f19061b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -925,7 +925,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
  VIRTIO_BALLOON_CMD_ID_STOP);
vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
  VIRTIO_BALLOON_CMD_ID_STOP);
-   vb->num_free_page_blocks = 0;
spin_lock_init(>free_page_list_lock);
INIT_LIST_HEAD(>free_page_list);
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-06 Thread Wei Wang
virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

The cmd_id_received is also renamed to cmd_id_received_cache, and
the value should be obtained via virtio_balloon_cmd_id_received.

Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
---
 drivers/virtio/virtio_balloon.c | 98 +++--
 1 file changed, 65 insertions(+), 33 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1..fb12fe2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -61,6 +61,10 @@ enum virtio_balloon_vq {
VIRTIO_BALLOON_VQ_MAX
 };
 
+enum virtio_balloon_config_read {
+   VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
@@ -77,14 +81,20 @@ struct virtio_balloon {
/* Prevent updating balloon when it is being canceled. */
spinlock_t stop_update_lock;
bool stop_update;
+   /* Bitmap to indicate if reading the related config fields are needed */
+   unsigned long config_read_bitmap;
 
/* The list of allocated free pages, waiting to be given back to mm */
struct list_head free_page_list;
spinlock_t free_page_list_lock;
/* The number of free page blocks on the above list */
unsigned long num_free_page_blocks;
-   /* The cmd id received from host */
-   u32 cmd_id_received;
+   /*
+* The cmd id received from host.
+* Read it via virtio_balloon_cmd_id_received to get the latest value
+* sent from host.
+*/
+   u32 cmd_id_received_cache;
/* The cmd id that is actively in use */
__virtio32 cmd_id_active;
/* Buffer to store the stop sign */
@@ -390,37 +400,31 @@ static unsigned long return_free_pages_to_mm(struct 
virtio_balloon *vb,
return num_returned;
 }
 
+static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
+{
+   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+   return;
+
+   /* No need to queue the work if the bit was already set. */
+   if (test_and_set_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
+>config_read_bitmap))
+   return;
+
+   queue_work(vb->balloon_wq, >report_free_page_work);
+}
+
 static void virtballoon_changed(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
-   s64 diff = towards_target(vb);
-
-   if (diff) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq,
-  >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
 
-   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-   virtio_cread(vdev, struct virtio_balloon_config,
-free_page_report_cmd_id, >cmd_id_received);
-   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
-   /* Pass ULONG_MAX to give back all the free pages */
-   return_free_pages_to_mm(vb, ULONG_MAX);
-   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
-  vb->cmd_id_received !=
-  virtio32_to_cpu(vdev, vb->cmd_id_active)) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update) {
-   queue_work(vb->balloon_wq,
-  >report_free_page_work);
-   }
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update) {
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   virtio_balloon_queue_free_page_work(vb);
}
+   spin_unlock_irqrestore(>stop_update_lock, flags);
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
@@ -527,6 +531,17 @@ static int init_vqs(struct virtio_balloon *vb)
return 0;
 }
 
+static u32 virtio_balloon_cmd_id_received(struct virtio_balloon *vb)
+{
+   if (test_and_clear_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
+  >config_r

[virtio-dev] [PATCH v3 2/3] virtio-balloon: improve update_balloon_size_func

2019-01-06 Thread Wei Wang
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.

Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index fb12fe2..e33dc8e 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -457,9 +457,12 @@ static void update_balloon_size_func(struct work_struct 
*work)
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (!diff)
+   return;
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
-   else if (diff < 0)
+   else
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
 
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v2 2/2] virtio-balloon: improve update_balloon_size_func

2019-01-03 Thread Wei Wang
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.

Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 35ee762..af1e6d9 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -453,9 +453,12 @@ static void update_balloon_size_func(struct work_struct 
*work)
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (!diff)
+   return;
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
-   else if (diff < 0)
+   else
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
 
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v2 1/2] virtio-balloon: tweak config_changed implementation

2019-01-03 Thread Wei Wang
virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
---
 drivers/virtio/virtio_balloon.c | 81 +++--
 1 file changed, 53 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1..35ee762 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -61,6 +61,10 @@ enum virtio_balloon_vq {
VIRTIO_BALLOON_VQ_MAX
 };
 
+enum virtio_balloon_config_read {
+   VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
@@ -77,6 +81,8 @@ struct virtio_balloon {
/* Prevent updating balloon when it is being canceled. */
spinlock_t stop_update_lock;
bool stop_update;
+   /* Bitmap to indicate if reading the related config fields are needed */
+   unsigned long config_read_bitmap;
 
/* The list of allocated free pages, waiting to be given back to mm */
struct list_head free_page_list;
@@ -390,37 +396,31 @@ static unsigned long return_free_pages_to_mm(struct 
virtio_balloon *vb,
return num_returned;
 }
 
+static void virtio_balloon_queue_free_page_work(struct virtio_balloon *vb)
+{
+   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+   return;
+
+   /* No need to queue the work if the bit was already set. */
+   if (test_and_set_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
+>config_read_bitmap))
+   return;
+
+   queue_work(vb->balloon_wq, >report_free_page_work);
+}
+
 static void virtballoon_changed(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
-   s64 diff = towards_target(vb);
-
-   if (diff) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq,
-  >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
 
-   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-   virtio_cread(vdev, struct virtio_balloon_config,
-free_page_report_cmd_id, >cmd_id_received);
-   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
-   /* Pass ULONG_MAX to give back all the free pages */
-   return_free_pages_to_mm(vb, ULONG_MAX);
-   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
-  vb->cmd_id_received !=
-  virtio32_to_cpu(vdev, vb->cmd_id_active)) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update) {
-   queue_work(vb->balloon_wq,
-  >report_free_page_work);
-   }
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update) {
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   virtio_balloon_queue_free_page_work(vb);
}
+   spin_unlock_irqrestore(>stop_update_lock, flags);
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
@@ -609,6 +609,16 @@ static int get_free_page_and_send(struct virtio_balloon 
*vb)
return 0;
 }
 
+static void virtio_balloon_read_cmd_id_received(struct virtio_balloon *vb)
+{
+   if (!test_and_clear_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
+   >config_read_bitmap))
+   return;
+
+   virtio_cread(vb->vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+}
+
 static int send_free_pages(struct virtio_balloon *vb)
 {
int err;
@@ -620,6 +630,7 @@ static int send_free_pages(struct virtio_balloon *vb)
 * stop the reporting.
 */
cmd_id_active = virtio32_to_cpu(vb->vdev, vb->cmd_id_active);
+   virtio_balloon_read_cmd_id_received(vb);
if (cmd_id_active != vb->cmd_id_received)
break;
 
@@ -637,11 +648,9 @@ static int send_free_pages(struct virtio_balloon *vb)
return 0;
 }
 
-static void repo

[virtio-dev] [PATCH v2 0/2] virtio-balloon: tweak config_changed

2019-01-03 Thread Wei Wang
Since virtio-ccw doesn't work with accessing to the config space
inside an interrupt context, this patch series avoids that issue by
moving the config register accesses to the related workqueue contexts.

v1->v2 ChangeLog:
- add config_read_bitmap to indicate to the workqueue callbacks about
  the necessity of reading the related config fields.

Wei Wang (2):
  virtio-balloon: tweak config_changed implementation
  virtio-balloon: improve update_balloon_size_func

 drivers/virtio/virtio_balloon.c | 86 +++--
 1 file changed, 57 insertions(+), 29 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v1 1/2] virtio-balloon: tweak config_changed implementation

2019-01-03 Thread Wei Wang

On 01/04/2019 12:42 AM, Michael S. Tsirkin wrote:

On Thu, Jan 03, 2019 at 01:31:01PM +0800, Wei Wang wrote:

virtio-ccw has deadlock issues with reading config registers inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.

Signed-off-by: Wei Wang 
---
  drivers/virtio/virtio_balloon.c | 54 -
  1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1..9a82a11 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -394,33 +394,15 @@ static void virtballoon_changed(struct virtio_device 
*vdev)
  {
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
-   s64 diff = towards_target(vb);
-
-   if (diff) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq,
-  >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
  
-	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {

-   virtio_cread(vdev, struct virtio_balloon_config,
-free_page_report_cmd_id, >cmd_id_received);

There's one problem with this approach:

previously updating the cmd_id_received here would immediately
stop the report in send_free_pages.

With this approach we are waiting for the wq to schedule,
which might be blocked waiting for report to complete.
So host can no longer quickly stop the report in progress.

A simple work-around would be to set some kind of flag whenever there
is a change interrupt, then have send_free_pages test it
and re-read cmd_id_received.

Needs to be an atomic I guess ...



Yes, sounds better..will update the patch.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v1 1/2] virtio-balloon: tweak config_changed implementation

2019-01-03 Thread Wei Wang

On 01/03/2019 05:40 PM, Cornelia Huck wrote:

On Thu,  3 Jan 2019 13:31:01 +0800
Wei Wang  wrote:


virtio-ccw has deadlock issues with reading config registers inside the

s/config registers/the config space/ ?


Sounds good.




interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.

Also credit Christian with a Reported-by:?


Yes, definitely. Sorry for missing that.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v1 0/2] Virtio: fix some vq allocation issues

2019-01-02 Thread Wei Wang

On 12/28/2018 03:57 PM, Christian Borntraeger wrote:


On 28.12.2018 03:26, Wei Wang wrote:

Some vqs don't need to be allocated when the related feature bits are
disabled. Callers notice the vq allocation layer by setting the related
names[i] to be NULL.

This patch series fixes the find_vqs implementations to handle this case.

So the random crashes during boot are gone.
What still does not work is actually using the balloon.

So in the qemu monitor using lets say "balloon 1000"  will hang the guest.
Seems to be a deadlock in the virtio-ccw code.  We seem to call the
config code in the interrupt handler.


Please try with applying both this series and the "virtio-balloon: tweak 
config_changed"

series that I just sent.
This series fixes the ccw booting issue and that series tries to fix the 
ccw deadlock issue.


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v1 1/2] virtio-balloon: tweak config_changed implementation

2019-01-02 Thread Wei Wang
virtio-ccw has deadlock issues with reading config registers inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.

Signed-off-by: Wei Wang 
---
 drivers/virtio/virtio_balloon.c | 54 -
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1..9a82a11 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -394,33 +394,15 @@ static void virtballoon_changed(struct virtio_device 
*vdev)
 {
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
-   s64 diff = towards_target(vb);
-
-   if (diff) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq,
-  >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
 
-   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-   virtio_cread(vdev, struct virtio_balloon_config,
-free_page_report_cmd_id, >cmd_id_received);
-   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
-   /* Pass ULONG_MAX to give back all the free pages */
-   return_free_pages_to_mm(vb, ULONG_MAX);
-   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
-  vb->cmd_id_received !=
-  virtio32_to_cpu(vdev, vb->cmd_id_active)) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update) {
-   queue_work(vb->balloon_wq,
-  >report_free_page_work);
-   }
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update) {
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+   queue_work(vb->balloon_wq, >report_free_page_work);
}
+   spin_unlock_irqrestore(>stop_update_lock, flags);
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
@@ -637,11 +619,9 @@ static int send_free_pages(struct virtio_balloon *vb)
return 0;
 }
 
-static void report_free_page_func(struct work_struct *work)
+static void virtio_balloon_report_free_page(struct virtio_balloon *vb)
 {
int err;
-   struct virtio_balloon *vb = container_of(work, struct virtio_balloon,
-report_free_page_work);
struct device *dev = >vdev->dev;
 
/* Start by sending the received cmd id to host with an outbuf. */
@@ -659,6 +639,24 @@ static void report_free_page_func(struct work_struct *work)
dev_err(dev, "Failed to send a stop id, err = %d\n", err);
 }
 
+static void report_free_page_func(struct work_struct *work)
+{
+   struct virtio_balloon *vb = container_of(work, struct virtio_balloon,
+report_free_page_work);
+
+   virtio_cread(vb->vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+
+   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
+   /* Pass ULONG_MAX to give back all the free pages */
+   return_free_pages_to_mm(vb, ULONG_MAX);
+   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
+  vb->cmd_id_received !=
+  virtio32_to_cpu(vb->vdev, vb->cmd_id_active)) {
+   virtio_balloon_report_free_page(vb);
+   }
+}
+
 #ifdef CONFIG_BALLOON_COMPACTION
 /*
  * virtballoon_migratepage - perform the balloon page migration on behalf of
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v1 0/2] virtio-balloon: tweak config_changed

2019-01-02 Thread Wei Wang
Since virtio-ccw doesn't work with accessing to the config registers
inside an interrupt context, this patch series avoids that issue by
moving the config register accesses to the related workqueue contexts.

Wei Wang (2):
  virtio-balloon: tweak config_changed implementation
  virtio-balloon: improve update_balloon_size_func

 drivers/virtio/virtio_balloon.c | 59 +
 1 file changed, 30 insertions(+), 29 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v1 2/2] virtio-balloon: improve update_balloon_size_func

2019-01-02 Thread Wei Wang
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.

Signed-off-by: Wei Wang 
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9a82a11..4884b85 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -435,9 +435,12 @@ static void update_balloon_size_func(struct work_struct 
*work)
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (!diff)
+   return;
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
-   else if (diff < 0)
+   else
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
 
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v37 0/3] Virtio-balloon: support free page reporting

2018-12-27 Thread Wei Wang

On 12/27/2018 08:17 PM, Christian Borntraeger wrote:


On 27.12.2018 12:59, Christian Borntraeger wrote:

On 27.12.2018 12:31, Christian Borntraeger wrote:

This patch triggers random crashes in the guest kernel on s390 early during 
boot.
No migration and no setting of the balloon is involved.


Adding Conny and Halil,

As the QEMU provides no PAGE_HINT feature yet, this quick hack makes the
guest boot fine again:


diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1eea305..aa2e1864c5736 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -492,7 +492,7 @@ static int init_vqs(struct virtio_balloon *vb)
 callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
 }
  
-   err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,

+   err = vb->vdev->config->find_vqs(vb->vdev, 3, //VIRTIO_BALLOON_VQ_MAX,
  vqs, callbacks, names, NULL, NULL);
 if (err)
 return err;


To me it looks like that virtio_ccw_find_vqs will abort if any of the virtqueues
that it is been asked for does not exist (including the earlier ones).


This "hack" makes the random crashes go away, but the balloon interface itself
does not work. (setting the value to anything will hang the guest).
As patch 1 also modifies the main path, there seem to be additional issues, 
maybe
endianess

Looking at things like

+   vb->cmd_id_received = VIRTIO_BALLOON_CMD_ID_STOP;
+   vb->cmd_id_active = cpu_to_virtio32(vb->vdev,
+ VIRTIO_BALLOON_CMD_ID_STOP);
+   vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
+ VIRTIO_BALLOON_CMD_ID_STOP);


Why is cmd_id_received not using cpu_to_virtio32?



That conversion is only needed when we need to send the value to the device.
cmd_id_received doesn't need to be sent to the device.

Best,
Wei



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v1 0/2] Virtio: fix some vq allocation issues

2018-12-27 Thread Wei Wang
Some vqs don't need to be allocated when the related feature bits are
disabled. Callers notice the vq allocation layer by setting the related
names[i] to be NULL.

This patch series fixes the find_vqs implementations to handle this case.

Wei Wang (2):
  virtio_pci: use queue idx instead of array idx to set up the vq
  virtio: don't allocate vqs when names[i] = NULL

 drivers/misc/mic/vop/vop_main.c|  9 +++--
 drivers/remoteproc/remoteproc_virtio.c |  9 +++--
 drivers/s390/virtio/virtio_ccw.c   | 12 +---
 drivers/virtio/virtio_mmio.c   |  9 +++--
 drivers/virtio/virtio_pci_common.c |  8 
 5 files changed, 34 insertions(+), 13 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v11 1/7] bitmap: fix bitmap_count_one

2018-12-13 Thread Wei Wang

On 12/13/2018 10:28 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

OK; it's a little odd that it's only bitmap_count_one that's being fixed
for this case; but OK.


Reviewed-by: Dr. David Alan Gilbert 

Thanks.

We could also help fix other callers outside this series.
(this one is put here as it helps this optimization feature avoid that 
issue).


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v11 4/7] migration: API to clear bits of guest free pages from the dirty bitmap

2018-12-11 Thread Wei Wang
This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
Reviewed-by: Peter Xu 
---
 include/migration/misc.h |  2 ++
 migration/ram.c  | 47 +++
 2 files changed, 49 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 01d267f..c13b44f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3131,6 +3131,53 @@ static void ram_state_resume_prepare(RAMState *rs, 
QEMUFile *out)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len, addr += used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+if (unlikely(!block || offset >= block->used_length)) {
+/*
+ * The implementation might not support RAMBlock resize during
+ * live migration, but it could happen in theory with future
+ * updates. So we add a check here to capture that case.
+ */
+error_report_once("%s unexpected error", __func__);
+return;
+}
+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+}
+
+start = offset >> TARGET_PAGE_BITS;
+npages = used_len >> TARGET_PAGE_BITS;
+
+qemu_mutex_lock(_state->bitmap_mutex);
+ram_state->migration_dirty_pages -=
+  bitmap_count_one_with_offset(block->bmap, start, npages);
+bitmap_clear(block->bmap, start, npages);
+qemu_mutex_unlock(_state->bitmap_mutex);
+}
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-11 Thread Wei Wang
The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/standard-headers/linux/virtio_balloon.h |   5 +
 3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1728e4f..543bbd4 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/misc.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -308,6 +309,184 @@ out:
 }
 }
 
+static void virtio_balloon_handle_free_page_vq(VirtIODevice *vdev,
+   VirtQueue *vq)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+qemu_bh_schedule(s->free_page_bh);
+}
+
+static bool get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return false;
+}
+
+if (elem->out_num) {
+uint32_t id;
+size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
+ , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);
+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return false;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+}
+}
+}
+
+if (elem->in_num) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+}
+
+return true;
+}
+
+static void virtio_ballloon_get_free_page_hints(void *opaque)
+{
+VirtIOBalloon *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+bool continue_to_get_hints;
+
+do {
+qemu_mutex_lock(>free_page_lock);
+virtio_queue_set_notification(vq, 0);
+continue_to_get_hints = get_free_page_hints(dev);
+qemu_mutex_unlock(>free_page_lock);
+virtio_notify(vdev, vq);
+  /*
+   * Start to poll the vq once the reporting started. Otherwise, continue
+   * only when there are entries on the vq, which need to be given back.
+   */
+} while (continue_to_get_hints ||
+ dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+virtio_queue_set_notification(vq, 1);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(VirtIOBalloon *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+/* For the stop and copy phase, we don't need to start the optimization */
+if (!vdev->vm_running) {
+return;
+}
+
+if (s->free_page_report_cmd_id == UINT_MAX) {
+s->free_page_report_cmd_id

[virtio-dev] [PATCH v11 5/7] migration/ram.c: add a notifier chain for precopy

2018-12-11 Thread Wei Wang
This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
Reviewed-by: Peter Xu 
---
 include/migration/misc.h | 19 ++
 migration/ram.c  | 51 +---
 migration/savevm.c   | 15 ++
 vl.c |  1 +
 4 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..15f8d00 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,6 +19,25 @@
 
 /* migration/ram.c */
 
+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_SETUP = 0,
+PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
+PRECOPY_NOTIFY_AFTER_BITMAP_SYNC = 2,
+PRECOPY_NOTIFY_COMPLETE = 3,
+PRECOPY_NOTIFY_CLEANUP = 4,
+PRECOPY_NOTIFY_MAX = 5,
+} PrecopyNotifyReason;
+
+typedef struct PrecopyNotifyData {
+enum PrecopyNotifyReason reason;
+Error **errp;
+} PrecopyNotifyData;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(NotifierWithReturn *n);
+void precopy_remove_notifier(NotifierWithReturn *n);
+int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
 
diff --git a/migration/ram.c b/migration/ram.c
index c13b44f..1e00d92 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -328,6 +328,32 @@ typedef struct RAMState RAMState;
 
 static RAMState *ram_state;
 
+static NotifierWithReturnList precopy_notifier_list;
+
+void precopy_infrastructure_init(void)
+{
+notifier_with_return_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_remove(n);
+}
+
+int precopy_notify(PrecopyNotifyReason reason, Error **errp)
+{
+PrecopyNotifyData pnd;
+pnd.reason = reason;
+pnd.errp = errp;
+
+return notifier_with_return_list_notify(_notifier_list, );
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1701,6 +1727,25 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
+static void migration_bitmap_sync_precopy(RAMState *rs)
+{
+Error *local_err = NULL;
+
+/*
+ * The current notifier usage is just an optimization to migration, so we
+ * don't stop the normal migration process in the error case.
+ */
+if (precopy_notify(PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+
+migration_bitmap_sync(rs);
+
+if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+}
+
 /**
  * save_zero_page_to_file: send the zero page to the file
  *
@@ -3072,7 +3117,7 @@ static void ram_init_bitmaps(RAMState *rs)
 
 ram_list_init_bitmaps();
 memory_global_dirty_log_start();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 
 rcu_read_unlock();
 qemu_mutex_unlock_ramlist();
@@ -3348,7 +3393,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 rcu_read_lock();
 
 if (!migration_in_postcopy()) {
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -3397,7 +3442,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 remaining_size < max_size) {
 qemu_mutex_lock_iothread();
 rcu_read_lock();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 rcu_read_unlock();
 qemu_mutex_unlock_iothread();
 remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e45fb4..ec74f44 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1018,6 +1018,7 @@ void qemu_savevm_state_header(QEMUFile *f)
 void qemu_savevm_state_setup(QEMUFile *f)
 {
 SaveStateEntry *se;
+Error *local_err = NULL;
 int ret;
 
 trace_savevm_state_setup();
@@ -1039,6 +1040,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
 break;
 }
 }
+
+if (precopy_notify(PRECOPY_NOTIFY_SETUP, _err)) {
+error_report_err(local_err);
+}
 }
 
 int qemu_savevm_state_resume_prepare(MigrationState *s)
@@ -1181,6 +1186,11 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool 
iterable_only,
 SaveStateEntry *se;
 int ret;
 bool in_postcopy = migration_in_postcopy();
+Error *local_err = NULL;
+
+if (precopy_notify(PRECOPY_NOTIFY_COMPLETE, _err)) {
+error_report_err(local_err);
+}
 
 trace_savevm_state_complete_precopy();
 
@@ -1313,6 +1323,1

[virtio-dev] [PATCH v11 2/7] bitmap: bitmap_count_one_with_offset

2018-12-11 Thread Wei Wang
Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/qemu/bitmap.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 679f1bd..5c31334 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -232,6 +232,19 @@ static inline long bitmap_count_one(const unsigned long 
*bitmap, long nbits)
 }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+long offset, long nbits)
+{
+long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+long redundant_bits = offset - aligned_offset;
+long bits_to_count = nbits + redundant_bits;
+const unsigned long *bitmap_start = bitmap +
+aligned_offset / BITS_PER_LONG;
+
+return bitmap_count_one(bitmap_start, bits_to_count) -
+   bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v11 1/7] bitmap: fix bitmap_count_one

2018-12-11 Thread Wei Wang
BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

Inital Discussion Link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg554316.html
Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..679f1bd 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -221,6 +221,10 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 
 static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
 {
+if (unlikely(!nbits)) {
+return 0;
+}
+
 if (small_nbits(nbits)) {
 return ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits));
 } else {
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v11 0/7] virtio-balloon: free page hint support

2018-12-11 Thread Wei Wang
This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
  (i.e. make bzImage -j4) running
  1) Live Migration Time:
 Optimization v.s. Legacy = 2273ms v.s. 4502ms
 --> ~50% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min42s v.s. 8min43s
 --> no obvious difference

2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
  1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
  2) Linux Compilation Time:
Optimization v.s. Legacy = 8min30s v.s. 8min36s
--> no obvious difference

ChangeLog:
v10->v11:
migration:
- qemu_guest_free_page_hint:
- "offset >= block->used_length", instead of
  "offset > block->used_length";
- RAMState: enable the "fpo_enabled" flag, when the free page
  optimization feature is used, instead of disabling ram_bulk_stage.
  Please see patch 6 commit log for details.

Previous changelog:
http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00055.html

Wei Wang (7):
  bitmap: fix bitmap_count_one
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  migration/ram.c: add a notifier chain for precopy
  migration/ram.c: add the free page optimization enable flag
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/migration/misc.h|  22 ++
 include/qemu/bitmap.h   |  17 ++
 include/standard-headers/linux/virtio_balloon.h |   5 +
 migration/ram.c | 121 ++-
 migration/savevm.c  |  15 ++
 vl.c|   1 +
 8 files changed, 466 insertions(+), 6 deletions(-)

-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v10 6/7] migration/ram.c: add a function to disable the bulk stage

2018-12-03 Thread Wei Wang

On 12/03/2018 04:20 PM, Wei Wang wrote:

On 12/03/2018 01:31 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:30AM +0800, Wei Wang wrote:
This patch adds a function to enable a precopy notifier callback 
outside
the migration subsystem to disable the bulk stage flag. This is 
needed by

the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  include/migration/misc.h | 1 +
  migration/ram.c  | 9 +
  2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 15f8d00..47e7ff5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -37,6 +37,7 @@ void precopy_infrastructure_init(void);
  void precopy_add_notifier(NotifierWithReturn *n);
  void precopy_remove_notifier(NotifierWithReturn *n);
  int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+void precopy_disable_bulk_stage(void);
void ram_mig_init(void);
  void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index b90a3f2..739dc97 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -354,6 +354,15 @@ int precopy_notify(PrecopyNotifyReason reason, 
Error **errp)
  return 
notifier_with_return_list_notify(_notifier_list, );

  }
  +void precopy_disable_bulk_stage(void)
+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+

This one is a bit tricky.  E.g., I think it'll at least affect XBZRLE
and compression somehow.  Since we will have the on-off switch for
balloon free page hinting so the user can at least decide what
features to use, with that I don't see much issue with it so far. But
I'd also like to see how other people see this change.



Yes. I think it would be better that this optimization could co-exist 
with the

compression feature.

How about adding a new flag "ram_state->free_page_optimization_enabled"?

We will then need a change in migration_bitmap_find_dirty:
 - if (rs->ram_bulk_stage && start > 0) {
+ if (!rs->free_page_optimization_enabled && rs->ram_bulk_stage && 
start > 0) {

next = start + 1;
} else {
next = find_next_bit(bitmap, size, start);
}



Btw, with the new flag, this patch is not needed, so "ram_bulk_stage" 
will not be changed.


Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v10 6/7] migration/ram.c: add a function to disable the bulk stage

2018-12-03 Thread Wei Wang

On 12/03/2018 01:31 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:30AM +0800, Wei Wang wrote:

This patch adds a function to enable a precopy notifier callback outside
the migration subsystem to disable the bulk stage flag. This is needed by
the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  include/migration/misc.h | 1 +
  migration/ram.c  | 9 +
  2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 15f8d00..47e7ff5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -37,6 +37,7 @@ void precopy_infrastructure_init(void);
  void precopy_add_notifier(NotifierWithReturn *n);
  void precopy_remove_notifier(NotifierWithReturn *n);
  int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+void precopy_disable_bulk_stage(void);
  
  void ram_mig_init(void);

  void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index b90a3f2..739dc97 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -354,6 +354,15 @@ int precopy_notify(PrecopyNotifyReason reason, Error 
**errp)
  return notifier_with_return_list_notify(_notifier_list, );
  }
  
+void precopy_disable_bulk_stage(void)

+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+

This one is a bit tricky.  E.g., I think it'll at least affect XBZRLE
and compression somehow.  Since we will have the on-off switch for
balloon free page hinting so the user can at least decide what
features to use, with that I don't see much issue with it so far. But
I'd also like to see how other people see this change.



Yes. I think it would be better that this optimization could co-exist 
with the

compression feature.

How about adding a new flag "ram_state->free_page_optimization_enabled"?

We will then need a change in migration_bitmap_find_dirty:
 - if (rs->ram_bulk_stage && start > 0) {
+ if (!rs->free_page_optimization_enabled && rs->ram_bulk_stage && start 
> 0) {

next = start + 1;
} else {
next = find_next_bit(bitmap, size, start);
}

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v10 5/7] migration/ram.c: add a notifier chain for precopy

2018-12-03 Thread Wei Wang

On 12/03/2018 01:20 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:29AM +0800, Wei Wang wrote:

This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---

[...]


+int precopy_notify(PrecopyNotifyReason reason, Error **errp);

This function could be hidden from include/migration/misc.h, but I
don't know whether that is a reason for a repost.

And what I meant was that we fail precopy if notifier failed the
hooks, but warning is fine too anyway no real use case there.


Yes, I was also thinking about this. Failing precopy needs to change
some upper layer functions which are defined with "void".

There is no use case that needs to fail precopy currently, so I chose to
keep the change minimal. But we can make more changes if people think
it's necessary to do.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v10 4/7] migration: API to clear bits of guest free pages from the dirty bitmap

2018-12-03 Thread Wei Wang

On 12/03/2018 01:10 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:28AM +0800, Wei Wang wrote:

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---

[...]


+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len, addr += used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+if (unlikely(!block || offset > block->used_length)) {

Maybe >=?  My fault if it is...


It would not cause problems (npages will be 0 below),
but I agree it sounds better to have ">=" earlier here. Thanks!
No problem, I can post another new version.


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v10 5/7] migration/ram.c: add a notifier chain for precopy

2018-12-02 Thread Wei Wang
This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 19 ++
 migration/ram.c  | 51 +---
 migration/savevm.c   | 15 ++
 vl.c |  1 +
 4 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..15f8d00 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,6 +19,25 @@
 
 /* migration/ram.c */
 
+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_SETUP = 0,
+PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
+PRECOPY_NOTIFY_AFTER_BITMAP_SYNC = 2,
+PRECOPY_NOTIFY_COMPLETE = 3,
+PRECOPY_NOTIFY_CLEANUP = 4,
+PRECOPY_NOTIFY_MAX = 5,
+} PrecopyNotifyReason;
+
+typedef struct PrecopyNotifyData {
+enum PrecopyNotifyReason reason;
+Error **errp;
+} PrecopyNotifyData;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(NotifierWithReturn *n);
+void precopy_remove_notifier(NotifierWithReturn *n);
+int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
 
diff --git a/migration/ram.c b/migration/ram.c
index 4f7a3fe..b90a3f2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -328,6 +328,32 @@ typedef struct RAMState RAMState;
 
 static RAMState *ram_state;
 
+static NotifierWithReturnList precopy_notifier_list;
+
+void precopy_infrastructure_init(void)
+{
+notifier_with_return_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_remove(n);
+}
+
+int precopy_notify(PrecopyNotifyReason reason, Error **errp)
+{
+PrecopyNotifyData pnd;
+pnd.reason = reason;
+pnd.errp = errp;
+
+return notifier_with_return_list_notify(_notifier_list, );
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1701,6 +1727,25 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
+static void migration_bitmap_sync_precopy(RAMState *rs)
+{
+Error *local_err = NULL;
+
+/*
+ * The current notifier usage is just an optimization to migration, so we
+ * don't stop the normal migration process in the error case.
+ */
+if (precopy_notify(PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+
+migration_bitmap_sync(rs);
+
+if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+}
+
 /**
  * save_zero_page_to_file: send the zero page to the file
  *
@@ -3072,7 +3117,7 @@ static void ram_init_bitmaps(RAMState *rs)
 
 ram_list_init_bitmaps();
 memory_global_dirty_log_start();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 
 rcu_read_unlock();
 qemu_mutex_unlock_ramlist();
@@ -3348,7 +3393,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 rcu_read_lock();
 
 if (!migration_in_postcopy()) {
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -3397,7 +3442,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 remaining_size < max_size) {
 qemu_mutex_lock_iothread();
 rcu_read_lock();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 rcu_read_unlock();
 qemu_mutex_unlock_iothread();
 remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e45fb4..ec74f44 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1018,6 +1018,7 @@ void qemu_savevm_state_header(QEMUFile *f)
 void qemu_savevm_state_setup(QEMUFile *f)
 {
 SaveStateEntry *se;
+Error *local_err = NULL;
 int ret;
 
 trace_savevm_state_setup();
@@ -1039,6 +1040,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
 break;
 }
 }
+
+if (precopy_notify(PRECOPY_NOTIFY_SETUP, _err)) {
+error_report_err(local_err);
+}
 }
 
 int qemu_savevm_state_resume_prepare(MigrationState *s)
@@ -1181,6 +1186,11 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool 
iterable_only,
 SaveStateEntry *se;
 int ret;
 bool in_postcopy = migration_in_postcopy();
+Error *local_err = NULL;
+
+if (precopy_notify(PRECOPY_NOTIFY_COMPLETE, _err)) {
+error_report_err(local_err);
+}
 
 trace_savevm_state_complete_precopy();
 
@@ -1313,6 +1323,11 @@ void qemu_savevm_state_pending(QE

[virtio-dev] [PATCH v10 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-02 Thread Wei Wang
The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/standard-headers/linux/virtio_balloon.h |   5 +
 3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1728e4f..5376f9b 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/misc.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -308,6 +309,184 @@ out:
 }
 }
 
+static void virtio_balloon_handle_free_page_vq(VirtIODevice *vdev,
+   VirtQueue *vq)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+qemu_bh_schedule(s->free_page_bh);
+}
+
+static bool get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return false;
+}
+
+if (elem->out_num) {
+uint32_t id;
+size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
+ , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);
+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return false;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+}
+}
+}
+
+if (elem->in_num) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+}
+
+return true;
+}
+
+static void virtio_ballloon_get_free_page_hints(void *opaque)
+{
+VirtIOBalloon *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+bool continue_to_get_hints;
+
+do {
+qemu_mutex_lock(>free_page_lock);
+virtio_queue_set_notification(vq, 0);
+continue_to_get_hints = get_free_page_hints(dev);
+qemu_mutex_unlock(>free_page_lock);
+virtio_notify(vdev, vq);
+  /*
+   * Start to poll the vq once the reporting started. Otherwise, continue
+   * only when there are entries on the vq, which need to be given back.
+   */
+} while (continue_to_get_hints ||
+ dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+virtio_queue_set_notification(vq, 1);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(VirtIOBalloon *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+/* For the stop and copy phase, we don't need to start the optimization */
+if (!vdev->vm_running) {
+return;
+}
+
+if (s->free_page_report_cmd_id == UINT_MAX) {
+s->free_page_report_cmd_id

[virtio-dev] [PATCH v10 2/7] bitmap: bitmap_count_one_with_offset

2018-12-02 Thread Wei Wang
Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/qemu/bitmap.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 679f1bd..5c31334 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -232,6 +232,19 @@ static inline long bitmap_count_one(const unsigned long 
*bitmap, long nbits)
 }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+long offset, long nbits)
+{
+long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+long redundant_bits = offset - aligned_offset;
+long bits_to_count = nbits + redundant_bits;
+const unsigned long *bitmap_start = bitmap +
+aligned_offset / BITS_PER_LONG;
+
+return bitmap_count_one(bitmap_start, bits_to_count) -
+   bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v10 6/7] migration/ram.c: add a function to disable the bulk stage

2018-12-02 Thread Wei Wang
This patch adds a function to enable a precopy notifier callback outside
the migration subsystem to disable the bulk stage flag. This is needed by
the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 1 +
 migration/ram.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 15f8d00..47e7ff5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -37,6 +37,7 @@ void precopy_infrastructure_init(void);
 void precopy_add_notifier(NotifierWithReturn *n);
 void precopy_remove_notifier(NotifierWithReturn *n);
 int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+void precopy_disable_bulk_stage(void);
 
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index b90a3f2..739dc97 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -354,6 +354,15 @@ int precopy_notify(PrecopyNotifyReason reason, Error 
**errp)
 return notifier_with_return_list_notify(_notifier_list, );
 }
 
+void precopy_disable_bulk_stage(void)
+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v10 4/7] migration: API to clear bits of guest free pages from the dirty bitmap

2018-12-02 Thread Wei Wang
This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h |  2 ++
 migration/ram.c  | 47 +++
 2 files changed, 49 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 01d267f..4f7a3fe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3131,6 +3131,53 @@ static void ram_state_resume_prepare(RAMState *rs, 
QEMUFile *out)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len, addr += used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+if (unlikely(!block || offset > block->used_length)) {
+/*
+ * The implementation might not support RAMBlock resize during
+ * live migration, but it could happen in theory with future
+ * updates. So we add a check here to capture that case.
+ */
+error_report_once("%s unexpected error", __func__);
+return;
+}
+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+}
+
+start = offset >> TARGET_PAGE_BITS;
+npages = used_len >> TARGET_PAGE_BITS;
+
+qemu_mutex_lock(_state->bitmap_mutex);
+ram_state->migration_dirty_pages -=
+  bitmap_count_one_with_offset(block->bmap, start, npages);
+bitmap_clear(block->bmap, start, npages);
+qemu_mutex_unlock(_state->bitmap_mutex);
+}
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v10 1/7] bitmap: fix bitmap_count_one

2018-12-02 Thread Wei Wang
BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

Inital Discussion Link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg554316.html
Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..679f1bd 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -221,6 +221,10 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 
 static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
 {
+if (unlikely(!nbits)) {
+return 0;
+}
+
 if (small_nbits(nbits)) {
 return ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits));
 } else {
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v10 0/7] virtio-balloon: free page hint support

2018-12-02 Thread Wei Wang
This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
  (i.e. make bzImage -j4) running
  1) Live Migration Time:
 Optimization v.s. Legacy = 2273ms v.s. 4502ms
 --> ~50% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min42s v.s. 8min43s
 --> no obvious difference

2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
  1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min30s v.s. 8min36s
 --> no obvious difference

ChangeLog:
v9->v10:
migration:
- qemu_guest_free_page_hint:
- move "addr += used_len" to the for() loop;
- add some comments about the ram block resize case.
- precopy notifier:
- enable callbacks to report errors (as the postcopy notifier);
- rename the notifier reasons:
- PRECOPY_NOTIFY_COMPLETE notifies the callback about the
  end of precopy;
- PRECOPY_NOTIFY_CLEANUP notifies the callback in the case
  migration is cancelled;
- relocate some notifications to the upper layer functions
  (i.e. qemu_savevm_state_*), which can get the callbacks notified
  when non-ram-saving code has errors which stops the precopy.
- migration_bitmap_sync_precopy to be called by the precopy code path
  with precopy notifier supported, and migration_bitmap_sync is left
  for callers without the need of precopy notifier (e.g. some postcopy
  code).
- avoid exposing migrate_postcopy;
virtio-balloon:
- virtio_balloon_free_page_report_notify: remove the migrate_postcopy
  check becasue the migration code has guaranteed that the notifier
  callback is invoked when in the precopy mode.

Previous changelog:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg02854.html

Wei Wang (7):
  bitmap: fix bitmap_count_one
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  migration/ram.c: add a notifier chain for precopy
  migration/ram.c: add a function to disable the bulk stage
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/migration/misc.h|  22 ++
 include/qemu/bitmap.h   |  17 ++
 include/standard-headers/linux/virtio_balloon.h |   5 +
 migration/ram.c | 112 +-
 migration/savevm.c  |  15 ++
 vl.c|   1 +
 8 files changed, 458 insertions(+), 5 deletions(-)

-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v10 3/7] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-12-02 Thread Wei Wang
The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 migration/ram.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..01d267f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -316,7 +316,7 @@ struct RAMState {
 uint64_t target_page_count;
 /* number of dirty bits in the bitmap */
 uint64_t migration_dirty_pages;
-/* protects modification of the bitmap */
+/* Protects modification of the bitmap and migration dirty pages */
 QemuMutex bitmap_mutex;
 /* The RAMBlock used in the last src_page_requests */
 RAMBlock *last_req_rb;
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
 {
 bool ret;
 
+qemu_mutex_lock(>bitmap_mutex);
 ret = test_and_clear_bit(page, rb->bmap);
 
 if (ret) {
 rs->migration_dirty_pages--;
 }
+qemu_mutex_unlock(>bitmap_mutex);
+
 return ret;
 }
 
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-29 Thread Wei Wang

On 11/29/2018 01:10 PM, Peter Xu wrote:

On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
I think this precopy notifier callchain is expected to be used only for
the precopy mode. Postcopy has its dedicated notifier callchain that
users could use.

How about changing the migrate_postcopy() check to "ms->start_postcopy":

bool migration_postcopy_start(void)
{
 MigrationState *s;

 s = migrate_get_current();

 return atomic_read(>start_postcopy);
}


static void precopy_notify(PrecopyNotifyReason reason)
{
 if (migration_postcopy_start())
 return;

 notifier_list_notify(_notifier_list, );
}

If postcopy started with precopy, the precopy optimization feature
could still be used until it switches to the postcopy mode.
I'm not sure we can use start_postcopy.  It's a variable being set in
the QMP handler but it does not mean postcopy has started.  I'm afraid
there can be race where it's still precopy but the variable is set so
event could be missed...


Peter,
I just found that migration_bitmap_sync is also called by 
ram_postcopy_send_discard_bitmap().

But we don't expect notifier_list_notify to be called in the postcopy mode.

So, probably we still need the start_postcopy check in 
notifier_list_notify though we have

the COMPLETE notifier.

Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-28 Thread Wei Wang

On 11/29/2018 01:10 PM, Peter Xu wrote:

On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:

On 11/28/2018 05:32 PM, Peter Xu wrote:

I'm not sure we can use start_postcopy.  It's a variable being set in
the QMP handler but it does not mean postcopy has started.  I'm afraid
there can be race where it's still precopy but the variable is set so
event could be missed...

IMHO the problem is not that complicated.  How about this proposal:

[1]

   typedef enum PrecopyNotifyReason {
 PRECOPY_NOTIFY_RAM_START,
 PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
 PRECOPY_NOTIFY_RAM_AFTER_SYNC,
 PRECOPY_NOTIFY_COMPLETE,
 PRECOPY_NOTIFY_RAM_CLEANUP,
   };

The first three keep the same as your old ones.  Notify RAM_CLEANUP in
ram_save_cleanup() to make sure it'll always be cleaned up (the same
as PRECOPY_END, just another name).  Notify COMPLETE in
qemu_savevm_state_complete_precopy() to show that precopy is
completed.  Meanwhile on balloon side you should stop the hinting for
either RAM_CLEANUP or COMPLETE event.  Then either:

   - precopy is switching to postcopy, or
   - precopy completed, or
   - precopy failed/cancelled

You should always get at least a notification to stop the balloon.
Though you could also get one RAM_CLEANUP after one COMPLETE, but
the balloon should easily handle it (stop the hinting twice).


Sounds better, thanks.




Here maybe you can even remove the "RAM_" in both RAM_START and
RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
not only limited to RAM.

Another suggestion is that you can add an Error into the notify hooks,
please refer to the postcopy one:

   int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);

So the hook functions have a way to even stop the migration (though
for balloon hinting it'll be always optional so no error should be
reported...), then the two interfaces are matched.


Yes, I removed that "errp" as it's not needed by the balloon usage.
But no problem, I can add it back if no objections from others.

Best,
Wei



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-28 Thread Wei Wang

On 11/28/2018 05:32 PM, Peter Xu wrote:


So what I am worrying here are corner cases where we might forget to
stop the hinting.  I'm fabricating one example sequence of events:

   (start migration)
   START_MIGRATION
   BEFORE_SYNC
   AFTER_SYNC
   ...
   BEFORE_SYNC
   AFTER_SYNC
   (some SaveStateEntry failed rather than RAM, then
migration_detect_error returned MIG_THR_ERR_FATAL so we need to
fail the migration, however when running the previous
ram_save_iterate for RAM's specific SaveStateEntry we didn't see
any error so no ERROR event detected)

Then it seems the hinting will last forever.  Considering that now I'm
not sure whether this can be done ram-only, since even if you capture
ram_save_complete() and at the same time you introduce PRECOPY_END you
may still miss the PRECOPY_END event since AFAIU ram_save_complete()
won't be called at all in this case.

Could this happen?


Thanks, indeed this case could happen if we add PRECOPY_END in
ram_save_complete.

How about putting PRECOPY_END in ram_save_cleanup?
I think it would be called in any case.

I'm also thinking probably we don't need PRECOPY_ERR when we have 
PRECOPY_END,

and what do you think of the notifier names below:

+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_RAM_SAVE_END = 0,
+PRECOPY_NOTIFY_RAM_SAVE_START = 1,
+PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
+PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
+PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
+} PrecopyNotifyReason;







[1]


Another thing to mention about the "reasons" (though I see it more
like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
It might help in some cases:

 - then you don't need to trickily export the migrate_postcopy()
   since you'll notify that before postcopy starts

I'm thinking probably we don't need to export migrate_postcopy even now.
It's more like a sanity check, and not needed because now we have the
notifier registered to the precopy specific callchain, which has ensured
that
it is invoked via precopy.

But postcopy will always start with precopy, no?

Yes, but I think we could add the check in precopy_notify()

I'm not sure that's good.  If the notifier could potentially have
other user, they might still work with postcopy, and they might expect
e.g. BEFORE_SYNC to be called for every sync, even if it's at the
precopy stage of a postcopy.


I think this precopy notifier callchain is expected to be used only for
the precopy mode. Postcopy has its dedicated notifier callchain that
users could use.

How about changing the migrate_postcopy() check to "ms->start_postcopy":

bool migration_postcopy_start(void)
{
MigrationState *s;

s = migrate_get_current();

return atomic_read(>start_postcopy);
}


static void precopy_notify(PrecopyNotifyReason reason)
{
if (migration_postcopy_start())
return;

notifier_list_notify(_notifier_list, );
}

If postcopy started with precopy, the precopy optimization feature
could still be used until it switches to the postcopy mode.




In that sense I still feel the
PRECOPY_END is better (so contantly call it at the end of precopy, no
matter whether there's another postcopy afterwards).  It sounds like a
cleaner interface.


Probably I still haven't got the point how PRECOPY_END could help above yet.

Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-28 Thread Wei Wang

On 11/28/2018 01:26 PM, Peter Xu wrote:


Ok thanks.  Please just make sure you will capture all the error
cases, e.g., I also see path like this (a few lines below):

 if (pages < 0) {
 qemu_file_set_error(f, pages);
 break;
 }

It seems that you missed that one.


I think that one should be fine. This notification is actually put at the
bottom of ram_save_iterate. All the above error will bail out to the "out:"
path and then go to call precopy_notify(PRECOPY_NOTIFY_ERR).



I would even suggest that you capture the error with higher level.
E.g., in migration_iteration_run() after qemu_savevm_state_iterate().
Or we can just check the return value of qemu_savevm_state_iterate(),
which we have had ignored so far.


Not very sure about the higher level, because other SaveStateEntry may cause
errors that this feature don't need to care, I think we may only need it 
in ram_save.




[1]




Another thing to mention about the "reasons" (though I see it more
like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
It might help in some cases:

- then you don't need to trickily export the migrate_postcopy()
  since you'll notify that before postcopy starts

I'm thinking probably we don't need to export migrate_postcopy even now.
It's more like a sanity check, and not needed because now we have the
notifier registered to the precopy specific callchain, which has ensured
that
it is invoked via precopy.

But postcopy will always start with precopy, no?


Yes, but I think we could add the check in precopy_notify()





- you'll have a solid point that you'll 100% guarantee that we'll
  stop the free page hinting and don't need to worry about whether
  there is chance the hinting will be running without an end [2].

Thanks, I think it makes sense. Plan to add PRECOPY_NOTIFY_END in
ram_save_complete.

Yeah you can.

Btw, if you're mostly adding the notifies only in RAM-only codes, then
you can consider add the "RAM" into the names of events too to be
clear.


Sounds good, will try and see if the name would become too long :)



My suggestion at [1] is precopy general, but you can still capture it
at the end of ram_save_iterate, then they are RAM-only again.  Please
feel free to choose what fits more...


OK, thanks.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-27 Thread Wei Wang

On 11/27/2018 03:38 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:08:01PM +0800, Wei Wang wrote:
  
+typedef enum PrecopyNotifyReason {

+PRECOPY_NOTIFY_ERR = 0,
+PRECOPY_NOTIFY_START_ITERATION = 1,
+PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP = 2,
+PRECOPY_NOTIFY_AFTER_SYNC_BITMAP = 3,
+PRECOPY_NOTIFY_MAX = 4,

It would be nice to add some comments for each of the notify reason.
E.g., from the name PRECOPY_NOTIFY_START_ITERATION seems more like a
hook at the start of each iteration but according to [1] it should be
at the start of migration rather than each iteration (or when
migration restarts, though I'm not sure whether we really have this
yet).


OK. I think It would be better if the name itself could be straightforward.
Probably we could change PRECOPY_NOTIFY_START_ITERATION to
PRECOPY_NOTIFY_START_MIGRATION.



+} PrecopyNotifyReason;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(Notifier *n);
+void precopy_remove_notifier(Notifier *n);
+
  void ram_mig_init(void);
  void qemu_guest_free_page_hint(void *addr, size_t len);
  
diff --git a/migration/ram.c b/migration/ram.c

index 229b791..65b1223 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -292,6 +292,8 @@ struct RAMState {
  bool ram_bulk_stage;
  /* How many times we have dirty too many pages */
  int dirty_rate_high_cnt;
+/* ram save states used for notifiers */
+int ram_save_state;

This can be removed?


Yes, thanks.




  /* these variables are used for bitmap sync */
  /* last time we did a full bitmap_sync */
  int64_t time_last_bitmap_sync;
@@ -328,6 +330,28 @@ typedef struct RAMState RAMState;
  
  static RAMState *ram_state;
  
+static NotifierList precopy_notifier_list;

+
+void precopy_infrastructure_init(void)
+{
+notifier_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(Notifier *n)
+{
+notifier_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(Notifier *n)
+{
+notifier_remove(n);
+}
+
+static void precopy_notify(PrecopyNotifyReason reason)
+{
+notifier_list_notify(_notifier_list, );
+}
+
  uint64_t ram_bytes_remaining(void)
  {
  return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1642,6 +1666,8 @@ static void migration_bitmap_sync(RAMState *rs)
  int64_t end_time;
  uint64_t bytes_xfer_now;
  
+precopy_notify(PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP);

+
  ram_counters.dirty_sync_count++;
  
  if (!rs->time_last_bitmap_sync) {

@@ -1699,6 +1725,8 @@ static void migration_bitmap_sync(RAMState *rs)
  if (migrate_use_events()) {
  qapi_event_send_migration_pass(ram_counters.dirty_sync_count);
  }
+
+precopy_notify(PRECOPY_NOTIFY_AFTER_SYNC_BITMAP);
  }
  
  /**

@@ -2555,6 +2583,8 @@ static void ram_state_reset(RAMState *rs)
  rs->last_page = 0;
  rs->last_version = ram_list.version;
  rs->ram_bulk_stage = true;
+
+precopy_notify(PRECOPY_NOTIFY_START_ITERATION);

[1]


  }
  
  #define MAX_WAIT 50 /* ms, half buffered_file limit */

@@ -3324,6 +3354,7 @@ out:
  
  ret = qemu_file_get_error(f);

  if (ret < 0) {
+precopy_notify(PRECOPY_NOTIFY_ERR);

Could you show me which function is this line in?

Line 3324 here is ram_save_complete(), but I cannot find this exact
place.


Sure, it's in ram_save_iterate():
...
out:
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
qemu_fflush(f);
ram_counters.transferred += 8;

ret = qemu_file_get_error(f);
if (ret < 0) {
+precopy_notify(PRECOPY_NOTIFY_ERR);
return ret;
}

return done;
}




Another thing to mention about the "reasons" (though I see it more
like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
It might help in some cases:

   - then you don't need to trickily export the migrate_postcopy()
 since you'll notify that before postcopy starts


I'm thinking probably we don't need to export migrate_postcopy even now.
It's more like a sanity check, and not needed because now we have the
notifier registered to the precopy specific callchain, which has ensured 
that

it is invoked via precopy.


   - you'll have a solid point that you'll 100% guarantee that we'll
 stop the free page hinting and don't need to worry about whether
 there is chance the hinting will be running without an end [2].


Thanks, I think it makes sense. Plan to add PRECOPY_NOTIFY_END in 
ram_save_complete.





Regarding [2] above: now the series only stops the hinting when
PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP is notified.  Could there be a case
that it's missing?  E.g., what if we cancel/fail a migration during
precopy?  Have you tried it?



I think it has been handled by the above PRECOPY_NOTIFY_ERR

Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-27 Thread Wei Wang

On 11/27/2018 03:41 PM, Peter Xu wrote:


Ok then I'm fine with it.  Though you could update the comments too if
you like:

 /* protects modification of the bitmap and migration_dirty_pages */
 QemuMutex bitmap_mutex;

And it's tricky that sometimes we don't take the lock when reading
this variable "migration_dirty_pages".  I don't see obvious issue so
far, hope it's true (at least I skipped the colo ones...).


The caller reads the value just to estimate the remaining_size, and
it seems fine without a lock, because whether it reads a
value before the update or after the update seem not causing
an issue.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 4/8] migration: API to clear bits of guest free pages from the dirty bitmap

2018-11-26 Thread Wei Wang

On 11/27/2018 02:06 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:08:00PM +0800, Wei Wang wrote:
Again, is it possible to resize during migration?

So I think the check is fine, but uncertain about the comment.


Yes, resize would not happen with the current implementation.
But heard it could just be a temporal implementation. Probably
we could improve the comment like this:

"
Though the implementation might not support ram resize currently,
this could happen in theory with future updates. So the check here
handles the case that RAMBLOCK is resized after the free page hint is
reported.
"



And shall we print something if that happened?  We can use
error_report_once(), and squashing the above assert:

   if (!block || offset > block->used_length) {
 /* should never happen, but if it happens we ignore the hints and warn */
 error_report_once("...");
 return;
   }

What do you think?


Sounds good.




+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+addr += used_len;

Maybe moving this line into the for() could be a bit better?

   for (; len > 0; len -= used_len, addr += used_len) {



Yes, I think it looks better, thanks.

Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-26 Thread Wei Wang

On 11/27/2018 02:02 PM, Wei Wang wrote:

On 11/27/2018 01:40 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:07:59PM +0800, Wei Wang wrote:

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  migration/ram.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..ef69dbe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,

  {
  bool ret;
  +qemu_mutex_lock(>bitmap_mutex);
  ret = test_and_clear_bit(page, rb->bmap);
if (ret) {
  rs->migration_dirty_pages--;
  }
+qemu_mutex_unlock(>bitmap_mutex);
+
  return ret;
  }

It seems fine to me, but have you thought about
test_and_clear_bit_atomic()?  Note that we just had
test_and_set_bit_atomic() a few months ago.


Thanks for sharing. I think we might also need to
mutex migration_dirty_pages.



And not related to this patch: I'm unclear on why we have had
bitmap_mutex before, since it seems unnecessary.


OK. This is because with the optimization we have a thread
which clears bits (of free pages) from the bitmap and updates
migration_dirty_pages. So we need to synchronization between
the migration thread and the optimization thread.



And before this feature, I think yes, that bitmap_mutex is not needed.
It was left there due to some historical reasons.
I remember Dave previous said he was about to remove it. But the new
feature will need it again.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-26 Thread Wei Wang

On 11/27/2018 01:40 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:07:59PM +0800, Wei Wang wrote:

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  migration/ram.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..ef69dbe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
  {
  bool ret;
  
+qemu_mutex_lock(>bitmap_mutex);

  ret = test_and_clear_bit(page, rb->bmap);
  
  if (ret) {

  rs->migration_dirty_pages--;
  }
+qemu_mutex_unlock(>bitmap_mutex);
+
  return ret;
  }

It seems fine to me, but have you thought about
test_and_clear_bit_atomic()?  Note that we just had
test_and_set_bit_atomic() a few months ago.


Thanks for sharing. I think we might also need to
mutex migration_dirty_pages.



And not related to this patch: I'm unclear on why we have had
bitmap_mutex before, since it seems unnecessary.


OK. This is because with the optimization we have a thread
which clears bits (of free pages) from the bitmap and updates
migration_dirty_pages. So we need to synchronization between
the migration thread and the optimization thread.

Best,
Wei



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v9 0/8] virtio-balloon: free page hint support

2018-11-26 Thread Wei Wang

On 11/15/2018 06:07 PM, Wei Wang wrote:

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
 Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
 2.1 Guest setup: 8G RAM, 4 vCPU
 2.1.1 Idle guest live migration time
 Optimization v.s. Legacy = 620ms vs 2970ms
 --> ~79% reduction
 2.1.2 Guest live migration with Linux compilation workload
   (i.e. make bzImage -j4) running
   1) Live Migration Time:
  Optimization v.s. Legacy = 2273ms v.s. 4502ms
  --> ~50% reduction
   2) Linux Compilation Time:
  Optimization v.s. Legacy = 8min42s v.s. 8min43s
  --> no obvious difference

 2.2 Guest setup: 128G RAM, 4 vCPU
 2.2.1 Idle guest live migration time
 Optimization v.s. Legacy = 5294ms vs 41651ms
 --> ~87% reduction
 2.2.2 Guest live migration with Linux compilation workload
   1) Live Migration Time:
 Optimization v.s. Legacy = 8816ms v.s. 54201ms
 --> 84% reduction
   2) Linux Compilation Time:
  Optimization v.s. Legacy = 8min30s v.s. 8min36s
  --> no obvious difference

ChangeLog:
v8->v9:
bitmap:
 - fix bitmap_count_one to handle the nbits=0 case
migration:
 - replace the ram save notifier chain with a more general precopy
   notifier chain, which is similar to the postcopy notifier chain.
 - Avoid exposing the RAMState struct, and add a function,
   precopy_disable_bulk_stage, to let the virtio-balloon notifier
   callback to disable the bulk stage flag.


Hi Dave and Peter,

Could you continue to review the patches? Thanks!

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support

2018-11-15 Thread Wei Wang

On 11/16/2018 02:50 AM, no-re...@patchew.org wrote:

Hi,

This series failed docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

   CC  net/filter.o
   CC  net/filter-buffer.o
   CC  net/filter-mirror.o
   CC  net/colo-compare.o
/tmp/qemu-test/src/migration/rdma.c: In function 'qemu_rdma_accept':
/tmp/qemu-test/src/migration/rdma.c:3353:5: error: implicit declaration of 
function 'migrate_postcopy' [-Werror=implicit-function-declaration]
  if (migrate_postcopy() && !rdma->is_return_path) {
  ^
/tmp/qemu-test/src/migration/rdma.c:3353:5: error: nested extern declaration of 
'migrate_postcopy' [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [migration/rdma.o] Error 1
make: *** Waiting for unfinished jobs


This is caused by missing "migration/misc.h" in rdma.c, since we moved 
the migrate_postcopy() declaration there. I'll add it.


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 4/8] migration: API to clear bits of guest free pages from the dirty bitmap

2018-11-15 Thread Wei Wang
This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h |  2 ++
 migration/ram.c  | 48 
 2 files changed, 50 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index ef69dbe..229b791 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3131,6 +3131,54 @@ static void ram_state_resume_prepare(RAMState *rs, 
QEMUFile *out)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+assert(block);
+
+/*
+ * This handles the case that the RAMBlock is resized after the free
+ * page hint is reported.
+ */
+if (unlikely(offset > block->used_length)) {
+return;
+}
+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+addr += used_len;
+}
+
+start = offset >> TARGET_PAGE_BITS;
+npages = used_len >> TARGET_PAGE_BITS;
+
+qemu_mutex_lock(_state->bitmap_mutex);
+ram_state->migration_dirty_pages -=
+  bitmap_count_one_with_offset(block->bmap, start, npages);
+bitmap_clear(block->bmap, start, npages);
+qemu_mutex_unlock(_state->bitmap_mutex);
+}
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-15 Thread Wei Wang
The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..ef69dbe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
 {
 bool ret;
 
+qemu_mutex_lock(>bitmap_mutex);
 ret = test_and_clear_bit(page, rb->bmap);
 
 if (ret) {
 rs->migration_dirty_pages--;
 }
+qemu_mutex_unlock(>bitmap_mutex);
+
 return ret;
 }
 
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 1/8] bitmap: fix bitmap_count_one

2018-11-15 Thread Wei Wang
BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

Inital Discussion Link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg554316.html
Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..679f1bd 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -221,6 +221,10 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 
 static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
 {
+if (unlikely(!nbits)) {
+return 0;
+}
+
 if (small_nbits(nbits)) {
 return ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits));
 } else {
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 6/8] migration/ram.c: add a function to disable the bulk stage

2018-11-15 Thread Wei Wang
This patch adds a function to enable a precopy notifier callback outside
the migration subsystem to disable the bulk stage flag. This is needed by
the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 1 +
 migration/ram.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 0bac623..67cb275 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -30,6 +30,7 @@ typedef enum PrecopyNotifyReason {
 void precopy_infrastructure_init(void);
 void precopy_add_notifier(Notifier *n);
 void precopy_remove_notifier(Notifier *n);
+void precopy_disable_bulk_stage(void);
 
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index 65b1223..8745ca3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -352,6 +352,15 @@ static void precopy_notify(PrecopyNotifyReason reason)
 notifier_list_notify(_notifier_list, );
 }
 
+void precopy_disable_bulk_stage(void)
+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 2/8] bitmap: bitmap_count_one_with_offset

2018-11-15 Thread Wei Wang
Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/qemu/bitmap.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 679f1bd..5c31334 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -232,6 +232,19 @@ static inline long bitmap_count_one(const unsigned long 
*bitmap, long nbits)
 }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+long offset, long nbits)
+{
+long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+long redundant_bits = offset - aligned_offset;
+long bits_to_count = nbits + redundant_bits;
+const unsigned long *bitmap_start = bitmap +
+aligned_offset / BITS_PER_LONG;
+
+return bitmap_count_one(bitmap_start, bits_to_count) -
+   bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 8/8] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-11-15 Thread Wei Wang
The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c  | 255 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/standard-headers/linux/virtio_balloon.h |   5 +
 3 files changed, 287 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1728e4f..4a3eb30 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/misc.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -308,6 +309,176 @@ out:
 }
 }
 
+static void virtio_balloon_handle_free_page_vq(VirtIODevice *vdev,
+   VirtQueue *vq)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+qemu_bh_schedule(s->free_page_bh);
+}
+
+static bool get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return false;
+}
+
+if (elem->out_num) {
+uint32_t id;
+size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
+ , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);
+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return false;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+}
+}
+}
+
+if (elem->in_num) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+}
+
+return true;
+}
+
+static void virtio_ballloon_get_free_page_hints(void *opaque)
+{
+VirtIOBalloon *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+bool continue_to_get_hints;
+
+do {
+qemu_mutex_lock(>free_page_lock);
+virtio_queue_set_notification(vq, 0);
+continue_to_get_hints = get_free_page_hints(dev);
+qemu_mutex_unlock(>free_page_lock);
+virtio_notify(vdev, vq);
+  /*
+   * Start to poll the vq once the reporting started. Otherwise, continue
+   * only when there are entries on the vq, which need to be given back.
+   */
+} while (continue_to_get_hints ||
+ dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+virtio_queue_set_notification(vq, 1);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(VirtIOBalloon *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+/* For the stop and copy phase, we don't need to start the optimization */
+if (!vdev->vm_running) {
+return;
+}
+
+if (s->free_page_report_cmd_id == UINT_MAX) {
+s->free_page_report_cmd_id

[virtio-dev] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-15 Thread Wei Wang
This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 12 
 migration/ram.c  | 31 +++
 vl.c |  1 +
 3 files changed, 44 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..0bac623 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,6 +19,18 @@
 
 /* migration/ram.c */
 
+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_ERR = 0,
+PRECOPY_NOTIFY_START_ITERATION = 1,
+PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP = 2,
+PRECOPY_NOTIFY_AFTER_SYNC_BITMAP = 3,
+PRECOPY_NOTIFY_MAX = 4,
+} PrecopyNotifyReason;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(Notifier *n);
+void precopy_remove_notifier(Notifier *n);
+
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
 
diff --git a/migration/ram.c b/migration/ram.c
index 229b791..65b1223 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -292,6 +292,8 @@ struct RAMState {
 bool ram_bulk_stage;
 /* How many times we have dirty too many pages */
 int dirty_rate_high_cnt;
+/* ram save states used for notifiers */
+int ram_save_state;
 /* these variables are used for bitmap sync */
 /* last time we did a full bitmap_sync */
 int64_t time_last_bitmap_sync;
@@ -328,6 +330,28 @@ typedef struct RAMState RAMState;
 
 static RAMState *ram_state;
 
+static NotifierList precopy_notifier_list;
+
+void precopy_infrastructure_init(void)
+{
+notifier_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(Notifier *n)
+{
+notifier_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(Notifier *n)
+{
+notifier_remove(n);
+}
+
+static void precopy_notify(PrecopyNotifyReason reason)
+{
+notifier_list_notify(_notifier_list, );
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1642,6 +1666,8 @@ static void migration_bitmap_sync(RAMState *rs)
 int64_t end_time;
 uint64_t bytes_xfer_now;
 
+precopy_notify(PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP);
+
 ram_counters.dirty_sync_count++;
 
 if (!rs->time_last_bitmap_sync) {
@@ -1699,6 +1725,8 @@ static void migration_bitmap_sync(RAMState *rs)
 if (migrate_use_events()) {
 qapi_event_send_migration_pass(ram_counters.dirty_sync_count);
 }
+
+precopy_notify(PRECOPY_NOTIFY_AFTER_SYNC_BITMAP);
 }
 
 /**
@@ -2555,6 +2583,8 @@ static void ram_state_reset(RAMState *rs)
 rs->last_page = 0;
 rs->last_version = ram_list.version;
 rs->ram_bulk_stage = true;
+
+precopy_notify(PRECOPY_NOTIFY_START_ITERATION);
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -3324,6 +3354,7 @@ out:
 
 ret = qemu_file_get_error(f);
 if (ret < 0) {
+precopy_notify(PRECOPY_NOTIFY_ERR);
 return ret;
 }
 
diff --git a/vl.c b/vl.c
index fa25d1a..48ae0e8 100644
--- a/vl.c
+++ b/vl.c
@@ -3057,6 +3057,7 @@ int main(int argc, char **argv, char **envp)
 module_call_init(MODULE_INIT_OPTS);
 
 runstate_init();
+precopy_infrastructure_init();
 postcopy_infrastructure_init();
 monitor_init_globals();
 
-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 0/8] virtio-balloon: free page hint support

2018-11-15 Thread Wei Wang
This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
  (i.e. make bzImage -j4) running
  1) Live Migration Time:
 Optimization v.s. Legacy = 2273ms v.s. 4502ms
 --> ~50% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min42s v.s. 8min43s
 --> no obvious difference

2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
  1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min30s v.s. 8min36s
 --> no obvious difference

ChangeLog:
v8->v9:
bitmap:
- fix bitmap_count_one to handle the nbits=0 case
migration:
- replace the ram save notifier chain with a more general precopy
  notifier chain, which is similar to the postcopy notifier chain.
- Avoid exposing the RAMState struct, and add a function,
  precopy_disable_bulk_stage, to let the virtio-balloon notifier
  callback to disable the bulk stage flag.
virtio-balloon:
- Remove VIRTIO_BALLOON_F_PAGE_POISON from this series as it is not
  a limit to the free page optimization now. Plan to add the device
  support for VIRTIO_BALLOON_F_PAGE_POISON in another patch later.

Previous changelog:
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01908.html

Wei Wang (8):
  bitmap: fix bitmap_count_one
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  migration/ram.c: add a notifier chain for precopy
  migration/ram.c: add a function to disable the bulk stage
  migration: move migrate_postcopy() to include/migration/misc.h
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

 hw/virtio/virtio-balloon.c  | 255 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/migration/misc.h|  16 ++
 include/qemu/bitmap.h   |  17 ++
 include/standard-headers/linux/virtio_balloon.h |   5 +
 migration/migration.h   |   2 -
 migration/ram.c |  91 +
 vl.c|   1 +
 8 files changed, 412 insertions(+), 3 deletions(-)

-- 
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v1 1/2] content/virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-11-13 Thread Wei Wang
The VIRTIO_BALLOON_F_FREE_PAGE_HINT feature supports driver reporting
free page hints to the device.

Live migration can use the virtio-balloon device to get the guest free
page hints, and avoid sending those free pages. It is not concerned that
the memory pages are used (e.g. guest reclaimed some of them due to memory
pressure) after they are given to the device as hints of free pages,
because they will be tracked by the hypervisor and transferred in the
subsequent round if they are used and written.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Wei Wang 
---
 content.tex | 122 +---
 1 file changed, 116 insertions(+), 6 deletions(-)

diff --git a/content.tex b/content.tex
index c346183..673c891 100644
--- a/content.tex
+++ b/content.tex
@@ -4367,9 +4367,9 @@ The traditional virtio memory balloon device is a 
primitive device for
 managing guest memory: the device asks for a certain amount of
 memory, and the driver supplies it (or withdraws it, if the device
 has more than it asks for). This allows the guest to adapt to
-changes in allowance of underlying physical memory. If the
-feature is negotiated, the device can also be used to communicate
-guest memory statistics to the host.
+changes in allowance of underlying physical memory. The device can
+also be used to communicate guest memory statistics and guest free
+memory pages to the host when the related features are negotiated.
 
 \subsection{Device ID}\label{sec:Device Types / Memory Balloon Device / Device 
ID}
   5
@@ -4378,11 +4378,14 @@ guest memory statistics to the host.
 \begin{description}
 \item[0] inflateq
 \item[1] deflateq
-\item[2] statsq.
+\item[2] statsq
+\item[3] freepageq
 \end{description}
 
   Virtqueue 2 only exists if VIRTIO_BALLON_F_STATS_VQ set.
 
+  Virtqueue 3 only exists if VIRTIO_BALLON_F_FREE_PAGE_HINT set.
+
 \subsection{Feature bits}\label{sec:Device Types / Memory Balloon Device / 
Feature bits}
 \begin{description}
 \item[VIRTIO_BALLOON_F_MUST_TELL_HOST (0)] Host has to be told before
@@ -4392,6 +4395,8 @@ guest memory statistics to the host.
 memory statistics is present.
 \item[VIRTIO_BALLOON_F_DEFLATE_ON_OOM (2) ] Deflate balloon on
 guest out of memory condition.
+\item[VIRTIO_BALLOON_F_FREE_PAGE_HINT(3) ] Guest reports free pages
+to host.
 
 \end{description}
 
@@ -4415,16 +4420,32 @@ allow guest to use memory before notifying host if
 VIRTIO_BALLOON_F_MUST_TELL_HOST is not negotiated.
 
 \subsection{Device configuration layout}\label{sec:Device Types / Memory 
Balloon Device / Device configuration layout}
-  Both fields of this configuration
-  are always available.
+
+\begin{lstlisting}
+#define VIRTIO_BALLOON_CMD_ID_MIN 0x8000
+#define VIRTIO_BALLOON_CMD_ID_STOP 0x0
+#define VIRTIO_BALLOON_CMD_ID_DONE 0x1
+\end{lstlisting}
 
 \begin{lstlisting}
 struct virtio_balloon_config {
 le32 num_pages;
 le32 actual;
+le32 free_page_report_cmd_id;
 };
 \end{lstlisting}
 
+\field{num_pages} and \field{actual} are always available.
+
+\field{free_page_report_cmd_id} is available if
+VIRTIO_BALLON_F_FREE_PAGE_HINT negotiated.
+
+\devicenormative{\subsubsection}{Device configuration layout}{Device Types / 
Traditional Memory Balloon Device / Device configuration layout}
+
+The device MUST NOT set \field{free_page_report_cmd_id} to a value
+between \field{VIRTIO_BALLOON_CMD_ID_DONE} and
+\field{VIRTIO_BALLOON_CMD_ID_MIN} exclusive.
+
 \subparagraph{Legacy Interface: Device configuration layout}\label{sec:Device 
Types / Memory Balloon Device / Device
 configuration layout / Legacy Interface: Device configuration layout}
 When using the legacy interface, transitional devices and drivers
@@ -4448,8 +4469,20 @@ The device initialization process is outlined below:
   \item DRIVER_OK is set: device operation begins.
   \item Notify the device about the stats virtqueue buffer.
   \end{enumerate}
+
+\item If the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature bit is
+  negotiated:
+  \begin{enumerate}
+  \item Identify the free page virtqueue.
+  \item Set \field{free_page_report_cmd_id} to
+\field{VIRTIO_BALLOON_CMD_ID_MIN}.
+  \item Register a notifier with an external component (e.g. a live
+migration agent) who will request for the free page reporting
+from the driver.
+  \end{enumerate}
 \end{enumerate}
 
+
 \subsection{Device Operation}\label{sec:Device Types / Memory Balloon Device / 
Device Operation}
 
 The device is driven either by the receipt of a configuration
@@ -4718,6 +4751,83 @@ first buffer.
   allocations in the guest.
 \end{description}
 
+\subsubsection{Free Page Reporting}\label{sec:Device Types / Memory Balloon 
Device / Device Operation / Free Page Reporting}
+
+The free page reporting is driven by the device. The device MAY
+request the driver to start or stop the free page reporting.
+
+A request to start the free page reporting proceeds as follows:
+
+\begin{enumerate}
+\item The device updates

[virtio-dev] [PATCH v1 0/2] Virtio-balloon Updates

2018-11-13 Thread Wei Wang
This patch series adds 2 features to the virtio-balloon spec.
The VIRTIO_BALLOON_F_FREE_PAGE_HINT feature supports guest to report
free page hints to host.
The VIRTIO_BALLOON_F_PAGE_POISON feature supports guest to report the
memory poison value to host.

Wei Wang (2):
  content/virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  content/virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 content.tex | 140 +---
 1 file changed, 134 insertions(+), 6 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v1 2/2] content/virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-11-13 Thread Wei Wang
The VIRTIO_BALLOON_F_PAGE_POISON feature supports guest to report the
memory poison value to host.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Wei Wang 
---
 content.tex | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/content.tex b/content.tex
index 673c891..df3c6c2 100644
--- a/content.tex
+++ b/content.tex
@@ -4397,6 +4397,8 @@ memory pages to the host when the related features are 
negotiated.
 guest out of memory condition.
 \item[VIRTIO_BALLOON_F_FREE_PAGE_HINT(3) ] Guest reports free pages
 to host.
+\item[VIRTIO_BALLOON_F_PAGE_POISON(4) ] Indicate if guest is using
+page poisoning.
 
 \end{description}
 
@@ -4404,6 +4406,10 @@ memory pages to the host when the related features are 
negotiated.
 The driver SHOULD accept the VIRTIO_BALLOON_F_MUST_TELL_HOST
 feature if offered by the device.
 
+The driver SHOULD accept the VIRTIO_BALLOON_F_PAGE_POISON feature
+only when the guest is using page poison and the feature is offered
+by the device.
+
 \devicenormative{\subsubsection}{Feature bits}{Device Types / Memory Balloon 
Device / Feature bits}
 If the device offers the VIRTIO_BALLOON_F_MUST_TELL_HOST feature
 bit, and if the driver did not accept this feature bit, the
@@ -4432,6 +4438,7 @@ struct virtio_balloon_config {
 le32 num_pages;
 le32 actual;
 le32 free_page_report_cmd_id;
+le32 poison_val;
 };
 \end{lstlisting}
 
@@ -4440,12 +4447,20 @@ struct virtio_balloon_config {
 \field{free_page_report_cmd_id} is available if
 VIRTIO_BALLON_F_FREE_PAGE_HINT negotiated.
 
+\field{poison_val} is available if VIRTIO_BALLOON_F_PAGE_POISON
+negotiated.
+
 \devicenormative{\subsubsection}{Device configuration layout}{Device Types / 
Traditional Memory Balloon Device / Device configuration layout}
 
 The device MUST NOT set \field{free_page_report_cmd_id} to a value
 between \field{VIRTIO_BALLOON_CMD_ID_DONE} and
 \field{VIRTIO_BALLOON_CMD_ID_MIN} exclusive.
 
+\drivernormative{\subsubsection}{Device configuration layout}{Device Types / 
Traditional Memory Balloon Device / Device configuration layout}
+
+The driver SHOULD write the guest used poison value to
+\field{poison_val} if VIRTIO_BALLOON_F_PAGE_POISON negotiated.
+
 \subparagraph{Legacy Interface: Device configuration layout}\label{sec:Device 
Types / Memory Balloon Device / Device
 configuration layout / Legacy Interface: Device configuration layout}
 When using the legacy interface, transitional devices and drivers
@@ -4480,6 +4495,9 @@ The device initialization process is outlined below:
 migration agent) who will request for the free page reporting
 from the driver.
   \end{enumerate}
+
+\item If the VIRTIO_BALLOON_F_PAGE_POISON feature bit is negotiated,
+  write the guest used poison value to \field{poison_val}.
 \end{enumerate}
 
 
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v37 0/3] Virtio-balloon: support free page reporting

2018-10-25 Thread Wei Wang

On 10/25/2018 08:58 AM, Michael S. Tsirkin wrote:

On Mon, Aug 27, 2018 at 09:32:16AM +0800, Wei Wang wrote:

The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this
series enables the virtio-balloon driver to report hints of guest free
pages to host. It can be used to accelerate virtual machine (VM) live
migration. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to have the hypervisor write-protect all the guest memory.

This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.

OK so it will be in linux-next.  Now can I trouble you for a virtio spec
patch with the description please?


No problem, I'll start to patch the spec. Thanks!

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v36 0/5] Virtio-balloon: support free page reporting

2018-09-06 Thread Wei Wang

On 07/23/2018 10:36 PM, Dr. David Alan Gilbert wrote:

* Michael S. Tsirkin (m...@redhat.com) wrote:

On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote:

This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.

* Tests
- Test Environment
 Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 Guest: 8G RAM, 4 vCPU
 Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
 - Idle Guest Live Migration Time (results are averaged over 10 runs):
 - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction
(setting page poisoning zero and enabling ksm don't affect the
  comparison result)
 - Guest with Linux Compilation Workload (make bzImage -j4):
 - Live Migration Time (average)
   Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction
 - Linux Compilation Time
   Optimization v.s. Legacy = 5min4s v.s. 5min12s
   --> no obvious difference

I'd like to see dgilbert's take on whether this kind of gain
justifies adding a PV interfaces, and what kind of guest workload
is appropriate.

Cc'd.

Well, 44% is great ... although the measurement is a bit weird.

a) A 2 second downtime is very large; 300-500ms is more normal
b) I'm not sure what the 'average' is  - is that just between a bunch of
repeated migrations?
c) What load was running in the guest during the live migration?

An interesting measurement to add would be to do the same test but
with a VM with a lot more RAM but the same load;  you'd hope the gain
would be even better.
It would be interesting, especially because the users who are interested
are people creating VMs allocated with lots of extra memory (for the
worst case) but most of the time migrating when it's fairly idle.

Dave



Hi Dave,

The results of the added experiments have been shown in the v37 cover 
letter.

Could you have a look at https://lkml.org/lkml/2018/8/27/29 . Thanks.

Best,
Wei


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v37 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-08-26 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
obtained one by one from the mm free list via the regular allocation
function.

Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register. When
the guest starts to report, it first sends a start cmd to host via the
free page vq, which acks to host the cmd id received. When the guest
finishes reporting free pages, a stop cmd is sent to host via the vq.
Host may also send a stop cmd id to the guest to stop the reporting.

VIRTIO_BALLOON_CMD_ID_STOP: Host sends this cmd to stop the guest
reporting.
VIRTIO_BALLOON_CMD_ID_DONE: Host sends this cmd to tell the guest that
the reported pages are ready to be freed.

Why does the guest free the reported pages when host tells it is ready to
free?
This is because freeing pages appears to be expensive for live migration.
free_pages() dirties memory very quickly and makes the live migraion not
converge in some cases. So it is good to delay the free_page operation
when the migration is done, and host sends a command to guest about that.

Why do we need the new VIRTIO_BALLOON_CMD_ID_DONE, instead of reusing
VIRTIO_BALLOON_CMD_ID_STOP?
This is because live migration is usually done in several rounds. At the
end of each round, host needs to send a VIRTIO_BALLOON_CMD_ID_STOP cmd to
the guest to stop (or say pause) the reporting. The guest resumes the
reporting when it receives a new command id at the beginning of the next
round. So we need a new cmd id to distinguish between "stop reporting" and
"ready to free the reported pages".

TODO:
- Add a batch page allocation API to amortize the allocation overhead.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
 drivers/virtio/virtio_balloon.c | 364 
 include/uapi/linux/virtio_balloon.h |   5 +
 2 files changed, 336 insertions(+), 33 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d1c1f62..a185678 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -41,13 +41,34 @@
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
+__GFP_NOMEMALLOC)
+/* The order of free page blocks to report to host */
+#define VIRTIO_BALLOON_FREE_PAGE_ORDER (MAX_ORDER - 1)
+/* The size of a free page block in bytes */
+#define VIRTIO_BALLOON_FREE_PAGE_SIZE \
+   (1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT))
+
 #ifdef CONFIG_BALLOON_COMPACTION
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -57,6 +78,18 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The list of allocated free pages, waiting to be given back to mm */
+   struct list_head free_page_list;
+   spinlock_t free_page_list_lock;
+   /* The number of free page blocks on the above list */
+   unsigned long num_free_page_blocks;
+   /* The cmd id received from host */
+   u32 cmd_id_received;
+   /* The cmd id that is actively in use */
+   __virtio32 cmd_id_active;
+   /* Buffer to store the stop sign */
+   __virtio32 cmd_id_stop;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +353,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct 

[virtio-dev] [PATCH v37 0/3] Virtio-balloon: support free page reporting

2018-08-26 Thread Wei Wang
The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this
series enables the virtio-balloon driver to report hints of guest free
pages to host. It can be used to accelerate virtual machine (VM) live
migration. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to have the hypervisor write-protect all the guest memory.

This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.

* Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
  (i.e. make bzImage -j4) running
  1) Live Migration Time:
 Optimization v.s. Legacy = 2273ms v.s. 4502ms
 --> ~50% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min42s v.s. 8min43s
 --> no obvious difference

2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
  1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min30s v.s. 8min36s
 --> no obvious difference

ChangeLog:
v36->v37:
- free the reported pages to mm when receives a DONE cmd from host.
  Please see patch 1's commit log for reasons. Please see patch 1's
  commit for detailed explanations.

For ChangeLogs from v22 to v36, please reference
https://lkml.org/lkml/2018/7/20/199

For ChangeLogs before v21, please reference
https://lwn.net/Articles/743660/

Wei Wang (3):
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 drivers/virtio/virtio_balloon.c | 374 
 include/uapi/linux/virtio_balloon.h |   8 +
 mm/page_poison.c|   6 +
 3 files changed, 355 insertions(+), 33 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v37 3/3] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-08-26 Thread Wei Wang
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value that is in use.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 10 ++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a185678..728ecd1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -825,6 +825,7 @@ static int virtio_balloon_register_shrinker(struct 
virtio_balloon *vb)
 static int virtballoon_probe(struct virtio_device *vdev)
 {
struct virtio_balloon *vb;
+   __u32 poison_val;
int err;
 
if (!vdev->config->get) {
@@ -892,6 +893,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->num_free_page_blocks = 0;
spin_lock_init(>free_page_list_lock);
INIT_LIST_HEAD(>free_page_list);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+   memset(_val, PAGE_POISON, sizeof(poison_val));
+   virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, _val);
+   }
}
/*
 * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
@@ -992,6 +998,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+   if (!page_poisoning_enabled())
+   __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
 }
@@ -1001,6 +1010,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+   VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 47c9eb4..a1966cd7 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -48,6 +49,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+   /* Stores PAGE_POISON if page poisoning is in use */
+   __u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v4 2/3] virtio-balloon: kzalloc the vb struct

2018-08-16 Thread Wei Wang
Zero all the vb fields at alloaction, so that we don't need to
zero-initialize each field one by one later.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Tetsuo Handa 
---
 drivers/virtio/virtio_balloon.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8100e77..d97d73c 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -561,7 +561,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
return -EINVAL;
}
 
-   vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
+   vdev->priv = vb = kzalloc(sizeof(*vb), GFP_KERNEL);
if (!vb) {
err = -ENOMEM;
goto out;
@@ -570,8 +570,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
INIT_WORK(>update_balloon_stats_work, update_balloon_stats_func);
INIT_WORK(>update_balloon_size_work, update_balloon_size_func);
spin_lock_init(>stop_update_lock);
-   vb->stop_update = false;
-   vb->num_pages = 0;
mutex_init(>balloon_lock);
init_waitqueue_head(>acked);
vb->vdev = vdev;
@@ -602,7 +600,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
err = PTR_ERR(vb->vb_dev_info.inode);
kern_unmount(balloon_mnt);
unregister_oom_notifier(>nb);
-   vb->vb_dev_info.inode = NULL;
goto out_del_vqs;
}
vb->vb_dev_info.inode->i_mapping->a_ops = _aops;
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v4 1/3] virtio-balloon: remove BUG() in init_vqs

2018-08-16 Thread Wei Wang
It's a bit overkill to use BUG when failing to add an entry to the
stats_vq in init_vqs. So remove it and just return the error to the
caller to bail out nicely.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 3988c09..8100e77 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -455,9 +455,13 @@ static int init_vqs(struct virtio_balloon *vb)
num_stats = update_balloon_stats(vb);
 
sg_init_one(, vb->stats, sizeof(vb->stats[0]) * num_stats);
-   if (virtqueue_add_outbuf(vb->stats_vq, , 1, vb, GFP_KERNEL)
-   < 0)
-   BUG();
+   err = virtqueue_add_outbuf(vb->stats_vq, , 1, vb,
+  GFP_KERNEL);
+   if (err) {
+   dev_warn(>vdev->dev, "%s: add stat_vq failed\n",
+__func__);
+   return err;
+   }
virtqueue_kick(vb->stats_vq);
}
return 0;
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v4 0/3] virtio-balloon: some improvements

2018-08-16 Thread Wei Wang
This series is split from the "Virtio-balloon: support free page
reporting" series to make some improvements.

ChangeLog:
v3->v4:
- use kzalloc to allocate the vb struct so that we don't need to zero
  initialize each field one by one later;
- also remove vb->shrinker.batch = 0, which is not needed now.
v2->v3:
- shrink the balloon pages according to the amount requested by the
  claimer, instead of using a user specified number;
v1->v2:
- register the shrinker when VIRTIO_BALLOON_F_DEFLATE_ON_OOM is
  negotiated.

Wei Wang (3):
  virtio-balloon: remove BUG() in init_vqs
  virtio-balloon: kzalloc the vb struct
  virtio_balloon: replace oom notifier with shrinker

 drivers/virtio/virtio_balloon.c | 125 +---
 1 file changed, 67 insertions(+), 58 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v4 3/3] virtio_balloon: replace oom notifier with shrinker

2018-08-16 Thread Wei Wang
The OOM notifier is getting deprecated to use for the reasons:
- As a callout from the oom context, it is too subtle and easy to
  generate bugs and corner cases which are hard to track;
- It is called too late (after the reclaiming has been performed).
  Drivers with large amuont of reclaimable memory is expected to
  release them at an early stage of memory pressure;
- The notifier callback isn't aware of oom contrains;
Link: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure. The balloon pages are
given back to mm adaptively by returning the number of pages that the
reclaimer is asking for (i.e. sc->nr_to_scan).

Currently the max possible value of sc->nr_to_scan passed to the balloon
shrinker is SHRINK_BATCH, which is 128. This is smaller than the
limitation that only VIRTIO_BALLOON_ARRAY_PFNS_MAX (256) pages can be
returned via one invocation of leak_balloon. But this patch still
considers the case that SHRINK_BATCH or shrinker->batch could be changed
to a value larger than VIRTIO_BALLOON_ARRAY_PFNS_MAX, which will need to
do multiple invocations of leak_balloon.

Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been used
to release balloon pages on OOM. We continue to use this feature bit for
the shrinker, so the shrinker is only registered when this feature bit
has been negotiated with host.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Tetsuo Handa 
---
 drivers/virtio/virtio_balloon.c | 110 +---
 1 file changed, 59 insertions(+), 51 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d97d73c..d1c1f62 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -40,13 +39,8 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> 
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
-#define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
-static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
-module_param(oom_pages, int, S_IRUSR | S_IWUSR);
-MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
-
 #ifdef CONFIG_BALLOON_COMPACTION
 static struct vfsmount *balloon_mnt;
 #endif
@@ -86,8 +80,8 @@ struct virtio_balloon {
/* Memory statistics */
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 
-   /* To register callback in oom notifier call chain */
-   struct notifier_block nb;
+   /* To register a shrinker to shrink memory upon memory pressure */
+   struct shrinker shrinker;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -365,38 +359,6 @@ static void update_balloon_size(struct virtio_balloon *vb)
  );
 }
 
-/*
- * virtballoon_oom_notify - release pages when system is under severe
- * memory pressure (called from out_of_memory())
- * @self : notifier block struct
- * @dummy: not used
- * @parm : returned - number of freed pages
- *
- * The balancing of memory by use of the virtio balloon should not cause
- * the termination of processes while there are pages in the balloon.
- * If virtio balloon manages to release some memory, it will make the
- * system return and retry the allocation that forced the OOM killer
- * to run.
- */
-static int virtballoon_oom_notify(struct notifier_block *self,
- unsigned long dummy, void *parm)
-{
-   struct virtio_balloon *vb;
-   unsigned long *freed;
-   unsigned num_freed_pages;
-
-   vb = container_of(self, struct virtio_balloon, nb);
-   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
-   return NOTIFY_OK;
-
-   freed = parm;
-   num_freed_pages = leak_balloon(vb, oom_pages);
-   update_balloon_size(vb);
-   *freed += num_freed_pages;
-
-   return NOTIFY_OK;
-}
-
 static void update_balloon_stats_func(struct work_struct *work)
 {
struct virtio_balloon *vb;
@@ -550,6 +512,52 @@ static struct file_system_type balloon_fs = {
 
 #endif /* CONFIG_BALLOON_COMPACTION */
 
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+   unsigned long pages_to_free, pages_freed = 0;
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   pages_to_free = sc->nr_to_scan * VIRTIO_BALLOON_PAGES_PER_PAGE;
+
+   /*
+* One invocation of leak_balloon can deflate at most
+* VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+* multiple times to deflate pages till reaching pages_to_free.
+*/
+   while 

[virtio-dev] Re: [PATCH v3 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-06 Thread Wei Wang

On 08/03/2018 08:11 PM, Tetsuo Handa wrote:

On 2018/08/03 17:32, Wei Wang wrote:

+static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
+{
+   vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
+   vb->shrinker.count_objects = virtio_balloon_shrinker_count;
+   vb->shrinker.batch = 0;
+   vb->shrinker.seeks = DEFAULT_SEEKS;

Why flags field is not set? If vb is allocated by kmalloc(GFP_KERNEL)
and is nowhere zero-cleared, KASAN would complain it.


Could you point where in the code that would complain it?
I only see two shrinker flags (NUMA_AWARE and MEMCG_AWARE), and they 
seem not related to that.



Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v3 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-05 Thread Wei Wang

On 08/04/2018 03:15 AM, Michael S. Tsirkin wrote:

On Fri, Aug 03, 2018 at 04:32:26PM +0800, Wei Wang wrote:

The OOM notifier is getting deprecated to use for the reasons:
- As a callout from the oom context, it is too subtle and easy to
   generate bugs and corner cases which are hard to track;
- It is called too late (after the reclaiming has been performed).
   Drivers with large amuont of reclaimable memory is expected to
   release them at an early stage of memory pressure;
- The notifier callback isn't aware of oom contrains;
Link: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure. The balloon pages are
given back to mm adaptively by returning the number of pages that the
reclaimer is asking for (i.e. sc->nr_to_scan).

Currently the max possible value of sc->nr_to_scan passed to the balloon
shrinker is SHRINK_BATCH, which is 128. This is smaller than the
limitation that only VIRTIO_BALLOON_ARRAY_PFNS_MAX (256) pages can be
returned via one invocation of leak_balloon. But this patch still
considers the case that SHRINK_BATCH or shrinker->batch could be changed
to a value larger than VIRTIO_BALLOON_ARRAY_PFNS_MAX, which will need to
do multiple invocations of leak_balloon.

Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been used
to release balloon pages on OOM. We continue to use this feature bit for
the shrinker, so the shrinker is only registered when this feature bit
has been negotiated with host.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 


Could you add data at how was this tested and how did guest
behaviour change. Which configurations see an improvement?



Yes. Please see the differences from the "*1" and "*2" cases below.

Taking this chance, I use "*2" and "*3" to show Michal etc the 
differences of applying and not applying the shrinker fix patch here: 
https://lkml.org/lkml/2018/8/3/384



*1. V3 patches
1)After inflating some amount of memory, actual=101536 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   79757289 514  10 171 447
Swap: 10236   0   10236

2) dd if=478MB_file of=/dev/null, actual=1058721792 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   79757233 102  10 639 475
Swap: 10236   0   10236

The advantage is that the inflated pages are given back to mm based on 
the number, i.e. ~56MB(diff "actual" above) of the reclaimer is asking 
for. This is more adaptive.




*2. V2 paches, balloon_pages_to_shrink=100 pages (around 4GB), with 
the shrinker fix patches applied.

1)After inflating some amount of memory, actual=101536 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   79757288 530  10 157 455
Swap: 10236   0   10236

2)dd if=478MB_file of=/dev/null, actual=5096001536 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   797533813953  10 6404327
Swap: 10236   0   10236

In the above example, we set 4GB to shrink to make the difference 
obvious. Though the claimer only needs to reclaim ~56MB memory, 4GB 
inflated pages are given back to mm, which is unnecessary. From the 
user's perspective, it has no idea of how many pages to given back at 
the time of setting the module parameter (balloon_pages_to_shrink). So I 
think the above "*1" is better.




*3.  V2 paches, balloon_pages_to_shrink=100 pages (around 4GB), 
without the shrinker fix patches applied.

1) After inflating some amount of memory, actual=101536 Bytes
free -m
   totalusedfree  shared buff/cache   
available

Mem:   79757292 524  10 158 450
Swap: 10236   0   10236

2) dd if=478MB_file of=/dev/null, actual=8589934592 Bytes
free -m
 totalusedfree  shared  buff/cache 
available

Mem:   7975  537281  10 6407656
Swap: 10236   0   10236

Compared to *2, all the balloon pages are shrunk, but users expect 4GB 
to shrink. The reason is that do_slab_shrink has a mistake in 
calculating schrinkctl->nr_scanned, which should be the actual number of 
pages that the shrinker has freed, but do slab_shrink still treat that 
value as 128 (but 4GB has actually been freed).



Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v3 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-03 Thread Wei Wang
The OOM notifier is getting deprecated to use for the reasons:
- As a callout from the oom context, it is too subtle and easy to
  generate bugs and corner cases which are hard to track;
- It is called too late (after the reclaiming has been performed).
  Drivers with large amuont of reclaimable memory is expected to
  release them at an early stage of memory pressure;
- The notifier callback isn't aware of oom contrains;
Link: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure. The balloon pages are
given back to mm adaptively by returning the number of pages that the
reclaimer is asking for (i.e. sc->nr_to_scan).

Currently the max possible value of sc->nr_to_scan passed to the balloon
shrinker is SHRINK_BATCH, which is 128. This is smaller than the
limitation that only VIRTIO_BALLOON_ARRAY_PFNS_MAX (256) pages can be
returned via one invocation of leak_balloon. But this patch still
considers the case that SHRINK_BATCH or shrinker->batch could be changed
to a value larger than VIRTIO_BALLOON_ARRAY_PFNS_MAX, which will need to
do multiple invocations of leak_balloon.

Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been used
to release balloon pages on OOM. We continue to use this feature bit for
the shrinker, so the shrinker is only registered when this feature bit
has been negotiated with host.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 111 ++--
 1 file changed, 60 insertions(+), 51 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8100e77..612a359 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -40,13 +39,8 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> 
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
-#define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
-static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
-module_param(oom_pages, int, S_IRUSR | S_IWUSR);
-MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
-
 #ifdef CONFIG_BALLOON_COMPACTION
 static struct vfsmount *balloon_mnt;
 #endif
@@ -86,8 +80,8 @@ struct virtio_balloon {
/* Memory statistics */
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 
-   /* To register callback in oom notifier call chain */
-   struct notifier_block nb;
+   /* To register a shrinker to shrink memory upon memory pressure */
+   struct shrinker shrinker;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -365,38 +359,6 @@ static void update_balloon_size(struct virtio_balloon *vb)
  );
 }
 
-/*
- * virtballoon_oom_notify - release pages when system is under severe
- * memory pressure (called from out_of_memory())
- * @self : notifier block struct
- * @dummy: not used
- * @parm : returned - number of freed pages
- *
- * The balancing of memory by use of the virtio balloon should not cause
- * the termination of processes while there are pages in the balloon.
- * If virtio balloon manages to release some memory, it will make the
- * system return and retry the allocation that forced the OOM killer
- * to run.
- */
-static int virtballoon_oom_notify(struct notifier_block *self,
- unsigned long dummy, void *parm)
-{
-   struct virtio_balloon *vb;
-   unsigned long *freed;
-   unsigned num_freed_pages;
-
-   vb = container_of(self, struct virtio_balloon, nb);
-   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
-   return NOTIFY_OK;
-
-   freed = parm;
-   num_freed_pages = leak_balloon(vb, oom_pages);
-   update_balloon_size(vb);
-   *freed += num_freed_pages;
-
-   return NOTIFY_OK;
-}
-
 static void update_balloon_stats_func(struct work_struct *work)
 {
struct virtio_balloon *vb;
@@ -550,6 +512,53 @@ static struct file_system_type balloon_fs = {
 
 #endif /* CONFIG_BALLOON_COMPACTION */
 
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+   unsigned long pages_to_free, pages_freed = 0;
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   pages_to_free = sc->nr_to_scan * VIRTIO_BALLOON_PAGES_PER_PAGE;
+
+   /*
+* One invocation of leak_balloon can deflate at most
+* VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+* multiple times to deflate pages till reaching pages_to_free.
+*/
+   while 

[virtio-dev] [PATCH v3 1/2] virtio-balloon: remove BUG() in init_vqs

2018-08-03 Thread Wei Wang
It's a bit overkill to use BUG when failing to add an entry to the
stats_vq in init_vqs. So remove it and just return the error to the
caller to bail out nicely.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 3988c09..8100e77 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -455,9 +455,13 @@ static int init_vqs(struct virtio_balloon *vb)
num_stats = update_balloon_stats(vb);
 
sg_init_one(, vb->stats, sizeof(vb->stats[0]) * num_stats);
-   if (virtqueue_add_outbuf(vb->stats_vq, , 1, vb, GFP_KERNEL)
-   < 0)
-   BUG();
+   err = virtqueue_add_outbuf(vb->stats_vq, , 1, vb,
+  GFP_KERNEL);
+   if (err) {
+   dev_warn(>vdev->dev, "%s: add stat_vq failed\n",
+__func__);
+   return err;
+   }
virtqueue_kick(vb->stats_vq);
}
return 0;
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v3 0/2] virtio-balloon: some improvements

2018-08-03 Thread Wei Wang
This series is split from the "Virtio-balloon: support free page
reporting" series to make some improvements.

ChangeLog:
v2->v3:
- shrink the balloon pages according to the amount requested by the
  claimer, instead of using a user specified number;
v1->v2:
- register the shrinker when VIRTIO_BALLOON_F_DEFLATE_ON_OOM is
  negotiated.

Wei Wang (2):
  virtio-balloon: remove BUG() in init_vqs
  virtio_balloon: replace oom notifier with shrinker

 drivers/virtio/virtio_balloon.c | 121 ++--
 1 file changed, 67 insertions(+), 54 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-02 Thread Wei Wang

On 08/02/2018 07:47 PM, Michal Hocko wrote:

On Thu 02-08-18 18:32:44, Wei Wang wrote:

On 08/01/2018 07:34 PM, Michal Hocko wrote:

On Wed 01-08-18 19:12:25, Wei Wang wrote:

On 07/30/2018 05:00 PM, Michal Hocko wrote:

On Fri 27-07-18 17:24:55, Wei Wang wrote:

The OOM notifier is getting deprecated to use for the reasons mentioned
here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure.

It would be great to document the replacement. This is not a small
change...

OK. I plan to document the following to the commit log:

The OOM notifier is getting deprecated to use for the reasons:
  - As a callout from the oom context, it is too subtle and easy to
generate bugs and corner cases which are hard to track;
  - It is called too late (after the reclaiming has been performed).
Drivers with large amuont of reclaimable memory is expected to be
released them at an early age of memory pressure;
  - The notifier callback isn't aware of the oom contrains;
  Link: https://lkml.org/lkml/2018/7/12/314

  This patch replaces the virtio-balloon oom notifier with a shrinker
  to release balloon pages on memory pressure. Users can set the amount of
  memory pages to release each time a shrinker_scan is called via the
  module parameter balloon_pages_to_shrink, and the default amount is 256
  pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
  been used to release balloon pages on OOM. We continue to use this
  feature bit for the shrinker, so the shrinker is only registered when
  this feature bit has been negotiated with host.

Do you have any numbers for how does this work in practice?

It works in this way: for example, we can set the parameter,
balloon_pages_to_shrink, to shrink 1GB memory once shrink scan is called.
Now, we have a 8GB guest, and we balloon out 7GB. When shrink scan is
called, the balloon driver will get back 1GB memory and give them back to
mm, then the ballooned memory becomes 6GB.

When the shrinker scan is called the second time, another 1GB will be given
back to mm. So the ballooned pages are given back to mm gradually.


Let's say
you have a medium page cache workload which triggers kswapd to do a
light reclaim? Hardcoded shrinking sounds quite dubious to me but I have
no idea how people expect this to work. Shouldn't this be more
adaptive? How precious are those pages anyway?

Those pages are given to host to use usually because the guest has enough
free memory, and host doesn't want to waste those pieces of memory as they
are not used by this guest. When the guest needs them, it is reasonable that
the guest has higher priority to take them back.
But I'm not sure if there would be a more adaptive approach than "gradually
giving back as the guest wants more".

I am not sure I follow. Let me be more specific. Say you have a trivial
stream IO triggering reclaim to recycle clean page cache. This will
invoke slab shrinkers as well. Do you really want to drop your batch of
pages on each invocation? Doesn't that remove them very quickly? Just
try to dd if=large_file of=/dev/null and see how your pages are
disappearing. Shrinkers usually scale the number of objects they are
going to reclaim based on the memory pressure (aka targer to be
reclaimed).


Thanks, I think it looks better to make it more adaptive. I'll send out 
a new version for review.


Best,
Wei



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-02 Thread Wei Wang

On 08/02/2018 07:00 PM, Tetsuo Handa wrote:

On 2018/08/02 19:32, Wei Wang wrote:

On 08/01/2018 07:34 PM, Michal Hocko wrote:

Do you have any numbers for how does this work in practice?

It works in this way: for example, we can set the parameter, 
balloon_pages_to_shrink,
to shrink 1GB memory once shrink scan is called. Now, we have a 8GB guest, and 
we balloon
out 7GB. When shrink scan is called, the balloon driver will get back 1GB 
memory and give
them back to mm, then the ballooned memory becomes 6GB.

Since shrinker might be called concurrently (am I correct?),


Not sure about it being concurrently, but I think it would be called 
repeatedly as should_continue_reclaim() returns true.



Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-02 Thread Wei Wang

On 08/01/2018 07:34 PM, Michal Hocko wrote:

On Wed 01-08-18 19:12:25, Wei Wang wrote:

On 07/30/2018 05:00 PM, Michal Hocko wrote:

On Fri 27-07-18 17:24:55, Wei Wang wrote:

The OOM notifier is getting deprecated to use for the reasons mentioned
here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure.

It would be great to document the replacement. This is not a small
change...

OK. I plan to document the following to the commit log:

   The OOM notifier is getting deprecated to use for the reasons:
 - As a callout from the oom context, it is too subtle and easy to
   generate bugs and corner cases which are hard to track;
 - It is called too late (after the reclaiming has been performed).
   Drivers with large amuont of reclaimable memory is expected to be
   released them at an early age of memory pressure;
 - The notifier callback isn't aware of the oom contrains;
 Link: https://lkml.org/lkml/2018/7/12/314

 This patch replaces the virtio-balloon oom notifier with a shrinker
 to release balloon pages on memory pressure. Users can set the amount of
 memory pages to release each time a shrinker_scan is called via the
 module parameter balloon_pages_to_shrink, and the default amount is 256
 pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
 been used to release balloon pages on OOM. We continue to use this
 feature bit for the shrinker, so the shrinker is only registered when
 this feature bit has been negotiated with host.

Do you have any numbers for how does this work in practice?


It works in this way: for example, we can set the parameter, 
balloon_pages_to_shrink, to shrink 1GB memory once shrink scan is 
called. Now, we have a 8GB guest, and we balloon out 7GB. When shrink 
scan is called, the balloon driver will get back 1GB memory and give 
them back to mm, then the ballooned memory becomes 6GB.


When the shrinker scan is called the second time, another 1GB will be 
given back to mm. So the ballooned pages are given back to mm gradually.



Let's say
you have a medium page cache workload which triggers kswapd to do a
light reclaim? Hardcoded shrinking sounds quite dubious to me but I have
no idea how people expect this to work. Shouldn't this be more
adaptive? How precious are those pages anyway?


Those pages are given to host to use usually because the guest has 
enough free memory, and host doesn't want to waste those pieces of 
memory as they are not used by this guest. When the guest needs them, it 
is reasonable that the guest has higher priority to take them back.
But I'm not sure if there would be a more adaptive approach than 
"gradually giving back as the guest wants more".


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-01 Thread Wei Wang

On 07/30/2018 05:00 PM, Michal Hocko wrote:

On Fri 27-07-18 17:24:55, Wei Wang wrote:

The OOM notifier is getting deprecated to use for the reasons mentioned
here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure.

It would be great to document the replacement. This is not a small
change...


OK. I plan to document the following to the commit log:

  The OOM notifier is getting deprecated to use for the reasons:
- As a callout from the oom context, it is too subtle and easy to
  generate bugs and corner cases which are hard to track;
- It is called too late (after the reclaiming has been performed).
  Drivers with large amuont of reclaimable memory is expected to be
  released them at an early age of memory pressure;
- The notifier callback isn't aware of the oom contrains;
Link: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure. Users can set the 
amount of

memory pages to release each time a shrinker_scan is called via the
module parameter balloon_pages_to_shrink, and the default amount is 256
pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
been used to release balloon pages on OOM. We continue to use this
feature bit for the shrinker, so the shrinker is only registered when
this feature bit has been negotiated with host.

In addition, the bug in the replaced virtballoon_oom_notify that only
VIRTIO_BALLOON_ARRAY_PFNS_MAX (i.e 256) balloon pages can be freed
though the user has specified more than that number is fixed in the
shrinker_scan function.


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v2 0/2] virtio-balloon: some improvements

2018-07-27 Thread Wei Wang
This series is split from the "Virtio-balloon: support free page
reporting" series to make some improvements.

v1->v2 ChangeLog:
- register the shrinker when VIRTIO_BALLOON_F_DEFLATE_ON_OOM is negotiated.

Wei Wang (2):
  virtio-balloon: remove BUG() in init_vqs
  virtio_balloon: replace oom notifier with shrinker

 drivers/virtio/virtio_balloon.c | 125 +++-
 1 file changed, 72 insertions(+), 53 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v2 1/2] virtio-balloon: remove BUG() in init_vqs

2018-07-27 Thread Wei Wang
It's a bit overkill to use BUG when failing to add an entry to the
stats_vq in init_vqs. So remove it and just return the error to the
caller to bail out nicely.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6b237e3..9356a1a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -455,9 +455,13 @@ static int init_vqs(struct virtio_balloon *vb)
num_stats = update_balloon_stats(vb);
 
sg_init_one(, vb->stats, sizeof(vb->stats[0]) * num_stats);
-   if (virtqueue_add_outbuf(vb->stats_vq, , 1, vb, GFP_KERNEL)
-   < 0)
-   BUG();
+   err = virtqueue_add_outbuf(vb->stats_vq, , 1, vb,
+  GFP_KERNEL);
+   if (err) {
+   dev_warn(>vdev->dev, "%s: add stat_vq failed\n",
+__func__);
+   return err;
+   }
virtqueue_kick(vb->stats_vq);
}
return 0;
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker

2018-07-27 Thread Wei Wang
The OOM notifier is getting deprecated to use for the reasons mentioned
here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure.

In addition, the bug in the replaced virtballoon_oom_notify that only
VIRTIO_BALLOON_ARRAY_PFNS_MAX (i.e 256) balloon pages can be freed
though the user has specified more than that number is fixed in the
shrinker_scan function.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 115 +++-
 1 file changed, 65 insertions(+), 50 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9356a1a..6b2229b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -40,12 +39,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> 
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
-#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define DEFAULT_BALLOON_PAGES_TO_SHRINK 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
-static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
-module_param(oom_pages, int, S_IRUSR | S_IWUSR);
-MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
+static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK;
+module_param(balloon_pages_to_shrink, ulong, 0600);
+MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory presure");
 
 #ifdef CONFIG_BALLOON_COMPACTION
 static struct vfsmount *balloon_mnt;
@@ -86,8 +85,8 @@ struct virtio_balloon {
/* Memory statistics */
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 
-   /* To register callback in oom notifier call chain */
-   struct notifier_block nb;
+   /* To register a shrinker to shrink memory upon memory pressure */
+   struct shrinker shrinker;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -365,38 +364,6 @@ static void update_balloon_size(struct virtio_balloon *vb)
  );
 }
 
-/*
- * virtballoon_oom_notify - release pages when system is under severe
- * memory pressure (called from out_of_memory())
- * @self : notifier block struct
- * @dummy: not used
- * @parm : returned - number of freed pages
- *
- * The balancing of memory by use of the virtio balloon should not cause
- * the termination of processes while there are pages in the balloon.
- * If virtio balloon manages to release some memory, it will make the
- * system return and retry the allocation that forced the OOM killer
- * to run.
- */
-static int virtballoon_oom_notify(struct notifier_block *self,
- unsigned long dummy, void *parm)
-{
-   struct virtio_balloon *vb;
-   unsigned long *freed;
-   unsigned num_freed_pages;
-
-   vb = container_of(self, struct virtio_balloon, nb);
-   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
-   return NOTIFY_OK;
-
-   freed = parm;
-   num_freed_pages = leak_balloon(vb, oom_pages);
-   update_balloon_size(vb);
-   *freed += num_freed_pages;
-
-   return NOTIFY_OK;
-}
-
 static void update_balloon_stats_func(struct work_struct *work)
 {
struct virtio_balloon *vb;
@@ -548,6 +515,54 @@ static struct file_system_type balloon_fs = {
 
 #endif /* CONFIG_BALLOON_COMPACTION */
 
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+   unsigned long pages_to_free = balloon_pages_to_shrink,
+ pages_freed = 0;
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   /*
+* One invocation of leak_balloon can deflate at most
+* VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+* multiple times to deflate pages till reaching
+* balloon_pages_to_shrink pages.
+*/
+   while (vb->num_pages && pages_to_free) {
+   pages_to_free = balloon_pages_to_shrink - pages_freed;
+   pages_freed += leak_balloon(vb, pages_to_free);
+   }
+   update_balloon_size(vb);
+
+   return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
+  struct shrink_control *sc)
+{
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   return min_t(unsigned long, vb->num_pages, balloon_pages_to_shrink) /
+  VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static voi

[virtio-dev] Re: [PATCH v36 0/5] Virtio-balloon: support free page reporting

2018-07-24 Thread Wei Wang

On 07/23/2018 10:36 PM, Dr. David Alan Gilbert wrote:

* Michael S. Tsirkin (m...@redhat.com) wrote:

On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote:

This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.

* Tests
- Test Environment
 Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 Guest: 8G RAM, 4 vCPU
 Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
 - Idle Guest Live Migration Time (results are averaged over 10 runs):
 - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction
(setting page poisoning zero and enabling ksm don't affect the
  comparison result)
 - Guest with Linux Compilation Workload (make bzImage -j4):
 - Live Migration Time (average)
   Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction
 - Linux Compilation Time
   Optimization v.s. Legacy = 5min4s v.s. 5min12s
   --> no obvious difference

I'd like to see dgilbert's take on whether this kind of gain
justifies adding a PV interfaces, and what kind of guest workload
is appropriate.

Cc'd.

Well, 44% is great ... although the measurement is a bit weird.

a) A 2 second downtime is very large; 300-500ms is more normal


No problem, I will set downtime to 400ms for the tests.


b) I'm not sure what the 'average' is  - is that just between a bunch of
repeated migrations?


Yes, just repeatedly ("source<>destination" migration) do the tests 
and get an averaged result.




c) What load was running in the guest during the live migration?


The first one above just uses a guest without running any specific 
workload (named idle guests).

The second one uses a guest with the Linux compilation workload running.



An interesting measurement to add would be to do the same test but
with a VM with a lot more RAM but the same load;  you'd hope the gain
would be even better.
It would be interesting, especially because the users who are interested
are people creating VMs allocated with lots of extra memory (for the
worst case) but most of the time migrating when it's fairly idle.


OK. I will add tests of a guest with larger memory.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker

2018-07-23 Thread Wei Wang

On 07/22/2018 10:48 PM, Michael S. Tsirkin wrote:

On Fri, Jul 20, 2018 at 04:33:02PM +0800, Wei Wang wrote:
  
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,

+ struct shrink_control *sc)
+{
+   unsigned long pages_to_free = balloon_pages_to_shrink,
+ pages_freed = 0;
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   /*
+* One invocation of leak_balloon can deflate at most
+* VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+* multiple times to deflate pages till reaching
+* balloon_pages_to_shrink pages.
+*/
+   while (vb->num_pages && pages_to_free) {
+   pages_to_free = balloon_pages_to_shrink - pages_freed;
+   pages_freed += leak_balloon(vb, pages_to_free);
+   }
+   update_balloon_size(vb);

Are you sure that this is never called if count returned 0?


Yes. Please see do_shrink_slab, it just returns if count is 0.




+
+   return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
+  struct shrink_control *sc)
+{
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   /*
+* We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to handle the
+* case when shrinker needs to be invoked to relieve memory pressure.
+*/
+   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+   return 0;

So why not skip notifier registration when deflate on oom
is clear?


Sounds good, thanks.



vb->vb_dev_info.inode->i_mapping->a_ops = _aops;
  #endif
+   err = virtio_balloon_register_shrinker(vb);
+   if (err)
+   goto out_del_vqs;
  
So we can get scans before device is ready. Leak will fail

then. Why not register later after device is ready?


Probably no.

- it would be better not to set device ready when register_shrinker failed.
- When the device isn't ready, ballooning won't happen, that is, 
vb->num_pages will be 0, which results in shrinker_count=0 and 
shrinker_scan won't be called.


So I think it would be better to have shrinker registered before 
device_ready.


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v36 1/5] virtio-balloon: remove BUG() in init_vqs

2018-07-20 Thread Wei Wang
It's a bit overkill to use BUG when failing to add an entry to the
stats_vq in init_vqs. So remove it and just return the error to the
caller to bail out nicely.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6b237e3..9356a1a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -455,9 +455,13 @@ static int init_vqs(struct virtio_balloon *vb)
num_stats = update_balloon_stats(vb);
 
sg_init_one(, vb->stats, sizeof(vb->stats[0]) * num_stats);
-   if (virtqueue_add_outbuf(vb->stats_vq, , 1, vb, GFP_KERNEL)
-   < 0)
-   BUG();
+   err = virtqueue_add_outbuf(vb->stats_vq, , 1, vb,
+  GFP_KERNEL);
+   if (err) {
+   dev_warn(>vdev->dev, "%s: add stat_vq failed\n",
+__func__);
+   return err;
+   }
virtqueue_kick(vb->stats_vq);
}
return 0;
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v36 4/5] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-07-20 Thread Wei Wang
In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.

Signed-off-by: Wei Wang 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Acked-by: Andrew Morton 
---
 mm/page_poison.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index aa2b3d3..830f604 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -17,6 +17,11 @@ static int __init early_page_poison_param(char *buf)
 }
 early_param("page_poison", early_page_poison_param);
 
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
 bool page_poisoning_enabled(void)
 {
/*
@@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
debug_pagealloc_enabled()));
 }
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
 
 static void poison_page(struct page *page)
 {
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v36 0/5] Virtio-balloon: support free page reporting

2018-07-20 Thread Wei Wang
This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.

* Tests
- Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Guest: 8G RAM, 4 vCPU
Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
- Idle Guest Live Migration Time (results are averaged over 10 runs):
- Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction
(setting page poisoning zero and enabling ksm don't affect the
 comparison result)
- Guest with Linux Compilation Workload (make bzImage -j4):
- Live Migration Time (average)
  Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction
- Linux Compilation Time
  Optimization v.s. Legacy = 5min4s v.s. 5min12s
  --> no obvious difference

ChangeLog:
v35->v36:
- remove the mm patch, as Linus has a suggestion to get free page
  addresses via allocation, instead of reading from the free page
  list.
- virtio-balloon:
- replace oom notifier with shrinker;
- the guest to host communication interface remains the same as
  v32.
- allocate free page blocks and send to host one by one, and free
  them after sending all the pages.

For ChangeLogs from v22 to v35, please reference
https://lwn.net/Articles/759413/

For ChangeLogs before v21, please reference
https://lwn.net/Articles/743660/

Wei Wang (5):
  virtio-balloon: remove BUG() in init_vqs
  virtio_balloon: replace oom notifier with shrinker
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 drivers/virtio/virtio_balloon.c | 456 ++--
 include/uapi/linux/virtio_balloon.h |   7 +
 mm/page_poison.c|   6 +
 3 files changed, 394 insertions(+), 75 deletions(-)

-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v36 5/5] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-07-20 Thread Wei Wang
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value that is in use.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 10 ++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 82cd497..6340cc1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -814,6 +814,7 @@ static int virtio_balloon_register_shrinker(struct 
virtio_balloon *vb)
 static int virtballoon_probe(struct virtio_device *vdev)
 {
struct virtio_balloon *vb;
+   __u32 poison_val;
int err;
 
if (!vdev->config->get) {
@@ -883,6 +884,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
  VIRTIO_BALLOON_CMD_ID_STOP);
spin_lock_init(>free_page_list_lock);
INIT_LIST_HEAD(>free_page_list);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+   memset(_val, PAGE_POISON, sizeof(poison_val));
+   virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, _val);
+   }
}
 
err = virtio_balloon_register_shrinker(vb);
@@ -979,6 +985,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+   if (!page_poisoning_enabled())
+   __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
 }
@@ -988,6 +997,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+   VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 18ee430..80a7b7e 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+   /* Stores PAGE_POISON if page poisoning is in use */
+   __u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v36 3/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-07-20 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
obtained one by one from the mm free list via the regular allocation
function. The allocated pages are given back to mm after they are put onto
the vq.

Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register. When
the guest starts to report, it first sends a start cmd to host via the
free page vq, which acks to host the cmd id received. When the guest
finishes reporting free pages, a stop cmd is sent to host via the vq.

TODO:
- Add a batch page allocation API to amortize the allocation overhead.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
 drivers/virtio/virtio_balloon.c | 331 +---
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 307 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index c6fd406..82cd497 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -42,6 +42,14 @@
 #define DEFAULT_BALLOON_PAGES_TO_SHRINK 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
+__GFP_NOMEMALLOC)
+/* The order of free page blocks to report to host */
+#define VIRTIO_BALLOON_FREE_PAGE_ORDER (MAX_ORDER - 1)
+/* The size of a free page block in bytes */
+#define VIRTIO_BALLOON_FREE_PAGE_SIZE \
+   (1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT))
+
 static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK;
 module_param(balloon_pages_to_shrink, ulong, 0600);
 MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory presure");
@@ -50,9 +58,22 @@ MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on 
memory presure");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -62,6 +83,16 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The list of allocated free pages, waiting to be given back to mm */
+   struct list_head free_page_list;
+   spinlock_t free_page_list_lock;
+   /* The cmd id received from host */
+   u32 cmd_id_received;
+   /* The cmd id that is actively in use */
+   __virtio32 cmd_id_active;
+   /* Buffer to store the stop sign */
+   __virtio32 cmd_id_stop;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -325,17 +356,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -352,6 +372,52 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id,

[virtio-dev] [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker

2018-07-20 Thread Wei Wang
The OOM notifier is getting deprecated to use for the reasons mentioned
here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure.

In addition, the bug in the replaced virtballoon_oom_notify that only
VIRTIO_BALLOON_ARRAY_PFNS_MAX (i.e 256) balloon pages can be freed
though the user has specified more than that number is fixed in the
shrinker_scan function.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
 drivers/virtio/virtio_balloon.c | 113 +++-
 1 file changed, 65 insertions(+), 48 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9356a1a..c6fd406 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -40,12 +39,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> 
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
-#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define DEFAULT_BALLOON_PAGES_TO_SHRINK 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
-static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
-module_param(oom_pages, int, S_IRUSR | S_IWUSR);
-MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
+static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK;
+module_param(balloon_pages_to_shrink, ulong, 0600);
+MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory presure");
 
 #ifdef CONFIG_BALLOON_COMPACTION
 static struct vfsmount *balloon_mnt;
@@ -86,8 +85,8 @@ struct virtio_balloon {
/* Memory statistics */
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 
-   /* To register callback in oom notifier call chain */
-   struct notifier_block nb;
+   /* To register a shrinker to shrink memory upon memory pressure */
+   struct shrinker shrinker;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -365,38 +364,6 @@ static void update_balloon_size(struct virtio_balloon *vb)
  );
 }
 
-/*
- * virtballoon_oom_notify - release pages when system is under severe
- * memory pressure (called from out_of_memory())
- * @self : notifier block struct
- * @dummy: not used
- * @parm : returned - number of freed pages
- *
- * The balancing of memory by use of the virtio balloon should not cause
- * the termination of processes while there are pages in the balloon.
- * If virtio balloon manages to release some memory, it will make the
- * system return and retry the allocation that forced the OOM killer
- * to run.
- */
-static int virtballoon_oom_notify(struct notifier_block *self,
- unsigned long dummy, void *parm)
-{
-   struct virtio_balloon *vb;
-   unsigned long *freed;
-   unsigned num_freed_pages;
-
-   vb = container_of(self, struct virtio_balloon, nb);
-   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
-   return NOTIFY_OK;
-
-   freed = parm;
-   num_freed_pages = leak_balloon(vb, oom_pages);
-   update_balloon_size(vb);
-   *freed += num_freed_pages;
-
-   return NOTIFY_OK;
-}
-
 static void update_balloon_stats_func(struct work_struct *work)
 {
struct virtio_balloon *vb;
@@ -548,6 +515,61 @@ static struct file_system_type balloon_fs = {
 
 #endif /* CONFIG_BALLOON_COMPACTION */
 
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+   unsigned long pages_to_free = balloon_pages_to_shrink,
+ pages_freed = 0;
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   /*
+* One invocation of leak_balloon can deflate at most
+* VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+* multiple times to deflate pages till reaching
+* balloon_pages_to_shrink pages.
+*/
+   while (vb->num_pages && pages_to_free) {
+   pages_to_free = balloon_pages_to_shrink - pages_freed;
+   pages_freed += leak_balloon(vb, pages_to_free);
+   }
+   update_balloon_size(vb);
+
+   return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
+  struct shrink_control *sc)
+{
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   /*
+* We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to handle the
+* case when shrinker needs 

[virtio-dev] Re: [PATCH v35 1/5] mm: support to get hints of free page blocks

2018-07-12 Thread Wei Wang

On 07/12/2018 07:49 PM, Michal Hocko wrote:

On Thu 12-07-18 19:34:16, Wei Wang wrote:

On 07/12/2018 04:13 PM, Michal Hocko wrote:

On Thu 12-07-18 10:52:08, Wei Wang wrote:

On 07/12/2018 10:30 AM, Linus Torvalds wrote:

On Wed, Jul 11, 2018 at 7:17 PM Wei Wang  wrote:

Would it be better to remove __GFP_THISNODE? We actually want to get all
the guest free pages (from all the nodes).

Maybe. Or maybe it would be better to have the memory balloon logic be
per-node? Maybe you don't want to remove too much memory from one
node? I think it's one of those "play with it" things.

I don't think that's the big issue, actually. I think the real issue
is how to react quickly and gracefully to "oops, I'm trying to give
memory away, but now the guest wants it back" while you're in the
middle of trying to create that 2TB list of pages.

OK. virtio-balloon has already registered an oom notifier
(virtballoon_oom_notify). I plan to add some control there. If oom happens,
- stop the page allocation;
- immediately give back the allocated pages to mm.

Please don't. Oom notifier is an absolutely hideous interface which
should go away sooner or later (I would much rather like the former) so
do not build a new logic on top of it. I would appreciate if you
actually remove the notifier much more.

You can give memory back from the standard shrinker interface. If we are
reaching low reclaim priorities then we are struggling to reclaim memory
and then you can start returning pages back.

OK. Just curious why oom notifier is thought to be hideous, and has it been
a consensus?

Because it is a completely non-transparent callout from the OOM context
which is really subtle on its own. It is just too easy to end up in
weird corner cases. We really have to be careful and be as swift as
possible. Any potential sleep would make the OOM situation much worse
because nobody would be able to make a forward progress or (in)direct
dependency on MM subsystem can easily deadlock. Those are really hard
to track down and defining the notifier as blockable by design which
just asks for bad implementations because most people simply do not
realize how subtle the oom context is.

Another thing is that it happens way too late when we have basically
reclaimed the world and didn't get out of the memory pressure so you can
expect any workload is suffering already. Anybody sitting on a large
amount of reclaimable memory should have released that memory by that
time. Proportionally to the reclaim pressure ideally.

The notifier API is completely unaware of oom constrains. Just imagine
you are OOM in a subset of numa nodes. Callback doesn't have any idea
about that.

Moreover we do have proper reclaim mechanism that has a feedback
loop and that should be always preferable to an abrupt reclaim.


Sounds very reasonable, thanks for the elaboration. I'll try with shrinker.

Best,
Wei




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v35 1/5] mm: support to get hints of free page blocks

2018-07-12 Thread Wei Wang

On 07/12/2018 04:13 PM, Michal Hocko wrote:

On Thu 12-07-18 10:52:08, Wei Wang wrote:

On 07/12/2018 10:30 AM, Linus Torvalds wrote:

On Wed, Jul 11, 2018 at 7:17 PM Wei Wang  wrote:

Would it be better to remove __GFP_THISNODE? We actually want to get all
the guest free pages (from all the nodes).

Maybe. Or maybe it would be better to have the memory balloon logic be
per-node? Maybe you don't want to remove too much memory from one
node? I think it's one of those "play with it" things.

I don't think that's the big issue, actually. I think the real issue
is how to react quickly and gracefully to "oops, I'm trying to give
memory away, but now the guest wants it back" while you're in the
middle of trying to create that 2TB list of pages.

OK. virtio-balloon has already registered an oom notifier
(virtballoon_oom_notify). I plan to add some control there. If oom happens,
- stop the page allocation;
- immediately give back the allocated pages to mm.

Please don't. Oom notifier is an absolutely hideous interface which
should go away sooner or later (I would much rather like the former) so
do not build a new logic on top of it. I would appreciate if you
actually remove the notifier much more.

You can give memory back from the standard shrinker interface. If we are
reaching low reclaim priorities then we are struggling to reclaim memory
and then you can start returning pages back.


OK. Just curious why oom notifier is thought to be hideous, and has it 
been a consensus?


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v35 1/5] mm: support to get hints of free page blocks

2018-07-11 Thread Wei Wang

On 07/12/2018 10:30 AM, Linus Torvalds wrote:

On Wed, Jul 11, 2018 at 7:17 PM Wei Wang  wrote:

Would it be better to remove __GFP_THISNODE? We actually want to get all
the guest free pages (from all the nodes).

Maybe. Or maybe it would be better to have the memory balloon logic be
per-node? Maybe you don't want to remove too much memory from one
node? I think it's one of those "play with it" things.

I don't think that's the big issue, actually. I think the real issue
is how to react quickly and gracefully to "oops, I'm trying to give
memory away, but now the guest wants it back" while you're in the
middle of trying to create that 2TB list of pages.


OK. virtio-balloon has already registered an oom notifier 
(virtballoon_oom_notify). I plan to add some control there. If oom happens,

- stop the page allocation;
- immediately give back the allocated pages to mm.

Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v35 1/5] mm: support to get hints of free page blocks

2018-07-11 Thread Wei Wang

On 07/12/2018 12:23 AM, Linus Torvalds wrote:

On Wed, Jul 11, 2018 at 2:21 AM Michal Hocko  wrote:

We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER -1).
So why do we need any array based interface?

That was actually my original argument in the original thread - that
the only new interface people might want is one that just tells how
many of those MAX_ORDER-1 pages there are.

See the thread in v33 with the subject

   "[PATCH v33 1/4] mm: add a function to get free page blocks"

and look for me suggesting just using

 #define GFP_MINFLAGS (__GFP_NORETRY | __GFP_NOWARN |
__GFP_THISNODE | __GFP_NOMEMALLOC)


Would it be better to remove __GFP_THISNODE? We actually want to get all 
the guest free pages (from all the nodes).


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v35 1/5] mm: support to get hints of free page blocks

2018-07-11 Thread Wei Wang

On 07/11/2018 05:21 PM, Michal Hocko wrote:

On Tue 10-07-18 18:44:34, Linus Torvalds wrote:
[...]

That was what I tried to encourage with actually removing the pages
form the page list. That would be an _incremental_ interface. You can
remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark
them free for ballooning that way. And if you still feel you have tons
of free memory, just continue removing more pages from the free list.

We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER -1).
So why do we need any array based interface?


Yes, I'm trying to get free pages directly via alloc_pages, so there 
will be no new mm APIs.
I plan to let free page allocation stop when the remaining system free 
memory becomes close to min_free_kbytes (prevent swapping).



Best,
Wei



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



  1   2   3   4   5   6   >