[PATCH v37 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the support of reporting hints of guest free pages to host via virtio-balloon. Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are obtained one by one from the mm free list via the regular allocation function. Host requests the guest to report free page hints by sending a new cmd id to the guest via the free_page_report_cmd_id configuration register. When the guest starts to report, it first sends a start cmd to host via the free page vq, which acks to host the cmd id received. When the guest finishes reporting free pages, a stop cmd is sent to host via the vq. Host may also send a stop cmd id to the guest to stop the reporting. VIRTIO_BALLOON_CMD_ID_STOP: Host sends this cmd to stop the guest reporting. VIRTIO_BALLOON_CMD_ID_DONE: Host sends this cmd to tell the guest that the reported pages are ready to be freed. Why does the guest free the reported pages when host tells it is ready to free? This is because freeing pages appears to be expensive for live migration. free_pages() dirties memory very quickly and makes the live migraion not converge in some cases. So it is good to delay the free_page operation when the migration is done, and host sends a command to guest about that. Why do we need the new VIRTIO_BALLOON_CMD_ID_DONE, instead of reusing VIRTIO_BALLOON_CMD_ID_STOP? This is because live migration is usually done in several rounds. At the end of each round, host needs to send a VIRTIO_BALLOON_CMD_ID_STOP cmd to the guest to stop (or say pause) the reporting. The guest resumes the reporting when it receives a new command id at the beginning of the next round. So we need a new cmd id to distinguish between "stop reporting" and "ready to free the reported pages". TODO: - Add a batch page allocation API to amortize the allocation overhead. Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michael S. Tsirkin Cc: Michal Hocko Cc: Andrew Morton Cc: Linus Torvalds --- drivers/virtio/virtio_balloon.c | 364 include/uapi/linux/virtio_balloon.h | 5 + 2 files changed, 336 insertions(+), 33 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index d1c1f62..a185678 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -41,13 +41,34 @@ #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 +#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \ +__GFP_NOMEMALLOC) +/* The order of free page blocks to report to host */ +#define VIRTIO_BALLOON_FREE_PAGE_ORDER (MAX_ORDER - 1) +/* The size of a free page block in bytes */ +#define VIRTIO_BALLOON_FREE_PAGE_SIZE \ + (1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT)) + #ifdef CONFIG_BALLOON_COMPACTION static struct vfsmount *balloon_mnt; #endif +enum virtio_balloon_vq { + VIRTIO_BALLOON_VQ_INFLATE, + VIRTIO_BALLOON_VQ_DEFLATE, + VIRTIO_BALLOON_VQ_STATS, + VIRTIO_BALLOON_VQ_FREE_PAGE, + VIRTIO_BALLOON_VQ_MAX +}; + struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; + + /* Balloon's own wq for cpu-intensive work items */ + struct workqueue_struct *balloon_wq; + /* The free page reporting work item submitted to the balloon wq */ + struct work_struct report_free_page_work; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -57,6 +78,18 @@ struct virtio_balloon { spinlock_t stop_update_lock; bool stop_update; + /* The list of allocated free pages, waiting to be given back to mm */ + struct list_head free_page_list; + spinlock_t free_page_list_lock; + /* The number of free page blocks on the above list */ + unsigned long num_free_page_blocks; + /* The cmd id received from host */ + u32 cmd_id_received; + /* The cmd id that is actively in use */ + __virtio32 cmd_id_active; + /* Buffer to store the stop sign */ + __virtio32 cmd_id_stop; + /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; @@ -320,17 +353,6 @@ static void stats_handle_request(struct virtio_balloon *vb) virtqueue_kick(vq); } -static void virtballoon_changed(struct virtio_device *vdev) -{ - struct virtio_balloon *vb = vdev->priv; - unsigned long flags; - - spin_lock_irqsave(>stop_update_lock, flags); - if (!vb->stop_update) - queue_work(system_freezable_wq, >update_balloon_size_work); - spin_unlock_irqrestore(>stop_update_lock, flags); -} - static inline s64 towards_target(struct virtio_balloon *vb) { s64 target;
[PATCH v37 2/3] mm/page_poison: expose page_poisoning_enabled to kernel modules
In some usages, e.g. virtio-balloon, a kernel module needs to know if page poisoning is in use. This patch exposes the page_poisoning_enabled function to kernel modules. Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Michal Hocko Cc: Michael S. Tsirkin Acked-by: Andrew Morton --- mm/page_poison.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/mm/page_poison.c b/mm/page_poison.c index aa2b3d3..830f604 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -17,6 +17,11 @@ static int __init early_page_poison_param(char *buf) } early_param("page_poison", early_page_poison_param); +/** + * page_poisoning_enabled - check if page poisoning is enabled + * + * Return true if page poisoning is enabled, or false if not. + */ bool page_poisoning_enabled(void) { /* @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void) (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && debug_pagealloc_enabled())); } +EXPORT_SYMBOL_GPL(page_poisoning_enabled); static void poison_page(struct page *page) { -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v37 3/3] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the guest is using page poisoning. Guest writes to the poison_val config field to tell host about the page poisoning value that is in use. Suggested-by: Michael S. Tsirkin Signed-off-by: Wei Wang Cc: Michael S. Tsirkin Cc: Michal Hocko Cc: Andrew Morton --- drivers/virtio/virtio_balloon.c | 10 ++ include/uapi/linux/virtio_balloon.h | 3 +++ 2 files changed, 13 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index a185678..728ecd1 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -825,6 +825,7 @@ static int virtio_balloon_register_shrinker(struct virtio_balloon *vb) static int virtballoon_probe(struct virtio_device *vdev) { struct virtio_balloon *vb; + __u32 poison_val; int err; if (!vdev->config->get) { @@ -892,6 +893,11 @@ static int virtballoon_probe(struct virtio_device *vdev) vb->num_free_page_blocks = 0; spin_lock_init(>free_page_list_lock); INIT_LIST_HEAD(>free_page_list); + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) { + memset(_val, PAGE_POISON, sizeof(poison_val)); + virtio_cwrite(vb->vdev, struct virtio_balloon_config, + poison_val, _val); + } } /* * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a @@ -992,6 +998,9 @@ static int virtballoon_restore(struct virtio_device *vdev) static int virtballoon_validate(struct virtio_device *vdev) { + if (!page_poisoning_enabled()) + __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON); + __virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM); return 0; } @@ -1001,6 +1010,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_FREE_PAGE_HINT, + VIRTIO_BALLOON_F_PAGE_POISON, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index 47c9eb4..a1966cd7 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */ +#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -48,6 +49,8 @@ struct virtio_balloon_config { __u32 actual; /* Free page report command id, readonly by guest */ __u32 free_page_report_cmd_id; + /* Stores PAGE_POISON if page poisoning is in use */ + __u32 poison_val; }; #define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */ -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v37 0/3] Virtio-balloon: support free page reporting
The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this series enables the virtio-balloon driver to report hints of guest free pages to host. It can be used to accelerate virtual machine (VM) live migration. Here is an introduction of this usage: Live migration needs to transfer the VM's memory from the source machine to the destination round by round. For the 1st round, all the VM's memory is transferred. From the 2nd round, only the pieces of memory that were written by the guest (after the 1st round) are transferred. One method that is popularly used by the hypervisor to track which part of memory is written is to have the hypervisor write-protect all the guest memory. This feature enables the optimization by skipping the transfer of guest free pages during VM live migration. It is not concerned that the memory pages are used after they are given to the hypervisor as a hint of the free pages, because they will be tracked by the hypervisor and transferred in the subsequent round if they are used and written. * Tests 1 Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms 2 Test Results (results are averaged over several repeated runs) 2.1 Guest setup: 8G RAM, 4 vCPU 2.1.1 Idle guest live migration time Optimization v.s. Legacy = 620ms vs 2970ms --> ~79% reduction 2.1.2 Guest live migration with Linux compilation workload (i.e. make bzImage -j4) running 1) Live Migration Time: Optimization v.s. Legacy = 2273ms v.s. 4502ms --> ~50% reduction 2) Linux Compilation Time: Optimization v.s. Legacy = 8min42s v.s. 8min43s --> no obvious difference 2.2 Guest setup: 128G RAM, 4 vCPU 2.2.1 Idle guest live migration time Optimization v.s. Legacy = 5294ms vs 41651ms --> ~87% reduction 2.2.2 Guest live migration with Linux compilation workload 1) Live Migration Time: Optimization v.s. Legacy = 8816ms v.s. 54201ms --> 84% reduction 2) Linux Compilation Time: Optimization v.s. Legacy = 8min30s v.s. 8min36s --> no obvious difference ChangeLog: v36->v37: - free the reported pages to mm when receives a DONE cmd from host. Please see patch 1's commit log for reasons. Please see patch 1's commit for detailed explanations. For ChangeLogs from v22 to v36, please reference https://lkml.org/lkml/2018/7/20/199 For ChangeLogs before v21, please reference https://lwn.net/Articles/743660/ Wei Wang (3): virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT mm/page_poison: expose page_poisoning_enabled to kernel modules virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON drivers/virtio/virtio_balloon.c | 374 include/uapi/linux/virtio_balloon.h | 8 + mm/page_poison.c| 6 + 3 files changed, 355 insertions(+), 33 deletions(-) -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization