from:"Wei Wang"

[PATCH v2] migration/postcopy: ensure preempt channel is ready before loading states

2024-04-04 Thread Wei Wang

Before loading the guest states, ensure that the preempt channel has been
ready to use, as some of the states (e.g. via virtio_load) might trigger
page faults that will be handled through the preempt channel. So yield to
the main thread in the case that the channel create event hasn't been
dispatched.

Originally-by: Lei Wang 
Link: 
https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced...@intel.com/T/
Suggested-by: Peter Xu 
Signed-off-by: Lei Wang 
Signed-off-by: Wei Wang 
---
 migration/savevm.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 388d7af7cd..63f9991a8a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2342,6 +2342,23 @@ static int 
loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
 
 QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
 
+/*
+ * Before loading the guest states, ensure that the preempt channel has
+ * been ready to use, as some of the states (e.g. via virtio_load) might
+ * trigger page faults that will be handled through the preempt channel.
+ * So yield to the main thread in the case that the channel create event
+ * hasn't been dispatched.
+ */
+do {
+if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
+mis->postcopy_qemufile_dst) {
+break;
+}
+
+aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
+} while (1);
+
 ret = qemu_loadvm_state_main(packf, mis);
 trace_loadvm_handle_cmd_packaged_main(ret);
 qemu_fclose(packf);
-- 
2.27.0

[PATCH v1] migration/postcopy: ensure preempt channel is ready before loading states

2024-04-04 Thread Wei Wang

Before loading the guest states, ensure that the preempt channel has been
ready to use, as some of the states (e.g. via virtio_load) might trigger
page faults that will be handled through the preempt channel. So yield to
the main thread in the case that the channel create event has been
dispatched.

Originally-by: Lei Wang 
Link: 
https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced...@intel.com/T/
Suggested-by: Peter Xu 
Signed-off-by: Lei Wang 
Signed-off-by: Wei Wang 
---
 migration/savevm.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 388d7af7cd..fbc9f2bdd4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2342,6 +2342,23 @@ static int 
loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
 
 QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
 
+/*
+ * Before loading the guest states, ensure that the preempt channel has
+ * been ready to use, as some of the states (e.g. via virtio_load) might
+ * trigger page faults that will be handled through the preempt channel.
+ * So yield to the main thread in the case that the channel create event
+ * has been dispatched.
+ */
+do {
+if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
+mis->postcopy_qemufile_dst) {
+break;
+}
+
+aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
+} while (!qemu_sem_timedwait(>postcopy_qemufile_dst_done, 1));
+
 ret = qemu_loadvm_state_main(packf, mis);
 trace_loadvm_handle_cmd_packaged_main(ret);
 qemu_fclose(packf);
-- 
2.27.0

[PATCH v2] migration: refactor migration_completion

2023-08-04 Thread Wei Wang

Current migration_completion function is a bit long. Refactor the long
implementation into different subfunctions:
- migration_completion_precopy: completion code related to precopy
- migration_completion_postcopy: completion code related to postcopy
- close_return_path_on_source: rp thread related cleanup on migration
completion. It is named to match with open_return_path_on_source.

This improves readability and is easier for future updates (e.g. add new
subfunctions when completion code related to new features are needed). No
functional changes intended.

Signed-off-by: Wei Wang 
---
Changelog:
- Merge await_return_path_close_on_source into
  close_return_path_on_source as the later basically just calls the
  previous;
- make migration_completion_postcopy "void" as it doesn't return a
  value.

 migration/migration.c | 174 --
 1 file changed, 98 insertions(+), 76 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5528acb65e..f1c55d1148 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2048,9 +2048,13 @@ static int open_return_path_on_source(MigrationState *ms,
 return 0;
 }
 
-/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
-static int await_return_path_close_on_source(MigrationState *ms)
+static int close_return_path_on_source(MigrationState *ms)
 {
+if (!ms->rp_state.rp_thread_created) {
+return 0;
+}
+trace_migration_return_path_end_before();
+
 /*
  * If this is a normal exit then the destination will send a SHUT and the
  * rp_thread will exit, however if there's an error we need to cause
@@ -2068,6 +2072,8 @@ static int 
await_return_path_close_on_source(MigrationState *ms)
 qemu_thread_join(>rp_state.rp_thread);
 ms->rp_state.rp_thread_created = false;
 trace_await_return_path_close_on_source_close();
+
+trace_migration_return_path_end_after(ms->rp_state.error);
 return ms->rp_state.error;
 }
 
@@ -2301,66 +2307,107 @@ static int migration_maybe_pause(MigrationState *s,
 return s->state == new_state ? 0 : -EINVAL;
 }
 
-/**
- * migration_completion: Used by migration_thread when there's not much left.
- *   The caller 'breaks' the loop when this returns.
- *
- * @s: Current migration state
- */
-static void migration_completion(MigrationState *s)
+static int migration_completion_precopy(MigrationState *s,
+int *current_active_state)
 {
 int ret;
-int current_active_state = s->state;
 
-if (s->state == MIGRATION_STATUS_ACTIVE) {
-qemu_mutex_lock_iothread();
-s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+qemu_mutex_lock_iothread();
+s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
-s->vm_old_state = runstate_get();
-global_state_store();
+s->vm_old_state = runstate_get();
+global_state_store();
 
-ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-trace_migration_completion_vm_stop(ret);
-if (ret >= 0) {
-ret = migration_maybe_pause(s, _active_state,
-MIGRATION_STATUS_DEVICE);
-}
-if (ret >= 0) {
-/*
- * Inactivate disks except in COLO, and track that we
- * have done so in order to remember to reactivate
- * them if migration fails or is cancelled.
- */
-s->block_inactive = !migrate_colo();
-migration_rate_set(RATE_LIMIT_DISABLED);
-ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
- s->block_inactive);
-}
+ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+trace_migration_completion_vm_stop(ret);
+if (ret < 0) {
+goto out_unlock;
+}
 
-qemu_mutex_unlock_iothread();
+ret = migration_maybe_pause(s, current_active_state,
+MIGRATION_STATUS_DEVICE);
+if (ret < 0) {
+goto out_unlock;
+}
 
-if (ret < 0) {
-goto fail;
-}
-} else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
-trace_migration_completion_postcopy_end();
+/*
+ * Inactivate disks except in COLO, and track that we have done so in order
+ * to remember to reactivate them if migration fails or is cancelled.
+ */
+s->block_inactive = !migrate_colo();
+migration_rate_set(RATE_LIMIT_DISABLED);
+ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
+ s->block_inactive);
+out_unlock:
+qemu_mutex_unlock_iothread();
+return ret;
+}
 
-qemu_

[PATCH v1] migration: refactor migration_completion

2023-07-14 Thread Wei Wang

Current migration_completion function is a bit long. Refactor the long
implementation into different subfunctions:
- migration_completion_precopy: completion code related to precopy
- migration_completion_postcopy: completion code related to postcopy
- close_return_path_on_source: rp thread related cleanup on migration
completion. It is named to match with open_return_path_on_source.

This improves readability and is easier for future updates (e.g. add new
subfunctions when completion code related to new features are needed). No
functional changes intended.

Signed-off-by: Wei Wang 
---
 migration/migration.c | 181 +-
 1 file changed, 106 insertions(+), 75 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 91bba630a8..83f1c74f99 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2058,6 +2058,21 @@ static int 
await_return_path_close_on_source(MigrationState *ms)
 return ms->rp_state.error;
 }
 
+static int close_return_path_on_source(MigrationState *ms)
+{
+int ret;
+
+if (!ms->rp_state.rp_thread_created) {
+return 0;
+}
+
+trace_migration_return_path_end_before();
+ret = await_return_path_close_on_source(ms);
+trace_migration_return_path_end_after(ret);
+
+return ret;
+}
+
 static inline void
 migration_wait_main_channel(MigrationState *ms)
 {
@@ -2288,66 +2303,107 @@ static int migration_maybe_pause(MigrationState *s,
 return s->state == new_state ? 0 : -EINVAL;
 }
 
-/**
- * migration_completion: Used by migration_thread when there's not much left.
- *   The caller 'breaks' the loop when this returns.
- *
- * @s: Current migration state
- */
-static void migration_completion(MigrationState *s)
+static int migration_completion_precopy(MigrationState *s,
+int *current_active_state)
 {
 int ret;
-int current_active_state = s->state;
 
-if (s->state == MIGRATION_STATUS_ACTIVE) {
-qemu_mutex_lock_iothread();
-s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+qemu_mutex_lock_iothread();
+s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
-s->vm_old_state = runstate_get();
-global_state_store();
+s->vm_old_state = runstate_get();
+global_state_store();
 
-ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-trace_migration_completion_vm_stop(ret);
-if (ret >= 0) {
-ret = migration_maybe_pause(s, _active_state,
-MIGRATION_STATUS_DEVICE);
-}
-if (ret >= 0) {
-/*
- * Inactivate disks except in COLO, and track that we
- * have done so in order to remember to reactivate
- * them if migration fails or is cancelled.
- */
-s->block_inactive = !migrate_colo();
-migration_rate_set(RATE_LIMIT_DISABLED);
-ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
- s->block_inactive);
-}
+ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+trace_migration_completion_vm_stop(ret);
+if (ret < 0) {
+goto out_unlock;
+}
 
-qemu_mutex_unlock_iothread();
+ret = migration_maybe_pause(s, current_active_state,
+MIGRATION_STATUS_DEVICE);
+if (ret < 0) {
+goto out_unlock;
+}
 
-if (ret < 0) {
-goto fail;
-}
-} else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
-trace_migration_completion_postcopy_end();
+/*
+ * Inactivate disks except in COLO, and track that we have done so in order
+ * to remember to reactivate them if migration fails or is cancelled.
+ */
+s->block_inactive = !migrate_colo();
+migration_rate_set(RATE_LIMIT_DISABLED);
+ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
+ s->block_inactive);
+out_unlock:
+qemu_mutex_unlock_iothread();
+return ret;
+}
 
-qemu_mutex_lock_iothread();
-qemu_savevm_state_complete_postcopy(s->to_dst_file);
-qemu_mutex_unlock_iothread();
+static int migration_completion_postcopy(MigrationState *s)
+{
+trace_migration_completion_postcopy_end();
+
+qemu_mutex_lock_iothread();
+qemu_savevm_state_complete_postcopy(s->to_dst_file);
+qemu_mutex_unlock_iothread();
+
+/*
+ * Shutdown the postcopy fast path thread.  This is only needed when dest
+ * QEMU binary is old (7.1/7.2).  QEMU 8.0+ doesn't need this.
+ */
+if (migrate_postcopy_preempt() && s->preempt_pre_7_2) {
+postcopy_preempt_shutdo

[PATCH RESEND v2 0/2] Enfore multifd and postcopy preempt setting

2023-06-06 Thread Wei Wang

Setting the cap/params of multifd and postcopy preempt is expected
to be done before incoming starts, as the number of multifd channels or
postcopy ram channels is used by qemu_start_incoming_migration (to
listen on the number of pending connections).

Enfocre the cap/params of multifd and postcopy preempt to be set
before incoming. If not, disable the feature and return with error
messages to remind users to adjust the commands. Fix the
qtest/migration-test to do so.

v2 RESEND change:
- Only patch commit change with more explanations

Wei Wang (2):
  migration: enfocre multifd and postcopy preempt to be set before
incoming
  qtest/migration-tests.c: use "-incoming defer" for postcopy tests

 migration/options.c  | 36 +++-
 tests/qtest/migration-test.c | 10 --
 2 files changed, 39 insertions(+), 7 deletions(-)

-- 
2.27.0

[PATCH v2 1/2] migration: enfocre multifd and postcopy preempt to be set before incoming

2023-06-06 Thread Wei Wang

qemu_start_incoming_migration needs to check the number of multifd
channels or postcopy ram channels to configure the backlog parameter (i.e.
the maximum length to which the queue of pending connections for sockfd
may grow) of listen(). So enforce the usage of postcopy-preempt and
multifd as below:
- need to use "-incoming defer" on the destination; and
- set_capability and set_parameter need to be done before migrate_incoming

Otherwise, disable the use of the features and report error messages to
remind users to adjust the commands.

Signed-off-by: Wei Wang 
Reviewed-by: Peter Xu 
---
 migration/options.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index b62ab30cd5..01403e5eaa 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -415,6 +415,11 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
 MIGRATION_CAPABILITY_VALIDATE_UUID,
 MIGRATION_CAPABILITY_ZERO_COPY_SEND);
 
+static bool migrate_incoming_started(void)
+{
+return !!migration_incoming_get_current()->transport_data;
+}
+
 /**
  * @migration_caps_check - check capability compatibility
  *
@@ -538,6 +543,12 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Postcopy preempt not compatible with compress");
 return false;
 }
+
+if (migrate_incoming_started()) {
+error_setg(errp,
+   "Postcopy preempt must be set before incoming starts");
+return false;
+}
 }
 
 if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
@@ -545,6 +556,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Multifd is not compatible with compress");
 return false;
 }
+if (migrate_incoming_started()) {
+error_setg(errp, "Multifd must be set before incoming starts");
+return false;
+}
 }
 
 return true;
@@ -998,11 +1013,22 @@ bool migrate_params_check(MigrationParameters *params, 
Error **errp)
 
 /* x_checkpoint_delay is now always positive */
 
-if (params->has_multifd_channels && (params->multifd_channels < 1)) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
-   "multifd_channels",
-   "a value between 1 and 255");
-return false;
+if (params->has_multifd_channels) {
+if (params->multifd_channels < 1) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+   "multifd_channels",
+   "a value between 1 and 255");
+return false;
+}
+if (migrate_incoming_started()) {
+MigrationState *ms = migrate_get_current();
+
+ms->capabilities[MIGRATION_CAPABILITY_MULTIFD] = false;
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+   "multifd_channels",
+   "must be set before incoming starts");
+return false;
+}
 }
 
 if (params->has_multifd_zlib_level &&
-- 
2.27.0

[PATCH v2 2/2] qtest/migration-tests.c: use "-incoming defer" for postcopy tests

2023-06-06 Thread Wei Wang

The Postcopy preempt capability is expected to be set before incoming
starts, so change the postcopy tests to start with deferred incoming and
call migrate-incoming after the cap has been set.

Why the existing tests (without this patch) didn't fail?
There could be two reasons:
1) "backlog" specifies the number of pending connections. As long as the
   server accepts the connections faster than the clients side connecting,
   connection will succeed. For the preempt test, it uses only 2 channels,
   so very likely to not have pending connections.
2) per my tests (on kernel 6.2), the number of pending connections allowed
   is actually "backlog + 1", which is 2 in this case.
That said, the implementation of socket_start_incoming_migration_internal
expects "migrate defer" to be used, and for safety, change the test to
work with the expected usage.

Signed-off-by: Wei Wang 
Reviewed-by: Peter Xu 
---
 tests/qtest/migration-test.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..cbdbc932de 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1143,10 +1143,11 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 QTestState **to_ptr,
 MigrateCommon *args)
 {
-g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+g_autofree char *uri = NULL;
 QTestState *from, *to;
+QDict *rsp;
 
-if (test_migrate_start(, , uri, >start)) {
+if (test_migrate_start(, , "defer", >start)) {
 return -1;
 }
 
@@ -1165,9 +1166,14 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 
 migrate_ensure_non_converge(from);
 
+rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
+   "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+qobject_unref(rsp);
+
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
 
+uri = migrate_get_socket_address(to, "socket-address");
 migrate_qmp(from, uri, "{}");
 
 wait_for_migration_pass(from);
-- 
2.27.0

[PATCH v2 1/2] migration: enfocre multifd and postcopy preempt to be set before incoming

2023-05-30 Thread Wei Wang

qemu_start_incoming_migration needs to check the number of multifd
channels or postcopy ram channels to configure the backlog parameter (i.e.
the maximum length to which the queue of pending connections for sockfd
may grow) of listen(). So enforce the usage of postcopy-preempt and
multifd as below:
- need to use "-incoming defer" on the destination; and
- set_capability and set_parameter need to be done before migrate_incoming

Otherwise, disable the use of the features and report error messages to
remind users to adjust the commands.

Signed-off-by: Wei Wang 
---
 migration/options.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index a560483871..954a39aa6d 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -415,6 +415,11 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
 MIGRATION_CAPABILITY_VALIDATE_UUID,
 MIGRATION_CAPABILITY_ZERO_COPY_SEND);
 
+static bool migrate_incoming_started(void)
+{
+return !!migration_incoming_get_current()->transport_data;
+}
+
 /**
  * @migration_caps_check - check capability compatibility
  *
@@ -538,6 +543,12 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Postcopy preempt not compatible with compress");
 return false;
 }
+
+if (migrate_incoming_started()) {
+error_setg(errp,
+   "Postcopy preempt must be set before incoming starts");
+return false;
+}
 }
 
 if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
@@ -545,6 +556,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Multifd is not compatible with compress");
 return false;
 }
+if (migrate_incoming_started()) {
+error_setg(errp, "Multifd must be set before incoming starts");
+return false;
+}
 }
 
 return true;
@@ -998,11 +1013,22 @@ bool migrate_params_check(MigrationParameters *params, 
Error **errp)
 
 /* x_checkpoint_delay is now always positive */
 
-if (params->has_multifd_channels && (params->multifd_channels < 1)) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
-   "multifd_channels",
-   "a value between 1 and 255");
-return false;
+if (params->has_multifd_channels) {
+if (params->multifd_channels < 1) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+   "multifd_channels",
+   "a value between 1 and 255");
+return false;
+}
+if (migrate_incoming_started()) {
+MigrationState *ms = migrate_get_current();
+
+ms->capabilities[MIGRATION_CAPABILITY_MULTIFD] = false;
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+   "multifd_channels",
+   "must be set before incoming starts");
+return false;
+}
 }
 
 if (params->has_multifd_zlib_level &&
-- 
2.27.0

[PATCH v2 2/2] qtest/migration-tests.c: use "-incoming defer" for postcopy tests

2023-05-30 Thread Wei Wang

The Postcopy preempt capability requires to be set before incoming
starts, so change the postcopy tests to start with deferred incoming and
call migrate-incoming after the cap has been set.

Signed-off-by: Wei Wang 
---
 tests/qtest/migration-test.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b99b49a314..31ce3d959c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1158,10 +1158,11 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 QTestState **to_ptr,
 MigrateCommon *args)
 {
-g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+g_autofree char *uri = NULL;
 QTestState *from, *to;
+QDict *rsp;
 
-if (test_migrate_start(, , uri, >start)) {
+if (test_migrate_start(, , "defer", >start)) {
 return -1;
 }
 
@@ -1180,9 +1181,14 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 
 migrate_ensure_non_converge(from);
 
+rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
+   "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+qobject_unref(rsp);
+
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
 
+uri = migrate_get_socket_address(to, "socket-address");
 migrate_qmp(from, uri, "{}");
 
 wait_for_migration_pass(from);
-- 
2.27.0

[PATCH v2 0/2] Enfore multifd and postcopy preempt setting

2023-05-30 Thread Wei Wang

Setting the cap/params of multifd and postcopy preempt is expected
to be done before incoming starts, as the number of multifd channels or
postcopy ram channels is used by qemu_start_incoming_migration (to
listen on the number of pending connections).

Enfocre the cap/params of multifd and postcopy preempt to be set
before incoming. If not, disable the feature and return with error
messages to remind users to adjust the commands. Fix the
qtest/migration-test to do so.

Wei Wang (2):
  migration: enfocre multifd and postcopy preempt to be set before
incoming
  qtest/migration-tests.c: use "-incoming defer" for postcopy tests

 migration/options.c  | 36 +++-
 tests/qtest/migration-test.c | 10 --
 2 files changed, 39 insertions(+), 7 deletions(-)

-- 
2.27.0

[PATCH v1] migration/multifd: always post the sync semaphores when sendesr exit

2023-05-24 Thread Wei Wang

The sem_sync and channels_ready semaphores are used by the multifd
sender threads to synchronize with the migration thread. It is possible
that one of the sender threads exits due to an error (e.g. occured when
writing to the socket). The error thread sets multifd_send_state->exiting
to force other sender threads to exit. Those threads exit (due to the
setting of multifd_send_state->exiting) won't post the sem_sync and
channels_ready semaphores, as they didn't encounter any error (i.e. ret
will be 0 when exits). In this case, if the migration thread is waiting on
the sem_sync semaphores from all the sender threads, it will wait there
forever, which is not expected.

Change the sender threads to always post the sem_sync and channels_ready
semaphores when exit. This will unblock the migration thread in the
above case to exit. If the main thread isn't waiting on the semaphores
when the sender threads exits, posting on the semaphores will just be
like a noop.

Signed-off-by: Wei Wang 
---
 migration/multifd.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 0bf5958a9c..1ff461675e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -757,10 +757,8 @@ out:
  * Error happen, I will exit, but I can't just leave, tell
  * who pay attention to me.
  */
-if (ret != 0) {
-qemu_sem_post(>sem_sync);
-qemu_sem_post(_send_state->channels_ready);
-}
+qemu_sem_post(>sem_sync);
+qemu_sem_post(_send_state->channels_ready);
 
 qemu_mutex_lock(>mutex);
 p->running = false;
-- 
2.27.0

[PATCH v1] migration: fix migrate_params_test_apply to set the dest param correctly

2023-05-24 Thread Wei Wang

qmp_migrate_set_parameters expects to use tmp for parameters check,
so migrate_params_test_apply is expected to copy the related fields from
params to tmp. So fix migrate_params_test_apply to use the function
parameter, *dest, rather than the global one. The dest->has_xxx (xxx is
the feature name) related fields need to be set, as they will be checked
by migrate_params_check.

Signed-off-by: Wei Wang 
---
 migration/options.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index b62ab30cd5..a560483871 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1089,39 +1089,45 @@ bool migrate_params_check(MigrationParameters *params, 
Error **errp)
 static void migrate_params_test_apply(MigrateSetParameters *params,
   MigrationParameters *dest)
 {
-*dest = migrate_get_current()->parameters;
-
 /* TODO use QAPI_CLONE() instead of duplicating it inline */
 
 if (params->has_compress_level) {
+dest->has_compress_level = true;
 dest->compress_level = params->compress_level;
 }
 
 if (params->has_compress_threads) {
+dest->has_compress_threads = true;
 dest->compress_threads = params->compress_threads;
 }
 
 if (params->has_compress_wait_thread) {
+dest->has_compress_wait_thread = true;
 dest->compress_wait_thread = params->compress_wait_thread;
 }
 
 if (params->has_decompress_threads) {
+dest->has_decompress_threads = true;
 dest->decompress_threads = params->decompress_threads;
 }
 
 if (params->has_throttle_trigger_threshold) {
+dest->has_throttle_trigger_threshold = true;
 dest->throttle_trigger_threshold = params->throttle_trigger_threshold;
 }
 
 if (params->has_cpu_throttle_initial) {
+dest->has_cpu_throttle_initial = true;
 dest->cpu_throttle_initial = params->cpu_throttle_initial;
 }
 
 if (params->has_cpu_throttle_increment) {
+dest->has_cpu_throttle_increment = true;
 dest->cpu_throttle_increment = params->cpu_throttle_increment;
 }
 
 if (params->has_cpu_throttle_tailslow) {
+dest->has_cpu_throttle_tailslow = true;
 dest->cpu_throttle_tailslow = params->cpu_throttle_tailslow;
 }
 
@@ -1136,45 +1142,58 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 }
 
 if (params->has_max_bandwidth) {
+dest->has_max_bandwidth = true;
 dest->max_bandwidth = params->max_bandwidth;
 }
 
 if (params->has_downtime_limit) {
+dest->has_downtime_limit = true;
 dest->downtime_limit = params->downtime_limit;
 }
 
 if (params->has_x_checkpoint_delay) {
+dest->has_x_checkpoint_delay = true;
 dest->x_checkpoint_delay = params->x_checkpoint_delay;
 }
 
 if (params->has_block_incremental) {
+dest->has_block_incremental = true;
 dest->block_incremental = params->block_incremental;
 }
 if (params->has_multifd_channels) {
+dest->has_multifd_channels = true;
 dest->multifd_channels = params->multifd_channels;
 }
 if (params->has_multifd_compression) {
+dest->has_multifd_compression = true;
 dest->multifd_compression = params->multifd_compression;
 }
 if (params->has_xbzrle_cache_size) {
+dest->has_xbzrle_cache_size = true;
 dest->xbzrle_cache_size = params->xbzrle_cache_size;
 }
 if (params->has_max_postcopy_bandwidth) {
+dest->has_max_postcopy_bandwidth = true;
 dest->max_postcopy_bandwidth = params->max_postcopy_bandwidth;
 }
 if (params->has_max_cpu_throttle) {
+dest->has_max_cpu_throttle = true;
 dest->max_cpu_throttle = params->max_cpu_throttle;
 }
 if (params->has_announce_initial) {
+dest->has_announce_initial = true;
 dest->announce_initial = params->announce_initial;
 }
 if (params->has_announce_max) {
+dest->has_announce_max = true;
 dest->announce_max = params->announce_max;
 }
 if (params->has_announce_rounds) {
+dest->has_announce_rounds = true;
 dest->announce_rounds = params->announce_rounds;
 }
 if (params->has_announce_step) {
+dest->has_announce_step = true;
 dest->announce_step = params->announce_step;
 }
 
@@ -1321,6 +1340,7 @@ void qmp_migrate_set_parameters(MigrateSetParameters 
*params, Error **errp)
 params->tls_hostname->u.s = strdup("");
 }
 
+memset(, 0, sizeof(MigrationParameters));
 migrate_params_test_apply(params, );
 
 if (!migrate_params_check(, errp)) {
-- 
2.27.0

[PATCH v1] migration: fail the cap check if it requires the use of deferred incoming

2023-05-18 Thread Wei Wang

qemu_start_incoming_migration needs to check the number of multifd
channels or postcopy ram channels to configure the backlog parameter (i.e.
the maximum length to which the queue of pending connections for sockfd
may grow) of listen(). So multifd and postcopy-preempt caps require the
use of deferred incoming, that is, calling qemu_start_incoming_migration
should be deferred via qmp or hmp commands after the cap of multifd and
postcopy-preempt are configured.

Check if deferred incoming is used when enabling multifd or
postcopy-preempt, and fail the check with error messages if not.

Signed-off-by: Wei Wang 
---
 migration/options.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index c2a278ee2d..25b333b3f4 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -537,6 +537,11 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Postcopy preempt not compatible with compress");
 return false;
 }
+
+if (mis->transport_data) {
+error_setg(errp, "Postcopy preempt should use deferred incoming");
+return false;
+}
 }
 
 if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
@@ -544,6 +549,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Multifd is not compatible with compress");
 return false;
 }
+if (mis->transport_data) {
+error_setg(errp, "Multifd should use deferred incoming");
+return false;
+}
 }
 
 return true;
-- 
2.27.0

[PATCH v1] migration/multifd: correct multifd_send_thread to trace the flags

2023-03-09 Thread Wei Wang

The p->flags could be updated via the send_prepare callback, e.g. OR-ed
with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the
local "flags" before the send_prepare callback could only get partial of
p->flags. Fix it by moving the assignment of p->flags to the local flags
after the callback, so that the correct flags can be traced.

Fixes: ab7cbb0b9a3b ("multifd: Make no compression operations into its own 
structure")
Signed-off-by: Wei Wang 
---
 migration/multifd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 5e85c3ea9b..cbc0dfe39b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -677,7 +677,7 @@ static void *multifd_send_thread(void *opaque)
 
 if (p->pending_job) {
 uint64_t packet_num = p->packet_num;
-uint32_t flags = p->flags;
+uint32_t flags;
 p->normal_num = 0;
 
 if (use_zero_copy_send) {
@@ -699,6 +699,7 @@ static void *multifd_send_thread(void *opaque)
 }
 }
 multifd_send_fill_packet(p);
+flags = p->flags;
 p->flags = 0;
 p->num_packets++;
 p->total_normal_pages += p->normal_num;
-- 
2.27.0

Re: KVM developers conference call agenda

2023-03-06 Thread Wei Wang



On 2/21/23 21:46, juan.quint...@gmail.com wrote:

KVM developers conference call


Hi for today call:
Trivial stuff (less that 5mins each)
- should we record the sessions
- should we have the call every week.
We have on the backburner:
* TDX migration


Hi Juan,
Can I get 30 min (or more if no more other topics)  in tomorrow's call to
continue the discussion on TDX live migration?
I also did some investigation on the previous comments about MigTD-to-MigTD
communication (remove socat), and have an update to discuss.

Thanks,
Wei


* VFIO/VPDA/Vhost migration
* Single binary qemu


The future of icount

Do we have an agenda for next weeks KVM call yet? If there is space I'd
like to take some time to discuss the future direction of icount.

Specifically I believe there might be some proposals for how we could
support icount with MTTCG worth discussing. From my point of view icount
provides too things:

- a sense of time vaguely related to execution rather than wall clock
- determinism

I would love to divorce the former from icount and punt it to plugins.
The plugin would be free to instrument as heavily or lightly as it sees
fit and provide its best guess as to guest time on demand. I wrote this
idea up as a card in Linaro's JIRA if anyone is interested:

https://linaro.atlassian.net/browse/QEMU-481

Being able to punt cost modelling and sense of time into plugins would
allow the core icount support to concentrate on determinism. Then any
attempt to enable icount for MTTCG would then have to ensure it stays
deterministic.

Richard and I have discussed the problem a few times and weren't sure it
was solvable but I'm totally open to hearing ideas on how to do it.
Fundamentally I think we would have to ensure any TB's doing IO would
have to execute in an exclusive context. The TCG code already has
mechanisms to ensure all IO is only done at the end of blocks so it
doesn't seem a huge leap to ensure we execute those blocks exclusively.
However there is still the problem of what to do about other pure
computation threads getting ahead or behind of the IO blocks on
subsequent runs.

KVM developers conference call
Tuesday 2023-02-21 ⋅ 15:00 – 16:00 (Central European Time - Madrid)
If you need call details, please contact me: quint...@redhat.com


Location

https://meet.jit.si/kvmcallmeeting
View map 




Guests

Philippe Mathieu-Daudé 
Joao Martins 
quint...@redhat.com 
Meirav Dean 
Felipe Franciosi 
afaer...@suse.de 
bazu...@redhat.com 
bbau...@redhat.com 
c...@f00f.org 
dustin.kirkl...@canonical.com 
ebl...@redhat.com 
edgar.igles...@gmail.com 
Eric Northup 
eric.au...@redhat.com 
i...@theiggy.com 
jan.kis...@web.de 
jidong.x...@gmail.com 
jjhe...@linux.vnet.ibm.com 
m...@linux.vnet.ibm.com 
Peter Maydell 
richard.hender...@linaro.org 
stefa...@gmail.com 
Warner Losh 
z@139.com 
zwu.ker...@gmail.com 
Jason Gunthorpe 
Neo Jia 
David Edmondson 
Elena Ufimtseva 
Konrad Wilk 
a...@rev.ng 
a...@rev.ng 
Shameerali Kolothum Thodi 
Wang, Wei W 
Chao Peng 
kvm-devel 
qemu-devel@nongnu.org 
mbur...@qti.qualcomm.com

Re: [PATCH v7 11/14] KVM: Register/unregister the guest private memory regions

2022-07-21 Thread Wei Wang





On 7/21/22 00:21, Sean Christopherson wrote:

On Wed, Jul 20, 2022, Gupta, Pankaj wrote:

+bool __weak kvm_arch_private_mem_supported(struct kvm *kvm)

Use kvm_arch_has_private_mem(), both because "has" makes it obvious this is 
checking
a flag of sorts, and to align with other helpers of this nature (and with
CONFIG_HAVE_KVM_PRIVATE_MEM).

   $ git grep kvm_arch | grep supported | wc -l
   0
   $ git grep kvm_arch | grep has | wc -l
   26


+#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM
+   case KVM_MEMORY_ENCRYPT_REG_REGION:
+   case KVM_MEMORY_ENCRYPT_UNREG_REGION: {
+   struct kvm_enc_region region;
+
+   if (!kvm_arch_private_mem_supported(kvm))
+   goto arch_vm_ioctl;
+
+   r = -EFAULT;
+   if (copy_from_user(, argp, sizeof(region)))
+   goto out;
+
+   r = kvm_vm_ioctl_set_encrypted_region(kvm, ioctl, );

this is to store private region metadata not only the encrypted region?

Correct.

Sorry for not being clear, was suggesting name change of this function from:
"kvm_vm_ioctl_set_encrypted_region" to "kvm_vm_ioctl_set_private_region"

Though I don't have strong reason to change it, I'm fine with this and

Yes, no strong reason, just thought "kvm_vm_ioctl_set_private_region" would
depict the actual functionality :)


this name matches the above kvm_arch_private_mem_supported perfectly.

BTW could not understand this, how "kvm_vm_ioctl_set_encrypted_region"
matches "kvm_arch_private_mem_supported"?

Chao is saying that kvm_vm_ioctl_set_private_region() pairs nicely with
kvm_arch_private_mem_supported(), not that the "encrypted" variant pairs nicely.

I also like using "private" instead of "encrypted", though we should probably
find a different verb than "set", because calling "set_private" when making the
region shared is confusing.  I'm struggling to come up with a good alternative
though.

kvm_vm_ioctl_set_memory_region() is already taken by KVM_SET_USER_MEMORY_REGION,
and that also means that anything with "memory_region" in the name is bound to 
be
confusing.

Hmm, and if we move away from "encrypted", it probably makes sense to pass in
addr+size instead of a kvm_enc_region.

Maybe this?

static int kvm_vm_ioctl_set_or_clear_mem_private(struct kvm *kvm, gpa_t gpa,
 gpa_t size, bool set_private)

and then:

#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM
case KVM_MEMORY_ENCRYPT_REG_REGION:
case KVM_MEMORY_ENCRYPT_UNREG_REGION: {
bool set = ioctl == KVM_MEMORY_ENCRYPT_REG_REGION;
struct kvm_enc_region region;

if (!kvm_arch_private_mem_supported(kvm))
goto arch_vm_ioctl;

r = -EFAULT;
if (copy_from_user(, argp, sizeof(region)))
goto out;

r = kvm_vm_ioctl_set_or_clear_mem_private(kvm, region.addr,
  region.size, set);
break;
}
#endif

I don't love it, so if someone has a better idea...

Maybe you could tag it with cgs for all the confidential guest support 
related stuff:

e.g. kvm_vm_ioctl_set_cgs_mem()

bool is_private = ioctl == KVM_MEMORY_ENCRYPT_REG_REGION;
...
kvm_vm_ioctl_set_cgs_mem(, is_private)

[PATCH v3] migration: clear the memory region dirty bitmap when skipping free pages

2021-07-22 Thread Wei Wang

When skipping free pages to send, their corresponding dirty bits in the
memory region dirty bitmap need to be cleared. Otherwise the skipped
pages will be sent in the next round after the migration thread syncs
dirty bits from the memory region dirty bitmap.

Cc: David Hildenbrand 
Cc: Peter Xu 
Cc: Michael S. Tsirkin 
Reported-by: David Hildenbrand 
Signed-off-by: Wei Wang 
---
 migration/ram.c | 74 +
 1 file changed, 56 insertions(+), 18 deletions(-)

v2->v3 changelog:
- change migration_clear_memory_region_dirty_bitmap_range to use
  QEMU_ALIGN

v1->v2 changelog:
- move migration_clear_memory_region_dirty_bitmap under bitmap_mutex as
  we lack confidence to have it outside the lock for now.
- clean the unnecessary subproject commit.

diff --git a/migration/ram.c b/migration/ram.c
index b5fc454b2f..c8262cf3bc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -789,6 +789,53 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
RAMBlock *rb,
 return find_next_bit(bitmap, size, start);
 }
 
+static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
+   RAMBlock *rb,
+   unsigned long page)
+{
+uint8_t shift;
+hwaddr size, start;
+
+if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page)) {
+return;
+}
+
+shift = rb->clear_bmap_shift;
+/*
+ * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
+ * can make things easier sometimes since then start address
+ * of the small chunk will always be 64 pages aligned so the
+ * bitmap will always be aligned to unsigned long. We should
+ * even be able to remove this restriction but I'm simply
+ * keeping it.
+ */
+assert(shift >= 6);
+
+size = 1ULL << (TARGET_PAGE_BITS + shift);
+start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
+trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
+memory_region_clear_dirty_bitmap(rb->mr, start, size);
+}
+
+static void
+migration_clear_memory_region_dirty_bitmap_range(RAMState *rs,
+ RAMBlock *rb,
+ unsigned long start,
+ unsigned long npages)
+{
+unsigned long i, chunk_pages = 1UL << rb->clear_bmap_shift;
+unsigned long chunk_start = QEMU_ALIGN_DOWN(start, chunk_pages);
+unsigned long chunk_end = QEMU_ALIGN_UP(start + npages, chunk_pages);
+
+/*
+ * Clear pages from start to start + npages - 1, so the end boundary is
+ * exclusive.
+ */
+for (i = chunk_start; i < chunk_end; i += chunk_pages) {
+migration_clear_memory_region_dirty_bitmap(rs, rb, i);
+}
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 RAMBlock *rb,
 unsigned long page)
@@ -803,26 +850,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState 
*rs,
  * the page in the chunk we clear the remote dirty bitmap for all.
  * Clearing it earlier won't be a problem, but too late will.
  */
-if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
-uint8_t shift = rb->clear_bmap_shift;
-hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
-hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
-
-/*
- * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
- * can make things easier sometimes since then start address
- * of the small chunk will always be 64 pages aligned so the
- * bitmap will always be aligned to unsigned long.  We should
- * even be able to remove this restriction but I'm simply
- * keeping it.
- */
-assert(shift >= 6);
-trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
-memory_region_clear_dirty_bitmap(rb->mr, start, size);
-}
+migration_clear_memory_region_dirty_bitmap(rs, rb, page);
 
 ret = test_and_clear_bit(page, rb->bmap);
-
 if (ret) {
 rs->migration_dirty_pages--;
 }
@@ -2741,6 +2771,14 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
 npages = used_len >> TARGET_PAGE_BITS;
 
 qemu_mutex_lock(_state->bitmap_mutex);
+/*
+ * The skipped free pages are equavalent to be sent from clear_bmap's
+ * perspective, so clear the bits from the memory region bitmap which
+ * are initially set. Otherwise those skipped pages will be sent in
+ * the next round after syncing from the memory region bitmap.
+ */
+migration_clear_memory_region_dirty_bitmap_range(ram_state, block,
+

[PATCH v2] migration: clear the memory region dirty bitmap when skipping free pages

2021-07-15 Thread Wei Wang

When skipping free pages to send, their corresponding dirty bits in the
memory region dirty bitmap need to be cleared. Otherwise the skipped
pages will be sent in the next round after the migration thread syncs
dirty bits from the memory region dirty bitmap.

Cc: David Hildenbrand 
Cc: Peter Xu 
Cc: Michael S. Tsirkin 
Reported-by: David Hildenbrand 
Signed-off-by: Wei Wang 
---
 migration/ram.c | 72 -
 1 file changed, 54 insertions(+), 18 deletions(-)

v1->v2 changelog:
- move migration_clear_memory_region_dirty_bitmap under bitmap_mutex as
  we lack confidence to have it outside the lock for now.
- clean the unnecessary subproject commit.

diff --git a/migration/ram.c b/migration/ram.c
index b5fc454b2f..69e06b55ec 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -789,6 +789,51 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
RAMBlock *rb,
 return find_next_bit(bitmap, size, start);
 }
 
+static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
+   RAMBlock *rb,
+   unsigned long page)
+{
+uint8_t shift;
+hwaddr size, start;
+
+if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page)) {
+return;
+}
+
+shift = rb->clear_bmap_shift;
+/*
+ * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
+ * can make things easier sometimes since then start address
+ * of the small chunk will always be 64 pages aligned so the
+ * bitmap will always be aligned to unsigned long. We should
+ * even be able to remove this restriction but I'm simply
+ * keeping it.
+ */
+assert(shift >= 6);
+
+size = 1ULL << (TARGET_PAGE_BITS + shift);
+start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
+trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
+memory_region_clear_dirty_bitmap(rb->mr, start, size);
+}
+
+static void
+migration_clear_memory_region_dirty_bitmap_range(RAMState *rs,
+ RAMBlock *rb,
+ unsigned long start,
+ unsigned long npages)
+{
+unsigned long page_to_clear, i, nchunks;
+unsigned long chunk_pages = 1UL << rb->clear_bmap_shift;
+
+nchunks = (start + npages) / chunk_pages - start / chunk_pages + 1;
+
+for (i = 0; i < nchunks; i++) {
+page_to_clear = start + i * chunk_pages;
+migration_clear_memory_region_dirty_bitmap(rs, rb, page_to_clear);
+}
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 RAMBlock *rb,
 unsigned long page)
@@ -803,26 +848,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState 
*rs,
  * the page in the chunk we clear the remote dirty bitmap for all.
  * Clearing it earlier won't be a problem, but too late will.
  */
-if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
-uint8_t shift = rb->clear_bmap_shift;
-hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
-hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
-
-/*
- * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
- * can make things easier sometimes since then start address
- * of the small chunk will always be 64 pages aligned so the
- * bitmap will always be aligned to unsigned long.  We should
- * even be able to remove this restriction but I'm simply
- * keeping it.
- */
-assert(shift >= 6);
-trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
-memory_region_clear_dirty_bitmap(rb->mr, start, size);
-}
+migration_clear_memory_region_dirty_bitmap(rs, rb, page);
 
 ret = test_and_clear_bit(page, rb->bmap);
-
 if (ret) {
 rs->migration_dirty_pages--;
 }
@@ -2741,6 +2769,14 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
 npages = used_len >> TARGET_PAGE_BITS;
 
 qemu_mutex_lock(_state->bitmap_mutex);
+/*
+ * The skipped free pages are equavelent to be sent from clear_bmap's
+ * perspective, so clear the bits from the memory region bitmap which
+ * are initially set. Otherwise those skipped pages will be sent in
+ * the next round after syncing from the memory region bitmap.
+ */
+migration_clear_memory_region_dirty_bitmap_range(ram_state, block,
+ start, npages);
 ram_state->migration_dirty_pages -=
   bitmap_count_one_with_offset(block->bmap, start, npages);
 bitmap_clear(block->bmap, start, npages);
-- 
2.25.1

[PATCH v1] migration: clear the memory region dirty bitmap when skipping free pages

2021-07-14 Thread Wei Wang

When skipping free pages, their corresponding dirty bits in the memory
region dirty bitmap need to be cleared. Otherwise the skipped pages will
be sent in the next round after the migration thread syncs dirty bits
from the memory region dirty bitmap.

migration_clear_memory_region_dirty_bitmap_range is put outside the
bitmap_mutex, becasue memory_region_clear_dirty_bitmap is possible to block
on the kvm slot mutex (don't want holding bitmap_mutex while blocked on
another mutex), and clear_bmap_test_and_clear uses atomic operation.

Cc: David Hildenbrand 
Cc: Peter Xu 
Reported-by: David Hildenbrand 
Signed-off-by: Wei Wang 
---
 capstone|  2 +-
 migration/ram.c | 73 +
 slirp   |  2 +-
 ui/keycodemapdb |  2 +-
 4 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/capstone b/capstone
index f8b1b83301..22ead3e0bf 16
--- a/capstone
+++ b/capstone
@@ -1 +1 @@
-Subproject commit f8b1b833015a4ae47110ed068e0deb7106ced66d
+Subproject commit 22ead3e0bfdb87516656453336160e0a37b066bf
diff --git a/migration/ram.c b/migration/ram.c
index 88ff34f574..c44c6e2fed 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -789,6 +789,51 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
RAMBlock *rb,
 return find_next_bit(bitmap, size, start);
 }
 
+static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
+   RAMBlock *rb,
+   unsigned long page)
+{
+uint8_t shift;
+hwaddr size, start;
+
+if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page)) {
+return;
+}
+
+shift = rb->clear_bmap_shift;
+/*
+ * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
+ * can make things easier sometimes since then start address
+ * of the small chunk will always be 64 pages aligned so the
+ * bitmap will always be aligned to unsigned long. We should
+ * even be able to remove this restriction but I'm simply
+ * keeping it.
+ */
+assert(shift >= 6);
+
+size = 1ULL << (TARGET_PAGE_BITS + shift);
+start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
+trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
+memory_region_clear_dirty_bitmap(rb->mr, start, size);
+}
+
+static void
+migration_clear_memory_region_dirty_bitmap_range(RAMState *rs,
+ RAMBlock *rb,
+ unsigned long start,
+ unsigned long npages)
+{
+unsigned long page_to_clear, i, nchunks;
+unsigned long chunk_pages = 1UL << rb->clear_bmap_shift;
+
+nchunks = (start + npages) / chunk_pages - start / chunk_pages + 1;
+
+for (i = 0; i < nchunks; i++) {
+page_to_clear = start + i * chunk_pages;
+migration_clear_memory_region_dirty_bitmap(rs, rb, page_to_clear);
+}
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 RAMBlock *rb,
 unsigned long page)
@@ -805,26 +850,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState 
*rs,
  * the page in the chunk we clear the remote dirty bitmap for all.
  * Clearing it earlier won't be a problem, but too late will.
  */
-if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
-uint8_t shift = rb->clear_bmap_shift;
-hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
-hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
-
-/*
- * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
- * can make things easier sometimes since then start address
- * of the small chunk will always be 64 pages aligned so the
- * bitmap will always be aligned to unsigned long.  We should
- * even be able to remove this restriction but I'm simply
- * keeping it.
- */
-assert(shift >= 6);
-trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
-memory_region_clear_dirty_bitmap(rb->mr, start, size);
-}
+migration_clear_memory_region_dirty_bitmap(rs, rb, page);
 
 ret = test_and_clear_bit(page, rb->bmap);
-
 if (ret) {
 rs->migration_dirty_pages--;
 }
@@ -2742,6 +2770,15 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
 start = offset >> TARGET_PAGE_BITS;
 npages = used_len >> TARGET_PAGE_BITS;
 
+/*
+ * The skipped free pages are equavelent to be sent from clear_bmap's
+ * perspective, so clear the bits from the memory region bitmap which
+ * are initially set. Otherwise those

Re: [raw] Guest stuck during live live-migration

2020-12-14 Thread Wei Wang


On 11/23/2020 05:36 PM, Quentin Grolleau wrote:

Hello,

In our company, we are hosting a large number of Vm, hosted behind 
Openstack (so libvirt/qemu).
A large majority of our Vms are runnign with local data only, stored 
on NVME, and most of them are RAW disks.


With Qemu 4.0(can be even with older version)we see strange 
live-migrationcomportement:

- some Vms live migrate at very high speed without issue (> 6 Gbps)
- some Vms are running correctly, but migrating at a strange low 
speed (3Gbps)
- some Vms are migrating at a very low speed (1Gbps, sometime 
less) and during the migration the guest is completely I/O stuck
When this issue happen the VM is completly block, iostat in the Vm 
show us a latency of 30 secs


First we thought it was related to an hardware issuewe check it, we 
comparing different hardware, but no issue where found there


So one of my colleague had the idea to limit with "tc" the bandwidth 
on the interface the migration was done, and it worked the VM didn't 
lose any ping nor being  I/O stuck
Important point : Once the Vm have been migrate (with the limitation ) 
one time, if we migrate it again right after, the migration will be 
done at full speed (8-9Gb/s) without freezing the Vm


It only happen on existing VM, we tried to replicate with a fresh 
instance with exactly the same spec and nothing was happening


We tried to replicate the workload inside the VM but there was no way 
to replicate the case. So it was not related to the workload nor to 
the server that hosts the Vm


So we thought about the disk of the instance : the raw file.

We also tried to strace -c the process during the live-migration and 
it was doing a lot of "lseek"


and we found this :
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00462.html


So i rebuilt Qemu with this patch and the live-migration went well, at 
high speedand with no VM freeze

( https://github.com/qemu/qemu/blob/master/block/file-posix.c#L2601)

Do you have a way to avoid the "lseek" mechanism as it consumes more 
resources to find the holes in the disk and don't let any for the VM ?



Server hosting the VM :
- Bi-Xeon hosts With NVME storage and 10 Go Network card
- Qemu 4.0 And Libvirt 5.4
- Kernel 4.18.0.25

Guest having the issue :
- raw image with Debian 8

Here the qemu img on the disk :
> qemu-img info disk
image: disk
file format: raw
virtual size: 400G (429496729600 bytes)
disk size: 400G



Could you share the migration options that you use and "info migrate" 
for both stuck and non-stuck cases?


Best,
Wei

[PATCH] migration: fix xbzrle encoding rate calculation

2020-06-07 Thread Wei Wang

It's reported an error of implicit conversion from "unsigned long" to
"double" when compiling with Clang 10. Simply make the encoding rate 0
when the encoded_size is 0.

Fixes: e460a4b1a4
Reported-by: Richard Henderson 
Signed-off-by: Wei Wang 
---
 migration/ram.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 41cc530..069b6e3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -913,10 +913,8 @@ static void migration_update_rates(RAMState *rs, int64_t 
end_time)
 unencoded_size = (xbzrle_counters.pages - rs->xbzrle_pages_prev) *
  TARGET_PAGE_SIZE;
 encoded_size = xbzrle_counters.bytes - rs->xbzrle_bytes_prev;
-if (xbzrle_counters.pages == rs->xbzrle_pages_prev) {
+if (xbzrle_counters.pages == rs->xbzrle_pages_prev || !encoded_size) {
 xbzrle_counters.encoding_rate = 0;
-} else if (!encoded_size) {
-xbzrle_counters.encoding_rate = UINT64_MAX;
 } else {
 xbzrle_counters.encoding_rate = unencoded_size / encoded_size;
 }
-- 
2.7.4

Re: [PATCH v3] migration/xbzrle: add encoding rate

2020-06-04 Thread Wei Wang


On 06/05/2020 12:57 AM, Richard Henderson wrote:

On 6/4/20 3:27 AM, Wei Wang wrote:

On 06/04/2020 05:38 PM, Dr. David Alan Gilbert wrote:

Hmm OK; I'll admit to not liking NaN/Inf in output.

Dave


OK. To deal with the reported issue, how about using FLT_MAX (as opposed to
UINT64_MAX or inf):
xbzrle_counters.encoding_rate = FLT_MAX;

So you'd rather see 340282346638528859811704183484516925440.00 printed?

It's arbitrary and not correct in any mathematical sense.

If you *really* insist on not printing Inf (which may have some diagnostic
value), then 0 is just as arbitrary, and at least smaller in the output.


0 works fine (though it logically means the lowest encoding rate).
I slightly prefer the biggest number or inf, which naturally means it's 
very encoding friendly.

Let's see if Dave has any thought about the choices :)

Best,
Wei

Re: [PATCH v3] migration/xbzrle: add encoding rate

2020-06-04 Thread Wei Wang


On 06/04/2020 05:38 PM, Dr. David Alan Gilbert wrote:

* Richard Henderson (richard.hender...@linaro.org) wrote:

On 6/3/20 7:58 PM, Wei Wang wrote:

It is possible that encoded_size==0, but unencoded_size !=0. For example,
a page is written with the same data that it already has.

That really contains 0 bytes?
Not even the ones that say "same data"?

You certainly have a magical compression algorithm there.
Or bad accounting.

We just don't bother sending the page at all in the case it's not
changed; no headers, no nothing:

 if (encoded_len == 0) {
 trace_save_xbzrle_page_skipping();
 return 0;

and that's xbzrle having correctly done it's job.



The encoding_rate is expected to reflect if the page is xbzrle encoding 
friendly.
The larger, the more friendly, so 0 might not be a good representation here.

Maybe, we could change UINT64_MAX above to "~0ULL" to avoid the issue?

~0ull is no different than UINT64_MAX -- indeed, they are *exactly* the same
value -- and is not an exactly representible floating-point value.

If unencoded_size != 0, and (somehow) encoded_size == 0, then

   unencoded_size / encoded_size = Inf

which is indeed the limit of x -> 0, n / x.

Which is *also* printable by %0.2f.

I still contend that the middle if should be removed, and you should print out
whatever's left.  Either NaN or Inf is instructive.  Certainly nothing in the
middle cares about the actual value.

Hmm OK; I'll admit to not liking NaN/Inf in output.

Dave



OK. To deal with the reported issue, how about using FLT_MAX (as opposed 
to UINT64_MAX or inf):

xbzrle_counters.encoding_rate = FLT_MAX;


Best,
Wei

Re: [PATCH v3] migration/xbzrle: add encoding rate

2020-06-04 Thread Wei Wang


On 06/04/2020 11:22 AM, Richard Henderson wrote:

On 6/3/20 7:58 PM, Wei Wang wrote:

It is possible that encoded_size==0, but unencoded_size !=0. For example,
a page is written with the same data that it already has.

That really contains 0 bytes?
Not even the ones that say "same data"?


Yes. It's a just delta operation, the diff (encoded_size) is 0 in that case.



You certainly have a magical compression algorithm there.
Or bad accounting.


The encoding_rate is expected to reflect if the page is xbzrle encoding 
friendly.
The larger, the more friendly, so 0 might not be a good representation here.

Maybe, we could change UINT64_MAX above to "~0ULL" to avoid the issue?

~0ull is no different than UINT64_MAX -- indeed, they are *exactly* the same
value -- and is not an exactly representible floating-point value.

If unencoded_size != 0, and (somehow) encoded_size == 0, then

   unencoded_size / encoded_size = Inf

which is indeed the limit of x -> 0, n / x.

Which is *also* printable by %0.2f.

I still contend that the middle if should be removed, and you should print out
whatever's left.  Either NaN or Inf is instructive.  Certainly nothing in the
middle cares about the actual value.



OK, leave it as "inf" looks good to me. Will send a patch to remove the 
middle. Thanks!


Best,
Wei

Re: [PATCH v3] migration/xbzrle: add encoding rate

2020-06-03 Thread Wei Wang


On 06/04/2020 03:28 AM, Richard Henderson wrote:

On Wed, 29 Apr 2020 at 18:54, Wei Wang  wrote:

+if (xbzrle_counters.pages == rs->xbzrle_pages_prev) {
+xbzrle_counters.encoding_rate = 0;
+} else if (!encoded_size) {
+xbzrle_counters.encoding_rate = UINT64_MAX;
+} else {
+xbzrle_counters.encoding_rate = unencoded_size / encoded_size;
+}

With clang 10, this produces

   CC  aarch64-softmmu/migration/ram.o
/home/rth/qemu/qemu/migration/ram.c:919:45: error: implicit conversion
from 'unsigned long' to 'double' changes value from
18446744073709551615 to 18446744073709551616
[-Werror,-Wimplicit-int-float-conversion]
 xbzrle_counters.encoding_rate = UINT64_MAX;
   ~ ^~
/usr/include/stdint.h:130:23: note: expanded from macro 'UINT64_MAX'
# define UINT64_MAX (__UINT64_C(18446744073709551615))
  ^~~~
/usr/include/stdint.h:107:25: note: expanded from macro '__UINT64_C'
#  define __UINT64_C(c) c ## UL
 ^~~
:36:1: note: expanded from here
18446744073709551615UL
^~
1 error generated.

UINT64_MAX apprears both arbitrary and wrong.

The only way I can imagine encoded_size == 0 is unencoded_size == 0,
so 0 seems like the correct answer.  Moreover, it really seems like
the first test sufficiently covers that possibility.


It is possible that encoded_size==0, but unencoded_size !=0. For example,
a page is written with the same data that it already has.




In addition, the only user of this value is


+monitor_printf(mon, "xbzrle encoding rate: %0.2f\n",
+   info->xbzrle_cache->encoding_rate);

which would be quite happy to print NaN even if the 0/0 computation
were to run.  Though as I say above, I don't think that's reachable.


The encoding_rate is expected to reflect if the page is xbzrle encoding 
friendly.

The larger, the more friendly, so 0 might not be a good representation here.

Maybe, we could change UINT64_MAX above to "~0ULL" to avoid the issue?

Best,
Wei

[PATCH v3] migration/xbzrle: add encoding rate

2020-04-29 Thread Wei Wang

Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---
 migration/migration.c |  1 +
 migration/ram.c   | 39 +--
 monitor/hmp-cmds.c|  2 ++
 qapi/migration.json   |  5 -
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 187ac0410c..e40421353c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -930,6 +930,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->xbzrle_cache->pages = xbzrle_counters.pages;
 info->xbzrle_cache->cache_miss = xbzrle_counters.cache_miss;
 info->xbzrle_cache->cache_miss_rate = xbzrle_counters.cache_miss_rate;
+info->xbzrle_cache->encoding_rate = xbzrle_counters.encoding_rate;
 info->xbzrle_cache->overflow = xbzrle_counters.overflow;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 04f13feb2e..41b75a0a0f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -327,6 +327,10 @@ struct RAMState {
 uint64_t num_dirty_pages_period;
 /* xbzrle misses since the beginning of the period */
 uint64_t xbzrle_cache_miss_prev;
+/* Amount of xbzrle pages since the beginning of the period */
+uint64_t xbzrle_pages_prev;
+/* Amount of xbzrle encoded bytes since the beginning of the period */
+uint64_t xbzrle_bytes_prev;
 
 /* compression statistics since the beginning of the period */
 /* amount of count that no free thread to compress data */
@@ -696,6 +700,18 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 return -1;
 }
 
+/*
+ * Reaching here means the page has hit the xbzrle cache, no matter what
+ * encoding result it is (normal encoding, overflow or skipping the page),
+ * count the page as encoded. This is used to caculate the encoding rate.
+ *
+ * Example: 2 pages (8KB) being encoded, first page encoding generates 2KB,
+ * 2nd page turns out to be skipped (i.e. no new bytes written to the
+ * page), the overall encoding rate will be 8KB / 2KB = 4, which has the
+ * skipped page included. In this way, the encoding rate can tell if the
+ * guest page is good for xbzrle encoding.
+ */
+xbzrle_counters.pages++;
 prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
 
 /* save current buffer into memory */
@@ -726,6 +742,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 } else if (encoded_len == -1) {
 trace_save_xbzrle_page_overflow();
 xbzrle_counters.overflow++;
+xbzrle_counters.bytes += TARGET_PAGE_SIZE;
 return -1;
 }
 
@@ -736,8 +753,12 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 qemu_put_be16(rs->f, encoded_len);
 qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
 bytes_xbzrle += encoded_len + 1 + 2;
-xbzrle_counters.pages++;
-xbzrle_counters.bytes += bytes_xbzrle;
+/*
+ * Like compressed_size (please see update_compress_thread_counts),
+ * the xbzrle encoded bytes don't count the 8 byte header with
+ * RAM_SAVE_FLAG_CONTINUE.
+ */
+xbzrle_counters.bytes += bytes_xbzrle - 8;
 ram_counters.transferred += bytes_xbzrle;
 
 return 1;
@@ -870,9 +891,23 @@ static void migration_update_rates(RAMState *rs, int64_t 
end_time)
 }
 
 if (migrate_use_xbzrle()) {
+double encoded_size, unencoded_size;
+
 xbzrle_counters.cache_miss_rate = (double)(xbzrle_counters.cache_miss -
 rs->xbzrle_cache_miss_prev) / page_count;
 rs->xbzrle_cache_miss_prev = xbzrle_counters.cache_miss;
+unencoded_size = (xbzrle_counters.pages - rs->xbzrle_pages_prev) *
+ TARGET_PAGE_SIZE;
+encoded_size = xbzrle_counters.bytes - rs->xbzrle_bytes_prev;
+if (xbzrle_counters.pages == rs->xbzrle_pages_prev) {
+xbzrle_counters.encoding_rate = 0;
+} else if (!encoded_size) {
+xbzrle_counters.encoding_rate = UINT64_MAX;
+} else {
+xbzrle_counters.encoding_rate = unencoded_size / encoded_size;
+}
+rs->xbzrle_pages_prev = xbzrle_counters.pages;
+rs->xbzrle_bytes_prev = xbzrle_counters.bytes;
 }
 
 if (migrate_use_compression()) {
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9b94e67879..c2a3a667ae 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -303,6 +303,8 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->xbzrle_cache->cache_miss);
 monitor_printf(mon, "xbzrle cache miss rate: %0.2f\n",
info->xbzrle_cache->cache_miss_rate);
+mo

Re: [PATCH v2] migration/xbzrle: add encoding rate

2020-04-28 Thread Wei Wang


On 04/28/2020 10:51 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---
  migration/migration.c |  1 +
  migration/ram.c   | 38 --
  monitor/hmp-cmds.c|  2 ++
  qapi/migration.json   |  5 -
  4 files changed, 43 insertions(+), 3 deletions(-)

ChangeLog:
- include the 3 bytes (ENCODING_FLAG_XBZRLE flag and encoded_len) when
   calculating the encoding rate. Similar to the compress rate
   calculation, the 8 byte RAM_SAVE_FLAG_CONTINUE flag isn't included in
   the calculation.

diff --git a/migration/migration.c b/migration/migration.c
index 187ac04..e404213 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -930,6 +930,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
  info->xbzrle_cache->pages = xbzrle_counters.pages;
  info->xbzrle_cache->cache_miss = xbzrle_counters.cache_miss;
  info->xbzrle_cache->cache_miss_rate = xbzrle_counters.cache_miss_rate;
+info->xbzrle_cache->encoding_rate = xbzrle_counters.encoding_rate;
  info->xbzrle_cache->overflow = xbzrle_counters.overflow;
  }
  
diff --git a/migration/ram.c b/migration/ram.c

index 04f13fe..f46ab96 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -327,6 +327,10 @@ struct RAMState {
  uint64_t num_dirty_pages_period;
  /* xbzrle misses since the beginning of the period */
  uint64_t xbzrle_cache_miss_prev;
+/* Amount of xbzrle pages since the beginning of the period */
+uint64_t xbzrle_pages_prev;
+/* Amount of xbzrle encoded bytes since the beginning of the period */
+uint64_t xbzrle_bytes_prev;
  
  /* compression statistics since the beginning of the period */

  /* amount of count that no free thread to compress data */
@@ -696,6 +700,18 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
  return -1;
  }
  
+/*

+ * Reaching here means the page has hit the xbzrle cache, no matter what
+ * encoding result it is (normal encoding, overflow or skipping the page),
+ * count the page as encoded. This is used to caculate the encoding rate.
+ *
+ * Example: 2 pages (8KB) being encoded, first page encoding generates 2KB,
+ * 2nd page turns out to be skipped (i.e. no new bytes written to the
+ * page), the overall encoding rate will be 8KB / 2KB = 4, which has the
+ * skipped page included. In this way, the encoding rate can tell if the
+ * guest page is good for xbzrle encoding.
+ */
+xbzrle_counters.pages++;

Can you explain how that works with overflow?  Doesn't overflow return
-1 here, not increment the bytes, so it looks like you've xbzrle'd a
page, but the encoding rate hasn't incremented.


OK. How about adding below before returning -1 (for the overflow case):

...
xbzrle_counters.bytes += TARGET_PAGE_SIZE;
return -1;

Example: if we have 2 pages encoded as below:
4KB--> after normal encoding: 2KB
4KB--> after overflow: 4KB (will be sent as non-encoded page)
then the encoding rate is 8KB / 6KB = ~1.3
(if we skip the counting of the overflow case,
the encoding rate will be 4KB/ 2KB=2. Users may think that's
good to go with xbzrle, unless they do another calculation with
checking the overflow numbers themselves)

Best,
Wei

[PATCH v2] migration/xbzrle: add encoding rate

2020-04-27 Thread Wei Wang

Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---
 migration/migration.c |  1 +
 migration/ram.c   | 38 --
 monitor/hmp-cmds.c|  2 ++
 qapi/migration.json   |  5 -
 4 files changed, 43 insertions(+), 3 deletions(-)

ChangeLog:
- include the 3 bytes (ENCODING_FLAG_XBZRLE flag and encoded_len) when
  calculating the encoding rate. Similar to the compress rate
  calculation, the 8 byte RAM_SAVE_FLAG_CONTINUE flag isn't included in
  the calculation.

diff --git a/migration/migration.c b/migration/migration.c
index 187ac04..e404213 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -930,6 +930,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->xbzrle_cache->pages = xbzrle_counters.pages;
 info->xbzrle_cache->cache_miss = xbzrle_counters.cache_miss;
 info->xbzrle_cache->cache_miss_rate = xbzrle_counters.cache_miss_rate;
+info->xbzrle_cache->encoding_rate = xbzrle_counters.encoding_rate;
 info->xbzrle_cache->overflow = xbzrle_counters.overflow;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 04f13fe..f46ab96 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -327,6 +327,10 @@ struct RAMState {
 uint64_t num_dirty_pages_period;
 /* xbzrle misses since the beginning of the period */
 uint64_t xbzrle_cache_miss_prev;
+/* Amount of xbzrle pages since the beginning of the period */
+uint64_t xbzrle_pages_prev;
+/* Amount of xbzrle encoded bytes since the beginning of the period */
+uint64_t xbzrle_bytes_prev;
 
 /* compression statistics since the beginning of the period */
 /* amount of count that no free thread to compress data */
@@ -696,6 +700,18 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 return -1;
 }
 
+/*
+ * Reaching here means the page has hit the xbzrle cache, no matter what
+ * encoding result it is (normal encoding, overflow or skipping the page),
+ * count the page as encoded. This is used to caculate the encoding rate.
+ *
+ * Example: 2 pages (8KB) being encoded, first page encoding generates 2KB,
+ * 2nd page turns out to be skipped (i.e. no new bytes written to the
+ * page), the overall encoding rate will be 8KB / 2KB = 4, which has the
+ * skipped page included. In this way, the encoding rate can tell if the
+ * guest page is good for xbzrle encoding.
+ */
+xbzrle_counters.pages++;
 prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
 
 /* save current buffer into memory */
@@ -736,8 +752,12 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 qemu_put_be16(rs->f, encoded_len);
 qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
 bytes_xbzrle += encoded_len + 1 + 2;
-xbzrle_counters.pages++;
-xbzrle_counters.bytes += bytes_xbzrle;
+/*
+ * Like compressed_size (please see update_compress_thread_counts),
+ * the xbzrle encoded bytes don't count the 8 byte header with
+ * RAM_SAVE_FLAG_CONTINUE.
+ */
+xbzrle_counters.bytes += bytes_xbzrle - 8;
 ram_counters.transferred += bytes_xbzrle;
 
 return 1;
@@ -870,9 +890,23 @@ static void migration_update_rates(RAMState *rs, int64_t 
end_time)
 }
 
 if (migrate_use_xbzrle()) {
+double encoded_size, unencoded_size;
+
 xbzrle_counters.cache_miss_rate = (double)(xbzrle_counters.cache_miss -
 rs->xbzrle_cache_miss_prev) / page_count;
 rs->xbzrle_cache_miss_prev = xbzrle_counters.cache_miss;
+unencoded_size = (xbzrle_counters.pages - rs->xbzrle_pages_prev) *
+ TARGET_PAGE_SIZE;
+encoded_size = xbzrle_counters.bytes - rs->xbzrle_bytes_prev;
+if (xbzrle_counters.pages == rs->xbzrle_pages_prev) {
+xbzrle_counters.encoding_rate = 0;
+} else if (!encoded_size) {
+xbzrle_counters.encoding_rate = UINT64_MAX;
+} else {
+xbzrle_counters.encoding_rate = unencoded_size / encoded_size;
+}
+rs->xbzrle_pages_prev = xbzrle_counters.pages;
+rs->xbzrle_bytes_prev = xbzrle_counters.bytes;
 }
 
 if (migrate_use_compression()) {
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9b94e67..c2a3a66 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -303,6 +303,8 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->xbzrle_cache->cache_miss);
 monitor_printf(mon, "xbzrle cache miss rate: %0.2f\n",
info->xbzrle_cache->cache_miss_rate);
+monitor_printf(mon, "xbzrle encoding rate: %0.2f\n",
+

Re: [PATCH v1 1/2] migration/xbzrle: replace transferred xbzrle bytes with encoded bytes

2020-04-27 Thread Wei Wang


On 04/24/2020 06:47 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 04/22/2020 03:21 AM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

Like compressed_size which indicates how many bytes are compressed, we
need encoded_size to understand how many bytes are encoded with xbzrle
during migration.

Replace the old xbzrle_counter.bytes, instead of adding a new counter,
because we don't find a usage of xbzrle_counter.bytes currently, which
includes 3 more bytes of the migration transfer protocol header (in
addition to the encoding header). The encoded_size will further be used
to calculate the encoding rate.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 

Can you explain why these 3 bytes matter?  Certainly the 2 bytes of the
encoded_len are an overhead that's a cost of using XBZRLE; so if you're
trying to figure out whether xbzrle is worth it, then you should include
those 2 bytes in the cost.
That other byte, that holds ENCODING_FLAG_XBZRLE also seems to be pure
oerhead of XBZRLE; so your cost of using XBZRLE really does include
those 3 bytes.

SO to me it makes sense to include the 3 bytes as it currently does.

Dave

Thanks Dave for sharing your thoughts.

We hope to do a fair comparison of compression rate and xbzrle encoding
rate.
The current compression_rate doesn't include the migration flag overhead
(please see
update_compress thread_counts() ). So for xbzrle encoding rate, we wanted it
not include the migration
protocol flags as well (but the 2 bytes xbzrle encoding overhead is kept
there, as the compression rate
includes the compression header overhead).

Or would you think it is necessary to add the migration flag (8 bytes) for
compression
when calculating the compression rate?

I don't think the migration flag (8 bytes) matters, because everyone has
that; but isn't this patch about the 3 bytes (1 byte
ENCONDING_FLAG_XBZRLE) (2 byte encoded_len) ?

The 2 byte encoded_len in this code, corresponds to the 4 byte blen in
qemu_put_compression_data;  I'm not sure but I think that 4 bytes is
included in the length update_compress_thread_counts() sees - if so
that makes it equivalent including the length.



Right, that makes sense, thanks.

Best,
Wei

Re: [PATCH v1 1/2] migration/xbzrle: replace transferred xbzrle bytes with encoded bytes

2020-04-21 Thread Wei Wang


On 04/22/2020 03:21 AM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

Like compressed_size which indicates how many bytes are compressed, we
need encoded_size to understand how many bytes are encoded with xbzrle
during migration.

Replace the old xbzrle_counter.bytes, instead of adding a new counter,
because we don't find a usage of xbzrle_counter.bytes currently, which
includes 3 more bytes of the migration transfer protocol header (in
addition to the encoding header). The encoded_size will further be used
to calculate the encoding rate.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 

Can you explain why these 3 bytes matter?  Certainly the 2 bytes of the
encoded_len are an overhead that's a cost of using XBZRLE; so if you're
trying to figure out whether xbzrle is worth it, then you should include
those 2 bytes in the cost.
That other byte, that holds ENCODING_FLAG_XBZRLE also seems to be pure
oerhead of XBZRLE; so your cost of using XBZRLE really does include
those 3 bytes.

SO to me it makes sense to include the 3 bytes as it currently does.

Dave


Thanks Dave for sharing your thoughts.

We hope to do a fair comparison of compression rate and xbzrle encoding 
rate.
The current compression_rate doesn't include the migration flag overhead 
(please see
update_compress thread_counts() ). So for xbzrle encoding rate, we 
wanted it not include the migration
protocol flags as well (but the 2 bytes xbzrle encoding overhead is kept 
there, as the compression rate

includes the compression header overhead).

Or would you think it is necessary to add the migration flag (8 bytes) 
for compression

when calculating the compression rate?

Best,
Wei

Re: [PATCH v1 2/2] migration/xbzrle: add encoding rate

2020-04-20 Thread Wei Wang


On 04/20/2020 10:53 PM, Eric Blake wrote:

On 4/19/20 10:06 PM, Wei Wang wrote:

Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---


In addition to Dan's review comments,


+++ b/qapi/migration.json
@@ -70,6 +70,8 @@
  #
  # @cache-miss-rate: rate of cache miss (since 2.1)
  #
+# @encoding-rate: rate of cache miss


This is missing a '(since 5.1)' tag.


OK, will add it, thanks.

Best,
Wei

Re: [PATCH v1 1/2] migration/xbzrle: replace transferred xbzrle bytes with encoded bytes

2020-04-20 Thread Wei Wang


On 04/20/2020 05:29 PM, Daniel P. Berrangé wrote:

On Mon, Apr 20, 2020 at 11:06:42AM +0800, Wei Wang wrote:

Like compressed_size which indicates how many bytes are compressed, we
need encoded_size to understand how many bytes are encoded with xbzrle
during migration.

Replace the old xbzrle_counter.bytes, instead of adding a new counter,
because we don't find a usage of xbzrle_counter.bytes currently, which
includes 3 more bytes of the migration transfer protocol header (in
addition to the encoding header). The encoded_size will further be used
to calculate the encoding rate.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---
  migration/migration.c |  2 +-
  migration/ram.c   | 18 +-
  monitor/hmp-cmds.c|  4 ++--
  qapi/migration.json   |  6 +++---
  4 files changed, 15 insertions(+), 15 deletions(-)



diff --git a/qapi/migration.json b/qapi/migration.json
index eca2981d0a..bf195ff6ac 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -62,7 +62,7 @@
  #
  # @cache-size: XBZRLE cache size
  #
-# @bytes: amount of bytes already transferred to the target VM
+# @encoded_size: amount of bytes encoded

Woah, this is part of QEMU's public API, so it isn't permissible to just
arbitrarily remove a field with no warning, and replace it with a new
field reporting different data. Adding a new field is allowed, but any
existing field should be deprecated first, if there is a genuine need
to remove it. If it isn't costly though, just leave the existing field
unchanged.

I would also note that the other fields in this struct use a hyphen, not
an underscore.

OK. Thanks for reviewing and pointing it out.
We can add it as a new filed using hyphen in this case.
Will wait for other comments to post out a new version.

Best,
Wei

[PATCH v1 1/2] migration/xbzrle: replace transferred xbzrle bytes with encoded bytes

2020-04-19 Thread Wei Wang

Like compressed_size which indicates how many bytes are compressed, we
need encoded_size to understand how many bytes are encoded with xbzrle
during migration.

Replace the old xbzrle_counter.bytes, instead of adding a new counter,
because we don't find a usage of xbzrle_counter.bytes currently, which
includes 3 more bytes of the migration transfer protocol header (in
addition to the encoding header). The encoded_size will further be used
to calculate the encoding rate.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---
 migration/migration.c |  2 +-
 migration/ram.c   | 18 +-
 monitor/hmp-cmds.c|  4 ++--
 qapi/migration.json   |  6 +++---
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 187ac0410c..8e7ee0d76b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -926,7 +926,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->has_xbzrle_cache = true;
 info->xbzrle_cache = g_malloc0(sizeof(*info->xbzrle_cache));
 info->xbzrle_cache->cache_size = migrate_xbzrle_cache_size();
-info->xbzrle_cache->bytes = xbzrle_counters.bytes;
+info->xbzrle_cache->encoded_size = xbzrle_counters.encoded_size;
 info->xbzrle_cache->pages = xbzrle_counters.pages;
 info->xbzrle_cache->cache_miss = xbzrle_counters.cache_miss;
 info->xbzrle_cache->cache_miss_rate = xbzrle_counters.cache_miss_rate;
diff --git a/migration/ram.c b/migration/ram.c
index 04f13feb2e..bca5878f25 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -677,7 +677,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 ram_addr_t current_addr, RAMBlock *block,
 ram_addr_t offset, bool last_stage)
 {
-int encoded_len = 0, bytes_xbzrle;
+int encoded_size = 0, bytes_xbzrle;
 uint8_t *prev_cached_page;
 
 if (!cache_is_cached(XBZRLE.cache, current_addr,
@@ -702,7 +702,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
 
 /* XBZRLE encoding (if there is no overflow) */
-encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
+encoded_size = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
TARGET_PAGE_SIZE);
 
@@ -710,7 +710,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
  * Update the cache contents, so that it corresponds to the data
  * sent, in all cases except where we skip the page.
  */
-if (!last_stage && encoded_len != 0) {
+if (!last_stage && encoded_size != 0) {
 memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
 /*
  * In the case where we couldn't compress, ensure that the caller
@@ -720,10 +720,10 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 *current_data = prev_cached_page;
 }
 
-if (encoded_len == 0) {
+if (encoded_size == 0) {
 trace_save_xbzrle_page_skipping();
 return 0;
-} else if (encoded_len == -1) {
+} else if (encoded_size == -1) {
 trace_save_xbzrle_page_overflow();
 xbzrle_counters.overflow++;
 return -1;
@@ -733,11 +733,11 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 bytes_xbzrle = save_page_header(rs, rs->f, block,
 offset | RAM_SAVE_FLAG_XBZRLE);
 qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE);
-qemu_put_be16(rs->f, encoded_len);
-qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
-bytes_xbzrle += encoded_len + 1 + 2;
+qemu_put_be16(rs->f, encoded_size);
+qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_size);
+bytes_xbzrle += encoded_size + 1 + 2;
 xbzrle_counters.pages++;
-xbzrle_counters.bytes += bytes_xbzrle;
+xbzrle_counters.encoded_size += encoded_size;
 ram_counters.transferred += bytes_xbzrle;
 
 return 1;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9b94e67879..6d3b35ca64 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -295,8 +295,8 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 if (info->has_xbzrle_cache) {
 monitor_printf(mon, "cache size: %" PRIu64 " bytes\n",
info->xbzrle_cache->cache_size);
-monitor_printf(mon, "xbzrle transferred: %" PRIu64 " kbytes\n",
-   info->xbzrle_cache->bytes >> 10);
+monitor_printf(mon, "xbzrle encoded size: %" PRIu64 " kbytes\n",
+   info->xbzrle_cache->encoded_size >> 10);

[PATCH v1 2/2] migration/xbzrle: add encoding rate

2020-04-19 Thread Wei Wang

Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---
 migration/migration.c |  1 +
 migration/ram.c   | 31 +--
 monitor/hmp-cmds.c|  2 ++
 qapi/migration.json   |  5 -
 slirp |  2 +-
 5 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8e7ee0d76b..84a556a4ac 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -930,6 +930,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->xbzrle_cache->pages = xbzrle_counters.pages;
 info->xbzrle_cache->cache_miss = xbzrle_counters.cache_miss;
 info->xbzrle_cache->cache_miss_rate = xbzrle_counters.cache_miss_rate;
+info->xbzrle_cache->encoding_rate = xbzrle_counters.encoding_rate;
 info->xbzrle_cache->overflow = xbzrle_counters.overflow;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index bca5878f25..c87c38ec80 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -327,6 +327,10 @@ struct RAMState {
 uint64_t num_dirty_pages_period;
 /* xbzrle misses since the beginning of the period */
 uint64_t xbzrle_cache_miss_prev;
+/* Amount of xbzrle pages since the beginning of the period */
+uint64_t xbzrle_pages_prev;
+/* Amount of encoded bytes since the beginning of the period */
+uint64_t encoded_size_prev;
 
 /* compression statistics since the beginning of the period */
 /* amount of count that no free thread to compress data */
@@ -696,6 +700,18 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 return -1;
 }
 
+/*
+ * Reaching here means the page has hit the xbzrle cache, no matter what
+ * encoding result it is (normal encoding, overflow or skipping the page),
+ * count the page as encoded. This is used to caculate the encoding rate.
+ *
+ * Example: 2 pages (8KB) being encoded, first page encoding generates 2KB,
+ * 2nd page turns out to be skipped (i.e. no new bytes written to the
+ * page), the overall encoding rate will be 8KB / 2KB = 4, which has the
+ * skipped page included. In this way, the encoding rate can tell if the
+ * guest page is good for xbzrle encoding.
+ */
+xbzrle_counters.pages++;
 prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
 
 /* save current buffer into memory */
@@ -736,7 +752,6 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 qemu_put_be16(rs->f, encoded_size);
 qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_size);
 bytes_xbzrle += encoded_size + 1 + 2;
-xbzrle_counters.pages++;
 xbzrle_counters.encoded_size += encoded_size;
 ram_counters.transferred += bytes_xbzrle;
 
@@ -859,7 +874,7 @@ uint64_t ram_get_total_transferred_pages(void)
 static void migration_update_rates(RAMState *rs, int64_t end_time)
 {
 uint64_t page_count = rs->target_page_count - rs->target_page_count_prev;
-double compressed_size;
+double compressed_size, encoded_size, unencoded_size;
 
 /* calculate period counters */
 ram_counters.dirty_pages_rate = rs->num_dirty_pages_period * 1000
@@ -873,6 +888,18 @@ static void migration_update_rates(RAMState *rs, int64_t 
end_time)
 xbzrle_counters.cache_miss_rate = (double)(xbzrle_counters.cache_miss -
 rs->xbzrle_cache_miss_prev) / page_count;
 rs->xbzrle_cache_miss_prev = xbzrle_counters.cache_miss;
+unencoded_size = (xbzrle_counters.pages - rs->xbzrle_pages_prev) *
+ TARGET_PAGE_SIZE;
+encoded_size = xbzrle_counters.encoded_size - rs->encoded_size_prev;
+if (xbzrle_counters.pages == rs->xbzrle_pages_prev) {
+xbzrle_counters.encoding_rate = 0;
+} else if (!encoded_size) {
+xbzrle_counters.encoding_rate = UINT64_MAX;
+} else {
+xbzrle_counters.encoding_rate = unencoded_size / encoded_size;
+}
+rs->xbzrle_pages_prev = xbzrle_counters.pages;
+rs->encoded_size_prev = xbzrle_counters.encoded_size;
 }
 
 if (migrate_use_compression()) {
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 6d3b35ca64..07f41e582f 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -303,6 +303,8 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->xbzrle_cache->cache_miss);
 monitor_printf(mon, "xbzrle cache miss rate: %0.2f\n",
info->xbzrle_cache->cache_miss_rate);
+monitor_printf(mon, "xbzrle encoding rate: %0.2f\n",
+   info->xbzrle_cache->encoding_rate);
 monitor_printf(mon, "xbz

[PATCH v1 0/2] Migration xbzrle changes

2020-04-19 Thread Wei Wang

This patches change/add some xbzrle statistics to give users some useful
feedbacks about the delta operations.

Wei Wang (2):
  migration/xbzrle: replace transferred xbzrle bytes with encoded bytes
  migration/xbzrle: add encoding rate

 migration/migration.c |  3 ++-
 migration/ram.c   | 49 +--
 monitor/hmp-cmds.c|  6 --
 qapi/migration.json   | 11 ++
 slirp |  2 +-
 5 files changed, 52 insertions(+), 19 deletions(-)

-- 
2.20.1

Re: [Qemu-devel] [PATCH] migration/ram.c: Fix codes conflict about bitmap_mutex

2019-04-01 Thread Wei Wang


On 03/30/2019 06:29 AM, Zhang Chen wrote:

From: Zhang Chen 

I found upstream codes conflict with COLO and lead to crash,
and I located to this patch:

commit 386a907b37a9321bc5d699bc37104d6ffba1b34d
Author: Wei Wang 
Date:   Tue Dec 11 16:24:49 2018 +0800

migration: use bitmap_mutex in migration_bitmap_clear_dirty

My colleague Wei's patch add bitmap_mutex in migration_bitmap_clear_dirty,
but COLO didn't initialize the bitmap_mutex. So we always get an error
when COLO start up. like that:
qemu-system-x86_64: util/qemu-thread-posix.c:64: qemu_mutex_lock_impl: Assertion 
`mutex->initialized' failed.

This patch add the bitmap_mutex initialize and destroy in COLO
lifecycle.

Signed-off-by: Zhang Chen 
---
  migration/ram.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index d7f8fe45a8..f68bff 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3918,6 +3918,7 @@ int colo_init_ram_cache(void)
  }
  ram_state = g_new0(RAMState, 1);
  ram_state->migration_dirty_pages = 0;
+qemu_mutex_init(_state->bitmap_mutex);
  memory_global_dirty_log_start();
  
  return 0;

@@ -3956,6 +3957,7 @@ void colo_release_ram_cache(void)
  }
  
  rcu_read_unlock();

+qemu_mutex_destroy(_state->bitmap_mutex);
  g_free(ram_state);
  ram_state = NULL;
  }


Reviewed-by: Wei Wang 

Best,
Wei

[Qemu-devel] [PATCH] virtio-balloon: fix a use-after-free case

2019-03-12 Thread Wei Wang

The elem could theorically contain both outbuf and inbufs. We move the
free operation to the end of this function to avoid using elem->in_sg
while elem has been freed.

Fixes: c13c4153f7
("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Reported-by: Peter Maydell 
Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e3a6594..b614552 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -391,6 +391,7 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 VirtQueueElement *elem;
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtQueue *vq = dev->free_page_vq;
+bool ret = true;
 
 while (dev->block_iothread) {
 qemu_cond_wait(>free_page_cond, >free_page_lock);
@@ -405,13 +406,12 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 uint32_t id;
 size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
  , sizeof(id));
-virtqueue_push(vq, elem, size);
-g_free(elem);
 
 virtio_tswap32s(vdev, );
 if (unlikely(size != sizeof(id))) {
 virtio_error(vdev, "received an incorrect cmd id");
-return false;
+ret = false;
+goto out;
 }
 if (id == dev->free_page_report_cmd_id) {
 dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
@@ -431,11 +431,12 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
-virtqueue_push(vq, elem, 1);
-g_free(elem);
 }
 
-return true;
+out:
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+return ret;
 }
 
 static void virtio_ballloon_get_free_page_hints(void *opaque)
-- 
1.8.3.1

Re: [Qemu-devel] [PULL 19/22] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-03-09 Thread Wei Wang


On 03/09/2019 02:30 AM, Dr. David Alan Gilbert wrote:

* Peter Maydell (peter.mayd...@linaro.org) wrote:

On Wed, 6 Mar 2019 at 11:55, Dr. David Alan Gilbert (git)
 wrote:

From: Wei Wang 


Wei: Can you look at this


Sure, I'll send a followup patch to fix this.





The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
Message-Id: <1544516693-5395-8-git-send-email-wei.w.w...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dr. David Alan Gilbert 
   dgilbert: Dropped kernel header update, fixed up CMD_ID_* name change
---

Hi -- Coverity points out a use-after-free here (CID 1399412):


+static bool get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return false;
+}
+
+if (elem->out_num) {
+uint32_t id;
+size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
+ , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);

Here we free elem...


+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return false;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+}
+}
+}
+
+if (elem->in_num) {

...but we can fall through here and try to dereference elem...


+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 1);
+g_free(elem);

...and then free it again.

OK, so the question here is:
   Is it allowed to have both in and out in the same queue element; if
it's not then we need to error the device.
   If it is allowed then we need to fix up the out_num case.



I think it is allowed. From virtqueue_pop, an elem could consist of 
several inbufs and outbufs.


Probably we could just delay the free till the end of this 
function..will post the fix patch for review soon.


Best,
Wei

Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-03-05 Thread Wei Wang


On 03/05/2019 10:50 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
  hw/virtio/virtio-balloon.c  | 263 
  include/hw/virtio/virtio-balloon.h  |  28 ++-
  include/standard-headers/linux/virtio_balloon.h |   5 +
  3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
+config.free_page_report_cmd_id =
+   cpu_to_le32(dev->free_page_report_cmd_id);
+} else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+config.free_page_report_cmd_id =
+   cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+} else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_DONE) {
+config.free_page_report_cmd_id =
+   cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_DONE_ID);
+}

It looks like somewhere in the last 3 months the name in the kernel
changed; so I think I've fixed this correctly but please shout if it's
wrong:

 if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
 config.free_page_report_cmd_id =
cpu_to_le32(dev->free_page_report_cmd_id);
 } else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
 config.free_page_report_cmd_id =
cpu_to_le32(VIRTIO_BALLOON_CMD_ID_STOP);
 } else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_DONE) {
 config.free_page_report_cmd_id =
cpu_to_le32(VIRTIO_BALLOON_CMD_ID_DONE);
 }

and I've dropped the kernel header update since it's already there.



Looks good. Thanks for the update.

Best,
Wei

Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-02-21 Thread Wei Wang


On 02/21/2019 06:18 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 02/20/2019 09:12 PM, Dr. David Alan Gilbert wrote:

* Wang, Wei W (wei.w.w...@intel.com) wrote:

On Friday, December 14, 2018 7:17 PM, Dr. David Alan Gilbert wrote:

On 12/14/2018 05:56 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 12/13/2018 11:45 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive
hints of guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier
chain. The notifier calls free_page_start after the migration
thread syncs the dirty bitmap, so that the free page
optimization starts to clear bits of free pages from the
bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round
of ram save. The free_page_stop is also called to stop the

optimization in the case when there is an error occurred in the process of
ram saving.

Note: balloon will report pages which were free at the time of this

call.

As the reporting happens asynchronously, dirty bit logging
must be enabled before this free_page_start call is made.
Guest reporting must be disabled before the migration dirty bitmap

is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 

I think I'm OK for this from the migration side, I'd appreciate
someone checking the virtio and aio bits.

I'm not too sure how it gets switched on and off - i.e. if we
get a nice new qemu on a new kernel, what happens when I try and
migrate to the same qemu on an older kernel without these hints?


This feature doesn't rely on the host kernel. Those hints are
reported from the guest kernel.
So migration across different hosts wouldn't affect the use of this

feature.

Please correct me if I didn't get your point.

Ah OK, yes;  now what about migrating from new->old qemu with a new
guest but old machine type?


I think normally, the source QEMU and destination QEMU should have the
same QEMU booting parameter. If the destination QEMU doesn't support
"--device virtio-balloon,free-page-hint=true", which the source QEMU
has, the destination side QEMU will fail to boot, and migration will
not happen then.

Ah that's OK; as long as free-page-hint is false by default that will work fine.

Dave


Hi Dave,

Could we have this feature in QEMU 4.0 (freeze on Mar 12)?

I think so; can you remind me where we're up to:
a) It looks like you've already got the kernel changes merged -
correct?

Yes, they were already merged half year ago.


b) What about the virtio spec changes - where are they upto?

The spec changes are in progress. v1 were posted out, a v2 is in
preparation.


c) Where are the other reviews upto - I think most are reviewed - is
it just 7/7 that is missing the review-by?

7/7 is about the virtio changes, and Michael has given the reviewed-by:
http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg03732.html

OK, I was going to check with mst for (b) because I prefer it after the
spec changes have been merged, but since mst is OK with it, then we can
merge especially with (a) already merged.


OK, thanks!

Best,
Wei

Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-02-20 Thread Wei Wang


On 02/20/2019 09:12 PM, Dr. David Alan Gilbert wrote:

* Wang, Wei W (wei.w.w...@intel.com) wrote:

On Friday, December 14, 2018 7:17 PM, Dr. David Alan Gilbert wrote:

On 12/14/2018 05:56 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 12/13/2018 11:45 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive
hints of guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier
chain. The notifier calls free_page_start after the migration
thread syncs the dirty bitmap, so that the free page
optimization starts to clear bits of free pages from the
bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round
of ram save. The free_page_stop is also called to stop the

optimization in the case when there is an error occurred in the process of
ram saving.

Note: balloon will report pages which were free at the time of this

call.

As the reporting happens asynchronously, dirty bit logging
must be enabled before this free_page_start call is made.
Guest reporting must be disabled before the migration dirty bitmap

is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 

I think I'm OK for this from the migration side, I'd appreciate
someone checking the virtio and aio bits.

I'm not too sure how it gets switched on and off - i.e. if we
get a nice new qemu on a new kernel, what happens when I try and
migrate to the same qemu on an older kernel without these hints?


This feature doesn't rely on the host kernel. Those hints are
reported from the guest kernel.
So migration across different hosts wouldn't affect the use of this

feature.

Please correct me if I didn't get your point.

Ah OK, yes;  now what about migrating from new->old qemu with a new
guest but old machine type?


I think normally, the source QEMU and destination QEMU should have the
same QEMU booting parameter. If the destination QEMU doesn't support
"--device virtio-balloon,free-page-hint=true", which the source QEMU
has, the destination side QEMU will fail to boot, and migration will
not happen then.

Ah that's OK; as long as free-page-hint is false by default that will work fine.

Dave


Hi Dave,

Could we have this feature in QEMU 4.0 (freeze on Mar 12)?

I think so; can you remind me where we're up to:
   a) It looks like you've already got the kernel changes merged -
correct?


Yes, they were already merged half year ago.


   b) What about the virtio spec changes - where are they upto?


The spec changes are in progress. v1 were posted out, a v2 is in 
preparation.



   c) Where are the other reviews upto - I think most are reviewed - is
it just 7/7 that is missing the review-by?

7/7 is about the virtio changes, and Michael has given the reviewed-by:
http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg03732.html


Best,
Wei

Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-14 Thread Wei Wang


On 12/14/2018 05:56 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 12/13/2018 11:45 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 

I think I'm OK for this from the migration side, I'd appreciate
someone checking the virtio and aio bits.

I'm not too sure how it gets switched on and off - i.e. if we get a nice
new qemu on a new kernel, what happens when I try and migrate to the
same qemu on an older kernel without these hints?


This feature doesn't rely on the host kernel. Those hints are reported from
the guest kernel.
So migration across different hosts wouldn't affect the use of this feature.
Please correct me if I didn't get your point.

Ah OK, yes;  now what about migrating from new->old qemu with a new
guest but old machine type?



I think normally, the source QEMU and destination QEMU should have the same
QEMU booting parameter. If the destination QEMU doesn't support
"--device virtio-balloon,free-page-hint=true", which the source QEMU 
has, the

destination side QEMU will fail to boot, and migration will not happen then.

But I think there is still an option to make the migration possible. The 
"free-page-hint"
can be set to false via e.g. QMP on the source side, then the 
destination side QEMU

can boot without "free-page-hint".

Best,
Wei

Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-13 Thread Wei Wang


On 12/13/2018 11:45 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 

I think I'm OK for this from the migration side, I'd appreciate
someone checking the virtio and aio bits.

I'm not too sure how it gets switched on and off - i.e. if we get a nice
new qemu on a new kernel, what happens when I try and migrate to the
same qemu on an older kernel without these hints?



This feature doesn't rely on the host kernel. Those hints are reported 
from the guest kernel.

So migration across different hosts wouldn't affect the use of this feature.
Please correct me if I didn't get your point.

Best,
Wei

Re: [Qemu-devel] [PATCH v11 1/7] bitmap: fix bitmap_count_one

2018-12-13 Thread Wei Wang


On 12/13/2018 10:28 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

OK; it's a little odd that it's only bitmap_count_one that's being fixed
for this case; but OK.


Reviewed-by: Dr. David Alan Gilbert 

Thanks.

We could also help fix other callers outside this series.
(this one is put here as it helps this optimization feature avoid that 
issue).


Best,
Wei

[Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-11 Thread Wei Wang

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/standard-headers/linux/virtio_balloon.h |   5 +
 3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1728e4f..543bbd4 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/misc.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -308,6 +309,184 @@ out:
 }
 }
 
+static void virtio_balloon_handle_free_page_vq(VirtIODevice *vdev,
+   VirtQueue *vq)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+qemu_bh_schedule(s->free_page_bh);
+}
+
+static bool get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return false;
+}
+
+if (elem->out_num) {
+uint32_t id;
+size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
+ , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);
+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return false;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+}
+}
+}
+
+if (elem->in_num) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+}
+
+return true;
+}
+
+static void virtio_ballloon_get_free_page_hints(void *opaque)
+{
+VirtIOBalloon *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+bool continue_to_get_hints;
+
+do {
+qemu_mutex_lock(>free_page_lock);
+virtio_queue_set_notification(vq, 0);
+continue_to_get_hints = get_free_page_hints(dev);
+qemu_mutex_unlock(>free_page_lock);
+virtio_notify(vdev, vq);
+  /*
+   * Start to poll the vq once the reporting started. Otherwise, continue
+   * only when there are entries on the vq, which need to be given back.
+   */
+} while (continue_to_get_hints ||
+ dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+virtio_queue_set_notification(vq, 1);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(VirtIOBalloon *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+/* For the stop and copy phase, we don't need to start the optimization */
+if (!vdev->vm_running) {
+return;
+}
+
+if (s->free_page_report_cmd_id == UINT_MAX) {
+s->free_page_report_cmd_id

[Qemu-devel] [PATCH v11 4/7] migration: API to clear bits of guest free pages from the dirty bitmap

2018-12-11 Thread Wei Wang

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
Reviewed-by: Peter Xu 
---
 include/migration/misc.h |  2 ++
 migration/ram.c  | 47 +++
 2 files changed, 49 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 01d267f..c13b44f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3131,6 +3131,53 @@ static void ram_state_resume_prepare(RAMState *rs, 
QEMUFile *out)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len, addr += used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+if (unlikely(!block || offset >= block->used_length)) {
+/*
+ * The implementation might not support RAMBlock resize during
+ * live migration, but it could happen in theory with future
+ * updates. So we add a check here to capture that case.
+ */
+error_report_once("%s unexpected error", __func__);
+return;
+}
+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+}
+
+start = offset >> TARGET_PAGE_BITS;
+npages = used_len >> TARGET_PAGE_BITS;
+
+qemu_mutex_lock(_state->bitmap_mutex);
+ram_state->migration_dirty_pages -=
+  bitmap_count_one_with_offset(block->bmap, start, npages);
+bitmap_clear(block->bmap, start, npages);
+qemu_mutex_unlock(_state->bitmap_mutex);
+}
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1

[Qemu-devel] [PATCH v11 1/7] bitmap: fix bitmap_count_one

2018-12-11 Thread Wei Wang

BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

Inital Discussion Link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg554316.html
Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..679f1bd 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -221,6 +221,10 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 
 static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
 {
+if (unlikely(!nbits)) {
+return 0;
+}
+
 if (small_nbits(nbits)) {
 return ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits));
 } else {
-- 
1.8.3.1

[Qemu-devel] [PATCH v11 6/7] migration/ram.c: add the free page optimization enable flag

2018-12-11 Thread Wei Wang

This patch adds the free page optimization enable flag, and a function
to set this flag. When the free page optimization is enabled, not all
the pages are needed to be sent in the bulk stage.

Why using a new flag, instead of directly disabling ram_bulk_stage when
the optimization is running?
Thanks for Peter Xu's reminder that disabling ram_bulk_stage will affect
the use of compression. Please see save_page_use_compression. When
xbzrle and compression are used, if free page optimizaion causes the
ram_bulk_stage to be disabled, save_page_use_compression will return
false, which disables the use of compression. That is, if free page
optimization avoids the sending of half of the guest pages, the other
half of pages loses the benefits of compression in the meantime. Using a
new flag to let migration_bitmap_find_dirty skip the free pages in the
bulk stage will avoid the above issue.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h |  1 +
 migration/ram.c  | 18 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 15f8d00..1227279 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -37,6 +37,7 @@ void precopy_infrastructure_init(void);
 void precopy_add_notifier(NotifierWithReturn *n);
 void precopy_remove_notifier(NotifierWithReturn *n);
 int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+void precopy_enable_free_page_optimization(void);
 
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index 1e00d92..188bbad 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -290,6 +290,8 @@ struct RAMState {
 uint32_t last_version;
 /* We are in the first round */
 bool ram_bulk_stage;
+/* The free page optimization is enabled */
+bool fpo_enabled;
 /* How many times we have dirty too many pages */
 int dirty_rate_high_cnt;
 /* these variables are used for bitmap sync */
@@ -354,6 +356,15 @@ int precopy_notify(PrecopyNotifyReason reason, Error 
**errp)
 return notifier_with_return_list_notify(_notifier_list, );
 }
 
+void precopy_enable_free_page_optimization(void)
+{
+if (!ram_state) {
+return;
+}
+
+ram_state->fpo_enabled = true;
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1567,7 +1578,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
RAMBlock *rb,
 return size;
 }
 
-if (rs->ram_bulk_stage && start > 0) {
+/*
+ * When the free page optimization is enabled, we need to check the bitmap
+ * to send the non-free pages rather than all the pages in the bulk stage.
+ */
+if (!rs->fpo_enabled && rs->ram_bulk_stage && start > 0) {
 next = start + 1;
 } else {
 next = find_next_bit(bitmap, size, start);
@@ -2600,6 +2615,7 @@ static void ram_state_reset(RAMState *rs)
 rs->last_page = 0;
 rs->last_version = ram_list.version;
 rs->ram_bulk_stage = true;
+rs->fpo_enabled = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
-- 
1.8.3.1

[Qemu-devel] [PATCH v11 2/7] bitmap: bitmap_count_one_with_offset

2018-12-11 Thread Wei Wang

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/qemu/bitmap.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 679f1bd..5c31334 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -232,6 +232,19 @@ static inline long bitmap_count_one(const unsigned long 
*bitmap, long nbits)
 }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+long offset, long nbits)
+{
+long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+long redundant_bits = offset - aligned_offset;
+long bits_to_count = nbits + redundant_bits;
+const unsigned long *bitmap_start = bitmap +
+aligned_offset / BITS_PER_LONG;
+
+return bitmap_count_one(bitmap_start, bits_to_count) -
+   bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1

[Qemu-devel] [PATCH v11 3/7] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-12-11 Thread Wei Wang

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
Reviewed-by: Peter Xu 
---
 migration/ram.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..01d267f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -316,7 +316,7 @@ struct RAMState {
 uint64_t target_page_count;
 /* number of dirty bits in the bitmap */
 uint64_t migration_dirty_pages;
-/* protects modification of the bitmap */
+/* Protects modification of the bitmap and migration dirty pages */
 QemuMutex bitmap_mutex;
 /* The RAMBlock used in the last src_page_requests */
 RAMBlock *last_req_rb;
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
 {
 bool ret;
 
+qemu_mutex_lock(>bitmap_mutex);
 ret = test_and_clear_bit(page, rb->bmap);
 
 if (ret) {
 rs->migration_dirty_pages--;
 }
+qemu_mutex_unlock(>bitmap_mutex);
+
 return ret;
 }
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v11 0/7] virtio-balloon: free page hint support

2018-12-11 Thread Wei Wang

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
  (i.e. make bzImage -j4) running
  1) Live Migration Time:
 Optimization v.s. Legacy = 2273ms v.s. 4502ms
 --> ~50% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min42s v.s. 8min43s
 --> no obvious difference

2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
  1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
  2) Linux Compilation Time:
Optimization v.s. Legacy = 8min30s v.s. 8min36s
--> no obvious difference

ChangeLog:
v10->v11:
migration:
- qemu_guest_free_page_hint:
- "offset >= block->used_length", instead of
  "offset > block->used_length";
- RAMState: enable the "fpo_enabled" flag, when the free page
  optimization feature is used, instead of disabling ram_bulk_stage.
  Please see patch 6 commit log for details.

Previous changelog:
http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00055.html

Wei Wang (7):
  bitmap: fix bitmap_count_one
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  migration/ram.c: add a notifier chain for precopy
  migration/ram.c: add the free page optimization enable flag
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/migration/misc.h|  22 ++
 include/qemu/bitmap.h   |  17 ++
 include/standard-headers/linux/virtio_balloon.h |   5 +
 migration/ram.c | 121 ++-
 migration/savevm.c  |  15 ++
 vl.c|   1 +
 8 files changed, 466 insertions(+), 6 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v11 5/7] migration/ram.c: add a notifier chain for precopy

2018-12-11 Thread Wei Wang

This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
Reviewed-by: Peter Xu 
---
 include/migration/misc.h | 19 ++
 migration/ram.c  | 51 +---
 migration/savevm.c   | 15 ++
 vl.c |  1 +
 4 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..15f8d00 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,6 +19,25 @@
 
 /* migration/ram.c */
 
+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_SETUP = 0,
+PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
+PRECOPY_NOTIFY_AFTER_BITMAP_SYNC = 2,
+PRECOPY_NOTIFY_COMPLETE = 3,
+PRECOPY_NOTIFY_CLEANUP = 4,
+PRECOPY_NOTIFY_MAX = 5,
+} PrecopyNotifyReason;
+
+typedef struct PrecopyNotifyData {
+enum PrecopyNotifyReason reason;
+Error **errp;
+} PrecopyNotifyData;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(NotifierWithReturn *n);
+void precopy_remove_notifier(NotifierWithReturn *n);
+int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
 
diff --git a/migration/ram.c b/migration/ram.c
index c13b44f..1e00d92 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -328,6 +328,32 @@ typedef struct RAMState RAMState;
 
 static RAMState *ram_state;
 
+static NotifierWithReturnList precopy_notifier_list;
+
+void precopy_infrastructure_init(void)
+{
+notifier_with_return_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_remove(n);
+}
+
+int precopy_notify(PrecopyNotifyReason reason, Error **errp)
+{
+PrecopyNotifyData pnd;
+pnd.reason = reason;
+pnd.errp = errp;
+
+return notifier_with_return_list_notify(_notifier_list, );
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1701,6 +1727,25 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
+static void migration_bitmap_sync_precopy(RAMState *rs)
+{
+Error *local_err = NULL;
+
+/*
+ * The current notifier usage is just an optimization to migration, so we
+ * don't stop the normal migration process in the error case.
+ */
+if (precopy_notify(PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+
+migration_bitmap_sync(rs);
+
+if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+}
+
 /**
  * save_zero_page_to_file: send the zero page to the file
  *
@@ -3072,7 +3117,7 @@ static void ram_init_bitmaps(RAMState *rs)
 
 ram_list_init_bitmaps();
 memory_global_dirty_log_start();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 
 rcu_read_unlock();
 qemu_mutex_unlock_ramlist();
@@ -3348,7 +3393,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 rcu_read_lock();
 
 if (!migration_in_postcopy()) {
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -3397,7 +3442,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 remaining_size < max_size) {
 qemu_mutex_lock_iothread();
 rcu_read_lock();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 rcu_read_unlock();
 qemu_mutex_unlock_iothread();
 remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e45fb4..ec74f44 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1018,6 +1018,7 @@ void qemu_savevm_state_header(QEMUFile *f)
 void qemu_savevm_state_setup(QEMUFile *f)
 {
 SaveStateEntry *se;
+Error *local_err = NULL;
 int ret;
 
 trace_savevm_state_setup();
@@ -1039,6 +1040,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
 break;
 }
 }
+
+if (precopy_notify(PRECOPY_NOTIFY_SETUP, _err)) {
+error_report_err(local_err);
+}
 }
 
 int qemu_savevm_state_resume_prepare(MigrationState *s)
@@ -1181,6 +1186,11 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool 
iterable_only,
 SaveStateEntry *se;
 int ret;
 bool in_postcopy = migration_in_postcopy();
+Error *local_err = NULL;
+
+if (precopy_notify(PRECOPY_NOTIFY_COMPLETE, _err)) {
+error_report_err(local_err);
+}
 
 trace_savevm_state_complete_precopy();
 
@@ -1313,6 +1323,1

Re: [Qemu-devel] [PATCH v10 6/7] migration/ram.c: add a function to disable the bulk stage

2018-12-03 Thread Wei Wang


On 12/03/2018 01:31 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:30AM +0800, Wei Wang wrote:

This patch adds a function to enable a precopy notifier callback outside
the migration subsystem to disable the bulk stage flag. This is needed by
the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  include/migration/misc.h | 1 +
  migration/ram.c  | 9 +
  2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 15f8d00..47e7ff5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -37,6 +37,7 @@ void precopy_infrastructure_init(void);
  void precopy_add_notifier(NotifierWithReturn *n);
  void precopy_remove_notifier(NotifierWithReturn *n);
  int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+void precopy_disable_bulk_stage(void);
  
  void ram_mig_init(void);

  void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index b90a3f2..739dc97 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -354,6 +354,15 @@ int precopy_notify(PrecopyNotifyReason reason, Error 
**errp)
  return notifier_with_return_list_notify(_notifier_list, );
  }
  
+void precopy_disable_bulk_stage(void)

+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+

This one is a bit tricky.  E.g., I think it'll at least affect XBZRLE
and compression somehow.  Since we will have the on-off switch for
balloon free page hinting so the user can at least decide what
features to use, with that I don't see much issue with it so far. But
I'd also like to see how other people see this change.



Yes. I think it would be better that this optimization could co-exist 
with the

compression feature.

How about adding a new flag "ram_state->free_page_optimization_enabled"?

We will then need a change in migration_bitmap_find_dirty:
 - if (rs->ram_bulk_stage && start > 0) {
+ if (!rs->free_page_optimization_enabled && rs->ram_bulk_stage && start 
> 0) {

next = start + 1;
} else {
next = find_next_bit(bitmap, size, start);
}

Best,
Wei

Re: [Qemu-devel] [PATCH v10 5/7] migration/ram.c: add a notifier chain for precopy

2018-12-03 Thread Wei Wang


On 12/03/2018 01:20 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:29AM +0800, Wei Wang wrote:

This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---

[...]


+int precopy_notify(PrecopyNotifyReason reason, Error **errp);

This function could be hidden from include/migration/misc.h, but I
don't know whether that is a reason for a repost.

And what I meant was that we fail precopy if notifier failed the
hooks, but warning is fine too anyway no real use case there.


Yes, I was also thinking about this. Failing precopy needs to change
some upper layer functions which are defined with "void".

There is no use case that needs to fail precopy currently, so I chose to
keep the change minimal. But we can make more changes if people think
it's necessary to do.

Best,
Wei

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v10 6/7] migration/ram.c: add a function to disable the bulk stage

2018-12-03 Thread Wei Wang


On 12/03/2018 04:20 PM, Wei Wang wrote:

On 12/03/2018 01:31 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:30AM +0800, Wei Wang wrote:
This patch adds a function to enable a precopy notifier callback 
outside
the migration subsystem to disable the bulk stage flag. This is 
needed by

the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  include/migration/misc.h | 1 +
  migration/ram.c  | 9 +
  2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 15f8d00..47e7ff5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -37,6 +37,7 @@ void precopy_infrastructure_init(void);
  void precopy_add_notifier(NotifierWithReturn *n);
  void precopy_remove_notifier(NotifierWithReturn *n);
  int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+void precopy_disable_bulk_stage(void);
void ram_mig_init(void);
  void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index b90a3f2..739dc97 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -354,6 +354,15 @@ int precopy_notify(PrecopyNotifyReason reason, 
Error **errp)
  return 
notifier_with_return_list_notify(_notifier_list, );

  }
  +void precopy_disable_bulk_stage(void)
+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+

This one is a bit tricky.  E.g., I think it'll at least affect XBZRLE
and compression somehow.  Since we will have the on-off switch for
balloon free page hinting so the user can at least decide what
features to use, with that I don't see much issue with it so far. But
I'd also like to see how other people see this change.



Yes. I think it would be better that this optimization could co-exist 
with the

compression feature.

How about adding a new flag "ram_state->free_page_optimization_enabled"?

We will then need a change in migration_bitmap_find_dirty:
 - if (rs->ram_bulk_stage && start > 0) {
+ if (!rs->free_page_optimization_enabled && rs->ram_bulk_stage && 
start > 0) {

next = start + 1;
} else {
next = find_next_bit(bitmap, size, start);
}



Btw, with the new flag, this patch is not needed, so "ram_bulk_stage" 
will not be changed.


Best,
Wei

Re: [Qemu-devel] [PATCH v10 4/7] migration: API to clear bits of guest free pages from the dirty bitmap

2018-12-03 Thread Wei Wang


On 12/03/2018 01:10 PM, Peter Xu wrote:

On Mon, Dec 03, 2018 at 10:18:28AM +0800, Wei Wang wrote:

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---

[...]


+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len, addr += used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+if (unlikely(!block || offset > block->used_length)) {

Maybe >=?  My fault if it is...


It would not cause problems (npages will be 0 below),
but I agree it sounds better to have ">=" earlier here. Thanks!
No problem, I can post another new version.


Best,
Wei

[Qemu-devel] [PATCH v10 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-02 Thread Wei Wang

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/standard-headers/linux/virtio_balloon.h |   5 +
 3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1728e4f..5376f9b 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/misc.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -308,6 +309,184 @@ out:
 }
 }
 
+static void virtio_balloon_handle_free_page_vq(VirtIODevice *vdev,
+   VirtQueue *vq)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+qemu_bh_schedule(s->free_page_bh);
+}
+
+static bool get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return false;
+}
+
+if (elem->out_num) {
+uint32_t id;
+size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
+ , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);
+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return false;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+}
+}
+}
+
+if (elem->in_num) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+}
+
+return true;
+}
+
+static void virtio_ballloon_get_free_page_hints(void *opaque)
+{
+VirtIOBalloon *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+bool continue_to_get_hints;
+
+do {
+qemu_mutex_lock(>free_page_lock);
+virtio_queue_set_notification(vq, 0);
+continue_to_get_hints = get_free_page_hints(dev);
+qemu_mutex_unlock(>free_page_lock);
+virtio_notify(vdev, vq);
+  /*
+   * Start to poll the vq once the reporting started. Otherwise, continue
+   * only when there are entries on the vq, which need to be given back.
+   */
+} while (continue_to_get_hints ||
+ dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+virtio_queue_set_notification(vq, 1);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(VirtIOBalloon *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+/* For the stop and copy phase, we don't need to start the optimization */
+if (!vdev->vm_running) {
+return;
+}
+
+if (s->free_page_report_cmd_id == UINT_MAX) {
+s->free_page_report_cmd_id

[Qemu-devel] [PATCH v10 6/7] migration/ram.c: add a function to disable the bulk stage

2018-12-02 Thread Wei Wang

This patch adds a function to enable a precopy notifier callback outside
the migration subsystem to disable the bulk stage flag. This is needed by
the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 1 +
 migration/ram.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 15f8d00..47e7ff5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -37,6 +37,7 @@ void precopy_infrastructure_init(void);
 void precopy_add_notifier(NotifierWithReturn *n);
 void precopy_remove_notifier(NotifierWithReturn *n);
 int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+void precopy_disable_bulk_stage(void);
 
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index b90a3f2..739dc97 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -354,6 +354,15 @@ int precopy_notify(PrecopyNotifyReason reason, Error 
**errp)
 return notifier_with_return_list_notify(_notifier_list, );
 }
 
+void precopy_disable_bulk_stage(void)
+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
-- 
1.8.3.1

[Qemu-devel] [PATCH v10 5/7] migration/ram.c: add a notifier chain for precopy

2018-12-02 Thread Wei Wang

This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 19 ++
 migration/ram.c  | 51 +---
 migration/savevm.c   | 15 ++
 vl.c |  1 +
 4 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..15f8d00 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,6 +19,25 @@
 
 /* migration/ram.c */
 
+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_SETUP = 0,
+PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC = 1,
+PRECOPY_NOTIFY_AFTER_BITMAP_SYNC = 2,
+PRECOPY_NOTIFY_COMPLETE = 3,
+PRECOPY_NOTIFY_CLEANUP = 4,
+PRECOPY_NOTIFY_MAX = 5,
+} PrecopyNotifyReason;
+
+typedef struct PrecopyNotifyData {
+enum PrecopyNotifyReason reason;
+Error **errp;
+} PrecopyNotifyData;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(NotifierWithReturn *n);
+void precopy_remove_notifier(NotifierWithReturn *n);
+int precopy_notify(PrecopyNotifyReason reason, Error **errp);
+
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
 
diff --git a/migration/ram.c b/migration/ram.c
index 4f7a3fe..b90a3f2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -328,6 +328,32 @@ typedef struct RAMState RAMState;
 
 static RAMState *ram_state;
 
+static NotifierWithReturnList precopy_notifier_list;
+
+void precopy_infrastructure_init(void)
+{
+notifier_with_return_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(NotifierWithReturn *n)
+{
+notifier_with_return_remove(n);
+}
+
+int precopy_notify(PrecopyNotifyReason reason, Error **errp)
+{
+PrecopyNotifyData pnd;
+pnd.reason = reason;
+pnd.errp = errp;
+
+return notifier_with_return_list_notify(_notifier_list, );
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1701,6 +1727,25 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
+static void migration_bitmap_sync_precopy(RAMState *rs)
+{
+Error *local_err = NULL;
+
+/*
+ * The current notifier usage is just an optimization to migration, so we
+ * don't stop the normal migration process in the error case.
+ */
+if (precopy_notify(PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+
+migration_bitmap_sync(rs);
+
+if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) {
+error_report_err(local_err);
+}
+}
+
 /**
  * save_zero_page_to_file: send the zero page to the file
  *
@@ -3072,7 +3117,7 @@ static void ram_init_bitmaps(RAMState *rs)
 
 ram_list_init_bitmaps();
 memory_global_dirty_log_start();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 
 rcu_read_unlock();
 qemu_mutex_unlock_ramlist();
@@ -3348,7 +3393,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 rcu_read_lock();
 
 if (!migration_in_postcopy()) {
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -3397,7 +3442,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 remaining_size < max_size) {
 qemu_mutex_lock_iothread();
 rcu_read_lock();
-migration_bitmap_sync(rs);
+migration_bitmap_sync_precopy(rs);
 rcu_read_unlock();
 qemu_mutex_unlock_iothread();
 remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e45fb4..ec74f44 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1018,6 +1018,7 @@ void qemu_savevm_state_header(QEMUFile *f)
 void qemu_savevm_state_setup(QEMUFile *f)
 {
 SaveStateEntry *se;
+Error *local_err = NULL;
 int ret;
 
 trace_savevm_state_setup();
@@ -1039,6 +1040,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
 break;
 }
 }
+
+if (precopy_notify(PRECOPY_NOTIFY_SETUP, _err)) {
+error_report_err(local_err);
+}
 }
 
 int qemu_savevm_state_resume_prepare(MigrationState *s)
@@ -1181,6 +1186,11 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool 
iterable_only,
 SaveStateEntry *se;
 int ret;
 bool in_postcopy = migration_in_postcopy();
+Error *local_err = NULL;
+
+if (precopy_notify(PRECOPY_NOTIFY_COMPLETE, _err)) {
+error_report_err(local_err);
+}
 
 trace_savevm_state_complete_precopy();
 
@@ -1313,6 +1323,11 @@ void qemu_savevm_state_pending(QE

[Qemu-devel] [PATCH v10 4/7] migration: API to clear bits of guest free pages from the dirty bitmap

2018-12-02 Thread Wei Wang

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h |  2 ++
 migration/ram.c  | 47 +++
 2 files changed, 49 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 01d267f..4f7a3fe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3131,6 +3131,53 @@ static void ram_state_resume_prepare(RAMState *rs, 
QEMUFile *out)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len, addr += used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+if (unlikely(!block || offset > block->used_length)) {
+/*
+ * The implementation might not support RAMBlock resize during
+ * live migration, but it could happen in theory with future
+ * updates. So we add a check here to capture that case.
+ */
+error_report_once("%s unexpected error", __func__);
+return;
+}
+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+}
+
+start = offset >> TARGET_PAGE_BITS;
+npages = used_len >> TARGET_PAGE_BITS;
+
+qemu_mutex_lock(_state->bitmap_mutex);
+ram_state->migration_dirty_pages -=
+  bitmap_count_one_with_offset(block->bmap, start, npages);
+bitmap_clear(block->bmap, start, npages);
+qemu_mutex_unlock(_state->bitmap_mutex);
+}
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1

[Qemu-devel] [PATCH v10 0/7] virtio-balloon: free page hint support

2018-12-02 Thread Wei Wang

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
  (i.e. make bzImage -j4) running
  1) Live Migration Time:
 Optimization v.s. Legacy = 2273ms v.s. 4502ms
 --> ~50% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min42s v.s. 8min43s
 --> no obvious difference

2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
  1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min30s v.s. 8min36s
 --> no obvious difference

ChangeLog:
v9->v10:
migration:
- qemu_guest_free_page_hint:
- move "addr += used_len" to the for() loop;
- add some comments about the ram block resize case.
- precopy notifier:
- enable callbacks to report errors (as the postcopy notifier);
- rename the notifier reasons:
- PRECOPY_NOTIFY_COMPLETE notifies the callback about the
  end of precopy;
- PRECOPY_NOTIFY_CLEANUP notifies the callback in the case
  migration is cancelled;
- relocate some notifications to the upper layer functions
  (i.e. qemu_savevm_state_*), which can get the callbacks notified
  when non-ram-saving code has errors which stops the precopy.
- migration_bitmap_sync_precopy to be called by the precopy code path
  with precopy notifier supported, and migration_bitmap_sync is left
  for callers without the need of precopy notifier (e.g. some postcopy
  code).
- avoid exposing migrate_postcopy;
virtio-balloon:
- virtio_balloon_free_page_report_notify: remove the migrate_postcopy
  check becasue the migration code has guaranteed that the notifier
  callback is invoked when in the precopy mode.

Previous changelog:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg02854.html

Wei Wang (7):
  bitmap: fix bitmap_count_one
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  migration/ram.c: add a notifier chain for precopy
  migration/ram.c: add a function to disable the bulk stage
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

 hw/virtio/virtio-balloon.c  | 263 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/migration/misc.h|  22 ++
 include/qemu/bitmap.h   |  17 ++
 include/standard-headers/linux/virtio_balloon.h |   5 +
 migration/ram.c | 112 +-
 migration/savevm.c  |  15 ++
 vl.c|   1 +
 8 files changed, 458 insertions(+), 5 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v10 1/7] bitmap: fix bitmap_count_one

2018-12-02 Thread Wei Wang

BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

Inital Discussion Link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg554316.html
Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..679f1bd 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -221,6 +221,10 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 
 static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
 {
+if (unlikely(!nbits)) {
+return 0;
+}
+
 if (small_nbits(nbits)) {
 return ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits));
 } else {
-- 
1.8.3.1

[Qemu-devel] [PATCH v10 2/7] bitmap: bitmap_count_one_with_offset

2018-12-02 Thread Wei Wang

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/qemu/bitmap.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 679f1bd..5c31334 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -232,6 +232,19 @@ static inline long bitmap_count_one(const unsigned long 
*bitmap, long nbits)
 }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+long offset, long nbits)
+{
+long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+long redundant_bits = offset - aligned_offset;
+long bits_to_count = nbits + redundant_bits;
+const unsigned long *bitmap_start = bitmap +
+aligned_offset / BITS_PER_LONG;
+
+return bitmap_count_one(bitmap_start, bits_to_count) -
+   bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1

[Qemu-devel] [PATCH v10 3/7] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-12-02 Thread Wei Wang

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 migration/ram.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..01d267f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -316,7 +316,7 @@ struct RAMState {
 uint64_t target_page_count;
 /* number of dirty bits in the bitmap */
 uint64_t migration_dirty_pages;
-/* protects modification of the bitmap */
+/* Protects modification of the bitmap and migration dirty pages */
 QemuMutex bitmap_mutex;
 /* The RAMBlock used in the last src_page_requests */
 RAMBlock *last_req_rb;
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
 {
 bool ret;
 
+qemu_mutex_lock(>bitmap_mutex);
 ret = test_and_clear_bit(page, rb->bmap);
 
 if (ret) {
 rs->migration_dirty_pages--;
 }
+qemu_mutex_unlock(>bitmap_mutex);
+
 return ret;
 }
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-29 Thread Wei Wang


On 11/30/2018 01:57 PM, Peter Xu wrote:



Or you can introduce migration_bitmap_sync_precopy() and let precopy
code call that one instead.


Agree, thanks.


PS. I'm a bit unsure on why we need to sync bitmap in ram_init_bitmaps.
I feel like it can be removed...


Seems it was added in early days to fix some bug:
https://git.virtualopensystems.com/dev/qemu/commit/79536f4f16934d6759a1d67f0342b4e7ceb66671?w=1

Hopefully, Juan could share some details about that.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-29 Thread Wei Wang


On 11/29/2018 01:10 PM, Peter Xu wrote:

On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
I think this precopy notifier callchain is expected to be used only for
the precopy mode. Postcopy has its dedicated notifier callchain that
users could use.

How about changing the migrate_postcopy() check to "ms->start_postcopy":

bool migration_postcopy_start(void)
{
 MigrationState *s;

 s = migrate_get_current();

 return atomic_read(>start_postcopy);
}


static void precopy_notify(PrecopyNotifyReason reason)
{
 if (migration_postcopy_start())
 return;

 notifier_list_notify(_notifier_list, );
}

If postcopy started with precopy, the precopy optimization feature
could still be used until it switches to the postcopy mode.
I'm not sure we can use start_postcopy.  It's a variable being set in
the QMP handler but it does not mean postcopy has started.  I'm afraid
there can be race where it's still precopy but the variable is set so
event could be missed...


Peter,
I just found that migration_bitmap_sync is also called by 
ram_postcopy_send_discard_bitmap().

But we don't expect notifier_list_notify to be called in the postcopy mode.

So, probably we still need the start_postcopy check in 
notifier_list_notify though we have

the COMPLETE notifier.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-28 Thread Wei Wang


On 11/29/2018 01:10 PM, Peter Xu wrote:

On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:

On 11/28/2018 05:32 PM, Peter Xu wrote:

I'm not sure we can use start_postcopy.  It's a variable being set in
the QMP handler but it does not mean postcopy has started.  I'm afraid
there can be race where it's still precopy but the variable is set so
event could be missed...

IMHO the problem is not that complicated.  How about this proposal:

[1]

   typedef enum PrecopyNotifyReason {
 PRECOPY_NOTIFY_RAM_START,
 PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
 PRECOPY_NOTIFY_RAM_AFTER_SYNC,
 PRECOPY_NOTIFY_COMPLETE,
 PRECOPY_NOTIFY_RAM_CLEANUP,
   };

The first three keep the same as your old ones.  Notify RAM_CLEANUP in
ram_save_cleanup() to make sure it'll always be cleaned up (the same
as PRECOPY_END, just another name).  Notify COMPLETE in
qemu_savevm_state_complete_precopy() to show that precopy is
completed.  Meanwhile on balloon side you should stop the hinting for
either RAM_CLEANUP or COMPLETE event.  Then either:

   - precopy is switching to postcopy, or
   - precopy completed, or
   - precopy failed/cancelled

You should always get at least a notification to stop the balloon.
Though you could also get one RAM_CLEANUP after one COMPLETE, but
the balloon should easily handle it (stop the hinting twice).


Sounds better, thanks.




Here maybe you can even remove the "RAM_" in both RAM_START and
RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
not only limited to RAM.

Another suggestion is that you can add an Error into the notify hooks,
please refer to the postcopy one:

   int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);

So the hook functions have a way to even stop the migration (though
for balloon hinting it'll be always optional so no error should be
reported...), then the two interfaces are matched.


Yes, I removed that "errp" as it's not needed by the balloon usage.
But no problem, I can add it back if no objections from others.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-28 Thread Wei Wang


On 11/28/2018 05:32 PM, Peter Xu wrote:


So what I am worrying here are corner cases where we might forget to
stop the hinting.  I'm fabricating one example sequence of events:

   (start migration)
   START_MIGRATION
   BEFORE_SYNC
   AFTER_SYNC
   ...
   BEFORE_SYNC
   AFTER_SYNC
   (some SaveStateEntry failed rather than RAM, then
migration_detect_error returned MIG_THR_ERR_FATAL so we need to
fail the migration, however when running the previous
ram_save_iterate for RAM's specific SaveStateEntry we didn't see
any error so no ERROR event detected)

Then it seems the hinting will last forever.  Considering that now I'm
not sure whether this can be done ram-only, since even if you capture
ram_save_complete() and at the same time you introduce PRECOPY_END you
may still miss the PRECOPY_END event since AFAIU ram_save_complete()
won't be called at all in this case.

Could this happen?


Thanks, indeed this case could happen if we add PRECOPY_END in
ram_save_complete.

How about putting PRECOPY_END in ram_save_cleanup?
I think it would be called in any case.

I'm also thinking probably we don't need PRECOPY_ERR when we have 
PRECOPY_END,

and what do you think of the notifier names below:

+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_RAM_SAVE_END = 0,
+PRECOPY_NOTIFY_RAM_SAVE_START = 1,
+PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
+PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
+PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
+} PrecopyNotifyReason;







[1]


Another thing to mention about the "reasons" (though I see it more
like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
It might help in some cases:

 - then you don't need to trickily export the migrate_postcopy()
   since you'll notify that before postcopy starts

I'm thinking probably we don't need to export migrate_postcopy even now.
It's more like a sanity check, and not needed because now we have the
notifier registered to the precopy specific callchain, which has ensured
that
it is invoked via precopy.

But postcopy will always start with precopy, no?

Yes, but I think we could add the check in precopy_notify()

I'm not sure that's good.  If the notifier could potentially have
other user, they might still work with postcopy, and they might expect
e.g. BEFORE_SYNC to be called for every sync, even if it's at the
precopy stage of a postcopy.


I think this precopy notifier callchain is expected to be used only for
the precopy mode. Postcopy has its dedicated notifier callchain that
users could use.

How about changing the migrate_postcopy() check to "ms->start_postcopy":

bool migration_postcopy_start(void)
{
MigrationState *s;

s = migrate_get_current();

return atomic_read(>start_postcopy);
}


static void precopy_notify(PrecopyNotifyReason reason)
{
if (migration_postcopy_start())
return;

notifier_list_notify(_notifier_list, );
}

If postcopy started with precopy, the precopy optimization feature
could still be used until it switches to the postcopy mode.




In that sense I still feel the
PRECOPY_END is better (so contantly call it at the end of precopy, no
matter whether there's another postcopy afterwards).  It sounds like a
cleaner interface.


Probably I still haven't got the point how PRECOPY_END could help above yet.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-28 Thread Wei Wang


On 11/28/2018 01:26 PM, Peter Xu wrote:


Ok thanks.  Please just make sure you will capture all the error
cases, e.g., I also see path like this (a few lines below):

 if (pages < 0) {
 qemu_file_set_error(f, pages);
 break;
 }

It seems that you missed that one.


I think that one should be fine. This notification is actually put at the
bottom of ram_save_iterate. All the above error will bail out to the "out:"
path and then go to call precopy_notify(PRECOPY_NOTIFY_ERR).



I would even suggest that you capture the error with higher level.
E.g., in migration_iteration_run() after qemu_savevm_state_iterate().
Or we can just check the return value of qemu_savevm_state_iterate(),
which we have had ignored so far.


Not very sure about the higher level, because other SaveStateEntry may cause
errors that this feature don't need to care, I think we may only need it 
in ram_save.




[1]




Another thing to mention about the "reasons" (though I see it more
like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
It might help in some cases:

- then you don't need to trickily export the migrate_postcopy()
  since you'll notify that before postcopy starts

I'm thinking probably we don't need to export migrate_postcopy even now.
It's more like a sanity check, and not needed because now we have the
notifier registered to the precopy specific callchain, which has ensured
that
it is invoked via precopy.

But postcopy will always start with precopy, no?


Yes, but I think we could add the check in precopy_notify()





- you'll have a solid point that you'll 100% guarantee that we'll
  stop the free page hinting and don't need to worry about whether
  there is chance the hinting will be running without an end [2].

Thanks, I think it makes sense. Plan to add PRECOPY_NOTIFY_END in
ram_save_complete.

Yeah you can.

Btw, if you're mostly adding the notifies only in RAM-only codes, then
you can consider add the "RAM" into the names of events too to be
clear.


Sounds good, will try and see if the name would become too long :)



My suggestion at [1] is precopy general, but you can still capture it
at the end of ram_save_iterate, then they are RAM-only again.  Please
feel free to choose what fits more...


OK, thanks.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-27 Thread Wei Wang


On 11/27/2018 03:38 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:08:01PM +0800, Wei Wang wrote:
  
+typedef enum PrecopyNotifyReason {

+PRECOPY_NOTIFY_ERR = 0,
+PRECOPY_NOTIFY_START_ITERATION = 1,
+PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP = 2,
+PRECOPY_NOTIFY_AFTER_SYNC_BITMAP = 3,
+PRECOPY_NOTIFY_MAX = 4,

It would be nice to add some comments for each of the notify reason.
E.g., from the name PRECOPY_NOTIFY_START_ITERATION seems more like a
hook at the start of each iteration but according to [1] it should be
at the start of migration rather than each iteration (or when
migration restarts, though I'm not sure whether we really have this
yet).


OK. I think It would be better if the name itself could be straightforward.
Probably we could change PRECOPY_NOTIFY_START_ITERATION to
PRECOPY_NOTIFY_START_MIGRATION.



+} PrecopyNotifyReason;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(Notifier *n);
+void precopy_remove_notifier(Notifier *n);
+
  void ram_mig_init(void);
  void qemu_guest_free_page_hint(void *addr, size_t len);
  
diff --git a/migration/ram.c b/migration/ram.c

index 229b791..65b1223 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -292,6 +292,8 @@ struct RAMState {
  bool ram_bulk_stage;
  /* How many times we have dirty too many pages */
  int dirty_rate_high_cnt;
+/* ram save states used for notifiers */
+int ram_save_state;

This can be removed?


Yes, thanks.




  /* these variables are used for bitmap sync */
  /* last time we did a full bitmap_sync */
  int64_t time_last_bitmap_sync;
@@ -328,6 +330,28 @@ typedef struct RAMState RAMState;
  
  static RAMState *ram_state;
  
+static NotifierList precopy_notifier_list;

+
+void precopy_infrastructure_init(void)
+{
+notifier_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(Notifier *n)
+{
+notifier_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(Notifier *n)
+{
+notifier_remove(n);
+}
+
+static void precopy_notify(PrecopyNotifyReason reason)
+{
+notifier_list_notify(_notifier_list, );
+}
+
  uint64_t ram_bytes_remaining(void)
  {
  return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1642,6 +1666,8 @@ static void migration_bitmap_sync(RAMState *rs)
  int64_t end_time;
  uint64_t bytes_xfer_now;
  
+precopy_notify(PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP);

+
  ram_counters.dirty_sync_count++;
  
  if (!rs->time_last_bitmap_sync) {

@@ -1699,6 +1725,8 @@ static void migration_bitmap_sync(RAMState *rs)
  if (migrate_use_events()) {
  qapi_event_send_migration_pass(ram_counters.dirty_sync_count);
  }
+
+precopy_notify(PRECOPY_NOTIFY_AFTER_SYNC_BITMAP);
  }
  
  /**

@@ -2555,6 +2583,8 @@ static void ram_state_reset(RAMState *rs)
  rs->last_page = 0;
  rs->last_version = ram_list.version;
  rs->ram_bulk_stage = true;
+
+precopy_notify(PRECOPY_NOTIFY_START_ITERATION);

[1]


  }
  
  #define MAX_WAIT 50 /* ms, half buffered_file limit */

@@ -3324,6 +3354,7 @@ out:
  
  ret = qemu_file_get_error(f);

  if (ret < 0) {
+precopy_notify(PRECOPY_NOTIFY_ERR);

Could you show me which function is this line in?

Line 3324 here is ram_save_complete(), but I cannot find this exact
place.


Sure, it's in ram_save_iterate():
...
out:
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
qemu_fflush(f);
ram_counters.transferred += 8;

ret = qemu_file_get_error(f);
if (ret < 0) {
+precopy_notify(PRECOPY_NOTIFY_ERR);
return ret;
}

return done;
}




Another thing to mention about the "reasons" (though I see it more
like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
It might help in some cases:

   - then you don't need to trickily export the migrate_postcopy()
 since you'll notify that before postcopy starts


I'm thinking probably we don't need to export migrate_postcopy even now.
It's more like a sanity check, and not needed because now we have the
notifier registered to the precopy specific callchain, which has ensured 
that

it is invoked via precopy.


   - you'll have a solid point that you'll 100% guarantee that we'll
 stop the free page hinting and don't need to worry about whether
 there is chance the hinting will be running without an end [2].


Thanks, I think it makes sense. Plan to add PRECOPY_NOTIFY_END in 
ram_save_complete.





Regarding [2] above: now the series only stops the hinting when
PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP is notified.  Could there be a case
that it's missing?  E.g., what if we cancel/fail a migration during
precopy?  Have you tried it?



I think it has been handled by the above PRECOPY_NOTIFY_ERR

Best,
Wei

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-27 Thread Wei Wang


On 11/27/2018 03:41 PM, Peter Xu wrote:


Ok then I'm fine with it.  Though you could update the comments too if
you like:

 /* protects modification of the bitmap and migration_dirty_pages */
 QemuMutex bitmap_mutex;

And it's tricky that sometimes we don't take the lock when reading
this variable "migration_dirty_pages".  I don't see obvious issue so
far, hope it's true (at least I skipped the colo ones...).


The caller reads the value just to estimate the remaining_size, and
it seems fine without a lock, because whether it reads a
value before the update or after the update seem not causing
an issue.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 4/8] migration: API to clear bits of guest free pages from the dirty bitmap

2018-11-26 Thread Wei Wang


On 11/27/2018 02:06 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:08:00PM +0800, Wei Wang wrote:
Again, is it possible to resize during migration?

So I think the check is fine, but uncertain about the comment.


Yes, resize would not happen with the current implementation.
But heard it could just be a temporal implementation. Probably
we could improve the comment like this:

"
Though the implementation might not support ram resize currently,
this could happen in theory with future updates. So the check here
handles the case that RAMBLOCK is resized after the free page hint is
reported.
"



And shall we print something if that happened?  We can use
error_report_once(), and squashing the above assert:

   if (!block || offset > block->used_length) {
 /* should never happen, but if it happens we ignore the hints and warn */
 error_report_once("...");
 return;
   }

What do you think?


Sounds good.




+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+addr += used_len;

Maybe moving this line into the for() could be a bit better?

   for (; len > 0; len -= used_len, addr += used_len) {



Yes, I think it looks better, thanks.

Best,
Wei

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-26 Thread Wei Wang


On 11/27/2018 02:02 PM, Wei Wang wrote:

On 11/27/2018 01:40 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:07:59PM +0800, Wei Wang wrote:

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  migration/ram.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..ef69dbe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,

  {
  bool ret;
  +qemu_mutex_lock(>bitmap_mutex);
  ret = test_and_clear_bit(page, rb->bmap);
if (ret) {
  rs->migration_dirty_pages--;
  }
+qemu_mutex_unlock(>bitmap_mutex);
+
  return ret;
  }

It seems fine to me, but have you thought about
test_and_clear_bit_atomic()?  Note that we just had
test_and_set_bit_atomic() a few months ago.


Thanks for sharing. I think we might also need to
mutex migration_dirty_pages.



And not related to this patch: I'm unclear on why we have had
bitmap_mutex before, since it seems unnecessary.


OK. This is because with the optimization we have a thread
which clears bits (of free pages) from the bitmap and updates
migration_dirty_pages. So we need to synchronization between
the migration thread and the optimization thread.



And before this feature, I think yes, that bitmap_mutex is not needed.
It was left there due to some historical reasons.
I remember Dave previous said he was about to remove it. But the new
feature will need it again.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-26 Thread Wei Wang


On 11/27/2018 01:40 PM, Peter Xu wrote:

On Thu, Nov 15, 2018 at 06:07:59PM +0800, Wei Wang wrote:

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  migration/ram.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..ef69dbe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
  {
  bool ret;
  
+qemu_mutex_lock(>bitmap_mutex);

  ret = test_and_clear_bit(page, rb->bmap);
  
  if (ret) {

  rs->migration_dirty_pages--;
  }
+qemu_mutex_unlock(>bitmap_mutex);
+
  return ret;
  }

It seems fine to me, but have you thought about
test_and_clear_bit_atomic()?  Note that we just had
test_and_set_bit_atomic() a few months ago.


Thanks for sharing. I think we might also need to
mutex migration_dirty_pages.



And not related to this patch: I'm unclear on why we have had
bitmap_mutex before, since it seems unnecessary.


OK. This is because with the optimization we have a thread
which clears bits (of free pages) from the bitmap and updates
migration_dirty_pages. So we need to synchronization between
the migration thread and the optimization thread.

Best,
Wei

Re: [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support

2018-11-26 Thread Wei Wang


On 11/15/2018 06:07 PM, Wei Wang wrote:

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
 Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
 2.1 Guest setup: 8G RAM, 4 vCPU
 2.1.1 Idle guest live migration time
 Optimization v.s. Legacy = 620ms vs 2970ms
 --> ~79% reduction
 2.1.2 Guest live migration with Linux compilation workload
   (i.e. make bzImage -j4) running
   1) Live Migration Time:
  Optimization v.s. Legacy = 2273ms v.s. 4502ms
  --> ~50% reduction
   2) Linux Compilation Time:
  Optimization v.s. Legacy = 8min42s v.s. 8min43s
  --> no obvious difference

 2.2 Guest setup: 128G RAM, 4 vCPU
 2.2.1 Idle guest live migration time
 Optimization v.s. Legacy = 5294ms vs 41651ms
 --> ~87% reduction
 2.2.2 Guest live migration with Linux compilation workload
   1) Live Migration Time:
 Optimization v.s. Legacy = 8816ms v.s. 54201ms
 --> 84% reduction
   2) Linux Compilation Time:
  Optimization v.s. Legacy = 8min30s v.s. 8min36s
  --> no obvious difference

ChangeLog:
v8->v9:
bitmap:
 - fix bitmap_count_one to handle the nbits=0 case
migration:
 - replace the ram save notifier chain with a more general precopy
   notifier chain, which is similar to the postcopy notifier chain.
 - Avoid exposing the RAMState struct, and add a function,
   precopy_disable_bulk_stage, to let the virtio-balloon notifier
   callback to disable the bulk stage flag.


Hi Dave and Peter,

Could you continue to review the patches? Thanks!

Best,
Wei

Re: [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support

2018-11-15 Thread Wei Wang


On 11/16/2018 02:50 AM, no-re...@patchew.org wrote:

Hi,

This series failed docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

   CC  net/filter.o
   CC  net/filter-buffer.o
   CC  net/filter-mirror.o
   CC  net/colo-compare.o
/tmp/qemu-test/src/migration/rdma.c: In function 'qemu_rdma_accept':
/tmp/qemu-test/src/migration/rdma.c:3353:5: error: implicit declaration of 
function 'migrate_postcopy' [-Werror=implicit-function-declaration]
  if (migrate_postcopy() && !rdma->is_return_path) {
  ^
/tmp/qemu-test/src/migration/rdma.c:3353:5: error: nested extern declaration of 
'migrate_postcopy' [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [migration/rdma.o] Error 1
make: *** Waiting for unfinished jobs


This is caused by missing "migration/misc.h" in rdma.c, since we moved 
the migrate_postcopy() declaration there. I'll add it.


Best,
Wei

[Qemu-devel] [PATCH v9 8/8] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-11-15 Thread Wei Wang

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c  | 255 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/standard-headers/linux/virtio_balloon.h |   5 +
 3 files changed, 287 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1728e4f..4a3eb30 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/misc.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -308,6 +309,176 @@ out:
 }
 }
 
+static void virtio_balloon_handle_free_page_vq(VirtIODevice *vdev,
+   VirtQueue *vq)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+qemu_bh_schedule(s->free_page_bh);
+}
+
+static bool get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return false;
+}
+
+if (elem->out_num) {
+uint32_t id;
+size_t size = iov_to_buf(elem->out_sg, elem->out_num, 0,
+ , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);
+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return false;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+}
+}
+}
+
+if (elem->in_num) {
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 1);
+g_free(elem);
+}
+
+return true;
+}
+
+static void virtio_ballloon_get_free_page_hints(void *opaque)
+{
+VirtIOBalloon *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+bool continue_to_get_hints;
+
+do {
+qemu_mutex_lock(>free_page_lock);
+virtio_queue_set_notification(vq, 0);
+continue_to_get_hints = get_free_page_hints(dev);
+qemu_mutex_unlock(>free_page_lock);
+virtio_notify(vdev, vq);
+  /*
+   * Start to poll the vq once the reporting started. Otherwise, continue
+   * only when there are entries on the vq, which need to be given back.
+   */
+} while (continue_to_get_hints ||
+ dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+virtio_queue_set_notification(vq, 1);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(VirtIOBalloon *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+/* For the stop and copy phase, we don't need to start the optimization */
+if (!vdev->vm_running) {
+return;
+}
+
+if (s->free_page_report_cmd_id == UINT_MAX) {
+s->free_page_report_cmd_id

[Qemu-devel] [PATCH v9 2/8] bitmap: bitmap_count_one_with_offset

2018-11-15 Thread Wei Wang

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/qemu/bitmap.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 679f1bd..5c31334 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -232,6 +232,19 @@ static inline long bitmap_count_one(const unsigned long 
*bitmap, long nbits)
 }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+long offset, long nbits)
+{
+long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+long redundant_bits = offset - aligned_offset;
+long bits_to_count = nbits + redundant_bits;
+const unsigned long *bitmap_start = bitmap +
+aligned_offset / BITS_PER_LONG;
+
+return bitmap_count_one(bitmap_start, bits_to_count) -
+   bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1

[Qemu-devel] [PATCH v9 6/8] migration/ram.c: add a function to disable the bulk stage

2018-11-15 Thread Wei Wang

This patch adds a function to enable a precopy notifier callback outside
the migration subsystem to disable the bulk stage flag. This is needed by
the free page optimization offered by virtio-balloon.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 1 +
 migration/ram.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 0bac623..67cb275 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -30,6 +30,7 @@ typedef enum PrecopyNotifyReason {
 void precopy_infrastructure_init(void);
 void precopy_add_notifier(Notifier *n);
 void precopy_remove_notifier(Notifier *n);
+void precopy_disable_bulk_stage(void);
 
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index 65b1223..8745ca3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -352,6 +352,15 @@ static void precopy_notify(PrecopyNotifyReason reason)
 notifier_list_notify(_notifier_list, );
 }
 
+void precopy_disable_bulk_stage(void)
+{
+if (!ram_state) {
+return;
+}
+
+ram_state->ram_bulk_stage = false;
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
-- 
1.8.3.1

[Qemu-devel] [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-11-15 Thread Wei Wang

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec..ef69dbe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1556,11 +1556,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
 {
 bool ret;
 
+qemu_mutex_lock(>bitmap_mutex);
 ret = test_and_clear_bit(page, rb->bmap);
 
 if (ret) {
 rs->migration_dirty_pages--;
 }
+qemu_mutex_unlock(>bitmap_mutex);
+
 return ret;
 }
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v9 7/8] migration: move migrate_postcopy() to include/migration/misc.h

2018-11-15 Thread Wei Wang

The ram save state notifier callback, for example the free page
optimization offerred by virtio-balloon, may need to check if
postcopy is in use, so move migrate_postcopy() to the outside header.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/migration/misc.h | 1 +
 migration/migration.h| 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 67cb275..06b816f 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -70,6 +70,7 @@ bool migration_has_failed(MigrationState *);
 /* ...and after the device transmission */
 bool migration_in_postcopy_after_devices(MigrationState *);
 void migration_global_dump(Monitor *mon);
+bool migrate_postcopy(void);
 
 /* migration/block-dirty-bitmap.c */
 void dirty_bitmap_mig_init(void);
diff --git a/migration/migration.h b/migration/migration.h
index e413d4d..e1e92ff 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -249,8 +249,6 @@ bool migration_is_blocked(Error **errp);
 bool migration_in_postcopy(void);
 MigrationState *migrate_get_current(void);
 
-bool migrate_postcopy(void);
-
 bool migrate_release_ram(void);
 bool migrate_postcopy_ram(void);
 bool migrate_zero_blocks(void);
-- 
1.8.3.1

[Qemu-devel] [PATCH v9 1/8] bitmap: fix bitmap_count_one

2018-11-15 Thread Wei Wang

BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

Inital Discussion Link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg554316.html
Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..679f1bd 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -221,6 +221,10 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 
 static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
 {
+if (unlikely(!nbits)) {
+return 0;
+}
+
 if (small_nbits(nbits)) {
 return ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits));
 } else {
-- 
1.8.3.1

[Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support

2018-11-15 Thread Wei Wang

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not sent by the migration thread to the destination.

*Tests
1 Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms

2 Test Results (results are averaged over several repeated runs)
2.1 Guest setup: 8G RAM, 4 vCPU
2.1.1 Idle guest live migration time
Optimization v.s. Legacy = 620ms vs 2970ms
--> ~79% reduction
2.1.2 Guest live migration with Linux compilation workload
  (i.e. make bzImage -j4) running
  1) Live Migration Time:
 Optimization v.s. Legacy = 2273ms v.s. 4502ms
 --> ~50% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min42s v.s. 8min43s
 --> no obvious difference

2.2 Guest setup: 128G RAM, 4 vCPU
2.2.1 Idle guest live migration time
Optimization v.s. Legacy = 5294ms vs 41651ms
--> ~87% reduction
2.2.2 Guest live migration with Linux compilation workload
  1) Live Migration Time:
Optimization v.s. Legacy = 8816ms v.s. 54201ms
--> 84% reduction
  2) Linux Compilation Time:
 Optimization v.s. Legacy = 8min30s v.s. 8min36s
 --> no obvious difference

ChangeLog:
v8->v9:
bitmap:
- fix bitmap_count_one to handle the nbits=0 case
migration:
- replace the ram save notifier chain with a more general precopy
  notifier chain, which is similar to the postcopy notifier chain.
- Avoid exposing the RAMState struct, and add a function,
  precopy_disable_bulk_stage, to let the virtio-balloon notifier
  callback to disable the bulk stage flag.
virtio-balloon:
- Remove VIRTIO_BALLOON_F_PAGE_POISON from this series as it is not
  a limit to the free page optimization now. Plan to add the device
  support for VIRTIO_BALLOON_F_PAGE_POISON in another patch later.

Previous changelog:
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01908.html

Wei Wang (8):
  bitmap: fix bitmap_count_one
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  migration/ram.c: add a notifier chain for precopy
  migration/ram.c: add a function to disable the bulk stage
  migration: move migrate_postcopy() to include/migration/misc.h
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

 hw/virtio/virtio-balloon.c  | 255 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/migration/misc.h|  16 ++
 include/qemu/bitmap.h   |  17 ++
 include/standard-headers/linux/virtio_balloon.h |   5 +
 migration/migration.h   |   2 -
 migration/ram.c |  91 +
 vl.c|   1 +
 8 files changed, 412 insertions(+), 3 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy

2018-11-15 Thread Wei Wang

This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 12 
 migration/ram.c  | 31 +++
 vl.c |  1 +
 3 files changed, 44 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..0bac623 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -19,6 +19,18 @@
 
 /* migration/ram.c */
 
+typedef enum PrecopyNotifyReason {
+PRECOPY_NOTIFY_ERR = 0,
+PRECOPY_NOTIFY_START_ITERATION = 1,
+PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP = 2,
+PRECOPY_NOTIFY_AFTER_SYNC_BITMAP = 3,
+PRECOPY_NOTIFY_MAX = 4,
+} PrecopyNotifyReason;
+
+void precopy_infrastructure_init(void);
+void precopy_add_notifier(Notifier *n);
+void precopy_remove_notifier(Notifier *n);
+
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
 
diff --git a/migration/ram.c b/migration/ram.c
index 229b791..65b1223 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -292,6 +292,8 @@ struct RAMState {
 bool ram_bulk_stage;
 /* How many times we have dirty too many pages */
 int dirty_rate_high_cnt;
+/* ram save states used for notifiers */
+int ram_save_state;
 /* these variables are used for bitmap sync */
 /* last time we did a full bitmap_sync */
 int64_t time_last_bitmap_sync;
@@ -328,6 +330,28 @@ typedef struct RAMState RAMState;
 
 static RAMState *ram_state;
 
+static NotifierList precopy_notifier_list;
+
+void precopy_infrastructure_init(void)
+{
+notifier_list_init(_notifier_list);
+}
+
+void precopy_add_notifier(Notifier *n)
+{
+notifier_list_add(_notifier_list, n);
+}
+
+void precopy_remove_notifier(Notifier *n)
+{
+notifier_remove(n);
+}
+
+static void precopy_notify(PrecopyNotifyReason reason)
+{
+notifier_list_notify(_notifier_list, );
+}
+
 uint64_t ram_bytes_remaining(void)
 {
 return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1642,6 +1666,8 @@ static void migration_bitmap_sync(RAMState *rs)
 int64_t end_time;
 uint64_t bytes_xfer_now;
 
+precopy_notify(PRECOPY_NOTIFY_BEFORE_SYNC_BITMAP);
+
 ram_counters.dirty_sync_count++;
 
 if (!rs->time_last_bitmap_sync) {
@@ -1699,6 +1725,8 @@ static void migration_bitmap_sync(RAMState *rs)
 if (migrate_use_events()) {
 qapi_event_send_migration_pass(ram_counters.dirty_sync_count);
 }
+
+precopy_notify(PRECOPY_NOTIFY_AFTER_SYNC_BITMAP);
 }
 
 /**
@@ -2555,6 +2583,8 @@ static void ram_state_reset(RAMState *rs)
 rs->last_page = 0;
 rs->last_version = ram_list.version;
 rs->ram_bulk_stage = true;
+
+precopy_notify(PRECOPY_NOTIFY_START_ITERATION);
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -3324,6 +3354,7 @@ out:
 
 ret = qemu_file_get_error(f);
 if (ret < 0) {
+precopy_notify(PRECOPY_NOTIFY_ERR);
 return ret;
 }
 
diff --git a/vl.c b/vl.c
index fa25d1a..48ae0e8 100644
--- a/vl.c
+++ b/vl.c
@@ -3057,6 +3057,7 @@ int main(int argc, char **argv, char **envp)
 module_call_init(MODULE_INIT_OPTS);
 
 runstate_init();
+precopy_infrastructure_init();
 postcopy_infrastructure_init();
 monitor_init_globals();
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v9 4/8] migration: API to clear bits of guest free pages from the dirty bitmap

2018-11-15 Thread Wei Wang

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h |  2 ++
 migration/ram.c  | 48 
 2 files changed, 50 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index ef69dbe..229b791 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3131,6 +3131,54 @@ static void ram_state_resume_prepare(RAMState *rs, 
QEMUFile *out)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+RAMBlock *block;
+ram_addr_t offset;
+size_t used_len, start, npages;
+MigrationState *s = migrate_get_current();
+
+/* This function is currently expected to be used during live migration */
+if (!migration_is_setup_or_active(s->state)) {
+return;
+}
+
+for (; len > 0; len -= used_len) {
+block = qemu_ram_block_from_host(addr, false, );
+assert(block);
+
+/*
+ * This handles the case that the RAMBlock is resized after the free
+ * page hint is reported.
+ */
+if (unlikely(offset > block->used_length)) {
+return;
+}
+
+if (len <= block->used_length - offset) {
+used_len = len;
+} else {
+used_len = block->used_length - offset;
+addr += used_len;
+}
+
+start = offset >> TARGET_PAGE_BITS;
+npages = used_len >> TARGET_PAGE_BITS;
+
+qemu_mutex_lock(_state->bitmap_mutex);
+ram_state->migration_dirty_pages -=
+  bitmap_count_one_with_offset(block->bmap, start, npages);
+bitmap_clear(block->bmap, start, npages);
+qemu_mutex_unlock(_state->bitmap_mutex);
+}
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2] bitmap: fix BITMAP_LAST_WORD_MASK

2018-08-07 Thread Wei Wang


On 08/07/2018 05:53 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 08/07/2018 03:39 PM, Peter Xu wrote:

On Tue, Jul 31, 2018 at 06:01:18PM +0800, Wei Wang wrote:

When "nbits = 0", which means no bits to mask, this macro is expected to
return 0, instead of 0x. This patch changes the macro to return
0 when there is no bit needs to be masked.

Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 

Reviewed-by: Peter Xu 

Is there any existing path that can trigger this nbits==0?

Not sure about other bitmap APIs which call this macro. But it happens in
the patches we are working on, which use bitmap_count_one.
It would be good to have the macro itself handle this corner case, so that
callers won't need to worry about that.

Given that I see you're having a similar discussion on the kernel list
we should see how that pans out before making qemu changes.


OK.
The situation is a little different in Linux, because all the callers 
there have already taken the responsibilities to avoid the "nbits=0" 
corner case, that's also the reason that they want to stick with the old 
way. Here in QEMU, most callers (e.g. bitmap_count_one, bitmap_fill, 
bitmap_intersects) haven't checked that, so fixing the macro itself 
might be a better choice here.


Best,
Wei

Re: [Qemu-devel] [PATCH v2] bitmap: fix BITMAP_LAST_WORD_MASK

2018-08-07 Thread Wei Wang


On 08/07/2018 08:17 PM, Peter Xu wrote:

On Tue, Aug 07, 2018 at 04:21:15PM +0800, Wei Wang wrote:

On 08/07/2018 03:39 PM, Peter Xu wrote:

On Tue, Jul 31, 2018 at 06:01:18PM +0800, Wei Wang wrote:

When "nbits = 0", which means no bits to mask, this macro is expected to
return 0, instead of 0x. This patch changes the macro to return
0 when there is no bit needs to be masked.

Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 

Reviewed-by: Peter Xu 

Is there any existing path that can trigger this nbits==0?

Not sure about other bitmap APIs which call this macro. But it happens in
the patches we are working on, which use bitmap_count_one.
It would be good to have the macro itself handle this corner case, so that
callers won't need to worry about that.

Yeah that makes sense.  Asked since that would matter on whether it's
3.0 material, then it's possibly not.



OK, thanks for checking.

Best,
Wei

Re: [Qemu-devel] [PATCH v2] bitmap: fix BITMAP_LAST_WORD_MASK

2018-08-07 Thread Wei Wang


On 08/07/2018 03:39 PM, Peter Xu wrote:

On Tue, Jul 31, 2018 at 06:01:18PM +0800, Wei Wang wrote:

When "nbits = 0", which means no bits to mask, this macro is expected to
return 0, instead of 0x. This patch changes the macro to return
0 when there is no bit needs to be masked.

Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 

Reviewed-by: Peter Xu 

Is there any existing path that can trigger this nbits==0?


Not sure about other bitmap APIs which call this macro. But it happens 
in the patches we are working on, which use bitmap_count_one.
It would be good to have the macro itself handle this corner case, so 
that callers won't need to worry about that.


Best,
Wei

[Qemu-devel] [PATCH v2] bitmap: fix BITMAP_LAST_WORD_MASK

2018-07-31 Thread Wei Wang

When "nbits = 0", which means no bits to mask, this macro is expected to
return 0, instead of 0x. This patch changes the macro to return
0 when there is no bit needs to be masked.

Signed-off-by: Wei Wang 
CC: Juan Quintela 
CC: Dr. David Alan Gilbert 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

v1->v2 ChangeLog:
- fix the macro directly, instead of fixing the callers one by one.

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..9372423 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -60,7 +60,10 @@
  */
 
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
-#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
+#define BITMAP_LAST_WORD_MASK(nbits)   \
+(  \
+nbits ? (~0UL >> (-(nbits) & (BITS_PER_LONG - 1))) : 0 \
+)
 
 #define DECLARE_BITMAP(name,bits)  \
 unsigned long name[BITS_TO_LONGS(bits)]
-- 
2.7.4

Re: [Qemu-devel] [PATCH 1/2] bitmap.h: add comments to BITMAP_LAST_WORD_MASK

2018-07-31 Thread Wei Wang


On 07/30/2018 09:19 PM, Juan Quintela wrote:

Wei Wang  wrote:

On 07/30/2018 03:13 PM, Wei Wang wrote:

This macro was ported from Linux and we've reached an aggreement there
that the corner case "nbits = 0" is not applicable to this macro, because
when "nbits = 0", which means no bits to mask, this macro is expected to
return 0, instead of 0x. This patch simply adds a comment above
the macro as a note to users about the corner case.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
   include/qemu/bitmap.h | 1 +
   1 file changed, 1 insertion(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..f53c640 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -60,6 +60,7 @@
*/
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) &
(BITS_PER_LONG - 1)))
+/* "nbits = 0" is not applicable to this macro. Callers should avoid that. */
   #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 
1)))
 #define DECLARE_BITMAP(name,bits)  \

A better fix would be to directly change the macro to:  nbits ? (~0UL

(-(nbits) & (BITS_PER_LONG - 1))) : 0,

so that we don't need to fix other callers like bitmap_full,
bitmap_intersects.

So just post this out for a discussion whether it's preferred to just
adding note comments as we did for linux or fixing the macro directly.

Best,
Wei

On one hand:
a - we have copied it form linux, we don't want to diverge
On the other hand:
b - it is much easier to use if we change the macro

So, it is a tought one.

I slightly preffer b), but will not object to a either.  As you are the
one doing the patch, your choice.


Thanks Juan. I plan to choose b - fixing the macro directly.

Best,
Wei

Re: [Qemu-devel] [PATCH 1/2] bitmap.h: add comments to BITMAP_LAST_WORD_MASK

2018-07-30 Thread Wei Wang


On 07/30/2018 03:13 PM, Wei Wang wrote:

This macro was ported from Linux and we've reached an aggreement there
that the corner case "nbits = 0" is not applicable to this macro, because
when "nbits = 0", which means no bits to mask, this macro is expected to
return 0, instead of 0x. This patch simply adds a comment above
the macro as a note to users about the corner case.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
  include/qemu/bitmap.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..f53c640 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -60,6 +60,7 @@
   */
  
  #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))

+/* "nbits = 0" is not applicable to this macro. Callers should avoid that. */
  #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 
1)))
  
  #define DECLARE_BITMAP(name,bits)  \


A better fix would be to directly change the macro to:  nbits ? (~0UL >> 
(-(nbits) & (BITS_PER_LONG - 1))) : 0,
so that we don't need to fix other callers like bitmap_full, 
bitmap_intersects.


So just post this out for a discussion whether it's preferred to just 
adding note comments as we did for linux or fixing the macro directly.


Best,
Wei

[Qemu-devel] [PATCH 1/2] bitmap.h: add comments to BITMAP_LAST_WORD_MASK

2018-07-30 Thread Wei Wang

This macro was ported from Linux and we've reached an aggreement there
that the corner case "nbits = 0" is not applicable to this macro, because
when "nbits = 0", which means no bits to mask, this macro is expected to
return 0, instead of 0x. This patch simply adds a comment above
the macro as a note to users about the corner case.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..f53c640 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -60,6 +60,7 @@
  */
 
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
+/* "nbits = 0" is not applicable to this macro. Callers should avoid that. */
 #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
 
 #define DECLARE_BITMAP(name,bits)  \
-- 
2.7.4

[Qemu-devel] [PATCH 0/2] bitmap: some fixes

2018-07-30 Thread Wei Wang

"nbits = 0" is not applicable to BITMAP_LAST_WORD_MASK, callers should
avoid that. Fix bitmap_count_one to avoid passing 0 to the macro.

Wei Wang (2):
  bitmap.h: add comments to BITMAP_LAST_WORD_MASK
  bitmap: fix bitmap_count_one

 include/qemu/bitmap.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.7.4

[Qemu-devel] [PATCH 2/2] bitmap: fix bitmap_count_one

2018-07-30 Thread Wei Wang

Since "nbits = 0" is not applicable to the BITMAP_LAST_WORD_MASK macro,
callers need to avoid passing "nbits = 0" to this macro, which generates
incorrect results.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 include/qemu/bitmap.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index f53c640..e45e9c0 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -223,7 +223,7 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
 {
 if (small_nbits(nbits)) {
-return ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits));
+return nbits ? ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits)) : 0;
 } else {
 return slow_bitmap_count_one(bitmap, nbits);
 }
-- 
2.7.4

Re: [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer

2018-06-28 Thread Wei Wang


On 06/28/2018 06:02 PM, Xiao Guangrong wrote:


CC: Paul, Peter Zijlstra, Stefani, Lai who are all good at memory 
barrier.



On 06/20/2018 12:52 PM, Peter Xu wrote:
On Mon, Jun 04, 2018 at 05:55:17PM +0800, guangrong.x...@gmail.com 
wrote:

From: Xiao Guangrong 

It's the simple lockless ring buffer implement which supports both
single producer vs. single consumer and multiple producers vs.
single consumer.

Many lessons were learned from Linux Kernel's kfifo (1) and DPDK's
rte_ring (2) before i wrote this implement. It corrects some bugs of
memory barriers in kfifo and it is the simpler lockless version of
rte_ring as currently multiple access is only allowed for producer.


Could you provide some more information about the kfifo bug? Any
pointer would be appreciated.



Sure, i reported one of the memory barrier issue to linux kernel:
   https://lkml.org/lkml/2018/5/11/58

Actually, beside that, there is another memory barrier issue in kfifo,
please consider this case:

   at the beginning
   ring->size = 4
   ring->out = 0
   ring->in = 4

 ConsumerProducer
 --- --
   index = ring->out; /* index == 0 */
   ring->out++; /* ring->out == 1 */
   < Re-Order >
out = ring->out;
if (ring->in - out >= ring->mask)
return -EFULL;
/* see the ring is not full */
index = ring->in & ring->mask; /* 
index == 0 */

ring->data[index] = new_data;
　ring->in++;

   data = ring->data[index];
   !! the old data is lost !!

So we need to make sure:
1) for the consumer, we should read the ring->data[] out before 
updating ring->out
2) for the producer, we should read ring->out before updating 
ring->data[]


as followings:
  Producer   Consumer
   
  Reading ring->outReading 
ring->data[index]

  smp_mb() smp_mb()
  Setting ring->data[index] = data ring->out++

[ i used atomic_store_release() and atomic_load_acquire() instead of 
smp_mb() in the

  patch. ]

But i am not sure if we can use smp_acquire__after_ctrl_dep() in the 
producer?



I wonder if this could be solved by simply tweaking the above consumer 
implementation:


[1] index = ring->out;
[2] data = ring->data[index];
[3] index++;
[4] ring->out = index;

Now [2] and [3] forms a WAR dependency, which avoids the reordering.


Best,
Wei

Re: [Qemu-devel] [PATCH v8 4/6] migration/ram.c: add ram save state notifiers

2018-06-20 Thread Wei Wang


On 06/20/2018 01:31 AM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 06/12/2018 03:50 PM, Peter Xu wrote:

On Fri, Jun 08, 2018 at 04:10:41PM +0800, Wei Wang wrote:

This patch adds a ram save state notifier list, and expose RAMState for
the notifer callbacks to use.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
   include/migration/misc.h | 52 +++
   migration/ram.c  | 64 
+---
   2 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..b970d7d 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -16,9 +16,61 @@
   #include "exec/cpu-common.h"
   #include "qemu/notify.h"
+#include "qemu/thread.h"
   /* migration/ram.c */
+typedef enum RamSaveState {
+RAM_SAVE_ERR = 0,
+RAM_SAVE_RESET = 1,
+RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
+RAM_SAVE_AFTER_SYNC_BITMAP = 3,
+RAM_SAVE_MAX = 4,
+} RamSaveState;
+
+/* State of RAM for migration */
+typedef struct RAMState {
+/* QEMUFile used for this migration */
+QEMUFile *f;
+/* Last block that we have visited searching for dirty pages */
+RAMBlock *last_seen_block;
+/* Last block from where we have sent data */
+RAMBlock *last_sent_block;
+/* Last dirty target page we have sent */
+ram_addr_t last_page;
+/* last ram version we have seen */
+uint32_t last_version;
+/* We are in the first round */
+bool ram_bulk_stage;
+/* How many times we have dirty too many pages */
+int dirty_rate_high_cnt;
+/* ram save states used for notifiers */
+int ram_save_state;
+/* these variables are used for bitmap sync */
+/* last time we did a full bitmap_sync */
+int64_t time_last_bitmap_sync;
+/* bytes transferred at start_time */
+uint64_t bytes_xfer_prev;
+/* number of dirty pages since start_time */
+uint64_t num_dirty_pages_period;
+/* xbzrle misses since the beginning of the period */
+uint64_t xbzrle_cache_miss_prev;
+/* number of iterations at the beginning of period */
+uint64_t iterations_prev;
+/* Iterations since start */
+uint64_t iterations;
+/* number of dirty bits in the bitmap */
+uint64_t migration_dirty_pages;
+/* protects modification of the bitmap */
+QemuMutex bitmap_mutex;
+/* The RAMBlock used in the last src_page_requests */
+RAMBlock *last_req_rb;
+/* Queue of outstanding page requests from the destination */
+QemuMutex src_page_req_mutex;
+QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
+} RAMState;

Why move these chunk?  Can it be avoided?


+
+void add_ram_save_state_change_notifier(Notifier *notify);
   void ram_mig_init(void);
   void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index 237f11e..d45b5a4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -56,6 +56,9 @@
   #include "qemu/uuid.h"
   #include "savevm.h"
+static NotifierList ram_save_state_notifiers =
+NOTIFIER_LIST_INITIALIZER(ram_save_state_notifiers);
+
   /***/
   /* ram save/restore */
@@ -267,49 +270,18 @@ struct RAMSrcPageRequest {
   QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
   };
-/* State of RAM for migration */
-struct RAMState {
-/* QEMUFile used for this migration */
-QEMUFile *f;
-/* Last block that we have visited searching for dirty pages */
-RAMBlock *last_seen_block;
-/* Last block from where we have sent data */
-RAMBlock *last_sent_block;
-/* Last dirty target page we have sent */
-ram_addr_t last_page;
-/* last ram version we have seen */
-uint32_t last_version;
-/* We are in the first round */
-bool ram_bulk_stage;
-/* How many times we have dirty too many pages */
-int dirty_rate_high_cnt;
-/* these variables are used for bitmap sync */
-/* last time we did a full bitmap_sync */
-int64_t time_last_bitmap_sync;
-/* bytes transferred at start_time */
-uint64_t bytes_xfer_prev;
-/* number of dirty pages since start_time */
-uint64_t num_dirty_pages_period;
-/* xbzrle misses since the beginning of the period */
-uint64_t xbzrle_cache_miss_prev;
-/* number of iterations at the beginning of period */
-uint64_t iterations_prev;
-/* Iterations since start */
-uint64_t iterations;
-/* number of dirty bits in the bitmap */
-uint64_t migration_dirty_pages;
-/* protects modification of the bitmap */
-QemuMutex bitmap_mutex;
-/* The RAMBlock used in the last src_page_requests */
-RAMBlock *last_req_rb;
-/* Queue of outstanding page requests from the destination */
-QemuMutex src_page_req_mutex;
-

Re: [Qemu-devel] [PATCH v8 4/6] migration/ram.c: add ram save state notifiers

2018-06-12 Thread Wei Wang


On 06/12/2018 03:50 PM, Peter Xu wrote:

On Fri, Jun 08, 2018 at 04:10:41PM +0800, Wei Wang wrote:

This patch adds a ram save state notifier list, and expose RAMState for
the notifer callbacks to use.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
  include/migration/misc.h | 52 +++
  migration/ram.c  | 64 +---
  2 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..b970d7d 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -16,9 +16,61 @@
  
  #include "exec/cpu-common.h"

  #include "qemu/notify.h"
+#include "qemu/thread.h"
  
  /* migration/ram.c */
  
+typedef enum RamSaveState {

+RAM_SAVE_ERR = 0,
+RAM_SAVE_RESET = 1,
+RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
+RAM_SAVE_AFTER_SYNC_BITMAP = 3,
+RAM_SAVE_MAX = 4,
+} RamSaveState;
+
+/* State of RAM for migration */
+typedef struct RAMState {
+/* QEMUFile used for this migration */
+QEMUFile *f;
+/* Last block that we have visited searching for dirty pages */
+RAMBlock *last_seen_block;
+/* Last block from where we have sent data */
+RAMBlock *last_sent_block;
+/* Last dirty target page we have sent */
+ram_addr_t last_page;
+/* last ram version we have seen */
+uint32_t last_version;
+/* We are in the first round */
+bool ram_bulk_stage;
+/* How many times we have dirty too many pages */
+int dirty_rate_high_cnt;
+/* ram save states used for notifiers */
+int ram_save_state;
+/* these variables are used for bitmap sync */
+/* last time we did a full bitmap_sync */
+int64_t time_last_bitmap_sync;
+/* bytes transferred at start_time */
+uint64_t bytes_xfer_prev;
+/* number of dirty pages since start_time */
+uint64_t num_dirty_pages_period;
+/* xbzrle misses since the beginning of the period */
+uint64_t xbzrle_cache_miss_prev;
+/* number of iterations at the beginning of period */
+uint64_t iterations_prev;
+/* Iterations since start */
+uint64_t iterations;
+/* number of dirty bits in the bitmap */
+uint64_t migration_dirty_pages;
+/* protects modification of the bitmap */
+QemuMutex bitmap_mutex;
+/* The RAMBlock used in the last src_page_requests */
+RAMBlock *last_req_rb;
+/* Queue of outstanding page requests from the destination */
+QemuMutex src_page_req_mutex;
+QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
+} RAMState;

Why move these chunk?  Can it be avoided?


+
+void add_ram_save_state_change_notifier(Notifier *notify);
  void ram_mig_init(void);
  void qemu_guest_free_page_hint(void *addr, size_t len);
  
diff --git a/migration/ram.c b/migration/ram.c

index 237f11e..d45b5a4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -56,6 +56,9 @@
  #include "qemu/uuid.h"
  #include "savevm.h"
  
+static NotifierList ram_save_state_notifiers =

+NOTIFIER_LIST_INITIALIZER(ram_save_state_notifiers);
+
  /***/
  /* ram save/restore */
  
@@ -267,49 +270,18 @@ struct RAMSrcPageRequest {

  QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
  };
  
-/* State of RAM for migration */

-struct RAMState {
-/* QEMUFile used for this migration */
-QEMUFile *f;
-/* Last block that we have visited searching for dirty pages */
-RAMBlock *last_seen_block;
-/* Last block from where we have sent data */
-RAMBlock *last_sent_block;
-/* Last dirty target page we have sent */
-ram_addr_t last_page;
-/* last ram version we have seen */
-uint32_t last_version;
-/* We are in the first round */
-bool ram_bulk_stage;
-/* How many times we have dirty too many pages */
-int dirty_rate_high_cnt;
-/* these variables are used for bitmap sync */
-/* last time we did a full bitmap_sync */
-int64_t time_last_bitmap_sync;
-/* bytes transferred at start_time */
-uint64_t bytes_xfer_prev;
-/* number of dirty pages since start_time */
-uint64_t num_dirty_pages_period;
-/* xbzrle misses since the beginning of the period */
-uint64_t xbzrle_cache_miss_prev;
-/* number of iterations at the beginning of period */
-uint64_t iterations_prev;
-/* Iterations since start */
-uint64_t iterations;
-/* number of dirty bits in the bitmap */
-uint64_t migration_dirty_pages;
-/* protects modification of the bitmap */
-QemuMutex bitmap_mutex;
-/* The RAMBlock used in the last src_page_requests */
-RAMBlock *last_req_rb;
-/* Queue of outstanding page requests from the destination */
-QemuMutex src_page_req_mutex;
-QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
-};
-typedef struct RAMState

[Qemu-devel] [PATCH v8 6/6] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-08 Thread Wei Wang

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration ram save state notifier list.
The notifier calls free_page_start after the migration thread syncs the
dirty bitmap, so that the free page hinting optimization starts to clear
bits of free pages from the bitmap. It calls the free_page_stop
before the migration thread syncs the bitmap, which is the end of the
current round of ram save. The free_page_stop is also called to stop the
optimization in the case there is an error happened in the process of
ram save.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

TODO:
- If free pages are poisoned by guest, the hints are dropped currently.
  We will support clearing bits of poisoned pages from the bitmap in the
  future.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 
---
 hw/virtio/virtio-balloon.c  | 260 
 include/hw/virtio/virtio-balloon.h  |  28 ++-
 include/standard-headers/linux/virtio_balloon.h |   7 +
 3 files changed, 294 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1f7a87f..a7bb971 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -28,6 +28,7 @@
 #include "qapi/visitor.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/misc.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -310,6 +311,166 @@ out:
 }
 }
 
+static void get_free_page_hints(VirtIOBalloon *dev)
+{
+VirtQueueElement *elem;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+uint32_t id;
+size_t size;
+
+while (dev->block_iothread) {
+qemu_cond_wait(>free_page_cond, >free_page_lock);
+}
+
+/*
+ * If the migration thread actively stops the reporting, exit
+ * immediately.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+return;
+}
+
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return;
+}
+
+if (elem->out_num) {
+size = iov_to_buf(elem->out_sg, elem->out_num, 0, , sizeof(id));
+virtqueue_push(vq, elem, size);
+g_free(elem);
+
+virtio_tswap32s(vdev, );
+if (unlikely(size != sizeof(id))) {
+virtio_error(vdev, "received an incorrect cmd id");
+return;
+}
+if (id == dev->free_page_report_cmd_id) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+} else {
+/*
+ * Stop the optimization only when it has started. This
+ * avoids a stale stop sign for the previous command.
+ */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+return;
+}
+}
+}
+
+if (elem->in_num) {
+/* TODO: send the poison value to the destination */
+if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
+!dev->poison_val) {
+qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+  elem->in_sg[0].iov_len);
+}
+virtqueue_push(vq, elem, 0);
+g_free(elem);
+}
+}
+
+static void virtio_balloon_poll_free_page_hints(void *opaque)
+{
+VirtIOBalloon *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtQueue *vq = dev->free_page_vq;
+
+while (dev->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+qemu_mutex_lock(>free_page_lock);
+get_free_page_hints(dev);
+qemu_mutex_unlock(>free_page_lock);
+}
+virtio_notify(vdev, vq);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static bool virtio_balloon_page_poison_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+/* For the stop and copy phase, we don't need to start the optimization */
+if (!vdev->vm_running) {
+return;
+}
+
+if (s-

[Qemu-devel] [PATCH v8 4/6] migration/ram.c: add ram save state notifiers

2018-06-08 Thread Wei Wang

This patch adds a ram save state notifier list, and expose RAMState for
the notifer callbacks to use.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 include/migration/misc.h | 52 +++
 migration/ram.c  | 64 +---
 2 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 113320e..b970d7d 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -16,9 +16,61 @@
 
 #include "exec/cpu-common.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 
 /* migration/ram.c */
 
+typedef enum RamSaveState {
+RAM_SAVE_ERR = 0,
+RAM_SAVE_RESET = 1,
+RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
+RAM_SAVE_AFTER_SYNC_BITMAP = 3,
+RAM_SAVE_MAX = 4,
+} RamSaveState;
+
+/* State of RAM for migration */
+typedef struct RAMState {
+/* QEMUFile used for this migration */
+QEMUFile *f;
+/* Last block that we have visited searching for dirty pages */
+RAMBlock *last_seen_block;
+/* Last block from where we have sent data */
+RAMBlock *last_sent_block;
+/* Last dirty target page we have sent */
+ram_addr_t last_page;
+/* last ram version we have seen */
+uint32_t last_version;
+/* We are in the first round */
+bool ram_bulk_stage;
+/* How many times we have dirty too many pages */
+int dirty_rate_high_cnt;
+/* ram save states used for notifiers */
+int ram_save_state;
+/* these variables are used for bitmap sync */
+/* last time we did a full bitmap_sync */
+int64_t time_last_bitmap_sync;
+/* bytes transferred at start_time */
+uint64_t bytes_xfer_prev;
+/* number of dirty pages since start_time */
+uint64_t num_dirty_pages_period;
+/* xbzrle misses since the beginning of the period */
+uint64_t xbzrle_cache_miss_prev;
+/* number of iterations at the beginning of period */
+uint64_t iterations_prev;
+/* Iterations since start */
+uint64_t iterations;
+/* number of dirty bits in the bitmap */
+uint64_t migration_dirty_pages;
+/* protects modification of the bitmap */
+QemuMutex bitmap_mutex;
+/* The RAMBlock used in the last src_page_requests */
+RAMBlock *last_req_rb;
+/* Queue of outstanding page requests from the destination */
+QemuMutex src_page_req_mutex;
+QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
+} RAMState;
+
+void add_ram_save_state_change_notifier(Notifier *notify);
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
 
diff --git a/migration/ram.c b/migration/ram.c
index 237f11e..d45b5a4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -56,6 +56,9 @@
 #include "qemu/uuid.h"
 #include "savevm.h"
 
+static NotifierList ram_save_state_notifiers =
+NOTIFIER_LIST_INITIALIZER(ram_save_state_notifiers);
+
 /***/
 /* ram save/restore */
 
@@ -267,49 +270,18 @@ struct RAMSrcPageRequest {
 QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
-/* State of RAM for migration */
-struct RAMState {
-/* QEMUFile used for this migration */
-QEMUFile *f;
-/* Last block that we have visited searching for dirty pages */
-RAMBlock *last_seen_block;
-/* Last block from where we have sent data */
-RAMBlock *last_sent_block;
-/* Last dirty target page we have sent */
-ram_addr_t last_page;
-/* last ram version we have seen */
-uint32_t last_version;
-/* We are in the first round */
-bool ram_bulk_stage;
-/* How many times we have dirty too many pages */
-int dirty_rate_high_cnt;
-/* these variables are used for bitmap sync */
-/* last time we did a full bitmap_sync */
-int64_t time_last_bitmap_sync;
-/* bytes transferred at start_time */
-uint64_t bytes_xfer_prev;
-/* number of dirty pages since start_time */
-uint64_t num_dirty_pages_period;
-/* xbzrle misses since the beginning of the period */
-uint64_t xbzrle_cache_miss_prev;
-/* number of iterations at the beginning of period */
-uint64_t iterations_prev;
-/* Iterations since start */
-uint64_t iterations;
-/* number of dirty bits in the bitmap */
-uint64_t migration_dirty_pages;
-/* protects modification of the bitmap */
-QemuMutex bitmap_mutex;
-/* The RAMBlock used in the last src_page_requests */
-RAMBlock *last_req_rb;
-/* Queue of outstanding page requests from the destination */
-QemuMutex src_page_req_mutex;
-QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
-};
-typedef struct RAMState RAMState;
-
 static RAMState *ram_state;
 
+void add_ram_save_state_change_notifier(Notifier *notify)
+{
+notifier_list_add(_save_state_notifiers, notify);
+}
+
+stat

[Qemu-devel] [PATCH v8 2/6] migration: use bitmap_mutex in migration_bitmap_clear_dirty

2018-06-08 Thread Wei Wang

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Michael S. Tsirkin 
CC: Peter Xu 
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index c53e836..2eabbe9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1093,11 +1093,14 @@ static inline bool 
migration_bitmap_clear_dirty(RAMState *rs,
 {
 bool ret;
 
+qemu_mutex_lock(>bitmap_mutex);
 ret = test_and_clear_bit(page, rb->bmap);
 
 if (ret) {
 rs->migration_dirty_pages--;
 }
+qemu_mutex_unlock(>bitmap_mutex);
+
 return ret;
 }
 
-- 
1.8.3.1

1 2 3 4 5 6 7 >

1 - 100 of 648 matches

Mail list logo